When you want to know what is happening inside of your application, the first place most people turn are the log files. If there’s a problem, one of the first things an experienced developer will likely ask is, “Are there any errors in the log files?” Effective application logging can be the key to solving application problems quickly and easily. When used poorly, they can be nothing but a bunch of noise, ignored by all, serving only to take up disk space.
Why do we use log files? Initially, writing information to disk was the only way to debug an application. Once debuggers came along, log files were still an effective tool for debugging. This is especially true if you have multiple iterations of a loop to go through or need to display something in a human friendly way.
There’s another important reason for having log files, even in this day and age of fancy debuggers. What do you think would happen if you asked to have a debugger port opened on your production system? Most places I’ve been would have responded with an emphatic “No!”. Debugging a production system is usually a really bad idea as it can affect the stability and performance of the system and potentially impact the user experience.
In the case of software you deliver, connecting a debugger isn’t even an option. You must rely on the customer to deliver the log files from the application in order to troubleshoot the problem.
There are exceptions. Once, when a critical production system crashed and refused to come up, the only way I was able to figure out what was going on was to connect the debugger and trace through the code to find the problem. (It was some corrupt data in the database) These rare occasions aside, log files are usually one of the main ways of getting your applications to talk.
If log files are so important, it makes sense to carefully consider what information to include in your application log files. Just as important, you should consider what not to include. Writing a bunch of fluff can make that needle in the haystack of log files much harder to find.
Over the years, I’ve had lots of experience generating log files and combing through even more. In this time, I’ve come to understand that there are four main characteristics of an application that uses logging effectively.
1. They include relevant information
Which of the following log messages do you find more useful?
1 | There was an error processing the billing transaction |
or
1 | Unable to processing billing transaction 83720 for user 274373 due to IllegalArgumentException |
I’m hoping you find the second message more useful. The first message tells you that there was an error, but says nothing about the cause or who it affected. The second message gives us some identifiers we can use to go find the relevant data in our database. The second error is also likely to be followed by a stack trace, which would tell us what code was being executed at the time of the error.
The key is to include enough information to give you a trail to follow, allowing you to go back in time and figure out what was going on and why things failed. You may not get this right initially, but over time, with experience and practice, you will get better at predicting what would be helpful to include in the log message. As with any predictions, they won’t be 100% accurate. Sometimes you’ll need to go back and add the information that would have been helpful, once you identify an issue. Some things typically included in a useful log message:
- The primary key of the objects in question. This could be a transaction id, user id, or other identifier that tells you what data was involved.
- What was going on at the time of the error. What was the business use case being performed? File upload? Invoice creation? A billing batch job?
- The error that occurred. (duh)
So far, I’ve mainly talked about logging errors. However, it’s also useful to write out information that isn’t an error, but might be useful to know when errors do occur. This might include things like:
- The full path of configuration files that were used to initialize the system
- Attempts to open a persistent connection to a server (e.g. the database)
- Scheduled job start and stop times.
2. They show restraint when logging exceptions
For many developers, including myself, catching an exception usually means logging it, in all it’s glory, with a full stack trace. After all, it’s an exceptional condition, someone will want to know as much about the code execution path as possible, right?
Maybe.
In some extreme examples, I’ve seen applications catch an exception, log the error along with the stack trace, wrap the exception and rethrow it. The next layer up catches the exception, does it’s own logging, wrapping and throwing. The layer above that catches the exception…you get the idea. A single error ends up in the log file multiple times in a row. When you’re scanning the logs, it’s hard to know at a glance if you had three separate errors or a single error logged three times.
Generally speaking, log exceptions at the highest layer in the application you can. That is, let the layer that is able to do something with the exception (other than rethrow it) log the error. Often, this means that something close to the user interface (UI) layer (Controller, Servlet, etc) should be doing the logging.
Pay special attention to exceptions that are part of the normal application flow. Yes, i know, using exceptions to do your branching logic is a bad idea, but sometimes you don’t have a choice (third party libraries, for example). In this case, does it really make sense to log the exception? If you feel you absolutely must log something, think long and hard about whether the entire stack trace is necessary. A one line message is probably enough to indicate what’s going on.
3. They do not include sensitive information
While you should strive to include as much useful information as possible, log messages should not include sensitive information, such as passwords. This may seem obvious, but I’ve seen database connection error messages in log files containing the username and password! Fortunately, in my case, it was the incorrect password, copied over from a staging environment.
While passwords are an obvious piece of information to exclude, there are other bits of data that may not be so obvious. In this day an age, privacy laws are getting stronger and stricter all the time. These laws determine what constitutes private information, who can see it and how or if it can be distributed. Some things that may not belong in log files:
- Street addresses
- Phone numbers
- Email addresses
- Usernames
Information like a street address, in isolation may be fine. For example, a geocoding service may want to log the address that caused an error. When connected with a specific person, however, it becomes more problematic, especially when collected in bulk. When in doubt, leave them out or perhaps only display the user’s id. Someone with the proper credentials should be able to provide the other information, should it be needed to troubleshoot a problem.
4. They use appropriate categorization and severity levels
Most logging systems (at least, those I’m familiar with in the Java world) provide a way to categorize messages and assign each one an appropriate severity level. This allows the messages to be filtered, reducing the volume of messages until a problem arises and more detail is needed. While the number of available levels may vary, depending on the logging system you use, I’ve found that four levels are usually enough:
- ERROR – “Mayday! Something really bad happened and I don’t know what to do!”
- WARN – “Hey, um, something bad happened. Don’t worry, I was able to work around the problem, but I wanted to let someone know”
- INFO – “Just thought you’d like to know that I’m working on this thing here and I’m about to process this user’s purchase”
- DEBUG – “Really? You want me to tell you what I’m doing every step of the way? Well, ok, but it’s going to be a lot of detail. Are you debugging a problem?”
This is one I’ve seen used badly over the years. In one example, a REST client library was logging the full contents of each HTTP request (URL, HTTP headers, request parameters, etc.) as well as the corresponding response at INFO level, the default log level for our application. In production, all this really did was clutter up the logs and fill up the disk. Instead of removing the logging entirely, we changed the severity to DEBUG. This reduced the volume of logging, but still allowed us the ability to get more information to diagnose any problems that might come up.
On a related note, it is extremely helpful if the application has some way of altering the log filtering rules at runtime. This could be a HTTP endpoint, a JMX operation or simply reloading the logging configuration file when it changes. Unless you’re building small, interrelated applications (that is, microservices), rebooting your application to pick up a logging configuration change can be an expensive and disruptive proposition. In addition, restarting the application wipes out the state that may be part of the problem.
Log files are a valuable source of information when you need to understand how your application is working, or not working. Getting the right information into the log files is something of an art form. You can never guess with certainty what you might need, so you do your best, iterate, and learn from past troubleshooting. Follow these strategies and your will improve the effectiveness of your application’s logging.
Question: What do you think are the most important things to include in an application’s log files?