Why Your Analysts Should Always Check Their Log Files

When I started writing this blog, I made a promise not to turn this into a hardcore data science discussion. I’m not going to break that promise, but today I might bend it a little. Stay with me though, my friend, because I want to pull back the covers on a hidden analytic problem that could impact your analyses. This post is about why your analysts should always check their log files, and the show-stopping issues that could happen if they don’t.

What Is a Log File?

Your statistical analysis software generates a log file as a record of the actions performed when executing the analysis. The log file is typically a text file. In addition to recording commands, the log file also includes any errors or warnings encountered along the way. The log files from some software will also contain text-based results from analyses.

In the interest of full transparency, I have to say that the log file is not the ONLY place where errors, warning, and results are shown. Virtually every statistical software program in existence has an interactive window showing this information as the analysis is being executed.

There are, however, two problems with the interactive window. First, modern computers process data so quickly it’s impossible for a human to read the information as it scrolls by. Second, the contents of the interactive window are not saved, unless the analyst explicitly asks for it.

While the interactive window offers another avenue for reviewing results, the log file provides an efficient way to save the results of an analysts for easy review any time afterward. Have ever needed to review a project 12 months after completion? If so, then you know it’s difficult to remember exactly what happened. In those circumstances, the log file is your best friend.

Why is the Log File Important?

The errors and warnings contained in log files provide analysts with information about whether their code encountered any problems. For example, an attempt to calculate an average for a text-based field like a product name produces an error. The error indicates the product name field contains the wrong kind of data (i.e., text) to calculate an average. When an error is encountered, statistical software programs will generally stop running without producing further results or output. The analysts then know exactly where to begin fixing the error.

The log file also includes warnings about potential problems with the analysis. For example, if the code creates a new data table that has no observations, a warning will be issued. Warnings are considered less critical than errors and often will not stop a program in its tracks.

In our data table example, no observations in the new table might be expected. Perhaps the analyst created a table shell before adding data to it. Alternatively, there could be a logical problem with a query to join two tables that results in an empty table. Most statistical software isn’t smart enough to know the difference between those two situations. Therefore, the program issues a warning to let analysts know there might be something wrong.

Without the log file, your analysts would not know if their code ran successfully. Additionally, without the results from some analytic tests, your analysts would not know whether the assumptions of their analytic methods had been met.

Log Files Are Not Foolproof

Now that I’ve explained the importance of log files, I need to tell you about a BIG caveat about these tools.

Statistical software programs don’t check everything when you run an analysis.

Yep, you read that correctly. Log files don’t provide you with tests of every assumption required for an analysis to be valid. The analyst needs to explicitly request tests of some assumptions. Even when the analyst requests these additional tests, many statistical software programs will still produce results when assumptions are violated. This is because some statistical methods are still reasonably accurate even when the underlying assumptions are not strictly true.

Additionally, there may be reasons why an analyst wants to see the results from an analysis they know is faulty. The analyst may want to compare results based on faulty assumptions to adjusted analyses to improve the validity of results. Even if the analyst cannot massage the data to meet all of the assumptions, the result may be valid enough.

In the end, the analyst is responsible for verifying the underlying assumptions have been met. The log file is a tool to help your analysts do that, but the responsibility remains with them.

The Two Reasons to Always Check the Log File

What this all boils down to are two important reasons for your analysts to always check their log files. First, they need to verify the analysis did not encounter any critical errors or warnings that need to be corrected. Second, the analysts need to confirm that they have tested all of the necessary underlying assumptions of the analysis.

If your analysts don’t review log files after every analysis, then it’s likely there are problems lurking in their code. Analysts are people, and even the best will make mistakes from time to time. They may have typos in the code; forget to change the name of a variable in one section of code after editing another section; or they may simply have overlooked a test for an assumption that needs to be confirmed. It happens to everyone.

Fortunately, you can prevent many of these challenges by requiring your analysts to always check their log files. If they couple routine checks of the logs with a good analytic validation plan, then many potential problems can be caught and corrected.

One Last Caveat

No software package knows what you are trying to do. The log file can’t tell you if the choice of analytic method is the best choice or even makes sense. That is why you and your analysts need to decide on the analytic strategy at the outset. You can read more about that in my posts on 5 Things You Need to Tell Your Analysts Before They Begin Working on Your Analysis, and Transform your Team Culture to Data-Driven Decision-Making.