Have you ever received results from your analyst and known, immediately, that the information they handed you was wrong? I’m sure one of the first questions you asked was, “Did you check this result?” You knew they didn’t check the results because they were completely implausible. You are not alone, my friend. This scenario plays out over and over, every day, across industries worldwide, because there is a dirty little secret about analytic validation that no one wants to talk about. In this post, I’m pulling the curtains back about how your analysts are checking their work, and why they are probably making mistakes.
What is Analytic Validation?
Analytic validation is the process of confirming that your results provide an accurate answer to the question you asked. Many organizations assume analysts will perform validation, and only question the result when something appears to be obviously wrong. If your analysts are doing work that impacts clients or the public directly, errors could lead to embarrassment and a loss of credibility. Worse yet, you could lose future work, or incur legal liabilities for errors and omissions. Given the potential costs of improperly validated analyses, it’s surprising how many organizations don’t have specific policies or procedures in place as part of their analytic governance to cover such a critical function.
6 Methods of Analytic Validation
Analytic validation covers a continuum of techniques that range in levels of complexity and cost. At the simple end of the continuum, you rely on your analysts to validate their own work and catch their own mistakes. At the complex end of the continuum, you have multiple analysts independently perform the work and reconcile their results. Knowing that costs are a considerable factor in the choice of analytic validation methods, I use a specific process for selecting a validation method. I start at the most accurate – and complex – end of the spectrum and work back to a method that balances feasibility, cost effectiveness, and risk aversion.
Discussing the approach to analytic validation is an important part of the discussion you should have with your analysts before starting an analysis. Here is a list of different approaches to analytic validation, and the pros and cons of each approach:
Code-to-Code Validation on Independent Approaches
This approach uses parallel analyses. Two analysts independently tackle the problem, develop independent solutions, and then compare results when they are finished. If there are differences in their results, then the analysts discuss their approaches and underlying assumptions. They adjust their analyses until they have reconciled their differences and agree on the results.
Pros:
- The approach most likely to produce accurate results because multiple analysts agree on what the correct answer is.
- Development of multiple independent results reduces the risk of logical or programming errors causing the results.
- Requires analysts to become very specific about their approaches and underlying assumptions.
Cons:
- The most expensive approach because independent analysts are performing duplicative work to ensure accuracy.
- The approach is time consuming. The reconciliation and reanalysis process takes time to complete since the analysts are writing their code independently. They may require multiple iterations of comparison and adjustment to bring results into agreement.
- Approach does not guarantee that errors cannot occur. I have seen competent analysts both make logical and/or programming errors and come to the same solution. What they agreed was correct turned out to be false. The process makes this outcome very unlikely, but it can still happen.
Code-to-Code Validation on A Common Approach
This approach helps to shortcut the process of code-to-code validation by having two analysts collaboratively develop an approach, rather than developing independent approaches. Once there is agreement on the approach, the analysts independently write code to perform the analysis. After completing the analysis, the analysts follow the same process of comparing results and adjusting their code until they agree on the correct result.
Pros:
- Takes less time than code-to-code validation with independent approaches. By developing an approach to the analysis they agree upon first, many potential differences can be resolved at the planning stage.
- Development of multiple results based on independent code reduces the risk of logical or programming errors causing the results.
Cons:
- The reconciliation process may still take time depending on the complexity of the analytic problem.
- Approach does not guarantee that errors cannot occur. Because the analysts agree upon a common approach at the beginning, a logical flaw in the approach can lead to consistent, yet inaccurate results.
Code-to-Code Validation on A Common Analytic File
This approach requires two analysts to collaborate in extracting, cleaning, and transforming the data they use for the final analysis. The process can be performed independently in small stages to ensure they develop the same analytic file to be used for the final analysis. Once the analysts agree on the structure and contents of the analytic file, the final analysis proceeds independently and with reconciliation as in other code-to-code approaches to validation.
Pros:
- The approach takes less time than other code-to-code approaches for large-scale analytic projects because the analysts are incrementally building a final analytic file they agree is accurate. For smaller analytic problems, however, the time-savings may be minimal.
- Development of multiple results based on independent code reduces the risk of logical or programming errors causing the results.
Cons:
- The reconciliation process may still take time depending on the complexity of the analytic problem.
- Approach does not guarantee that errors cannot occur. Because the analysts collaborate to create an analytic file at the beginning, a logical flaw in the approach can lead to consistent, yet inaccurate results.
Code Review on A Common Approach
In this approach, two analysts develop an analytic plan collaboratively. Then one analyst writes all the code and performs the analysis. The second analyst finishes the process by reviewing the code that was written to confirm it works appropriately and reviews the results to confirm they appear plausible and correct.
Pros:
- Requires less analyst time than code-to-code approaches because only one analyst is writing the code.
Cons:
- Requires a more senior analyst to perform the code review because they must be able to spot flaws in the logic and execution that less experienced analysts may miss.
- More error-prone at the coding stage than code-to-code approaches because only one analyst is writing code. It is harder to catch small errors in the programming with visual verification than by comparing two sets of coded output.
Visual Validation of Results by Independent Analyst
This approach uses one analyst to develop an approach to the project and write all the code. After producing results, a second analyst reviews the output – not the code – to ensure that the results appear plausible.
Pros:
- Analyses using independent validation are completed faster because there is no reconciliation of results or code review process.
Cons:
- Requires the validating analyst to have very good understanding of the business problem, the data used in the analysis, and what the results should look like.
- Highly error prone approach. Without a code review, errors in the programming or logic could easily lead to inaccurate results.
Solo Analytic Work and Self-Validation
This approach places the burden of producing accurate results entirely on a single analyst. There is no second analyst to review the approach, data, code, or results.
Pros:
- Often produces the fastest results because there is no independent review.
Cons:
- The most error prone approach. Requires that the analyst on the project identify and fix all their own errors in both logic and execution.
- Not recommended except in the cases where only the roughest of results and minimal accuracy are required.
Conclusion
The choice of analytic validation methods is often a tradeoff between the need for accuracy, time frame available for the analysis, and costs. While the most accurate results are obtained most often from code-to-code validation with independent approaches, this is also the most expensive and time-consuming method. You need to gauge your organization’s level of risk aversion and need for accuracy when determining which method to use for any given analysis. You should also consult your organization’s analytic governance policies with respect to any specific direction for validation.
Regardless of the analytic validation method used, I always recommend that analysts save their log files with the analytic results and take the time to review the logs after running an analysis to confirm there were no errors or warnings. While a log file without errors or warnings does not guarantee accuracy in the results, it can help confirm that there were no unexpected bugs in the programming code. Additionally, saving the log files allows them to be stored with the rest of the analytic documentation for review later if necessary.
By selecting the level of analytic validation best suited for your project parameters, you can approach the results with good insight about the degree of accuracy and where to look for any potential problems if they come up. For additional information on how to ensure your results are rock solid, see my post on the seven (7) things to ask your analyst when they bring you results.