One beautiful aspect of analytics is your ability to rerun the same analysis over and over again. Once you write the code to do something, you can easily execute the same analysis on different data. Ideally, your analytic team would develop their code so that repeating an analysis with new data would require minimal updating. In reality, your analytic team will not do this the first time around. In this post, I discuss the process and best practices for developing and nurturing mature analytic code.
The Process of Creating Code from Scratch
Analysts in different organizations will follow slightly different processes for developing code from scratch. At a high level, however, the main components of the process are likely to be very similar. The last thing you want is for your analysts to dive immediately into writing code – a process we call thrashing. Instead, a typical process will include the following components:
- Outlining a problem solution
- Mapping out the analytic workflow
- Developing code
Outlining A Problem Solution
I will begin this discussion assuming that we have a well-defined research question, and data sources. You can learn more about this process in my post on Asking Better Questions.
The analysts should begin by working with the project team to identify the necessary characteristics of a successful solution to the research question. They can develop a checklist of information the analysis will need to provide,
With a checklist in hand, the analysts should turn next to the types of analytic methods that will be needed to generate the results. They will need to consider the metrics to be calculated, and the descriptive or inferential statistics required.
By the end of this process, the project team and analysts will have a solid outline of a problem solution. One litmus test to gauge success at this point is to ask yourself the following question: Would another team have a chance to complete this project if we gave them the research question and this outline?
Mapping Out the Analytic Workflow
After outlining the problem solution, the analysts should still not jump straight into writing code. Instead, experienced analysts will map out the analytic workflow.
You want your analysts to literally sketch out the logical flow of the code they will develop for the project. They should include components in the map for the following steps (the order of steps will depend on your project):
- Importing data
- Cleaning data
- Transforming data
- Decision logic
- Analytic processing
- Output generation
The analysts’ goal in mapping the workflow is to think about all the pieces of code that need to be written. During the process, your analysts will come to a deeper understanding of the code requirements, and the order of execution. Ultimately, your analysts will have a better coding experience by starting with a good workflow map.
Developing Code
If your analysts follow this process, they are finally ready to start developing code. At this point, they should have an excellent idea of what needs to be done. The analysts should know if there are any processes or procedures they need to research or learn. They should also be familiar with each of the components that need to be written.
Even with thorough preparation by your analysts, programming code is rarely ever perfect when first written. They will need to debug and edit the code to complete the initial analysis. Furthermore, the analysts’ first shot at developing code is often targeted at having code that works. This is different from having mature code. Still, all the work your team put into planning the analysis will pay off by streamlining the actual code development.
Why Initial Code Isn’t Mature Code
The first time your analysts write code for a project, it will not be mature code. Instead, the analysts’ focus is on developing code that will accurately and efficiently execute the analysis. While the initial code works, if you needed to transition the analysis to a new data set, or use different decision logic, the analysts would need to make substantial revisions to the code.
In contrast, mature code is code that your team has used repeatedly. Your analysts refined the code to include a high degree of automation. They have included parameters for user input to guide the decision logic and easily adjust the analysis. In the next section, I go deeper into four key developments that transition initial code into mature code.
Key Characteristics of Mature Code
If your analytic project is not a one-off or custom analysis, then you will likely use the code again for the same analysis. For example, perhaps you are developing a customer relationship dashboard for your sales team. Or you might be creating a reporting service for a proprietary survey you field on behalf of clients. In these cases, your analysts will need to use their code repeatedly to execute the analysis.
After the initial code development and validation, the analysts will want to nurture their code to a more mature state. Here are four (4) ways in which your analysts can nurture a mature platform for analytic code.
Modularizing Code
It doesn’t matter how your analysts learned to code, as a young analyst they tend to write in blobs. A blob is what you get when an analyst puts all the code for a project into a single programming file. I did this myself when I was young.
My worst offense was writing a 10,000 line program to execute all of the statistical analysis for a project. While I was able to replicate the entire analytic journey from start to finish, the final program was a mess. It was difficult to navigate, and you wouldn’t want to run the code all at once.
Instead, your analysts will want to write modularized code. The idea is for the analyst to write small programs with very specific purposes instead of one large program. You can keep track of and edit smaller programs easier than big ones. And, nearly every analytic programming language today uses object-oriented programming, or OOP.
In an OOP framework, each piece of code can independently create and work with objects. By working with objects, analysts can write code that refers to objects created by other code. This provides a very flexible way to modularize code and link all the pieces together.
Adding Automation
Taking the concept of modular code one step further, analysts will want to add automation functions to their mature code. With experience, analysts will eventually begin writing all their code in a modularized manner. For initial code, however, the analysts often run each piece independently to ensure the entire process works properly. Once proof-of-concept is established, having an analyst spend the time to babysit the entire process becomes a waste of resources. Adding automation allows analysts to transition code into a process they set up and leave to run on its own.
Programmers have developed lots of methods for automating code. One of my preferred methods is to write a wrapper program that controls the workflow of the entire process. The wrapper program collects the user inputs for decision-making and calls the code modules to execute the analysis. This program is essentially the brain of the entire process, following the instructions of the analyst.
Clarify File Path and Naming Conventions
It may seem like a fundamental issue, but analysts nurture mature code by using conventions for file paths and names. Typically, this means analysts decide where input and output files will be stored during the planning phases. It seems like such a small thing, but believe me, my friend, it makes a difference.
When analysts set file path and naming conventions at the outset, the automation process becomes easier. The programmer can bake those conventions into the code to streamline everything. For example, users can declare variables for input and output file paths in the wrapper program. The rest of the code simply refers to these two variables to get inputs and save outputs. If the analyst needs to change the file paths, they can do so easily in the wrapper.
Improving Documentation
Analysts can also nurture mature code by improving their documentation. Every analyst is guilty of not documenting their code when they are new and inexperienced. I am no different than anyone else in this regard. I once wrote code for an entire project without a single line of documentation about what I was doing. I paid for that mistake dearly when I had to edit the code nearly a year later.
I now tell my analysts that documentation is part of the coding process. They need to document their code well enough that another analyst could quickly step in by reading the documentation and code alone.
If your analysts have developed sufficient analytic plans and code maps, then they have a good start on the documentation. See my article on better analytic planning here. The analysts should supplement these documents with liberal comments written in the actual programming code. Additionally, it is helpful if your analysts document their thought processes on solutions to any particularly challenging problems they encounter.
Conclusion
One saying my staff get tired of hearing me say is, “Performing a task once makes you successful, not proficient.” The process of developing mature analytic code is no different. Analysts who develop mature code move beyond simply getting it to work. They consider and address downstream issues to allow their code to be automated and efficient. The next time your analysts develop code from scratch, encourage them to go a few steps further at the end to create a more mature process.