Help…My Data Aren’t Useful for Analytics!

Neon words say Data for analytics. Blocks spell the word Quality. Headline is: Help...My Data Aren't Useful for Analytics!

I often hear concerns from professionals that their in-house data aren’t useful for analytics. While they want to use analytics in their organizations, they’re concerned that it’ll be a waste of time.

If you read articles online about the importance of data quality, you’ll quickly see why they’re concerned. Most people writing about it say that if your data aren’t high-quality, you should stop and fix them first.

I’ll echo the sentiment that bad data is something you should avoid. However, I have a different take on data quality and analytics.

Simply put, I believe that most data can provide at least some valuable insights. Additionally, my experience has shown me that improving data quality is a continual process, not a one-time event.

Two Major Camps of Data Quality Concern

The complaints I typically hear from concerned professionals fall into one of two camps:

  1. My data don’t include the fields I want for analysis
  2. My data quality isn’t good enough to use for analytics

Both concerns are related to data quality. After all, comprehensiveness is the element of data quality requiring that your data set includes all relevant fields or variables.

In contrast, the second complaint about data quality focuses more on the elements of data accuracy, completeness, timeliness, and uniqueness. These are the more traditional concerns about data quality.

In the end, neither of these concerns are likely to be so true that you simply cannot begin analyzing your data.

My Data Are Missing Fields

Every analyst I know can identify a field, that they wish had been included in their data set but wasn’t.

This is often the case when they analyze the data for reasons different from why it was collected.

You may wish your data included customer feedback, satisfaction scores, or information about interests, desires, demographics, or fears.

It’s okay if you don’t have those fields at the beginning. You can create an organizational analytics plan to help move you forward. It doesn’t mean your data aren’t useful for analytics.

You’ll learn how to collect the data as you move forward and collect more data. Start thinking about how you can obtain that information through surveys, phone calls, interviews, or focus groups.

In the meantime, just because you’re missing fields doesn’t mean the rest of your data are useless. Quite the contrary.

In fact, you should begin thinking about the questions your data can answer that you don’t already have answers for.

Examples of analyses that could be done with simple sales and customer database tables:

  • Customer segmentation for targeted marketing
  • Conversion rate calculations for operational efficiency
  • Identifying your most profitable products, services, and customers
  • Forecasting sales revenue to predict demand
  • Allocating staff and other resources to meet demand
  • Monitoring inventory to ensure proper supply to meet demand

And there are many more analyses you could pursue with these simple data tables.

Start by reviewing the list I provide here. If you are not performing these analyses and they are relevant to your business, then you have room for improvement.

My Data Are Poor Quality

You have heard that poor data quality in analytics leads to poor decision-making. If you think your data aren’t useful for analytics, start by assessing data quality. Data quality refers to accuracy, completeness, uniqueness, and timeliness in this section.

The initial statement certainly holds some truth. However, do you know the difference between good and poor data quality?

Data can be good quality without being perfect. Similarly, data with a few errors or omissions are not necessarily poor quality.

In my 25+ year career as an analyst, I have yet to find a perfect data set. At the same time, only a handful of data sets were so bad I couldn’t learn anything from them.

All data sets will have some issues with quality. The question you must ask yourself is: how badly will these issues impact my analyses?

Inaccurate Data

Data accuracy should always be a concern. After all, it doesn’t matter how much data you have if the information is wrong.

Even so, almost every data set will have errors in accuracy.

Survey respondents may misremember details they are asked to recall.

People may choose the wrong item in a drop-down menu or from a list.

Your own staff could simply mistype the data into a form.

You can estimate how accurate your data are by performing periodic audits to identify any obvious errors.

Even when you audit your data, a few errors will likely slip past. These errors will not impact your analysis dramatically unless your data set is small or they introduce an outlier value.

Incomplete Data

Virtually every data set will have some kind of missing data. Typically, this happens when users are given an option to answer a question or provide information and choose not to.

Missing data itself is not always a problem. After all, if the pattern of the missingness is random across observations and fields, your analysis should still be accurate.

Similarly, if the percentage of data missing is small relative to the overall sample, your analysis should not lose much precision.

In contrast, missing data becomes problematic when it is systematic or extensive.

If the pattern of missing data is systematic, then some other factor, observed or not, predicts the missingness.

When missing data is systematic, your analytic results are likely to be skewed in the direction of the observable data.

Additionally, the greater the extent of missing data in your data set, the more likely your results will be both inaccurate and imprecise.

I often get the question, “How much missing data can I have before it’s a problem?”

Typically, I get concerned about having more than 5 to 10 percent missing data on a single variable in my analysis. This is simply a rule of thumb.

There are no hard rules about how much missing data is too much. However, the less risk you are willing to accept in your results, the less missing data you should accept in your analysis.

Duplicate Data

It pretty much goes without saying that if you have duplicate records in your dataset, you will get inaccurate results.

Fortunately, most data analysis software packages include simple procedures to check for, and remove, duplicate records.

Every analyst should check their data for duplicate records as part of the data validation process. During data validation, the analyst checks a data set to identify any potential issues and confirm suitability for analysis.

Old Data

The data quality gurus always include timeliness as a characteristic of high-quality data. By timely, they mean the organization captures data and has it available for analysis in a time frame that is valuable.

To be fair, timely doesn’t always mean fast.

If the context of your business data doesn’t change quickly, then older data might be perfectly valuable.

In contrast, the faster your business context changes, the more recent your data needs to be to add value.

With all of this in mind, there is one caveat you should keep in mind. If no recent data is available, then using old data may be better than using nothing.

Improving Data Quality Is a Process

If your data have quality issues – and chances are they do – fixing them may take some time. This is especially true if you need to revise your data collection processes and tools.

Because you need time to make data quality improvements, I don’t recommend waiting to begin analyzing your data.

I say this for three reasons:

  1. You need to do some analyses simply to understand the nature and extent of your data quality issues.
  2. If your data quality issues are not extensive, then your analysis can still help answer substantive questions.
  3. If you wait, you invite perfection over progress, and unnecessary delays.

So I encourage you to perform a self-appraisal of your data quality. Be as honest with yourself as possible and call out any areas of concern.

Next, you should develop two lists.

The first list includes questions and analyses with minimal data quality concerns. You can begin implementing these immediately.

The second list includes questions and analyses for which you have concerns. Include the specific data quality concern on the list. Prioritize those concerns based on the importance of the analysis to your organization.

Begin working to improve your data quality concerns, one at a time. As you work through the list of improvements, you will be able to stand up more of your analytics platform.

Finally, be sure you periodically monitor your data quality to ensure new issues don’t come up and fix them when they do.

Conclusion

Even though many gurus claim you need to fix your data quality issues before using analytics, it simply isn’t feasible. If you think your data aren’t useful for analytics, you’re probably wrong.

All data have some kind of quality issue that analysts should be aware of. However, if those quality issues are not systematic or extensive, the data can still yield insights with the proper cautions.

You can determine where your data need fixing, and where they hold insights through careful data quality assessment. You can manage these efforts by prioritizing the order of analyses to implement and which quality improvements to make.

>
Scroll to Top