Statistical Power Made Easy

When you track key performance indicators (KPIs), quality improvement (QI) statistics, or other important analytics, do you have enough statistical power? If you aren’t sure how to answer that question, then this article should be on your MUST READ list.

If your analyses lack power, there is a good chance you can’t detect important changes in your metrics. And if your analytics are overpowered, you could be chasing after trivial issues. In this article, you’ll learn the importance of statistical power, and how to view your analytics through this lens.

A Simple Explanation of Statistical Power

In simple terms, statistical power is a measure of your data’s ability to correctly identify statistical significance in an analysis.

In every statistical analysis you perform, one of four possible outcomes will occur:

1. You correctly identify a result as significant.
2. You correctly identify a result as non-significant.
3. You incorrectly identify a result as significant (aka, Type I Error).
4. You incorrectly identify a result as non-significant (aka, Type II Error).

Statistical power is a measure of the probability that your data can achieve #1.

Traditionally, when thinking about statistical power, analysts use a probability of 0.80, or 80 percent.

Why Statistical Power is Important

Even if you are simply tracking KPIs, QI statistics, or other metrics over time, statistical power and sample size play a role in how precise your descriptive statistics will be.

The larger your data set, the more precise your statistical estimates are. The smaller your data set, the less precise your statistical estimates are. They have less power.

So, for any given analysis, your preference should be to use as much data as you can get.

More data = more precision = more power.

A Statistical Power Thought Experiment

This is demonstrated with a simple thought experiment.

There are about 330 million people in the United States.

If you asked 2 people at random to tell you their age, you probably wouldn’t have a very good estimate of the average age in the country.

If you wanted to compare the average age for men and women, it might not even be possible. This is especially true if you sampled two men or two women.

But if you asked 329,999,998 people at random, you would have a pretty good estimate of the average age for the entire population.

With such a large sample size, you should also have a very good idea about the difference in average age for men and women.

Thus, you improve your ability to detect differences between groups or over time when you use a larger sample because of the increased precision.

How Does Effect Size Play into the Sample Size – Power Relationship?

My description of the relationship between sample size and power above assumes your effect size is held constant.

Your effect size is the magnitude of difference, or strength of the relationship, you find from your analysis.

Taking our thought experiment one step further, you can consider the impact of effect size on sample sizes and statistical power.

For example, an average age difference of 10 years between men and women is greater than an average difference of 1 month.

If you believe the difference in the population is actually 10 years, then you shouldn’t need many observations in your sample to see that large difference.

In contrast, if you wanted to detect a difference of 1 month, you would need a much larger sample to get more precise estimates of the average.

In the end, the larger your effect size is, the less data you need to identify the effect as statistically significant.

If you don’t have enough data, your analysis will look at large effects and call them non-significant.

Conversely, if your data set is very large, your analysis will identify very small, and potentially meaningless, effects as highly significant.

When Should You Think About Statistical Power?

The best time to think about statistical power is before you begin collecting data. You want to determine how much data you’ll need to detect an effect of a certain size, with a given level of power.

Due to the cost and effort of collecting data, you don’t want to go through the effort only to find that your data set is too small to be useful.

If you are working with data that was already collected, you should think about statistical power to determine whether your data are capable of identifying significant results.

To be clear, doing the analysis after the fact doesn’t help you fix anything. It only helps place the results in context if you don’t find significant results.

What Should I Do if My Analysis is Under/Over-Powered?

So far, my discussion of statistical power has focused on making a somewhat technical topic more accessible. However, I haven’t addressed the elephant in the room yet: how to handle under/overpowered data.

Practically speaking, if your data are under-powered for your analysis you’ll need to collect more data.

If you have missing data that is reducing the power of your analysis, you can try to impute the missing values.

Statistical imputation is a family of methods used to guess what the value of missing data should be. I won’t get into the details here, but I used the word “guess” for a reason…all imputation methods are fancy forms of guessing.

When it comes to overpowered analyses, your next steps are easier to take. Simply ignore results that are not meaningful.

In saying this, however, you will need to be able to translate your statistical results into practical terms about the underlying hypotheses. It is easier to say that a result is statistically significant but not meaningful in effect size than to say that a large practical effect size is not significant.

Conclusion

If you’ve made it this far, you have the insight needed to review your analytics and consider whether they have sufficient statistical power to be useful.

Going forward, you know that statistical significance is not a measure of how important or meaningful a result is. Depending on how much statistical power your data holds, you could easily identify meaningless effects as significant (overpowered) or fail to identify large effects as significant (meaningless).

In either case, you also know better how to approach the next steps in your decision-making process.