Correlation Is Not Causation

0
47 views

Share on LinkedIn

Anyone who works with statistics has heard the phrase "correlation is not causation."

What it means is that just because A is correlated with B you can't conclude that A causes B. It's also possible that B causes A, or that A and B are both caused by C, or that A and B are mutually dependent, or that it's all just a big coincidence.

Similarly, you can't assume that lack of correlation means lack of causation. Just because A isn't correlated with B doesn't mean that A does not cause B. It's possible that A causes B but with a time delay, or through some more complex relationship than the simple linear formula most correlation analysis assumes. It's also possible that B is caused by many different factors, including A, C, D, E, F, G, and the rest of the alphabet.

In reality, a linear correlation analyis mostly tells you the degree to which A and B are measuring the same thing. That's useful information but it doesn't necessarily tell you how to drive improvement in B.

I'm always a little disappointed when, in a business setting, someone does a linear correlation of a bunch of different variables against some key metric and then assumes that the things with the highest correlation coefficient are the ones to focus on. Correlation analysis can't actually tell you what causes the metric to go up or down: it's the wrong tool for the job. At best, it's a simple way to get a few hints about what might be worth a deeper look.

Actually understanding the drivers for a business metric requires a more sophisticated set of tools. A/B testing (where you actually perform an experiment) is the gold standard, but you can also learn a lot from natural experiments (taking advantage of events which normally happen in the course of business), and also from the basic exercise of formulating theories about what causes the metric to change and testing those theories against existing data. 

LEAVE A REPLY

Please enter your comment!
Please enter your name here