# There Are Lies, Damned Lies and Descriptive Statistics

5
259

#### Anscombe’s quartet

Here are some interesting numbers for you. Interesting in a geeky sort of way…

They are 4 sets of readings for two variables, X and Y.

That is a horribly algebraic. It brings back memories of a comprehensive school deep in the 1980’s. All polyester blazers and fermenting gym kit. Let me bring the X’s and the Y’s a little more up to date:

• X versus Y
• Sales rate versus call handle time
• Information collected versus first time fix
• Quality versus cost

There are lots of X’s and Y’s in business. My time in Mr Gilpin’s maths class wasn’t wasted.

#### The question

Are these X’s and Y’s different or are they the same. How do they relate to each other?

In the world of big data and analytics (when did analysis become a noun?) the solution is easy to find, draw up some “descriptive statistics”. Summarise the data so that you can look at it.

#### Nothing to see here

I have saved your spreadsheet blushes and calculated the statistics for you. Here they are…

What do they say?

• The mean of X and the mean of Y — all four are the same.
• The standard deviation of X and the standard deviation of Y — all four are the same.
• The correlation between X and Y — all four are the same.
• The regression equation for X and Y — all four are the same.

So there you have it, the 4 groups of X’s and Y’s are all the same. Nothing has changed, there is nothing to worry about.

#### Or is there?

If I draw some scatter plots the data sets look very different.

#### The point of the story

If you want to understand what is going on in your business, don’t rely on the accountants and analysts with their beautiful tables of numbers. Draw some graphs and look at the data instead.

Better still, go and have a look at the shop floor. It will be far more interesting than your maths class ever was.

Image by Blondinrikard Fröberg

Republished with author's permission from original post.

James Lawther
James Lawther is a middle-aged middle manager. To reach this highly elevated position he has worked for many organisations, from supermarkets to tax collectors and has had multiple roles from running a night shift to doing operational research. He gets upset by operations that don't work and mildly apoplectic about poor customer service.

1. This is an absolutely delightful post using a clever way to make an important point. It reminds me of a great piece I read in the ’70’s on “Salt Passage Research.” http://bit.ly/2GRGyjB. But the best warning about the illusion of stats to communicate the truth comes from John Steinbeck’s book Sea of Cortez, in which he describes a fishing expedition off the coast of Los Cabos, Mexico in this way:

“The Mexican sierra has 17 plus 15 plus nine spines in the dorsal fin. These can easily be counted. But if the sierra strikes hard on the line so that our hands are burned, if the fish sounds and nearly escapes and finally comes in over the rail, his colors pulsing and his tail beating in the air, a whole new relational externality has come into being-an entity which is more than the sum of the fish plus the fisherman.

The only way to count the spines of the sierra unaffected by this second relational reality is to sit in a laboratory, open an evil-smelling jar, remove a stiff colorless fish from the formalin solution, count the spines and write the truth. There you have recorded a reality which cannot be assailed-probably the least important reality concerning either the fish or yourself.”

I have fished for Mexican Sierra off the coast of Los Cabos and I can verify his accurate description of the experience, despite what the research may report.

Thanks, James for your great work on this! May never make a Type 2 error and always remain a few standard deviations away from the mean of the gullible crowd!!

2. Once again, making the case for understanding the differences between correlation/regression (simple or multiple) and the causation, or qualitatively and quantitatively drawn real drivers of results. Maybe it shouldn’t be so surprising to me that, even with so much evidence to challenge use of correlation data, so few analysts and companies seem to comprehend this.

3. Interesting read, you summed it up well. Data visualization is crucial in developing a practical statistical model, but just as statistics, data visualization can be misinterpreted or misrepresented because users may not understand the fundamental concerns with charts, graphs and maps that apply to their world.