Correlation is Not Causation: Big Data Challenges and Related Truths That Will Impact Business Success in 2013

3
406 views

Share on LinkedIn

For years, social scientists and consultants have warned the corporate world about making too much of correlation analysis, the simple regression technique which shows the relationship between one set of attitudes or behaviors and another. As an example, “The Service Profit Chain”, a model developed by three Harvard professors in the ’90’s, is generally summarized as happy employees = happy customers = happy shareholders. In other words, at the core of effective employee engagement is the tacit belief that there is a direct relationship or linkage between higher employee satisfaction and customer experience. And, as found by noted customer experience expert Frank Capek, though elevated levels of customer service, and also increased profitability, may result from enhanced employee engagement: “…just because employee satisfaction and engagement are correlated with customer satisfaction doesn’t mean that making employees happier will lead to better customer experience. This is one of those classic traps your college professors warned you about: confusing correlation with causation. I’ve observed that this flaw in logic has led many organizations to invest in trying to make their employees happier in the hope that those happier employees will turn around and deliver a better experience for customers. We’ve just seen too many companies where, at best, more highly engaged employees simply deliver a sub-par experience more enthusiastically.”

What is true in the world of employee behavior optimization is, if anything, even more of a fact in the broader landscape of marketing, brand positioning, communication, and customer experience. CMO’s, for instance, have relied on correlation as a core analytical approach for connecting basic customer value performance, and the identification of unmet needs, to forecasts of potential sales and profitability. Today’s customer, however, is less patterned and more self-educated, more socially-connected, and independent-thinking; and this sea change has put a great deal of pressure on the kinds of customer information, and analytics, CMO’s are used to using. There are new streams, and types, of data that CMO’s will need to understand; and there are also new analytical approaches – getting to key, causative drivers of customer behavior – that will be required for insights into why customers think what they think, say what they say, and do what they do.



This is, to a great extent, where “big data” comes into play. Marketing has always had some volume of available macro quantitative data, such as customer profile, purchase history, ad hoc research, historical brand and transactional tracking reports, etc. – for looking at decisions involving target audience, new products and concepts, value proposition, and competitive positioning. That said, marketing management, and the cultures in which they function, have also historically relied more on the conventional wisdom brought about by their own experience and instincts, creative concepts under consideration, and engaging communication.

Customer data analysis, where applied, has largely been of the straightforward correlation or cross-tab type, for evaluating simple, core business elements. In sum, marketing has been more about supporting big ideas than having objective, insightful information. Now, with the kinds of analytical tools which are emerging, marketers can crunch petabytes and terabytes of data from the sources just identified, plus third-party statistical information exchanges, and public and industry stats, to create all manner of fascinating correlations. The worthy goal is to identify connections, or data stream correlations, between one set of customer information and another, in ever smaller and smaller audience microsegments so that marketing dollars can be more effectively spent.



This is where users of such data need to exercise care. Correlation is not causation, and the insights produced by big data analysis tend to be only directional, tentative, and preliminary. Even though big data are more complete, and more available, than ever before, there are often missing elements in databases, plus disruptive, or confounding, factors which can compromise data quality. Another way of saying this is that correlation analysis of big data sets generates results that should be seen as more hypothetical than actual; so there is little assurance that any correlations uncovered by big data will directly influence customer behavior.

Rather than serve as an unchallenged platform for decision-making, a better use of the insights spun out by big data analysis, and especially correlation analysis, would be to distill the results into testable value propositions. A marketing, and corporate, culture built on testing and controlled experimentation leads to more financially sound, proof-based, answers which will truly help grow the business. Testing of ideas and concepts certainly isn’t new, but it needs to become an enterprise mantra. Leading companies have already embraced testing and experiments as the fuel for an engine of success.



3 COMMENTS

  1. Hello Michael

    Well said. And it occurs to me that data is simply that data. Data is only useful when it is interpreted. Interpretation is the act of making sense – giving meaning and significance to the data so that it becomes relevant and useful for how one lives one’s life.

    This of course is where the fun starts. For a given set of data many interpretations are possible. And in the course of interpretation there is bias – has to be because all interpretation proceeds from a specific way (out of many) of looking at / making sense of the world. Including, what is worth making sense of and how one goes about making sense of it. This is in addition to cognitive pitfalls like confusing correlation with causation. Or seeing that A lead to B and not seeing that B might lead to C and C lead to B. So that A influences/leads to B and B influences/leads to A. That is to say that their is circular relationship rather than a linear relationship.

    Then there is wisdom. Wisdom comes through experience of being in the game – again and again, getting present to the structure of the system, the hidden rules of the game, what should not work according to theory and does work in practice and vice versa. This is where the real fun starts. The data tells you one thing. And wisdom tells you another.

    I say that big data is a goldmine for the IT vendors. And a honeytrap for the many who as yet cannot even use small data effectively.

    As always I thank you for writing a thoughtful considered piece. A refreshing change in a an ocean of ‘bullshit’. Incidentally, it is quite likely that I am contributing to that ocean of ‘bullshit’.

    Maz

  2. …and I always find your perspectives, and the stories behind them, both welcome and refreshing.

    Per your points, researchers can, rather easily I’ve found, fall into the Venus Flytrap of logical fallacy. In the haste to make correlations work, they miss that the reasoning behind an argument or hypothesis can be flawed. And, by the way, it’s just as inappropriate to make the opposite assumption, that correlation proves causation, so when two events occur together they have a cause-and-effect relationship. This, from my long-ago memory of Latin, is “cum/post hoc ergo propter hoc” (“with/after this, therefore because of this”). Implication, or suggestion, of connection between events, however, doesn’t make them real – in science or in marketing.

LEAVE A REPLY

Please enter your comment!
Please enter your name here