Lies and big data

February 3, 2014

127

The other week someone brought to my attention an article with a title “Lies Data Tell Us” by Steven J. Thompson, CEO at Johns Hopkins Medicine International. The title took me aback, but as I read it I realized the article was really about better practices required for data to be more useful. Use of the provocative and somewhat misleading title resulted in nearly 12K views, dozens of comments and hundreds of shares in social media. When I started looking for this article again, the search brought a number of links that associate data, big data, etc. with “lies”. Most of the authors blame data or unscrupulous mining and analysis technology vendors for all sort of business problems resulted from “data lies”. It seems some of these authors use the following definition:

Data Scientist (n): A machine for turning data you don’t have into infographics you don’t care about.

I would like to examine a process people often follow when they deal with data.

Since the term “big data” is thrown around a lot, I would like to define it in the context of this article. Mere volume and velocity of data does not constitute “big data”, but multiplicity of data sources and data formats does. From that perspective the term “big data” describes an enterprise data aggregated from multiple departments and multiple data bases (i.e. data warehouse model), linked with data from sources external to a company, in a structured and/or unstructured format. Mining such set of “right data” may produce very valuable intelligence. However, all can also result in waste of money, efforts and opportunities if

The mining process does not produce relevant new intelligence, or
The intelligence is not used for action.

We act when we believe the action will result in a desirable outcome. We never know for sure, but we estimate probability based on our experiences in similar circumstances. These dynamics influence how we select, search and interpret the data into intelligence, or lack of thereof. Subconsciously we select data that would likely provide confirmation of our existing beliefs. This usually means that we heavily rely on internally generated (controlled) data and heavily discount externally generated data.

We like to use such terms as unbiased and objective, but the very process of selecting a data set introduces bias and subjectivity. It is unavoidable. It is a much better practice to embrace and understand a bias that is pragmatic, and define a purpose of an inquiry. You don’t see people mining a mountain to find “whatever” is there. They carefully select and test an area for an indication of high concentration of desired mineral before the exploration and mining start.

If the purpose of your inquiry is improvement of customer experience, assemble a data set from the most relevant internal and external data sources available. If you limit your data set to a company controlled data, you introduce a company bias. In such a case the likelihood of discovering any new intelligence for improving your customers experience is quite low. Forget about data mining and just continue your archaic surveying exercises of “guess and validate”. If you include data generated by customers without solicitation and control, you will introduce customer bias. Introduction of channel generated return data and customer service data will allow for balancing of the biases. Correlation of trends in controlled and external data sources will help to discover potential gaps between your beliefs and emerging evidence. However, even the best evidence cannot automatically make people abandon their beliefs and start acting differently, but that is a subject of another article.

The point is – data cannot lie to us; we have to do it ourselves by not mining it honestly and competently.

Republished with author's permission from original post.

2 COMMENTS

Michael Lowenstein February 4, 2014 At 8:02 am

Completely agree with your perspective, and your contextual definition. In order for sources and streams of information to be identified as ‘big intelligence’ rather than ‘big data’, there has to be action-driven objectivity in its analysis and application.

One of the several ways in which data can lie is to use analytical tools of omission, rather than commission, i.e. slavishly looking for connections and correlations from the various databases rather than for causation. There’s a big difference in the intelligence and insight this produces: http://customerthink.com/a_big_big_big_data_challenge/
Gregory Yankelovich (@piplzchoice) February 4, 2014 At 12:06 pm

Michael, thank you for your comment. Finding causations while examining open systems is a very difficult, and some would say impossible, proposition. More than one statistician have told me that “causation is an ideological term”. Like in many debates, the key to resolution lay in a very tight definition of “the truth”. Even gravity works only under limited conditions. I hope you would agree that these limitations should not stop us from using correlations to build models, IMHO. I addressed this issue in http://blog.amplifiedanalytics.com/2014/01/customer-experience-correlations-and-predictions-of-the-future/

ADD YOUR COMMENT
Please use comments to add value to the discussion. Maximum one link to an educational blog post or article. We will NOT PUBLISH brief comments like "good post," comments that mainly promote links, or comments with links to companies, products, or services.

Lies and big data

2 COMMENTS

ADD YOUR COMMENT
Please use comments to add value to the discussion. Maximum one link to an educational blog post or article. We will NOT PUBLISH brief comments like "good post," comments that mainly promote links, or comments with links to companies, products, or services.

New Posts

Four Strategies to Revolutionize B2B Revenue

CX Design Part I: How Leading Enterprise Tech Firms Overcome Complex Challenges

The CMOs 5-Step Guide to Boosting High-Quality Traffic and Customer Engagement

Mastering the Digital Landscape: Crafting a Winning Social Media Marketing Strategy in 2024

7 Amazing Ways Costco Boosts The Consumer Economy

2 COMMENTS

ADD YOUR COMMENT Please use comments to add value to the discussion. Maximum one link to an educational blog post or article. We will NOT PUBLISH brief comments like "good post," comments that mainly promote links, or comments with links to companies, products, or services.

New Posts

ADD YOUR COMMENT
Please use comments to add value to the discussion. Maximum one link to an educational blog post or article. We will NOT PUBLISH brief comments like "good post," comments that mainly promote links, or comments with links to companies, products, or services.