Defining Big Data, it’s all about the analytics


Share on LinkedIn

As is true of most analytic types, I've grown tired of the hype and hub-bub surrounding Big Data, but I love having lot's of data to analyze. So, I am accepting of the furor for the fact that it gets companies to pay attention to all the data they have that is not being analyzed and used for insight creation. And that brings me to my main point in this post: Big Data is a topic of conversation ONLY because of the desire to analyze and extract insight from data that was previously untouchable.

A recent post by Information Management titled "Defining Big Data" references this open-source article by two computer science students (Ward and Barker). The article reviews different definitions of Big Data put forth by leading companies and organizations. Of course, they mention Gartner's 4 V's definition (with no mention of the 5th and 6th V's or my proposed 7th V), but the thing that really caught my attention is this quote from the paper: "... big data is intrinsically related to data analytics and the discovery of meaning from data." I firmly believe this. The discussion around Big Data often revolves around the question of "How big?" but I think the real focus is around how we can extract insight and value from data that has previously been too big or too inaccessible to analyze. This is the real big data challenge to me.

And this leads to my new favorite definition of Big Data highlighted by Ward and Barker, which comes from the Method for an Integrated Knowledge Environment (MIKE2.0) project: Big Data is defined by a high degree of permutations and interactions within a dataset. This highlights the most important and challenging characteristic that has kept us from analyzing data in the past - it's complexity. Complex data doesn't need to be big. If you are analyzing a terabyte of data to determine the correlation between two fields, I would question whether you are really doing Big Data analytics. However, if you are mining the inter-relationship of 100 variables in a sparsely populated dataset that is 50MB, then you will need to push the envelop of data science to extract a useful insight. This is qualifies as Big Data analytics to me.

So maybe the best definition of Big Data is simply this: "Data that you haven't been able to extract new insights from before." 

Republished with author's permission from original post.

Troy Powell, Ph.D
Troy consults on solutions to derive insights from customer information that optimize business performance. He has primary responsibility for deploying advanced analytics and developing innovative solutions for understanding and driving customer behavior. Troy has fifteen years of research across multiple research disciplines for both academic and corporate organizations. Troy holds a Ph.D. from Duke University.


Please use comments to add value to the discussion. Maximum one link to an educational blog post or article. We will NOT PUBLISH brief comments like "good post," comments that mainly promote links, or comments with links to companies, products, or services.

Please enter your comment!
Please enter your name here