Defining Big Data, it’s all about the analytics


Share on LinkedIn

As is true of most analytic types, I've grown tired of the hype and hub-bub surrounding Big Data, but I love having lot's of data to analyze. So, I am accepting of the furor for the fact that it gets companies to pay attention to all the data they have that is not being analyzed and used for insight creation. And that brings me to my main point in this post: Big Data is a topic of conversation ONLY because of the desire to analyze and extract insight from data that was previously untouchable.

A recent post by Information Management titled "Defining Big Data" references this open-source article by two computer science students (Ward and Barker). The article reviews different definitions of Big Data put forth by leading companies and organizations. Of course, they mention Gartner's 4 V's definition (with no mention of the 5th and 6th V's or my proposed 7th V), but the thing that really caught my attention is this quote from the paper: "... big data is intrinsically related to data analytics and the discovery of meaning from data." I firmly believe this. The discussion around Big Data often revolves around the question of "How big?" but I think the real focus is around how we can extract insight and value from data that has previously been too big or too inaccessible to analyze. This is the real big data challenge to me.

And this leads to my new favorite definition of Big Data highlighted by Ward and Barker, which comes from the Method for an Integrated Knowledge Environment (MIKE2.0) project: Big Data is defined by a high degree of permutations and interactions within a dataset. This highlights the most important and challenging characteristic that has kept us from analyzing data in the past - it's complexity. Complex data doesn't need to be big. If you are analyzing a terabyte of data to determine the correlation between two fields, I would question whether you are really doing Big Data analytics. However, if you are mining the inter-relationship of 100 variables in a sparsely populated dataset that is 50MB, then you will need to push the envelop of data science to extract a useful insight. This is qualifies as Big Data analytics to me.

So maybe the best definition of Big Data is simply this: "Data that you haven't been able to extract new insights from before." 


Please enter your comment!
Please enter your name here