The Four V’s, Big Data, and Big Data Skeptics: Half Right and Wholly Wrong

6
262

Share on LinkedIn

This past week I did an internal webinar for a client on big data systems. It was a good session with lots of back-and-forth and questions – which not only makes for a more interesting and relaxed presentation, it’s more likely to stimulate my thinking. I hear more than enough of myself talking!

So even though this is a digression (from my posts on Forecasting and Dashboards) inside a digression (my larger series on re-engineering Voice of Customer at the enterprise level), I wanted to recap and expand on a couple of the central themes of that presentation.

First, a quick table-set. I’ve taken a crack in the past at defining what “big data” is all about. It’s always hard when a concept takes off to keep control over it – the Hype Cycle (Gartner’s lovely concept) can take anything – from Presidential candidates to epistemology to piano-playing cats and so over-expose them as to make the underlying reality nearly impossible to discern.

Naturally, this creates a kind of backlash. Plenty of web analytics pundits are more than willing to describe big data as just hype. Not only is there a certain cachet in running counter to a popular trend, there’s a certain self-interest here too. Web analytics companies nearly all come from Web analytics tool backgrounds – said tools being, in most respects, the antithesis of big data tools. I imagine that it’s always safest to assume that anything you don’t understand must be unimportant!

I’m not a fan of the industry standard definition of big data (probably best exemplified by the four Vs: Volume, Variety, Velocity, Veracity) in part because I think it’s vulnerable to the criticism of skeptics that we’ve always had these exact same factors.

The four Vs do describe most big data situations, they just don’t get to the heart of what big data is all about. Back in the early ’90s when I was doing credit-card work, we had volume most of today’s big data companies would still consider massive. We had plenty of velocity too, and veracity was pretty darn important when clearing card transactions. We didn’t have variety, but it’s implausible to argue that variety is essential to every big data application. There are plenty of big data applications that are single source. If I’m trying to mine CNN’s digital data stream, I don’t need variety to be in the big data universe.

So were all of us in credit-card working on big data in the early ’90s? Some might say yes, but I don’t think so.

Instead, I’ve proposed a simpler, more basic, and more fundamental definition of what uniquely defines big data. Big data happens when you drive your data capture and analysis down from the traditional levels of analysis (like customer or transaction) to a level where the meaning of each event can only be interpreted in relationship to the stream of events. Digital is a paradigm case for this. Web site page events are not, in and of themselves, meaningful. The meaningful level of aggregation is somewhere in the sequence of events and that’s what you have to interpret.

It’s not too different in utilities. When you move from the once a period reading of a meter per customer to constant collection, you change the nature of the analysis and data capture problem. No single meter reading is, in and of itself, important. It’s in the flow and pattern of the readings that meaning emerges. This is a different type of analysis.

It should also be clear from this why the Four V’s look like a reasonable definition of big data to those in the field. When you drive your unit of analysis down a level, you increase by one or more orders of magnitude the volume of your data capture and the velocity of your data. You place additional demands on data collection that can result in poor data quality. And while variety isn’t necessarily wrapped up in the concept, you have created a whole new set of challenges around joining data that lives at the stream level – making multiple sources (variety) far more difficult to handle.

But the beauty of the definition I’ve provided is that it makes it clear why my ’90s credit card work – despite hitting the V’s pretty well – wasn’t necessarily big data. No amount of the four V’s make for big data if you’re just scaling up the same exact types of data and analysis as you’ve always done. It also explains much of why today’s generation of big data technologies are built the way they are and why they provide unique advantages that traditional transactional systems don’t. Those traditional systems sure-enough handled lots of volume – just not in the ways we need it handled now.

It also explains another aspect of big data that is particularly important and represents one of the biggest risks if you’re building a big data system. The nature of the analysis and the methods necessary to join, process, and understand the data all change at the stream level.

Traditional analysis techniques, from joining methods to sql queries to aggregatations to traditional statistical techniques like correlation, regression and clustering all work differently – if they work at all – when applied to this type of detail, stream data.

I’ve seen this cast as a debate between machine learning and traditional analysis; it isn’t.

Machine learning may (though I think it’s debatable) be particularly useful in big data situations because of the symptoms (the four Vs) that spring from detailed stream-level analysis. As far as I can tell, there is nothing about detailed stream-level analysis itself that makes machine-learning particularly suitable. The really important point isn’t about machine learning – it’s that your standard analytics toolbox is mostly out the window.

So while defining big data by the four V’s may miss the mark, it’s far, far more misleading to suggest that big-data is just “more of the same” – perhaps with a bit of an emphasis on the “more”. If proponents of the more of the same view are claiming that we still need to decide how to structure data, how to join data, how to query data, and how to analyze data then their claim is merely empty. Of course we do. But if they mean to claim that we should use the same methods to join, structure, query and analyze the data as we always have in traditional transactional or BI systems, then they are flat-out wrong.

It’s nearly always a safe bet that opposing the hype cycle will make you at least half-right. But if the big data skeptics are half-right about what’s wrong with the hype, they are wholly wrong about the alternative.

[I should mention that I’m going to be speaking on a Big Data panel at the DAA Symposium in Washington DC on June 4th. The Symposiums are the probably the single best thing the DAA has created – I’ve attended a fair number and been consistently impressed. If you’re in the area, do come out!]

Republished with author's permission from original post.

Gary Angel
Gary is the CEO of Digital Mortar. DM is the leading platform for in-store customer journey analytics. It provides near real-time reporting and analysis of how stores performed including full in-store funnel analysis, segmented customer journey analysis, staff evaluation and optimization, and compliance reporting. Prior to founding Digital Mortar, Gary led Ernst & Young's Digital Analytics practice. His previous company, Semphonic, was acquired by EY in 2013.

6 COMMENTS

  1. Hi Gary,

    Although I mostly agree with the arguments in your article, I wish to point out one thing – the definition of Big Data needs to be enhanced to incorporate one more V – value. All the analysis around data is worth being called Big if it creates value, else its junk. I have presented my arguments in a blog here -> http://www.customerthink.com/blog/what_actually_is_big_data_the_different_definitions

    Please see if you can spare a few moments to critique and improve my definition

    regards
    Abhishek

  2. Abishek,

    I’m certainly on board with the necessity to deliver value. Indeed, I think most of us are saying pretty much the same thing – and I doubt that if you, and I, and the IBM or Oracle or Aster/Teradata big data teams sat down we’d have any deep disagreement. It’s really more a matter of perspective – I definitely come at the problem of big data (and definition) from an analyst’s perspective not a technology vendor, and I suspect you’re pretty much the same.

    That being said, this isn’t entirely a “difference that makes no difference is no difference” kind of issue. Definitions that position big data as “beyond what can be handled in traditional technology” leave open to skeptics the charge that big data is just more of the same and is more hype than reality.

    I’m not sure that adding Value to the equation avoids that risk – since that’s also a part of what we’ve always wanted in traditional data processing and analytics.

  3. I find “big data” a useful way to draw attention to new kinds of data, and the growth of data overall.

    I’m not convinced it will have a much longer life than “Social CRM,” however. For a few years it was all the rage, but as I expected/predicted, “social” has now become an integral part of CRM solutions.

    In the case of Big Data, I think the same will happen. Web data has been around since, um, web sites existed. Social and unstructured data is hardly new, either. Of course now there’s sensor data and more, and that’s the beauty of Big Data — it’s extensible!

    As for adding Value as one of the Vs, I agree with Gary. I don’t see how it differentiates Big Data from all the boring old Small Data. Value is something you should get out of any data/analytics, else what’s the point?

  4. Indeed, agree with you. The only reason I wanted to point that out is because there are many companies that are chasing Big Data for the sake of Big Data without having visibility on how much value can it create.
    Some retailers for example believe they can increase consumption by “predictive analytics” when there are so many other simpler problems they are getting wrong.
    Anyway, on the whole I agree with what you are saying

  5. Hi Bob,

    You and me are on the same page. I agree with you on the “new kind of data”. On the note of Social and Big Data, I believe that is a place of probable impact when it comes to Big Data. I have penned some thoughts in my post on that very matter-> http://www.customerthink.com/blog/social_media_big_data_declared_preferences_vs_discovered_preferences

    But whether Big Data gets subsumed under Social, or whether we indeed find unique ways to create value our of it, well that time will tell.

    But I think another point where Gary, yourself and me agree is that there is fundamentally nothing new with data analytics that Big Data is proposing. Only may be the data sources have evolved. Let’s see…

  6. I just attended Predictive Analytics World and visited with some of the vendors there. In particular I was trying to figure out what was different about “big data” analytics.

    In short, they convinced me that analytics tools designed to mine the newer forms of data (clickstream, social, … ) are different than traditional BI tools designed to work on relational data.

    That said, it’s probably only a matter of time before the analytics software industry responds, by creating their own solutions (SAS, IBM) or making acquisitions of the startups.

ADD YOUR COMMENT

Please use comments to add value to the discussion. Maximum one link to an educational blog post or article. We will NOT PUBLISH brief comments like "good post," comments that mainly promote links, or comments with links to companies, products, or services.

Please enter your comment!
Please enter your name here