One man’s trash is another man’s veracity


Share on LinkedIn

As many of you may know, Big Data has added a 4th ‘V’ to it’s definition – the concept of Veracity. Oh wait, apparently a 5th and 6th ‘V’ have been added – Value and Viability. A 7th? Not yet? It’s only a matter of time. I nominate vaticination.

The Four V's of Big Data

Since Veracity was suggested by IBM, I’ll focus on this one for now. Veracity is meant to represent the fact that Big Data contains a lot of noise and error that can easily obscure the ‘truth’ that exists within the data. The data tend to be so big and move so quickly that standard data cleaning and management procedures are difficult to apply. As a result, the trustworthiness of Big Data is an important fact to consider and to question.

However, we have to recognize that Veracity cannot be singularly defined for a given set of data. It depends on the question the data is answering and the purpose for which the data is being collected. For instance, let’s say I have a massive dataset of all the local deliveries made by a construction supply company. This dataset contains information on every order over the last year, and includes the products in each order, the customer who ordered it, the site it was delivered to, and whether the delivery was on-time. This data could have high veracity when answering a question about problem areas or problem products for on-time delivery. However, it will have lower veracity when answering a question about the next product a customer is going to order. This particular dataset will be missing information about will-call orders, returns, and detailed customer demographics – all things that would be important for creating a next-best offer model.

In my opinion, we spend too much time trying to define Big Data, and not enough time figuring out what data and analytic approaches are necessary to answer key business questions. The discussion and scoping of Big Data is an important discussion to have. Harnessing and understanding Big Data will lead to new questions and new answers for the world, but there are still plenty of big questions that need to be answered in the meantime using whatever size and type of data we have available.

Republished with author's permission from original post.

Troy Powell, Ph.D
Troy consults on solutions to derive insights from customer information that optimize business performance. He has primary responsibility for deploying advanced analytics and developing innovative solutions for understanding and driving customer behavior. Troy has fifteen years of research across multiple research disciplines for both academic and corporate organizations. Troy holds a Ph.D. from Duke University.


Please use comments to add value to the discussion. Maximum one link to an educational blog post or article. We will NOT PUBLISH brief comments like "good post," comments that mainly promote links, or comments with links to companies, products, or services.

Please enter your comment!
Please enter your name here