Google Flu Trends: Importance of Veracity, the 4th “V” in Big Data


Share on LinkedIn

A lot has been written recently criticizing Goolge’s Flu Trends – a flu tracker service that predicts flu activity based on specific search terms using aggregated Google search data and estimates current flu activity around the world in near real-time. For more, read How does this work?

Science magazine has recently published an article titled “The Parable of Google Flu: Traps in Big Data Analysis” and Steve Lohr has published a great piece in BITS blog of New York Times titled “Google Flu Trends: The Limits of Big Data.

It is important to note that over-estimation of flu activity in Google Flu Trends is NOT a limitation of Big Data or Analytics used for estimating the flu activity as some of the writers have suggested. Rather, it highlights importance of fourth “V” of Big Data – Veracity.

It is often mentioned that Big Data has three defining attributes – three Vs as they are called, namely Data Volume, Data Variety and Data Velocity. (for more, check out TDWI Best Practices Report titled Big Data Analytics). But this definition of Big Data misses a very important dimension or element of Big Data, namely Data Veracity.

I think Google Flu Trends estimates will be much more realistic if we were to incorporate Data Veracity, the fourth dimension of Big Data into estimation models and adjust estimates based on “Veracity Score”.

In other words, inaccurate estimates of flu activity as reported by Google Flu Trends is NOT a limitation of Big Data or Analytics, rather we need to incorporate the Data Veracity element into the estimation model.

What do you think? Do you agree that inaccurate estimates of flu activity as reported by Google Flu Trends is NOT a limitation of Big Data or Analytics?

Republished with author's permission from original post.

Harish Kotadia, Ph.D.
Dr. Harish Kotadia has more than twelve years' work experience as a hands-on CRM Program and Project Manager implementing CRM and Analytics solutions for Fortune 500 clients in the US. He also has about five years' work experience as a Research Executive in Marketing Research and Consulting industry working for leading MR organizations. Dr. Harish currently lives in Dallas, Texas, USA and works as a Practice Leader, Data Analytics and Big Data at a US based Global Consulting Company. Views and opinion expressed in this blog are his own.


Please use comments to add value to the discussion. Maximum one link to an educational blog post or article. We will NOT PUBLISH brief comments like "good post," comments that mainly promote links, or comments with links to companies, products, or services.

Please enter your comment!
Please enter your name here