The Importance of Data Quality and How to Achieve It

0
203 views

Share on LinkedIn

While every organization recognizes the value of business data, there is still a lot of uncertainty and confusion about how to unlock its value to optimize business outcomes. One barrier is simply the sheer volume of customer data from an increasing number of sources; the velocity alone can be overwhelming. Complicating matters is that data comes to the enterprise in multiple formats, different entry systems, a lack of context, various rules for use, and is often inaccessible in time to meet business needs.

The financial and operational costs can be staggering. In a recent study from independent research firm Vanson Bourne on the impact of data distrust, 91% of IT decision makers agree that they need to improve the quality of data in the organization, and 77% said they lack trust in their business data. Largely because of poor data quality, 76% said they are missing out on revenue opportunities. A few years ago, IBM famously pegged the lost opportunity of “bad data” at a whopping $3 trillion.


To learn more about why data quality is so confounding, and to explore what organizations can do to alleviate the common pain points, we invited Redpoint Global CTO George Corugedo and Redpoint CMSO John Nash to discuss the state of data quality. Redpoint Global is a software company based in Wellesley, Mass., experienced in solving enterprise challenges around customer data, actions and insights. It recently released Redpoint In Situ, a cloud-native, data quality as-a-service (DQaaS) solution with a mission to perfect customer data within the enterprise in a seamless and simple way.

John Nash, Redpoint CMSO: George, it seems like someone could ask 10 different companies why they struggle with data quality and receive 10 different answers. Data siloes, an unintegrated technology stack, update cycles, governance and data privacy concerns – the list goes on. Is there a common undercurrent that jumps out to you as one of the main reasons organizations struggle?

George Corugedo, Redpoint Global CTO: Aversion to the complexity and a poor understanding of strategy. Mastering data, representing that idealized version of what your data should be, is complicated. It requires a detailed analysis of both the data that is available and its ability to support the brand’s strategy. Unfortunately, most companies don’t have a clearly defined strategy against which to apply the data. Therefore, it becomes virtually impossible to master your data if you don’t know the purpose you’re mastering it for. Getting this right requires some really hard questions, detailed analysis and rationalization but the pace at which most companies are typically running makes it extremely difficult, if not impossible, to slow down and do the required analysis to best determine how to leverage and monetize one’s data. The technology for processing data exists and has been in the market for many years. It’s the will and the purpose that is missing.

Even though a majority of organizations do struggle with data quality, the good news is that most also recognize the critical need to harness business data to realize the value and ultimately monetize their data. Every businessperson shares this ambition, but it’s hard to understand how to unlock the potential without the skill and technical expertise to perfect your customer data accurately and consistently. Though the technology exists to address the problem, for the most part, there is a lack of skills, capabilities and expertise in sufficient numbers to make a dent in such a widespread problem. That’s what I see as the main stumbling block. The market is ripe for disruption, and there’s a lot of opportunity for organizations that find a better way.

JN: It’s difficult to know the solution without understanding the problem. What can organizations do today to at least get on the right path and shed some of the complexity?

GC: The first thing they should do is define a strategy. In the absence of a clear brand strategy, at least define specific use cases where data can be used to drive new streams of revenue or improve established ones. By defining these use cases, those responsible for mastering the data now have an explicit set of criteria and actions they need to support with the data. This then translates into prioritizing data elements, defining transformations and KPIs and identifying gaps in the data that will need to be filled to support the use cases. The good news is that the mechanics of getting the data cleaned up are well understood and straightforward.

Comprehensive data quality entails three main steps. All three require attention to enable the monetization of business data through bold, data-driven actions and aggressive strategies. First, data must be translated into a common lexicon across the enterprise. Second, every PII element must be standardized. Third, advanced algorithms are needed to probabilistically bring together data records of all data sources into a group that together can be used to construct a golden record of full complete, accurate and timely data about any entity the business is focused on, be it a customer, a household, manufacturing parts or financial entities.

An abundance of data, data sources, various systems, and data that is siloed by channel, department or processes make those three steps more difficult than they need to be. Just think about how hard it is for most companies to reach a consensus on what data means, whether it’s a question of standardization or a business definition or even agreeing on an apples-to-apples comparison.

But data quality depends on completing those three steps with a high degree of competence in order to link the data, which is the final step required to actually unlock the value of business data. Data linkage takes the harmonized formats and lexicons and the standardized PII elements and, through AI-based algorithms and other techniques, matches all data about a person or entity to a record. In situations where you have a consumer and a brand – with marketing as the prime example – that record is a real-time snapshot of a person’s history, interests, likely behaviors, utilization and preferences. Whenever the person appears on a channel, physical or digital, a brand is primed to deliver relevant content, a next-best action that is perfectly in sync with a customer’s journey and optimized for the channel of engagement.

Comprehensive data quality fuels a virtuous cycle of data, quality and action that breaks free from the technology, channel and process siloes that prevent the business from harnessing the true power of business data.

JN: You mentioned that capabilities are inadequate for the existing tools and technology to drive that virtuous cycle. What are some of today’s approaches to data quality that attempt to power through this deficiency?

GC:
Deception and distraction. Many companies claiming to do this type of work are literally deceiving and distracting customers away from the important topics of data quality and identity resolution. Using gamified UIs, lots of buzzwords and hype they set out to confuse the issue rather than clarify. This is largely how most companies in this space “power through it.”

If you think of the capability as the combination of skills and technology, the capabilities for mastering data are indeed insufficient to the size of the problem in the market. However, the technology is there and has been there for many years, so this is a skills and resource supply problem that fails to address the unmet need for data quality. Given the velocity of data growth and the overall pace of business, it’s hard to imagine how that skills gap gets filled. Thanks to office supply giant, Staples, everybody understands the concept of the easy button. That’s what everyone wants expects and is in the continual search for. The data “easy button.”

Because it’s hard to do it right, many solutions offer shortcuts or other runarounds that either shifts the problem downstream or ignore it altogether. Data transformations present a good example. Many vendors promote a data architecture that separates storage and compute capabilities, as a holdover from the Hadoop “Big Data” hype, which promises fast data ingestion without “cumbersome” transformations. But they’re only cumbersome if the lack of skills or technology makes it so. In addition, high-speed ingest into a data lake is fine, but data lakes are awful for managing quality and data governance. Some promote an easy-to-use transformation tool, but a closer look shows that it falls outside the sphere of data governance. What happens then is that transformations either must be built by someone else upstream, or inconsistencies develop because users who access the transformation tool may have access to different data, which makes it difficult to manage transformations consistently. They are all just poking at the problem not embracing and solving it. In the meantime, they go to market promising “the easy button” and end up delivering nothing but disappointment and failure.

Other approaches include sending your customer data out to a data quality and identity resolution vendor. These are typically SaaS organizations, a cloud version of a managed services provider. These companies require you to send them your data, to join it up against a reference file of data that has been collected over the years and years of data processing. In addition to the inherent risk of sending out your customer data, the results of such a process are never as good as the sales pitch. The reference files go stale quickly, over-matching is always a problem because they are paid on matched records, and it takes time; lots of time. Often, these processing cycles have a latency of days, or at least tens of hours. This kind of latency is completely inconsistent with the idea of providing your customers a high-quality omnichannel customer experience that aligns with your brand values. Customers have been won or lost by the time you get the data back.

JN: Lastly, George, data privacy is obviously a very hot topic right now. With the phasing out of the third-party cookie, we’re seeing brands and organizations focus more on first-party data. How does data privacy tie into data quality, and what can enterprises do to ensure that one is not sacrificed for the other?

GC: That’s a great question John and the forthcoming privacy requirements are really the buzz in the market. Most companies are in a bit of panic that they will no longer be able to acquire audiences as easily as they used to from the DMP’s and others. However, in an interesting way, there are more opportunities here than restrictions. While purchasing audiences will go away a new context for engaging your customers directly is emerging. In the coming world where a customer’s personal data must be requested directly from the customer, for limited use, in a specific window of time, there is an opportunity to exchange value and establish trust. Now trust may be a somewhat ephemeral word, yet in recent surveys of consumers’ trust for the brands they buy from, only 46% claim to trust the brands they do business with. This means that more than half of the consumers in the United States have no brand trust or loyalty. I see this as an enormous greenfield opportunity for brands to earn those customers’ trust and secure them as loyal, engaged supporters of the brand. In my memory, such a potentially massive rest as this could be. Yet, it’s all about trust; as McKinsey calls it, we are entering the “Trust Economy”.

The other complexity for many of the “pseudo” data quality vendors, particularly those using reference files, is that consumers will be validating their own data in exchange for some value. If they do that there is just no need for an external reference file. What will be needed are the tools that can immediately ingest and probabilistically match new data against existing records, to prevent the creation of duplicate records and make sure addresses and other contact information are correct. What is needed is functionality at the edge, not a “Vlookup” table that takes two days to process. In addition, I would ask consumers to think about the last time they opted in to have their data used in a reference file. The answer is you probably don’t remember but likely opted in as part of some privacy policy you accepted but never read. It just so happens it’s exactly this dynamic of intentional uses of personal data that the privacy legislation is going after and making illegal.

The bottom line is that unless an “easy button” presents itself to the market there will be disruption. The unmet need is so great and so valuable that any vendor who can provide instantaneous data quality where the data resides, do it in real-time, and without depending on deep data quality and data matching skills, will cause a significant disruption in this market and change the dysfunctional dynamics between business and data forever.

LEAVE A REPLY

Please enter your comment!
Please enter your name here