Neither Your CDP Nor Your MDM Decides Who Your Customer Is

Share on LinkedIn Share on LinkedIn

A man applies for a mortgage and is turned down. The reason, when he finally drags it out from the mortgage company, is a defaulted loan he never took out. The debt belongs to a stranger who happens to share his first and last name. Somewhere in the machinery that assembles credit reports, two people had been folded into one. No human had ever decided they were the same person. A piece of software decided it, on the strength of a matching name, and moved on.

This kind of error is common enough to have a bureaucratic label, the “mixed file,” and common enough that the Consumer Financial Protection Bureau felt obliged to say the obvious out loud. Matching information to a person on the basis of name alone, the bureau advised in 2021, “is not using reasonable procedures to assure maximum possible accuracy.” Names are not identities. And yet, across a surprising amount of enterprise software, a matching name and a rough resemblance is exactly how the question of identity gets answered.

I spend my working life on that question, and I have come to believe it is the most quietly consequential problem in customer data. Every company wants the so-called “single customer view”: one trustworthy picture of each person it serves, assembled from the dozens of systems that each hold a fragment. Marketing has its version, support has another, billing a third. The promise of a unified view is that these fragments resolve into a person. The difficulty is that “resolve” is carrying an enormous amount of weight in that sentence, and most organizations have never decided who, or what, actually does it.

The map everyone half-draws

Ask a data leader how they reach that single customer view and you will usually be handed a familiar argument: the customer data platform or master data management system. The two camps have debated their merits for a decade. Years ago, on this site, the analyst David Raab drew the clearest map of the difference. A CDP, he wrote, is built for customer data and bought mostly by marketers, pulling information into a usable profile. MDM is a general-purpose discipline owned by IT, maintaining a governed “golden record” and a master ID that connected systems are required to use.

It is a good map. But buried in Raab’s own description is a line most readers skim past. Both the CDP and MDM, he noted, “require identity resolution.” Both rest on a prior step. Before a CDP can unify a profile or an MDM can govern a golden record, something has to decide which scattered records belong to the same human being. That decision making is its own discipline. It is not really a feature of the CDP, and it is not really a feature of the MDM. It sits between them and feeds both. It is the box the map leaves off.

What matching actually is

The technical name for this missing box is entity resolution, sometimes referred to as record linkage, and the reason it is hard is that human data is messy in ways that defeat the obvious methods. The obvious method is to match on exact keys: same email, same account number, same national ID. When those keys exist and agree, the work is trivial. They often do not exist. People mistype. They marry and change names. They give a work address on one form and a home address on another, a nickname in one place and a legal name in the next. Most people have multiple email addresses. The same person arrives as “Bob,” “Robert,” and “Rob.” A system that joins only on exact keys is blind to all of it.

This is why the discipline has two halves rather than one. Deterministic matching handles the exact cases. Probabilistic, or fuzzy, matching handles everything else, by weighing partial evidence. As the record-linkage specialist Robin Linacre describes it, the fact that two records share the name “John Smith” is “evidence that these two records may refer to the same person, but this evidence is inconclusive because it’s possible there are two different John Smiths.” Good matching quantifies that evidence, for and against, and reaches a judgment. It behaves less like a database join and more like a careful person reasoning under uncertainty. Statisticians have studied this for decades. The empirical comparisons summarized in the survey “(Almost) all of entity resolution” show probabilistic methods consistently outperforming exact-match rules on noisy data, and the United States Census Bureau treats record linkage as a core statistical activity, because the integrity of a national count depends on it.

None of this is exotic. It is a step. And the question every leader should be able to answer is simple: where, in our stack, is that step performed, and how well?

The quiet failures

When the step is assumed rather than owned, the failures are rarely dramatic. They are quiet, and they compound.

Assume the CDP handles identity. Most platforms stitch profiles on exact keys, which is fast and clean and works right up until it does not. The fuzzy cases fall through: the typos, the variants, the missing identifiers. One customer becomes two profiles, or three, each looking unified and each incomplete. Marketing trusts the picture and acts on it, and the same person is counted three times, emailed three times, and understood not at all.

Assume the MDM handles identity. Its strength is governance: one master record, carefully reconciled. But that strength carries an assumption of stability and a batch cadence. The golden record is cleaned on a schedule, and the rigid master ID, as Raab’s own sources observed, expects every system to fall in line. Real customer data is not stable and does not wait for the schedule. By the time the overnight reconciliation runs, the customer has already been served, or mis-served.

The cost is not theoretical, though it is usually invisible on any single dashboard. Gartner has estimated that poor data quality costs organizations an average of $12.9 million a year. In MIT Sloan Management Review, the data-quality expert Thomas Redman put the figure at 15 to 25 percent of revenue for most companies. A 2026 IBM analysis found that more than a quarter of organizations believe they lose over $5 million a year to poor data quality, with 7 percent losing more than $25 million. A meaningful share of that waste is not exotic corruption. It is the same person, counted twice.

A publisher in Vienna

Recently, I worked with a large Austrian news publisher, the company behind the Kronen Zeitung and the Kurier, as it tried to build one view of its readers across multiple product lines. But there was no shared identifier across systems, so the same subscriber existed as several different people, and the combined data was not reliable enough to act on. The remedy was not a new platform. It was making the resolution step explicit: removing duplicates across sources, reconciling the spelling variants, and assigning a single cross-system identifier. The unique identification of individuals improved by more than 24 percent, and only then did the analytics and the AI work that depended on knowing the customer become viable. The lesson had nothing to do with any one tool. It was that the step had been there all along, unowned, and naming it changed the result.

Why it suddenly matters more

For most of its history this was a back-office concern. A duplicate inflated a report or wasted a mailing, and someone tidied it up the following quarter. There was a human between the data and any real consequence.

That buffer is vanishing. AI agents now read a customer profile and act on it in the same motion, with no person in between. Gartner predicts that by 2029 agentic AI will autonomously resolve 80 percent of common customer service issues, and the firm is explicit that these systems “act autonomously to complete tasks,” from canceling a membership to issuing a refund. When the identity beneath that action is wrong or stale, the agent does not hesitate. It acts, with confidence, on the wrong person.

We already have a preview of how the courts will see this. When an Air Canada chatbot gave a grieving passenger incorrect advice about bereavement fares, the airline argued that the chatbot was, in effect, responsible for itself. A British Columbia tribunal was unpersuaded. “It should be obvious to Air Canada that it is responsible for all the information on its website,” the decision read. “It makes no difference whether the information comes from a static page or a chatbot.” The airline was held liable for negligent misrepresentation. The principle reaches well past chatbots: a company owns what its automated systems do, including what they do to the wrong customer. McKinsey’s most recent global survey found that nearly a third of organizations have already seen consequences from AI inaccuracy. Who the agent thinks it is talking to is part of that accuracy.

A short test

The point of all this is not to add a box to a diagram or to begin a migration. The resolution step sits next to the CDP and the MDM you already run. It feeds them; it does not replace them. The point is to stop leaving the decision to chance between systems that each assume someone else has made it. If you want to know whether you own that step or only assume it, these questions are a place to start.

  • Can you point to where, in your stack, the decision that two records are the same person actually gets made, or does each system quietly decide on its own?
  • Does your matching weigh the messy real-world cases, the typos and name variants and missing identifiers, or does it only join on exact keys?
  • Is resolution applied as data comes together, or is it hoped for later, at report time?
  • When the same person appears in your CDP, your CRM, and your MDM, do they reconcile to one entity, and can you prove it?
  • Can you explain why two records were merged, or kept apart? Is the decision auditable, or a black box?
  • Who owns this, by name? If the answer is no one, it is being decided by default.
  • When a system, or an AI agent, acts on a customer in the moment, is it working from the freshest resolved view, or from last night’s batch?

Most companies, asked these questions, find that the most important decision in their customer data is the one nobody is making on purpose. The work begins where the man denied his mortgage wishes it had begun: with someone deciding, deliberately and well, who is who.

Sources

  1. Consumer Financial Protection Bureau, “Fair Credit Reporting; Name-Only Matching Procedures” (advisory opinion, November 2021).
  2. David Raab, “Customer Data Platforms vs Master Data Management: How They Differ,” CustomerThink.
  3. Robin Linacre, “An Introduction to Probabilistic Record Linkage.”
  4. Olivier Binette and Rebecca C. Steorts, “(Almost) all of entity resolution,” Science Advances, 2022.
  5. U.S. Census Bureau, “Record Linkage.”
  6. Gartner, “Data Quality: Why It Matters and How to Achieve It.”
  7. Thomas C. Redman, “Seizing Opportunity in Data Quality,” MIT Sloan Management Review.
  8. IBM, “The True Cost of Poor Data Quality,” 2026.
  9. Data Institute, “Case Study: 24% Better Data Quality (MediaPrint).”
  10. Gartner, “Gartner Predicts Agentic AI Will Autonomously Resolve 80% of Common Customer Service Issues Without Human Intervention by 2029,” press release, March 5, 2025.
  11. CBC News, “Air Canada found liable for chatbot’s bad advice on bereavement rates.”
  12. American Bar Association, “BC Tribunal Confirms Companies Remain Liable for Information Provided by AI Chatbot,” Business Law Today, February 2024.
  13. McKinsey & Company, “The State of AI,” global survey.

Share on LinkedIn Share on LinkedIn

Steven Renwick, Ph.D. MBA
Steven is the CEO and founder of Tilores.io, where he works on real-time customer entity resolution. He has a PhD from UCL and a MBA from the University of Oxford.

ADD YOUR COMMENT

Please use comments to add value to the discussion. Maximum one link to an educational blog post or article. We will NOT PUBLISH brief comments like "good post," comments that mainly promote links, or comments with links to companies, products, or services.

Please enter your comment!
Please enter your name here