I’ve been working for months to find a way to help marketers understand the differences between Customer Data Platform vendors. After several trial balloons and with considerable help from industry friends, I recently published a set of criteria that I think will do the job. You can see the full explanation on the CDP Institute blog. But, since this blog has its own readership I figured I’d post the basics here as well.
The primary goal is give marketers a relatively easy way to decide which CDPs are likely to meet their needs. To do this I’ve come up with a a small list of features that relate directly to working with particular data sources and supporting particular applications. The theory is that marketers know what sources and applications they need to support, even if they’re not experts in the fine points of CDP technology.
In other words, read these items as meaning: if you want your CDP to support [this data type or application] then it should have [this feature].
Obviously this list covers just a tiny fraction of all possible CDP features. It’s up to marketers to dig into the details of each system to determine how well it supports their specific needs. We have detailed lists of CDP features in the Evaluation section of the CDP Institute Library.
The final list also includes a few features that are present in all CDPs (or, more precisely, in all systems that I consider a CDP – we can’t control what vendors say about themselves). These are presented since there’s still some confusion about how CDPs differ from other types of systems.
Now that the list is set, the next step is to research which features are actually present in which vendors and publish the results. That will take a while but when it’s done I’ll certainly announce it here.
Here’s the list:
Shared CDP Features: Every CDP does all of these. Non-CDPs may or may not.
- Retain original detail. The system stores data with all the detail provided when it was loaded. This means all details associated with purchase transactions, promotion history, Web browsing logs, changes to personal data, etc. Inputs might be physically reformatted when they’re loaded into the CDP but can be reconstructed if needed.
- Persistent data. The system retains the input data as long as the customer chooses. (This is implied by the previous item but is listed separately to simplify comparison with non-CDP systems.)
- Individual detail. The system can access all detailed data associated with each person. (This is also implied by the first item but is a critical difference from systems that only store and access segment tags on customer records.)
- Vendor-neutral access. All stored data can be exposed to any external system, not only components of the vendor’s own suite. Exposing particular items might require some set-up and access is not necessarily a real time query.
- Manage Personally Identifiable Information (PII). The system manages Personally Identifiable Information such as name, address, email, and phone number. PII is subject to privacy and security regulations that vary based on data type, location, permissions, and other factors.
Differentiating CDP Features: A CDP doesn’t have to do any of these although many do some and some do many. These are divided into three subclasses: data management, analytics, and customer engagement.
Data Management. These are features that gather, assemble, and expose the CDP data.
Base Features. These apply to all types of data.
- API/query access. External systems can access CDP data via an API or standard query language such as SQL. It’s just barely acceptable for a CDP to not offer this function and instead provide access through data extracts. But API or query access is much preferred and usually available. API or query access often requires some intermediate configuration, reformatting, or indexing to expose items within the CDP’s primarily data store. Those are important details that buyers must explore separately.
- Persistent ID. The system assigns each person an internal identifier and maintains it over time despite changes or multiple versions of other identifiers, such as email address or phone number. This allows the CDP to maintain individual history over time, even when source systems might discard old identifiers. CDPs that use a persistent ID applied outside of the system do not meet this requirement.
- Deterministic match (a.k.a. “identity stitching”). The system can store multiple identifiers known to belong to the same person and link them to a shared ID (usually the persistent ID). This enables the system to connect identifiers indirectly: for example, if an email linked to an account is opened on a particular device, subsequent activity on that device can also be linked to the account.
- Probabilistic match (a.k.a. “cross device match”). The system can apply statistical methods and rules to identify multiple devices used by the same person, such as computers, tablets, smart phones, and home appliances. While many CDPs rely on third party services for this sort of matching, this item refers only to matching done by the CDP itself.
Unstructured and Semi-Structured Data. This refers to loading data from unstructured or semi-structured sources such as Web logs, social media comments, voice, video, or mages. These are typically managed with “big data” technologies such as Hadoop. Nearly all CDPs use some version of this technology but it’s only essential if clients have unstructured or semi-structured sources and/or very high data volumes. Some CDPs handle very high data volumes in structured databases such as Amazon Redshift.
- JSON load. The system can accept and store data through JSON feeds without the user specifying in advance the specific attributes that will be included. Additional configuration may later be required to access this data. There are some alternatives to JSON that offer similar capabilities.
- Schema-free data store. The system uses a data store that does not require advance specification of the elements to be stored. Examples include Hadoop, Cassanda, MongoDB, and Neo4J.
Web Site. This refers to interactions with the company’s own Web site, whether on a desktop computer or mobile device.
- Cookie management. The system can deploy and maintain Web browser cookies associated with the client’s own Web site. The cookies can be linked to customer records in the CDP database.
Mobile Apps. This refers to interactions with mobile apps created by the company.
- SDK load. The system offers a Software Development Kit (SDK) that can load data from a mobile app into the CDP database. It must be able to associate the data with individual customers in the CDP database. This is usually done through an app ID. Other SDK features such as message delivery are not a requirement for this item.
Display Ads. This refers to interactions through display advertising networks, including social media networks.
- Audience API. The system has an API that can send customer lists from the CDP to systems that will use them as advertising audiences. The receiving systems might be Data Management Platforms, Demand Side Platforms, advertising exchanges, social media publishers, or others. Ability to receive information back from the advertising systems is not a requirement for this item.
- Cookie synch. The CDP can match its own cookie IDs with third party cookie IDs to allow the marketer to enrich profiles with external data or reach users through advertising networks.
Offline. This refers to interactions managed through offline sources such as direct mail and retail stores, where the customer’s primary identifier is name and postal address.
- Postal Address. The system can clean, standardize, verify, and otherwise work with postal addresses. This processing is reduces inconsistencies and makes matching more effective. Systems meet this requirement so long as the address processing is built into system process flows, even if they rely on third party software. Systems that send records to external systems in a batch process do not meet this requirement.
- Name/Address Match. The system can find matches between different postal name/address records despite variations in spelling, missing data elements, and similar differences. As with postal processing, systems can meet this requirement with third party matching software so long as the software is embedded in their processing flows.
Business to Business. This refers to companies that sell to other businesses rather than to consumers.
- Account-level data. The system can maintain separate customer records for accounts (i.e., businesses) and for individuals within those accounts. This means account information is stored and updated separately from individual information. It also means that selections, campaigns, reports, analyses, and other system activities can combine data from both levels.
- Lead to Account Match. The system can determine which individuals should be associated with which account records, using information such as company name, address, email domain, and telephone number. This excludes processing done by sending batch files to external vendors.
Analytics. These are applications that use the CDP data but don’t extend to selecting messages, which is the province of customer engagement.
- Segmentation. The system lets non-technical users define customer segments and automatically send segment member information to external systems on a user-defined schedule. Ideally, all data would be available to use in the segment definitions and to include in the extract files. In practice, some configuration may be needed to expose particular elements. Systems meet this requirement regardless of whether segments are defined manually or discovered by automated processes such as cluster analysis.
- Incremental attribution. The system has algorithms to estimate the incremental impact of different marketing activities on specified outcomes such as a purchase or conversion. Attribution is a specialized analytical process that relies on the unified customer data assembled by the CDP. Algorithms vary greatly. To qualify for this item, the algorithm must estimate the contribution of different marketing contacts on the final result. That is, fixed approaches such as “first touch” or “U-shaped distribution” are not included.
- Automated predictive. The system can generate, deploy, and refresh predictive models without involvement of a technical user such as a data scientist or statistician. This usually employs some form of machine learning. There are many different types of automated predictive; systems meet this requirement if they have any of them.
Engagement. This refers to applications that select messages for individual customers. It does not include content delivery, which is typically handled outside of the CDP.
- Content selection. The system can select appropriate marketing or editorial content for individual customers in the current situation, based on the data it stores about them, other information, and user instructions. The instructions may employ fixed rules, predictive models, or a combination. Selections may be made as part of a batch process.
- Multi-step campaigns. The system can select a series of marketing messages for individual customers over time, based on data and user instructions. The message sequence is defined in advance but may change or be terminated depending on customer behaviors as the sequence is executed.
- Real-time interactions. The system can select appropriate marketing or editorial content for individual customers during a real-time interaction. This requires accepting input about the customer from a customer-facing system, finding that customer’s data within the CDP, selecting appropriate content, and sending the results back to the customer-facing system for delivery. The results might include the actual message or instructions that enable the customer-facing system to generate the message.