One of my favorite objections from potential buyers of Customer Data Platforms is that CDPs are simply “too good to be true”. It’s a reasonable response from people who hear CDP vendors say they can quickly build a unified customer database but have seen many similar-seeming projects fail in the past. I like the objection because I can so easily refute it by pointing to real-world case histories where CDPs have actually delivered on their promise.
One of the vendors I have in mind when I’m referring to those histories is Treasure Data. They’ve posted several case studies on the CDP Institute Library, including one where data was available within one month and another where it was ready in two hours. Your mileage may vary, of course, but these cases illustrate the core CDP advantage of using preassembled components to ingest, organize, access, and analyze data. Without that preassembly, accessing just one source can take days, weeks, or even months to complete.
Even in the context of other CDP systems, Treasure Data stands out for its ability to connect with massive data sources quickly. The key is a proprietary data format that lets access new data sources with little explicit mapping: in slightly more technical terms, Treasure Data uses a columnar data structure where new attributes automatically appear as new columns. It also helps that the system runs on Amazon S3, so little time is spent setting up new clients or adding resources as existing clients grow.
Treasure Data ingests data using open source connectors Fluentd for streaming inputs and embulk for batch transfers. It provides deterministic and probabilistic identity matching, integrated machine learning, always-on encryption, and precise control over which users can access which pieces of data. One caveat is there’s no user interface to manage this sort of processing: users basically write scripts and query statements. Treasure Data is working on a user interface to make this easier and to support complex workflows.
Data loaded into Treasure Data can be accessed through an integrated reporting tool and an interface that shows the set of events associated with a customer. But most users will rely on prebuilt connectors for Python, R, Tableau, and Power BI. Other SQL access is available using Hive, Presto and ODBC. While there’s no user interface for creating audiences, Treasure Data does provide the functions needed to assign customers to segments and then push those segments to email, Facebook, or Google. It also has an API that lets external systems retrieve the list of all segments associated with a single customer.
Treasure Data clearly isn’t an all-in-one solution for customer data management. But organizations with the necessary technical skills and systems can find it hugely increases the productivity of their resources. The company was founded in 2011 and now has over 250 clients, about half from the data-intensive worlds of games, ecommerce, and ad tech. Annual cost starts around $100,000 per year. The actual pricing models vary with the situation but are usually based on either the number of customer profiles being managed or total resource consumption.