Choosing the Right Big Data Technology Stack for Digital Marketing


Share on LinkedIn

IBM and Semphonic just partnered on a new Whitepaper tackling one of the hottest and most challenging topics in digital analytics – choosing the right big data technology stack. I finished it a couple of weeks back and it’s now gone into general release. In addition, I’m going to be doing a webinar about it with IBM’s CTO of Big Data Solutions, Krishnan Parasuraman.

I’m very excited about both.

In the Whitepaper, I got to combine some of the big themes that have been emerging in our practice: the unique challenges of digital analytics for traditional statistical and database methods, the impact of those challenges on the selection of a technology stack, and the best ways to structure a digital analytics technology initiative to address the issues and build an effective digital big data solution.

Over the last twelve months, Semphonic has been incredibly active in this area. We never used to focus that much on strategic measurement engagements. But the confluence of Big Data and Digital Analytics has changed that. With our extensive background in database marketing, we’re comfortable (indeed, eager) to get our hands on the detailed customer data and the database, BI, and statistical tools that support that deep access. We’ve had fifteen hard years trying to figure out how to measure, segment, and use digital data effectively. We’ve also seen first-hand how easy it is to break traditional technology stacks with digital data, having done it repeatedly! That combination of big data technology and digital measurement chops is pretty unique, and I think that’s why we’ve been getting asked so often to help large enterprise’s craft a strategy that blends these elements effectively.

In the Whitepaper, I’ve tried to distill that experience down into a useful framework for thinking about digital marketing analytics in a big data world.

So Just What is Big Data?

The Whitepaper starts with a pretty deep discussion of the challenges of digital and why digital is a paradigm case of big data. I know people are already starting to hate the term big data, and I don’t really blame them. In the broader market, it doesn’t have a specific meaning. It’s lots of data. We get that. But how much data is big data? And why does having lots of data really change anything?

I try to tackle this definitional morass in the Whitepaper. At Semphonic we’ve come to have a pretty specific view about what big data means and why it really is somewhat different – not just “more rows than normal.” We believe that big data is really about a drive to “detail” data and to algorithmic analytics techniques that don’t work off of aggregates. Yes, volume does count. But big data isn’t just big, it’s big because we’ve shifted the level of analysis.

This shift to detail-level analysis has a much bigger impact than you might suppose. From a technology standpoint, it does drive more row volume. But from an analysis perspective, it makes many traditional BI techniques (that depend on cube-based aggregates) impossible or irrelevant.

In digital, it has even deeper implications. Which brings me to the part of the Whitepaper that I think is the most interesting and important.

The Challenge of Stream Data

You’ll often hear digital data described as “unstructured.” I think that’s wrong (at least in part). Yes, social media data is truly unstructured. But analytics data collected from the Web and Mobile channels is certainly structured. The SiteCatalyst data-feed (our most common source of this information) is just a classic, big, comma-delimited flat file with 400 or so fields per row. Structure!

In fact, almost every digital data source except social is structured data.

So why this persistent description of digital data as unstructured?

Well, digital data does drive IT folks and data architects crazy. But it’s not the lack of structure that does it, it’s the level of meaning.

In most digital data, there’s no meaning inherent in a single detailed row. The server call (or page view) is not, on it’s own, the unit of analysis. Worse, digital data doesn’t aggregate cleanly. Adding server calls to create page view counts or time on site isn’t, in most cases, the path to meaning. Meaning comes by interpreting a stream of server calls (on the Web this is a Visit or Path). So digital data is (mostly) semi-structured. Each row is structured just fine, but to get to anything interesting requires interpretation (effectively the addition of structure).

Why is this important?

The vast majority of ETL, query and statistical analysis techniques have been built to operate on individual rows. That doesn’t work in digital. In digital, meaning exists only in the combination of multiple rows (paths) and that combination isn’t a straightforward aggregation.

Stream data create a second big problem. Stream data defeats classic join strategies. One-to-One and One-to-Many joins are almost the only types of joins ever used in classic database work. With streams, you get Many-to-Many joins. Many-to-Many joins don’t work well.

We’ve seen a number of cases where our clients dump digital data streams into a warehouse, find join keys, and think they are done. In a traditional world, putting two data sources on the same box with a join key makes it easy for an analyst to put them together. In a stream world, it doesn’t quite solve the problem.

In the Whitepaper, I take a real deep-dive into this topic because I think it is, quite simply, the key to understanding the challenge of digital big data warehousing.

Translating Problems into Solutions

It’s nice to have a good definition of big data. It’s certainly interesting to know why digital data is such a challenge. But how does that knowledge translate into a useful framework for moving forward?

Well, that’s the third part of the Whitepaper. Because once you understand some of the unique challenges of big data analysis and digital, you can start to map different applications of digital to specific attributes of different technology stacks.

In the Whitepaper, I look at a whole set of different decision factors (from handling very large row counts, to supporting algorithmic queries, to real-time analytics, to the availability of expertise) and match them to another set of digital marketing use-cases (things like email Targeting, Personalization, Customer Analytics and Attribution).

Not every digital marketing application has the same requirements or puts the same stress on the technology decision factors. So if you know what types of digital marketing applications you have, the Whitepaper gives you a great framework for evaluating what types of technology capabilities you need.

If you’re at the point a lot of our clients are, you know that the range of new technologies and big data capabilities, while welcome, make choosing the right approach harder not easier. It can just be too many choices. Without a way to think about which trade-offs are appropriate (and believe me, EVERY technology has trade-offs), making a decision can feel random.

Yes, IBM has put together a pretty comprehensive big data solution set. It will probably be on just about any enterprise short-list for big data. But our (both Semphonic and IBM’s) goal in this Whitepaper wasn’t to evaluate the IBM solution or even to talk much about it. It was to lay out a way for ANY organization evaluating ANY big data technology stack to think more clearly about what’s needed and why.

Download it here! Register for the webinar here!

Republished with author's permission from original post.

Gary Angel
Gary is the CEO of Digital Mortar. DM is the leading platform for in-store customer journey analytics. It provides near real-time reporting and analysis of how stores performed including full in-store funnel analysis, segmented customer journey analysis, staff evaluation and optimization, and compliance reporting. Prior to founding Digital Mortar, Gary led Ernst & Young's Digital Analytics practice. His previous company, Semphonic, was acquired by EY in 2013.


Please use comments to add value to the discussion. Maximum one link to an educational blog post or article. We will NOT PUBLISH brief comments like "good post," comments that mainly promote links, or comments with links to companies, products, or services.

Please enter your comment!
Please enter your name here