Retail Metrics, eCommerce Data Models and Personalization


Share on LinkedIn

I’ve been writing (and working) quite a lot about data warehousing, ETL and data models this year. And between webinars on Celebrus, Infobright, and Semphonic ETL, plus Whitepapers on warehouse data collection infrastructure and Semphonic’s ETL and digital data warehousing models, it may seem like all the work I’ve been doing is around data warehousing. That isn’t true for me and it certainly isn’t true for Semphonic. Increasingly, however, much of the most interesting work does revolve around data warehousing or intersects with customer analytics and personalization in increasingly rich ways.

I wanted to discuss a few examples of this type of intersection from some recent work on eCommerce implementations and reporting.

Traditional eCommerce Web analytics implementations have focused on capturing the basic product funnel: from view to detail to cart to checkout. By cross-tabulating this information, you can easily see which products are moving and which aren’t. You can also tell which campaigns drive to which products. It’s certainly useful information and well used can help a product manager in a wide range of tasks.

However, if all you’re capturing in your Web analytics solution is product/category information relative to the funnel, you’re missing out on many of the most interesting and most actionable (from a targeting and personalization perspective) variables. You may not realize this, because the analytic opportunities in a Web analytics tool can be quite limiting. But if you are moving this data to a warehouse, you can do far more. And the data you’re moving to the warehouse is – in the majority of cases – the data you’re capturing in the Web analytics tool.

There are three types of data capture that I think are a critical part of a good retail analytics implementation: sequenced customer journey data, merchandising data, and filtering data. In each case, capturing this data not only leads to superior analytics, but to segmentation, targeting, and personalization opportunities. All of these can (and should – at least to some extent) be captured in a Web analytics implementation. But in many cases, analysis or customer modeling from this information will take a warehouse based data model.

My first example of this uses the Two-Tiered segmentation concepts of visit intent to answer a question of considerable importance to any product or marketing manager: how well did the site marketing facilitate the customer purchase decision? If you’re going to evaluate the effectiveness of your site merchandising, you have to be able to answer this question and all of the surrounding questions (such as: did they buy the product they originally considered, did they buy anything, did they switch categories, they buy up or down, and did they add additional products).

In traditional Web analytics, these questions are by no means easy to answer. You can easily generate a product report that shows which products are purchased and which products are purchased together. But if you think about it, you’ll see that this doesn’t really answer any of the questions. You have no way of assessing whether or not customer’s intended to buy those products, which order they shopped for them, what products they evaluated but didn’t buy, and whether or not you increased or decreased their order size for any given type of product consideration. Ouch.

To answer these types of questions, you need to establish a visit intent by sequencing a customer’s browsing/buying behavior. Structurally, your goal is to build a kind of customer journey relative to a specific shopping experience.

The milestones to capture include, at minimum, the type of entry, summary of any pre-shopping navigation, any browsing behavior within shopping (more on this later), any product detail behavior (ditto), and any cart or checkout behavior. Here’s an example of a couple of such streams:

External Search Entry on Product -> Detail View -> Image View ->Cart ->Checkout

General Site Entry -> Category Navigation -> Product View ->Category Navigation -> Product 2 View

Note that good journey tracking generally involves several different types of variables. This can make “pathing” reports useless (and pathing reports are very challenging to work with anyway) since you often don’t have all the things you need in a single variable.

If you build this type of journey, however, you can answer some of the questions that really matter to a site merchandiser. You can trivially figure out how often consideration of one product leads to purchase of a different product in the category and how often consideration in one category leads to consideration of another category.

If you pull in meta-data about the products (such as price and margin), you can also answer the next level of question: how often did you shift a visitor to a more expensive or higher margin product and how often did a visitor shift to a lower margin product?

And here’s where I want to stop and make my real point – good analysis and personalization opportunities are intertwined. At the customer level, this data allows you to create some powerful shopping oriented visitor segmentations:

  • Is this visitor a focused shopper or are they open to alternatives?
  • When they shop, are they likely to shift up or down?
  • When they add products to the cart, are they adding small amounts or significant upsells?

If you’re building rules to target upsells or cross-sells to customers, these are the types of segmentation your data model MUST support.

My second example focuses on merchandising action data. We see many retail sites that collect virtually no data about what merchandising cues are present on the Product Detail and Product List Pages. Is the Product listed as out-of-stock? Not captured. Is it listed as in-store? Not captured. Is it listed with a discount? Not Captured. Is the price hidden? Not Captured. Are the reviews good? Not captured. Is the margin high? Not captured. Is there a video? Not captured. In situations where inventory pricing is used, what’s the price and how does it compare to rack rate? Not Captured.

How are you supposed to do merchandising analysis when every single important merchandising action is removed from the data?

By integrating this type of data into the customer journey, you dramatically expand the ability of the analyst to understand what drove customer decision-making. This not only allows you answer a host of basic analytics questions:

  • How much do Out-of-Stock items cost the Web channel?
  • How much do negative reviews impact sales?
  • Do negative reviews shift product purchases or lost product purchases?
  • Do negative reviews shift product purchases to more or less expensive products?
  • Do inventory prices track to market (is our conversion rate fairly steady across rate shifts)?

It also allows you do drive interesting segmentation and personalization:

  • Is a customer review sensitive?
  • Will a customer shift to the store when an item is out-of-stock?
  • Will a customer shift products when the margin or inventory price is high?
  • Is a customer sensitive to discount language and drives?

This is powerful stuff when it comes to personalization and remarketing. It’s also part of the reason why, when you’re constructing a data model for retail, you don’t want to rely too much on Web analytics constructs. Page-based Web analytics constructs just aren’t a very good basis for tracking merchandising.

Which brings me to final example. More and more eCommerce behavior is being pushed up-stream into product lists and multi-product browsing experiences. For many of our clients, it’s becoming much less common to dive down into the Product Detail page. With popups, better product listings, and more early-stage research, the product list page is often the place where all the merchandising work is going on.

Of course, all of my earlier comments apply here in spades. If you aren’t capturing the state of the product on a detail page, what are the chances you’re capturing it on a list page? Pretty much none. In fact, most web analytics implementations aren’t even setup to capture the Products shown on the list page (which can, by the way, be really challenging) much less any meta-data about their merchandising state.

Capturing this type of information via tagging is so challenging that it may not be a good solution. With a warehouse, you may be better off capturing a “merchandising state” file on a daily basis or creating a product merchandising state history file that is constantly updated. A warehouse based approach gives you this opportunity.

On the other hand, there are some interface actions that really should be captured in the Web analytics tool. Particularly important are the filtering and sorting options applied by a user. Faceting is a critical aspect of the customer’s buying journey and should be captured as such. But what about sorting options? They are often deeply revealing.

Let me give you an example. If I’m looking for a nice night out with my wife, I’ll sort the restaurant listing on Opentable by Star Rating. If I’m looking for a family dinner, I won’t. It’s a simple cue – and it’s repeated over and over on countless retail and eCommerce sites. Could Opentable use this (plus my # of people in party) to highlight restaurants as romantic or family friendly? You bet. Do they? Nope.

List sorting choices on a well-designed site are, almost invariably, a powerful classifier of shopping intent and customer behavior. But nobody in Web analytics seems to want them.

If you capture the faceting and sorting information, you can answer questions like these:

  • Which types of sort orders drive conversions?
  • If a filter reduces a set to zero items, what’s the best behavior?
  • Which sort orders are optimal by category?

That’s interesting stuff, but this is a case where the personalization opportunities are considerably richer than the analytic opportunities. By capturing this information at the Customer level, you can build personalization around:

  • What factor is most influential to the user?
  • What type of purchasing decision is involved right now?
  • Is a customer consistent in their preferences or do they vary frequently (can we anticipate what they want)?

Here’s the bottom line:

If you’re building a measurement infrastructure for ecommerce, it’s not enough to capture product level information through the funnel. This misses nearly everything a product marketer can actually do to merchandise more effectively. By capturing sequenced customer journey information, product merchandising information, and sorting and filtering information, you create a vastly better merchandising analysis data stream. What’s more, if you’re moving this data to the warehouse, you have a slew of deeply interesting personalization and segmentation variables. And that, I think, is the deepest point. The limitations of traditional Web analytics have made it seem unnecessary to capture this type of information. If you live within those confines when you build a warehouse (or stick with the page-based shopping paradigm), you’ll be creating an impoverished model for both analytics and targeting.

Republished with author's permission from original post.

Gary Angel
Gary is the CEO of Digital Mortar. DM is the leading platform for in-store customer journey analytics. It provides near real-time reporting and analysis of how stores performed including full in-store funnel analysis, segmented customer journey analysis, staff evaluation and optimization, and compliance reporting. Prior to founding Digital Mortar, Gary led Ernst & Young's Digital Analytics practice. His previous company, Semphonic, was acquired by EY in 2013.


Please use comments to add value to the discussion. Maximum one link to an educational blog post or article. We will NOT PUBLISH brief comments like "good post," comments that mainly promote links, or comments with links to companies, products, or services.

Please enter your comment!
Please enter your name here