Repeatable Analytics


Share on LinkedIn

In Matthew Fryer’s Huddle on “Getting the Data to Tell It’s Secrets,” the broad consensus of the group seemed to be that deep-dive analysis is more art than science, that it is largely exploratory, and that its value is necessarily hard to predict. This is all certainly true with regards to the current state of digital analytics, but none of it, in my opinion, should be true.

There are analytic techniques that are fairly standard, highly-repeatable and extremely likely to add value. It isn’t impossible to imagine such beasts. Response modeling in targeted marketing is a good example of a repeatable, obviously valuable analytics method. So is market basket analysis in the grocery and retail shelf world. These are true analytic techniques, they are exploratory, but they are done in a highly standardized fashion with the full expectation that they will deliver value of a specific kind.

Is there a digital/Web equivalent?

I think Attribution Analysis is starting to emerge as one potential equivalent. In my recent short Facebook product write-up of ClearSaleing, I lamented that good attribution systems were forking off from Web analytics. As I’ve thought about this, however, I’ve begun to think that this process may be inevitable. Web analytics systems (and other general purpose analysis tools) are the systems we use for broad exploration. When we find a technique that really works, it’s often more convenient and more elegant to split that system off into a dedicated tool.

Are there other examples in digital measurement or techniques that might be treated similarly to our advantage? At Semphonic, we use two analytics methodologies (Functionalism and Use-Case Analysis) on a highly repeatable basis. Both of these methods work consistently and turn out useful results of an expected type. I’ve written about both, and while I think these represent the best attempts at analytic standardization in broad digital measurement, both are more generic and require more analyst tuning than I think ideal.

Since X Change, I’ve been thinking about the problem and I’ve found several types of analysis that Semphonic has done on a one-off basis that might, with a little bit of effort, lend themselves to true standardization. I think the development of standardized analytics products would be a huge benefit to our industry (and to Semphonic of course). So I’m going to issue a kind of challenge/offer around the methods I’m about to describe.

Give us your data, pay the cloud processing costs (which shouldn’t be too much), and throw in a little extra (say 5K for shipping and handling!), and we’ll do a full analysis based on one of these methods. My goal, of course, is to quickly generate enough analysis projects to test whether these methods are truly repeatable and to figure out how/whether they can best be done in consistently valuable fashion.

Here they are:

Site Navigational Structure Analysis

Description: Every site has a specific navigational structure that’s been designed to achieve a balance between business goals and visitor intent. In one of my earlier posts on Web analytics theory, I described the tension between visitor intent and navigational structure as a core problem in Web analytics. This analysis is designed to take advantage of that tension to help you understand how the customer use of your Website matches/differs from your expectations. The goal of this analysis is to create a complete map of the navigational behavioral structure of your site which can be compared to the actual structure to understand mismatches in design and consumer intent.

Method: We’ll model the entry and next page behavior for every significant page on your Website. For each page, we’ll identify the actual behavior-based parents and children of that page. Using this data, we’ll create a full mapping of the actual “behavioral” structure of your web site. As a bonus (where possible), we’ll use this child/parent data to automatically assign functional categories to the pages of your site. The analysis will be entirely mechanical and will cover the entire Web site.

Deliverable: A complete hierarchical mapping of the pages on your Web site based on actual visitor navigation with any possible functional mappings added in. This mapping can be easily compared to the actual site structure to identify significant differences between the way people use your Web site and the way you expected/designed the Web site for their use.

Requirements: A data feed from your web analytics solution. This analysis can be performed on a relatively short time-slice of data – a week would suffice for most sites.

Media or Publishing Site Content Correlation Analysis

Description: One of the staples of our analytic techniques on content-heavy sites is to model the viewing relationships between content categories. Questions of the sort “What type of Content are Visitors who View Type X mostly likely to be Interested In?” and “Does the order of viewing Content (Starting with X or Starting with Y) alter their likelihood of being interested in other types of Content?”. These questions of relation and order are similar, in some respects, to attribution analysis. However, answering them produces a different set of actionable tactics focused on suggesting (or helping editors choose the best suggestions) for additional content consumption. These types of answers can also help model the potential value of new visitors acquired by content type.

Method: We use Factor Analysis and Tetrachoric Correlation to identify content associations. Separately, we’ll analyze the problem set with order of viewing treated as a content attribute.

Deliverable: A full content type correlation matrix with significant relationships called out. For the order analysis, we’ll identify significant ordering relationships (cases where the order viewed changes the content associations).

Requirements: A data feed from your web analytics solution that includes some means of identifying content pages and their category type. This analysis usually requires a short-moderate time slice of data. We’d recommend a week to a month of data.

Customer Support Content Effectiveness Analysis

Description: Most Customer Support Sites have lots of long-tail support content. It’s hard to analyze and it’s too expensive – unlike conversion funnel pages, the effort it takes to find out which pages are working well and which aren’t is simply more than the effort is worth. On the other hand, efficient optimization of the entire content base would yield significant benefits (ala long-tail SEO). What’s needed is an efficient, comprehensive method of identifying the best support content for each type of problem (based on internal search keyword or stubbed – two pages prior to content – navigation path) that involves little or no manual effort.

Method: We’ll model exit behavior for each type of problem. We’ll identify the most common true end-point (the content page from which the user no longer needs to continue their journey). If available, we’ll refine the analysis with data from your page-based user-feedback mechanisms. The analysis will be entirely mechanical and will cover every common search term, stubbed navigation path, and content support page on the site.

Deliverable: A list of the entire problem space and, for each problem type (such as individual internal search term), the best content to drive to. For significant problem-spaces (higher volume of search), the best content may be an ordered list of the best content links.

Requirements: A data feed from your web analytics solution that includes internal search terms and some means of identifying support and support content pages. Because this is a long tail analysis, we recommend using a fairly significant time-slice of data – 3 to 6 months is optimal though the analysis can be done on much shorter time-slices for heavily trafficked support sites.

There you have it, three highly structured analytic techniques for “getting the data to tell its secrets.” One (site navigational analysis) is appropriate for any Website; a second (Content Correlation Analysis) is appropriate for any content-rich site; the third (Customer Support Content Effectiveness) is targeted to support sites with long-tail content. In each case, we at Semphonic have done the analysis at least once, but we’ve never standardized the process. We’re probably closest with Content Correlation Analysis, since we’ve done that many times.

I’d very much like to see if these techniques can be done super-efficiently in a repeatable and valuable manner. Ultimately, I think the most important fruits of analytics will be this type of highly-structured and consistently valuable technique. In their absence, it will always be an open question how valuable analytics actually is, and the degree of which companies are willing to commit to the creation and use of an analytics practice will always be limited.

If you’re interested in giving one or more of these a try with your data, drop me a line and we can talk!

Republished with author's permission from original post.

Gary Angel
Gary is the CEO of Digital Mortar. DM is the leading platform for in-store customer journey analytics. It provides near real-time reporting and analysis of how stores performed including full in-store funnel analysis, segmented customer journey analysis, staff evaluation and optimization, and compliance reporting. Prior to founding Digital Mortar, Gary led Ernst & Young's Digital Analytics practice. His previous company, Semphonic, was acquired by EY in 2013.


Please use comments to add value to the discussion. Maximum one link to an educational blog post or article. We will NOT PUBLISH brief comments like "good post," comments that mainly promote links, or comments with links to companies, products, or services.

Please enter your comment!
Please enter your name here