In my last post on building a digital data model for the analytics warehouse, I described the concept of statistical ETL and argued for its critical importance in creating a robust and easily used visit-level data model. That post drew an interesting comment from a friend of mine arguing not with the concept of statistical ETL, but against the idea of a visit-level aggregation. Here’s the comment (from John Stansbury):
Gary, As usual, great blog. It just seems that we’ve got to get past the notion of visits and sessions. Those entities only exist for the convenience of measurement tools–consumers don’t consciously think of a “visit” to a site, and they certainly don’t think, “well, it’s been 30 minutes of inactivity so if I click now it will be entirely different activity from just a little while ago.” Marketers and Analysts have got to address customers in terms of their goals. I know it’s difficult, but until Marketers demand it the tool vendors won’t change paradigms. Customers don’t visit, they shop. And if your customers’ shopping crosses time and technology, you had better market to them that way.
There’s a lot in here that I agree with. John’s surely right that the “visit” as we know it as largely an artifact of the convenience of our technology vendors. The thirty minute time-out, in particular, is so clearly arbitrary that not much except convenience can be said in its defense. Not that I would completely denigrate convenience, particularly when it comes with an element of plausibility. We draw these arbitrary lines in the sand all the time (draft age, voting age, drinking age, age of consent) and then we gradually begin to assume they have some weightier meaning than convenience and custom. But it’s also true that we often HAVE to draw some lines in the sand and if we don’t invest them with meaning then we haven’t really got a line.
So the more interesting question is do we need to draw this particular line at all – is the concept of a visit useful?
I think it is, but for a concept I described in my last post as central, I’m rather less confident of that than you might expect. In Mobile App measurement, for example, we’ve experimented with the notion of replacing the idea of a “visit” with something I call a “unit-of-work”.
What’s the difference?
A unit-of-work is the complete set of activities over any period of time focused on a single task. If, for example, I was planning an international trip, the unit-of-work on a travel website would include ALL of the activity dedicated to booking that flight. That unit-of-work might span months, numerous periods of activity, and many, many different actions. At the same time, if I use a travel site to book my plane travel and then my hotel, I may have two units-of-work in one sustained period of activity lasting only a few minutes in time.
The unit-of-work paradigm is, therefore, quite distinct from the visit. What we call a visit might contain a part of a unit-of-work or two or three complete units-of-work. For a unit-of-work, we’d likely measure the number of periods of activity, the total calendar time elapsed, the actual activity time spent, and many other metrics that don’t exist when you think about visits. I happen to think these metrics are often more interesting than the visit-constrained ones we typically use.
There’s one advantage to the unit-of-work concept that is particularly attractive to us here at EY. It eliminates the problem of multi-purpose sessions in a two-tiered segmentation while fitting that segmentation scheme beautifully in every other respect. A unit-of-work just is a use-case – the 2nd tier of digital segmentation. And by eliminating visits, the problem of how to classify those visits with two or more clearly different use cases is also eliminated.
Interestingly, you can ONLY switch to a concept like unit-of-work if you deploy something similar to our two-tiered segmentation and statistical ETL.
Now as John rightly points out, in our digital analytics tools, we are locked in to the concept of a visit. But in the analytics warehouse, we have no such constraint. Just as I recommended unpacking some Adobe data feed records in my first post on building a clean event-level feed, there’s no reason why we can’t re-organize our data into a unit-of-work level aggregation in the warehouse.
So should you scrap the concept of a visit in your warehouse?
For a few types of Websites, I think the answer may just be yes. If you have lots of “mixed” sessions or lots of multi-session transactional behavior, then it seems to me that the unit-of-work concept might work better than the traditional visit. For other Websites, however, I think the visit still makes sense as a logical way to organize the data.
Think of a visit as a sustained short-term period of activity on a Website. As such, I think it captures an important and real aspect of digital behavior.
As a sports-fan, I usually keep a sports-site in a tab on my browser. Periodically throughout the day, I’ll come back to that tab, refresh the main page, and check scores or look at new stuff. Each time I do that, it seems to me that I’ve started afresh – giving the site a new opportunity to engage me. Do those events necessarily happen thirty minutes apart? Sadly, no (at least on Football Sundays and other sports-heavy days). On the other hand, we don’t necessarily have a good way to capture and track when a browse session actually has focus and even if we did track detailed focus events, we’d still probably be reluctant to describe very short losses of focus as creating distinct periods of activity. If I flip away from my sports site for 2 minutes and then come back, it’s quite different than when I’ve been away for 2 hours.
Which brings us back to drawing lines and the thirty minute boundary question. Is 30 minutes a good boundary line? Probably not for most sites, but that’s a question to be answered empirically not in the abstract. You have to draw the line somewhere and the right place is going to be arbitrary but justifiable as being largely correct and conservative in whatever direction it seems most important to protect.
Indeed, it might make sense to blend the traditional time-based visit concepts with the activity based unit-of-work model and think about some time period in conjunction with some change in activity. If, for example, I pop the Gametracker for the Giants v. Dodgers into a browser window, then go away for 35 minutes and come back to check the score, it hardly seems like I’ve changed my activity in any significant way. It feels to me like the same visit as long as I’m tracking the same game. But if I pop that same Gametracker, come back in 15 minutes, refresh to the home page and read an article about the NFL draft, then I’ve changed my activity pattern and probably generated a new visit. On the other hand, if I do that same pattern inside a minute, I’d be inclined to call it all part of the same “mixed use-case” session.
It’s complicated!
I think blending time and activity might yield a model that comes closest to representing real digital behaviors in an intuitive fashion. But don’t kid yourself – it will still contain some largely arbitrary distinctions and will regularly misrepresent certain types of browsing behaviors. That’s the price we nearly always have to pay for creating an abstract model of any complicated reality.
So is the visit dead? Maybe. But long live the visit!