, ,

Src: http://jonathonbalogh.com/2012/04/01/how-to-do-cohort-analysis-in-google-analytics/2/

Cohort analysis example: engagement

Never use analytics to track information that uniquely identifies a particular person, including their real name, email address or IP. It’s not only against Google Analytics’ terms of service, it’s also a lousy and unnecessary violation of privacy.

Most cohort analysis is based on users grouped by a common date range. We do this to see if their behavior from one period to the next has changed. It’s also possible to group users based on other attributes that they share, such as membership level or achieved goals. The objective is to learn whether users with this attribute tend to achieve our product goals at a significantly different rate than a baseline cohort over time.
What types of data should we track? This depends on the type of product you have and the level of detail you need. Ask yourself: what are the long term attributes of your users that Google Analytics doesn’t provide? Which properties best differentiate your users and are most relevant to your product? What questions are you trying to answer?

Good examples total downloads, donated, sign up date, Klout score, gender, membership type, games played, referred friend, test group
Bad examples number of visits, location, browser, referer, number of pageviews, IP address, last name

Yes, there are exceptions to virtually every one of those examples. Use your judgement. If it’s important for you to know the number of people who started with Internet Explorer last year but are using Chrome this year then go ahead and record the user’s “Initial Browser”, for example.


In Google Analytics the majority of metrics are associated with a visit or session – this includes goals and events. When selecting trackable cohort attributes you’re making a decision about which user data to track across visits. Want to know how many downloads you had last week? Just use events or virtual pageviews. Cohort tracking doesn’t help with that. Need to track the number of visits in which users opened your pricing page, clicked a Learn More link and then signed up for your premium plan? Use a funnel, that’s what they’re for. Curious if last year’s paying members are as likely to pay this year as new members? Use a cohort analysis and track both sign up date and transactions.
There are, in fact, other ways to get this type of information. The best way is to just query your database directly. If users need to sign in to your product to use it then they likely have an account stored in your database. Want the number of users who’ve signed up in the last month and donated at least once? Just login to your live database and execute the appropriate SQL query. Want to graph that for the last 6 months and compare it against the referring medium? No problem. Just parse your site log file to correlate visits to logins so you can update a new DB table on visitor attributes then run another query, likely involving a join, on a replicated DB (to ensure stability), export the results, import the data into a spreadsheet or something else and then create the graphs. Heck, you can even manage funnel reports if you’re willing to work at it.
A homegrown analytics solution gives you lots of power and flexibility without having to rely on a third party service. And honestly, as involved as it may be, if you know what you’re doing you can automate your solution to the point where it’s just as fast and easy to use as a dedicated service. Maybe better. So why wouldn’t you? If you’re comfortable with this stuff, don’t mind investing the time and believe it’s critical for your product’s success then you probably should. For the rest of us, the investment in learning, building and maintaining this type of solution just isn’t worth it. (Though there are analytics servicesaround that can help you with this.)

Blog example: Guido’s Mosquitos

I find things much easier to understand when looking at a real world situation. Let’s try a quick tutorial showing how you might use cohort analysis in Google Analytics to track engagement. Imagine your product is a blog advocating respect for your friend, the misunderstood mosquito. Your goal for “Guido’s Mosquitos” is to understand how well you retain your readers as well as record a few goals that they might reach on your site. In this case, you need to decide which cohort retention intervals you care about and which goals matter most. Let’s start with something like this:

Data layout:

Slot 1 Signup date 20111019 Date of user’s first visit
Slot 2 Weekly cohort 42 Week of user’s first visit
Slot 3 Ebook downloads 3 Number of ebooks downloaded
Slot 4 Goal tracking RefSent User referred a friend

It’s a new year and you’re considering adding more ebooks for readers to download from your blog. However, you only want to do so if it’s likely to increase donations. How do you proceed? In this case, the cohort, the group of people you’re most interested in, is made up of users who have downloaded at least x of your ebooks. You don’t care when they started coming to your site, or even how long they stayed, just that they engaged in an activity of interest to you.

“Cohort: 0 downloads” Custom var: 3
Matching RegExp: ^0$
“Cohort: 1 download” Custom var: 3
Matching RegExp: ^1$
“Cohort: 2+ downloads” Custom var: 3
Matching RegExp: ^[2-9]$

With this segmentation you can jump over to an appropriately configured custom report and attempt to answer your initial question. For example, you might try to plot the number of goals achieved (donations) by each of the 3 user segments during the last couple months of the year.

Aak! The abundance of ebooks is killing your business! Ok, not really. This is a rather limited analysis and it’s important that we understand exactly what it says. Looking at the “Cohort: 1 download” segment, for example, the results might be read something like this: 14.49% of users who downloaded exactly 1 ebook made a donation in the last 2 months. These users may have downloaded their one ebook during the analysis period or any time before that.

Correlation between users who download ebooks and make donations

What we are trying to do is establish a correlation between our test segments (users who download ebooks) and our target goals (in this case, donations). The graph suggests that those who download ebooks are significantly more likely to donate but that those who download 1 ebook are just as likely to donate (if not more) as those who download 2 or more. The graph says nothing about why this is the case. Perhaps each of the downloaded ebooks repeat the same message and you’re boring your audience to tears. I don’t know. A more detailed attribution analysis would be required. But the investigation here should at least make you stop and think: maybe I should investigate this further before adding more ebooks, or perhaps there’s a better way to increase donations (preferably one with more promising data).