Understanding data discrepancy when applying secondary dimensions in Google Analytics

Understanding data discrepancy when applying secondary dimensions in Google Analytics

When it comes to tracking and analyzing any metrics for a website or a mobile app nothing beats good ol’ GA. But oftentimes GA will put you in situations where data will simply make no sense leaving you doubtful about all of the reports you have ever made.

One such instance where you will find reduced metrics (page views, users, sessions, etc.) is when you apply secondary dimension or custom dimension to standard reporting in Google Analytics. There are two reasons why you are witnessing reduced metrics:

When a secondary dimension is a derivative of a primary dimension:

This is the most obvious and the simplest cause to understand why values will be reduced in GA when applying a secondary dimension. This issue occurs when secondary dimension is derived from a primary dimension. Say for example you want to see an overview based on the geographical location of your users. You will be served with a report similar to the one below when you head over to Audience>Geo>Location.

Note the traffic for USA and now apply a secondary dimension Browser. Apply filters to the report by exact matching United States to Country and Chrome to Browser. You will see a report similar to the one below:

This obvious reduction is due to the cut-down of data when selecting chrome (browser) as a secondary dimension. What GA does is that it cuts down traffic and other behavioral metrics from users in United Stated who are using only Google Chrome as their browser. Therefore the stripped down metrics do not include traffic reports from any other browser except Google Chrome.

Note that the above metrics are for the same date ranges.

The not-so-obvious reduction in metrics when applying secondary dimensions

The second scenario of experiencing highly reduced values while applying secondary dimensions will simply leave you flabbergasted. Despite keeping the date range same, the metrics for a primary dimension will drastically reduce without making any sense when applying a secondary dimension. To understand this, have a look at this example:

The above image shows behavioral metrics for a homepage of our example site. Now, if you wanted to segment the data of the users by gender, you would apply a secondary dimension GENDER to bifurcate the data into male and female users. But as soon as you apply the secondary dimension, this is what happens:

Even if you total the different metrics manually, it would never add up to the metrics of the primary dimension. So why is this happening? Have you been reporting wrongly all this time?

DATA SAMPLING: THE CAUSE!

This issue arises due to data sampling that Google Analytics automatically puts into play, a sampling algorithm whenever a large data is requested. This is done so that the processing time and latency is reduced and you are served with a subset of the original data as quickly as possible at the cost of accurate results.

Whenever you see a warning highlighted in yellow, (and this is generally when you apply a secondary or custom dimension) you can be sure that you are viewing a sampled report out of the actual results.

Data sampling can be very damaging for your reports and can take you anywhere from 10% to 30% off the mark. If you run a blog where payouts are given on the number of page views to authors, data sampling can create a mess and there can be numerous other examples.

How to overcome the problem of Data Sampling in Google Analytics?

By default, Google Analytics has an upward limit of 250k sessions for its free version. Any data that goes past this limit and you will be served with a sampled report. Therefore it is best to adjust the date range to a shorter period of time if you are witnessing a yellow warning for sampled reports.

The other way round is to use standard reports. Standard reports are not sampled and you can get accurate results. Sometimes when you need to apply advance segmentation, secondary dimensions or custom dimensions, the best way to go is to import data to excel and apply formulas to get a custom report. You will find most of the data through standard reporting in GA without having to apply any custom/secondary dimensions.

Thanks, that clearly describes the issue with "not-so-obvious reduction in metrics when applying secondary dimensions", but in my case problem is not caused by sampled data, report shows that exploration is unsampled. Any clue on why it can happen?

Like
Reply

To view or add a comment, sign in

More articles by Syed Afzal Ali

Others also viewed

Explore content categories