How does collection sampling affect the "live" stats for Google Analytics? - google-analytics

We've noticed lately that as our site is growing, our data in Google Analytics is getting less reliable.
One of the places we've noticed this most strongly is on the "Realtime Dashboard".
When we were getting 30k users per day, it would show about 500-600 people on line at a time. Now that we are hitting 50k users per day, it's showing 200-300 people on line at a time.
(Other custom metrics from within our product show that the user behavior hasn't changed much; if anything, users are currently spending longer on the site than ever!)
The daily totals in analytics are still rising, so it's not like it's just missing the hits or something... Does anyone have any thoughts?

The only thing I can think of is that there is probably a difference in interpretation of what constitutes a user being on line.
How do you determine if the user is on line?
Unless there is an explicit login/logout tracking, is it possible that it assumes that a user has gone if there is no user generated event or a request from the browser within an interval of X seconds?
If that is the case then it may be worth while adding a hidden iframe with some Javascript code that keeps sending a request every t seconds.

You can't compare instant measures of unique, concurrent users to different time-slices of unique users.
For example, you could have a small number of concurrent unique users (say 10) and a much higher daily unique users number like 1000, because 1000 different people were there over the course of the day, but only 10 at any given time. The number of concurrent users isn't correlated to the total daily uniques, the distribution over the course of the day may be uneven and it's almost apples and oranges.
This is the same way that monthly unique and daily uniques can't be combined, but average daily uniques are a lower bound for monthly uniques.

Related

How can I calculate total user time on site across sessions in Google Analytics?

I want to know the total amount of time any one of our users has spent on the site, across sessions. Is there a way to find this stat in Google Analytics?
If you want to track an individual customer through different sessions, it is not natively possible under Google Analytics, and there are quite a few traps you need to avoid to make sure to be complying with the conditions of use of the service. You can find more information here
If you mean tracking the total duration of sessions, Session Duration and Time on Page should give you what you want.
They basically measure the same thing, but can give you slightly different results depending on your implementation.

Unique Users in Google Analytics

I'm trying to get all unique visitors for a selected time period, but I want to filter them by date on the server. However, the sum of unique visitors for each day isn't the number of unique visitors for the time period.
For example:
Monday: 2 unique visitors
Tuesday: 3 unique visitors
The unique visitors for the two days period isn't necessarily 5.
Is there a way to get the results I want using the Google Analytics API (v3)?
You're right that Users aren't additive, so you can't simply add them day by day. There are several ways around this.
The fist and most obvious is that if you've implemented the User-ID you should be able to straight up pull and interrogate the data about which users saw your site on which days.
Another way I've implemented before is to dynamically pull the number of Users from the Google Analytics API whenever you need it. Obviously this only works if you're populating a live web dashboard or similar, but since it's just the one figure you're asking for, it wouldn't slow down the load time by much. Eg. if you're using a dashboarding tool such as Klipfolio, you may be able to define a dynamic data source, and query Google whenever you needthe figure (https://support.klipfolio.com/hc/en-us/articles/216183237-BETA-Working-with-dynamic-data-sources)
You could also limit the number of ways that the data can be interrogated, and calculate all of them. For example, if you only allow users to look at data month-by-month or day-by-day, then you only need those figures.
Finally, you can estimate the figure with reasonable accuracy by splitting it into two parts. New Users are equal to New Sessions (you're only new on your first Session), which is additive, so that figure can be separated out and combined as required.
Then, you could take a rough ratio of new to returning Users (% New Users) from, say, 1 year of data, and use that with the New Users figure to generate an average on any level.

Data sampling in google analytics goal flow report

The goal flow report on my google analytics account shows some strange sampling behavior. While I can usually select up to a month of data before sampling starts it seems to be different for the goal flow report.
As soon as I select more than one day of data the used data set is getting smaller very fast. At three days the report ist based on only 50% of the sessions, which, according to analytics, comes to only 35 sessions.
Has anyone experienced a similar behavior of sampling although only very small data-sets are used?
Sampling is induced when your request is calculation-intensive; there's no 'garunteed point at which it trips.
Goal flow complexity will increase exponentially as you add goals, so even a low number of goals might make this report demand a lot of processing.
Meanwhile you'll find that moast of the standard reports can cover large periods of time without sampling; they are preaggreated, so it's very cheap to load them.
If you want to know more about sampling, see here:
https://stackoverflow.com/a/37386181/5815149

Why are Google Analytics Dashboard statistics changing?

Background:
I have a Google Analytics account using which I am tracking user activity for web and mobile app. After logging into your account and choosing the web property and the corresponding view, you generally see a dashboard with quick stats like Pageviews, Users, Sessions, Pages/Sessions, Avg. Session Duration, Bounce Rate and percentage of new sessions. You can change the time period (from the top right area of the Dashboard) to get the same stats for that period.
Problem:
Last week, I was interested in the three main stats: Page views, Users and Sessions for a particular day - say, day A. The dashboard showed the following stats:
Pageviews - 1,660,137
Users - 496,068
Sessions - 983,549
This report was based on 100% of sessions.
I go back to the dashboard TODAY and check the same stats for the same day A. Here's what I saw:
Pageviews - 1,660,137
Users - 511,071
Sessions - 1,005,517
This report is also based on 100% of sessions.
Nothing was changed in the tracking code for the web and mobile app. Could someone explain why I have this difference in the stats? Is this normal?
They need some time to update the system, otherwise their system would overwhelm
When you first create a profile it can take up to 48 -72 hours for it to start showing data.
After that time data will appear instantly in the Real-time reports.
Standard reports take longer to finish processing. You need to remember the amount of data that is being processed. Some of the data may appear in the standard reports after a few hours. The numbers have not completed processing for at least 24 hours, so anything you look at then will not be accurate.
When checking Google Analytics never look at todays or yesterdays numbers in the standards reports, if you want accurate information. Things get even more confusing when you consider time zones. When exactly is it yesterday? I have noticed numbers changing as far back as 48 hours. But Google Says in there documentation 24 hours. I am looking for the link in the documentation will post it when I find it.
Found it: Data Limits
Data processing latency
Processing latency is 24-48 hours. Standard accounts that send more
than 200,000 sessions per day to Google Analytics will result in the
reports being refreshed only once a day. This can delay updates to
reports and metrics for up to two days. To restore intra-day
processing, reduce the number of sessions you send to < 200,000 per
day. For Premium accounts, this limit is extended to 2 billion hits
per month.
So try doing the same thing again today but check your last day being Monday. When you check again next week the numbers should be correct.

big difference in "visitor" count

I try to pull out the (unique) visitor count for a certain directory using three different methods:
* with a profile
* using an dynamic advanced segment
* using custom report filter
On a smaller site the three methods give the same result. But on the large site (> 5M visits/month) I get a big discrepancy between the profile on one hand and the advanced segment and filter on the other. This might be because of sampling - but the difference is smaller when it comes to pageviews. Is the estimation of visitors worse and the discrepancy bigger when using sampled data? Also when extracting data from the API (using filters or profiles) I still get DIFFERENT data even if GA doesn't indicate that the data is sampled - ie I'm looking at unsampled data.
Another strange thing is that the pageviews are higher in the profile than the filter, while the visitor count is higher for the filter vs the profile. I also applied a filter at the profile to force it to use sample data - and I again get quite similar results to the filter and segment-data.
profile filter segment filter#profile
unique 25550 37778 36433 37971
pageviews 202761 184130 n/a 202761
What I am trying to achieve is to find a way to get somewhat accurat data on unique visitors when I've run out of profiles to use.
More data with discrepancies can be found in this google docs: https://docs.google.com/spreadsheet/ccc?key=0Aqzq0UJQNY0XdG1DRFpaeWJveWhhdXZRemRlZ3pFb0E
Google Analytics (free version) tracks only 10 mio page interactions [0] (pageviews and events, any tracker method that start with "track" is an interaction) per month [1], so presumably the data for your larger site is already heavily sampled (I guess each of you 5 Million visitors has more than two interactions) [2]. Ad hoc reports use only 1 mio datapoints at max, so you have a sample of a sample. Naturally aggregated values suffer more from smaller sample sizes.
And I'm pretty sure the data limits apply to api access too (Google says that there is "no assurance that the excess hits will be processed"), so for the large site the api returns sampled (or incomplete) data, too - so you cannot really be looking at unsampled data.
As for the differences, I'd say that different ad hoc report use different samples so you end up with different results. With GA you shouldn't rely too much an absolute numbers anyway and look more for general trends.
[1] Analytics Premium tracks 50 mio interactions per month (and has support from Google) but comes at 150 000 USD per year
[2] Google suggests to use "_setSampleRate()" on large sites to make sure you have actually sampled data for each day of the month instead of random hit or miss after you exceed the data limits.
Data limits:
http://support.google.com/analytics/bin/answer.py?hl=en&answer=1070983).
setSampleRate:
https://developers.google.com/analytics/devguides/collection/gajs/methods/gaJSApiBasicConfiguration#_gat.GA_Tracker_._setSampleRate
Yes, the sampled data is less accurate, especially with visitor counts.
I've also seen them miss 500k pageviews over two days, only to see them appear in their reporting a few days later. It also doesn't surprise me to see different results from different interfaces. The quality of Google Analytics has diminished, even as they have tried to become more real-time. It appears that their codebase is inconsistent across API's, and their algorithms are all over the map.
I usually stick with the same metrics and reporting methods, so that my results remain comparable to one another. I also run GA in tandem with Gaug.es, as a validation and sanity check. With that extra data, I choose the reporting method in GA that I am most confident with and I rely on that exclusively.

Resources