Unique Users in Google Analytics - google-analytics

I'm trying to get all unique visitors for a selected time period, but I want to filter them by date on the server. However, the sum of unique visitors for each day isn't the number of unique visitors for the time period.
For example:
Monday: 2 unique visitors
Tuesday: 3 unique visitors
The unique visitors for the two days period isn't necessarily 5.
Is there a way to get the results I want using the Google Analytics API (v3)?

You're right that Users aren't additive, so you can't simply add them day by day. There are several ways around this.
The fist and most obvious is that if you've implemented the User-ID you should be able to straight up pull and interrogate the data about which users saw your site on which days.
Another way I've implemented before is to dynamically pull the number of Users from the Google Analytics API whenever you need it. Obviously this only works if you're populating a live web dashboard or similar, but since it's just the one figure you're asking for, it wouldn't slow down the load time by much. Eg. if you're using a dashboarding tool such as Klipfolio, you may be able to define a dynamic data source, and query Google whenever you needthe figure (https://support.klipfolio.com/hc/en-us/articles/216183237-BETA-Working-with-dynamic-data-sources)
You could also limit the number of ways that the data can be interrogated, and calculate all of them. For example, if you only allow users to look at data month-by-month or day-by-day, then you only need those figures.
Finally, you can estimate the figure with reasonable accuracy by splitting it into two parts. New Users are equal to New Sessions (you're only new on your first Session), which is additive, so that figure can be separated out and combined as required.
Then, you could take a rough ratio of new to returning Users (% New Users) from, say, 1 year of data, and use that with the New Users figure to generate an average on any level.

Related

How can I view individual hits to pages within a GA custom report

I would like to compare some data between a 3rd party analytics tool and GA.
Now I would love to see the IP addresses that Ga is receiving however it seems that they do not reveal this information, fine, however, I cannot find a way to use the flat table in the GA custom report to show me the following if possible;
Full Date Time (Seems as though they don't want you to have this either)
Browser Version
Browser Width & Height
Page (from the hit)
And I would like this data not to be grouped by the metric, this way I can see that if the same user has hit a page 3 times it isn't grouped.
If anyone can help please let me know. If the question is poorly phrased please let me know.
Thanks,
Connor.
This requires some work, and it will allow the breakdown only for future hits, not for hits that are already collected.
To view individual hits you need to create a hit based dimension that is unique per hit. Unless your page has an amazing amount of traffic a timestamp in milliseconds (e.g. new Date().getTime()) will be sufficient (for your report you might want to format that in a nice way). So in the admin section of your GA property you go to custom definitions, create a hit scoped custom dimension, and then modify your pagecode to send the timestamp to that dimension. Hit scoped means it is attached to the pageview (or other interacton hit) it is sent with.
If you want to break down your report by user you need the clientid (clientid is how Google recognizes that hits belong to the same user). Again, send it as a custom dimension.
This does not tell you how many sessions the user had (there is no session identifier in GA). If you need to know that you can create a session scoped custom dimension and send a random number along ("session scope" means that GA only stores the last value in a session, so you don't need to maintain a session id over multiple pageviews, since the last value will be set for all hits within the session). The number of different sessions ids per client id then tells you the number of sessions per user.
The takeaway is that GA only shows aggregated data, and if you want to defeat this mechanism you need to throw data at it that cannot be aggregated further. You might run into other constraints (i.e. there is a limited number of rows per report).

Google Analytics API: Creating arbitrary "dimensions" to group metrics

I have a requirement to programmatically get unique visitors grouped by partial matches on some fields. For example, assume I want to group my users by the source domain like "google" or "facebook".
A single user's visits might come in with a ga:source of "m.facebook.com" and then "www.facebook.com" on another visit, or "m.google.com" and "www.google.co.uk", etc. I can perform an API query specifying "ga:source" as the dimension, and it will give me the unique visitors for "m.facebook.com", "www.facebook.com", "m.google.com" and "www.google.co.uk" respectively. However users who visited via more than one of them in the requested period are counted in each group, so aggregating this data subsequently into "facebook" and "google" groups results in duplicate users being counted.
Would it be possible to group the "ga:source" dimension using a Regex (^(?:.*?\.)(.*?)(?:\..*) for instance) or some similar arbitrary mechanism so that I can get two groups of unique visitors instead: "facebook" and "google"?
I can of course, use filters to get each category and then perform multiple requests and that works fine, but being the lazy programmer I am, I was wondering if I could do it all in one go, or if anyone had alternative suggestions I haven't thought of.
The conclusion appears to be that the only way is indeed to submit a filtered query for each desired grouping of unique users. So I shall do that. :)

How does collection sampling affect the "live" stats for Google Analytics?

We've noticed lately that as our site is growing, our data in Google Analytics is getting less reliable.
One of the places we've noticed this most strongly is on the "Realtime Dashboard".
When we were getting 30k users per day, it would show about 500-600 people on line at a time. Now that we are hitting 50k users per day, it's showing 200-300 people on line at a time.
(Other custom metrics from within our product show that the user behavior hasn't changed much; if anything, users are currently spending longer on the site than ever!)
The daily totals in analytics are still rising, so it's not like it's just missing the hits or something... Does anyone have any thoughts?
The only thing I can think of is that there is probably a difference in interpretation of what constitutes a user being on line.
How do you determine if the user is on line?
Unless there is an explicit login/logout tracking, is it possible that it assumes that a user has gone if there is no user generated event or a request from the browser within an interval of X seconds?
If that is the case then it may be worth while adding a hidden iframe with some Javascript code that keeps sending a request every t seconds.
You can't compare instant measures of unique, concurrent users to different time-slices of unique users.
For example, you could have a small number of concurrent unique users (say 10) and a much higher daily unique users number like 1000, because 1000 different people were there over the course of the day, but only 10 at any given time. The number of concurrent users isn't correlated to the total daily uniques, the distribution over the course of the day may be uneven and it's almost apples and oranges.
This is the same way that monthly unique and daily uniques can't be combined, but average daily uniques are a lower bound for monthly uniques.

Doing cohort analytics on Google Analytics

Suppose I have 65 people that register on January 1, 2012.
I want to find out how many of those 65 people returned to the site that same week. (More generally, if n people signup on date A, I want to be able to find out how many of those n people return in a given date range.)
Is there a way to do this using Google Analytics? If so, how? I am currently getting the user's username for each page hit.
If you only need to track people who sign in then you don't need to get very fancy. You can copy the relevant user attributes, such as sign up date, from your DB to GA using events or session level custom variables.
But if you want to track everyone, including those who don't sign up, then you'll need to use visitor level custom variables (GA cookies).
I explain how to set this up in detail in this post so I'll just highlight the key points here:
First, decide how to layout the data in Google Analytic's custom variables based on your requirements. For example, are you storing retention dates for daily, weekly or monthly tracking? Do you also want to track cohort goals? Partition this data into the available custom variable slots.
Write the cohort data to these custom variables when visitors arrive or achieve goals using Google Analytic's _setCustomVar function. Setting the fourth parameter of that function to 1 indicates you want to do visitor-level (cookie) tracking.
For each cohort you wish to analyze, create an advanced segment in Google Analytics. Using a regex expression in the condition will give you the flexibility to segment for interesting cohorts. ex: "All users whose first visit was the week before Christmas".
Analyze the results with reports by specifying a date range and the corresponding cohort-sliced advanced segments. Another option is to extract the data using the Google Analytics Data Feed Query Explorer or their API.
Once you've put in the work your new visitors will be stamped by their first visit date and nicely fall into each daily or weekly retention bucket. This is what it might look like if you were tracking weekly retention, for example:
This is not a full solution, but here are some points on how I would approach this problem with the help of Google Analytics:
You have to make sure that you somehow store the registration date of each user, either in your database or in a cookie. Then have a look at Google Analytics Event Tracking. You could for example set up a new category based on the registration date. On every page load in your page, you then have to set up this event tracking call, for example like:
_trackEvent("returns", "2012-01-01", "UserId:123123123")
This way you will receive all page views for users that registered on that particular date. To add a date range in this, you have to make sure that these events only get fired for the number of dates after the signup (e.g. 7 days).
After your date range, you will be able to see how many page views and how many users returned - you even know which users came back.

big difference in "visitor" count

I try to pull out the (unique) visitor count for a certain directory using three different methods:
* with a profile
* using an dynamic advanced segment
* using custom report filter
On a smaller site the three methods give the same result. But on the large site (> 5M visits/month) I get a big discrepancy between the profile on one hand and the advanced segment and filter on the other. This might be because of sampling - but the difference is smaller when it comes to pageviews. Is the estimation of visitors worse and the discrepancy bigger when using sampled data? Also when extracting data from the API (using filters or profiles) I still get DIFFERENT data even if GA doesn't indicate that the data is sampled - ie I'm looking at unsampled data.
Another strange thing is that the pageviews are higher in the profile than the filter, while the visitor count is higher for the filter vs the profile. I also applied a filter at the profile to force it to use sample data - and I again get quite similar results to the filter and segment-data.
profile filter segment filter#profile
unique 25550 37778 36433 37971
pageviews 202761 184130 n/a 202761
What I am trying to achieve is to find a way to get somewhat accurat data on unique visitors when I've run out of profiles to use.
More data with discrepancies can be found in this google docs: https://docs.google.com/spreadsheet/ccc?key=0Aqzq0UJQNY0XdG1DRFpaeWJveWhhdXZRemRlZ3pFb0E
Google Analytics (free version) tracks only 10 mio page interactions [0] (pageviews and events, any tracker method that start with "track" is an interaction) per month [1], so presumably the data for your larger site is already heavily sampled (I guess each of you 5 Million visitors has more than two interactions) [2]. Ad hoc reports use only 1 mio datapoints at max, so you have a sample of a sample. Naturally aggregated values suffer more from smaller sample sizes.
And I'm pretty sure the data limits apply to api access too (Google says that there is "no assurance that the excess hits will be processed"), so for the large site the api returns sampled (or incomplete) data, too - so you cannot really be looking at unsampled data.
As for the differences, I'd say that different ad hoc report use different samples so you end up with different results. With GA you shouldn't rely too much an absolute numbers anyway and look more for general trends.
[1] Analytics Premium tracks 50 mio interactions per month (and has support from Google) but comes at 150 000 USD per year
[2] Google suggests to use "_setSampleRate()" on large sites to make sure you have actually sampled data for each day of the month instead of random hit or miss after you exceed the data limits.
Data limits:
http://support.google.com/analytics/bin/answer.py?hl=en&answer=1070983).
setSampleRate:
https://developers.google.com/analytics/devguides/collection/gajs/methods/gaJSApiBasicConfiguration#_gat.GA_Tracker_._setSampleRate
Yes, the sampled data is less accurate, especially with visitor counts.
I've also seen them miss 500k pageviews over two days, only to see them appear in their reporting a few days later. It also doesn't surprise me to see different results from different interfaces. The quality of Google Analytics has diminished, even as they have tried to become more real-time. It appears that their codebase is inconsistent across API's, and their algorithms are all over the map.
I usually stick with the same metrics and reporting methods, so that my results remain comparable to one another. I also run GA in tandem with Gaug.es, as a validation and sanity check. With that extra data, I choose the reporting method in GA that I am most confident with and I rely on that exclusively.

Resources