Unsampled reports automation for historical data - google-analytics

We have a client who receives 2-4 million visits a day, so off the bat we can only get unsampled reports because it exceeds google's limit :
500,000 maximum sessions for special queries where the data is not already stored.
We are attempting to collect Unique Visitors and Visits for a 1 day period. Using the Google API has proved frivolous as the data is sampled.
We have set up Unsampled reports on a daily basis that get dumped into Google Drive and our application picks up the new files and downloads them just fine. The problem we are running into is that we need 2 years worth of daily data for 20 reports. The maximum range we can run an unsampled report using google analytics web interface is 1 week before we exceed a query limit. So 52 weeks of reports x 2 years x 20 different reports to set up is 2080 scheduled unsampled reports and this is for 1 client only.
EDIT: Can we automate unsampled reports using GA API or any programming method to pull historical data with the constraints previously mentioned? Also we do have Google Analytics Premium

Cris G, the only way to avoid data-sampling in Google Analytics without having an access to Premium is day-parting technique = you split a data-request for selected time period into shorter period queries (typically days) and then add all the numbers up. If your profiles/views are not sampled if you look at daily numbers, this could solve you issue.
However, this doesn't work on Unique Visitors, since they will be unique every single time (you are running data requests on daily basis), so there will be most likely duplicates and inflated totals if your site is attracting lots of returning visitors.
To automate some of the work, I suggest using tools like Analytics Canvas. It can make your life much easier and I think it could be the perfect tool for what you need to. Bear in mind the limitations about unique visitors (and some other metrics).
Having said that, I still think the best choice would be to use the benefits of Premium and the ability to get unsampled data for your reports.

Related

Google Analytics Real-time + historical data

I work for a non-profit that needs to see how our fundraising efforts are going in 'real-time'.
We look at results in blocks of about a half hour - so we need to report on how we finished the last 24 hours or so and also where we're at in the current half-hour. We're accomplishing this through google analytics, as we have multiple fundraising streams all pointing to a common GA account.
I have tried using datastudio to report against the GA API, but that connector does not seem to refresh at a reliable rate - someitmes it'll pull fresh data within a minute, sometimes it can take twenty minutes to report on recent transactions. I believe the 'real-time' API could be used to get fresher GA data, but as far as I can tell, that will only report 'live' data, and not prior/historical data (say from four hours ago). Does anyone know what API I could use if any to pull all data historical through current datetime?
I apologize if this request is vague, but I'm just looking for a conceptual approach at this point to get the freshest data - preferably in one fell swoop (API call). There is more complexity post-data intake (I have to then compare it to goals we've set for each half-hour, amongst other nuances to the transacitons themselves), so i wanted to start with this fundamental piece/question.
Thanks!
Given the context provided, I believe that the API solution would not be feasible. Among other reasons:
The real time API only offers a limited amount of dimensions and metrics. For example, e-commerce data is not available.
https://ga-dev-tools.appspot.com/dimensions-metrics-explorer/
https://developers.google.com/analytics/devguides/reporting/realtime/dimsmets
The Standard intraday processing SLA for the Core Reporting API is < 24 hours for standard properties. The processing occurs on a best effort basis. Meaning that an hourly availability can occur from time to time but can not be guaranteed.
https://support.google.com/analytics/answer/7084038?hl=en
As an alternative approach to the API solution, you could consider the use of an App + Web property which would allow you to stream event data in real time to BigQuery. However, this solution has some cost implications and would introduce you to a new tracking paradigm.
https://developers.google.com/analytics/devguides/collection/app-web/tag-guide
https://support.google.com/firebase/answer/6318765?hl=en
https://www.simoahava.com/analytics/getting-started-with-google-analytics-app-web/

Google API Returning Sampled Data

I'm using the Google-Analytics API to query my analytics for data using the Google Analytics Spreadsheet Add-on We then use the spreadsheet data in Google Data studio for a dashboard to display the data.
Everything has been going well for the last few months, however over the last 48 hours we have begun to receive sampled data when we query the API using the spreadsheet add on. This is undesirable for how we are using the data.
The total results that we were getting before being returned was about 1100 results. We have altered the date range of the query to be only 3 days whereas before we were querying since the start of the year.
Initially that worked and the results were no longer sampled. Then 24 hours later the data appears to be sampled again.
The documentation says the following regarding sampling for the free account:
Analytics Standard: 500k sessions at the view level for the date range you are using
We are not using our analytics that heavily so I cannot understand why we would have hit the 500K limit?
It is also not clear to me what a "View Level" is? Any help on this would be greatly appreciated.

How to extract Google Analytics historical data using APIs. Pros and cons?

I'm doing a quick proof of concept to understand the procedure to extract historical data from Google Analytics to be further used for offline data stitching to generate a holistic view of data and its analysis. I have not found any detailed online documentation available to understand pros and cons.
Would like to know any limitations on:
The time period for which data can be extracted or any limitation for max. calendar days?
Whether all dimensions/metrics can be extracted or any specific ones?
Will the data be real-time or sampled?
Can all data be pulled into a single table or separate ones?
Will it be available for both freeware and premium version?
The time period for which data can be extracted or any limitation for max. calendar days?
Start date can not be before the launch of Google analytics on '2005-01-01'. Due to processing time lag extracting data that is newer then 2 days old can result in incomplete data. Recommend checking the isDataGolden flag on the response.
Requesting large date ranges can result in sampling which can not be prevented. Its best to request the data in small chunks.
Whether all dimensions/metrics can be extracted or any specific ones?
A list of the dimensions and metrics you can extract can be found here. Each request can contain a maximum of 7 dimensions and 10 metrics.
Will the data be real-time or sampled?
Real-time API and Reporting API are two different APIs. Real-time API is not to my knowledge sampled but as its only about five minutes of data I find it hard to think anyone but really big websites will hit this problem if it is.
Will it be available for both freeware and premium version?
Accessing Google Analytics APIs is free there is no charge. There are however limits on how much data you can extract in a given day.
By default your application can run a maximum of 50k request a day. This can be extended.
Each view you are extracting from can make a maximum of 10k requests day. This can not be extended.
See: limits and quotas for more info.
Note: I am a developer on a business intelligence application that extracts Google Analytics data. I can tell you that its definitely doable.

Google analytics data adjustment?

I've been using a SSIS Integration component to download data from Google Analytics in order to keep an historical view of some websites and track the evolution of them. Basically the metrics we track are Visits (now Sessions) and Visitros (now Users), and the dimensions are Year and Month. However, today I noticed that the data I downloaded for july had a variation on the Users metric. I heard that google analytics uses an estimation method to "calculate" some (if not all) of their metrics, could it be that after that they "adjust" the data with more acurate information? If so, is this mentioned in the documentation? (a link would be highly appreciated) Since the users are complaining that we are not delivering the real GA Data. I tried looked on the Google analytics documentation page with no luck.
Thanks for your time.
PS: Sorry for my english, it isnĀ“t my native language
If you are using the standard version of Google Analytics (you'll know if you are paying $150k for premium), data is sampled depending on volume. Have a read of this article can-you-trust-your-google-analytics-data
I have seen very slightly differing results being returned if you repeatedly call the api with the same historical parameters repeatedly. In my case the figures only differed by 1-2 over a daily set of several thousand, but nevertheless it differed.
If you want to guarantee your results, consider upgrading to premium
Sampling could be an issue if what you are requesting is over 50,000 rows for the time period you are requesting. To avoid it you can download more often, such as daily.
But I think your issue is that there is a processing time for Google Analytics - if you are downloading at 3 am on the 1st it is probable that the processing for the previous day has not finished.
Google Analytics Premium SLA is for 4 hour data freshness, so even that would have trouble. Pragmatically you should allow 24 hours before you download data for the previous day, 48 hours for e-commerce data.
Thirdly make sure it is not Unique Visitors you are requesting, as this is dependent on the time period you are requesting.

big difference in "visitor" count

I try to pull out the (unique) visitor count for a certain directory using three different methods:
* with a profile
* using an dynamic advanced segment
* using custom report filter
On a smaller site the three methods give the same result. But on the large site (> 5M visits/month) I get a big discrepancy between the profile on one hand and the advanced segment and filter on the other. This might be because of sampling - but the difference is smaller when it comes to pageviews. Is the estimation of visitors worse and the discrepancy bigger when using sampled data? Also when extracting data from the API (using filters or profiles) I still get DIFFERENT data even if GA doesn't indicate that the data is sampled - ie I'm looking at unsampled data.
Another strange thing is that the pageviews are higher in the profile than the filter, while the visitor count is higher for the filter vs the profile. I also applied a filter at the profile to force it to use sample data - and I again get quite similar results to the filter and segment-data.
profile filter segment filter#profile
unique 25550 37778 36433 37971
pageviews 202761 184130 n/a 202761
What I am trying to achieve is to find a way to get somewhat accurat data on unique visitors when I've run out of profiles to use.
More data with discrepancies can be found in this google docs: https://docs.google.com/spreadsheet/ccc?key=0Aqzq0UJQNY0XdG1DRFpaeWJveWhhdXZRemRlZ3pFb0E
Google Analytics (free version) tracks only 10 mio page interactions [0] (pageviews and events, any tracker method that start with "track" is an interaction) per month [1], so presumably the data for your larger site is already heavily sampled (I guess each of you 5 Million visitors has more than two interactions) [2]. Ad hoc reports use only 1 mio datapoints at max, so you have a sample of a sample. Naturally aggregated values suffer more from smaller sample sizes.
And I'm pretty sure the data limits apply to api access too (Google says that there is "no assurance that the excess hits will be processed"), so for the large site the api returns sampled (or incomplete) data, too - so you cannot really be looking at unsampled data.
As for the differences, I'd say that different ad hoc report use different samples so you end up with different results. With GA you shouldn't rely too much an absolute numbers anyway and look more for general trends.
[1] Analytics Premium tracks 50 mio interactions per month (and has support from Google) but comes at 150 000 USD per year
[2] Google suggests to use "_setSampleRate()" on large sites to make sure you have actually sampled data for each day of the month instead of random hit or miss after you exceed the data limits.
Data limits:
http://support.google.com/analytics/bin/answer.py?hl=en&answer=1070983).
setSampleRate:
https://developers.google.com/analytics/devguides/collection/gajs/methods/gaJSApiBasicConfiguration#_gat.GA_Tracker_._setSampleRate
Yes, the sampled data is less accurate, especially with visitor counts.
I've also seen them miss 500k pageviews over two days, only to see them appear in their reporting a few days later. It also doesn't surprise me to see different results from different interfaces. The quality of Google Analytics has diminished, even as they have tried to become more real-time. It appears that their codebase is inconsistent across API's, and their algorithms are all over the map.
I usually stick with the same metrics and reporting methods, so that my results remain comparable to one another. I also run GA in tandem with Gaug.es, as a validation and sanity check. With that extra data, I choose the reporting method in GA that I am most confident with and I rely on that exclusively.

Resources