I've been using a SSIS Integration component to download data from Google Analytics in order to keep an historical view of some websites and track the evolution of them. Basically the metrics we track are Visits (now Sessions) and Visitros (now Users), and the dimensions are Year and Month. However, today I noticed that the data I downloaded for july had a variation on the Users metric. I heard that google analytics uses an estimation method to "calculate" some (if not all) of their metrics, could it be that after that they "adjust" the data with more acurate information? If so, is this mentioned in the documentation? (a link would be highly appreciated) Since the users are complaining that we are not delivering the real GA Data. I tried looked on the Google analytics documentation page with no luck.
Thanks for your time.
PS: Sorry for my english, it isnĀ“t my native language
If you are using the standard version of Google Analytics (you'll know if you are paying $150k for premium), data is sampled depending on volume. Have a read of this article can-you-trust-your-google-analytics-data
I have seen very slightly differing results being returned if you repeatedly call the api with the same historical parameters repeatedly. In my case the figures only differed by 1-2 over a daily set of several thousand, but nevertheless it differed.
If you want to guarantee your results, consider upgrading to premium
Sampling could be an issue if what you are requesting is over 50,000 rows for the time period you are requesting. To avoid it you can download more often, such as daily.
But I think your issue is that there is a processing time for Google Analytics - if you are downloading at 3 am on the 1st it is probable that the processing for the previous day has not finished.
Google Analytics Premium SLA is for 4 hour data freshness, so even that would have trouble. Pragmatically you should allow 24 hours before you download data for the previous day, 48 hours for e-commerce data.
Thirdly make sure it is not Unique Visitors you are requesting, as this is dependent on the time period you are requesting.
Related
I work for a non-profit that needs to see how our fundraising efforts are going in 'real-time'.
We look at results in blocks of about a half hour - so we need to report on how we finished the last 24 hours or so and also where we're at in the current half-hour. We're accomplishing this through google analytics, as we have multiple fundraising streams all pointing to a common GA account.
I have tried using datastudio to report against the GA API, but that connector does not seem to refresh at a reliable rate - someitmes it'll pull fresh data within a minute, sometimes it can take twenty minutes to report on recent transactions. I believe the 'real-time' API could be used to get fresher GA data, but as far as I can tell, that will only report 'live' data, and not prior/historical data (say from four hours ago). Does anyone know what API I could use if any to pull all data historical through current datetime?
I apologize if this request is vague, but I'm just looking for a conceptual approach at this point to get the freshest data - preferably in one fell swoop (API call). There is more complexity post-data intake (I have to then compare it to goals we've set for each half-hour, amongst other nuances to the transacitons themselves), so i wanted to start with this fundamental piece/question.
Thanks!
Given the context provided, I believe that the API solution would not be feasible. Among other reasons:
The real time API only offers a limited amount of dimensions and metrics. For example, e-commerce data is not available.
https://ga-dev-tools.appspot.com/dimensions-metrics-explorer/
https://developers.google.com/analytics/devguides/reporting/realtime/dimsmets
The Standard intraday processing SLA for the Core Reporting API is < 24 hours for standard properties. The processing occurs on a best effort basis. Meaning that an hourly availability can occur from time to time but can not be guaranteed.
https://support.google.com/analytics/answer/7084038?hl=en
As an alternative approach to the API solution, you could consider the use of an App + Web property which would allow you to stream event data in real time to BigQuery. However, this solution has some cost implications and would introduce you to a new tracking paradigm.
https://developers.google.com/analytics/devguides/collection/app-web/tag-guide
https://support.google.com/firebase/answer/6318765?hl=en
https://www.simoahava.com/analytics/getting-started-with-google-analytics-app-web/
We are using the free level of GA and have been creating reports using Custom Dimensions and Metrics since last summer.
We also use the Google Sheets Analytics add-on to post process data pulled from the API.
Overnight on 16-17 May (UK Time), our reports suddenly started showing as being sampled. Prior to that we had no sampling at all, as our reports are scheduled so I can look back through the revision history to see changes made when the scheduled reports run.
This sampling is occurring in custom reports viewed in the GA platform and in GA sheets. I've done some analysis and it appears to only occur at the point that more than one Custom Dimension is added to a report, or when the GA dimensions ga:hour or ga:dateHour are used (ga:date does not trigger sampling).
All our Custom Dimensions and Custom Metrics are set at Hit level (I've read a post where it was claimed to be due to mixing scopes on Dimensions & Metrics, but we are not doing this).
If I reduce the date range of a query (suggested as a solution on many blogs), the sampling level actually gets worse rather than better.
For the month of May we didn't even hit 4k sessions at property level. I can't find any reference anywhere to any changes being made to GA that would cause sampling to apply to our reports (change documentation, Google Blogs etc).
Is anyone else experiencing this or can anyone shed any light on why this might be happening? Given how we use GA if we can't resolve this then it's a year of work down the drain, so I'm really keen to at least know why this has suddenly happened even if ultimately nothing can be done about it.
I'm doing a quick proof of concept to understand the procedure to extract historical data from Google Analytics to be further used for offline data stitching to generate a holistic view of data and its analysis. I have not found any detailed online documentation available to understand pros and cons.
Would like to know any limitations on:
The time period for which data can be extracted or any limitation for max. calendar days?
Whether all dimensions/metrics can be extracted or any specific ones?
Will the data be real-time or sampled?
Can all data be pulled into a single table or separate ones?
Will it be available for both freeware and premium version?
The time period for which data can be extracted or any limitation for max. calendar days?
Start date can not be before the launch of Google analytics on '2005-01-01'. Due to processing time lag extracting data that is newer then 2 days old can result in incomplete data. Recommend checking the isDataGolden flag on the response.
Requesting large date ranges can result in sampling which can not be prevented. Its best to request the data in small chunks.
Whether all dimensions/metrics can be extracted or any specific ones?
A list of the dimensions and metrics you can extract can be found here. Each request can contain a maximum of 7 dimensions and 10 metrics.
Will the data be real-time or sampled?
Real-time API and Reporting API are two different APIs. Real-time API is not to my knowledge sampled but as its only about five minutes of data I find it hard to think anyone but really big websites will hit this problem if it is.
Will it be available for both freeware and premium version?
Accessing Google Analytics APIs is free there is no charge. There are however limits on how much data you can extract in a given day.
By default your application can run a maximum of 50k request a day. This can be extended.
Each view you are extracting from can make a maximum of 10k requests day. This can not be extended.
See: limits and quotas for more info.
Note: I am a developer on a business intelligence application that extracts Google Analytics data. I can tell you that its definitely doable.
In our project we stored all users event data in our database for over one year , but it's not indexed.
now we are going to use google analytics to store our analytics and analyze the report using google analytics dashboard.
but before start using google analytics , i would like to emigrate all old statics (about 2 million events) to google analytics.
for this matter i should use Measurement Protocol and it's limit allow me to transfer 2 million hits with no problem.
but i didn't succeed to know how to set the time of the event. Measurement Protocol has Queue Time but google says :
Values greater than four hours may lead to hits not being processed.
how it's possible to transfer 2 million events to google analytics with there event time ?
Thanks
You are correct you can use the measurement protocol to send events data directly to google analytics. I don't see any problem in sending 2 million events. However its not possible to set the event time longer then four hours ago.
Queue time is used to set the time that the event occurred as you can see it cant be more then four hours ago and I have found that if you do set it to four hours ago its a bit fuzzy if the data is correct or not. This feature is probably most use in mobile devices where they may go off line for a short time you can store the data then send it all once the device is online again.
So the dates will be the date that you sent the event to Google Analytics you cant back date the data to more then four hours ago. So I am not sure how much use the data will be to you when it is all inserted.
There is no way to do this, but you can make it easier on yourself.
Unfortunately, there is no way to add, remove, or otherwise edit Google Analytics hit data retrospectively, except to delete all of it. You also cannot copy, or move it between accounts, or download it all.
You are not the first to have to come to terms with this.
In this situation, we recommend to our clients that they run their new and old systems in parallel for a testing period (usually 6 months or a year), before switching off one of them.
Yes, it's difficult to let go of old data, but sometimes it has to be done.
We have a client who receives 2-4 million visits a day, so off the bat we can only get unsampled reports because it exceeds google's limit :
500,000 maximum sessions for special queries where the data is not already stored.
We are attempting to collect Unique Visitors and Visits for a 1 day period. Using the Google API has proved frivolous as the data is sampled.
We have set up Unsampled reports on a daily basis that get dumped into Google Drive and our application picks up the new files and downloads them just fine. The problem we are running into is that we need 2 years worth of daily data for 20 reports. The maximum range we can run an unsampled report using google analytics web interface is 1 week before we exceed a query limit. So 52 weeks of reports x 2 years x 20 different reports to set up is 2080 scheduled unsampled reports and this is for 1 client only.
EDIT: Can we automate unsampled reports using GA API or any programming method to pull historical data with the constraints previously mentioned? Also we do have Google Analytics Premium
Cris G, the only way to avoid data-sampling in Google Analytics without having an access to Premium is day-parting technique = you split a data-request for selected time period into shorter period queries (typically days) and then add all the numbers up. If your profiles/views are not sampled if you look at daily numbers, this could solve you issue.
However, this doesn't work on Unique Visitors, since they will be unique every single time (you are running data requests on daily basis), so there will be most likely duplicates and inflated totals if your site is attracting lots of returning visitors.
To automate some of the work, I suggest using tools like Analytics Canvas. It can make your life much easier and I think it could be the perfect tool for what you need to. Bear in mind the limitations about unique visitors (and some other metrics).
Having said that, I still think the best choice would be to use the benefits of Premium and the ability to get unsampled data for your reports.