Why is this Google Analytics query unsampled via the web UI but sampled via the GA API? - google-analytics

I recently attempted to query the Google Analytics API for a report using device category, source, and medium as dimensions. The report covered about four weeks of time. Despite the fact that I was able to build the equivalent ad-hoc report in the UI and get results based on 100% of sessions, I couldn't get the API to give me results based on any more than 1.3% or so of sessions. The client I'm using is based on the v3 API, but I got the same results when using Google's v4 testing tool, so it's not a function of the API version.
According to Google's documentation, ad-hoc reports are supposed to use pre-aggregated unsampled data where possible:
Ad-hoc reports are based on any non-standard query of Analytics data. For example, if you apply a segment or secondary dimension to a standard report, then Analytics has to issue a new, non-standard query of the data to return that information.
The new query goes first to the tables of aggregated data to see if all of the requested information is available there. If the information is not available there, then Analytics queries the complete, unfiltered set of data and computes new aggregates to satisfy the application of the segment or secondary dimension.
This is apparently true of the web UI, but not necessarily of the API. I was under the impression that the web UI was making calls equivalent to those exposed in the API under the hood, but it seems that this isn't the case. Does anybody know whether it's possible to force the API query to use the pre-aggregated data sets that I know are available?

The difference in sampling threshold between the web UI and the API does indeed explain this. This happens to be a 360 account, for which the sampling threshold is much higher than the API permits (the documentation is cagey about exact numbers but apparently it can be "up to 100M sessions"). The same test on a standard account showed equivalent behavior between the API and the web UI. Google's issue tracker for the GA API indicates that they do not plan to increase the sampling threshold beyond 1M sessions even for 360 accounts.

Related

Pushing specific visitor ID into GA as personal identifier (Pardot)

I am trying to get to a point where I can identify visitors who are generating website Goals. And identifying them via their Pardot ID-s in GA.
Do you think that's possible?
On the site every visitor gets a Pardot cookie and in that there is a readable Visitor ID which via an API query can be turned into a Pardot ID.
But how can this piece of information get stitched to the rest of the GA parameters? How to push this into GA as a custom data point so I can create a report on who are the Pardot IDs that completed a certain goal this week?
Is there any guidance you can give?
Assuming, that Pardot ID itself is not a Personally Identifiable Information (PII) in terms of Google Analytics, there are several ways to accomplish this.
You could provide this data as User ID, which helps Google Analyitcs to identify users across several browsers and devices. However, this dimension is not exposed on the reporting GUI or the reporting API. (Available dimensions and metrics can be browsed here.)
Instead, or in parallel, you could store this information in a custom dimension, which, can be used in standard or custom reports, or via the reporting API as well. There a couple of things to consider. According to the Measurement Protocoll reference, the maximum length of this field is 150 bytes. You should also decide, if this dimension is most useful for your needs and possibilities on hit, session or user level, about which you can read here.

How to replicate the GA field Visits in Big Query

In a typical GA session, after picking a View ID and a date range,
We can get a week's worth of data like this:
Users
146,207
New Users
124,582
Sessions
186,191
The question is, what BQ field(s) to query in order to get this Users value?
Here is an example query with 2 methods (the 2nd method is commented out).
SELECT
count(DISTINCT(CONCAT(CAST(visitID as STRING),cast(visitNumber as
STRING)))) as visitors,
-- count(DISTINCT(fullVisitorId)) as visitors
I noticed the FVID method was fairly close to what I see in GA (with Users being a little understated by a 3% in BQ) and if I use the commented out method, I get a value that is about 15% overstated as compared to GA. Is there a more reliable method in BQ to acquire the Users value in GA?
The COUNT(DISTINCT fullVisitorId) method is the most correct method, but it won't match what Analytics 360 reports by default. Since last year, Google Analytics 360 by default uses a different calculation for the Users metric than it previously did. The old calculation, which is still used in unsampled reports, is more likely to match what you get out of BigQuery. You can verify this by exporting your report as an unsampled report, or using the unsampled reporting features in the Management API.
If you want the numbers to match exactly, you can turn off the new calculation by using the instructions here. The new calculation's precise details are not public, so duplicating that value in BigQuery is quite difficult.
There are still some reasons you might see different numbers, even with the old calculation. One is if the site has implemented User ID, in which case the GA number will be lower than BigQuery for fullVisitorId. Another is sampling, though that's unlikely in Analytics 360 at the volumes you're talking about.

Unsampled data with Google Analytics API

I am trying to automate the weekly report. Currently, I am using Google Analytics website to get the data for my report. Sampling level is higher precision.
I tried to get the same data by Google analytics API set samplingLevel as HIGHER_PRECISION. However, I am still getting the sampled data.
For FASTER, Precision Level is roughly 25% whereas for DEFAULT and HIGHER_PRECISION sampling level is roughly 50%.
On Google Analytics website, it says 'This report is based on 100% of sessions'. Can I get the same level of accuracy with Google API? I am using Google Apps script.Response for HIGHER_PRECISION is not matching.
Sumit, the API and the Google Analytics UI are certainly different and similarly the sampling's effect on things is a different beast which must be handled properly to get anything useful out of it.
As was mentioned in the comment, you can achieve high precision unsampled reports by (typically) shortening your date range that you're querying for and then "walking" the data.
To walk the data, you are essentially just gradually incrementing that small date range as you move through the desired data.
The "unsampled reports API" is... well, not the best. Considering that's what they are avoiding giving the end user in the first place, the offering available is not a very good long term or large project friendly solution. I would recommend small date ranges and then doing a data walk.
Happy Coding
There are several solutions to avoid sampling issue in Google Analytics by automating the process of data export for short date ranges.
I prefer this tool, it's pretty simple to use: MadStats.io

How to extract Google Analytics historical data using APIs. Pros and cons?

I'm doing a quick proof of concept to understand the procedure to extract historical data from Google Analytics to be further used for offline data stitching to generate a holistic view of data and its analysis. I have not found any detailed online documentation available to understand pros and cons.
Would like to know any limitations on:
The time period for which data can be extracted or any limitation for max. calendar days?
Whether all dimensions/metrics can be extracted or any specific ones?
Will the data be real-time or sampled?
Can all data be pulled into a single table or separate ones?
Will it be available for both freeware and premium version?
The time period for which data can be extracted or any limitation for max. calendar days?
Start date can not be before the launch of Google analytics on '2005-01-01'. Due to processing time lag extracting data that is newer then 2 days old can result in incomplete data. Recommend checking the isDataGolden flag on the response.
Requesting large date ranges can result in sampling which can not be prevented. Its best to request the data in small chunks.
Whether all dimensions/metrics can be extracted or any specific ones?
A list of the dimensions and metrics you can extract can be found here. Each request can contain a maximum of 7 dimensions and 10 metrics.
Will the data be real-time or sampled?
Real-time API and Reporting API are two different APIs. Real-time API is not to my knowledge sampled but as its only about five minutes of data I find it hard to think anyone but really big websites will hit this problem if it is.
Will it be available for both freeware and premium version?
Accessing Google Analytics APIs is free there is no charge. There are however limits on how much data you can extract in a given day.
By default your application can run a maximum of 50k request a day. This can be extended.
Each view you are extracting from can make a maximum of 10k requests day. This can not be extended.
See: limits and quotas for more info.
Note: I am a developer on a business intelligence application that extracts Google Analytics data. I can tell you that its definitely doable.

enabling hourly data in google analytics

I have two view/profiles linked to my google analytics account. I want to fetch the hourly data for the current day, ie
start date:today
end date: today
with a few filters and dimensions.
Now I am getting the response for one view that means it is possible in google analytics, however for the other view its showing all the values as 0- this applies both to the gui and the api.
Can anyone suggest me how to enable it for the other view as well?
You cannot. Google Analytics needs some processing time. It might be that some data appears immediately, especially on small accounts, but it's not guaranteed and not a thing you can "enable" or count on.
Updated: Okay, that was a dumb answer. Still, there is a processing latency event in GA Premium. It is possible to get realtime data, but that's a different API with limited data (the core reporting API might return data, but no guarantees for that).
But I admit, since your problem is that you do not get data for the whole day yor have a different problem. But with a premium account you should be able to contact your account manager/technical support.

Resources