I'm facing a pretty weird behavior of Google Analytics API.
Below are the examples of the queries I'm making using the Query Explorer. Dates and all other options are absolutely identical in both queries.
First query
metrics=ga:pageviews
filters=gaPagePath==/my_article_uri/
Result: 975
Second query
metrics=ga:pageviews
dimensions=ga:country
filters=gaPagePath==/my_article_uri/
Result: Azerbaijan: 3, Netherlands: 60, Russia: 2067, Singapore: 22, United States: 157
Total: 2309!
Question: Why the results are not equal?
I tried to google this, and it seems that the problem is the scope of the metrics/dimensions, but still can't really understand.
EDIT
For the first query, the results are NOT sampled.
However, for the second query (with the dimension specified) there IS sampling:
"containsSampledData": true,
"sampleSize": "499365",
"sampleSpace": "1568579"
Might be the reason?
Based on the sampling reported back in the query, I believe the inaccuracies are due to sampling. The second query is based on about 32% of sessions.
If you decrease your date range so that both are not sampled, then they should match.
Related
I'm having a tough time with Google Analytics, trying to understand why the value of metrics changes when segments are applied.
There is a standard audience overview report, which is based on 100% of sessions (no sampling) and the view is not filtered. The period is March of 2017.
Standard "All visitors" segment looks like this:
Then, there is another built-in segment called "Bounced Sessions". When I apply this segment, the "All visitors" values changes:
Amount of users increases, but the count of pageviews decreases.
Any ideas how to explain this?.. Thank you in advance!
Oki, there can be, multiple reasons. Let me explain first how these numbers are calculated, then we move on to your query.
There two types of data gathering and manipulation from google.
Pre-calculated data -- pre-aggregated tables
These are the precalculated data that Google uses to speed up the UI. Google does not specify when this is done but it can be at any point of the time. These are known as pre-aggregated tables
Data calculated on the fly
Some that you do which result in computation or manipulation falls under this category. Like using segments, creating custom reports etc.
Coming to your problem. When you apply segment, every metric that it effects will be calculated again. Thus it may result in numbers greater than you see in normal view.
Standard audience overview report is pre-aggregated at some point of the day. When you apply segment, the results will be calculated with the fresh data. Since latter is the latest, it will automatically give you increased number of the metrics. Even you can see a decrease as well, all depends on your data and user behavior.
Resolution: If you are a premium user, use Big Query. You must rely on big query for every metric as they are fresh and calculated on the fly
We are running the Google Analytics free version and I'm seeing some inconsistent results regarding data sampling. I have tried my requests in Google Analytics Query Explorer, the GA Sheets add-on, and within the GA interface.
Basically, I am comparing results from a complete date range against the sum of results for that date range broken into smaller chunks (to reduce/remove the chance of sampling occurring). Metrics are sessions, transactions, and revenue. I have a session-level dynamic segment applied: sessions::condition::!ga:landingPagePath=#/thanks
As you may expect, the results from the single request are different (counts are lower) than those from summing the multiple smaller requests. For example, sessions are 45,311 vs. 51,596 and income is further apart. This implies that sampling is being used for the larger request. The trouble is that the API response explicitly says that sampling is not used in any case, i.e. "Contains Sampled Data" equals "No", even for the full date range within which our property should be exceeding the 500,000 session threshold for sampling to kick in.
I'm almost certain that the results from summing smaller date ranges are correct, as these are pretty close to what we see in our CMS analytics.
Can anyone explain the mechanics behind this? Is GA doing some sort of behind-the-scenes sampling to produce this inconsistency?
Thanks,
Daniel
Sounds like sampling. Check all your sources to see if they contain sampling and make sure you have Sampling Level Set to "HIGHER_PRECISION".
1) Google Sheets Google Analytics Add-On in cell B6 of the data for each query check to see if it says "Yes: for "Contains Sampled Data"
2) Google Analytics Query Explorer in the header below your profile name check to see if it says "Contains Sampled Data: Yes"
You are on the right track in breaking your query down into smaller chunks with smaller date ranges to avoid sampling. Here is a post on how to Avoid Google Analytics Sampling using Python
I have a problem with the Google Analytics API.
I have two almost identical queries, the only difference is the selected metric (ga:pageviews vs ga:uniquePageviews)
the call for ga:pageviews return much lower totalResults than the call for uniquePageviews.
None of the queries use Sampling, so this should not be the Problem.
The otherwise identical Query with pageviews
At the bottom of the pictures you can see the total Results are much lower with the pageviews metric, but there shouldn't be any pagePaths that have uniquePageviews but no pageviews, so this data can't be accurate.
Am I missing something?
Can you help me get the correct numbers for the pageviews query?
Thanks!
Edit: Changing the date range or removing the filter does not resolve the problem
Edit2: after adding a sort by -ga:pageviews it looks like the results are grouped:
Hostname Page Pageviews
(other) (other) 204617
when I remove the hostname-dimension (and only use pagePath) this grouping does not occur and I get the complete list of URLs
I have no idea why the results are grouped, is there a way to prevent that?
Edit3: Looks like the results are thoses that have a session associated with them (Total results found: 1156 is the same number as a query for the ga:sessions metric returns) this only happens with pageviews (ga:uniquePageviews and also ga:hits returns all 200k+ rows)
Is this a bug or intended behaviour?
I understand that this is a question which has been asked elsewhere, but I haven't yet found an answer which is especially helpful.
The problem I'm having is that the data on the regular web version of analytics doesn't match the data I've pulled from the API.
From what I've read, this can sometimes be an issue with the type of query being used. Here's what I've been using:
var requiredArguments = {
'dimensions':'ga:medium',
'metrics': 'ga:users, ga:sessions, ga:uniquePageviews, ga:newUsers',
'sort': 'ga:medium',
'start-index': '1',
'max-results': '1000',
'sampling-level': 'DEFAULT',
};
and then...
var results = Analytics.Data.Ga.get(
tableId,
startDate,
finishDate,
'ga:users, ga:sessions, ga:uniquePageviews, ga:newUsers',
requiredArguments);
Sessions, across a month, for instance, can sometimes vary by other 1000. I've tried using different sampling types; I don;t think it's that, because I'm not going over 50,000 sessions in a query.
Any help on this is much appreciated.
You need to check the result returned if the data is sampled it will tell you the data is sampled.
"containsSampledData":false
samplingLevel
samplingLevel=DEFAULT Optional. Use this parameter to set the sampling
level (i.e. the number of sessions used to calculate the result) for a
reporting query. The allowed values are consistent with the web
interface and include: DEFAULT — Returns response with a sample size
that balances speed and accuracy. FASTER — Returns a fast response
with a smaller sample size. HIGHER_PRECISION — Returns a more accurate
response using a large sample size, but this may result in the
response being slower. If not supplied, the DEFAULT sampling level
will be used. See the Sampling section for details on how to calculate
the percentage of sessions that were used for a query.
Sampling should return results that are close but not exactly the same as the website. The only way to completely remove sampling from the API is to have a Premium Google Analytics Account
Also remember to consider processing latency. If you request data that is under 48 hours old it will also be different from the website.
I'm trying to use the GA API to run session queries that exclude certain page paths. I've tried using the following dimensions with varying levels of success, but am generally confused by the results.
pagePath
pagePathLevel1
pagePathLevel2
pagePathLevel3
Queries excluding visits based on pagePathLevel1 and pagePathLevel2 seem to yield expected results, but as soon as I use pagePathLevel3 it starts acting weird.
Example:
ga:pagePathLevel3==/page1/ yields no results
ga:pagePathLevel3==/page1/
yields the following:
desktop 882
mobile 124
tablet 38
Queries using regex and pagePath also act weird.
Example:
ga:pagePath=~/page/.*
yields the same result as
ga:pagePathLevel3!~/page/.*
I looked through Stack and the GA API docs and coudln't find anything useful on the subject. Any help would be greatly appreciated! I also tried including the hostname as a filter in conjunction with the pagePath dimensions with no improvement.