My goal is to get ga::users, ga::sessions and ga::avgSessionDuration for all sessions that visited page path containing /mockeroo from Google Analytics API v4. Ignoring session duration for now, it is my understanding that a filter clause like ga:page contains /mockeroo and sessions::condition::ga:page contains /mockeroo should produce the same results for at least number of users and number of sessions. However, I cannot replicate the results on Google Analytics (which is using the segment constraint) with my API call (which is using the filter clause). Am I not understanding the difference between a filter clause and a segment correctly?
Related
I recently attempted to query the Google Analytics API for a report using device category, source, and medium as dimensions. The report covered about four weeks of time. Despite the fact that I was able to build the equivalent ad-hoc report in the UI and get results based on 100% of sessions, I couldn't get the API to give me results based on any more than 1.3% or so of sessions. The client I'm using is based on the v3 API, but I got the same results when using Google's v4 testing tool, so it's not a function of the API version.
According to Google's documentation, ad-hoc reports are supposed to use pre-aggregated unsampled data where possible:
Ad-hoc reports are based on any non-standard query of Analytics data. For example, if you apply a segment or secondary dimension to a standard report, then Analytics has to issue a new, non-standard query of the data to return that information.
The new query goes first to the tables of aggregated data to see if all of the requested information is available there. If the information is not available there, then Analytics queries the complete, unfiltered set of data and computes new aggregates to satisfy the application of the segment or secondary dimension.
This is apparently true of the web UI, but not necessarily of the API. I was under the impression that the web UI was making calls equivalent to those exposed in the API under the hood, but it seems that this isn't the case. Does anybody know whether it's possible to force the API query to use the pre-aggregated data sets that I know are available?
The difference in sampling threshold between the web UI and the API does indeed explain this. This happens to be a 360 account, for which the sampling threshold is much higher than the API permits (the documentation is cagey about exact numbers but apparently it can be "up to 100M sessions"). The same test on a standard account showed equivalent behavior between the API and the web UI. Google's issue tracker for the GA API indicates that they do not plan to increase the sampling threshold beyond 1M sessions even for 360 accounts.
I have a custom segment that has been is use for in excess of a year which very recently started returning some very odd results in the interface. Session numbers seemed to reduce down to match the transactions numbers, so ecommerce conversion rate was ~100%.
The interface seemed to "recover" last week, but the API still will not return session numbers that are expected. Other custom segments return normal session numbers. I have queried the API with two different spreadsheet add-on and the query explorer but they all return the same - even for very short time periods where sampling is not an issue.
I am implementing Google Analytics (via GTM) on multiple ecommerce sites. I need to record transactions to the client's google analytics account as well as to our single master analytics account, which accrues data for multiple sites.
I am wondering if there will be any issues sending duplicate transaction ids to our master google analytics's account i.e. if an order is placed on two different client sites and they happen to both be the same transaction id. Would Google Analytics recognise these as two separate transactions or would one overwrite the other?
If you send two identical transaction ids to the same account Google Analytics will add the second transaction to the first and show products for both under the same transaction id (before refund data upload became available that was actually used to cancel out unwanted transations using negative values for revenue).
However internally they will be counted as two transactions (i.e. in calculations for E-Commerce-Conversionrate etc.). Also if you set a time based second dimension (hour of day, minute index or the like) or e.g. by hostname you will see that Google untangles both transactions (so they appear added up only for timeframes that encompasses both).
To avoid this I would recommend that you use an advanced filter to add the hostname to the transaction id.
I am using Google Bigquery to extract data on conversion paths from Google Analytics (GA).
When I analyze these conversion paths from the exported dataset, the last-click conversions match the Acquisition report in GA, but not to the Multi Channel Funnel (MCF) data. Apparently Bigquery doesn't really export raw data, but transforms it by deleting all last direct clicks. like described here: https://support.google.com/analytics/answer/1319312?hl=en.
Is it possible to get the Bigquery data to correspond to Multi Channel Funnel (MCF) conversion path data? To undo the deletion of last non-direct click and get proper 'raw' user level data?
All of the trafficSource fields in BigQuery Export for Google Analytics use campaign attribution as described in this processing flow, which will overwrite direct traffic with the most recent campaign (if there is one and it is within the specified timeout), as you mentioned.
If you are using Universal Analytics, you can adjust the campaign timeout to be shorter than the 6 month default. For example, if you set the campaign timeout to be one day, any direct visits that come in at least one day after a visit with a campaign will be attributed to direct instead of the previous campaign. This can be done with Classic Analytics as well using _setCampaignCookieTimeout. This technique will affect data collection from the time it is implemented going forward.
This thread is rather dated, so I thought I'd update just in case anyone else comes across this same question.
There is a field that was introduced (both in the Google Analytics interface and the BigQuery export) that allows you to match the numbers in the MCF reports. In BigQuery, look for the field trafficSource.isTrueDirect
BigQuery Export Schema
trafficSource.isTrueDirect
True if the source of the session was Direct (meaning the user typed
the name of your website URL into the browser or came to your site via
a bookmark), This field will also be true if 2 successive but distinct
sessions have exactly the same campaign details. Otherwise NULL.
This may be a possible duplicate of this question, but according to all the Google Analytics documentation I really should be able to pull my list of custom segments.
Since I have a very large list of them, it would be suboptimal for me to manually copy the segment ids over one at a time.
I'm following this walk through. Steps to reproduce:
Create a custom segment using date of first session in your Google Analytics account.
Authorize the Google Analytics guide to access your Google Analytics account.
Try their on-page query tester, and inspect whether your custom segment is there.
One thing I've already ruled out was the user that created the segment. I've manually created a segment with the same user that I'm querying the API with and it still does not show. Is there a flag I need to set somewhere to include custom segments?
Edit:
It turns out that it will list some custom segments, but not ones created with date of first session, so this is a duplicate of this question, which means that there is a bug in the Google Analytics API.
There was a bug which is now fixed. So it is now possible to list the Date of Session Segments in the Google Analytics Management API by calling the segments.list() method.
So after days of trying to solve this one I've come to the conclusion that it cannot be done as asked.
There is, however, another way to do it. For every segment set up a daily (or weekly, etc) email report to a email as a TSV. In each email body specify the name of the segment so when you're consuming the emails you can know which segment the attached TSV is for. It doesn't look like the daily reports were designed with segments in mind, since non of the metadata included in the TSV mentions which segment it is for.
From there it's trivial. Connect to the email address using an IMAP client once a day and update the numbers.
Note that the daily email only contains the numbers for that day (not a specified range), so you'll need to first generate the report one time with the historical data to load in.
While hacky, one nice thing about this approach is that it keeps your reports in sync with your (faked through email) api code (provided you match the column headings in the TSV). So, if for example, a new filter is included into a report, the new daily fields will continue to update.
Unfortunately though, the past data won't be reflected in the change.
Obviously this isn't great, but if you are monitoring daily cohorts it's the best you've got if you need to stay with Google Analytics. I have raised this as a bug to the Google Analytics developers, but I haven't heard back as to whether or not they plan to fix it.