How to replicate the GA field Visits in Big Query - google-analytics

In a typical GA session, after picking a View ID and a date range,
We can get a week's worth of data like this:
Users
146,207
New Users
124,582
Sessions
186,191
The question is, what BQ field(s) to query in order to get this Users value?
Here is an example query with 2 methods (the 2nd method is commented out).
SELECT
count(DISTINCT(CONCAT(CAST(visitID as STRING),cast(visitNumber as
STRING)))) as visitors,
-- count(DISTINCT(fullVisitorId)) as visitors
I noticed the FVID method was fairly close to what I see in GA (with Users being a little understated by a 3% in BQ) and if I use the commented out method, I get a value that is about 15% overstated as compared to GA. Is there a more reliable method in BQ to acquire the Users value in GA?

The COUNT(DISTINCT fullVisitorId) method is the most correct method, but it won't match what Analytics 360 reports by default. Since last year, Google Analytics 360 by default uses a different calculation for the Users metric than it previously did. The old calculation, which is still used in unsampled reports, is more likely to match what you get out of BigQuery. You can verify this by exporting your report as an unsampled report, or using the unsampled reporting features in the Management API.
If you want the numbers to match exactly, you can turn off the new calculation by using the instructions here. The new calculation's precise details are not public, so duplicating that value in BigQuery is quite difficult.
There are still some reasons you might see different numbers, even with the old calculation. One is if the site has implemented User ID, in which case the GA number will be lower than BigQuery for fullVisitorId. Another is sampling, though that's unlikely in Analytics 360 at the volumes you're talking about.

Related

Google Analytics Hit Quotas

I wonder whether someone can help me please.
I have a user who under a specific property, sporadically receives the following error:
Some hits sent on 03-Jul-2018 to property ...... exceeded one or more hit quotas and were therefore not processed.
Hits can be dropped when daily or monthly hit limits are exceeded. You can view your hit volume levels in Property Settings in Analytics.
Hits can also be dropped if visitor hit limits are exceeded. This can happen when your site is incorrectly generating the visitor ID for a GA session. Contact your website administrator to check that the visitor ID generation has been correctly implemented.
They are not using the Premium account but when I look at the data for the day in question, there aren't any issues with regards to 'High Cardinality' which unless I've misunderstood I'd expect to see.
Could someone look at this please and offer some guidance where the issue may be because this area is fairly new to me.
Many thanks and kind regards
Chris
Collection limits are influenced by 2 factors:
The tracker: whether you use ga.js,gtag.js,analytics.js etc... here are the details.
The property type: whether you are using GA (10M hits / month) or GA 360 (2B hits / month).
In your case you are facing a property limit. To find out when such limits where reached, you can create a custom report using a time dimension (eg date+time) combined with the hits metric. You can also combine the hit metrics with other dimensions (country, browser, device) to see if you find any patterns as to why you're getting so many hits.
Cardinality is something else: it refers to the number of unique value combinations for your dimensions. For instance if you have 500K events where each event category is different, you'll have a Cardinality of 500K on the event category dimension. The more hits, the more likely you'll have a high cardinality, but the 2 aren't necessary related (if you send 10B events with the same category, the cardinality on the category is 1).
So focus on identifying and solving your limits/quotas issue, as it's the real issue here:
If the number of hits is legitimate (you have a huge amount of traffic), then the only options are to upgrade to GA 360 or reduce the number of hits for each session
If the number of hits is abnormally high (eg traffic is stable but hits increased dramatically), look for implementation issues, especially generic event trackers such as error tracking with tools like Google Tag Manager

How can I view individual hits to pages within a GA custom report

I would like to compare some data between a 3rd party analytics tool and GA.
Now I would love to see the IP addresses that Ga is receiving however it seems that they do not reveal this information, fine, however, I cannot find a way to use the flat table in the GA custom report to show me the following if possible;
Full Date Time (Seems as though they don't want you to have this either)
Browser Version
Browser Width & Height
Page (from the hit)
And I would like this data not to be grouped by the metric, this way I can see that if the same user has hit a page 3 times it isn't grouped.
If anyone can help please let me know. If the question is poorly phrased please let me know.
Thanks,
Connor.
This requires some work, and it will allow the breakdown only for future hits, not for hits that are already collected.
To view individual hits you need to create a hit based dimension that is unique per hit. Unless your page has an amazing amount of traffic a timestamp in milliseconds (e.g. new Date().getTime()) will be sufficient (for your report you might want to format that in a nice way). So in the admin section of your GA property you go to custom definitions, create a hit scoped custom dimension, and then modify your pagecode to send the timestamp to that dimension. Hit scoped means it is attached to the pageview (or other interacton hit) it is sent with.
If you want to break down your report by user you need the clientid (clientid is how Google recognizes that hits belong to the same user). Again, send it as a custom dimension.
This does not tell you how many sessions the user had (there is no session identifier in GA). If you need to know that you can create a session scoped custom dimension and send a random number along ("session scope" means that GA only stores the last value in a session, so you don't need to maintain a session id over multiple pageviews, since the last value will be set for all hits within the session). The number of different sessions ids per client id then tells you the number of sessions per user.
The takeaway is that GA only shows aggregated data, and if you want to defeat this mechanism you need to throw data at it that cannot be aggregated further. You might run into other constraints (i.e. there is a limited number of rows per report).

Hits Processed Per Month?

If you refer to http://www.google.com/intl/en_uk/analytics/premium/features.html, you will notice that Standard allows for 10 million hits processed per month and Premium allows for 1 billion.
I have a website on an account, with multiple "folders" for different sub-domains, and also different "Views" or dashboards for some of these sub-domains.
The website I am on recently lost tracking for conversion rates, and everything has plummeted to near 0%, which is an incorrect statistic. I am curious as to how I can figure up if this account is reaching the 10 million limit on the standard version. Or at least how to figure actual hits processed a day, week, or month?
Any ideas?
Thanks!
I don't know how Google enforces hit limits in 2015. However in 2013 a Google representative sent one of our bigger clients a document (answering a question about data limits) that contained the following paragraph:
How do data limits impact sampling? Google Analytics does not sample
your clients data at the point of collection or processing, regardless
of how far they exceed our stated limits. So no hits are discarded.
The only way to sample data at the point of collection is for clients
to use_setSampleRate in their tracking code.
[...]
[...] we reserve the right to shutdown their account [sc. if limits are exceeded], but it won't
happen before we have attempted to contact the account Admins multiple times
and we have exhausted all other options.
Unless Google has changed it's policy in the last 1,5 years I would say not, unprocessed hits are not your problem; it seems Google would have contacted you with an request to limit your hits or upgrade to Analytics Premium before problems occurr.
Plus, since you mentioned that you have several views - views do not count towards your quota (they display the same data in different ways). However properties (I think that is what you mean by "folders") do.
Updated 2017: It seems that Google intends to enforce limits more strictly. One of my clients now has the following warning in his GA interface:
Your data volume (XXX hits) exceeds the limit of 10M hit per month as
outlined in our terms of service. If you continue to exceed the limit
you will lose access to future data.
You can create a database table, like this:
visits(
id bigint primary key auto_increment,
ip text,
visit_date timestamp default current_timestamp
)
Upon each page visit, you can insert a record into the table. Later you can view statistics. For instance, visit count in a given day would look like:
select id, ip, visit_date
from visits
where visit_date >= '2015-07-21 00:00:00' and visit_date < '2015-07-22 00:00:00'

How can I query Google Analytics condition on TWO different dates?

I wish to extract (via the Analytics Core Reporting API) all the transactions made TODAY by users that had a specific ga:eventCategory few weeks ago.
I'm looking to see the date of a transaction and all dated of event that are related to that transaction.
If GA was sql I would join by the ga user and take in the dimension both his transactions date and his dimension update date...
Thanks.
Noam.
Like I have indicated in my comment you can segment the data to include only those users who have the specific event. Segmentation works fine with the core reporting API.
Your segment defintion would look like this:
users::condition::ga:eventCategory==[myEventCategory]
(where obviously the thing in [brackets] is a placeholder that needs to be substituted for the event category name). The "users::" prefix means you are segmenting by user scope (as opposed to sessions), so this will include all sessions in the selected timeframe for users who had the event at least in one of their session (even if the event was outside the selected timeframe).
Select transactionId as dimension and some metric (revenue) and todays date and you are done. Or you would be done if this was actually going to work, but there are at least two caveats:
Google Analytics does not work in realtime, so it's unlikely that TODAYs transactions are fully available (Google says it's 24 hours until the data is processed - actually it might happen faster, but you cannot rely on it).
If a user has deleted his or her cookie she won't be recognized as a recurring user and GA will be unable to segment her out. The longer the interval between the event and the transaction the less likey it is that the GA cookie is still present.
So even with a technically correct query it might be that you won't get the data you need.

Getting MCF Conversions path data from Google Bigquery

I am using Google Bigquery to extract data on conversion paths from Google Analytics (GA).
When I analyze these conversion paths from the exported dataset, the last-click conversions match the Acquisition report in GA, but not to the Multi Channel Funnel (MCF) data. Apparently Bigquery doesn't really export raw data, but transforms it by deleting all last direct clicks. like described here: https://support.google.com/analytics/answer/1319312?hl=en.
Is it possible to get the Bigquery data to correspond to Multi Channel Funnel (MCF) conversion path data? To undo the deletion of last non-direct click and get proper 'raw' user level data?
All of the trafficSource fields in BigQuery Export for Google Analytics use campaign attribution as described in this processing flow, which will overwrite direct traffic with the most recent campaign (if there is one and it is within the specified timeout), as you mentioned.
If you are using Universal Analytics, you can adjust the campaign timeout to be shorter than the 6 month default. For example, if you set the campaign timeout to be one day, any direct visits that come in at least one day after a visit with a campaign will be attributed to direct instead of the previous campaign. This can be done with Classic Analytics as well using _setCampaignCookieTimeout. This technique will affect data collection from the time it is implemented going forward.
This thread is rather dated, so I thought I'd update just in case anyone else comes across this same question.
There is a field that was introduced (both in the Google Analytics interface and the BigQuery export) that allows you to match the numbers in the MCF reports. In BigQuery, look for the field trafficSource.isTrueDirect
BigQuery Export Schema
trafficSource.isTrueDirect
True if the source of the session was Direct (meaning the user typed
the name of your website URL into the browser or came to your site via
a bookmark), This field will also be true if 2 successive but distinct
sessions have exactly the same campaign details. Otherwise NULL.

Resources