How can I query Google Analytics condition on TWO different dates? - google-analytics

I wish to extract (via the Analytics Core Reporting API) all the transactions made TODAY by users that had a specific ga:eventCategory few weeks ago.
I'm looking to see the date of a transaction and all dated of event that are related to that transaction.
If GA was sql I would join by the ga user and take in the dimension both his transactions date and his dimension update date...
Thanks.
Noam.

Like I have indicated in my comment you can segment the data to include only those users who have the specific event. Segmentation works fine with the core reporting API.
Your segment defintion would look like this:
users::condition::ga:eventCategory==[myEventCategory]
(where obviously the thing in [brackets] is a placeholder that needs to be substituted for the event category name). The "users::" prefix means you are segmenting by user scope (as opposed to sessions), so this will include all sessions in the selected timeframe for users who had the event at least in one of their session (even if the event was outside the selected timeframe).
Select transactionId as dimension and some metric (revenue) and todays date and you are done. Or you would be done if this was actually going to work, but there are at least two caveats:
Google Analytics does not work in realtime, so it's unlikely that TODAYs transactions are fully available (Google says it's 24 hours until the data is processed - actually it might happen faster, but you cannot rely on it).
If a user has deleted his or her cookie she won't be recognized as a recurring user and GA will be unable to segment her out. The longer the interval between the event and the transaction the less likey it is that the GA cookie is still present.
So even with a technically correct query it might be that you won't get the data you need.

Related

How can I view individual hits to pages within a GA custom report

I would like to compare some data between a 3rd party analytics tool and GA.
Now I would love to see the IP addresses that Ga is receiving however it seems that they do not reveal this information, fine, however, I cannot find a way to use the flat table in the GA custom report to show me the following if possible;
Full Date Time (Seems as though they don't want you to have this either)
Browser Version
Browser Width & Height
Page (from the hit)
And I would like this data not to be grouped by the metric, this way I can see that if the same user has hit a page 3 times it isn't grouped.
If anyone can help please let me know. If the question is poorly phrased please let me know.
Thanks,
Connor.
This requires some work, and it will allow the breakdown only for future hits, not for hits that are already collected.
To view individual hits you need to create a hit based dimension that is unique per hit. Unless your page has an amazing amount of traffic a timestamp in milliseconds (e.g. new Date().getTime()) will be sufficient (for your report you might want to format that in a nice way). So in the admin section of your GA property you go to custom definitions, create a hit scoped custom dimension, and then modify your pagecode to send the timestamp to that dimension. Hit scoped means it is attached to the pageview (or other interacton hit) it is sent with.
If you want to break down your report by user you need the clientid (clientid is how Google recognizes that hits belong to the same user). Again, send it as a custom dimension.
This does not tell you how many sessions the user had (there is no session identifier in GA). If you need to know that you can create a session scoped custom dimension and send a random number along ("session scope" means that GA only stores the last value in a session, so you don't need to maintain a session id over multiple pageviews, since the last value will be set for all hits within the session). The number of different sessions ids per client id then tells you the number of sessions per user.
The takeaway is that GA only shows aggregated data, and if you want to defeat this mechanism you need to throw data at it that cannot be aggregated further. You might run into other constraints (i.e. there is a limited number of rows per report).

How to replicate the GA field Visits in Big Query

In a typical GA session, after picking a View ID and a date range,
We can get a week's worth of data like this:
Users
146,207
New Users
124,582
Sessions
186,191
The question is, what BQ field(s) to query in order to get this Users value?
Here is an example query with 2 methods (the 2nd method is commented out).
SELECT
count(DISTINCT(CONCAT(CAST(visitID as STRING),cast(visitNumber as
STRING)))) as visitors,
-- count(DISTINCT(fullVisitorId)) as visitors
I noticed the FVID method was fairly close to what I see in GA (with Users being a little understated by a 3% in BQ) and if I use the commented out method, I get a value that is about 15% overstated as compared to GA. Is there a more reliable method in BQ to acquire the Users value in GA?
The COUNT(DISTINCT fullVisitorId) method is the most correct method, but it won't match what Analytics 360 reports by default. Since last year, Google Analytics 360 by default uses a different calculation for the Users metric than it previously did. The old calculation, which is still used in unsampled reports, is more likely to match what you get out of BigQuery. You can verify this by exporting your report as an unsampled report, or using the unsampled reporting features in the Management API.
If you want the numbers to match exactly, you can turn off the new calculation by using the instructions here. The new calculation's precise details are not public, so duplicating that value in BigQuery is quite difficult.
There are still some reasons you might see different numbers, even with the old calculation. One is if the site has implemented User ID, in which case the GA number will be lower than BigQuery for fullVisitorId. Another is sampling, though that's unlikely in Analytics 360 at the volumes you're talking about.

Multiple Ecommerce Transaction because of cached webpage

I am stuck in a case where Google Analytics is recording multiple eCommerce transaction. We have added code on server side to execute GA eCommerce posting code only one time. Still this issue is reproducible for some transaction. The multiple eCommerce transaction are for same transaction Id but on different dates.
On research I found that this case is with small devices (mobile, tablet). The small devices browser caches whole webpage. And when the browser is opened it reload webpage from cache. So each time user opens the browser and page loads from cache hence the causing this issue.
Can anyone help me on this?
Thanks
"Ignore double transaction ids" would be quite a useful setting and we should try and make this a feature request. However at the moment it does not exist.
The only way I can think of would be to use an API script that selects the transaction ids for the last "n" days and then inserts a heap of filters via the management API to exclude hits with that transaction id. After some time (when the caches have presumably expired) you could throw out old filters. This would be only feasible if you have a small number of transactions (I think there is an upper limit to the number of filters a view can have).
Or if your transaction ids are somehow sequential (e.g. if they contain the date) you might be able to construct a regex that matches earlier parts of the sequence (e.g. previous dates) and only let's a transaction pass if it is higher up in the sequence than the last recorded transaction id (or does not let it pass if the date in the transaction id is lower than the current date - remember to update your filter at midnight).
Caveat: I have not actually tried something like this, but it sounds like it should work.

Discrepancies on "active users metric" between Firebase Analytics dashboard and BigQuery export

According to Firebase Analytics docs (https://support.google.com/firebase/answer/6317517#active-users), the active number of users is the number of unique users who initiated sessions on a given day. Also according to the docs, every time a session is started an event with session_start name is sent. I am trying to get that metric using BigQuery's export, but my query is giving me different results (15636 on BigQuery, 14908 on FB analytics)
I have also tried converting to different timezones to see if that might be the issue, but no matter which timezone I try I never get the same (or similar) results
Which query should I run to get the same results I get on Firebase Analytics dashboard for active users?
My query is
SELECT EXACT_COUNT_DISTINCT(user_dim.app_info.app_instance_id)
FROM table_date_range([XXXXX.app_events_], timestamp('2016-11-26'), timestamp('2016-11-29'))
WHERE DATE(event_dim.timestamp_micros) = '2016-11-27'
AND event_dim.name ='session_start'
Thanks
Update
After #djabi's answer I changed my query to use user_engagement rather than session_start and it works much better now. Still some minor differences though (they range from under ten to under 50 out of 16K, depending on the date).
I have tried once again using different timezones by playing around with DATE(date_add(event_dim.timestamp_micros,1,'hour')) but I never got the exact number I get on Firebase Analytics dashboard.
The new numbers are good enough to be considered statistically acceptable, but wondering if anyone has a suggestion to improve the query and get exact results?
The current query is:
SELECT
COUNT(*) AS active_users
FROM (
SELECT
COALESCE(user_dim.user_id, user_dim.app_info.app_instance_id) AS user_id
FROM
TABLE_DATE_RANGE([XXXXX.app_events_], TIMESTAMP('2016-11-24'), TIMESTAMP('2016-11-29'))
WHERE
DATE(event_dim.timestamp_micros) = '2016-11-25'
AND event_dim.name ='user_engagement'
GROUP BY
user_id )
Note: At the moment we are not sending user_id, so the COALESCE will always return the app_instance_id, in case anyone was going to suggest that could be the problem
You need to wait for full 3 days for data from offline devices to be uploaded. Your query correctly filter the events based on the event timestamp and you pull data from 3 days but that is only day and half from today and that is enough for all data to be uploaded. Try including 3 days from yesterday.
Also try using user_engagement event instead of session_start. I believe active user count is based on user_engagement and not on session_start events.
Also FB reports take a bit to process so you wight want and check the FB reports the next day.
FB reports are done on the time zone on the account and events are timestamped in UTC so the day in FB reports is different from UTC calendar day. You want to control for that discrepancy as well to get matching numbers.
Sessions are by-default measured after user activity of 10 seconds in the respective app which you can change. Try changing the sessions start time count to the least number possible and then you may arrive at a number closer to what you are expecting.
For Android stats I used:
user_dim.device_info.resettable_device_id
instead of
user_dim.app_info.app_instance_id
and it produced better results.

How are dynamic segments in Google Analytics retroactive?

Are dynamic advanced segments retroactive at the session or visitor level? Can it retroactively recalculate session data or can it retroactively recalculate visitor data?
Here is an example as this is a foggy question.
Say I add an event tag to GA today. Tomorrow i run a report where the dynamic segment is for visitors who have triggered the event. The report requests unique visitors over time.
Now, if it is retroactive at the visitor level, the visitor is now tagged as having triggered the event. The report should show data going back in time (assuming these are not first time visitors). In this scenario GA will see if the visitors tagged arrived 2 days ago even though the events did not exist yet.
This answer no longer reflects up to date information.
Advanced Segments are not queried at the visitor level, and are thus not able to query data across sessions. They query particular sessions (or, visits), not visitors.
So, if you visit the site today, trigger an event, and then visit the site again tomorrow and don't trigger the event, an advanced segment for that event will be a query that says "Show me all sessions in which this event was trigger"; the former will be included and the latter excluded.
Similarly, if you do an advanced segment for a particular page, what you're saying is "Filter down to all the sessions in which this page was viewed" (this can be confusing for people who apply an advanced segment for a particular page, and the result contains more than just that page.)
However, they are dynamic and can be applied to the retroactively. In other words, the results of the advanced segmentation are not contingent on when the advanced segment itself was created. (This stands in contrasts to, say, account filters, that do not apply themselves retroactively.) They tend to be calculated on the fly; you'll notice that complex advanced segments can often take a long time to process, and tend to increase the likelihood that Google Analytics will return sampled (or, "fast access") data.
There is no way to use advanced segmentation to query across sessions.

Resources