Google API shows duplicate rows for TransactionId's - google-analytics

I've got a strange problem.
I'm trying to pull out data from GA API.
metrics: ga:users
dimensions: ga:date,ga:source,ga:medium,ga:transactionId
After reviewing the data I can see that I have multiple transaction Id's.
Usually 5 to 7 duplicates per month - the same transaction ID is in two dates.
In Google Analytics there are no duplicates.
There are in the exported data + Query Explorer also shows duplicates.
Does anybody know why?
Thanks,
Krzysztof

First of all, do you make sure you use unique transactionIDs for each transaction? I've seen cases where the ERP makes certain transaction or orderIDs available again after an order was cancelled.
If you look at the transactionID in GA (click in on the ID itself to drill down into it) and change to Quantity or look at the product revenue for the graph line, do they occur on two different dates?
This behaviour is often seen if you forget to prevent the transaction pixel again on things like a page refresh. Another example is if they perhaps receive an email with "Click here to view your order/transaction" and it fires again on the receipt page.

Related

Google Analytics shows delayed data after removed filter

has anybody every experience delay in data between views on the same property before and after applying a filter on google analytics.
Basically I have an unfiltered view (View1) this view is showing let's say 100K users
I have another View ( View2) where i applied a filter which shows 20K .
The very next day a removed the filter from (View2) but now when i compare the number on day0 both having no filter the number are not the same. View2 will have the data being updated and matching view 1 only 30 hours later.
I wonder if there is a way to completely reset the view as it seems that it is still filtering and then doing a computation on the background.
Thanks,
removing filters only affects the data collected after the filter was removed, past data are not changed.
For the current day the filters are reprocessed after midnight. Everything that has been previously filtered (previous days) can no longer be restored if it has been filtered.
The root cause of the delay was a google adwords account that was link incorrectly to GA and was causing lag on the data , once remove the data come back to normal

Core Reporting API v4 sampling limit for click data

After how many clicks will the Core Reporting API start to sample the click data, when using samplingLevel=LARGE?
I'm trying to retrieve data from a large account (i.e. more than 30,000 clicks/day on average) and the number of clicks doesn't always match what I can see on Google Analytics. This, however, seems to happen only on this large account, and not every day. Strangely, on those days where the click count doesn't match, transactions and revenue match what I can see in Google Analytics.
In my query, I'm only trying to retrieve the data for a given account, without applying any filter.
EDIT: If I don't retrieve data aggregated at the account level ― thus not including ga:adwordsCampaignID, ga:adwordsAdGroupID and ga:adwordsCriteriaID in the dimensions ― I can retrieve all the clicks.
EDIT2: If add the ga:deviceCategory dimension, along with ga:adwordsCampaignID, ga:adwordsAdGroupID and ga:adwordsCriteriaID I can retrieve all clicks. I'm not sure if this can help narrow down the issue.
Google Analytics has a cardinality of 50K after that you ll receive (others)
Based on the "EDIT" i can safely assume the reason is that when a click doesn't have an 'ga:adwordsCampaignID' associated with it, it ll not retrieve that click. This happened to me with custom dimensions.
EDIT: Try using the 'include-empty-rows' parameter on your query. https://developers.google.com/analytics/devguides/reporting/core/v3/reference#includeEmptyRows

Discrepancies on "active users metric" between Firebase Analytics dashboard and BigQuery export

According to Firebase Analytics docs (https://support.google.com/firebase/answer/6317517#active-users), the active number of users is the number of unique users who initiated sessions on a given day. Also according to the docs, every time a session is started an event with session_start name is sent. I am trying to get that metric using BigQuery's export, but my query is giving me different results (15636 on BigQuery, 14908 on FB analytics)
I have also tried converting to different timezones to see if that might be the issue, but no matter which timezone I try I never get the same (or similar) results
Which query should I run to get the same results I get on Firebase Analytics dashboard for active users?
My query is
SELECT EXACT_COUNT_DISTINCT(user_dim.app_info.app_instance_id)
FROM table_date_range([XXXXX.app_events_], timestamp('2016-11-26'), timestamp('2016-11-29'))
WHERE DATE(event_dim.timestamp_micros) = '2016-11-27'
AND event_dim.name ='session_start'
Thanks
Update
After #djabi's answer I changed my query to use user_engagement rather than session_start and it works much better now. Still some minor differences though (they range from under ten to under 50 out of 16K, depending on the date).
I have tried once again using different timezones by playing around with DATE(date_add(event_dim.timestamp_micros,1,'hour')) but I never got the exact number I get on Firebase Analytics dashboard.
The new numbers are good enough to be considered statistically acceptable, but wondering if anyone has a suggestion to improve the query and get exact results?
The current query is:
SELECT
COUNT(*) AS active_users
FROM (
SELECT
COALESCE(user_dim.user_id, user_dim.app_info.app_instance_id) AS user_id
FROM
TABLE_DATE_RANGE([XXXXX.app_events_], TIMESTAMP('2016-11-24'), TIMESTAMP('2016-11-29'))
WHERE
DATE(event_dim.timestamp_micros) = '2016-11-25'
AND event_dim.name ='user_engagement'
GROUP BY
user_id )
Note: At the moment we are not sending user_id, so the COALESCE will always return the app_instance_id, in case anyone was going to suggest that could be the problem
You need to wait for full 3 days for data from offline devices to be uploaded. Your query correctly filter the events based on the event timestamp and you pull data from 3 days but that is only day and half from today and that is enough for all data to be uploaded. Try including 3 days from yesterday.
Also try using user_engagement event instead of session_start. I believe active user count is based on user_engagement and not on session_start events.
Also FB reports take a bit to process so you wight want and check the FB reports the next day.
FB reports are done on the time zone on the account and events are timestamped in UTC so the day in FB reports is different from UTC calendar day. You want to control for that discrepancy as well to get matching numbers.
Sessions are by-default measured after user activity of 10 seconds in the respective app which you can change. Try changing the sessions start time count to the least number possible and then you may arrive at a number closer to what you are expecting.
For Android stats I used:
user_dim.device_info.resettable_device_id
instead of
user_dim.app_info.app_instance_id
and it produced better results.

How can I query Google Analytics condition on TWO different dates?

I wish to extract (via the Analytics Core Reporting API) all the transactions made TODAY by users that had a specific ga:eventCategory few weeks ago.
I'm looking to see the date of a transaction and all dated of event that are related to that transaction.
If GA was sql I would join by the ga user and take in the dimension both his transactions date and his dimension update date...
Thanks.
Noam.
Like I have indicated in my comment you can segment the data to include only those users who have the specific event. Segmentation works fine with the core reporting API.
Your segment defintion would look like this:
users::condition::ga:eventCategory==[myEventCategory]
(where obviously the thing in [brackets] is a placeholder that needs to be substituted for the event category name). The "users::" prefix means you are segmenting by user scope (as opposed to sessions), so this will include all sessions in the selected timeframe for users who had the event at least in one of their session (even if the event was outside the selected timeframe).
Select transactionId as dimension and some metric (revenue) and todays date and you are done. Or you would be done if this was actually going to work, but there are at least two caveats:
Google Analytics does not work in realtime, so it's unlikely that TODAYs transactions are fully available (Google says it's 24 hours until the data is processed - actually it might happen faster, but you cannot rely on it).
If a user has deleted his or her cookie she won't be recognized as a recurring user and GA will be unable to segment her out. The longer the interval between the event and the transaction the less likey it is that the GA cookie is still present.
So even with a technically correct query it might be that you won't get the data you need.

Google Analytics unique events is incorrect

We have a Google Analytics account set up to track downloads on certain files. When you create a report with, for example, Event Label (user) as the primary field, and Event Action (file name) as the secondary field, GA will say that the number of unique events is 168. When you add up the numbers in the unique events column, however, they add up to 322. Exporting the table as a CSV file and viewing it in Excel will also give you 322.
I should also add that there are 270 rows in the table, so for there to be 168 unique events, that would mean some user/file combinations would have 0 unique events, which doesn't make any sense.
Can anybody shed some light on why this is happening?
There is a lot of confusion with Unique Events metric. Instead of counting a number of times an event with unique combination of category/action/label happened, GA was counting unique combination of every dimension included in the report!
Finally that metric is deprecated now and renamed to Unique Events (legacy).
Instead we get a real Unique Events (new) metric which behave like expected.
More explanation in my blog post
http://www.internetrix.com.au/blog/google-analytics-unique-events-are-dead-long-live-unique-events/
In all Google Analytics custom reports, the Unique Events field actually reports the number of Visits (or sometimes a slightly higher number).
Built-in Google Analytics reports will show you the correct number of Unique Events.
It's a bug, plain and simple. I reported it to Google back in August, but it's still broken.
The number in the Google Analytics standard reports can be explained...but as Aaron pointed out, it sure looks broken. I wrote an article explaining it all:
http://www.analyticsedge.com/2014/09/misunderstood-metrics-unique-events/

Resources