Missing significant number of transactions when using measurement protocol and non-interactive - google-analytics

Using google analytics and it's measurement protocol, I am trying to track eCommerce transactions based on my customers (who aren't end-consumers meaning not sparse unique userid's, locations, etc...) which have a semantic idea of a "sale" with revenue.
The problem is that not all of my logged requests to the ga mp API are resulting in "rows" of transactions when looking at conversions->ecommerce->transactions. And additionally, the revenue reported is respectively missing too. An example of the discrepancy is listing all my non-zero transaction revenue API calls, I should see 321 transactions in the analytics dashboard. However, I see only 106... 30%!!! This is about the same every day even tweaking some attributes which I would think would force uniqueness of a session or transaction.
A semantic difference is that a unique consumer (cid or uid) can send a "t=transaction" with a unique "ti" (transaction id) which overlap and are not serial. I say this to suggest that maybe there is some session related deduplication happening even though my "ti" attribute is definitely unique across my notion of a "transaction". In other words, a particular cid/uid maybe have many different ti's in the same minute.
I have no google analytics javascript or client-side components in use and are simply not applicable to how I need to use google analytics which takes me to using the measurement protocol.
Using the hit-builder, /debug/collect, and logging of any http non-200 responses, I see absolutely no indication that all of my "t=transaction" messages would not be received and processed. Some of the typical debugging points I think are eliminated with this list of what I have tried
sent message via /collect
sent multiple message via /batch (t=transaction and t=item)
sent my UUID of my consumer as cid=, uid= and both
tried with and without "sc=start" to ensure there was no session deduplication of a transaction
tried with and without ua (user-agent) and uip (ip override) since it's server side but hits from consumers do come from different originations sometimes
took into consideration my timezone (UTC-8) and how my server logs these requests (UTC)
waited 24 to 48 hours to ensure data
ecommerce is turned on for my view
amount of calls to measurement protocol are < 10000 per day so I don't think I am hitting any limits
I have t=event messages too although I am taking a step back from using them for now until I can see that data is represented at least to 90%+.
Here is an example t=transaction call.
curl \
--verbose \
--request POST \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data 'ta=customer1&t=transaction&sc=start&v=1&cid=4b014cff-ccce-44c2-bfb8-e0f05fc7827c&tr=0.0&uid=4b014cff-ccce-44c2-bfb8-e0f05fc7827c&tid=UA-xxxxxxxxx-1&ti=5ef618370b01009807f780c5' \
'https://www.google-analytics.com/collect'

You've done a very good job debugging so unfortunately there isn't much left to do, a few things left to check:
View bot/spider filter: disable this option to be on the safe sife
500 hits / session: if you're sending lots of hits for the same cid/uid within 30 minutes (whatever your session-timeout is), then these would be recorded as per of the same session and thus you could reach quota limit.
10M hits / property / month: you didn't mention overall properly volume so I'm mentioning this in case
Paylod limit of 1KB = 8192 bytes: I've seen people running into that issue when tracking transactions with a crazy amount of products attached to it
Other view filters: same thing, you didn't mention so I'm mentioning just in case
Further debugging could include:
Using events instead of transactions: the problem with transactions is that they're a black box, if they don't show up you don't have any debug. I personally always track my transactions via events and I set a copy of the Ecommerce payload as event label (JSON string) for debugging, so if the event is present I know it's not a data ingestion issue but most likely my ecommerce payload which is malformed (and I have the event label to debug it), and if the event is missing then it's a data ingestion problem. See below example, replace UA-XXXXXXX-1 with your own:
v=1&t=event&tid=UA-XXXXXXX-1&cid=1373730658.1593409595&ec=Ecommerce&ea=Purchase&ti=T12345&ta=Online%20Store&tr=15.25&tt=0.00&ts=0.00&tcc=SUMMER_SALE&pa=purchase&pr1nm=Triblend%20Android%20T-Shirt&pr1id=12345&pr1pr=15.25&pr1br=Google&pr1ca=Apparel&pr1va=Gray&pr1qt=1&pr1cc=&el=%7B%22purchase%22%3A%7B%22actionField%22%3A%7B%22id%22%3A%22T12345%22%2C%22affiliation%22%3A%22Online%20Store%22%2C%22revenue%22%3A%2215.25%22%2C%22tax%22%3A%220.00%22%2C%22shipping%22%3A%220.00%22%2C%22coupon%22%3A%22SUMMER_SALE%22%7D%2C%22products%22%3A%5B%7B%22name%22%3A%22Triblend%20Android%20T-Shirt%22%2C%22id%22%3A%2212345%22%2C%22price%22%3A%2215.25%22%2C%22brand%22%3A%22Google%22%2C%22category%22%3A%22Apparel%22%2C%22variant%22%3A%22Gray%22%2C%22quantity%22%3A1%2C%22coupon%22%3A%22%22%7D%5D%7D%7D
Using a data collection platform like Segment: which will give you an extra level of debugging via their debugger (although their payload syntax is != than GA so that introduces another level of complexity, it does help me from time to time to spot issues with the underlying data though as I'm familiar with its syntax).

Related

Can the Google Calendar API events watch be used without risking to exceed the usage quotas?

I am using the Google Calendar API to preprocess events that are being added (adjust their content depending on certain values they may contain). This means that theoretically I need to update any number of events at any given time, depending on how many are created.
The Google Calendar API has usage quotas, especially one stating a maximum of 500 operations per 100 seconds.
To tackle this I am using a time-based trigger (every 2 minutes) that does up to 500 operations (and only updates sync tokens when all events are processed). The downside of this approach is that I have to run a check every 2 minutes, whether or not anything has actually changed.
I would like to replace the time-based trigger with a watch. I'm not sure though if there is any way to limit the amount of watch calls so that I can ensure the 100 seconds quota is not exceeded.
My research so far shows me that it cannot be done. I'm hoping I'm wrong. Any ideas on how this can be solved?
AFAIK, that is one of the best practice suggested by Google. Using watch and push notification allows you to eliminate the extra network and compute costs involved with polling resources to determine if they have changed. Here are some tips to best manage working within the quota from this blog:
Use push notifications instead of polling.
If you cannot avoid polling, make sure you only poll when necessary (for example poll very seldomly at night).
Use incremental synchronization with sync tokens for all collections instead of repeatedly retrieving all the entries.
Increase page size to retrieve more data at once by using the maxResults parameter.
Update events when they change, avoid re-creating all the events on every sync.
Use exponential backoff for error retries.
Also, if you cannot avoid exceeding to your current limit. You can always request for additional quota.

Different Ways to Call Google Analytics from Server Side?

Currently, I am using Measurement Protocol to push the data to GA. The problem is I didn't get any response back for Success or Error on Production, If yes Please suggest?
Due to this, I am looking if there is any other options available for the same like can we achieve it using analytics 360?
The google analytics production data collection endpoint does not return a request status back (always 200 OK) by design to ensure ultra-light processing speed.
What I usually recommend to clients using Measurement Protocol server-side is to
To log a reasonable amount (or all of them) of requests somewhere. Storage is extremely cheap nowadays and knowing the data format if an emergency happens you will be able to manually extract the data
Every once in a while (one on thousand or one on a million or even more oftne depending on the importance of the data randomly) validate request on GA debug endpoint and parse the returned json. If there are any warnings or error send a notification for further investigation. This way, if anything went wrong you will be on top of the problem by the time BI & Marketing Team would be affected

What is the rate limit for direct use of the Google Analytics Measurement Protocol API?

In the Documentation for Google Analytics Collection Limits and Quotas
It gives the rate limits that are implemented by the various Google-provided libraries. I can't seem to find a published rate limit for users that are POSTing directly to measurement protocol (https://www.google-analytics.com/collect).
Is there one and if so what is it?
Edit on 10 July 2015 -
A few commenters asked for an example of the kind of data I am sending in.
Using a series of calls to wget with a sleep of one second between each call.
Here is an example with the app name and tracking code removed:
wget -nv --post-data 'ul=en&qt=7150000&av=0.0.1&ea=PLET&v=1&tid=<my_tracking_code>&ec=Move+to+Object&cid=1434738538-738-654031&an=<my_app_name>&t=event' -O /dev/null 'https://www.google-analytics.com/collect'
I've tried sending these queries to the /debug endpoint and all of them are valid. My first upload worked as expected and reports looked good. Subsequent uploads of the same data set to different GA properties have had mixed results. Sometimes no data appears in reports. Sometimes partial data appears in reports. During upload, realtime reports always show activity, though.
Directly from the documentation Google Analytics Collection Limits and Quotas
These limits apply to the Web Property / Property / Tracking ID.
10 million hits per month per property
Measurement protocol
Universal Analytics Enabled
This applies to analytics.js, Android iOS SDK, and the Measurement
Protocol.
200,000 hits per user per day 500 hits per session not including
ecommerce (item and transaction hit types). If you go over either of
these limits, additional hits will not be processed for that session /
day, respectively. These limits apply to Premium as well.
Now I agree it doesn't specifically state the per second it rate for measurement protocol but the above one dumped Measurement in with analytics.js so I think we can assume its
analytics.js:
Each analytics.js tracker object starts with 20 hits that are
replenished at a rate of 2 hit per second. Applies to all hits except
for ecommerce (item or transaction).
But just to make sure I am sending an email off to the development team they should make it more clear where the per second rate of the measurement protocol lies. I will repost here when I hear from them
Response from Google
The Measurement Protocol does not do any kind of rate limiting or
quota-ing by IP address or tracking ID or anything like that. However,
most of the client libraries do rate limit in some form or another.
As Linda points out in her answer, there are various limits and quotas
imposed by the back end, but those are done at processing time, not
collection time.
Conclusion
There is no limit to sending data through the measurement protocol. But when the data is processed limit may be applied. I think they may be referring to the max 2 million hits a month. It seems it's the libraries that apply limits on how fast you can send data not the measurement protocol directly.
Last Update: Please watch this video which explains all GA quotas policies:
https://youtu.be/1UfER93ALxo
In particular, your issue might be result of 10 requests / 1 second limitation:
https://youtu.be/1UfER93ALxo?t=5m27s
I can confirm the same thing. In my case I had own buildHitTask which constructs URL for a measurement protocol request (MPR) and stores it in the hitPayload field. But instead of original GA reporting - I was saving those URLs into cookies for delayed reporting.
In my experiment, only 10-20% of 2,000 measurement protocol requests were actually "stored".
Rest of hits are not available in GA Reporting UI, neither API or BigQuery. Each request was sent with 2 seconds delay via new Image() method, and slowdown in case of errors. Received results are not consistent. Both success and failed hits are randomly distributed across whole time period.
Please let me know in case if you find more details on this constraint!

Google measurement protocol transaction passing

I am making such request to Google Measurement API:
curl --request POST 'http://www.google-analytics.com/collect' --data "v=1&tid=UA-58879752-1&cid=1526087851.1425399573&t=transaction&dh=somedomain.com&ti=1&ta=clothing&tr=17.98&ts=2.0&tt=2.5&cu=EUR"
it returns me 1x1 GIF file:
GIF89a�����,D;%
But I don't see anything in transactions list:
It takes 24 - 48 hours for data to appear in the standard reports, this is because Google needs time to process things.
You can check real-time reports but only send basic data. This will tell you if your hits are in fact being recorded.
You will have to wait until tomorrow to see if the transaction data is being recorded that isn't available though the Real-time reports.

What is the X-REQUEST-ID http header?

I have already googled a lot this subject, read various articles about this header, its use in Heroku, and projects based on Django.
However, it's still all confused in my head.
What is the purpose of this header?
Does it violate user privacy?
Can it help tracking a user?
When you're operating a webservice that is accessed by clients, it might be difficult to correlate requests (that a client can see) with server logs (that the server can see).
The idea of the X-Request-ID is that a client can create some random ID and pass it to the server. The server then include that ID in every log statement that it creates. If a client receives an error it can include the ID in a bug report, allowing the server operator to look up the corresponding log statements (without having to rely on timestamps, IPs, etc).
As this ID is generated (randomly) by the client it does not contain any sensitive information, and should thus not violate the user's privacy. As a unique ID is created per request it does also not help with tracking users.
Purpose: Idempotency
With an ID that changes for every request, but stays the same in case of a retry of a request, the receiver can ensure the request won't get processed more than once.
This is a quote from some API provider:
All POST, PUT, and PATCH HTTP requests should contain a unique
X-Request-Id header which is used to ensure idempotent message
processing in case of a retry
If you make it a random string, unique per request, it won't infringe on your privacy, nor enable tracking.
If you want to know more of what idempotency has to offer, read this insightful article.
N.B. As Stefan Kögl comments, this header is not standardized - hence the (deprecated) "X-" prefix.
Explanation using a story/analogy
You can think of X-Request-ID like your driver's license (some type of ID card).
Imagine visiting the DMV:
You present your ID card to gain admission, and then you
Stand in line, for 16 hours,
after 16 hours - the DMV tells you to go home. i.e. your request timed out. The petty tyrants at the DMV don't work a second past 4:31 pm.
An entire day wasted - you complain to the congressman - hey: I waited in line for 16 hours etc. The congressman replies:
"Buddy, we get 1000s of people visiting the DMV everyday - When I look through the DMV records, how am I meant to identify you - when you came etc.?
That's where the X-Request-ID comes in.
Application of story to HTTP
The same applies to http requests - it's an id used to help back end devs find out what went wrong. Clients submit requests with that id - and it's a ID that they create (i.e. some random number etc.). Now servers can keep track of it.
Story given to help you remember. Hopefully you're not confused further - post a comment if I have and i'll try to clear it up. thx.
This request header can be used for syncrhonization. Let's say you've built a ToDo list that offers offline capability. Your user creates 3 items and each of them are given a unique UUID on the offline application. When network connectivity is available, the records are POSTed to the server and the corresponding IDs auto-generated from the database are returned. You can then replace the IDs in your app (e.g. "id" attribute of HTML "li" element).

Resources