I'm using Measurement Protocol (for Google Analytics 4) and I'd like to send multiple events in batches to send fewer overall requests, but I'm not sure how to specify the timestamp of each individual event.
I'm sending an http post request to the measurement protocol endpoint. The documentation
states that the post body has a timestamp_micros field, which is used for all events in the request, unless an event overrides it: "This value can be overridden via user_property or event timestamps".
I could not find a reference for the structure of an event item though, so I'm not sure how to override it. A key name of timestamp would make sense, but the post body itself uses timestamp_micros so maybe events follow that pattern? Or do events even use microseconds?
It's not easy to test because the raw metadata like timestamps aren't available in GA.
Related
I am using Google Analytics 4 (GA4) on the client to track a whole bunch of different events. However, there are 2 scenarios that I can't cover client side:
A user completing check out on a payment page hosted by a third-party (Stripe in this case).
A refund that is made by the support team.
These events are handled by the server using webhooks. To me it seems like the most straightforward solution, would be to let the server send the event to GA4 (as opposed to the client sending it). I believe the Measurement Protocol should be used for this.
For each event submitted through the Measurement Protocol a client_id is required. When the client is submitting an event, this is an automatically generated ID which is used to track a particular device.
My question thus is, what should the client_id be when submitting an event server-side?
Should the same client_id perhaps be used for all events, as to recognize the server as one device? I have read some people proposing to use a randomly generated client_id for each event, but this would result in a new user to be recognized for every server-side event...
EDIT:
One of the answers proposes to use the client_id, which is part of the request as a cookie. However, for both examples given above, this cookie is not present as the request is made by a third-party webhook and not by the user.
I could of course store the client_id in the DB, but the refund in the second example is given by the support team. And thus conceptually it feels odd to associate that event with the user's client_id as the client_id is just a way to recognize the user's device? I.e. it is not the user's device which triggered the refund event here.
Another refund event example would be when user A makes a purchase with user B and user B refunds this purchase a week later. In this situation, should the client_id be the one of user A or of user B? Again, it feels odd to use a stored client_id here. Because, what if user A is logged in on two devices? Which client_id should be used here then?
Great question. Yes, your aim to use Measurement Protocol is a proper solution here.
Do not hardcode the client id. It's gonna be a hellish mess in reports. The nature of user-based reporting (which GA is) demands client ids to uniquely identify users. To your best ability.
GA stores the client id in a cookie. You should have convenient and immediate access to it on every client hit to BE. The cookie name is _ga. GA4 appends the measurement id to the cookie name. Here, google's docs on it: https://developers.google.com/analytics/devguides/collection/analyticsjs/cookie-usage But you can easily find it if you inspect "collect" hits and look at their payloads. There's another cookie named _gid that contains a different value. That would be a unique client id. Set it too if you can, but don't use it for the normal client id. It has a different purpose. Here how the cookie looks here, on stack:
And here it is in Network. You will need it for proper debugging. Mostly to make sure your FE client ids are the same as BE client ids:
Keep an eye on the cases when the cookie is not set. When a cookie is not set, that most frequently means the user is using an ad-blocker. Your analysts will still want to know that the transaction happened even if there's a lack of context about the user. You still can track them properly.
3.1 The laziest solution would be giving them an "AnonymousUser" client id and then append a random number to that so that it would
both indicate that a user is anonymous and still make it possible
for GA to separate them.
3.2 A better solution would be for you to make a fingerprint client id for such users, say, hashing a concatenated string of their
useragent+ip+locale+screen resolution, this is up to your analysts
to actually work on the definition of a unique user if the google
analytics library is unable to do it.
3.3 Finally, one of the best solutions for you would be generating a client id on your own, keeping GA's format and maybe adding an indicator there that it has been generated on your end just for easier debugging in the Future and setting it as a cookie, using it instead of _ga. Just use a different cookie name so that ad-blockers wouldn't know to block it.
If you want to indicate that a hit was sent through the server, that's a good idea. Use custom dimension for that. Just sync it with your analysts first. Maybe they wouldn't want that, or maybe they would want it in a different dimension.
Now, this is very trivial. There are ways to go much deeper and to improve the quality of data from here. Like gluing the order id, the transaction id, the user id to that, using them to generate client id, do some custom client tracking for the future. But I must say that it's better than what more than 90% of, say, shopify clients have.
Also, GA4 is not good enough for deeper production usage. Many things there are still very rudimentary and lacking. I would suggest concentrating on Universal Analytics and having GA4 as a backup for when Google makes GA4 actually good enough to replace UA. That is, unless you're downloading your data elsewhere and not using GA's interface for analysis.
It seems that this page (Relevant portion in the screenshot below), advices to either send the data along with the client_id or user_id. However fails to address the fact client_id is a mandatory field as stated here.
I believe it is probably safe to assume that randomly generating this field should work. At least it seems to on my end however be warned that I am unsure if this has any impact on attribution.
* In the above image, Device ID refers to client_id
I have traffic coming from Salesloft emails sent by sales reps that gets bounced off a subdomain and then has a sbrc parameter appended to the URL. I'd like to ensure that this traffic gets counted by Google Analytics as coming from Email, whereas it's currently falling under the "Direct" bucket. Ideally, users would also be appending utm parameters to their links, but this isn't happening consistently.
I tried creating a filter to search and replace the following regex (?:^|\?|&)(sbrc=[^&]*&?) with ?utm_medium=email&utm_source=salesloft in the Request URI. This changed the displayed URL when looking at my realtime traffic, but did not change how GA categorized the channel of the traffic (still direct).
I then tried editing the default channel groupings for Email to be the system categorized OR Landing Page URL contains "sbrc" and dragged Email to the first channel grouping at the top. This doesn't seem to have done anything at all.
How can I make GA recognize this custom parameter as being attributable to the Email channel?
You could use advanced filters to solve this task. The Advanced filter lets you construct Fields for reporting from one or two existing Fields.
In your case, advanced filters let you assign values to campaign source and campaign medium, based on the content of the request URL. This is a sample setup for the source field, and you need a second similar filter for medium as well:
Ideally, users would also be appending utm parameters to their links,
but this isn't happening consistently.
If they add their own and use a utm_medium other then utm_medium=Email, then it will not be defined as being in the default Email channel.
Filters are not retroactive.
Filters also run after data has already been processed by Google, to only change how it displays in the reports as it gets sent back into the GA reports.
A Search and Replace Filter applied to the RequestURI for
?utm_medium=email&utm_source=salesloft is not going to have the result you want.
It will change the appearance of the RequestURI in reports, not actually change how that traffic is attributed.
Changes to the default Channel grouping are not retroactive, it should have affect going forward though.
To see the changes in historical data for the Channels, you would do best to create a custom Channel grouping at the View level, this can then be applied retroactively to the historic data.
I need to track events in Google Analytics from a server through the Measurement Protocol. I can do this just fine, but my problem is that I want to send additional/custom data along with the event. Specifically, I want to send a UUID along with the event so that it is possible for me to fetch data from the Google Analytics API in the future and correlate events with rows in a relational database.
Is there any decent way to send custom data along with events? I looked at using the event value, but it must be an integer, and it is not intended for things like this. The event category, action, and label are reserved for other purposes.
I am not that proficient in Google Analytics, so the solutions off the top of my head would be:
Send an additional event containing the UUID in the event label or something like that. Seems like a bit of a hack/workaround to send two events, with one being used exclusively behind the scenes.
Perhaps using a custom dimension or metric. I am not 100% sure about the implications of this and if that's a decent approach or not.
So basically my question is: what would be the best way for me to send a UUID along with a Google Analytics event from a server, taking into consideration that I cannot use the event category, action, and label for the current event? Is there any other way in which I could link events retrieved from the Google Analytics API to rows in a database?
So let's say I trigger a "Completed Order" event to GA, and I also have orders in a MySQL database. So what I want to do, is to link an event to an order row in the database.
There are several things that you can do and it pretty much depends on what you want to do with the information you are storing. For starters, all your requests should include the uid field with the value being the user ID within your system. This way all Google Analytics data will be calculated on the same user. Note: this is an internal value used within Google Analytics and you won't be able to see it.
Second, I would create a custom dimension of the name user_id and store the user information in that. You will then be able to use this information within your reports to see what each user is doing. Note: it's against TOS to send user name, email, or any other PII (personally identifiable information) to Google Analytics. But you can send your internal user ID.
I have done both of these in the past and found it to work quite well.
More info on User-ID.
I have a single site with analytics.js loaded on it. If JS is disabled (usually if the visitor is a bot or scraper), I have a mechanism to fall back to using server-side Measurement Protocol calls so that the data is still recorded.
Is there a way I can segment the results to show only the data recorded with analytics.js? Would using appId creatively (ga('set', 'appId', 'analytics.js');) or a custom variable when recording the data be the right way to do this?
This is fairly straightforward if you manipulate the Data Source field in your measurement protocol hits.
First, when sending measurement protocol hits, set the ds field to server-side (or a similar name of your choosing).
Then, in Google Analytics you can create a segment with the following conditions:
Dimension: Data Source
Operator: exactly matches
Value: server-side (or whatever value you are sending via the ds field)
I wonder if I need to send all parameters with each hit, or if some of the parameters are 'cached' in the current session. I.e. do I require to send the resolution, view port, etc. with each hit, or is it enough to send those once per session?
I can't find any source that confirms what behaviour it it.
Thanks.
It depends on which specific parameters you mean, but the fields you specify when you create your tracker apply to all hits for that tracker. See the GA documentation section called Specifying fields at creation time. More specifically the send documentation states (emphasis mine),
The fields that are sent are the values specified in the ...fields parameters and fieldsObject, merged with the fields currently stored on the tracker.