Measurement protocol and two tracking IDs - google-analytics

We're looking to set up measurement protocol requests to import non PII CRM data to Google Analytics. This will be tied to a client ID that is stored in the CRM and the measurement protocol request would be set to a non-interaction hit populating user scoped custom dimensions.
The CRM data is populated via two portals one for a global market and one for China, with each having their own property number.
The issue currently faced is it is unknown which portal the CRM data originated from and therefore which tracking ID to use with the measurement protocol request.
Even with the request being a non-interaction hit, if a request was made to both tracking IDs, what would happen with the data if the client ID didn't already exist?
Would a new user still be created?
Would it be disregarded?
Are there any ramifications of such actions?

Just to make sure my assumptions on this are correct I tested this (i.e. sending a single non-interaction event with a set clientid) with an empty view, and the answers are:
a new user is created
no
yes, there is at least one ramification since GA counted a user, but no session (I guess since technically there was no interaction). This looks rather odd in the reports. Also of course no pageview for that user.

Related

Google Analytics 4 - Measurement Protocol API used without gtag.js or firebase

Is it possible to use GA4 Measurement Protocol to send events to Google Analytics and view and analyze them in the GA dashboard without using gtag.js or any other front-end script? The use case would be that some events are being sent to my server and I will just push these events to GA through the API.
One thing that makes me think is that the official Measurement Protocol API say:
In order for an event to be valid, it must have a client_id that has already been used to send an event from gtag.js. You will need to capture this ID client-side and include it in your call to the measurement protocol. In send an event to your property, we use "client_id" as the client_id. You will need to replace this with a real client_id that comes from gtag.js.
(https://developers.google.com/analytics/devguides/collection/protocol/ga4/verify-implementation?client_type=gtag)
That suggests that only events that have a valid client_id that originate from gtag.js will be counted.
I did some experimenting with randomly generated client_ids and what I discovered was that I was able to see my events in the Realtime section of the GA4 console (the Event count by Event name section), but all the other sections would be empty and the Users in last 30 min section would always show 0.
Can someone please explain to me why it's zero and if such a use case is valid at all? Thanks
tl;dr
You can use any value in client_id, as long as it uniquely identifies the user (we use a GUID/UUID), but it seems like you also need to send a value in user_id. We use the same value for both.
Also, you need to add the 'engagement_time_msec' parameter to get any user metrics to register.
Longer answer:
We're trying to do the same, i.e. send all events to the GA4 Measurement Protocol from the server, so that it is not dependent on the current user's GDPR cookie settings.
We currently do this for a Universal Analytics property with no issues, but it seems that Google is trying to prevent this in future, by restricting the scope of the Measurement Protocol in GA4, whilst forcing everyone to move to it by July 1st 2023. See the documentation at https://developers.google.com/analytics/devguides/collection/protocol/ga4#full_server-to-server, where it states:
While it is possible to send events to Google Analytics solely with
measurement protocol, only partial reporting may be available. The
purpose of measurement protocol is to augment existing events
collected via gtag, GTM, or Firebase.
We have something working with GA4, in that the events are being registered on the GA4 property correctly, using a client id that is just a GUID/UUID that we define in our own site cookies. So, any value can be used in the client id, as long as it uniquely identifies the user. The same value is used to populate the user_id parameter.
When sending events, the realtime event details were showing on the GA4 dashboard, but user metrics were not until we also populated the 'engagement_time_msec' parameter, as described in https://stackoverflow.com/a/71482548/7205473
We still have issues with things like getting the user location and the platform details, which previously were automatically populated by passing the IP address and the User Agent, but which seem to no longer work in GA4.
We were also passing page load timing events through the Measurement Protocol, but again, these features seem to have been removed in GA4.
It is possible to use GA4 directly without gtag.ja or the Firebase SDK. Its not supported, so it takes some work. We have this working in a desktop app reasonably well. There a couple things that need to be done.
As stated elsewhere the "engagement_time_msec" param must be set using the "_et" parameter. This is the number of milliseconds between now and the previous event.
The client id "cid" has a specific format; it should be:
"randomNumbers(10).unixTimeStamp()"
The session id "sid" format is:
"randomNumbers(10)"
The "_z" parameter needs to be set. I think this is a cache buster. Looking deep into the gtag.js code it is a url safe base64 encoding of "CCD", which always results in the value "ccd.v9b"
The page hash parameter "_p" can be set to this; not totally sure its correct but it works.
"randomString(3).randomString(3)"
Set the "User-Agent" HTTP request header in whatever framework/lib you are using. GA4 uses this to determine many things including Operating System. You will need to create a fake user agent based on the local device information. This is what we use for a Windows 11 x64:
"myco.testapp/4.0.0 (Windows NT 10.0; Win64; x64)"
The IP will be taken fromn the web request which is where the geolocation data comes from.
Since a full working example is worth 1,000 words of documentation; here is a "test" event with a parameter "animal=dog":
https://www.google-analytics.com/g/collect?cid=0078745494.1659679529&_et=364&_p=pfJ.Aev&seg=1&sid=2678664821&tid=G-???&ul=en&v=2&_z=ccd.v9b&en=test&ep.animal=dog
It's possible to extract outgoing GA4 request from a GTM container debug/preview view and map any GA4 (automatically collected and custom) event.
Example page_view request URL:
https://www.google-analytics.com/g/collect?v=2&tid=G-XXXXXXXXXX&gtm=3oes1i1&_p=1545013558&_dbg=1&cid=P%2FdJWyULMwcT21TMrzn7pZdlNt%2FxtttGVqGUmqNYbhc%3D.1669722847&ul=nl-nl&sr=2560x1440&uaa=x86&uab=64&uafvl=Not_A%2520Brand%3B99.0.0.0%7CGoogle%2520Chrome%3B109.0.5414.75%7CChromium%3B109.0.5414.75&uamb=0&uam=&uap=Windows&uapv=10.0.0&uaw=0&_s=1&_uip=XXX.XXX.XXX.X&sid=1674235261&sct=1&dl=https%3A%2F%2FXXXXXXXXXX.com%2F%3Fgtm_debug%3D1674235654105&dr=https%3A%2F%2Ftagassistant.google.com%2F&dt=OM%20test&jscid=XXXXXXXXXX.1669722847&seg=1&en=page_view
Tip: use Postman to analyse and experiment with parameters
regardless of the platform used to make a call the Measurement Protocol, you should use a client id generated by gtag.js, or the app ID if using Firebase.

How can I get website analytics data of bots and users with no javascript and/or cookies?

Is it possible to get the info about website traffic without Google Analytics, for example?
For tracking without Javascript you can use the Measurement Protocol, i.e. you send a request to the Google Analytics Server with query parameter that specify the type of interaction (pageview, event etc) and the data you want to track.
General info is found in the protocol reference, a list of valid parameters is here, and if you want to test your tracking requests you can use the hit builder, which allows you to assemble, validate and send a hit to Google Analytics.
As for tracking without cookies that probably won't work very well. You need a persistent id to assemble hits into sessions, and sessions into visits. Such an ID is usually stored in a cookie (your "no javascript" requirement means that things like local storage are out of the question). You can either decorate all your links with a client id and use that to persist the parameter from page to page, or you use some sort of server-side browser fingerprinting.
All in all this might be somewhat less trivial than you assume, especially if you do not only want to track pageviews but also events that do not load a new page.

Google UA not connecting event to previous sessions

I am sending an offline event to Google UA using their measurement protocol. I am trying to tie it to the users previous visits to get attribution and using Google's own Client ID from their cookie to do that. While the event does appear in Google UA, it is not tied to other client id sessions.
Here an example of the API call
In this example, "1859919454.1455744839" are the X.Y elements parsed from the _ga cookie's client id.
Am I doing something wrong or making some wrong assumptions about google analytics accepting their own Client ID instead of creating and using my own as suggested in their measurement protocol's parameter reference? I have seen plenty of forum threads that suggest google's own client id is acceptable.
I checked your API call and you are missing a measurement protocol parameter in the URL "t" (https://developers.google.com/analytics/devguides/collection/protocol/v1/parameters#t)
which defines what type of hit you are trying to send i.e. event or pageView
Google has created a debug tool to check whether the url generated is valid or not. You can also send hits to your GA using the tool.
https://ga-dev-tools.appspot.com/hit-builder/
turns out there is an unpublished parameter in the newer UA interface that allows for strict or loose userid. If strictly enforced, the userid MUST be a UUID. If strict is false it will accept google's own user id. Once that parameter is passed everything worked

Google Analytics - flagging PII/NPI (personally identifiable information & non-public information)

Can you set up alerts in Google Analytics to flag potential PII/NPI such as name, email address, billing address, billing details etc.? If so, how?
First I have do say I do not understand the downvote(s). For example I have seen applications with user logins where a full name was part of the page title - combined with time based dimensions that gave profile that say which user looked at what page at what time, and that would be clearly illegal. Even worse I have seen a case where security tokens were transmitted to GA that allowed access to secured resources. So clearly accidental transmission of PII to Google Analytics is a real thing.
Unfortunately there is not much you can do about it. You can either do a custom report with relevant dimensions and have it sent to you for a manual audit, or pull them via the API and have them programmatically examined via regular expressions that look for patterns like e-mail addresses etc. But by the time you can do that it is already to late, the data will already be permanently recorded in the GA property.
You have to stop this before the data is collected - if at all possible already in the website (via form validation etc), or use Google Tag Manager with custom javascript variables with validation rules, or filters in the analytics view (the latter being cumbersome and not very promising for this purpose).
The good news is that GA will not suddendly start to track PII on it's own. So you only need to check if your GA account tracks PII when you set up the account. Collect a few days data, validate that everything is okay, make changes as necessary and after all flaws are straightened out copy the view to start data collection from scratch and drop the old view if it contains PII.

Universal Analytics clientId vs userId

The docs describe the clientId as:
This anonymously identifies a particular user, device, or browser instance.
https://developers.google.com/analytics/devguides/collection/protocol/v1/parameters#cid
It can be used to send server side hits to analytics while still tying them to a particular user.
There is also a feature in closed beta called userId, which you will be able to pass once a user has authenticated: https://developers.google.com/analytics/devguides/collection/analyticsjs/user-id
userId is fairly self-explanatory. However, UA also allows you to pass your own clientid if you choose to. For developing CRM type tools, can one just associate the clientid with a user in the same way that you would with a userid? The goal is primarily to be able to track offline interactions and connect them with visitors in Analytics.
maembe,
clientID is a random number generated by Google Analytics, and keep in mind it's always required and its value should always be a random UUID (version 4) (you could technically use your own, but I am not sure how practical and reliable this would be). Most importantly, you can easily access it with predefined get function (see documentation).
For your needs, this is exactly what you should do -- if someone sings ups, store ClientID in your CRM and then if there is any offline purchase, record the transactions with measurement protocol using the stored clientID. Google Analytics will then make the link (attribution) with that visitor and you will see this in your reports. Also, take advantage of newly available custom metrics and dimensions which can store pretty much anything you want (think of customer segmentation etc.). Beware of storing PII though.
Hope this helps :)
I am curious how UserID is going to work, it might change everything, but for now, I wouldn't rely on it as there is very little information available.
This Analytics support page now states the differences between Client ID and User ID - https://support.google.com/analytics/answer/6205850?hl=en#clientid-userid
Essentially client IDs represent unauthenticated users, and are automatically randomly generated.
User IDs represent authenticated users, and must be set manually.
It's worth noting that user IDs cannot be things like an email address, or other data that would allow Google to identify the user
You will not upload any data that allows Google to personally identify an individual (such as certain names, Social Security Numbers, email addresses, or any similar data), or data that permanently identifies a particular device (such as a unique device identifier if such an identifier cannot be reset).
If you upload any data that allows Google to personally identify an individual, your Google Analytics account can be terminated, and you may lose your Google Analytics data.
Taken from: https://developers.google.com/analytics/devguides/collection/protocol/policy
I'd imagine User ID is designed to differentiate the behavior of an authenticated user. here

Resources