Is there any non-valid contents for a referer field? - google-analytics

I'd like to know the identities of users who come to my website by clicking on a link in my native application.
Could my app populate the referer field with a unique user ID?
Does the referer field have to be a valid URL?
Will Google Analytics handle a referer that isn't a URL?
I could mockup a URL like http://www.user.com/656 where 656 is the user ID.
Any caveats?

Good question. On one hand, i have remapped quite a few of these GA Fields (including Referral) and i have never had a problem. In addition, doing so is core functionality in GA (see below); in other words (as i am sure you know) GA actually provides its users with the templates (via Advanced Filters) to munge these Fields beyond recognition. )
On the other hand, i can think of one situation (only one actually) in which i believe Google would object and that's the case where the replacement information violates the GA Privacy Policy. In particular, displaying or storing or tracking via GA any personally identifiable information violates the Policy [italics are mine, not a quote from the GA Privacy Policy].
The actual definition from Google:
Personal information is information
that personally identifies you, such
as your name, email address or billing
information, or other data which can
be reasonably linked to such
information
Is a unique user ID, personally identifiable information? There is a difference between data intended to clearly distinguish one user from another, and data intended to identify each user. The distinction might depend on security--i.e., how easily could someone who had in their possession a list of your unique user IDs, discover the user identifies behind them?

Technically, the referrer header just specifies a URL. If you want to construct "fake" URLs, come up with a URI scheme specific to your application. Something like userid:656.

Related

What should the client_id be when sending events to Google Analytics 4 using the Measurement Protocol?

I am using Google Analytics 4 (GA4) on the client to track a whole bunch of different events. However, there are 2 scenarios that I can't cover client side:
A user completing check out on a payment page hosted by a third-party (Stripe in this case).
A refund that is made by the support team.
These events are handled by the server using webhooks. To me it seems like the most straightforward solution, would be to let the server send the event to GA4 (as opposed to the client sending it). I believe the Measurement Protocol should be used for this.
For each event submitted through the Measurement Protocol a client_id is required. When the client is submitting an event, this is an automatically generated ID which is used to track a particular device.
My question thus is, what should the client_id be when submitting an event server-side?
Should the same client_id perhaps be used for all events, as to recognize the server as one device? I have read some people proposing to use a randomly generated client_id for each event, but this would result in a new user to be recognized for every server-side event...
EDIT:
One of the answers proposes to use the client_id, which is part of the request as a cookie. However, for both examples given above, this cookie is not present as the request is made by a third-party webhook and not by the user.
I could of course store the client_id in the DB, but the refund in the second example is given by the support team. And thus conceptually it feels odd to associate that event with the user's client_id as the client_id is just a way to recognize the user's device? I.e. it is not the user's device which triggered the refund event here.
Another refund event example would be when user A makes a purchase with user B and user B refunds this purchase a week later. In this situation, should the client_id be the one of user A or of user B? Again, it feels odd to use a stored client_id here. Because, what if user A is logged in on two devices? Which client_id should be used here then?
Great question. Yes, your aim to use Measurement Protocol is a proper solution here.
Do not hardcode the client id. It's gonna be a hellish mess in reports. The nature of user-based reporting (which GA is) demands client ids to uniquely identify users. To your best ability.
GA stores the client id in a cookie. You should have convenient and immediate access to it on every client hit to BE. The cookie name is _ga. GA4 appends the measurement id to the cookie name. Here, google's docs on it: https://developers.google.com/analytics/devguides/collection/analyticsjs/cookie-usage But you can easily find it if you inspect "collect" hits and look at their payloads. There's another cookie named _gid that contains a different value. That would be a unique client id. Set it too if you can, but don't use it for the normal client id. It has a different purpose. Here how the cookie looks here, on stack:
And here it is in Network. You will need it for proper debugging. Mostly to make sure your FE client ids are the same as BE client ids:
Keep an eye on the cases when the cookie is not set. When a cookie is not set, that most frequently means the user is using an ad-blocker. Your analysts will still want to know that the transaction happened even if there's a lack of context about the user. You still can track them properly.
3.1 The laziest solution would be giving them an "AnonymousUser" client id and then append a random number to that so that it would
both indicate that a user is anonymous and still make it possible
for GA to separate them.
3.2 A better solution would be for you to make a fingerprint client id for such users, say, hashing a concatenated string of their
useragent+ip+locale+screen resolution, this is up to your analysts
to actually work on the definition of a unique user if the google
analytics library is unable to do it.
3.3 Finally, one of the best solutions for you would be generating a client id on your own, keeping GA's format and maybe adding an indicator there that it has been generated on your end just for easier debugging in the Future and setting it as a cookie, using it instead of _ga. Just use a different cookie name so that ad-blockers wouldn't know to block it.
If you want to indicate that a hit was sent through the server, that's a good idea. Use custom dimension for that. Just sync it with your analysts first. Maybe they wouldn't want that, or maybe they would want it in a different dimension.
Now, this is very trivial. There are ways to go much deeper and to improve the quality of data from here. Like gluing the order id, the transaction id, the user id to that, using them to generate client id, do some custom client tracking for the future. But I must say that it's better than what more than 90% of, say, shopify clients have.
Also, GA4 is not good enough for deeper production usage. Many things there are still very rudimentary and lacking. I would suggest concentrating on Universal Analytics and having GA4 as a backup for when Google makes GA4 actually good enough to replace UA. That is, unless you're downloading your data elsewhere and not using GA's interface for analysis.
It seems that this page (Relevant portion in the screenshot below), advices to either send the data along with the client_id or user_id. However fails to address the fact client_id is a mandatory field as stated here.
I believe it is probably safe to assume that randomly generating this field should work. At least it seems to on my end however be warned that I am unsure if this has any impact on attribution.
* In the above image, Device ID refers to client_id

GTM dataLayer restrictions

I am about to send an user's email address via dataLayer.push(), I was wondering if there are any dataLayer restrictions that I should be aware of?
for instance, in Google Analytics, it is not recommended to send email addresses or any PII(Personally identifiable information).
Is it a good idea to send email address, first and last name via dataLayer.push()?
I was not able to get any definitive answers online. Maybe someone can shed a light here. Thanks
Datalayer.push will not do anything with the data except putting it into a variable that is local to the browser. So at this point there is not harm done, and there is not legal or otherwise prohibition to do so (which is what you are asking).
Of course having it on the dataLayer does not do anything by itself.
The question is what you are going to do with the data, and that depends on your jurisdiction (in Europe the GDPR applies, other countries have their own privacy laws) and how the TOS for your tracking tools look like (e.g. in GA you cannot have PII, even if the GDPR or comparable laws do not apply in your country).
But for example if you have consenting users, and storing their email address is essential for delivering the service you are advertising on the page, then pushing this to the dataLayer and using it for your essential purpose should be fine (IANAL).
Also, once the data is in a variable, every other tracking tool that you have implemented on your page can access the value even without your knowledge (but then they can already read it from input fields or other elements in the page code, so that doesn't add much to the danger).

can google analytics tell me http referrer for each specific goal conversion?

I have a website that allows people to create an account (that is the conversion I wish to track).
I wish to know where a specific person is coming from. I have google analytics installed and have set up the registration page as a goal, but the reporting tells me traffic sources as an aggregated pie chart. It doesn't report down to the user account level to say that 'person with email xyz' came from 'facebook' for example.
What custom variables or mark up would I need to add to GA to report at that detailed level, if that is at all possible?
Otherwise, I will just have to record the first http_referer in a cookie and stick it in a database during the registration process.
Any advice?
Firstly I must ask you, how actionable do you think it is to look at data at that granular of a level? Finding out what % of people who registered came from facebook or some other place is actionable, because it helps you do things like determine where to focus marketing efforts. But individual users? How is this actionable to you? (hint: it's not)
However, if you are still determined to know this, you should first note that it is against Google's ToS to record personally identifiable data both directly (recording the actual value in GA) or indirectly (e.g. - recording a unique id that you can use to tie to personal info stored within your own system). If this is something you don't want to risk, I suggest moving to another analytics tool that does not have this sort of thing in their ToS (e.g. Adobe SiteCatalyst, which costs money, or perhaps you may instead prefer to choose an "in-house" approach, like Piwik)
If you are still determined to follow through with this and hope not to get caught or whatever, Google Analytics doesn't record data like what info a visitor filled out in a form (like their email address) unless you populate that data in a custom field/dimension/metric/event to be sent along with the request. Usually you would populate this on the form "thank you" page (which is usually the same page you use as your goal url or goal event if you're popping and using an event for your goal). So you would populate the email address in one of those custom variables and then have it as a dimension to break down the http referrer by.

Are hash referral codes in URL necessary?

Other sites' referral programs generate url's with hash codes to represent the referrer. When the url is sent to and followed by friends and family, some kind of points or recognition system awards the referrer defined by the hash code...but why the hash code? Why not the user id?
I can see a few reasons:
Obscure the user ID for privacy reasons
Adds an abstraction layer so you can track where the referal came from. e.g. Hash #1 for links from stackoverflow, Hash #2 for links from expert-sexchange (sic), etc...
Security so that a malicious user couldn't simply try all possible user IDs sequentially and rack up a lot of bogus referals - very trivial if the user IDs are simply numbers.

Is it possible to use Google Analytics to track single user account?

I've got a website that needs user logged in before they can use, I want to track the behavior of each single user. Is it possible to do this? Any advice? Thanks very much!
Yes this is possible.
The simplest way might be to define a Custom Variable scoped to the visitor, and bind it to the value equal to the (obfuscated) user's ID (the one you assigned them when they registered):
pageTracker._setCustomVar(1, "Registered TempID", "345X34RT", 1)
The four arguments that you pass into a Custom Variable are: slot number (any integer 1 through 5, which won't change in this case; 'TempID' which is a variable name i chose for this variable; 'TempID', the value for that variable that maps to (but must not not personally identify or it will violate Google's Privacy Policy) a registered user; the final argument '1' is the scope, which i is '1' in this case because this variable is scoped to the visitor).
This new variable is sent to the GA server via a call to _trackPageview() so make sure you the custom variable is set before trackPageview() is called.
There are a several excellent resources, including step-by-step tutorials on GA Custom Variables, including a blog post by ROI Analytics, which is think is one of the best.
Once you've done to view this Custom Variable in the Google Analytics Web Client, go to the left-hand panel and click on the Visitor heading; the last item under this heading (and just before the next major heading which is Traffic Sources) you will see the Custom Variables subheading.
This is where you can view the data for the custom variables you set. For instance, the panel will look something like this:
It is technically possible, but prohibited by the terms of service that you agreed to when you installed Google Analytics (you read, them, right?).
From: http://www.google.com/analytics/tos.html
7. PRIVACY . You will not (and will not allow any third party to) use the Service to track or collect personally identifiable information of Internet users, nor will You (or will You allow any third party to) associate any data gathered from Your website(s) (or such third parties' website(s)) with any personally identifying information from any source as part of Your use (or such third parties' use) of the Service. You will have and abide by an appropriate privacy policy and will comply with all applicable laws relating to the collection of information from visitors to Your websites. You must post a privacy policy and that policy must provide notice of your use of a cookie that collects anonymous traffic data.
Seems pretty clear.
It's possible via User-ID javascript : User-ID j enables the analysis of groups of sessions, across devices, using a unique and persistent ID
ga('create', 'UA-XXXX-Y', 'auto');
ga('set', '&uid', {{ USER_ID }});
ga('send', 'pageview');
{{ USER_ID }} is a unique, persistent, and non-personally identifiable string ID that represents a user or signed-in account across devices.
https://developers.google.com/analytics/devguides/collection/analyticsjs/user-id
The cookies that google analytics use, will track the same user, so long as they use the same PC and dont clear their cookies. Thats how GA can tell if a customer is a new user or a returning user. However, it is limited for the reasons I gave above.

Resources