User ID Clearing - google-analytics

This is not SEO related but somewhat merely client related.
There is a website in particular that uses Google Analytics tracking to uniquely identify each one of their users.
This website has a blocking system that is preventing me from viewing some of their content, which I'd like to view.
I figured out that the website uses the Google Analytics cookie to somehow identify me.
Is there a way I can somehow spoof the client-ID from Google Analytics to basically make a new identity for myself? Thanks community.

GA has two ids that could identify a user, clientId and userId. The client id is set by the javascript tracking code. To delete that you would simply have to delete the _ga cookie. You will get a new ID randomly generated that does not identify you other than in the sense that multiple pageviews with that clientId will be treated as coming from the same user.
The userId is set by the server when a user logs in (to connect all sessions of that user, however it must not personally identify the user in question), so at that point the website already has to know you to set a userid.
It seems more than unlikely (and technologically not even feasible) to use Google Analytics to limit access to a website. For starters, Google provides an opt-out plugin for Analytics that would make such a system rather ineffectual, and I'm not quite sure how such a system would work.
To yet answer your question, you can change the clientId by manipulating the _ga cookie and replace the value therein, and you could spoof the userID e.g. by using a browser plugin that allows you to manipulate http requests. However I don't think this will let you bypass any access protection.

Related

Google cross domain analytics add params in URL when cookies are not accepted

One of my clients has a cross domain analytics set up.
Everything works well, but there are different behaviors when user gives full cookie consent and when he allows only strictly necessary cookies.
Behavior in case of full cookie consent:
GA stores data into cookies i.e. _ga cookie _ga_ID can be found in console cookie tab.
Behavior in case of only strictly necessary cookie consent:
GA stores some data in URL, for example:
https://www.example-page.com/?_gl=1*XXXXXXX*_up*MQ..*_ga*ZZZZZZZ.*_ga_YYYYYYY*YYYYYYY..
According to google documentation the second case is default behavior. And cross domain measurement is working when _gl param is added to url.
What I do not understand is why are URL params not added everytime and only when some cookies are not accepted, so I would like to get better understanding of this.
There is also a possible issue which I do not understand and that is:
GA params are added to url also when user is just switching between subsites in the same domain i.e. from www.example-page.com/home-page to www.example-page.com/about-page. If I understand correctly this should not happen as I am staying within domain.
The questions I am most interested in are:
How is GA determining if it should store its data as cookies or push it to url?
Where are these parameters stored before user redirect first time? Is it part of datalayer / google_tag_manager global variables?
Is there way to store the params somewhere else than in url when full cookie consent is not granted?
Is adding of GA params to url even when staying withing same domain a correct behavior?
Project details:
Site is running on Wordpress and use OneTrust for cookie management.
EDIT: Issue with URL resolved.
In my case this issue was caused by update of consent mode template (gtm-templates-simo-ahava). Reverting to previous version fixed the problem. Possible cause of the problem can be maybe connected to this pull request in template repository
How is GA determining if it should store its data as cookies or push it to url?
Pushing the data to url is the mechanism of cross-domain tracking. You set a list of domains that cross-domain tracking should work for. This is likely your problem here. You're not supposed to set subdomains, only TLDs in vast majority of cases.
Where are these parameters stored before user redirect first time? Is it part of datalayer / google_tag_manager global variables?
This data is stored in cookies before the user goes to a different domain. If cookies are deleted, then it's stored in the JS scope of the GA library. This implies that they would be erased and regenerated on JS context loss. Loss on a page unload, regeneration on a page load.
Is there way to store the params somewhere else than in url when full cookie consent is not granted?
Well. Yes. But very tricky and expensive. And the immediate question is why would you do that. This would defeat the purpose of blocking the cookie. Natively, GA doesn't support other methods of passing the value, but if you're into tinkering, you can either store the value on your backend and then retrieve it, using some "primary functionality" cookie. Another option is using third party server's cookies, but that would defeat the purpose even more.
Is adding of GA params to url even when staying withing same domain a correct behavior?
No, it's most likely a mistake.
Now, you really asked all the right questions, so I don't have much to add, except that disabling your primary anonymized behavioral tracking is usually a lazy "safe" choice. And lazy here implies wrong.
Normally, larger corps don't block primary tracking. They only block third party marketing-related tracking. Basically, pixels. They consider their main analytics part of the primary functionality, which is a strong case given that main analytics data is often used in debugging, performance measurement and even for app security audits.
Finally, using onetrust or a similar solution to completely manage your tracking is sub optimal. They basically just destroy all "offending" cookies all the time. This will mess up your behavioral data very significantly.
The proper way to use consent management systems is declaring user consent choice in your tag management system and then in it, block rules/tags from firing in case the consent is not given. You normally just carefully block marketing tags there based on consent. Remember, consent management systems are only deleting cookies. Because that's trivial. They don't block network requests. Absence of cookies may not prevent the data from being sent, often even uniquely identifying the client, using the primary cookie's user id, allowing to match the activity to the backend database.

Disable GA Analytics Cookies and Deleting

Part of our GDPR Requirement is to disable the GA tracking, we already achieved this by setting the window property to "window['ga-disable-UA-XXXXX-Y'] = true;". But aside from this, we also wanted to delete the set cookies (i.e. " _ga", "_gid"). Will expiring the said cookie suffice for the deletion of those cookies, or is there a better way to approach this?
Or if this cookies won't be deleted, what are they for if the tracking is disabled.
These are first-party cookie set via Javascript, so expiring them will work just fine.
what are they for if the tracking is disabled.
I would venture that Google did not put a lot of thought into this - after all their cookie solution precedes the GDPR and any widespread notion of data protection. They just remain there because that is the default behaviour for cookies. The obvious downside is that if the opt-out is revoked (either on purpose or by accident) the client id from the cookie might be reused and the tracking data would be connected to existing data. So deleting the cookie is a really good idea. If you want to be particularly thorough, you could pick up the client id of a user who opts out and sent a request to the User Deletion API. This will not remove aggregated data, but will remove PII (namely client id and user id) to anonymize the data.

Google Analytics refresh token invalidates

We allow our users to connect their Google Analytics account to our CMS. However, many of them choose to have us manage their GA, so we have a lot of properties.
Each property has its own View ID, and when we connect the View ID, we do so by entering the View ID. Then, the account select prompt shows up. We select the account that owns the property (which is usually the same account, say admin#company.com), and then retrieve access and refresh token.
This all works well. Except, sometimes it seems the refresh token invalidates.
Is this because of the refresh token limit (we definitely manage more than 25 clients)?
If so, what would be a better way to connect the property to the site, while still allowing users to use their own GA account if they wish to do so?
I was thinking of trying to retrieve which google account is being use for the connect, but I am not sure how I would do that.
Any ideas?
I figured it out. The refresh token limit is per actual google account, took me a while to figure that out. I now store a default value in the main DB and if that one can access the ID that is to be connected, the default is used. Otherwise it will redirect to the Google auth window and authenticate normally.

If google analytics is used without cookie storage, are all pageviews a "new" user?

I have a little web browser in my application that hits a webpage using Google analytics. That little web browser has cookies and local disk storage disabled.
Are my user analytics going to be skewed because of this? Is every user reported as a new user when in actuality they are an existing one?
Yes, your Analytics data is going to be impacted. For example, you will not be able to differentiate hits between Sessions and Returning Visitors. As you say, each Visitor will be reported as a new one.
Analytics uses the Client ID parameter to uniquely identify a Visitor. As the official Field Reference states:
Client ID
Required for all hit types.
Anonymously identifies a browser instance. By default, this value is
stored as part of the first-party analytics tracking cookie with a
two-year expiration.
If your application can generate a unique key for each user and persist it elsewhere that in cookies or localStorate, you could still create your own Client ID:
Disabling Cookies
By default, analytics.js uses a single cookie to persist a unique
client identifier across pages. In some cases you might want to use
your own storage mechanism and send data directly to Google Analytics
without the use of cookies.
You can disable analytics.js from setting cookies using the following:
ga('create', 'UA-XXXX-Y', {
'storage': 'none',
'clientId': '35009a79-1a05-49d7-b876-2b884d0f825b'
});
When you disable cookie storage, you will have to supply your own
clientId parameter except for the special case where you are using
cross-domain linking parameters.
Yes. Google Analytics uses the client ID to determine if a user is new or returning.
Note, if your users are logged in (probably not though without cookies), then you can use the user ID feature to determine new from returning users.

Shibboleth being passed as referrer in Analytics

Been searching for an answer for this for a while but nothing that I can find that is useful.
Basically in the organisation I work in we use Shibboleth for user authentication.
We probably have 200+ sites & Shibboleth works effectively for these.
However, one site in particular is protected using Shibboleth (if the user is not a valid user they have to sign up for an account in this process also) which is causing an issue with our Analytics (Google Analytics). The referrer to the site is ALWAYS the shibboleth authenticator.
What we need is a facility to track conversions on this site based on source/campaign the user arrived from. The proposed solution right now is to use a Tag Manager product to fire off specific tags based on a combination the referrer or campaign. This one single site is used for a multitude of things (all prospective leads to EPR) but we need different information based on how the user landed on that page.
We are tracking all the interaction points that lead up to this (i.e. tracking potential leads) but a lot drop out during the signup process and right now we do not have actual conversion data, meaning the Marketing etc is being spent based on highest traffic source rather than which source or campaign is most effective.
The problem is when a user signs in, signs up using Shibboleth, Shibboleth sets a new cookie for that session. In Google Analytics all the referrer to this site are from the authenticator.
Is there some configuration issue with GA that I am overlooking for this scenario or is there something that can be done with Shibboleth so that the initial referrer (rather than the authenticator) being passed as the referrer which in turn would facilitate the rule creation of the Tag Manager to fire off the required tags
If you use Universal Analytics, you might be able to overcome using the new Referral Exclusion:
You can exclude specific domains from being recognized as referral
traffic sources in your Analytics reports. A common use for this
feature is to exclude traffic from a third-party shopping cart to
prevent customers from being counted in new session and as a referral
when they return to your order confirmation page after checking out on
the third-party site.
Google Analytics recognizes the URL you use to set up a new property
in your account and automatically excludes this domain from your
referral traffic, so you won’t see self-referrals in your Analytics
reports.
https://support.google.com/analytics/answer/2795830?hl=en
Another option might be to pass a parameter across the login page when you request an authentication. If you set your fields in the request correctly, you could have Shibboleth send your user back to any page or with any query string, such as adding ?source=original_page_name. After authentication, these parameters should be available to manipulate or pass on to GA. This won't actually spoof your referrers, but it will get you the data you need.

Resources