Shibboleth being passed as referrer in Analytics - google-analytics

Been searching for an answer for this for a while but nothing that I can find that is useful.
Basically in the organisation I work in we use Shibboleth for user authentication.
We probably have 200+ sites & Shibboleth works effectively for these.
However, one site in particular is protected using Shibboleth (if the user is not a valid user they have to sign up for an account in this process also) which is causing an issue with our Analytics (Google Analytics). The referrer to the site is ALWAYS the shibboleth authenticator.
What we need is a facility to track conversions on this site based on source/campaign the user arrived from. The proposed solution right now is to use a Tag Manager product to fire off specific tags based on a combination the referrer or campaign. This one single site is used for a multitude of things (all prospective leads to EPR) but we need different information based on how the user landed on that page.
We are tracking all the interaction points that lead up to this (i.e. tracking potential leads) but a lot drop out during the signup process and right now we do not have actual conversion data, meaning the Marketing etc is being spent based on highest traffic source rather than which source or campaign is most effective.
The problem is when a user signs in, signs up using Shibboleth, Shibboleth sets a new cookie for that session. In Google Analytics all the referrer to this site are from the authenticator.
Is there some configuration issue with GA that I am overlooking for this scenario or is there something that can be done with Shibboleth so that the initial referrer (rather than the authenticator) being passed as the referrer which in turn would facilitate the rule creation of the Tag Manager to fire off the required tags

If you use Universal Analytics, you might be able to overcome using the new Referral Exclusion:
You can exclude specific domains from being recognized as referral
traffic sources in your Analytics reports. A common use for this
feature is to exclude traffic from a third-party shopping cart to
prevent customers from being counted in new session and as a referral
when they return to your order confirmation page after checking out on
the third-party site.
Google Analytics recognizes the URL you use to set up a new property
in your account and automatically excludes this domain from your
referral traffic, so you won’t see self-referrals in your Analytics
reports.
https://support.google.com/analytics/answer/2795830?hl=en

Another option might be to pass a parameter across the login page when you request an authentication. If you set your fields in the request correctly, you could have Shibboleth send your user back to any page or with any query string, such as adding ?source=original_page_name. After authentication, these parameters should be available to manipulate or pass on to GA. This won't actually spoof your referrers, but it will get you the data you need.

Related

Google cross domain analytics add params in URL when cookies are not accepted

One of my clients has a cross domain analytics set up.
Everything works well, but there are different behaviors when user gives full cookie consent and when he allows only strictly necessary cookies.
Behavior in case of full cookie consent:
GA stores data into cookies i.e. _ga cookie _ga_ID can be found in console cookie tab.
Behavior in case of only strictly necessary cookie consent:
GA stores some data in URL, for example:
https://www.example-page.com/?_gl=1*XXXXXXX*_up*MQ..*_ga*ZZZZZZZ.*_ga_YYYYYYY*YYYYYYY..
According to google documentation the second case is default behavior. And cross domain measurement is working when _gl param is added to url.
What I do not understand is why are URL params not added everytime and only when some cookies are not accepted, so I would like to get better understanding of this.
There is also a possible issue which I do not understand and that is:
GA params are added to url also when user is just switching between subsites in the same domain i.e. from www.example-page.com/home-page to www.example-page.com/about-page. If I understand correctly this should not happen as I am staying within domain.
The questions I am most interested in are:
How is GA determining if it should store its data as cookies or push it to url?
Where are these parameters stored before user redirect first time? Is it part of datalayer / google_tag_manager global variables?
Is there way to store the params somewhere else than in url when full cookie consent is not granted?
Is adding of GA params to url even when staying withing same domain a correct behavior?
Project details:
Site is running on Wordpress and use OneTrust for cookie management.
EDIT: Issue with URL resolved.
In my case this issue was caused by update of consent mode template (gtm-templates-simo-ahava). Reverting to previous version fixed the problem. Possible cause of the problem can be maybe connected to this pull request in template repository
How is GA determining if it should store its data as cookies or push it to url?
Pushing the data to url is the mechanism of cross-domain tracking. You set a list of domains that cross-domain tracking should work for. This is likely your problem here. You're not supposed to set subdomains, only TLDs in vast majority of cases.
Where are these parameters stored before user redirect first time? Is it part of datalayer / google_tag_manager global variables?
This data is stored in cookies before the user goes to a different domain. If cookies are deleted, then it's stored in the JS scope of the GA library. This implies that they would be erased and regenerated on JS context loss. Loss on a page unload, regeneration on a page load.
Is there way to store the params somewhere else than in url when full cookie consent is not granted?
Well. Yes. But very tricky and expensive. And the immediate question is why would you do that. This would defeat the purpose of blocking the cookie. Natively, GA doesn't support other methods of passing the value, but if you're into tinkering, you can either store the value on your backend and then retrieve it, using some "primary functionality" cookie. Another option is using third party server's cookies, but that would defeat the purpose even more.
Is adding of GA params to url even when staying withing same domain a correct behavior?
No, it's most likely a mistake.
Now, you really asked all the right questions, so I don't have much to add, except that disabling your primary anonymized behavioral tracking is usually a lazy "safe" choice. And lazy here implies wrong.
Normally, larger corps don't block primary tracking. They only block third party marketing-related tracking. Basically, pixels. They consider their main analytics part of the primary functionality, which is a strong case given that main analytics data is often used in debugging, performance measurement and even for app security audits.
Finally, using onetrust or a similar solution to completely manage your tracking is sub optimal. They basically just destroy all "offending" cookies all the time. This will mess up your behavioral data very significantly.
The proper way to use consent management systems is declaring user consent choice in your tag management system and then in it, block rules/tags from firing in case the consent is not given. You normally just carefully block marketing tags there based on consent. Remember, consent management systems are only deleting cookies. Because that's trivial. They don't block network requests. Absence of cookies may not prevent the data from being sent, often even uniquely identifying the client, using the primary cookie's user id, allowing to match the activity to the backend database.

User ID Clearing

This is not SEO related but somewhat merely client related.
There is a website in particular that uses Google Analytics tracking to uniquely identify each one of their users.
This website has a blocking system that is preventing me from viewing some of their content, which I'd like to view.
I figured out that the website uses the Google Analytics cookie to somehow identify me.
Is there a way I can somehow spoof the client-ID from Google Analytics to basically make a new identity for myself? Thanks community.
GA has two ids that could identify a user, clientId and userId. The client id is set by the javascript tracking code. To delete that you would simply have to delete the _ga cookie. You will get a new ID randomly generated that does not identify you other than in the sense that multiple pageviews with that clientId will be treated as coming from the same user.
The userId is set by the server when a user logs in (to connect all sessions of that user, however it must not personally identify the user in question), so at that point the website already has to know you to set a userid.
It seems more than unlikely (and technologically not even feasible) to use Google Analytics to limit access to a website. For starters, Google provides an opt-out plugin for Analytics that would make such a system rather ineffectual, and I'm not quite sure how such a system would work.
To yet answer your question, you can change the clientId by manipulating the _ga cookie and replace the value therein, and you could spoof the userID e.g. by using a browser plugin that allows you to manipulate http requests. However I don't think this will let you bypass any access protection.

How to track & ID users in Google Analytics while registration/login via an other domain?

We have Multiple products, all are using one SSO (Single Sign On & Registration). So when visiting product A: www.AAA.com, the user (when register or sign-in) will have to go to: sso.mainsite.com to go through registration or login, then it will redirect him to the product page that he came from.
I want to track unique users properly so:
I know each user from where he was acquired. (e.g. Mailchimp campaign? Social Media?..etc)
Each user activity in the product is always linked to him (Made purchase, did an activity...etc)
Statistics Not be affected by the common SSO site, where multiple product users are directed there to login or register. I want to be able to identify product A users.
We have Analytics Account for the main site (inc. SSO) and an account for product A. I'm having difficult time:
Should I use the product A tracking code in the SSO (to have multiple tracking codes on that page)?
Identifying users by Google's new User ID, should we let the SSO identify them (assign ID) or by Product A, when the user get transferred there?
I know i'm asking a lot, but i'm having difficult time knowing what is the best approach, not to damage any statistics. Thank you!
If you want to track the session on www.AAA.com without interruption it should be enough to add sso.mainsite.com to the referral exclusion list in the property settings in www.AAA.com's GA account.
That way sso.mainsite.com will not appear as a referrer, instead the session including channel attribution will be continued when you redirect back from sso.mainsite.com. This will completely ignore the pageviews on your sso page.
The alternative would be to set up cross domain tracking, if you want to include the detour to sso.mainsite.com into the tracking. That would be somewhat complicated, and unless you signal that this is really what you want I will not even bother to explain the setup.

Google Analytics referral sources

We're using Google and Facebook SSO allowing our users to sign up and login with these services. However, if a user signs up or logins in with either service (rather than creating a standard email login), we lose the referral source in Google Analytics -- and, instead, sign up and upgrade sources are attributed to accounts.google.com or Facebook.com. Anyone have some thoughts on a workaround?
This requires some backend work. Whenever one of your users clicks on the login button with either services, you backend should 'remember' him using a cookie or any other parameter. In that way whenever he comes back from exactly the url of your Facebook SSO or accounts.google.com you should set the GA tracker referrer parameter to the one of your site's URL. You can do this in basic js code like this
ga('set', 'referrer', 'mydomain.com');
In this way you won't see these invalid referrals anymore.

If google analytics is used without cookie storage, are all pageviews a "new" user?

I have a little web browser in my application that hits a webpage using Google analytics. That little web browser has cookies and local disk storage disabled.
Are my user analytics going to be skewed because of this? Is every user reported as a new user when in actuality they are an existing one?
Yes, your Analytics data is going to be impacted. For example, you will not be able to differentiate hits between Sessions and Returning Visitors. As you say, each Visitor will be reported as a new one.
Analytics uses the Client ID parameter to uniquely identify a Visitor. As the official Field Reference states:
Client ID
Required for all hit types.
Anonymously identifies a browser instance. By default, this value is
stored as part of the first-party analytics tracking cookie with a
two-year expiration.
If your application can generate a unique key for each user and persist it elsewhere that in cookies or localStorate, you could still create your own Client ID:
Disabling Cookies
By default, analytics.js uses a single cookie to persist a unique
client identifier across pages. In some cases you might want to use
your own storage mechanism and send data directly to Google Analytics
without the use of cookies.
You can disable analytics.js from setting cookies using the following:
ga('create', 'UA-XXXX-Y', {
'storage': 'none',
'clientId': '35009a79-1a05-49d7-b876-2b884d0f825b'
});
When you disable cookie storage, you will have to supply your own
clientId parameter except for the special case where you are using
cross-domain linking parameters.
Yes. Google Analytics uses the client ID to determine if a user is new or returning.
Note, if your users are logged in (probably not though without cookies), then you can use the user ID feature to determine new from returning users.

Resources