Tracking iframe origin - iframe

I am creating a Web Widget, a page that customers can use within an HTML Iframe in order to embed our experience on 3rd parties and vendors.
The site will be public, I am not willing to ask consumers to register in order to have a key or a unique identity to be passed as a query param for example (e.g. ?id=<unique_id>).
On the other hand, I need to track who is using the iframe. What are my options? A colleague suggested using the request headers, such as the origin, to track the usage on the server-side. Is that a good strategy? I'm not sure how much I can trust the origin header.
What if I fire an event (hence a client to server call), at page load (such as analytics) which logs the current page URL? Would that work, from within an iframe?
I am pretty sure I am reinventing the wheel here. What would be some good recommendations?
Thanks!

For others ideating for a similar solution, my fix was actually to simply hook a proper client analytics to the page, and trigger a page load event, upon page load, which would push not just the page, but quite a few other properties to our analytics.
Also, we added a clientId query param to our urls, so that we could identify precisely who was serving the iframe visited by the user.

Related

Trying to understand GTM Docs

A client of us is asking us to move to server side tagging on GTM.
So I created a container, I verified the domain and all of that, then I followed this:
https://developers.google.com/tag-platform/tag-manager/server-side/send-data
Once the server container URL is set, data is sent to a client in your Tag Manager server-side container. By default, GA4 and a UA clients are pre-installed on your server-side container.
Key Term: "Clients" are adapters between the software running on a user's device and your server-side Tag Manager container. They receive measurement data from a device, transform that data into one or more events, process the data in the container, and package the results to be sent back to the device.
In your Tag Manager server container, click Clients in the left navigation to view the list of clients. Click the name of the client to view or edit details. In most cases, the client will require no modifications. However, there are settings that may require edits for certain cases:
So now, my site (drupal) is sending events to a GTM web container as it always was, and that web container is firing the events on the server container and tags to GA4, and then the server container fires the events to GA4 again.
Isn't that redundant and pointless?
I thought the idea was to cut the middleman and try not to cross domain boundaries to prevent cookies from being blocked.
Am I reading the docs wrong? maybe I'm assuming this is the way it should be setup and instead this is just an example to get data from an existing feed?
As #Darrellwan indicated, you don't need to use two containers in parallel.
What a lot of people are trying to do is to use the front-end GTM to send events about what happens to the page to the backend GTM.
Using front-end GTM in this case, however, is questionable since it's too easy to block with most of the adblockers. Therefore, people often just do tracking with arbitrary on-page JS, sending the details to the sGTM endpoint. But then the adblockers can block the sGTM endpoint, which is, too, not the best case scenario, so they deploy a proxying mirror endpoint on their backend that relays events to the sGTM endpoint.
As per cookies, there's very little cookie concern. GTM doesn't use third party cookies for tracking, so no issues on that front. And there are very few people who disable cookies in their browsers since that cuts off a lot of functionality.
If all you do is using GA4, then having a server-side GTM is a bit pointless; you still have to integrate a full GA4 installation on the client via GTM (because some GA4 events like session_start are only generated in the client-side GA4 code), so it does not even save you page weight by reducing JS code.
The situation changes somewhat when you have additonal marketing tags, since you do not have to send a request to server-side GTM for every tag, but instead can re-use the data from the GA request for other tags. With a bit of programming you can also convert javascript cookies to server-side cookies, which makes them a tad more robust against some ad blocking features, and you can redact e.g. personal data before you fire tags to a marketing vendor.
The limits are tags that interact with the website, or tags that rely on 3rd party cookies (both not possible from a server-side container), so feedback tools, remarketing tags and similar cannot be used from a server-side container.
But yes, if all you want to do is to run GA4, then you do not need GTM at all (neither server-side nor client side) and would be better off with gtag.js (a sort of trimmed-down tag deployment solution for Google tags only).

How can I get website analytics data of bots and users with no javascript and/or cookies?

Is it possible to get the info about website traffic without Google Analytics, for example?
For tracking without Javascript you can use the Measurement Protocol, i.e. you send a request to the Google Analytics Server with query parameter that specify the type of interaction (pageview, event etc) and the data you want to track.
General info is found in the protocol reference, a list of valid parameters is here, and if you want to test your tracking requests you can use the hit builder, which allows you to assemble, validate and send a hit to Google Analytics.
As for tracking without cookies that probably won't work very well. You need a persistent id to assemble hits into sessions, and sessions into visits. Such an ID is usually stored in a cookie (your "no javascript" requirement means that things like local storage are out of the question). You can either decorate all your links with a client id and use that to persist the parameter from page to page, or you use some sort of server-side browser fingerprinting.
All in all this might be somewhat less trivial than you assume, especially if you do not only want to track pageviews but also events that do not load a new page.

Best way to exclude traffic from homepage to app?

I need to set up a filter (or segment) that excludes web app traffic.
On our site we have a button on the top right that takes the visitor to portal.domain.com which is where the web app is hosted.
I need a GA view that shows me traffic in which they have not clicked this button and gone on to use the web app.
Whats the best way of setting this up?
Thanks
Add a filter to exclude, campaign source and set it to your domain. That will remove hits in the view that came from that URL.
You should be able to read the traffic that you want using a filter.
Is that page tracked by the same GA Account? If so, you set the filter to have a condition that excludes that page (although you'd probably identify it using its hostname, and you'd also have to be careful about cookie domains).
Alternatively, you could throw an event that triggers when people depart for that page, and then use that in the segment above.

Techniques to Trigger Google Analytics Tracking from PDF Links

Here's the scenario:
I have a mailing list that contains a PDF download link. The PDF contains ads with clickable links. I need to get analytic data on the link clicks - preferably via Google Analytics (due to the richness of information available).
The solution I have in mind is for the link to go to a web page that I host with some sort of ad-specific token. GA records the request and then I use a client-side technique to redirect to the actual target URL. The redirect page serves no purpose other than to track the click and so I'm not worried about it being perceived as cloaking by search engines.
What I want to know is:
Are there any alternative ways to achieve the tracking without using an intermediate redirect page (could I perhaps call GA server-side somehow)?
If I do use the redirect page approach, what are potential pitfalls could I encounter?
Thanks in advance for any advice.
dunno what server-side environment/language you use but for instance in php you can use cURL to send an image request to google, with the custom code appended to the url. Easiest way to do it is to output the code with javascript with your custom code and then capture the image request url with a sniffer, so you can replicate the format for your cURL request. Make sure to send header info, including fake browser info so GA doesn't weed it out as a bot. Then forward to the ad url. That way you don't need to output a page.
Yeah you still have a 'redirect' happening but you cut out having to have the client download a page or worry about javascript being disabled, etc...
unfortunately there really isn't anything better you can do.

How does google-analytics guarantees that the tracking record is coming from the real site

When you sign up to google analytics it instructs you to use a javascript snippet on every page you want to track. This code includes an API key, which is visible to everyone who views your source code.
How does it guarantees that the request is coming from the real site, and not from a third-party who wants to mess with your statistics? Does it check the HTTP Referer header? Even that is not safe, as it can be forged.
GA doesn't (to the best of my knowledge) attempt to verify that the site ID (the UA-XXXXX-XX code) matches a domain specified in the GA setup - I think this is a good thing, as you can track a bunch of related sites as though they were a single site (think single-product minisites, for example). However, this does leave the GA profile open to accidental or malicious use of the UA code on other unrelated sites.
The easiest way to fix this is to add a filter onto the GA profile which restricts reported data to a specified referrer hostname set. This will clean out the accidental typo problem, but malicious types would be able to work around this if they were really interested (but they'd be more likely to grief your PPC campaigns instead).

Resources