Handling Google analytics in an offline enabled web app - google-analytics

I am a developer of Codiva - java ide and online compiler. I am working on improving offline support, reducing network usage, reducing the latency by pre-caching as much as possible.
I want to know how to handle requests to google analytics.
First is the ga script. I use google tag manager to setup GA. Is it okay to cache that request, that is, can I use networkFirst strategy for this request? Or should it always be networkOnly?
How to make sure the actions that happened offline gets tracked correctly?
I am planning to start using Firebase for some featuers, firebase also has some kind of analytics. Would it automatically handle analytics when the device goes offline?

Use the Service Worker helper for Google Analytics:
https://developers.google.com/web/updates/2016/07/offline-google-analytics?hl=en
Try PWA Template https://github.com/StartPolymer/progressive-web-app-template

First is the ga script. I use google tag manager to setup GA. Is it okay to cache that request, that is, can I use networkFirst strategy for this request? Or should it always be networkOnly?
I'm not sure it's wise to cache the GTM script. The analytics.js script is relatively static, but the GTM script can be updated by anyone who has access to your GTM account. Changes made in there obviously wouldn't get propagated to users of the cached version of the script.
How to make sure the actions that happened offline gets tracked correctly?
The key is to use the qt parameter, which allows you to send a hit after the fact, and specify its time offset.
There's an unofficial service worker script that does this today that you should take a look at. It will probably become officially supported sometime soon:
https://gist.github.com/jeffposnick/466ef7578c4c880a78c7270e6ac69620
I am planning to start using Firebase for some featuers, firebase also has some kind of analytics. Would it automatically handle analytics when the device goes offline?
At this point Firebase analytics is mobile-only. If you're using their web SDK, I don't think you get any analytics at this point.

Related

Workbox Google Analytics doesn't work log events to resent later

I'm trying to implement the Google Analytics's Workbox Plugin on my application, which works most of the time offline. I used the webpack plugin as well to generate the service worker (SW). I used the method GenerateSW({offlineGoogleAnalytics: true}). But, unfortunately, after the SW be registered it didn't created an database for the workbox background sync, neither a table for the request (as seeing on this demo). On my application the SW is active and running but it doens't make requests, neither setup the indexDB. I'm using and local version of the gtag.js and it works as expected when the application is online, the events are logged on Analytics real time dashboard. But offline the requests aren't saved to be resent later. What I missed for this feature to work?

How can you prevent fake events in firebase analytics

I've recently set up Firebase Analytics with my website. I was beginning to add some events to be logged and realized any arbitrary event could artificially be called. I could just go into my browser's console and run the command firebase.analytics().logEvent('some_fake_event').
If you know a website is using firebase analytics, what's to stop you from simply spamming fake events into your console? The website owner's analytics would become fairly screwed up. Also, firebase mentions that there's a 500 event-type limit. One could also run firebase.analytics().logEvent('fake_event_1'), firebase.analytics().logEvent('fake_event_2'), etc. Oops, the website owner can't create any more new (legitimate) event types.
What is in place to prevent this?
The logEvent method will be available once you rely on standard firebase SDK. The only way to prevent this would be to obfuscate the library you're using.

Should i use workbox runtime caching staleWhileRevalidate to cache gtm.js?

I'm using GTM in my next.js app and I'm using next-offline which uses workbox-webpack-plugin internally for offline support is it a good idea to use runtime caching staleWhileRevalidate strategy to cache gtm.js ?
My app works offline and it saves analytics offline and send them when back online by importing this script:
// Initialize offline google analytics which will store failed analytics requests and try again later when connection is back
// it will also cache the analytics.js library
workbox.googleAnalytics.initialize({
// using a custom dimension(cd1) to track online vs. offline interactions
parameterOverrides: {
cd1: "offline"
},
// Using a custom metric to track time requests spent in the queue
hitFilter: params => {
const queueTimeInSeconds = Math.round(params.get("qt") / 1000);
params.set("cm1", queueTimeInSeconds);
}
});
Let's say the user on the second visit opened my home page i use runtime cache with networkFirst strategy to cache html, so if the user revisited my home page again while he is totally offline he will get a fully working app especially that i use the same networkFirst runtime caching strategy to cache the api requests, But while totally offline the request to fetch gtm.js will return 404 and the analytics won't work offline because gtm.js won't init the request to fetch analytics.js which will be served from workbox cache. My idea was to use staleWhileRevalidate strategy to cache gtm.js so the offline analytics works even if the user opens the app in offline mode and if he went back online those analytics will be resent by workbox.
Is this a good idea ? will it work as expected or there is something i'm missing ?
I'm not familiar with gtm.js, but workbox-google-analytics will automatically create an appropriate runtime caching route to handle offline access to the analytics.js and gtag.js scripts for you:
Workbox Google Analytics does exactly this. It also also adds fetch
handlers to cache the analytics.js and gtag.js scripts, so they can
also be run offline. Lastly, when failed requests are retried Workbox
Google Analytics also automatically sets (or updates) the qt in the
request payload to ensure timestamps in Google Analytics reflect the
time of the original user interaction.
It sounds like gtm.js is loaded from a different URL than gtag.js and might have a different syntax for its collection pings, so filing a feature request in the Workbox GitHub repo asking for gtm.js support sounds like your best bet.

Avoiding Google Tag Manager blocking by AdBlockers

I have used Amplitude analytics in the past in my react Web app to send event data. However I just started with Google Tag Manager and noticed it does not run because being blocked by adBlockers. Amplitude was always functional because I loaded their Javascript SDK through NPM install 'github:amplitude/Amplitude-Javascript' and initialized it at app load with client API key. I like the approach of Google Tag manager where I dont have redeploy app to make changes to my analytics logic. How can I take a similar approach to avoiding being blocked by adblockers.
It may very well be that Google products are popular so Adblock specifically just block google analytics products not other analytics products.
You don't. If people don't want to be tracked, that is their decision. You should not be forcing people to provide you with any data they do not want to provide. Especially by using some shady "bypassing" measures. Instead-
You could use a cookie to permanently disable your tracking of those who do not wish to be tracked, to help you preserve reliable analytics. See: http://www.multiminds.eu/2016/05/19/how-to-disable-tracking-via-google-tag-manager/
Or, better yet, simply measure the percentage of visitors who have disabled tracking so your analytical data can remain accurate. See: https://marthijnhoiting.com/detect-if-someone-is-blocking-google-analytics-or-google-tag-manager/
Yes, it's possible.
You can use reverse proxy for Google Tag Manager.
First, download the Google Analytics JavaScript library itself and host it on your server.
Then alter the code in the downloaded library to change the target host from www.google-analytics.com to your own domain name using find-replace.
Replace the link from the default Google Analytics script in your codebase to modified one.
Create a proxy endpoint to Google Analytics servers on your back end. One important step here is to additionally detect the client’s IP address and write it explicitly in requests to Google Analytics servers to preserve correct location detection.
Test the results. You’re done!
more detail info on freecodecamp.org/news/save-your-analytics-from-content-blockers and https://analytics-bypassing-adblockers.netlify.com
There's dataunlocker.com as well as some other open source alternatives (1, 2) which can help to fix reporting accuracy of Google Tag Manager, Amplitude, Google Analytics etc.
Talking about ethics and privacy, tools like DataUnlocker are just tools which allow you to bypass ad blockers as if you have implemented server-side analytics. I think by correctly implementing that "we use cookies" consent one can solve any privacy concerns.
I've managed to get around some blockers with the following in a node app:
var request = require('request');
app.get('/proxy*', function(req,res) {
const newurl = req.url.split('/proxy/')[1];
const data = request(newurl);
//data.on('response', function(response){console.log(JSON.stringify(response))});
data.pipe(res);
});
Then in your snippets for GTM prepend: "/proxy/" in the url and now the call goes via your server.
The caveat with the above is that without additional code you can't preview the container, but the container does load correctly. Lack of preview is a different issue to deal with.

Google Tag Manager and sending data offline

I have a question to the following case. We want to track a content platform using google tag manager. However, not every time the platform is online but GTM would send data to our internal server. Therefore our concern is if data collected during this offline period will be kept or if we loose them.
Do you know if there is some period during which data collected offline through Google Tag Manager is kept and once it gets online then it is sent to Google Analytics?
Thank you,
Lukas
No, that is not how Google Tag Manager works. GTM for web is basically a javascript injection engine. It bundles your configured tags,triggers and variables with a selector engine and injects that into your page. There is no serverside component that stores data.
I'm sure one could come up with a solution to your problem - e.g. store your data with localstorage in the browser, poll you server to see if it is available, and when it's online send the data with a queue time parameter to Google Analytics. However that has nothing to do with GTM.
Having said this, it is hard to understand your use case - if your server is offline, then where does the data come from ?
If you have an offline PWA app (with a Service Worker), you can use the Workbox Google Analytics module to handle the collection of data, and to report it upstream when your site comes back online.
This module has a service worker fetch handler that intercepts the calls that you would make with analytics.js or gtag.js, and stores your data locally in IndexDB in the event that the call fails because it is offline.

Resources