Google Analytics 4 vs. UA: Why are Event Counts Different? - google-analytics

My event counts between GA4 and UA are different. They are not drastically different (maybe about 10%) but the numbers are still off. If the tags and triggers are all the same in GTM, shouldn't the event counts be identical? what would cause it to be 10% off or is this normal??

First of all, it has at this point become agreed upon that GA4's pre-baked reports are less than reliable. We now suggest avoiding using pre-cooked reports in GA4 and instead either use the Explorer, or export the data and use something else entirely (preferably) if you have the resources for it.
Secondly, make sure you don't have Google Signals enabled since this changes thresholding/sampling logic. Also, switch to device-only reporting, it will help with thresholding. More on it here. It's important that the Signals are never enabled. It looks like the thresholding logic won't be fixed even if you disable them after enabling. Some report the thresholding reduce after you give it some months. Some claim it's due to Signals affecting the data and then the data can't be restored.
Event Properties Cardinality. Here's more about it. Despite what Google claims, GA4 is still full of bugs and unpleasant features. One of it would be the cardinality of your eps' values. Keep it low. Otherwise, sampling kicks in hard and you end up seeing a large percentage of your events as Other(Other). Even when you don't use the high cardinality dimension in your report.
Data retention. See the limitations on it here. Yep, no more free access to old data for your precious ad-hoc YoY analysis, so if you're counting old events, no luck. UA will show them to you, but GA4 wipes them. GA4 tries to still maintain the pre-cooked aggregated reports, but now you can't drill into them as you used to in UA, and they're not accurate anyway.
These are generic suggestions. More debugging would have to be done on your side to find out exactly what datapoints at what times aren't being counted. Data exports to BQ would help narrowing it down. But at this moment, the general consensus among analysts is that we shouldn't compare GA4's data to that of UA. I personally don't agree with that consensus, since it's always good to know the difference, but that has become almost an industry standard today.

I posted a finding to this thread.
had the same issue with purchase events being different for UA vs. GA4.
Universal Analytics was always showing higher numbers and the triggers were exactly the same.
Then I enabled data export to BigQuery and it turned out that GA4 shows only those transactions in the GA4 UI that have a value for the field user_pseudo_id (you only see this field in the BigQuery data export). There were transactions where the field was null and apparently these dont show up in the UI.
I would recommend looking at raw event in BigQuery, the data export is for free as long as you dont go crazy with ETLs and queries.

Related

Explore Free Form report in Google Analytics

I am trying to generate a report using Google Analytics Explore tab using Free Form technique. Few weeks ago I could use Message name, stream name and time to see all the notification name, platform and total no of click. I exported the same to excel file.
but today when I tried to generate the same I couldn't find "Message Name" dimension. Is this field removed from pre defined/custom dimensions from GA? or am I doing something wrong?
My main purpose is to get all list of notifications sent via Firebase.
Any help will be deeply appreciated.
Given that you excluded the obvious issues like using the too-fresh data, the proper way to debug it is to export the data into a sample BQ table, then conduct exactly the same analysis that you're trying to conduct in GA4's explorer. From there, if your issue is with explorer's filters, you will quickly see it.
If, however, you're able to see your event properties in BQ, but not able to get the explorer to display them... Well, Google likely saved quite a lot of money on GA4. UA was pretty expensive. GA4 now introduces all these amazing features like data retention limits, properties' values cardinality bugs, odd inconsistencies between explore's reports and default reports and so on.
For now, the best way to really access your data minus all the artificial limitations of GA4 is to ETL your data from there either through the reporting API or exporting it to BQ.

Google Analytics suddenly started Sampling Data, 3k sessions for property over time period

We are using the free level of GA and have been creating reports using Custom Dimensions and Metrics since last summer.
We also use the Google Sheets Analytics add-on to post process data pulled from the API.
Overnight on 16-17 May (UK Time), our reports suddenly started showing as being sampled. Prior to that we had no sampling at all, as our reports are scheduled so I can look back through the revision history to see changes made when the scheduled reports run.
This sampling is occurring in custom reports viewed in the GA platform and in GA sheets. I've done some analysis and it appears to only occur at the point that more than one Custom Dimension is added to a report, or when the GA dimensions ga:hour or ga:dateHour are used (ga:date does not trigger sampling).
All our Custom Dimensions and Custom Metrics are set at Hit level (I've read a post where it was claimed to be due to mixing scopes on Dimensions & Metrics, but we are not doing this).
If I reduce the date range of a query (suggested as a solution on many blogs), the sampling level actually gets worse rather than better.
For the month of May we didn't even hit 4k sessions at property level. I can't find any reference anywhere to any changes being made to GA that would cause sampling to apply to our reports (change documentation, Google Blogs etc).
Is anyone else experiencing this or can anyone shed any light on why this might be happening? Given how we use GA if we can't resolve this then it's a year of work down the drain, so I'm really keen to at least know why this has suddenly happened even if ultimately nothing can be done about it.

How to Sample Adobe Analytics (Omniture) Data

I can't find anything on the web about how to sample Adobe Analytics data? I need to integrate Adobe Analytics into a new website with a ton of traffic so the stakeholders want to sample the data to avoid exorbitant server calls. I'm using DTM but not sure if that will help or be a non-factor? Can anyone either point me to some documentation or give me some direction on how to do this?
Adobe Analytics does not have any built-in method for sampling data, neither on their end nor in the js code.
DTM doesn't offer anything like this either. It doesn't have any (exposed) mechanisms in place to evaluate all requests made to a given property (container); any rules that extend state beyond "hit" scope are cookie based.
Adobe Target does offer ability to output code based on % of traffic so you can achieve sampling this way, but really, you're just trading one server call cost for another.
Basically, your only solution would be to create your own server-side framework for conditionally outputting the Adobe Analytics (or DTM) tag, to achieve sampling with Adobe Analytics.
Update:
#MichaelJohns comment below:
We have a file that we use as a boot strap file to serve the DTM file.
What I think we are going to do is use some JS logic and cookies
around that to determine if a visitor should be served the DTM code.
Okay, well maybe i'm misunderstanding what your goal here is (but I don't think I am) but that's not going to work.
For example, if you only want to output tracking for 50% of visitors, how would you use javascript and cookies alone to achieve this? In order to know that you are only filtering 50%, you need to know the total # of people in play. By itself, javascript and cookies only know about ONE browser, ONE person. It has no way of knowing anything about those other 99 people unless you have some sort of shared state between all of them, like keeping track of a count in a database server-side.
The best you can do solely with javascript and cookies is that you can basically flip a coin. In this example of 50%, basically you'd pick a random # between 1 and 100 and lower half gets no tracking, higher half gets tracking.
The problem with this is that it is possible for the pendulum to swing 100% one way or the other. It is the same principle as flipping a coin 100 times in a row: it is entirely possible that it can land on tails all 100 times.
In theory, the trend over time should show an overall average of 50/50, but this has a major flaw in that you may go one month with a ton of traffic, another month with few. Or you could have a week with very little traffic followed by 1 day of a lot of traffic. And you really have no idea how that's going to manifest over time; you can't really know which way your pendulum is swinging unless you ARE actually recording 100% of the traffic to begin with. The affect of all this is that it will absolutely destroy your trended data, which is the core principle of making any kind of meaningful analysis.
So basically, if you really want to reliably output tracking to a % of traffic, you will need a mechanism in place that does in fact record 100% of traffic. If I were going to roll my own homebrewed "sampler", I would do this:
In either a flatfile or a database table I would have two columns, one representing "yes", one representing "no". And each time a request is made, I look for the cookie. If the cookie does NOT exist, I count this as a new visitor. Since it is a new visitor, I will increment one of those columns by 1.
Which one? It depends on what percent of traffic I am wanting to (not) track. In this example, we're doing a very simple 50/50 split, so really, all I need to do is increment whichever one is lower, and in the case that they are currently both equal, I can pick one at random. If you want to do a more uneven split, e.g. 30% tracked, 70% not tracked, then the formula becomes a bit more complex. But that's a different topic for discussion ( also, there are a lot of papers and documents and wikis out there published by people a lot smarter than me that can explain it a lot better than me! ).
Then, if it is fated that that I incremented the "yes" column, I set the "track" cookie to "yes". Otherwise I set the "track" cookie to "no".
Then in in my controller (or bootstrap, router, whatever all requests go through), I would look for the cookie called "track" and see if it has a value of "yes" or "no". If "yes" then I output the tracking script. If "no" then I do not.
So in summary, process would be:
Request is made
Look for cookie.
If cookie is not set, update database/flatfile incrementing either yes or no.
Set cookie with yes or no.
If cookie is set to yes, output tracking
If cookie is set to no, don't output tracking
Note: Depending on language/technology of your server, cookie won't actually be set until next request, so you may need to throw in logic to look for a returned value from db/flatfile update, then fallback to looking for cookie value in last 2 steps.
Another (more general) note: In general, you should beware sampling. It is true that some tracking tools (most notably Google Analytics) samples data. But the thing is, it initially records all of the data, and then uses complex algorithms to sample from there, including excluding/exempting certain key metrics from being sampled (like purchases, goals, etc.).
Just think about that for a minute. Even if you take the time to setup a proper "sampler" as described above, you are basically throwing out the window data proving people are doing key things on your site - the important things that help you decide where to go as far as giving visitors a better experience on your site, etc..so now the only way around it is to start recording everything internally and factoring those things in to whether or not to send the data to AA.
But all that aside.. Look, I will agree that hits are something to be concerned about on some level. I've worked with very, very large clients with effectively unlimited budgets, and even they worry about hit costs racking up.
But the bottom line is you are paying for an enterprise level tool. If you are concerned about the cost from Adobe Analytics as far as your site traffic.. maybe you should consider moving away from Adobe Analytics, and towards a different tool like GA, or some other tool that doesn't charge by the hit. Adobe Analytics is an enterprise level tool that offers a lot more than most other tools, and it is priced accordingly. No offense, but IMO that's like leasing a Mercedes and then cheaping out on the quality of gasoline you use.

Could "filling up" Google Analytics with millions of events slow down query performance / increase sampling?

Considering doing some relatively large scale event tracking on my website.
I estimate this would create up to 6 million new events per month in Google Analytics.
My questions are, would all of this extra data that I'm now hanging onto:
a) Slow down GA UI performance
and
b) Increase the amount of data sampling
Notes:
I have noticed that GA seems to be taking longer to retrieve results for longer timelines for my website lately, but I don't know if it has to do with the increased amount of event tracking I've been doing lately or not – it may be that GA is fighting for resources as it matures and as more and more people collect more and more data...
Finally, one might guess that adding events may only slow down reporting on events, but this isn't necessarily so is it?
Drewdavid,
The amount of data being loaded will influence the speed of GA performance, but nothing really dramatic I would say. I am running a website/app with 15+ million events per month and even though all the reporting is automated via API, every now and then we need to find something specific and use the regular GA UI.
More than speed I would be worried about sampling. That's the reason we automated the reporting in the first place as there are some ways how you can eliminate it (with some limitations. See this post for instance that describes using Analytics Canvas, one my of favorite tools (am not affiliated in any way :-).
Also, let me ask what would be the purpose of your events? Think twice if you would actually use them later on...
Slow down GA UI performance
Standard Reports are precompiled and will display as usual. Reports that are generated ad hoc (because you apply filters, segments etc.) will take a little longer, but not so much that it hurts.
Increase the amount of data sampling
If by "sampling" you mean throwing away raw data, Google does not do that (I actually have that in writing from a Google representative). However the reports might not be able to resolve all data points (e.g. you get Top 10 Keywords and everything else is lumped under "other").
However those events will count towards you data limit which is ten million interaction hits (pageviews, events, transactions, any single product in a transaction, user timings and possibly others). Google will not drop data or close your account without warning (again, I have that in writing from a Google Sales Manager) but they reserve to right to either force you to collect less interaction hits or to close your account some time after they issued a warning (actually they will ask you to upgrade to Premium first, but chances are you don't want to spend that much money).
Google is pretty lenient when it comes to violations of the data limit but other peoples leniency is not a good basis for a reliable service, so you want to make sure that you stay withing the limits.

Google Analytics Event Tracking reporting inflated event values

We recently released two typefaces on our website for free (albeit suggesting an optional donation). I decided we should track downloads through Google Analytics using the event feature, so we ended up adding the corresponding JS snippet to the download form (on submit), something akin to this:
_gaq.push(['_trackEvent', 'Typeface', 'Download', 'Typeface #1', parseInt($('input[name=amount]').val(), 10) || 0]);
I also decided we might as well use GA to keep track of donations, so as you might have noticed the optional donation amount is being sent as the event value argument. There's already a browser-side numeric-only verification, and it will set it to 0 in case it's empty (NaN), so we're completely sure it's always an integer (required type for the argument).
I configured two different goals (one for each typeface) in our GA profile, using the two different events as their respective conditions, as recommended by every howto I've been reading about this subject.
However, some of the reported data appears to be somewhat inflated. According to GA there's been, as of now, 455 unique events out of 550 total events, which seems to be okay, but apparently it's worth a value of over a million dollars. And, believe me on this, we have not received such a huge amount, at least just yet.
According to GA: Event Value is the total value of an event or set of events. It is calculated by multiplying the per-event value by the number of times the event occurred.
I assumed I could set individual values to different instances of the same event, even GA documentation leads me to believe so with their examples, so I don't really understand why it's being reported as such an inflated total value.
Is there something wrong with my assumption? Is this the correct approach to what I'm trying to accomplish? should I just forget about keeping track of donations using this method and resort to using the e-commerce feature instead as I've also been reading about?
I'm not checking for any verification of a donation successfully completing, so I'm left with an estimate and I'm okay with that. Maybe someone jokingly wrote off some exaggerated amount then never completed the donation process?
Your assumption is right : you could set individual values to each event and "the report adds the total values based on each event count" (as explain in doc).
The main problem with your approach is the one you mentioned : you count the donation at form validation, before its confirmation and even before you told your visitor that the donation must be made via PayPal. So yes : some people probably wrote off some exaggerated amount or simply not complete the donation process.
I recommend you to use e-commerce tracking after the PayPal payment to avoid unconfirmed donation tracking and the lack of deduplication using goals values to monitor amounts.

Resources