GA4 limits and sampling - google-analytics

In the internet i tryed to find information, but it very hard ,because in any forums i found diffrent information.
How many hits i can collct in GA4 property in one month without sampling and limits?
In GA3 i could collect only 10M in one month, but in GA4 i don't know.
And after 10M in month, data will be collecting or Google will stop collect new data more than 10M hits?
In official Google docs - https://support.google.com/analytics/answer/11202874?hl=en
They say Explore sampling limits = 10M events per query
What does it mean?
It is impossible that in the report there were more than 10M lines?

It means that when you generate a non-standard report involving 10 million hits, sampling is applied. In that case you can reduce the time frame to involve fewer hits.

Related

Google Analytics MCF - how is sampling done?

According to the Google sampling documentation, sampling is done based on sessions (fact). In Google MCF queries, sessions are not available as a metric or dimension (fact?) Now I wonder: how is sampling done if there is no metric or dimension for sessions (and date)?
I suspect that sampling for Google MCF is done based on a maximum number of rows of 10.000 for each query. Are there more than 10.000 results? Than sampling is true. Is this correct?
Use-case: let's say I make a query and the "TotalResults" are 67540. Does this mean that I can get all the unsampled data with querying 7 calls (6x10000 and one time 7540 rows)? How do I know which dateranges to query than?
Tried some different approached and found out that the sampling is doe based on sessions and dates, same as the other data from the API.

Data sampling in google analytics goal flow report

The goal flow report on my google analytics account shows some strange sampling behavior. While I can usually select up to a month of data before sampling starts it seems to be different for the goal flow report.
As soon as I select more than one day of data the used data set is getting smaller very fast. At three days the report ist based on only 50% of the sessions, which, according to analytics, comes to only 35 sessions.
Has anyone experienced a similar behavior of sampling although only very small data-sets are used?
Sampling is induced when your request is calculation-intensive; there's no 'garunteed point at which it trips.
Goal flow complexity will increase exponentially as you add goals, so even a low number of goals might make this report demand a lot of processing.
Meanwhile you'll find that moast of the standard reports can cover large periods of time without sampling; they are preaggreated, so it's very cheap to load them.
If you want to know more about sampling, see here:
https://stackoverflow.com/a/37386181/5815149

Google analytics data adjustment?

I've been using a SSIS Integration component to download data from Google Analytics in order to keep an historical view of some websites and track the evolution of them. Basically the metrics we track are Visits (now Sessions) and Visitros (now Users), and the dimensions are Year and Month. However, today I noticed that the data I downloaded for july had a variation on the Users metric. I heard that google analytics uses an estimation method to "calculate" some (if not all) of their metrics, could it be that after that they "adjust" the data with more acurate information? If so, is this mentioned in the documentation? (a link would be highly appreciated) Since the users are complaining that we are not delivering the real GA Data. I tried looked on the Google analytics documentation page with no luck.
Thanks for your time.
PS: Sorry for my english, it isnĀ“t my native language
If you are using the standard version of Google Analytics (you'll know if you are paying $150k for premium), data is sampled depending on volume. Have a read of this article can-you-trust-your-google-analytics-data
I have seen very slightly differing results being returned if you repeatedly call the api with the same historical parameters repeatedly. In my case the figures only differed by 1-2 over a daily set of several thousand, but nevertheless it differed.
If you want to guarantee your results, consider upgrading to premium
Sampling could be an issue if what you are requesting is over 50,000 rows for the time period you are requesting. To avoid it you can download more often, such as daily.
But I think your issue is that there is a processing time for Google Analytics - if you are downloading at 3 am on the 1st it is probable that the processing for the previous day has not finished.
Google Analytics Premium SLA is for 4 hour data freshness, so even that would have trouble. Pragmatically you should allow 24 hours before you download data for the previous day, 48 hours for e-commerce data.
Thirdly make sure it is not Unique Visitors you are requesting, as this is dependent on the time period you are requesting.

Unsampled reports automation for historical data

We have a client who receives 2-4 million visits a day, so off the bat we can only get unsampled reports because it exceeds google's limit :
500,000 maximum sessions for special queries where the data is not already stored.
We are attempting to collect Unique Visitors and Visits for a 1 day period. Using the Google API has proved frivolous as the data is sampled.
We have set up Unsampled reports on a daily basis that get dumped into Google Drive and our application picks up the new files and downloads them just fine. The problem we are running into is that we need 2 years worth of daily data for 20 reports. The maximum range we can run an unsampled report using google analytics web interface is 1 week before we exceed a query limit. So 52 weeks of reports x 2 years x 20 different reports to set up is 2080 scheduled unsampled reports and this is for 1 client only.
EDIT: Can we automate unsampled reports using GA API or any programming method to pull historical data with the constraints previously mentioned? Also we do have Google Analytics Premium
Cris G, the only way to avoid data-sampling in Google Analytics without having an access to Premium is day-parting technique = you split a data-request for selected time period into shorter period queries (typically days) and then add all the numbers up. If your profiles/views are not sampled if you look at daily numbers, this could solve you issue.
However, this doesn't work on Unique Visitors, since they will be unique every single time (you are running data requests on daily basis), so there will be most likely duplicates and inflated totals if your site is attracting lots of returning visitors.
To automate some of the work, I suggest using tools like Analytics Canvas. It can make your life much easier and I think it could be the perfect tool for what you need to. Bear in mind the limitations about unique visitors (and some other metrics).
Having said that, I still think the best choice would be to use the benefits of Premium and the ability to get unsampled data for your reports.

big difference in "visitor" count

I try to pull out the (unique) visitor count for a certain directory using three different methods:
* with a profile
* using an dynamic advanced segment
* using custom report filter
On a smaller site the three methods give the same result. But on the large site (> 5M visits/month) I get a big discrepancy between the profile on one hand and the advanced segment and filter on the other. This might be because of sampling - but the difference is smaller when it comes to pageviews. Is the estimation of visitors worse and the discrepancy bigger when using sampled data? Also when extracting data from the API (using filters or profiles) I still get DIFFERENT data even if GA doesn't indicate that the data is sampled - ie I'm looking at unsampled data.
Another strange thing is that the pageviews are higher in the profile than the filter, while the visitor count is higher for the filter vs the profile. I also applied a filter at the profile to force it to use sample data - and I again get quite similar results to the filter and segment-data.
profile filter segment filter#profile
unique 25550 37778 36433 37971
pageviews 202761 184130 n/a 202761
What I am trying to achieve is to find a way to get somewhat accurat data on unique visitors when I've run out of profiles to use.
More data with discrepancies can be found in this google docs: https://docs.google.com/spreadsheet/ccc?key=0Aqzq0UJQNY0XdG1DRFpaeWJveWhhdXZRemRlZ3pFb0E
Google Analytics (free version) tracks only 10 mio page interactions [0] (pageviews and events, any tracker method that start with "track" is an interaction) per month [1], so presumably the data for your larger site is already heavily sampled (I guess each of you 5 Million visitors has more than two interactions) [2]. Ad hoc reports use only 1 mio datapoints at max, so you have a sample of a sample. Naturally aggregated values suffer more from smaller sample sizes.
And I'm pretty sure the data limits apply to api access too (Google says that there is "no assurance that the excess hits will be processed"), so for the large site the api returns sampled (or incomplete) data, too - so you cannot really be looking at unsampled data.
As for the differences, I'd say that different ad hoc report use different samples so you end up with different results. With GA you shouldn't rely too much an absolute numbers anyway and look more for general trends.
[1] Analytics Premium tracks 50 mio interactions per month (and has support from Google) but comes at 150 000 USD per year
[2] Google suggests to use "_setSampleRate()" on large sites to make sure you have actually sampled data for each day of the month instead of random hit or miss after you exceed the data limits.
Data limits:
http://support.google.com/analytics/bin/answer.py?hl=en&answer=1070983).
setSampleRate:
https://developers.google.com/analytics/devguides/collection/gajs/methods/gaJSApiBasicConfiguration#_gat.GA_Tracker_._setSampleRate
Yes, the sampled data is less accurate, especially with visitor counts.
I've also seen them miss 500k pageviews over two days, only to see them appear in their reporting a few days later. It also doesn't surprise me to see different results from different interfaces. The quality of Google Analytics has diminished, even as they have tried to become more real-time. It appears that their codebase is inconsistent across API's, and their algorithms are all over the map.
I usually stick with the same metrics and reporting methods, so that my results remain comparable to one another. I also run GA in tandem with Gaug.es, as a validation and sanity check. With that extra data, I choose the reporting method in GA that I am most confident with and I rely on that exclusively.

Resources