If you refer to http://www.google.com/intl/en_uk/analytics/premium/features.html, you will notice that Standard allows for 10 million hits processed per month and Premium allows for 1 billion.
I have a website on an account, with multiple "folders" for different sub-domains, and also different "Views" or dashboards for some of these sub-domains.
The website I am on recently lost tracking for conversion rates, and everything has plummeted to near 0%, which is an incorrect statistic. I am curious as to how I can figure up if this account is reaching the 10 million limit on the standard version. Or at least how to figure actual hits processed a day, week, or month?
Any ideas?
Thanks!
I don't know how Google enforces hit limits in 2015. However in 2013 a Google representative sent one of our bigger clients a document (answering a question about data limits) that contained the following paragraph:
How do data limits impact sampling? Google Analytics does not sample
your clients data at the point of collection or processing, regardless
of how far they exceed our stated limits. So no hits are discarded.
The only way to sample data at the point of collection is for clients
to use_setSampleRate in their tracking code.
[...]
[...] we reserve the right to shutdown their account [sc. if limits are exceeded], but it won't
happen before we have attempted to contact the account Admins multiple times
and we have exhausted all other options.
Unless Google has changed it's policy in the last 1,5 years I would say not, unprocessed hits are not your problem; it seems Google would have contacted you with an request to limit your hits or upgrade to Analytics Premium before problems occurr.
Plus, since you mentioned that you have several views - views do not count towards your quota (they display the same data in different ways). However properties (I think that is what you mean by "folders") do.
Updated 2017: It seems that Google intends to enforce limits more strictly. One of my clients now has the following warning in his GA interface:
Your data volume (XXX hits) exceeds the limit of 10M hit per month as
outlined in our terms of service. If you continue to exceed the limit
you will lose access to future data.
You can create a database table, like this:
visits(
id bigint primary key auto_increment,
ip text,
visit_date timestamp default current_timestamp
)
Upon each page visit, you can insert a record into the table. Later you can view statistics. For instance, visit count in a given day would look like:
select id, ip, visit_date
from visits
where visit_date >= '2015-07-21 00:00:00' and visit_date < '2015-07-22 00:00:00'
Related
I wonder whether someone can help me please.
I have a user who under a specific property, sporadically receives the following error:
Some hits sent on 03-Jul-2018 to property ...... exceeded one or more hit quotas and were therefore not processed.
Hits can be dropped when daily or monthly hit limits are exceeded. You can view your hit volume levels in Property Settings in Analytics.
Hits can also be dropped if visitor hit limits are exceeded. This can happen when your site is incorrectly generating the visitor ID for a GA session. Contact your website administrator to check that the visitor ID generation has been correctly implemented.
They are not using the Premium account but when I look at the data for the day in question, there aren't any issues with regards to 'High Cardinality' which unless I've misunderstood I'd expect to see.
Could someone look at this please and offer some guidance where the issue may be because this area is fairly new to me.
Many thanks and kind regards
Chris
Collection limits are influenced by 2 factors:
The tracker: whether you use ga.js,gtag.js,analytics.js etc... here are the details.
The property type: whether you are using GA (10M hits / month) or GA 360 (2B hits / month).
In your case you are facing a property limit. To find out when such limits where reached, you can create a custom report using a time dimension (eg date+time) combined with the hits metric. You can also combine the hit metrics with other dimensions (country, browser, device) to see if you find any patterns as to why you're getting so many hits.
Cardinality is something else: it refers to the number of unique value combinations for your dimensions. For instance if you have 500K events where each event category is different, you'll have a Cardinality of 500K on the event category dimension. The more hits, the more likely you'll have a high cardinality, but the 2 aren't necessary related (if you send 10B events with the same category, the cardinality on the category is 1).
So focus on identifying and solving your limits/quotas issue, as it's the real issue here:
If the number of hits is legitimate (you have a huge amount of traffic), then the only options are to upgrade to GA 360 or reduce the number of hits for each session
If the number of hits is abnormally high (eg traffic is stable but hits increased dramatically), look for implementation issues, especially generic event trackers such as error tracking with tools like Google Tag Manager
In a typical GA session, after picking a View ID and a date range,
We can get a week's worth of data like this:
Users
146,207
New Users
124,582
Sessions
186,191
The question is, what BQ field(s) to query in order to get this Users value?
Here is an example query with 2 methods (the 2nd method is commented out).
SELECT
count(DISTINCT(CONCAT(CAST(visitID as STRING),cast(visitNumber as
STRING)))) as visitors,
-- count(DISTINCT(fullVisitorId)) as visitors
I noticed the FVID method was fairly close to what I see in GA (with Users being a little understated by a 3% in BQ) and if I use the commented out method, I get a value that is about 15% overstated as compared to GA. Is there a more reliable method in BQ to acquire the Users value in GA?
The COUNT(DISTINCT fullVisitorId) method is the most correct method, but it won't match what Analytics 360 reports by default. Since last year, Google Analytics 360 by default uses a different calculation for the Users metric than it previously did. The old calculation, which is still used in unsampled reports, is more likely to match what you get out of BigQuery. You can verify this by exporting your report as an unsampled report, or using the unsampled reporting features in the Management API.
If you want the numbers to match exactly, you can turn off the new calculation by using the instructions here. The new calculation's precise details are not public, so duplicating that value in BigQuery is quite difficult.
There are still some reasons you might see different numbers, even with the old calculation. One is if the site has implemented User ID, in which case the GA number will be lower than BigQuery for fullVisitorId. Another is sampling, though that's unlikely in Analytics 360 at the volumes you're talking about.
According to Firebase Analytics docs (https://support.google.com/firebase/answer/6317517#active-users), the active number of users is the number of unique users who initiated sessions on a given day. Also according to the docs, every time a session is started an event with session_start name is sent. I am trying to get that metric using BigQuery's export, but my query is giving me different results (15636 on BigQuery, 14908 on FB analytics)
I have also tried converting to different timezones to see if that might be the issue, but no matter which timezone I try I never get the same (or similar) results
Which query should I run to get the same results I get on Firebase Analytics dashboard for active users?
My query is
SELECT EXACT_COUNT_DISTINCT(user_dim.app_info.app_instance_id)
FROM table_date_range([XXXXX.app_events_], timestamp('2016-11-26'), timestamp('2016-11-29'))
WHERE DATE(event_dim.timestamp_micros) = '2016-11-27'
AND event_dim.name ='session_start'
Thanks
Update
After #djabi's answer I changed my query to use user_engagement rather than session_start and it works much better now. Still some minor differences though (they range from under ten to under 50 out of 16K, depending on the date).
I have tried once again using different timezones by playing around with DATE(date_add(event_dim.timestamp_micros,1,'hour')) but I never got the exact number I get on Firebase Analytics dashboard.
The new numbers are good enough to be considered statistically acceptable, but wondering if anyone has a suggestion to improve the query and get exact results?
The current query is:
SELECT
COUNT(*) AS active_users
FROM (
SELECT
COALESCE(user_dim.user_id, user_dim.app_info.app_instance_id) AS user_id
FROM
TABLE_DATE_RANGE([XXXXX.app_events_], TIMESTAMP('2016-11-24'), TIMESTAMP('2016-11-29'))
WHERE
DATE(event_dim.timestamp_micros) = '2016-11-25'
AND event_dim.name ='user_engagement'
GROUP BY
user_id )
Note: At the moment we are not sending user_id, so the COALESCE will always return the app_instance_id, in case anyone was going to suggest that could be the problem
You need to wait for full 3 days for data from offline devices to be uploaded. Your query correctly filter the events based on the event timestamp and you pull data from 3 days but that is only day and half from today and that is enough for all data to be uploaded. Try including 3 days from yesterday.
Also try using user_engagement event instead of session_start. I believe active user count is based on user_engagement and not on session_start events.
Also FB reports take a bit to process so you wight want and check the FB reports the next day.
FB reports are done on the time zone on the account and events are timestamped in UTC so the day in FB reports is different from UTC calendar day. You want to control for that discrepancy as well to get matching numbers.
Sessions are by-default measured after user activity of 10 seconds in the respective app which you can change. Try changing the sessions start time count to the least number possible and then you may arrive at a number closer to what you are expecting.
For Android stats I used:
user_dim.device_info.resettable_device_id
instead of
user_dim.app_info.app_instance_id
and it produced better results.
I wish to extract (via the Analytics Core Reporting API) all the transactions made TODAY by users that had a specific ga:eventCategory few weeks ago.
I'm looking to see the date of a transaction and all dated of event that are related to that transaction.
If GA was sql I would join by the ga user and take in the dimension both his transactions date and his dimension update date...
Thanks.
Noam.
Like I have indicated in my comment you can segment the data to include only those users who have the specific event. Segmentation works fine with the core reporting API.
Your segment defintion would look like this:
users::condition::ga:eventCategory==[myEventCategory]
(where obviously the thing in [brackets] is a placeholder that needs to be substituted for the event category name). The "users::" prefix means you are segmenting by user scope (as opposed to sessions), so this will include all sessions in the selected timeframe for users who had the event at least in one of their session (even if the event was outside the selected timeframe).
Select transactionId as dimension and some metric (revenue) and todays date and you are done. Or you would be done if this was actually going to work, but there are at least two caveats:
Google Analytics does not work in realtime, so it's unlikely that TODAYs transactions are fully available (Google says it's 24 hours until the data is processed - actually it might happen faster, but you cannot rely on it).
If a user has deleted his or her cookie she won't be recognized as a recurring user and GA will be unable to segment her out. The longer the interval between the event and the transaction the less likey it is that the GA cookie is still present.
So even with a technically correct query it might be that you won't get the data you need.
We've noticed lately that as our site is growing, our data in Google Analytics is getting less reliable.
One of the places we've noticed this most strongly is on the "Realtime Dashboard".
When we were getting 30k users per day, it would show about 500-600 people on line at a time. Now that we are hitting 50k users per day, it's showing 200-300 people on line at a time.
(Other custom metrics from within our product show that the user behavior hasn't changed much; if anything, users are currently spending longer on the site than ever!)
The daily totals in analytics are still rising, so it's not like it's just missing the hits or something... Does anyone have any thoughts?
The only thing I can think of is that there is probably a difference in interpretation of what constitutes a user being on line.
How do you determine if the user is on line?
Unless there is an explicit login/logout tracking, is it possible that it assumes that a user has gone if there is no user generated event or a request from the browser within an interval of X seconds?
If that is the case then it may be worth while adding a hidden iframe with some Javascript code that keeps sending a request every t seconds.
You can't compare instant measures of unique, concurrent users to different time-slices of unique users.
For example, you could have a small number of concurrent unique users (say 10) and a much higher daily unique users number like 1000, because 1000 different people were there over the course of the day, but only 10 at any given time. The number of concurrent users isn't correlated to the total daily uniques, the distribution over the course of the day may be uneven and it's almost apples and oranges.
This is the same way that monthly unique and daily uniques can't be combined, but average daily uniques are a lower bound for monthly uniques.