Events table is created too late in BigQuery - firebase

I am exporting my app's events from Firebase to BigQuery on a regular basis. This system creates an 'events_intraday_YYYYMMDD' table for the current day we are in, and for the previous days, the data is stored in 'events_YYYYMMDD' tables, one for every day. When the day ends, the events_intraday table is converted to an events table, and a new intraday table is created for the new day. This normally happens around early in the morning.
Recently, this process started to take too long. The new intraday tables are created when they should've been, but the older ones are not converted to events tables. For example, today (15 Apr 2021) I still have the events table for the 14th as an intraday table, which should've been saved as an events table much earlier. In the last couple of days, intraday tables are kept as long as 2-3 days.
I've looked at every documentation in Firebase, Google Analytics, and BigQuery. But could not find a solution to this, nor how to edit the settings for this if there is any. Firebase says that I should check the 'event export settings' in Google Analytics, but there is no such option there.
Anyone else ever experienced the same problem?

Related

How To Change Firebase BigQuery Integration 'Dataset Time To Live' From 60 Days To Does Not Expire?

We've recently upgraded our BigQuery integration in Firebase from Sandbox to Blaze, but the 'Dataset Time To Live' is still 60 days. We've updated the dataset's 'Default table expiration' in BigQuery from 60 days to 'Never', but it's still only retaining the last 60 days of data in the historic event table and didn't change the 'Dataset Time To Live' field in Firebase. We've also updated our Data Retention settings in GA4 to retain data for the last 14 months, but it didn't have an effect on the BQ integration table expiration either.
Any help on how to get the 'Dataset Time To Live' to be set to 'Does not expire' would be greatly appreciated.
Thanks!
Once you've updated the dataset's Default table expiration all new tables within this dataset will have Table expiration = Never, but all already existing tables will still have the old value. You need to update it manually for all existing tables. Also check if there any partition expiration configured.

Yesterday's data from BigQuery

We are having some issues pulling yesterday's Google Analytics data from BigQuery. Can anyone explain at what point a previous day's GA data is finalized?
There is some explanation here of the intraday tables, but it's not very clear:
https://support.google.com/analytics/answer/3437719?hl=en
To get previous day data do you need to need to use the intraday tables at all? Do you have access to the fully processed dataset at 8am local time? Or is it 8 hours after the current day UTC+14:00 (etc)?
I had a similar question and asked their support, this is the reply:
"According to this Google Analytics documentation , it states that '1 file will be exported each day that contains the previous day’s data, and 3 files will be exported each day that contain the current day's data'. In such, the minimum time that the data from Google Analytics to be exported to BigQuery was 8 hours. Although Google Analytics can be linked to BigQuery, the availability of data depends on how it was served by Google Analytics 360."
But based on experience, it's really a minimum time. Sometimes there are delays of 4-5 hours.
My team has been pressing Google's support for providing SLA's for BigQuery dump, so they updated the documentation:
This feature is not governed by a service-level agreement (SLA).
In practice we are experiencing regular delays anywhere between 2 to 12 hours.

my google analytics shows data from realtime stream but no hits in standard report

I can see traffics in realtime stream, there are active users on my site.
But I can't see visitor information in standard report.
My site was set up long times ago, and previously data collecting works fine, so it shouldn't be the 'new property not display data within 24 hours' issue.
I did modify my property yesterday, I've added 'referral exclusion' item, and deleted few minutes later.
There're no filters in my view anymore.
No hits collected mostly like happen after I've done item 4(change property).
How can I fix this issue, any ideas?
Did you check your monthly quota? (admin > property settings > property hit volume > last month).
In the free version of GA you've 10 million hits per month per property.(https://developers.google.com/analytics/devguides/collection/analyticsjs/limits-quotas).
Moreover if you send more than 200,000 sessions per day to Analytics will result in the reports being refreshed only once a day. This can delay updates to reports and metrics for up to two days.
(https://support.google.com/analytics/answer/1070983?hl=en)
Maybe one of these conditions is the cause of the issue.

Discrepancies on "active users metric" between Firebase Analytics dashboard and BigQuery export

According to Firebase Analytics docs (https://support.google.com/firebase/answer/6317517#active-users), the active number of users is the number of unique users who initiated sessions on a given day. Also according to the docs, every time a session is started an event with session_start name is sent. I am trying to get that metric using BigQuery's export, but my query is giving me different results (15636 on BigQuery, 14908 on FB analytics)
I have also tried converting to different timezones to see if that might be the issue, but no matter which timezone I try I never get the same (or similar) results
Which query should I run to get the same results I get on Firebase Analytics dashboard for active users?
My query is
SELECT EXACT_COUNT_DISTINCT(user_dim.app_info.app_instance_id)
FROM table_date_range([XXXXX.app_events_], timestamp('2016-11-26'), timestamp('2016-11-29'))
WHERE DATE(event_dim.timestamp_micros) = '2016-11-27'
AND event_dim.name ='session_start'
Thanks
Update
After #djabi's answer I changed my query to use user_engagement rather than session_start and it works much better now. Still some minor differences though (they range from under ten to under 50 out of 16K, depending on the date).
I have tried once again using different timezones by playing around with DATE(date_add(event_dim.timestamp_micros,1,'hour')) but I never got the exact number I get on Firebase Analytics dashboard.
The new numbers are good enough to be considered statistically acceptable, but wondering if anyone has a suggestion to improve the query and get exact results?
The current query is:
SELECT
COUNT(*) AS active_users
FROM (
SELECT
COALESCE(user_dim.user_id, user_dim.app_info.app_instance_id) AS user_id
FROM
TABLE_DATE_RANGE([XXXXX.app_events_], TIMESTAMP('2016-11-24'), TIMESTAMP('2016-11-29'))
WHERE
DATE(event_dim.timestamp_micros) = '2016-11-25'
AND event_dim.name ='user_engagement'
GROUP BY
user_id )
Note: At the moment we are not sending user_id, so the COALESCE will always return the app_instance_id, in case anyone was going to suggest that could be the problem
You need to wait for full 3 days for data from offline devices to be uploaded. Your query correctly filter the events based on the event timestamp and you pull data from 3 days but that is only day and half from today and that is enough for all data to be uploaded. Try including 3 days from yesterday.
Also try using user_engagement event instead of session_start. I believe active user count is based on user_engagement and not on session_start events.
Also FB reports take a bit to process so you wight want and check the FB reports the next day.
FB reports are done on the time zone on the account and events are timestamped in UTC so the day in FB reports is different from UTC calendar day. You want to control for that discrepancy as well to get matching numbers.
Sessions are by-default measured after user activity of 10 seconds in the respective app which you can change. Try changing the sessions start time count to the least number possible and then you may arrive at a number closer to what you are expecting.
For Android stats I used:
user_dim.device_info.resettable_device_id
instead of
user_dim.app_info.app_instance_id
and it produced better results.

Last “end date” with data in Analytics

I'm using "Reporting google Analitics API" and I can’t find information about what the last “end date” with data in Analytics is.
For example, let's suppose you want to retrive the last month’s data.
When do you have to perform the query?
The first day of the current month?
...or the second one?
...or maybe the third one?
And only another question: are the returned data for days in pacific time?
Google Analytics API is supposed to have access to the same data you have in the interface.
Google says that data can take up to 24h to process. The time it takes to really update the data depends on the type and size of the account. Small accounts are updated multiple times a day and can have data available in just a few hours. Once you reach 1M hits a month you are moved to a different mode where the data on your account is updated only once a day. Google Analytics Premium customers have updates more often even for large ammounts of traffic.
There's no way to tell through the API what is exactly the time of the last hit processed. You can query the data for today by the hour and see for yourself though.
Usually you don't care and just want to make sure that the data you're querying has been fully processed for that day.
So if you query data for yesterday there's a chance it has not being completely updated, for example if it's midnight the data for yesterday is just a couple minutes ago and probably haven't been completely processed yet. The safest bet in this case is to query data for 2 days ago.
So if today is 2012-06-15 and you want to get 1 month of data a safe approach is to query data with start-date=2012-05-13 and end-date=2012-06-13. This will most of the time give you data for days that have been fully processed, but it's not 100% safe as well. Google Analytics have had outages in the past where data took longer than that to process, these are not usual though. When you get the data out it's really hard to tell just for the API if the data for those days have been fully processed or not, using the 2 days ago isea you just make it more likely that it is.
The days are aggregate following your timezone settings configured on the Google Analytics profile.

Resources