Why is the intraday table sometimes missing from the BigQuery dataset? - firebase

My team has linked our Firebase and BigQuery projects and set up the intraday table. However, the table is created unpredictably. I was able to use it yesterday (events_intraday_20200701), but it is already noon as of writing this, and there is still no intraday table (events_intraday_20200702) in the dataset. (The regular event tables are there, per usual). In the streaming area of the Firebase console, I can see hundreds of events being generated, but cannot query an intraday table to see them in realtime.
I also struggle to find resources clarifying when the table is created besides "raw event data is streamed into a separate intraday BigQuery table in real-time" from
https://support.google.com/firebase/answer/6318765?hl=en. Are there reasons why the table may not be created, or more details about what time during the day I can expect it to exist?
On a related note, is it true that Web events are not supported for the intraday table?
Thanks!

Related

Firebase Export to BigQuery: event_timestamp changes when going from intraday to full day table

I have a mobile application that is connected to a BigQuery Warehouse through Firebase export. For keeping my dashboards up to date, I run incremental jobs (dbt) several times a day to extract data from the tables BigQuery creates that contain imported Firebase data. (see this article).
For real-time data streaming, a table with the suffix "_intraday" is created. Once that day is over, the data is moved over to the table which only contains full days and the intraday table is deleted.
It looks like when this happens (moving from intraday to full day), the event_timestamp (UNIX) is slightly changed (a few milliseconds) for each entry. The problem: I defined a combination of user_id and event_timestamp as unique key. Due to this issue, the first job dealing with the moved table will identify each row as a new, unique row, duplicating my resulting data exactly by 2.
Has anyone ever seen this issue and knows if it's expected? Do you know any other solution than implementing an event ID on the client, giving each event a unique identifier (through custom event params) and using this instead of user_id + timestamp?
auto-created tables
Thank you.

BigQuery - Large Amount of Rows in Newest Event Table

I recently linked my Firebase project's Analytics with BigQuery using the free-tier sandbox, and now I'm nearing the 10GB storage ceiling.
I noticed that the majority of that exported data populated to the earliest event table created (earliest table was 6.5GB and other tables were ~50-100MB), so yesterday I just deleted that earliest event table to get rid of all of those old rows I didn't want.
However, I noticed after checking today that the newest event table is roughly the same size as the one I deleted.
My questions are:
Is the latest table created so big because that's older rows repopulating?
Is it possible to delete that large chunk of data from storage without a similarly-sized amount flowing into the next event table that's created?

Saving data in bigquery simultaneously from a source other than firebase

We are trying to log some events based on our application. There are two types of events,
Client-side events: logged by an android application using firebase SDK, which is saved to bigquery by firebase
Server-side events: we log them using bigquery's go client
Now, firebase stores current day's events into events_intraday_$date table and then flushes that table into partitioned table events_$date.
So, I also logged current day's events into events_intraday_$date table.
The events got logged successfully into the table but got deleted the next day when the events_intraday_$date table is flushed into events_$date table.
I'm not able to understand how is that happening.
Looks like this is intended behaviour:
Within each dataset, a table is imported for each day of export. Daily tables have the format "ga_sessions_YYYYMMDD".
Intraday data is imported approximately three times a day. Intraday tables have the format "ga_sessions_intraday_YYYYMMDD". During the same day, each import of intraday data overwrites the previous import in the same table.
When the daily import is complete, the intraday table from the previous day is deleted. For the current day, until the first intraday import, there is no intraday table. If an intraday-table write fails, then the previous day's intraday table is preserved.
Data for the current day is not final until the daily import is complete. You may notice differences between intraday and daily data based on active user sessions that cross the time boundary of last intraday import.

Google analytics realtime data in BigQuery

We have enabled continuous export of Google Analytics data to BigQuery which means we get ga_realtime_sessions_YYYYMMDD tables with data dumps throughout the day.
These tables are – usually! – left in place, so we accumulate a stack of the realtime tables for the previous n dates (n does not seem to be configurable).
However, every once in a while, one of the tables disappears, so there will be gaps in the sequence of dates and we might not have a table for e.g. yesterday.
Is this behaviour documented somewhere?
It would be nice to know which guarantees we have, as we might rely on e.g. realtime data from yesterday while we wait for the “finished” ga_sessions_YYYYMMDD table to show up. The support document linked above does not mention this.
As stated in this help article, these internal ga_realtime_sessions_YYYYMMDD tables should not be used for queries and the ga_realtime_sessions_view_YYYYMMDD view should be used instead for your queries, in order to obtain the fresh data and to avoid unexpected results.
In the case you want to use data from some day ago while you wait for the internal ga_realtime_sessions_YYYYMMDD tables to be created for today, you can choose to copy the data obtained from querying the ga_realtime_sessions_view_YYYYMMDD view, into a separate table at the end of a day for this purpose.

Can I add a field to the app_events_intraday table in BigQuery?

I am currently extracting my Firebase event data from BigQuery to an onsite database for analysis. I extract the Firebase intraday table(s) along with the previous 4 days (since previous days' tables continue to be updated) every time I run the ETL job. Since there is no key or unique ID for events, I am deleting & re-inserting the past 4 days of data locally in order to refresh the data from BigQuery.
Would it be possible for me to create a new field called event_dim.etl_status on the intraday table to keep track of events that have been moved locally? And if so, would this field make its way into the app_events_yyyymmdd table once it is renamed from *_intraday to *_yyyymmdd?
Edit:
Some more context based on comments from dsesto:
A magical Firebase-BigQuery wizard automatically copies/renames the Event "intraday" table into a daily table, so I have no way to reproduce or test this. It is part of the Firebase->BigQuery black box.
Since I only have a production environment (Firebase has no mechanism for a sandbox environment), testing this theory would require potentially breaking my production environment which is why I posed a "is it possible" scenario in case someone else has done something similar.

Resources