I moved a dataset from US region to EU region, following the instructions given by Google.
If you choose the wrong region and need to change it after you've created the link:
Delete the link to BigQuery (see below).
Backup the data to another dataset in BigQuery (move or copy).
Delete the original dataset. Take note of the name: you'll need it in the next step.
Create a new dataset with the same name as the dataset you just deleted, and select the location for the data.
Copy the backup data into the new dataset.
Repeat the procedure above to create a new link to BigQuery.
After changing the location, you'll have a gap in your data: streaming and daily exports of data will not process between deletion of the existing link and creation of the new link.
After creating the new dataset in EU and activating the link, I have the following issues: the intraday tables haven't been canceled completely in the following days and, also, the datasets that I created for the backup (copy) are being updating by Firebase exports.
What is happening with the exports?
Related
I have a mobile application that is connected to a BigQuery Warehouse through Firebase export. For keeping my dashboards up to date, I run incremental jobs (dbt) several times a day to extract data from the tables BigQuery creates that contain imported Firebase data. (see this article).
For real-time data streaming, a table with the suffix "_intraday" is created. Once that day is over, the data is moved over to the table which only contains full days and the intraday table is deleted.
It looks like when this happens (moving from intraday to full day), the event_timestamp (UNIX) is slightly changed (a few milliseconds) for each entry. The problem: I defined a combination of user_id and event_timestamp as unique key. Due to this issue, the first job dealing with the moved table will identify each row as a new, unique row, duplicating my resulting data exactly by 2.
Has anyone ever seen this issue and knows if it's expected? Do you know any other solution than implementing an event ID on the client, giving each event a unique identifier (through custom event params) and using this instead of user_id + timestamp?
auto-created tables
Thank you.
I recently connected my Google analytics (ga4) property to bigquery. I set up a daily export of the data. The data doesn't seem to get exported as I don't find any tables under the dataset in bigquery. The things I tried:
Delete the link and the dataset and start from scratch again. - Didn't work.
Check for service accounts - all were present and it still didn't work.
Issue: Dataset is getting created but the tables which should contain the raw data are not getting created.
My team has linked our Firebase and BigQuery projects and set up the intraday table. However, the table is created unpredictably. I was able to use it yesterday (events_intraday_20200701), but it is already noon as of writing this, and there is still no intraday table (events_intraday_20200702) in the dataset. (The regular event tables are there, per usual). In the streaming area of the Firebase console, I can see hundreds of events being generated, but cannot query an intraday table to see them in realtime.
I also struggle to find resources clarifying when the table is created besides "raw event data is streamed into a separate intraday BigQuery table in real-time" from
https://support.google.com/firebase/answer/6318765?hl=en. Are there reasons why the table may not be created, or more details about what time during the day I can expect it to exist?
On a related note, is it true that Web events are not supported for the intraday table?
Thanks!
After upgrading to Google Analytics in my Firebase project I linked up to a new GA property and had to re-setup my Bigquery integration (after accidentally linking/unlinking my GA account, my integration was turned off). I got it fixed by linking again but now all new data is fed into a new analytics_* dataset.
Since all my queries are referring to the old dataset it would be quite the effort renaming all of them to link to both the new dataset + the old dataset. Is it possible to either:
Change the destination table in the firebase Bigquery export (choosing the old dataset instead of the newly created one)
Somehow merge the two datasets (instead of copying them)
I understood it's impossible to rename datasets which would solve my issue if I could change the name of the new set to the old name and copy the contents of the old set to the new one.
UPDATE
I was able to fix it by:
unlinking the project again
using the firebase management api to link my firebase project again to the original GA property
https://firebase.google.com/docs/projects/api/reference/rest/v1beta1/projects/addGoogleAnalytics#request-body
This started feeding data back into my old property. I subsequently copied the partitioned tables from the newly created property/ table back into the old property (in Bigquery) using the same naming convention (eg. events_20190101) which correctly copied them in the partitioned dataset. I had to append some intraday events as well to the existing partitioned dataset but this solved my problem in the end.
According to the BigQuery export schema document for each single Firebase project linked to BigQuery, a single dataset named "analytics_" is added to your Bigquery project and such ID refers to your Analytics Property ID located in the Analytics Settings in Firebase (Settings -> Integrations -> Google Analytics).
It seems that this ID is generated automatically when you set up a property and it can be manually changed to a custom one.
Additionally, there's no way to merge datasets than copy the data between them. Thus, you could consider using scheduled queries to append the data in your old dataset.
Hope it helps
I am currently extracting my Firebase event data from BigQuery to an onsite database for analysis. I extract the Firebase intraday table(s) along with the previous 4 days (since previous days' tables continue to be updated) every time I run the ETL job. Since there is no key or unique ID for events, I am deleting & re-inserting the past 4 days of data locally in order to refresh the data from BigQuery.
Would it be possible for me to create a new field called event_dim.etl_status on the intraday table to keep track of events that have been moved locally? And if so, would this field make its way into the app_events_yyyymmdd table once it is renamed from *_intraday to *_yyyymmdd?
Edit:
Some more context based on comments from dsesto:
A magical Firebase-BigQuery wizard automatically copies/renames the Event "intraday" table into a daily table, so I have no way to reproduce or test this. It is part of the Firebase->BigQuery black box.
Since I only have a production environment (Firebase has no mechanism for a sandbox environment), testing this theory would require potentially breaking my production environment which is why I posed a "is it possible" scenario in case someone else has done something similar.