Is there a way to create custom day partition on BigQuery Tables? - google-cloud-datastore

I'm new to BigQuery and I'm trying to create table with day partition other than the default that google is allowing. Is it possible to back date the DAY partition? Since I'm trying to load historical data I wouldn't be helpful using today for those partition. I'm creating table in BigQuery through Google Storage.
Thanks!

Yes, you can use partition decorators to insert data into a specific partition: https://cloud.google.com/bigquery/docs/partitioned-tables#addressing_table_partitions.
bq load 'mydataset.table$20160519' ...

Related

Firebase Export to BigQuery: event_timestamp changes when going from intraday to full day table

I have a mobile application that is connected to a BigQuery Warehouse through Firebase export. For keeping my dashboards up to date, I run incremental jobs (dbt) several times a day to extract data from the tables BigQuery creates that contain imported Firebase data. (see this article).
For real-time data streaming, a table with the suffix "_intraday" is created. Once that day is over, the data is moved over to the table which only contains full days and the intraday table is deleted.
It looks like when this happens (moving from intraday to full day), the event_timestamp (UNIX) is slightly changed (a few milliseconds) for each entry. The problem: I defined a combination of user_id and event_timestamp as unique key. Due to this issue, the first job dealing with the moved table will identify each row as a new, unique row, duplicating my resulting data exactly by 2.
Has anyone ever seen this issue and knows if it's expected? Do you know any other solution than implementing an event ID on the client, giving each event a unique identifier (through custom event params) and using this instead of user_id + timestamp?
auto-created tables
Thank you.

DLP data scan from bigquery table showing row_index as null

I have scanned a Bigquery table from Google DLP Console. The scan results are saved back into a big query table. DLP has identified sensitive information, but the row_index is shown as null "location.content_locations.record_location.table_location.row_index", can anyone help me understand why?
We no longer populate row_index for bigquery as it's not meaningful since BQ is unordered. If you want to identify the row where the finding came from, I suggest using identifyingFields which lives in BigQueryOptions when you create your job.
https://cloud.google.com/dlp/docs/creating-job-triggers#job-identifying-fields

Is there any way to access Firebase raw event data prior to linking to BigQuery?

I need to access raw event data stored in Firebase. Thus, I have linked Firebase to Bigquery last month. Bigquery currently creates daily tables containing event data for a month. However, as the Bigquery documentation states, it is not possible to import data prior to linking to Bigquery. Does anyone know how these data can be exported?
The exact dataset from prior to linking cannot be exported in anyway.
The only workaround is that if you want to look up specific information, you can try using the GA4 Data API to fetch the information. Again, this will not give you the entire dataset export.

Why is the intraday table sometimes missing from the BigQuery dataset?

My team has linked our Firebase and BigQuery projects and set up the intraday table. However, the table is created unpredictably. I was able to use it yesterday (events_intraday_20200701), but it is already noon as of writing this, and there is still no intraday table (events_intraday_20200702) in the dataset. (The regular event tables are there, per usual). In the streaming area of the Firebase console, I can see hundreds of events being generated, but cannot query an intraday table to see them in realtime.
I also struggle to find resources clarifying when the table is created besides "raw event data is streamed into a separate intraday BigQuery table in real-time" from
https://support.google.com/firebase/answer/6318765?hl=en. Are there reasons why the table may not be created, or more details about what time during the day I can expect it to exist?
On a related note, is it true that Web events are not supported for the intraday table?
Thanks!

Create a table from a query result with record fields in BiqQuery

We have a set of tables with the data about users interactions online and we want to create a table with the scheme similar to GA BigQuery Export Schema (this feature is not yet available in Russia).
I couldn't find the information on how to create a record field in BQ querying existing tables.
On the contrary, it is written that "This type is only available when using JSON source files."
Is there any workaround or this feature expected in a nearer future? Can I submit a feature request?
Currently the only way to get nested and repeated records into BigQuery is loading JSON files. Once a query is run, all structure is flattened.
Feature request noted, hopefully BigQuery will support emitting nested records results!

Resources