Uploading files automatically from Google Cloud to Big Query - bigdata

How to automate the process of uploading .CSV files from Google Cloud onto Big Query.

Google Cloud Storage provides access and storage log files in CSV formats which can be directly imported into BigQuery for analysis. In order to access these logs, you must set up log delivery and enable logging. The schemas are available online, in JSON format, for both the storage access logs and storage bucket data. More information is available in the Cloud Storage access logs and storage data documentation.
In order to load storage and access logs into BigQuery from the command line, use a command such as:
bq load --schema=cloud_storage_usage_schema.json my_dataset.usage_2012_06_18_v0 gs://my_logs/bucket_usage_2012_06_18_14_v0

Related

Copy Firestore Database Data to Bigquery using Cloud Shell Terminal

Does anyone know how I can manually copy/transfer data from Firestore database to Bigquery using Cloud Shell Terminal?
I did this in the past but I'm unable to find the documentation/video that I used. I find a lot that states that once Bigquery is connected to Firebase, it should be automatic but mine is not.
When I ran code in the Cloud Shell Terminal to pull data from Firebase the collection was copied as a table into a Bigquery dataset. Two tables were created and "raw_latest" and "raw_changelog" were created.
I'm not sure how to transfer another collection now.
I specifically need to transfer data from a subcollection in the Firestore database.
You can now export data from Cloud Firestore to BigQuery with a
Firebase Extension. To import all the previous data you will need
first to install the extension because all the writes while doing
the export first without installing the extension will be lost.
See: https://firebase.google.com/products/extensions/firestore-bigquery-export
Firestore allows import / export data to BigQuery using a GCS
bucket. The data is exported to a Cloud Storage bucket and from
there it can be imported into Big Query.
The gcloud commands for the same are :
export data :
gcloud beta firestore export --collection-ids=users gs://my bucket/users
load backup into bq :
bq load --source_format=DATASTORE_BACKUP mydataset.users gs://gs://mybucket/users/all_namespaces/kind_users/all_namespaces_kind_users.export_metadata
Here are some links that might be helpful:
https://firebase.google.com/docs/firestore/manage-data/export-import
https://cloud.google.com/bigquery/docs/loading-data-cloud-datastore
https://github.com/firebase/extensions/blob/master/firestore-bigquery-export/guides/IMPORT_EXISTING_DOCUMENTS.md

How to remove EXIF data from file uploaded to Cloud Storage via Firebase

I am uploading files, including pictures and videos taken on phones with location data, to a Cloud Storage bucket via the Firebase Storage SDK. How can I remove the EXIF data from these files? It would be nice to remove them on the client side, but if it requires a Cloud Function, that's fine.

Firebase events export from BigQuery

Is there a recommended way of exporting firebase events to Google Cloud Storage (for example Parquet format)?
If I export my data to BigQuery, what is the best way to have the data consistently pushed to GCP Cloud Storage?
The reason is that I have daraproc jobs dealing with parquet files in Cloud Storage, I want my firebase data to be accessible in the same way.
Exporting data from BigQuery directly as parquet file is not supported currently.
BigQuery supports three format now,
CSV
Avro
JSON
You have option to transform data to parquet file using Apache Beam & Google Cloud Dataflow. Use ParquetIO to transform data after reading data from BigQuery and write it to Cloud Storage.
Reference
Exporting Data(BigQuery)
https://cloud.google.com/bigquery/docs/exporting-data#export_formats_and_compression_types
ParquetIO(Apache Beam)
https://beam.pache.org/releases/javadoc/2.5.0/org/apache/beam/sdk/io/parquet/ParquetIO.html

How to obtain Firebase/Google Cloud Storage Storage Download Analytics

I understand that much of Firebase's services are wrappers around the Google Cloud platform (functions, storage etc..) I would like to obtain analytics on Google Cloud Storage downloads on an object basis (downloads, time). In the firebase console, it shows the amount of requests as well as the amount of data downloaded, but I don't know which objects were downloaded and how often.
Is there a logging method or API in Google Cloud I can utilize to obtain this data?
You have the Access Logs & Storage Logs that allows you to get the information for all the requests made to a bucket.
You will have to set up a bucket dedicated to store Access and Storage logs and then set logging for each bucket pointing to the dedicated one.

Access Google Datastore via BigQuery

It looks like Querying Google Cloud Bigtable Data is possible with BigQuery, with a url like:
https://googleapis.com/bigtable/projects/[PROJECT_ID]/instances/[INSTANCE_ID]/tables/[TABLE_NAME]
Even though Google Datastore is built on Google Bigtable, there's no indication of what the PROJECT_ID, INSTANCE_ID or TABLE_NAME would be, where
[PROJECT_ID] is the project containing your Cloud Bigtable instance
[INSTANCE_ID] is the Cloud Bigtable instance ID
[TABLE_NAME] is the name of the table you're querying
Is connecting to Datastore via a live connection possible via BigQuery? (i.e. not just via datastore-backup)
BigQuery allows you to query below sources
CSV files
Google sheets
Newline-delimited JSON
Avro files
Google Cloud Datastore backups
[Beta] Google Cloud Bigtable
BigQuery allows you to query below Google Cloud Datastore backups. But to do that you need create a table on BigQuery using the Datastore backup.
Follow the steps:
Step 1 - Create a bucket on gcs to store your backup. (link)
Step 2 - Take a backup of datastore. (link)
Step 3 - On Biguery load your backup creating a table (link)
Some considerations about Step 3:
You need import table by table.
The location will be the files ended on [Entity Name].backup_info.
Ex:
gs://bck-kanjih/ag9zfmdvb2dsLWNpdC1nY3ByQQsSHF9BRV9EYXRhc3RvcmVBZG1pbl9PcGVyYXRpb24YwZ-rAwwLEhZfQUVfQmFja3VwX0luZm9ybWF0aW9uGAEM.Conference.backup_info

Resources