Google Analytics export to BigQuery - google-analytics

I have a doubt related to the data export from Google Analytics into BigQuery.
Basically, I have configured the streaming export on the Google Analytics side to, in real time, export the data into the BigQuery (table ga_realtime_sessions_YYYMMDD). This streaming is working fine.
At some point at the end of the day, the data from this real timetable is exported into the ga_sessions_YYYYMMDD.
What I need to be explained is how this export (from the real timetable into the ga_sessions one) works.
I have several automatic processes that run around 8 AM (Portugal timezone) and, in the last days, these processes are failing due to the fact that the ga_sessions for the previous day are not created yet.
I checked the time that the ga_sessions are created for every day and this time is very volatile, and for some cases is around 2 AM, 3 AM but in another case is around 7 AM, 8 AM. This time difference could be due to the data size that needs to be exported from the real timetable into the ga_sessions one?

The exports of daily sessions in BigQuery are indeed not completed at the same time everyday. This is due to a fully managed backend, which depends on workloads worldwide.
I suggest that you create an event listener on file creation for ga_sessions_YYYYMMDD, so that only once it is created you can then safely run dependent processes.
E.g. you can export the file in a Cloud Storage bucket, then use a trigger with a Cloud Function.

Related

Firebase Functions returns error of Bandwidth Exhausted

We are using Firebase Functions with a few different HTTP functions .
One of the functions runs via a manual trigger from our website. It then pulls in a lot of data from an external resource and saves it into our Firestore database. Our function resources are Node.js 10, 1 GB of Memory and 540s before it times out.
However, when we have large datasets that we need to pull in, e.g. 5 000 - 10 000 records to write to the database, we start running into issues. We receive an error on large data sets of:
8 RESOURCE_EXHAUSTED: Bandwidth exhausted
The full error on Firebase Functions Health Dashboard logs looks like this:
Error: 8 RESOURCE_EXHAUSTED: Bandwidth exhausted
at Object.callErrorFromStatus (/workspace/node_modules/#grpc/grpc-js/build/src/call.js:31:26)
at Object.onReceiveStatus (/workspace/node_modules/#grpc/grpc-js/build/src/client.js:176:52)
at Object.onReceiveStatus (/workspace/node_modules/#grpc/grpc-js/build/src/client-interceptors.js:342:141)
at Object.onReceiveStatus (/workspace/node_modules/#grpc/grpc-js/build/src/client-interceptors.js:305:181)
at Http2CallStream.outputStatus (/workspace/node_modules/#grpc/grpc-js/build/src/call-stream.js:117:74)
at Http2CallStream.maybeOutputStatus (/workspace/node_modules/#grpc/grpc-js/build/src/call-stream.js:156:22)
at Http2CallStream.endCall (/workspace/node_modules/#grpc/grpc-js/build/src/call-stream.js:142:18)
at ClientHttp2Stream.stream.on (/workspace/node_modules/#grpc/grpc-js/build/src/call-stream.js:420:22)
at ClientHttp2Stream.emit (events.js:198:13)
at ClientHttp2Stream.EventEmitter.emit (domain.js:466:23)
Our Firebase project is on the blaze plan and also, on GCP connected to an active billing account.
Upon inspection on GCP, it seems like we are NOT exceeding our WRITES per minute quote, as previously thought, however, we are exceeding our Cloud Build limit. We are also using batched writes when we save data to firestore from within the function, which seems to also make the amount of db writes less. e.g.
We don't use Cloud Build, so I assume that Firebase Functions uses Cloud Build in the back end to run the functions or something, but I can't find any documentation on the matter. We also have a few firestore database functions that run when documents are created. Not sure if that uses Cloud build in the back end or not.
Any idea why this would happen ? Whenever this happens, our function gets terminated with that error which causes us to only import half of our data. The data import works flawlessly with smaller amounts of data.
See our usage here for this particular project:
Cloud Build is used during the deployment of Cloud Functions. If you check this documentation you can see that:
Deployments work by uploading an archive containing your function's source code to a Google Cloud Storage bucket. Once the source code has been uploaded, Cloud Build automatically builds your code into a container image and pushes that image to Container Registry. Cloud Functions uses that image to create the container that executes your function.
This by itself is not enough to justify the charges you are seeing, but if you check the container image documentation it says:
Because the entire build process takes place within the context of your project, the project is subject to the pricing of the included resources:
For Cloud Build pricing, see the Pricing page. This process uses the default instance size of Cloud Build, as these instances are pre-warmed and are available more quickly. Cloud Build does provide a free tier: please review the pricing document for further details.
So with that information in mind, I would make an educated guess that your website is triggering the HTTP function enough times to make Cloud Functions scale up this particular function with new intances of it, which triggers a build process for the container that hosts the function and charges you as a Cloud Build charge. So to keep doing what you doing you are going to have to increase your Cloud Build Quota to meet this demand of your website.
There was a Firestore trigger that was triggering on new records of the same type I was importing.
So in short, I was creating thousands of records in a collection, and for every one of those, the firestore rule (function) triggered, but what I did not know at the time, is that it created a new build process in the background for each firestore trigger that ran, which is not documented anywhere.

Firebase cloud functions dynamic time zones

So in my android app, I am using the real-time database to store information about my users. That information should be updated every Monday at 00:00 o'clock. I am using a cloud function to do this but the problem here is the time zones. Right now I have set the time zone to 'Europe/Sofia' for testing purposes. In the documentation, it is said that the time zone for cloud functions must be from the TZ database. So I figured I could ask the user before registering in my app their preferred time zone and save it in the database. My question is after getting the user's prefered time zone is there a way to only write one cloud function and execute it dynamically for each time zone in the TZ database or do I have to create individual functions for each time zone in the TZ database?
If I correctly understand your question, you could have a scheduled Cloud Function which runs every hour from 00:00 to 23:00 UTC+14:00 on Mondays, and, for every execution (i.e. for every hour within this range), query for the users that should be updated and execute the updates.
I'm not able to enter more into details, based on the info you have provided.
It's not possible to schedule a Cloud Function using a dynamic timezone. You must know the timezone at the time you write the function and declare it statically in your code.
If you want to schedule something dynamically, read through your options in this other question: https://stackoverflow.com/a/42796988/807126
So, you could schedule a repeating function that runs every hour, and check to see if something should be run for a user at the moment in time that it was invoked. Or, you can schedule a single future invocation of a function with a service like Cloud Run, and keep rescheduling it if needed.

Google Cloud Scheduler Run at Set Times Every Minute

I am trying to call an api every minute for ski lift status and check for changes. I am going to store the value of if the lift is open or closed in firebase (Real Time Database) and read to see if value from api is different and only update/ write to that node when it's a different value. Then I can set up a cloud function that will listen for database changes and send push notifications to the list of FCM tokens from that channel. I am not sure if this is the most efficient way, but I was going to set up scheduled functions to call the third party api.
I have been using these docs:
https://firebase.google.com/docs/functions/schedule-functions
I was planning to do something like this:
exports.scheduledFunction = functions.pubsub.schedule('every 5 minutes').onRun((context) => {
CALL MY API IN HERE AND UPDATE DATABASE IF SNAPSHOT BACK IS DIFFERENT
});
I was wondering how would I run only between set times- say 8am-6pm EST. I am struggling to find anything about times to run. Should I just run the function every minute and then pause and resume by checking the time? In which case how does it know to keep checking the time when it is paused?
Firebase scheduled functions use Cloud Scheduler to implement the schedule. It accepts cron style time specifiers to indicate when a job should be run. The full spec for that can be found here. You will have to use ranges of numbers to indicate the valid times and frequency of the schedule. For example, you might use "8-18" in the hour field to limit the hours of execution.

Why is my export sink from Stackdriver only loading the latest audit logs into BigQuery and no historical?

I created an export sink in Stackdriver to load audit logs into BigQuery. I want to be able to see audit logs from the past 3 months. However, when I queried the tables in BigQuery, I am only seeing logs from today and no earlier.
I applied the following filters to my export sink. I also tried removing the timestamp filter but still only seeing logs from today and no prior.
resource.type="bigquery_dataset"
timestamp > "2019-05-01T23:59:09.739Z"
Exports only work for new entries.
Per the documentation --
"exporting happens for new log entries only, you cannot export log entries that Logging received before your sink was created."
https://cloud.google.com/logging/docs/export/#how_sinks_work

Time Period of Firebase realtime profile operations

The official document of Firebase Realtime profiler says:
The profiler tool logs all the activity in your database over a given period of time, then generates a detailed report.
But it doesn't tell the specific time like last 24 hours.
My database usage shows that on a particular day, bandwidth consumed is X so I want to specify a particular day or time duration like last 24 hours in Firebase Realtime database profiler >
Q1. Is it possible to specify the duration in profile like last 24 hours?
Q2. How does profiler work?
I think, profiler just scans some log and keeps writing/streaming the operations to user console unless user stops the the profiling tool. Correct me if I am wrong here.
Q1. Is it possible to specify the duration in profile like last 24
hours?
No, it's not possible to profile "last" hours. But you can profile the next 24. (I'll get to that on Q2)
Q2. How does profiler work?
What the profiler does is it logs all the operations happening on your database from the time you run the command until the time you stop it. When you run the command, the console will show you how many operations have been logged so far and you can use Enter to stop logging. It will then show you (or save it to a file if you prefer) speed and bandwidth reports.
But it also has an option to set the logging duration (in seconds). For example, if you want to log the next 24 hours you can use:
firebase database:profile -d 86400
But have in mind that logging only happens if the computer that started it is still on. This means you'll need to keep your computer on for the next 24h.

Resources