Stackdriver Logging as Data Pipeline for Analytics - stackdriver

Is it appropriate to use Stackdriver Logging as a means of recording events for analytics such as ad impressions?
Stackdriver Logging can be used to record events (expressed in JSON) that can then be written to GCS, PubSub, and/or BigQuery. Is it appropriate to use this as a means of recording events such ad ad impressions for use in OLAP processes? e.g. are its reliability and throughput adequate for such use cases?

I can't see any reason why Stackdriver Logging wouldn't technically work. It appears that you could use the logging API to write the records to Stackdriver and then have an export to pump them out-bound to BigQuery, GCS or PubSub. Combine that with a Stackdriver logging exclude and the written records need not be written to the actual log and hence apparently not be included in your logging utilization.
A possible down-side would be that as you pass through Stackdriver, you are likely increasing the path length of writes and hence it may be longer before a write actually makes its way to the final destination. However, for historical analytics, this should not pose a problem.
I'd also suggest that you encapsulate your record writes in a function where your initial implementation might be to use Stackdriver but if you change the implementation in the future, you will only have to modify the function body.

Related

Monitoring reads (per user) in Firebase

I have thought about monitoring reads for users in Firebase.
Let's say that a user makes a lot of "reads", then I will monitor these "reads" by writing to a counter in a separate document. Once the user has reached 104 reads I will tell the user that they have reached their daily quota. In such a case I also want to disable them from making any further reads (for a day), is this possible in Firebase?
No, that's not currently possible. At least not in any secure way that would prevent the user from manipulating the value.
Your best solution at present would be to use Firestore or RTDB as a non-public database and then create an API behind the API Gateway to handle your rate limiting. Then reads go through your API giving you much more control but of course involves losing realtime updates.

Is there a Google API answering about Firestore database either Metrics or Health Checks or Current Active Connectios or Exceptions or Performance

Context: I am total Google Cloud begginer and I have just convinced my company headers to use Firestore Realtime Database for pushing transaction status to our mobile application. We have around 4 millions users that will use significantly our application for small money transfers. Now-a-days we use the concept of polling from Android/IOS to our Microservice endpoints and it will replaced by Firebase SDK imported to our Mobile app which will listen/observe to our Firestore Collection following few Firestore Rules. Since all money transfer will be confirmed/denied in short time (from few seconds to 1 or 2 minutes) the idea of replacing polling by a real reactive approach straigh from Firestore sounded and is already ongoing coding.
The issue: Firstly I don't what to compare solutions. It is just my reality: the prodution support operators must look after our internal Dashboard. Isn't allowed to them look at Google Dashboard Console (please accept this for this question). I need get on demand metrics of our FIrestore. It is nothing to do with Google pricing. It is just our demand: they want to see metrics like:
how many users listening at the same time now
how many users took some exception during connection
is there any user holding connection for more than X minute
when was the connection pick this morning
any exception of any type surrounding our Firestore database
I read Code Samples carefully follow the sample step-by-step trying to figure out some idea if there is some API providing the answers I am looking for.
So, my straight question is: is there such type of Google API providing metrics about my Firestore Database? Maybe following the same idea we found in Performance Monitor which works on Mobile side also some similar aproach on Firestore side.
*** Edited
Future readers may find worth read also about a way to get Firestore metrics info striagh from curl/postman
A couple of things: You mentioned both Firestore and Realtime Database; just wanted to make sure that you are aware that those are two different databases offered under the Firebase umbrella.
how many users listening at the same time now
is there any user holding connection for more than X minute
Yes, there's a dashboard: https://support.google.com/firebase/answer/6317517?hl=en. Including lots of options, like users active in the last 30 mins.
how many users took some exception during connection
any exception of any type surrounding our Firestore database
Yes, you can track errors and other logging via Stack Driver logging. These can give you reports on your cloud functions.
https://cloud.google.com/functions/docs/monitoring
Where can I find Stackdriver in Firebase console?
when was the connection pick this morning
For this one, I'm not sure if you mean A. when did somebody log on in the morning, or B. what was the time that there was the peak \ most usage. If B see 1. If A,
Real-time database has the concept of presence, which lets you know if a user is currently logged in or not. See examples here from the official documentation:
https://firebase.google.com/docs/firestore/solutions/presence
and this post
How to make user presence mechanism using Firebase?
Also applies to your
is there any user holding connection for more than X minute
..............
Edit in response to comments: I believe you are experiencing the XY problem https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem where you are focused on a particular solution, even though your problem has other solutions. User metrics, database events, and errors are all accessible through both dashboards and cloud functions. You can cURL cloud functions if you wish, or set up cron functions to auto report, or set up database trigger functions to log errors. So, while the exact way you want this to work may not exist, you just need to connect existing tools to get the result you want.

Throttling callable https firebase cloud function execution per user?

I was not able to find any resources about this, hence wanted to ask if it is a good idea / necessary to add throttling to callable https cloud functions in firebase on per user basis?
Example, I want to limit one user to be only able to call https function every 5 seconds.
If it is a viable thing to do, how would it be acheived?
There is not any inbuilt per user throttling capabilities in cloud functions. You have a few options of doing your own:
Put logic in your client side apps that tracks the amount of times a user is calling them and deny the call if too frequent
Issue here is that if someone is trying to game you this wouldn't be 100% effective as they could use multiple windows, etc.
You could implement a database solution where you track their usage and at the beginning of your function you check if they are violating your rate limit
Issue here is you are still having the triggers of your functions incurring the costs.
If it was a super big issue for you, I would recommend looking at using an API management platform such as Apigee where you can apply policies such as rate limiting
This a heavy weight solution with an increased cost and so wouldn't do it unless necessary

What Firebase mechanism to use for system logging, Fabric, Firebase Analytics or RealTime DB

I have a problem to understand my data in the Firebase Realtime Database,
it is not always clear and easy to navigate and query the data.
The database entries are not easily understood and interpreted. Adding more data to make the entries more clear will add to the data storage costs.
Adding extra data for a certain duration, cannot be cleared out easily. It is also not visible when database read operations are performed. Dates are stored as an integers and cannot be interpreted easily. I have to use a bash command "date -r ", etc.
Clearly the database alone is not enough to debug flow of events,
and as the database grows, analysis of data will be harder.
I have no record of the sequence of the events that describes the database entries.
Possible solutions:
(1) Use Realtime Database
I create another Realtime Firebase "node" and log all the events in a human friendly way to this node". This way I have control over this data and can clear this at any time I wish, minimising the data "costs". The only problem I see with this is that I will have to periodically have to remember to clear this data. (Perhaps Firebase has some periodic scheduler to call some process). Or use some mobile clients to trigger the events...
(2) Fabric
My other option is to use Fabric's Answers, but after looking at the reporting of this data, it does not really suite my needs, no filtering and the details of the messages are not really as verbose as I have expected.
(3) Firebase Analytics
I am not sure about Firebase Analytics, I see no mechanism to clear events,
will this add to my costs? Will it be easy enough to filter / query logs
to analyse a certain sequence of events.
Typically I would like to see data something like this:
data_time_user_friendly,user_id,user_friendly_id,event_action,payload
What is the best practice to have a remote syslog for analysing my data and flow of events.
Edited ....
After some searching, I found there are numerous products that seem more suitable and are specifically developed for logging.
I did a quick comparison of some "free" products for system logging:
Papertrail 100 MB/month 7 days retention search 48 hours
Loggly 200 MB/day 7 days retention
Bugfender 100K lines/day 1 day retention
Logz.io 1GB/day 3 days retention
This is just a quick comparison and have not evaluated any of the options.
Does Firebase have a solution for this or is it best to use one of the above mentioned products?

Google Analytics real-time - keep alive

i have a realtime platform when users are staying on pages for a long duration, i found that after 5 minutes (more or less) the GA realtime stop show them so i created timer that each 4 minutes send pageview and this way all users remain "connected" to GA.
I wonder if it's a good approach or it's can may produce un-accurate data on the reports later.
Is anyone experienced that?
Your terminology seems a little off - users do not become "disconnected" from Google Analytics, the difference between realtime reports and data from the reporting api is that the former shows only a subset of ad hoc computed dimensions and metrics whereas the reporting api shows, after some processing latency, the full set of metrics and dimensions, including stuff that required more processing time like session- and user scoped data.
Other than that your approach is fine. There is a limit on the number of API calls you are allowed to make - the documentation has an example on how to calculate your calls to stay within the limits, and Google suggests to implement some sort of serverside caching if you do need a lot of realtime dashboards.
But this is not going to affect the data quality of reports in any way. Realtime API is a read-only API, the worst thing that can happen is that you exceed your quota and get blocked for the rest of the day. So there is no way this would create "un-accurate data on the reports later".

Resources