Google Datalab: Can I query Google Cloud Datastore for trainning data for model? - google-cloud-datastore

I am planning on creating a ML model using Google Datalab.
I plan to keep the source data (json, structured) on datastore.
Still, I am not finding lot of examples onhow to query datastore form datalab.
Is that something that can be done? Is that a good practice?
Should I better write a process to send the trainning data to a CSV on Google Cloud Storage?
Thanks!

#Kolban answered it on the comments.
This is a duplicate of Google Datastore API from Datalab
Also, there are not many examples because it is not used as much as the other products
Thanks!

Related

Is it possible to export usage data by collection?

We use Firebase with a firestore database.
I would like to do some data analysis to identify business logic that performs unnecessary read / write operations. Is it possible to export detailed data on read/write operations, or am I limited to what Google give us via Firestore Usage.
Ideally, I would like to export detailed usage data for analysis in R / Python.
Is this possible?
Google Cloud Platform does not provide a straight-forward way to analyze read/write operations by documents/collections.
In the end, I solved this problem by exporting the Firebase audit logs to BigQuery, and cleaning the data manually.
https://cloud.google.com/logging/docs/audit

Firebase - Perform Analytics from database/firestore data

I am using Firebase as my authentication and database platform in my React Native-Expo app. I have not yet decided if I will be using the realtime-database or Firestore database.
I need to perform statistical analysis on daily data gathered from my users, which is stored in the database. I.e. the users type in their daily intake of protein, from it I would like to calculate their weekly average, expected monthly average, provide suggestions of types of food if protein intake is too low and etc.
What would be the best approach in order to achieve the result wanted in my specific situation?
I am really unfamiliar and stepping into uncharted territory regarding on how I can accomplish this. I have read that Firebase Analytics generates different basic analytics regarding usage of the app, number crash-free users etc. But can it perform analytics on custom events? Can I create a custom event for Firebase analytics to keep track of a certain node in my database, and output analytics from that? And then of course, if yes, does it work with React Native-Expo or do I need to detach from Expo? In addition, I have read that Firebase Analytics can be combined with Google BigQuery. Would this be an alternative for my case?
Are there any other ways of performing such data analysis on my data stored in Firebase database? For example, export the data and use Python and SciKit Learn?
Whatever opinion or advice you may have, I would be grateful if you could share it!
You're not alone - many people building web apps on GCP have this question, and there is no single answer.
I'm not too familiar with Firebase Analytics, but can answer the question for Firestore and for your custom analytics (e.g. weekly avg protein consumption)
The first thing to point out is that Firestore, unlike other NoSQL databases, is storage only. You can't perform aggregations in real time like you can with MongoDB, so the calculations have to be done somewhere else.
The best practice recommended by GCP in this case is indeed to do a regular export of your Firestore data into BQ (BigQuery), and you can run analytical calculations there in the meantime. You could also, when a user inputs some data, send that to Pub/Sub and use one of GCP Dataflow's streaming templates to stream the data into BQ, and have everything in near real time.
Here's the issue with that however: while this solution gives you real time, and is very scalable, it gets expensive fast, and if you're more used to Python than SQL to run analytics it can be a steep learning curve. Here's an alternative I've used for smaller webapps, which scales well for <100k users and costs <$20 a month on GCP's current pricing:
Write a Python script that grabs the data from Firestore (using the Firestore Python SDK), generates the analytics you need on it, and writes the results back to a Firestore collection
Create an endpoint for that function using Flask or Django
Deploy that server application on Cloud Run, preventing unauthenticated invocations (you'll only be calling it from within GCP) - see this article, steps 1 and 2 only. You can also deploy the Python script(s) to GCP's Vertex AI or hosted Jupyter notebooks if you're more comfortable with that
Use Cloud Scheduler to call that function every x minutes - see these docs for authentication
Have your React app query the "analytics results" collection to get the results
My solution is a FlutterWeb based Dashboard that displays relevant data in (near) realtime like the Regular Flutter IOS/Android app and likewise some aggregated data.
The aggregated data is compiled using a few nodejs based triggers in the database that does any analytic lifting and hence is also near realtime. If you study pricing you will learn, that function invocations are pretty cheap unless of-course you happen to make a 'desphew' :)
I came up with a great solution.
I used the inbuilt firebase BigQuery plugin. Then I used Cube.js (deployed on GCP - cloud run on docker) on top of bigquery.
Cube.js just makes everything just so easy. You do need to make a manual query It tries to do optimize queries. On top of that, it uses caching so you won't get big bills on GCP. I think this is the best solution I was able to find. And this is infinitely scalable and totally real-time.
Also if you are a small startup then it is mostly free with GCP - free limits on cloud run and BigQuery.
Note:- This is not affiliated in any way with cubejs.

Firebase performance data with big query

I want to load firebase performance(performance monitoring) data into big query so that I can create custom visualization on Google data studio.
is it possible to do this with performance monitoring? I am not able to find this in docs anywhere?
There is currently no export of Firebase Performance Monitoring data.
It's happening now. You can export your firebase performance data in BigQuery for custom queries or reports
https://firebase.google.com/docs/perf-mon/bigquery-export

Using firebase for persistent storage

I'm working on a IOT project, where we are evaluating to use firebase. We have a prototype implemented which works fine for real time data. Since the data is critical for client they want to be able to retrieve the data at later point of date. I couldn't find much about a persistent storage in the firebase website, any of you have idea on if we can use firebase alone or do we need to transfer it to you a cloud based store like google cloud store, if so what is the reliable time for which firebase will hold the data. Assuming you choose the right plan.
Thanks
Kamal
Firebase is a persistent database. While you certainly can setup a way to transfer the data to an archive, if you structure your data correctly, there shouldn't be any need to do so. You haven't really indicated what those requirements are, however, so it is difficult to thoroughly answer your question. (If you have one.)
There have been several demonstrations of using Firebase as part of an IoT infrastructure. Consider, for example this presentation by Jenny Tong (#mimming) about doing exactly this.

Does Omniture support providing APIs?

Can someone help me out with this information if SiteCatalyst offers any API method which user can use to access data from its database (from the data which SiteCatalyst has collected from web).
We have option of scheduling reports but what if number of users is in some tens of thousands ? We cannot schedule reports for everyone. Hence this question came in my mind, if SiteCatalyst offers any API to facilitate this, which user can use and fetch data depending upon their need.
Thanks,
Adi
You can utilize the reporting API as outlined here: https://marketing.adobe.com/developer/documentation/analytics-reporting-1-4/whatsnew
There aren't any API's that let you interact with the scheduled reports interface yet, so hopefully the reporting API gives you what you're looking for.

Resources