How to "aggregate" entities on DataStore - google-cloud-datastore

I understand "joins" and "aggregate" queries aren't supported within the Cloud Datastore query engine.
But, If I have to do "aggregate", How to aggregate entities?
Which ways are better?or anything way?
If entity value was update, update and sum to another entity for aggregate at the same time.
Read each entity and reduce using Dataflow.
Export to Bigquery.

Don't need live data
If you don't need live data, do periodic exports from Cloud Datastore to BigQuery:
gcloud datastore export --kinds="myKind" gs://${BUCKET}
Someone has even done a shell script to help export a kind to GCS, then import in to BigQuery.
Need Live Data
One thing to look into here is using projections. Projection queries are much faster and cheaper, as you can tell Datastore to simply return you just the property you want to aggregate and it will stream it out of the index.
If you have a larger dataset, Dataflow can then come in handy.

Related

How to order by firebase storage list

Can not find any function in firebase storage to list all files in a specific order like ascending or descending, i have tried listoptions but it supports only two arguments maxResults and pageToken
The Cloud Storage List API does not have the ability to sort anything by some criteria you choose. If you need the ability to query objects in a bucket, you should consider also storing information about your objects in a database that can be queried with the flexibility that you require. You will need to keep that database up to date as the contents of your bucket change (perhaps using Cloud Functions triggers). This is a common thing to implement, since Cloud Storage is optimized only for storage huge amounts of data for fast retrieval at extremely low costs - it is not also trying to be a database for object metadata.
Please also see:
gsutil / gcloud storage file listing sorted date descending?

do you need to create a separate collection/document for reading an aggregated document with firebase cloud function

I want to use cloud function to produce an aggregated document containing all the data i need for the first page of my app. The aggregated document will be updated each time a document is add/updated in a Firestore collection A.
In order to do so, I have to create a separate collection B containing a single document(the aggregated doc from cloud function) which the app will fetch from when it start right? Hence, my cloud function will be updating the single document in Collection B? Am I correct in my understanding of how using cloud function to aggregate data works? Thank you very much
This is indeed a totally valid approach. Note that you may use a Transaction, in the Cloud Function, if there is a risk that source data is updated by several users in parallel.
You don't give any detail on what is in collection A (identical docs or different docs?) and what is aggregated (numbers, headlines,...), but you should try to avoid reading all the docs from collection A each time the Cloud Function is triggered. This may generate some unnecessary extra cost. If the first page of your app aggregates some figures, you may use some counters.

Which of the Azure Cosmos DB types of database should I use to log simple events from my mobile application?

I would like to set up event logging for my application. Simple information such as date (YYYYMMDD), activity and appVersion. Later I would like to query this to give me some simple information such as how many times a certain activity occurred for each month.
From what I see there are a few different types of database in Cosmos such as NoSQL and Casandra.
Which would be the most suitable to meet my simple needs?
You can use Cosmos DB SQL API for storing this data. It has rich querying capabilities and also has a great support for aggregate functions.
One thing you would need to keep in mind is your data partitioning strategy and design your container's partition key accordingly. Considering you're going to do data aggregation on a monthly basis, I would recommend creating a partition key for year and month so that data for a month (and year) stays in a single logical partition. However, please note that a logical partition can only contain 10GB data (including indexes) so you may have to rethink your partitioning strategy if you expect the data to go above 10GB.
A cheaper alternative for you would be to use Azure Table Storage however it doesn't have that rich querying capabilities and also it doesn't have aggregation capability. However with some code (running in Azure Functions), you can aggregate the data yourself.

Firebase Realtime Database vs Cloud Firestore

Edit: After posting the question I thought I could also make this post a quick reference for those of you needs a quick peek at some of the differences between these two technologies which might help you decide on one of them eventually. I will be editing this question and adding more info as I learn more.
I have decided to use firebase for the backend of my project. For firestore is says "the next generation of the realtime database". Now I am trying to decide which way to go. Realtime database or cloud firestore?
Billing:
At a first glance, it looks like firestore charges per number of results returned, number of reads, number of writes/updates etc. Real-time database charges based on the data transmitted. The number of read-write operations is irrelevant. They both also charge on the data stored on the google servers too (I think in this respect firestore is cheaper one). Why am I mentioning this price point? Because from my point of view, although it might a lower weight, it is also a point to consider while choosing the one over the other.
Scaling:
Cloudstore seems to scale horizontally seamlessly. I think this is not possible with the real-time database.
Edit:
In the real-time database, you need to shard your data yourself using multiple databases. And you can only do this if you are in BLAZE pracing plan.
ref: https://firebase.google.com/docs/database/usage/sharding
Performance & Indexing:
Another thing is the real-time database data structure is different in both. The real-time database stores the data as a JSON object in any way we structure them. Firestore structures the data as collections and documents. And hence the querying also changes between the two.
I think firestore does auto indexing which increases the read performance greatly too (which will decrease read performance). I am not sure if this is also the case with the real-time database.
Edit:
The real-time database does not automatically index your data. You need to do it yourself after a solid inspection of your data and your needs.
ref:https://firebase.google.com/docs/database/security/indexing-data
What other differences can you think of?
What would be (or has been) your choice for different types of projects?
Do you still go with the real-time database or have you migrated from that to the firestore? If so why?
And one last thing. How would you compare the SDKs of these two?
Thanks a lot!
What other differences can you think of?
what i think, ok. I use realtime-database for 6 months experience and difference is, firestore easy for sorting data. As Example, i want to retrieving user name based timestamp.
Query firstQuery = firestore.collection("Names").orderBy("timestamp", Query.Direction.DESCENDING).limit(10); // load 10 names
What would be (or has been) your choice for different types of
projects?
For me, Realtime-Database for Data Streaming when i work with Arduino, i want to store Drone Speed.
And Firestore for SMART OFFICE, like Air Conditioner, or light-room and Enterprise like Inventory Quantities, etc.
Do you still go with the real-time database or have you migrated from
that to the firestore? If so why?
still go with real-time because i need TREE for displaying streaming data strucure instead of query TABLE like firestore.

Create dashboard on Firebase Database for various metrics

I have events in firebase database table where each event has certain fields. One of the field is event_type. What I want to achieve is to be able to visualize in graphical form, how many events of each type comes daily?
How do I do something like that in firebase database?
Q1. Is it possible to directly do this in firebase?
Q2. Do I need to move data to some other datasource (like Big query) and setup dashboard there?
It is definitely possible to create a dashboard with aggregate data directly on the Firebase Realtime Database. But you'll have to take a different approach than with e.g. BigQuery.
With relational databases, you'll create a dashboard by running aggregation queries. For example to show how many events of each type, you'll run something like SELECT type, COUNT(*) FROM events GROUP BY type.
The Firebase Realtime Database (and most NoSQL databases) don't have such a GROUP BY operation, not a COUNT() method. So that means that you'd have to load all data into your dashboard, and group/count it there, which is quite expensive. That why on NoSQL databases you'll typically keep a running count for each type in the database and update that on every write operation. While this puts an overhead on each write operation, the dashboard itself suddenly becomes very simply when you do this. For an example of a simple counter, see the function-samples repo.
This approach only works if you know up front what counters (and other aggregates) you want to show in the dashboard. If that isn't the case, many developers use the nightly backups from the Realtime Database to ingest the data into another system that lends itself more to exploratory querying, such as BigQuery.
Either approach can work fine. The right approach is a matter of your exact use-case (e.g. do you know the exact data you want in the dashboard, or are you still figuring that out?) and what you're most comfortable with.

Resources