Does GCP Datastore export reads all entities - google-cloud-datastore

I am investigating on the best approach to backuping our data every night. GCP documentation:
https://cloud.google.com/datastore/docs/export-import-entities
However I am concerned about this line:
Entity reads and writes performed by export and import operations count towards your Firestore in Datastore mode costs.
Does this mean that every night we're going to be reading all records? How expensive can this backuping solution get?

I think yes all records will be read. This is quite clear in provided documentation.
You can check pricing here as well. Fortunately there is Free quota per day, not sure how big is your database...
If you want to estimate all cost of whole your solution you can use very friendly calculator. If you pick or search for Datastore product it can help you too estimate cost.
If you use other products as well you can add it to total calculation.
I hope it will help!

Related

firebase realtime database download pricing vs firestore read

It is well known that read/write cost of firebase rtdb is free. With little bit more digging, i could've find out that read/write can actually cost in non-direct ways. Ok, so i've been searching through docs and SO questions to figure out of "what is the exact difference between FIRESTORE READING COST($0.06 per 100,000 documents) and REALTIME DATABASE DOWNLOAD($1/GB) COST", but sadly i couldn't have managed to complete it.
Stored data cost for RTDB($5/GB) is really clear, and i understand that the price billed monthly(this one would be true, right?). But what is exactly a DOWNLOAD cost? Through a few SO questions and official docs, i could've figure out that rtdb download cost is really similar to firestore reading cost, and it is important to specify db.ref path clearly by diving into the final path. But, if the download cost is all about these operations, such as reading json data in a specific field or path, what is difference between concept of firestore reading and concept of these rtdb download operations?
If all these things are already happening in the Earth, the cost of RTDB when it comes to 'conceptual reading' is never free, even if we speak in direct manner. Then why some community members and articles always say "read/write cost for RTDB is free"? I was considering migration of some features from firestore to RTDB since it is well-known that rtdb is free for read and write. The feature is updating a single path(document for firestore) of 500B size hundreds time every month. But this issue makes me really confusing.
Let's say that 100,000 read for firestore is $0.04 and download for RTDB(which seems like reading) is $1/GB. In my calculation, 2,500,000 document reads from firestore is equal to a single GB download from RTDB. It means that if a single operation reads bunch of data larger than 400B(approx.), firestore read-cost is even cheaper than RTDB read-cost. Then there is no reason for me to use RTDB for reading data if single operation needs to retrieve data larger than 400B per operation. It feels like i've got caught by wrong concepts, but it is not easy to get out of this swamp.. ]:
So i hope to make clear of RTDB read/write cost(if it is really free of charge by itself), and the reason why it is better to use RTDB than firestore, when the app have to do lots of read operations(for me, ex. approx. 1,000 operations retrieving 400B-size data per month per a single user). I understand that a few firebase gurus are thankfully contributing SO's firebase tag. I've tried to write the question as clear as possible, but think there would be some unclear parts in the question. So, comments will be really appreciated! Hope this question would reach to you.. Thanks in advance [:
I have created a very handy spreadsheet calculator that calculates the rough size of the payload and scales per user while also factoring in the free tier usage as well. You can enter your values at the top and get a decent result.
But to summarize, Realtime DB is highly expensive to read per KB while Firestore is rated for up to 1mb (potential) per read while writing to Realtime is extremely cheap, I have confirmed that besides overhead, it is free to write to realtime db.
Realtime db is not as economical compared to Firestore and is designed to cover some caveats of Firestore. Realtime Billing for reads (download) is the (data + overhead) rounded up to nearest kb
TLDR:
Firestore is ideal for high reads, low writes, static information.
Realtime is better suited with low reads, high writes, volatile information.
When reading documents from Firestore you pay for:
Document reads - The cost to read the document on the server.
Network egress - The cost to download the data to the client.
In most scenarios we see the cost for developers using Firestore coming more from document reads, as the cost per GB is comparatively low.
When reading data from the Realtime Database, you only pay for:
GB downloaded - The cost to download the data to the client.
Here the cost mostly comes from the size of the data you download. It's quite similar to the Network egress from Firestore, but at a higher cost per byte read (and of course you then don't pay for the read operation on the server itself).
While a calculator (such as the one from DIGI Byte, or the one on the pricing page) is going to be best, the rough guidance is that if you perform many small reads and writes, RTDB is going to a better choice, while if you perform fewer writes and/or more larger reads, then Firestore is often the better choice.

How to avoid Firestore document write limit when implementing an Aggregate Query?

I need to keep track of the number of photos I have in a Photos collection. So I want to implement an Aggregate Query as detailed in the linked article.
My plan is to have a Cloud Function that runs whenever a Photo document is created or deleted, and then increment or decrement the aggregate counter as needed.
This will work, but I worry about running into the 1 write/document/second limit. Say that a user adds 10 images in a single import action. That is 10 executions of the Cloud Function in more-or-less the same time, and thus 10 writes to the Aggregate Query document more-or-less at the same time.
Looking around I have seen several mentions (like here) that the 1 write/doc/sec limit is for sustained periods of constant load, not short bursts. That sounds reassuring, but it isn't really reassuring enough to convince an employer that your choice of DB is a safe and secure option if all you have to go on is that 'some guy said it was OK on Google Groups'. Is there any official sources stating that short write bursts are OK, and if so, what definitions are there for a 'short burst'?
Or are there other ways to maintain an Aggregate Query result document without also subjecting all the aggregated documents to a very restrictive 1 write / second limitation across all the aggregated documents?
If you think that you'll see a sustained write rate of more than once per second, consider dividing the aggregation up in shards. In this scenario you have N aggregation docs, and each client/function picks one at random to write to. Then when a client needs the aggregate, it reads all these subdocuments and adds them up client-side. This approach is quite well explained in the Firebase documentation on distributed counters, and is also the approach used in the distributed counter Firebase Extension.

Firestore Document "Too much contention": such thing in realtime database?

I've built an app that let people sell tickets for events. Whenever a ticket is sold, I update the document that represents the ticket of the event in firestore to update the stats.
On peak times, this document is updated quite a lot (10x a second maybe). Sometimes transactions to this item document fail due to the fact that there is "too much contention", which results in inaccurate stats since the stat update is dropped. I guess this is the result of the high load on the document.
To resolve this problem, I am considering to move the stats of the items from the item document in firestore to the realtime database. Before I do, I want to be sure that this will actually resolve the problem I had with the contention on my item document. Can the realtime database handle such load better than a firestore document? Is it considered good practice to move such data to the realtime database?
The issue you're running into is a documented limit of Firestore. There is a limit to the rate of sustained writes to a single document of 1 per second. You might be able to burst writes faster than that for a while, but eventually the writes will fail, as you're seeing.
Realtime Database has different documented limits. It's measured in the total volume of data written to the entire database. That limit is 64MB per minute. If you want to move to Realtime Database, as long as you are under that limit, you should be OK.
If you are effectively implementing a counter or some other data aggregation in Firestore, you should also look into the distributed counter solution that works around the per-document write limit by sharding data across multiple documents. Your client code would then have to use all of these document shards in order to present data.
As for whether or not any one of these is a "good practice", that's a matter of opinion, which is off topic for Stack Overflow. Do whatever works for your use case. I've heard of people successfully using either one.
On peak times, this document is updated quite a lot (10x a second maybe). Sometimes transactions to this item document fail due to the fact that there is "too much contention"
This is happening because Firestore cannot handle such a rate. According to the official documentation regarding quotas for writes and transactions:
Maximum write rate to a document: 1 per second
Sometimes it might work for two or even three writes per second but at some time will definitely fail. 10 writes per second are way too much.
To resolve this problem, I am considering to move the stats of the items from the item document in Firestore to the realtime database.
That's a solution that I even I use it for such cases.
According to the official documentation regarding usage and limits in Firebase Realtime database, there is no such limitation there. But it's up to you to decide if it fits your needs or not.
There one more thing that you need to into consideration, which is distributed counter. It can solve your problem for sure.

Is there any cost efficient way to find the number of docs in a collection in firestore

I'm new to firebase and I'm trying to integrate it into my vuejs project.
if I just want the number of docs in a collection instead of the actual docs, i read all the docs and use snapshot.size() to find the number. Is this cost-efficient? If not is there a better way to approach it?
i read all the docs and use snapshot.size()
As the docs stays, this is what snapshot.size() is ment for. So you can go ahead with it.
If not is there a better way to approach it?
Yes it is, you can implement distributed counters as explained in the official documentation of distributed counters.
As a personal hint, don't store this kind of counters in Cloud Firestore, because every time you increase or decrease the counter will cost you a read or a write operation. Host this counter in Firebase realtime database "at almost no cost".

Understand where the bandwidth usage is coming from in Firebase database

My app is growing in terms of bandwidth usage with Firebase database and I am trying to optimize my queries to use less bandwidth (thus reduce cost) but I am doing this quite blindly because there are no statistics about my database usage (I can't know what queries take the most bandwidth).
Is there somehow a way to know which queries are taking a lot of bandwidth? How do you go about optimizing usage with Firebase database?
Edit:
I have a chat website, and I use observers such as messagesRef.child(conversationID).limitToLast(25).on('child_‌​added'...
conversationsRef.child(conversationID).('participants').on('value'...
The Firebase Profiler saved my life for this. https://firebase.google.com/docs/database/usage/profile
Was able to pinpoint exactly what reference (including children) was hogging the bandwidth, which made it much easier to figure out which part of the code is problematic.
there's no query tuning tools, if that's what you're looking for. you could build in simple time logging to capture just before and after queries are issued, log that data, and harvest it from client to narrow down to the most poorly performing ones.
hard to help without seeing the actual queries or the data model.
Just in case you are not already using Firebase's .indexOn()... which is the best way to improve your performance (so they say hereunder), take a look at Index Your Data.
Firebase guys say:
If you know in advance what your indexes will be, you can define them via the .indexOn rule in your Firebase Realtime Database Rules to improve query performance.
highly agreed with ZagNut answer.
Logging queries completion on "then()" will help you here.
You can keep count of queries for a node request on client side and save that request count by client id in a separate node from your data structure on firebase database.
Now filter these queries to find usage patterns.
Thanks.

Resources