I have seen the documentation & pricing calculator but, not getting clear Idea if bucket is charged externally for data transfers / (Bandwidth consumed) or not.
Its showing
Storage cost
$0.026 per GB-month
Retrieval cost
Free
Class A operations (Create/ delete )
$0.005 per 1,000 ops
Class B operations (Downloads)
$0.0004 per 1,000 ops
Does this retrieval cost means no extra charges will be levied for any amount of data transfer/ bandwidth used for files stored .
The "retrieval cost" is a special charge that is specific to nearline and coldline objects. Because these tiers are designed around data being seldomly read, there is a fee to read objects stored under these tiers in addition to any other costs.
Bandwidth consumption, when reading from GCS, is a network usage fee. It's a bit complicated, but basically you're paying for GCS to send the data somewhere. This may be free if you're sending the data to a nearby GCP service, or otherwise the cost will be based on where the data is being stored and where the data is being read from. For example, if the data is being stored in Iowa, it will be more expensive to read it from China than it will be to read it from Oregon.
For more details, see https://cloud.google.com/storage/pricing or contact the sales team.
You are charged for bandwidth.
Retrieval cost is a separate charge associated with Nearline and Coldline storage.
"A retrieval cost applies when you read data or metadata that is stored as Nearline Storage or Coldline Storage. This cost is in addition to any network charges associated with reading the data."
Related
The DynamoDB's pricing page contains the following text explaining how much storing continuous backups (a.k.a. PITR - point-in-time recovery) costs:
DynamoDB charges for PITR based on the size of each DynamoDB table (table data and local secondary indexes) on which it is enabled. DynamoDB monitors the size of your PITR-enabled tables continuously throughout the month to determine your backup charges and continues to bill you until you disable PITR on each table.
This seems to say that that the user is charged for continuous backups based on the size of the table they are enabled on - not the size of the backup stored. It means that if a user continuously modifies existing data instead of adding new data, Amazon may need huge amounts of storage to store 35 days worth of modifications, space for which the user does not pay. That doesn't make sense to me - I suspect their pricing needs to correspond to the size of the backup, not the table - but this is not claimed in the above text or in any of its similar variants I found on Amazon's site.
So my question is - how does Amazon charge for continuous-backup storage? By the table size, or by the backup size (i.e., the amount of changes)? Is this documented anywhere?
Curiously, I couldn't find any other source on the web which discusses this question.
I found many slightly-modified versions of the above text copied to all sorts of tutorials, but none of them give any example of answers my question. It's as if nobody really cares how much this feature will cost before they start using it :-)
Pricing
Your assumptions are correct, and it is of course the price of the table which you pay for. This means that PITR is extremely cost-effective when comparing it to taking multiple on-demand backups. Moreover, PITR also lets you restore to any point in time in the previous 35 days.
But How?
How is it done, it's simply smarts from the DynamoDB team, which uses S3 and snapshotting to store your backups. Learn more from this re:Invent presentation which goes into further details.
The pricing model seems to be clearly stated in your quote, being based on the size of the table. An on-demand backup can easily have its size estimated based on the size of the table, but in the absence of details as to the implementation of continuous backups, it seems that the uncertainty as to the total backup size is taken into account when setting the price at double the price of on-demand backup storage, perhaps with the help of analytical analysis to determine the usual/average levels of activity.
I'm working on a product that displays the results of running races. Races could have thousands of participants. So, in the days after a medium-sized event, there might be 3000 non-authenticated users wanting to browse 3000 results.
Although not every visitor will view all the results, the maximum damage at 3000 * 3000 would be 9,000,000 reads which at $.06 (Google cloud pricing) would cost $540,000 (Update: I'm a dummy, I missed the "per 100,000 documents" part, so this would only be $540).
Obviously, I wouldn't deliver all 3000 results for each visit - there would be paging and limits. Though, there's something inherently scary about the possibility of those costs.
Questions:
Is firebase simply the wrong technology for this type of product?
Is firebase not really intended for non-authenticated apps? Obviously DDOS becomes a concern for public access and there's no real protection in FB for this.
Every post I've read on these topics assumes developers are building apps for authenticated users.
9,000,000 reads which at $.06 (Google cloud pricing) would cost $540,000
The Firestore pricing of $0.06 is for 100,000 document reads, so 9 million document reads cost $540.
Aside from that: you should model your data in a way that ensures you read the data that the user actually sees. For example, if all users will read the entirety of all 3,000 documents, consider using a data bundle to distribute that to them.
Realistically though it is more likely that each user will read just a subset of the documents, and probably not of all 3,000 documents. So consider if you can combine the part that they'll read into a more cost-efficient structure. For if these were news articles: you could store the headline and intro paragraph of the first 100 articles in a single document, and just read that document (let's call it the frontpage) into each client when it starts.
There are many more ways to model the data, depending on the use-cases of your app. To learn more on how to think about such data modeling, I recommend reading NoSQL data modeling and watching the excellent Get to know Cloud Firestore video series.
I am considering storing multiple tenants in a single Firebase Firestore database. There will only be one collection per tenant and a few shared collections. Some will have more data than others. Some tenants may have a few million records while others may end up with a few billion. I want to confirm that the size of data in one collection will not impact the performance or storage of another collection in the same database.
I couldn't find much in the documentation about how the data is physically stored. Is all the data in Firestore stored in a single blob/file? If so, this could be a problem when there are hundreds of tenants with billions of records each. In an ideal world, each collection would be a physically separate file, and the server orchestration would separate the collections onto multiple servers so that a single server is not sharing the load between a very heavy tenant, and a very light tenant. This scenario would mean that a heavy tenant would slow down a light tenant.
My basic question is: can a single Firestore database infinitely scale up in size assuming that no single collection is bigger than a few billion records?
I know that there are two types of databases: native and datastore. Which of these seems more appropriate, and is the answer to my question different depending on which of these I select?
If the answer is that Firestore cannot scale infinitely in this way, what is the alternative approach? Should I be using Bigtable instead? Cassandra? Or, is there another way to physically divide my Firestore database other than collections?
Some tenants may have a few million records while others may end up with a few billion. I want to confirm that the size of data in one collection will not impact the performance or storage of another collection in the same database.
The performance in Firestore isn't related to the number of documents that exist in a collection. In terms of speed, it doesn't matter if you perform a query on:
A top-level (root-level) collection.
A sub-collection, which basically represents a collection that is nested under a document.
A collection group, which actually means querying collections and sub-collections that exist across the entire database.
The speed will always be the same, as long as the query returns the same number of documents. This is happening because the query performance depends on the number of documents you request and not on the number of documents you search. So it doesn't really matter if you query a collection with 1 MILLION documents or even 1 BILLION documents, the time for getting the same results will be the same.
I couldn't find much in the documentation about how the data is physically stored. Is all the data in Firestore stored in a single blob/file? If so, this could be a problem when there are hundreds of tenants with billions of records each.
In Cloud Firestore, the unit of storage is the document. Documents live in collections, which are simply containers for documents. Please note that Firestore is optimized for storing large collections of small documents. And when I say large, I mean extremely large. So when you perform a query against a collection of 1 MILLION documents, the speed depends on the number of results you return and it does not depend on the number of the documents in which you search, or on the number of documents that exist in other collections in which you aren't performing a search.
Can a single Firestore database infinitely scale up in size assuming that no single collection is bigger than a few billion records?
While when using the Firebase Realtime Database you had to scale using multiple databases, in Firestore this practice is not necessary. However, the are some techniques that are really good explained in the official docs:
Building scalable applications with Firestore
If the answer is that Firestore cannot scale infinitely in this way, what is the alternative approach?
I can definitely massively scale.
See the Firestore best practices and security rules.
You may conceptualize Firestore as being one service being shared by all of Google's customers. Just as Google's attempts to ensure that one customer's (so-called "noisy neighbor") impact on the service does not affect others, you don't want to be a noisy neighbor to yourself.
You need to consider more than just performance.
Security. E.g.see security rules as a mechanism that you may be able to use to help enforce segregation of your tenants' data. You will want to understand fully how to keep different customers' data separated securely. Your customers will want to understand what measures you're employing to ensure their data is keep separate too.
Multitenancy. Google Cloud Platform has no intrinsic (platform-wide) multitenant capabilities and, often, a way to manifest tenancy has been to use different Google Projects for different customers. This is because Projects provide a well-defined security perimeter. You may want to investigate whether (some subset of your customers) would benefit from being one customer, one project.
Quota. Another important consideration is quota. Every Cloud Platform method is constrained by some quota. You will want to be careful in ensuring that quota is distributed fairly across customers so that some customers don't consume all the quota denying other customers access to the service.
It is well known that read/write cost of firebase rtdb is free. With little bit more digging, i could've find out that read/write can actually cost in non-direct ways. Ok, so i've been searching through docs and SO questions to figure out of "what is the exact difference between FIRESTORE READING COST($0.06 per 100,000 documents) and REALTIME DATABASE DOWNLOAD($1/GB) COST", but sadly i couldn't have managed to complete it.
Stored data cost for RTDB($5/GB) is really clear, and i understand that the price billed monthly(this one would be true, right?). But what is exactly a DOWNLOAD cost? Through a few SO questions and official docs, i could've figure out that rtdb download cost is really similar to firestore reading cost, and it is important to specify db.ref path clearly by diving into the final path. But, if the download cost is all about these operations, such as reading json data in a specific field or path, what is difference between concept of firestore reading and concept of these rtdb download operations?
If all these things are already happening in the Earth, the cost of RTDB when it comes to 'conceptual reading' is never free, even if we speak in direct manner. Then why some community members and articles always say "read/write cost for RTDB is free"? I was considering migration of some features from firestore to RTDB since it is well-known that rtdb is free for read and write. The feature is updating a single path(document for firestore) of 500B size hundreds time every month. But this issue makes me really confusing.
Let's say that 100,000 read for firestore is $0.04 and download for RTDB(which seems like reading) is $1/GB. In my calculation, 2,500,000 document reads from firestore is equal to a single GB download from RTDB. It means that if a single operation reads bunch of data larger than 400B(approx.), firestore read-cost is even cheaper than RTDB read-cost. Then there is no reason for me to use RTDB for reading data if single operation needs to retrieve data larger than 400B per operation. It feels like i've got caught by wrong concepts, but it is not easy to get out of this swamp.. ]:
So i hope to make clear of RTDB read/write cost(if it is really free of charge by itself), and the reason why it is better to use RTDB than firestore, when the app have to do lots of read operations(for me, ex. approx. 1,000 operations retrieving 400B-size data per month per a single user). I understand that a few firebase gurus are thankfully contributing SO's firebase tag. I've tried to write the question as clear as possible, but think there would be some unclear parts in the question. So, comments will be really appreciated! Hope this question would reach to you.. Thanks in advance [:
I have created a very handy spreadsheet calculator that calculates the rough size of the payload and scales per user while also factoring in the free tier usage as well. You can enter your values at the top and get a decent result.
But to summarize, Realtime DB is highly expensive to read per KB while Firestore is rated for up to 1mb (potential) per read while writing to Realtime is extremely cheap, I have confirmed that besides overhead, it is free to write to realtime db.
Realtime db is not as economical compared to Firestore and is designed to cover some caveats of Firestore. Realtime Billing for reads (download) is the (data + overhead) rounded up to nearest kb
TLDR:
Firestore is ideal for high reads, low writes, static information.
Realtime is better suited with low reads, high writes, volatile information.
When reading documents from Firestore you pay for:
Document reads - The cost to read the document on the server.
Network egress - The cost to download the data to the client.
In most scenarios we see the cost for developers using Firestore coming more from document reads, as the cost per GB is comparatively low.
When reading data from the Realtime Database, you only pay for:
GB downloaded - The cost to download the data to the client.
Here the cost mostly comes from the size of the data you download. It's quite similar to the Network egress from Firestore, but at a higher cost per byte read (and of course you then don't pay for the read operation on the server itself).
While a calculator (such as the one from DIGI Byte, or the one on the pricing page) is going to be best, the rough guidance is that if you perform many small reads and writes, RTDB is going to a better choice, while if you perform fewer writes and/or more larger reads, then Firestore is often the better choice.
Understand firestore charge based on read / write operation.
But I notice that the firestore read from server per app launch, it will cause a big read count if many user open the app quite frequent.
Q1 Can I just limit user read from server for first time login. After that it just read for those update document per app launch?
For example there's a chat app group.
100 users
100 message
100 app launch / user / day
It will become 1,000,000 read count per day?
Which is ridiculous high.
Q2 Read is count per document, doesn't matter is root collection / sub collection, right?
For example, I read from a root collection that contain 10 subcollection and each of them having 10 documents, which will result 100 read count, am i right?
Thanks.
That’s correct, Cloud Firestore cares less about the amount of downloaded data and more about the number of performed operations.
As Cloud Firestore’s pricing depends on the number of reads, writes, and deletes that you perform, it means that if you had 100 users communicating within one chat room, each of the users would get an update once someone sends a message in that chat, therefore, increasing the number of read operations.
Since the number of read operations would be very much affected by the number of people in the same chatroom, Cloud Firestore suits best (price-wise) for a person-to-person chat app.
However, you could structure your app to have more chat rooms in order to decrease the volume of reads. Here you can see how to store different chat rooms, while the following link will guide you to the best practices on how to optimize your Cloud Firestore realtime updates.
Please keep in mind that Cloud Firestore itself does not have any rate limiting by default. However, Google Cloud Platform, has configurable billing alerts that apply to your entire project.
You can also limit the billing to $25/month by using the Flame plan, and if there is anything unclear in your bill, you can always contact Firebase support for help.
Regarding your second question, a read occurs any time a client gets data from a document. Remember, only the documents that are retrieved are counted - Cloud Firestore does searching through indexes, not the documents themselves.
By using subcollections, you can still retrieve data from a single document, which will count only as 1 read, or you can use a collection group query that will retrieve all the documents within the subcollection, counting into multiple reads depending on the amount of documents (in the example you put, it would be 10x10 = 100).