We have about 10 millions of small files - blobs, most of them around 1-2kB, some around 100kB and a very few above 1MB, total size 50GB. They have unique integer id - key from MySQL database. They will be always accessed by the key, never searched. Access time for one blob should be to 50ms. But we want to archive them occasionally, or do some bulk deletes or updates. I need to choose the proper storage option in the Google Cloud:
Storage - natural choice for blobs. But cannot do well archive and restore large number of objects - no bulk download/upload operations.
Datastore - is suitable as a blob store? But has limit 1MB per entity.
CloudSQL - is suitable as blob store?
BigTable - too expensive :-)
Custom file server - NFS, Gluster, ...
Custom NoSQL? Also an option, but we would prefer hosted Google solution.
Other?
You are going to be slightly limited on Google Cloud Platform options based on your requirements.
The ideal approach would be to use data store but yes they have size limitations for entities. CloudSQL is an option however it is really designed for transactions and therefore comes with a larger cost of running it.
I would therefore say - you would need to either use CloudSQL or have your own storage setup on an instance until they look to increase limits.
Related
i've been searching for what is the concurrent users limit for the cloud firestore spark plan but couldn't find it.
https://firebase.google.com/docs/firestore/quotas
It did said 1.000.000 concurrent users limit, but did not mention whether it is for the spark plan or the blaze plan. I've also tried searching answer elswhere, but did not find it answered specifically (with a source).
Help would be appreciated, thank you.
Per the Cloud Firestore pricing information (which Firebase uses):
When you use Firestore, you are charged for the following:
The number of documents you read, write, and delete.
The amount of storage that your database uses, including overhead for metadata and indexes.
The amount of network bandwidth that you use.
There is also no mention of any connection limits on Firebase's pricing page or the quotas documentation that you linked.
Unlike the Realtime Database, Cloud Firestore does not charge on a per-connection basis.
This video series also covers the ins and outs of Firebase products and is well worth sitting through.
Think of Cloud Firestore like a folder on your computer, which can contain thousands of little text files, similar to how documents in Cloud Firestore are stored. Users can update them with little chance of collision and grabbing a single document file would only require feeding 1s and 0s back to the requestor. This is why you are charged for network bandwidth rather than by individual connection.
In comparison, the RTDB was similar to one large JSON text file, with many people all trying to update it at once. Because parsing this text file on the server side was required to read and write data from it, it required compute resources to be able to do so. For this reason (among others), the number of connections the RTDB manager processes handled on behalf of spark plans were rate-limited to prevent abuse.
Despite not performing too many read/write operations - 170r, 4w to my Firestore database, I appear to have a huge number of API calls - 15,093 (See image below). The cause of this high calls to read/write ratio should be accounted to my application's use of network streams. My question is, should this be considered as a billable metric, or should I not worry if this runs into the millions so long as read/write are within limits (theoretically, I've never seen this happen on my own account).
I'm inclined to believe that I needn't worry about this metric, as I can't seem to find it under either the Firebase or GCP quotas page.
It may also be considered that I use the google maps and directions API from GCP, although this aren't nearly used as much as Firestore.
Thanks.
According to this doc, you are charged for the following:
The number of documents you read, write, and delete.
The amount of storage that your database uses, including overhead
for metadata and indexes.
The amount of network bandwidth that you use.
in the project i am working on, we have a database per tenant and each tenant consists of at least 1 department. One of the requirements we have is that when an admin user deletes a department using a custom frontend we've provided, the system should first archive the data of that department on a blob storage before the data is deleted. The same we have for the tenant, we need to archive the data before the database of that tenant is removed from the account.
Now, my question: is there any best practice to do this? We are planning to retrieve all the data from all collections, using a mongo query, based on the department id (which is also the partition key) and then send it to a blob storage. The challenge we have is the execution of the query to retrieve all the data because it can be a huge amount and the RUs required for that action may affect the performance of the system because other users may be using the system while we remove the data.
I looked at mongodump and mongoexport but these are applications so we cannot execute it from our code?
Any ideas? Thanks a lot.
I think one way to solve this is by using ChangeFeed, as it reallyhelps and simplifies writing a carbon copy somewhere else.
However, as of now the change feed processor won't notify you for deleted documents so you can't listen for them, this feature is planned as of now.
Your best bet is to write some custom application that does archiving using Query language support
I am using firebase realtime database.
The project works like a chat application.
We are constantly downloading / uploading.
But the cost seems too high. All data is downloaded again every get 1 message. What can I do to reduce the cost of this?
Instead of downloading data again every time I think of creating a cache. What should we pay attention to when creating the cache?
What are the solutions Firebase offers to reduce cost for realtime database?
Thanks, best regards
The pricing page of Firebase is pretty clear. The cost for the Realtime Database is based on:
The amount of data you store in the database.
The amount of data that is read from the database.
So those are the two factors you'll need to pay attention to if you want to reduce the cost.
Which one has the highest impact really depends on where your cost is coming from, which you didn't say. But the most common one is to look if you can reduce the number of times each client downloads the same data by local caching. If you're using the native mobile SDKs for iOS and Android (which you also don't mention), you can often already accomplish some reduction by enabling disk caching.
I'm starting a project with Cloud Firestore and I added my first collections and documents.
In order to be sure to use the right tool, I tried to search if there were limitations with Cloud Firestore.
I saw there were some limitations in bandwidth, number of commits etc.. but I didn't find (or didn't understand) if there was a limitation with the size of the database (number of collections / documents).
Is there a limitation? If yes, does it exist plans/bundles to extend those limitations ?
Best Regards,
Cloud Firestore scales effortlessly. It will store as much data as you're willing to put into it (and pay for). Practically speaking, volume of data is not a concern.