Currently I am using Firestore for my database and I have a users collection. Whenever a user document is created or updated to the users collection, a cloud function takes the user document and saves it in Elastisearch.
I am starting to be concerned about the scalability to this architecture. For example, suppose that several thousand cloud functions started writing documents to Elasticsearch at once, is Elasticsearch going to handle this load. Is there a better solution to this in Google cloud?
For example, can those cloud functions write the user documents in a queue and have cloud functions at the other end of the queue take a 100 documents and bulk write them to Elasticsearch.
I am new to Google cloud and would appreciate if you give me ideas, videos, and things to read.
Thanks
ElasticSearch has no limits on number of documents it can have per index but there are some limits such as maximum doc size and bulk writes mentioned in their documentation.
Maximum Document Size: 100KB [configurable in 7.7+]
Maximum Indexing Payload Size: 10MB
Bulk Indexing Maximum: 100 documents per batch
As far as I know, Google Cloud has no full text search API.
Talking of bulk writes, if realtime availability (data to be available immediately after adding) if not a concern, then you can store the new documents in Firestore along with a timestamp they were added and a boolean value if a document has been indexes in Elasticsearch.
Then instead of running a cloud function with onCreate trigger, you can run a scheduled cloud function every N minutes which will:
Query documents which have not been added in Elasticsearch
Make batches of 100 (for the 1000/batch limit)
Upload them to Elasticsearch
This way you are are more documents per cloud function run so that'll be a bit efficient but if you need your new data to be available immediately then this won't work.
Related
I am in the process of migrating my Realtime Database to Cloud Firestore. Ideally, I need to keep the same Realtime Database node keys that have been generated using push() and use it as the document ID in Firestore, but is this safe to do so?
I have read information at https://firebase.google.com/docs/firestore/best-practices and I am still unsure whether this will be safe. I am aware that auto generated document IDs in Cloud Firestore are in a different format to those automatically generated in Realtime Database.
Am I likely to run into problems by using by using Realtime Database generated keys such as: -M_NHw525_IxMqiGPUvd as the document ID in Cloud Firestore?
I really appreciate any help, Thanks.
Firestore is sensitive to hot spots in its writing process, meaning that write throughput is best when the writes are randomly distributed across the address space. In other words: if the IDs of the documents being written, and the values that are being written to the indexes, are randomly distributed.
Firebase Realtime Database push IDs start with an encoded timestamp, so they are definitely not randomly distributed. They are (by design) largely sequential: subsequent calls to push() typically leads to keys that are next to each other. This is exactly what they were designed for in Realtime Database, but it doesn't meet the requirement of a random distribution that is needed for maximizing write throughput in Firestore.
Whether you'll run into problems when using existing push keys for your Firestore writes really depends on the implementation. For example, during a data migration you'll want to be ready to throttle the writes (sooner) when they're not randomly distributed. Hopefully the above helps you to know what to keep an eye out for when performing the data migration.
Does reading (or writing) data from Realtime Database within a Firebase Cloud Function using transfer volume?
In my function almost the whole database is analyzed/read... can it happen, that the free download transfer limit will be exceeded?
Reading from the Realtime Database in any way is counted as a charged download. There is no exemption for traffic from Cloud Functions.
how to check firebase cloud firestore size in a project?
according to firebase the free plan gives you 1GB for firebase cloud firestore, i already create some collections with documents inside, but where can i check the total size i am using?
i already check some statistics in firebase console but i can only see the numbers of reads and writes.
You can check your Cloud Firestore size or Cloud Firestore stored data in Google Cloud App Engine Quotas page. You can go direct to Google Cloud using the link because when you create a Firebase project, you're also creating a project in the Google Cloud.
Inside the App Engine Quotas page you can see also the other Cloud Firestore usage information including reads, writes, index writes, deletes, and network egress.
Example images:
UPDATE:
You can check your usage up to last 30 days in Firestore Database > Usage
Go to the Firestore Usage tab
Click "View in Usage and Billing" (bottom right)
This will show a summary of usage including total bytes stored, bandwidth, reads, writes, and deletes.
The answer is you can’t check the usage of bytes stored per month in Firestore. Only per day.
Usage per day can be checked in the app engine
It’s confusing because you don’t know how much data you have stored in total in Firestore.
https://firebase.google.com/docs/firestore/manage-data/transactions
If we use the batch API to write something to firestore, is it going to be counted as a single write operation on pricing >
No. You are still charged for each individual document write.
It seems odd to me that Firestore would charge me for read queries to locally cached data, but I can't find any clarification to the contrary in the Firestore Pricing document. If I force Firebase into offline mode and then perform reads on my locally cached data, am I still charged for each individual entity that I retrieve?
Second, offline users in my app write many small updates to a single entity. I want the changes to persist locally each time (in case they quit the app), but I only need eventually consistent saves to the cloud. When a user reconnects to the internet and Firestore flushes the local changes, will I be charged a single write request for the entity or one per update call that I made while offline?
Firestore could potentially fit my use case very well, but if offline reads and writes are charged at the same rate as online ones it would not be an affordable option.
As the offical documentation says,
Cloud Firestore supports offline data persistence. This feature caches a copy of the Cloud Firestore data that your app is actively using, so your app can access the data when the device is offline. You can write, read, listen to, and query the cached data. When the device comes back online, Cloud Firestore synchronizes any local changes made by your app to the data stored remotely in Cloud Firestore.
So, every client that is using a Firestore database and sets PersistenceEnabled to true, maintains it's own internal (local) version of the database. When data is inserted/updated, it is first written to this local version of the database. As a result, all writes to the database are added to a queue. This means that all the operations that where stored there will be commited on Firebase servers once you are back online. This also means that those operations will be seen as independent operations and not as a whole.
But remeber, don't use Firestore as an offline-only database. It is really designed as an online database that came work for short to intermediate periods of being disconnected. While offline it will keep queue of write operations. As this queue grows, local operations and app startup will slow down. Nothing major, but over time these may add up.
If Google Cloud Firestore priceing model does not fit your use case very well then use Firebase Realtime Database. As mentioned also in this post from the Firebase offical blog, one the reasons you still might want to use the Realtime Database is:
As we noted above, Cloud Firestore's pricing model means that applications that perform very large numbers of small reads and writes per second per client could be significantly more expensive than a similarly performing app in the Realtime Database.
So it's up to you which option you choose.
According to this If you want to work completely offline with Cloud Firestore you can disable network by :
FirebaseFirestore.getInstance().disableNetwork()
but firestore will cause client offline error for first user get request, that you must consider this error as empty response.