We recently started getting write stream exhausted errors:
#firebase/firestore: Firestore (8.10.0): FirebaseError: [code=resource-exhausted]: 8 RESOURCE_EXHAUSTED: Write stream exhausted maximum allowed queued writes.
#firebase/firestore: Firestore (8.10.0): Using maximum backoff delay to prevent overloading the backend.
We're sending a few million transactions in batches of 500 transactions to a firestore collection every 15 seconds. If we run it any more often than every 15 seconds we get the error above. If we exceed it too far, it eventually hard crashes. The advice we got from firebase support was to instead write to a bunch of subcollections, but their limits clearly show 500 writes/second even when writing sequentially to a single collection. Our document ids aren't quite sequential but are often close like client-id:guid where client id would often be the same for most of the write. Ideas on what we might be doing wrong/how to fix it? We've tried sending smaller more frequent batches and larger less frequent batches. We created a new project and didn't create any indexes to see if they were a problem. We've sent the requests from different clients, none of which are taxed for resources.
Cloud Firestore does not stop you from exceeding the tresholds below, but doing so affects perfomance, thus the "batches of 500" comes from this hard limit and this soft limit.
The main issue here is hotspotting, since the document IDs are client-id:guid where client id would often be the same for most of the write. I advised to follow best-practices, or your requests will get queued up, and eventually return RESOURCE_EXHAUSTED.
Our issue turned out to the be the batch sizes. We started sending batches of 50 instead of 500 and were able to write about 400/s. We still see the backoff errors but we're not losing any data. I haven't had time to test out the parallel individual writes to see if we can gain a little more write speed but I'll update this answer once I do.
Also, we tried to increase our batch write speeds with the 500/50/5 method they recommend and it still hit the resource exhausted errors.
Related
For example, I have 10 documents in my collection. 10 requests come in near the same second and will run the same query. They each will start their own transaction that will try and read 1 document, and then delete that document. Given the firestore documentation with document contention, it is made to seem like contention errors happen when more than one transaction occurs on the same document X amount of times (it is not documented how many times).
Cloud Firestore resolves data contention by delaying or failing one of the operations.
https://cloud.google.com/firestore/docs/transaction-data-contention
However, in this case since 1 of those transactions committed, I am assuming the other 9 that tried to operate on that same document, will retry because the document from the query was "changed" and couldn't commit. Then the next 9 transactions will try to do the same thing, but on another document, and this will continue until all requests finished deleting 1 document and there are no more active transactions.
Would the retry rules of these transactions that kept getting retried, be ABORTED due to contention, even though it's been a different document each time? Or would these transactions just keep getting delayed and retried because the contention is happening on different documents on each attempt?
According to the documentation, the transaction will retry a "finite number of times". This number is dependent on how the SDK itself is configured, which may be different for various SDK platforms and versions. It doesn't matter which contended document(s) caused the retry. The max number of retries is absolute for that transaction in order to avoid excessive work.
Newer versions of the SDK allow configuration of the number of retries (e.g. Android 24.4.0 lets you specify TransactionOptions).
I have a large one million document collection with Firebase that I treat as a stack array where the first element gets read and removed from the stack. My main problem is I have over a thousand connections trying to access the collection and I am having issues with connections receiving the same document. To prevent duplicates results, I've resorted to using Mutex as referenced by this post below..
Cloud Firestore document locking
I am using a Mutex to lock each document before removing it from the collection. I use transactions to ensure the mutex owner is not getting overwritten by other connections or to check if the document has not been removed yet.
The problem I have with this solution is as we scale up, more connections are fighting over retrieving a mutex lock. Each connection spends a long time retrying until it successfully locks a document. Avoiding long retries will allow for faster response time and less reads.
So in summary, a connection tries to retrieve a document. It retrieves the document but fails to successfully create a lock because another incoming connection just locked it. So it looks for another document and also fails. It keeps retrying until it beats another connnection to locking the document.
Is it possible to increase throughput and keep read costs low as I scale up?
Yeah, I doubt such a mutex is going to help your throughput.
How important is it that documents are processed in the exact order that they are in the queue? If it is not crucial, you could consider having each client request the first N documents, and then picking one at random to lock-and-process. That would improve your throughput up to N times.
Is it possible to limit the speed at which Google Firestore pushes writes made in an app to the online database?
I'm investigating the feasibility of using Firestore to store a data stream from an IoT device via a mobile device/bluetooth.
The main concern is battery cost - receive a new data packet roughly two minutes, I'm concerned about the additional battery drain that an internet round-trip every two minutes, 24hrs a day, will cost. I also would want to limit updates to wifi connections only.
It's not important for the data to be available online real-time. However it is possible for multiple sources to add to the same datastream in a 2-way sybc, (ie online DB and all devices have the merged data).
I'm currently handling that myself, but when I saw the offline capability of Datastore I hoped I could get all that functionality for free.
I know we can't directly control offline-mode in Firestore, but is there any way to prevent it from always and immediately pushing write changes online?
The only technical question I can see here has to do with how batch writes operate, and more importantly, cost. Simply put, a batch write of 100 writes is the same as writing 100 writes individually. The function is not a way to avoid the write costs of firestore. Same goes for transactions. Same for editing a document (that's a write). If you really want to avoid those costs then you could store the values for the thirty minutes and let the client send the aggregated data in a single document. Though you mentioned you need data to be immediate so I'm not sure that's an option for you. Of course, this would be dependent on what one interprets "immediate" as based off the relative timespan. In my opinion, (I know those aren't really allowed here but it's kind of part of the question) if the data is stored over months/years, 30 minutes is fairly immediate. Either way, batch writes aren't quite the solution I think you're looking for.
EDIT: You've updated your question so I'll update my answer. You can do a local cache system and choose how you update however you wish. That's completely up to you and your own code. Writes aren't really automatic. So if you want to only send a data packet every hour then you'd send it at that time. You're likely going to want to do this in a transaction if multiple devices will write to the same stream so one doesn't overwrite the other if they're sending at the same time. Other than that I don't see firestore being a problem for you.
Firestore offers 50000 documents read operations as part of its free bundle.
However, in my application, the client is fetching a collection containing price data. The price data is created over time. Hence, starting from a specific timestamp, the client can read up to 1000 documents. Each document represents one timestamp with the price information.
This is means that if the client refreshes his/her web browser 50 times, it will exhaust my quota immediately. And that is just for a single client.
That is what happened. And got this error:
Error: 8 RESOURCE_EXHAUSTED: Quota exceeded
The price data are static. Once they have been written, it is not supposed to change.
Is there a solution for this issue or I should consider other database other than Firestore?
The error message indicates that you've exhausted the quota that is available. On the free plan the quota is 50,000 document reads per day, so you've read that number of documents already.
Possible solutions:
Upgrade to a paid plan, which has a much higher quota.
Wait until tomorrow to continue, since the quota resets every day.
Try in another free project, since each project has its own quota.
If you have a dataset that will never-ever (or rarely) change, why not write it as JSON object in the app itself. You could make it a separate .js file and then import for reading to make your table.
Alternatively - is there a reason your users would ever navigate through all 1,000 records. You can simulate a full table even with limiting to calls to say 10 or so and then paginate to get more results if needed.
I have an application, which runs all the time and receives some messages (rate of them varies from several per second to none per hour). Every message should be put into a SQLite database. What's the best way to do this?
Opening and closing the database on each message doesn't sound good: if there are tens of them per second, it will be extremely slow.
On the other hand, opening the database once and just writing to it can lead to loss of data if the process unexpectedly terminates.
It sounds like whatever you do, you'll have to make a trade-off.
If safety is your top-most concern, then update the database on each message and take the speed hit.
If you want a compromise, then update the database write every so many messages. For instance, maintain a buffer and every 100th message, issue an update, wrapped in a transaction.
The transaction wrapping is important for two reasons. First, it maximizes speed. Second, it can help you recover from errors if you employ logging.
If you do the batch update above, you can add an additional level of safety by logging each message as it comes to a file. You will reset this log every time a database update is successfully issues. That way, if an update fails, you know it failed on the entire block (since you are using transactions) and your log will have the information that did not update. This will allow you to re-issue the update, or even see if there was a problem with the data that caused the failure. This of course assumes that keeping a log is cheaper than updating the database, which can be the case depending on how you are connecting.
If your top rate is "several per second" then I dont see a real problem with opening and closing the db. This is especially true if its critical that the data be recorded right away in case of server failure.
We use SQLite in a reporting product and the best performance we have been able to eek out is recording rows in blocks of several thousands at a time. Our default is around setting is 50k. That means our app waits around until 50k rows of data is collected then commits it as one transaction.
There is an easy algorithm to adjust your application's behaviour to the message rate:
When you have just written a message, check if there is any new message.
If yes, write that message too, and repeat.
Only when you have run out of immediately available messages, commit the transaction and close the database.
In that manner, every message will be saved immediately, unless the message rate becomes too high for that.
Note: closing the database will not increase data durability (that's what transaction commit is for), it will just free up a little bit of memory.