In my project I have to update field value concurrently from multiple devices and check if the count reaches to 0. When I reduce value of field by 1 then I have make entry in another table. So when I try to update concurrently from 2 devices, the field decrement only by 1 instead of two. And when I tried using transaction, multiple entries are getting added to other table.
If multiple client need to modify multiple documents concurrently, you'll need to use a transaction. In this transaction you then get() the documents, and write the updates back to the database.
Once you do this in a transaction, the Firestore servers will detect conflicting updates, and force the second client to retry - at which point it gets the correct value (0 IIRC) and can then alert the user that there's no more stock.
Related
I have diagnostic data for devices being written to cosmos, some devices write 1000's of messages a day while others write just a few. I always want there to be diagnostics data regardless of when it was added but I don't want to retain all of it forever. Adding a TTL of 90 days works fine for the devices that are very active, they will always have diagnostics data as they are sending it in on a daily basis. The not so active devices will loose their diagnostics logs after the TTL.
Is there a way to use the TTL feature of CosmosDb but always keep at least n records?
I am looking for something like only keeping records from the last last 90 days (TTL) but always keep at least 100 documents regardless of the last updated timestamp.
There are no built-in quantity-based filters for TTL: you either have collection-based TTL, or collection+item TTL (item-based TTL overriding default set in the collection).
You'd need to create something yourself, where you'd mark eligible documents for deletion (based on time period, perhaps?), and then run a periodic cleanup routine based on item counts, age of delete-eligible items, etc.
Alternatively, you could treat low-volume and high-volume devices differently, with high-volume device telemetry written to TTL-based collections, and low-volume device telemetry written to no-TTL collections (or something like that)...
tl;dr this isn't something built-in.
Short anwser: there's no such built in functionality.
You could create your own Function App working on a schedule trigger that fires a query as such:
SELECT *
FROM c
WHERE NOT IS_DEFINED(c.ttl) --only update items that have no ttl
ORDER BY c._ts DESC
OFFSET 100 LIMIT 2147483647 --skip the newest 100
and then updates the items from it by setting a ttl for them. That way you'll be assured that that the newest 100 records remain available (assuming you don't have another process deleting others), while cleaning up the other items periodically. Keep in mind the update resets the tll as _ts will be updated.
I am building a flutter app with firebase backend in which I require multiple clients to update the same document simultaneously. But the fields that each client updates are different. So would there be any benefit if I use transaction to update rather than updating the document normally?
You would want to use a transaction if a client is updating a field using the contents of the other fields for reference. So if field2 needs to be computed consistently based on the contents of field1, you would need a transaction to ensure that there is no race condition between the update of either field.
If each field is entirely, logically separate from each other, and there is no race condition between their updates (they can all change independently of each other) then it should be safe to update them each without a transaction. But bear in mind that each document has a sustained max write rate of 1 per second. So if you have a lot of concurrent updates coming in, those updates could fail. In that case, you'd want each field to exist in its own document.
Can I read multi data from firestore in one connection similar to Transactions and batched writes but without written.
For example :
I logged in via google button and player name is Player1
First connection : I want read highest 10 players they have diamonds.
Second connection : I want read diamonds Player1.
Can I mix first and second connection in one connection.
Because I want if first connection failed so cancel second connection. Or if first connection connected successfully and second connection failed so cancel first connection etc... , I hope you understand what I mean.
Transactions and batched writes are write operations. There's nothing similar for read operations, nor should it really be needed.
If you want to fail subsequent reads, you should:
Sequentially start each read, after the previous read has completed.
Start all reads at the same time, but check the status of each completed read operation. Only continue once all read operations completed successfully.
From reading your question it sound like you want to do a client-side join of the player info for the top 10 ranked players. This typically leads to 11 reads.
The query to get the top 10 scores, which includes the player's UIDs.
The 10 individual document reads to get each top player's profile.
In this case you could for example keep a counter to track how many of the player profiles you've already successfully read. Once that counter reaches 10, you know you have all the player profiles, and can start any subsequent operation you might have. If you want to fail the entire operation once any player profile fails to load, you'll want a separate flag for that too.
Background:
I have a Firestore database with a users collection. Each user is a document which contains a contacts collection. Each document in that collection is a single contact.
Since firestore does not have a "count" feature for all documents, and since I don't want to read all contacts to count how many contacts a user has, I trigger cloud functions when a contact is added or deleted which increments or decrements numberOfContacts in the user document. In order to make the function idempotent, it has to do multiple reads and writes to prevent incrementing the counter more than once if it's called more than once for the same document. This means that I need to have a different collection of eventIDs that I've already handled so I don't duplicate it. This requires me to run another function once a month to go through each user deleting all such documents (which is a lot of reads and some writes).
Issue
Now the challenge is that the user can import his/her contacts. So if a user imports 10,000 contacts, this function will get fired 10,000 times in quick succession.
How do I prevent that?
Current approach:
Right now I am adding a field in the contact document that indicates that the addition was part of an import. This gets the cloud function to not increment.
I perform the operation from the client 499 contacts at a time in a transaction, which also increments the count as the 500th write. That way the count stays consistent if something failed halfway.
Is this really the best way? It seems so complicated to just have a count of contacts available. I end up doing multiple reads and writes each time a single contact changes plus I have to run a cleanup function every month.
I keep thinking there's gotta be a simpler way.
For those who're curious, it seems like the approach I am taking is the best appraoch.
I add a field in the contact document that indicates that the addition was part of an import (bulkAdd = true). This gets the cloud function to not increment.
I have another cloud function add the contacts 200 at a time (I do FieldValue.timestamp and that counts as another write, so it's 400 writes). I do this in a batch and the 401th write in the batch is the increment count. That way I can bulk import contacts without having to bombard a single document with writes.
Problem with increments
There are duplicate-safe operations like FieldValue.arrayUnion() & FieldValue.arrayRemove(). I wrote a bit about that approach here: Firebase function document.create and user.create triggers firing multiple times
By this approach you make your user document contain a special array field with contact IDs. Once the contact is added to a subcollection and your function is triggered, the contact's id can be written to this field. If the function is triggered twice or more times for one contact, there will be only one instance of it written into the master user doc. But the actual size can be fetched on the client or with one more function triggered on the user doc update. This is a bit simplier than having eventIDs.
Problem with importing 10k+ contacts
This is a bit philosophically.
If I got it, the problem is that a user performs 10k writes. Than these 10k writes trigger 10k functions, which perform additional 10k writes to the master doc (and same amount of reads if they use eventIDs document)?
You can make a special subcollection just for importing multiple contacts to your DB. Instead of writing 10k docs to the DB, the client would create one but big document with 10k contact fields, which triggers a cloud function. The mentioned function would read it all, make the neccessary 10k contact writes + 1 write to master doc with all the arrayUnions. You would just need to think how to prevent 10k invoked function writes (adding a special metadata field like yours bulkAdd)
This is just an opinion.
Assuming we're using AWS Triggers on DynamoDB Table, and that trigger is to run a lambda function, whose job is to update entry into CloudSearch (to keep DynamoDB and CS in sync).
I'm not so clear on how Lambda would always keep the data in sync with the data in dynamoDB. Consider the following flow:
Application updates a DynamoDB table's Record A (say to A1)
Very closely after that Application updates same table's same record A (to A2)
Trigger for 1 causes Lambda of 1 to start execute
Trigger for 2 causes Lambda of 2 to start execute
Step 4 completes first, so CloudSearch sees A2
Now Step 3 completes, so CloudSearch sees A1
Lambda triggers are not guaranteed to start ONLY after previous invocation is complete (Correct if wrong, and provide me link)
As we can see, the thing goes out of sync.
The closest I can think which will work is to use AWS Kinesis Streams, but those too with a single Shard (1MB ps limit ingestion). If that restriction works, then your consumer application can be written such that the record is first processed sequentially, i.e., only after previous record is put into CS, then the next record should be processed. Assuming the aforementioned statement is true, how to ensure the sync happens correctly, if there is so much of data ingestion into DynamoDB that more than one shards are needed n Kinesis?
You may achieve that using DynamoDB Streams:
DynamoDB Streams
"A DynamoDB stream is an ordered flow of information about changes to items in an Amazon DynamoDB table."
DynamoDB Streams guarantees the following:
Each stream record appears exactly once in the stream.
For each item that is modified in a DynamoDB table, the stream records appear in the same sequence as the actual modifications to the item.
Another cool thing about DynamoDB Streams, if your Lambda fails to handle the stream (any error when indexing in Cloud Search for example) the event will keep retrying and the other record streams will wait until your context succeed.
We use Streams to keep our Elastic Search indexes in sync with our DynamoDB tables.
AWS Lambda F&Q Link
Q: How does AWS Lambda process data from Amazon Kinesis streams and Amazon DynamoDB Streams?
The Amazon Kinesis and DynamoDB Streams records sent to your AWS Lambda function are strictly serialized, per shard. This means that if you put two records in the same shard, Lambda guarantees that your Lambda function will be successfully invoked with the first record before it is invoked with the second record. If the invocation for one record times out, is throttled, or encounters any other error, Lambda will retry until it succeeds (or the record reaches its 24-hour expiration) before moving on to the next record. The ordering of records across different shards is not guaranteed, and processing of each shard happens in parallel.
So that means Lambda would pick the Records in one shard one by one, in order they appear in the Shard, and not execute a new record until previous record is processed!
However, the other problem that remains is what if the entries of the same record are present across different shards? Thankfully, AWS DynamoDB Streams ensure that primary key only resides in a particular Shard always. (Essentially, I think, the Primary Key is what is used to find the hash to point to a shard) AWS Slide Link. See more from AWS Blog below:
The relative ordering of a sequence of changes made to a single primary key will be preserved within a shard. Further, a given key will be present in at most one of a set of sibling shards that are active at a given point in time. As a result, your code can simply process the stream records within a shard in order to accurately track changes to an item.