In the docs it is stated, that a transaction fails, when:
The transaction read a document that was modified outside of the transaction.
I'm wondering if this also applies to read queries, that might return more than one document (via .where('x', '==', 'y')). Does the transaction still fail, if the Read query would return more results if it was executed again sometime during the transaction?
To illustrate my question, let's say I have a collection of cars with the following Schema:
{
ownerId: string,
make: string,
horsepower: int
...
}
Now I'm querying the cars of a certain owner in a transaction.get() call:
transaction.get(firestore.collection('cars').where('ownerId', '==', '123'))...
Let's say I receive a Snapshot with 2 Cars, based on these cars I want to do some magic in the transaction. During the transaction another car is added for this owner (so it's not part of the initial Snapshot). Will the transaction fail in this case?
PS: I'm not looking for a different solution, the example above is fictitious, I just want to understand how the transaction behaves in this kind of case.
Let's say I receive a Snapshot with 2 Cars, based on these cars I want to do some magic in the transaction. During the transaction another car is added for this owner (so it's not part of the initial Snapshot). Will the transaction fail in this case?
Definitely no. Since the new added car is not apart of the initial transaction, it won't fail. The car that is added is consider a new added object and not an object that was modified.
They mention in the docs that:
The transaction read a document that was modified outside of the transaction.
Because a transaction absolutely requires round trip communications with server in order to ensure that the code inside the transaction completes successfully. So that's why a transaction will fail if that particular document is modified by an operation other than the transaction in which is already involved.
Related
This is a follow-up/elaboration to a previous question of mine.
In the case of a collection of documents containing a time range represented by two timestamp fields (start and end), how does one go about guaranteeing that two documents don't get added with overlapping time ranges?
Say I had the following JavaScript on form submit:
var bookingsRef = db.collection('bookings')
.where('start', '<', booking.end)
.where('end', '>', booking.start);
bookingsRef.get().then(snapshot => {
// if a booking is found (hence there is an overlap), display error
// if booking is not found (hence there is no overlap), create booking
});
Now if two people were to submit overlapping bookings at the same time, could transactions be used (either on the client or the server) to guarantee that in between the get and add calls no other documents were created that would invalidate the original collection get query where clauses.
Or would my option be using some sort of security create rule that checks for other document time overlaps prior to allowing a new write (if this is at all possible)? One approach to guarantee document uniqueness via security rules seems to be exposing field values in the document ID, but I'm not entirely sure how exposing the start and end timestamp values in the ID would allow a rule to check for overlapping time ranges.
I think transaction is proper approach. According to the documentation:
..., if a transaction reads documents and another client
modifies any of those documents, Cloud Firestore retries the
transaction. This feature ensures that the transaction runs on
up-to-date and consistent data.
This seems to be an answer to your problem. All reads will be retried, if anything will change in the meantime. I think transaction mechanism is exactly for that reason.
I am implementing a one-to-one chat app using firestore in which there is a collection named chat such that each document of a collection is a different thread.
When the user opens the app, the screen should display all threads/conversations of that user including those which have new messages (just like in whatsapp). Obviously one method is to fetch all documents from the chat collection which are associated with this user.
However it seems a very costly operation, as the user might have only few updated threads (threads with new messages), but I have to fetch all the threads.
Is there an optimized and less costly method of doing the same where only those threads are fetched which have new messages or more precisely threads which are not present in the user's device cache (either newly created or modified threads).
Each document in the chat collection have these fields:
senderID: (id of the user who have initiated the thread/conversation)
receiverID: (id of the other user in the conversation)
messages: [],
lastMsgTime: (timestamp of last message in this thread)
Currently to load all threads of a certain user, I am applying the following query:
const userID = firebase.auth().currentUser.uid
firebase.firestore().collection('chat').where('senderId', '==', userID)
firebase.firestore().collection('chat').where('receiverId', '==', userID)
and finally I am merging the docs returned by these two queries in an array to render in a flatlist.
In order to know whether a specific thread/document has been updated, the server will have to read that document, which is the charged operation that you're trying to avoid.
The only common way around this is to have the client track when it was last online, and then do a query for documents that were modified since that time. But if you want to show both existing and new documents, this would have to be a separate query, which means that it'd end up in a separate area of the cache. So in that case you'll have to set up your own offline storage on top of Firestore's, which is more work than I'm typically willing to do.
I need to query a collection and return all documents that are new or updated since the last query. The collection is partitioned by userId. I am looking for a value that I can use (or create and use) that would help facilitate this query. I considered using _ts:
SELECT * FROM collection WHERE userId=[some-user-id] AND _ts > [some-value]
The problem with _ts is that it is not granular enough and the query could miss updates made in the same second by another client.
In SQL Server I could accomplish this using an IDENTITY column in another table. Let's call the table version. In a transaction I would create a new row in the version table, do the updates to the other table (including updating the version column with the new value. To query for new and updated rows I would use a query like this:
SELECT * FROM table WHERE userId=[some-user-id] and version > [some-value]
How could I do something like this in Cosmos DB? The Change Feed seems like the right option, but without the ability to query the Change Feed, I'm not sure how I would go about this.
In case it matters, the (web/mobile) clients connect to data in Cosmos DB via a web api. I have control of the entire stack - from client to back-end.
As the statements in this link:
Today, you see all operations in the change feed. The functionality
where you can control change feed, for specific operations such as
updates only and not inserts is not yet available. You can add a “soft
marker” on the item for updates and filter based on that when
processing items in the change feed. Currently change feed doesn’t log
deletes. Similar to the previous example, you can add a soft marker on
the items that are being deleted, for example, you can add an
attribute in the item called "deleted" and set it to "true" and set a
TTL on the item, so that it can be automatically deleted. You can read
the change feed for historic items, for example, items that were added
five years ago. If the item is not deleted you can read the change
feed as far as the origin of your container.
Change feed is not available for your requirements.
My idea:
Use Azure Function Cosmos DB Trigger to collect all the operations in your specific cosmos collection. Follow this document to configure the input of azure function as cosmos db, then follow this document to configure the output as azure queue storage.
Get the ids of changed items and send them into queue storage as messages.When you want to query the changed item,just query the messages from the queue to consume them at a specific unit time and after that just clear the entire queue. No items will be missed.
With your approach, you can get added/updated documents and save reference value (_ts and id field) somewhere (like blob)
SELECT * FROM collection WHERE userId=[some-user-id] AND _ts > [some-value] and id !='guid' order by _ts desc
This is a similar approach we use to read data from Eventhub and store checkpointing information (epoch number, sequence number and offset value) in blob. And at a time only one function can take a lease of that blob.
If you go with ChangeFeed, you can create listener (Function or Job) to listen all add/update data from collection and you can store those value in some collection, while saving data you can add Identity/version field on every document. This approach may increase your cosmos DB bill.
This is what the transaction consistency levels are for: https://learn.microsoft.com/en-us/azure/cosmos-db/consistency-levels
Choose strong consistency and your queries will always return the latest write.
Strong: Strong consistency offers a linearizability guarantee. The
reads are guaranteed to return the most recent committed version of an
item. A client never sees an uncommitted or partial write. Users are
always guaranteed to read the latest committed write.
When creating a new Transaction can I reference an uncommitted State as an input state?
E.g.
Issue new Painting State to the ledger with color attribute set to "Blue".
Before all parties have signed (so Painting state remains uncommitted) I issue a new transaction consuming the uncommitted state as input and new Painting state as output (setting painting color attribute to "Red").
Inputs are referenced in transactions using StateRefs, which are defined as follows:
data class StateRef(val txhash: SecureHash, val index: Int)
Where:
txhash is the hash of the transaction that created the input state
index is the index of the input state in the outputs of the transaction that created it
If you wanted to create a second transaction consuming an input state before you've committed the first transaction, you could proceed as follows:
Completely build the first transaction
Manually construct a StateRef based on the first transaction's ID
Pass this StateRef as an input to the second transaction
You could even build an entire transaction chain in this way.
However...
Until you commit the first transaction, you will see odd behaviour when you do stuff with the second transaction. For example:
If you try and verify the second transaction, your node will try and use the node's vault to convert all the input StateRefs into actual states. This will fail, because you haven't stored the first transaction yet
If you send the second transaction to a counterparty and they try to resolve its dependency graph, your node will try and retrieve the first transaction from its transaction storage and send it over. Again, this will fail because you haven't stored the first transaction yet
So if you're building a transaction chain in this way without committing each state as you create it, you must be very careful about the order in which you later commit the transactions and what operations you do on the second, third, etc. transaction in the meantime.
My database uses redundant data to speed up fetches and minimise the number of documents that need to be read for certain queries. For example I'd store the names of followed users in a map in a users document so I don't have to read another document to retrieve the names of each of the followed users.
User: (Collection) {
userID: (Document) {
//user state
name: ...
followingUsers: (Map) {
followingUserID: nameOfUser,
followingUserID: nameOfUser
}
}
}
If a user was to change their name, what is the best way to propagate these changes to all places with the redundant data?
Good question!
For starters, I'd recommend doing this kind of administrative task in a server SDK or cloud function, since you don't want a client to necessarily have the ability to start mucking with every single User doc.
The good news is that, once you start using the server SDKs, you can then put a query into a transaction. So let's say user_123 changes their name from "Jenny" to "Jen". Your transaction would look something like this in pseudo-code:
Start Transaction
transaction.get(usersRef.where("followingUsers.user_123", ">=", ""))
Loop through query results. Grab the doc_id from each doc and use that to start building out the writes in your transaction.
transaction.update("/users/<doc_id>/", {"followingUsers.user_123" : "Jen"})
Also make sure you add transactions.update("/users/user_123", {"name": "Jen"})
End transaction
This general approach would also work on the client-side, but you just wouldn't be able to do this in a transaction. (You could still put all of these changes into a batch write, though.)