I am using multi-region write (and read) cosmos db. I have multiple change feed observers on the same collection, each updating a different search index (each with own lease prefix). Default consistency level is set to Session.
Using SDK v2 (and change feed processor library v2):
new ChangeFeedProcessorBuilder()
.WithHostName(hostName)
.WithProcessorOptions(hostOptions)
.WithFeedCollection(collectionInfo)
.WithLeaseCollection(leaseInfo)
.WithObserverFactory(observerFactory)
.BuildAsync();
My logs show a situation where 2 out of 3 of those observers received an older version of the updated document:
time t1: document1 created
time t2 (days after t1): document1 updated
time t3:
observer1 received document1 (version at t2)
observer2 received document1 (version at t1)
observer3 received document1 (version at t1)
Question: Does the changefeed processor instance have an affinity to a particular region? In other words, is it possible that it reads the LSN from one region and pulls the documents from another? I was not able to find clear documentation on change feed observers and multi-region. Is the assumption that once the processor instance acquires the lease, it will observe changes from the same region consistently, an incorrect assumption?
The region contacted is the default region (in the case of Multi master, the Hub region, the first one in the Portal list), unless you specify a PreferredLocation in collectionInfo you are using in WithFeedCollection.
DocumentCollectionInfo has a ConnectionPolicy property you can use to define your preference through the PreferredLocations (just like you can do with the normal SDK client). Reference: https://learn.microsoft.com/dotnet/api/microsoft.azure.documents.changefeedprocessor.documentcollectioninfo?view=azure-dotnet
All changes are pulled from that region, the LSN returned and the documents are from that region (they come in the same Change Feed response).
Once an observer acquires a lease, it will read changes for the partition that lease is for, from the region defined in the configuration (default is Hub or whatever you define in PreferredLocations).
EDIT: Are you doing a ReadDocument in your observer after getting the changes? If so, with Session consistency you will need the SessionToken from the IChangeFeedObserverContext.FeedResponse (reference https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.documents.changefeedprocessor.feedprocessing.ichangefeedobservercontext.feedresponse?view=azure-dotnet#Microsoft_Azure_Documents_ChangeFeedProcessor_FeedProcessing_IChangeFeedObserverContext_FeedResponse)
Related
This is a follow-up/elaboration to a previous question of mine.
In the case of a collection of documents containing a time range represented by two timestamp fields (start and end), how does one go about guaranteeing that two documents don't get added with overlapping time ranges?
Say I had the following JavaScript on form submit:
var bookingsRef = db.collection('bookings')
.where('start', '<', booking.end)
.where('end', '>', booking.start);
bookingsRef.get().then(snapshot => {
// if a booking is found (hence there is an overlap), display error
// if booking is not found (hence there is no overlap), create booking
});
Now if two people were to submit overlapping bookings at the same time, could transactions be used (either on the client or the server) to guarantee that in between the get and add calls no other documents were created that would invalidate the original collection get query where clauses.
Or would my option be using some sort of security create rule that checks for other document time overlaps prior to allowing a new write (if this is at all possible)? One approach to guarantee document uniqueness via security rules seems to be exposing field values in the document ID, but I'm not entirely sure how exposing the start and end timestamp values in the ID would allow a rule to check for overlapping time ranges.
I think transaction is proper approach. According to the documentation:
..., if a transaction reads documents and another client
modifies any of those documents, Cloud Firestore retries the
transaction. This feature ensures that the transaction runs on
up-to-date and consistent data.
This seems to be an answer to your problem. All reads will be retried, if anything will change in the meantime. I think transaction mechanism is exactly for that reason.
I need to query a collection and return all documents that are new or updated since the last query. The collection is partitioned by userId. I am looking for a value that I can use (or create and use) that would help facilitate this query. I considered using _ts:
SELECT * FROM collection WHERE userId=[some-user-id] AND _ts > [some-value]
The problem with _ts is that it is not granular enough and the query could miss updates made in the same second by another client.
In SQL Server I could accomplish this using an IDENTITY column in another table. Let's call the table version. In a transaction I would create a new row in the version table, do the updates to the other table (including updating the version column with the new value. To query for new and updated rows I would use a query like this:
SELECT * FROM table WHERE userId=[some-user-id] and version > [some-value]
How could I do something like this in Cosmos DB? The Change Feed seems like the right option, but without the ability to query the Change Feed, I'm not sure how I would go about this.
In case it matters, the (web/mobile) clients connect to data in Cosmos DB via a web api. I have control of the entire stack - from client to back-end.
As the statements in this link:
Today, you see all operations in the change feed. The functionality
where you can control change feed, for specific operations such as
updates only and not inserts is not yet available. You can add a “soft
marker” on the item for updates and filter based on that when
processing items in the change feed. Currently change feed doesn’t log
deletes. Similar to the previous example, you can add a soft marker on
the items that are being deleted, for example, you can add an
attribute in the item called "deleted" and set it to "true" and set a
TTL on the item, so that it can be automatically deleted. You can read
the change feed for historic items, for example, items that were added
five years ago. If the item is not deleted you can read the change
feed as far as the origin of your container.
Change feed is not available for your requirements.
My idea:
Use Azure Function Cosmos DB Trigger to collect all the operations in your specific cosmos collection. Follow this document to configure the input of azure function as cosmos db, then follow this document to configure the output as azure queue storage.
Get the ids of changed items and send them into queue storage as messages.When you want to query the changed item,just query the messages from the queue to consume them at a specific unit time and after that just clear the entire queue. No items will be missed.
With your approach, you can get added/updated documents and save reference value (_ts and id field) somewhere (like blob)
SELECT * FROM collection WHERE userId=[some-user-id] AND _ts > [some-value] and id !='guid' order by _ts desc
This is a similar approach we use to read data from Eventhub and store checkpointing information (epoch number, sequence number and offset value) in blob. And at a time only one function can take a lease of that blob.
If you go with ChangeFeed, you can create listener (Function or Job) to listen all add/update data from collection and you can store those value in some collection, while saving data you can add Identity/version field on every document. This approach may increase your cosmos DB bill.
This is what the transaction consistency levels are for: https://learn.microsoft.com/en-us/azure/cosmos-db/consistency-levels
Choose strong consistency and your queries will always return the latest write.
Strong: Strong consistency offers a linearizability guarantee. The
reads are guaranteed to return the most recent committed version of an
item. A client never sees an uncommitted or partial write. Users are
always guaranteed to read the latest committed write.
I am trying to do simple fund transfer from one account to another account using simple state. 2 Flows have been created one for issue of transfer request with cash transfer and flow created to just consumed that transaction. My question is, is it possible to transfer and consume state in one flow ?
As per my opinion transaction must be consumed after transfer but also want to show it on UI.
Corda RPCQuery allowed to bring information of unconsumed states only, if I consume above transaction, is there way to show consumed transactions last state?
CordaRPCOps allows you to query for unconsumed states, consumed states, or both. Here's a simple way of querying for all states:
val criteria = QueryCriteria.VaultQueryCriteria(Vault.StateStatus.ALL)
val results = proxy.vaultQueryBy<ContractState>(criteria)
To show the last consumed state, you could retrieve all the consumed states in descending order of consumption and grab the first one, as follows:
val criteria = QueryCriteria.VaultQueryCriteria(Vault.StateStatus.UNCONSUMED)
val sortColumn = Sort.SortColumn(SortAttribute.Standard(Sort.VaultStateAttribute.CONSUMED_TIME), Sort.Direction.DESC)
val sorting = Sort(listOf(sortColumn))
val results = proxy.vaultQueryBy<Obligation>(criteria, sorting = sorting)
val lastConsumedState = results.states[0]
From any flow, you need to call FinalityFlow, in order to notarize
and record the transaction in individual parties' vault. So I think
after the issuance (or transfer), you need to call FinalityFlow
first. Only then you can use the issued state as input for the new
transaction.
The notary is responsible for avoiding the double spend
of the input state in any transaction. So you can not use any newly
issued state (as input to new transaction), until the notary is aware
of it.
Thus I think in your case, you will need to call FinalityFlow twice, once after each transaction (i.e. issuance & consumption).
There were some changes in the API for 1.0 that had removed isRelevant(). What are the best workarounds for this?
Given a use case: If there are 100 parties that want to see this queryable state and all updates pertaining to it (but read-only and doesn't need to sign), do I need to add them all to the Participants list? The "observer" role doesn't exist yet? Is there an alsoInformed or something similar for the use case of just seeing static reference data?
The general function here is a shared linear queryable state where the issuer has the master to change/update which would propagate to all parties that want to "subscribe" to these changes. I believe this might work with a broadcast to a "club", but I don't think clubs exist yet or if they're dynamic groupings of the network map.
I'll go into a bit of background before answering... The concept of relevancy still exists in the platform. As you know, in Corda there are two data persistence stores; the storage service and the vault.
The storage service
The storage service is a key -> value store that persists data such as:
Serialised flow state machines
Attachments
Transactions
The storage service is great for storing large amounts of serialised data that can be indexed and retrieved by hash. However, it is awkward if one wishes to search for data within one of the stored objects. E.g. one cannot easily search for transaction output states of a specific type when using the transaction store. The approach would be to iterate through all transactions, deserialise them one by one, and filter by output type. It's cumbersome and not very efficient. This is why the vault and the concept of relevancy exists!
The vault
The vault exists to store state objects, as opposed to transactions. There is a master states table where the state reference, transaction id (that generated the output state) and some other meta data such as whether the state is consumed (or not), is stored. There's also a table for LinearStates and a table for OwnableStates. Also, if one wishes to add an ORM to their state, a database table is created for each type of state object reflecting the ORM definition. These properties can then be queried to pull out states from the vault that meet specific queries, e.g. "Any obligation states over £1000 with Alice as the lender that have not yet been consumed". That's the power of the vault!
Relevancy
Now, it is the case that not all transactions a node receives produce states that are relevant to that node. An example would be a payment vs payment transaction where Alice sends dollars to Bob and Bob sends pounds to Alice. As Bob now owns the dollars Alice previously owned, those dollars are now not relevant for Alice. As such, Alice shouldn't record the output state representing those dollars as she does not hold the right and obligations to those dollars. What Alice does do is to mark the old dollar state as consumed, thus it will now not count towards her total dollars balance and cannot be used as an input in another transaction (as it has already been spent).
How relevancy works in Corda
So, when a node receives a new transaction, it intersects the public keys defined in the participants property of each output state with all the public keys that the VaultService is aware of. If the resultant set fora particular state is not empty, then the state is relevant for the node. Simple.
What this means is that if a node receives a transaction where their public keys are not listed in an output states' participants field, then they will not store that output state in the vault. However, they will store the transaction in the transaction store, and it can still be queried.
The concept of relevancy for OwnableStates is simple, one either owns it or they don't. The concept for LinearStates that represent multi-lateral agreements is more complex. In versions M14 and below, one could override the functionality of isRelevant in a LinearState, however in V1 this has been removed in favour of an easier approach which just compares the participants keys to the vault keys (as described above).
Implications of the V1 approach to relevancy
As the OP notes, in V1, there will be the concept of transaction observers, where nodes that were not participants of a state can still store the state in their vault and query it as a "third party" state. I.e. it cannot be consumed or contribute to balance totals but it can be queried. In the meantime, we will have to work around the absence of that feature and the options are:
For LinearStates, add all intended observers to the participants list. Then, add an additional property to the state object called something like consumers that just contains the list of participants that can consume this state in a valid transaction, i.e. sign a transaction containing it. The contract for that state will then compare those public keys in the commands to those in the consumers list. This way all the observers will still store the state in their vaults. The FinalityFlow will broadcast this transaction to all participants. You can use randomly generated public keys if you don't want the observers to be known to other participants.
For OwnableStates, like Cash, there can only be one identity in participants, the owner. So the approach would be to use the FinalityFlow to send the transaction to a set of observers, then those observers would have to get the output states directly from the transaction. Cumbersome but temporary as we are working on transaction observers at this moment: https://r3-cev.atlassian.net/browse/CORDA-663.
Just a strawman of what I understood if this were to be in code.
i.e using obligation cordapp example
Obligation State
val consumers: List<AbstractParty> = listOf(lender, borrower)
override val participants: List<AbstractParty> get() = listOf(lender, borrower, extraActor)
Contract code verify
override fun verify(tx: LedgerTransaction){
val command = tx.commands.requireSingleCommand<Commands>()
when (command.value) {
is Obligation.Issue -> requireThat {
"The signers should only be the consumers for the obligation" using
(command.signers.toSet() == obligation.consumers.map { it.owningKey }.toSet())
}
Add Command specifying the signers need only be consumers during TX creation
val utx = TransactionBuilder(notary = notary)
.addOutputState(state, OBLIGATION_CONTRACT_ID)
.addCommand(Obligation.Issue(), state.consumers.map { it.owningKey })
.setTimeWindow(serviceHub.clock.instant(), 30.seconds)
In this way, the first tx allows the extraActor to commit the state into the ledger without signing. In a future tx proposal, the extraActor here can query its state table and propose a change of lifecycle in the state using a different command, whereby this time it may require all participants (if need be) to sign the command. i.e Obligation.DoSomethingExtra command with all participant signing (command.signers.toSet() == obligation.participants.map { it.owningKey }.toSet())
Imaging that there are two clients client1 and client2, both writing the same key. This key has three replicas named A, B, C. A first receives client1's request, and then client2', while B receives client2's request, and then client1's. Now A and B must be inconsistent with each other, and they cannot resolve conflict even using Vector Clock. Am I right?
If so, it seems that it is easy to occur write conflict in dynamo. Why so many open source projects based on dynamo's design?
If you're using Dynamo and are worried about race conditions (which you should be if you're using lambda)
You can check conditionals on putItem or updateItem, if the condition fails
e.g. during getItem the timestamp was 12345, add conditional that timestamp must equal 12345, but another process updates it, changes the timestamp to 12346, your put/update should fail now, in java for example, you can catch ConditionalCheckFailedException, you can do another get item, apply your changes on top, then resubmit the put/update
To prevent a new item from replacing an existing item, use a conditional expression that contains the attribute_not_exists function with the name of the attribute being used as the partition key for the table. Since every record must contain that attribute, the attribute_not_exists function will only succeed if no matching item exists.
For more information about PutItem, see Working with Items in the Amazon DynamoDB Developer Guide.
Parameters:
putItemRequest - Represents the input of a PutItem operation.
Returns:
Result of the PutItem operation returned by the service.
Throws:
ConditionalCheckFailedException - A condition specified in the operation could not be evaluated.
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/AmazonDynamoDB.html#putItem-com.amazonaws.services.dynamodbv2.model.PutItemRequest-
Can't talk about HBase, but I can about Cassandra, which is inspired in Dynamo.
If that happens in Cassandra, the most recent key wins.
Cassandra uses coordinator nodes (which can be any node), that receive the client requests and resends them to all replica nodes. Meaning each request has its own timestamp.
Imagine that Client2 has the most recent request, miliseconds after Client1.
Replica A receives Client1, which is saved, and then Client2, which is saved over Client1 since Client2 is the most recent information for that key.
Replica B receives Client2, which is saved, and then Client1, which is rejected since has an older timestamp.
Both replicas A and B have Client2, the most recent information, and therefore are consistent.