Write conflict in Dynamo - amazon-dynamodb

Imaging that there are two clients client1 and client2, both writing the same key. This key has three replicas named A, B, C. A first receives client1's request, and then client2', while B receives client2's request, and then client1's. Now A and B must be inconsistent with each other, and they cannot resolve conflict even using Vector Clock. Am I right?
If so, it seems that it is easy to occur write conflict in dynamo. Why so many open source projects based on dynamo's design?

If you're using Dynamo and are worried about race conditions (which you should be if you're using lambda)
You can check conditionals on putItem or updateItem, if the condition fails
e.g. during getItem the timestamp was 12345, add conditional that timestamp must equal 12345, but another process updates it, changes the timestamp to 12346, your put/update should fail now, in java for example, you can catch ConditionalCheckFailedException, you can do another get item, apply your changes on top, then resubmit the put/update
To prevent a new item from replacing an existing item, use a conditional expression that contains the attribute_not_exists function with the name of the attribute being used as the partition key for the table. Since every record must contain that attribute, the attribute_not_exists function will only succeed if no matching item exists.
For more information about PutItem, see Working with Items in the Amazon DynamoDB Developer Guide.
Parameters:
putItemRequest - Represents the input of a PutItem operation.
Returns:
Result of the PutItem operation returned by the service.
Throws:
ConditionalCheckFailedException - A condition specified in the operation could not be evaluated.
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/AmazonDynamoDB.html#putItem-com.amazonaws.services.dynamodbv2.model.PutItemRequest-

Can't talk about HBase, but I can about Cassandra, which is inspired in Dynamo.
If that happens in Cassandra, the most recent key wins.
Cassandra uses coordinator nodes (which can be any node), that receive the client requests and resends them to all replica nodes. Meaning each request has its own timestamp.
Imagine that Client2 has the most recent request, miliseconds after Client1.
Replica A receives Client1, which is saved, and then Client2, which is saved over Client1 since Client2 is the most recent information for that key.
Replica B receives Client2, which is saved, and then Client1, which is rejected since has an older timestamp.
Both replicas A and B have Client2, the most recent information, and therefore are consistent.

Related

Change feed observer on multi-region write

I am using multi-region write (and read) cosmos db. I have multiple change feed observers on the same collection, each updating a different search index (each with own lease prefix). Default consistency level is set to Session.
Using SDK v2 (and change feed processor library v2):
new ChangeFeedProcessorBuilder()
.WithHostName(hostName)
.WithProcessorOptions(hostOptions)
.WithFeedCollection(collectionInfo)
.WithLeaseCollection(leaseInfo)
.WithObserverFactory(observerFactory)
.BuildAsync();
My logs show a situation where 2 out of 3 of those observers received an older version of the updated document:
time t1: document1 created
time t2 (days after t1): document1 updated
time t3:
observer1 received document1 (version at t2)
observer2 received document1 (version at t1)
observer3 received document1 (version at t1)
Question: Does the changefeed processor instance have an affinity to a particular region? In other words, is it possible that it reads the LSN from one region and pulls the documents from another? I was not able to find clear documentation on change feed observers and multi-region. Is the assumption that once the processor instance acquires the lease, it will observe changes from the same region consistently, an incorrect assumption?
The region contacted is the default region (in the case of Multi master, the Hub region, the first one in the Portal list), unless you specify a PreferredLocation in collectionInfo you are using in WithFeedCollection.
DocumentCollectionInfo has a ConnectionPolicy property you can use to define your preference through the PreferredLocations (just like you can do with the normal SDK client). Reference: https://learn.microsoft.com/dotnet/api/microsoft.azure.documents.changefeedprocessor.documentcollectioninfo?view=azure-dotnet
All changes are pulled from that region, the LSN returned and the documents are from that region (they come in the same Change Feed response).
Once an observer acquires a lease, it will read changes for the partition that lease is for, from the region defined in the configuration (default is Hub or whatever you define in PreferredLocations).
EDIT: Are you doing a ReadDocument in your observer after getting the changes? If so, with Session consistency you will need the SessionToken from the IChangeFeedObserverContext.FeedResponse (reference https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.documents.changefeedprocessor.feedprocessing.ichangefeedobservercontext.feedresponse?view=azure-dotnet#Microsoft_Azure_Documents_ChangeFeedProcessor_FeedProcessing_IChangeFeedObserverContext_FeedResponse)

Is an HTTP request considered idempotent if changes a record's last modified time?

Suppose that I have a table called persons and that a request to change any information about a person also updates that record's last_modified column. Would such a request still be considered idempotent? What I'm trying to find out is if auxiliary fields can be exempted from the criteria of idempotence.
If any information is changed on the database after a request (a POST request obviously, you would not alter a person record on a GET request) then it's not indempotent. By definition. Unless you only store stats (like logs).
Here it's not the last_modified column which is important, it's the change any information about a person.
A GET request is indempotent, you can take any uri and put it in an <IMG> in a web page, browsers will load it without asking, it must not alter anything in the database, or in the session (like destroying a session is not indempotent). An indempotent request can be prefetched, can run in any prioity (no need to care about the order of several indempotent queries,none of them can impact the other), etc.

Android SyncAdapter and concurrent write accesses

I am developing a RESTful Android app using SyncAdapter. I have watched some screencasts and read some tutorials, but they only address basic dynamics. Before starting to write the code, I would like to try and receive feedback from expert users about concurrency issues.
Step 1. The user U inserts a new entry E into the table T of the database. The entry has a column status = TO_SYNC.
Step 2. Before the entry E is synced, U decides to modify it. An activity A starts for modifying the column values of E.
Step 3. While the user is modifying the entry, the SyncAdapter starts and sends the entries with status == TO_SYNC to the server. For each entry, the SyncAdapter sets status = SYNCED once it receives a positive answer from the server.
Step 4. Let's say that a concurrent access to the same entry E causes a conflict:
The SyncAdapter reads E, sends E to the server
The activity A completes and sets E status to TO_SYNC
The SyncAdapter receives the ok from the server and sets the status of E to SYNCED
At this point the new values of E will not be synced since the value TO_SYNC has been overwritten by the SyncAdapter.
My question is: how can I avoid such issue without blocking the entire database with a begin/end transaction while syncing (which could take a long time to complete)? Should I fall back on a classical java lock on single entries? Is there a more elegant way?
You can sync to the server based on timestamps (call it LAST_UPDATED_TIME) rather than a simple flag and along with that in another column (call it HASH) you store the hash value of the concatenated string of all the values in a row. With this you check the timestamp since last sync, get those incremental data to be synced and sync to the server (of course in a separate thread) and when you get back the results you will do the following steps:
Compare the hash of the data currently in the database/row against what was synced.
Based on the truthness of the above statement two things can happen:
If the hash of the data in the db/row is equals to that value that was synced, you just update the LAST_UPDATED_TIME field with the time of sync
If the hash of the data in the db/row is not equals to that values that was synced, you immediately sync the row again (you can obviously optimize this even more)
Another way, with second column "updated_at"
The SyncAdapter reads E, sends E to the server
The activity A completes and sets E status to TO_SYNC and updated_at at NOW
The SyncAdapter receives the ok from the server
The SyncAdapter reads E_bis, compare E.updated_at and E_bis.updated_at, if different => request new sync, else => sets the status of E to SYNCED

Modality work list - Which items are returned for C-FIND request of a sequence?

My question is a really basic question. Consider to query a modality work list to get some work items by a C-FIND query. Consider using a sequence (SQ) as Return Key attribute for the C-FIND query, for example: [0040,0100] (Scheduled Procedure Step) and universal matching.
What should I expect in the SCP's C-FIND response? Or, better say, what should I expect to find with regards of the scheduled procedure step for a specific work item? All the mandatory items that Modality Work List Information Model declare as encapsulated in the sequence? Should I instead explicitly issue a C-FIND request for those keys I want the SCP return in the response?
For example: if I want the SCP return the Scheduled Procedure Step Start Time and Scheduled Procedure Start Date, do I need to issue a specific C-FIND request with those keys or querying for Scheduled Procedure Step key is enough to force the SCP to send all items related to the Scheduled Procedure Step itself?
Yes, you should include the Scheduled Procedure Step Start Time / Date Tags into the 0040,0100 sequence.
See also Service Class Specifications (K6.1.2.2)
This will not ensure you will retrieve this information, because it depends on the Modality Worklist Provider, which information will be returned.
You could also request a Dicom Conformance Statement from the Modality Provider to know the necessary tags for request/retrieve.
As for table K.6-1, you can consider it as showing only the requirement of the SCP side or what SCP is required to use for matching key (i.e. query filter) and additional required attribute values to return (i.e. Return Key) with successful match. It is up to SCP’s implementation to support matching against required key but you can always expect SCP to use the values in matching key for query filter.
Also note that, SCP is only required to return values for attributes that are present in the C-FIND Request. One exception is the sequence matching and there you have the universal matching like mechanism where you can pass a zero length ITEM to retrieve entire sequence. So as stated in PS 3.4 section C.2.2.2.6, you can just include an empty ITEM (FFFE, E000) element with VR of SQ under Scheduled Procedure Step Sequence (0040, 0100) for universal matching.

Transmit data to client on specific page, based on SQL Server column or row update

I want to achieve something specific using ASP.NET and SQL Server. Let's for example that I have several pages, each one with each own identification (ie. id=1, id=5). Furthermore, let's assume that for each one of those id I have a row in the database:
So in short, what I want to achieve is: Pushing database changes in-directly to specific clients on specific pages while taking advantage of web sockets (persistent connection).
for example:
Row 1:
id = 1
name = myname1
Row 2:
id = 2
name = myname2
What I want to do is that when the specific row or even a specific value in a column changes, it will trigger an event that can send a specific data to ONLY those clients that are visiting the page with a specific id that was changed.
for example: if row 1 column name changed from 'name1' to 'name2', and the ID of the primary key is 5, I want all those who visit the page with id=5 to recieve an event in the client side.
I want to prevent myself for developing a client code that will contentiously send requests to a webservice and query that specific row by id to see if it was update or a specific column value was changed.
One solution that I thought about is to keep the key/value in memory (ie. memcache) like the key represents the id and the value will be the datetime lst updated. Then I can query the memory and if, for example, [5, 05/11/2012 12:03:45] I can know if they data was last updated by saving the last time I queries the memory in the client side, and compare the dates. If the client datetime value is older than the one in the key/value in the memory, then I would query the database again.
However, it's still a passive approach.
Let me draw it how it should work:
Client and Server have persistent connection [can be done using ASP.NET 4.5 socket protocols]
Server knows to differentiate between a connection that comes from different pages, those with different query strings for example, id=1, id=2, etc. One option I thought about is to create an array in memory that stores the connection Ids for each connection string id value. For example: {1: 2346,6767,87878, 2:876,8765,3455}. 1 and 2 are the page's identification (ie. id=1, id=2), and the other values are the connection ids of the persistent connection that I get using ASP.net 4.5
A column value in a row with primary key value id=5 has its column 'count' updated from value '1' to '2'.
A trigger calls a function and pass the id (let's assume value X) of the changed row. I prefer being able to also send specific columns' value ( some column of my choice) [this is done using CLR triggers]
The function has a list of connections, for the clients who are visiting the page with id with value X (a number)
The Server sends the client the column values, or if it's not possible, just send true or false, notifying the client that a change to that row has been taken place.
Solved until now:
1] Can be done using ASP.NET 4.5 socket protocols
4] Using CLR triggers I can have a function that gets to have the columns data and id of a specific row that was altered.
I am developing my app using ASP.NET 4.5.
Thanks
Sql Server Service Broker can accomplish a good portion of your requirements.
Service Broker allows for async messaging in sql server. Since it's asynchronous let's split up the functionality into 2 parts.
The first part is a trigger on the table that writes a message the the service broker queue. This is just straight T-SQL, and fairly straight forward. The payload of the message is anything you can convert to varbinary(max). it could be xml, a varchar(100) that contains comma seperated values, or some other representation.
The second part is the handling of the message. You issue a transact-sql RECEIVE statement to get the next message from the queue. This statement blocks until something arrives. A queue can have multiple conversations, so each client gets their own notifications.
Conceptually, it could work like this (Client is asp.net code):
Client opens Service Broker conversation .
Client sends a message which says "I'm Interested in Page=3)
Client does a RECEIVE which blocks indefinitely
UPDATE changes data for page=3
Trigger on table sends message to every conversation that is interested in Page=3
Client receives the message, and sends updated data to web browser.
No CLR required, no periodic polling of the database.

Resources