I have a transaction system in my web service. Basically you can send (fake) currency back and forth between accounts.
This requires a strongly consistent write model. This is the scenario i'm worried about. A has 0$ B has 0$ and C has 100$. C sends A 100$ Cs balance is currently being updated. Before the update is propagated across all dynamoDB nodes, C sends 100$ to B, since on another instance, his balance is still 100$. Now C has a balance of -100$ and A and B both have 100$. This would be an invalid state.
Since this scenario would require both consistent reads and writes (to check and update balances), are dynamoDB writes suited for this?
I'm not a DynamoDB expert, but I'm almost certain you can pull this off.
To make this work, make sure to use the ConditionExpression feature on the UpdateItem request. That way you can protect against what you're talking about.
Reads in DynamoDB are "eventually consistent", so it is possible, as you stated, that a reader could think that "C has $100" just after you've changed C's balance. However, I'm pretty sure using the condition expression you can fully guarantee that you never get into this bad state. You can structure your update request as a totally atomic operation that doesn't execute unless a condition is met: "Decrement C's balance by $100 only if C's balance >= $100". DynamoDB can execute that update atomically for you, and thus prevent the issue.
P.S. check out this talk. If I remember correctly the speaker covers some topics adjacent to this.
Related
The Firestore docs says that both transactions and batched writes are atomic operations - either all changes are written or nothing is changed.
This question is about whether the changes of an atomic operation in Firestore can be partially observed, or whether the all or nothing guarantee applies to readers too?
Example:
Let's say that we have a Firestore database with at least two documents, X and Y.
Let's also say that there are at least two clients (A and B) connected to this database.
At some point client A executes a batched write that updates both document X and Y.
Later, client B reads document X and observes the change that client A made.
Now, if client B would read document Y too, is there a guarantee that the change made by A (in the same batched write operation) will be observed?
(Assuming that no other changes where made to those documents)
I've tested it and I've never detected any inconsistencies. However, just testing this matter can't be enough. It comes down to the level of consistency provided by Firestore, under all circumstances (high write frequency, large data sets, failover etc)
It might be the case that Firestore is allowed (for a limited amount of time) to expose the change of document X to client B but still not expose the change of document Y. Both changes will eventually be exposed.
Question is; will they be exposed as an atomic operation, or is this atomicity only provided for the write?
I've received an excellent response from Gil Gilbert in the Firebase Google Group.
In short; Firestore do guarantee that reads are consistent too. No partial observations as I was worried about.
However, Gil mentions two cases were a client could observe this kind of inconsistency anyway due to offline caching and session handling.
Please refer to Gil's response (link above) for details.
The data in our vault is manageable. Eventually, we would accumulate a large volume. It is not possible to retain such large data for every day transactions. We would want to periodically archive or warehouse the data, so that query performance is maintained.
May I know if you have thought about handling large scale datasets and what would be your advise.
From the corda-dev mailing list:
Yep, we should do some design work around this. As you note it’s not a pressing issue right now but may become one in future.
Our current implementation is actually designed to keep data around even when it’s no longer ‘current’ on the ledger. The ORM mapped vault tables prefer to mark a row as obsolete rather than actually delete the data from the underlying database. Also, the transaction store has no concept of garbage collection or pruning so it never deletes data either. This has clear benefits from the perspective of understanding the history of the ledger and how it got into its current state, but it poses operational issues as well.
I think people will have different preferences here depending on their resources and jurisdiction. Let’s tackle the two data stores separately:
Making the relationally mapped tables delete data is easy, it’s just a policy change. Instead of marking a row as gone, we actually issue a SQL DELETE call.
The transaction store is trickier. Corda benefits from its blockless design here; in theory we can garbage collect old transactions. The devil is in the details however because for nodes that use SGX the tx store will be encrypted. Thus not only do we need to develop a parallel GC for the tx graph, but also, run it entirely inside the enclaves. A fun systems engineering problem.
If the concern is just query performance, one obvious move is to shift the tx store into a scalable K/V store like Cassandra, hosted BigTable etc. There’s no deep reason the tx store must be in the same RDBMS as the rest of the data, it’s just convenient to have a single database to backup. Scalable K/V stores don’t really lose query performance as the dataset grows, so, this is also a nice solution.
W.R.T. things like the GDPR, being able to delete data might help or it might be irrelevant. As with all things GDPR related nobody knows because the EU didn’t bother to define any answers - auditing a distributed ledger might count as a “legitimate need” for data, or it might not, depending on who the judge is on the day of the case.
It is at any rate only an issue when personal data is stored on ledger, which is not most use cases today.
Does a push key need to be written to the Firebase database to stay unique in the future or is it considered taken as soon as its computed client side and if; for example, it hasn't been written to database yet and another client (Client B) is generating a key might turn out to be the same one the client A has?
I hope it's clear what I'm asking.
It will most likely be a GUID, meaning a global unique number and will never be generated again. (As soon as issued).
Have a look here: link
Just to attempt to reason it better; based on what I ready and understand, it doesn't seem to be a global mechanism to ensure they are unique other than the fact that the chance of it being computed twice is infinitesimal (see here) plus the fact that part of each ID is computed based on server timeStamp. So basically, if there are two users creating pushIDs at the exact same server time (infinitesimally low chance) there is still a chance but its practically very close to 0 to have pushIDs identical because the second part of pushIDs are also made by a mechanism to ensure something of ~ 1 in 2^120 chance of occurrence.
Combine the two segments and you have a practically 0 chance of re-occurance. However, mathematically is not 0, for those paranoid fellows like myself ;P
Feel free to correct me if I'm wrong.
In DynamoDB an Atomic Counter is a number that avoids race conditions
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WorkingWithItems.html#WorkingWithItems.AtomicCounters
What makes a number atomic, and can I add/subtract from a float in non-unit values?
Currently I am doing: "SET balance = balance + :change"
(long version) I'm trying to use DynamoDB for user balances, so accuracy is paramount. The balance can be updated from multiple sources simultaneously. There is no need to pre-fetch the balance, we will never deny a transaction, I just care that when all the operations are finished we are left with the right balance. The operations can also be applied in any order, as long as the final result is correct.
From what I understand, this should be fine, but I haven't seen any atomic increment examples that do changes of values other than "1"
My hesitation arises because questions like Amazon DynamoDB Conditional Writes and Atomic Counters suggest using conditional writes for similar situation, which sounds like a terrible idea. If I fetch balance, change and do a conditional write, the write could fail if the value has changed in the meantime. However, balance is the definition of business critical, and I'm always nervous when ignoring documentation
-Additional Info-
All writes will originate from a Lambda function, and I expect pretty much 100% success rates in writes. However, I also maintain a history of all changes, and in the event the balance is in an "unknown" state (eg network timeout), could lock the table and recalculate the correct balance from history.
This I think gives the best "normal" operation. 99.999% of the time, all updates will work with a single write. Failure could be very costly, as we would need to scan a clients entire history to recreate the balance, but in terms of trade-off that seems a pretty safe bet.
The documentation for atomic counter is pretty clear and in my opinion it will be not safe for your use case.
The problem you are solving is pretty common, AWS recommends using optimistic locking in such scenarios.
Please refer to the following AWS documentation,
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBMapper.OptimisticLocking.html
It appears that this concept is workable, from a AWS staff reply
Often application writers will use a combination of both approaches,
where you can have an atomic counter for real-time counting, and an
audit table for perfect accounting later on.
https://forums.aws.amazon.com/thread.jspa?messageID=470243񲳣
There is also confirmation that the update will be atomic and any update operation will be consistent
All non batch requests you send to DynamoDB gets processed atomically
- there is no interleaving involved of any sort between requests. Write requests are also consistent, so any write request will update
the latest version of the item at the time the request is received.
https://forums.aws.amazon.com/thread.jspa?messageID=621994򗶪
In fact, every write to a given item is strongly consistent
in DynamoDB, all operations against a given item are serialized.
https://forums.aws.amazon.com/thread.jspa?messageID=324353񏌁
Using DynamoDB, two independent clients trying to write to the same item at the same time, using conditional writes, and trying to change the value that the condition is referencing. Obviously, one of these writes is doomed to fail with the condition check; that's ok.
Suppose during the write operation, something bad happens, and some of the various DynamoDB nodes fail or lose connectivity to each other. What happens to my write operations?
Will they both block or fail (sacrifice of "A" in the CAP theorem)? Will they both appear to succeed and only later it turns out that one of them actually was ignored (sacrifice of "C")? Or will they somehow both work correctly due to some magic (consistent hashing?) going on in the DynamoDB system?
It just seems like a really hard problem, but I can't find anything discussing the possibility of availability issues with conditional writes (unlike with, for instance, consistent reads, where the possibility of availability reduction is explicit).
There is a lack of clear information in this area but we can make some pretty strong inferences. Many people assume that DynamoDB implements all of the ideas from its predecessor "Dynamo", but that doesn't seem to be the case and it is important to keep the two separated in your mind. The original Dynamo system was carefully described by Amazon in the Dynamo Paper. In thinking about these, it is also helpful if you are familiar with the distributed databases based on the Dynamo ideas, like Riak and Cassandra. In particular, Apache Cassandra which provides a full range of trade-offs with respect to CAP.
By comparing DynamoDB which is clearly distributed to the options available in Cassandra I think we can see where it is placed in the CAP space. According to Amazon "DynamoDB maintains multiple copies of each item to ensure durability. When you receive an 'operation successful' response to your write request, DynamoDB ensures that the write is durable on multiple servers. However, it takes time for the update to propagate to all copies." (Data Read and Consistency Considerations). Also, DynamoDB does not require the application to do conflict resolution the way Dynamo does. Assuming they want to provide as much availability as possible, since they say they are writing to multiple servers, writes in DyanmoDB are equivalent to Cassandra QUORUM level. Also, it would seem DynamoDB does not support hinted handoff, because that can lead to situations requiring conflict resolution. For maximum availability, an inconsistent read would only have to be at the equivalent of Cassandras's ONE level. However, to get a consistent read given the quorum writes would require a QUORUM level read (following the R + W > N for consistency). For more information on levels in Cassandra see About Data Consistency in Cassandra.
In summary, I conclude that:
Writes are "Quorum", so a majority of the nodes the row is replicated to must be available for the write to succeed
Inconsistent Reads are "One", so only a single node with the row need be available, but the data returned may be out of date
Consistent Reads are "Quorum", so a majority of the nodes the row is replicated to must be available for the read to succeed
So writes have the same availability as a consistent read.
To specifically address your question about two simultaneous conditional writes, one or both will fail depending on how many nodes are down. However, there will never be an inconsistency. The availability of the writes really has nothing to do with whether they are conditional or not I think.