When do distributed transactions make sense in a service-oriented architecture?
Distributed transactions are used very often in SOA environments. If you've a composite service calling multiple services, the underlying service calls should be handled as a single transaction. Business processes should allow for a roll-back of their steps. If the underlying resources allow for it, you can use 2-phase commits, but in many cases it is impossible. In these cases compensating actions should be done on the services/resources invoked before the failed step. In other words, undo the succeeded steps in a reverse order.
Imaginary example: telecom company provisions a new VoIP product for a customer with 6 service calls:
query inventory to check the customer has the right equipment
configure customer equipment via mediation
update inventory with new configuration
set up rating engine to count CDR's for customer
set up billing software to charge the customer with the correct price plan
update CRM system with the result of the provisioning process
The 6 steps above should be parts of one transaction. E.g. if inventory update fails, you (may) need to undo the customer equipment configuration.
Not really a case of when they make sense. A transaction (distributed or not) is implemented by necessity rather than arbitrary choice, to guarentee consistency. The alternative is to implement a reconcilliation process to ensure eventual consistency.
In the classic bank example (money out of account A, into account B), transactional consistency is pretty much essential. In some inventory systems (check inventory, decrement inventory, sell to customer), it may be acceptable for stock levels to be roughly accurate, rather than guarenteed. In which case, ignoring a failure (decrement inventory, sale fails to complete) could be dealt with by reconciling later.
Related
Been reading up on Corda (no actual use yet) and other DLTs to see if we could use it in a project. What I was wondering after reading all the Corda key concepts: what would be the way to share data with everyone, including nodes that are only added later?
I've been reading things like https://corda.net/blog/broadcasting-a-transaction-to-external-organisations/ and https://stackoverflow.com/a/53205303/1382108. But what if another node joins later?
As an example use case: say an organization wants to advertise goods it's selling to all nodes in a network, while price negotiations or actual sales can then happen in private. If a new node joins, what's the best/easiest way to make sure they are also aware of the advertised goods? With blockchain technologies I'd think they'd just replicate the chain that has these facts upon joining but how would it work in Corda? Any links or code samples would be greatly appreciated!
You can share transactions with other nodes that have not previously seen them, but this sort of functionality doesn't come out of the box and has to be implemented using flows by the CorDapp developer.
As the author of ONIXLabs, I've implemented much of this functionality generally to make it easier for CorDapp developers to consume. There are quite a few feature-rich APIs available on GitHub.
In order to publish a transaction, the ONIXLabs Corda Core API contains functions that extend FlowLogic<*> to provide generalised transaction publishing:
publishTransaction called on the initiating-side of the flow, specifies the transaction to be published, and to whom.
publishTransactionHandler called on the initiated-by/handler side of the flow specifies the transaction to be recorded and who it's from.
As an example of how these APIs are consumed, take a look at the ONIXLabs Corda Identity Framework, where we have a mechanism for publishing accounts from one node to a collection of counterparties.
PublishAccountFlow consumes the publishTransaction function.
PublishAccountFlowHandler consumes the publishTransactionHandler function.
I have a .net core 5 microservices project, the client has a search module which will query the data from many objects, these objects are in many services.
first microservice for products.
in this microservice has table product {productid, productname }.
second microservice for vendors.
in this microservice has table account {verndorId, vendorName}.
third microservice for purchase
in this microservice has table purchase {title, productid(this id comes from product table in first microservcice), accountid (this id comes from account table in secount microservice)}.
Now : the user want to search for the purchase where product name like "clothes" and vendorname like "x";
who I can do this query throw microservice pattern.
taking a monolithic system and slapping service interfaces on top of each table will just make your life hell as you will begin to implement that database logic yourself - like in this question where you try to recreate database joins.
Putting aside that it looks like your partitioning to services doesn't sound right (or needed) for this case. considering purchases describe events that happened (even if they are later deleted) they can capture a state of the product or the vendor. so you can augment the data in the purchases to represent the product and vendor data that were right at the time of the purchase. This will isolate purchase queries to the purchase service and will have the added benefit of preserving history when products/vendors evolve over time.
Considering this case you have to build at least 3 Queue Managers in your Message Queue service integration for Purchase, Vendor and Products.
And setup the Request and Reply queues.
Now those request and reply of the data can be in either JSON or XML however you like.
You need to create a listener which will be responsible to listen to the reply queues and you can create something like SignalR streaming to continuous listen.
Once all the integeration is completed you can directly inject the result in your Client application.
In riak documentation, there are often examples that you could model your e-commerce datastore in certain way. But here is written:
In a production Riak cluster being hit by lots and lots of concurrent writes,
value conflicts are inevitable, and Riak Data Types
are not perfect, particularly in that they do not guarantee strong
consistency and in that you cannot specify the rules yourself.
From http://docs.basho.com/riak/latest/theory/concepts/crdts/#Riak-Data-Types-Under-the-Hood, last paragraph.
So, is it safe enough to user Riak as primary datastore in e-commerce app, or its better to use another database with stronger consistency?
Riak out of the box
In my opinion out of the box Riak is not safe enough to use as the primary datastore in an e-commerce app. This is because of the eventual consistency nature of Riak (and a lot of the NoSQL solutions).
In the CAP Theorem distributed datastores (Riak being one of them) can only guarentee at most 2 of:
Consistency (all nodes see the same data at the same time)
Availability (a guarantee that every request receives a response
about whether it succeeded or failed)
Partition tolerance (the system
continues to operate despite arbitrary partitioning due to network
failures)
Riak specifically errs on the side of Availability and Partition tolerance by having eventual consistency of data held in its datastore
What Riak can do for an e-commerce app
Using Riak out of the box, it would be a good source for the content about the items being sold in your e-commerce app (content that is generally written once and read lots is a great use case for Riak), however maintaining:
count of how many items left
money in a users' account
Need to be carefully handled in a distributed datastore.
Implementing consistency in an eventually consistent datastore
There are several methods you can use, they include:
Implement a serialization method when writing updates to values that you need to be consistent (ie: go through a single/controlled service that guarantees that it will only update a single item sequentially), this would need to be done outside of Riak in your API layer
Change the replication properties of your consistent buckets so that you can 'guarantee' you never retrieve out of date data
At the bucket level, you can choose how many copies of data you want
to store in your cluster (N, or n_val), how many copies you wish to
read from at one time (R, or r), and how many copies must be written
to be considered a success (W, or w).
The above method is similar to using the strong consistency model available in the latest versions of Riak.
Important note: In all of these data store systems (distributed or not) you in general will do:
Read the current data
Make a decision based on the current value
Change the data (decrement the Item count)
If all three of the above actions cannot be done in atomic way (either by locking or failing the 3rd if the value was changed by something else) an e-commerce app is open to abuse. This issue exists in traditional SQL storage solutions (which is why you have SQL Transactions).
I'm considering using Amazon's DynamoDB. Naturally, if you're bothering to use a highly available distributed data store, you want to make sure your client deals with outages in a sensible way!
While I can find documentation describing Amazon's "Dynamo" database, it's my understanding that "DynamoDB" derives its name from Dynamo, but is not at all related in any other way.
For DynamoDB itself, the only documentation I can find is a brief forum post which basically says "retry 500 errors". For most other databases much more detailed information is available.
Where should I be looking to learn more about DynamoDB's outage handling?
While Amazon DynamoDB indeed lacks a detailed statement about their choices regarding the CAP theorem (still hoping for a DynamoDB edition of Kyle Kingsbury's most excellent Jepsen series - Call me maybe: Cassandra analyzes a Dynamo inspired database), Jeff Walker Code Ranger's answer to DynamoDB: Conditional writes vs. the CAP theorem confirms the lack of clear information in this area, but asserts that we can make some pretty strong inferences.
The referenced forum post also suggests a strong emphasis on availability too in fact:
DynamoDB does indeed synchronously replicate across multiple
availability zones within the region, and is therefore tolerant to a
zone failure. If a zone becomes unavailable, you will still be able to
use DynamoDB and the service will persist any successful writes that
we have acknowledged (including writes we acknowledged at the time
that the availability zone became unavailable).
The customer experience when a complete availability zone is lost
ranges from no impact at all to delayed processing times in cases
where failure detection and service-side redirection are necessary.
The exact effects in the latter case depend on whether the customer
uses the service's API directly or connects through one of our SDKs.
Other than that, Werner Vogels' posts on Dynamo/DynamoDB provide more insight eventually:
Amazon's Dynamo - about the original paper
Amazon DynamoDB – a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications - main introductory article including:
History of NoSQL at Amazon – Dynamo
Lessons learned from Amazon's Dynamo
Introducing DynamoDB - this features the most relevant information regarding the subject matter
Durable and Highly Available. Amazon DynamoDB replicates its data over
at least 3 different data centers so that the system can continue to
operate and serve data even under complex failure scenarios.
Flexible. Amazon DynamoDB is an extremely flexible system that does
not force its users into a particular data model or a particular
consistency model. DynamoDB tables do not have a fixed schema but
instead allow each data item to have any number of attributes,
including multi-valued attributes. Developers can optionally use
stronger consistency models when accessing the database, trading off
some performance and availability for a simpler model. They can also
take advantage of the atomic increment/decrement functionality of
DynamoDB for counters. [emphasis mine]
DynamoDB One Year Later: Bigger, Better, and 85% Cheaper… - about improvements
Finally, Aditya Dasgupta's presentation about Amazon's Dynamo DB also analyzes its modus operandi regarding the CAP theorem.
Practical Guidance
In terms of practical guidance for retry handling, the DynamoDB team has meanwhile added a dedicated section about Handling Errors, including Error Retries and Exponential Backoff.
I have a situation where a single Oracle system is the data master for two seperate CRM Systems (PeopleSoft & Siebel). The Oracle system sends CRUD messages to BizTalk for customer data, inventory data, product info and product pricing. BizTalk formats and forwards the messages on to PeopelSoft & Siebel web service interfcaes for action. After initial synchronization of the data, the ongoing operation has created a situation where the data isn't accurate in the outlying Siebel and PeopleSoft systems despite successful delivery of the data (this is another converation about what these systems mean when they return a 'Success' to BizTalk).
What do other similar implementations do to reconcile system data in this distributed service-oriented approach? Do they run a periodic dump from all systems for comparison? Are there any other techniques or methodologies for spotting failed updates and ensuring synchronization?
Your thoughts and experiences are appreciated. Thanks!
Additional Info
So why do the systems get out of synch? Whenevr a destination syste acknolwedges to BizTalk it has received the message, it means many things. Sometimes an HTTP 200 means I've got it and put it in a staging table and I'll commit it in a bit. Sometimes this is sucessful, sometimes it is not for various data issues. Sometimes the HTTP 200 means... yes I have received and comitted the data. Using HTTP, there can be issues with ordere dlivery. All of tese problems could have been solved with a lot of architehtural planning up front. It was not done. There are no update/create timestamps to prevent un-ordered delivery from stepping on data. There is no full round trip acknowledgement of data commi from destinatin systems. All of this adds up to things getting out of synch.
(sorry this is an answer and not a comment working my way up to 50 points).
Can the data be updated in the other systems or is it essentially read only?
Could you implement some further validation in the BizTalk layer to ensure that updates wouldn't fail because of data issues?
Can you grab any sort of notification that the update failed from the destination systems which would allow you to compensate in the BizTalk layer?
FWIW in situations like this I have usually end up with a central data store that contains at least the datakeys from the 3 systems that acts as the new golden repository for the data, however this is usually to compensate for multiple update sources. Seems like we also usually operate some sort of manual error queue that users must maintain.
To your idea of batch reconciliation I have seen that be quite common to compensate for transactional errors especially in the financial services realm.