Transaction management in Web services

Transaction management in Web services - soa

Our client follows SOA principles and have design web services that are very fine grained like createCustomer, deleteCustomer, etc.
I am not sure if fine grained services are desirable as they create transactional related issues. for e.g. if a business requirement is every Customer must have a Address when it's created. So in this case, the presentation component will invoke createCustomer first and then createAddress. The services internally use simple JDBC to update the respective tables in db. As a service is invoked by external component, it has not way of fulfilling transactional requirement here i.e. if createAddress fails, createCustomer operation must be rolledback.
I guess, one of the approach to deal with this is to either design course grained services (that creates a Customer and associated Address in one single JDBC transaction) or
perhaps simple create a reversing service (deleteCustomer) that simply reverses the action of createCustomer.
any suggestions. thanks

The short answer: services should be designed for the convenience of the service client. If the client is told "call this, then cdon't forget to call that" you're making their lives too difficult. There should be a coarse-grained service.
A long answer: Can a Customer reasonably be entered with no Address? So we call
createCustomer( stuff but no address)
and the result is a valid (if maybe not ideal) state for a customer. Later we call
changeCustomerAddress ( customerId, Address)
and now the persisted customer is more useful.
In this scenario the API is just fine. The key point is that the system's integrity does not depend upon the client code "remembering" to do something, in this case to add the address. However, more likely we don't want a customer in the system without an address in which case I see it as the service's responsibility to ensure that this happens, and to give the caller the fewest possibilities of getting it wrong.
I would see a coarse-grained createCompleteCustomer() method as by far the best way to go - this allows the service provider to solve the problem once rather then require every client programmer to implement the logic.
Alternatives:
a). There are web Services specs for Atomic Transactions and major vendors do support these specs. In principle you could actually implement using fine-grained methods and true transactions. Practically, I think you enter a world of complexity when you go down this route.
b). A stateful interface (work, work, commit) as mentioned by #mtreit. Generally speaking statefulness either adds complexity or obstructs scalability. Where does the service hold the intermediate state? If in memeory, then we require affinity to a particular service instance and hence introduce scaling and reliability problems. If in some State or Work-in-progress database then we have significant additional implementation complexity.

Ok, lets start:
Our client follows SOA principles and
have design web services that are very
fine grained like createCustomer,
deleteCustomer, etc.
No, the client has forgotten to reach the SOA principles and put up what most people do - a morass of badly defined interfaces. For SOA principles, the clinent would have gone to a coarser interface (such asfor example the OData meachsnism to update data) or followed the advice of any book on multi tiered architecture written in like the last 25 years. SOA is just another word for what was invented with CORBA and all the mistakes SOA dudes do today where basically well known design stupidities 10 years ago with CORBA. Not that any of the people doing SOA today has ever heard of CORBA.
I am not sure if fine grained services
are desirable as they create
transactional related issues.
Only for users and platforms not supporting web services. Seriously. Naturally you get transactional issues if you - ignore transactional issues in your programming. The trick here is that people further up the food chain did not, just your client decided to ignore common knowledge (again, see my first remark on Corba).
The people designing web services were well aware of transactional issues, which is why web service specification (WS*) contains actually mechanisms for handling transactional integrity by moving commit operations up to the client calling the web service. The particular spec your client and you should read is WS-Atomic.
If you use the current technology to expose your web service (a.k.a. WCF on the MS platform, similar technologies exist in the java world) then you can expose transaction flow information to the client and let the client handle transaction demarcation. This has its own share iof problems - like clients keeping transactions open maliciously - but is still pretty much the only way to handle transactions that do get defined in the client.
As you give no platform and just mention java, I am pointing you to some MS example how that can look:
http://msdn.microsoft.com/en-us/library/ms752261.aspx
Web services, in general, are a lot more powerfull and a lot more thought out than what most people doing SOA ever think about. Most of the problems they see have been solved a long time ago. But then, SOA is just a buzz word for multi tiered architecture, but most people thinking it is the greatest thing since sliced bread just dont even know what was around 10 years ago.
As your customer I would be a lot more carefull about the performance side. Fine grained non-semantic web services like he defines are a performance hog for non-casual use because the amount of times you cross the network to ask / update small small small small stuff makes the network latency kill you. Creating an order for like 10 goods can easily take 30-40 network calls in this scenario which will really possibly take a lot of time. SOA preaches, ever since the beginning (if you ignore the ramblings of those who dont know history) to NOT use fine grained calls but to go for a coarse grained exchange of documents and / or a semantical approach, much like the OData system.

If transactionality is required, a coarser-grained single operation that can implement transaction-semantics on the server is definitely going to be much simpler to implement.
That said, certainly it is possible to construct some scheme where the target of the operations is not committed until all of the necessary fine-grained operations have succeeded. For instance, have a Commit operation that checks some flag associated with the object on the server; the flag is not set until all of the necessary steps in the transaction have completed, and Commit fails if the flag is not set.
Of course, if having light-weight, fine grained operations is an important design requirement, perhaps the need to have transactionality should be re-thought.

Related

Approach for disconnected application development

Our company has people in every catastrophic event here in the U.S. and parts of Canada. An example is they were quite prevalent in Katrina immediately after the event.
We are constructing an application to improve their job in the field which may be either ASP.NET or WPF, and the disconnect requirement makes us believe it will be a WPF application. Our people need to be able to create their jobs, provide all of the insurance and measurement data, and save it as if in the database whether or not the internet is available.
The issue we are trying to get our heads around is that when at catastrophic events our people need to be able to use our new application even when the internet is not available. (They were offline for 3 days in Katrina)
Has anyone else had to address requirements like this and suggestions on how they approached functioning on small-footprint devices while saving data as if they were still connected to the backend services and database? We also have to incorporate security into this as well, and do it well enough that their entered data loads into the connected database without issues.
Our longterm goal is to also provide this application for Android and IPad Tablet devices as well as laptops. Our initial desire for ASP.NET was it gave us an immediate application for the tablet environment. In the old application they have, they run a local server, run remote connections on the tablets and run the application through terminal server. Not pretty. Not pretty.
I feel this is a serious question that is not subjective so hopefully this won't get deleted.
Our current architecture on the server side is Entity Framework with a repository pattern, WCF services to satisfy CRUD requests returning composite data transfer objects, and a proxy for use by the clients.
I'm interested in hearing other developers' input and this design puzzle.
Additional Information Added to the Discussion
Lots of good information provided!!! I'll have to look at Microsoft Sync for sure. For the disconnected database I would be placing only list tables (enumerations) in the initial database. Jobs and, if needed, an item we call dry books, will be added for each client we are helping. (though I hope the internet returns by the time we are cleaning and drying out the homes) These are the tables that would then populate back to the host once we have a stable link. In the case of Katrina we also lost internet connectivity in our offices which meant the office provided no communication relief for days as well.
Last night I realized that our client proxy is the key to everything working! The client remains unaware of the fact that it is online or offline and leaves the synchronization process within that library. We are discovering how much data we are talking about today. I also want to make it clear that ASP.NET was a like-to-have but a thick client (actually WPF with XAML) may end up being our end state.
Now -- for multiple updates. The disconnected work will be going to individual homes by a single franchise. In fact our home office dispatches specific franchises to specific events. So we have a reduced likelihood (if any) of the problem of multiple people updating a record. The reason is that they are creating records for each job (person's home/office/business) and only that one franchise will deal with it. Of course this also means that if they are disconnected for days that the device that creates the job (record of who, where, condition, insurance company, etc) is also the only device that knows of the job. But that can be lived with. In fact we may be able to have a facility to sync the franchise devices on a hub.
I'm looking forward to hearing additional stories of how you've implemented your disconnected environment.
Thanks!!!
Looking at new technology from Microsoft
I was directed to look at a video from TechEd 2012 and thought I might have an answer. The talk was on using ASP.NET and MVC4 along with 2 libraries for disconnected behavior. At first I thought it would be great but then as it continued it worried me quite a bit.
First the use of a javascript backend to support disconnected I/O does not generate confidence. As a compiler guy (and one who wrote two interpretive languages) I really do not like having a critical business model reliant upon interpretive javascript. And script at that! It may be me but it just makes me shudder.
Then they show their "great"(???) programming model having your ViewModel exist as just javascript. I do not care for an application (asp.net and javascript) that can be, and may as well be (for lack of intellisense ) written in notepad.
No offense meant to any asp lovers, but a well written C# program that has been syntactically and type checked gives me stronger confidence in software than something written with a hope and prayer that a class namespace has been properly typed without any means of cross check. I've seen too many hours of debugging looking for a bug that ended up in a huge namespace with transposed ie in it's name. I ran my thought past the other senior developers in my group and we are all in consensus on this technology.
But we continue to look. (I feel this is becoming more of a diary than a question) :)

Looks like a perfect example for Microsoft Sync Framework
http://msdn.microsoft.com/en-us/sync/bb736753.aspx
A comprehensive synchronization platform that enables collaboration
and offline access for applications, services, and devices with
support for any data type, any data store, any transfer protocol, and
any network topology.

I often find that building a lightweight framework to fit my specific needs is more beneficial to me than using an existing one. However, always look at what's available and weigh the pros and cons before making that decision.
I haven't use the Microsoft Sync Framework, but it sounds like that's a good one to research first. If you have Sql Server Standard (or some other version other than the Express version) then replication might also be an option.
If you want to develop your own homegrown solution, then be sure to put lastupdated and dateadded fields on any tables that need to stay in sync. It doesn't 'sound' like your scenario will be burdened by concurrency issues (i.e. if person A and B both modify a field at the same time, who wins?). If that's the case then developing your own lightweight solution will be pretty straightforward.
As Jeremy pointed out, you will need a way to get the changes. In addition to using a web service, you can also use WCF which is similar to a web service in some ways. But my personal bias would be towards just accessing a SQL server remotely over the internet. The downside of that solution is added security concerns, while the upside is decreased development overhead (i.e. faster/easier development now and less maintenance over time). Also, the direct SQL solution is also assuming that this is an internal application... that you're in charge of all development and not working with 3rd parties who need access to your data and wouldn't be allowed to access it this way.

Not really a full answer but too much for a comment.
I have two apps one that synchs one way and the other two way.
I do a one way synch to client for disconnected operation. At the server full SQL Server and at the client Compact Edition. TimeStamp is a prefect for finding any rows that needs to be synched. I also don't copy the whole database as some of the largest table are non nonessential. The common use is the user marks identified records they want to synch.
If synch does what you need great +1 for Jakub. For me I don't have the option to synch the whole MSSQL both based on size and security.
Have another smaller application that synchs two way but in this case it has regions and update are only within the region. So a region only synchs their data and in disconnected mode they can only add new records. Update to an existing records must be performed in connected mode. That was mangeable. In that case MSSQL for the master and used XML for the client.
No news to you but the hard part of a raw synch is that two parties may have added or revised the same record.

Workflow Foundation - Can I make it fit?

I have been researching workflow foundation for a week or so now, but have been aware of it and the concepts and use cases for it for many years, just never had the chance to dedicate any time to going deeper.
We now have some projects where we would benifit from a centralized business logic exposed as services as these projects require many different interfaces on different platforms I can see the "Business Logic Silos" occuring.
I have had a play around with some proof of concepts to discover what is possible and how it can be achieved and I must say, its a bit of a fundamental phase shift for a regular C# developer.
There are 3 things that I want to achieve:
Runtime instanciated state machines
Customizable by the user (perform different tasks in different orders and have unique functions called between states).
WCF exposed
So I have gone down the route of testing state machine workflows, xamlx wcf services, appfabric hosted services with persistance and monitoring, loading xamlx services from the databse at runtime, etc, but all of these examples seem not to play nicely together. For example, a hosted state machine service, when in appfabric, has issues with the sequence of service method calls such as:
"Operation 'MethodName' on service instance with identifier 'efa6654f-9132-40d8-b8d1-5e611dd645b1' cannot be performed at this time. Please ensure that the operations are performed in the correct order and that the binding in use provides ordered delivery guarantees".
Also, if you call instancial workflow services at runtime from an sql store, they cannot be tracked in appfabric.
I would like to Thank Ron Jacobs for all of his very helpful Hands On Labs and blog posts.
Are there any examples out there that anyone knows of that will tie together all of these concepts?
Am I trying to do something that is not possible or am I attempting this in the right way?
Thanks for all your help and any comments that you can make to assist.
Nick

Regarding the error, it seems like you have modified the WF once deployed (is that #2 in your list?), hence the error you mention.
Versioning (or for this case, modifying a WF after it's been deployed) is something that will be improved in the coming version, but I don't think it will achieve what you need in #2 (if it is what I understood), as the same WF is used for every instance.

How is an SOA architecture really supposed to be implemented?

My project is converting a legacy fat-client desktop application into the web. The database is not changing as a result. Consequently, we are being forced to call external web services to access data in our own database. Couple this with the fact that some parts of our application are allowed to access the database directly through DAOs (a practice that is much faster and easier). The functionality we're supposed to call web services for are what has been deemed necessary for downstream, dependent systems.
Is this really how SOA is supposed to work? Admittedly, this is my first foray into the SOA world, but I have to think this is the complete wrong way to go about this.

I agree that it's the wrong approach. Calling your own database via a webservice should raise red flags in a design review, and a simple DAO is the way to go (KISS principle).
Now, if it's data that truly needs to be shared across your company (accounts, billing, etc) THEN it's time to consider a more heavy-duty solution such as SOAP or REST. But your team could still access it directly, which would be faster.
My team had the same thing happen with a web service that we wanted to call in batch mode. Rather than call our own SOAP endpoint, we instead set it up to call a POJO (plain old java object) interface. There's no XML transformation or extra network hop through an SOA appliance.
It's overkill to put an XML interface between MVC layers when your team owns the whole application. It may not be traditional SOA... but IMO it's traditional common sense. ;)

I've seen people try to jam SOA at too low a level and this may be such a case. I would certainly not equate DAO and SOA at the same level.
I agree with #ewernli
What is SOA "in plain english"?
IMHO, SOA makes sense only at the enterprise-level, and means nothing for a single application.
If I'm reading into your question correctly, your web services are for C/R/U/D data into the database. If so, providing C/R/U/D services directly to the database and its tables are likely too low level to be SOA services.
I'd look for services at a higher level and try to determine whether they are interesting at to the enterprise. If so, those are your services. I'd also ask myself whether my former desktop app is providing services (i.e. should you be looking to make your new app an SOA service itself rather than trying to force an SOA architecture into the desktop app at a low level.

Consequently, we are being forced to
call external web services to access
data in our own database.
Man, that gotta hurt. As far as services in SOA go,
a service is a repeatable logical manifestation of a business task - that means you are not implementing SOA if you are not 'service enabling' business processes. If you are putting some web services to select data out of your data base, all you got is a bunch of webservices, which would slowdown your applications which could have been faster by conventional data access patterns (like DAO)
When you equate SOA with Web services there is a risk of replacing existing APIs with Web services without proper architecture. This will result in identifying many services that are not business aligned.
Also, service orientation is a way of integrating a business as a group of linked services - so ask yourself is the organization making use of these atomic services to achieve further benefits?
Do a google search for SOA anti-patterns and you will find what are the different ways to end up with a pile of web-services instead of SOA.

SOA... SOA... is the bane of my existence, for just this reason. What, or what not, constitutes SOA? I support SOA products in my day job, and some people get it, some don't. SOA.. SOA is about wrapping discrete business services in XML. ZIP+4 validation services. Payment gateways. B2B messaging.
SOA CAN be used to decouple desktop apps from backend databases. Sometimes it doesn't make sense, sometimes it does. What almost NEVER makes sense is low-latency high-query-count logic. If you ever have to use an application in France directly connected to a database in California, you'll get what I mean. SOA pretty much forces you to then smartly about how you model and return your data (look into SDO - Service Data Objects). The devil's in the details though. Marshalling data to/from XML can be costly.

Good SOA design is all about separation of behavior and data.
I repeat behavior and data need to be separate or else you will have lots or problems whether its CORBA/SOAP/REST/XMLRPC or even plain old in-the-same-JVM-method calls.
Lots of people will talk about service end points, message handling, and contracts making SOA one of the more soporific areas of computing when its surprisingly not complicated.
If you are doing Java its really easy. Make POJOs for your domain objects with no weird state behavior and no weird collaborators and then make Service classes with the behavior. More often then not you can just use your DAO as the service (I mean you should have a thin layer over the DAO but if you don't need one....).
OOP lovers will disagree of this separation of data and behavior but this design pattern scales extremely well and is infact what most functional programming languages like Erlang do.
That being said if you are making a video game or something very state based then this design philosophy is a bad idea. BTW SOA is about as vacuous as the term enterprise.

Which part do you think is wrong? The part that you have to hit the web service, or the part you are hitting the database directly?
SOA is more of an API design guideline, not a development methodology. It's not an easy thing to implement, but the reward of reusability is often worth it.
See Service-Oriented Architecture expands the vision of Web services or any technical book on SOA. Simply wrapping function calls with web call does not make it a Service Oriented Architecture. The idea of the SOA is to make reusable services, and then you make higher level services (like website) by compositing or orchestrating underlying low-level services. At the very low level, you should focus on things like statelessness, loose coupling, and granularity. Modern frameworks like Microsoft's WCF supports wiring protocols like SOAP, REST, and faster binary side by side.
If your application is designed to run over the Internet, you should be mindful of the network latency issues. In a traditional client-server application that is deployed on a LAN, because the latency is sub 10 msec, you could hit the database every time you need the data without interrupting the user experience. However, on the Internet, it is not uncommon to have 200 msec latency if you go across proxies or oceans. If you hit the database 100 times, and that will add up to 20 seconds of pause. In SOA, you would try to pack the whole thing into a single document, and you exchange the document back and forth, similar to the way tax is filed using Form 1040 if you live in the US.
You may say that the latency issue is irrelevant because the web service is only consumed by your web application layer. But you could hit the web service from the browser using AJAX reload the data, which should give the user shorter response time.

Asynchronously Decoupled Three-Tier Architecture

Maybe I just expected "three-tier architecture" to deliver a little more than just a clean separation of responsibilities in the source code (see here)...
My expectations to such a beast that can safely call its self "three-tier architecture" are a lot higher... so, here they are:
If you were to build something like a "three tier architecture" system but this time with these, additional requirements and constraints:
Up and running at all times from a Users point of viewExpect when the UI gets replacedWhen other parts of the system are down, the UI has to handle that
Never get into a undefined state or one from which the system cannot recover automatically
The system has to be "pausable"
The middle-tier has to contain all the business logic
Obviously using an underlying Database, itself in the data-tier (if you like)
The business logic can use a big array of core services (here in the data-tier, not directly accessible by the UI, only through business logic tier facade)
Can be unavailable at times
Can be available as many parallel running, identical processes
The UI's may not contain any state other than the session in case of web UI's and possibly transient view baking models
Presentation-tier, logic-tier and data/core-services-tier have to be scalable independently
The only thing you can take for granted is the network
Note: The mentioned "core services" are heavy-weight components that access various external systems within the enterprise. An example would be the connection to an Active Directory or to a "stock market ticker"...
1. How would you do it?
If you don't have an answer right now, maybe read on and let me know what you think about this:
Sync considered harmful. Ties your system together in a bad way (Think: "weakest link"). Thread blocked while waiting for timeout. Not easy to recover from.
Use asynchronous messaging for all inter-process communication (between all tiers). Allows to suspend the system anytime you like. When part of the system is down, no timeout happens.
Have central routing component where all requests get routed through and core services can register themselves.
Add heartbeat component that can e.g. inform the UI that a component is not currently available.
State is a necessary evil: Allow no state other than in the business logic tier. This way the beast becomes manageable. While the core services might well need to access data themselves, all that data should be fed in by the calling middle tier. This way the core services can be implemented in a fire and forget fashion.
2. What do you think about this "solution"?

I think that, in the real world, high-availability systems are implemented using fail-over: for example, it isn't that the UI can continue to work without the business layer, instead it's that if the business layer becomes unavailable then the UI fails over to using a backup instance of the business layer.
Apart from that, they might operate using store-and-forward: e.g. a mail system might store a piece of mail, and retransmit it periodically, if it can't deliver it immediately.

Yep its the way most large websites do it. Look at nosql databases, Google's bigtable architecture etc.
1. This is the general approach I'd take.
I'd use a mixture of memcached , a nosql-cloud (couch-db or mongo-db) and enterprise grade RDBMS systems (core data storage) for the data layer. I'd then write the service layer ontop of the data layer. nosql database API's are massively parallel (look at couchdb with its ngingx service layer parallizer). I'd then provide "oldschool each request is a web-page" generating web-servers and also direct access to the service layer for new style AJAX application; both these would depend on the service layer.
p.s. the RDBMS is an important component here, it holds the authoritative copy of the all the data in the memchached/nosql cloud. I would use an enterprise grade RDBMS to do data-centre to data-centre replication. I don't know how the big boys do their cloud based site replication, it would scare me if they did data-cloud to data-cloud replication :P
Some points:
yYu do not need heartbeat, with nosql
the approach taken is that if content
becomes unavailable, you regenerate it
onto another server using the
authoratitve copy of the data.
The burden of state-less web-design
is carried to the nosql and memcached
layer which is infinitely scalable.
So you do not need to worry about
this. Just have a good network
infrastructure.
In terms of sync, when you are
talking to the RDBMS you can expect
acceptable synchronous response
times. Your cloud you should treat as
an asynchronous resource, you will
get help from the API's that
interface with your cloud so you
don't even have to think about this.
Advice I can give about networking
and redundancy is this: do not go for
fancy Ethernet bonding, as its not worth
it -- things always go wrong. Just
set up redundant switches, ethernet cards
and have multiple routes to all your
machines. You can use OpenBSD and
CARP for your routers, as they work
great - routers are your worst point of failure -- openbsd solves this problem.
2. You've described the general components of a web 2.0 farm, so no comment:D

Are middleware apps required to do business logic?

Let's suppose I have a large middleware infrastructure mediating requests between several business components (customer applications, network, payments, etc). The middleware stack is responsible for orchestration, routing, transformation and other stuff (similar to the Enterprise Integration Patterns book by Gregor Hohpe).
My question is: is it good design to put some business logic on the middleware?
Let's say my app A requests some customer data from the middleware. But in order to get this data, I have to supply customer id and some other parameter. The fetching of this parameter should be done by the requesting app or is the middleware responsible for 'facilitating' and providing an interface that receives customer ids and internally fetches the other parameter?
I realize this is not a simple question (because of the definition of business logic), but I was wondering if it is a general approach or some guidelines.

Apart from the routing, transformation and orchestration, performance should be kept in mind while loading middleware with functional requirements. Middlware should take a fraction of the entire end-to-end transaction life time. This can be achieved only by concentrating on the middleware core functionalities, rather than trying to complement the host system functionalities.

This is the "Composite Application" pattern; the heart of a Service Oriented Architecture. That's what the ESB vendors are selling: a way to put additional business logic somewhere that creates a composite application out of existing applications.
This is not simple because your composite application is not just routing. It's a proper new composite transaction layered on top of the routing.
Hint. Look at getting a good ESB before going too much further. This rapidly gets out of control and having some additional support is helpful. Even if you don't buy something like Sun's JCAPS or Open ESB, you'll be happy you learned what it does and how they organize complex composite applications.

Orchestration, Routing and Transformation.
You don't do any of these for technical reasons, at random, or just for fun, you do these because you have some business requirement -- ergo there is business logic involved.
The only thing you are missing for a complete business system is calculation and reporting (let us assume you already have security in place!).
Except for very low level networking, OS and storage issues almost everything that comprises a computer system is there because the business/government/end users wants it to be there.
The choice of 'Business Logic' as terminoligy was very poor and has led to endless distortions of design and architecture.
What most good designers/architects mean by business logic is calculation and analysis.
If you "%s/Business Logic/Calculation/g" most of the architectural edicts make more sense.

The middleware application should do it. System A should have no idea that the other parameter exists, and will certainly have no idea about how to get it.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex