Maybe I just expected "three-tier architecture" to deliver a little more than just a clean separation of responsibilities in the source code (see here)...
My expectations to such a beast that can safely call its self "three-tier architecture" are a lot higher... so, here they are:
If you were to build something like a "three tier architecture" system but this time with these, additional requirements and constraints:
Up and running at all times from a Users point of viewExpect when the UI gets replacedWhen other parts of the system are down, the UI has to handle that
Never get into a undefined state or one from which the system cannot recover automatically
The system has to be "pausable"
The middle-tier has to contain all the business logic
Obviously using an underlying Database, itself in the data-tier (if you like)
The business logic can use a big array of core services (here in the data-tier, not directly accessible by the UI, only through business logic tier facade)
Can be unavailable at times
Can be available as many parallel running, identical processes
The UI's may not contain any state other than the session in case of web UI's and possibly transient view baking models
Presentation-tier, logic-tier and data/core-services-tier have to be scalable independently
The only thing you can take for granted is the network
Note: The mentioned "core services" are heavy-weight components that access various external systems within the enterprise. An example would be the connection to an Active Directory or to a "stock market ticker"...
1. How would you do it?
If you don't have an answer right now, maybe read on and let me know what you think about this:
Sync considered harmful. Ties your system together in a bad way (Think: "weakest link"). Thread blocked while waiting for timeout. Not easy to recover from.
Use asynchronous messaging for all inter-process communication (between all tiers). Allows to suspend the system anytime you like. When part of the system is down, no timeout happens.
Have central routing component where all requests get routed through and core services can register themselves.
Add heartbeat component that can e.g. inform the UI that a component is not currently available.
State is a necessary evil: Allow no state other than in the business logic tier. This way the beast becomes manageable. While the core services might well need to access data themselves, all that data should be fed in by the calling middle tier. This way the core services can be implemented in a fire and forget fashion.
2. What do you think about this "solution"?
I think that, in the real world, high-availability systems are implemented using fail-over: for example, it isn't that the UI can continue to work without the business layer, instead it's that if the business layer becomes unavailable then the UI fails over to using a backup instance of the business layer.
Apart from that, they might operate using store-and-forward: e.g. a mail system might store a piece of mail, and retransmit it periodically, if it can't deliver it immediately.
Yep its the way most large websites do it. Look at nosql databases, Google's bigtable architecture etc.
1. This is the general approach I'd take.
I'd use a mixture of memcached , a nosql-cloud (couch-db or mongo-db) and enterprise grade RDBMS systems (core data storage) for the data layer. I'd then write the service layer ontop of the data layer. nosql database API's are massively parallel (look at couchdb with its ngingx service layer parallizer). I'd then provide "oldschool each request is a web-page" generating web-servers and also direct access to the service layer for new style AJAX application; both these would depend on the service layer.
p.s. the RDBMS is an important component here, it holds the authoritative copy of the all the data in the memchached/nosql cloud. I would use an enterprise grade RDBMS to do data-centre to data-centre replication. I don't know how the big boys do their cloud based site replication, it would scare me if they did data-cloud to data-cloud replication :P
Some points:
yYu do not need heartbeat, with nosql
the approach taken is that if content
becomes unavailable, you regenerate it
onto another server using the
authoratitve copy of the data.
The burden of state-less web-design
is carried to the nosql and memcached
layer which is infinitely scalable.
So you do not need to worry about
this. Just have a good network
infrastructure.
In terms of sync, when you are
talking to the RDBMS you can expect
acceptable synchronous response
times. Your cloud you should treat as
an asynchronous resource, you will
get help from the API's that
interface with your cloud so you
don't even have to think about this.
Advice I can give about networking
and redundancy is this: do not go for
fancy Ethernet bonding, as its not worth
it -- things always go wrong. Just
set up redundant switches, ethernet cards
and have multiple routes to all your
machines. You can use OpenBSD and
CARP for your routers, as they work
great - routers are your worst point of failure -- openbsd solves this problem.
2. You've described the general components of a web 2.0 farm, so no comment:D
Related
We have a process that involves loading a large block of data, applying some transformations to it, and then outputting what has changed. We currently run a web app where multiple instances of these large blocks of data are processed in the same CLR instance, and this leads to garbage collection thrashing and OOM errors.
We have proven that hosting some tracked state in a longer running process works perfectly to solve our main problem. The issue we now face is, as a stateful system, we need to host it and manage coordination with other parts of the system (also change tracking instances).
I'm evaluating Actors in Service Fabric and Akka at the moment, there are a number of other options, but before I proceed, I would like peoples thoughts on this approach with the following considerations:
We have a natural partition point in our system (Authority) which means we can divide our top level data set easily. Each partition will be represented by a top level instance that needs to organise a few sub-actors in its own local cluster, but we would expect a single host machine to be able to run multiple clusters.
Each Authority Cluster of actors would ideally be hosted together on a single machine to benefit from local communication and some use of shared local resources to get around limits on message size.
The actors themselves should be separate processes on the same box (Akka seems to run local Actors in the same CLR instance, which would crash everything on OOM - is this true?), this will enable me to spin up a process, run the transformation through it, emit the results and tear it down without impacting the other instances memory / GC. I appreciate hardware resource contention would still be a problem, but I expect this to be more memory than CPU intensive, so expect a RAM heavy box.
Because the data model is quite large, and the messages can contain either model fragments or changes to model fragments, it's difficult to work with immutability. We do not want to clone every message payload into internal state and apply it to the model, so ideally any actor solution used would enable us to work with the original message payload. This may cause problems with restoring an actor state as it wants to save and replay these on wakeup, but as we have state tracking internally, we can just store the resulting output of this on sleep.
We need a coordinator that can spin up instances of an Authority Cluster. There needs to be some elasticity in terms of the number of VM's/Machines and the number of Authority Clusters hosted on them, and something needs to handle creation and destruction of these.
We have a lot of .NET code, all our models, transformations and validation is defined in it, and will need to be heavily re-used. Whatever solution will need to support .Net
My questions then are:
While this feels like a good fit for Actors, I have reservations and wonder if there is something more appropriate? Everything I have tried has come back to a hosted processes of some kind.
If actors are the right way to go, which tech stack would put me closest to what I am trying to achieve with the above concerns taken into account?
IMO (coming at this from a JVM Akka perspective, thus why I changed the akka tag to akka.net; I don't have a great knowledge about the CLR side of things), there seems to be a mismatch between
We do not want to clone every message payload into internal state and apply it to the model, so ideally any actor solution used would enable us to work with the original message payload.
and
The actors themselves should be separate processes on the same box (Akka seems to run local Actors in the same CLR instance, which would crash everything on OOM - is this true?)
Assuming that you're talking about the same OS process, those are almost certainly mutually incompatible: exchanging messages strongly suggests serialization and is thus isomorphic to a copy operation. It's possible that something using shared memory between OS processes could work, but you may well have to make a choice about which is more important.
Likewise, the parent/child relationship in the "traditional" (Erlang/Akka) style actor model trivially gives you the local cluster of actors (which, since they're running in the same OS process allows the Akka optimization of not copying messages until you cross an OS process boundary), while "virtual actor" implementations as found in Service Fabric or Orleans (or, I'd argue Cloudstate or Lagom) basically assume distribution.
Semantically, the virtual actor models implicitly assume that actors are eternal (though their eternal essence may not always be incarnate). For your use-case, this doesn't necessarily seem to be the case.
I think a cluster of Akka.Net instances with sharded Authority actors spawning shorter-lived child actors best fits, assuming that you're getting OOM issues from trying to process multiple large blocks of data simultaneously. You would have to implement the instance scale-up/down logic yourself.
I have not worked with Akka.net so I can't speak to that at all, but I'd be happy to speak to what you're talking about in a Service Fabric context.
Service Fabric has no issue with the concept of running multiple clusters. In its terminology, the whole of your system would be called an Application and would have a version when deployed to the SF cluster. If you wanted to create multiple instances of it, all you'd need to do is select what you wanted to call the deployed app instance and it'll stand up provisioning for you.
SF has a notion of placement constraints, metric balancing and custom rules that you can utilize if you think you can better balance the various resources than its automatic balancing (or you need to for network DMZ purposes). While I've never personally grouped things down to a single machine, I frequently limit access of services to single VM scale sets (we host in Azure).
To the last point though, you'll still have message size limits, but you can also override them to some degree. In your project containing service interfaces, just set the following attribute above your namespace:
[assembly:FabricTransportRemotingSettings(MaxMessageSize=<(long)new size in bytes>)] and you're good to go.
Services can be configured to run using a Shared or Exclusive process model.
Regarding your state requirement, it's not necessarily clear to me what you're trying to do, but I think you're saying that that it's not critical that your actors store any state since they can work from some centrally-provided model.
You might look then at volatile state persistence then as it'll mean that state is saved for the actors in memory, but should you lose the replicas, nothing is written to disk so it's all lost. Or if you don't care and are ok just sending the model to the actors for any work, you can configure them to be stateless.
On the other hand, if you're still looking to retain state in the actors and simply are concerned about immutability, rest assured that actor state isn't immutable and can be updated trivially. There are simply order of operation concerns you need to keep in mind (e.g. if you retrieve the state, make a change, save it, 1) you must commit the transaction for it to take and 2) if you modify the state but don't save it, it'll obviously not persist - pull a fresh copy in a new transaction for any modifications). There's a whole pile of guidelines here.
Assuming your coordinator is intended to save some sort of state, might I recommend a singleton stateful service. Presumably it's not receiving an inordinate amount of use so a single instance is sufficient and it can easily save state (without the annoyance of identifying which state is on which partition). As for spinning up services, I covered this in the first bullet, but use the ApplicationManager on the built-in FabricClient to set up new applications and the ServiceManager to create instances of necessary services within each.
Service Fabric supports .NET Core 3.1 through .NET 5 as of the latest 8.0 release though note a minor serialization issues with an easy workaround with .NET 5.
If you have an Azure support subscription, I'd encourage you to write to the team under Development questions and share your concerns. Alternatively, on the third Thursday of each month at 10 AM PST, they also have a community call on Teams that you're welcome to join and you can find past calls here.
Again, I can't speak to whether this is a better fit than Akka.NET, but our stack is built atop Service Fabric. While it has some shortcomings (what framework doesn't?) it's an excellent platform for distributed software development.
Our client follows SOA principles and have design web services that are very fine grained like createCustomer, deleteCustomer, etc.
I am not sure if fine grained services are desirable as they create transactional related issues. for e.g. if a business requirement is every Customer must have a Address when it's created. So in this case, the presentation component will invoke createCustomer first and then createAddress. The services internally use simple JDBC to update the respective tables in db. As a service is invoked by external component, it has not way of fulfilling transactional requirement here i.e. if createAddress fails, createCustomer operation must be rolledback.
I guess, one of the approach to deal with this is to either design course grained services (that creates a Customer and associated Address in one single JDBC transaction) or
perhaps simple create a reversing service (deleteCustomer) that simply reverses the action of createCustomer.
any suggestions. thanks
The short answer: services should be designed for the convenience of the service client. If the client is told "call this, then cdon't forget to call that" you're making their lives too difficult. There should be a coarse-grained service.
A long answer: Can a Customer reasonably be entered with no Address? So we call
createCustomer( stuff but no address)
and the result is a valid (if maybe not ideal) state for a customer. Later we call
changeCustomerAddress ( customerId, Address)
and now the persisted customer is more useful.
In this scenario the API is just fine. The key point is that the system's integrity does not depend upon the client code "remembering" to do something, in this case to add the address. However, more likely we don't want a customer in the system without an address in which case I see it as the service's responsibility to ensure that this happens, and to give the caller the fewest possibilities of getting it wrong.
I would see a coarse-grained createCompleteCustomer() method as by far the best way to go - this allows the service provider to solve the problem once rather then require every client programmer to implement the logic.
Alternatives:
a). There are web Services specs for Atomic Transactions and major vendors do support these specs. In principle you could actually implement using fine-grained methods and true transactions. Practically, I think you enter a world of complexity when you go down this route.
b). A stateful interface (work, work, commit) as mentioned by #mtreit. Generally speaking statefulness either adds complexity or obstructs scalability. Where does the service hold the intermediate state? If in memeory, then we require affinity to a particular service instance and hence introduce scaling and reliability problems. If in some State or Work-in-progress database then we have significant additional implementation complexity.
Ok, lets start:
Our client follows SOA principles and
have design web services that are very
fine grained like createCustomer,
deleteCustomer, etc.
No, the client has forgotten to reach the SOA principles and put up what most people do - a morass of badly defined interfaces. For SOA principles, the clinent would have gone to a coarser interface (such asfor example the OData meachsnism to update data) or followed the advice of any book on multi tiered architecture written in like the last 25 years. SOA is just another word for what was invented with CORBA and all the mistakes SOA dudes do today where basically well known design stupidities 10 years ago with CORBA. Not that any of the people doing SOA today has ever heard of CORBA.
I am not sure if fine grained services
are desirable as they create
transactional related issues.
Only for users and platforms not supporting web services. Seriously. Naturally you get transactional issues if you - ignore transactional issues in your programming. The trick here is that people further up the food chain did not, just your client decided to ignore common knowledge (again, see my first remark on Corba).
The people designing web services were well aware of transactional issues, which is why web service specification (WS*) contains actually mechanisms for handling transactional integrity by moving commit operations up to the client calling the web service. The particular spec your client and you should read is WS-Atomic.
If you use the current technology to expose your web service (a.k.a. WCF on the MS platform, similar technologies exist in the java world) then you can expose transaction flow information to the client and let the client handle transaction demarcation. This has its own share iof problems - like clients keeping transactions open maliciously - but is still pretty much the only way to handle transactions that do get defined in the client.
As you give no platform and just mention java, I am pointing you to some MS example how that can look:
http://msdn.microsoft.com/en-us/library/ms752261.aspx
Web services, in general, are a lot more powerfull and a lot more thought out than what most people doing SOA ever think about. Most of the problems they see have been solved a long time ago. But then, SOA is just a buzz word for multi tiered architecture, but most people thinking it is the greatest thing since sliced bread just dont even know what was around 10 years ago.
As your customer I would be a lot more carefull about the performance side. Fine grained non-semantic web services like he defines are a performance hog for non-casual use because the amount of times you cross the network to ask / update small small small small stuff makes the network latency kill you. Creating an order for like 10 goods can easily take 30-40 network calls in this scenario which will really possibly take a lot of time. SOA preaches, ever since the beginning (if you ignore the ramblings of those who dont know history) to NOT use fine grained calls but to go for a coarse grained exchange of documents and / or a semantical approach, much like the OData system.
If transactionality is required, a coarser-grained single operation that can implement transaction-semantics on the server is definitely going to be much simpler to implement.
That said, certainly it is possible to construct some scheme where the target of the operations is not committed until all of the necessary fine-grained operations have succeeded. For instance, have a Commit operation that checks some flag associated with the object on the server; the flag is not set until all of the necessary steps in the transaction have completed, and Commit fails if the flag is not set.
Of course, if having light-weight, fine grained operations is an important design requirement, perhaps the need to have transactionality should be re-thought.
My project is converting a legacy fat-client desktop application into the web. The database is not changing as a result. Consequently, we are being forced to call external web services to access data in our own database. Couple this with the fact that some parts of our application are allowed to access the database directly through DAOs (a practice that is much faster and easier). The functionality we're supposed to call web services for are what has been deemed necessary for downstream, dependent systems.
Is this really how SOA is supposed to work? Admittedly, this is my first foray into the SOA world, but I have to think this is the complete wrong way to go about this.
I agree that it's the wrong approach. Calling your own database via a webservice should raise red flags in a design review, and a simple DAO is the way to go (KISS principle).
Now, if it's data that truly needs to be shared across your company (accounts, billing, etc) THEN it's time to consider a more heavy-duty solution such as SOAP or REST. But your team could still access it directly, which would be faster.
My team had the same thing happen with a web service that we wanted to call in batch mode. Rather than call our own SOAP endpoint, we instead set it up to call a POJO (plain old java object) interface. There's no XML transformation or extra network hop through an SOA appliance.
It's overkill to put an XML interface between MVC layers when your team owns the whole application. It may not be traditional SOA... but IMO it's traditional common sense. ;)
I've seen people try to jam SOA at too low a level and this may be such a case. I would certainly not equate DAO and SOA at the same level.
I agree with #ewernli
What is SOA "in plain english"?
IMHO, SOA makes sense only at the enterprise-level, and means nothing for a single application.
If I'm reading into your question correctly, your web services are for C/R/U/D data into the database. If so, providing C/R/U/D services directly to the database and its tables are likely too low level to be SOA services.
I'd look for services at a higher level and try to determine whether they are interesting at to the enterprise. If so, those are your services. I'd also ask myself whether my former desktop app is providing services (i.e. should you be looking to make your new app an SOA service itself rather than trying to force an SOA architecture into the desktop app at a low level.
Consequently, we are being forced to
call external web services to access
data in our own database.
Man, that gotta hurt. As far as services in SOA go,
a service is a repeatable logical manifestation of a business task - that means you are not implementing SOA if you are not 'service enabling' business processes. If you are putting some web services to select data out of your data base, all you got is a bunch of webservices, which would slowdown your applications which could have been faster by conventional data access patterns (like DAO)
When you equate SOA with Web services there is a risk of replacing existing APIs with Web services without proper architecture. This will result in identifying many services that are not business aligned.
Also, service orientation is a way of integrating a business as a group of linked services - so ask yourself is the organization making use of these atomic services to achieve further benefits?
Do a google search for SOA anti-patterns and you will find what are the different ways to end up with a pile of web-services instead of SOA.
SOA... SOA... is the bane of my existence, for just this reason. What, or what not, constitutes SOA? I support SOA products in my day job, and some people get it, some don't. SOA.. SOA is about wrapping discrete business services in XML. ZIP+4 validation services. Payment gateways. B2B messaging.
SOA CAN be used to decouple desktop apps from backend databases. Sometimes it doesn't make sense, sometimes it does. What almost NEVER makes sense is low-latency high-query-count logic. If you ever have to use an application in France directly connected to a database in California, you'll get what I mean. SOA pretty much forces you to then smartly about how you model and return your data (look into SDO - Service Data Objects). The devil's in the details though. Marshalling data to/from XML can be costly.
Good SOA design is all about separation of behavior and data.
I repeat behavior and data need to be separate or else you will have lots or problems whether its CORBA/SOAP/REST/XMLRPC or even plain old in-the-same-JVM-method calls.
Lots of people will talk about service end points, message handling, and contracts making SOA one of the more soporific areas of computing when its surprisingly not complicated.
If you are doing Java its really easy. Make POJOs for your domain objects with no weird state behavior and no weird collaborators and then make Service classes with the behavior. More often then not you can just use your DAO as the service (I mean you should have a thin layer over the DAO but if you don't need one....).
OOP lovers will disagree of this separation of data and behavior but this design pattern scales extremely well and is infact what most functional programming languages like Erlang do.
That being said if you are making a video game or something very state based then this design philosophy is a bad idea. BTW SOA is about as vacuous as the term enterprise.
Which part do you think is wrong? The part that you have to hit the web service, or the part you are hitting the database directly?
SOA is more of an API design guideline, not a development methodology. It's not an easy thing to implement, but the reward of reusability is often worth it.
See Service-Oriented Architecture expands the vision of Web services or any technical book on SOA. Simply wrapping function calls with web call does not make it a Service Oriented Architecture. The idea of the SOA is to make reusable services, and then you make higher level services (like website) by compositing or orchestrating underlying low-level services. At the very low level, you should focus on things like statelessness, loose coupling, and granularity. Modern frameworks like Microsoft's WCF supports wiring protocols like SOAP, REST, and faster binary side by side.
If your application is designed to run over the Internet, you should be mindful of the network latency issues. In a traditional client-server application that is deployed on a LAN, because the latency is sub 10 msec, you could hit the database every time you need the data without interrupting the user experience. However, on the Internet, it is not uncommon to have 200 msec latency if you go across proxies or oceans. If you hit the database 100 times, and that will add up to 20 seconds of pause. In SOA, you would try to pack the whole thing into a single document, and you exchange the document back and forth, similar to the way tax is filed using Form 1040 if you live in the US.
You may say that the latency issue is irrelevant because the web service is only consumed by your web application layer. But you could hit the web service from the browser using AJAX reload the data, which should give the user shorter response time.
Let's suppose I have a large middleware infrastructure mediating requests between several business components (customer applications, network, payments, etc). The middleware stack is responsible for orchestration, routing, transformation and other stuff (similar to the Enterprise Integration Patterns book by Gregor Hohpe).
My question is: is it good design to put some business logic on the middleware?
Let's say my app A requests some customer data from the middleware. But in order to get this data, I have to supply customer id and some other parameter. The fetching of this parameter should be done by the requesting app or is the middleware responsible for 'facilitating' and providing an interface that receives customer ids and internally fetches the other parameter?
I realize this is not a simple question (because of the definition of business logic), but I was wondering if it is a general approach or some guidelines.
Apart from the routing, transformation and orchestration, performance should be kept in mind while loading middleware with functional requirements. Middlware should take a fraction of the entire end-to-end transaction life time. This can be achieved only by concentrating on the middleware core functionalities, rather than trying to complement the host system functionalities.
This is the "Composite Application" pattern; the heart of a Service Oriented Architecture. That's what the ESB vendors are selling: a way to put additional business logic somewhere that creates a composite application out of existing applications.
This is not simple because your composite application is not just routing. It's a proper new composite transaction layered on top of the routing.
Hint. Look at getting a good ESB before going too much further. This rapidly gets out of control and having some additional support is helpful. Even if you don't buy something like Sun's JCAPS or Open ESB, you'll be happy you learned what it does and how they organize complex composite applications.
Orchestration, Routing and Transformation.
You don't do any of these for technical reasons, at random, or just for fun, you do these because you have some business requirement -- ergo there is business logic involved.
The only thing you are missing for a complete business system is calculation and reporting (let us assume you already have security in place!).
Except for very low level networking, OS and storage issues almost everything that comprises a computer system is there because the business/government/end users wants it to be there.
The choice of 'Business Logic' as terminoligy was very poor and has led to endless distortions of design and architecture.
What most good designers/architects mean by business logic is calculation and analysis.
If you "%s/Business Logic/Calculation/g" most of the architectural edicts make more sense.
The middleware application should do it. System A should have no idea that the other parameter exists, and will certainly have no idea about how to get it.
When starting a new ASP.NET application, with the knowledge that at some point in the future it must scale, what are the most important design decisions that will allow future scalability without wholsesale refactoring?
My Top three decisions are
Disabling or storing session state
in a database.
Storing as little as possible in session state.
Good N-Tier Architecture. Separating business logic and using Webservices instead of directly accessing DLL's ensures that you can scale out both the business layer as well as the presentation layer. Your database will likely be able to handle anything you throw at it although you can probably cluster that too if needed.
You could also look at partitioning data in the database too.
I have to admit though I do this regardless of whether the site has to scale or not.
These are our internal ASP.Net Do's and Don't Do's for massively visited web applications:
General Guidelines
Don't use Sessions - SessionState=Off
Disable ViewState completely - EnableViewState=False
Don't use any of the complext ASP.Net UI controls, stick to basic (DataGrid vs. Simple repeater)
Use fastest and shortest data access
mechanisms (stick to sqlreaders on
the front site)
Application Architecture
Create a caching manager with an abstraction layer. This will allow you to replace the simple System.Web.Cache with a more complex distributed caching solution in the future when you start scaling you application.
Create a dedicated I/O manager with an abstraction layer to support future growth (S3 anyone?)
Build timing tracing into your main pipelines which you can switch on and off, this will allow you to detect bottle necks when such occur.
Employ a background processing mechanism and move whatever is not required to render the current page for it to chew on.
Better yet - consider firing events from your application to other applications so they can do that async work.
Prepare for database scalability, place your own layer so that you can later decide if you want to partition you database or alternatively work with several read servers in a master-slave scenario.
Above all, learn from others successes and failures and stay positive.
Ensure you have a solid caching policy for transient / static data. Database calls are expensive especially with separate physical servers so be aggressive with your caching.
There are so many considerations, that one could write a book on the subject. In fact, there is a great book and it is free. ;-)
Microsoft has released Improving .NET Application Performance and Scalability as a PDF eBook.
It is worth reading cover to cover, if you don't mind the droll writing style. Not only does it identify key performance scenarios, but also establishing benchmarks, measuring performance, and how to apply what you learn.