I'm reading all over the net that you your separate your "external schemas" from your "internal schemas" and never expose the "internal schemas" to any external actor.
If my solution only acts as a messagebus to create a loose coupling between 2 existing systems, will I really need any internal schemas?
System A makes a Request(Message with SchemaA) to Biztalk
Biztalk Maps SchemaA to SchemaB
Biztalk forwards request of type SchemaB to SystemB
SystemB returns ResponseB
Biztalk maps ResponeB to ResponeA
Biztalk routes the result back to System A
I can't see the pro's of having an internal schema and map:
SchemaA -> SchemaInternal -> SchemaB
?
The term canonical schema is often used to describe the creation of schemas internal (SchemaInternal in your last example) to an integration mechanism such as BizTalk.
Use of canonical schemas is widely regarded as a best practice, as it decouples your BizTalk flow control mapping from any 'other' system's schemas (other system here could be internal to your organisation or external to it, e.g. a supplier, customer or partner system). This way, if any of the systems integrated via BizTalk change, it is just the external schemas, and maps to the canonical schemas which need to be changed. It also prevents foreign conventions, naming and hierarchy differences inherent in external schemas from leaking into your internal BizTalk artefacts.
Generally, transformation of incoming messages to a canonical schema is done as early as possible e.g. on a receive, and similarly, transformation out of canonical done as late as possible, e.g. on a send port map.
A common scenario for Canonical Schemas (CS) is where a single orchestration or message flow is common to multiple trading parties (e.g. you may have many suppliers with different systems, however, all of them submit invoices for processing). In this case, each new supplier system just needs to be integrated with your CS - no new processing logic needs to be added or duplicated - CS can actually reduce the overall effort in such instances. (The n x m problem is explained in detail here). Another example of where CS are vital is where your business IS switching of messages - e.g. a Medical industry switch will have many doctor and practice systems sending authorisation requests and invoices and these need to be mapped and routed to multiple medical fund (medical aid) systems.
And FWIW:
IMO CS make most sense in an when BizTalk is the end-end solution in an EAI or ESB scenario, e.g. direct integration of 2 or more line of business systems. Otherwise, if BizTalk is just one endpoint on a larger corporate ESB, then it probably makes sense to use the corporate ESB schemas internally, and hence map external schemas directly to the ESB schemas (i.e. no need for another set of CS within BizTalk, provided that you have a good change management / version control mechanism across your enterprise).
If standard schemas (e.g. EDIFACT) exist for your industry, it is moot as to whether it is a goal to adopt these as internal CS. In general these may conflict with the meaning of Canonical as being 'simple', as industry schemas often need to be verbose in order to model all flavours and 'edge cases' of the document). Personally I would ensure that I have a mapping to / from said industry schemas, but would use a custom schema internally.
In described solution you don't have need in internal schemas. Well you can hide the schemas of System X from users of System Y, but that is not so important.
In this context, External = Public, meaning outside your organization.
The guidance is to protect internal implementation details, naming conventions and such, from others.
If both System A and System B are inside your organization then 'security' is less of an issue but your application can still offer an 'external' schema to consumers in order to protect them from internal changes to your application.
Related
I have a REST API that will be facilitating CRUD from multiple databases. These databases all represent the same data for different locations within the organization (IE We have 20 or so implementations of a software package and we want to read from all of the supporting databases via one API).
I was wondering what the "Best Practice" would be for facilitating what database to access resources from?
For example, right now in my request headers I have a custom "X-" header that would represent the database id. Unfortunately, this sort of thing feels a bit like a workaround.
I was thinking of a few other options:
I could bake the Database Id into the URI (/:db_id/resource/...)
I could modify the Accept Header like someone would with an API version
I could split up the API to be one service per database
Would one of the aforementioned options be considered "better" than the others, and if not what is considered the "best" option for this sort of architecture?
I am, at the moment, using ASP.NET Web API 2.
These databases all represent the same data for different locations within the organization
I think this is the key to your answer - you don't want to expose internal implementation details (like database IDs etc.) outside your API - what if you consolidate? or change your internal implementation one day?
However, this sentence reveals a distinction that is meaningful to the business - the location.
So - I'd make the location part of the URI:
/api/location/{locationId}/resource...
Then map the locationId internally to a database ID. LocationId could also be a name, or a code, or something unique that would be meaningful to the API client.
Then - if you later consolidate multiple locations to the same database or otherwise change your internal implementation, the clients don't have to change.
In addition, whoever is configuring the client applications, can do so thinking about something meaningful to the business - the location they are interested in.
I am trying to understand the definitions in this document.
http://www.opengroup.org/soa/source-book/ontologyv2/service.htm
Their definitions of service, service interface and service contract are either unclear or seem different from what I normally encounter.
Service:
“A service is a logical representation of a repeatable activity that
has a specified outcome. It is self-contained and is a ‘black box’ to
its consumers.”
Lets say I have a WCF project and it has two Operations
StoreFront
+GetPrice
+AddToCart
The definition says "a repeatable activity". So is the service StoreFront? Or do I have two services (GetPrice and AddToCart).
Service Contract:
Has an "effect" class. Is the effect "return price" and " added to cart" ?
From the same article:
“A capability offered by one entity or entities to others using
well-defined ‘terms and conditions’ and interfaces.” (Source: OMG
SoaML Specification - my italics)
This is in my opinion a preferable defnition than the one talking about "repeatable activities".
The key word in the definition is capability. Capability refers to Business Capability which is a carry-over from the BPM industry, but in an SOA context refers to a business domain with distinct boundaries.
So from this definition we can surmise that services should be exposed or should operate within a business capability/process boundary. This leads us towards the idea (from the principals or tenants of SOA) that services should be autonomous within well defined boundaries.
In your example, you are asking
So is the service StoreFront? Or do I have two services (GetPrice and
AddToCart)
The answer to that as always is "it depends". However, generally Pricing (GetPrice) would belong to a different business capability to Ordering (AddToCart). Additionally, the operations differ in some other important ways:
GetPrice is a read operation, while AddToCart is a write operation.
GetPrice is a synchronous operation, while AddToCart could very well be asynchronous
So from these we should probably assume that they are two different services from a business perspective.
This assumption has some radical repercussions. If they are two services, then according to SOA they should be autonomous. Meaning that we should be looking to minimize coupling between the services in every possible way, so that as much as possible they can be planned, developed, tested, built, deployed, hosted, supported, and managerd as separate concerns.
Another repercussion is that when you physically separate services to this extent, how can you show this stuff together to your users? They may be different capabilities but they still need to work together on the screen.
Additionally, from a back end perspective Ordering needs to know about Pricing data, otherwise how can order fulfillment happen? If you've separated the database into two, how can the Checkout service know how much stuff costs, what discounts to apply, etc?
I have posted about this stuff before, so please feel free to have a read. I would recommend reading the excellent article on Microservices by Lewis and Fowler also.
A lot of our use cases for Biztalk involve simply mapping and routing HL7 2.x messages from one system to another. Implementing maps and associating them to send/recieve ports is generally straightforward, but we also need to do some content based filtering on the sending side.
For example, we may want to only send ADT A04 and ADT A08 messages to system X if the sending facility is any 200 facilities (out of a possible 1000 facilities we have in our organization), but System Y needs ADT A04, A05, A8 for a totally different set of facilities and only for renal patients.
Because we're just routing messages and not really managing business processes here, utilzing orchestrations for the sole purpose to call out to the business rule engine is a little overkill here, especially considering that we'd probably need a seperate orchestration for each ADT type because of how schemas work. Is it possible to implement filter rules like this without using using orchestrations? The filters functionality of send ports looks a little too rudimentary for what we need, but at the same time I'd rather not develop and manage orchestrations.
You might be able to do this with property schemas...
You need to create a property schema and include the properties (from the other schemas) that you want to use for routing. Once you deploy the schema, those properties will be available for use as a filter in the send port. Start from here, you should be able to find examples somewhere...
As others have suggested you can use a custom pipeline component to call the Business Rules Engine.
And rather then trying to create your own, there is already an open source one available called the BizTalk Business Rules Engine Pipeline Framework
By calling BRE from the pipeline you can create complex rules which then set simple context properties on which you can route your messages.
Full disclosure: I've worked with the author of that framework when we were both at the same company.
Our client follows SOA principles and have design web services that are very fine grained like createCustomer, deleteCustomer, etc.
I am not sure if fine grained services are desirable as they create transactional related issues. for e.g. if a business requirement is every Customer must have a Address when it's created. So in this case, the presentation component will invoke createCustomer first and then createAddress. The services internally use simple JDBC to update the respective tables in db. As a service is invoked by external component, it has not way of fulfilling transactional requirement here i.e. if createAddress fails, createCustomer operation must be rolledback.
I guess, one of the approach to deal with this is to either design course grained services (that creates a Customer and associated Address in one single JDBC transaction) or
perhaps simple create a reversing service (deleteCustomer) that simply reverses the action of createCustomer.
any suggestions. thanks
The short answer: services should be designed for the convenience of the service client. If the client is told "call this, then cdon't forget to call that" you're making their lives too difficult. There should be a coarse-grained service.
A long answer: Can a Customer reasonably be entered with no Address? So we call
createCustomer( stuff but no address)
and the result is a valid (if maybe not ideal) state for a customer. Later we call
changeCustomerAddress ( customerId, Address)
and now the persisted customer is more useful.
In this scenario the API is just fine. The key point is that the system's integrity does not depend upon the client code "remembering" to do something, in this case to add the address. However, more likely we don't want a customer in the system without an address in which case I see it as the service's responsibility to ensure that this happens, and to give the caller the fewest possibilities of getting it wrong.
I would see a coarse-grained createCompleteCustomer() method as by far the best way to go - this allows the service provider to solve the problem once rather then require every client programmer to implement the logic.
Alternatives:
a). There are web Services specs for Atomic Transactions and major vendors do support these specs. In principle you could actually implement using fine-grained methods and true transactions. Practically, I think you enter a world of complexity when you go down this route.
b). A stateful interface (work, work, commit) as mentioned by #mtreit. Generally speaking statefulness either adds complexity or obstructs scalability. Where does the service hold the intermediate state? If in memeory, then we require affinity to a particular service instance and hence introduce scaling and reliability problems. If in some State or Work-in-progress database then we have significant additional implementation complexity.
Ok, lets start:
Our client follows SOA principles and
have design web services that are very
fine grained like createCustomer,
deleteCustomer, etc.
No, the client has forgotten to reach the SOA principles and put up what most people do - a morass of badly defined interfaces. For SOA principles, the clinent would have gone to a coarser interface (such asfor example the OData meachsnism to update data) or followed the advice of any book on multi tiered architecture written in like the last 25 years. SOA is just another word for what was invented with CORBA and all the mistakes SOA dudes do today where basically well known design stupidities 10 years ago with CORBA. Not that any of the people doing SOA today has ever heard of CORBA.
I am not sure if fine grained services
are desirable as they create
transactional related issues.
Only for users and platforms not supporting web services. Seriously. Naturally you get transactional issues if you - ignore transactional issues in your programming. The trick here is that people further up the food chain did not, just your client decided to ignore common knowledge (again, see my first remark on Corba).
The people designing web services were well aware of transactional issues, which is why web service specification (WS*) contains actually mechanisms for handling transactional integrity by moving commit operations up to the client calling the web service. The particular spec your client and you should read is WS-Atomic.
If you use the current technology to expose your web service (a.k.a. WCF on the MS platform, similar technologies exist in the java world) then you can expose transaction flow information to the client and let the client handle transaction demarcation. This has its own share iof problems - like clients keeping transactions open maliciously - but is still pretty much the only way to handle transactions that do get defined in the client.
As you give no platform and just mention java, I am pointing you to some MS example how that can look:
http://msdn.microsoft.com/en-us/library/ms752261.aspx
Web services, in general, are a lot more powerfull and a lot more thought out than what most people doing SOA ever think about. Most of the problems they see have been solved a long time ago. But then, SOA is just a buzz word for multi tiered architecture, but most people thinking it is the greatest thing since sliced bread just dont even know what was around 10 years ago.
As your customer I would be a lot more carefull about the performance side. Fine grained non-semantic web services like he defines are a performance hog for non-casual use because the amount of times you cross the network to ask / update small small small small stuff makes the network latency kill you. Creating an order for like 10 goods can easily take 30-40 network calls in this scenario which will really possibly take a lot of time. SOA preaches, ever since the beginning (if you ignore the ramblings of those who dont know history) to NOT use fine grained calls but to go for a coarse grained exchange of documents and / or a semantical approach, much like the OData system.
If transactionality is required, a coarser-grained single operation that can implement transaction-semantics on the server is definitely going to be much simpler to implement.
That said, certainly it is possible to construct some scheme where the target of the operations is not committed until all of the necessary fine-grained operations have succeeded. For instance, have a Commit operation that checks some flag associated with the object on the server; the flag is not set until all of the necessary steps in the transaction have completed, and Commit fails if the flag is not set.
Of course, if having light-weight, fine grained operations is an important design requirement, perhaps the need to have transactionality should be re-thought.
Maybe I just expected "three-tier architecture" to deliver a little more than just a clean separation of responsibilities in the source code (see here)...
My expectations to such a beast that can safely call its self "three-tier architecture" are a lot higher... so, here they are:
If you were to build something like a "three tier architecture" system but this time with these, additional requirements and constraints:
Up and running at all times from a Users point of viewExpect when the UI gets replacedWhen other parts of the system are down, the UI has to handle that
Never get into a undefined state or one from which the system cannot recover automatically
The system has to be "pausable"
The middle-tier has to contain all the business logic
Obviously using an underlying Database, itself in the data-tier (if you like)
The business logic can use a big array of core services (here in the data-tier, not directly accessible by the UI, only through business logic tier facade)
Can be unavailable at times
Can be available as many parallel running, identical processes
The UI's may not contain any state other than the session in case of web UI's and possibly transient view baking models
Presentation-tier, logic-tier and data/core-services-tier have to be scalable independently
The only thing you can take for granted is the network
Note: The mentioned "core services" are heavy-weight components that access various external systems within the enterprise. An example would be the connection to an Active Directory or to a "stock market ticker"...
1. How would you do it?
If you don't have an answer right now, maybe read on and let me know what you think about this:
Sync considered harmful. Ties your system together in a bad way (Think: "weakest link"). Thread blocked while waiting for timeout. Not easy to recover from.
Use asynchronous messaging for all inter-process communication (between all tiers). Allows to suspend the system anytime you like. When part of the system is down, no timeout happens.
Have central routing component where all requests get routed through and core services can register themselves.
Add heartbeat component that can e.g. inform the UI that a component is not currently available.
State is a necessary evil: Allow no state other than in the business logic tier. This way the beast becomes manageable. While the core services might well need to access data themselves, all that data should be fed in by the calling middle tier. This way the core services can be implemented in a fire and forget fashion.
2. What do you think about this "solution"?
I think that, in the real world, high-availability systems are implemented using fail-over: for example, it isn't that the UI can continue to work without the business layer, instead it's that if the business layer becomes unavailable then the UI fails over to using a backup instance of the business layer.
Apart from that, they might operate using store-and-forward: e.g. a mail system might store a piece of mail, and retransmit it periodically, if it can't deliver it immediately.
Yep its the way most large websites do it. Look at nosql databases, Google's bigtable architecture etc.
1. This is the general approach I'd take.
I'd use a mixture of memcached , a nosql-cloud (couch-db or mongo-db) and enterprise grade RDBMS systems (core data storage) for the data layer. I'd then write the service layer ontop of the data layer. nosql database API's are massively parallel (look at couchdb with its ngingx service layer parallizer). I'd then provide "oldschool each request is a web-page" generating web-servers and also direct access to the service layer for new style AJAX application; both these would depend on the service layer.
p.s. the RDBMS is an important component here, it holds the authoritative copy of the all the data in the memchached/nosql cloud. I would use an enterprise grade RDBMS to do data-centre to data-centre replication. I don't know how the big boys do their cloud based site replication, it would scare me if they did data-cloud to data-cloud replication :P
Some points:
yYu do not need heartbeat, with nosql
the approach taken is that if content
becomes unavailable, you regenerate it
onto another server using the
authoratitve copy of the data.
The burden of state-less web-design
is carried to the nosql and memcached
layer which is infinitely scalable.
So you do not need to worry about
this. Just have a good network
infrastructure.
In terms of sync, when you are
talking to the RDBMS you can expect
acceptable synchronous response
times. Your cloud you should treat as
an asynchronous resource, you will
get help from the API's that
interface with your cloud so you
don't even have to think about this.
Advice I can give about networking
and redundancy is this: do not go for
fancy Ethernet bonding, as its not worth
it -- things always go wrong. Just
set up redundant switches, ethernet cards
and have multiple routes to all your
machines. You can use OpenBSD and
CARP for your routers, as they work
great - routers are your worst point of failure -- openbsd solves this problem.
2. You've described the general components of a web 2.0 farm, so no comment:D