I have a REST API that will be facilitating CRUD from multiple databases. These databases all represent the same data for different locations within the organization (IE We have 20 or so implementations of a software package and we want to read from all of the supporting databases via one API).
I was wondering what the "Best Practice" would be for facilitating what database to access resources from?
For example, right now in my request headers I have a custom "X-" header that would represent the database id. Unfortunately, this sort of thing feels a bit like a workaround.
I was thinking of a few other options:
I could bake the Database Id into the URI (/:db_id/resource/...)
I could modify the Accept Header like someone would with an API version
I could split up the API to be one service per database
Would one of the aforementioned options be considered "better" than the others, and if not what is considered the "best" option for this sort of architecture?
I am, at the moment, using ASP.NET Web API 2.
These databases all represent the same data for different locations within the organization
I think this is the key to your answer - you don't want to expose internal implementation details (like database IDs etc.) outside your API - what if you consolidate? or change your internal implementation one day?
However, this sentence reveals a distinction that is meaningful to the business - the location.
So - I'd make the location part of the URI:
/api/location/{locationId}/resource...
Then map the locationId internally to a database ID. LocationId could also be a name, or a code, or something unique that would be meaningful to the API client.
Then - if you later consolidate multiple locations to the same database or otherwise change your internal implementation, the clients don't have to change.
In addition, whoever is configuring the client applications, can do so thinking about something meaningful to the business - the location they are interested in.
Related
I’m redesigning the REST API for a small SaaS I built. Currently there’s a route /entries that doesn’t require any authentication. However, if the client authenticates with sufficient privileges, the server will send additional information (ex: the account associated with each entry).
The main problem I see with this is that a client attempting to request protected data with insufficient privileges will still receive a 200 response, but without the expected data, instead of a 401 Unauthorized.
The alternatives I came up with are:
Split the endpoint into two endpoints, ex /entries and /admin/entries. The problem with this approach is that there are now two different endpoints for essentially the same resource. However, it has the advantage of being easy to document with OpenAPI. (Additionally, it allows for the addition of a /entries/:id/account endpoint.)
Accept a query parameter ?admin=true. This option is harder to document. On the other hand, it avoids having multiple URIs for a single entry.
Is there a standard way to structure something like this?
Related question: Different RESTful representations of the same resource
The alternatives I came up with are
Note that, as far as HTTP/REST are concerned, your two alternatives are the same: in both cases you are introducing a new resource.
The fact that in one case you use path segments to distinguish the two identifiers and in the other case you are using the query part doesn't change the fact that you have two resources.
Having two resources with the same information is fine - imagine two web pages built from the same information.
It's a trade off - the HTTP application isn't going to know that these resources have common information, and so won't know that invalidating one cached resource should also invalidate the other. So just like with web pages, you can get into situations where the representations that you have in your cache aren't consistent with each other.
Sometimes, the right answer is to use links between different resources - have "the" information in one place, and everywhere else has links that allow you to find that one place. Again, trade-offs.
HTTP isn't an infinitely flexible application protocol. It's really good at transferring documents over a network, especially at "web scale".
There have been attempts at using Link headers to trigger invalidation of other cached resources, but as far as I have been able to tell, none of them has made it past the proposal stage.
We are using Micro services architecture where top services are used for exposing REST API's to end user and backend services does the work of querying database.
When we get 1 user request we make ~30k requests to backend service. We are using RxJava for top service so all 30K requests gets executed in parallel.
We are using haproxy to distribute the load between backend services.
However when we get 3-5 user requests we are getting network connection Exceptions, No Route to Host Exception, Socket connection Exception.
What are the best practices for this kind of use case?
Well you ended up with the classical microservice mayhem. It's completely irrelevant what technologies you employ - the problem lays within the way you applied the concept of microservices!
It is natural in this architecture, that services call each other (preferably that should happen asynchronously!!). Since I know only little about your service APIs I'll have to make some assumptions about what went wrong in your backend:
I assume that a user makes a request to one service. This service will now (obviously synchronously) query another service and receive these 30k records you described. Since you probably have to know more about these records you now have to make another request per record to a third service/endpoint to aggregate all the information your frontend requires!
This shows me that you probably got the whole thing with bounded contexts wrong! So much for the analytical part. Now to the solution:
Your API should return all the information along with the query that enumerates them! Sometimes that could seem like a contradiction to the kind of isolation and authority over data/state that the microservices pattern specifies - but it is not feasible to isolate data/state in one service only because that leads to the problem you currently have - all other services HAVE to query that data every time to be able to return correct data to the frontend! However it is possible to duplicate it as long as the authority over the data/state is clear!
Let me illustrate that with an example: Let's assume you have a classical shop system. Articles are grouped. Now you would probably write two microservices - one that handles articles and one that handles groups! And you would be right to do so! You might have already decided that the group-service will hold the relation to the articles assigned to a group! Now if the frontend wants to show all items in a group - what happens: The group service receives the request and returns 30'000 Article numbers in a beautiful JSON array that the frontend receives. This is where it all goes south: The frontend now has to query the article-service for every article it received from the group-service!!! Aaand your're screwed!
Now there are multiple ways to solve this problem: One is (as previously mentioned) to duplicate article information to the group-service: So every time an article is assigned to a group using the group-service, it has to read all the information for that article form the article-service and store it to be able to return it with the get-me-all-the-articles-in-group-x query. This is fairly simple but keep in mind that you will need to update this information when it changes in the article-service or you'll be serving stale data from the group-service. Event-Sourcing can be a very powerful tool in this use case and I suggest you read up on it! You can also use simple messages sent from one service (in this case the article-service) to a message bus of your preference and make the group-service listen and react to these messages.
Another very simple quick-and-dirty solution to your problem could also be just to provide a new REST endpoint on the articles services that takes an array of article-ids and returns the information to all of them which would be much quicker. This could probably solve your problem very quickly.
A good rule of thumb in a backend with microservices is to aspire for a constant number of these cross-service calls which means your number of calls that go across service boundaries should never be directly related to the amount of data that was requested! We closely monitory what service calls are made because of a given request that comes through our API to keep track of what services calls what other services and where our performance bottlenecks will arise or have been caused. Whenever we detect that a service makes many (there is no fixed threshold but everytime I see >4 I start asking questions!) calls to other services we investigate why and how this could be fixed! There are some great metrics tools out there that can help you with tracing requests across service boundaries!
Let me know if this was helpful or not, and whatever solution you implemented!
I'm reading all over the net that you your separate your "external schemas" from your "internal schemas" and never expose the "internal schemas" to any external actor.
If my solution only acts as a messagebus to create a loose coupling between 2 existing systems, will I really need any internal schemas?
System A makes a Request(Message with SchemaA) to Biztalk
Biztalk Maps SchemaA to SchemaB
Biztalk forwards request of type SchemaB to SystemB
SystemB returns ResponseB
Biztalk maps ResponeB to ResponeA
Biztalk routes the result back to System A
I can't see the pro's of having an internal schema and map:
SchemaA -> SchemaInternal -> SchemaB
?
The term canonical schema is often used to describe the creation of schemas internal (SchemaInternal in your last example) to an integration mechanism such as BizTalk.
Use of canonical schemas is widely regarded as a best practice, as it decouples your BizTalk flow control mapping from any 'other' system's schemas (other system here could be internal to your organisation or external to it, e.g. a supplier, customer or partner system). This way, if any of the systems integrated via BizTalk change, it is just the external schemas, and maps to the canonical schemas which need to be changed. It also prevents foreign conventions, naming and hierarchy differences inherent in external schemas from leaking into your internal BizTalk artefacts.
Generally, transformation of incoming messages to a canonical schema is done as early as possible e.g. on a receive, and similarly, transformation out of canonical done as late as possible, e.g. on a send port map.
A common scenario for Canonical Schemas (CS) is where a single orchestration or message flow is common to multiple trading parties (e.g. you may have many suppliers with different systems, however, all of them submit invoices for processing). In this case, each new supplier system just needs to be integrated with your CS - no new processing logic needs to be added or duplicated - CS can actually reduce the overall effort in such instances. (The n x m problem is explained in detail here). Another example of where CS are vital is where your business IS switching of messages - e.g. a Medical industry switch will have many doctor and practice systems sending authorisation requests and invoices and these need to be mapped and routed to multiple medical fund (medical aid) systems.
And FWIW:
IMO CS make most sense in an when BizTalk is the end-end solution in an EAI or ESB scenario, e.g. direct integration of 2 or more line of business systems. Otherwise, if BizTalk is just one endpoint on a larger corporate ESB, then it probably makes sense to use the corporate ESB schemas internally, and hence map external schemas directly to the ESB schemas (i.e. no need for another set of CS within BizTalk, provided that you have a good change management / version control mechanism across your enterprise).
If standard schemas (e.g. EDIFACT) exist for your industry, it is moot as to whether it is a goal to adopt these as internal CS. In general these may conflict with the meaning of Canonical as being 'simple', as industry schemas often need to be verbose in order to model all flavours and 'edge cases' of the document). Personally I would ensure that I have a mapping to / from said industry schemas, but would use a custom schema internally.
In described solution you don't have need in internal schemas. Well you can hide the schemas of System X from users of System Y, but that is not so important.
In this context, External = Public, meaning outside your organization.
The guidance is to protect internal implementation details, naming conventions and such, from others.
If both System A and System B are inside your organization then 'security' is less of an issue but your application can still offer an 'external' schema to consumers in order to protect them from internal changes to your application.
I am using SignalR with my ASP.NET application. What my application needs is to pressist the groups data that is updated from various servers. According to SignalR documentation it's my responsibility to do this. It means that I need to use an external server/service that will collect the data from one or more servers and I can query that data from a single place.
I first thought that MemCached is the best candidate, because it's fast and the data that I need to put there is volatile. The problem is that I need to store collections, for example: collection A with user Ids, so I can have Collection A with 2000 user ids and Collection B with 40,000 ids. The problem is that I need to update this collection and remove and insert id very quickly. I afraid that because the commands will be initiated from several servers, and the fact that I might need to read the entire collection and update it on a either web servers, the data won't be consistent. Web Server A might update the data, but Server B will read the data before Server A finished updating it. There is a concurrency conflict.
I'm searching for the best way to implement this kind of strategy in my ASP.NET 4.5 application. I think that this might be a choice to use a in-memory database or that to insure no data integrity.
I want to ask you what is the best solution for my problem.
Here's an example for my problen:
MemCached Server - stores the collections (e.g. Collection A, B, C, D), each collection stores User Id's, which can be thousands of Ids and even much more.
Web Servers - My Amazon EC2 web servers with SignalR installed. Can be behind load balancer. Those servers need to gain access to the memcached server and get a complete collection items by the Collection name (e.g. "Collection_23"). They need to be able to remove items (User Id's) and add Items. All this should be fast as possible.
I hope that I explained myself right. Thanks.
Alternatively, you can use Redis, like Memcached everything is served from in-memory. Redis has many other capabilities beyond a simple key-value datastore; for your specific case you might use Redis transactions, which ensures data consistency.
In a comment in another post it shows a link to redis provider. The link is broken, it seems that it is now integrated in the main SignalR project: https://github.com/SignalR/SignalR/tree/master/src/Microsoft.AspNet.SignalR.Redis
You have the redis nuget here:
http://www.nuget.org/packages/Microsoft.AspNet.SignalR.Redis
and documentation here:
http://www.asp.net/signalr/overview/signalr-20/performance-and-scaling/scaleout-with-redis
I'm considering an SOA architecture for a set of servives to support a business that Im consulting for, previously we used database integration where each application picked out what it need from a shared MS SQL database and worked with it etc.. We had various apps integrating with the monster database including java, .net and microsoft access, there was referential integrity as everything was tightly coupled.
I'm a bit confused about how to support data sharing between services.
Lets take Product Service which sits on top of a the Product database provided by the wholesaler each month. We build a domain model and sit this on to of the database with Hibernate or whatvever, implentation wise Product is a large object graph given the information provided by the wholesaler about the product.
Now lets say the Review service, Pricing Service, Shipping Service, and Stock Service will subscribe to ProductUpdated, ProductAdded, ProductDeleted. The problem is that each service only need part or some parts of the information about the Product. Shipping might only need the dimensions and weight. Pricing might only need product id, wholesale cost, volume discount, price effective to date. Review might need product id, product name, producer.
Is it standard practice just to publish the whole Product (suitable non-subscriber-specific contracts e.g. ProductUpdated, and a suitable schema - representing all product object graph) and let the subscribers map whatever they need to their domain models (or heck do what they want with, might not even have a domain model)...
Or as I write this I'm thinking maybe:
Product Service Publishes ProductAdded message (does not included product details just an ID of product and maybe a timestamp)
Pricing Service subscribes to ProductAdded and publishes RequestPricingForProduct message
Product Service Publishes ResultForPricingForProduct message
Hmm.. seems a little better... but it feels like I'm building the contract for Product Service based on what other services I can identify and what they are going to need, perhaps in future XYZ Service requires something different. Im going to stop there as I think it's getting clearer where I'm confused... perhaps the above will work because I should expose a way to return whatever that should be public hmmm right.
Any comments or direction greatly appreciated. Sorry if this appears half baked.
I actually think the solution to this problem is to NOT share the data. SOA means that data is owned by a service - it is the technical authority of that data. I suggest reading a few Pat Helland articles, such as Data On The Inside, Data On The Outside.
The only thing that should be shared between these different services is the primary key - the ProductId in your example. Otherwise, for each service, the data that needs to be transactionally consistent goes together.
There does not need to be one "Product". Each service can have a different view of the product in their service. For the Pricing service, you have a productId and a price. For the review service, a productId and a review. And so on.
Where this starts to confuse people is how to display this data in the UI if it's from all these disparate services. How can you show a list of reviews for a product that has the product name from the ProductService and the review text from the ReviewService?
The answer to that is to compose the UI from all the different services. Get the product from the product service and get the review data from the review service and then combine that data in the UI.
I was in your position recently. The problem with directly exposing the underlying object through the service is that you increase coupling between layers, and there becomes little point in using a Service Oriented Achitecture at all. You would not be able to change these objects or business rules without affecting the web service too.
It sounds like you are on the right track. If you are serious about seperating your layers, then the most common pattern is to create a new separate set of message classes just for the web service, potentially each service, and translate your internal objects back and forth.
For an example of how to set up your service layer in this manner see the "Service Interface" pattern. On the client side of the service, there is an opposite pattern called "Service Gateway".
The Application Architecture Guide 2.0 has a whole chapter dedicated to the types of the decisions you are making (http://apparchguide.codeplex.com/Wiki/View.aspx?title=Chapter%2013%20-%20Service%20Layer%20Guidelines). I would download the whole guide.
Here is the portion most relevant to you. Long story short, if you take the time to create new coarse-grained methods, and message-based objects, you'll end up with a much better web service:
Consider the following guidelines when designing a service interface:
Consider using a coarse-grained interface to batch requests and minimize the number of calls over the network.
Design service interfaces in such a way that changes to the business logic do not affect the interface.
Do not implement business rules in a service interface.
Consider using standard formats for parameters to provide maximum compatibility with different types of clients.
Do not make assumptions in your interface design about the way that clients will use the service.
Do not use object inheritance to implement versioning for the service interface.