REST: Proper way to handle partially-restricted resources - http

I’m redesigning the REST API for a small SaaS I built. Currently there’s a route /entries that doesn’t require any authentication. However, if the client authenticates with sufficient privileges, the server will send additional information (ex: the account associated with each entry).
The main problem I see with this is that a client attempting to request protected data with insufficient privileges will still receive a 200 response, but without the expected data, instead of a 401 Unauthorized.
The alternatives I came up with are:
Split the endpoint into two endpoints, ex /entries and /admin/entries. The problem with this approach is that there are now two different endpoints for essentially the same resource. However, it has the advantage of being easy to document with OpenAPI. (Additionally, it allows for the addition of a /entries/:id/account endpoint.)
Accept a query parameter ?admin=true. This option is harder to document. On the other hand, it avoids having multiple URIs for a single entry.
Is there a standard way to structure something like this?
Related question: Different RESTful representations of the same resource

The alternatives I came up with are
Note that, as far as HTTP/REST are concerned, your two alternatives are the same: in both cases you are introducing a new resource.
The fact that in one case you use path segments to distinguish the two identifiers and in the other case you are using the query part doesn't change the fact that you have two resources.
Having two resources with the same information is fine - imagine two web pages built from the same information.
It's a trade off - the HTTP application isn't going to know that these resources have common information, and so won't know that invalidating one cached resource should also invalidate the other. So just like with web pages, you can get into situations where the representations that you have in your cache aren't consistent with each other.
Sometimes, the right answer is to use links between different resources - have "the" information in one place, and everywhere else has links that allow you to find that one place. Again, trade-offs.
HTTP isn't an infinitely flexible application protocol. It's really good at transferring documents over a network, especially at "web scale".
There have been attempts at using Link headers to trigger invalidation of other cached resources, but as far as I have been able to tell, none of them has made it past the proposal stage.

Related

How to handle network calls in Microservices architecture

We are using Micro services architecture where top services are used for exposing REST API's to end user and backend services does the work of querying database.
When we get 1 user request we make ~30k requests to backend service. We are using RxJava for top service so all 30K requests gets executed in parallel.
We are using haproxy to distribute the load between backend services.
However when we get 3-5 user requests we are getting network connection Exceptions, No Route to Host Exception, Socket connection Exception.
What are the best practices for this kind of use case?
Well you ended up with the classical microservice mayhem. It's completely irrelevant what technologies you employ - the problem lays within the way you applied the concept of microservices!
It is natural in this architecture, that services call each other (preferably that should happen asynchronously!!). Since I know only little about your service APIs I'll have to make some assumptions about what went wrong in your backend:
I assume that a user makes a request to one service. This service will now (obviously synchronously) query another service and receive these 30k records you described. Since you probably have to know more about these records you now have to make another request per record to a third service/endpoint to aggregate all the information your frontend requires!
This shows me that you probably got the whole thing with bounded contexts wrong! So much for the analytical part. Now to the solution:
Your API should return all the information along with the query that enumerates them! Sometimes that could seem like a contradiction to the kind of isolation and authority over data/state that the microservices pattern specifies - but it is not feasible to isolate data/state in one service only because that leads to the problem you currently have - all other services HAVE to query that data every time to be able to return correct data to the frontend! However it is possible to duplicate it as long as the authority over the data/state is clear!
Let me illustrate that with an example: Let's assume you have a classical shop system. Articles are grouped. Now you would probably write two microservices - one that handles articles and one that handles groups! And you would be right to do so! You might have already decided that the group-service will hold the relation to the articles assigned to a group! Now if the frontend wants to show all items in a group - what happens: The group service receives the request and returns 30'000 Article numbers in a beautiful JSON array that the frontend receives. This is where it all goes south: The frontend now has to query the article-service for every article it received from the group-service!!! Aaand your're screwed!
Now there are multiple ways to solve this problem: One is (as previously mentioned) to duplicate article information to the group-service: So every time an article is assigned to a group using the group-service, it has to read all the information for that article form the article-service and store it to be able to return it with the get-me-all-the-articles-in-group-x query. This is fairly simple but keep in mind that you will need to update this information when it changes in the article-service or you'll be serving stale data from the group-service. Event-Sourcing can be a very powerful tool in this use case and I suggest you read up on it! You can also use simple messages sent from one service (in this case the article-service) to a message bus of your preference and make the group-service listen and react to these messages.
Another very simple quick-and-dirty solution to your problem could also be just to provide a new REST endpoint on the articles services that takes an array of article-ids and returns the information to all of them which would be much quicker. This could probably solve your problem very quickly.
A good rule of thumb in a backend with microservices is to aspire for a constant number of these cross-service calls which means your number of calls that go across service boundaries should never be directly related to the amount of data that was requested! We closely monitory what service calls are made because of a given request that comes through our API to keep track of what services calls what other services and where our performance bottlenecks will arise or have been caused. Whenever we detect that a service makes many (there is no fixed threshold but everytime I see >4 I start asking questions!) calls to other services we investigate why and how this could be fixed! There are some great metrics tools out there that can help you with tracing requests across service boundaries!
Let me know if this was helpful or not, and whatever solution you implemented!

Is it dangerous if a web resource POSTs to itself?

While reading some articles about writing web servers using Twisted, I came across this page that includes the following statement:
While it's convenient for this example, it's often not a good idea to
make a resource that POSTs to itself; this isn't about Twisted Web,
but the nature of HTTP in general; if you do this, make sure you
understand the possible negative consequences.
In the example discussed in the article, the resource is a web resource retrieved using a GET request.
My question is, what are the possible negative consequences that can arrive from having a resource POST to itself? I am only concerned about the aspects related to the HTTP protocol, so please ignore the fact that I mentioned about Twisted.
The POST verb is used for making a new resource in a collection.
This means that POSTing to a resource has no direct meaning (POST endpoints should always be collections, not resources).
If you want to update your resource, you should PUT to it.
Sometimes, you do not know if you want to update or create the resource (maybe you've created it locally and want to create-or-update it). I think that in that case, the PUT verb is more appropriate because POST really means "I want to create something new".
There's nothing inherently wrong about a page POSTing back to itself - in fact, many of the widely-used frameworks (ASP.NET, etc.) use that method to handle various events that happen on the client - some data is posted back to the same page where the server processes it and sends a new reponse.

What is the difference between REST and HTTP protocols?

What is the REST protocol and what does it differ from HTTP protocol ?
REST is a design style for protocols, it was developed by Roy Fielding in his PhD dissertation and formalised the approach behind HTTP/1.0, finding what worked well with it, and then using this more structured understanding of it to influence the design of HTTP/1.1. So, while it was after-the-fact in a lot of ways, REST is the design style behind HTTP.
Fielding's dissertation can be found at http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm and is very much worth reading, and also very readable. PhD dissertations can be pretty hard-going, but this one is wonderfully well-described and very readable to those of us without a comparable level of Computer Science. It helps that REST itself is pretty simple; it's one of those things that are obvious after someone else has come up with it. (It also for that matter encapsulates a lot of things that older web developers learnt themselves the hard way in one simple style, which made reading it a major "a ha!" moment for many).
Other application-level protocols as well as HTTP can also use REST, but HTTP is the classic example.
Because HTTP uses REST, all uses of HTTP are using a REST system. The description of a web application or service as RESTful or non-RESTful relates to whether it takes advantage of REST or works against it.
The classic example of a RESTful system is a "plain" website without cookies (cookies aren't always counter to REST, but they can be): Client state is changed by the user clicking a link which loads another page, or doing GET form queries which brings results. POST form queries can change both server and client state (the server does something on the basis of the POST, and then sends a hypertext document that describes the new state). URIs describe resources, but the entity (document) describing it may differ according to content-type or language preferred by the user. Finally, it's always been possible for browsers to update the page itself through PUT and DELETE though this has never been very common and if anything is less so now.
The classic example of a non-RESTful system using HTTP is something which treats HTTP as if it was a transport protocol, and with every request sends a POST of data to the same URI which is then acted upon in an RPC-like manner, possibly with the connection itself having shared state.
A RESTful computer-readable (i.e. not a website in a browser, but something used programmatically) system would obtain information about the resources concerned by GETting URI which would then return a document (e.g. in XML, but not necessarily) which would describe the state of the resource, including URIs to related resources (hypermedia therefore), change their state through PUTting entities describing the new state or DELETEing them, and have other actions performed by POSTing.
Key advantages are:
Scalability: The lack of shared state makes for a much more scalable system (demonstrated to me massively when I removed all use of session state from a heavily hit website, while I was expecting it to give a bit of extra performance, even a long-time anti-session advocate like myself was blown-away by the massive gain from removing what had been pretty slim use of sessions, it wasn't even why I had been removing them!)
Simplicity: There are a few different ways in which REST is simpler than more RPC-like models, in particular there are only a few "verbs" that are ever possible, and each type of resource can be reasoned about in reasonable isolation to the others.
Lightweight Entities: More RPC-like models tend to end up with a lot of data in the entities sent both ways just to reflect the RPC-like model. This isn't needed. Indeed, sometimes a simple plain-text document is all that is really needed in a given case, in which case with REST, that's all we would need to send (though this would be an "end-result" case only, since plain-text doesn't link to related resources). Another classic example is a request to obtain an image file, RPC-like models generally have to wrap it in another format, and perhaps encode it in some way to let it sit within the parent format (e.g. if the RPC-like model uses XML, the image will need to be base-64'd or similar to fit into valid XML). A RESTful model would just transmit the file the same as it does to a browser.
Human Readable Results: Not necessarily so, but it is often easy to build a RESTful webservice where the results are relatively easy to read, which aids debugging and development no end. I've even built one where an XSLT meant that the entire thing could be used by humans as a (relatively crude) website, though it wasn't primarily for human-use (essentially, the XSLT served as a client to present it to users, it wasn't even in the spec, just done to make my own development easier!).
Looser binding between server and client: Leads to easier later development or moves in how the system is hosted. Indeed, if you keep to the hypertext model, you can change the entire structure, including moving from single-host to multiple hosts for different services, without changing client code at all.
Caching: For the GET operations where the client obtains information about the state of a resource, standard HTTP caching mechanisms allow both for statements that the resource won't meaningfully change until a certain date at the earliest (no need to query at all until then) or that it hasn't changed since the last query (send a couple hundred bytes of headers saying this rather than several kilobytes of data). The improvement in performance can be immense (big enough to move the performance of something from the point where it is impractical to use to the point where performance is no longer a concern, in some cases).
Availability of toolkits: Because it works at a relatively simple level, if you have a webserver you can build a RESTful system's server and if you have any sort of HTTP client API (XHR in browser javascript, HttpWebRequest in .NET, etc) you can build a RESTful system's client.
Resiliance: In particular, the lack of shared state means that a client can die and come back into use without the server knowing, and even the server can die and come back into use without the client knowing. Obviously communications during that period will fail, but once the server is back online things can just continue as they were. This also really simplifies the use of web-farms for redundancy and performance - each server acts like it's the only server there is, and it doesn't matter that its actually only dealing with a fraction of the requests from a given client.
REST is an approach that leverages the HTTP protocol, and is not an alternative to it.
http://en.wikipedia.org/wiki/Representational_State_Transfer
Data is uniquely referenced by URL and can be acted upon using HTTP operations (GET, PUT, POST, DELETE, etc). A wide variety of mime types are supported for the message/response but XML and JSON are the most common.
For example to read data about a customer you could use an HTTP get operation with the URL http://www.example.com/customers/1. If you want to delete that customer, simply use the HTTP delete operation with the same URL.
The Java code below demonstrates how to make a REST call over the HTTP protocol:
String uri =
"http://www.example.com/customers/1";
URL url = new URL(uri);
HttpURLConnection connection =
(HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
connection.setRequestProperty("Accept", "application/xml");
JAXBContext jc = JAXBContext.newInstance(Customer.class);
InputStream xml = connection.getInputStream();
Customer customer =
(Customer) jc.createUnmarshaller().unmarshal(xml);
connection.disconnect();
For a Java (JAX-RS) example see:
http://bdoughan.blogspot.com/2010/08/creating-restful-web-service-part-45.html
REST is not a protocol, it is a generalized architecture for describing a stateless, caching client-server distributed-media platform. A REST architecture can be implemented using a number of different communication protocols, though HTTP is by far the most common.
REST is not a protocol, it is a way of exposing your application, mostly done over HTTP.
for example, you want to expose an api of your application that does getClientById
instead of creating a URL
yourapi.com/getClientById?id=4
you can do
yourapi.com/clients/id/4
since you are using a GET method it means that you want to GET data
You take advantage over the HTTP methods: GET/DELETE/PUT
yourapi.com/clients/id/4 can also deal with delete, if you send a delete method and not GET, meaning that you want to dekete the record
All the answers are good.
I hereby add a detailed description of REST and how it uses HTTP.
REST = Representational State Transfer
REST is a set of rules, that when followed, enable you to build a distributed application that has a specific set of desirable constraints.
It is stateless, which means that ideally no connection should be maintained between the client and server.
It is the responsibility of the client to pass its context to the server and then the server can store this context to process the client's further request. For example, session maintained by server is identified by session identifier passed by the client.
Advantages of Statelessness:
Web Services can treat each method calls separately.
Web Services need not maintain the client's previous interaction.
This in turn simplifies application design.
HTTP is itself a stateless protocol unlike TCP and thus RESTful Web Services work seamlessly with the HTTP protocols.
Disadvantages of Statelessness:
One extra layer in the form of heading needs to be added to every request to preserve the client's state.
For security we may need to add a header info to every request.
HTTP Methods supported by REST:
GET: /string/someotherstring:
It is idempotent(means multiple calls should return the same results every time) and should ideally return the same results every time a call is made
PUT:
Same like GET. Idempotent and is used to update resources.
POST: should contain a url and body
Used for creating resources. Multiple calls should ideally return different results and should create multiple products.
DELETE:
Used to delete resources on the server.
HEAD:
The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response. The meta information contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request.
OPTIONS:
This method allows the client to determine the options and/or requirements associated with a resource, or the capabilities of a server, without implying a resource action or initiating a resource retrieval.
HTTP Responses
Go here for all the responses.
Here are a few important ones:
200 - OK
3XX - Additional information needed from the client and url redirection
400 - Bad request
401 - Unauthorized to access
403 - Forbidden
The request was valid, but the server is refusing action. The user might not have the necessary permissions for a resource, or may need an account of some sort.
404 - Not Found
The requested resource could not be found but may be available in the future. Subsequent requests by the client are permissible.
405 - Method Not Allowed
A request method is not supported for the requested resource; for example, a GET request on a form that requires data to be presented via POST, or a PUT request on a read-only resource.
404 - Request not found
500 - Internal Server Failure
502 - Bad Gateway Error

HTTP Verbs and Content Negotiation or GET Strings for REST service?

I am designing a REST service and am trying to weight the pros and cons of using the full array of http verbs and content negotiation vs GET string variables. Does my choice affect cacheability? Neither solution may be right for every area.
Which is best for the crud and queries (e.g. ?action=PUT)?
Which is best for api version picking (e.g. ?version=1.0)?
Which is best for return data type(e.g. ?type=json)?
CRUD/Queries are best represented with HTTP verbs. A create and update is usually a PUT or POST. A retrieve would be a GET. Deletes would be a DELETE. Thats the generally mapping. The main point is that a GET doesn't cause side effects, and that the verbs do what you'd expect them to do.
Putting the action in the URI is OK if thats the -only- way to pass it (e.g, the http client library doesn't allow you to send non-GET/POST requests). Most libraries do, though, so its strongly advised not to pass the verb via the URL.
The "best" way to version the API would be using HTTP headers on a per-request basis; this lets clients upgrade/downgrade specific requests instead of every single one. Of course, that granularity of versioning needs to be baked in at the start and could severely complicate the server-side code. Most people just use the URL used the access the servers. A longer explanation is in a blog post by Peter Williams, "Versioning Rest Web Services"
There is no best return data type; it depends on your app. JSON might be easier for Ajax websites, whereas XML might be easier for complicated structures you want to query with Xpath. Protocol Buffers are a third option. Its also debated whether its better to put the return protocol is best specified in the URL or in the HTTP headers.
For the most part, headers will have the largest affect on caching, since proxies are suppose to respect them when told, as are user agents (obviously UA's behave differently, though). Caching based on URL alone is very dependent on the layers. Some user agents don't cache anything with a query string (Safari, iirc), and proxies are free to cache or not cache as they feel appropriate.

Is it wrong to switch client logic in the service tier?

We have two client apps (a web app and an agent app) accessing methods on the same service, but with slightly different requirements. My team wants to control behaviour on the service side by passing in a ApplicationType parameter to every method - which is essentially an enum containing the name of the calling client application - which is then used as a key for a database lookup to configure the service with client-specific options.
Something about this makes me uneasy as I don't think the service should really have to be aware of which client is calling it. I'm being told that it's easier to do it this way than pass a load of options dynamically through the method call.
Is there anything wrong with the client application telling the service who they are? Or is there really no difference between passing a config key versus a set of parameterized options?
One immediate problem I can see is that if we ever opened the service to another client run by a third party, we'd have to maintain their configuration settings locally for them. At the moment we own both client apps so it's not so much of a problem.
How would you do it?
In a layered solution, you should always consider your layers as onion-like layers, and dependencies should always go inwards, never outwards.
So your GUI/App layer should depend on the businesslogic layer, the businesslogic layer should depend on the data access layer, and similar.
Unless you categorize the clients (web, win, wpf, cli), or generalize it with client profiles (which client applications can configure), I would never pass in the name of the calling application, as this would make the business logic layer aware of and dependent upon the outside layer.
What kind of differences are we talking about that would depend on the type of application? If you elaborate a bit on the differences here, perhaps someone can come up with some helpful advice on other ways to solve this.
But I would definitely look for other ways before going down your described path.
Can't you create two different services, one for each application? The two services will share a lot of code or call a single internal service with different parameterization depending on what outer service was called.
From a design perspective, this is no different than having users with different profiles. From a security perspective, I hope your applications are doing something to identify themselves, lest users of one application figure out a way to invoke the other applications logic as a hack. (Image a HR application being used by the mafia and a bank at the same time, one customer would be interesting in hacking the other customer's application on a shared application host)
In .net the design doesn't feel this way because the credentials live on the thread (i.e. when you set the IIPrincipal, that info rides on the thread-- it is communicated along with each method call, but not as a parameter.)
Maybe what you are looking for in terms of a more elegant design is an ApplicationIdentity attribute. You'd have to write a custom one, I don't know of one in the framework right now.
This is a hard topic to discuss without a solid example.
You are right for feeling that way. Sending in the client type to change behaviour is not correct. It's not a bad idea for logging... but that's about it.
Here is what I would do:
Review each method to see what needs to be different and why.
Create different methods for different usages. The method name should be self explanatory. If you ever need to break compatibility, you have more control (assuming you're not using a versioning system which would be overkill for an in-house-only service).
In some cases request parameters (flags/enum values) are more appropriate.
In some cases knowing the operating environment is more appropriate (especially for data security). The operating environment almost always sent during a login request. Something like "attended"/"secure" (agent client) vs "unattended"/"not secure" (web client). Now you must exchange a session key (HTTP cookie or an application level session id). Sessions obviously doesn't work if you need to be 100% stateless -- especially if you want to scale-out without session replication... if you have that requirement, send a structure in every request.
Think of requests like functions in your code. You wouldn't put a magic parameter that changes the behaviour of the function. You would create multiple functions that each behave differently. Whoever is using the function makes the decision which one to call.
So why is client type so wrong? Client type has no specific meaning on its own. It has many meanings and they may change over time. It's simply informational which is why it is a handy thing to log. An operating environment does have a specific meaning.
Here is a scenario to consider: What if a new client type is developed that is slightly different in a way that would break compatibility with the original request? Now you have two requests. 2 clients use Request A and 1 client uses Request B. If you pass in a client type to each request, the server is expected to work for every possible client type. Much harder to test and maintain!!

Resources