Should I use the Content-Location header this way? - http

Preface:
After reading a lot about HTTP and REST, you have spent a few hours devising a cunning content-negotiation scheme. So that your web API can serve XML, JSON and HTML from a single URL. Because, you know, a resource should only have one URL and different representations should be requested using Accept headers. You start to wonder why it took the web 20 years for that realization.
And that is when reality slaps you in the face.
So to help browsers (and yourself trying to debug) with coercing your service to serve the desired content type you do what every self-respecting REST evangelist would despise you for: Filename extensions.
Eternal torment in hell notwithstanding, is the following use of Content-Location + .ext acceptable?
Say we have users at /users/:loginname for example /users/bob. This would be the API endpoint for anything that is capable of setting a proper Accept header. But for any possible Content-Type (or at least some), we allow an alternate method of access and that is a URL with a filetype suffix. For example /users/bob.html for an HTML representation. Let's assume (and that is a big assumption to make) login names will never contain a period/dot.
Request:
GET /users/bob.json HTTP/1.1
Host: example.com
Response:
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 14
Content-Location: /users/bob
{"foo": "bar"}
This would allow me to encode alternative ways to access (in this case) the user information.
For example a link to a user page could be Bob.
A link to a vCard (to add the user to the Address-Book/Outlook/anything) would be Bob.
Are there any pitfalls I have missed? What would be pros/cons of this?
Edit: This popped up a bit late for me to notice. And even though it touches the subject and is really helpful, I think it's not exactly what I'm looking for...

As far as I can tell, you use Content-Location exactly the wrong way; it should point to the more specific URI.

According to RFC 2616:
The Content-Location entity-header field MAY be used to supply
the resource location for the entity enclosed in the message
when that entity is accessible from a location separate from
the requested resource's URI.
and
The Content-Location value is not a replacement for the original
requested URI; it is only a statement of the location of the resource
corresponding to this particular entity at the time of the request.
so generally, yes, you can use Content-Location header to identify origin resource. Main disadvantage of using of extension suffix is that you are making another URLs, e.g. /users/bob, /users/bob.vfc, /users/bob.html are three different resources.

Related

What determines request equivalence for HTTP caching?

I feel like this has to be easy to Google, but I can't find it: from the perspective of an HTTP cache, what determines if two requests are equivalent?
I imagine one ingredient is that that their URLs need to be identical; for example, rearranging (but not changing) query string parameters seems to cause a cache miss. Presumably they need to have the same Accept header. What else determines if a request can be served from cache?
This is mostly described in this RFC: https://www.rfc-editor.org/rfc/rfc7234#section-4
Summary:
The method
The full uri
Caching-related headers in response influence whether something got stored.
Any request headers that appeared in the list of the Vary response header.
It also matters whether you are caching for a specific user (for example a browser), or many users (for example a proxy).
I also struggled with this. Changing my google search to use "http cache key" generated better results. Using the URL seems to be the most common. Query strings are also generally included.
https://support.cloudflare.com/hc/en-us/articles/115004290387-Using-Custom-Cache-Keys describes what the default is for cloudflare and a discussion on the impact of using different keys.
Another parameter that could be useful is to identifying the type of assets that you want to cache. Or leave it open (no filtering)
"Authorization" header is specifically mentioned in the HTTP spec (https://www.rfc-editor.org/rfc/rfc7234) and needs to be handled.
Upon further reading, I noticed the section on "Secondary keys" in the standard (https://www.rfc-editor.org/rfc/rfc7234#section-4.1) and the use of "Vary" header in a response. Headers presented in the "Vary" response header have to match in both the original and the new request for the cache to declare it as a match.
And as for the primary key, standard says "The primary cache key consists of the request method and target URI." in https://www.rfc-editor.org/rfc/rfc7234#section-2
There are all the conditional requests for cache control like If-match, If-unmodified-since, If-none-match and If-modified-since. For example If-modified-since works this way: suppose you have already requested a page and now you want to reload it. If the header is present then a new page will be sent back from the server ONLY if it was modified since the date indicated as a value for If-modified-since, otherwise 304(not-modified) status will be returned.
Accept and Accept-* instead are necessary for Content-Negotiation, like in which language the page should be returned.
More on conditional requests here: https://www.rfc-editor.org/rfc/rfc7232#page-13

Specify supported media types when sending "415 unsupported media type"

If a clients sends data in an unsupported media type to a HTTP server, the server answers with status "415 unsupported media type". But how to tell the client what media types are supported? Is there a standard or at least a recommended way to do so? Or would it just be written to the response body as text?
There is no specification at all for what to do in this case, so expect implementations to be all over the place. (What would be sensible would be if the server's response included something like an Accept: header since that has pretty much the right semantics, if currently in the wrong direction.)
I believe you can do this with the OPTIONS Http verb.
Also the status code of 300 Multiple Choices could be used if your scenario fits a certain use case. If they send a request with an Accept header of application/xml and you only support text/plain and that representation lives at a distinct URL then you can respond with a 300 and in the Location header the URL of that representation. I realize this might not exactly fit your question, but it's another possible option.
And from the HTTP Spec:
10.4.7 406 Not Acceptable
The resource identified by the request is only capable of generating response entities which have content characteristics not acceptable according to the accept headers sent in the request.
Unless it was a HEAD request, the response SHOULD include an entity containing a list of available entity characteristics and location(s) from which the user or user agent can choose the one most appropriate. The entity format is specified by the media type given in the Content-Type header field. Depending upon the format and the capabilities of the user agent, selection of the most appropriate choice MAY be performed automatically. However, this specification does not define any standard for such automatic selection.
Note: HTTP/1.1 servers are allowed to return responses which are
not acceptable according to the accept headers sent in the
request. In some cases, this may even be preferable to sending a
406 response. User agents are encouraged to inspect the headers of
an incoming response to determine if it is acceptable.
tl;dr;
Edited the generated proxy class to inherit from Microsoft.Web.Services3.WebServicesClientProtocol**.
I came across this question when troubleshooting this error, so I thought I would help the next person who might come through here, although not sure if it answers the question as stated. I ran into this error when at some point I had to take over an existing solution which was utilizing WSE and MTOM encoding. It was a windows client calling a web service.
To the point, the client was calling the web service where it would throw that error.
Something that contributed to resolving that error for me was to check the web service proxy class that apparently is generated by default to inherit from System.Web.Services.Protocols.SoapHttpClientProtocol.
Essentially that meant that it didn't actually use WSE3.
Anyhow I manually edited the proxy and changed it to inherit from Microsoft.Web.Services3.WebServicesClientProtocol.
BTW, to see the generated proxy class in VS click on the web reference and then click the 'Show All Files' toolbar button. The reference.cs is da place of joy!
Hope it helps.
In his book "HTTP Developer's Handbook" on page 81 Chris Shiflett explains what a 415 means, and then he says, "The media type used in the content of the HTTP response should be indicated in the Content-Type entity header."
1) So is Content-Type a possible answer? It would presumably be a comma-separated list of accepted content types. The obvious problem with this possibility is that Content-Type is an entity header not a response header.
2) Or is this a typo in the book? Did he really mean to say "the HTTP request"?

HTTP POST prarameters order / REST urls

Let's say that I'm uploading a large file via a POST HTTP request.
Let's also say that I have another parameter (other than the file) that names the resource which the file is updating.
The resource cannot be not part of the URL the way you can do it with REST (e.g. foo.com/bar/123). Let's say this is due to a combination of technical and political reasons.
The server needs to ignore the file if the resource name is invalid or, say, the IP address and/or the logged in user are not authorized to update the resource. This can easily be done if the resource parameter came first in the POST request.
Looks like, if this POST came from an HTML form that contains the resource name first and file field second, for most (all?) browsers, this order is preserved in the POST request. But it would be naive to fully rely on that, no?
In other words the order of HTTP parameters is insignificant and a client is free to construct the POST in any order. Isn't that true?
Which means that, at least in theory, the server may end up storing the whole large file before it can deny the request.
It seems to me that this is a clear case where RESTful urls have an advantage, since you don't have to look at the POST content to perform certain authorization/error checking on the request.
Do you agree? What are your thoughts, experiences?
More comments please! In my mind, everyone who's doing large file uploads (or any file uploads for that matter) should have thought about this.
You can't rely on the order of POST variables that's for sure. Especially you can't trust form arrays to be in correct order when submitting/POSTing the form. You might want to check the credentials etc. somewhere else before getting to the point of posting the actual data if you want to save the bandwidth.
I'd just stick whatever variables you need first in the request's querystring.
On the client,
<form action="/yourhandler?user=0&resource=name" method="post">
<input type="file" name="upload" /></form>
Gets you
POST /yourhandler?user=0&resource=name HTTP/1.1
Content-Type: multipart/form-data; boundary=-----
...
-----
Content-Disposition: form-data; name="upload"; filename="somebigfile.txt"
Content-Type: text/plain
...
On the server, you'd then be able to check the querystring before the upload completes and shut it down if necessary. (This is more or less the same as REST but may be easier to implement based on what you have to work with, technically and politically speaking.)
Your other option might be to use a cookie to store that data, but this obviously fails when a user has cookies disabled. You might also be able to use the Authorization header.
You should provide more background on your use case. So far, I see absolutely no reason, why you should not simply PUT the large entity to the resource you intend to create or update.
PUT /documents/some-doc-name
Content-Type: text/plain
[many bytes of text data]
Why do you think that is not a possible solution in your case?
Jan

HTTP Get content type

I have a program that is supposed to interact with a web server and retrieve a file containing structured data using http and cgi. I have a couple questions:
The cgi script on the server needs to specify a body right? What should the content-type be?
Should I be using POST or GET?
Could anyone tell me a good resource for reading about HTTP?
If you just want to retrieve the resource, I’d use GET. And with GET you don’t need a Content-Type since a GET request has no body. And as of HTTP, I’d suggest you to read the HTTP 1.1 specification.
The content-type specified by the server will depend on what type of data you plan to return. As Jim said if it's JSON you can use 'application/json'. The obvious payload for the request would be whatever data you're sending to the client.
From the servers prospective it shouldn't matter that much. In general if you're not expecting a lot of information from the client I'd set up the server to respond to GET requests as opposed to POST requests. An advantage I like is simply being able to specify what I want in the url (this can't be done if it's expecting a POST request).
I would point you to the rfc for HTTP...probably the best source for information..maybe not the most user friendly way to get your answers but it should have all the answers you need. link text
For (1) the Content-Type depends on the structured data. If it's XML you can use application/xml, JSON can be application/json, etc. Content-Type is set by the server. Your client would ask for that type of content using the Accept header. (Try to use existing data format standards and content types if you can.)
For (2) GET is best (you aren't sending up any data to the server).
I found RESTful Web Services by Richardson and Ruby a very interesting introduction to HTTP. It takes a very strict, but very helpful, view of HTTP.

Opinions Needed on the Atomicity of a RESTful PUT

My colleagues and I are implementing a number of RESTful HTTP services, and we're trying to make sure we are a) following the spec, and b) doing the "right" thing where the spec is short of detail.
Here is a particular situation that we have come to and are looking for opinions from the community on:
Suppose you have a resource /People/Bob, and your client is going to update it with a PUT. The server can produce representations for /People/Bob in application/json and text/html. The server can interpret representations for /People/Bob in application/json.
Given this request:
PUT /People/Bob
Content-Type: application/json
Accept: application/xml
{ name: "Still Bob" }
The server can't produce an XML representation, but it can process the incoming JSON. So we know the correct answer is for the server to return status 406.
The question is: should the server have performed the update to /People/Bob?
+1 for Philosophy of REST.
Without detailed knowledge of the HTTP spec, I would simply choose one of the options and document the quandary and the choice.
My preference would be that the server cannot respond as requested, then it should not process any of the request at all.
But that may not work in some scenarios, so you might have to do the opposite.
The question is: should the server have performed the update to /People/Bob?
From the HTTP spec, a 406 means:
The resource identified by the request is only capable of generating response entities which have content characteristics not acceptable according to the accept headers sent in the request.
Unless it was a HEAD request, the response SHOULD include an entity containing a list of available entity characteristics and location(s) from which the user or user agent can choose the one most appropriate. The entity format is specified by the media type given in the Content-Type header field. Depending upon the format and the capabilities of the user agent, selection of the most appropriate choice MAY be performed automatically. However, this specification does not define any standard for such automatic selection.
Note: HTTP/1.1 servers are allowed to return responses which are
not acceptable according to the accept headers sent in the
request. In some cases, this may even be preferable to sending a
406 response. User agents are encouraged to inspect the headers of
an incoming response to determine if it is acceptable.
If the response could be unacceptable, a user agent SHOULD temporarily stop receipt of more data and query the user for a decision on further actions.
That note in the middle about HTTP/1.1 may be your answer. I read it as saying "you may return a 200 in response to the PUT request to /People/Bob when the user agent specifies application/xml in the Accept header, selecting any suitable content-type, and that this outcome may be preferable to returning 406."
Under this scenario, the PUT would succeed on the server, return a 200, but the client would get an application/json representation. The client needs to be able to handle that possibility by making sure that it understands the media type given in the Content-type header, and behaving in a well-defined manner if it doesn't.
But this is always true anyway.
One more thing: you may want to consider not using plain-vanilla media types like application/xml and application/json, but instead define your own custom media types, maybe based on XHTML or JSON. All of the client-server coupling in a RESTful application happens through media types. Without media types rich enough to capture your domain concepts, you're incompletely specifying your REST API.
I would argue 'yes' in theory, but 'no' for real-world application.
I see the logic in not processing if there's an error. Since you return a 406, not a 500, I would know that it's not an error in the data I provided, but rather in the way the result is being presented to me.
That said, some applications won't check for error codes; they will just see that it came back with an error rather than the XML it asked for, and assume the transaction failed.
I assume your not handling application/xml is not an actual problem, but for the purposes of the question - if this is actually being deployed as a real-world service, you'd almost certainly want to be able to have an XML representation, as that's (I suspect) the most common RESTful interaction, and many callers would probably be hard-coded to use XML.
To sum up: if you actually aren't providing application/xml, then I would say, don't perform the update. If you're handling all the standards, but you're planning for the contingency where a user will ask for application/fooSomethingNonStandard, then go ahead and perform the update, but be sure you respond with a 406.
One way out of your conundrum is to have a successful PUT return a 204 (No Content). That way the client's Accept header is irrelevant to the issue of whether the update is performed.
A "RESTful" (or at least "HTTP-embracing") client will know not to update its current "page", and that it will have to do a GET in order to refresh its view of the just-PUT resource. The Accept header on that GET is now, of course, a separate concern from the update atomicity.
I would either succeed and return a 200 using the method Rich suggests above or a 406 and fail. The protocol does not allow for a more nuanced approach mixing 2xx (Success) with 4xx (Error) codes so 4xx can be read to imply NOT Success.

Resources