Does HTTP content negotiation respect media type parameters - http

An HTTP request can include an Accept header, indicating the media type(s) of responses that the client will find acceptable. The server should honour the request by providing a response that has a Content-Type that matches (one of) the requested media type(s). A media type may include parameters. Does HTTP require that this process of content-negotiation respect parameters?
That is, if the client requests
Accept: application/vnd.example; version=2
(here the version parameter has a value of 2), and the server can serve media-type application/vnd.example; version=1, but not application/vnd.example; version=2, is it OK for the server to provide a response with
Content-Type: application/vnd.example; version=1
Is it OK for the server to provide a response labelled
Content-Type: application/vnd.example; version=2
but for the body of the response to actually be encoded as media-type application/vnd.example; version=1? That is, for the parameters of the media-type of a response to be an inaccurate description of the body of the response?
It seems that Spring MVC 4.1.0 does not respect media-type parameters when doing content negotiation, and gives responses for which the parameters of the media-type of the response are an inaccurate description of the body of the response. This seems to be because the org.springframework.util.MimeType.isCompatibleWith(MimeType) method does not examine the parameters of the MimeType objects.

The relevant standard, RFC 7231 section 3.1.1.1, says the following about media-types:
The type/subtype MAY be followed by parameters in the form of
name=value pairs.
So, Accept and Content-Type headers may contain media type parameters. It adds:
The presence or absence of a
parameter might be significant to the processing of a media-type,
depending on its definition within the media type registry.
That suggests that the server code that uses parameter types should pay attention to them, and not simply discard them, because for some media types they will be significant. It has to implement some smarts in whether to consider whether the media type parameters are significant.
Spring MVC 4.1.0 therefore seems to be wrong to completely ignore the parameters when doing content negotiation: the class org.springframework.web.servlet.mvc.method.annotation.AbstractMessageConverterMethodProcessor is incorrect to use org.springframework.util.MimeType.isCompatibleWith(MimeType), or that MimeType.isCompatibleWith(MimeType) method is incorrect. If you provide Spring with several HTTP message converters that differ only in the parameters of their supported media type, Spring will not reliably choose the HTTP message converter that has the media type that exactly matches the requested media type.
In section 3.1.1.5, where it describes the Content-Type header, it says:
The indicated media type defines both the data
format and how that data is intended to be processed by a recipient
As the parameters of a media type in general could vary the data format, the behaviour of Spring MVC 4.1.0 is wrong, in providing parameters that are an inaccurate description of the body of the response: the method AbstractMessageConverterMethodProcessor.getMostSpecificMediaType(MediaType, MediaType) is wrong to return the acceptType rather than the produceTypeToUse when the two types are equally specific.
However, section 3.4.1, which discusses content negotiation (Proactive Negotiation), notes:
A user agent cannot rely on proactive negotiation preferences being
consistently honored, since the origin server might not implement
proactive negotiation for the requested resource or might decide that
sending a response that doesn't conform to the user agent's
preferences is better than sending a 406 (Not Acceptable) response.
So the server is permitted to give a response that does not exactly match the media-type parameters requested, as a fall-back when it can not provide an exact match. That is, it may choose to respond with a application/vnd.example; version=1 response body, with a Content-Type: application/vnd.example; version=1 header, despite the request saying Accept: application/vnd.example; version=2, if, and only if generating a valid application/vnd.example; version=2 response would be impossible.
This apparently incorrect behaviour of Spring already has a Spring bug report, SPR-10903. The Spring developers closed it as "Works as Designed", noting
I don't know any rule for comparing media types with their parameters effectively. It really depends on the media type...If you're actually trying to achieve REST versioning through media types, it seems that the most common solution is to use different media types, since their format obviously changed between versions:
"application/vnd.spring.foo.v1+json"
"application/vnd.spring.foo.v2+json"

The relevant spec for content negotiation in HTTP/1.1 is RFC2616, Section 14.1.
It contains the following example, relevant to your question:
Accept: text/*, text/html, text/html;level=1, */*
and gives the precedence as
1) text/html;level=1
2) text/html
3) text/*
4) */*
So I think it is safe to say that text/html;level=1 and text/html are different media types.
I would also consider text/html;level=1 and text/html;level=2 as different.
So in your example I think it would be correct to respond with a 406 error and not respond with a different media type.

Related

HTTP Status Code Priority and Processing

Let's say a web application gets the following request:
POST /some/endpoint HTTP/1.1
Host: <something>
Accept: application/json
Accept-Language: pt
Content-Type: application/json
If-Match: "blabla"
Some body
If the server doesn't support HTTP 1.1 and the endpoint /some/endpoint does not exist, the former problem should likely be checked first, and a 505 rather than 404 should be returned.
If it just so happens that none of the endpoints of the server accept POST and the endpoint /some/endpoint doesn't exist, the latter should get priority, and 404 should be returned rather than 405.
If the Accept can't be provided and the body can't be appropriately decoded/validated, probably 406 should take precedence over 400.
These are cases where intuition might suffice. But there are a myriad other ones where it is not clear which of two non-2XX status codes should be preferred/checked first. For example, should Content-Type (resulting in 415) or Accept-Language (406) be returned if both would fail? 415 or 412? And on it goes...
Much of the time errors are pairwise independent: if the aspect that is relevant to one error being thrown (such as a particular header value) is fixed, the success/error status of another will not be affected. In those cases, the wrong error "priority" is perhaps only a nuisance. But sometimes it may be the case that these errors are not independent: I might have a resources as HTML in Portuguese, but in JSON only in English (humour me), so that if a client expects me to prioritise Accept-Language over Accept, and I do the opposite, the result will be quite bad.
The question should be evident now: are there any standards about which errors should be prioritised?
I haven't come across any relevant RFCs, or even much serious and general discussion. I know of the webmachine diagram, which sort of helps, but primarily just seems to describe a particular (well thought out) implementation rather than any standard.
Obviously, you can’t expect this question to be answered “no,” even though that’s probably the correct answer.
So let me address a particular point of yours instead:
I might have a resources as HTML in Portuguese, but in JSON only in English (humour me), so that if a client expects me to prioritise Accept-Language over Accept, and I do the opposite, the result will be quite bad.
In your example, you tell the server that Portuguese JSON is good, but all other combinations are equally bad. If that’s not the case, you can elaborate your preferences like this:
Accept: text/json
Accept-Language: pt, en;q=0.1
The server can then multiply your weights, getting 1×0.1=0.1 for English JSON and 0×1=0 for Portuguese HTML, and choosing the former.
(Sidenote 1: there is no text/json media type in the registry. You probably want application/json.)
(Sidenote 2: 415 Unsupported Media Type is not a correct response code for the scenarios you mention. It concerns the request body. If you cannot honor the Accept header, you can respond with 406 Not Acceptable, just as with Accept-Language.)
TL;DR: The specifications give the server ultimate authority in how it honors the request, even allowing the server to ignore the acceptable formats the client requests. However, the specifications instruct the server to make a best effort and to respond in a way that best helps the client recover from errors.
The specifications provide guidance, even if they don't (or can't) prioritize all possible error modes.
RFC 2616 § 10.4.7 says:
HTTP/1.1 servers are allowed to return responses which are
not acceptable according to the accept headers sent in the
request. In some cases, this may even be preferable to sending a
406 response. User agents are encouraged to inspect the headers of
an incoming response to determine if it is acceptable.
RFC 7231 § 3 says:
An origin server might be provided with, or be capable of generating,
multiple representations that are each intended to reflect the
current state of a target resource. In such cases, some algorithm is
used by the origin server to select one of those representations as
most applicable to a given request, usually based on content
negotiation.
RFC 7231 § 3.4 says:
Note that, in all cases, HTTP is not aware of the resource semantics.
The consistency with which an origin server responds to requests ... is determined entirely by whatever entity or algorithm selects
or generates those responses. HTTP pays no attention to the man
behind the curtain.
RFC 7231 § 3.3 says:
Response messages with an error status code
usually contain a payload that represents the error condition, such
that it describes the error state and what next steps are suggested
for resolving it.
RFC 2616 § 14.46 says:
The Warning general-header field is used to carry additional information about the status or transformation of a message which might not be reflected in the message. This information is typically used to warn about a possible lack of semantic transparency from caching operations or transformations applied to the entity body of the message.
(Emphases all mine.)
Section 3 of RFC 7231 gives the origin server ultimate authority to decide the appropriate response, even if that response is repugnant. Simultaneously, section 3 encourages the origin server to satisfy the request, or provide notice that it satisfied some of the request (Vary), or provide selectable options ("Passive negotiation").
Even though the server has ultimate authority, the specification makes clear to me that the responses should help the user resolve the problem. In my mind, the best error code is the one that helps the user best fix the problem!
Considering your pair-wise examples:
"If the server doesn't support HTTP 1.1 and the endpoint /some/endpoint does not exist, the former problem should likely be checked first, and a 505 rather than 404 should be returned."
No. Per the spec, an HTTP 1.1 client can GET from 1.0 server by protocol downgrade, so this kind of version negotiation is handled by the specification. Send a 404 (or a 301 if that's known) so the user can correct it.
"If it just so happens that none of the endpoints of the server accept POST and the endpoint /some/endpoint doesn't exist, the latter should get priority, and 404 should be returned rather than 405."
Yes, 404. If you're not getting to a resource, the method hardly matters.
"If the Accept can't be provided and the body can't be appropriately decoded/validated, probably 406 should take precedence over 400."
Never send 400 when you know 406 applies. You're giving the client less information, which is less helpful. However, the origin server is free to ignore the Accept header per RFC 7231 § 5.3.2:
If the [Accept] header field is
present in a request and none of the available representations for
the response have a media type that is listed as acceptable, the
origin server can either honor the header field by sending a 406 (Not
Acceptable) response or disregard the header field by treating the
response as if it is not subject to content negotiation.
"I might have a resources as HTML in Portuguese, but in JSON only in English (humour me), so that if a client expects me to prioritise Accept-Language over Accept, and I do the opposite, the result will be quite bad."
I disagree that the result will be bad. See RFC 7231 § 5.3.5:
the origin server can either disregard the [Accept-Language] header field by treating the response as if it is not subject to content negotiation or honor the header field by sending a 406 (Not Acceptable) response. However, the latter is not encouraged, as doing so can prevent users from accessing content that they might be able to use (with translation software, for example).
This pattern of specification language occurs more than once. "The server may disregard [whatever the client requested] by treating the response as if it's not subject to [this part of the specification], or the server may honor [the client request] and send [an applicable error code]. But, it's better to [send something intelligible] than only send [an inscrutable error code]."
At the end of the day, it's your API. HTTP provides only a window into your semantics. Document what you accept, how you respond, and with what. Send intelligible responses (HATEOAS is good) and, when applicable, the most specific error codes available.

How are parameters sent in an HTTP POST request?

In an HTTP GET request, parameters are sent as a query string:
http://example.com/page?parameter=value&also=another
In an HTTP POST request, the parameters are not sent along with the URI.
Where are the values? In the request header? In the request body? What does it look like?
The values are sent in the request body, in the format that the content type specifies.
Usually the content type is application/x-www-form-urlencoded, so the request body uses the same format as the query string:
parameter=value&also=another
When you use a file upload in the form, you use the multipart/form-data encoding instead, which has a different format. It's more complicated, but you usually don't need to care what it looks like, so I won't show an example, but it can be good to know that it exists.
The content is put after the HTTP headers. The format of an HTTP POST is to have the HTTP headers, followed by a blank line, followed by the request body. The POST variables are stored as key-value pairs in the body.
You can see this in the raw content of an HTTP Post, shown below:
POST /path/script.cgi HTTP/1.0
From: frog#jmarshall.com
User-Agent: HTTPTool/1.0
Content-Type: application/x-www-form-urlencoded
Content-Length: 32
home=Cosby&favorite+flavor=flies
You can see this using a tool like Fiddler, which you can use to watch the raw HTTP request and response payloads being sent across the wire.
Short answer: in POST requests, values are sent in the "body" of the request. With web-forms they are most likely sent with a media type of application/x-www-form-urlencoded or multipart/form-data. Programming languages or frameworks which have been designed to handle web-requests usually do "The Right Thing™" with such requests and provide you with easy access to the readily decoded values (like $_REQUEST or $_POST in PHP, or cgi.FieldStorage(), flask.request.form in Python).
Now let's digress a bit, which may help understand the difference ;)
The difference between GET and POST requests are largely semantic. They are also "used" differently, which explains the difference in how values are passed.
GET (relevant RFC section)
When executing a GET request, you ask the server for one, or a set of entities. To allow the client to filter the result, it can use the so called "query string" of the URL. The query string is the part after the ?. This is part of the URI syntax.
So, from the point of view of your application code (the part which receives the request), you will need to inspect the URI query part to gain access to these values.
Note that the keys and values are part of the URI. Browsers may impose a limit on URI length. The HTTP standard states that there is no limit. But at the time of this writing, most browsers do limit the URIs (I don't have specific values). GET requests should never be used to submit new information to the server. Especially not larger documents. That's where you should use POST or PUT.
POST (relevant RFC section)
When executing a POST request, the client is actually submitting a new document to the remote host. So, a query string does not (semantically) make sense. Which is why you don't have access to them in your application code.
POST is a little bit more complex (and way more flexible):
When receiving a POST request, you should always expect a "payload", or, in HTTP terms: a message body. The message body in itself is pretty useless, as there is no standard (as far as I can tell. Maybe application/octet-stream?) format. The body format is defined by the Content-Type header. When using a HTML FORM element with method="POST", this is usually application/x-www-form-urlencoded. Another very common type is multipart/form-data if you use file uploads. But it could be anything, ranging from text/plain, over application/json or even a custom application/octet-stream.
In any case, if a POST request is made with a Content-Type which cannot be handled by the application, it should return a 415 status-code.
Most programming languages (and/or web-frameworks) offer a way to de/encode the message body from/to the most common types (like application/x-www-form-urlencoded, multipart/form-data or application/json). So that's easy. Custom types require potentially a bit more work.
Using a standard HTML form encoded document as example, the application should perform the following steps:
Read the Content-Type field
If the value is not one of the supported media-types, then return a response with a 415 status code
otherwise, decode the values from the message body.
Again, languages like PHP, or web-frameworks for other popular languages will probably handle this for you. The exception to this is the 415 error. No framework can predict which content-types your application chooses to support and/or not support. This is up to you.
PUT (relevant RFC section)
A PUT request is pretty much handled in the exact same way as a POST request. The big difference is that a POST request is supposed to let the server decide how to (and if at all) create a new resource. Historically (from the now obsolete RFC2616 it was to create a new resource as a "subordinate" (child) of the URI where the request was sent to).
A PUT request in contrast is supposed to "deposit" a resource exactly at that URI, and with exactly that content. No more, no less. The idea is that the client is responsible to craft the complete resource before "PUTting" it. The server should accept it as-is on the given URL.
As a consequence, a POST request is usually not used to replace an existing resource. A PUT request can do both create and replace.
Side-Note
There are also "path parameters" which can be used to send additional data to the remote, but they are so uncommon, that I won't go into too much detail here. But, for reference, here is an excerpt from the RFC:
Aside from dot-segments in hierarchical paths, a path segment is considered
opaque by the generic syntax. URI producing applications often use the
reserved characters allowed in a segment to delimit scheme-specific or
dereference-handler-specific subcomponents. For example, the semicolon (";")
and equals ("=") reserved characters are often used to delimit parameters and
parameter values applicable to that segment. The comma (",") reserved
character is often used for similar purposes. For example, one URI producer
might use a segment such as "name;v=1.1" to indicate a reference to version
1.1 of "name", whereas another might use a segment such as "name,1.1" to
indicate the same. Parameter types may be defined by scheme-specific
semantics, but in most cases the syntax of a parameter is specific
to the implementation of the URIs dereferencing algorithm.
You cannot type it directly on the browser URL bar.
You can see how POST data is sent on the Internet with Live HTTP Headers for example.
Result will be something like that
http://127.0.0.1/pass.php
POST /pass.php HTTP/1.1
Host: 127.0.0.1
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/20100101 Firefox/18.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Referer: http://127.0.0.1/pass.php
Cookie: passx=87e8af376bc9d9bfec2c7c0193e6af70; PHPSESSID=l9hk7mfh0ppqecg8gialak6gt5
Connection: keep-alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 30
username=zurfyx&pass=password
Where it says
Content-Length: 30
username=zurfyx&pass=password
will be the post values.
The default media type in a POST request is application/x-www-form-urlencoded. This is a format for encoding key-value pairs. The keys can be duplicate. Each key-value pair is separated by an & character, and each key is separated from its value by an = character.
For example:
Name: John Smith
Grade: 19
Is encoded as:
Name=John+Smith&Grade=19
This is placed in the request body after the HTTP headers.
Form values in HTTP POSTs are sent in the request body, in the same format as the querystring.
For more information, see the spec.
Some of the webservices require you to place request data and metadata separately. For example a remote function may expect that the signed metadata string is included in a URI, while the data is posted in a HTTP-body.
The POST request may semantically look like this:
POST /?AuthId=YOURKEY&Action=WebServiceAction&Signature=rcLXfkPldrYm04 HTTP/1.1
Content-Type: text/tab-separated-values; charset=iso-8859-1
Content-Length: []
Host: webservices.domain.com
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: identity
User-Agent: Mozilla/3.0 (compatible; Indy Library)
name id
John G12N
Sarah J87M
Bob N33Y
This approach logically combines QueryString and Body-Post using a single Content-Type which is a "parsing-instruction" for a web-server.
Please note: HTTP/1.1 is wrapped with the #32 (space) on the left and with #10 (Line feed) on the right.
First of all, let's differentiate between GET and POST
Get: It is the default HTTP request that is made to the server and is used to retrieve the data from the server and query string that comes after ? in a URI is used to retrieve a unique resource.
this is the format
GET /someweb.asp?data=value HTTP/1.0
here data=value is the query string value passed.
POST: It is used to send data to the server safely so anything that is needed, this is the format of a POST request
POST /somweb.aspHTTP/1.0
Host: localhost
Content-Type: application/x-www-form-urlencoded //you can put any format here
Content-Length: 11 //it depends
Name= somename
Why POST over GET?
In GET the value being sent to the servers are usually appended to the base URL in the query string,now there are 2 consequences of this
The GET requests are saved in browser history with the parameters. So your passwords remain un-encrypted in browser history. This was a real issue for Facebook back in the days.
Usually servers have a limit on how long a URI can be. If have too many parameters being sent you might receive 414 Error - URI too long
In case of post request your data from the fields are added to the body instead. Length of request params is calculated, and added to the header for content-length and no important data is directly appended to the URL.
You can use the Google Developer Tools' network section to see basic information about how requests are made to the servers.
and you can always add more values in your Request Headers like Cache-Control , Origin , Accept.
There are many ways/formats of post parameters
formdata
raw data
json
encoded data
file
xml
They are controlled by content-type in Header that are representes as mime-types.
In CGI Programming on the World Wide Web the author says:
Using the POST method, the server sends the data as an input stream to
the program. ..... since the server passes information to this program
as an input stream, it sets the environment variable CONTENT_LENGTH to
the size of the data in number of bytes (or characters). We can use
this to read exactly that much data from standard input.

MIME type for HTTP requests other than form submissions

For requests not sent by HTML forms, does HTTP limit the Content-Type of a request to application/x-www-form-urlencoded for non-file uploads, or is that MIME type "right"/standard/semantically meaningful in any other way?
For example, PHP automatically parses the content into $_POST, which seems to indicate that x-www-form-urlencoded is expected by the server. On the other hand, I could use Ajax to send a JSON object in the HTTP request content and set the Content-Type to application/json. At least some server technologies (e.g. WSGI) would not try to parse that, and instead provide it in original form to the script.
What MIME type should I use in POST and PUT requests in a RESTful API to ensure compliance with all server implementations of HTTP? I'm disregarding such technologies as SOAP and JSON-RPC because they tunnel protocols through HTTP instead of using HTTP as intended.
Short Answer
You should specify whichever content type best describes the HTTP message entity body.
Long Answer
For example, PHP automatically parses the content into $_POST, which seems to indicate that x-www-form-urlencoded is expected by the server.
The server is not "expecting" x-www-form-urlencoded. PHP -- in an effort to make the lives of developers simpler -- will parse the form-encoded entity body into the $_POST superglobal if and only if Content-Type: x-www-form-urlencoded AND the entity body is actually a urlencoded key-value string. A similar process is followed for messages arriving with Content-Type: multipart/form-data to generate the $_FILES array. While helpful, these superglobals are unfortunately named and they obfuscate what's really happening in terms of the actual HTTP transactions.
What MIME type should I use in POST and PUT requests in a RESTful API
to ensure compliance with all server implementations of HTTP?
You should specify whichever content type best describes the HTTP message entity body. Always adhere to the official HTTP specification -- you can't go wrong if you do that. From RFC 2616 Sec 7.2.1 (emphasis added):
Any HTTP/1.1 message containing an entity-body SHOULD include a
Content-Type header field defining the media type of that body. If and
only if the media type is not given by a Content-Type field, the
recipient MAY attempt to guess the media type via inspection of its
content and/or the name extension(s) of the URI used to identify the
resource. If the media type remains unknown, the recipient SHOULD
treat it as type "application/octet-stream".
Any mainstream server technology will adhere to these rules. Thoughtful web applications will not trust your Content-Type header, because it may or may not be correct. The originator of the message is free to send a totally bogus value. Usually the Content-Type header is checked as a preliminary validation measure, but the content is further verified by parsing the actual data. For example, if you're PUTing JSON data to a REST service, the endpoint might first check to make sure that you've sent Content-Type: application/json, but then actually parse the entity body of your message to ensure it really is valid JSON.

Appropriate HTTP status code for request specifying invalid Content-Encoding header?

What status code should be returned if a client sends an HTTP request and specifies a Content-Encoding header which cannot be decoded by the server?
Example
A client POSTs JSON data to a REST resource and encodes the entity body using the gzip coding. However, the server can only decode DEFLATE codings because it failed the gzip class in server school.
What HTTP response code should be returned? I would say 415 Unsupported Media Type but it's not the entity's Content-Type that is the problem -- it's the encoding of the otherwise supported entity body.
Which is more appropriate: 415? 400? Perhaps a custom response code?
Addendum: I have, of course, thoroughly checked rfc2616. If the answer is there I may need some new corrective eyewear, but I don't believe that it is.
Update:
This has nothing to do with sending a response that might be unacceptable to a client. The problem is that the client is sending the server what may or may not be a valid media type in an encoding the server cannot understand (as per the Content-Encoding header the client packaged with the request message).
It's an edge-case and wouldn't be encountered when dealing with browser user-agents, but it could crop up in REST APIs accepting entity bodies to create/modify resources.
As i'm reading it, 415 Unsupported Media Type sounds like the most appropriate.
From RFC 2616:
10.4.16 415 Unsupported Media Type
The server is refusing to service the request because the entity of the request is in a format not supported by the requested resource for the requested method.
Yeah, the text part says "media type" rather than "encoding", but the actual description doesn't include any mention of that distinction.
The new hotness, RFC 7231, is even explicit about it:
6.5.13. 415 Unsupported Media Type
The 415 (Unsupported Media Type) status code indicates that the
origin server is refusing to service the request because the payload
is in a format not supported by this method on the target resource.
The format problem might be due to the request's indicated
Content-Type or Content-Encoding, or as a result of inspecting the
data directly.
They should make that the final question on Who Wants To Be a Millionaire!
Well the browser made a request that the server cannot service because the information the client provided is in a format that cannot be handled by the server. However, this isn't the server's fault for not supporting the data the client provided, it's the client's fault for not listening to the server's Acccept-* headers and providing data in an inappropriate encoding. That would make it a Client Error (400 series error code).
My first instinct is 400 Bad Request is the appropriate response in this case.
405 Method Not Allowed isn't right because it refers to the HTTP verb being one that isn't allowed.
406 Not Acceptable looks like it might have promise, but it refers to the server being unable to provide data to the client that satisfies the Accept-* request headers that it sent. This doesn't seem like it would fit your case.
412 Precondition Failed is rather vaguely defined. It might be appropriate, but I wouldn't bet on it.
415 Unsupported Media Type isn't right because it's not the data type that's being rejected, it's the encoding format.
After that we get into the realm of non-standard response codes.
422 Unprocessable Entity describes a response that should be returned if the request was well-formed but if it was semantically incorrect in some way. This seems like a good fit, but it's a WebDAV extension to HTTP and not standard.
Given the above, I'd personally opt for 400 Bad Request. If any other HTTP experts have a better candidate though, I'd listen to them instead. ;)
UPDATE: I'd previously been referencing the HTTP statuses from their page on Wikipedia. Whilst the information there seems to be accurate, it's also less than thorough. Looking at the specs from W3C gives a lot more information on HTTP 406, and it's leading me to think that 406 might be the right code after all.
10.4.7 406 Not Acceptable
The resource identified by the request is only capable of generating
response entities which have content characteristics not acceptable
according to the accept headers sent in the request.
Unless it was a HEAD request, the response SHOULD include an entity
containing a list of available entity characteristics and location(s)
from which the user or user agent can choose the one most appropriate.
The entity format is specified by the media type given in the
Content-Type header field. Depending upon the format and the
capabilities of the user agent, selection of the most appropriate
choice MAY be performed automatically. However, this specification
does not define any standard for such automatic selection.
Note: HTTP/1.1 servers are allowed to return responses which are
not acceptable according to the accept headers sent in the
request. In some cases, this may even be preferable to sending a
406 response. User agents are encouraged to inspect the headers of
an incoming response to determine if it is acceptable.
If the response could be unacceptable, a user agent SHOULD temporarily
stop receipt of more data and query the user for a decision on further
actions.
While it does mention the Content-Type header explicitly, the wording mentions "entity characteristics", which you could read as covering stuff like GZIP versus DEFLATE compression.
One thing worth noting is that the spec says that it may be appropriate to just send the data as is, along with the headers to tell the client what format it's in and what encoding it uses, and just leave it for the client to sort out. So if the client sends a header indicating it accepts GZIP compression, but the server can only generate a response with DEFLATE, then sending that along with headers saying it's DEFLATE should be okay (depending on the context).
Client: Give me a GZIPPED page.
Server: Sorry, no can do. I can DEFLATE pack it for you. Here's the DEFLATE packed page. Is that okay for you?
Client: Welllll... I didn't really want DEFLATE, but I can decode it okay so I'll take it.
(or)
Client: I think I'll have to clear that with my user. Hold on.

Is Content-Transfer-Encoding an HTTP header?

I'm writing a web service that returns a base64-encoded PDF file, so my plan is to add two headers to the response:
Content-Type: application/pdf
Content-Transfer-Encoding: base64
My question is: Is Content-Transfer-Encoding a valid HTTP header? I think it might only be for MIME. If not, how should I craft my HTTP response to represent the fact that I'm returning a base64-encoded PDF? Thanks.
EDIT:
It looks like HTTP does not support this header. From RFC2616 Section 14:
Note: while the definition of Content-MD5 is exactly the same for HTTP
as in RFC 1864 for MIME entity-bodies, there are several ways in which
the application of Content-MD5 to HTTP entity-bodies differs from its
application to MIME entity-bodies. One is that HTTP, unlike MIME, does
not use Content-Transfer-Encoding, and does use Transfer-Encoding and
Content-Encoding.
Any ideas for what I should set my headers to? Thanks.
EDIT 2
Many of the code samples found in the comments of this PHP reference manual page seem to suggest that it actually is a valid HTTP header:
http://php.net/manual/en/function.header.php
According to RFC 1341 (made obsolete by RFC 2045):
A Content-Transfer-Encoding header field, which can be used to
specify an auxiliary encoding that was applied to the data in order to
allow it to pass through mail transport mechanisms which may have
data or character set limitations.
and later:
Many Content-Types which could usefully be transported via email
are represented, in their "natural" format, as 8-bit character or
binary data. Such data cannot be transmitted over some transport
protocols. For example, RFC 821 restricts mail messages to 7-bit
US-ASCII data with 1000 character lines.
It is necessary, therefore, to define a standard mechanism for
re-encoding such data into a 7-bit short-line format. (...) The
Content-Transfer-Encoding field is used to indicate the type of
transformation that has been used in order to represent the body
in an acceptable manner for transport.
Since you have a webservice, which has nothing in common with emails, you shouldn't use this header.
You can use Content-Encoding header which indicates that transferred data has been compressed (gzip value).
I think that in your case
Content-Type: application/pdf
is enough. Additionally, you can set Content-Length header, but in my opinion, if you are building webservice (it's not http server / proxy server) Content-Type is enough. Please bear in mind that some specific headers (e.g. Transfer-Encoding) if not used appropriately, may cause unexpected communication issues, so if you are not 100% sure about usage of some header - if you really need it or not - just don't use it.
Notes in rfc2616 section 14.15 are explicit: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
"Note: while the definition of Content-MD5 is exactly the same for
HTTP as in RFC 1864 for MIME entity-bodies, there are several ways
in which the application of Content-MD5 to HTTP entity-bodies
differs from its application to MIME entity-bodies. One is that
HTTP, unlike MIME, does not use Content-Transfer-Encoding, and
does use Transfer-Encoding and Content-Encoding. Another is that
HTTP more frequently uses binary content types than MIME, so it is
worth noting that, in such cases, the byte order used to compute
the digest is the transmission byte order defined for the type.
Lastly, HTTP allows transmission of text types with any of several
line break conventions and not just the canonical form using CRLF."
As been answered before and also here, a valid Content-Transfer-Encoding HTTP response header does not exist. Also the known headers Content-Encoding and Transfer-Encoding have no appropriate value to express a Base64 encoded response body.
Starting from here, no client would expect a response declared as application/pdf to be encoded as Base64! If you wand to do so, better use a different content type like:
Content-Type: application/pdf+base64
In this case, a client would know some Base64 encoded data is coming (the basic subtype is the suffix after the plus sign) and has a hint there is PDF in there.
Even this is a little hacky (+base64 is no official media type suffix) but at least would somehow meet some standards. Better use a custom content type than misusing standard HTTP headers!
Of course no browser would be able to directly open such a response anyway. Maybe your project should consider creating another endpoint offering a binary PDF response and marking this one deprecated.

Resources