Is Content-Transfer-Encoding an HTTP header? - http

I'm writing a web service that returns a base64-encoded PDF file, so my plan is to add two headers to the response:
Content-Type: application/pdf
Content-Transfer-Encoding: base64
My question is: Is Content-Transfer-Encoding a valid HTTP header? I think it might only be for MIME. If not, how should I craft my HTTP response to represent the fact that I'm returning a base64-encoded PDF? Thanks.
EDIT:
It looks like HTTP does not support this header. From RFC2616 Section 14:
Note: while the definition of Content-MD5 is exactly the same for HTTP
as in RFC 1864 for MIME entity-bodies, there are several ways in which
the application of Content-MD5 to HTTP entity-bodies differs from its
application to MIME entity-bodies. One is that HTTP, unlike MIME, does
not use Content-Transfer-Encoding, and does use Transfer-Encoding and
Content-Encoding.
Any ideas for what I should set my headers to? Thanks.
EDIT 2
Many of the code samples found in the comments of this PHP reference manual page seem to suggest that it actually is a valid HTTP header:
http://php.net/manual/en/function.header.php

According to RFC 1341 (made obsolete by RFC 2045):
A Content-Transfer-Encoding header field, which can be used to
specify an auxiliary encoding that was applied to the data in order to
allow it to pass through mail transport mechanisms which may have
data or character set limitations.
and later:
Many Content-Types which could usefully be transported via email
are represented, in their "natural" format, as 8-bit character or
binary data. Such data cannot be transmitted over some transport
protocols. For example, RFC 821 restricts mail messages to 7-bit
US-ASCII data with 1000 character lines.
It is necessary, therefore, to define a standard mechanism for
re-encoding such data into a 7-bit short-line format. (...) The
Content-Transfer-Encoding field is used to indicate the type of
transformation that has been used in order to represent the body
in an acceptable manner for transport.
Since you have a webservice, which has nothing in common with emails, you shouldn't use this header.
You can use Content-Encoding header which indicates that transferred data has been compressed (gzip value).
I think that in your case
Content-Type: application/pdf
is enough. Additionally, you can set Content-Length header, but in my opinion, if you are building webservice (it's not http server / proxy server) Content-Type is enough. Please bear in mind that some specific headers (e.g. Transfer-Encoding) if not used appropriately, may cause unexpected communication issues, so if you are not 100% sure about usage of some header - if you really need it or not - just don't use it.

Notes in rfc2616 section 14.15 are explicit: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
"Note: while the definition of Content-MD5 is exactly the same for
HTTP as in RFC 1864 for MIME entity-bodies, there are several ways
in which the application of Content-MD5 to HTTP entity-bodies
differs from its application to MIME entity-bodies. One is that
HTTP, unlike MIME, does not use Content-Transfer-Encoding, and
does use Transfer-Encoding and Content-Encoding. Another is that
HTTP more frequently uses binary content types than MIME, so it is
worth noting that, in such cases, the byte order used to compute
the digest is the transmission byte order defined for the type.
Lastly, HTTP allows transmission of text types with any of several
line break conventions and not just the canonical form using CRLF."

As been answered before and also here, a valid Content-Transfer-Encoding HTTP response header does not exist. Also the known headers Content-Encoding and Transfer-Encoding have no appropriate value to express a Base64 encoded response body.
Starting from here, no client would expect a response declared as application/pdf to be encoded as Base64! If you wand to do so, better use a different content type like:
Content-Type: application/pdf+base64
In this case, a client would know some Base64 encoded data is coming (the basic subtype is the suffix after the plus sign) and has a hint there is PDF in there.
Even this is a little hacky (+base64 is no official media type suffix) but at least would somehow meet some standards. Better use a custom content type than misusing standard HTTP headers!
Of course no browser would be able to directly open such a response anyway. Maybe your project should consider creating another endpoint offering a binary PDF response and marking this one deprecated.

Related

Using "Content-Encoding":"GZIP"

I want to send large amount of json over http to sever.
If I use "Content-Encoding":"GZIP" in my httpClient, does it automatically convert the request body to compressed format?
No, the RFC 7231 describes content encoding. If you are sending Content-Encoding you need to make sure that the content is in that encoding.
If you send Content-Encoding: gzip and the message in plain text you will (quite rightly) receive an HTTP 400. The body of a gzip message will always start with 0x1f 0x8b and if the server does not find that int he POST request it is right to complain.
Another reason for this is that you need an appropriate Content-Length header. This will not be the length of the original JSON, it must be the length (in bytes) of the gzipped JSON.
You need to perform the gzip of the JSON before sending anything since you need to know what to place in Content-Length beforehand.
Extra note: If the JSON is that huge (e.g. several gigabytes) you probably will need Transfer-Encoding: chunked, which comes with its own complications. (You do not send Content-Length but add the length of the chuck to the body itself.)
If it automatically does this, is 100% dependent on which http client you are using and if they implemented it that way. Usually setting a header will not automatically encode it, at least in the clients I regularly use.

Does HTTP content negotiation respect media type parameters

An HTTP request can include an Accept header, indicating the media type(s) of responses that the client will find acceptable. The server should honour the request by providing a response that has a Content-Type that matches (one of) the requested media type(s). A media type may include parameters. Does HTTP require that this process of content-negotiation respect parameters?
That is, if the client requests
Accept: application/vnd.example; version=2
(here the version parameter has a value of 2), and the server can serve media-type application/vnd.example; version=1, but not application/vnd.example; version=2, is it OK for the server to provide a response with
Content-Type: application/vnd.example; version=1
Is it OK for the server to provide a response labelled
Content-Type: application/vnd.example; version=2
but for the body of the response to actually be encoded as media-type application/vnd.example; version=1? That is, for the parameters of the media-type of a response to be an inaccurate description of the body of the response?
It seems that Spring MVC 4.1.0 does not respect media-type parameters when doing content negotiation, and gives responses for which the parameters of the media-type of the response are an inaccurate description of the body of the response. This seems to be because the org.springframework.util.MimeType.isCompatibleWith(MimeType) method does not examine the parameters of the MimeType objects.
The relevant standard, RFC 7231 section 3.1.1.1, says the following about media-types:
The type/subtype MAY be followed by parameters in the form of
name=value pairs.
So, Accept and Content-Type headers may contain media type parameters. It adds:
The presence or absence of a
parameter might be significant to the processing of a media-type,
depending on its definition within the media type registry.
That suggests that the server code that uses parameter types should pay attention to them, and not simply discard them, because for some media types they will be significant. It has to implement some smarts in whether to consider whether the media type parameters are significant.
Spring MVC 4.1.0 therefore seems to be wrong to completely ignore the parameters when doing content negotiation: the class org.springframework.web.servlet.mvc.method.annotation.AbstractMessageConverterMethodProcessor is incorrect to use org.springframework.util.MimeType.isCompatibleWith(MimeType), or that MimeType.isCompatibleWith(MimeType) method is incorrect. If you provide Spring with several HTTP message converters that differ only in the parameters of their supported media type, Spring will not reliably choose the HTTP message converter that has the media type that exactly matches the requested media type.
In section 3.1.1.5, where it describes the Content-Type header, it says:
The indicated media type defines both the data
format and how that data is intended to be processed by a recipient
As the parameters of a media type in general could vary the data format, the behaviour of Spring MVC 4.1.0 is wrong, in providing parameters that are an inaccurate description of the body of the response: the method AbstractMessageConverterMethodProcessor.getMostSpecificMediaType(MediaType, MediaType) is wrong to return the acceptType rather than the produceTypeToUse when the two types are equally specific.
However, section 3.4.1, which discusses content negotiation (Proactive Negotiation), notes:
A user agent cannot rely on proactive negotiation preferences being
consistently honored, since the origin server might not implement
proactive negotiation for the requested resource or might decide that
sending a response that doesn't conform to the user agent's
preferences is better than sending a 406 (Not Acceptable) response.
So the server is permitted to give a response that does not exactly match the media-type parameters requested, as a fall-back when it can not provide an exact match. That is, it may choose to respond with a application/vnd.example; version=1 response body, with a Content-Type: application/vnd.example; version=1 header, despite the request saying Accept: application/vnd.example; version=2, if, and only if generating a valid application/vnd.example; version=2 response would be impossible.
This apparently incorrect behaviour of Spring already has a Spring bug report, SPR-10903. The Spring developers closed it as "Works as Designed", noting
I don't know any rule for comparing media types with their parameters effectively. It really depends on the media type...If you're actually trying to achieve REST versioning through media types, it seems that the most common solution is to use different media types, since their format obviously changed between versions:
"application/vnd.spring.foo.v1+json"
"application/vnd.spring.foo.v2+json"
The relevant spec for content negotiation in HTTP/1.1 is RFC2616, Section 14.1.
It contains the following example, relevant to your question:
Accept: text/*, text/html, text/html;level=1, */*
and gives the precedence as
1) text/html;level=1
2) text/html
3) text/*
4) */*
So I think it is safe to say that text/html;level=1 and text/html are different media types.
I would also consider text/html;level=1 and text/html;level=2 as different.
So in your example I think it would be correct to respond with a 406 error and not respond with a different media type.

MIME type for HTTP requests other than form submissions

For requests not sent by HTML forms, does HTTP limit the Content-Type of a request to application/x-www-form-urlencoded for non-file uploads, or is that MIME type "right"/standard/semantically meaningful in any other way?
For example, PHP automatically parses the content into $_POST, which seems to indicate that x-www-form-urlencoded is expected by the server. On the other hand, I could use Ajax to send a JSON object in the HTTP request content and set the Content-Type to application/json. At least some server technologies (e.g. WSGI) would not try to parse that, and instead provide it in original form to the script.
What MIME type should I use in POST and PUT requests in a RESTful API to ensure compliance with all server implementations of HTTP? I'm disregarding such technologies as SOAP and JSON-RPC because they tunnel protocols through HTTP instead of using HTTP as intended.
Short Answer
You should specify whichever content type best describes the HTTP message entity body.
Long Answer
For example, PHP automatically parses the content into $_POST, which seems to indicate that x-www-form-urlencoded is expected by the server.
The server is not "expecting" x-www-form-urlencoded. PHP -- in an effort to make the lives of developers simpler -- will parse the form-encoded entity body into the $_POST superglobal if and only if Content-Type: x-www-form-urlencoded AND the entity body is actually a urlencoded key-value string. A similar process is followed for messages arriving with Content-Type: multipart/form-data to generate the $_FILES array. While helpful, these superglobals are unfortunately named and they obfuscate what's really happening in terms of the actual HTTP transactions.
What MIME type should I use in POST and PUT requests in a RESTful API
to ensure compliance with all server implementations of HTTP?
You should specify whichever content type best describes the HTTP message entity body. Always adhere to the official HTTP specification -- you can't go wrong if you do that. From RFC 2616 Sec 7.2.1 (emphasis added):
Any HTTP/1.1 message containing an entity-body SHOULD include a
Content-Type header field defining the media type of that body. If and
only if the media type is not given by a Content-Type field, the
recipient MAY attempt to guess the media type via inspection of its
content and/or the name extension(s) of the URI used to identify the
resource. If the media type remains unknown, the recipient SHOULD
treat it as type "application/octet-stream".
Any mainstream server technology will adhere to these rules. Thoughtful web applications will not trust your Content-Type header, because it may or may not be correct. The originator of the message is free to send a totally bogus value. Usually the Content-Type header is checked as a preliminary validation measure, but the content is further verified by parsing the actual data. For example, if you're PUTing JSON data to a REST service, the endpoint might first check to make sure that you've sent Content-Type: application/json, but then actually parse the entity body of your message to ensure it really is valid JSON.

Matching HTTP responses with their corresponding HTTP pipelined requests

I'm trying to write a program to match HTTP requests with their corresponding responses. Seems that everything is working well for most of the scenarios (when the transfer is perfectly ordered and even when its not, by using TCP sequence numbers).
The only problem I found is for when I have pipelined requests. After that, I get several responses but I don't know which packets are the answer to a specific request and which are not. I read in another post that the responses will come sequentially and combining this property with information on the Content-Length field seems to be a solution. The problem is that Content-length is not a mandatory field, so I'm not sure if I can always rely on that.
Does anyone know how the web-browsers that support this feature (btw, not most of them do) actually do it?
The information about the bodies length has to be present in the headers. It's just not always in 'content-length'. In order to work it all out you will have to study the relevant RFC 2616. Most notably section 4.4 deals with the different headers
Some more relevant rules from the RFC 2616:
When pipelining:
A server MUST send its responses to those requests in the same order that the requests were received.
From 9.2
If no response body is included, the response MUST include a Content-Length field with a field-value of "0".
From 10.2.7 206 Partial Content
The response MUST include .... Either a Content-Range header field ... or a multipart/byteranges
Content-Type including Content-Range fields for each part.
From 14.13 Content-Length
Applications SHOULD use this field to indicate the transfer-length of the message-body, unless this is prohibited by the rules in section 4.4.
Current responses are a bit old. Need a refresh.
The new HTTP 1.1 RFC is RFC 7230. And contains more precise information on parsing the messages size.
Message Body Length
Associating a response to a request
Security Considerations
Detecting the size of a message is quite complex. You can have a Content-length, or Transfer-Encoding: chunked, or both, or none. And some sepcial codes like 100 Continue which may alter all this.
The first link contains 7 entries that should be checked in the right order to guess the right size.
And as stated in the last link, failing to detect the right message length may lead to HTTP Smuggling (splitting, cache poisoning) issues.
Pipelining support is the source of most smuggling issues. You should really take care of the whole RFC7230 document if you want to implement it.

How do web servers know the charset using in forms posted to them?

When a web server gets a POST of a form, parsing it into param-value(s) pairs is quite straightforward. However, if the values contain non-English chars that have been encoded by the browser, it must know the charset used in order to decode them.
I've examined the requests sent by two posts. One was done from a page using UTF-8, and one from a page using Windows-1255. The same text was encoded differently. AFAIK, the Content-type header could contain a charset after the application/x-www-form-urlencoded, but it wasn't (using Firefox).
In a servlet, when you use request.getParameter(), you're supposed to get the decoded value. How does the servlet container do that? Does it always bet on UTF-8, use some heuristics, or is there some deterministic way I'm missing?
From the Serlvet 3.0 Spec, section 3.10 Request Data Encoding (emphasis mine)
Currently, many browsers do not send a char encoding qualifier with the ContentType header, leaving open the determination of the character encoding for reading
HTTP requests. The default encoding of a request the container uses to create the
request reader and parse POST data must be “ISO-8859-1” if none has been specified
by the client request. However, in order to indicate to the developer, in this case, the
failure of the client to send a character encoding, the container returns null from
the getCharacterEncoding method.
If the client hasn’t set character encoding and the request data is encoded with a
different encoding than the default as described above, breakage can occur. To
remedy this situation, a new method setCharacterEncoding(String enc) has
been added to the ServletRequest interface. Developers can override the
character encoding supplied by the container by calling this method. It must be
called prior to parsing any post data or reading any input from the request. Calling
this method once data has been read will not affect the encoding.
In practice, I find that setting the charset in a response influences the charset used in the subsequent POST. To be extra sure, you can write a Servlet Filter that calls the setCharacterEncoding on every request object before it is used.
You may also find this thread useful - Detecting the character encoding of an HTTP POST request
The apropriate header for specifying charsets is Accept-Charset.
Latest Chrome for linux, e.g., spits:
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3
on each request.
Section 14.2 from http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html states:
The Accept-Charset request-header field can be used to indicate what character sets are acceptable for the response. This field allows clients capable of understanding more comprehensive or special- purpose character sets to signal that capability to a server which is capable of representing documents in those character sets.
(...)
If no Accept-Charset header is
present, the default is that any
character set is acceptable. If an
Accept-Charset header is present, and
if the server cannot send a response
which is acceptable according to the
Accept-Charset header, then the server
SHOULD send an error response with the
406 (not acceptable) status code,
though the sending of an unacceptable
response is also allowed.
So if you receive such a header from a client, the value with highest q can be the encoding you're receiving from it.

Resources