Why is the `Content-Encoding` header named that way? - http

Background
A server can send a Content-Encoding header to indicate if, and how, the content of the response body has been compressed. E.g.
Content-Encoding: gzip
A server can also send a Content-Type header to indicate the media type, and optionally provide the standard used to encode the content. E.g.
Content-Type: text/html; charset=utf-8
Therefore, it seems that the encoding of the content is specified in the Content-Type header, and not the Content-Encoding header.
Question
During the design of the HTTP standard, what's the rationale behind the naming of the Content-Encoding header? (Over, say Content-Compression) Is this a similar case of bad naming like the 401 Unauthorized response code?

Related

Charles - How to rewrite a HTTP response encoded in GZIP?

I am trying to change the body of a HTTP response (for debugging purposes) using Charles webproxy. However, the response is GZIPed and when I modify something in the body (using the Tools-->Rewrite options) two problems emerge:
The Content-Length header is not updated to reflect the new data I've added to the response. I have to manually update it myself.
Even if I update the Content-Length manually, it seems the new GZIP body is invalid (ie, it is not GZIPed correctly).
I can't just remove the Content-Encoding: gzip from the request because I don't have control over the code that consumes the response. That code is always expecting a gzip body and it fails to parse the data if I remove the GZIP encoding from the response by removing the Content-Encoding: gzip from the request.
How can I modify a GZIPed response body using Charles?

How should I specify HTTP accept header for multipart response?

I want to communicate to a HTTP endpoint that for its multipart response, I want part 1 (or reference a content-disposition filename) to be in JSON and part 2 to be in XML, what's right way to do that?
I can include both JSON and XML but then that alone doesn't communicate my intention of wanting different formats for each part.
EDIT:
Suppose I have a service and right now it's returning something along the line of:
Content-Type: multipart/mixed; boundary=--37adc569155a4943b203e28a422cb96f
Content-Length: ...
----37adc569155a4943b203e28a422cb96f
Content-Type: application/xml; charset=utf-8
Content-Disposition: result
<Result>
...
</Result>
----37adc569155a4943b203e28a422cb96f
Content-Type: application/json; charset=utf-8
Content-Disposition: state
{ "Score": 42, ... }
----37adc569155a4943b203e28a422cb96f--
I can, and want to support passing the data back in different formats, for instance, using protocol buffer for sending the state back or using JSON for result.
I figured the right way to do it would be via the HTTP Accept header, but how do I communicate to the service that I want the state in JSON and result in protocol buffer? If the Accept header is not the way to go, what should I be using instead?
I think what you are looking to do is not defined by any of the RFCs and calls for using custom header fields. I would add headers like
X-Foo-Accept-State: application/json; charset=utf-8
X-Foo-Accept-Result: application/xml; charset=utf-8
(Where "Foo" is your company name. See elsewhere for arguments about the best way to name custom header fields.)

How to get file type of HTTP response?

Is there a way to get the file type(extension) of http response? It is not doable to parse http request because some times there will not be file name in the url, for example, "GET www.stackoverflow.com"
HTTP isn't concerned about file types or file extensions, but uses MIME types to distinguish between different content types. As mentioned by shyam, it is represented by the Content-Type header, which for normal web pages may look like this:
Content-Type: text/html; charset=utf-8
An exception is when the HTTP response is serving a file which is supposed to be stored on the client side, in which case a Content-Disposition header may be included to indicate a filename and thus a file extension:
Content-Disposition: attachment; filename="fname.ext"
You can try to guess from the Content-Type header

Chunked transfer encoding - browser behavior

I'm trying to send data in chunked mode. All headers are set properly and data is encoded accordingly. Browsers recognize my response as a chunked one, accepting headers and start receiving data.
I was expecting the browser would update the page on each received chunk, instead it waits until all chunks are received then displays them all. Is this the expected behavior?
I was expecting to see each chunk displayed right after it was received. When using curl, each chunk is shown right after it is received. Why does the same not happen with GUI browsers? Are they using some sort of buffering/cache?
I set the Cache-Control header to no-cache, so not sure it is about cache.
afaik browsers needs some payload to start render chunks as they received.
Curl is of course an exception.
Try to send about 1KB of arbitrary data before your first chunk.
If you are doing everything correctly, browsers should render chunks as they received.
Fix your headers.
As of 2019, if you use Content-type: text/html, no buffering occurs in Chrome.
If you just want to stream text, similar to text/plain, then just using Content-type: text/event-stream will also disable buffering.
If you use Content-type: text/plain, then Chrome will still buffer 1 KiB, unless you additionally specify X-Content-Type-Options: nosniff.
RFC 2045 specifies that if no Content-Type is specified, Content-type: text/plain; charset=us-ascii should be assumed
5.2. Content-Type Defaults
Default RFC 822 messages without a MIME Content-Type header are taken
by this protocol to be plain text in the US-ASCII character set,
which can be explicitly specified as:
Content-type: text/plain; charset=us-ascii
This default is assumed if no Content-Type header field is specified.
It is also recommend that this default be assumed when a
syntactically invalid Content-Type header field is encountered. In
the presence of a MIME-Version header field and the absence of any
Content-Type header field, a receiving User Agent can also assume
that plain US-ASCII text was the sender's intent. Plain US-ASCII
text may still be assumed in the absence of a MIME-Version or the
presence of an syntactically invalid Content-Type header field, but
the sender's intent might have been otherwise.
Browsers will start to buffer text/plain for a certain amount in order to check if they can detect if the content sent is really plain text or some media type like an image, in case the Content-Type was omitted, which would then equal a text/plain content type. This is called MIME type sniffing.
MIME type sniffing is defined by Mozilla as:
In the absence of a MIME type, or in certain cases where browsers
believe they are incorrect, browsers may perform MIME sniffing —
guessing the correct MIME type by looking at the bytes of the
resource.
Each browser performs MIME sniffing differently and under different
circumstances. (For example, Safari will look at the file extension in
the URL if the sent MIME type is unsuitable.) There are security
concerns as some MIME types represent executable content. Servers can
prevent MIME sniffing by sending the X-Content-Type-Options header.
According to Mozilla's documentation:
The X-Content-Type-Options response HTTP header is a marker used by
the server to indicate that the MIME types advertised in the
Content-Type headers should not be changed and be followed. This
allows to opt-out of MIME type sniffing, or, in other words, it is a
way to say that the webmasters knew what they were doing.
Therefore adding X-Content-Type-Options: nosniff makes it work.
The browser can process and render the data as it comes in whether data is sent chunked or not. Whether a browser renders the response data is going to be a function of the data structure and what kind of buffering it employs. e.g. Before the browser can render an image, it needs to have the document (or enough of the document), the style sheet, etc.
Chunking is mostly useful when the length of a resource is unknown at the time the resource response is generated (a "Content-Length" can't be included in the response headers) and the server doesn't want to close the connection after the resource is transferred.

How to interpret HTTP Accept headers?

According to the HTTP1.1 spec, an Accept header of the following
Accept: text/plain; q=0.5, text/html, text/x-dvi; q=0.8, text/x-c
is interpreted to mean
text/html and text/x-c are the preferred media types, but if they do not
exist, then send the text/x-dvi entity, and if that does not exist, send
the text/plain entity
Let's change the header to:
Accept: text/html, text/x-c
What is returned if neither of this is accepted ? e.g. let's pretend that I only support application/json
Maybe you should respond with a 406 Not Acceptable. That's how I read this.
Or a 415 Unsupported Media Type?
I would opt for a 406, because in that case and according to the specs, a response SHOULD include a list of alternatives. Although is not clear to me how that list should look like.
"If an Accept header field is present, and if the server cannot send a response which is acceptable according to the combined Accept field value, then the server SHOULD send a 406 (not acceptable) response." -- RFC2616, Section 14.1
You have a choice. You can either reply with 406 and include an "entity" (e.g. HTML or text file) describing the available formats; OR if you are using HTTP 1.1, you can send the format you support even though it wasn't listed in the Accept header.
(see section 10.4.7 of RFC 2616)
"Note: HTTP/1.1 servers are allowed
to return responses which are not
acceptable according to the accept
headers sent in the request. In some
cases, this may even be preferable to
sending a 406 response. User agents
are encouraged to inspect the headers
of an incoming response to determine
if it is acceptable."

Resources