Chunked transfer encoding - browser behavior - http

I'm trying to send data in chunked mode. All headers are set properly and data is encoded accordingly. Browsers recognize my response as a chunked one, accepting headers and start receiving data.
I was expecting the browser would update the page on each received chunk, instead it waits until all chunks are received then displays them all. Is this the expected behavior?
I was expecting to see each chunk displayed right after it was received. When using curl, each chunk is shown right after it is received. Why does the same not happen with GUI browsers? Are they using some sort of buffering/cache?
I set the Cache-Control header to no-cache, so not sure it is about cache.

afaik browsers needs some payload to start render chunks as they received.
Curl is of course an exception.
Try to send about 1KB of arbitrary data before your first chunk.
If you are doing everything correctly, browsers should render chunks as they received.

Fix your headers.
As of 2019, if you use Content-type: text/html, no buffering occurs in Chrome.
If you just want to stream text, similar to text/plain, then just using Content-type: text/event-stream will also disable buffering.
If you use Content-type: text/plain, then Chrome will still buffer 1 KiB, unless you additionally specify X-Content-Type-Options: nosniff.
RFC 2045 specifies that if no Content-Type is specified, Content-type: text/plain; charset=us-ascii should be assumed
5.2. Content-Type Defaults
Default RFC 822 messages without a MIME Content-Type header are taken
by this protocol to be plain text in the US-ASCII character set,
which can be explicitly specified as:
Content-type: text/plain; charset=us-ascii
This default is assumed if no Content-Type header field is specified.
It is also recommend that this default be assumed when a
syntactically invalid Content-Type header field is encountered. In
the presence of a MIME-Version header field and the absence of any
Content-Type header field, a receiving User Agent can also assume
that plain US-ASCII text was the sender's intent. Plain US-ASCII
text may still be assumed in the absence of a MIME-Version or the
presence of an syntactically invalid Content-Type header field, but
the sender's intent might have been otherwise.
Browsers will start to buffer text/plain for a certain amount in order to check if they can detect if the content sent is really plain text or some media type like an image, in case the Content-Type was omitted, which would then equal a text/plain content type. This is called MIME type sniffing.
MIME type sniffing is defined by Mozilla as:
In the absence of a MIME type, or in certain cases where browsers
believe they are incorrect, browsers may perform MIME sniffing —
guessing the correct MIME type by looking at the bytes of the
resource.
Each browser performs MIME sniffing differently and under different
circumstances. (For example, Safari will look at the file extension in
the URL if the sent MIME type is unsuitable.) There are security
concerns as some MIME types represent executable content. Servers can
prevent MIME sniffing by sending the X-Content-Type-Options header.
According to Mozilla's documentation:
The X-Content-Type-Options response HTTP header is a marker used by
the server to indicate that the MIME types advertised in the
Content-Type headers should not be changed and be followed. This
allows to opt-out of MIME type sniffing, or, in other words, it is a
way to say that the webmasters knew what they were doing.
Therefore adding X-Content-Type-Options: nosniff makes it work.

The browser can process and render the data as it comes in whether data is sent chunked or not. Whether a browser renders the response data is going to be a function of the data structure and what kind of buffering it employs. e.g. Before the browser can render an image, it needs to have the document (or enough of the document), the style sheet, etc.
Chunking is mostly useful when the length of a resource is unknown at the time the resource response is generated (a "Content-Length" can't be included in the response headers) and the server doesn't want to close the connection after the resource is transferred.

Related

Can the "Accept" HTTP header have a "charset" parameter?

I have encountered an HTTP client that uses the following header:
Accept: application/vnd.api+json; charset=utf-8'
According to the HTTP spec, Accept headers can have parameters. The most common one being the q parameter, which sets the priority of different content types. However, there are a number of reasons I don't think charset is a valid Accept parameter:
Accept already has the Accept-Charset parameter, which seems like it makes this redundant
MDN doesn't include it in their documentation on Accept, even though they do include in on their Content-Type page
Werkzeug, the flask HTTP parser, doesn't bother to parse charsets for Accept, even though it does for Content-Type
So, it seems this Accept; charset is unusual. But is it wrong?
You quoted the spec, which says they are ok. What else needs to be said?

ETag with Accept-Language

I have a web service which puts an ETag on to each response so future calls can make use of the HTTP 304 (Not Modified) status. The ETag I generate really just a Base64 encoding of the query type along with the timestamp.
The problem I have is if the browser requests the same resource with a difference Accept-Language. The browser currently sends the same If-None-Match header, so the response is a 304, even thought the actual resource would have come back in a different language. So I thought the way to do this was to add a Vary Header, to specify to the client that the response varys with Accept & Accept-Language, as shown below.
Vary:Accept, Accept-Language
However my browser (Chrome) uses the same ETag regardless of the accept-language. What is the correct convention to use here?
Thanks
E-Tag identifies the response contents.
So better use a response body hash for E-Tag construction.
At least you can use hash of a query and a language concatenated.

Using "Content-Encoding":"GZIP"

I want to send large amount of json over http to sever.
If I use "Content-Encoding":"GZIP" in my httpClient, does it automatically convert the request body to compressed format?
No, the RFC 7231 describes content encoding. If you are sending Content-Encoding you need to make sure that the content is in that encoding.
If you send Content-Encoding: gzip and the message in plain text you will (quite rightly) receive an HTTP 400. The body of a gzip message will always start with 0x1f 0x8b and if the server does not find that int he POST request it is right to complain.
Another reason for this is that you need an appropriate Content-Length header. This will not be the length of the original JSON, it must be the length (in bytes) of the gzipped JSON.
You need to perform the gzip of the JSON before sending anything since you need to know what to place in Content-Length beforehand.
Extra note: If the JSON is that huge (e.g. several gigabytes) you probably will need Transfer-Encoding: chunked, which comes with its own complications. (You do not send Content-Length but add the length of the chuck to the body itself.)
If it automatically does this, is 100% dependent on which http client you are using and if they implemented it that way. Usually setting a header will not automatically encode it, at least in the clients I regularly use.

Is Content-Transfer-Encoding an HTTP header?

I'm writing a web service that returns a base64-encoded PDF file, so my plan is to add two headers to the response:
Content-Type: application/pdf
Content-Transfer-Encoding: base64
My question is: Is Content-Transfer-Encoding a valid HTTP header? I think it might only be for MIME. If not, how should I craft my HTTP response to represent the fact that I'm returning a base64-encoded PDF? Thanks.
EDIT:
It looks like HTTP does not support this header. From RFC2616 Section 14:
Note: while the definition of Content-MD5 is exactly the same for HTTP
as in RFC 1864 for MIME entity-bodies, there are several ways in which
the application of Content-MD5 to HTTP entity-bodies differs from its
application to MIME entity-bodies. One is that HTTP, unlike MIME, does
not use Content-Transfer-Encoding, and does use Transfer-Encoding and
Content-Encoding.
Any ideas for what I should set my headers to? Thanks.
EDIT 2
Many of the code samples found in the comments of this PHP reference manual page seem to suggest that it actually is a valid HTTP header:
http://php.net/manual/en/function.header.php
According to RFC 1341 (made obsolete by RFC 2045):
A Content-Transfer-Encoding header field, which can be used to
specify an auxiliary encoding that was applied to the data in order to
allow it to pass through mail transport mechanisms which may have
data or character set limitations.
and later:
Many Content-Types which could usefully be transported via email
are represented, in their "natural" format, as 8-bit character or
binary data. Such data cannot be transmitted over some transport
protocols. For example, RFC 821 restricts mail messages to 7-bit
US-ASCII data with 1000 character lines.
It is necessary, therefore, to define a standard mechanism for
re-encoding such data into a 7-bit short-line format. (...) The
Content-Transfer-Encoding field is used to indicate the type of
transformation that has been used in order to represent the body
in an acceptable manner for transport.
Since you have a webservice, which has nothing in common with emails, you shouldn't use this header.
You can use Content-Encoding header which indicates that transferred data has been compressed (gzip value).
I think that in your case
Content-Type: application/pdf
is enough. Additionally, you can set Content-Length header, but in my opinion, if you are building webservice (it's not http server / proxy server) Content-Type is enough. Please bear in mind that some specific headers (e.g. Transfer-Encoding) if not used appropriately, may cause unexpected communication issues, so if you are not 100% sure about usage of some header - if you really need it or not - just don't use it.
Notes in rfc2616 section 14.15 are explicit: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
"Note: while the definition of Content-MD5 is exactly the same for
HTTP as in RFC 1864 for MIME entity-bodies, there are several ways
in which the application of Content-MD5 to HTTP entity-bodies
differs from its application to MIME entity-bodies. One is that
HTTP, unlike MIME, does not use Content-Transfer-Encoding, and
does use Transfer-Encoding and Content-Encoding. Another is that
HTTP more frequently uses binary content types than MIME, so it is
worth noting that, in such cases, the byte order used to compute
the digest is the transmission byte order defined for the type.
Lastly, HTTP allows transmission of text types with any of several
line break conventions and not just the canonical form using CRLF."
As been answered before and also here, a valid Content-Transfer-Encoding HTTP response header does not exist. Also the known headers Content-Encoding and Transfer-Encoding have no appropriate value to express a Base64 encoded response body.
Starting from here, no client would expect a response declared as application/pdf to be encoded as Base64! If you wand to do so, better use a different content type like:
Content-Type: application/pdf+base64
In this case, a client would know some Base64 encoded data is coming (the basic subtype is the suffix after the plus sign) and has a hint there is PDF in there.
Even this is a little hacky (+base64 is no official media type suffix) but at least would somehow meet some standards. Better use a custom content type than misusing standard HTTP headers!
Of course no browser would be able to directly open such a response anyway. Maybe your project should consider creating another endpoint offering a binary PDF response and marking this one deprecated.

How to interpret HTTP Accept headers?

According to the HTTP1.1 spec, an Accept header of the following
Accept: text/plain; q=0.5, text/html, text/x-dvi; q=0.8, text/x-c
is interpreted to mean
text/html and text/x-c are the preferred media types, but if they do not
exist, then send the text/x-dvi entity, and if that does not exist, send
the text/plain entity
Let's change the header to:
Accept: text/html, text/x-c
What is returned if neither of this is accepted ? e.g. let's pretend that I only support application/json
Maybe you should respond with a 406 Not Acceptable. That's how I read this.
Or a 415 Unsupported Media Type?
I would opt for a 406, because in that case and according to the specs, a response SHOULD include a list of alternatives. Although is not clear to me how that list should look like.
"If an Accept header field is present, and if the server cannot send a response which is acceptable according to the combined Accept field value, then the server SHOULD send a 406 (not acceptable) response." -- RFC2616, Section 14.1
You have a choice. You can either reply with 406 and include an "entity" (e.g. HTML or text file) describing the available formats; OR if you are using HTTP 1.1, you can send the format you support even though it wasn't listed in the Accept header.
(see section 10.4.7 of RFC 2616)
"Note: HTTP/1.1 servers are allowed
to return responses which are not
acceptable according to the accept
headers sent in the request. In some
cases, this may even be preferable to
sending a 406 response. User agents
are encouraged to inspect the headers
of an incoming response to determine
if it is acceptable."

Resources