Using "Content-Encoding":"GZIP" - http

I want to send large amount of json over http to sever.
If I use "Content-Encoding":"GZIP" in my httpClient, does it automatically convert the request body to compressed format?

No, the RFC 7231 describes content encoding. If you are sending Content-Encoding you need to make sure that the content is in that encoding.
If you send Content-Encoding: gzip and the message in plain text you will (quite rightly) receive an HTTP 400. The body of a gzip message will always start with 0x1f 0x8b and if the server does not find that int he POST request it is right to complain.
Another reason for this is that you need an appropriate Content-Length header. This will not be the length of the original JSON, it must be the length (in bytes) of the gzipped JSON.
You need to perform the gzip of the JSON before sending anything since you need to know what to place in Content-Length beforehand.
Extra note: If the JSON is that huge (e.g. several gigabytes) you probably will need Transfer-Encoding: chunked, which comes with its own complications. (You do not send Content-Length but add the length of the chuck to the body itself.)

If it automatically does this, is 100% dependent on which http client you are using and if they implemented it that way. Usually setting a header will not automatically encode it, at least in the clients I regularly use.

Related

Http Header Accept Encoding

I have difficulty in understanding how this header works.
Briefly my question is
If i am requesting a post to certain resource then let's
Say in 1st case response is some json string and in 2nd case response is a .jar file.
1.Should client include accept-header:gzip,deflate in both cases while sending HTTP request,knowing that first one results in json string?
2.What if the response is already zipped,now zipping the response over the already zipped data doesn't create problems?
3.what happens if i include accept-encoding:gzip in first case where json string is received. So i receive a zipped data as my response(i am not even sure if get zipped data or some encoded data as response.I think zipped data means something zipped like .jar/.zip and encoded data means Encoded data of the original data ,which one is happening zipping or encoding)?
4.Lets say the server sends the response with Contentype header as "application/octet-stream". Now is it must to use accept-header:gzip,deflate
A client can use Accept-Encoding HTTP request header to tell the server that it can accept a compressed response.
The server can use the request header to decide if it should send a compressed response or not. It can ignore the header and always send a non-compressed response (possibly less efficient). It can ignore the header and always send a compressed response (risking giving a client a response it can't decode).
Should client include accept-header:gzip,deflate in both cases
I can't think of any reason to not tell the server that a client can handle a compressed response (assuming that fact is true).
What if the response is already zipped,now zipping the response over the already zipped data doesn't create problems
It might be a waste of processor power for little or no saving in bytes.
That's not a reason for the client to say it can't handle a compressed response though. That's a decision to be made on the server.
what happens if i include accept-encoding:gzip in first case where json string is received.
Then the client has told the server that a compressed response is acceptable.
So i receive a zipped data as my response
The server might send a compressed response. It might ignore the header.
i am not even sure if get zipped data or some encoded data as response
There isn't an "or" here.
The data is encoded using a compression algorithm.
Lets say the server sends the response with Contentype header as "application/octet-stream"
That just means the server doesn't know what type of data it is sending. Instead of saying "This is JSON" or "This is a jar file" it is saying "I dunno what this is, it's just a stream of bytes to me".
Now is it must to use accept-header:gzip,deflate
It doesn't make a difference.
The server can compress the data. It can send uncompressed data. It can use the Accept-Encoding request header to decide which of the two.
Yes, why not? If the JSON payload is big, compressing it will make a lot of sense.
It's just overhead.
You might receive gzipped data - not a ZIP file. You may want to read RFCs 7230 and RFC 7231 for details.
The internet media type of the payload is completely independent of the content coding.

ETag with Accept-Language

I have a web service which puts an ETag on to each response so future calls can make use of the HTTP 304 (Not Modified) status. The ETag I generate really just a Base64 encoding of the query type along with the timestamp.
The problem I have is if the browser requests the same resource with a difference Accept-Language. The browser currently sends the same If-None-Match header, so the response is a 304, even thought the actual resource would have come back in a different language. So I thought the way to do this was to add a Vary Header, to specify to the client that the response varys with Accept & Accept-Language, as shown below.
Vary:Accept, Accept-Language
However my browser (Chrome) uses the same ETag regardless of the accept-language. What is the correct convention to use here?
Thanks
E-Tag identifies the response contents.
So better use a response body hash for E-Tag construction.
At least you can use hash of a query and a language concatenated.

Can I make a WebRequest without setting ContentLength

This is a goofy question but I'm testing a tool and need to create an unusual situation. I want to make a POST request that sends several megabytes of data to a web server but I don't want the Content-Length set. (It would be OK if it was set to 0 or -1.)
HttpWebRequest automatically sets request.ContentLength to the length of the RequestStream buffer. Is there a way to prevent or circumvent this?
You can send a POST request without setting the Content-Length header by using chunked encoding. With chunked encoding, you send the data in segments (or "chunks") instead of in a single piece. This is useful when you need to send data to a server but you don't know the size.
Chunked encoding is part of HTTP 1.1 as defined by RFC 2616
.NET provides the SendChunked feature to support this scenario

Is Content-Transfer-Encoding an HTTP header?

I'm writing a web service that returns a base64-encoded PDF file, so my plan is to add two headers to the response:
Content-Type: application/pdf
Content-Transfer-Encoding: base64
My question is: Is Content-Transfer-Encoding a valid HTTP header? I think it might only be for MIME. If not, how should I craft my HTTP response to represent the fact that I'm returning a base64-encoded PDF? Thanks.
EDIT:
It looks like HTTP does not support this header. From RFC2616 Section 14:
Note: while the definition of Content-MD5 is exactly the same for HTTP
as in RFC 1864 for MIME entity-bodies, there are several ways in which
the application of Content-MD5 to HTTP entity-bodies differs from its
application to MIME entity-bodies. One is that HTTP, unlike MIME, does
not use Content-Transfer-Encoding, and does use Transfer-Encoding and
Content-Encoding.
Any ideas for what I should set my headers to? Thanks.
EDIT 2
Many of the code samples found in the comments of this PHP reference manual page seem to suggest that it actually is a valid HTTP header:
http://php.net/manual/en/function.header.php
According to RFC 1341 (made obsolete by RFC 2045):
A Content-Transfer-Encoding header field, which can be used to
specify an auxiliary encoding that was applied to the data in order to
allow it to pass through mail transport mechanisms which may have
data or character set limitations.
and later:
Many Content-Types which could usefully be transported via email
are represented, in their "natural" format, as 8-bit character or
binary data. Such data cannot be transmitted over some transport
protocols. For example, RFC 821 restricts mail messages to 7-bit
US-ASCII data with 1000 character lines.
It is necessary, therefore, to define a standard mechanism for
re-encoding such data into a 7-bit short-line format. (...) The
Content-Transfer-Encoding field is used to indicate the type of
transformation that has been used in order to represent the body
in an acceptable manner for transport.
Since you have a webservice, which has nothing in common with emails, you shouldn't use this header.
You can use Content-Encoding header which indicates that transferred data has been compressed (gzip value).
I think that in your case
Content-Type: application/pdf
is enough. Additionally, you can set Content-Length header, but in my opinion, if you are building webservice (it's not http server / proxy server) Content-Type is enough. Please bear in mind that some specific headers (e.g. Transfer-Encoding) if not used appropriately, may cause unexpected communication issues, so if you are not 100% sure about usage of some header - if you really need it or not - just don't use it.
Notes in rfc2616 section 14.15 are explicit: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
"Note: while the definition of Content-MD5 is exactly the same for
HTTP as in RFC 1864 for MIME entity-bodies, there are several ways
in which the application of Content-MD5 to HTTP entity-bodies
differs from its application to MIME entity-bodies. One is that
HTTP, unlike MIME, does not use Content-Transfer-Encoding, and
does use Transfer-Encoding and Content-Encoding. Another is that
HTTP more frequently uses binary content types than MIME, so it is
worth noting that, in such cases, the byte order used to compute
the digest is the transmission byte order defined for the type.
Lastly, HTTP allows transmission of text types with any of several
line break conventions and not just the canonical form using CRLF."
As been answered before and also here, a valid Content-Transfer-Encoding HTTP response header does not exist. Also the known headers Content-Encoding and Transfer-Encoding have no appropriate value to express a Base64 encoded response body.
Starting from here, no client would expect a response declared as application/pdf to be encoded as Base64! If you wand to do so, better use a different content type like:
Content-Type: application/pdf+base64
In this case, a client would know some Base64 encoded data is coming (the basic subtype is the suffix after the plus sign) and has a hint there is PDF in there.
Even this is a little hacky (+base64 is no official media type suffix) but at least would somehow meet some standards. Better use a custom content type than misusing standard HTTP headers!
Of course no browser would be able to directly open such a response anyway. Maybe your project should consider creating another endpoint offering a binary PDF response and marking this one deprecated.

How do web servers know the charset using in forms posted to them?

When a web server gets a POST of a form, parsing it into param-value(s) pairs is quite straightforward. However, if the values contain non-English chars that have been encoded by the browser, it must know the charset used in order to decode them.
I've examined the requests sent by two posts. One was done from a page using UTF-8, and one from a page using Windows-1255. The same text was encoded differently. AFAIK, the Content-type header could contain a charset after the application/x-www-form-urlencoded, but it wasn't (using Firefox).
In a servlet, when you use request.getParameter(), you're supposed to get the decoded value. How does the servlet container do that? Does it always bet on UTF-8, use some heuristics, or is there some deterministic way I'm missing?
From the Serlvet 3.0 Spec, section 3.10 Request Data Encoding (emphasis mine)
Currently, many browsers do not send a char encoding qualifier with the ContentType header, leaving open the determination of the character encoding for reading
HTTP requests. The default encoding of a request the container uses to create the
request reader and parse POST data must be “ISO-8859-1” if none has been specified
by the client request. However, in order to indicate to the developer, in this case, the
failure of the client to send a character encoding, the container returns null from
the getCharacterEncoding method.
If the client hasn’t set character encoding and the request data is encoded with a
different encoding than the default as described above, breakage can occur. To
remedy this situation, a new method setCharacterEncoding(String enc) has
been added to the ServletRequest interface. Developers can override the
character encoding supplied by the container by calling this method. It must be
called prior to parsing any post data or reading any input from the request. Calling
this method once data has been read will not affect the encoding.
In practice, I find that setting the charset in a response influences the charset used in the subsequent POST. To be extra sure, you can write a Servlet Filter that calls the setCharacterEncoding on every request object before it is used.
You may also find this thread useful - Detecting the character encoding of an HTTP POST request
The apropriate header for specifying charsets is Accept-Charset.
Latest Chrome for linux, e.g., spits:
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3
on each request.
Section 14.2 from http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html states:
The Accept-Charset request-header field can be used to indicate what character sets are acceptable for the response. This field allows clients capable of understanding more comprehensive or special- purpose character sets to signal that capability to a server which is capable of representing documents in those character sets.
(...)
If no Accept-Charset header is
present, the default is that any
character set is acceptable. If an
Accept-Charset header is present, and
if the server cannot send a response
which is acceptable according to the
Accept-Charset header, then the server
SHOULD send an error response with the
406 (not acceptable) status code,
though the sending of an unacceptable
response is also allowed.
So if you receive such a header from a client, the value with highest q can be the encoding you're receiving from it.

Resources