Http Header Accept Encoding - http

I have difficulty in understanding how this header works.
Briefly my question is
If i am requesting a post to certain resource then let's
Say in 1st case response is some json string and in 2nd case response is a .jar file.
1.Should client include accept-header:gzip,deflate in both cases while sending HTTP request,knowing that first one results in json string?
2.What if the response is already zipped,now zipping the response over the already zipped data doesn't create problems?
3.what happens if i include accept-encoding:gzip in first case where json string is received. So i receive a zipped data as my response(i am not even sure if get zipped data or some encoded data as response.I think zipped data means something zipped like .jar/.zip and encoded data means Encoded data of the original data ,which one is happening zipping or encoding)?
4.Lets say the server sends the response with Contentype header as "application/octet-stream". Now is it must to use accept-header:gzip,deflate

A client can use Accept-Encoding HTTP request header to tell the server that it can accept a compressed response.
The server can use the request header to decide if it should send a compressed response or not. It can ignore the header and always send a non-compressed response (possibly less efficient). It can ignore the header and always send a compressed response (risking giving a client a response it can't decode).
Should client include accept-header:gzip,deflate in both cases
I can't think of any reason to not tell the server that a client can handle a compressed response (assuming that fact is true).
What if the response is already zipped,now zipping the response over the already zipped data doesn't create problems
It might be a waste of processor power for little or no saving in bytes.
That's not a reason for the client to say it can't handle a compressed response though. That's a decision to be made on the server.
what happens if i include accept-encoding:gzip in first case where json string is received.
Then the client has told the server that a compressed response is acceptable.
So i receive a zipped data as my response
The server might send a compressed response. It might ignore the header.
i am not even sure if get zipped data or some encoded data as response
There isn't an "or" here.
The data is encoded using a compression algorithm.
Lets say the server sends the response with Contentype header as "application/octet-stream"
That just means the server doesn't know what type of data it is sending. Instead of saying "This is JSON" or "This is a jar file" it is saying "I dunno what this is, it's just a stream of bytes to me".
Now is it must to use accept-header:gzip,deflate
It doesn't make a difference.
The server can compress the data. It can send uncompressed data. It can use the Accept-Encoding request header to decide which of the two.

Yes, why not? If the JSON payload is big, compressing it will make a lot of sense.
It's just overhead.
You might receive gzipped data - not a ZIP file. You may want to read RFCs 7230 and RFC 7231 for details.
The internet media type of the payload is completely independent of the content coding.

Related

Using "Content-Encoding":"GZIP"

I want to send large amount of json over http to sever.
If I use "Content-Encoding":"GZIP" in my httpClient, does it automatically convert the request body to compressed format?
No, the RFC 7231 describes content encoding. If you are sending Content-Encoding you need to make sure that the content is in that encoding.
If you send Content-Encoding: gzip and the message in plain text you will (quite rightly) receive an HTTP 400. The body of a gzip message will always start with 0x1f 0x8b and if the server does not find that int he POST request it is right to complain.
Another reason for this is that you need an appropriate Content-Length header. This will not be the length of the original JSON, it must be the length (in bytes) of the gzipped JSON.
You need to perform the gzip of the JSON before sending anything since you need to know what to place in Content-Length beforehand.
Extra note: If the JSON is that huge (e.g. several gigabytes) you probably will need Transfer-Encoding: chunked, which comes with its own complications. (You do not send Content-Length but add the length of the chuck to the body itself.)
If it automatically does this, is 100% dependent on which http client you are using and if they implemented it that way. Usually setting a header will not automatically encode it, at least in the clients I regularly use.

Sending a SOAP XML manually and receiving a HTTP 500 error code and binary data in the response

I am using ruby to send a SOAP request to a very enterprisey bla bla service, so unfortunately I can not attach any samples, there's nobody to send any server-side logs, nobody knows whats wrong on the provider side or how the actual HTTP requests need to look like (except a single XML example I got, but no HTTP headers), the docs are very Microsoft-centric with C# examples and whatnot ("instantiate AbstractFactoryFactory..." and whatnot), long live enterprise software.
But the bottom line is, eventually I took one of their own XMLs from their logs and sent it via HTTP to the endpoint from the WSDL and sent it to their host using the Savon gem raw XML option and got a HTTP 500 error from their host and a bunch of non-ascii binary data inside - literally, no ASCII characters are in the body.
I guessed that maybe Savon does some bad magic or that the XML option is not working as expected and I tried sending the same request via Faraday, but got the same thing,
the HTTP response headers says it's a HTTP response, XML encoded, from an ASP.NET host:
"content-type"=>"text/xml; charset=utf-8",
"server"=>"Microsoft-IIS/7.5",
"x-aspnet-version"=>"2.0.50727",
"x-powered-by"=>"ASP.NET",
but again, a 440 bytes worth of binaries in the response:
method=:post,
body=
"\x1F\x8B\b\x00\x00...
etc.
Am I missing some weird aspect of the SOAP specification and I need to do something to decode this data or has their server gone bonkers from my XML, HTTP headers or something else and I need to ping the provider?
Update 1
I noticed that their original XML had UTF-16 encoding set, so I tried encoding the raw string to UTF-16, then had Savon spew errors at me about bad data, then I updated encoding in the Savon client config. But I still get HTTP 500 error and binaries as response and if I try to log anything Savon reports a bug:
Encoding::CompatibilityError: incompatible encoding regexp match (US-ASCII regexp with UTF-16 string)
from /home/bbozo/.rvm/gems/ruby-2.2.4/gems/savon-2.11.1/lib/savon/log_message.rb:13:in `to_s'
Faraday basically reported the same behavior, an binary blob.
Update 2
I tried piping the encoding to every known encoding, and got nothing, even though the HTTP headers imply the encoding is UTF-8, it obviously isn't
Encoding.name_list.map{ |e_in| [ e_in, ( response.body.dup.force_encoding(e_in).encode('utf-8') rescue 'incompatible' ) ] }
There is nothing that would indicate the encoding in the WSDL files, the API spec doesn't even mention encoding except that the request XMLs need to be UTF-8 encoding, I tried encoding the body, changing the XML encoding definition, HTTP headers, but still I get the same binary blob, with the same heading (\x1F\x8B\b\x00\x00) - so it's not some weird encryption either.
Compression maybe?
I tried with https for good measure and nothing.
Question
Am I missing some weird aspect of the SOAP specification and I need to do something to decode this data or has their server gone bonkers from my XML, HTTP headers or something else and I need to ping the provider?
The response body was compressed! In the end I just gunzipped it and there it was,
How to decompress Gzip string in ruby?

HTTP request and response flow for get

I am having difficulties understanding the HTTP request and response flow. I am working with a system where I can "hijack" incoming HTTP request and give my own response. The problem I am having is that some type of GET request seem to assume that all data is sent back in first request.
For instance, JPEG image requests, no matter the size (my tests include 0-20 MB JPEG files) seems to assume that the entire data is sent in the first response. Even if I don't send any data and explicitly set range header to 0 I never get a response back from the client asking for the data.
Other data request types, such as mp4 video, the client seems perfectly fine with getting a response with only header information with no data and then sends a new request explicitly asking for byte range 0-.
Is there some kind of agreement between the the client and server that some types should be sent back in one request while others can be split up in a number of requests?

Parsing a previously saved HTTP response

I have previously captured HTTP responses on disk (in the WARC format to be specific) and need a way to parse the responses.
I need to be able to access the headers and header values and optionally get the body if headers fit my criteria.
A WARC file contains the full respons including the HTTP/1.1 line, all headers and the body. I can find enough Clojure libraries to do HTTP requests, but I'm unable to only use the response parsing of such libraries. I'm a Clojure beginner, so I might be missing something.
The WARC parser I currently use gives me a stream with the HTTP response if this matters.

Does sending POST data to a server that doesn't accept post data recieve the data?

I am setting up a back end API in a script of mine that contacts one of my sites by sending XML to my web server in the form of POST data. This script will be used by many and I want to limit the bandwidth waste for people that accidentally turn the feature on without a proper access key.
I will be denying requests that do not have the correct access key by maybe generating a 403 access code.
Lets say the POST data is ~500kb of data. Does the server receive all 500kb of data when this attempt is made regardless of the status code?
How about if I made the url contain the key mydomain/api/123456789 and generate 403 status on all bad access keys.
Does the POST data still get sent/received regardless or is it negotiated before the data is finally sent.
Thanks in advance!
Generally speaking, the entire request will be sent, including post data. There is often no way for the application layer to return a response like a 403 until it has received the entire request.
In reality, it will depend on the language/framework used and how closely it is linked to the HTTP server. Section 8.2.2 of RFC2616 HTTP/1.1 specification has this to say
An HTTP/1.1 (or later) client sending
a message-body SHOULD monitor the
network connection for an error status
while it is transmitting the request.
If the client sees an error status, it
SHOULD immediately cease transmitting
the body. If the body is being sent
using a "chunked" encoding (section
3.6), a zero length chunk and empty trailer MAY be used to prematurely
mark the end of the message. If the
body was preceded by a Content-Length
header, the client MUST close the
connection.
So, if you can find a language environemnt closely linked with the HTTP server (for example, mod_perl), you could do this in a way which does comply with standards.
An alternative approach you could take is to make an initial, smaller request to obtain a URL to use for the larger POST. The application can then deny providing the URL to clients without an appropriate key.
Here is great book about RESTful Web Services, where it's explained how HTTP works: http://oreilly.com/catalog/9780596529260
You can consider any request as envelope, where on top of it it's written address (URL), some properties (HTTP Headers) and inside it there's some data (if request is initiated by post method). So as you might guess you can't receive envelope partially.
Oh I forgot, it's when you are using HTTP Post with standard HTTP header "application/x-www-form-urlencoded" but if you are uploading files (correspondingly using ""multipart/form-data") Django gives you control over streamed chunks of files using Middleware classes: http://docs.djangoproject.com/en/dev/topics/http/middleware/

Resources