How to zip/compress the Multipart HTTP request sent by the client

How to zip/compress the Multipart HTTP request sent by the client - http

I am using apache httpclient 4.2 to send Multipart HTTP PUT request. The client has to upload large size binary data of the order of 500 MB. Hence compression is required.
I wish to compress the whole Multipart HTTP request and inform the server through header Content-Encoding: gzip. I control the server as well as client code.
Note: I am aware of the approach where I can selectively compress the large size binary data & not the whole request but for now I am ruling out that approach.
The httpclient's HttpRequestInterceptor class doesn't provide a handle to the request input stream or an entity to compress the request.
I have searched on the web & found few related links(below) but none of them works
1. http://old.nabble.com/compressing-multipart-request-from-custom-client-tt27804438.html#a27811350
2. http://hc.apache.org/httpcomponents-client-ga/tutorial/html/fundamentals.html#protocol_interceptors - This link says that "Protocol interceptors can also manipulate content entities enclosed with messages - transparent content compression / decompression being a good example." but don't know how to get the desired functionality.
Please give me some direction, if possible some sample code.

I was finally able to solve this requirement by
creating a org.apache.http.entity.mime.MultipartEntity object reqEntity.
Adding the compressed binary part to the HTTP request
ByteArrayBody binaryData =
new ByteArrayBody(compressedData, "binary/octet-stream", "N/A");
binaryParts.add(binaryData);
reqEntity.addPart("uniqueId", binaryPart);

Related

Is there a way to set the http Header values for an esp_https_ota call?

I'm trying to download a firmware.bin file that is produced in a private Github repository. I have the code that is finding the right asset url to download the file and per Github instructions the accept header needs to be set to accept: application/octet-stream in order to get the binary file. I'm only getting JSON in response. If I run the same request through postman I'm getting a binary file as the body. I've tried downloading it using HTTPClient and I get the same JSON request. It seems the headers aren't being set as requested to tell Github to send the binary content as I'm just getting JSON. As for the ArduinoOTA abstraction, I can't see how to even try to set headers and in digging into the esp_https_ota functions and http_client functions there doesn't appear to be a way to set headers for any of these higher level abstractions because the http_config object has no place for headers as far as I can tell. I might file a feature request to allow for this, but am new to this programming area and want to check to see if I'm missing something first.
Code returns JSON, not binary. URL is github rest api url to the asset (works in postman)
HTTPClient http2;
http2.setAuthorization(githubname,githubpass);
http2.addHeader("Authorization","token MYTOKEN");
http2.addHeader("accept","application/octet-stream");
http2.begin( firmwareURL, GHAPI_CERT); //Specify the URL and certificate

With the ESP IDF HTTP client you can add headers to an initialized HTTP client using function esp_http_client_set_header().
esp_http_client_handle_t client = esp_http_client_init(&config);
esp_http_client_set_header(client, "HeaderKey", "HeaderValue");
err = esp_http_client_perform(client);
If using the HTTPS OTA API, you can register for a callback which gives you a handle to the underlying HTTP client. You can then do the exact same as in above example.

Is it possible for an HTTP server to interrupt and replace a response in progress?

Supposing I was to stream a large response to a browser, or other HTTP client; is there a way for the server to return an HTTP 500 error page mid-way through the download?
For example, I may be serving a very large HTML page, or perhaps displaying a large JPEG file in its own tab, and the data-source is terminated prematurely. At this point I want to show my standard HTTP 500 error page.
If it helps, I can use Content-Transfer-Encoding: chunked, or even other hackish HTTP headers to make this work.
Is it possible to override a download result on the HTTP level?
I know one could add a JavaScript alert if the response were in HTML format, but I'm just wondering if I can override / restart the response on the HTTP level instead?

Appropriate HTTP status code for request specifying invalid Content-Encoding header?

What status code should be returned if a client sends an HTTP request and specifies a Content-Encoding header which cannot be decoded by the server?
Example
A client POSTs JSON data to a REST resource and encodes the entity body using the gzip coding. However, the server can only decode DEFLATE codings because it failed the gzip class in server school.
What HTTP response code should be returned? I would say 415 Unsupported Media Type but it's not the entity's Content-Type that is the problem -- it's the encoding of the otherwise supported entity body.
Which is more appropriate: 415? 400? Perhaps a custom response code?
Addendum: I have, of course, thoroughly checked rfc2616. If the answer is there I may need some new corrective eyewear, but I don't believe that it is.
Update:
This has nothing to do with sending a response that might be unacceptable to a client. The problem is that the client is sending the server what may or may not be a valid media type in an encoding the server cannot understand (as per the Content-Encoding header the client packaged with the request message).
It's an edge-case and wouldn't be encountered when dealing with browser user-agents, but it could crop up in REST APIs accepting entity bodies to create/modify resources.

As i'm reading it, 415 Unsupported Media Type sounds like the most appropriate.
From RFC 2616:
10.4.16 415 Unsupported Media Type
The server is refusing to service the request because the entity of the request is in a format not supported by the requested resource for the requested method.
Yeah, the text part says "media type" rather than "encoding", but the actual description doesn't include any mention of that distinction.
The new hotness, RFC 7231, is even explicit about it:
6.5.13. 415 Unsupported Media Type
The 415 (Unsupported Media Type) status code indicates that the
origin server is refusing to service the request because the payload
is in a format not supported by this method on the target resource.
The format problem might be due to the request's indicated
Content-Type or Content-Encoding, or as a result of inspecting the
data directly.

They should make that the final question on Who Wants To Be a Millionaire!
Well the browser made a request that the server cannot service because the information the client provided is in a format that cannot be handled by the server. However, this isn't the server's fault for not supporting the data the client provided, it's the client's fault for not listening to the server's Acccept-* headers and providing data in an inappropriate encoding. That would make it a Client Error (400 series error code).
My first instinct is 400 Bad Request is the appropriate response in this case.
405 Method Not Allowed isn't right because it refers to the HTTP verb being one that isn't allowed.
406 Not Acceptable looks like it might have promise, but it refers to the server being unable to provide data to the client that satisfies the Accept-* request headers that it sent. This doesn't seem like it would fit your case.
412 Precondition Failed is rather vaguely defined. It might be appropriate, but I wouldn't bet on it.
415 Unsupported Media Type isn't right because it's not the data type that's being rejected, it's the encoding format.
After that we get into the realm of non-standard response codes.
422 Unprocessable Entity describes a response that should be returned if the request was well-formed but if it was semantically incorrect in some way. This seems like a good fit, but it's a WebDAV extension to HTTP and not standard.
Given the above, I'd personally opt for 400 Bad Request. If any other HTTP experts have a better candidate though, I'd listen to them instead. ;)
UPDATE: I'd previously been referencing the HTTP statuses from their page on Wikipedia. Whilst the information there seems to be accurate, it's also less than thorough. Looking at the specs from W3C gives a lot more information on HTTP 406, and it's leading me to think that 406 might be the right code after all.
10.4.7 406 Not Acceptable
The resource identified by the request is only capable of generating
response entities which have content characteristics not acceptable
according to the accept headers sent in the request.
Unless it was a HEAD request, the response SHOULD include an entity
containing a list of available entity characteristics and location(s)
from which the user or user agent can choose the one most appropriate.
The entity format is specified by the media type given in the
Content-Type header field. Depending upon the format and the
capabilities of the user agent, selection of the most appropriate
choice MAY be performed automatically. However, this specification
does not define any standard for such automatic selection.
Note: HTTP/1.1 servers are allowed to return responses which are
not acceptable according to the accept headers sent in the
request. In some cases, this may even be preferable to sending a
406 response. User agents are encouraged to inspect the headers of
an incoming response to determine if it is acceptable.
If the response could be unacceptable, a user agent SHOULD temporarily
stop receipt of more data and query the user for a decision on further
actions.
While it does mention the Content-Type header explicitly, the wording mentions "entity characteristics", which you could read as covering stuff like GZIP versus DEFLATE compression.
One thing worth noting is that the spec says that it may be appropriate to just send the data as is, along with the headers to tell the client what format it's in and what encoding it uses, and just leave it for the client to sort out. So if the client sends a header indicating it accepts GZIP compression, but the server can only generate a response with DEFLATE, then sending that along with headers saying it's DEFLATE should be okay (depending on the context).
Client: Give me a GZIPPED page.
Server: Sorry, no can do. I can DEFLATE pack it for you. Here's the DEFLATE packed page. Is that okay for you?
Client: Welllll... I didn't really want DEFLATE, but I can decode it okay so I'll take it.
(or)
Client: I think I'll have to clear that with my user. Hold on.

HTTP Get content type

I have a program that is supposed to interact with a web server and retrieve a file containing structured data using http and cgi. I have a couple questions:
The cgi script on the server needs to specify a body right? What should the content-type be?
Should I be using POST or GET?
Could anyone tell me a good resource for reading about HTTP?

If you just want to retrieve the resource, I’d use GET. And with GET you don’t need a Content-Type since a GET request has no body. And as of HTTP, I’d suggest you to read the HTTP 1.1 specification.

The content-type specified by the server will depend on what type of data you plan to return. As Jim said if it's JSON you can use 'application/json'. The obvious payload for the request would be whatever data you're sending to the client.
From the servers prospective it shouldn't matter that much. In general if you're not expecting a lot of information from the client I'd set up the server to respond to GET requests as opposed to POST requests. An advantage I like is simply being able to specify what I want in the url (this can't be done if it's expecting a POST request).
I would point you to the rfc for HTTP...probably the best source for information..maybe not the most user friendly way to get your answers but it should have all the answers you need. link text

For (1) the Content-Type depends on the structured data. If it's XML you can use application/xml, JSON can be application/json, etc. Content-Type is set by the server. Your client would ask for that type of content using the Accept header. (Try to use existing data format standards and content types if you can.)
For (2) GET is best (you aren't sending up any data to the server).
I found RESTful Web Services by Richardson and Ruby a very interesting introduction to HTTP. It takes a very strict, but very helpful, view of HTTP.

Does sending POST data to a server that doesn't accept post data recieve the data?

I am setting up a back end API in a script of mine that contacts one of my sites by sending XML to my web server in the form of POST data. This script will be used by many and I want to limit the bandwidth waste for people that accidentally turn the feature on without a proper access key.
I will be denying requests that do not have the correct access key by maybe generating a 403 access code.
Lets say the POST data is ~500kb of data. Does the server receive all 500kb of data when this attempt is made regardless of the status code?
How about if I made the url contain the key mydomain/api/123456789 and generate 403 status on all bad access keys.
Does the POST data still get sent/received regardless or is it negotiated before the data is finally sent.
Thanks in advance!

Generally speaking, the entire request will be sent, including post data. There is often no way for the application layer to return a response like a 403 until it has received the entire request.
In reality, it will depend on the language/framework used and how closely it is linked to the HTTP server. Section 8.2.2 of RFC2616 HTTP/1.1 specification has this to say
An HTTP/1.1 (or later) client sending
a message-body SHOULD monitor the
network connection for an error status
while it is transmitting the request.
If the client sees an error status, it
SHOULD immediately cease transmitting
the body. If the body is being sent
using a "chunked" encoding (section
3.6), a zero length chunk and empty trailer MAY be used to prematurely
mark the end of the message. If the
body was preceded by a Content-Length
header, the client MUST close the
connection.
So, if you can find a language environemnt closely linked with the HTTP server (for example, mod_perl), you could do this in a way which does comply with standards.
An alternative approach you could take is to make an initial, smaller request to obtain a URL to use for the larger POST. The application can then deny providing the URL to clients without an appropriate key.

Here is great book about RESTful Web Services, where it's explained how HTTP works: http://oreilly.com/catalog/9780596529260
You can consider any request as envelope, where on top of it it's written address (URL), some properties (HTTP Headers) and inside it there's some data (if request is initiated by post method). So as you might guess you can't receive envelope partially.
Oh I forgot, it's when you are using HTTP Post with standard HTTP header "application/x-www-form-urlencoded" but if you are uploading files (correspondingly using ""multipart/form-data") Django gives you control over streamed chunks of files using Middleware classes: http://docs.djangoproject.com/en/dev/topics/http/middleware/