As far as I know it is possible to transfer binary files over HTTP protocol. But HTTP is a text-based protocol, the typical HTTP response frame looks as follows:
HTTP/1.1 200 OK
Date: Wed, 23 May 2012 22:38:34 GMT
Content-Length: 438
Content-Type: text/html; charset=UTF-8
Here goes content
If so, how should binary file be encoded in this frame? What is the Content-Type? Is the content encoded with base64 - same as attachments in POP3 protocol? Or it is raw data (is it possible not to cause problems if so?)
The header fields are text based, but the actual payload is binary. You can transfer whatever you want.
And no, it doesn't have anything to do with the Content-Type. That is just a label so that the recipient knows how to process the data; it does not affect the format in the protocol itself.
Binary files are usually transferred with the Application/octet-stream mimetype (unless they match another more specific mimetype of course). For transmission you use the raw data - no base64 needed.
Related
If I send binary data over HTTP using the standard Content-Type: application/octet-stream header, what would be the overhead? ignoring HTTP header - I'm talking about data \ encoding overhead only byte count wise.
Thanks
Why should there be any overhead? HTTP does not usually do things like base64 encoding (you may be thinking of e-mail, which mostly uses different 7bit encodings creating a lot of overhead). The content-type header has little to nothing to do with your data encoding, but with how clients try to handle it. with octet-stream the clients will always prompt for a download, even when the file is text/plain from its content.
I have a question on usage of Content-Encoding and Transfer-Encoding:
Please let me know if my below understanding is right:
Client in its request can specify which encoding types it is willing to accept using accept-encoding header. So, if Server wishes to encode the message before transmission, eg. gzip, it can zip the entity (content) and add content-encoding: gzip and send across the HTTP response. On reception, client can receive and decompress and parse the entity.
In case of Transfer Encoding, Client may specify what kind of encoding it is willing to accept and perform its action on fly. i.e. if Client sends a TE: gzip; q=1, it means that if Server wishes, it can send a 200 OK with Transfer-Encoding: gzip and as it tries sending the stream, it can compress and send across, and client upon receiving the content, can decompress on fly and perform its parsing.
Is my understanding right here? Please comment.
Also, what is the basic advantage of compressing the entity on fly vs compressing the entity first and then transmitting it across? Is transfer-encoding valid only for chunked responses as we do not know the size of the entity before transmission?
The difference really is not about on-the-fly or not -- Content-Encoding can be both pre-computed and on the fly.
The differences are:
Transfer Encoding is hop-by-hop, not end-to-end
Transfer Encodings other than "chunked" (sadly) aren't implemented in practice
Transfer Encoding is on the message layer, Content Encoding on the payload layer
Using Content Encoding affects entity tags etc.
See http://greenbytes.de/tech/webdav/rfc7230.html#transfer.codings and http://greenbytes.de/tech/webdav/rfc7231.html#data.encoding.
I am writing a HTTP Webserver. My server has to handle Http multipart requests. In my previous implementation, I was extracting the data with the help of content length header present in every part of request. The client which I was using give content-length header with every part part(file) in the multipart request.
But another client is not giving content-length of each file. In my implementation I use content-length header to extract that much bytes and save them into a file.
Please tell me how can I extract data now.
The Headers which I am getting now are:
POST xxxxxxxxxxxxxxxxxxxxxxx¤tTab=PHOTOxxxxxxxxxxxxxxxx HTTP/1.1
Content-Length: 6829
Content-Type: multipart/form-data; boundary=SnlCg9JqTpQIl6t_mPzByTjZ8bD24kUj; charset=UTF-8
Host: host
Connection: Keep-Alive
User-Agent: Apache-HttpClient/xxxxxxxx
Accept-Encoding: gzip
--SnlCg9JqTpQIl6t_mPzByTjZ8bD24kUj
Content-Disposition: form-data; name="file"; filename="imagesCA5L2CL6_jpg(2)_jpg.jpg"
Content-Type: photo/jpg
**Some Data byte array**
--SnlCg9JqTpQIl6t_mPzByTjZ8bD24kUj--
In this request, there is now content-length header in part data.
EDIT:
Earlier this client used to send content-length header in every part. But for some reason it is not sending it any more. Can anybody suggest any reason for that.
thanks
Like this : Reading file input from a multipart/form-data POST
Reading file input from a multipart/form-data POST
Take a look at RFC 2616 if you want to implement a HTTP/1.1 server. See section 4.4 on how to determine message length. See RFC 2388 on how to implement multipart/form-data.
The real answer is: don't reinvent the wheel, or you'll have to reimplmement a few hundred pages of RFC's. There are tons of libraries and servers out there.
If you do want to write your own web server, for example as an exercise, you would have found those RFC's already, right?
I'm using Apache Abdera to POST atom multipart data to my server, and am having some odd problems that I can't pin down.
It looks like an issue with chunked transfer encoding, but I'm insufficiently experienced to be certain. The problem manifests as the server throwing an error indicating that the request I sent it contains only one mime part, not two as required. I attached Wireshark to the interface and captured the conversation, and it went like this:
POST /sss/col-uri/2ee98ea1-f9ad-4f01-9b1c-cfa3c4a6dc3c HTTP/1.1
Host: localhost
Expect: 100-continue
Transfer-Encoding: chunked
Content-Type: multipart/related; boundary="1306399868259";type="application/atom+xml;type=entry"
The server's response:
HTTP/1.1 100 Continue
My client continues:
198
--1306399868259
Content-Type: application/atom+xml;type=entry
Content-Disposition: attachment; name="atom"
<entry xmlns="http://www.w3.org/2005/Atom"><title xmlns="http://purl.org/dc/terms/">Richard Woz Ere</title><bibliographicCitation xmlns="http://purl.org/dc/terms/">this is my citation</bibliographicCitation><content type="application/zip" src="cid:48bd9436-e8b6-4f68-aa83-5c88eda52fd4" /></entry>
0
b0e9
--1306399868259
Content-Type: application/zip
Content-Disposition: attachment; name="payload"; filename="example.zip"
Content-ID: <48bd9436-e8b6-4f68-aa83-5c88eda52fd4>
Packaging: http://purl.org/net/sword/package/SimpleZip
And at this point the server responds with:
HTTP/1.1 400 Bad Request
Date: Thu, 26 May 2011 08:51:08 GMT
Server: Apache/2.2.17 (Unix) mod_ssl/2.2.17 OpenSSL/0.9.8l DAV/2 mod_wsgi/3.3 Python/2.6.1
Connection: close
Transfer-Encoding: chunked
Content-Type: text/xml
Indicating the error (which is well understood). My server goes on to stream a pile of base64 encoded bits onto the output stream, but in the mean time the server is not listening, it has already decided that the request was erroneous.
Unfortunately, I'm not in charge of the HTTP layer - this is all handled by Abdera using Apache httpclient. My code that does this looks like this:
client.execute("POST", url.toString(), new SWORDMultipartRequestEntity(deposit), options);
Here, the SWORDMultipartRequestEntity is a copy of the standard Abdera MultipartRequestEntity class, with a few extra headers thrown in (see, for example, Packaging in the above snippet); the "deposit" argument is just an object holding the atom part and the inputstream.
When attaching a debugger I get to this line of code fine, and then it disappears into a rat hole and then I get this error back.
Any hints or tips? I've pretty much exhausted my angles of attack!
The only thing that stands out for me is that immediately after the atom:entry document, there is a newline with "0" on it alone, which appears to be chunked transfer encoding speak for "I'm finished". Not sure how it got there, or whether it really has any effect. Help much appreciated.
Cheers,
Richard
The lonely 0 may indeed be a problem. My uninformed guess is that it results from some call to flush(), which then writes the whole buffer as another HTTP chunk. Unfortunately at the point where flush is called, the buffer had already been flushed and its size is therefore zero. So the HttpChunkedOutputFilter (or however it is called) should be taught than an empty buffer does not need to be flushed.
[update:] You should set a breakpoint in the ChunkedOutputStream class, especially the flush method. I just looked at its code and it seems to be ok, but maybe I missed something.
What is the difference between HTTP headers Content-Range and Range? When should each be used?
I am trying to stream an audio file from a particular byte offset. Should I use Content-Range or Range header?
Actually, the accepted answer is not complete. Content-Range is not only used in responses. It is also legal in requests that provide an entity body.
For example, an HTTP PUT provides an entity body, it might provide only a portion of an entity. Thus the PUT request can include a Content-Range header indicating to the server where the partial entity body should be merged into the entity.
For example, let's first create and then append to a file using HTTP:
Request 1:
PUT /file HTTP/1.1
Host: server
Content-Length: 1
a
Request 2:
PUT /file HTTP/1.1
Host: server
Content-Range: bytes 1-2/*
Content-Length: 1
a
How, let's see the file's contents...
Request 3:
GET /file HTTP/1.1
Host: server
HTTP/1.1 200 OK
Content-Length: 2
aa
This allows random file access, both READING and WRITING over HTTP. I just wanted to clarify, as I was researching the use of Content-Range in a WebDAV client I am developing, so perhaps this expanded information will prove useful to somebody else.
Range is used in the request, to ask for a particular range (or ranges) of bytes. Content-Range is used in the response, to indicate which bytes the server is giving you (which may be different than the range you requested), as well as how long the entire content is (if known).