http content type, and binary data

http content type, and binary data - http

I thought I knew this already but now I'm not sure: Is all content sent over http always encoded to character data? ie, if my content type is a binary file type, is it always converted to binhex, or is it possible to send "actual" binary data across the wire?

In HTTP there is no content transfer encoding (e.g. base64) done, so binary data is sent just binary, byte-by-byte.

Character data is just binary data with special meaning to humans :p
The actual body of the HTTP request may be encoded and/or compressed, and this is specified in the headers.

Related

Is HTTP response a single ASCII Encoded text file ? or the data and the headers are sent separately?

So according to mozilla docs, http messages are composed of textual information encoded in ASCII.
HTTP messages are composed of textual information encoded in ASCII,
and span over multiple lines.
So, when we request a binary file such as an Image, how is it represented in the response message?
(Assuming the response is a single text file containing both headers and data)
Is it also ASCII encoded?
If not, then are headers and data transferred separately? If yes, please share some resources where I can learn the working.

In HTTP/1.1 (not HTTP/2 or HTTP/3), message header and control information is transferred as plain text. The actual request payloads are binary, and thus no encoding is needed.
See IERF RFCs 9110..9114.

Does IMAP protocol support binary inside multi-part body?

IMAP RFC:
8-bit textual and binary mail is supported through the use of a
[MIME-IMB] content transfer encoding. IMAP4rev1 implementations MAY
transmit 8-bit or multi-octet characters in literals, but SHOULD do
so only when the [CHARSET] is identified.
Although a BINARY body encoding is defined, unencoded binary
strings are not permitted. A "binary string" is any string with
NUL characters. Implementations MUST encode binary data into a
textual form, such as BASE64, before transmitting the data. A
string with an excessive amount of CTL characters MAY also be
considered to be binary.
If implementation has to convert to base64, why RFC is saying "BINARY body encoding is defined". Since every time we need to send the data as base64 (or some other format) effectively binary is not supported. Or am i reading some thing wrong?
IMAP supports MIME multi-part, can the parts inside this have binary data? that is content-transfer-encoding?
I am new to IMAP/HTTP, reason for asking this question is, i have to develop a server which supports both HTTP and IMAP, in HTTP server recive the data in binary (HUGE multipart data, with content-transfer-encoding as binary), FETCH can be done in IMAP. Problem is i need to parse the data and convert each parts inside multipart to base64 if IMAP doesnt support binary. Which i think is severe performance issue.

The answer is unfortunately "maybe".
The MIME RFC supports binary, but the IMAP RFC specifically disallows sending NULL characters. This is likely because they can be confusing for text based parsers, especially those written in C, where NULL has the meaning of End of String.
Some IMAP servers just consider the body to be a "bag of bytes" and I doubt few, if any, actually do re-encoding. So if you ask for the entire message, you will probably get the literal content of it.
If your clients can handle MIME-Binary, you will probably be fine.
There is RFC 3516 for an IMAP extension to support BINARY properly, but this is not widely deployed.
As a side note: why are you using Multipart MIME? That is an odd implementation choice for HTTP.

What to chose application/x-www-form-urlencoded / multipart/form-data for file size in GB?

I am sending some video files (size could be even in GB) as application/x-www-form-urlencodedover HTTP POST.
The following link link suggests that it would be better to transmit it over Multipart form data when we have non-alphanumeric content.
Which encoding would be better to transmit data of this kind?
Also how can I find the length of encoded data (data encoded with application/x-www-form-urlencoded)?
Will encoding the binary data consume much time?
In general, encoding skips the non-alphanumeric characters with some others. So, can we skip encoding for binary data (like video)? How can we skip it?

x-www-form-urlencoded treats the value of an entry in the form data set as a sequence of bytes (octets).
Of the possible 256 values, only 66 are left as it or still encoded as a single byte value, the others are replaced by the hexadecimal representation of the value of their code-point.
This usually takes three to five bytes depending on the encoding.
So in average (256-66)/256 or 74% of the file will be encoded to take three-to-five as much space as originally.
This encoding however has no header nor significant overhead.
multipart/form-data instead works by dividing the data into parts and then finding a string of any length that doesn't occur in said part.
Such string is called the boundary and it is used to delimit the end of the part, that is transmitted as a stream of octects.
So the file is mostly send as it, with negligible size overhead for big enough data.
The draw back is that the user-agent need to find a suitable boundary, however given a string of length k there is only a probability of 2-8k of finding that string in a uniformly generated binary file.
So the user-agent can simply generate a random string and do a quick search and exploit the network transmission time to hide the latency of the search.
You should use multipart/form-data.
This depends on the platform you are using, in general if you cannot access the request body you have to re-perform the encoding your self.
For multipart/form-data encoding there is a little, usually negligible (compared to the transmission time) overhead.

Are there ascii characters that can not be sent across the internet?

I am building a web application that will send a set of flag states to its server by converting the flags into binary, then converting the binary into ascii characters. The ascii characters will be sent using a post command, then a response (encoded the same way) will be sent back. I would like to know if there are ascii character that can cause the HTTP requests and data transfer to break down or get misdirected. Are there standard ascii characters (0-127) that need to be avoided?

Despite its name, HTTP is agnostic to the format and semantics of the entity-body content. It doesn't need to be text. Describing it as text and giving a character encoding is metadata for the sending and receiving applications. Your actual entity-data isn't text but if you've added a layer of re-interpretation so you could provide that metadata.
HTTP bodies are of two types: counted or chunked. For counted, the message-body is the same as the entity-body. Counted is used unless you want to start streaming data before knowing its entire length. Just send the Content-Length header with the number of octets in the entity-body and copy it into the output stream.

application/x-www-form-urlencoded or multipart/form-data?

In HTTP there are two ways to POST data: application/x-www-form-urlencoded and multipart/form-data. I understand that most browsers are only able to upload files if multipart/form-data is used. Is there any additional guidance when to use one of the encoding types in an API context (no browser involved)? This might e.g. be based on:
data size
existence of non-ASCII characters
existence on (unencoded) binary data
the need to transfer additional data (like filename)
I basically found no formal guidance on the web regarding the use of the different content-types so far.

TL;DR
Summary; if you have binary (non-alphanumeric) data (or a significantly sized payload) to transmit, use multipart/form-data. Otherwise, use application/x-www-form-urlencoded.
The MIME types you mention are the two Content-Type headers for HTTP POST requests that user-agents (browsers) must support. The purpose of both of those types of requests is to send a list of name/value pairs to the server. Depending on the type and amount of data being transmitted, one of the methods will be more efficient than the other. To understand why, you have to look at what each is doing under the covers.
For application/x-www-form-urlencoded, the body of the HTTP message sent to the server is essentially one giant query string -- name/value pairs are separated by the ampersand (&), and names are separated from values by the equals symbol (=). An example of this would be:
MyVariableOne=ValueOne&MyVariableTwo=ValueTwo
According to the specification:
[Reserved and] non-alphanumeric characters are replaced by `%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character
That means that for each non-alphanumeric byte that exists in one of our values, it's going to take three bytes to represent it. For large binary files, tripling the payload is going to be highly inefficient.
That's where multipart/form-data comes in. With this method of transmitting name/value pairs, each pair is represented as a "part" in a MIME message (as described by other answers). Parts are separated by a particular string boundary (chosen specifically so that this boundary string does not occur in any of the "value" payloads). Each part has its own set of MIME headers like Content-Type, and particularly Content-Disposition, which can give each part its "name." The value piece of each name/value pair is the payload of each part of the MIME message. The MIME spec gives us more options when representing the value payload -- we can choose a more efficient encoding of binary data to save bandwidth (e.g. base 64 or even raw binary).
Why not use multipart/form-data all the time? For short alphanumeric values (like most web forms), the overhead of adding all of the MIME headers is going to significantly outweigh any savings from more efficient binary encoding.

READ AT LEAST THE FIRST PARA HERE!
I know this is 3 years too late, but Matt's (accepted) answer is incomplete and will eventually get you into trouble. The key here is that, if you choose to use multipart/form-data, the boundary must not appear in the file data that the server eventually receives.
This is not a problem for application/x-www-form-urlencoded, because there is no boundary. x-www-form-urlencoded can also always handle binary data, by the simple expedient of turning one arbitrary byte into three 7BIT bytes. Inefficient, but it works (and note that the comment about not being able to send filenames as well as binary data is incorrect; you just send it as another key/value pair).
The problem with multipart/form-data is that the boundary separator must not be present in the file data (see RFC 2388; section 5.2 also includes a rather lame excuse for not having a proper aggregate MIME type that avoids this problem).
So, at first sight, multipart/form-data is of no value whatsoever in any file upload, binary or otherwise. If you don't choose your boundary correctly, then you will eventually have a problem, whether you're sending plain text or raw binary - the server will find a boundary in the wrong place, and your file will be truncated, or the POST will fail.
The key is to choose an encoding and a boundary such that your selected boundary characters cannot appear in the encoded output. One simple solution is to use base64 (do not use raw binary). In base64 3 arbitrary bytes are encoded into four 7-bit characters, where the output character set is [A-Za-z0-9+/=] (i.e. alphanumerics, '+', '/' or '='). = is a special case, and may only appear at the end of the encoded output, as a single = or a double ==. Now, choose your boundary as a 7-bit ASCII string which cannot appear in base64 output. Many choices you see on the net fail this test - the MDN forms docs, for example, use "blob" as a boundary when sending binary data - not good. However, something like "!blob!" will never appear in base64 output.

I don't think HTTP is limited to POST in multipart or x-www-form-urlencoded. The Content-Type Header is orthogonal to the HTTP POST method (you can fill MIME type which suits you). This is also the case for typical HTML representation based webapps (e.g. json payload became very popular for transmitting payload for ajax requests).
Regarding Restful API over HTTP the most popular content-types I came in touch with are application/xml and application/json.
application/xml:
data-size: XML very verbose, but usually not an issue when using compression and thinking that the write access case (e.g. through POST or PUT) is much more rare as read-access (in many cases it is <3% of all traffic). Rarely there where cases where I had to optimize the write performance
existence of non-ascii chars: you can use utf-8 as encoding in XML
existence of binary data: would need to use base64 encoding
filename data: you can encapsulate this inside field in XML
application/json
data-size: more compact less that XML, still text, but you can compress
non-ascii chars: json is utf-8
binary data: base64 (also see json-binary-question)
filename data: encapsulate as own field-section inside json
binary data as own resource
I would try to represent binary data as own asset/resource. It adds another call but decouples stuff better. Example images:
POST /images
Content-type: multipart/mixed; boundary="xxxx"
... multipart data
201 Created
Location: http://imageserver.org/../foo.jpg
In later resources you could simply inline the binary resource as link:
<main-resource&gt
...
<link href="http://imageserver.org/../foo.jpg"/>
</main-resource>

I agree with much that Manuel has said. In fact, his comments refer to this url...
http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4
... which states:
The content type
"application/x-www-form-urlencoded" is
inefficient for sending large
quantities of binary data or text
containing non-ASCII characters. The
content type "multipart/form-data"
should be used for submitting forms
that contain files, non-ASCII data,
and binary data.
However, for me it would come down to tool/framework support.
What tools and frameworks do you
expect your API users to be building
their apps with?
Do they have
frameworks or components they can use
that favour one method over the
other?
If you get a clear idea of your users, and how they'll make use of your API, then that will help you decide. If you make the upload of files hard for your API users then they'll move away, of you'll spend a lot of time on supporting them.
Secondary to this would be the tool support YOU have for writing your API and how easy it is for your to accommodate one upload mechanism over the other.

Just a little hint from my side for uploading HTML5 canvas image data:
I am working on a project for a print-shop and had some problems due to uploading images to the server that came from an HTML5 canvas element. I was struggling for at least an hour and I did not get it to save the image correctly on my server.
Once I set the
contentType option of my jQuery ajax call to application/x-www-form-urlencoded everything went the right way and the base64-encoded data was interpreted correctly and successfully saved as an image.
Maybe that helps someone!

If you need to use Content-Type=x-www-urlencoded-form then DO NOT use FormDataCollection as parameter: In asp.net Core 2+ FormDataCollection has no default constructors which is required by Formatters. Use IFormCollection instead:
public IActionResult Search([FromForm]IFormCollection type)
{
return Ok();
}

In my case the issue was that the response contentType was application/x-www-form-urlencoded but actually it contained a JSON as the body of the request. Django when we access request.data in Django it cannot properly converted it so access request.body.
Refer this answer for better understanding:
Exception: You cannot access body after reading from request's data stream

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex