MJPEG over HTTP Specification

MJPEG over HTTP Specification - http

I was trying to create a tool to grab frames from a mjpeg stream that is transmitted over http. I did not find any specification so I looked at what wikipedia says here:
In response to a GET request for a MJPEG file or stream, the server
streams the sequence of JPEG frames over HTTP. A special mime-type
content type multipart/x-mixed-replace;boundary=<boundary-name>
informs the client to expect several parts (frames) as an answer
delimited by <boundary-name>. This boundary name is expressly
disclosed within the MIME-type declaration itself.
But this doesn't seem to be very accurate in practice. I dumped some streams to find out how they behave. Most streams have the following format (where CRLF is a carriage return line feed, and a partial header are some header fields without a status line):
Status line (e.g. HTTP/1.0 200 OK) CRLF
Header fields (e.g. Cache-Control: no-cache) CRLF
Content-Type header field (e.g. Content-Type: multipart/x-mixed-replace; boundary=--myboundary) CRLF
CRLF (Denotes that the header is over)
Boundary (Denotes that the first frame is over) CRLF
Partial header fields (mostly: Content-type: image/jpeg) CRLF
CRLF (Denotes that this "partial header" is over)
Actual frame data CRLF
(Sometimes here is an optional CRLF)
Boundary
Starting again at partial header (line 6)
The first frame never contained actual image data.
All of the analyzed streams had the Content-Type header, with the type set to multipart/x-mixed-replace.
But some of the streams get things wrong here:
Two Servers claimed boundary="MOBOTIX_Fast_Serverpush" but then used --MOBOTIX_Fast_Serverpush as frame delimiter.
This irritated me quite a bit so I though of an other approach to get the frames.
Since each JPEG starts with 0xFF 0xD8 as Start of Image marker and ends with 0xFF 0xD9 I could just start looking for these. This seems to be a very dirty approach and I don't really like it, but it might be the most robust one.
Before I start implementing this, are there some points I missed about MJPEG over HTTP? Is there any real specification of transmitting MJPEG over HTTP?
What are the caveats when just watching for the Start and End markers of a JPEG instead of using the boundary to delimit frames?

this doesn't seem to be very accurate in practice.
It is very accurate in practice. You are just not handling it correctly.
The first frame never contained actual image data.
Yes, it does. There is always a starting boundary before the first MIME entity (as MIME can contain prologue data before the first entity). You are thinking that MIME boundaries exist only after each MIME entity, but that is simply not true.
I suggest you read the MIME specification, particularly RFC 2045 and RFC 2046. MIME works fine in this situation, you are just not interpreting the results correctly.
Actual frame data CRLF
(Sometimes here is an optional CRLF)
Boundary
Actually, that last CRLF is NOT optional, it is actually part of the next boundary that follows a MIME entity's data (see RFC 2046 Section 5). MIME boundaries must appear on their own lines, so a CRLF is artificially inserted after the entity data, which is especially important for data types (like images) that are not naturally terminated by their own CRLF.
Two Servers claimed boundary="MOBOTIX_Fast_Serverpush" but then used --MOBOTIX_Fast_Serverpush as frame delimiter
That is how MIME is supposed to work. The boundary specified in the Content-Type header is always prefixed with -- in the actual entity stream, and the terminating boundary after the last entity is also suffixed with -- as well.
For example:
Content-Type: multipart/x-mixed-replace; boundary="MOBOTIX_Fast_Serverpush"
--MOBOTIX_Fast_Serverpush
Content-Type: image/jpeg
<jpeg bytes>
--MOBOTIX_Fast_Serverpush
Content-Type: image/jpeg
<jpeg bytes>
--MOBOTIX_Fast_Serverpush
... and so on ...
--MOBOTIX_Fast_Serverpush--
This irritated me quite a bit so I though of an other approach to get the frames.
What you are thinking of will not work, and is not as robust as you are thinking. You really need to process the MIME stream correctly instead.
When processing multipart/x-mixed-replace, what you are supposed to do is:
read and discard the HTTP response body until you reach the first MIME boundary specified by the Content-Type response header.
then read a MIME entity's headers and data until you reach the next matching MIME boundary.
then process the entity's data as needed, according to its headers (for instance, displaying a image/jpeg entity onscreen).
if the connection has not been closed, and the last boundary read is not the termination boundary, go back to 2, otherwise stop processing the HTTP response.

Related

Is HTTP response a single ASCII Encoded text file ? or the data and the headers are sent separately?

So according to mozilla docs, http messages are composed of textual information encoded in ASCII.
HTTP messages are composed of textual information encoded in ASCII,
and span over multiple lines.
So, when we request a binary file such as an Image, how is it represented in the response message?
(Assuming the response is a single text file containing both headers and data)
Is it also ASCII encoded?
If not, then are headers and data transferred separately? If yes, please share some resources where I can learn the working.

In HTTP/1.1 (not HTTP/2 or HTTP/3), message header and control information is transferred as plain text. The actual request payloads are binary, and thus no encoding is needed.
See IERF RFCs 9110..9114.

Seeming contradiction in RFC for HTTP/2 case sensitivity

There is a confusing bit of terminology in the RFC for HTTP/2 that I wish was clearer.
Per the RFC https://www.rfc-editor.org/rfc/rfc7540#section-8.1.2
Just as in HTTP/1.x, header field names are strings of ASCII
characters that are compared in a case-insensitive fashion. However,
header field names MUST be converted to lowercase prior to their
encoding in HTTP/2. A request or response containing uppercase header
field names MUST be treated as malformed
This seems to outline two conflicting ideas
Header field names are case-insensitive in HTTP/2
If you receive or send fields that are not lowercase, the request/response is malformed.
If a request or response that contains non-lowercase headers is invalid, how can it be considered case-insensitive?

There are two levels of "HTTP": a more abstract, upper, layer with the HTTP semantic (e.g. PUT resource r1), and a lower layer where that semantic is encoded. Think of these two as, respectively, the application layer of HTTP, and the network layer of HTTP.
The application layer can be completely unaware of whether the semantic HTTP request PUT r1 has arrived in HTTP/1.1 or HTTP/2 format.
On the other hand, the same semantic, PUT r1, is encoded differently in HTTP/1.1 (textual) vs HTTP/2 (binary) by the network layer.
The referenced section of the specification should be interpreted in the first sentence as referring to the application layer: "as in HTTP/1.1 header names should compared case insensitively".
This means that if an application is asked "is header ACCEPT present?", the application should look at the header names in a case insensitive fashion (or be sure that the implementation provides such feature), and return true if Accept or accept is present.
The second sentence should be interpreted as referring to the network layer: a compliant HTTP/2 implementation MUST send the headers over the network lowercase, because that is how HTTP/2 encodes header names to be sent over the wire.
Nothing forbids a compliant HTTP/2 implementation to receive content-length: 128 (lowercase), but then convert this header into Content-Length: 128 when it makes it available to the application - for example for maximum compatibility with HTTP/1.1 where the header has uppercase first letters (for example to be printed on screen).

It's a mistake, or at least an unnecessarily confusing specification.
Header names must be compared case-insensitive, but that's irrelevant since they are only transmitted or received lowercase.
I.e. in the RFC, "Content-Length" refers to to the header "content-length."
RFC 7540 has been obsoleted by RFC 9113, which states simply:
Field names MUST be converted to lowercase when constructing an HTTP/2 message.
https://www.rfc-editor.org/rfc/rfc9113#section-8.2
And the prose now uses the lowercase header names.

Content-Range for resuming a file of unknown length

I create a ZIP archive on-the-fly of unknown length from existing material (using Node), which is already compressed. In the ZIP archive, files just get stored; the ZIP is only used to have a single container. That's why caching the created ZIP files makes no sense -there's no real computation involved.
So far, OK. Now I want to permit resuming downloads, and I'm reading about Accept-Range, Range and Content-Range HTTP headers. A client with a broken download would ask for an open-ended range, say: Range: bytes=8000000-.
How do I answer that? My answer must include a Content-Range header, and there, according to RFC 2616 § 14.16 :
Unlike byte-ranges-specifier values (see section 14.35.1), a byte- range-resp-spec MUST only specify one range, and MUST contain absolute byte positions for both the first and last byte of the range.
So I cannot just send "everything starting from position X", I must specify the last byte sent, too - either by sending only a part of known size, or by calculating the length in advance. Both ideas are not convenient to my situation. Is there any other possibility?

Answering myself: Looks like I have to choose between (1) chunked-encoding of a file of yet unknown length, or (2) knowing its Content-Length (or at least the size of the current part), allowing for resuming downloads (as well as for progress bars).
I can live with that - for each of my ZIP files, the length will be the same, so I can store it somewhere and re-use it for subsequent downloads. I'm just surprised the HTTP protocol does not allow for resuming downloads of unknown length.

Response with "multipart/byteranges" Content-Type including Content-Range fields for each part.
Reasoning:
When replying to requests with "Range" header, successful partial responses should report 206 HTTP status code (14.35.1 Byte Ranges section)
206 response suggests either "Content-Range" header or "multipart/byteranges" Content-Type (10.2.7 206 Partial Content)
"Content-Range" header cannot be added to the response as it does not allow omitting end position, so the only left way is to use "multipart/byteranges" Content-Type

Http response with no http header

I have written a mini-minimalist http server prototype ( heavily inspired by boost asio examples ), and for the moment I haven't put any http header in the server response, only the html string content. Surprisingly it works just fine.
In that question the OP wonders about necessary fields in the http response, and one of the comments states that they may not be really important from the server side.
I have not tried yet to respond binary image files, or gzip compressed file for the moment, in which cases I suppose it is mandatory to have a http header.
But for text only responses (html, css, and xml outputs), would it be ok never to include the http header in my server responses ? What are the risks / errors possible ?

At a minimum, you must provide a header with a status line and a date.
As someone who has written many protocol parsers, I am begging you, on my digital metaphoric knees, please oh please oh please don't just totally ignore the specification just because your favorite browser lets you get away with it.
It is perfectly fine to create a program that is minimally functional, as long as the data it produces is correct. This should not be a major burden, since all you have to do is add three lines to the start of your response. And one of those lines is blank! Please take a few minutes to write the two glorious line of code that will bring your response data into line with the spec.
The headers you really should supply are:
the status line (required)
a date header (required)
content-type (highly recommended)
content-length (highly recommended), unless you're using chunked encoding
if you're returning HTTP/1.1 status lines, and you're not providing a valid content-length or using chunked encoding, then add Connection: close to your headers
the blank line to separate header from body (required)
You can choose not to send a content-type with the response, but you have to understand that the client might not know what to do with the data. The client has to guess what kind of data it is. A browser might decide to treat it as a downloaded file instead of displaying it. An automated process (someone's bash/curl script) might reasonably decide that the data isn't of the expected type so it should be thrown away.
From the HTTP/1.1 Specification section 3.1.1.5. Content-Type:
A sender that generates a message containing a payload body SHOULD
generate a Content-Type header field in that message unless the
intended media type of the enclosed representation is unknown to the
sender. If a Content-Type header field is not present, the recipient
MAY either assume a media type of "application/octet-stream"
([RFC2046], Section 4.5.1) or examine the data to determine its type.

application/x-www-form-urlencoded or multipart/form-data?

In HTTP there are two ways to POST data: application/x-www-form-urlencoded and multipart/form-data. I understand that most browsers are only able to upload files if multipart/form-data is used. Is there any additional guidance when to use one of the encoding types in an API context (no browser involved)? This might e.g. be based on:
data size
existence of non-ASCII characters
existence on (unencoded) binary data
the need to transfer additional data (like filename)
I basically found no formal guidance on the web regarding the use of the different content-types so far.

TL;DR
Summary; if you have binary (non-alphanumeric) data (or a significantly sized payload) to transmit, use multipart/form-data. Otherwise, use application/x-www-form-urlencoded.
The MIME types you mention are the two Content-Type headers for HTTP POST requests that user-agents (browsers) must support. The purpose of both of those types of requests is to send a list of name/value pairs to the server. Depending on the type and amount of data being transmitted, one of the methods will be more efficient than the other. To understand why, you have to look at what each is doing under the covers.
For application/x-www-form-urlencoded, the body of the HTTP message sent to the server is essentially one giant query string -- name/value pairs are separated by the ampersand (&), and names are separated from values by the equals symbol (=). An example of this would be:
MyVariableOne=ValueOne&MyVariableTwo=ValueTwo
According to the specification:
[Reserved and] non-alphanumeric characters are replaced by `%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character
That means that for each non-alphanumeric byte that exists in one of our values, it's going to take three bytes to represent it. For large binary files, tripling the payload is going to be highly inefficient.
That's where multipart/form-data comes in. With this method of transmitting name/value pairs, each pair is represented as a "part" in a MIME message (as described by other answers). Parts are separated by a particular string boundary (chosen specifically so that this boundary string does not occur in any of the "value" payloads). Each part has its own set of MIME headers like Content-Type, and particularly Content-Disposition, which can give each part its "name." The value piece of each name/value pair is the payload of each part of the MIME message. The MIME spec gives us more options when representing the value payload -- we can choose a more efficient encoding of binary data to save bandwidth (e.g. base 64 or even raw binary).
Why not use multipart/form-data all the time? For short alphanumeric values (like most web forms), the overhead of adding all of the MIME headers is going to significantly outweigh any savings from more efficient binary encoding.

READ AT LEAST THE FIRST PARA HERE!
I know this is 3 years too late, but Matt's (accepted) answer is incomplete and will eventually get you into trouble. The key here is that, if you choose to use multipart/form-data, the boundary must not appear in the file data that the server eventually receives.
This is not a problem for application/x-www-form-urlencoded, because there is no boundary. x-www-form-urlencoded can also always handle binary data, by the simple expedient of turning one arbitrary byte into three 7BIT bytes. Inefficient, but it works (and note that the comment about not being able to send filenames as well as binary data is incorrect; you just send it as another key/value pair).
The problem with multipart/form-data is that the boundary separator must not be present in the file data (see RFC 2388; section 5.2 also includes a rather lame excuse for not having a proper aggregate MIME type that avoids this problem).
So, at first sight, multipart/form-data is of no value whatsoever in any file upload, binary or otherwise. If you don't choose your boundary correctly, then you will eventually have a problem, whether you're sending plain text or raw binary - the server will find a boundary in the wrong place, and your file will be truncated, or the POST will fail.
The key is to choose an encoding and a boundary such that your selected boundary characters cannot appear in the encoded output. One simple solution is to use base64 (do not use raw binary). In base64 3 arbitrary bytes are encoded into four 7-bit characters, where the output character set is [A-Za-z0-9+/=] (i.e. alphanumerics, '+', '/' or '='). = is a special case, and may only appear at the end of the encoded output, as a single = or a double ==. Now, choose your boundary as a 7-bit ASCII string which cannot appear in base64 output. Many choices you see on the net fail this test - the MDN forms docs, for example, use "blob" as a boundary when sending binary data - not good. However, something like "!blob!" will never appear in base64 output.

I don't think HTTP is limited to POST in multipart or x-www-form-urlencoded. The Content-Type Header is orthogonal to the HTTP POST method (you can fill MIME type which suits you). This is also the case for typical HTML representation based webapps (e.g. json payload became very popular for transmitting payload for ajax requests).
Regarding Restful API over HTTP the most popular content-types I came in touch with are application/xml and application/json.
application/xml:
data-size: XML very verbose, but usually not an issue when using compression and thinking that the write access case (e.g. through POST or PUT) is much more rare as read-access (in many cases it is <3% of all traffic). Rarely there where cases where I had to optimize the write performance
existence of non-ascii chars: you can use utf-8 as encoding in XML
existence of binary data: would need to use base64 encoding
filename data: you can encapsulate this inside field in XML
application/json
data-size: more compact less that XML, still text, but you can compress
non-ascii chars: json is utf-8
binary data: base64 (also see json-binary-question)
filename data: encapsulate as own field-section inside json
binary data as own resource
I would try to represent binary data as own asset/resource. It adds another call but decouples stuff better. Example images:
POST /images
Content-type: multipart/mixed; boundary="xxxx"
... multipart data
201 Created
Location: http://imageserver.org/../foo.jpg
In later resources you could simply inline the binary resource as link:
<main-resource&gt
...
<link href="http://imageserver.org/../foo.jpg"/>
</main-resource>

I agree with much that Manuel has said. In fact, his comments refer to this url...
http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4
... which states:
The content type
"application/x-www-form-urlencoded" is
inefficient for sending large
quantities of binary data or text
containing non-ASCII characters. The
content type "multipart/form-data"
should be used for submitting forms
that contain files, non-ASCII data,
and binary data.
However, for me it would come down to tool/framework support.
What tools and frameworks do you
expect your API users to be building
their apps with?
Do they have
frameworks or components they can use
that favour one method over the
other?
If you get a clear idea of your users, and how they'll make use of your API, then that will help you decide. If you make the upload of files hard for your API users then they'll move away, of you'll spend a lot of time on supporting them.
Secondary to this would be the tool support YOU have for writing your API and how easy it is for your to accommodate one upload mechanism over the other.

Just a little hint from my side for uploading HTML5 canvas image data:
I am working on a project for a print-shop and had some problems due to uploading images to the server that came from an HTML5 canvas element. I was struggling for at least an hour and I did not get it to save the image correctly on my server.
Once I set the
contentType option of my jQuery ajax call to application/x-www-form-urlencoded everything went the right way and the base64-encoded data was interpreted correctly and successfully saved as an image.
Maybe that helps someone!

If you need to use Content-Type=x-www-urlencoded-form then DO NOT use FormDataCollection as parameter: In asp.net Core 2+ FormDataCollection has no default constructors which is required by Formatters. Use IFormCollection instead:
public IActionResult Search([FromForm]IFormCollection type)
{
return Ok();
}

In my case the issue was that the response contentType was application/x-www-form-urlencoded but actually it contained a JSON as the body of the request. Django when we access request.data in Django it cannot properly converted it so access request.body.
Refer this answer for better understanding:
Exception: You cannot access body after reading from request's data stream

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex