HTTP byte ranges and multipart/byteranges alternatives? - http

rfc2616 (HTTP/1.1):
A response to a request for a single range MUST NOT be sent using the
multipart/byteranges media type.
A response to a request for multiple ranges, whose result is a single
range, MAY be sent as a multipart/byteranges media type with one part.
A client that cannot decode a multipart/byteranges message MUST NOT
ask for multiple byte-ranges in a single request.
If I understand this correctly, multiple ranges in a single request MAY use multipart/byteranges and clients MUST be able to decode it or shouldn't request it at all.
Does the "MAY" imply that there are also alternatives to multipart/byteranges that could be used? Do any exist? If so, are there headers to request them?
For example, could a server potentially concatenate all byte ranges into a single part response?

If a request asks for multiple ranges and the server can concatenate the requested ranges into a single continuous range, then the response can either:
use multipart/byteranges with a single MIME part for the concatenated range, where the part has its own Content-Range header.
send the concatenated data by itself and include a top-level Content-Range header.

As far as my experience back in 2012, I would reommend to stick to the first, i.e. "A response to a request for a single range MUST NOT be sent using the multipart/byteranges media type." because some clients will choke.

Related

Is splitting http request header meaningful in golang http package?

Most servers have the http request header length limit(4k~8k).
Usually we split the long headers into several parts.
For golang http package, I remember that it combines headers with the same key value into one giant header. Is this correct?
Like if I have a token which length exceeds the 8k limit. I'd like to split into several parts with the same header key Authorization.
Then send request using http package.
Does this split make sense or not?
Hmm, I'm not sure that's quite valid. The Headers object is actually a map of string keys pointing to string slices.
https://golang.org/pkg/net/http/#Header
As such, if you try to set the same key it will be overwritten as per standard golang map functionality.

Generating a multipart/byterange response without scanning the parts ahead of sending

I would like to generate a multipart byte range response. Is there a way for me to do it without scanning each segment I am about to send out, since I need to generate multipart boundary strings?
For example, I can have a user request a byterange that would have me fetch and scan 2GB of data, which in my case involves me loading that data into my (slow) VM as strings and so forth. Ideally I would like to simply state in the response that a part has a length of a certain number of bytes, and be done with it. Is there any tooling that could provide me with this option? I see that many developers just grab a UUID as the boundary and are probably willing to risk a tiny probability that it will appear somewhere within the part, but that risk seems to be small enough multiple people are taking it?
To explain in more detail: scanning the parts ahead of time (before generating the response) is not really feasible in my case since I need to fetch them via HTTP from an upstream service. This means that I effectively have to prefetch the entire part first to compute a non-matching multipart boundary, and only then can I splice that part into the response.
Assuming the data can be arbitrary, I don’t see how you could guarantee absence of collisions without scanning the data.
If the format of the data is very limited (like... base 64 encoded?), you may be able to pick a boundary that is known to be an illegal sequence of bytes in that format.
Even if your boundary does collide with the data, it must be followed by headers such as Content-Range, which is even more improbable, so the client is likely to treat it as an error rather than consume the wrong data.
Major Web servers use very simple strategies. Apache grabs 8 random bytes at startup and renders them in hexadecimal. nginx uses a sequential counter left-padded with zeroes.
UUIDs are designed to avoid collisions with other UUIDs, not with arbitrary data. A UUID is no more likely to be a good boundary than a completely random string of the same length. Moreover, some UUID variants include information that you may not want to disclose, such as your machine’s MAC address.
Ideally I would like to simply state in the response that a part has a length of a certain number of bytes, and be done with it. Is there any tooling that could provide me with this option?
Maybe you can avoid supporting multiple ranges and simply tell the clients to request each range separately. In that case, you don’t use the multipart format, so there is no problem.
If you do want to send multiple ranges in one response, then RFC 7233 requires the multipart format, which requires the boundary string.
You can, of course, invent your own mechanism instead of that of RFC 7233. In that case:
You cannot use 206 (Partial Content). You must use 200 (OK) or some other applicable status code.
You cannot use the multipart/byteranges media type. You must come up with your own media type.
You cannot use the Range request header.
Because a 200 (OK) response to a GET request is supposed to carry a (full) representation of the resource, you must do one of the following:
encode the requested ranges in the URL; or
use something like POST instead of GET; or
use a custom, non-standard status code instead of 200 (OK); or
(not sure if this is a correct approach) use media type parameters, send them in Accept, and add Accept to Vary.
The chunked transfer coding may be useful, but you cannot rely on it alone, because it is a property of the connection, not of the payload.

Are there ascii characters that can not be sent across the internet?

I am building a web application that will send a set of flag states to its server by converting the flags into binary, then converting the binary into ascii characters. The ascii characters will be sent using a post command, then a response (encoded the same way) will be sent back. I would like to know if there are ascii character that can cause the HTTP requests and data transfer to break down or get misdirected. Are there standard ascii characters (0-127) that need to be avoided?
Despite its name, HTTP is agnostic to the format and semantics of the entity-body content. It doesn't need to be text. Describing it as text and giving a character encoding is metadata for the sending and receiving applications. Your actual entity-data isn't text but if you've added a layer of re-interpretation so you could provide that metadata.
HTTP bodies are of two types: counted or chunked. For counted, the message-body is the same as the entity-body. Counted is used unless you want to start streaming data before knowing its entire length. Just send the Content-Length header with the number of octets in the entity-body and copy it into the output stream.

Content-Range for resuming a file of unknown length

I create a ZIP archive on-the-fly of unknown length from existing material (using Node), which is already compressed. In the ZIP archive, files just get stored; the ZIP is only used to have a single container. That's why caching the created ZIP files makes no sense -there's no real computation involved.
So far, OK. Now I want to permit resuming downloads, and I'm reading about Accept-Range, Range and Content-Range HTTP headers. A client with a broken download would ask for an open-ended range, say: Range: bytes=8000000-.
How do I answer that? My answer must include a Content-Range header, and there, according to RFC 2616 § 14.16 :
Unlike byte-ranges-specifier values (see section 14.35.1), a byte- range-resp-spec MUST only specify one range, and MUST contain absolute byte positions for both the first and last byte of the range.
So I cannot just send "everything starting from position X", I must specify the last byte sent, too - either by sending only a part of known size, or by calculating the length in advance. Both ideas are not convenient to my situation. Is there any other possibility?
Answering myself: Looks like I have to choose between (1) chunked-encoding of a file of yet unknown length, or (2) knowing its Content-Length (or at least the size of the current part), allowing for resuming downloads (as well as for progress bars).
I can live with that - for each of my ZIP files, the length will be the same, so I can store it somewhere and re-use it for subsequent downloads. I'm just surprised the HTTP protocol does not allow for resuming downloads of unknown length.
Response with "multipart/byteranges" Content-Type including Content-Range fields for each part.
Reasoning:
When replying to requests with "Range" header, successful partial responses should report 206 HTTP status code (14.35.1 Byte Ranges section)
206 response suggests either "Content-Range" header or "multipart/byteranges" Content-Type (10.2.7 206 Partial Content)
"Content-Range" header cannot be added to the response as it does not allow omitting end position, so the only left way is to use "multipart/byteranges" Content-Type

How to determine valid Range Header w.r.t HTTP Entity?

As per HTTP/1.1 spec for Range header (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35), it is stated that
Byte range specifications in HTTP apply to the sequence of bytes in the entity-body (not necessarily the same as the message-body).
My question is suppose I am requesting to download a binary file of size 1GB & it is having multiple encrypted blocks of 128MB. Since Byte range of HTTP is not equal to the size of file instead the HTTP entity, to download these chunks parallely from the server without breaking the boundaries. Please note that I don't want to reassemble the file. I want to process these blocks separately to decrypt. which Range header would be most suitable & how to derive the correct value to be sent to in that Range header?
Thanks,
The Range header is applicable for not full HTTP Entity rather only the entity-body of that HTTP entity. The HTTP Message RFC (http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html) says
The message-body (if any) of an HTTP message is used to carry the entity-body associated with the request or response. The message-body differs from the entity-body only when a transfer-coding has been applied, as indicated by the Transfer-Encoding header field (section 14.41).
Another good reference to read is http://www.ietf.org/rfc/rfc3229.txt (section 4 - The HTTP message-generation sequence) which explains how the HTTP response is generated. Conceptually, when a Range header & transfer encoding both are provided in the request, Range is applied first for message response generation & then the transfer encoding is applied. I think most of the HTTP servers should be confirming to this, so we can apply the range header w.r.t message content length.

Resources