As per HTTP/1.1 spec for Range header (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35), it is stated that
Byte range specifications in HTTP apply to the sequence of bytes in the entity-body (not necessarily the same as the message-body).
My question is suppose I am requesting to download a binary file of size 1GB & it is having multiple encrypted blocks of 128MB. Since Byte range of HTTP is not equal to the size of file instead the HTTP entity, to download these chunks parallely from the server without breaking the boundaries. Please note that I don't want to reassemble the file. I want to process these blocks separately to decrypt. which Range header would be most suitable & how to derive the correct value to be sent to in that Range header?
Thanks,
The Range header is applicable for not full HTTP Entity rather only the entity-body of that HTTP entity. The HTTP Message RFC (http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html) says
The message-body (if any) of an HTTP message is used to carry the entity-body associated with the request or response. The message-body differs from the entity-body only when a transfer-coding has been applied, as indicated by the Transfer-Encoding header field (section 14.41).
Another good reference to read is http://www.ietf.org/rfc/rfc3229.txt (section 4 - The HTTP message-generation sequence) which explains how the HTTP response is generated. Conceptually, when a Range header & transfer encoding both are provided in the request, Range is applied first for message response generation & then the transfer encoding is applied. I think most of the HTTP servers should be confirming to this, so we can apply the range header w.r.t message content length.
Related
I have a case where I am uploading a file and I know the final size once the upload is complete and provide this information when someone is requesting the file while the upload is still in progress.
My problem is range requests vs non range request and how to handle the resource size.
In the case of non range request I always set Content-Length to the expected/final size of the resource and let the server tail the file.
In the case of range request I set the */{complete-length} to the current size of the resource and the server returns the current payload without tailing the file.
What I'm unsure of is whether this is spec compliant?
In terms of non range request the spec says:
the Content-Length indicates the size of the selected representation
While in terms of range request the spec says:
The complete-length in a 416 response indicates the current length of the selected representation.
So the question here is whether the spec means that these two are the same or not. My current implementation assumes they are not the same, i.e:
Content-Length is the final/expected size of the resource
complete-length is the current size of the resource
How does Content-Range looks like, if I am requesting some range and the size in unknown.
For example my request is "bytes=100-200" and the stream will end at 150. But I do not know it before I start to stream. What should I send as Content-Range header?
bytes 100-/*
bytes 100-200/*
bytes 100-*/*
Or it is not a legal situation at all?
Same question if the request is open ended: "bytes=100-"
If you request a range that is satisfiable, the server should respond with a 206 (partial content) response. See RFC7233, sec. 4.1.
If the bytelength of the requested resource is smaller than the offset of the range interval, or the closing offset is beyond the resource length, the server should respond with a 416 (range not satisfiable). See section 4.4.
To skip the first 100 bytes of the content, you are indeed right in that the request should contain a Range: bytes=100- header. See sec. 2.1 and sec. 3.1.
As far as the situation goes for a resource which has unknown length and is being read in a way that yields content chunks of unpredictable size: This is undefined behaviour not sanctioned by any RFC. The Content-Range header is specified in a way that the current range or the total content size is unknown, but not both. You cannot resort to the HTTP envelope as a means of specifying the range length as a server must provide a Content-Range header when responding with a 206 code (cf. sec 4.1).
The correct way of handling the situation were:
Validating the range request
Attempting to read a sufficient amount of bytes from the requested resource
If a sufficient amount of bytes could have been retrieved, create the HTTP envelope, specify the range and attach the body. Cut off if needed,
In any other case: Respond with a 416
rfc2616 (HTTP/1.1):
A response to a request for a single range MUST NOT be sent using the
multipart/byteranges media type.
A response to a request for multiple ranges, whose result is a single
range, MAY be sent as a multipart/byteranges media type with one part.
A client that cannot decode a multipart/byteranges message MUST NOT
ask for multiple byte-ranges in a single request.
If I understand this correctly, multiple ranges in a single request MAY use multipart/byteranges and clients MUST be able to decode it or shouldn't request it at all.
Does the "MAY" imply that there are also alternatives to multipart/byteranges that could be used? Do any exist? If so, are there headers to request them?
For example, could a server potentially concatenate all byte ranges into a single part response?
If a request asks for multiple ranges and the server can concatenate the requested ranges into a single continuous range, then the response can either:
use multipart/byteranges with a single MIME part for the concatenated range, where the part has its own Content-Range header.
send the concatenated data by itself and include a top-level Content-Range header.
As far as my experience back in 2012, I would reommend to stick to the first, i.e. "A response to a request for a single range MUST NOT be sent using the multipart/byteranges media type." because some clients will choke.
I create a ZIP archive on-the-fly of unknown length from existing material (using Node), which is already compressed. In the ZIP archive, files just get stored; the ZIP is only used to have a single container. That's why caching the created ZIP files makes no sense -there's no real computation involved.
So far, OK. Now I want to permit resuming downloads, and I'm reading about Accept-Range, Range and Content-Range HTTP headers. A client with a broken download would ask for an open-ended range, say: Range: bytes=8000000-.
How do I answer that? My answer must include a Content-Range header, and there, according to RFC 2616 ยง 14.16 :
Unlike byte-ranges-specifier values (see section 14.35.1), a byte- range-resp-spec MUST only specify one range, and MUST contain absolute byte positions for both the first and last byte of the range.
So I cannot just send "everything starting from position X", I must specify the last byte sent, too - either by sending only a part of known size, or by calculating the length in advance. Both ideas are not convenient to my situation. Is there any other possibility?
Answering myself: Looks like I have to choose between (1) chunked-encoding of a file of yet unknown length, or (2) knowing its Content-Length (or at least the size of the current part), allowing for resuming downloads (as well as for progress bars).
I can live with that - for each of my ZIP files, the length will be the same, so I can store it somewhere and re-use it for subsequent downloads. I'm just surprised the HTTP protocol does not allow for resuming downloads of unknown length.
Response with "multipart/byteranges" Content-Type including Content-Range fields for each part.
Reasoning:
When replying to requests with "Range" header, successful partial responses should report 206 HTTP status code (14.35.1 Byte Ranges section)
206 response suggests either "Content-Range" header or "multipart/byteranges" Content-Type (10.2.7 206 Partial Content)
"Content-Range" header cannot be added to the response as it does not allow omitting end position, so the only left way is to use "multipart/byteranges" Content-Type
I have written a mini-minimalist http server prototype ( heavily inspired by boost asio examples ), and for the moment I haven't put any http header in the server response, only the html string content. Surprisingly it works just fine.
In that question the OP wonders about necessary fields in the http response, and one of the comments states that they may not be really important from the server side.
I have not tried yet to respond binary image files, or gzip compressed file for the moment, in which cases I suppose it is mandatory to have a http header.
But for text only responses (html, css, and xml outputs), would it be ok never to include the http header in my server responses ? What are the risks / errors possible ?
At a minimum, you must provide a header with a status line and a date.
As someone who has written many protocol parsers, I am begging you, on my digital metaphoric knees, please oh please oh please don't just totally ignore the specification just because your favorite browser lets you get away with it.
It is perfectly fine to create a program that is minimally functional, as long as the data it produces is correct. This should not be a major burden, since all you have to do is add three lines to the start of your response. And one of those lines is blank! Please take a few minutes to write the two glorious line of code that will bring your response data into line with the spec.
The headers you really should supply are:
the status line (required)
a date header (required)
content-type (highly recommended)
content-length (highly recommended), unless you're using chunked encoding
if you're returning HTTP/1.1 status lines, and you're not providing a valid content-length or using chunked encoding, then add Connection: close to your headers
the blank line to separate header from body (required)
You can choose not to send a content-type with the response, but you have to understand that the client might not know what to do with the data. The client has to guess what kind of data it is. A browser might decide to treat it as a downloaded file instead of displaying it. An automated process (someone's bash/curl script) might reasonably decide that the data isn't of the expected type so it should be thrown away.
From the HTTP/1.1 Specification section 3.1.1.5. Content-Type:
A sender that generates a message containing a payload body SHOULD
generate a Content-Type header field in that message unless the
intended media type of the enclosed representation is unknown to the
sender. If a Content-Type header field is not present, the recipient
MAY either assume a media type of "application/octet-stream"
([RFC2046], Section 4.5.1) or examine the data to determine its type.