I have a case where I am uploading a file and I know the final size once the upload is complete and provide this information when someone is requesting the file while the upload is still in progress.
My problem is range requests vs non range request and how to handle the resource size.
In the case of non range request I always set Content-Length to the expected/final size of the resource and let the server tail the file.
In the case of range request I set the */{complete-length} to the current size of the resource and the server returns the current payload without tailing the file.
What I'm unsure of is whether this is spec compliant?
In terms of non range request the spec says:
the Content-Length indicates the size of the selected representation
While in terms of range request the spec says:
The complete-length in a 416 response indicates the current length of the selected representation.
So the question here is whether the spec means that these two are the same or not. My current implementation assumes they are not the same, i.e:
Content-Length is the final/expected size of the resource
complete-length is the current size of the resource
Related
I read many questions about the limit of the URL in HTTP still not able to find the answer to how many parameters are maximum supported in HTTP
What is the maximum number of parameters supported in HTTP by parameters i mean:
https://www.google.com/search?q=cookies&ie=utf-8&oe=utf-8
Here there are 3 parameters:
q ie oe and their corresponding values.
The query string is under authority of RFC 3986, section 3.4 which does not specify any limit with the exception of the allowed characters. You will also struggle to find any limitation on the logical number of parameters, since there has never been a real specification on the format; what you find in there is rather a best-practice that has been highly influenced by what CGI is doing. So the number of parameters is very much bound by what the client or server is willing to transfer/accept (the lower bound wins, obviously). Per this answer, you can find a rough estimate here.
There is no limit for parameters number, its all about data size how many KBs you are sending using your GET request, however this value is configurable from web server side (Apache, Tomact, ..etc).
The default limit for the length of the request line is 8190 bytes in apache and this value could be changed to increase it or decrease it.
How does Content-Range looks like, if I am requesting some range and the size in unknown.
For example my request is "bytes=100-200" and the stream will end at 150. But I do not know it before I start to stream. What should I send as Content-Range header?
bytes 100-/*
bytes 100-200/*
bytes 100-*/*
Or it is not a legal situation at all?
Same question if the request is open ended: "bytes=100-"
If you request a range that is satisfiable, the server should respond with a 206 (partial content) response. See RFC7233, sec. 4.1.
If the bytelength of the requested resource is smaller than the offset of the range interval, or the closing offset is beyond the resource length, the server should respond with a 416 (range not satisfiable). See section 4.4.
To skip the first 100 bytes of the content, you are indeed right in that the request should contain a Range: bytes=100- header. See sec. 2.1 and sec. 3.1.
As far as the situation goes for a resource which has unknown length and is being read in a way that yields content chunks of unpredictable size: This is undefined behaviour not sanctioned by any RFC. The Content-Range header is specified in a way that the current range or the total content size is unknown, but not both. You cannot resort to the HTTP envelope as a means of specifying the range length as a server must provide a Content-Range header when responding with a 206 code (cf. sec 4.1).
The correct way of handling the situation were:
Validating the range request
Attempting to read a sufficient amount of bytes from the requested resource
If a sufficient amount of bytes could have been retrieved, create the HTTP envelope, specify the range and attach the body. Cut off if needed,
In any other case: Respond with a 416
If a server is set-up to handle Range requests, by accepting bytes for instance, then there are roughly two ways of sending a valid request to that server.
The first is to set the Range header to something like Range: 0-, meaning that we want the first byte of whatever the server is serving and are lazy about the rest of the content. Just give us as much as the server can.
The second is to set the Range header to something like Range: 0-2000/*, meaning that we explicitly want bytes 0 through 2000 and we don't care how large the file is that the server is serving.
These requests - if deemed valid by the server (which we now state, is indeed valid) - are then answered by HTTP 206 responses.
My question is: what is the most professional approach to serving a file through Partial responses, as discussed above. Would it be serving all bytes the client who sent the request does not have at once, i.e. if a request with Range: 0- comes in serving bytes 0 through the last byte? Or would it be to split up the file and when a request with Range: 0- comes in only serve a number of bytes, i.e. bytes 0 through 500? Or something different alltogether?
I have this question because a current code base of mine seems to be blocking, thus no more than five or so requests can be handled at a given time.
From RFC7233, section 2.1:
A client can limit the number of bytes requested without knowing
the size of the selected representation. If the last-byte-pos
value is absent, or if the value is greater than or equal to the
current length of the representation data, the byte range is
interpreted as the remainder of the representation (i.e., the
server replaces the value of last-byte-pos with a value that is one
less than the current length of the selected representation).
So this means that a range like:
Range: 0-
Should be interpreted as 'serve the entire file'.
I create a ZIP archive on-the-fly of unknown length from existing material (using Node), which is already compressed. In the ZIP archive, files just get stored; the ZIP is only used to have a single container. That's why caching the created ZIP files makes no sense -there's no real computation involved.
So far, OK. Now I want to permit resuming downloads, and I'm reading about Accept-Range, Range and Content-Range HTTP headers. A client with a broken download would ask for an open-ended range, say: Range: bytes=8000000-.
How do I answer that? My answer must include a Content-Range header, and there, according to RFC 2616 ยง 14.16 :
Unlike byte-ranges-specifier values (see section 14.35.1), a byte- range-resp-spec MUST only specify one range, and MUST contain absolute byte positions for both the first and last byte of the range.
So I cannot just send "everything starting from position X", I must specify the last byte sent, too - either by sending only a part of known size, or by calculating the length in advance. Both ideas are not convenient to my situation. Is there any other possibility?
Answering myself: Looks like I have to choose between (1) chunked-encoding of a file of yet unknown length, or (2) knowing its Content-Length (or at least the size of the current part), allowing for resuming downloads (as well as for progress bars).
I can live with that - for each of my ZIP files, the length will be the same, so I can store it somewhere and re-use it for subsequent downloads. I'm just surprised the HTTP protocol does not allow for resuming downloads of unknown length.
Response with "multipart/byteranges" Content-Type including Content-Range fields for each part.
Reasoning:
When replying to requests with "Range" header, successful partial responses should report 206 HTTP status code (14.35.1 Byte Ranges section)
206 response suggests either "Content-Range" header or "multipart/byteranges" Content-Type (10.2.7 206 Partial Content)
"Content-Range" header cannot be added to the response as it does not allow omitting end position, so the only left way is to use "multipart/byteranges" Content-Type
As per HTTP/1.1 spec for Range header (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35), it is stated that
Byte range specifications in HTTP apply to the sequence of bytes in the entity-body (not necessarily the same as the message-body).
My question is suppose I am requesting to download a binary file of size 1GB & it is having multiple encrypted blocks of 128MB. Since Byte range of HTTP is not equal to the size of file instead the HTTP entity, to download these chunks parallely from the server without breaking the boundaries. Please note that I don't want to reassemble the file. I want to process these blocks separately to decrypt. which Range header would be most suitable & how to derive the correct value to be sent to in that Range header?
Thanks,
The Range header is applicable for not full HTTP Entity rather only the entity-body of that HTTP entity. The HTTP Message RFC (http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html) says
The message-body (if any) of an HTTP message is used to carry the entity-body associated with the request or response. The message-body differs from the entity-body only when a transfer-coding has been applied, as indicated by the Transfer-Encoding header field (section 14.41).
Another good reference to read is http://www.ietf.org/rfc/rfc3229.txt (section 4 - The HTTP message-generation sequence) which explains how the HTTP response is generated. Conceptually, when a Range header & transfer encoding both are provided in the request, Range is applied first for message response generation & then the transfer encoding is applied. I think most of the HTTP servers should be confirming to this, so we can apply the range header w.r.t message content length.