What is the professional way of serving HTTP 206 responses - http

If a server is set-up to handle Range requests, by accepting bytes for instance, then there are roughly two ways of sending a valid request to that server.
The first is to set the Range header to something like Range: 0-, meaning that we want the first byte of whatever the server is serving and are lazy about the rest of the content. Just give us as much as the server can.
The second is to set the Range header to something like Range: 0-2000/*, meaning that we explicitly want bytes 0 through 2000 and we don't care how large the file is that the server is serving.
These requests - if deemed valid by the server (which we now state, is indeed valid) - are then answered by HTTP 206 responses.
My question is: what is the most professional approach to serving a file through Partial responses, as discussed above. Would it be serving all bytes the client who sent the request does not have at once, i.e. if a request with Range: 0- comes in serving bytes 0 through the last byte? Or would it be to split up the file and when a request with Range: 0- comes in only serve a number of bytes, i.e. bytes 0 through 500? Or something different alltogether?
I have this question because a current code base of mine seems to be blocking, thus no more than five or so requests can be handled at a given time.

From RFC7233, section 2.1:
A client can limit the number of bytes requested without knowing
the size of the selected representation. If the last-byte-pos
value is absent, or if the value is greater than or equal to the
current length of the representation data, the byte range is
interpreted as the remainder of the representation (i.e., the
server replaces the value of last-byte-pos with a value that is one
less than the current length of the selected representation).
So this means that a range like:
Range: 0-
Should be interpreted as 'serve the entire file'.

Related

Content-Length vs complete-length

I have a case where I am uploading a file and I know the final size once the upload is complete and provide this information when someone is requesting the file while the upload is still in progress.
My problem is range requests vs non range request and how to handle the resource size.
In the case of non range request I always set Content-Length to the expected/final size of the resource and let the server tail the file.
In the case of range request I set the */{complete-length} to the current size of the resource and the server returns the current payload without tailing the file.
What I'm unsure of is whether this is spec compliant?
In terms of non range request the spec says:
the Content-Length indicates the size of the selected representation
While in terms of range request the spec says:
The complete-length in a 416 response indicates the current length of the selected representation.
So the question here is whether the spec means that these two are the same or not. My current implementation assumes they are not the same, i.e:
Content-Length is the final/expected size of the resource
complete-length is the current size of the resource

Generating a multipart/byterange response without scanning the parts ahead of sending

I would like to generate a multipart byte range response. Is there a way for me to do it without scanning each segment I am about to send out, since I need to generate multipart boundary strings?
For example, I can have a user request a byterange that would have me fetch and scan 2GB of data, which in my case involves me loading that data into my (slow) VM as strings and so forth. Ideally I would like to simply state in the response that a part has a length of a certain number of bytes, and be done with it. Is there any tooling that could provide me with this option? I see that many developers just grab a UUID as the boundary and are probably willing to risk a tiny probability that it will appear somewhere within the part, but that risk seems to be small enough multiple people are taking it?
To explain in more detail: scanning the parts ahead of time (before generating the response) is not really feasible in my case since I need to fetch them via HTTP from an upstream service. This means that I effectively have to prefetch the entire part first to compute a non-matching multipart boundary, and only then can I splice that part into the response.
Assuming the data can be arbitrary, I don’t see how you could guarantee absence of collisions without scanning the data.
If the format of the data is very limited (like... base 64 encoded?), you may be able to pick a boundary that is known to be an illegal sequence of bytes in that format.
Even if your boundary does collide with the data, it must be followed by headers such as Content-Range, which is even more improbable, so the client is likely to treat it as an error rather than consume the wrong data.
Major Web servers use very simple strategies. Apache grabs 8 random bytes at startup and renders them in hexadecimal. nginx uses a sequential counter left-padded with zeroes.
UUIDs are designed to avoid collisions with other UUIDs, not with arbitrary data. A UUID is no more likely to be a good boundary than a completely random string of the same length. Moreover, some UUID variants include information that you may not want to disclose, such as your machine’s MAC address.
Ideally I would like to simply state in the response that a part has a length of a certain number of bytes, and be done with it. Is there any tooling that could provide me with this option?
Maybe you can avoid supporting multiple ranges and simply tell the clients to request each range separately. In that case, you don’t use the multipart format, so there is no problem.
If you do want to send multiple ranges in one response, then RFC 7233 requires the multipart format, which requires the boundary string.
You can, of course, invent your own mechanism instead of that of RFC 7233. In that case:
You cannot use 206 (Partial Content). You must use 200 (OK) or some other applicable status code.
You cannot use the multipart/byteranges media type. You must come up with your own media type.
You cannot use the Range request header.
Because a 200 (OK) response to a GET request is supposed to carry a (full) representation of the resource, you must do one of the following:
encode the requested ranges in the URL; or
use something like POST instead of GET; or
use a custom, non-standard status code instead of 200 (OK); or
(not sure if this is a correct approach) use media type parameters, send them in Accept, and add Accept to Vary.
The chunked transfer coding may be useful, but you cannot rely on it alone, because it is a property of the connection, not of the payload.

Response to Partial Content, if size is unknown. Range request like "bytes=100-"

How does Content-Range looks like, if I am requesting some range and the size in unknown.
For example my request is "bytes=100-200" and the stream will end at 150. But I do not know it before I start to stream. What should I send as Content-Range header?
bytes 100-/*
bytes 100-200/*
bytes 100-*/*
Or it is not a legal situation at all?
Same question if the request is open ended: "bytes=100-"
If you request a range that is satisfiable, the server should respond with a 206 (partial content) response. See RFC7233, sec. 4.1.
If the bytelength of the requested resource is smaller than the offset of the range interval, or the closing offset is beyond the resource length, the server should respond with a 416 (range not satisfiable). See section 4.4.
To skip the first 100 bytes of the content, you are indeed right in that the request should contain a Range: bytes=100- header. See sec. 2.1 and sec. 3.1.
As far as the situation goes for a resource which has unknown length and is being read in a way that yields content chunks of unpredictable size: This is undefined behaviour not sanctioned by any RFC. The Content-Range header is specified in a way that the current range or the total content size is unknown, but not both. You cannot resort to the HTTP envelope as a means of specifying the range length as a server must provide a Content-Range header when responding with a 206 code (cf. sec 4.1).
The correct way of handling the situation were:
Validating the range request
Attempting to read a sufficient amount of bytes from the requested resource
If a sufficient amount of bytes could have been retrieved, create the HTTP envelope, specify the range and attach the body. Cut off if needed,
In any other case: Respond with a 416

Content-Range for resuming a file of unknown length

I create a ZIP archive on-the-fly of unknown length from existing material (using Node), which is already compressed. In the ZIP archive, files just get stored; the ZIP is only used to have a single container. That's why caching the created ZIP files makes no sense -there's no real computation involved.
So far, OK. Now I want to permit resuming downloads, and I'm reading about Accept-Range, Range and Content-Range HTTP headers. A client with a broken download would ask for an open-ended range, say: Range: bytes=8000000-.
How do I answer that? My answer must include a Content-Range header, and there, according to RFC 2616 § 14.16 :
Unlike byte-ranges-specifier values (see section 14.35.1), a byte- range-resp-spec MUST only specify one range, and MUST contain absolute byte positions for both the first and last byte of the range.
So I cannot just send "everything starting from position X", I must specify the last byte sent, too - either by sending only a part of known size, or by calculating the length in advance. Both ideas are not convenient to my situation. Is there any other possibility?
Answering myself: Looks like I have to choose between (1) chunked-encoding of a file of yet unknown length, or (2) knowing its Content-Length (or at least the size of the current part), allowing for resuming downloads (as well as for progress bars).
I can live with that - for each of my ZIP files, the length will be the same, so I can store it somewhere and re-use it for subsequent downloads. I'm just surprised the HTTP protocol does not allow for resuming downloads of unknown length.
Response with "multipart/byteranges" Content-Type including Content-Range fields for each part.
Reasoning:
When replying to requests with "Range" header, successful partial responses should report 206 HTTP status code (14.35.1 Byte Ranges section)
206 response suggests either "Content-Range" header or "multipart/byteranges" Content-Type (10.2.7 206 Partial Content)
"Content-Range" header cannot be added to the response as it does not allow omitting end position, so the only left way is to use "multipart/byteranges" Content-Type

How many A records can fit in a single DNS response?

What are the size limits on DNS responses? For instance how many 'A' resource records can be present in a single DNS response? The DNS response should still be cache-able.
According to this RFC, the limit is based on the UDP message size limit, which is 512 octets. The EDNS standard supports a negotiated response with a virtually unlimited response size, but at the time of that writing (March 2011), only 65% of clients supported it (which means you can't really rely on it)
The largest guaranteed supported DNS message size is 512 bytes.
Of those, 12 are used up by the header (see §4.1.1 of RFC 1035).
The Question Section appears next, but is of variable length - specifically it'll be:
the domain name (in wire format)
two bytes each for QTYPE and QCLASS
Hence the longer your domain name is, the less room you have left over for answers.
Assuming that label compression is used (§4.1.4), each A record will require:
two bytes for the compression pointer
two bytes each for TYPE and CLASS
four bytes for the TTL
two bytes for the RDLENGTH
four bytes for the A record data itself
i.e. 16 bytes for each A record (§4.1.3).
You should if possible also include your NS records in the Authority Section.
Given all that, you might squeeze around 25 records into one response.

Resources