HTTP multirange requests - headers in response - http

I'm using multirange http requests like this
"curl --range 1-2,2-3 http://some.url"
The response is like
--00000000000000030705 Content-Type: text/html; charset=utf-8 Content-Range: bytes 1-2/13882393
il
--00000000000000030705 Content-Type: text/html; charset=utf-8 Content-Range: bytes 2-3/13882393
le
--00000000000000030705--
How can I remove fields Content-Type and Content-Range from response to get a raw data from server (without parsing on client side)?
I want to get response like:
"ille"
Thanks a lot!

You probably can't. The server is conforming to the spec, as described by the RFC.
If multiple parts are being transferred, the server generating the 206 response must generate a "multipart/byteranges" payload, as defined in Appendix A, and a Content-Type header field containing the multipart/byteranges media type and its required boundary parameter. To avoid confusion with single-part responses, a server must not generate a Content-Range header field in the HTTP header section of a multiple part response (this field will be sent in each part instead).
In the case of contiguous multi ranges the server may send the response without the multipart boundaries but this is optional.
When multiple ranges are requested, a server may coalesce any of the ranges that overlap, or that are separated by a gap that is smaller than the overhead of sending multiple parts, regardless of the order in which the corresponding byte-range-spec appeared in the received Range header field. Since the typical overhead between parts of a multipart/byteranges payload is around 80 bytes, depending on the selected representation's media type and the chosen boundary parameter length, it can be less efficient to transfer many small disjoint parts than it is to transfer the entire selected representation.
Within the header area of each body part in the multipart payload, the server must generate a Content-Range header field corresponding to the range being enclosed in that body part. If the selected representation would have had a Content-Type header field in a 200 (OK) response, the server should generate that same Content-Type field in the header area of each body part. For example:
Assuming your server conforms to the spec, sending a single range 1-3 you will get a single body.

Related

Multiple content length headers and multiple transfer encoding headers

If there are multiple content length headers, should I
fail (I don't think so?)
use the first one,
use the last one
Then I have the same question for transfer_encoding header too. I think with transfer_encoding we are supposed to use the last one.
then, same question for 'Host' header as well.
thanks,
Dean
Content-Length is a single-value header. Usually the last header would have authority; however RFC 7230, section 3.3.2 states:
If a message is received that has multiple Content-Length header fields with field-values consisting of the same decimal value, or a single Content-Length header field with a field value containing a list of identical decimal values (e.g., "Content-Length: 42, 42"), indicating that duplicate Content-Length header fields have been generated or combined by an upstream message processor, then the recipient MUST either reject the message as invalid or replace the duplicated field-values with a single valid Content-Length field containing that decimal value prior to determining the message body length or forwarding the message.
Transfer-Encoding is a different matter as it is containing a list. There can be multiple ones and all are valid. The important thing here is that applied encodings have to be listed in the order in which they have been applied. E.g. if content has been gzipped and then chunk-encoded, the headers have to look like
Transfer-Encoding: gzip, chunked
or
Transfer-Encoding: gzip
Transfer-Encoding: chunked
WRT Content-Length: yes, you actually MUST fail (unless both values are the same, in which case you MAY pick one). See RFC 7230.
"Transfer-Encoding" is different in that it allows multiple values; so you'll have to process them all in order.

What is the difference between a request payload and request body?

I am learning HTTP. I enclose a request payload in XML or JSON format in my POST requests. What I wanted to know is whether a request payload and request body mean the same thing?
Definition of: payload : The "actual data" in a packet or file minus all headers attached for transport and minus all descriptive meta-data. In a network packet, headers are appended to the payload for transport and then discarded at their destination.
Edit: In Http protocol, an http packet has http headers and http payload.So payload section of http packet may or may not have a body depending upon the type of request (e.g. POST vs GET). So payload and body are not the same thing.
Payload is the "wrapper" to the body
Payload is something one carries. A paperboy's payload is a pile of newspapers and a HTTP POST request's payload is whatever comes in the "body".
What I wanted to know is whether a request payload and request body mean the same thing?
No, they have different meanings. A payload (a.k.a. content) is a part of representation data while a body is a part of a message, which are two different HTTP concepts. A representation (data and metadata) is transferred as a single or multiple messages, so a message encloses a complete or partial representation. The representation metadata are enclosed in the header fields of a message and the representation data, the payload, are enclosed in the body of a message, as is or transfer-encoded.
References
RFC 9110: HTTP Semantics defines the term representation:
3.2. Representations
A "representation" is information that is intended to reflect a past, current, or desired state of a given resource, in a format that can be readily communicated via the protocol. A representation consists of a set of representation metadata and a potentially unbounded stream of representation data (Section 8).
Notice that the definition is independent of the version of HTTP because it is about semantics.
RFC 9112: HTTP/1.1 defines the term message:
2.1. Message Format
An HTTP/1.1 message consists of a start-line followed by a CRLF and a sequence of octets in a format similar to the Internet Message Format [RFC5322]: zero or more header field lines (collectively referred to as the "headers" or the "header section"), an empty line indicating the end of the header section, and an optional message body.
HTTP-message = start-line CRLF
*( field-line CRLF )
CRLF
[ message-body ]
Notice that the definition depends on the version of HTTP because it is about syntax.
RFC 9110: HTTP Semantics defines the term content:
6.4. Content
HTTP messages often transfer a complete or partial representation as the message "content": a stream of octets sent after the header section, as delineated by the message framing.
This abstract definition of content reflects the data after it has been extracted from the message framing. For example, an HTTP/1.1 message body (Section 6 of [HTTP/1.1]) might consist of a stream of data encoded with the chunked transfer coding -- a sequence of data chunks, one zero-length chunk, and a trailer section -- whereas the content of that same message includes only the data stream after the transfer coding has been decoded; it does not include the chunk lengths, chunked framing syntax, nor the trailer fields (Section 6.5).
Note: Some field names have a "Content-" prefix. This is an informal convention; while some of these fields refer to the content of the message, as defined above, others are scoped to the selected representation (Section 3.2). See the individual field's definition to disambiguate.
RFC 9110: HTTP Semantics substitutes the term content for payload used in previous RFCs:
B.3. Changes from RFC 7231
[…]
The terms "payload" and "payload body" have been replaced with "content", to better align with its usage elsewhere (e.g., in field names) and to avoid confusion with frame payloads in HTTP/2 and HTTP/3. (Section 6.4)
Header identifies source & destination of the sent packet, whereas the actual data i.e Body is referred to as Payload
The start-line and HTTP headers of the HTTP message are collectively known as the head of the requests, whereas its payload is known as the body
So Yes, they are the same thing.
Got this from https://developer.mozilla.org/en-US/docs/Web/HTTP/Messages
Payload of HTTP message is known as the body. link1
The HTTP message payload body is the information ("payload") part of the data that is sent in the HTTP Message Body (if any), prior to transfer encoding being applied. If transfer encoding is not used, the payload body and message body are the same thing! link2
So basically the only difference between HTTP message body and HTTP message payload body is encoding (but only if present). So generalizing the term request payload = request body.

What does ajax GET data look like at the lowest level?

GET requests can be used to retrieve images and text and other things I'm guessing. There is no need to specify Content-type.
What does this data look like at the bit level? If you are looking at the 1s and 0s in the HTTP packet, what specification governs what can be put here.
Using the client, when I send / receive data via ajax GET, is the data directly transferred into 1s and 0s in the packet or is there some sort of transformation?
For example:
xhr = new win.XMLHttpRequest();
xhr.open('GET', config_ajax.url, true);
xhr.onload = function () {
if (this.status === 200) {
config_ajax.callback(xhr.responseText);
}
};
xhr.send(send);
// example data
send = "0xFF";
xhr.responseText = "0x0A"
Would one see 11111111 being sent and 00001010 being received if they were analyzing the bit stream?
I guess there is no need to specify Content-type. What does this data look like at the bit level? If you are looking at the 1s and 0s in the HTTP packet, what specification governs what can be put here.
You're looking for the HTTP specification itself, section 7. Indeed it works quite like you assumed:
7 Entity
Request and Response messages MAY transfer an entity if not otherwise
restricted by the request method or response status code. An
entity consists of entity-header fields and an entity-body, although
some responses will only include the entity-headers.
In this section, both sender and recipient refer to either the client
or the server, depending on who sends and who receives the entity.
In this section, both sender and recipient refer to either the
client or the server, depending on who sends and who receives the
entity.
7.1 Entity Header Fields
Entity-header fields define metainformation about the entity-body
or, if no body is present, about the resource identified by the
request. Some of this metainformation is OPTIONAL; some might be
REQUIRED by portions of this specification.
entity-header = Allow ; Section 14.7
| Content-Encoding ; Section 14.11
| Content-Language ; Section 14.12
| Content-Length ; Section 14.13
| Content-Location ; Section 14.14
| Content-MD5 ; Section 14.15
| Content-Range ; Section 14.16
| Content-Type ; Section 14.17
| Expires ; Section 14.21
| Last-Modified ; Section 14.29
| extension-header
extension-header = message-header
The extension-header mechanism allows additional entity-header
fields to be defined without changing the protocol, but these
fields cannot be assumed to be recognizable by the recipient.
Unrecognized header fields SHOULD be ignored by the recipient and
MUST be forwarded by transparent proxies.
7.2 Entity Body
The entity-body (if any) sent with an HTTP request or response is
in a format and encoding defined by the entity-header fields.
entity-body = *OCTET
An entity-body is only present in a message when a message-body is
present, as described in section 4.3. The entity-body is obtained
from the message-body by decoding any Transfer-Encoding that might
have been applied to ensure safe and proper transfer of the message.
7.2.1 Type
When an entity-body is included with a message, the data type of
that body is determined via the header fields Content-Type and
Content- Encoding. These define a two-layer, ordered encoding
model:
entity-body := Content-Encoding( Content-Type( data ) )
Content-Type specifies the media type of the underlying data.
Content-Encoding may be used to indicate any additional content
codings applied to the data, usually for the purpose of data
compression, that are a property of the requested resource. There is
no default encoding.
Any HTTP/1.1 message containing an entity-body SHOULD include a
Content-Type header field defining the media type of that body. If
and only if the media type is not given by a Content-Type field, the
recipient MAY attempt to guess the media type via inspection of its
content and/or the name extension(s) of the URI used to identify the
resource. If the media type remains unknown, the recipient SHOULD
treat it as type "application/octet-stream".
7.2.2 Entity Length
The entity-length of a message is the length of the message-body
before any transfer-codings have been applied. Section 4.4 defines
how the transfer-length of a message-body is determined.

HTTP Chunked Encoding. Need an example of 'Trailer' mentioned in SPEC

I am writing an HTTP parser for a transparent proxy. What is stumping me is the Trailer: mentioned in the specs for Transfer-Encoding: chunked. What does it look like?
Normally, a HTTP chunked ends like this.
0\r\n
\r\n
What I am confused about is how to detect the end of the chunk if there is some sort of trailing headers...
UPDATE: I believe that a simple \r\n\r\n i.e. an empty line is enough to detect the end of trailing headers... Is that correct?
Below is a copy of an example trailer I copied from The TCP/IP Guide site.
As we can see, if we want to use trailer header, we need add a "Trailer:header_name" header field with a header name and then add the trailer header entity after chunked body area.
We can add 0 or more trailer headers in a HTTP body per the RFC.
Section 4.1.2 of RFC7230 bans the use of following headers in trailer header area:
A sender MUST NOT generate a trailer that contains a field necessary
for message framing (e.g., Transfer-Encoding and Content-Length),
routing (e.g., Host), request modifiers (e.g., controls and
conditionals in Section 5 of RFC7231), authentication (e.g., see
RFC7235 and RFC6265), response control data (e.g., see Section 7.1
of RFC7231), or determining how to process the payload (e.g.,
Content-Encoding, Content-Type, Content-Range, and Trailer).
This means we can use other standard headers and custom headers in trailer header area.
0\r\n
SomeAfterHeader: TheData \r\n
\r\n
In other words, it is sufficient to look for a \r\n\r\n, in layman's terms: a blank line. To detect the end of a chunked transmission. But it is very important that each chunk is read before doing this. Because the chunked data itself can contain blank lines which would erroneously be detected as the end of the stream.
Regarding trailer:
The list of trailing headers should be specified in the Trailer header, as you note.
The BNF in Section 14.40 of RFC 2616 is this:
Trailer = "Trailer" ":" 1#field-name
Gourley and Totty give this example:
Trailer: Content-Length
(It's odd that they give this example, since Content-Length is explicitly forbidden to be a trailing header in 14.40.)
Shiflett gives this example:
Trailer: Date
Regarding end of message with trailing headers:
The BNF in Section 3.6.1 of RFC 2616 is what you're looking for. Here's part:
Chunked-Body = *chunk
last-chunk
trailer
CRLF
last-chunk = 1*("0") [ chunk-extension ] CRLF
trailer = *(entity-header CRLF)
So the last chunk and 2 trailing headers might look like this:
0<CRLF>
Date:Sun, 06 Nov 1994 08:49:37 GMT<CRLF>
Content-MD5:1B2M2Y8AsgTpgAmY7PhCfg==<CRLF>
<CRLF>

How can I find out whether a server supports the Range header?

I have been trying to stream audio from a particular point by using the Range header values but I always get the song right from the beginning. I am doing this through a program so am not sure whether the problem lies in my code or on the server.
How can I find out whether the server supports the Range header param?
Thanks.
The way the HTTP spec defines it, if the server knows how to support the Range header, it will. That in turn, requires it to return a 206 Partial Content response code with a Content-Range header, when it returns content to you. Otherwise, it will simply ignore the Range header in your request, and return a 200 response code.
This might seem silly, but are you sure you're crafting a valid HTTP request header? All too commonly, I forget to specify HTTP/1.1 in the request, or forget to specify the Range specifier, such as "bytes".
Oh, and if all you want to do is check, then just send a HEAD request instead of a GET request. Same headers, same everything, just "HEAD" instead of "GET". If you receive a 206 response, you'll know Range is supported, and otherwise you'll get a 200 response.
This is for others searching how to do this. You can use curl:
curl -I http://exampleserver.com/example_video.mp4
In the header you should see
Accept-Ranges: bytes
You can go further and test retrieving a range
curl --header "Range: bytes=100-107" -I http://exampleserver.com/example_vide0.mp4
and in the headers you should see
HTTP/1.1 206 Partial Content
and
Content-Range: bytes 100-107/10000000
Content-Length: 8
[instead of 10000000 you'll see the length of the file]
Although I am a bit late in answering this question, I think my answer will help future visitors. Here is a python method that detects whether a server supports range queries or not.
def accepts_byte_ranges(self, effective_url):
"""Test if the server supports multi-part file download. Method expects effective (absolute) url."""
import pycurl
import cStringIO
import re
c = pycurl.Curl()
header = cStringIO.StringIO()
# Get http header
c.setopt(c.URL, effective_url)
c.setopt(c.NOBODY, 1)
c.setopt(c.HEADERFUNCTION, header.write)
c.perform()
c.close()
header_text = header.getvalue()
header.close()
verbose_print(header_text)
# Check if server accepts byte-ranges
match = re.search('Accept-Ranges:\s+bytes', header_text)
if match:
return True
else:
# If server explicitly specifies "Accept-Ranges: none" in the header, we do not attempt partial download.
match = re.search('Accept-Ranges:\s+none', header_text)
if match:
return False
else:
c = pycurl.Curl()
# There is still hope, try a simple byte range query
c.setopt(c.RANGE, '0-0') # First byte
c.setopt(c.URL, effective_url)
c.setopt(c.NOBODY, 1)
c.perform()
http_code = c.getinfo(c.HTTP_CODE)
c.close()
if http_code == 206: # Http status code 206 means byte-ranges are accepted
return True
else:
return False
One way is just to try, and check the response. In your case, it appears the server doesn't support ranges.
Alternatively, do a GET or HEAD on the URI, and check for the Accept-Ranges response header.
You can use GET method with 0-0 Range request header, and check whether the response code is 206 or not, which will respond with
the first and last bytes of the response body
You also can use HEAD method do the same thing as the first session which will get the same response header and code without response body
Furthermore, you can check Accept-Ranges on the response header to judge whether it can support range, but please notice if the value is none on Accept-Ranges field, it means it can't support range, and if the response header doesn't have Accept-Ranges field you also can't finger out it can't support range from it.
There is another thing you have to know if you are using 0- Range on the request header with GET method to check the response code, the response body message will be cached automatically on the TCP receive window until the cache is full.

Resources