How to configure `Content-Length' header in HTTP protocol - http

I don't clear about how to count `Content-Length' header in HTTP.
Take an example,
HEADER
...
Content-Type: text/html
(blank line `\r\n')
<html></html>
(blank line `\r\n')
This is a working http request sending an empty HTML page(correct me if any problem :-)). Then what should be the length of content? 15 or 17(take the blank line between header and sending entity into account)?
Thanks in advance. Best regards.

According to W3 Content-Lentgth is defined as followed:
The Content-Length entity-header field indicates the size of the
entity-body, in decimal number of OCTETs, sent to the recipient or, in
the case of the HEAD method, the size of the entity-body that would
have been sent had the request been a GET.
As far as I understand it, you have to count everything after the first line break. My answer to your question would be 15 then.

15 is the correct answer. That counts the line break at the END of the entity data, which means that line break is part of the entity, not the http protocol. DO NOT count the line break between the headers and entity.

Related

How many URLs/URIs in an HTTP POST request, and what are they?

Questions like this and material like this provide lots of useful information about HTTP POST... but none that I have found clarifies the role of that second URL (or is it a URI, or something else, and is it even secondary?) in the very first line of the POST request header:
POST /second/url/here/ HTTP/1.1
The request itself is sent to a URL (URI?), which feels "primary" to me in some sense. Can someone please clarify the role of both, and why they would be the same or different as seems to be possible?
(P.S. it was probably once so obvious that nobody thought it might ever need explaining. But now when you google for "HTTP POST", half the internet appears, and it's near impossible to see the forest for the trees...)
I don't know hat is the 1st and the 2nd URL for you. I see only one. And it's not in the body but in the first line.
L01 POST /something?query=string HTTP/1.1\r\n
L02 Host: www.example.com\r\n
L03 Foo: here another header content\r\n
L04 Content-Length: 26\r\n
L05 \r\n
L06 This=is+the+body&arg=val\r\n
Let's analyze this from the bottom:
L06: this is the body, with a size of 26 octets, and containing some data, format of this body may be more complex, like being fom-url-encoded, gzipped, may contain some other \r and \n, etc. Depends on the list of headers.
L05: body separator
L04: one of the headers, with the size of the body
L03: another header (you can have plenty of headers)
L02: important header, the Host header must be used on HTTP version 1.1, to tell the server which Virtualhost you really want
L01: the first line.
The ** first line** is:
METHOD URL PROTOCOL
Where:
METHOD: is POST
URL: is /something?query=string, everything after the ? is the query string, which does not indicate a document (this is the job of the first part) but some extra parameters (the only one you can use with GET queries).
PROTOCOL: is HTTP/1.1, means you are talking with version 1.1 of the HTTP protocol

Http Get response

I created a http client in C. I want to get just the data sent by server. The data is normally after the empty line (\r\n\r\n). The problem is when I try GET on a html page after the empty line I get a number then the line \n0.
I don't know the signification of these two numbers.
When I try GET on a image file I don't get theses two numbers.
Someone can explain me.
Does the response have "Transfer-Encoding: chunked" header?
If so, the response is chunked encoded and the numbers may be chunk-size and last-chunk. The response is split into many chunks and the each chunk-size tells size of each chunk and the last-chunk must be "0\r\n" by HTTP/1.1 specification.

What is the boundary parameter in an HTTP multi-part (POST) Request?

I am trying to develop a sidebar gadget that automates the process of checking a web page for the evolution of my transfer quota. I am almost at it but there is one last step I need to get it working: Sending an HttpRequest with the correct POST data to a php page. Using a firefox plugin, here is what the "Content-Type" of the header looks like:
Content-Type=multipart/form-data; boundary=---------------------------99614912995
with the parameter "boundary" seeming to be random, and the POSTDATA is this:
POSTDATA =-----------------------------99614912995
Content-Disposition: form-data; name="SOMENAME"
Formulaire de Quota
-----------------------------99614912995
Content-Disposition: form-data; name="OTHERNAME"
SOMEDATA
-----------------------------99614912995--
I do not understand how to correctly emulate the POSTDATA with the mystery "boundary" parameter coming back.
Would someone know how I can solve this?
To quote from the RFC 1341, section 7.2.1, what I consider to be the relevant bits on the boundary parameter of the Content-Type header (for MIME):
All subtypes of "multipart" share a common syntax ...
The Content-Type field for multipart entities requires one parameter, "boundary", which is used to specify the encapsulation boundary. The encapsulation boundary is defined as a line consisting entirely of two hyphen characters ("-", decimal code 45) followed by the boundary parameter value from the Content-Type header field.
and then clarifies:
Thus, a typical multipart Content-Type header field might look like this:
Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08jU534c0p
This indicates that the entity consists of several parts, each itself with a structure that is syntactically identical to an RFC 822 message, except that the header area might be completely empty, and that the parts are each preceded by the line
--gc0p4Jq0M2Yt08jU534c0p
Things to Note:
The encapsulation boundary must occur at the beginning of a line, i.e., following a CRLF (Carriage Return-Line Feed)
The boundary must be followed immediately either by another CRLF and the header fields for the next part, or by two CRLFs, in which case there are no header fields for the next part (and it is therefore assumed to be of Content-Type text/plain).
Encapsulation boundaries must not appear within the encapsulations, and must be no longer than 70 characters, not counting the two leading hyphens.
Last but not least:
The encapsulation boundary following the last body part is a distinguished delimiter that indicates that no further body parts will follow. Such a delimiter is identical to the previous delimiters, with the addition of two more hyphens at the end of the line:
--gc0p4Jq0M2Yt08jU534c0p--
I hope this helps someone else in the future, as I had to roam for a while before getting the full picture (please ensure to read the necessary RFCs to get the deepest understanding).
The boundary parameter is set to a number of hyphens plus a random string at the end, but you can set it to anything at all. The problem is, if the boundary string shows up in the request data, it will be treated as a boundary.
For some tips, and an example function for sending multipart/form-data see my answer to this question. It wouldn't be too difficult to modify that function to use a loop for each part you would like to send.
The actual specification for multipart/form-data is in RFC 7578. Boundary is defined in Section 4.1.

getting gmail contacts, can't figure out what to set content length to

I'm trying to retrieve my contacts using curl. I've succeeded in getting my authToken, and now am getting an error stating that I need to set the content-length in the header, but when I set the content length to 0 I get a "bad request" error.
Does anyone know what the content length is? Is it the length of the Auth key? or the length of the entire header field that contains it? I'm just poking around in the dark, and the google api doesn't seem to explain what it's looking for.
According to the HTTP standard, content-length must be greater then or equal to zero. This header can cause a "bad request" problem if:
A 'transfer-encoding' header is included in the request with certain values or
If the content-length is less than the actual length
A content-length less than zero is sent
Content length should be the size of the message body (not including headers). This would include POST data (presumably how your authToken is sent) sent with the request.
The length sent shouldn't need to be exact (though you should try!). Most browsers don't care about the length (as long as it is greater than the actual content length). If it is less than the actual content length, most browsers choke, but not the other way around. I'm assuming Google's servers will act similarly.
So, the solution appears to be that
a) the second request is a GET not a POST
and
b) the username I was passing in requires a fully qualified email (boo#gmail.com, not just boo)

How can I find out whether a server supports the Range header?

I have been trying to stream audio from a particular point by using the Range header values but I always get the song right from the beginning. I am doing this through a program so am not sure whether the problem lies in my code or on the server.
How can I find out whether the server supports the Range header param?
Thanks.
The way the HTTP spec defines it, if the server knows how to support the Range header, it will. That in turn, requires it to return a 206 Partial Content response code with a Content-Range header, when it returns content to you. Otherwise, it will simply ignore the Range header in your request, and return a 200 response code.
This might seem silly, but are you sure you're crafting a valid HTTP request header? All too commonly, I forget to specify HTTP/1.1 in the request, or forget to specify the Range specifier, such as "bytes".
Oh, and if all you want to do is check, then just send a HEAD request instead of a GET request. Same headers, same everything, just "HEAD" instead of "GET". If you receive a 206 response, you'll know Range is supported, and otherwise you'll get a 200 response.
This is for others searching how to do this. You can use curl:
curl -I http://exampleserver.com/example_video.mp4
In the header you should see
Accept-Ranges: bytes
You can go further and test retrieving a range
curl --header "Range: bytes=100-107" -I http://exampleserver.com/example_vide0.mp4
and in the headers you should see
HTTP/1.1 206 Partial Content
and
Content-Range: bytes 100-107/10000000
Content-Length: 8
[instead of 10000000 you'll see the length of the file]
Although I am a bit late in answering this question, I think my answer will help future visitors. Here is a python method that detects whether a server supports range queries or not.
def accepts_byte_ranges(self, effective_url):
"""Test if the server supports multi-part file download. Method expects effective (absolute) url."""
import pycurl
import cStringIO
import re
c = pycurl.Curl()
header = cStringIO.StringIO()
# Get http header
c.setopt(c.URL, effective_url)
c.setopt(c.NOBODY, 1)
c.setopt(c.HEADERFUNCTION, header.write)
c.perform()
c.close()
header_text = header.getvalue()
header.close()
verbose_print(header_text)
# Check if server accepts byte-ranges
match = re.search('Accept-Ranges:\s+bytes', header_text)
if match:
return True
else:
# If server explicitly specifies "Accept-Ranges: none" in the header, we do not attempt partial download.
match = re.search('Accept-Ranges:\s+none', header_text)
if match:
return False
else:
c = pycurl.Curl()
# There is still hope, try a simple byte range query
c.setopt(c.RANGE, '0-0') # First byte
c.setopt(c.URL, effective_url)
c.setopt(c.NOBODY, 1)
c.perform()
http_code = c.getinfo(c.HTTP_CODE)
c.close()
if http_code == 206: # Http status code 206 means byte-ranges are accepted
return True
else:
return False
One way is just to try, and check the response. In your case, it appears the server doesn't support ranges.
Alternatively, do a GET or HEAD on the URI, and check for the Accept-Ranges response header.
You can use GET method with 0-0 Range request header, and check whether the response code is 206 or not, which will respond with
the first and last bytes of the response body
You also can use HEAD method do the same thing as the first session which will get the same response header and code without response body
Furthermore, you can check Accept-Ranges on the response header to judge whether it can support range, but please notice if the value is none on Accept-Ranges field, it means it can't support range, and if the response header doesn't have Accept-Ranges field you also can't finger out it can't support range from it.
There is another thing you have to know if you are using 0- Range on the request header with GET method to check the response code, the response body message will be cached automatically on the TCP receive window until the cache is full.

Resources