Let's say this is the http header:
Content-length: 67728\r\n
Content-type: application/x-genericbytedata-octet-stream\r\n
\r\n
Does it include \0 at the end of the header or the "Content-length" bytes start directly after the last \n ?
There is no null byte at the end, so no, it's not included.
In complement.
The Body will start after the end of the headers. And of the headers is "\r\n\r\n", "\n\r\n", "\r\n\n" or "\n\n" (that is two valid end-of-line sequences).
Adding an "\0" somewhere in your headers will almost certainly makes the server reject your request (silently or not). Trying to inject NULL character in a request is usually considered as an attack attempt.
In HTTP line separator syntax is "\r\n", and usually this is what the parsers are seeking, not the NULL character. With one optional exception, in headers, where a line starting with a space may be considered as part of the previous header (this is the obs-fold syntax), so "X:B\r\n Z\r\n" is in fact the header "X" with value "B Z".
Related
When a webserver claims ContentType: text/plain in an HTTP response can the client assume newlines are '\n', or '\r\n', something else, or should it allow both?
What standards specify? I am lost and confused among the standards. RFC 2046 appears to define the 'plain' subtype, but there refers to RFC 822.
I've skimmed RFC 822 but I'm confused about whether it is saying CRLF (\r\n) is explicitly not allowed (in the message body), or whether CRLF should implicitly be allowed because any ASCII character is legal after the blank line?
RFC 5322 defines the 'internet message format' and I'm not sure if that applies to HTTP (it seems intended for email), but it specifically says the ONLY CR or LF in the message body you should see is the CRLF combination..?
RFC 2046 section 4.1.1 says:
"The canonical form of any MIME "text" subtype MUST always represent a line break as a CRLF sequence. Similarly, any occurrence of CRLF in MIME "text" MUST represent a line break. Use of CR and LF outside of line break sequences is also forbidden."
To be honest though, if you're using this for parsing or display purposes I wouldn't rely on it. Most webservers are going to set the content-type from the file extension, so any Unixy file with a .txt extension is going to get the text/plain content-type (illegally, as far as the paragraph above is concerned).
I am making a request like so in Fiddler2
User-Agent: Fiddler
Host: asdf.example.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 0
Key=asdf:qwer
When I click Execute, Fiddler edits the last line to read:
Key=asdf: qwer
Note the additional space.
Why is this happening and could it cause problems with my request?
RFC 2616, 4.2 Message Headers:
Each header field consists
of a name followed by a colon (":") and the field value. Field names
are case-insensitive. The field value MAY be preceded by any amount
of LWS, though a single SP is preferred.
[...]
The field-content does not include any leading or trailing LWS:
linear white space occurring before the first non-whitespace
character of the field-value or after the last non-whitespace
character of the field-value. Such leading or trailing LWS MAY be
removed without changing the semantics of the field value. Any LWS
that occurs between field-content MAY be replaced with a single SP
before interpreting the field value or forwarding the message
downstream.
In other words: leading whitespace is to be ignored for the field value, and a space is even preferred. When you do want to send a space, you'll have to quote the string: Some-Header: " foo".
So it's nice of Fiddler to display (and probably send) it like that, though a custom HTTP server that doesn't expect a space there is faulty and should be repaired.
As for your comment regarding the "invalid header name" error the server returns: an HTTP header is defined as such:
message-header = field-name ":" [ field-value ]
field-name = token
field-value = [...]
As you can see, field-name can only exist of token, which does not include = (as that is a separator).
So the header name Key=asdf you use is invalid and the server throws a 400 Bad Request because of malformed syntax. The more specific Invalid header name you claim to get, sounds like you're running your site in IIS. Change the = to - for example, and you'll see it'll work.
This isn't really related to programming, but I'm using this in a program, so I thought it would be best to ask here. Essentially this is a question about handling anomalies in HTTP requests.
A standard request might look like:
GET / HTTP/1.1
Host: example.com
User-Agent: Firefox
My question is, how should HTTP handle "special characters" in parts of the HTTP request that aren't usually tampered with. For instance, what if the method was "POST ME" instead of "GET" (i.e. inclusion of a space); would this be encoded to %20?
Another example, suppose I want one of my headers to be "Class:Test: example", with the extra ":" in the header name (the header value being "example"). Would this be encoded to %3A?
Note: this isn't about whether any web servers out there would accept such encoding; this is about how it should be done. My program is a fuzz tester, so it is supposed to be testing this sort of thing!
The two question must be answered as "no" and "yes, BUT..."
The "percent encoding" you suggest is defined for content, values, not for the http language syntax. You mix protocol and payload.
You may want to take a look at the RFC that defines HTTP. It clearly defines a syntax. If you stick to that syntax you can create valid extensions (which is what you are trying to do). If you break that syntax you create invalid http requests. That would be a thing you can do inhouse, but most likely such requests won't work in the open internet, where for example proxies come into play. These have to understand your requests on y syntactical level.
For question 2 the answer is "yes, BUT", I wrote. So a few words to the BUT:
You can specify such headers and they are valid, if you encode the second ':' as you suggested. However you should understand what you are doing there: you are NOT introducing a hierarchy into header names. Instead you specify a headers content to contain a ':'. That is perfectly fine. It is up to your server component to understand, interpret and react as intended to that content.
The HTTP specification says that the method is a token, so it can't contain any delimiter characters. So "POST ME" would not be a valid method.
Similarly, header names are also tokens, so they can't contain ":". The colon is always taken to be the delimiter between the header name and its contents.
As arkascha says, you should read RFC 2616, which specifies the HTTP protocol.
For your method containing a space, this is not possible, since a request-line is defined as this:
Request-Line = Method SP Request-URI SP HTTP-Version CRLF
Method is defined as one of the HTTP/1.1 verbs or an extension-method, being a token (which cannot contain spaces). So the first space the server encounters marks the end of the method. Therefore, a method cannot contain spaces. You can percent-encode it, but the server won't know what to do with a verb like GET%20ME.
For your Class:Test: example, the http header is defined as:
message-header = field-name ":" [ field-value ]
field-name = token
field-value = *( field-content | LWS )
field-content = <the OCTETs making up the field-value
and consisting of either *TEXT or combinations
of token, separators, and quoted-string>
And TEXT is defined as:
TEXT = <any OCTET except CTLs,
but including LWS>
And CTL is defined as:
CTL = <any US-ASCII control character
(octets 0 - 31) and DEL (127)>
So no, you don't have to escape further colons (58), the first one in a header-line is always accounted as being a separator, since a colon is not allowed in a token.
So in your example the field-name is Class, while the field-value is Test: example.
I am writing an HTTP parser for a transparent proxy. What is stumping me is the Trailer: mentioned in the specs for Transfer-Encoding: chunked. What does it look like?
Normally, a HTTP chunked ends like this.
0\r\n
\r\n
What I am confused about is how to detect the end of the chunk if there is some sort of trailing headers...
UPDATE: I believe that a simple \r\n\r\n i.e. an empty line is enough to detect the end of trailing headers... Is that correct?
Below is a copy of an example trailer I copied from The TCP/IP Guide site.
As we can see, if we want to use trailer header, we need add a "Trailer:header_name" header field with a header name and then add the trailer header entity after chunked body area.
We can add 0 or more trailer headers in a HTTP body per the RFC.
Section 4.1.2 of RFC7230 bans the use of following headers in trailer header area:
A sender MUST NOT generate a trailer that contains a field necessary
for message framing (e.g., Transfer-Encoding and Content-Length),
routing (e.g., Host), request modifiers (e.g., controls and
conditionals in Section 5 of RFC7231), authentication (e.g., see
RFC7235 and RFC6265), response control data (e.g., see Section 7.1
of RFC7231), or determining how to process the payload (e.g.,
Content-Encoding, Content-Type, Content-Range, and Trailer).
This means we can use other standard headers and custom headers in trailer header area.
0\r\n
SomeAfterHeader: TheData \r\n
\r\n
In other words, it is sufficient to look for a \r\n\r\n, in layman's terms: a blank line. To detect the end of a chunked transmission. But it is very important that each chunk is read before doing this. Because the chunked data itself can contain blank lines which would erroneously be detected as the end of the stream.
Regarding trailer:
The list of trailing headers should be specified in the Trailer header, as you note.
The BNF in Section 14.40 of RFC 2616 is this:
Trailer = "Trailer" ":" 1#field-name
Gourley and Totty give this example:
Trailer: Content-Length
(It's odd that they give this example, since Content-Length is explicitly forbidden to be a trailing header in 14.40.)
Shiflett gives this example:
Trailer: Date
Regarding end of message with trailing headers:
The BNF in Section 3.6.1 of RFC 2616 is what you're looking for. Here's part:
Chunked-Body = *chunk
last-chunk
trailer
CRLF
last-chunk = 1*("0") [ chunk-extension ] CRLF
trailer = *(entity-header CRLF)
So the last chunk and 2 trailing headers might look like this:
0<CRLF>
Date:Sun, 06 Nov 1994 08:49:37 GMT<CRLF>
Content-MD5:1B2M2Y8AsgTpgAmY7PhCfg==<CRLF>
<CRLF>
I am trying to develop a sidebar gadget that automates the process of checking a web page for the evolution of my transfer quota. I am almost at it but there is one last step I need to get it working: Sending an HttpRequest with the correct POST data to a php page. Using a firefox plugin, here is what the "Content-Type" of the header looks like:
Content-Type=multipart/form-data; boundary=---------------------------99614912995
with the parameter "boundary" seeming to be random, and the POSTDATA is this:
POSTDATA =-----------------------------99614912995
Content-Disposition: form-data; name="SOMENAME"
Formulaire de Quota
-----------------------------99614912995
Content-Disposition: form-data; name="OTHERNAME"
SOMEDATA
-----------------------------99614912995--
I do not understand how to correctly emulate the POSTDATA with the mystery "boundary" parameter coming back.
Would someone know how I can solve this?
To quote from the RFC 1341, section 7.2.1, what I consider to be the relevant bits on the boundary parameter of the Content-Type header (for MIME):
All subtypes of "multipart" share a common syntax ...
The Content-Type field for multipart entities requires one parameter, "boundary", which is used to specify the encapsulation boundary. The encapsulation boundary is defined as a line consisting entirely of two hyphen characters ("-", decimal code 45) followed by the boundary parameter value from the Content-Type header field.
and then clarifies:
Thus, a typical multipart Content-Type header field might look like this:
Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08jU534c0p
This indicates that the entity consists of several parts, each itself with a structure that is syntactically identical to an RFC 822 message, except that the header area might be completely empty, and that the parts are each preceded by the line
--gc0p4Jq0M2Yt08jU534c0p
Things to Note:
The encapsulation boundary must occur at the beginning of a line, i.e., following a CRLF (Carriage Return-Line Feed)
The boundary must be followed immediately either by another CRLF and the header fields for the next part, or by two CRLFs, in which case there are no header fields for the next part (and it is therefore assumed to be of Content-Type text/plain).
Encapsulation boundaries must not appear within the encapsulations, and must be no longer than 70 characters, not counting the two leading hyphens.
Last but not least:
The encapsulation boundary following the last body part is a distinguished delimiter that indicates that no further body parts will follow. Such a delimiter is identical to the previous delimiters, with the addition of two more hyphens at the end of the line:
--gc0p4Jq0M2Yt08jU534c0p--
I hope this helps someone else in the future, as I had to roam for a while before getting the full picture (please ensure to read the necessary RFCs to get the deepest understanding).
The boundary parameter is set to a number of hyphens plus a random string at the end, but you can set it to anything at all. The problem is, if the boundary string shows up in the request data, it will be treated as a boundary.
For some tips, and an example function for sending multipart/form-data see my answer to this question. It wouldn't be too difficult to modify that function to use a loop for each part you would like to send.
The actual specification for multipart/form-data is in RFC 7578. Boundary is defined in Section 4.1.