I am making a request like so in Fiddler2
User-Agent: Fiddler
Host: asdf.example.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 0
Key=asdf:qwer
When I click Execute, Fiddler edits the last line to read:
Key=asdf: qwer
Note the additional space.
Why is this happening and could it cause problems with my request?
RFC 2616, 4.2 Message Headers:
Each header field consists
of a name followed by a colon (":") and the field value. Field names
are case-insensitive. The field value MAY be preceded by any amount
of LWS, though a single SP is preferred.
[...]
The field-content does not include any leading or trailing LWS:
linear white space occurring before the first non-whitespace
character of the field-value or after the last non-whitespace
character of the field-value. Such leading or trailing LWS MAY be
removed without changing the semantics of the field value. Any LWS
that occurs between field-content MAY be replaced with a single SP
before interpreting the field value or forwarding the message
downstream.
In other words: leading whitespace is to be ignored for the field value, and a space is even preferred. When you do want to send a space, you'll have to quote the string: Some-Header: " foo".
So it's nice of Fiddler to display (and probably send) it like that, though a custom HTTP server that doesn't expect a space there is faulty and should be repaired.
As for your comment regarding the "invalid header name" error the server returns: an HTTP header is defined as such:
message-header = field-name ":" [ field-value ]
field-name = token
field-value = [...]
As you can see, field-name can only exist of token, which does not include = (as that is a separator).
So the header name Key=asdf you use is invalid and the server throws a 400 Bad Request because of malformed syntax. The more specific Invalid header name you claim to get, sounds like you're running your site in IIS. Change the = to - for example, and you'll see it'll work.
Related
Let's say this is the http header:
Content-length: 67728\r\n
Content-type: application/x-genericbytedata-octet-stream\r\n
\r\n
Does it include \0 at the end of the header or the "Content-length" bytes start directly after the last \n ?
There is no null byte at the end, so no, it's not included.
In complement.
The Body will start after the end of the headers. And of the headers is "\r\n\r\n", "\n\r\n", "\r\n\n" or "\n\n" (that is two valid end-of-line sequences).
Adding an "\0" somewhere in your headers will almost certainly makes the server reject your request (silently or not). Trying to inject NULL character in a request is usually considered as an attack attempt.
In HTTP line separator syntax is "\r\n", and usually this is what the parsers are seeking, not the NULL character. With one optional exception, in headers, where a line starting with a space may be considered as part of the previous header (this is the obs-fold syntax), so "X:B\r\n Z\r\n" is in fact the header "X" with value "B Z".
This isn't really related to programming, but I'm using this in a program, so I thought it would be best to ask here. Essentially this is a question about handling anomalies in HTTP requests.
A standard request might look like:
GET / HTTP/1.1
Host: example.com
User-Agent: Firefox
My question is, how should HTTP handle "special characters" in parts of the HTTP request that aren't usually tampered with. For instance, what if the method was "POST ME" instead of "GET" (i.e. inclusion of a space); would this be encoded to %20?
Another example, suppose I want one of my headers to be "Class:Test: example", with the extra ":" in the header name (the header value being "example"). Would this be encoded to %3A?
Note: this isn't about whether any web servers out there would accept such encoding; this is about how it should be done. My program is a fuzz tester, so it is supposed to be testing this sort of thing!
The two question must be answered as "no" and "yes, BUT..."
The "percent encoding" you suggest is defined for content, values, not for the http language syntax. You mix protocol and payload.
You may want to take a look at the RFC that defines HTTP. It clearly defines a syntax. If you stick to that syntax you can create valid extensions (which is what you are trying to do). If you break that syntax you create invalid http requests. That would be a thing you can do inhouse, but most likely such requests won't work in the open internet, where for example proxies come into play. These have to understand your requests on y syntactical level.
For question 2 the answer is "yes, BUT", I wrote. So a few words to the BUT:
You can specify such headers and they are valid, if you encode the second ':' as you suggested. However you should understand what you are doing there: you are NOT introducing a hierarchy into header names. Instead you specify a headers content to contain a ':'. That is perfectly fine. It is up to your server component to understand, interpret and react as intended to that content.
The HTTP specification says that the method is a token, so it can't contain any delimiter characters. So "POST ME" would not be a valid method.
Similarly, header names are also tokens, so they can't contain ":". The colon is always taken to be the delimiter between the header name and its contents.
As arkascha says, you should read RFC 2616, which specifies the HTTP protocol.
For your method containing a space, this is not possible, since a request-line is defined as this:
Request-Line = Method SP Request-URI SP HTTP-Version CRLF
Method is defined as one of the HTTP/1.1 verbs or an extension-method, being a token (which cannot contain spaces). So the first space the server encounters marks the end of the method. Therefore, a method cannot contain spaces. You can percent-encode it, but the server won't know what to do with a verb like GET%20ME.
For your Class:Test: example, the http header is defined as:
message-header = field-name ":" [ field-value ]
field-name = token
field-value = *( field-content | LWS )
field-content = <the OCTETs making up the field-value
and consisting of either *TEXT or combinations
of token, separators, and quoted-string>
And TEXT is defined as:
TEXT = <any OCTET except CTLs,
but including LWS>
And CTL is defined as:
CTL = <any US-ASCII control character
(octets 0 - 31) and DEL (127)>
So no, you don't have to escape further colons (58), the first one in a header-line is always accounted as being a separator, since a colon is not allowed in a token.
So in your example the field-name is Class, while the field-value is Test: example.
I am reading through the HTTP 1.1 RFCs and I am not able to answer the following question.
We have this header:
Authorization: Basic Qmxvb21iZXJnOnRjbG1lU1JT, Basic
which is causing troubles because Rails 3 authorization parser incorrectly decodes the string because of the "," character. This is very uncommon I know, but we add this using this Apache httpd configuration:
RequestHeader append Authorization "Basic" early
The Apache mod_header documentation says:
The response header is appended to any existing header of the same
name. When a new value is merged onto an existing header it is
separated from the existing header with a comma. This is the HTTP
standard way of giving a header multiple values.
But I don't think it is correct for this Authorization header. The RFC definition does not allow this. But some headers permit comma-separated list. I am not sure if this is a general rule for all HTTP headers.
I am looking for a paragraph in the HTTP 1.1 RFC that prooves my idea this is not correct. I have already found something that is saying "this is valid only for headers that can be separated", but this is not a proof.
Multiple message-header fields with the same field-name MAY be present
in a message if and only if the entire field-value for that header
field is defined as a comma-separated list [i.e., #(values)]. It MUST
be possible to combine the multiple header fields into one
"field-name: field-value" pair, without changing the semantics of the
message, by appending each subsequent field-value to the first, each
separated by a comma. The order in which header fields with the same
field-name are received is therefore significant to the interpretation
of the combined field value, and thus a proxy MUST NOT change the
order of these field values when a message is forwarded.
It really does not make sense, but I am looking for a clear proof.
The answer is in the text you quoted:
"Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]."
This is not the case for "Authorization".
As far as I know it is allowed by the HTTP spec to set more than one HTTP header with the same name. Is there any use case to do so (from client to server and vice versa)?
HTTP 1.1 Section 4.2:
Multiple message-header fields with
the same field-name MAY be present in
a message if and only if the entire
field-value for that header field is
defined as a comma-separated list
[i.e., #(values)]. It MUST be possible
to combine the multiple header fields
into one "field-name: field-value"
pair, without changing the semantics
of the message, by appending each
subsequent field-value to the first,
each separated by a comma. The order
in which header fields with the same
field-name are received is therefore
significant to the interpretation of
the combined field value, and thus a
proxy MUST NOT change the order of
these field values when a message is
forwarded.
If I'm not wrong there is no case where multiple headers with the same name are needed.
It's commonly used for Set-Cookie:. Many servers set more than one cookie.
Of course, you can always set them all in a single header.
Actually, I think you cannot set multiple cookies in one header. So that's a necessary use-case.
The Cookie spec (RFC 2109) does claim that you can combine multiple cookies in one header the same way other headers can be combined (comma-separated), but it also points out that non-conforming syntaxes (like the Expires parameter, which has ,s in its value) are still common and must be dealt with by implementations.
So, if you use Expires params in your Set-Cookie headers and you don't want all your cookies to expire at the same time, you probably need to use multiple headers.
Update: Evolution of the Cookie spec
RFC 2109 has been obsoleted by RFC 2965 that in turn got obsoleted by RFC 6265, which is stricter on the issue:
Origin servers SHOULD NOT fold multiple Set-Cookie header fields into a single header field. The usual mechanism for folding HTTP headers fields (i.e., as defined in [RFC2616]) might change the semantics of the Set-Cookie header field because the %x2C (",") character is used by Set-Cookie in a way that conflicts with such folding.
Side note
RFC 6265 uses the verb "folding" when it refers to combining multiple header fields into one, which is ambiguous in the context of the HTTP/1 specs (both by RFC2616, and its successor, RFC 7230) where:
"folding" consistently refers to line folding, and
the verb "combine" is used to describe merging same headers.
Combining header fields:
See RFC 2616, Section 4.2, Message Headers (quoted in the question), but searching for the for the word "combine" will bring up special cases.
The above item obsoleted by RFC 7230, Section 3.2.2, Field Order:
A recipient MAY combine multiple header fields with the same field name into one field-name: field-value pair, without changing the semantics of the message, by appending each subsequent field value to the combined field value in order, separated by a comma. The order in which header fields with the same field name are received is therefore significant to the interpretation of the combined field value; a proxy MUST NOT change the order of these field values when forwarding a message.
Note: In practice, the "Set-Cookie" header field (RFC6265) often appears multiple times in a response message and does not use the list syntax, violating the above requirements on multiple header fields with the same name. Since it cannot be combined into a single field-value, recipients ought to handle Set-Cookie as a special case while processing header fields. (See Appendix A.2.3 of [Kri2001] for details.)
Line folding:
From RFC 2616, Section 2.2, Basic Rules:
HTTP/1.1 header field values can be folded onto multiple lines if the continuation line begins with a space or horizontal tab. All linear white space, including folding, has the same semantics as SP. A recipient MAY replace any linear white space with a single SP before interpreting the field value or forwarding the message downstream.
The above section obsoleted by RFC 7230, Section 3.2.4, Field Parsing:
Historically, HTTP header field values could be extended over multiple lines by preceding each extra line with at least one space or horizontal tab (obs-fold). This specification deprecates such line folding except within the message/http media type (Section 8.3.1). A sender MUST NOT generate a message that includes line folding (i.e., that has any field-value that contains a match to the obs-fold rule) unless the message is intended for packaging within the message/http media type.
A server that receives an obs-fold in a request message that is not within a message/http container MUST either reject the message by sending a 400 (Bad Request), preferably with a representation explaining that obsolete line folding is unacceptable, or replace each received obs-fold with one or more SP octets prior to interpreting the field value or forwarding the message downstream.
A proxy or gateway that receives an obs-fold in a response message that is not within a message/http container MUST either discard the message and replace it with a 502 (Bad Gateway) response, preferably with a representation explaining that unacceptable line folding was received, or replace each received obs-fold with one or more SP octets prior to interpreting the field value or forwarding the message downstream.
A user agent that receives an obs-fold in a response message that is not within a message/http container MUST replace each received obs-fold with one or more SP octets prior to interpreting the field value.
Since duplicate headers can cause issues with various web-servers and APIs (regardless of what the spec says), I doubt there is any general purpose use case where this is best practice. That's not to say someone somewhere isn't doing it, of course.
As you're looking for use-cases, maybe Accept would be a valid one.
Accept: application/json
Accept: application/xml
It's only allowed for headers using a very specific format, see RFC 2616, Section 4.2.
Old thread, but I was looking into this same issue. Anyway, the Accept and Accept-Encoding headers are typical examples that uses multiple values, comma separated. Even if these are request specific header, the specs do not differentiate between request and response at this level. Check the one from this page.
What the spec says is that if you have commas as character in the value of the header, you cannot use multiple headers of the same name, unless you disambiguate the use of the comma.
Are urls of the form http://asdf.com/something.do?param1=true?param2=false valid?
I don't think the second ? is allowed in valid urls and that it should instead be an ampersand (&), but I'm unable to find anything about this in the http 1.1 rfc. Any ideas?
It is not valid to use ? again. ? should indicate the start of the parameter list. & should separate parameters.
From RFC 3986:
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
From RFC 1738 :
An HTTP URL takes the form:
http:// <host> : <port> / <path> ? <searchpart>
where <host> and <port> are as described in Section 3.1. If :<port>
is omitted, the port defaults to 80. No user name or password is
allowed. <path> is an HTTP selector, and <searchpart> is a query
string. The <path> is optional, as is the <searchpart> and its
preceding "?". If neither <path> nor <searchpart> is present, the "/"
may also be omitted.
Within the <path> and <searchpart> components, "/", ";", "?" are
reserved. The "/" character may be used within HTTP to designate a
hierarchical structure.
The search part/query part is described here.
use & for the second and third
i.e. http://asdf.com/something.do?param1=true¶m2=false
application/x-www-form-urlencoded
This is the default content type. Forms submitted with this content type must be encoded as follows:
Control names and values are escaped. Space characters are replaced by +, and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by %HH, a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., %0D%0A).
The control names/values are listed in the order they appear in the document. The name is separated from the value by = and name/value pairs are separated from each other by &.
— application/x-www-form-urlencoded
As mentioned, it's not valid to use it again. However, if you have the ? character as part of a parameter value, you can encode it as %63 (just like the space character which gets encoded as %20).