Is white space allowed betwee mime header field-name and ':' separator - http

Within a mime header, is white space allowed between the header field-name and ':' separator? For example, are:
Content-Type: <value>
and
Content-Type : <value>
equivalent?
Also, can you please provide a pointer to the mime standard where this is described? I checked a few but did not find it.
Thanks

Depends on what you mean by 'allowed'. RFCs 2822 (which obsoleted the 1982 RFC822) and 5322 (which obsoleted 2822) specifically forbid the insertion of WS between the field name and the colon (these are not 'MIME' standards, BTW). Note that : is not a token, and is only referenced as part of a field name, for example:
from = "From:" mailbox-list CRLF
However, the ancient RFC822 did allow space here, and the newer RFCs state that the obsolete syntax "MUST be accepted and parsed by a conformant receiver". The obsolete From: header definition, for example, was
obs-from = "From" *WSP ":" mailbox-list CRLF
Section 4 covers the obsolete syntax. I don't actually allow obsolete syntax in my own receiver, and I've never had a problem.

It isn't entirely clear whether it is or is not allowed, by the standard. However, implementations vary in how they handle whitespace between header field names and the colon. I would highly recommend avoiding whitespace there if you can.
The RFC for reference. This somewhat old article discusses the issue for HTTP headers, a similar standard.

If the question is about the HTTP then the answer is "no, not allowed". See http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p1-messaging-21.html#rfc.section.3.2

Related

Is the "?" in URLs completely arbitrary (disregarding reserved/non-escaped character problems, etc.)?

For example, if for whatever stupid reason I configured my server to parse the URL by splitting the queries by the "^" symbol (escaped if necessary) and the "-" symbol instead of the "?" and "&", would I run into any trouble at all apart from a confused user?
Will the browser/HTTP request sent treat it differently in a way that may be detrimental to my up and coming "power minus" business?
? is not arbitrary but defined in the URI RFC section 3.4 Query, I dont' think you can change that.
The Query component internal syntax (how name=value couples are encoded) is not defined by the URI RFC, separators can be defined by other specifications:
& is defined as separator of the application/x-www-form-urlencoded content type by HTML Spec. You may change this aspect supporting for example ; as separator, but you would have in any case to support & for when processing the request produced by an HTML FORM.

What is the correct newline to use with text/plain ContentType?

When a webserver claims ContentType: text/plain in an HTTP response can the client assume newlines are '\n', or '\r\n', something else, or should it allow both?
What standards specify? I am lost and confused among the standards. RFC 2046 appears to define the 'plain' subtype, but there refers to RFC 822.
I've skimmed RFC 822 but I'm confused about whether it is saying CRLF (\r\n) is explicitly not allowed (in the message body), or whether CRLF should implicitly be allowed because any ASCII character is legal after the blank line?
RFC 5322 defines the 'internet message format' and I'm not sure if that applies to HTTP (it seems intended for email), but it specifically says the ONLY CR or LF in the message body you should see is the CRLF combination..?
RFC 2046 section 4.1.1 says:
"The canonical form of any MIME "text" subtype MUST always represent a line break as a CRLF sequence. Similarly, any occurrence of CRLF in MIME "text" MUST represent a line break. Use of CR and LF outside of line break sequences is also forbidden."
To be honest though, if you're using this for parsing or display purposes I wouldn't rely on it. Most webservers are going to set the content-type from the file extension, so any Unixy file with a .txt extension is going to get the text/plain content-type (illegally, as far as the paragraph above is concerned).

Reason Phrase charset

What is the charset used for http Reason Phrase?
If I use special char è (utf8 encoded) chrome works well, but Firefox show "é".
I don't find anything about that on reference http://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html#sec6.1.1
The production in RFC 2616 is
Reason-Phrase = *<TEXT, excluding CR, LF>
and the RFC explains: “The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO-8859-1 only when encoded according to the rules of RFC 2047”. This suggests that the implied encoding is ISO-8859-1, so Firefox would be right here.

Is Request.Headers["Header-Name"] in ASP.NET case-sensitive?

Is Request.Headers["Header-Name"] in ASP.NET case-sensitive? And if it is, how should I get a certain header (e.g. "X-requested-with") if I don't know for sure what case the client will send it in?
no they are case-insensitive as per RFC2616
4.2 Message Headers
HTTP header fields, which include general-header (section 4.5),
request-header (section 5.3), response-header (section 6.2), and
entity-header (section 7.1) fields, follow the same generic format as
that given in Section 3.1 of RFC 822 [9]. Each header field consists
of a name followed by a colon (":") and the field value. Field names
are case-insensitive. The field value MAY be preceded by any amount
of LWS, though a single SP is preferred. Header fields can be
extended over multiple lines by preceding each extra line with at
least one SP or HT. Applications ought to follow "common form", where
one is known or indicated, when generating HTTP constructs, since
there might exist some implementations that fail to accept anything
Request.Headers is case-insensitive.
Borrowing from this answer:
From RFC 2616, "Hypertext Transfer Protocol -- HTTP/1.1", §4.2, "Message Headers":
Each header field consists of a name followed by a colon (":") and the field value. Field names are case-insensitive.
I never used ASP.NET but RFC HTTP/1.1 defines that message-headers field-name are case insensitive.
If ASP.NET follow HTTP Specification, Request.Header["Header-Name"] will return the same value that Request.Header["header-name"].

What is the boundary parameter in an HTTP multi-part (POST) Request?

I am trying to develop a sidebar gadget that automates the process of checking a web page for the evolution of my transfer quota. I am almost at it but there is one last step I need to get it working: Sending an HttpRequest with the correct POST data to a php page. Using a firefox plugin, here is what the "Content-Type" of the header looks like:
Content-Type=multipart/form-data; boundary=---------------------------99614912995
with the parameter "boundary" seeming to be random, and the POSTDATA is this:
POSTDATA =-----------------------------99614912995
Content-Disposition: form-data; name="SOMENAME"
Formulaire de Quota
-----------------------------99614912995
Content-Disposition: form-data; name="OTHERNAME"
SOMEDATA
-----------------------------99614912995--
I do not understand how to correctly emulate the POSTDATA with the mystery "boundary" parameter coming back.
Would someone know how I can solve this?
To quote from the RFC 1341, section 7.2.1, what I consider to be the relevant bits on the boundary parameter of the Content-Type header (for MIME):
All subtypes of "multipart" share a common syntax ...
The Content-Type field for multipart entities requires one parameter, "boundary", which is used to specify the encapsulation boundary. The encapsulation boundary is defined as a line consisting entirely of two hyphen characters ("-", decimal code 45) followed by the boundary parameter value from the Content-Type header field.
and then clarifies:
Thus, a typical multipart Content-Type header field might look like this:
Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08jU534c0p
This indicates that the entity consists of several parts, each itself with a structure that is syntactically identical to an RFC 822 message, except that the header area might be completely empty, and that the parts are each preceded by the line
--gc0p4Jq0M2Yt08jU534c0p
Things to Note:
The encapsulation boundary must occur at the beginning of a line, i.e., following a CRLF (Carriage Return-Line Feed)
The boundary must be followed immediately either by another CRLF and the header fields for the next part, or by two CRLFs, in which case there are no header fields for the next part (and it is therefore assumed to be of Content-Type text/plain).
Encapsulation boundaries must not appear within the encapsulations, and must be no longer than 70 characters, not counting the two leading hyphens.
Last but not least:
The encapsulation boundary following the last body part is a distinguished delimiter that indicates that no further body parts will follow. Such a delimiter is identical to the previous delimiters, with the addition of two more hyphens at the end of the line:
--gc0p4Jq0M2Yt08jU534c0p--
I hope this helps someone else in the future, as I had to roam for a while before getting the full picture (please ensure to read the necessary RFCs to get the deepest understanding).
The boundary parameter is set to a number of hyphens plus a random string at the end, but you can set it to anything at all. The problem is, if the boundary string shows up in the request data, it will be treated as a boundary.
For some tips, and an example function for sending multipart/form-data see my answer to this question. It wouldn't be too difficult to modify that function to use a loop for each part you would like to send.
The actual specification for multipart/form-data is in RFC 7578. Boundary is defined in Section 4.1.

Resources