Email vs http: line endings - http

Suppose you have a plain content part:
Header: value
Header: value
content...
In HTTP, there is no requirement to add an additional "\r\n\r\n" at the end of the content (Should newline be included in http response content length?), it uses only content-length (or chunked) to determine the size of the message.
However the question is: is it necessary for email? I can't find the exact place in the (many) rfcs related to mail that define how a "normal" content part should end.

MIME content parts end just before the newline before the content separator. Thus in the following fragment
--moo
Foo
--moo
Bar
--moo--
... the first part lacks a trailing newline, whereas the second ends with one.
Pre-MIME messages were not explicitly standardized on this particular point; but due to the requirements of SMTP, it wasn't possible to transmit a message which didn't have a final newline at the end.

Related

Do I need to explicitly fold header lines in TidHttp ver 10.5498?

Please can someone (Remy Lebeau?) clarify the point on header line folding in TidHTTP?
My server expects headers to be folded if the line exceeds 998 characters, which one of mine certainly will.
Among many other posts discussing this I saw this one which is a more or less definitive post from a while back where Remy says
by default the TIdHeaderList.FoldLines property is set to True
and
the default value of the TIdHeaderList.FoldLength property is 78
which seem to indicate that I don't need to do anything special to get my headers folded when using TIdHTTP.
However, looking at the source code of TidHTTP I find comments from Remy such as these
(in TIdCustomHTTP.Post)
Currently when issuing a POST, IdHTTP will automatically set the
protocol to version 1.0 independently of the value it had initially.
and
(in TIdHTTPProtocol.BuildAndSendRequest)
TODO: disable header folding for HTTP 1.0 requests
Which appear to indicate that my request is going to be using HTTP 1.0 requests anyway, regardless if I ask for 1.1 or not and that the header lines will not be folded regardless.
My question therefore is simply; when using TidHttp ver 10.5498 do I need the lines
IdHTTP1.Request.CustomHeaders.FoldLines := true;
IdHTTP1.Request.CustomHeaders.FoldLength := 998; //could be less, but not more
or can I simply accept the defaults and be confident that my headers will be correctly folded?
The default FoldLength is 78 chars unless the QuoteType is QuoteHTTP, then the default is MaxInt instead (effectively disabling folding for HTTP headers even if FoldLines is True). So, if you want your HTTP headers folded at 998 chars, you do need to set the FoldLength manually.
Note that while RFC 1945 (for HTTP 1.0) and RFC 2616 (for HTTP 1.1) do allow headers to be folded:
Header fields can be extended over multiple lines by preceding each extra line with at least one SP or HT, though this is not recommended.
<nbsp;>
Header fields can be extended over multiple lines by preceding each extra line with at least one SP or HT.
RFC 7230 (which updates HTTP 1.1) deprecates that practice:
Historically, HTTP header field values could be extended over multiple lines by preceding each extra line with at least one space or horizontal tab (obs-fold). This specification deprecates such line folding except within the message/http media type (Section 8.3.1). A sender MUST NOT generate a message that includes line folding (i.e., that has any field-value that contains a match to the obs-fold rule) unless the message is intended for packaging within the message/http media type.
As for TIdHTTP forcing HTTP 1.0 for POST requests, you can prevent that by enabling the hoKeepOrigProtocol flag in the TIdHTTP.HTTPOptions property.

HTTP header - about the delimiter

Recently I read something about HTTP header format, I think I found some rule about it, but I cannot confirm it.
for example:
Some-Header:Foo;x=foo_attr,Bar;y=bar_attr\r\n
Foo and Bar are the list items of Some-Header, x is the attribute of Foo, y is the attribute of Bar, right?
If it is right, "," should be the delimiter of header items, and ";" should be the delimiter of the attributes of header item.
Unfortunately, how a HTTP header should be parsed depends on the header. You can't really look at a header and make assumptions about the structural format, because the format differs on a per-header basis.
What can be said is that for almost all HTTP headers, the comma represents multiple values your example is identical to:
Some-Header:Foo;x=foo_attr
Some-Header:Bar;y=bar_attr
However, there are exceptions to this rule. You can't for example do the same thing with the Set-Cookie header. Set-Cookie is the only exception I can recall from the top of my head though. (there might be more).
But aside from that, it's basically up to you. If you're defining Some-Headers then you need to tell implementors how to parse it.
There currently an effort to come up a standard way to describe structures in headers. You can read the current draft here:
draft-ietf-httpbis-header-structure

What standard specifies inner structure of the query component in HTTP URI

According to RFC3986 (URI),
The query component is indicated by the first question
mark ("?") character and terminated by a number sign ("#") character
or by the end of the URI.
And specifies what characters are allowed inside. That's generic URI.
In daily interaction with various HTTP/Web servers, in URI http scheme, we're seeing query components represented as key=value pairs separated by & sign. RFC7230 (HTTP/1.1) says nothing about it, just that the content of the query component corresponds to RFC3986 generic definition.
The only standard defining said key-value pairs is HTML 4.01 while talking about content type application/x-www-form-urlencoded. It's also the only standard saying + should be treated as Space character in the query component.
However, as far as I could dig up in the specs, Content-Type header only applies to the message body, not its URI. And when, as an experiment, I'm googling for "asd zxc" Chrome sends the request /search?q=zxc+asd to Google without specifying said application/x-www-form-urlencoded content type at all.
Is it just conventional or am I missing something?

What are the risks of allowing quote characters as part of a URL parameter?

I need to allow the user to submit queries as follows;
/search/"my search string"
but it's failing because of request validation, as outlined in the following 2 questions:
How to include quote characters as a route parameter? Getting "Illegal characters in path" message
How to modify request validation?
I'm currently trying to figure out how to disable request validation for the quote character, but i'd like to know the risks before I actually put the site live with this disabled? I will not disable the request validation unless I can only disable it for the quote character, so I do intend to disallow every other character that's currently not allowed.
According to the URI generic syntax specification (RFC 2396), the double-quote character is explicitly excluded and must be escaped (i.e. %22). See section 2.4.3. The reason given in the spec:
The angle-bracket "<" and ">" and double-quote (") characters are excluded because they are often used as the delimiters around URI in text documents and protocol fields.
You can see easily why this is the case -- imagine trying to create a link in HTML to your URL:
<a href="http://somesite/search/"my search string""/>
That would fail HTML parsing (and also breaks SO's syntax highlighting). You also would have trouble doing basic things with the URL like emailing it to someone (the email client wouldn't parse the URL correctly), posting it on a message board, sending it in an instant message, etc.
For what it's worth, spaces are also explicitly excluded (same section of the RFC explains why).

What is the boundary parameter in an HTTP multi-part (POST) Request?

I am trying to develop a sidebar gadget that automates the process of checking a web page for the evolution of my transfer quota. I am almost at it but there is one last step I need to get it working: Sending an HttpRequest with the correct POST data to a php page. Using a firefox plugin, here is what the "Content-Type" of the header looks like:
Content-Type=multipart/form-data; boundary=---------------------------99614912995
with the parameter "boundary" seeming to be random, and the POSTDATA is this:
POSTDATA =-----------------------------99614912995
Content-Disposition: form-data; name="SOMENAME"
Formulaire de Quota
-----------------------------99614912995
Content-Disposition: form-data; name="OTHERNAME"
SOMEDATA
-----------------------------99614912995--
I do not understand how to correctly emulate the POSTDATA with the mystery "boundary" parameter coming back.
Would someone know how I can solve this?
To quote from the RFC 1341, section 7.2.1, what I consider to be the relevant bits on the boundary parameter of the Content-Type header (for MIME):
All subtypes of "multipart" share a common syntax ...
The Content-Type field for multipart entities requires one parameter, "boundary", which is used to specify the encapsulation boundary. The encapsulation boundary is defined as a line consisting entirely of two hyphen characters ("-", decimal code 45) followed by the boundary parameter value from the Content-Type header field.
and then clarifies:
Thus, a typical multipart Content-Type header field might look like this:
Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08jU534c0p
This indicates that the entity consists of several parts, each itself with a structure that is syntactically identical to an RFC 822 message, except that the header area might be completely empty, and that the parts are each preceded by the line
--gc0p4Jq0M2Yt08jU534c0p
Things to Note:
The encapsulation boundary must occur at the beginning of a line, i.e., following a CRLF (Carriage Return-Line Feed)
The boundary must be followed immediately either by another CRLF and the header fields for the next part, or by two CRLFs, in which case there are no header fields for the next part (and it is therefore assumed to be of Content-Type text/plain).
Encapsulation boundaries must not appear within the encapsulations, and must be no longer than 70 characters, not counting the two leading hyphens.
Last but not least:
The encapsulation boundary following the last body part is a distinguished delimiter that indicates that no further body parts will follow. Such a delimiter is identical to the previous delimiters, with the addition of two more hyphens at the end of the line:
--gc0p4Jq0M2Yt08jU534c0p--
I hope this helps someone else in the future, as I had to roam for a while before getting the full picture (please ensure to read the necessary RFCs to get the deepest understanding).
The boundary parameter is set to a number of hyphens plus a random string at the end, but you can set it to anything at all. The problem is, if the boundary string shows up in the request data, it will be treated as a boundary.
For some tips, and an example function for sending multipart/form-data see my answer to this question. It wouldn't be too difficult to modify that function to use a loop for each part you would like to send.
The actual specification for multipart/form-data is in RFC 7578. Boundary is defined in Section 4.1.

Resources