HTTP post multipart-form data length format? - http

In a HTTP POST multipart-form content-type stream, what do the long ------------- lines means? What is hex encoded as the end of these lines? Can you figure out the length of the variables from them? Or is this a specially designed sequence so you can find the break between variables?
-----------------------------7dc34719970524
Content-Disposition: form-data; name="my variable"
blah content here
-----------------------------7dc34719970524
Content-Disposition: form-data; name="asdfasdf"
heaps of data here

It is a boundary that is used to separate the different sets of data in case of a multi-part data submission. Read more about it:
http://www.w3.org/Protocols/rfc1341/7_2_Multipart.html
To quote from the link:
In the case of multiple part messages, in which one or more different sets of data are combined in a single body, a "multipart" Content-Type field must appear in the entity's header. The body must then contain one or more "body parts," each preceded by an encapsulation boundary, and the last one followed by a closing boundary. Each part starts with an encapsulation boundary, and then contains a body part consisting of header area, a blank line, and a body area. Thus a body part is similar to an RFC 822 message in syntax, but different in meaning.

Related

Multiple content length headers and multiple transfer encoding headers

If there are multiple content length headers, should I
fail (I don't think so?)
use the first one,
use the last one
Then I have the same question for transfer_encoding header too. I think with transfer_encoding we are supposed to use the last one.
then, same question for 'Host' header as well.
thanks,
Dean
Content-Length is a single-value header. Usually the last header would have authority; however RFC 7230, section 3.3.2 states:
If a message is received that has multiple Content-Length header fields with field-values consisting of the same decimal value, or a single Content-Length header field with a field value containing a list of identical decimal values (e.g., "Content-Length: 42, 42"), indicating that duplicate Content-Length header fields have been generated or combined by an upstream message processor, then the recipient MUST either reject the message as invalid or replace the duplicated field-values with a single valid Content-Length field containing that decimal value prior to determining the message body length or forwarding the message.
Transfer-Encoding is a different matter as it is containing a list. There can be multiple ones and all are valid. The important thing here is that applied encodings have to be listed in the order in which they have been applied. E.g. if content has been gzipped and then chunk-encoded, the headers have to look like
Transfer-Encoding: gzip, chunked
or
Transfer-Encoding: gzip
Transfer-Encoding: chunked
WRT Content-Length: yes, you actually MUST fail (unless both values are the same, in which case you MAY pick one). See RFC 7230.
"Transfer-Encoding" is different in that it allows multiple values; so you'll have to process them all in order.

How to configure `Content-Length' header in HTTP protocol

I don't clear about how to count `Content-Length' header in HTTP.
Take an example,
HEADER
...
Content-Type: text/html
(blank line `\r\n')
<html></html>
(blank line `\r\n')
This is a working http request sending an empty HTML page(correct me if any problem :-)). Then what should be the length of content? 15 or 17(take the blank line between header and sending entity into account)?
Thanks in advance. Best regards.
According to W3 Content-Lentgth is defined as followed:
The Content-Length entity-header field indicates the size of the
entity-body, in decimal number of OCTETs, sent to the recipient or, in
the case of the HEAD method, the size of the entity-body that would
have been sent had the request been a GET.
As far as I understand it, you have to count everything after the first line break. My answer to your question would be 15 then.
15 is the correct answer. That counts the line break at the END of the entity data, which means that line break is part of the entity, not the http protocol. DO NOT count the line break between the headers and entity.

Set more than one HTTP header with the same name?

As far as I know it is allowed by the HTTP spec to set more than one HTTP header with the same name. Is there any use case to do so (from client to server and vice versa)?
HTTP 1.1 Section 4.2:
Multiple message-header fields with
the same field-name MAY be present in
a message if and only if the entire
field-value for that header field is
defined as a comma-separated list
[i.e., #(values)]. It MUST be possible
to combine the multiple header fields
into one "field-name: field-value"
pair, without changing the semantics
of the message, by appending each
subsequent field-value to the first,
each separated by a comma. The order
in which header fields with the same
field-name are received is therefore
significant to the interpretation of
the combined field value, and thus a
proxy MUST NOT change the order of
these field values when a message is
forwarded.
If I'm not wrong there is no case where multiple headers with the same name are needed.
It's commonly used for Set-Cookie:. Many servers set more than one cookie.
Of course, you can always set them all in a single header.
Actually, I think you cannot set multiple cookies in one header. So that's a necessary use-case.
The Cookie spec (RFC 2109) does claim that you can combine multiple cookies in one header the same way other headers can be combined (comma-separated), but it also points out that non-conforming syntaxes (like the Expires parameter, which has ,s in its value) are still common and must be dealt with by implementations.
So, if you use Expires params in your Set-Cookie headers and you don't want all your cookies to expire at the same time, you probably need to use multiple headers.
Update: Evolution of the Cookie spec
RFC 2109 has been obsoleted by RFC 2965 that in turn got obsoleted by RFC 6265, which is stricter on the issue:
Origin servers SHOULD NOT fold multiple Set-Cookie header fields into a single header field. The usual mechanism for folding HTTP headers fields (i.e., as defined in [RFC2616]) might change the semantics of the Set-Cookie header field because the %x2C (",") character is used by Set-Cookie in a way that conflicts with such folding.
Side note
RFC 6265 uses the verb "folding" when it refers to combining multiple header fields into one, which is ambiguous in the context of the HTTP/1 specs (both by RFC2616, and its successor, RFC 7230) where:
"folding" consistently refers to line folding, and
the verb "combine" is used to describe merging same headers.
Combining header fields:
See RFC 2616, Section 4.2, Message Headers (quoted in the question), but searching for the for the word "combine" will bring up special cases.
The above item obsoleted by RFC 7230, Section 3.2.2, Field Order:
A recipient MAY combine multiple header fields with the same field name into one field-name: field-value pair, without changing the semantics of the message, by appending each subsequent field value to the combined field value in order, separated by a comma. The order in which header fields with the same field name are received is therefore significant to the interpretation of the combined field value; a proxy MUST NOT change the order of these field values when forwarding a message.
Note: In practice, the "Set-Cookie" header field (RFC6265) often appears multiple times in a response message and does not use the list syntax, violating the above requirements on multiple header fields with the same name. Since it cannot be combined into a single field-value, recipients ought to handle Set-Cookie as a special case while processing header fields. (See Appendix A.2.3 of [Kri2001] for details.)
Line folding:
From RFC 2616, Section 2.2, Basic Rules:
HTTP/1.1 header field values can be folded onto multiple lines if the continuation line begins with a space or horizontal tab. All linear white space, including folding, has the same semantics as SP. A recipient MAY replace any linear white space with a single SP before interpreting the field value or forwarding the message downstream.
The above section obsoleted by RFC 7230, Section 3.2.4, Field Parsing:
Historically, HTTP header field values could be extended over multiple lines by preceding each extra line with at least one space or horizontal tab (obs-fold). This specification deprecates such line folding except within the message/http media type (Section 8.3.1). A sender MUST NOT generate a message that includes line folding (i.e., that has any field-value that contains a match to the obs-fold rule) unless the message is intended for packaging within the message/http media type.
A server that receives an obs-fold in a request message that is not within a message/http container MUST either reject the message by sending a 400 (Bad Request), preferably with a representation explaining that obsolete line folding is unacceptable, or replace each received obs-fold with one or more SP octets prior to interpreting the field value or forwarding the message downstream.
A proxy or gateway that receives an obs-fold in a response message that is not within a message/http container MUST either discard the message and replace it with a 502 (Bad Gateway) response, preferably with a representation explaining that unacceptable line folding was received, or replace each received obs-fold with one or more SP octets prior to interpreting the field value or forwarding the message downstream.
A user agent that receives an obs-fold in a response message that is not within a message/http container MUST replace each received obs-fold with one or more SP octets prior to interpreting the field value.
Since duplicate headers can cause issues with various web-servers and APIs (regardless of what the spec says), I doubt there is any general purpose use case where this is best practice. That's not to say someone somewhere isn't doing it, of course.
As you're looking for use-cases, maybe Accept would be a valid one.
Accept: application/json
Accept: application/xml
It's only allowed for headers using a very specific format, see RFC 2616, Section 4.2.
Old thread, but I was looking into this same issue. Anyway, the Accept and Accept-Encoding headers are typical examples that uses multiple values, comma separated. Even if these are request specific header, the specs do not differentiate between request and response at this level. Check the one from this page.
What the spec says is that if you have commas as character in the value of the header, you cannot use multiple headers of the same name, unless you disambiguate the use of the comma.

Standard for adding multiple values of a single HTTP Header to a request or response

If I want to add a list of values as an HTTP Header, is there a standard way to do this? I couldn't find anything (that I could easily understand) in RFC 822. For example, is
comma separated values standard or semi-colon separated values. Is there a standard at all?
Example:
Key: value1;value2;value3
You'll want to take a look at the HTTP spec RFC 2616 where it says:
Multiple message-header fields with
the same field-name MAY be present in
a message if and only if the entire
field-value for that header field is
defined as a comma-separated list
[i.e., #(values)]. It MUST be possible
to combine the multiple header fields
into one "field-name: field-value"
pair, without changing the semantics
of the message, by appending each
subsequent field-value to the first,
each separated by a comma. The order
in which header fields with the same
field-name are received is therefore
significant to the interpretation of
the combined field value, and thus a
proxy MUST NOT change the order of
these field values when a message is
forwarded.
What this means is that you can send the same header multiple times in a response with different values, as long as those values can be appended to each other using a comma. This also means that you can send multiple values in a single header by concatenating them with commas.
So in your case it will be:
Key: value1,value2,value3
by all means #marc-novakowski you narrowing the "problem" :)
normally (per HTTP spec) we delimit each value from the other using a comma ','
but we will examine a simple case:
Cookie-set: language=pl; expires=Sat, 15-Jul-2017 23:58:22 GMT; path=/; domain=x.com
Cookie-set: id=123 expires=Sat, 15-Jul-2017 23:58:22 GMT; path=/; domain=x.com; httponly
how do you join such headers when the values one from another are delimited with commas - case when coma can appear ???
then the "client" responsibility is to choose and decide the strategy eg drop, merg (if merg how)?
pleas take look at Mozilla implementation of nsHttpHeaderArray
https://github.com/bnoordhuis/mozilla-central/blob/master/netwerk/protocol/http/nsHttpHeaderArray.h#L185
mozilla choose to use a newline delimiter '\n' in this case (for certain header fields names)
I encourage when you face a such situation to search in common existing solutions - as they providing familiar scheme
flags explanations:
Cookies are no part of the HTTP standard. Cookies are defined in an
own RFC, 6265 (formally 2965 and 2109). Even the HTTP 2 RFC only
mentions cookies but does not define them as part of the standard. –
#mecki Aug 25 at 18:56
please look one more time for sentence:
per HTTP spec we delimit each value from other using a comma ',' - there is no word cookie here :)
maybe we need to precise we talk here about HEADER FIELD(s - when repeating them) "Cookie-set" is a header field and it has value .. those value we consider to be a "COOKIE/S" - thus client/server implementation should handle such "COOKIE/S"
SEE VALUES OR NAME PAIRS :) IN HTTP 1/1 SPEC
https://datatracker.ietf.org/doc/html/rfc7230#section-3.2.2
However not all values with the same field name may be combined into field values list. For example, in RFC 7230 we may read
Note: In practice, the "Set-Cookie" header field ([RFC6265]) often
appears multiple times in a response message and does not use the
list syntax, violating the above requirements on multiple header
fields with the same name. Since it cannot be combined into a
single field-value, recipients ought to handle "Set-Cookie" as a
special case while processing header fields. (See Appendix A.2.3
of [Kri2001] for details.)

What is the boundary parameter in an HTTP multi-part (POST) Request?

I am trying to develop a sidebar gadget that automates the process of checking a web page for the evolution of my transfer quota. I am almost at it but there is one last step I need to get it working: Sending an HttpRequest with the correct POST data to a php page. Using a firefox plugin, here is what the "Content-Type" of the header looks like:
Content-Type=multipart/form-data; boundary=---------------------------99614912995
with the parameter "boundary" seeming to be random, and the POSTDATA is this:
POSTDATA =-----------------------------99614912995
Content-Disposition: form-data; name="SOMENAME"
Formulaire de Quota
-----------------------------99614912995
Content-Disposition: form-data; name="OTHERNAME"
SOMEDATA
-----------------------------99614912995--
I do not understand how to correctly emulate the POSTDATA with the mystery "boundary" parameter coming back.
Would someone know how I can solve this?
To quote from the RFC 1341, section 7.2.1, what I consider to be the relevant bits on the boundary parameter of the Content-Type header (for MIME):
All subtypes of "multipart" share a common syntax ...
The Content-Type field for multipart entities requires one parameter, "boundary", which is used to specify the encapsulation boundary. The encapsulation boundary is defined as a line consisting entirely of two hyphen characters ("-", decimal code 45) followed by the boundary parameter value from the Content-Type header field.
and then clarifies:
Thus, a typical multipart Content-Type header field might look like this:
Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08jU534c0p
This indicates that the entity consists of several parts, each itself with a structure that is syntactically identical to an RFC 822 message, except that the header area might be completely empty, and that the parts are each preceded by the line
--gc0p4Jq0M2Yt08jU534c0p
Things to Note:
The encapsulation boundary must occur at the beginning of a line, i.e., following a CRLF (Carriage Return-Line Feed)
The boundary must be followed immediately either by another CRLF and the header fields for the next part, or by two CRLFs, in which case there are no header fields for the next part (and it is therefore assumed to be of Content-Type text/plain).
Encapsulation boundaries must not appear within the encapsulations, and must be no longer than 70 characters, not counting the two leading hyphens.
Last but not least:
The encapsulation boundary following the last body part is a distinguished delimiter that indicates that no further body parts will follow. Such a delimiter is identical to the previous delimiters, with the addition of two more hyphens at the end of the line:
--gc0p4Jq0M2Yt08jU534c0p--
I hope this helps someone else in the future, as I had to roam for a while before getting the full picture (please ensure to read the necessary RFCs to get the deepest understanding).
The boundary parameter is set to a number of hyphens plus a random string at the end, but you can set it to anything at all. The problem is, if the boundary string shows up in the request data, it will be treated as a boundary.
For some tips, and an example function for sending multipart/form-data see my answer to this question. It wouldn't be too difficult to modify that function to use a loop for each part you would like to send.
The actual specification for multipart/form-data is in RFC 7578. Boundary is defined in Section 4.1.

Resources