Does HTTP limit the length of Start-Line(Request-Line or Status-Line)?
If it does, Which status-code HTTP Server should response when received a HTTP request whose Request-Line is longer than the maximum length?
Quoting from THE HTTP 1.1 RFC(2616),
The Request-Line begins with a method token, followed by the Request-URI and the protocol version, and ending with CRLF. The elements are separated by SP characters. No CR or LF is allowed except in the final CRLF sequence.
Request-Line = Method SP Request-URI SP HTTP-Version CRLF
This does not specify a limit on the length.
The Request-URI can itself be long, and the rfc also says about that:
The HTTP protocol does not place any a priori limit on the length of a URI. Servers MUST be able to handle the URI of any resource they serve, and SHOULD be able to handle URIs of unbounded length if they provide GET-based forms that could generate such URIs. A server SHOULD return 414 (Request-URI Too Long) status if a URI is longer than the server can handle (see section 10.4.15).
So a "too long" status exists for Request-URI, but it means "too long for this server to handle" and not "longer than the spec allows."
Related
A question that arose while reading the RFC 7230: Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing was "what is the simplest possible server that this would be compliant with this standard". Searching for "MUST" in the document, it seems to me that the only responses a server has to return is a 400-error, in the case of the request being malformed:
A
server MUST reject any received request message that contains
whitespace between a header field-name and colon with a response code
of 400 (Bad Request).
(3.2.4. Field Parsing)
If a Transfer-Encoding header field
is present in a request and the chunked transfer coding is not
the final encoding, the message body length cannot be determined
reliably; the server MUST respond with the 400 (Bad Request)
status code and then close the connection.
(3.3.3. Message Body Length .3)
If this is a
request message, the server MUST respond with a 400 (Bad Request)
status code and then close the connection.
(3.3.3. Message Body Length .4)
among others.
There are some cases where the standard dictates that the server MUST return a certain status code, but they all seem to be made optional by some stronger clause, for example:
If a server receives both an Upgrade and an Expect header field with
the "100-continue" expectation (Section 5.1.1 of [RFC7231]), the
server MUST send a 100 (Continue) response before sending a 101
(Switching Protocols) response.
(6.7 Upgrade)
being made optional by
A server MAY ignore a received Upgrade
header field if it wishes to continue using the current protocol on
that connection. Upgrade cannot be used to insist on a protocol
change.
(6.7 Upgrade)
All of this leads me to believe that a 400-only server is technically allowed by the standard.
This does seem rather odd to me, I thought a 400-header meant that the request itself was malformed in some way and that a server had to respond with some other error iff the request was well-formed but invalid in some other way.
Have I missed something in the standard, or some other relevant standard, or is a 400-only server allowed?
(As a side note, section 2.6 states
A server can send a 505
(HTTP Version Not Supported) response if it wishes, for any reason,
to refuse service of the client's major protocol version.
which leads me to believe a 505-only server would actually be allowed, although quite boring)
Both HTTP Request-Line and the Status-Line have 3 components :
Request-Line= Method SP Request-URI SP HTTP-Version CRLF
Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
The Status-Line (the Server response) is fine:
it begin with the HTTP-Version (like any protocol) so the decoder can adapt it parsing according to this first field
followed by some protocol-defined values (the Status Code) that a single word and don't need any SP/CR/LF character
end with any TEXT character (except CR/LF) as the Reason-Phrase.
What I'm failing to understand is why the Request-Line is so different:
The HTTP-Version is at the end
the Request-URI must be escaped to avoid having an SP/CR/LF character (here it goes the famous %20)
Why it does not follow the same (clean) pattern as the Status-line ?
Request-Line= HTTP-Version SP Method SP Request-URI CRLF
This way the Request-URI could be any TEXT character (except CR/LF)
So it would look like this:
HTTP/1.1 GET /user/with space
...
HTTP/1.1 404 NOT FOUND
...
See:
https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html
https://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html
It may come from HTTP/0.9, the early protocol version.
The request part was:
GET http://www.example.com/foo.html\r\n
And the response part was the response body (without headers), so directly your html response starting with <html> for example.
The Request Line is:
METHOD OSP Absolute-Request-URL CRLF
With a lot of optionnal spaces for OSP, like tab or formfeed
with the location part having also the Host part (which is still supported on the protocol today)
The important point is there is no protocol version, and no protocol part. Both in the response and the request.
When HTTP/1.0 was created there was the implicit need of still supporting HTTP/0.9 requests and responses. Something that some servers are still doing today.
On the response side all the response headers parts were added (like stating the mime type of the response!), and the first line was built with this nice idea of starting by the protocol version of the response.
On the request side the protocol version was added as an optional addition so you could still decide to make a HTTP/0.9 request or a new version, and most importantly, an HTTP/0.9 server could maybe still understand your query (and ignore the SP PROTOCOL addition (and even optionnal headers added in the request).
Today if you forgot the protocol part of your request the HTTP/0.9 compatible servers will only parse the first line of your request and ignore extra headers.
These are equivalent queries (but the first one is in http 0.9 and would get no headers in the response):
# HTTP 0.9:
GET http://www.example.com/foo.html\r\n
# HTTP/1.0 version:
GET http://www.example.com/foo.html HTTP/1.0\r\n
\r\n
# or
GET /foo.html HTTP/1.0\r\n
Host: www.example.com\r\n
\r\n
#or
GET http://www.example.com/foo.html HTTP/1.0\r\n
Host: www.foo.com\r\n
\r\n
I think they've been thinking about code updates needed in the parsers and that adding the protocol at the end of the first line was easier to implement. Maybe an old parser could still send a 0.9 response to a HTTP/1.0 query (which is bad but easy to write).
Maybe just adding something on an existing line seems more like an improvment than prefixing the line of the existing protocol.
Maybe you should have been old enough to comment the RFC at this time and tell them that it would be more elegant your way (which is right) :-)
When trying to connect to localhost (with Terminal), I got this answer:
HTTP/1.1 426 Upgrade Required
Server: WebSocket++/0.3.0-alpha4
How can I respond to that to Upgrade?
You are clearly connecting to a WebSocket server, not a plain HTTP server
Server: WebSocket++/0.3.0-alpha4
The WebSocket protocol starts with an HTTP-based request/response handshake where the client asks the server for permission to upgrade communications to full duplex WebSocket messaging.
The 426 response means that initial handshake is not requesting a proper WebSocket upgrade. Per RFC 6455 Section 4.1 Client Requirements:
Once a connection to the server has been established (including a connection via a proxy or over a TLS-encrypted tunnel), the client MUST send an opening handshake to the server. The handshake consists of an HTTP Upgrade request, along with a list of required and optional header fields. The requirements for this handshake are as follows.
The handshake MUST be a valid HTTP request as specified by [RFC2616].
The method of the request MUST be GET, and the HTTP version MUST be at least 1.1.
For example, if the WebSocket URI is "ws://example.com/chat", the first line sent should be "GET /chat HTTP/1.1".
The "Request-URI" part of the request MUST match the /resource name/ defined in Section 3 (a relative URI) or be an absolute http/https URI that, when parsed, has a /resource name/, /host/, and /port/ that match the corresponding ws/wss URI.
The request MUST contain a |Host| header field whose value contains /host/ plus optionally ":" followed by /port/ (when not using the default port).
The request MUST contain an |Upgrade| header field whose value MUST include the "websocket" keyword.
The request MUST contain a |Connection| header field whose value MUST include the "Upgrade" token.
The request MUST include a header field with the name |Sec-WebSocket-Key|. The value of this header field MUST be a nonce consisting of a randomly selected 16-byte value that has been base64-encoded (see Section 4 of [RFC4648]). The nonce MUST be selected randomly for each connection.
NOTE: As an example, if the randomly selected value was the sequence of bytes 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0a 0x0b 0x0c 0x0d 0x0e 0x0f 0x10, the value of the header field would be "AQIDBAUGBwgJCgsMDQ4PEC=="
The request MUST include a header field with the name |Origin| [RFC6454] if the request is coming from a browser client. If the connection is from a non-browser client, the request MAY include this header field if the semantics of that client match the use-case described here for browser clients. The value of this header field is the ASCII serialization of origin of the context in which the code establishing the connection is running. See [RFC6454] for the details of how this header field value is constructed.
As an example, if code downloaded from www.example.com attempts to establish a connection to ww2.example.com, the value of the header field would be "http://www.example.com".
The request MUST include a header field with the name |Sec-WebSocket-Version|. The value of this header field MUST be 13.
NOTE: Although draft versions of this document (-09, -10, -11, and -12) were posted (they were mostly comprised of editorial changes and clarifications and not changes to the wire protocol), values 9, 10, 11, and 12 were not used as valid values for Sec-WebSocket-Version. These values were reserved in the IANA registry but were not and will not be used.
The request MAY include a header field with the name |Sec-WebSocket-Protocol|. If present, this value indicates one or more comma-separated subprotocol the client wishes to speak, ordered by preference. The elements that comprise this value MUST be non-empty strings with characters in the range U+0021 to U+007E not including separator characters as defined in [RFC2616] and MUST all be unique strings. The ABNF for the value of this header field is 1#token, where the definitions of constructs and rules are as given in [RFC2616].
The request MAY include a header field with the name |Sec-WebSocket-Extensions|. If present, this value indicates the protocol-level extension(s) the client wishes to speak. The interpretation and format of this header field is described in Section 9.1.
The request MAY include any other header fields, for example, cookies [RFC6265] and/or authentication-related header fields such as the |Authorization| header field [RFC2616], which are processed according to documents that define them.
Once the client's opening handshake has been sent, the client MUST wait for a response from the server before sending any further data.
The client MUST validate the server's response as follows:
If the status code received from the server is not 101, the client handles the response per HTTP [RFC2616] procedures. In particular, the client might perform authentication if it receives a 401 status code; the server might redirect the client using a 3xx status code (but clients are not required to follow them), etc. Otherwise, proceed as follows.
If the response lacks an |Upgrade| header field or the |Upgrade| header field contains a value that is not an ASCII case-insensitive match for the value "websocket", the client MUST Fail the WebSocket Connection.
If the response lacks a |Connection| header field or the |Connection| header field doesn't contain a token that is an ASCII case-insensitive match for the value "Upgrade", the client MUST Fail the WebSocket Connection.
If the response lacks a |Sec-WebSocket-Accept| header field or the |Sec-WebSocket-Accept| contains a value other than the base64-encoded SHA-1 of the concatenation of the |Sec-WebSocket-Key| (as a string, not base64-decoded) with the string "258EAFA5-E914-47DA-95CA-C5AB0DC85B11" but ignoring any leading and trailing whitespace, the client MUST Fail the WebSocket Connection.
If the response includes a |Sec-WebSocket-Extensions| header field and this header field indicates the use of an extension that was not present in the client's handshake (the server has indicated an extension not requested by the client), the client MUST Fail the WebSocket Connection. (The parsing of this header field to determine which extensions are requested is discussed in Section 9.1.)
If the response includes a |Sec-WebSocket-Protocol| header field and this header field indicates the use of a subprotocol that was not present in the client's handshake (the server has indicated a subprotocol not requested by the client), the client MUST Fail the WebSocket Connection.
If the server's response does not conform to the requirements for the server's handshake as defined in this section and in Section 4.2.2, the client MUST Fail the WebSocket Connection.
Please note that according to [RFC2616], all header field names in
both HTTP requests and HTTP responses are case-insensitive.
If the server's response is validated as provided for above, it is said that The WebSocket Connection is Established and that the WebSocket Connection is in the OPEN state.
After a connection has been established between client and server, the client starts sending an HTTP request. This consists of a line that looks something like GET / HTTP/1.1 followed by several lines of headers. My question is, how does a web server know when to start returning data? Does the client somehow close its side of the connection to indicate it is done with the request and is ready to start receiving the response? Does the server just know after the "\r\n\r\n" string at the end of the headers? Is it something else entirely?
Thanks!
You need to read the HTTP 1.1 specification. The serve has to read the entire request before it can formulate and send the response. There are at least two ways it can know where the request ends:
Content-length header
Chunked transfer encoding.
You should read HTTP 1.1 core specifications.
19.4.6 Introduction of Transfer-Encoding
HTTP/1.1 introduces the Transfer-Encoding header field (section
14.41). Proxies/gateways MUST remove any transfer-coding prior to
forwarding a message via a MIME-compliant protocol.
A process for decoding the "chunked" transfer-coding (section 3.6)
can be represented in pseudo-code as:
length := 0
read chunk-size, chunk-extension (if any) and CRLF
while (chunk-size > 0) {
read chunk-data and CRLF
append chunk-data to entity-body
length := length + chunk-size
read chunk-size and CRLF
}
read entity-header
while (entity-header not empty) {
append entity-header to existing header fields
read entity-header
}
Content-Length := length
Remove "chunked" from Transfer-Encoding
What's the difference between these two response statuses:
HTTP/1.1 101 Web Socket Protocol Handshake
HTTP/1.1 101 Switching Protocols
Does it matter which one I get?
There is no difference whatsoever. What is important is the 101 response code to indicate the handshake is progressing. This is defined in RFC 6455:
The leading line from the client follows the Request-Line format. The leading line from the server follows the Status-Line format. The Request-Line and Status-Line productions are defined in [RFC2616].
...
The handshake from the server is much simpler than the client handshake. The first line is an HTTP Status-Line, with the status code 101:
HTTP/1.1 101 Switching Protocols
Any status code other than 101 indicates that the WebSocket handshake has not completed and that the semantics of HTTP still apply.
The text of the Status-Line is arbitrary, the server can use whatever text it wants, per RFC 2616:
Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
...
The Status-Code element is a 3-digit integer result code of the attempt to understand and satisfy the request. These codes are fully defined in section 10. The Reason-Phrase is intended to give a short textual description of the Status-Code. The Status-Code is intended for use by automata and the Reason-Phrase is intended for the human user. The client is not required to examine or display the Reason-Phrase.
Switching Protocols just happens to be what the examples in RFC 6455 use, but that is not a requirement.