Web Socket Protocol Handshake vs. Switching Protocols - http

What's the difference between these two response statuses:
HTTP/1.1 101 Web Socket Protocol Handshake
HTTP/1.1 101 Switching Protocols
Does it matter which one I get?

There is no difference whatsoever. What is important is the 101 response code to indicate the handshake is progressing. This is defined in RFC 6455:
The leading line from the client follows the Request-Line format. The leading line from the server follows the Status-Line format. The Request-Line and Status-Line productions are defined in [RFC2616].
...
The handshake from the server is much simpler than the client handshake. The first line is an HTTP Status-Line, with the status code 101:
HTTP/1.1 101 Switching Protocols
Any status code other than 101 indicates that the WebSocket handshake has not completed and that the semantics of HTTP still apply.
The text of the Status-Line is arbitrary, the server can use whatever text it wants, per RFC 2616:
Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
...
The Status-Code element is a 3-digit integer result code of the attempt to understand and satisfy the request. These codes are fully defined in section 10. The Reason-Phrase is intended to give a short textual description of the Status-Code. The Status-Code is intended for use by automata and the Reason-Phrase is intended for the human user. The client is not required to examine or display the Reason-Phrase.
Switching Protocols just happens to be what the examples in RFC 6455 use, but that is not a requirement.

Related

Is a standard-compliant (RFC 7230) HTTP-server allowed to respond with only 400 Bad Request?

A question that arose while reading the RFC 7230: Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing was "what is the simplest possible server that this would be compliant with this standard". Searching for "MUST" in the document, it seems to me that the only responses a server has to return is a 400-error, in the case of the request being malformed:
A
server MUST reject any received request message that contains
whitespace between a header field-name and colon with a response code
of 400 (Bad Request).
(3.2.4. Field Parsing)
If a Transfer-Encoding header field
is present in a request and the chunked transfer coding is not
the final encoding, the message body length cannot be determined
reliably; the server MUST respond with the 400 (Bad Request)
status code and then close the connection.
(3.3.3. Message Body Length .3)
If this is a
request message, the server MUST respond with a 400 (Bad Request)
status code and then close the connection.
(3.3.3. Message Body Length .4)
among others.
There are some cases where the standard dictates that the server MUST return a certain status code, but they all seem to be made optional by some stronger clause, for example:
If a server receives both an Upgrade and an Expect header field with
the "100-continue" expectation (Section 5.1.1 of [RFC7231]), the
server MUST send a 100 (Continue) response before sending a 101
(Switching Protocols) response.
(6.7 Upgrade)
being made optional by
A server MAY ignore a received Upgrade
header field if it wishes to continue using the current protocol on
that connection. Upgrade cannot be used to insist on a protocol
change.
(6.7 Upgrade)
All of this leads me to believe that a 400-only server is technically allowed by the standard.
This does seem rather odd to me, I thought a 400-header meant that the request itself was malformed in some way and that a server had to respond with some other error iff the request was well-formed but invalid in some other way.
Have I missed something in the standard, or some other relevant standard, or is a 400-only server allowed?
(As a side note, section 2.6 states
A server can send a 505
(HTTP Version Not Supported) response if it wishes, for any reason,
to refuse service of the client's major protocol version.
which leads me to believe a 505-only server would actually be allowed, although quite boring)

Why is HTTP status-line different from the request-line

Both HTTP Request-Line and the Status-Line have 3 components :
Request-Line= Method SP Request-URI SP HTTP-Version CRLF
Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
The Status-Line (the Server response) is fine:
it begin with the HTTP-Version (like any protocol) so the decoder can adapt it parsing according to this first field
followed by some protocol-defined values (the Status Code) that a single word and don't need any SP/CR/LF character
end with any TEXT character (except CR/LF) as the Reason-Phrase.
What I'm failing to understand is why the Request-Line is so different:
The HTTP-Version is at the end
the Request-URI must be escaped to avoid having an SP/CR/LF character (here it goes the famous %20)
Why it does not follow the same (clean) pattern as the Status-line ?
Request-Line= HTTP-Version SP Method SP Request-URI CRLF
This way the Request-URI could be any TEXT character (except CR/LF)
So it would look like this:
HTTP/1.1 GET /user/with space
...
HTTP/1.1 404 NOT FOUND
...
See:
https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html
https://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html
It may come from HTTP/0.9, the early protocol version.
The request part was:
GET http://www.example.com/foo.html\r\n
And the response part was the response body (without headers), so directly your html response starting with <html> for example.
The Request Line is:
METHOD OSP Absolute-Request-URL CRLF
With a lot of optionnal spaces for OSP, like tab or formfeed
with the location part having also the Host part (which is still supported on the protocol today)
The important point is there is no protocol version, and no protocol part. Both in the response and the request.
When HTTP/1.0 was created there was the implicit need of still supporting HTTP/0.9 requests and responses. Something that some servers are still doing today.
On the response side all the response headers parts were added (like stating the mime type of the response!), and the first line was built with this nice idea of starting by the protocol version of the response.
On the request side the protocol version was added as an optional addition so you could still decide to make a HTTP/0.9 request or a new version, and most importantly, an HTTP/0.9 server could maybe still understand your query (and ignore the SP PROTOCOL addition (and even optionnal headers added in the request).
Today if you forgot the protocol part of your request the HTTP/0.9 compatible servers will only parse the first line of your request and ignore extra headers.
These are equivalent queries (but the first one is in http 0.9 and would get no headers in the response):
# HTTP 0.9:
GET http://www.example.com/foo.html\r\n
# HTTP/1.0 version:
GET http://www.example.com/foo.html HTTP/1.0\r\n
\r\n
# or
GET /foo.html HTTP/1.0\r\n
Host: www.example.com\r\n
\r\n
#or
GET http://www.example.com/foo.html HTTP/1.0\r\n
Host: www.foo.com\r\n
\r\n
I think they've been thinking about code updates needed in the parsers and that adding the protocol at the end of the first line was easier to implement. Maybe an old parser could still send a 0.9 response to a HTTP/1.0 query (which is bad but easy to write).
Maybe just adding something on an existing line seems more like an improvment than prefixing the line of the existing protocol.
Maybe you should have been old enough to comment the RFC at this time and tell them that it would be more elegant your way (which is right) :-)

How to know the full request has been received with HTTP 1.0 and HTTP 1:1?

I'm implementing a ultra simple dummy HTTP server responding a message with Hello world to any requests. It is just for benchmarking the asynchronous event handling with wrk or equivalent web server benchmarking tool.
After some searching on the Web I can't find a clear EndOfMessage (EOM) marker. It seam that with HTTP 1.0 we know we have received the full request when the connection is closed. Is that right ?
For HTTP 1.1, how do we know if pipelining is used ? What is the EOM in this case ?
After some searching on the Web I can't find a clear EndOfMessage (EOM) marker.
You can't find one because such a thing doesn't exist. The only marker you may find is the CRLF pair indicating the end of the header fields. In general, the enclosed entity length (that is for requests and responses!) is either communicated beforehand via the Content-Length header or through the transport coding.
with HTTP 1.0 we know we have received the full request when the connection is closed. Is that right?
That is one of two ways mandated by RFC 1945. So generally speaking: no. From RFC 1945, section 7.2.2:
When an Entity-Body is included with a message, the length of that body may be determined in one of two ways. If a Content-Length header field is present, its value in bytes represents the length of the Entity-Body. Otherwise, the body length is determined by the closing of the connection by the server.
This may read like you were generally in the right with your assertion. BUT:
Closing the connection cannot be used to indicate the end of a request body, since it leaves no possibility for the server to send back a response.
With you being on the receiving side, your assumption is simply wrong on every conceivable level: If the request contains a body, announcing the size of said body through the Content-Length header is an absolute requirement.
HTTP/1.1 is a bit relaxed in this regard, as it allows for more options. As Julian pointed out, please consult RFC 7230, section 3.3.3. That section is straightforward to read and to answer your question, I'd have to c&p it as whole.
For HTTP 1.1, how do we know if pipelining is used ?
You do if you receive multiple requests through one connection. The strongest indicator for the client non engaging into pipelining is the presence of Connection: close in the first received request. See RFC 7230, section 6.3 and section 6.3.2. If you are worried about having to support this, you are always free to just read the first request and send back a response with Connection: close in it. The client will know it has to establish a new connection.
What is the EOM in this case ?
Again, there is no marker as there is no special treatment for requests during pipelining. All pipelining is really enabling is to have multiple requests being issued in one go. See section 3.3.3 from above on how to determine the message length.

Jetty returning "HTTP/1.1 400 Bad Request" on malformed HTTP POST header. Is this expected?

does having a space in the http post headers result in BAD request??
I see this in one of the requests:
Content-Type "text/xml; c harset=utf-8"
and I get a HTTP/1.1 400 Bad Request
But if the same request is posted with
Content-Type "text/xml; charset=utf-8"
i.e no space in charset it works.
In my implementation I am not doing any validation.
So I am assuming my Jetty server throws a bad request since there is a space in charset??
Am I right or is my interpretation wrong.
Thanks!!
Yes, having a space where you are putting it should result in a bad request.
HTTP 1.1 is a protocol defined by a standard. By referencing the standard documentation, it is possible to determine what is and what isn't a valid request.
You can find the standard for HTTP/1.1 at RFC 2616 - Hypertext Transfer Protocol -- HTTP/1.1, and you might specifically want to look at sections 14.17 Content-Type and 3.7 Media Types.
Essentially, by inserting the space into "charset", you are creating an invalid HTTP request because the protocol doesn't understand the "c" and "harset" portions. Those aren't defined as valid text in that context.
Moreover, while the protocol knows what appears to be valid and what doesn't, it isn't intelligent enough to infer how to fix even a simple typo like this. As such, for the server to respond "400 Bad Request" is appropriate and conforming to the protocol standard. For what it's worth, you'll also find the HTTP status codes in the RFC. Status code 400 Bad Request means:
The request could not be understood by the server due to malformed
syntax. The client SHOULD NOT repeat the request without
modifications.
i.e. Don't do that. :)

The maximum length of HTTP Start-Line

Does HTTP limit the length of Start-Line(Request-Line or Status-Line)?
If it does, Which status-code HTTP Server should response when received a HTTP request whose Request-Line is longer than the maximum length?
Quoting from THE HTTP 1.1 RFC(2616),
The Request-Line begins with a method token, followed by the Request-URI and the protocol version, and ending with CRLF. The elements are separated by SP characters. No CR or LF is allowed except in the final CRLF sequence.
Request-Line = Method SP Request-URI SP HTTP-Version CRLF
This does not specify a limit on the length.
The Request-URI can itself be long, and the rfc also says about that:
The HTTP protocol does not place any a priori limit on the length of a URI. Servers MUST be able to handle the URI of any resource they serve, and SHOULD be able to handle URIs of unbounded length if they provide GET-based forms that could generate such URIs. A server SHOULD return 414 (Request-URI Too Long) status if a URI is longer than the server can handle (see section 10.4.15).
So a "too long" status exists for Request-URI, but it means "too long for this server to handle" and not "longer than the spec allows."

Resources