Why is HTTP status-line different from the request-line - http

Both HTTP Request-Line and the Status-Line have 3 components :
Request-Line= Method SP Request-URI SP HTTP-Version CRLF
Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
The Status-Line (the Server response) is fine:
it begin with the HTTP-Version (like any protocol) so the decoder can adapt it parsing according to this first field
followed by some protocol-defined values (the Status Code) that a single word and don't need any SP/CR/LF character
end with any TEXT character (except CR/LF) as the Reason-Phrase.
What I'm failing to understand is why the Request-Line is so different:
The HTTP-Version is at the end
the Request-URI must be escaped to avoid having an SP/CR/LF character (here it goes the famous %20)
Why it does not follow the same (clean) pattern as the Status-line ?
Request-Line= HTTP-Version SP Method SP Request-URI CRLF
This way the Request-URI could be any TEXT character (except CR/LF)
So it would look like this:
HTTP/1.1 GET /user/with space
...
HTTP/1.1 404 NOT FOUND
...
See:
https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html
https://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html

It may come from HTTP/0.9, the early protocol version.
The request part was:
GET http://www.example.com/foo.html\r\n
And the response part was the response body (without headers), so directly your html response starting with <html> for example.
The Request Line is:
METHOD OSP Absolute-Request-URL CRLF
With a lot of optionnal spaces for OSP, like tab or formfeed
with the location part having also the Host part (which is still supported on the protocol today)
The important point is there is no protocol version, and no protocol part. Both in the response and the request.
When HTTP/1.0 was created there was the implicit need of still supporting HTTP/0.9 requests and responses. Something that some servers are still doing today.
On the response side all the response headers parts were added (like stating the mime type of the response!), and the first line was built with this nice idea of starting by the protocol version of the response.
On the request side the protocol version was added as an optional addition so you could still decide to make a HTTP/0.9 request or a new version, and most importantly, an HTTP/0.9 server could maybe still understand your query (and ignore the SP PROTOCOL addition (and even optionnal headers added in the request).
Today if you forgot the protocol part of your request the HTTP/0.9 compatible servers will only parse the first line of your request and ignore extra headers.
These are equivalent queries (but the first one is in http 0.9 and would get no headers in the response):
# HTTP 0.9:
GET http://www.example.com/foo.html\r\n
# HTTP/1.0 version:
GET http://www.example.com/foo.html HTTP/1.0\r\n
\r\n
# or
GET /foo.html HTTP/1.0\r\n
Host: www.example.com\r\n
\r\n
#or
GET http://www.example.com/foo.html HTTP/1.0\r\n
Host: www.foo.com\r\n
\r\n
I think they've been thinking about code updates needed in the parsers and that adding the protocol at the end of the first line was easier to implement. Maybe an old parser could still send a 0.9 response to a HTTP/1.0 query (which is bad but easy to write).
Maybe just adding something on an existing line seems more like an improvment than prefixing the line of the existing protocol.
Maybe you should have been old enough to comment the RFC at this time and tell them that it would be more elegant your way (which is right) :-)

Related

How to know the full request has been received with HTTP 1.0 and HTTP 1:1?

I'm implementing a ultra simple dummy HTTP server responding a message with Hello world to any requests. It is just for benchmarking the asynchronous event handling with wrk or equivalent web server benchmarking tool.
After some searching on the Web I can't find a clear EndOfMessage (EOM) marker. It seam that with HTTP 1.0 we know we have received the full request when the connection is closed. Is that right ?
For HTTP 1.1, how do we know if pipelining is used ? What is the EOM in this case ?
After some searching on the Web I can't find a clear EndOfMessage (EOM) marker.
You can't find one because such a thing doesn't exist. The only marker you may find is the CRLF pair indicating the end of the header fields. In general, the enclosed entity length (that is for requests and responses!) is either communicated beforehand via the Content-Length header or through the transport coding.
with HTTP 1.0 we know we have received the full request when the connection is closed. Is that right?
That is one of two ways mandated by RFC 1945. So generally speaking: no. From RFC 1945, section 7.2.2:
When an Entity-Body is included with a message, the length of that body may be determined in one of two ways. If a Content-Length header field is present, its value in bytes represents the length of the Entity-Body. Otherwise, the body length is determined by the closing of the connection by the server.
This may read like you were generally in the right with your assertion. BUT:
Closing the connection cannot be used to indicate the end of a request body, since it leaves no possibility for the server to send back a response.
With you being on the receiving side, your assumption is simply wrong on every conceivable level: If the request contains a body, announcing the size of said body through the Content-Length header is an absolute requirement.
HTTP/1.1 is a bit relaxed in this regard, as it allows for more options. As Julian pointed out, please consult RFC 7230, section 3.3.3. That section is straightforward to read and to answer your question, I'd have to c&p it as whole.
For HTTP 1.1, how do we know if pipelining is used ?
You do if you receive multiple requests through one connection. The strongest indicator for the client non engaging into pipelining is the presence of Connection: close in the first received request. See RFC 7230, section 6.3 and section 6.3.2. If you are worried about having to support this, you are always free to just read the first request and send back a response with Connection: close in it. The client will know it has to establish a new connection.
What is the EOM in this case ?
Again, there is no marker as there is no special treatment for requests during pipelining. All pipelining is really enabling is to have multiple requests being issued in one go. See section 3.3.3 from above on how to determine the message length.

How to tell if an HTTP response terminates in C

I'm implementing a minimum HTTPS layer for my embedded project where I'm using mbedTLS for TLS and hard-coding HTTP headers to talk with HTTPS servers.
It works fine with normal websites. But so far my implementation detects the end of HTTPS response by checking if the last byte read is \n.
if( ret > 0 && output[len-1] == '\n' )
{
ret = 0;
output[len] = 0;
break;
}
This, however, is not always working for obvious reason. I tried openssl s_client, and it behaves the same - if an HTTP response terminates with \n, then s_client returns immediately after fetching all data. Otherwise it blocks forever, waiting for more data.
An real browser seems to be able to handle this properly. Is there anything I can do beyond setting a timeout?
How to tell if an HTTP response terminates in C...
But so far my implementation detects the end of HTTPS response by checking if the last byte read is \n...
This, however, is not always working for obvious reason...
HTTP calls out \r\n, and not \n. See RFC 2616, Hypertext Transfer Protocol - HTTP/1.1 and page 15:
HTTP/1.1 defines the sequence CR LF as the end-of-line marker for all
protocol elements except the entity-body (see appendix 19.3 for
tolerant applications). The end-of-line marker within an entity-body
is defined by its associated media type, as described in section 3.7.
CRLF = CR LF
Now, what various servers send is a whole different ballgame. There will be duplicate end-of-line markers, missing end-of-line markers, and incorrect end-of-line markers. Its the wild, wild west.
You might want to look at a reference implementation of a HTTP parser. If so, check out libevent or cURL's parsers and how they maintain their state machine.

Web Socket Protocol Handshake vs. Switching Protocols

What's the difference between these two response statuses:
HTTP/1.1 101 Web Socket Protocol Handshake
HTTP/1.1 101 Switching Protocols
Does it matter which one I get?
There is no difference whatsoever. What is important is the 101 response code to indicate the handshake is progressing. This is defined in RFC 6455:
The leading line from the client follows the Request-Line format. The leading line from the server follows the Status-Line format. The Request-Line and Status-Line productions are defined in [RFC2616].
...
The handshake from the server is much simpler than the client handshake. The first line is an HTTP Status-Line, with the status code 101:
HTTP/1.1 101 Switching Protocols
Any status code other than 101 indicates that the WebSocket handshake has not completed and that the semantics of HTTP still apply.
The text of the Status-Line is arbitrary, the server can use whatever text it wants, per RFC 2616:
Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
...
The Status-Code element is a 3-digit integer result code of the attempt to understand and satisfy the request. These codes are fully defined in section 10. The Reason-Phrase is intended to give a short textual description of the Status-Code. The Status-Code is intended for use by automata and the Reason-Phrase is intended for the human user. The client is not required to examine or display the Reason-Phrase.
Switching Protocols just happens to be what the examples in RFC 6455 use, but that is not a requirement.

Jetty returning "HTTP/1.1 400 Bad Request" on malformed HTTP POST header. Is this expected?

does having a space in the http post headers result in BAD request??
I see this in one of the requests:
Content-Type "text/xml; c harset=utf-8"
and I get a HTTP/1.1 400 Bad Request
But if the same request is posted with
Content-Type "text/xml; charset=utf-8"
i.e no space in charset it works.
In my implementation I am not doing any validation.
So I am assuming my Jetty server throws a bad request since there is a space in charset??
Am I right or is my interpretation wrong.
Thanks!!
Yes, having a space where you are putting it should result in a bad request.
HTTP 1.1 is a protocol defined by a standard. By referencing the standard documentation, it is possible to determine what is and what isn't a valid request.
You can find the standard for HTTP/1.1 at RFC 2616 - Hypertext Transfer Protocol -- HTTP/1.1, and you might specifically want to look at sections 14.17 Content-Type and 3.7 Media Types.
Essentially, by inserting the space into "charset", you are creating an invalid HTTP request because the protocol doesn't understand the "c" and "harset" portions. Those aren't defined as valid text in that context.
Moreover, while the protocol knows what appears to be valid and what doesn't, it isn't intelligent enough to infer how to fix even a simple typo like this. As such, for the server to respond "400 Bad Request" is appropriate and conforming to the protocol standard. For what it's worth, you'll also find the HTTP status codes in the RFC. Status code 400 Bad Request means:
The request could not be understood by the server due to malformed
syntax. The client SHOULD NOT repeat the request without
modifications.
i.e. Don't do that. :)

Chunked encoding and content-length header

Is it possible to set the content-length header and also use chunked transfer encoding? and does doing so solve the problem of not knowing the length of the response at the client side when using chunked?
the scenario I'm thinking about is when you have a large file to transfer and there's no problem in determining its size, but it's too large to be buffered completely.
(If you're not using chunked, then the whole response must get buffered first? Right??)
thanks.
No:
"Messages MUST NOT include both a Content-Length header field and a non-identity transfer-coding. If the message does include a non-identity transfer-coding, the Content-Length MUST be ignored." (RFC 2616, Section 4.4)
And no, you can use Content-Length and stream; the protocol doesn't constrain how your implementation works.
Well, you can always send a header stating the size of the file.
Something like response.addHeader("File-Size","size of the file");
And ignore the Content-Length header.
The client implementation has to be tweaked to read this value, but hey you can achieve both the things you want :)
You have to use either Content-Length or chunking, but not both.
If you know the length in advance, you can use Content-Length instead of chunking even if you generate the content on the fly and never have it all at once in your buffer.
However, you should not do that if the data is really large because a proxy might not be able to handle it. For large data, chunking is safer.
This headers can be cause of Postman Parse Error:
"Content-Length" and "Transfer-Encoding" can't be present in the response headers together.
Using parametrized ResponseEntity<?> except raw ResponseEntity in controller can fixed the issue.
The question asks:
Is it possible to set the content-length header and also use chunked transfer encoding?
The RFC HTTP/1.1 spec, quoted in Julian's answer, says:
Messages MUST NOT include both a Content-Length header field and a non-identity transfer-coding.
There is an important difference between what's possible, and what's allowed by a protocol. It is certainly possible, for example, for you to write your own HTTP/1.1 client which sends malformed messages with both headers. You would be violating the HTTP/1.1 spec in doing so, and so you'd imagine some alarm bells would go off and a bunch of Internet police would burst into your house and say, "Stop, arrest that client!" But that doesn't happen, of course. Your request will get sent to wherever it's going.
OK, so you can send a malformed message. So what? Surely on the receiving end, the server will detect the HTTP/1.1 protocol client-side violation, vanquish your malformed request, and serve you back a stern 400 response telling you that you are due in court the following Monday for violating the protocol. But no, actually, that probably won't happen. Of course, it's beyond the scope of HTTP/1.1 to prescribe what happens to misbehaving clients; i.e. while the HTTP/1.1 protocol is analogous to the "law", there is nothing in HTTP/1.1 analogous to the judicial system.
The best that the HTTP/1.1 protocol can do is dictate how a server must act/respond in the case of receiving such a malformed request. However, it's quite lenient in this case. In particular, the server does not have to reject such malformed requests. In fact, in such a scenario, the rule is:
If the message does include a non-identity transfer-coding, the Content-Length MUST be ignored.
Unfortunately, though, some HTTP servers will violate that part of the HTTP/1.1 protocol and will actually give precedence to the Content-Length header, if both headers are present. This can cause a serious problem, if the message visits two servers in sequence in the same system and they disagree about where one HTTP message ends and the next one starts. It leaves the system vulnerable to HTTP Desync attacks a.k.a. Request Smuggling.

Resources