Is it valid to return different text in the response header than the usual fare?
For example if the request is invalid, could I respond with:
HTTP/1.1 400 Here be Dragons
And have that header properly handled by proxies, etc?
The HTTP spec says:
The Status-Code is intended for use by automata and the Reason-Phrase is intended for the human user. The client is not required to examine or display the Reason-Phrase.
and:
The reason phrases listed here are only recommendations -- they MAY be replaced by local equivalents without affecting the protocol.
So yes, it's valid to use any text you'd like as the Reason-Phrase AKA "Status text" or "Status name".
Yes, it conforms to the HTTP protocol to have arbitrary text on the response line. No, proxies aren't required to forward that as-is (but typically will).
Related
There is a confusing bit of terminology in the RFC for HTTP/2 that I wish was clearer.
Per the RFC https://www.rfc-editor.org/rfc/rfc7540#section-8.1.2
Just as in HTTP/1.x, header field names are strings of ASCII
characters that are compared in a case-insensitive fashion. However,
header field names MUST be converted to lowercase prior to their
encoding in HTTP/2. A request or response containing uppercase header
field names MUST be treated as malformed
This seems to outline two conflicting ideas
Header field names are case-insensitive in HTTP/2
If you receive or send fields that are not lowercase, the request/response is malformed.
If a request or response that contains non-lowercase headers is invalid, how can it be considered case-insensitive?
There are two levels of "HTTP": a more abstract, upper, layer with the HTTP semantic (e.g. PUT resource r1), and a lower layer where that semantic is encoded. Think of these two as, respectively, the application layer of HTTP, and the network layer of HTTP.
The application layer can be completely unaware of whether the semantic HTTP request PUT r1 has arrived in HTTP/1.1 or HTTP/2 format.
On the other hand, the same semantic, PUT r1, is encoded differently in HTTP/1.1 (textual) vs HTTP/2 (binary) by the network layer.
The referenced section of the specification should be interpreted in the first sentence as referring to the application layer: "as in HTTP/1.1 header names should compared case insensitively".
This means that if an application is asked "is header ACCEPT present?", the application should look at the header names in a case insensitive fashion (or be sure that the implementation provides such feature), and return true if Accept or accept is present.
The second sentence should be interpreted as referring to the network layer: a compliant HTTP/2 implementation MUST send the headers over the network lowercase, because that is how HTTP/2 encodes header names to be sent over the wire.
Nothing forbids a compliant HTTP/2 implementation to receive content-length: 128 (lowercase), but then convert this header into Content-Length: 128 when it makes it available to the application - for example for maximum compatibility with HTTP/1.1 where the header has uppercase first letters (for example to be printed on screen).
It's a mistake, or at least an unnecessarily confusing specification.
Header names must be compared case-insensitive, but that's irrelevant since they are only transmitted or received lowercase.
I.e. in the RFC, "Content-Length" refers to to the header "content-length."
RFC 7540 has been obsoleted by RFC 9113, which states simply:
Field names MUST be converted to lowercase when constructing an HTTP/2 message.
https://www.rfc-editor.org/rfc/rfc9113#section-8.2
And the prose now uses the lowercase header names.
Is the description component of a HTTP status code used? For example, in the HTTP response '200 OK', is the OK (i.e. the description) ever used? Or is it just for humans to read?
No, the "reason phrase" is purely there for humans to read. Nothing should be using it programmatically - especially because in HTTP/2, it's been eliminated:
HTTP/2 does not define a way to carry the version or reason phrase that is included in an HTTP/1.1 status line.
The status code is often used. For example, if you are using AJAX, you will most likely check the HTTP status code before using the data returned.
However the description is just for humans as computers recognize the status code and what it entails without the description.
The description is a standard message associated with the code itself to get a quick description of the code. It is used for a diagnostic, along with its full description wich can be found on the https status code list. So, it is not "used" as i think you're implying, but indeed only used for diagnostic by humans
The data sections of messages Error, Forward and redirection responses may be used to contain human-readable diagnostic information.
I need to do a spot of server-side parsing of raw HTTP headers - in particular the Content-type header. Whilst what I see for this header in different browser is appears to confirm to the same rules of capitalization and space usage I need to be sure. For the mainstream browsers is it safe to assume that this header string will bear the form (for multipart form data)
Content-type:...; boundary=...
or is it necessary to check for redundant spaces, e.g. boundary = etc?
Is adding an extra redudandant header to the HTTP request may cause any functionality harm?
for example:
Adding :
myheader=blablabla
An X- prefix was customary for those headers, but no longer. It shouldn't break anything as long as your headers are formatted correctly (so: myheader: blablabla, not myheader=blablabla)
The HTTP 1.1 specification says (about entity headers);
Unrecognized header fields SHOULD be ignored by the recipient and MUST be forwarded by transparent proxies.
In other words - since the wording is SHOULD, not MUST - recipients are allowed to react to unknown headers, so technically your extra header could cause harm.
In practice though I have never seen a recipient do this, and with the surfacing of newer RFCs regarding custom header use seeing an adverse effect is very unlikely.
I have written a mini-minimalist http server prototype ( heavily inspired by boost asio examples ), and for the moment I haven't put any http header in the server response, only the html string content. Surprisingly it works just fine.
In that question the OP wonders about necessary fields in the http response, and one of the comments states that they may not be really important from the server side.
I have not tried yet to respond binary image files, or gzip compressed file for the moment, in which cases I suppose it is mandatory to have a http header.
But for text only responses (html, css, and xml outputs), would it be ok never to include the http header in my server responses ? What are the risks / errors possible ?
At a minimum, you must provide a header with a status line and a date.
As someone who has written many protocol parsers, I am begging you, on my digital metaphoric knees, please oh please oh please don't just totally ignore the specification just because your favorite browser lets you get away with it.
It is perfectly fine to create a program that is minimally functional, as long as the data it produces is correct. This should not be a major burden, since all you have to do is add three lines to the start of your response. And one of those lines is blank! Please take a few minutes to write the two glorious line of code that will bring your response data into line with the spec.
The headers you really should supply are:
the status line (required)
a date header (required)
content-type (highly recommended)
content-length (highly recommended), unless you're using chunked encoding
if you're returning HTTP/1.1 status lines, and you're not providing a valid content-length or using chunked encoding, then add Connection: close to your headers
the blank line to separate header from body (required)
You can choose not to send a content-type with the response, but you have to understand that the client might not know what to do with the data. The client has to guess what kind of data it is. A browser might decide to treat it as a downloaded file instead of displaying it. An automated process (someone's bash/curl script) might reasonably decide that the data isn't of the expected type so it should be thrown away.
From the HTTP/1.1 Specification section 3.1.1.5. Content-Type:
A sender that generates a message containing a payload body SHOULD
generate a Content-Type header field in that message unless the
intended media type of the enclosed representation is unknown to the
sender. If a Content-Type header field is not present, the recipient
MAY either assume a media type of "application/octet-stream"
([RFC2046], Section 4.5.1) or examine the data to determine its type.