HTTP Request\Response Header Grammar - http

In the header of an HTTP request or response will the header keys be constant in terms of capitalization, between servers.
I ask so I can expect in my code: (Using Fake Function names)
Safe Precise Python Code
for hdr in header.keys():
if 'content-length' == hdr.lower():
recv_more_data( header[hdr] ) # header[hdr] == Content-Length (5388) bytes
break # Exit for loop when if statement is met.
Code I Would Like To Use
recv_more_data (header['Content-Length'])
# I know to expect 'Content-Length' not 'content-Length' or some other variation
Meaning will a server ever return a header with the keys like so.
Standard Request
GET / HTTP/1.1
Host: www.example-host.com
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0
Accept: */*
Accept-Language: en-US
Accept-Encoding: gzip
Connection: closed
Content-Length: 0
A Bad But Possible Response?
HTTP/1.1 200 OK
Server: nginx/1.0.15
date: Thu, 23 Oct 2014 00:25:37 GMT
content-Type: text/html; charset=iso-8859-1
transfer-encoding: chunked
Connection: close
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Content-Encoding: gzip
Clarification will help my code neatness.

HTTP header names are case-insensitive, per the HTTP specification.
RFC 2616 - Hypertext Transfer Protocol -- HTTP/1.1
Section 4.2 - Message Headers
HTTP header fields, which include general-header (section 4.5),
request-header (section 5.3), response-header (section 6.2), and
entity-header (section 7.1) fields, follow the same generic format as
that given in Section 3.1 of RFC 822 [9]. Each header field consists
of a name followed by a colon (":") and the field value. Field names
are case-insensitive. The field value MAY be preceded by any amount
of LWS, though a single SP is preferred. Header fields can be
extended over multiple lines by preceding each extra line with at
least one SP or HT. Applications ought to follow "common form", where
one is known or indicated, when generating HTTP constructs, since
there might exist some implementations that fail to accept anything
beyond the common forms.
RFC 7230 - Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing
Section 3.2 - Header Fields:
Each header field consists of a case-insensitive field name followed
by a colon (":"), optional leading whitespace, the field value, and
optional trailing whitespace.

HTTP header names are case insensitive.
It looks like you're using python. Check out the requests library. It'll make your life much easier: http://docs.python-requests.org/en/latest/

Bear in mind that, even though most major servers will have consistent capitalization, any Joe PHP Developer can set the response headers manually in their code - and there is no way to police what that guy uses as a capitalization standard.

Related

Managing "HTTP/1.1 502 Bad gateway error"

I need to interact with a remote HTTP server at the lowest possible level (i.e.: at socket level) because my target is a very small embedded system with no support for higher level libraries (it's a bare-metal uController wit no O.S. at all and talking to a GSM modem via serial line; modem has some support for sockets, but nothing above that).
Basic need is to upload a "file" using POST.
I have all needed Header/Body in place and it "usually works".
Problem is I randomly get a "HTTP/1.1 502 Bad gateway error" response and this is more likely to happen as the size of "file" increases.
I understand this means there's some problem between the reverse proxy frontend (nginx, apparently) and the backends, but I have absolutely no control on those (actually I dont't really know how the atual setup besides what can be gleamed from (light) probing).
My current strategy is to open a plain socket and send the folowing sequence (dots represent binary data):
POST /path/to/websend.php HTTP/1.0
Host: host.domain.tld
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:33.0) Gecko/20100101 Firefox/33.0
Connection: Keep-Alive
Proxy-Connection: Keep-Alive
Content-Type: multipart/form-data; boundary=AaB03x
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Accept: */*
Content-Length: <full_length>
--AaB03x
Content-Disposition: form-data; name="IV"
Content-Type: application/data
Content-Transfer-Encoding: binary
000102030405060708090A0B0C0D0E0F
--AaB03x
Content-Disposition: form-data; name="S_TXT_FILE"; filename="FILENAME_s.txt"
Content-Type: application/data
Content-Transfer-Encoding: binary
..............................................................
..............................................................
...... several 512byte blocks ................................
..............................................................
..............................................................
--AaB03x--
Is there something I could do to enhance reliability?
I already do multiple retries and this actually works, but sometimes I need to retry six or more times to have a positive answer (200 OK).
Note I send exactly the same sequence on rety and it succeeds... eventually.
I need to send two parts because content is encrypted and first part is the neded "Initialization Vector".

is it a bug to send response gzip-compressed to clients that doesn't specify Accept-Encoding: gzip?

is it a bug in the server if it sends content gzip-compressed to clients that did not specify Accept-Encoding: gzip ? is it breaking the http specs? or is it legal?
i'm curious because https://www.amazon.com always sends content gzip-compressed, regardless of the Accept-Encoding header, as a simple test to confirm:
$ curl https://www.amazon.com
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
$ curl https://www.amazon.com -I
HTTP/2 405
content-type: text/html; charset=UTF-8
server: Server
date: Sat, 03 Nov 2018 11:27:35 GMT
set-cookie: skin=noskin; path=/; domain=.amazon.com
strict-transport-security: max-age=47474747; includeSubDomains; preload
x-amz-id-1: 2M3HZHHA9J21D3MTHH4K
allow: POST, GET
vary: Accept-Encoding,User-Agent,X-Amazon-CDN-Cache
content-encoding: gzip
x-amz-rid: 2M3HZHHA9J21D3MTHH4K
x-frame-options: SAMEORIGIN
x-cache: Error from cloudfront
via: 1.1 1cc4305a3ce000ca199328864ca1c98e.cloudfront.net (CloudFront)
x-amz-cf-id: OKz61IdKmCBfC97pPg-zmDhQnJzK3THXL2iYwegU5EtDaRf6yjBGzw==
curl complains that it's recieving binary data here because it's not responding with HTML, but gzip-compressed html, which is binary data. to actually see the html, add the --compressed argument, which tells curl to add the header Accept-Encoding: gzip, deflate and automatically decompress the response.
A request without an Accept-Encoding header field implies that the user agent has no preferences regarding content-codings. Although this allows the server to use any content-coding in a response, it does not imply that the user agent will be able to correctly process all encodings.
-- https://greenbytes.de/tech/webdav/rfc7231.html#rfc.section.5.3.4.p.4

Debugging Multiple Disposition Headers

I am developing a web application using Java and keep getting this error in Chrome on some particular pages:
net::ERR_RESPONSE_HEADERS_MULTIPLE_CONTENT_DISPOSITION
So, I checked WireShark for the corresponding TCP stream and this was the header of the response:
HTTP/1.0 200 OK
Date: Mon, 10 Sep 2012 08:48:49 GMT
Server: Apache-Coyote/1.1
Content-Disposition: attachment; filename=KBM 80 U (50/60Hz,220/230V)_72703400230.pdf
Content-Type: application/pdf
Content-Length: 564449
X-Cache: MISS from my-company-proxy.local
X-Cache-Lookup: MISS from my-company-proxy.local:8080
Via: 1.0 host-of-application.com, 1.1 my-company-proxy.local:8080 (squid/2.7.STABLE5)
Connection: keep-alive
Proxy-Connection: keep-alive
%PDF-1.4
[PDF data ...]
I only see one content disposition header in there. Why does chrome tell me there were several?
Because the filename parameter is unquoted, and contains a comma character (which is not allowed in unquoted values, and in this case indicates that multiple header values have been folded into a single one).
See http://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.4.2.p.5 and http://greenbytes.de/tech/webdav/rfc6266.html

Why is Connection: keep-alive still being specified in http headers (isn't it deprecated)?

According to "HTTP: The Definitive Guide", using
Connection: keep-alive
to specify a persistent connection is deprecated in HTTP/1.1, since HTTP/1.1 specifies that connections are persistent by default and must be closed manually by sending
Connection: close
Thus, my simple assumption is that "Connection: keep-alive" shouldn't really be used anymore. However, it still seems alive and well. For example, keep-alive is being returned in the following query:
curl -I https://foursquare.com
HTTP/1.1 200 OK
Server: nginx/0.8.52
Date: Thu, 11 Aug 2011 21:15:45 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Expires: Thu, 11 Aug 2011 21:15:45 UTC
Set-Cookie: XSESSIONID=w19~kqtn4bpqmfq51p8qolstpk6ti;Path=/;Secure;HttpOnly
Set-Cookie: LOCATION=49.25::-123.13330078125::Hockeytown::CA;Path=/;Secure
Set-Cookie: bbhive=OQ32XATE0OQAEVCY0IVSWUDPQ1A2GT
Content-Length: 38815
Cache-Control: no-cache, private, no-store
Pragma: no-cache
My question is: Why is Connection: keep-alive still being specified in HTTP headers?
A corollary question is: Are there still (clients, servers, proxies, etc) that still only speak HTTP/1.0 and its variants, or are most such entities on HTTP/1.1 as of 2011?
Here are my working hypotheses:
1) HTTP/1.0 is no longer in use, b/c that was "many years" ago
2) Given (1), keep-alive shouldn't be used anymore, but is purely for vestigial reasons (that is, certain technologies haven't bothered to remove it, or keep it around as voodoo code, etc.)
If (1) is incorrect, and HTTP/1.0 is still in use, then sure it seems plausible to keep using keep-alive, despite follow-up questions on HTTP 1.0-1.1 interop.
Thanks in advance for any insights shared!
HTTP/1.0 have no headers like Connection, but there is many different implementation of HTTP/1.0 and HTTP/1.1.
so Connection: keep-alive is used 'Just in case'

Content-Length header with HEAD requests?

The http spec says about the HEAD request:
The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response. The metainformation contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request.
Should the response to a HEAD request contain a Content-Length header? Should it be the value which would be returned on a GET request, even if there is no response body? Or should the Content-Length be 0?
To me it looks like the HTTP 1.1 RFC is pretty specific:
The Content-Length
entity-header field indicates the size of the entity-body, in decimal
number of OCTETs, sent to the recipient or, in the case of the HEAD
method, the size of the entity-body that would have been sent had
the request been a GET.
Section 14.13 of the HTTP/1.1 spec detailed the Content-Length header, and says this:
Applications SHOULD use this field to
indicate the transfer-length of the
message-body, unless this is
prohibited by the rules in section
4.4.
The word 'SHOULD' has a very specific meaning in RFCs:
SHOULD This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.
So, you may not always see a Content-Length. Typically you might not see it for any content which is dynamically generated, since that might be too expensive to service an exploratory HEAD request. For example, a HEAD request to Apache for a static file will have a Content-Length, but a request for a PHP script may not.
For example, try this very website...
telnet stackoverflow.com 80
HEAD / HTTP/1.0
Host:stackoverflow.com
HTTP/1.1 200 OK
Date: Mon, 11 Jan 2016 10:58:25 GMT
Content-Type: text/html; charset=utf-8
Connection: close
Set-Cookie: __cfduid=c2eb4742a1e02d89cab0402220736c0bd1452509905; expires=Tue, 10-Jan-17 10:58:25 GMT; path=/; domain=.stackoverflow.com; HttpOnly
Cache-Control: public, no-cache="Set-Cookie", max-age=36
Expires: Mon, 11 Jan 2016 10:59:02 GMT
Last-Modified: Mon, 11 Jan 2016 10:58:02 GMT
Vary: *
X-Frame-Options: SAMEORIGIN
X-Request-Guid: 487e80bc-3783-4cfd-d883-a3bc84253234
Set-Cookie: prov=8dc24306-c067-45eb-bf5d-cffa855c2b03; domain=.stackoverflow.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
Server: cloudflare-nginx
CF-RAY: 26303c15f8e035a2-LHR
No content-length there.
Yes, the Content-Length of a HEAD response SHOULD, but not always does (see #Paul's answer) include the Content-Length value of a GET response:
Stack Overflow does:
> telnet stackoverflow.com 80
HEAD / HTTP/1.1
Host: stackoverflow.com
HTTP/1.1 200 OK
Cache-Control: public, max-age=60
Content-Length: 362245 <--------
Content-Type: text/html; charset=utf-8
Expires: Mon, 04 Oct 2010 11:51:49 GMT
Last-Modified: Mon, 04 Oct 2010 11:50:49 GMT
Vary: *
Date: Mon, 04 Oct 2010 11:50:49 GMT
Google doesn't:
> telnet www.google.com 80
HEAD / HTTP/1.1
Host: www.google.ie
HTTP/1.1 200 OK
Date: Mon, 04 Oct 2010 11:55:36 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Server: gws
X-XSS-Protection: 1; mode=block
Transfer-Encoding: chunked
The HTTP-spec at W3C states:
If the new field values indicate that the cached entity differs from the current entity (as would be indicated by a change in Content-Length, ...
Which (to me) means it should hold the "correct" value as you would in a GET response.
Contra the accepted answer, section 4.3.2 of RFC 7231 states:
The server SHOULD send the same header fields in response to a HEAD request as it would have sent if the request had been a GET, except that the payload header fields (Section 3.3)
—which is to say, Content-Length, Content-Range, Trailer, and Transfer-Encoding—
MAY be omitted.
This is even weaker than the note on SHOULD in Paul Dixon's answer:
MAY This word, or the adjective "OPTIONAL", mean that an item is
truly optional. One vendor may choose to include the item because a
particular marketplace requires it or because the vendor feels that
it enhances the product while another vendor may omit the same item.
So the real answer is, you don't need to include Content-Length, but if you do, you should give the correct value.

Resources