When receiving a raw response from http://www.google.com/ the Content-Length header is missing. Instead the number of bytes to receive is placed after the end of header code \r\n\r\n but before the actual content.
I looked at the raw response and the 8000 contains \r\n for an end of line.
Partial Google Response
Header
HTTP/1.1 200 OK
Date: Tue, 28 Oct 2014 18:38:37 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: ...
Set-Cookie: ...
P3P: ...
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Alternate-Protocol: 80:quic,p=0.01
Transfer-Encoding: chunked
End of Header (signified by '\r\n\r\n')
8000 # has '\r\n', I am assuming this is the content-length?
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content...`
End Response
So my question is why is Google so damn special, that they can screw up my re-invention of the HTTP wheel. And if I should account for this happening in all my responses or just from Google.
You should account for this possibility from all responses, as it is quite common. Nothing about this is "special" and this is indeed completely legitimate behavior.
What you're assuming is the content-length is actually the chunk size, as per the Chunked Transfer Encoding that the server is responding with.
The response is complete when a chunk size of 0 is encountered, thus the effective content-length of a chunked response is equal to the sum of chunk sizes.
This has been part of the HTTP spec since 1999.
Related
The send and receive content on the server I work on is with type "application/xml".
On my init section I added the below line to automatically to add to all my header requests
web_add_auto_header("Content-Type","application/xml");
When I run the script, I get response header showing the correct content-type but in the boo day I get message:
351-byte response headers for "http://172.29.67.68/svc/bw/cti/monitor/event/bw_perfuser1000_60a439f7-599d-4fe1-baa6-598391312954" (RelFrameId=1, Internal ID=5)
HTTP/1.1 200 OK\r\n
Date: Mon, 11 Mar 2019 18:20:09 GMT\r\n
Content-Length: 681\r\n
Content-Type: application/xml\r\n
X-Frame-Options: SAMEORIGIN\r\n
Expires: Thu, 01 Jan 1970 00:00:00 GMT\r\n
Cache-Control: no-cache, private, must-revalidate, max-stale=0, post-check=0, pre-check=0
no-store\r\n
Pragma: no-cache\r\n
Keep-Alive: timeout=15, max=96\r\n
Connection: Keep-Alive\r\n
message I get:
HTML parsing not performed for Content-Type "application/xml" ("ParseHtmlContentType" Run-Time Setting is "TEXT").
To fix this issue, I need to add the below line before each request
web_add_header("Content-Type","application/xml");
Can anyone please explain why I need to explicitly mention the content-type before each request although I used the web_add_auto_header() function?
In HTTP protocol, you need to specific Request Header Fields in HTTP request. The detail of HTTP Header please refer to wiki.
From this wikipedia example
a typical HTTP response may look like this
HTTP/1.1 200 OK
Date: Mon, 23 May 2005 22:38:34 GMT
Content-Type: text/html; charset=UTF-8
Content-Encoding: UTF-8
Content-Length: 138
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)
ETag: "3f80f-1b6-3e1cb03b"
Accept-Ranges: bytes
Connection: close
<html>
<head>
<title>An Example Page</title>
</head>
<body>
Hello World, this is a very simple HTML document.
</body>
</html>
In this case because of this header:
Connection: close
the client will know that the HTTP body has finished, because the TCP connection will be closed, Say for example Connection: close wasn't present, the client would still know the HTTP body was finished because this header:
Content-Length: 138
Is saying, once you get 138 bytes, we're done here. But in the case where neither of these headers are used and instead the server sends Transfer-encoding: chunked how does a browser know when a response has completed, so that it can move onto other things?
From Wikipedia:
The chunked keyword in the Transfer-Encoding header is used to
indicate chunked transfer. Each chunk is preceded by its size. The
transmission ends when a zero-length chunk is encountered.
I'm working on a delphi api for Google docs and having a hard time getting the upload to work. I'm following Google's development guide here and from what I understand it looks like the process should go like this:
Make a POST request to this url: https://docs.google.com/feeds/upload/create-session/default/private/full/?access_token=my_access_token&v=3&convert=false with these headers: X-Upload-Content-Type and X-Upload-Content-Length
Get a 200 OK response with the next upload location stored in the Location header
Make a PUT request to the Location header with the header Content-Type set to whatever I had X-Upload-Content-Type set to in step 1 and the header Content-Range set to something like this: bytes 0-524287/2097152 and the first 512kb of data in the body
Get a 308 Resume Incomplete Response that has the next upload location in the Location header
Go back to 3 until all bytes are uploaded, at which point I will receive a 201 Created response that will have the xml data describing the file I uploaded
Everything up to and including step 3 works fine. It is at step 4 that things start to go wrong.
The one thing that confuses me the most is that the response on step 4 doesn't contain a Location header. I figured that meant I should just send the next request to the same url, but that causes me to get a 504 error. I tried the entire process with fiddler just to see if it was the delphi code, a lack of understanding on my part, or something that google is doing.
Here's the requests and responses I sent and received using fiddler:
POST https://docs.google.com/feeds/upload/create-session/default/private/full/?access_token=my_access_token&v=3&convert=false HTTP/1.1
Content-Type: application/x-www-form-urlencoded
X-Upload-Content-Type: application/octet-stream
X-Upload-Content-Length: 2097152
Content-Length: 0
Host: docs.google.com
HTTP/1.1 200 OK
Server: HTTP Upload Server Built on May 16 2012 12:03:24 (1337195004)
Location: https://docs.google.com/feeds/upload/create-session/default/private/full/?access_token=my_access_token&v=3&convert=false&upload_id=AEnB2Ur9-9VxMSI6kaFzbybY2qiyzK6kVoKzcZ6Yo02H8Ni4FlQFl_N06DdjZXzp3vSjOPH3CEb_4vDlKZp7VlC0hxpkypzlKg
Date: Tue, 22 May 2012 16:53:27 GMT
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Cache-Control: no-cache, no-store, must-revalidate
Content-Length: 0
Content-Type: text/html
PUT https://docs.google.com/feeds/upload/create-session/default/private/full/?access_token=my_access_token&v=3&convert=false&upload_id=AEnB2Ur9-9VxMSI6kaFzbybY2qiyzK6kVoKzcZ6Yo02H8Ni4FlQFl_N06DdjZXzp3vSjOPH3CEb_4vDlKZp7VlC0hxpkypzlKg HTTP/1.1
Content-Type: application/octet-stream
Content-Length: 524288
Content-Range: bytes 0-524287/2097152
Host: docs.google.com
[first 512kb of data here]
HTTP/1.1 308 Resume Incomplete
Server: HTTP Upload Server Built on May 16 2012 12:03:24 (1337195004)
Range: bytes=0-524287
X-Range-MD5: bd9d4ee7afa24b7da0e685f05b5f1f44
Date: Tue, 22 May 2012 16:54:29 GMT
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Cache-Control: no-cache, no-store, must-revalidate
Content-Length: 0
Content-Type: text/html
PUT https://docs.google.com/feeds/upload/create-session/default/private/full/?access_token=my_access_token&v=3&convert=false&upload_id=AEnB2Ur9-9VxMSI6kaFzbybY2qiyzK6kVoKzcZ6Yo02H8Ni4FlQFl_N06DdjZXzp3vSjOPH3CEb_4vDlKZp7VlC0hxpkypzlKg HTTP/1.1
Content-Type: application/octet-stream
Content-Length: 524288
Content-Range: bytes 524288-1048575/2097152
Host: docs.google.com
[next 512kb of data]
HTTP/1.1 504 Fiddler - Send Failure
Content-Type: text/html; charset=UTF-8
Connection: close
Timestamp: 10:54:14.056
The only thing I was able to do was to be able to say for a fact that it is not just the delphi code that is wrong, and since I don't think it's google, I'm going to have to go with I don't understand something that should be happening. What am I missing?
Edit
I was able to get the upload working, I'm not entirely sure what I did differently, but the documentation is a little misleading. At least it is to me. When you send a PUT request, you don't get a new location, you just continue to upload to the same one. Also, when you finish the upload, the 201 response doesn't contain the actual XML data, instead, it has a Location header that points to where you can grab the XML data from. Not a huge deal but a little confusing.
It seems like the 504 error is returned by Fiddler, these two links should help:
https://urda.com/blog/2010/09/28/iis-services-504s-and-fiddler/
https://urda.com/blog/2010/09/30/follow-up-iis-services-504s-and-fiddler/
According to "HTTP: The Definitive Guide", using
Connection: keep-alive
to specify a persistent connection is deprecated in HTTP/1.1, since HTTP/1.1 specifies that connections are persistent by default and must be closed manually by sending
Connection: close
Thus, my simple assumption is that "Connection: keep-alive" shouldn't really be used anymore. However, it still seems alive and well. For example, keep-alive is being returned in the following query:
curl -I https://foursquare.com
HTTP/1.1 200 OK
Server: nginx/0.8.52
Date: Thu, 11 Aug 2011 21:15:45 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Expires: Thu, 11 Aug 2011 21:15:45 UTC
Set-Cookie: XSESSIONID=w19~kqtn4bpqmfq51p8qolstpk6ti;Path=/;Secure;HttpOnly
Set-Cookie: LOCATION=49.25::-123.13330078125::Hockeytown::CA;Path=/;Secure
Set-Cookie: bbhive=OQ32XATE0OQAEVCY0IVSWUDPQ1A2GT
Content-Length: 38815
Cache-Control: no-cache, private, no-store
Pragma: no-cache
My question is: Why is Connection: keep-alive still being specified in HTTP headers?
A corollary question is: Are there still (clients, servers, proxies, etc) that still only speak HTTP/1.0 and its variants, or are most such entities on HTTP/1.1 as of 2011?
Here are my working hypotheses:
1) HTTP/1.0 is no longer in use, b/c that was "many years" ago
2) Given (1), keep-alive shouldn't be used anymore, but is purely for vestigial reasons (that is, certain technologies haven't bothered to remove it, or keep it around as voodoo code, etc.)
If (1) is incorrect, and HTTP/1.0 is still in use, then sure it seems plausible to keep using keep-alive, despite follow-up questions on HTTP 1.0-1.1 interop.
Thanks in advance for any insights shared!
HTTP/1.0 have no headers like Connection, but there is many different implementation of HTTP/1.0 and HTTP/1.1.
so Connection: keep-alive is used 'Just in case'
The http spec says about the HEAD request:
The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response. The metainformation contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request.
Should the response to a HEAD request contain a Content-Length header? Should it be the value which would be returned on a GET request, even if there is no response body? Or should the Content-Length be 0?
To me it looks like the HTTP 1.1 RFC is pretty specific:
The Content-Length
entity-header field indicates the size of the entity-body, in decimal
number of OCTETs, sent to the recipient or, in the case of the HEAD
method, the size of the entity-body that would have been sent had
the request been a GET.
Section 14.13 of the HTTP/1.1 spec detailed the Content-Length header, and says this:
Applications SHOULD use this field to
indicate the transfer-length of the
message-body, unless this is
prohibited by the rules in section
4.4.
The word 'SHOULD' has a very specific meaning in RFCs:
SHOULD This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.
So, you may not always see a Content-Length. Typically you might not see it for any content which is dynamically generated, since that might be too expensive to service an exploratory HEAD request. For example, a HEAD request to Apache for a static file will have a Content-Length, but a request for a PHP script may not.
For example, try this very website...
telnet stackoverflow.com 80
HEAD / HTTP/1.0
Host:stackoverflow.com
HTTP/1.1 200 OK
Date: Mon, 11 Jan 2016 10:58:25 GMT
Content-Type: text/html; charset=utf-8
Connection: close
Set-Cookie: __cfduid=c2eb4742a1e02d89cab0402220736c0bd1452509905; expires=Tue, 10-Jan-17 10:58:25 GMT; path=/; domain=.stackoverflow.com; HttpOnly
Cache-Control: public, no-cache="Set-Cookie", max-age=36
Expires: Mon, 11 Jan 2016 10:59:02 GMT
Last-Modified: Mon, 11 Jan 2016 10:58:02 GMT
Vary: *
X-Frame-Options: SAMEORIGIN
X-Request-Guid: 487e80bc-3783-4cfd-d883-a3bc84253234
Set-Cookie: prov=8dc24306-c067-45eb-bf5d-cffa855c2b03; domain=.stackoverflow.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
Server: cloudflare-nginx
CF-RAY: 26303c15f8e035a2-LHR
No content-length there.
Yes, the Content-Length of a HEAD response SHOULD, but not always does (see #Paul's answer) include the Content-Length value of a GET response:
Stack Overflow does:
> telnet stackoverflow.com 80
HEAD / HTTP/1.1
Host: stackoverflow.com
HTTP/1.1 200 OK
Cache-Control: public, max-age=60
Content-Length: 362245 <--------
Content-Type: text/html; charset=utf-8
Expires: Mon, 04 Oct 2010 11:51:49 GMT
Last-Modified: Mon, 04 Oct 2010 11:50:49 GMT
Vary: *
Date: Mon, 04 Oct 2010 11:50:49 GMT
Google doesn't:
> telnet www.google.com 80
HEAD / HTTP/1.1
Host: www.google.ie
HTTP/1.1 200 OK
Date: Mon, 04 Oct 2010 11:55:36 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Server: gws
X-XSS-Protection: 1; mode=block
Transfer-Encoding: chunked
The HTTP-spec at W3C states:
If the new field values indicate that the cached entity differs from the current entity (as would be indicated by a change in Content-Length, ...
Which (to me) means it should hold the "correct" value as you would in a GET response.
Contra the accepted answer, section 4.3.2 of RFC 7231 states:
The server SHOULD send the same header fields in response to a HEAD request as it would have sent if the request had been a GET, except that the payload header fields (Section 3.3)
—which is to say, Content-Length, Content-Range, Trailer, and Transfer-Encoding—
MAY be omitted.
This is even weaker than the note on SHOULD in Paul Dixon's answer:
MAY This word, or the adjective "OPTIONAL", mean that an item is
truly optional. One vendor may choose to include the item because a
particular marketplace requires it or because the vendor feels that
it enhances the product while another vendor may omit the same item.
So the real answer is, you don't need to include Content-Length, but if you do, you should give the correct value.