Http protocol content-length - http

I am working on a simple download application. While making a request for the following file both firefox and my application doesn't get the content-length field. But if i make the request using wget server does send the content-length field. I did change wgets user agent string to test and it still got the content-length field.
Any ideas why this is happening?
wget request
---request begin---
GET /dc-13/video/2005_Defcon_V2-P_Zimmerman-Unveiling_My_Next_Big_Project.mp4 HTTP/1.0
User-Agent: test
Accept: */*
Host: media.defcon.org
Connection: Keep-Alive
---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.0 200 OK
Server: lighttpd
Date: Sun, 05 Apr 2009 04:40:08 GMT
Last-Modified: Tue, 23 May 2006 22:18:19 GMT
Content-Type: video/mp4
Content-Length: 104223909
Connection: keep-alive
firefox request
GET /dc-13/video/2005_Defcon_V2-P_Zimmerman-Unveiling_My_Next_Big_Project.mp4 HTTP/1.1
Host: media.defcon.org
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.0.8) Gecko/2009032608 Firefox/3.0.8
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://www.defcon.org/html/links/defcon-media-archives.html
Pragma: no-cache
Cache-Control: no-cache
HTTP/1.x 200 OK
Server: lighttpd
Date: Sun, 05 Apr 2009 05:20:12 GMT
Last-Modified: Tue, 23 May 2006 22:18:19 GMT
Content-Type: video/mp4
Transfer-Encoding: chunked
Update:
Is there a header that I can send that will tell Lighthttpd not to use chunked encoding.My original problem is that I am using urlConnection to grab the file in my java application which automatically sends HTTP 1.1 request.
I would like to know the size of the file so i can update my percentage.

GET
/dc-13/video/2005_Defcon_V2-P_Zimmerman-Unveiling_My_Next_Big_Project.mp4
HTTP/1.1
Firefox is performing an HTTP 1.1 GET request. Lighthttpd understands that the client will support chunked-transfer encoding and returns the content in chunks, with each chunk reporting its own length.
GET
/dc-13/video/2005_Defcon_V2-P_Zimmerman-Unveiling_My_Next_Big_Project.mp4
HTTP/1.0
Wget on the other hand performs an HTTP 1.0 GET request. Lighthttpd, understanding that the client doesn't support HTTP 1.1 (and thus chunked-transfer encoding), returns the content in one single chunk, with the length reported in the response header.

Looks like it's because of the chunked transfer encoding:
Transfer-Encoding: chunked
This will send the video down in chunks, each with its own size. This is defined in HTTP 1.1, which is what Firefox is using, while wget is using HTTP 1.0, which doesn't support chunked transfer encoding, so the server has to send the whole file at once.

I was having the same problem and found a solution regardless of which HTTP version:
First use a HEAD request to the server which correctly responds with just the HTTP header and no contents. This header correctly includes the wanted Content-Length: bytes size for the file to download.
Proceed with the GET request to download the file (the header from the GET response fails to include Content-length).
An Objective-C language example:
NSString *zipURL = #"http://1.bp.blogspot.com/_6-cw84gcURw/TRNb3PDWneI/AAAAAAAAAYM/YFCZP1foTiM/s1600/paragliding1.jpg";
NSURL *url = [NSURL URLWithString:zipURL];
// Configure the HTTP request for HEAD header fetch
NSMutableURLRequest *urlRequest = [NSMutableURLRequest requestWithURL:url];
urlRequest.HTTPMethod = #"HEAD"; // Default is "GET"
// Define response class
__autoreleasing NSHTTPURLResponse *response;
// Send HEAD request to server
NSData *contentsData = [NSURLConnection sendSynchronousRequest:urlRequest returningResponse:&response error:nil];
// Header response field
NSDictionary *headerDeserialized = response.allHeaderFields;
// The contents length
int contents_length = [(NSString*)headerDeserialized[#"Content-Length"] intValue];
//printf("HEAD Response header: %s\n",headerDeserialized.description.UTF8String);
printf("HEAD:\ncontentsData.length: %d\n",contentsData.length);
printf("contents_length = %d\n\n",contents_length);
urlRequest.HTTPMethod = #"GET";
// Send "GET" to download file
contentsData = [NSURLConnection sendSynchronousRequest:urlRequest returningResponse:&response error:nil];
// Header response field
headerDeserialized = response.allHeaderFields;
// The contents length
contents_length = [(NSString*)headerDeserialized[#"Content-Length"] intValue];
printf("GET Response header: %s\n",headerDeserialized.description.UTF8String);
printf("GET:\ncontentsData.length: %d\n",contentsData.length);
printf("contents_length = %d\n",contents_length);
return;
And the output:
HEAD:
contentsData.length: 0
contents_length = 146216
GET:
contentsData.length: 146216
contents_length = 146216
(Note: This example URL does correctly provides the header Content-Length from the GET response, but shows the idea if it failed to)

Related

Getting strange http response codes, but the site is actually working

When I view the URL below or the other below in the code it's displayed fine. I don't see anything unusual in the network tab when I press F12 in the browser, but with the code below I will get response codes 403 or 400. When I use the response code checker here http://httpstatus.io/ it will come back fine with a 200 response for both URLS.
I get a 403 for http://psychsignal.com/ using my code below.
URL u = new URL("http://www.nasdaqomxnordic.com/"); //returns 400 response code
//u.toURI(); //to check the syntax
HttpURLConnection huc = (HttpURLConnection)u.openConnection();
huc.setRequestMethod("GET");
//huc.setRequestMethod("HEAD");
huc.connect();
System.out.println(huc.getResponseCode());
Thanks if anyone has any ideas! This is actually my first post!
My guess is that there's some restrictions placed on the User-Agent of the client. Some testing seems to support my theory:
If I use the curl default user agent:
# curl -I -H "User-Agent: curl/7.35.0" "http://www.nasdaqomxnordic.com/"
HTTP/1.1 400 Bad Request
Content-Type: text/html; charset=UTF-8
Cache-Control: no-cache
Pragma: no-cache
Expires: 0
Connection: close
If I use a hacked up standard browser agent string:
# curl -I -H "User-Agent: Mozilla/5.0" -0 "http://www.nasdaqomxnordic.com/"
HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Length: 0
Content-Type: text/html;charset=UTF-8
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Wed, 22 Jul 2015 15:06:22 GMT
Connection: close
And then if I use a Java agent string (which is my guess as to what you're using):
# curl -I -H "User-Agent: Java/1.6.0_26" "http://www.nasdaqomxnordic.com/"
HTTP/1.1 400 Bad Request
Content-Type: text/html; charset=UTF-8
Cache-Control: no-cache
Pragma: no-cache
Expires: 0
Connection: close
Only the "browser" user agent gets through. I'd try tweaking your code to set the user agent string to something commonly found in a web browser.

HTTP 400 Bad Request Error

I am trying to post a XML file (test1.xml) and receive an output from a webservice API. I get an error HTTP/1.1 400 Bad Request. This is the code below.
myheader=c(Connection="close",
'Content-Type' = "application/xml",
'Content-length' =nchar("test1.xml"))
data = getURL("http://abcd/efg/requests/",
userpwd="m12345:123456", httpauth = 1L,
postfields="test1.xml",
httpheader=myheader,
verbose=TRUE)
This is the output
* Hostname was NOT found in DNS cache
* Trying 123.456.789.123...
* Connected to rcftomdev1 (123.456.789.123) port 8086 (#0)
* Server auth using Basic with user 'm12345'
> POST /dart/requests/ HTTP/1.1
Authorization: Basic bTEzNDQ4M#$#$#$#$%5==
Host: abcd:8086
Accept: */*
Connection: close
Content-Type: application/xml
Content-length: 9
* upload completely sent off: 9 out of 9 bytes
< HTTP/1.1 400 Bad Request
< Server: Apache-Coyote/1.1
< Content-Type: text/html;charset=utf-8
< Content-Length: 968
< Date: Tue, 20 Jan 2015 07:50:05 GMT
< Connection: close
<
* Closing connection 0
Not sure where I am going wrong, need help ?
If that was indeed exactly what was sent in the Authorization: header for the given user+password then something seriously wrong happened as your combo should've created bTEyMzQ1OjEyMzQ1Ngo=.
The other aspects of the request looks fine so only reading up on the server side requirements for this server and API will give you the answer.

HTTP Content length less than File byte-size, did it fully download?

Trying to determine if a user actually downloaded an executable file from a website. I examined the pcap and I see that the Content-Length field = 784,536 but the Server->User is 430,380 bytes. This tells me that the user did not fully download the file. I also downloaded the file myself and see that it is 766 KB. Is it possible that the content-length value based on the HTTP header will not be EQUAL TO the file size of that EXE file if it is downloaded (the local file size)? Is this correct?
Packet Capture Data (I can't post screenshots)
GET /ChromasLite211Setup.exe HTTP/1.1
Host: www.technelysium.com.au
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:17.0) Firefox/17.0
Accept: text/html, application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us
Accept-Enconding: gzip, deflate
Connection: keep-alive
Referrer: http://technelysium.com.au/
HTTP/1.1 200 OK
Date: Thu, 01 Aug 2013 17:28:17 GMT
Server: Apache
Last-Modified: Mon, 15 Apr 2013 08:29:57 GMT
Accept-Ranges: bytes
Content-Length: 784536
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: application/x-msdownload
MZP........................#.............................!..L..This program must be run under Win32
Entire Conversation (430722 bytes)
Users IP -> Server IP (342 bytes)
Server IP -> Users IP (430380)
When I download the file from the site it shows as, "Binary FIle (766 KB)"
Converting Bytes to Kilobytes
784,536/1024 = 766.14
No. The user did not download all the bytes.
If a server sends a Content-Length header, that is exactly how many bytes of content it intends to send as the HTTP Response Body. If less than that number was sent, then something happened (Client terminated connection, Client timed out, etc.)

HTTP Get not using Port

I am trying to call a page in PHP with a http_get :
$url = "http://mysite.fr:9090/neolane-webservice/campagnesclient/Coclico=1135446";
http_get($url, $appelOptions, $appelInfos);
My problem is that it does not work every time.
I installed Wireshark to see what I'm really sending and I found an odd thing. Sometimes, the port is not used for the HTTP request.
When it works, I have :
Hypertext Transfer Protocol
GET http://mysite.fr:9090/neolane-webservice/campagnesclient/Coclico=1135446 HTTP/1.1\r\n
Request Method: GET
Request URI: http://mysite.fr:9090/neolane-webservice/campagnesclient/Coclico=1135446
Request Version: HTTP/1.1
User-Agent: PECL::HTTP/1.6.5 (PHP/5.2.4-2ubuntu5.7)\r\n
Host: mysite.fr:9090\r\n
Pragma: no-cache\r\n
Accept: */*\r\n
Proxy-Connection: Keep-Alive\r\n
Keep-Alive: 300\r\n
Connection: keep-alive\r\n
Date: Fri, 15 Jun 2012 16:40:46 +0200\r\n
Accept-Charset: utf-8\r\n
Accept-Encoding: gzip;q=1.0,deflate;q=0.5\r\n
\r\n
And when it's not :
Hypertext Transfer Protocol
GET http://mysite.fr:9090/neolane-webservice/campagnesclient/Coclico=1135446 HTTP/1.1\r\n
Request Method: GET
Request URI: http://mysite.fr:9090/neolane-webservice/campagnesclient/Coclico=1135446
Request Version: HTTP/1.1
User-Agent: PECL::HTTP/1.6.5 (PHP/5.2.4-2ubuntu5.7)\r\n
Host: mysite.fr\r\n
Pragma: no-cache\r\n
Accept: */*\r\n
Proxy-Connection: Keep-Alive\r\n
Keep-Alive: 300\r\n
Connection: keep-alive\r\n
Date: Fri, 15 Jun 2012 16:40:34 +0200\r\n
Accept-Charset: utf-8\r\n
Accept-Encoding: gzip;q=1.0,deflate;q=0.5\r\n
\r\n
I tried to call the page with wget and it's always working :
wget http://mysite.fr:9090/neolane-webservice/campagnesclient/Coclico=1135446
So I'm guessing that my problem id due to Apache config, but I don't know where to look. Could you help me please ?
You will need to set the port in the $appelOptions array.
$appelOptions['port']=9090;
http_get($url, $appelOptions, $appelInfos);
Unfortunately http_get does not seem to respect the :port syntax in the URL

HTTP 500 error in wget

Take a look at this page:
http://www.ptmytrade.com/product.asp?id=61363
It's loading fine (at least here). Now I would like to grab it with wget.
$ wget http://www.ptmytrade.com/product.asp?id=61363 --debug
DEBUG output created by Wget 1.12 on linux-gnu.
--2011-05-21 18:24:51-- http://www.ptmytrade.com/product.asp?id=61363
Resolving www.ptmytrade.com... 205.209.150.134
Caching www.ptmytrade.com => 205.209.150.134
Connecting to www.ptmytrade.com|205.209.150.134|:80... connected.
Created socket 3.
Releasing 0x0890e260 (new refcount 1).
---request begin---
GET /product.asp?id=61363 HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: www.ptmytrade.com
Connection: Keep-Alive
---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 500 Internal Server Error
Connection: keep-alive
Date: Sat, 21 May 2011 16:24:56 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Content-Length: 471822
Content-Type: text/html
Set-Cookie: ASPSESSIONIDSCACCAQA=FOCCMJODFHHMOKNKPAIHJCIL; path=/
Cache-control: private
---response end---
500 Internal Server Error
Stored cookie www.ptmytrade.com -1 (ANY) / <session> <insecure> [expiry none] ASPSESSIONIDSCACCAQA FOCCMJODFHHMOKNKPAIHJCIL
Registered socket 3 for persistent reuse.
Disabling further reuse of socket 3.
Closed fd 3
2011-05-21 18:24:57 ERROR 500: Internal Server Error.
OK, so I check the headers when fetching the page using my browser (using Live HTTP Headers add-on):
http://www.ptmytrade.com/product.asp?id=61361
GET /product.asp?id=61361 HTTP/1.1
Host: www.ptmytrade.com
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:2.0) Gecko/20100101 Firefox/4.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: ASPSESSIONIDSCACBBRA=AMPBLLNDGMFLNPNCPEBPNNLB; ASPSESSIONIDSCACCAQA=FJNBMJODLHHJNDHPFBIEEPEM
HTTP/1.1 500 Internal Server Error
Date: Sat, 21 May 2011 16:20:46 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Content-Length: 471822
Content-Type: text/html
Cache-Control: private
----------------------------------------------------------
http://www.ptmytrade.com/images/index_117.jpg
GET /images/index_117.jpg HTTP/1.1
Host: www.ptmytrade.com
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:2.0) Gecko/20100101 Firefox/4.0
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Referer: http://www.ptmytrade.com/product.asp?id=61361
Cookie: ASPSESSIONIDSCACBBRA=AMPBLLNDGMFLNPNCPEBPNNLB; ASPSESSIONIDSCACCAQA=FJNBMJODLHHJNDHPFBIEEPEM
HTTP/1.1 404 Not Found
Content-Length: 1635
Content-Type: text/html
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Date: Sat, 21 May 2011 16:20:48 GMT
I'm not sure what's going on here. The page displays just fine, but I'm getting the 500 error code in the header.
The problem was solved by using curl (which was also getting a 500, but fetched the page just fine) instead, but I'm curious what's going here.
Hi everybody I had this huge problem too, I don't know why, but the solution was add this:
wget -U "Opera 11.0" "http://your_link" -O out.csv
I found it on
[Curl and wget return error 500 for helloworld.php on new install but browser is fine
Using this option will fix the issue:
--content-on-error
If this is set to on, wget will not skip the content when the
server responds with a http status code that indicates error.
So the command looks like this:
wget --content-on-error "https://stackoverflow.com"
NOTE: It's important to put the URL inside double-quotes, otherwise, wget will get stuck on Redirecting output to ‘wget-log’..
Or as stated in the comments and by OP, use curl instead.
But I should note that curl cannot download whole webpages (css, js, images etc.) because it cannot parse HTML. Source and Taken from.
It's a bug in the webpage. The HTTP status is indeed seemingly incorrectly set to HTTP 500. Firefox/Firebug also confirms this. Basically, you're facing a HTTP 500 error page with "normal" content.
Report it to the site admin.
Try enclosing it in quotes:
wget "http://www.ptmytrade.com/product.asp?id=61363"
instead of:
wget http://www.ptmytrade.com/product.asp?id=61363

Resources