wget says 406 Not acceptable - http

I have a simple file on my web server, and when I request it in a browser, it loads without problems:
http://example.server/report.php
But when I request the file with wget from a Raspberry Pi, I get this:
$ wget -d --spider http://example.server/report.php
Setting --spider (spider) to 1
DEBUG output created by Wget 1.18 on linux-gnueabihf.
Reading HSTS entries from /home/pi/.wget-hsts
URI encoding = 'ANSI_X3.4-1968'
converted 'http://example.server/report.php' (ANSI_X3.4-1968) -> 'http://example.server/report.php' (UTF-8)
Converted file name 'report.php' (UTF-8) -> 'report.php' (ANSI_X3.4-1968)
Spider mode enabled. Check if remote file exists.
--2018-06-03 07:29:29-- http://example.server/report.php
Resolving example.server (example.server)... 49.132.206.71
Caching example.server => 49.132.206.71
Connecting to example.server (example.server)|49.132.206.71|:80... connected.
Created socket 3.
Releasing 0x00832548 (new refcount 1).
---request begin---
HEAD /report.php HTTP/1.1
User-Agent: Wget/1.18 (linux-gnueabihf)
Accept: */*
Accept-Encoding: identity
Host: example.server
Connection: Keep-Alive
---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 406 Not Acceptable
Date: Fri, 15 Jun 2018 08:25:17 GMT
Server: Apache
Keep-Alive: timeout=3, max=200
Connection: Keep-Alive
Content-Type: text/html; charset=iso-8859-1
---response end---
406 Not Acceptable
Registered socket 3 for persistent reuse.
URI content encoding = 'iso-8859-1'
Remote file does not exist -- broken link!!!
I read somewhere that it might be an encoding problem, so I tried
$ wget -d --spider --header="Accept-encoding: *" http://example.server/report.php
but that gives me the exact same error.

That's because the server you're connecting to serves only to certain User-Agents.
Change the user agent and it works fine:
wget -d --user-agent="Mozilla/5.0 (Windows NT x.y; rv:10.0) Gecko/20100101 Firefox/10.0" http://example.server/report.php

Related

Multiple sites share one IP address: I can't reach special site using Host header

In a book I'm reading now the author shows what HTTP headers mean. Namely he said that there are servers that host multiple web site.
Let's do this:
ping fideloper.com
We can see the IP address: 198.211.113.202.
Now let's use the IP address only:
curl -I 198.211.113.202
We catch:
$ curl -I 198.211.113.202
HTTP/1.1 301 Moved Permanently
Server: nginx
Date: Thu, 03 Aug 2017 14:48:33 GMT
Content-Type: text/html
Content-Length: 178
Connection: keep-alive
Location: https://book.serversforhackers.com/
Let’s next see what happens when we add a Host header to the HTTP request:
$ curl -I -H "Host: fideloper.com" 198.211.113.202
HTTP/1.1 200 OK
Server: nginx
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Vary: Accept-Encoding
Cache-Control: max-age=86400, public
Date: Thu, 03 Aug 2017 13:23:58 GMT
Last-Modified: Fri, 30 Dec 2016 22:32:12 GMT
X-Frame-Options: SAMEORIGIN
Set-Cookie: laravel_session=eyJpdiI6IjhVQlk2UWcyRExsaDllVEpJOERaT3dcL2d2aE9mMHV4eUduSjFkQTRKU0R3PSIsInZhbHVlIjoiMmcwVUpNSjFETWs1amJaNzhGZXVGZjFPZ3hINUZ1eHNsR0dBV1FvdE9mQ1RFak5IVXBKUEs2aEZzaEhpRHRodE1LcGhFbFI3OTR3NzQxZG9YUlN5WlE9PSIsIm1hYyI6ImRhNTVlZjM5MDYyYjUxMTY0MjBkZjZkYTQ1ZTQ1YmNlNjU3ODYzNGNjZTBjZWUyZWMyMjEzYjZhOWY1MWYyMDUifQ%3D%3D; expires=Thu, 03-Aug-2017 15:23:58 GMT; Max-Age=7200; path=/; httponly
X-Fastcgi-Cache: HIT
This means that serversforhackers.com is the default site.
Then the author said that we could request Servers for Hackers on the same server:
$ curl -I -H "Host: serversforhackers.com” 198.211.113.202
Here in the book HTTP/1.1 200 OK is received.
But I receve this:
curl -I -H "Host: serversforhackers.com" 198.211.113.202
HTTP/1.1 301 Moved Permanently
Server: nginx
Date: Thu, 03 Aug 2017 14:55:14 GMT
Content-Type: text/html
Content-Length: 178
Connection: keep-alive
Location: https://book.serversforhackers.com/
Well, the author organized a 301 redirect and uses HTTPS now.
I could do this:
curl -I https://serversforhackers.com
But this doesn't illustrate the whole idea of what default site is and how Host header can address a special site on a shared IP address.
Is it still possible somehow to get 200 Ok addressing via IP address?
In HTTP/1.1, without HTTPS, the Host header is the only place where the hostname is sent to the server.
With HTTPS, things are more interesting.
First, your client will normally try to check the server’s TLS certificate against the expected name:
$ curl -I -H "Host: book.serverforhackers.com" https://198.211.113.202
curl: (51) SSL: certificate subject name (book.serversforhackers.com) does not match target host name '198.211.113.202'
Most clients provide a way to override this check. curl has the -k/--insecure option for that:
$ curl -k -I -H "Host: book.serverforhackers.com" https://198.211.113.202
HTTP/1.1 200 OK
Server: nginx
[...]
But then there’s the second issue. I can’t illustrate it with your example server, but here’s one I found on the Internet:
$ curl -k -I https://analytics.usa.gov
HTTP/1.1 200 OK
Content-Type: text/html
[...]
$ host analytics.usa.gov | head -n 1
analytics.usa.gov has address 54.240.184.142
$ curl -k -I -H "Host: analytics.usa.gov" https://54.240.184.142
curl: (35) gnutls_handshake() failed: Handshake failed
This is caused by server name indication (SNI) — a feature of TLS (HTTPS) whereby the hostname is also sent in the TLS handshake. It is necessary because the server needs to present the right certificate (for the right hostname) before it can receive any HTTP headers at all. In the example above, when we use https://54.240.184.142, curl doesn’t send the correct SNI, and the server refuses the handshake. Other servers might accept the connection but route it to a wrong place, where the Host header will end up being ignored.
With curl, you can’t set SNI with a separate option like you set the Host header. curl will always take it from the request URL. But curl has a special --resolve option:
Provide a custom address for a specific host and port pair. Using this, you can make the curl requests(s) use a specified address and prevent the otherwise normally resolved address to be used. Consider it a sort of /etc/hosts alternative provided on the command line.
In this case:
$ curl -I --resolve analytics.usa.gov:443:54.240.184.142 https://analytics.usa.gov
HTTP/1.1 200 OK
Content-Type: text/html
[...]
(443 is the standard TCP port for HTTPS)
If you want to experiment at a lower level, you can use the openssl tool to establish a raw TLS connection with the right SNI:
$ openssl s_client -connect 54.240.184.142:443 -servername analytics.usa.gov -crlf
You will then be able to type an HTTP request and see the right response:
HEAD / HTTP/1.1
Host: analytics.usa.gov
HTTP/1.1 200 OK
Content-Type: text/html
[...]
Lastly, note that in HTTP/2, there’s a special header named :authority (yes, with a colon) that may be used instead of Host by some clients. The distinction between them exists for backward compatibility with HTTP/1.1 and proxies: see RFC 7540 § 8.1.2.3 and RFC 7230 § 5.3 for details.

Why does curl not work, but wget works?

I am using both curl and wget to get this url: http://opinionator.blogs.nytimes.com/2012/01/19/118675/
For curl, it returns no output at all, but with wget, it returns the entire HTML source:
Here are the 2 commands. I've used the same user agent, and both are coming from the same IP, and are following redirects. The URL is exactly the same. For curl, it returns immediately after 1 second, so I know it's not a timeout issue.
curl -L -s "http://opinionator.blogs.nytimes.com/2012/01/19/118675/" --max-redirs 10000 --location --connect-timeout 20 -m 20 -A "Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1" 2>&1
wget http://opinionator.blogs.nytimes.com/2012/01/19/118675/ --user-agent="Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1"
If NY Times might be cloaking, and not returning the source to curl, what could be different in the headers curl is sending? I assumed since the user agent is the same, the request should look exactly the same from both of these requests. What other "footprints" should I check?
The way to solve is to analyze your curl request by doing curl -v ... and your wget request by doing wget -d ... which shows that curl is redirected to a login page
> GET /2012/01/19/118675/ HTTP/1.1
> User-Agent: Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
> Host: opinionator.blogs.nytimes.com
> Accept: */*
>
< HTTP/1.1 303 See Other
< Date: Wed, 08 Jan 2014 03:23:06 GMT
* Server Apache is not blacklisted
< Server: Apache
< Location: http://www.nytimes.com/glogin?URI=http://opinionator.blogs.nytimes.com/2012/01/19/118675/&OQ=_rQ3D0&OP=1b5c69eQ2FCinbCQ5DzLCaaaCvLgqCPhKP
< Content-Length: 0
< Content-Type: text/plain; charset=UTF-8
followed by a loop of redirections (which you must have noticed, because you have already set the --max-redirs flag).
On the other hand, wget follows the same sequence except that it returns the cookie set by nytimes.com with its subsequent request(s)
---request begin---
GET /2012/01/19/118675/?_r=0 HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Accept: */*
Host: opinionator.blogs.nytimes.com
Connection: Keep-Alive
Cookie: NYT-S=0MhLY3awSMyxXDXrmvxADeHDiNOMaMEZFGdeFz9JchiAIUFL2BEX5FWcV.Ynx4rkFI
The request sent by curl never includes the cookie.
The easiest way I see to modify your curl command and obtain the desired resource is by adding -c cookiefile to your curl command. This stores the cookie in the otherwise unused temporary "cookie jar" file called "cookiefile" thereby enabling curl to send the needed cookie(s) with its subsequent requests.
For example, I added the flag -c x directly after "curl " and I obtained the output just like from wget (except that wget writes it to a file and curl prints it on STDOUT).
In my case was because the https_proxy enviroment variable for utility cURL needs set the port in the URL, for example :
Not work with cURL :
https_proxy=http://proxyapp.net.com/
Works with cURL :
https_proxy=http://proxyapp.net.com:80/
With "wget" utility works with and without the port in url, but curl needs it, in case of not set the utility "curl" return error "(56) Proxy CONNECT aborted".
When you get verbosity of the command "curl -v" could see "curl" use port "1080" as default if port in not set at proxy url.

Unable to test HTTP PUT-based file upload via Squid Proxy

I can upload a file to my Apache web server using Curl just fine:
echo "[$(date)] file contents." | curl -T - http://WEB-SERVER/upload/sample.put
However, if I put a Squid proxy server in between, then I am not able to:
echo "[$(date)] file contents." | curl -x http://SQUID-PROXY:3128 -T - http://WEB-SERVER/upload/sample.put
Curl reports the following error:
Note: This error response was in HTML format, but I've removed the tags for ease of reading.
ERROR: The requested URL could not be retrieved
ERROR
The requested URL could not be retrieved
While trying to retrieve the URL:
http://WEB-SERVER/upload/sample.put
The following error was encountered:
Unsupported Request Method and Protocol
Squid does not support all request methods for all access protocols.
For example, you can not POST a Gopher request.
Your cache administrator is root.
My squid.conf doesn't seem to be having any ACL/rule that should disallow based on the src or dst IP addresses, or the protocol, or the HTTP method... as I can do an HTTP POST just fine between the same client and the web server, with the same proxy sitting in between.
In case of the failing HTTP PUT case, to see the request and response traffic that was actually occurring, I placed a netcat process in between Curl and Squid, and this is what I saw:
Request:
PUT http://WEB-SERVER/upload/sample.put HTTP/1.1
User-Agent: curl/7.15.5 (i686-redhat-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5
Host: WEB-SERVER
Pragma: no-cache
Accept: */*
Proxy-Connection: Keep-Alive
Transfer-Encoding: chunked
Expect: 100-continue
Response:
HTTP/1.0 501 Not Implemented
Server: squid/2.6.STABLE21
Date: Sun, 13 May 2012 02:11:39 GMT
Content-Type: text/html
Content-Length: 1078
Expires: Sun, 13 May 2012 02:11:39 GMT
X-Squid-Error: ERR_UNSUP_REQ 0
X-Cache: MISS from SQUID-PROXY-FQDN
X-Cache-Lookup: NONE from SQUID-PROXY-FQDN:3128
Via: 1.0 SQUID-PROXY-FQDN:3128 (squid/2.6.STABLE21)
Proxy-Connection: close
<SNIPPED the HTML error response already shown earlier above>
Note: I have anonymized the IP addresses and server names throughout for readability reasons.
Thanks to Amos Jeffries for answering this on squid-users forum. The issue is basically that Squid before version 3.1 does not implement HTTP 1.1 and thus rejects the chunked transfer encoding.

WGET 401 Unauthorized

I'm trying to use a batch file with WGET to download the public FCC file from here
http://wireless.fcc.gov/uls/data/complete/l_micro.zip
When I intially run the batch file with parameters
wget --server-response -owget.log http://wireless.fcc.gov/uls/data/complete/l_micro.zip
It fails with an HTTP 401 unauthorized error. I can retry at this point and it keeps failing. However I noticed if I open up IE, start a download and cancel when prompted to save, I can rerun the batch file and it executes perfectly!
Here is my detailed server response from the log
--2012-02-06 14:32:24-- http://wireless.fcc.gov/uls/data/complete/l_micro.zip
Resolving wireless.fcc.gov (wireless.fcc.gov)... 192.104.54.158
Connecting to wireless.fcc.gov (wireless.fcc.gov)|192.104.54.158|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 302 Found
Location: REMOVED - appears to have my IP
Cache-Control: no-cache
Pragma: no-cache
Content-Type: text/html; charset=utf-8
Connection: close
Content-Length: 513
Location: REMOVED [following]
--2012-02-06 14:32:24-- REMOVED
Resolving REMOVED... 192.168.2.11
Connecting to REMOVED|192.168.2.11|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 401 Unauthorized
Cache-Control: no-cache
Pragma: no-cache
WWW-Authenticate: NTLM
WWW-Authenticate: BASIC realm="AD_BCAAA"
Content-Type: text/html; charset=utf-8
Proxy-Connection: close
Set-Cookie: BCSI-CS-8ECFB6B4AA642EF0=2; Path=/
Connection: close
Content-Length: 575
Authorization failed.
Here is the log after doing my little IE procedure and getting it to work
--2012-02-08 15:52:43-- http://wireless.fcc.gov/uls/data/complete/l_micro.zip
Resolving wireless.fcc.gov (wireless.fcc.gov)... 192.104.54.158
Connecting to wireless.fcc.gov (wireless.fcc.gov)|192.104.54.158|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Server: Sun-Java-System-Web-Server/7.0
Date: Fri, 27 Jan 2012 18:37:51 GMT
Content-type: application/zip
Last-modified: Sun, 22 Jan 2012 11:18:09 GMT
Etag: "46fa95c-4f1bf071"
Accept-ranges: bytes
Content-length: 74426716
Connection: Keep-Alive
Age: 1045014
Length: 74426716 (71M) [application/zip]
Saving to: `l_micro.zip'
Any help is appreciated!
If the website has simply a htpassword setup, you can try:
wget --user=admin --ask-password https://www.yourwebsite.com/file.zip
I used --auth-no-challenge and the exact error get solved .
You have a Blue Coat secure web gateway on your network, as evidenced by the line in the response:
Set-Cookie: BCSI-CS-8ECFB6B4AA642EF0=2; Path=/
It looks like it wants you to authenticate, presumably with your domain credentials. Try passing them with --http-user and --http-passwd.
I had a similar issue with the xwiki based site. after several attempts I found some combination that worked for me just fine
wget --no-check-certificate --auth-no-challenge -k -nc -p -l 1 -r https://user:password#host.domain
I think the key was --auth-no-challenge
Try using this extension for firefox. It generates a wget or a curl command that can be copied and run from bash.
I came here trying to find out why wget was giving a 401 unauthorized message when on another system the problem did not occur.
After installing a later version of wget from source (binary was not available in my distro) it worked. I can't explain why, except that it must be some kind of bug so if none of the above fixes your problem, consider upgrading wget.
Try setting a user-agent string with wget - e.g.
--user-agent=Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
it's entirely feasible for a site to reject requests from certain user agents, particularly if they look to be circumventing the "usual" routes to information (i.e. through webpages).
Although this doesn't explain your problem, it's a good idea anyway. Perhaps the site implements a mechanism whereby when you browse with a "known" browser (e.g. IE) it then caches your IP as "safe" then allows any user agent from your IP to download anything :)

HTTP 500 error in wget

Take a look at this page:
http://www.ptmytrade.com/product.asp?id=61363
It's loading fine (at least here). Now I would like to grab it with wget.
$ wget http://www.ptmytrade.com/product.asp?id=61363 --debug
DEBUG output created by Wget 1.12 on linux-gnu.
--2011-05-21 18:24:51-- http://www.ptmytrade.com/product.asp?id=61363
Resolving www.ptmytrade.com... 205.209.150.134
Caching www.ptmytrade.com => 205.209.150.134
Connecting to www.ptmytrade.com|205.209.150.134|:80... connected.
Created socket 3.
Releasing 0x0890e260 (new refcount 1).
---request begin---
GET /product.asp?id=61363 HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: www.ptmytrade.com
Connection: Keep-Alive
---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 500 Internal Server Error
Connection: keep-alive
Date: Sat, 21 May 2011 16:24:56 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Content-Length: 471822
Content-Type: text/html
Set-Cookie: ASPSESSIONIDSCACCAQA=FOCCMJODFHHMOKNKPAIHJCIL; path=/
Cache-control: private
---response end---
500 Internal Server Error
Stored cookie www.ptmytrade.com -1 (ANY) / <session> <insecure> [expiry none] ASPSESSIONIDSCACCAQA FOCCMJODFHHMOKNKPAIHJCIL
Registered socket 3 for persistent reuse.
Disabling further reuse of socket 3.
Closed fd 3
2011-05-21 18:24:57 ERROR 500: Internal Server Error.
OK, so I check the headers when fetching the page using my browser (using Live HTTP Headers add-on):
http://www.ptmytrade.com/product.asp?id=61361
GET /product.asp?id=61361 HTTP/1.1
Host: www.ptmytrade.com
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:2.0) Gecko/20100101 Firefox/4.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: ASPSESSIONIDSCACBBRA=AMPBLLNDGMFLNPNCPEBPNNLB; ASPSESSIONIDSCACCAQA=FJNBMJODLHHJNDHPFBIEEPEM
HTTP/1.1 500 Internal Server Error
Date: Sat, 21 May 2011 16:20:46 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Content-Length: 471822
Content-Type: text/html
Cache-Control: private
----------------------------------------------------------
http://www.ptmytrade.com/images/index_117.jpg
GET /images/index_117.jpg HTTP/1.1
Host: www.ptmytrade.com
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:2.0) Gecko/20100101 Firefox/4.0
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Referer: http://www.ptmytrade.com/product.asp?id=61361
Cookie: ASPSESSIONIDSCACBBRA=AMPBLLNDGMFLNPNCPEBPNNLB; ASPSESSIONIDSCACCAQA=FJNBMJODLHHJNDHPFBIEEPEM
HTTP/1.1 404 Not Found
Content-Length: 1635
Content-Type: text/html
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Date: Sat, 21 May 2011 16:20:48 GMT
I'm not sure what's going on here. The page displays just fine, but I'm getting the 500 error code in the header.
The problem was solved by using curl (which was also getting a 500, but fetched the page just fine) instead, but I'm curious what's going here.
Hi everybody I had this huge problem too, I don't know why, but the solution was add this:
wget -U "Opera 11.0" "http://your_link" -O out.csv
I found it on
[Curl and wget return error 500 for helloworld.php on new install but browser is fine
Using this option will fix the issue:
--content-on-error
If this is set to on, wget will not skip the content when the
server responds with a http status code that indicates error.
So the command looks like this:
wget --content-on-error "https://stackoverflow.com"
NOTE: It's important to put the URL inside double-quotes, otherwise, wget will get stuck on Redirecting output to ‘wget-log’..
Or as stated in the comments and by OP, use curl instead.
But I should note that curl cannot download whole webpages (css, js, images etc.) because it cannot parse HTML. Source and Taken from.
It's a bug in the webpage. The HTTP status is indeed seemingly incorrectly set to HTTP 500. Firefox/Firebug also confirms this. Basically, you're facing a HTTP 500 error page with "normal" content.
Report it to the site admin.
Try enclosing it in quotes:
wget "http://www.ptmytrade.com/product.asp?id=61363"
instead of:
wget http://www.ptmytrade.com/product.asp?id=61363

Resources