wget connection reset by peer - tcp

I try to access www.indeed.com from our web server by using wget but it raises "Connection reset by peer" error.
wget www.indeed.com
--2013-02-05 03:03:12-- (try: 3) http://www.indeed.com/
Connecting to www.indeed.com|208.43.224.140|:80... connected.
HTTP request sent, awaiting response... Read error (Connection reset by peer) in headers.
Retrying.
It was working before cause I'm using their API for a while but now I'm not even reach their public website.
What could it be the problem? Could Indeed add to their blacklist the server's IP or is this related with my firewall etc.?
Is there a way to debug/trace where the problem is?

You should use with user-agent like following sample
wget "http://www.indeed.com/" --user-agent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"

Related

Why wget fails to download a file but browser succeeds?

I am trying to download virus database for clamav from http://database.clamav.net/main.cvd location. I am able to download main.cvd from web browser(chrome or firefox) but unable to do same with wget and get the following error:
--2021-05-03 19:06:01-- http://database.clamav.net/main.cvd
Resolving database.clamav.net (database.clamav.net)... 104.16.219.84, 104.16.218.84, 2606:4700::6810:db54, ...
Connecting to database.clamav.net (database.clamav.net)|104.16.219.84|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2021-05-03 19:06:01 ERROR 403: Forbidden.
Any lead on this issue?
Edit 1:
This is how my chrome cookies look like when I try to download main.cvd
Any lead on this issue?
It might be possible that blocking is based on User-Agent header. You might use --user-agent= option to set same User-Agent as for browser. Example
wget --user-agent="Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0" https://www.example.com
will download example.com page and identify itself as Firefox to server. If you want know more about User-Agent's part meaning you might read Developer Mozilla docs for User-Agent header
Check for the session cookies or tokens from browser, as some websites place similar kind of security

wget runs locally but not from a server

I am trying to fetch a url via wget.My wget command is as follows:
wget --header "User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0" --header "Host: www.zomato.com" --header "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" "https://www.zomato.com/bangalore/restaurants/biryani"
This runs perfectly when I run it locally.However, when I host this on a server and then run it,it fails.It connects to the host and then waits forever and then fails due to a timeout error.
Just to ensure that it is not because of a particular server,I have now tried this on :
AWS
Heroku
Heroku with Fixie for a fixed outbound static IP address
Google Compute in different regions
Netmagix
and it fails every single time. Can anybody give me any pointers as to why this could be happening ?

How to differentiate request coming from command-line and browsers?

To check whether it is a cli or http request, in PHP this method php_sapi_namecan be used, take a look here. I am trying to replicate that in apache conf file. The underlying idea is, if the request is coming from cli a 'minimal info' is served, if the request is from browsers then the users are redirected to different location. Is this possible?
MY PSEUDO CODE:
IF (REQUEST_COMING_FROM_CLI) {
ProxyPass / http://${IP_ADDR}:5000/
ProxyPassReverse / http://${IP_ADDR}:5000/
}ELSE IF(REQUEST_COMING_FROM_WEB_BROWSERS){
ProxyPass / http://${IP_ADDR}:8585/welcome/
ProxyPassReverse / http://${IP_ADDR}:8585/welcome/
}
Addition: cURL uses host of different protocols including http, ftp & telnet. Can apache figure out if the request is from cli or browser?
For as far as I know, there is no way to find the difference using apache.
if a request from the command-line is set up properly, apache can not make a difference between command-line and browser.
When you check it in PHP (using php_sapi_name, as you suggested), it only checks where php itself was called from (cli, apache, etc.), not where the http request came from.
using telnet for the command line, you can connect to apache, set the required http-headers and send the request as if you were using a browser(only, the browser sets the headers for you)
so, i do not think apache could differentiate between console or browser
The only way to do this is to test the user agent sent in the header of the request but this information can be easily changed.
By default every php http request looks like this to the apache server:
192.168.1.15 - - [01/Oct/2008:21:52:43 +1300] "GET / HTTP/1.0" 200 5194 "-" "-"
this information can be easily changed to look like a browser, for example using this
ini_set('user_agent',
'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3');
the http request will look like this
192.168.1.15 - - [01/Oct/2008:21:54:29 +1300] "GET / HTTP/1.0" 200 5193
"-" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3"
At this moment the apache will think that the received connection come from a windows firefox 3.0.3.
So there is no a exact way to get this information.
You can use a BrowserMatch directive if the cli requests are not spoofing a real browser in the User-Agent header. Else, like everyone else has said, there is no way to tell the difference.

HTTP over TCP using Telnet/Hercules/Raw socket/

I'm connecting to real-time data on a remote server as a client. I want to send the following to a server and keep the connection open. This is a 'push' protocol.
http://server.domain.com:80/protocol/dosomething.txt?POSTDATA=thePostData
I can call this in a browser and it's fine. However, if I try to use telnet directly in a windows command prompt, the prompt just exits.
GET protocol/dosomething.txt?POSTDATA=thePostData
The same is the case if I use Putty.exe and select Telnet as the protocol. I can't see a way to do this with Hercules at all, as I don't think the server will interpret the GET
Is there any way I can do this?
Thanks.
You have to match the HTTP protocol (RFC2616) to the letter if you want to use telnet. Try something like:
shell$ telnet www.google.com 80
Trying 173.194.43.50...
Connected to www.google.com (173.194.43.50).
Escape character is '^]'.
GET / HTTP/1.1
Host: www.google.com:80
Connection: close
HTTP/1.1 200 OK
Date: Tue, 11 Sep 2012 15:09:51 GMT
...
You need to type the following lines including an "empty line" following the "Connection" line.
GET / HTTP/1.1
Host: www.google.com:80
Connection: close

gWidgetsWWW2 Error

Hi I have installed FastRWeb, Rserve and gWidgetsWWW2 packages and followed the instructions. on the respective sites on my Linux (Ubuntu 10.04.3) with Apache web server.
I have loaded the test.R app and when I go to the URL
//localhost/cgi-bin/R/app?app=test as in the following github site
https://github.com/jverzani/gWidgetsWWW2/tree/master/inst/FastRWeb
I can see the app in my browsers. When I click on the "Click for a message" button nothing happens. I can inspect the element on my come browser and see that there is an error when executing the runHandler.R function. The error I see is:
Error in rawToChar(request$body) : object 'request' not found
When I look at the header I see POST is passing request, but R is not seeing it as an object?
Request URL:http://localhost/cgi-bin/R/gwappAJAX/runHandler
Request Method:POST
Status Code:200 OK
Request Headersview source
Accept:*/*
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Connection:keep-alive
Content-Length:77
Content-Type:application/json
Host:localhost
Origin:http://localhost
Referer:http://localhost/cgi-bin/R/app?app=test
User-Agent:Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/myIP Safari/536.11
X-Requested-With:XMLHttpRequest
Request Payload
{"id":"ogWidget_ID3","signal":"click","value":null,"session_id":"0BJS1QKLM9"}
Response Headersview source
Connection:Keep-Alive
Content-Encoding:gzip
Content-Length:78
Content-Type:text/html; charset=utf-8
Date:Thu, 12 Jul 2012 17:17:50 GMT
Keep-Alive:timeout=15, max=96
Server:Apache/2.2.14 (Ubuntu)
Vary:Accept-Encoding
Did in miss something in the set up? why isn't my R session not seeing the request object?
You are better off running such scripts under Rook, the FastRWeb setup is much less responsive. I've found running Rook on a local port, like 9000 and using apache to reverse proxy to that port will work fine, though it doesn't scale the way a FastRWeb solution should.
With that said, does it run locally under Rook through load_app? If so, then it may be that the newer FastRWeb + RServe isn't working. I haven't tested this since Simon updated his work. I hope to get to this though this summer. The promise of using websockets for the communication with R should bypass this responsiveness issue.

Resources