wget runs locally but not from a server - http

I am trying to fetch a url via wget.My wget command is as follows:
wget --header "User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0" --header "Host: www.zomato.com" --header "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" "https://www.zomato.com/bangalore/restaurants/biryani"
This runs perfectly when I run it locally.However, when I host this on a server and then run it,it fails.It connects to the host and then waits forever and then fails due to a timeout error.
Just to ensure that it is not because of a particular server,I have now tried this on :
AWS
Heroku
Heroku with Fixie for a fixed outbound static IP address
Google Compute in different regions
Netmagix
and it fails every single time. Can anybody give me any pointers as to why this could be happening ?

Related

Airflow 2.0 API response 403 Forbidden

I'm trying to trigger a new dag run via Airflow 2.0 REST API. If I am logged in to the Airflow webserver on the remote machine and I go to the swagger documentation page to test the API, the call is successful. If I log out or if the API call is sent through Postman or curl, then I get a 403 forbidden message. The same 403 error message is received in curl or postman whether I provide the web server username password or not.
curl -X POST --user "admin:blabla" "http://10.0.0.3:7863/api/v1/dags/tutorial_taskflow_api_etl/dagRuns" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"conf\":{},\"dag_run_id\":\"string5\"}"
{
"detail": null,
"status": 403,
"title": "Forbidden",
"type": "https://airflow.apache.org/docs/2.0.0/stable-rest-api-ref.html#section/Errors/PermissionDenied"
}
The security for API has been changed to default, instead of deny_all (auth_backend = airflow.api.auth.backend.default). The installation of airflow has been done using pip using ubuntu 18 bionic. Dags are running fine if triggered manually or scheduled. The database backend is postgres.
Also tried copying the cookie details from Chrome into postman to get past this issue, but it did not work.
Here is the log on the web server for the two calls mentioned above.
airflowWebserver_container | 10.0.0.4 - - [05/Jan/2021:06:35:33 +0000] "POST /api/v1/dags/tutorial_taskflow_api_etl/dagRuns HTTP/1.1" 403 170 "http://10.0.0.3:7863/api/v1/ui/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"
airflowWebserver_container | 10.0.0.4 - - [05/Jan/2021:06:35:07 +0000] "POST /api/v1/dags/tutorial_taskflow_api_etl/dagRuns HTTP/1.1" 409 251 "http://10.0.0.3:7863/api/v1/ui/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"
I am using basic_auth for Airflow v2.0. The AIRFLOW__API__AUTH_BACKEND environment variable should be set to airflow.api.auth.backend.basic_auth. You will have to restart the webserver container. Then you should be able to access all stable APIs using the cURL commands with --user option.
In Airflow 2.0, There seems to be some bug.
If you set this auth configuration in airflow.cfg, it doesn't work.
auth_backend = airflow.api.auth.backend.basic_auth
But setting this as an environment variable works
AIRFLOW__API__AUTH_BACKEND: "airflow.api.auth.backend.basic_auth"
#AmitSingh was correct. Setting security to default only works with the experimental api. I changed the relevant configuration in airflow, restarted and added 'experimental' in the api path. Please see https://airflow.apache.org/docs/apache-airflow/stable/rest-api-ref.html
Maybe also good to know:
You can only disable authentication for experimental API, not the stable REST API.
See: https://airflow.apache.org/docs/apache-airflow/stable/security/api.html#disable-authentication

How to make an HTTP GET request manually with netcat?

So, I have to retrieve temperature from any one of the cities from http://www.rssweather.com/dir/Asia/India.
Let's assume I want to retrieve of Kanpur's.
How to make an HTTP GET request with Netcat?
I'm doing something like this.
nc -v rssweather.com 80
GET http://www.rssweather.com/wx/in/kanpur/wx.php HTTP/1.1
I don't know exactly if I'm even in the right direction or not. I am not able to find any good tutorials on how to make an HTTP get request with netcat, so I'm posting it on here.
Of course you could dig in standards searched for google, but actually if you want to get only a single URL, it isn't​‎​‎ worth the effort.
You could also start a netcat in listening mode on a port:
nc -l 64738
(Sometimes nc -l -p 64738 is the correct argument list)
...and then do a browser request into this port with a real browser. Just type in your browser http://localhost:64738 and see.
In your actual case the problem is that HTTP/1.1 doesn't close the connection automatically, but it waits your next URL you want to retrieve. The solution is simple:
Use HTTP/1.0:
GET /this/url/you/want/to/get HTTP/1.0
Host: www.rssweather.com
<empty line>
or use a Connection: request header to say the server you want to close after that:
GET /this/url/you/want/to/get HTTP/1.1
Host: www.rssweather.com
Connection: close
<empty line>
Extension: After the GET header write only the path part of the request. The hostname from which you want to get data belongs to a Host: header as you can see in my examples. This is because multiple websites can run on the same webserver, so the browsers need to say him, from which site it wants to load the page.
This works for me:
$ nc www.rssweather.com 80
GET /wx/in/kanpur/wx.php HTTP/1.0
Host: www.rssweather.com
And then hit double <enter>, i.e. once for the remote http server and once for the nc command.
source: pentesterlabs
You don't even need to use/install netcat
Create a tcp socket via an unused file-descriptor i.e I use 88 here
Write the request into it
use the fd
exec 88<>/dev/tcp/rssweather.com/80
echo -e "GET /dir/Asia/India HTTP/1.1\nhost: www.rssweather.com\nConnection: close\n\n" >&88
sed 's/<[^>]*>/ /g' <&88
On MacOS, you need the -c flag as follows:
Little-Net:~ minfrin$ nc -c rssweather.com 80
GET /wx/in/kanpur/wx.php HTTP/1.1
Host: rssweather.com
Connection: close
[empty line]
The response then appears as follows:
HTTP/1.1 200 OK
Date: Thu, 23 Aug 2018 13:20:49 GMT
Server: Apache
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html
The -c flag is described as "Send CRLF as line-ending".
To be HTTP/1.1 compliant, you need the Host header, as well as the "Connection: close" if you want to disable keepalive.
Test it out locally with python3 http.server
This is also a fun way to test it out. On one shell, launch a local file server:
python3 -m http.server 8000
Then on the second shell, make a request:
printf 'GET / HTTP/1.1\r\nHost: localhost\r\n\r\n' | nc localhost 8000
The Host: header is required in HTTP 1.1.
This shows an HTML listing of the directory, just as you would see from:
firefox http://localhost:8000
Next you can try to list files and directories and observe the response:
printf 'GET /my-subdir/ HTTP/1.1\n\n' | nc localhost 8000
printf 'GET /my-file HTTP/1.1\n\n' | nc localhost 8000
Every time you make a successful request, the server prints:
127.0.0.1 - - [05/Oct/2018 11:20:55] "GET / HTTP/1.1" 200 -
confirming that it was received.
example.com
This IANA maintained domain is another good test URL:
printf 'GET / HTTP/1.1\r\nHost: example.com\r\n\r\n' | nc example.com 80
and compare with: http://example.com/
https SSL
nc does not seem to be able to handle https URLs. Instead, you can use:
sudo apt-get install nmap
printf 'GET / HTTP/1.1\r\nHost: github.com\r\n\r\n' | ncat --ssl github.com 443
See also: https://serverfault.com/questions/102032/connecting-to-https-with-netcat-nc/650189#650189
If you try nc, it just hangs:
printf 'GET / HTTP/1.1\r\nHost: github.com\r\n\r\n' | nc github.com 443
and trying port 80:
printf 'GET / HTTP/1.1\r\nHost: github.com\r\n\r\n' | nc github.com 443
just gives a redirect response to the https version:
HTTP/1.1 301 Moved Permanently
Content-Length: 0
Location: https://github.com/
Connection: keep-alive
Tested on Ubuntu 18.04.

How to differentiate request coming from command-line and browsers?

To check whether it is a cli or http request, in PHP this method php_sapi_namecan be used, take a look here. I am trying to replicate that in apache conf file. The underlying idea is, if the request is coming from cli a 'minimal info' is served, if the request is from browsers then the users are redirected to different location. Is this possible?
MY PSEUDO CODE:
IF (REQUEST_COMING_FROM_CLI) {
ProxyPass / http://${IP_ADDR}:5000/
ProxyPassReverse / http://${IP_ADDR}:5000/
}ELSE IF(REQUEST_COMING_FROM_WEB_BROWSERS){
ProxyPass / http://${IP_ADDR}:8585/welcome/
ProxyPassReverse / http://${IP_ADDR}:8585/welcome/
}
Addition: cURL uses host of different protocols including http, ftp & telnet. Can apache figure out if the request is from cli or browser?
For as far as I know, there is no way to find the difference using apache.
if a request from the command-line is set up properly, apache can not make a difference between command-line and browser.
When you check it in PHP (using php_sapi_name, as you suggested), it only checks where php itself was called from (cli, apache, etc.), not where the http request came from.
using telnet for the command line, you can connect to apache, set the required http-headers and send the request as if you were using a browser(only, the browser sets the headers for you)
so, i do not think apache could differentiate between console or browser
The only way to do this is to test the user agent sent in the header of the request but this information can be easily changed.
By default every php http request looks like this to the apache server:
192.168.1.15 - - [01/Oct/2008:21:52:43 +1300] "GET / HTTP/1.0" 200 5194 "-" "-"
this information can be easily changed to look like a browser, for example using this
ini_set('user_agent',
'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3');
the http request will look like this
192.168.1.15 - - [01/Oct/2008:21:54:29 +1300] "GET / HTTP/1.0" 200 5193
"-" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3"
At this moment the apache will think that the received connection come from a windows firefox 3.0.3.
So there is no a exact way to get this information.
You can use a BrowserMatch directive if the cli requests are not spoofing a real browser in the User-Agent header. Else, like everyone else has said, there is no way to tell the difference.

Why does curl not work, but wget works?

I am using both curl and wget to get this url: http://opinionator.blogs.nytimes.com/2012/01/19/118675/
For curl, it returns no output at all, but with wget, it returns the entire HTML source:
Here are the 2 commands. I've used the same user agent, and both are coming from the same IP, and are following redirects. The URL is exactly the same. For curl, it returns immediately after 1 second, so I know it's not a timeout issue.
curl -L -s "http://opinionator.blogs.nytimes.com/2012/01/19/118675/" --max-redirs 10000 --location --connect-timeout 20 -m 20 -A "Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1" 2>&1
wget http://opinionator.blogs.nytimes.com/2012/01/19/118675/ --user-agent="Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1"
If NY Times might be cloaking, and not returning the source to curl, what could be different in the headers curl is sending? I assumed since the user agent is the same, the request should look exactly the same from both of these requests. What other "footprints" should I check?
The way to solve is to analyze your curl request by doing curl -v ... and your wget request by doing wget -d ... which shows that curl is redirected to a login page
> GET /2012/01/19/118675/ HTTP/1.1
> User-Agent: Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
> Host: opinionator.blogs.nytimes.com
> Accept: */*
>
< HTTP/1.1 303 See Other
< Date: Wed, 08 Jan 2014 03:23:06 GMT
* Server Apache is not blacklisted
< Server: Apache
< Location: http://www.nytimes.com/glogin?URI=http://opinionator.blogs.nytimes.com/2012/01/19/118675/&OQ=_rQ3D0&OP=1b5c69eQ2FCinbCQ5DzLCaaaCvLgqCPhKP
< Content-Length: 0
< Content-Type: text/plain; charset=UTF-8
followed by a loop of redirections (which you must have noticed, because you have already set the --max-redirs flag).
On the other hand, wget follows the same sequence except that it returns the cookie set by nytimes.com with its subsequent request(s)
---request begin---
GET /2012/01/19/118675/?_r=0 HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Accept: */*
Host: opinionator.blogs.nytimes.com
Connection: Keep-Alive
Cookie: NYT-S=0MhLY3awSMyxXDXrmvxADeHDiNOMaMEZFGdeFz9JchiAIUFL2BEX5FWcV.Ynx4rkFI
The request sent by curl never includes the cookie.
The easiest way I see to modify your curl command and obtain the desired resource is by adding -c cookiefile to your curl command. This stores the cookie in the otherwise unused temporary "cookie jar" file called "cookiefile" thereby enabling curl to send the needed cookie(s) with its subsequent requests.
For example, I added the flag -c x directly after "curl " and I obtained the output just like from wget (except that wget writes it to a file and curl prints it on STDOUT).
In my case was because the https_proxy enviroment variable for utility cURL needs set the port in the URL, for example :
Not work with cURL :
https_proxy=http://proxyapp.net.com/
Works with cURL :
https_proxy=http://proxyapp.net.com:80/
With "wget" utility works with and without the port in url, but curl needs it, in case of not set the utility "curl" return error "(56) Proxy CONNECT aborted".
When you get verbosity of the command "curl -v" could see "curl" use port "1080" as default if port in not set at proxy url.

wget connection reset by peer

I try to access www.indeed.com from our web server by using wget but it raises "Connection reset by peer" error.
wget www.indeed.com
--2013-02-05 03:03:12-- (try: 3) http://www.indeed.com/
Connecting to www.indeed.com|208.43.224.140|:80... connected.
HTTP request sent, awaiting response... Read error (Connection reset by peer) in headers.
Retrying.
It was working before cause I'm using their API for a while but now I'm not even reach their public website.
What could it be the problem? Could Indeed add to their blacklist the server's IP or is this related with my firewall etc.?
Is there a way to debug/trace where the problem is?
You should use with user-agent like following sample
wget "http://www.indeed.com/" --user-agent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"

Resources