How to avoid some sites rejecting HTTP get using go - http

We have a script that on a daily basis checks all of the web links in all of our database records (the users want notifications when a link becomes out of date).
There are a couple of sites that work fine through a web browser from this IP address, but when fetched through GO, they either disconnect before completing the request or return a HTTP authorisation denied message.
I am assuming some sort of firewall (F5) is filtering/blocking the request. This occurs even when I change the HTTP request to use a common user agent. What can we do to ensure a GO request looks like a standard browser?
func fetch_url(url string, d time.Duration) (int, error) {
client := &http.Client{
Timeout: d,
}
req, err := http.NewRequest("GET", url, nil)
if err != nil {
return 0, err
}
req.Header.Set("User-Agent", "Mozilla/5.0 (iPad; CPU OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53")
resp, err := client.Do(req)
if err != nil {
return 0, err
}
status := resp.StatusCode
resp.Body.Close()
return status, nil
}

Try matching the exact headers from a request from your web browser to eliminate other factors. A smart firewall could have heuristics on what looks like a web browser versus a robot.
Notice that the go http client sends only a minimal HTTP request:
GET /foo HTTP/1.1
Host: localhost:3030
User-Agent: Go 1.1 package http
Accept-Encoding: gzip
Whereas a web browser is more chatty:
GET /foo HTTP/1.1
Host: localhost:3030
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.89 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8

Related

Why does the HTTP Client Force an Accept-Encoding header

Sample Code:
package main
import (
"fmt"
"net/http"
"net/http/httputil"
)
func main() {
client := &http.Client{
Transport: &http.Transport{
DisableCompression: true,
},
}
url := "https://google.com"
req, err := http.NewRequest(http.MethodGet, url, nil)
if err != nil {
return
}
//req.Header.Set("Accept-Encoding", "*")
//req.Header.Del("Accept-Encoding")
requestDump, err := httputil.DumpRequestOut(req, false)
if err != nil {
fmt.Println(err)
}
fmt.Println(string(requestDump))
client.Do(req)
}
Output:
GET / HTTP/1.1
Host: google.com
User-Agent: Go-http-client/1.1
Accept-Encoding: gzip
With only req.Header.Set("Accept-Encoding", "*" uncommented:
GET / HTTP/1.1
Host: google.com
User-Agent: Go-http-client/1.1
Accept-Encoding: *
With only req.Header.Del("Accept-Encoding") uncommented:
GET / HTTP/1.1
Host: google.com
User-Agent: Go-http-client/1.1
Accept-Encoding: gzip
With both lines uncommented:
GET / HTTP/1.1
Host: google.com
User-Agent: Go-http-client/1.1
Accept-Encoding: gzip
Does DisableCompression actually do anything to the HTTP Request itself?
According to the godocs:
// DisableCompression, if true, prevents the Transport from
// requesting compression with an "Accept-Encoding: gzip"
// request header when the Request contains no existing
// Accept-Encoding value. If the Transport requests gzip on
// its own and gets a gzipped response, it's transparently
// decoded in the Response.Body. However, if the user
// explicitly requested gzip it is not automatically
// uncompressed.
As per document:
DumpRequestOut is like DumpRequest but for outgoing client requests.
It includes any headers that the standard http.Transport adds, such as User-Agent.
That means it adds "Accept-Encoding: gzip" to the printed wire format.
To test what is actually written to the connection, you need to wrap Transport.Dial or Transport.DialContext to provide connection that logs written data.
If you are using a transport that supports httptrace (which all built-in and "x/http/..." transport implementation supports), you may set up a WroteHeaderField callback to inspect written header fields.
If you just need to inspect the headers, however, you can spawn up a httptest.Server.
Playground link provided by #EmilePels:
https://play.golang.org/p/ZPi-_mfDxI8

IdHttp returns HTTP 1.1/ 403 Forbidden. HttpClient works fine

I'm using IdHTTP for a simple Get request:
procedure TForm1.FormCreate(Sender: TObject);
const
URL = 'https://www.udemy.com/course/the-modern-angular-bootcamp/';
begin
IdSSLIOHandler.SSLOptions.Method := sslvSSLv23;
IdHttp1.Request.UserAgent :=
'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0';
Memo1.Lines.Text := IdHTTP1.Get(URL);
The call to IdHTTP1.Get results in the error "HTTP/1.1 403 Forbidden".
A similar question (Why do I get "403 Forbidden" when I connect to whatismyip.com?) says the UserAgent should be set to a modern value. I have done so, but it does not help.
I'm using OpenSSL Win32 v1.0.2u. Tokyo, Windows 10
Similar code using THttpClient works without any problem:
HttpClient := THTTPClient.Create;
try
Response := HttpClient.Get(URL);
Memo1.Lines.Text := Response.ContentAsString();
finally
HttpClient.Free;
end;

Convert XHR (XML Http Request) into R command

I am trying to turn an XHR (XMLHttpRequest) request into an R command.
I am using the following code:
library(httr)
x <- POST("https://transparency.entsoe.eu/generation/r2/actualGenerationPerGenerationUnit/getDataTableDetailData/?name=&defaultValue=false&viewType=TABLE&areaType=BZN&atch=false&dateTime.dateTime=17.03.2017+00%3A00%7CUTC%7CDAYTIMERANGE&dateTime.endDateTime=17.03.2017+00%3A00%7CUTC%7CDAYTIMERANGE&area.values=CTY%7C10YBE----------2!BZN%7C10YBE----------2&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B20&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&dateTime.timezone=UTC&dateTime.timezone_input=UTC&dv-datatable-detail_22WAMERCO000010Y_22WAMERCO000008L_length=10&dv-datatable_length=50&detailId=22WAMERCO000010Y_22WAMERCO000008L",
user_agent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.50 Safari/537.36"),
add_headers(`Referer`="https://transparency.entsoe.eu/generation/r2/actualGenerationPerGenerationUnit/show?name=&defaultValue=true&viewType=TABLE&areaType=BZN&atch=false&dateTime.dateTime=17.03.2017+00:00|UTC|DAYTIMERANGE&dateTime.endDateTime=17.03.2017+00:00|UTC|DAYTIMERANGE&area.values=CTY|10YBE----------2!BZN|10YBE----------2&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&productionType.values=B20&dateTime.timezone=UTC&dateTime.timezone_input=UTC&dv-datatable_length=100",
Connection = "keep-alive",
Host = "https://transparency.entsoe.eu/",
Accept = "application/json, text/javascript, */*; q=0.01",
`Accept-Encoding` = "gzip, deflate, br",
Origin = "https://transparency.entsoe.eu",
`X-Requested-With` = "XMLHttpRequest",
`Content-Type` = "application/json;charset=UTF-8",
`Accept-Language`= "en-US,en;q=0.8,nl;q=0.6,fr-FR;q=0.4,fr;q=0.2"))
But I keep getting an 400 error: bad request instead of the 200 which would mark a successful response.
I've extracted the values via the Chrome network monitor from this website. The XHR request is sent when the plus button is clicked. I can send it repeatedly from my browser, but it doesn't seem to work from R.
What am I doing wrong in creating the Post request?

Why are there 2 requests from my browser?

I have a simple node server. All it does is log the req.headers and res (I am learning!).
let http = require('http');
function handleIncomingRequest(req, res) {
console.log('---------------------------------------------------');
console.log(req.headers);
console.log('---------------------------------------------------');
console.log();
console.log('---------------------------------------------------');
res.writeHead(200, {'Content-Type': 'application/json'});
res.end(JSON.stringify( {error: null}) + '\n');
}
let s = http.createServer(handleIncomingRequest);
s.listen(8080);
When I use curl to test the server it sends 1 request. When I use chrome it sends 2 different requests.
{ host: 'localhost:8080',
connection: 'keep-alive',
'cache-control': 'max-age=0',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
accept: 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, sdch, br',
'accept-language': 'en-GB,en;q=0.8' }
and
{ host: 'localhost:8080',
connection: 'keep-alive',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
accept: 'image/webp,image/*,*/*;q=0.8',
referer: 'http://localhost:8080/',
'accept-encoding': 'gzip, deflate, sdch, br',
'accept-language': 'en-GB,en;q=0.8' }
This is in incognito mode as in normal mode there are 3 requests!
What is the browser doing and why?
Hard to tell without seeing the full transaction data (for example, what was the request, i.e. what came after GET or POST - and what were the answers from the server).
But it could be caused by the 'upgrade-insecure-requests': '1' header:
When a server encounters this preference in an HTTP request’s headers,
it SHOULD redirect the user to a potentially secure representation of
the resource being requested.
See this.
accept: 'image/webp,image/*,*/*;q=0.8'
On the other hand, the second request is probably for an image only, most likely the favicon.ico or a (bigger) icon for iPad/iPhone maybe (that could explain the 3 requests). You should check out the full request data to be sure.
You can use F12 en select network in the browser to see what's really happening.

POST raw to server Processing

I have an Intel Edison running a Node.JS server that is printing everything I post to it into the console. I can successfully post to it using Postman and see the sent raw data in the console.
Now I'm using Processing to POST to it, which will fire off different events on the Node.JS server.
My problem is that I can't seem to successfully POST the raw body to the server, I've been trying to get this working for several hours already.
import processing.net.*;
String url = "192.168.0.107:3000";
Client myClient;
void setup(){
myClient = new Client(this, "192.168.0.107", 3000);
myClient.write("POST / HTTP/1.1\n");
myClient.write("Cache-Control: no-cache\n");
myClient.write("Content-Type: text/plain\n");
//Attempting to write the raw post body
myClient.write("test");
//2 newlines tells the server that we're done sending
myClient.write("\n\n");
}
The console shows that the server received the POST, and the correct headers, but it doesn't show any data in it.
How do I specify the that "test" is the raw POST data?
The HTTP code from Postman:
POST HTTP/1.1
Host: 192.168.0.107:3000
Content-Type: text/plain
Cache-Control: no-cache
Postman-Token: 6cab79ad-b43b-b4d3-963f-fad11523ec0b
test
The server output from a POST from Postman:
{ host: '192.168.0.107:3000',
connection: 'keep-alive',
'content-length': '4',
'cache-control': 'no-cache',
origin: 'chrome-extension://fhbjgbiflinjbdggehcddcbncdddomop',
'content-type': 'text/plain',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36',
'postman-token': 'd17676a6-98f4-917c-955c-7d8ef01bb024',
accept: '*/*',
'accept-encoding': 'gzip, deflate',
'accept-language': 'en-US,en;q=0.8' }
test
The server output from my POST from Processing:
{ host: '192.168.0.107:3000',
'cache-control': 'no-cache',
'content-type': 'text/plain' }
{}
I just figured out what was wrong, I needed to add the content-length header to tell the server how much data to listen for, and then a newline before the data.
Final code:
import processing.net.*;
String url = "192.168.0.107:3000";
Client myClient;
void setup(){
myClient = new Client(this, "192.168.0.107", 3000);
myClient.write("POST / HTTP/1.1\n");
myClient.write("Cache-Control: no-cache\n");
myClient.write("Content-Type: text/plain\n");
myClient.write("content-length: 4\n");
myClient.write("\n");
myClient.write("test");
myClient.write("\n\n");
}

Resources