Why is there a 60 second delay on my HTTP POST request when using a Go HTTP client? - http

My goal is to scrape a website that requires me to log in first using HTTP requests in Golang. I actually succeeded by finding out I can send a post request to the website writing form-data into the body of the request. When I test this through an API development software I use called Postman, the response is instantaneous with no delays. However, when performing the request with an HTTP client in Go, there is a consistent 60 second delay every single time. I end up getting a logged in page, but for my program I need the response to be nearly instantaneous.
As you can see in my code, I've tried adding a bunch of headers to the request like "Connection", "Content-Type", "User-Agent" since I thought maaaaaybe the website can tell I'm requesting from a program and is forcing me to wait 60 seconds for a response. Adding these headers to make my request more legitimate(?) doesn't work at all.
Is the delay coming from Go's HTTP client being slow or is there something wrong with how I'm forming my HTTP POST request? Also, was I on to something with my headers and HTTP client is rewriting them when they send out?
Here's my simple program...
package main
import (
"bytes"
"fmt"
"mime/multipart"
"net/http"
"net/http/cookiejar"
"os"
)
func main() {
url := "https://easypronunciation.com/en/log-in"
method := "POST"
payload := &bytes.Buffer{}
writer := multipart.NewWriter(payload)
_ = writer.WriteField("email", "foo#bar.com")
_ = writer.WriteField("password", "*********")
_ = writer.WriteField("persistent_login", "on")
_ = writer.WriteField("submit", "")
err := writer.Close()
if err != nil {
fmt.Println(err)
}
cookieJar, _ := cookiejar.New(nil)
client := &http.Client{
Jar: cookieJar,
}
req, err := http.NewRequest(method, url, payload)
if err != nil {
fmt.Println(err)
}
req.Header.Set("Content-Type", writer.FormDataContentType())
req.Header.Set("Connection", "Keep-Alive")
req.Header.Set("Accept-Language", "en-US")
req.Header.Set("User-Agent", "Mozilla/5.0")
res, err := client.Do(req)
if err != nil {
fmt.Println(err)
}
defer res.Body.Close()
f, err := os.Create("response.html")
defer f.Close()
res.Write(f)
}

I doubt, this is the go client library too. I would suggest printing out the latencies for different components and see if/where the 60 second delay is. I would also replace and try different URLs instead

Related

Golang HTTP Get Request Not Resolving for some URL

I was trying to build some sort of website status checker. I figure out that the golang HTTP get request is not resolved and hung forever for some URL like https://www.hetzner.com. But the same URL works if we do curl.
Golang
Here there is no error thrown. It just hangs on http.Get
func main() {
resp, err := http.Get("https://www.hetzner.com")
if err != nil {
fmt.Println("Error while retrieving site", err)
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
fmt.Println("Eroor while reading response body", err)
}
fmt.Println("RESPONSE", string(body))}
CURL
I get the response while running following command.
curl https://www.hetzner.com
What may be the reason? And how do I resolve this issue from golang HTTP?
Your specific case can be fixed by specifying HTTP User-Agent Header:
import (
"fmt"
"io"
"net/http"
)
func main() {
client := &http.Client{}
req, err := http.NewRequest("GET", "https://www.hetzner.com", nil)
if err != nil {
fmt.Println("Error while retrieving site", err)
}
req.Header.Set("User-Agent", "Golang_Spider_Bot/3.0")
resp, err := client.Do(req)
if err != nil {
fmt.Println("Error while retrieving site", err)
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
fmt.Println("Eroor while reading response body", err)
}
fmt.Println("RESPONSE", string(body))
}
Note: many other hosts will reject requests from your server because of some security rules on their side. Some ideas:
Empty or bot-like User-Agent HTTP header
Location of your IP address. For example, online shops in the USA don't need to handle requests from Russia.
Autonomous System or CIDR of your provider. Some ASNs are completely blackholed because of the enormous malicious activities from their residents.
Note 2: Many modern websites have DDoS protection or CDN systems in front of them. If Cloudflare protects your target website, your HTTP request will be blocked despite the status code 200. To handle this, you need to build something able to render JavaScript-based websites and add some scripts to resolve a captcha.
Also, if you check a considerable amount of websites in a short time, you will be blocked by your DNS servers as they have some inbuild rate limits. In this case, you may want to take a look at massdns or similar solutions.

How to bypass golang's HTTP request (net/http) RFC compliance

I'm developing a Security Scanner and therefore need to send HTTP requests which don't honor RFC specifications. However, golang is very strict to comply with these.
Issue
I want to send a HTTP request which contains prohibited special characters such as "".
For example: "Ill\egal": "header value"
However, golang always throws the error: 'net/http: invalid header field name "Ill\egal"'.
This error is thrown on line 523 at https://go.dev/src/net/http/transport.go
Issue
I want to send a single HTTP request which contains either two content-length, two transfer-encoding or one content-length & one transfer-encoding header (for HTTP request smuggling). Those need sometimes to have wrong values.
However, it isn't possible to set those headers oneself, they are generated automatically. So it's only possible to use one of these headers with a correct value.
I've bypassed this by using a Raw TCP Stream, however this solution isn't satisfying, as I can't use a proxy this way: Use Dialer with Proxy. Route TCP stream through Proxy
Issue
I want to send a HTTP request where the header name is mixed upper and lowercase. E.g. "HeAdErNaMe": "header value".
This is possible for HTTP 1 requests by writing directly to the header map (req.Header["HeAdErNaMe"] = []string{"header value"})
However for HTTP 2 requests the headers will still be capitalized to meet the RFC specifications.
You can dump request into a buffer, modify the buffer (with regexp or replace), and send modified buffer to the host using net.Dial.
Example:
package main
import (
"bufio"
"crypto/tls"
"fmt"
"log"
"net/http"
"net/http/httputil"
"strings"
)
func main() {
// create and dump request
req, err := http.NewRequest(http.MethodGet, "https://golang.org", nil)
if err != nil {
log.Fatal(err)
}
req.Header.Add("User-Agent", "aaaaa")
buf, err := httputil.DumpRequest(req, true)
if err != nil {
log.Fatal(err)
}
// Corrupt request
str := string(buf)
str = strings.Replace(str, "User-Agent: aaaaa", "UsEr-AgEnT: aaa\"aaa", 1)
println(str)
// Dial and send raw request text
conn, err := tls.Dial("tcp", "golang.org:443", nil)
if err != nil {
log.Fatal(err)
}
defer conn.Close()
fmt.Fprintf(conn, str)
// Read response
br := bufio.NewReader(conn)
resp, err := http.ReadResponse(br, nil)
if err != nil {
log.Fatal(err)
}
log.Printf("%+v", resp)
}

How to know proxy used by http client in given request

I'm doing some requests through some proxy servers. The function that defines which proxy url to use will choose randomly from a list of proxies. I would like to know for a given request, which proxy url is being used. As far as I know, when using a proxy server the http headers remain the same, but the tcp headers are the one that change.
Here's some code illustrating it (no error handling for simplicity):
func main() {
transport := &http.Transport{Proxy: chooseProxy}
client := http.Client{Transport: transport}
request, err := http.NewRequest(http.MethodGet, "https://www.google.com", nil)
checkErr(err)
// How to know here which proxy was used? Suppose the same client will perform several requests to different URL's.
response, err := client.Do(request)
checkErr(err)
dump, _ := httputil.DumpRequest(response.Request, false)
fmt.Println(dump)
}
func chooseProxy(request *http.Request) (*url.URL, error) {
proxies := []string{"proxy1", "proxy2", "proxy3"}
proxyToUse := proxies[rand.Intn(len(proxies))]
return url.Parse(proxyToUse)
}
I'm assuming that the Proxy function in the transport is called for each request even if the same client is used, as per the docs that say "Proxy specifies a function to return a proxy for a given Request". Am I right?
Some HTTP proxies add a Via header that tell who they are.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Via
You can modify your chooseProxy function so that it saves the proxy selected.
To do that, you can transform the chooseProxy func into a method of a type that will be used as storage for the information you want to keep:
type proxySelector string
func (sel *proxySelector) chooseProxy(request *http.Request) (*url.URL, error) {
proxies := []string{"proxy1", "proxy2", "proxy3"}
proxyToUse := proxies[rand.Intn(len(proxies))]
*sel = proxySelector(proxyToUse) // <-----
return url.Parse(proxyToUse)
}
func main() {
var proxy proxySelector
transport := &http.Transport{Proxy: proxy.chooseProxy} // <-----
client := http.Client{Transport: transport}
request, err := http.NewRequest(http.MethodGet, "https://www.google.com", nil)
checkErr(err)
// How to know here which proxy was used? Suppose the same client will perform several requests to different URL's.
response, err := client.Do(request)
checkErr(err)
dump, _ := httputil.DumpRequest(response.Request, false)
fmt.Println(dump)
fmt.Println("Proxy:", string(proxy)) // <-----
}
The request which contains the target URI is given as argument request to chooseProxy. So you can have the correct mapping already inside your chooseProxy function, all you need to to is check proxyToUse vs. request.URL there.
If you don't really trust the code that this mapping is actually done, then you need to look outside the code. For example you can look at the actual network traffic with Wireshark to see which proxy gets accessed.

Golang http.Get too many redirects

I'm trying to download a file from the web. It should be a simple processes. One that I've alredy done before. But, this particular link (a 135 kB zip file) gives me an error message: Get "http://www1.caixa.gov.br/loterias/_arquivos/loterias/D_megase.zip": stopped after 10 redirect. If I copy the link into the browser the file is downloaded without any issues, but when using the code below, the error pops up.
package main
import (
"io"
"net/http"
"os"
)
func main() {
link := "http://www1.caixa.gov.br/loterias/_arquivos/loterias/D_megase.zip"
resp, err := http.Get(link)
if err != nil {
panic(err)
}
defer resp.Body.Close()
// Create the file
out, err := os.Create("ms.zip")
if err != nil {
panic(err)
}
defer out.Close()
// Write the body to file
_, err = io.Copy(out, resp.Body)
if err != nil {
panic(err)
}
}
Any ideas on why does this happens and how to get around it?
Thanks for the attention.
After investigating this url I see that it sets cookie
Set-Cookie: security=true; path=/
You can set cookie manually, or implement CookieJar
c := http.Client{}
req, err := http.NewRequest("GET", link, nil)
if err != nil {
panic(err)
}
req.AddCookie(&http.Cookie{Name: "security", Value: "true", Path: "/"})
resp, err := c.Do(req)
if err != nil {
panic(err)
}
Your code is totally fine, but you'll often find this issue is more related to the source you're trying to download a file from, itself, rather than Go.
You would have had the same issue with other tools/languages, because the host you are trying to reach, keeps redirecting you because of an invalid 'User-Agent' header property. This is often the case when you want to allow your files to be downloadable only from 'browsers', rather than crawls, automated scripts etc.
With Go, you can add the header property with req.Header.Set("User-Agent", "<some-user-agent-value>"), before sending the request. You'd create an instance of request set the header, and execute it with a http.Client{} and client.Do(req).
Eg:
link := "http://www1.caixa.gov.br/loterias/_arquivos/loterias/D_megase.zip"
req, err := http.NewRequest("GET", link, nil)
if err != nil {
panic(err)
}
req.Header.Set("User-Agent", "Mozilla/4.0") // Doesn't even have to be a full
// proper user agent string
client := &http.Client{}
resp, err := client.Do(req)
You can read more in the Go's http pkg docs, it states that:
"For control over HTTP client headers, redirect policy, and other
settings, create a Client..."
Here's also the http.reqeust and http.client docs.
More about this ingeneral you can find in e.g. Mozilla's HTTP docs, as well as many other great docs and resources out there.
Btw. the zip archive you're trying to download seems like invalid. :-)

Unexpected EOF using Go http client

I am learning Go and came across this problem.
I am just downloading web page content using HTTP client:
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
)
func main() {
client := &http.Client{}
req, err := http.NewRequest("GET", "https://mail.ru/", nil)
req.Close = true
response, err := client.Do(req)
if err != nil {
log.Fatal(err)
}
defer response.Body.Close()
content, err := ioutil.ReadAll(response.Body)
if err != nil {
fmt.Println(err)
}
fmt.Println(string(content)[:100])
}
I get an unexpected EOF error when reading response body. At the same time content variable has full page content.
This error appear only when I downloading https://mail.ru/ content. With other URLs everything works fine - without any errors.
I used curl for downloading this page content - everything works as expected.
I am confused a bit - what's happening here?
Go v1.2, tried on Ubuntu and MacOS X
It looks like the that server (Apache 1.3, wow!) is serving up a truncated gzip response. If you explicitly request the identity encoding (preventing the Go transport from adding gzip itself), you won't get the ErrUnexpectedEOF:
req.Header.Add("Accept-Encoding", "identity")

Resources