I was trying to build some sort of website status checker. I figure out that the golang HTTP get request is not resolved and hung forever for some URL like https://www.hetzner.com. But the same URL works if we do curl.
Golang
Here there is no error thrown. It just hangs on http.Get
func main() {
resp, err := http.Get("https://www.hetzner.com")
if err != nil {
fmt.Println("Error while retrieving site", err)
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
fmt.Println("Eroor while reading response body", err)
}
fmt.Println("RESPONSE", string(body))}
CURL
I get the response while running following command.
curl https://www.hetzner.com
What may be the reason? And how do I resolve this issue from golang HTTP?
Your specific case can be fixed by specifying HTTP User-Agent Header:
import (
"fmt"
"io"
"net/http"
)
func main() {
client := &http.Client{}
req, err := http.NewRequest("GET", "https://www.hetzner.com", nil)
if err != nil {
fmt.Println("Error while retrieving site", err)
}
req.Header.Set("User-Agent", "Golang_Spider_Bot/3.0")
resp, err := client.Do(req)
if err != nil {
fmt.Println("Error while retrieving site", err)
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
fmt.Println("Eroor while reading response body", err)
}
fmt.Println("RESPONSE", string(body))
}
Note: many other hosts will reject requests from your server because of some security rules on their side. Some ideas:
Empty or bot-like User-Agent HTTP header
Location of your IP address. For example, online shops in the USA don't need to handle requests from Russia.
Autonomous System or CIDR of your provider. Some ASNs are completely blackholed because of the enormous malicious activities from their residents.
Note 2: Many modern websites have DDoS protection or CDN systems in front of them. If Cloudflare protects your target website, your HTTP request will be blocked despite the status code 200. To handle this, you need to build something able to render JavaScript-based websites and add some scripts to resolve a captcha.
Also, if you check a considerable amount of websites in a short time, you will be blocked by your DNS servers as they have some inbuild rate limits. In this case, you may want to take a look at massdns or similar solutions.
Related
I have a use case where I want to use an HTTP client in Go with pooled connections (connection re-use), but with the special case where a connection is intentionally closed (not allowed for re-use) if a request on that connection returns a specific HTTP status code.
I've implemented a custom http.RoundTripper, which wraps an http.Transport, and can inspect the response status code. However, I can't seem to find a way to prevent the http.Transport from re-using that connection, without also preventing it from re-using any other connection.
Is this possible using the net/http package? If not, any suggested workaround for accomplishing this?
My current code looks something like this:
type MyTransport struct {
transport *http.Transport
}
func (mt *MyTransport) RoundTrip(req *http.Request) (*http.Response, error) {
resp, err := tt.transport.RoundTrip(req)
if err != nil {
return resp, err
}
if resp.StatusCode == 567 {
// HERE:
// Do something to prevent re-use of this connection
}
return resp, err
}
I'm developing a Security Scanner and therefore need to send HTTP requests which don't honor RFC specifications. However, golang is very strict to comply with these.
Issue
I want to send a HTTP request which contains prohibited special characters such as "".
For example: "Ill\egal": "header value"
However, golang always throws the error: 'net/http: invalid header field name "Ill\egal"'.
This error is thrown on line 523 at https://go.dev/src/net/http/transport.go
Issue
I want to send a single HTTP request which contains either two content-length, two transfer-encoding or one content-length & one transfer-encoding header (for HTTP request smuggling). Those need sometimes to have wrong values.
However, it isn't possible to set those headers oneself, they are generated automatically. So it's only possible to use one of these headers with a correct value.
I've bypassed this by using a Raw TCP Stream, however this solution isn't satisfying, as I can't use a proxy this way: Use Dialer with Proxy. Route TCP stream through Proxy
Issue
I want to send a HTTP request where the header name is mixed upper and lowercase. E.g. "HeAdErNaMe": "header value".
This is possible for HTTP 1 requests by writing directly to the header map (req.Header["HeAdErNaMe"] = []string{"header value"})
However for HTTP 2 requests the headers will still be capitalized to meet the RFC specifications.
You can dump request into a buffer, modify the buffer (with regexp or replace), and send modified buffer to the host using net.Dial.
Example:
package main
import (
"bufio"
"crypto/tls"
"fmt"
"log"
"net/http"
"net/http/httputil"
"strings"
)
func main() {
// create and dump request
req, err := http.NewRequest(http.MethodGet, "https://golang.org", nil)
if err != nil {
log.Fatal(err)
}
req.Header.Add("User-Agent", "aaaaa")
buf, err := httputil.DumpRequest(req, true)
if err != nil {
log.Fatal(err)
}
// Corrupt request
str := string(buf)
str = strings.Replace(str, "User-Agent: aaaaa", "UsEr-AgEnT: aaa\"aaa", 1)
println(str)
// Dial and send raw request text
conn, err := tls.Dial("tcp", "golang.org:443", nil)
if err != nil {
log.Fatal(err)
}
defer conn.Close()
fmt.Fprintf(conn, str)
// Read response
br := bufio.NewReader(conn)
resp, err := http.ReadResponse(br, nil)
if err != nil {
log.Fatal(err)
}
log.Printf("%+v", resp)
}
My goal is to scrape a website that requires me to log in first using HTTP requests in Golang. I actually succeeded by finding out I can send a post request to the website writing form-data into the body of the request. When I test this through an API development software I use called Postman, the response is instantaneous with no delays. However, when performing the request with an HTTP client in Go, there is a consistent 60 second delay every single time. I end up getting a logged in page, but for my program I need the response to be nearly instantaneous.
As you can see in my code, I've tried adding a bunch of headers to the request like "Connection", "Content-Type", "User-Agent" since I thought maaaaaybe the website can tell I'm requesting from a program and is forcing me to wait 60 seconds for a response. Adding these headers to make my request more legitimate(?) doesn't work at all.
Is the delay coming from Go's HTTP client being slow or is there something wrong with how I'm forming my HTTP POST request? Also, was I on to something with my headers and HTTP client is rewriting them when they send out?
Here's my simple program...
package main
import (
"bytes"
"fmt"
"mime/multipart"
"net/http"
"net/http/cookiejar"
"os"
)
func main() {
url := "https://easypronunciation.com/en/log-in"
method := "POST"
payload := &bytes.Buffer{}
writer := multipart.NewWriter(payload)
_ = writer.WriteField("email", "foo#bar.com")
_ = writer.WriteField("password", "*********")
_ = writer.WriteField("persistent_login", "on")
_ = writer.WriteField("submit", "")
err := writer.Close()
if err != nil {
fmt.Println(err)
}
cookieJar, _ := cookiejar.New(nil)
client := &http.Client{
Jar: cookieJar,
}
req, err := http.NewRequest(method, url, payload)
if err != nil {
fmt.Println(err)
}
req.Header.Set("Content-Type", writer.FormDataContentType())
req.Header.Set("Connection", "Keep-Alive")
req.Header.Set("Accept-Language", "en-US")
req.Header.Set("User-Agent", "Mozilla/5.0")
res, err := client.Do(req)
if err != nil {
fmt.Println(err)
}
defer res.Body.Close()
f, err := os.Create("response.html")
defer f.Close()
res.Write(f)
}
I doubt, this is the go client library too. I would suggest printing out the latencies for different components and see if/where the 60 second delay is. I would also replace and try different URLs instead
I'm trying to download a file from the web. It should be a simple processes. One that I've alredy done before. But, this particular link (a 135 kB zip file) gives me an error message: Get "http://www1.caixa.gov.br/loterias/_arquivos/loterias/D_megase.zip": stopped after 10 redirect. If I copy the link into the browser the file is downloaded without any issues, but when using the code below, the error pops up.
package main
import (
"io"
"net/http"
"os"
)
func main() {
link := "http://www1.caixa.gov.br/loterias/_arquivos/loterias/D_megase.zip"
resp, err := http.Get(link)
if err != nil {
panic(err)
}
defer resp.Body.Close()
// Create the file
out, err := os.Create("ms.zip")
if err != nil {
panic(err)
}
defer out.Close()
// Write the body to file
_, err = io.Copy(out, resp.Body)
if err != nil {
panic(err)
}
}
Any ideas on why does this happens and how to get around it?
Thanks for the attention.
After investigating this url I see that it sets cookie
Set-Cookie: security=true; path=/
You can set cookie manually, or implement CookieJar
c := http.Client{}
req, err := http.NewRequest("GET", link, nil)
if err != nil {
panic(err)
}
req.AddCookie(&http.Cookie{Name: "security", Value: "true", Path: "/"})
resp, err := c.Do(req)
if err != nil {
panic(err)
}
Your code is totally fine, but you'll often find this issue is more related to the source you're trying to download a file from, itself, rather than Go.
You would have had the same issue with other tools/languages, because the host you are trying to reach, keeps redirecting you because of an invalid 'User-Agent' header property. This is often the case when you want to allow your files to be downloadable only from 'browsers', rather than crawls, automated scripts etc.
With Go, you can add the header property with req.Header.Set("User-Agent", "<some-user-agent-value>"), before sending the request. You'd create an instance of request set the header, and execute it with a http.Client{} and client.Do(req).
Eg:
link := "http://www1.caixa.gov.br/loterias/_arquivos/loterias/D_megase.zip"
req, err := http.NewRequest("GET", link, nil)
if err != nil {
panic(err)
}
req.Header.Set("User-Agent", "Mozilla/4.0") // Doesn't even have to be a full
// proper user agent string
client := &http.Client{}
resp, err := client.Do(req)
You can read more in the Go's http pkg docs, it states that:
"For control over HTTP client headers, redirect policy, and other
settings, create a Client..."
Here's also the http.reqeust and http.client docs.
More about this ingeneral you can find in e.g. Mozilla's HTTP docs, as well as many other great docs and resources out there.
Btw. the zip archive you're trying to download seems like invalid. :-)
I am learning Go and came across this problem.
I am just downloading web page content using HTTP client:
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
)
func main() {
client := &http.Client{}
req, err := http.NewRequest("GET", "https://mail.ru/", nil)
req.Close = true
response, err := client.Do(req)
if err != nil {
log.Fatal(err)
}
defer response.Body.Close()
content, err := ioutil.ReadAll(response.Body)
if err != nil {
fmt.Println(err)
}
fmt.Println(string(content)[:100])
}
I get an unexpected EOF error when reading response body. At the same time content variable has full page content.
This error appear only when I downloading https://mail.ru/ content. With other URLs everything works fine - without any errors.
I used curl for downloading this page content - everything works as expected.
I am confused a bit - what's happening here?
Go v1.2, tried on Ubuntu and MacOS X
It looks like the that server (Apache 1.3, wow!) is serving up a truncated gzip response. If you explicitly request the identity encoding (preventing the Go transport from adding gzip itself), you won't get the ErrUnexpectedEOF:
req.Header.Add("Accept-Encoding", "identity")