Go HTTP RoundTripper: Preventing Connection Reuse Based on Response - http

I have a use case where I want to use an HTTP client in Go with pooled connections (connection re-use), but with the special case where a connection is intentionally closed (not allowed for re-use) if a request on that connection returns a specific HTTP status code.
I've implemented a custom http.RoundTripper, which wraps an http.Transport, and can inspect the response status code. However, I can't seem to find a way to prevent the http.Transport from re-using that connection, without also preventing it from re-using any other connection.
Is this possible using the net/http package? If not, any suggested workaround for accomplishing this?
My current code looks something like this:
type MyTransport struct {
transport *http.Transport
}
func (mt *MyTransport) RoundTrip(req *http.Request) (*http.Response, error) {
resp, err := tt.transport.RoundTrip(req)
if err != nil {
return resp, err
}
if resp.StatusCode == 567 {
// HERE:
// Do something to prevent re-use of this connection
}
return resp, err
}

Related

Golang HTTP Get Request Not Resolving for some URL

I was trying to build some sort of website status checker. I figure out that the golang HTTP get request is not resolved and hung forever for some URL like https://www.hetzner.com. But the same URL works if we do curl.
Golang
Here there is no error thrown. It just hangs on http.Get
func main() {
resp, err := http.Get("https://www.hetzner.com")
if err != nil {
fmt.Println("Error while retrieving site", err)
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
fmt.Println("Eroor while reading response body", err)
}
fmt.Println("RESPONSE", string(body))}
CURL
I get the response while running following command.
curl https://www.hetzner.com
What may be the reason? And how do I resolve this issue from golang HTTP?
Your specific case can be fixed by specifying HTTP User-Agent Header:
import (
"fmt"
"io"
"net/http"
)
func main() {
client := &http.Client{}
req, err := http.NewRequest("GET", "https://www.hetzner.com", nil)
if err != nil {
fmt.Println("Error while retrieving site", err)
}
req.Header.Set("User-Agent", "Golang_Spider_Bot/3.0")
resp, err := client.Do(req)
if err != nil {
fmt.Println("Error while retrieving site", err)
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
fmt.Println("Eroor while reading response body", err)
}
fmt.Println("RESPONSE", string(body))
}
Note: many other hosts will reject requests from your server because of some security rules on their side. Some ideas:
Empty or bot-like User-Agent HTTP header
Location of your IP address. For example, online shops in the USA don't need to handle requests from Russia.
Autonomous System or CIDR of your provider. Some ASNs are completely blackholed because of the enormous malicious activities from their residents.
Note 2: Many modern websites have DDoS protection or CDN systems in front of them. If Cloudflare protects your target website, your HTTP request will be blocked despite the status code 200. To handle this, you need to build something able to render JavaScript-based websites and add some scripts to resolve a captcha.
Also, if you check a considerable amount of websites in a short time, you will be blocked by your DNS servers as they have some inbuild rate limits. In this case, you may want to take a look at massdns or similar solutions.

How to know proxy used by http client in given request

I'm doing some requests through some proxy servers. The function that defines which proxy url to use will choose randomly from a list of proxies. I would like to know for a given request, which proxy url is being used. As far as I know, when using a proxy server the http headers remain the same, but the tcp headers are the one that change.
Here's some code illustrating it (no error handling for simplicity):
func main() {
transport := &http.Transport{Proxy: chooseProxy}
client := http.Client{Transport: transport}
request, err := http.NewRequest(http.MethodGet, "https://www.google.com", nil)
checkErr(err)
// How to know here which proxy was used? Suppose the same client will perform several requests to different URL's.
response, err := client.Do(request)
checkErr(err)
dump, _ := httputil.DumpRequest(response.Request, false)
fmt.Println(dump)
}
func chooseProxy(request *http.Request) (*url.URL, error) {
proxies := []string{"proxy1", "proxy2", "proxy3"}
proxyToUse := proxies[rand.Intn(len(proxies))]
return url.Parse(proxyToUse)
}
I'm assuming that the Proxy function in the transport is called for each request even if the same client is used, as per the docs that say "Proxy specifies a function to return a proxy for a given Request". Am I right?
Some HTTP proxies add a Via header that tell who they are.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Via
You can modify your chooseProxy function so that it saves the proxy selected.
To do that, you can transform the chooseProxy func into a method of a type that will be used as storage for the information you want to keep:
type proxySelector string
func (sel *proxySelector) chooseProxy(request *http.Request) (*url.URL, error) {
proxies := []string{"proxy1", "proxy2", "proxy3"}
proxyToUse := proxies[rand.Intn(len(proxies))]
*sel = proxySelector(proxyToUse) // <-----
return url.Parse(proxyToUse)
}
func main() {
var proxy proxySelector
transport := &http.Transport{Proxy: proxy.chooseProxy} // <-----
client := http.Client{Transport: transport}
request, err := http.NewRequest(http.MethodGet, "https://www.google.com", nil)
checkErr(err)
// How to know here which proxy was used? Suppose the same client will perform several requests to different URL's.
response, err := client.Do(request)
checkErr(err)
dump, _ := httputil.DumpRequest(response.Request, false)
fmt.Println(dump)
fmt.Println("Proxy:", string(proxy)) // <-----
}
The request which contains the target URI is given as argument request to chooseProxy. So you can have the correct mapping already inside your chooseProxy function, all you need to to is check proxyToUse vs. request.URL there.
If you don't really trust the code that this mapping is actually done, then you need to look outside the code. For example you can look at the actual network traffic with Wireshark to see which proxy gets accessed.

Is resp.Body.Close() required if I don't need response? [duplicate]

This question already has answers here:
What could happen if I don't close response.Body?
(5 answers)
Closed 2 years ago.
I'm making request which I don't need response from. Would it cause any problems if I do it like this?
client = &http.Client{
Timeout: time.Duration(15 * time.Second),
}
...
...
_, err := client.Do(req)
Quoting from the doc of Client.Do()
If the returned error is nil, the Response will contain a non-nil Body which the user is expected to close. If the Body is not both read to EOF and closed, the Client's underlying RoundTripper (typically Transport) may not be able to re-use a persistent TCP connection to the server for a subsequent "keep-alive" request.
So yes, you always have to close it if there is no error. You are also expected to read the body to EOF before closing. Quoting from http.Response:
// The default HTTP client's Transport may not
// reuse HTTP/1.x "keep-alive" TCP connections if the Body is
// not read to completion and closed.
If you don't need the body, you may discard it like this:
resp, err := client.Do(req)
if err != nil {
// handle error and return
return
}
defer resp.Close()
io.Copy(ioutil.Discard, resp.Body)
If there is an error, see related question: Do we need to close the response object if an error occurs while calling http.Get(url)?

Terminate http request from IP layer using golang

I am making an http post request to a server using golang. Suppose the server is currently turned off (Means the machine on which the server runs is turned off) then the request is stuck at the IP layer. So my program execution is unable to proceed further. It is unable to proceed to the Application layer. So is there any way in golang to stop this.
I am using the following code.
req, err := http.NewRequest("POST", url, bytes.NewReader(b))
if err != nil {
return errors.Wrap(err, "new request error")
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
return errors.Wrap(err, "http request error")
}
defer resp.Body.Close()
Is there anything that can be added to this, to terminate the request if it doesn't find anything from the IP layer.
The default http Client has no timeout. You can create an explicit http.Client yourself and set the timeout:
var cl = &http.Client{
Timeout: time.Second * 10,
}
resp, err := cl.Do(req)
if err != nil {
// err will be set on timeout
return errors.Wrap(err, "http request error")
}
defer resp.Body.Close()
If the server does not answer any more in the middle of a request, you can handle the timeout.
Use a non-default http.Transport with its DialContext field set to a function which uses a custom context with the properly configured timeout/deadline. Another option is to use a custom net.Dialer.
Something like this:
cli := http.Client{
Transport: &http.Transport{
DialContext: func (ctx context.Context, network, address string) (net.Conn, error) {
dialer := net.Dialer{
Timeout: 3 * time.Second,
}
return dialer.DialContext(ctx, network, address)
},
},
}
req, err := http.NewRequest(...)
resp, err := cli.Do(req)
Note that as per the net.Dialer's docs the context passed to its DialContext might trump the timeout set on the dialer itself—this is
exactly what we need: the dialer's Timeout field controls exactly the
"dialing" (TCP connection establishment) while you might also arm your
HTTP request with a context (using http.Request.WithContext) controlling
the timeout of the whole request, and also be able to cancel it at any time (including the dialing step).
Playground example.
The Transport #kostix refers to is definitely what you're looking for in this case. Transports as well as Clients are safe for concurrent use as well. But please read about the Transport (and I also advise reading about the Client as well) as there are a number of different ways to affect how you handle idle connections, not just the pre-mentioned DialContext.
As you may want to set your ResponseHeaderTimeout:
ResponseHeaderTimeout, if non-zero, specifies the amount of
time to wait for a server's response headers after fully
writing the request (including its body, if any). This
time does not include the time to read the response body.
Or, if you are using a secure connection, you may want to set your TLSHandshakeTimeout:
TLSHandshakeTimeout specifies the maximum amount of time waiting to
wait for a TLS handshake. Zero means no timeout.
For readability and maintainability, I suggest also maybe creating a function to build your Client, something along the lines of:
func buildClient(timeout time.Duration) *http.Client {
tr := &http.Transport{
IdleConnTimeout: timeout,
ResponseHeaderTimeout: timeout,
TLSHandshakeTimeout: timeout,
}
client := &http.Client{
Transport: tr,
Timeout: timeout,
}
return client
}

Empty HTTP Response Using http.Client.Do in Golang

I am using Go to make an HTTP GET request to an external web service. For some reason, the body of the response is always empty; the content length is always zero bytes. The response status code is always 200, however, and the call to Client.Do returns no error. The request requires an Authorization header, so I am using the http.NewRequest / http.Client.Do pattern to submit the request, as you'll see below. I have done requests similar to these in the past, but never using a GET that required a header. It seems unlikely that this the cause, but I wonder if it may be related. If anyone can spot any potential issues with the pattern used or perhaps has had a similar experience, I'd really appreciate any help.
Thank you.
if req, err := http.NewRequest("GET", "https://api.molt.in/v1/orders/11111111/items", nil); err != nil {
return nil, err
} else {
client := &http.Client{}
req.Header.Add("Authorization", "secretToken")
if resp, err := client.Do(req); err != nil {
return nil, err
} else {
defer resp.Body.Close()
return readBody(resp.Body)
}
}
I finally discovered the source of the problem. It had nothing to do with the request being made, or the response being received. It had to do with the parsing of the response.
I was using bufio.NewScanner.Text to attempt to convert the response body into a string. Replacing this call with one to ioutil.ReadAll output the string that I originally expected.
Thanks for all of your help, and apologies for the misleading question.

Resources