Dial tcp I/O timeout on simultaneous requests - http

I am building a tool in Go that needs to make a very large number of simultaneous HTTP requests to many different servers. My initial prototype in Python had no problem doing a few hundred simultaneous requests.
However, I have found that in Go this almost always results in a Get http://www.google.com: dial tcp 216.58.205.228:80: i/o timeout for some if the number of simultaneous requests exceeds ~30-40.
I've tested on macOS, openSUSE, different hardware, in different networks and with different domain lists, and changing the DNS server as described in other Stackoverflow answers does not work either.
The interesting thing is that the failed requests do not even produce a packet, as can be seen when checking with Wireshark.
Is there anything that I am doing wrong or is that a bug in Go?
Minimum reproducible program below:
package main
import (
"fmt"
"net/http"
"sync"
)
func main() {
domains := []string{/* large domain list here, eg from https://moz.com/top500 */}
limiter := make(chan string, 50) // Limits simultaneous requests
wg := sync.WaitGroup{} // Needed to not prematurely exit before all requests have been finished
for i, domain := range domains {
wg.Add(1)
limiter <- domain
go func(i int, domain string) {
defer func() { <-limiter }()
defer wg.Done()
resp, err := http.Get("http://"+domain)
if err != nil {
fmt.Printf("%d %s failed: %s\n", i, domain, err)
return
}
fmt.Printf("%d %s: %s\n", i, domain, resp.Status)
}(i, domain)
}
wg.Wait()
}
Two particular error messages are happening, a net.DNSError that does not make any sense and a non-descript poll.TimeoutError:
&url.Error{Op:"Get", URL:"http://harvard.edu", Err:(*net.OpError)(0xc00022a460)}
&net.OpError{Op:"dial", Net:"tcp", Source:net.Addr(nil), Addr:net.Addr(nil), Err:(*net.DNSError)(0xc000aca200)}
&net.DNSError{Err:"no such host", Name:"harvard.edu", Server:"", IsTimeout:false, IsTemporary:false}
&url.Error{Op:"Get", URL:"http://latimes.com", Err:(*net.OpError)(0xc000d92730)}
&net.OpError{Op:"dial", Net:"tcp", Source:net.Addr(nil), Addr:net.Addr(nil), Err:(*poll.TimeoutError)(0x14779a0)}
&poll.TimeoutError{}
Update:
Running the requests with a seperate http.Client as well as http.Transport and net.Dialer does not make any difference as can be seen when running code from this playground.

I think many of your net.DNSErrors are actually too many open files errors in disguise. You can see this by running your sample code with the netgo tag (recommendation from here) (go run -tags netgo main.go) which will emit errors like:
…dial tcp: lookup buzzfeed.com on 192.168.1.1:53: dial udp 192.168.1.1:53: socket: too many open files
instead of
…dial tcp: lookup buzzfeed.com: no such host
Make sure you're closing the request's response body (resp.Body.Close()). You can find more about this specific problem at What's the best way to handle "too many open files"? and How to set ulimit -n from a golang program?. (On my machine (macOS), increasing file limits manually seemed to help, but I don't think it's a good solution since it doesn't really scale, and I'm not sure how many open files you'd need overall.)
As suggested by #liam-kelly, I think the i/o timeout error is coming from a DNS server or some other security mechanism. Setting a custom (bad) DNS server IP gives me the same error.

Related

Using Go standard libs, why do I leak TCP connections constantly in this two-tier architecture?

In this situation, I'm using all standard Go libraries -- net/http, most importantly.
The application consists of two layers. The first layer is the basic web application. The web application serves out the UI, and proxies a bunch of API calls back to the second layer based on username -- so, it's effectively a load balancer with consistent hashing -- each user is allocated to one of these second-layer nodes, and any requests pertaining to that user must be sent to that particular node.
Quick details
These API endpoints in the first layer effectively read in a JSON body, check the username, use that to figure out which of the layer 2 nodes to send the JSON body to, and then it sends it there. This is done using a global http.Client that has timeouts set on it, as appropriate.
The server side does a defer request.Body.Close() in each of the handlers after ensuring no error comes back from decoder.Decode(&obj) calls that unmarshal the JSON. If there is any codepath where that could happen, it isn't one that's likely to get followed very often.
Symptoms
On the node in the second layer (the application server) I get log lines like this because it's leaking sockets presumably and sucking up all the FDs:
2019/07/15 16:16:59 http: Accept error: accept tcp [::]:8100: accept4: too many open files; retrying in 1s
2019/07/15 16:17:00 http: Accept error: accept tcp [::]:8100: accept4: too many open files; retrying in 1s
And, when I do lsof 14k lines are output, of which 11,200 are TCP sockets. When I look into the contents of lsof, I see that nearly all these TCP sockets are in connection state CLOSE_WAIT, and are between my application server (second layer node) and the web server (the first layer node).
Interestingly, nothing seems to go wrong with the web application server (layer 1) during this timeframe.
Why does this happen?
I've seen lots of explanations, but most either point out that you need to specify custom defaults on a custom http.Client and not use the default, or they tell you to make sure to close the request bodies after reading from them in the layer 2 handlers.
Given all this information, does anyone have any idea what I can do to at least put this to bed once and for all? Everything I search on the internet is user error, and while I certainly hope that's the case here, I worry that I've nailed down every last quirk of the Go standard library I can find.
Been having trouble nailing down exactly how long it takes for this to happen -- the last time it happened, it was up for 3 days before I started to see this error, and at that point obviously nothing recovers until I kill and restart the process.
Any help would be hugely appreciated!
EDIT: example of client-side code
Here is an example of what I'm doing in the web application (layer 1) to call the layer 2 node:
var webHttpClient = &http.Client{
Transport: &http.Transport{
MaxIdleConnsPerHost: MaxIdleConnections,
},
Timeout: time.Second * 20,
}
// ...
uri := fmt.Sprintf("http://%s/%s", tsUri, "pms/all-venue-balances")
req, e := http.NewRequest("POST", uri, bytes.NewBuffer(b))
resp, err := webHttpClient.Do(req)
if err != nil {
log.Printf("Submit rebal error 3: %v\n", err)
w.WriteHeader(500)
return
}
defer resp.Body.Close()
body, _ := ioutil.ReadAll(resp.Body)
w.WriteHeader(200)
w.Write(body)

Check for Internet connection from application

I need to check if the user is connected to internet before I can proceed.
I am hitting the endpoint using HttpClient as follows:
client := &http.Client{}
req, _ := http.NewRequest("GET", url, nil)
req.SetBasicAuth(username, password)
res, err := client.Do(req)
if err != nil {
fmt.Println(err)
ui.Failed("Check your internet connection")
}
1) I need to show clear messages to the user if the user is not connected to the internet in this case, display "Please check your internet connection"
2) In case the server is not responding, and receives 504 bad gateway,
display "504 Bad gateway"
Can someone help how to proceed and distinguish between these two scenarios and I would like to display only simple messages and not the entire error messages received from the server.
Checking for an established Internet connection isn't as simple as making a single HTTP request to an arbitray URL, like Ivan de la Beldad suggests. This can fail for any number of reasons, none of which will necessarily stop you from doing what you actually intend to do with the connection. To name a few:
clients3.google.com may be deliberatly blocked by the local network (firewall, corporate or school proxy) or any en-route network (think Great Firewall of China)
clients3.google.com may be unreachable for some clients due to network outages
clients3.google.com itself may block the client for whatever reason (perhaps unintentionally)
clients3.google.com may have an outage
port 80 and 443 may work fine, but all other ports are blocked by shitty hotel/coffee shop wifi
shitty hotel/coffee shop wifi presents fake TLS certificates to clients, so HTTPS requests will fail in many cases
So instead of relying on a single arbitrary HTTP request it is much better to send some kind of liveliness probe to whatever service(s) you actually want to use.
If your app wants to communicate with an API, see if there is a health or status endpoint that you can call. If there isn't, look for some kind of cheap no-op. And try not to tell users to simply "check their Internet connection". Try to at least explain why your app concludes that there might be an issue "We can't connect to Twitter right now. If you are connected to the Internet try again in a few minutes" is much better.
On the off-chance that you really only want to check if the Internet itself is available, I would suggest making a DNS query to several DNS servers on the Internet. DNS is not likely to be blocked through local policies and cheaper than HTTP requests. Pick your DNS queries wisely and be prepared for NXDOMAIN responses.
To check if you're connected to internet you can use this:
func connected() (ok bool) {
_, err := http.Get("http://clients3.google.com/generate_204")
if err != nil {
return false
}
return true
}
And to get the status code you can have it from res.StatusCode.
The final result would be something like that:
if !connected() {
ui.Failed("Check your internet connection")
}
client := &http.Client{}
req, _ := http.NewRequest("GET", url, nil)
req.SetBasicAuth(username, password)
res, _ := client.Do(req)
if res.StatusCode == 504 {
ui.Failed("504 Bad gateway")
}
(I'm ignoring other errors that obviusly you should check)

How do I use nginx as a reverse proxy alongside Golang?

I want to use Golang as my server side language, but everything I've read points to nginx as the web server rather than relying on net/http (not that it's bad, but it just seems preferable overall, not the point of this post though).
I've found a few articles on using fastcgi with Golang, but I have no luck in finding anything on reverse proxies and HTTP and whatnot, other than this benchmark which doesn't go into enough detail unfortunately.
Are there any tutorials/guides available on how this operates?
For example there is a big post on Stackoverflow detailing it with Node, but I cannot find a similar one for go.
That's not needed at all anymore unless you're using nginx for caching, Golang 1.6+ is more than good enough to server http and https directly.
However if you're insisting, and I will secretly judge you and laugh at you, here's the work flow:
Your go app listens on a local port, say "127.0.0.1:8080"
nginx listens on 0.0.0.0:80 and 0.0.0.0:443 and proxies all requests to 127.0.0.1:8080.
Be judged.
The nginx setup in Node.js + Nginx - What now? is exactly the same setup you would use for Go, or any other standalone server for that matter that isn't cgi/fastcgi.
I use Nginx in production very effectively, using Unix sockets instead of TCP for the FastCGI connection. This code snippet comes from Manners, but you can adapt it for the normal Go api quite easily.
func isUnixNetwork(addr string) bool {
return strings.HasPrefix(addr, "/") || strings.HasPrefix(addr, ".")
}
func listenToUnix(bind string) (listener net.Listener, err error) {
_, err = os.Stat(bind)
if err == nil {
// socket exists and is "already in use";
// presume this is from earlier run and therefore delete it
err = os.Remove(bind)
if err != nil {
return
}
} else if !os.IsNotExist(err) {
return
}
listener, err = net.Listen("unix", bind)
return
}
func listen(bind string) (listener net.Listener, err error) {
if isUnixNetwork(bind) {
logger.Printf("Listening on unix socket %s\n", bind)
return listenToUnix(bind)
} else if strings.Contains(bind, ":") {
logger.Printf("Listening on tcp socket %s\n", bind)
return net.Listen("tcp", bind)
} else {
return nil, fmt.Errorf("error while parsing bind arg %v", bind)
}
}
Take a look around about line 252, which is where the switching happens between HTTP over a TCP connection and FastCGI over Unix-domain sockets.
With Unix sockets, you have to adjust your startup scripts to ensure that the sockets are created in an orderly way with the correct ownership and permissions. If you get that right, the rest is easy.
To answer other remarks about why you would want to use Nginx, it always depends on your use-case. I have Nginx-hosted static/PHP websites; it is convenient to use it as a reverse-proxy on the same server in such cases.

How to query TCP connection state in go?

On the client side of a TCP connection, I am attempting to to reuse established connections as much as possible to avoid the overhead of dialing every time I need a connection. Fundamentally, it's connection pooling, although technically, my pool size just happens to be one.
I'm running into a problem in that if a connection sits idle for long enough, the other end disconnects. I've tried using something like the following to keep connections alive:
err = conn.(*net.TCPConn).SetKeepAlive(true)
if err != nil {
fmt.Println(err)
return
}
err = conn.(*net.TCPConn).SetKeepAlivePeriod(30*time.Second)
if err != nil {
fmt.Println(err)
return
}
But this isn't helping. In fact, it's causing my connections to close sooner. I'm pretty sure this is because (on a Mac) this means the connection health starts being probed after 30 seconds and then is probed at 8 times at 30 second intervals. The server side must not be supporting keepalive, so after 4 minutes and 30 seconds, the client is disconnecting.
There might be nothing I can do to keep an idle connection alive indefinitely, and that would be absolutely ok if there were some way for me to at least detect that a connection has been closed so that I can seamlessly replace it with a new one. Alas, even after reading all the docs and scouring the blogosphere for help, I can't find any way at all in go to query the state of a TCP connection.
There must be a way. Does anyone have any insight into how that can be accomplished? Many thanks in advance to anyone who does!
EDIT:
Ideally, I'd like to learn how to handle this, low-level with pure go-- without using third-party libraries to accomplish this. Of course if there is some library that does this, I don't mind being pointed in its direction so I can see how they do it.
The socket api doesn't give you access to the state of the connection. You can query the current state it in various ways from the kernel (/proc/net/tcp[6] on linux for example), but that doesn't make any guarantee that further sends will succeed.
I'm a little confused on one point here. My client is ONLY sending data. Apart from acking the packets, the server sends nothing back. Reading doesn't seem an appropriate way to determine connection status, as there's noting TO read.
The socket API is defined such that that you detect a closed connection by a read returning 0 bytes. That's the way it works. In Go, this is translated to a Read returning io.EOF. This will usually be the fastest way to detect a broken connection.
So am I supposed to just send and act on whatever errors occur? If so, that's a problem because I'm observing that I typically do not get any errors at all when attempting to send over a broken pipe-- which seems totally wrong
If you look closely at how TCP works, this is the expected behavior. If the connection is closed on the remote side, then your first send will trigger an RST from the server, fully closing the local connection. You either need to read from the connection to detect the close, or if you try to send again you will get an error (assuming you've waited long enough for the packets to make a round trip), like "broken pipe" on linux.
To clarify... I can dial, unplug an ethernet cable, and STILL send without error. The messages don't get through, obviously, but I receive no error
If the connection is actually broken, or the server is totally unresponsive, then you're sending packets off to nowhere. The TCP stack can't tell the difference between packets that are really slow, packet loss, congestion, or a broken connection. The system needs to wait for the retransmission timeout, and retry the packet a number of times before failing. The standard configuration for retries alone can take between 13 and 30 minutes to trigger an error.
What you can do in your code is
Turn on keepalive. This will notify you of a broken connection more quickly, because the idle connection is always being tested.
Read from the socket. Either have a concurrent Read in progress, or check for something to read first with select/poll/epoll (Go usually uses the first)
Set timeouts (deadlines in Go) for everything.
If you're not expecting any data from the connection, checking for a closed connection is very easy in Go; dispatch a goroutine to read from the connection until there's an error.
notify := make(chan error)
go func() {
buf := make([]byte, 1024)
for {
n, err := conn.Read(buf)
if err != nil {
notify <- err
return
}
if n > 0 {
fmt.Println("unexpected data: %s", buf[:n])
}
}
}()
There is no such thing as 'TCP connection state', by design. There is only what happens when you send something. There is no TCP API, at any level down to the silicon, that will tell you the current state of a TCP connection. You have to try to use it.
If you're sending keepalive probes, the server doesn't have any choice but to respond appropriately. The server doesn't even know that they are keepalives. They aren't. They are just duplicate ACKs. Supporting keepalive just means supporting sending keepalives.

golang errors with bind address already in use even though nothing is running on the port

I have a setup in golang which basically gets a free port from OS and then starts a http sever on it. It started to give random errors with port signup failures. I simplified it into the following program which seems to error after grabbing a few free ports. It happens very randomly and there is no real process running on the port it errors. Doesn't make sense at all to me on why this has to error. Any help would be appreciated.
Output of the program:
..
..
58479
..
..
58867
58868
58869
..
bound well! 58867
bound well! 58868
bound well! 58869
..
..
..
2015/04/28 09:05:09 Error while binding port: listen tcp :58479: bind: address already in use
I made sure to check that the free port that came out never repeated.
package main
import (
"net"
"net/http"
"log"
"fmt"
)
func main() {
for {
l, _ := net.Listen("tcp", ":0")
var port = l.Addr().String()[5:]
l.Close()
fmt.Println(port)
go func() {
l1, err := net.Listen("tcp", ":"+port)
if (err != nil) {
log.Fatal("Error while binding port: ", err.Error())
} else {
fmt.Println("bound well! ", port)
}
http.Serve(l1, nil)
}()
}
}
What you do is checking whether the port is free at one point and then you try to use it basing on the fact it was free in the past. This is not going to work.
What happens: with every iteration of a for loop, you generate a port number and make sure it's free. Then you spawn a routine with intention of using this port (which is already released back to the pool of free ports). You don't really know when the routine kicks in. It might be activated while the main routine (the for loop) has just generated another free port – maybe the same one again? Or maybe another process has taken this port in a meantime. Essentially you can have a race condition on a single port.
After some more research:
There's a small caveat though. Two different sockets can be bound to the same ip+port as long as the local+remote pair is unique. I've once written a response about it. So when I've been creating listener with :0 I was able to get a "collision"; proved by netstat -an:
10.0.1.11.65245 *.* LISTEN
10.0.1.11.65245 17.143.164.101.5223 ESTABLISHED
Now, the thing is that if you want to explicitly bind the socket the port being used, this is not possible. Probably because you would be able to specify local address only and remote address wouldn't be known until call to listen or connect (we're talking about syscalls now, not the Go interface). In other words, when you leave port unspecified, OS has a wider choice. So if it happens you got a local address that is also being used by another socket, you're unable to bind to it manually.
How to sovle it:
As I've mentioned in the comments, your server process should be using :0 notation in order to be able to choose available resource from OS. Once it's listening, the address should be announced to interested processes. You can do it, for example, through a file or a standard output.
Firstly I check the port:
$ lsof -i :8080
The results are:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
WeChat 1527 wonz 241u IPv4 0xbfe135d7b32e86f1 0t0 TCP wonzdembp:63849->116.128.133.101:http-alt (ESTABLISHED)
__debug_b 41009 wonz 4u IPv6 0xbfe135e149b1e659 0t0 TCP *:http-alt (LISTEN)
So I kill PID:
$ kill 41009
Then it works.
It is possible you were previously running or debugging an application on this port, and it did not shut down cleanly. The process might still be hanging out in your system's memory. Kill that process completely, and any other network daemons that might be lurking in the shadows, and try to run your application again.
If you haven't checked for this already, you can use (if using Linux) top, htop, or any GUI system monitor like Windows' Task Manager, Gnome3's System Monitor or KDE's KSysGuard to search for the offending process.
For an example, I have observed that Visual Studio Code's debugger/runner utility (F5/Ctrl+F5) does not always clean up the process, especially if you hit F5 too quickly and the old debugger did not shut down.
Use reuseport to afftectively use the port for listen.
"github.com/libp2p/go-reuseport"
l, err := reuseport.Listen("tcp", ":"+strconv.Itoa(tcpPort))
instead of
l1, err := net.Listen("tcp", ":"+port)

Resources