is ioutil.ReadAll blocking my server? - asynchronous

I'm trying to write a server in Go, using the net/http package. I only have one route, and it's pretty simple. It downloads a file from S3 and returns it to the client:
response, err := http.Get("some S3 url")
if err != nil {
return
}
body, err := ioutil.ReadAll(response.Body)
w.Write(body)
Downloading the url myself takes about 0.25 seconds. So I start this server and send it 250 requests/sec. Initially I get responses back within 0.25 seconds. But that number keeps going up until it starts taking 45 seconds for a response. I'm running this on a 40 core machine, with GOMAXPROCS=40. I started wondering if somehow the downloads aren't happening in parallel.
But if I comment out this line:
body, err := ioutil.ReadAll(response.Body)
And just return some garbage data of equal length, suddenly my server consistently responds in 0.25 seconds. Why is it faster after removing the ReadAll?

Few things comes to mind:
You're not closing response.Body and the server is running out of FDs.
The garbage collector is being slow and you're running out of memory for reading so many files with ReadAll.
You're choking the networking because of #1.
Try something like this and see if it helps:
response, err := http.Get("some S3 url")
if err != nil {
return
}
defer response.Body.Close()
_, err := io.Copy(w, response.Body)

Related

Speed up reading HTTP response body for speed test

I'm writing a (hopefully zero-dependency) speed test in Go leveraging Netflix's fast.com servers.
The code is pulling down several pieces of 25MB content, reading the response into a buffer and counting the bytes read along the way.
The code (speed test) works as expected on my development computer, but when I run it on a much tinier linux machine, the speed test caps out at measuring ~75Mbps (despite being hardwired into a network reliably providing 400+Mbps).
I believe the issue must be that because the machine is small, it's relatively slow at either reading the response or writing into the buffer.
I did a Go trace of the program on the 2 machines, and sure enough the Heap on the small linux machine continually gets full before GC clears it out; rinse and repeat.
The question is: what can I do about this to make my speed test accurate? More specifically, since I don't actually need the response data (because this is just a speed test), is there a way I can download and count the bytes from the HTTP response without actually bothering to write them anywhere, thus potentially saving time?
The relevant code is below. (Note: the reason I'm using http.NewRequest is because in some cases I build on URL params).
client := &http.Client{}
req, err := http.NewRequest("GET", url, nil)
resp, err := client.Do(req)
defer resp.Body.Close()
buffer := make([]byte, 128 * 1024)
for {
b, err := resp.Body.Read(buffer)
if err == io.EOF {
break
}
func() {
mu.Lock()
defer mu.Unlock()
*bytesRead += b
if *done {
return
}
}()
}
Edit: I should also add that the linux device has been tested and validated via other speed tests that it can achieve greater than 75Mbps.
You can use the io.Discard Writer in order to speedup the code, an example:
var bytesRead int64
client := &http.Client{}
req, err := http.NewRequest("GET", url, nil)
if err != nil {
panic(err)
}
resp, err := client.Do(req)
if err != nil {
panic(err)
}
defer resp.Body.Close()
nBytes, err := io.Copy(io.Discard, resp.Body)
if err != nil {
panic(err)
}
bytesRead += nBytes
By this way you don't need to iterate among a bytes buffer.

Why is there a 60 second delay on my HTTP POST request when using a Go HTTP client?

My goal is to scrape a website that requires me to log in first using HTTP requests in Golang. I actually succeeded by finding out I can send a post request to the website writing form-data into the body of the request. When I test this through an API development software I use called Postman, the response is instantaneous with no delays. However, when performing the request with an HTTP client in Go, there is a consistent 60 second delay every single time. I end up getting a logged in page, but for my program I need the response to be nearly instantaneous.
As you can see in my code, I've tried adding a bunch of headers to the request like "Connection", "Content-Type", "User-Agent" since I thought maaaaaybe the website can tell I'm requesting from a program and is forcing me to wait 60 seconds for a response. Adding these headers to make my request more legitimate(?) doesn't work at all.
Is the delay coming from Go's HTTP client being slow or is there something wrong with how I'm forming my HTTP POST request? Also, was I on to something with my headers and HTTP client is rewriting them when they send out?
Here's my simple program...
package main
import (
"bytes"
"fmt"
"mime/multipart"
"net/http"
"net/http/cookiejar"
"os"
)
func main() {
url := "https://easypronunciation.com/en/log-in"
method := "POST"
payload := &bytes.Buffer{}
writer := multipart.NewWriter(payload)
_ = writer.WriteField("email", "foo#bar.com")
_ = writer.WriteField("password", "*********")
_ = writer.WriteField("persistent_login", "on")
_ = writer.WriteField("submit", "")
err := writer.Close()
if err != nil {
fmt.Println(err)
}
cookieJar, _ := cookiejar.New(nil)
client := &http.Client{
Jar: cookieJar,
}
req, err := http.NewRequest(method, url, payload)
if err != nil {
fmt.Println(err)
}
req.Header.Set("Content-Type", writer.FormDataContentType())
req.Header.Set("Connection", "Keep-Alive")
req.Header.Set("Accept-Language", "en-US")
req.Header.Set("User-Agent", "Mozilla/5.0")
res, err := client.Do(req)
if err != nil {
fmt.Println(err)
}
defer res.Body.Close()
f, err := os.Create("response.html")
defer f.Close()
res.Write(f)
}
I doubt, this is the go client library too. I would suggest printing out the latencies for different components and see if/where the 60 second delay is. I would also replace and try different URLs instead

Terminate http request from IP layer using golang

I am making an http post request to a server using golang. Suppose the server is currently turned off (Means the machine on which the server runs is turned off) then the request is stuck at the IP layer. So my program execution is unable to proceed further. It is unable to proceed to the Application layer. So is there any way in golang to stop this.
I am using the following code.
req, err := http.NewRequest("POST", url, bytes.NewReader(b))
if err != nil {
return errors.Wrap(err, "new request error")
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
return errors.Wrap(err, "http request error")
}
defer resp.Body.Close()
Is there anything that can be added to this, to terminate the request if it doesn't find anything from the IP layer.
The default http Client has no timeout. You can create an explicit http.Client yourself and set the timeout:
var cl = &http.Client{
Timeout: time.Second * 10,
}
resp, err := cl.Do(req)
if err != nil {
// err will be set on timeout
return errors.Wrap(err, "http request error")
}
defer resp.Body.Close()
If the server does not answer any more in the middle of a request, you can handle the timeout.
Use a non-default http.Transport with its DialContext field set to a function which uses a custom context with the properly configured timeout/deadline. Another option is to use a custom net.Dialer.
Something like this:
cli := http.Client{
Transport: &http.Transport{
DialContext: func (ctx context.Context, network, address string) (net.Conn, error) {
dialer := net.Dialer{
Timeout: 3 * time.Second,
}
return dialer.DialContext(ctx, network, address)
},
},
}
req, err := http.NewRequest(...)
resp, err := cli.Do(req)
Note that as per the net.Dialer's docs the context passed to its DialContext might trump the timeout set on the dialer itself—this is
exactly what we need: the dialer's Timeout field controls exactly the
"dialing" (TCP connection establishment) while you might also arm your
HTTP request with a context (using http.Request.WithContext) controlling
the timeout of the whole request, and also be able to cancel it at any time (including the dialing step).
Playground example.
The Transport #kostix refers to is definitely what you're looking for in this case. Transports as well as Clients are safe for concurrent use as well. But please read about the Transport (and I also advise reading about the Client as well) as there are a number of different ways to affect how you handle idle connections, not just the pre-mentioned DialContext.
As you may want to set your ResponseHeaderTimeout:
ResponseHeaderTimeout, if non-zero, specifies the amount of
time to wait for a server's response headers after fully
writing the request (including its body, if any). This
time does not include the time to read the response body.
Or, if you are using a secure connection, you may want to set your TLSHandshakeTimeout:
TLSHandshakeTimeout specifies the maximum amount of time waiting to
wait for a TLS handshake. Zero means no timeout.
For readability and maintainability, I suggest also maybe creating a function to build your Client, something along the lines of:
func buildClient(timeout time.Duration) *http.Client {
tr := &http.Transport{
IdleConnTimeout: timeout,
ResponseHeaderTimeout: timeout,
TLSHandshakeTimeout: timeout,
}
client := &http.Client{
Transport: tr,
Timeout: timeout,
}
return client
}

why is golang http server failing with "broken pipe" when response exceeds 8kb?

I have a example web server below where if you call curl localhost:3000 -v then ^C (cancel) it immediately (before 1 second), it will report write tcp 127.0.0.1:3000->127.0.0.1:XXXXX: write: broken pipe.
package main
import (
"fmt"
"net/http"
"time"
)
func main() {
log.Fatal(http.ListenAndServe(":3000", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
time.Sleep(1 * time.Second)
// Why 8061 bytes? Because the response header on my computer
// is 132 bytes, adding up the entire response to 8193 (1 byte
// over 8kb)
if _, err := w.Write(make([]byte, 8061)); err != nil {
fmt.Println(err)
return
}
})))
}
Based on my debugging, I have been able to conclude that this will only happen if the entire response is writing more than 8192 bytes (or 8kb). If my entire response write less than 8192, the broken pipe error is not returned.
My question is where is this 8192 bytes (or 8kb) buffer limit set? Is this a limit in Golang's HTTP write buffer? Is this related to the response being chunked? Is this only related to the curl client or the browser client? How can I change this limit so I can have a bigger buffer written before the connection is closed (for debugging purposes)?
Thanks!
In net/http/server.go the output buffer is set to 4<<10, i.e. 4KB.
The reason you see the error at 8KB, is that it takes at least 2 writes to a socket to detect a closed remote connection. The first write succeeds, but the remote host sends an RST packet. The second write will be to a closed socket, which is what returns the broken pipe error.
Depending on the socket write buffer, and the connection latency, it's possible that even more writes could succeed before the first RST packet is registered.
It is broken pipe, but u should use ioutil.ReadAll for small data size of response or io.copy for large data size of response.
For ioutil.ReadAll
defer response.Body.Close()
body, err := ioutil.ReadAll(response.Body)
if err != nil {
logger.Errorf(ctx, "err is %+v", err)
return nil, err
}
For io.copy
// 10MB
var wb = make([]byte, 0, 10485760)
buf := bytes.NewBuffer(wb)
written, err := io.Copy(buf, response.Body)
body := wb[:written]

Program halts after successive timeout while performing GET request

I'm making a crawler that fetches html, css and js pages. The crawler is a typical one with 4 go-routines running concurrently to fetch the resources. To study, I've been using 3 test sites. The crawler works fine and shows program completion log while testing two of them.
In the 3rd website however, there are too many timeouts happening while fetching css links. This eventually causes my program to stop. It fetches the links but after 20+ successive timeouts, the program stops showing log. Basically it halts. I don't think it's problem with Event log console.
Do I need to handle timeouts separately ? I'm not posting the full code because it won't relate to conceptual answer that I'm seeking. However the code goes something like this :
for {
site, more := <-sites
if more {
url, err := url.Parse(site)
if err != nil {
continue
}
response, error := http.Get(url.String())
if error != nil {
fmt.Println("There was an error with Get request: ", error.Error())
continue
}
// Crawl function
}
}
The default behavior of the http client is to block forever. Set a timeout when you create the client: (http://godoc.org/net/http#Client)
func main() {
client := http.Client{
Timeout: time.Second * 30,
}
res, err := client.Get("http://www.google.com")
if err != nil {
panic(err)
}
fmt.Println(res)
}
After 30 seconds Get will return an error.

Resources