Speed up reading HTTP response body for speed test - http

I'm writing a (hopefully zero-dependency) speed test in Go leveraging Netflix's fast.com servers.
The code is pulling down several pieces of 25MB content, reading the response into a buffer and counting the bytes read along the way.
The code (speed test) works as expected on my development computer, but when I run it on a much tinier linux machine, the speed test caps out at measuring ~75Mbps (despite being hardwired into a network reliably providing 400+Mbps).
I believe the issue must be that because the machine is small, it's relatively slow at either reading the response or writing into the buffer.
I did a Go trace of the program on the 2 machines, and sure enough the Heap on the small linux machine continually gets full before GC clears it out; rinse and repeat.
The question is: what can I do about this to make my speed test accurate? More specifically, since I don't actually need the response data (because this is just a speed test), is there a way I can download and count the bytes from the HTTP response without actually bothering to write them anywhere, thus potentially saving time?
The relevant code is below. (Note: the reason I'm using http.NewRequest is because in some cases I build on URL params).
client := &http.Client{}
req, err := http.NewRequest("GET", url, nil)
resp, err := client.Do(req)
defer resp.Body.Close()
buffer := make([]byte, 128 * 1024)
for {
b, err := resp.Body.Read(buffer)
if err == io.EOF {
break
}
func() {
mu.Lock()
defer mu.Unlock()
*bytesRead += b
if *done {
return
}
}()
}
Edit: I should also add that the linux device has been tested and validated via other speed tests that it can achieve greater than 75Mbps.

You can use the io.Discard Writer in order to speedup the code, an example:
var bytesRead int64
client := &http.Client{}
req, err := http.NewRequest("GET", url, nil)
if err != nil {
panic(err)
}
resp, err := client.Do(req)
if err != nil {
panic(err)
}
defer resp.Body.Close()
nBytes, err := io.Copy(io.Discard, resp.Body)
if err != nil {
panic(err)
}
bytesRead += nBytes
By this way you don't need to iterate among a bytes buffer.

Related

Golang HTTP Get Request Not Resolving for some URL

I was trying to build some sort of website status checker. I figure out that the golang HTTP get request is not resolved and hung forever for some URL like https://www.hetzner.com. But the same URL works if we do curl.
Golang
Here there is no error thrown. It just hangs on http.Get
func main() {
resp, err := http.Get("https://www.hetzner.com")
if err != nil {
fmt.Println("Error while retrieving site", err)
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
fmt.Println("Eroor while reading response body", err)
}
fmt.Println("RESPONSE", string(body))}
CURL
I get the response while running following command.
curl https://www.hetzner.com
What may be the reason? And how do I resolve this issue from golang HTTP?
Your specific case can be fixed by specifying HTTP User-Agent Header:
import (
"fmt"
"io"
"net/http"
)
func main() {
client := &http.Client{}
req, err := http.NewRequest("GET", "https://www.hetzner.com", nil)
if err != nil {
fmt.Println("Error while retrieving site", err)
}
req.Header.Set("User-Agent", "Golang_Spider_Bot/3.0")
resp, err := client.Do(req)
if err != nil {
fmt.Println("Error while retrieving site", err)
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
fmt.Println("Eroor while reading response body", err)
}
fmt.Println("RESPONSE", string(body))
}
Note: many other hosts will reject requests from your server because of some security rules on their side. Some ideas:
Empty or bot-like User-Agent HTTP header
Location of your IP address. For example, online shops in the USA don't need to handle requests from Russia.
Autonomous System or CIDR of your provider. Some ASNs are completely blackholed because of the enormous malicious activities from their residents.
Note 2: Many modern websites have DDoS protection or CDN systems in front of them. If Cloudflare protects your target website, your HTTP request will be blocked despite the status code 200. To handle this, you need to build something able to render JavaScript-based websites and add some scripts to resolve a captcha.
Also, if you check a considerable amount of websites in a short time, you will be blocked by your DNS servers as they have some inbuild rate limits. In this case, you may want to take a look at massdns or similar solutions.

Why is there a 60 second delay on my HTTP POST request when using a Go HTTP client?

My goal is to scrape a website that requires me to log in first using HTTP requests in Golang. I actually succeeded by finding out I can send a post request to the website writing form-data into the body of the request. When I test this through an API development software I use called Postman, the response is instantaneous with no delays. However, when performing the request with an HTTP client in Go, there is a consistent 60 second delay every single time. I end up getting a logged in page, but for my program I need the response to be nearly instantaneous.
As you can see in my code, I've tried adding a bunch of headers to the request like "Connection", "Content-Type", "User-Agent" since I thought maaaaaybe the website can tell I'm requesting from a program and is forcing me to wait 60 seconds for a response. Adding these headers to make my request more legitimate(?) doesn't work at all.
Is the delay coming from Go's HTTP client being slow or is there something wrong with how I'm forming my HTTP POST request? Also, was I on to something with my headers and HTTP client is rewriting them when they send out?
Here's my simple program...
package main
import (
"bytes"
"fmt"
"mime/multipart"
"net/http"
"net/http/cookiejar"
"os"
)
func main() {
url := "https://easypronunciation.com/en/log-in"
method := "POST"
payload := &bytes.Buffer{}
writer := multipart.NewWriter(payload)
_ = writer.WriteField("email", "foo#bar.com")
_ = writer.WriteField("password", "*********")
_ = writer.WriteField("persistent_login", "on")
_ = writer.WriteField("submit", "")
err := writer.Close()
if err != nil {
fmt.Println(err)
}
cookieJar, _ := cookiejar.New(nil)
client := &http.Client{
Jar: cookieJar,
}
req, err := http.NewRequest(method, url, payload)
if err != nil {
fmt.Println(err)
}
req.Header.Set("Content-Type", writer.FormDataContentType())
req.Header.Set("Connection", "Keep-Alive")
req.Header.Set("Accept-Language", "en-US")
req.Header.Set("User-Agent", "Mozilla/5.0")
res, err := client.Do(req)
if err != nil {
fmt.Println(err)
}
defer res.Body.Close()
f, err := os.Create("response.html")
defer f.Close()
res.Write(f)
}
I doubt, this is the go client library too. I would suggest printing out the latencies for different components and see if/where the 60 second delay is. I would also replace and try different URLs instead

Golang http.Get too many redirects

I'm trying to download a file from the web. It should be a simple processes. One that I've alredy done before. But, this particular link (a 135 kB zip file) gives me an error message: Get "http://www1.caixa.gov.br/loterias/_arquivos/loterias/D_megase.zip": stopped after 10 redirect. If I copy the link into the browser the file is downloaded without any issues, but when using the code below, the error pops up.
package main
import (
"io"
"net/http"
"os"
)
func main() {
link := "http://www1.caixa.gov.br/loterias/_arquivos/loterias/D_megase.zip"
resp, err := http.Get(link)
if err != nil {
panic(err)
}
defer resp.Body.Close()
// Create the file
out, err := os.Create("ms.zip")
if err != nil {
panic(err)
}
defer out.Close()
// Write the body to file
_, err = io.Copy(out, resp.Body)
if err != nil {
panic(err)
}
}
Any ideas on why does this happens and how to get around it?
Thanks for the attention.
After investigating this url I see that it sets cookie
Set-Cookie: security=true; path=/
You can set cookie manually, or implement CookieJar
c := http.Client{}
req, err := http.NewRequest("GET", link, nil)
if err != nil {
panic(err)
}
req.AddCookie(&http.Cookie{Name: "security", Value: "true", Path: "/"})
resp, err := c.Do(req)
if err != nil {
panic(err)
}
Your code is totally fine, but you'll often find this issue is more related to the source you're trying to download a file from, itself, rather than Go.
You would have had the same issue with other tools/languages, because the host you are trying to reach, keeps redirecting you because of an invalid 'User-Agent' header property. This is often the case when you want to allow your files to be downloadable only from 'browsers', rather than crawls, automated scripts etc.
With Go, you can add the header property with req.Header.Set("User-Agent", "<some-user-agent-value>"), before sending the request. You'd create an instance of request set the header, and execute it with a http.Client{} and client.Do(req).
Eg:
link := "http://www1.caixa.gov.br/loterias/_arquivos/loterias/D_megase.zip"
req, err := http.NewRequest("GET", link, nil)
if err != nil {
panic(err)
}
req.Header.Set("User-Agent", "Mozilla/4.0") // Doesn't even have to be a full
// proper user agent string
client := &http.Client{}
resp, err := client.Do(req)
You can read more in the Go's http pkg docs, it states that:
"For control over HTTP client headers, redirect policy, and other
settings, create a Client..."
Here's also the http.reqeust and http.client docs.
More about this ingeneral you can find in e.g. Mozilla's HTTP docs, as well as many other great docs and resources out there.
Btw. the zip archive you're trying to download seems like invalid. :-)

is ioutil.ReadAll blocking my server?

I'm trying to write a server in Go, using the net/http package. I only have one route, and it's pretty simple. It downloads a file from S3 and returns it to the client:
response, err := http.Get("some S3 url")
if err != nil {
return
}
body, err := ioutil.ReadAll(response.Body)
w.Write(body)
Downloading the url myself takes about 0.25 seconds. So I start this server and send it 250 requests/sec. Initially I get responses back within 0.25 seconds. But that number keeps going up until it starts taking 45 seconds for a response. I'm running this on a 40 core machine, with GOMAXPROCS=40. I started wondering if somehow the downloads aren't happening in parallel.
But if I comment out this line:
body, err := ioutil.ReadAll(response.Body)
And just return some garbage data of equal length, suddenly my server consistently responds in 0.25 seconds. Why is it faster after removing the ReadAll?
Few things comes to mind:
You're not closing response.Body and the server is running out of FDs.
The garbage collector is being slow and you're running out of memory for reading so many files with ReadAll.
You're choking the networking because of #1.
Try something like this and see if it helps:
response, err := http.Get("some S3 url")
if err != nil {
return
}
defer response.Body.Close()
_, err := io.Copy(w, response.Body)

Strange behaviour of golang UDP server

I wrote a simple UDP server in go.
When I do go run udp.go it prints all packages I send to it. But when running go run udp.go > out it stops passing stdout to the out file when the client stops.
The client is simple program that sends 10k requests. So in the file I have around 50% of sent packages. When I run the client again, the out file grows again until the client script finishes.
Server code:
package main
import (
"net"
"fmt"
)
func main() {
addr, _ := net.ResolveUDPAddr("udp", ":2000")
sock, _ := net.ListenUDP("udp", addr)
i := 0
for {
i++
buf := make([]byte, 1024)
rlen, _, err := sock.ReadFromUDP(buf)
if err != nil {
fmt.Println(err)
}
fmt.Println(string(buf[0:rlen]))
fmt.Println(i)
//go handlePacket(buf, rlen)
}
}
And here is the client code:
package main
import (
"net"
"fmt"
)
func main() {
num := 0
for i := 0; i < 100; i++ {
for j := 0; j < 100; j++ {
num++
con, _ := net.Dial("udp", "127.0.0.1:2000")
fmt.Println(num)
buf := []byte("bla bla bla I am the packet")
_, err := con.Write(buf)
if err != nil {
fmt.Println(err)
}
}
}
}
As you suspected, it seems like UDP packet loss due to the nature of UDP. Because UDP is connectionless, the client doesn't care if the server is available or ready to receive data. So if the server is busy processing, it won't be available to handle the next incoming datagram. You can check with netstat -u (which should include UDP packet loss info). I ran into the same thing, in which the server (receive side) could not keep up with the packets sent.
You can try two things (the second worked for me with your example):
Call SetReadBuffer. Ensure the receive socket has enough buffering to handle everything you throw at it.
sock, _ := net.ListenUDP("udp", addr)
sock.SetReadBuffer(1048576)
Do all packet processing in a go routine. Try to increase the datagrams per second by ensuring the server isn't busy doing other work when you want it to be available to receive. i.e. Move the processing work to a go routine, so you don't hold up ReadFromUDP().
//Reintroduce your go handlePacket(buf, rlen) with a count param
func handlePacket(buf []byte, rlen int, count int)
fmt.Println(string(buf[0:rlen]))
fmt.Println(count)
}
...
go handlePacket(buf, rlen, i)
One final option:
Lastly, and probably not what you want, you put a sleep in your client which would slow down the rate and would also remove the problem. e.g.
buf := []byte("bla bla bla I am the packet")
time.Sleep(100 * time.Millisecond)
_, err := con.Write(buf)
Try syncing stdout after the write statements.
os.Stdout.Sync()

Resources