I'm making a crawler that fetches html, css and js pages. The crawler is a typical one with 4 go-routines running concurrently to fetch the resources. To study, I've been using 3 test sites. The crawler works fine and shows program completion log while testing two of them.
In the 3rd website however, there are too many timeouts happening while fetching css links. This eventually causes my program to stop. It fetches the links but after 20+ successive timeouts, the program stops showing log. Basically it halts. I don't think it's problem with Event log console.
Do I need to handle timeouts separately ? I'm not posting the full code because it won't relate to conceptual answer that I'm seeking. However the code goes something like this :
for {
site, more := <-sites
if more {
url, err := url.Parse(site)
if err != nil {
continue
}
response, error := http.Get(url.String())
if error != nil {
fmt.Println("There was an error with Get request: ", error.Error())
continue
}
// Crawl function
}
}
The default behavior of the http client is to block forever. Set a timeout when you create the client: (http://godoc.org/net/http#Client)
func main() {
client := http.Client{
Timeout: time.Second * 30,
}
res, err := client.Get("http://www.google.com")
if err != nil {
panic(err)
}
fmt.Println(res)
}
After 30 seconds Get will return an error.
Related
I recently run into a problem when I develope a high concurrency http client via valyala/fasthttp: The client works fine in the first 15K~ requests but after that more and more dial tcp4 127.0.0.1:80: i/o timeout and dialing to the given TCP address timed out error occours.
Sample Code
var Finished = 0
var Failed = 0
var Success = 0
func main() {
for i := 0; i < 1000; i++ {
go get()
}
start := time.Now().Unix()
for {
fmt.Printf("Rate: %.2f/s Success: %d, Failed: %d\n", float64(Success)/float64(time.Now().Unix()-start), Success, Failed)
time.Sleep(100 * time.Millisecond)
}
}
func get() {
ticker := time.NewTicker(time.Duration(100+rand.Intn(2900)) * time.Millisecond)
defer ticker.Stop()
client := &fasthttp.Client{
MaxConnsPerHost: 10000,
}
for {
req := &fasthttp.Request{}
req.SetRequestURI("http://127.0.0.1:80/require?number=10")
req.Header.SetMethod(fasthttp.MethodGet)
req.Header.SetConnectionClose()
res := &fasthttp.Response{}
err := client.DoTimeout(req, res, 5*time.Second)
if err != nil {
fmt.Println(err.Error())
Failed++
} else {
Success++
}
Finished++
client.CloseIdleConnections()
<-ticker.C
}
}
Detail
The server is built on labstack/echo/v4 and when client got timeout error, the server didn't have any error, and manually perform the request via Postman or Browser like Chrome are works fine.
The client runs pretty well in the first 15K~ request, but after that, more and more timeout error occours and the output Rate is decreasing. I seached for google and github and I found this issue may be the most suitable one, but didn't found a solution.
Another tiny problem...
As you can notice, when the client start, it will first generate some the server closed connection before returning the first response byte. Make sure the server returns 'Connection: close' response header before closing the connection error, and then works fine till around 15K issues, and then start generating more and more timeout error.Why it would generate the Connection closed error in the begining?
Machine Info
Macbook Pro 14 2021 (Apple M1 Pro) with 16GB Ram and running macOS Monterey 12.4
So basically, If you trying to open a connection and then close it as soon as possibile, it's not going to be like "connection#1 use a port then immediately return it back", there gonna be lots of processing needs to be done, so If you want to send many request at the same time, I think it's better to reuse the connection as possible as you can.
For example, in fasthttp:
req := fasthttp.AcquireRequest()
res := fasthttp.AcquireResponse()
defer fasthttp.ReleaseRequest(req)
defer fasthttp.ReleaseResponse(res)
// Then do the request below
I have an API that receives a CSV file to process. I'd like to be able to send back an 202 Accepted (or any status really) while processing the file in the background. I have a handler that checks the request, writes the success header, and then continues processing via a producer/consumer pattern. The problem is that, due to the WaitGroup.Wait() calls, the accepted header isn't sending back. The errors on the handler validation are sending back correctly but that's because of the return statements.
Is it possible to send that 202 Accepted back with the wait groups as I'm hoping (and if so, what am I missing)?
func SomeHandler(w http.ResponseWriter, req *http.Request) {
endAccepted := time.Now()
err := verifyRequest(req)
if err != nil {
w.WriteHeader(http.StatusBadRequest)
data := JSONErrors{Errors: []string{err.Error()}}
json.NewEncoder(w).Encode(data)
return
}
// ...FILE RETRIEVAL CLIPPED (not relevant)...
// e.g. csvFile, openErr := os.Open(tmpFile.Name())
//////////////////////////////////////////////////////
// TODO this isn't sending due to the WaitGroup.Wait()s below
w.WriteHeader(http.StatusAccepted)
//////////////////////////////////////////////////////
// START PRODUCER/CONSUMER
jobs := make(chan *Job, 100) // buffered channel
results := make(chan *Job, 100) // buffered channel
// start consumers
for i := 0; i < 5; i++ { // 5 consumers
wg.Add(1)
go consume(i, jobs, results)
}
// start producing
go produce(jobs, csvFile)
// start processing
wg2.Add(1)
go process(results)
wg.Wait() // wait for all workers to finish processing jobs
close(results)
wg2.Wait() // wait for process to finish
log.Println("===> Done Processing.")
}
You're doing all the processing in the background, but you're still waiting for it to finish. The solution would be to just not wait. The best solution would move all of the handling elsewhere to a function you can just call with go to run it in the background, but the simplest solution leaving it inline would just be
w.WriteHeader(http.StatusAccepted)
go func() {
// START PRODUCER/CONSUMER
jobs := make(chan *Job, 100) // buffered channel
results := make(chan *Job, 100) // buffered channel
// start consumers
for i := 0; i < 5; i++ { // 5 consumers
wg.Add(1)
go consume(i, jobs, results)
}
// start producing
go produce(jobs, csvFile)
// start processing
wg2.Add(1)
go process(results)
wg.Wait() // wait for all workers to finish processing jobs
close(results)
wg2.Wait() // wait for process to finish
log.Println("===> Done Processing.")
}()
Note that you elided the CSV file handling, so you'll need to ensure that it's safe to use this way (i.e. that you haven't defered closing or deleting the file, which would cause that to occur as soon as the handler returns).
I have a Go app running on a GCloud Compute Engine Instance with Ubuntu 20.04 and whatever the latest ver of Go is for 20.04.
The app is highly reliant on a Firestore Snapshot listener. After a random period of time (hours to a week), the listener just stops working. No errors, nothing. It works flawless until then. I can even update Firestore myself and it still doesn't see the change. Restarting the app fixes it.
I've reported this possible bug to FireBase but haven't heard back yet. Does anyone see anything wrong with this code that might cause that?
I don't think there are any coding mistakes here. If so, it's because I anonymized the code a bit.
func UsersSnapShots() {
ctx := context.Background()
iter := usersdb.Where("connected", "==", 1).Where("status", "==", 1).OrderBy("name", firestore.Asc).Snapshots(ctx)
defer iter.Stop()
for {
next, err := iter.Next()
log.Printf("User change detected")
users = make(map[string]interface{})
if err == iterator.Done {
log.Printf("users done. That shouldn't happen! Run this function again.")
go UsersSnapShots()
break
}
if err != nil {
sendOnlineUsers()
log.Printf("USERS SNAPS ERROR: %v", err)
go UsersSnapShots()
return
}
doc := next.Documents
for {
docIter, err := doc.Next()
if err == iterator.Done {
break
}
t := docIter.Data()
temp := []string{}
//clients is a map of my websocket clients.
safeClients := clients //maps are not thread safe. So let's make a copy.
//Add the websocket IDs for these certain users to the map.
for k, v := range safeClients {
if v.email == t["email"] {
temp = append(temp, k)
}
}
t["sockets"] = temp
users[t["email"].(string)] = t
}
sendOnlineUsers() //Send all the logged in users to all the users' browsers.
//log.Printf("users: %v", users)
}
}
I'm trying to write a server in Go, using the net/http package. I only have one route, and it's pretty simple. It downloads a file from S3 and returns it to the client:
response, err := http.Get("some S3 url")
if err != nil {
return
}
body, err := ioutil.ReadAll(response.Body)
w.Write(body)
Downloading the url myself takes about 0.25 seconds. So I start this server and send it 250 requests/sec. Initially I get responses back within 0.25 seconds. But that number keeps going up until it starts taking 45 seconds for a response. I'm running this on a 40 core machine, with GOMAXPROCS=40. I started wondering if somehow the downloads aren't happening in parallel.
But if I comment out this line:
body, err := ioutil.ReadAll(response.Body)
And just return some garbage data of equal length, suddenly my server consistently responds in 0.25 seconds. Why is it faster after removing the ReadAll?
Few things comes to mind:
You're not closing response.Body and the server is running out of FDs.
The garbage collector is being slow and you're running out of memory for reading so many files with ReadAll.
You're choking the networking because of #1.
Try something like this and see if it helps:
response, err := http.Get("some S3 url")
if err != nil {
return
}
defer response.Body.Close()
_, err := io.Copy(w, response.Body)
In Go, a TCP connection (net.Conn) is a io.ReadWriteCloser. I'd like to test my network code by simulating a TCP connection. There are two requirements that I have:
the data to be read is stored in a string
whenever data is written, I'd like it to be stored in some kind of buffer which I can access later
Is there a data structure for this, or an easy way to make one?
No idea if this existed when the question was asked, but you probably want net.Pipe() which provides you with two full duplex net.Conn instances which are linked to each other
EDIT: I've rolled this answer into a package which makes things a bit simpler - see here: https://github.com/jordwest/mock-conn
While Ivan's solution will work for simple cases, keep in mind that a real TCP connection is actually two buffers, or rather pipes. For example:
Server | Client
---------+---------
reads <=== writes
writes ===> reads
If you use a single buffer that the server both reads from and writes to, you could end up with the server talking to itself.
Here's a solution that allows you to pass a MockConn type as a ReadWriteCloser to the server. The Read, Write and Close functions simply proxy through to the functions on the server's end of the pipes.
type MockConn struct {
ServerReader *io.PipeReader
ServerWriter *io.PipeWriter
ClientReader *io.PipeReader
ClientWriter *io.PipeWriter
}
func (c MockConn) Close() error {
if err := c.ServerWriter.Close(); err != nil {
return err
}
if err := c.ServerReader.Close(); err != nil {
return err
}
return nil
}
func (c MockConn) Read(data []byte) (n int, err error) { return c.ServerReader.Read(data) }
func (c MockConn) Write(data []byte) (n int, err error) { return c.ServerWriter.Write(data) }
func NewMockConn() MockConn {
serverRead, clientWrite := io.Pipe()
clientRead, serverWrite := io.Pipe()
return MockConn{
ServerReader: serverRead,
ServerWriter: serverWrite,
ClientReader: clientRead,
ClientWriter: clientWrite,
}
}
When mocking a 'server' connection, simply pass the MockConn in place of where you would use the net.Conn (this obviously implements the ReadWriteCloser interface only, you could easily add dummy methods for LocalAddr() etc if you need to support the full net.Conn interface)
In your tests you can act as the client by reading and writing to the ClientReader and ClientWriter fields as needed:
func TestTalkToServer(t *testing.T) {
/*
* Assumes that NewMockConn has already been called and
* the server is waiting for incoming data
*/
// Send a message to the server
fmt.Fprintf(mockConn.ClientWriter, "Hello from client!\n")
// Wait for the response from the server
rd := bufio.NewReader(mockConn.ClientReader)
line, err := rd.ReadString('\n')
if line != "Hello from server!" {
t.Errorf("Server response not as expected: %s\n", line)
}
}
Why not using bytes.Buffer? It's an io.ReadWriter and has a String method to get the stored data. If you need to make it an io.ReadWriteCloser, you could define you own type:
type CloseableBuffer struct {
bytes.Buffer
}
and define a Close method:
func (b *CloseableBuffer) Close() error {
return nil
}
In majority of the cases you do not need to mock net.Conn.
You only have to mock stuff that will add time to your tests, prevent tests from running in parallel (using shared resources like the hardcoded file name) or can lead to outages (you can potentially exhaust the connection limit or ports but in most of the cases it is not a concern, when you run your tests in isolation).
Not mocking has an advantage of more precise testing of what you want to test with a real thing.
https://www.accenture.com/us-en/blogs/software-engineering-blog/to-mock-or-not-to-mock-is-that-even-a-question
Instead of mocking net.Conn, you can write a mock server, run it in a goroutine in your test and connect to it using real net.Conn
A quick and dirty example:
port := someRandomPort()
srv := &http.Server{Addr: port}
go func(msg string) {
http.HandleFunc("/hello", myHandleFUnc)
srv.ListenAndServe()
}
myTestCodeUsingConn(port)
srv.Shutdown(context.TODO())