Respond to HTTP request while processing in the background - http

I have an API that receives a CSV file to process. I'd like to be able to send back an 202 Accepted (or any status really) while processing the file in the background. I have a handler that checks the request, writes the success header, and then continues processing via a producer/consumer pattern. The problem is that, due to the WaitGroup.Wait() calls, the accepted header isn't sending back. The errors on the handler validation are sending back correctly but that's because of the return statements.
Is it possible to send that 202 Accepted back with the wait groups as I'm hoping (and if so, what am I missing)?
func SomeHandler(w http.ResponseWriter, req *http.Request) {
endAccepted := time.Now()
err := verifyRequest(req)
if err != nil {
w.WriteHeader(http.StatusBadRequest)
data := JSONErrors{Errors: []string{err.Error()}}
json.NewEncoder(w).Encode(data)
return
}
// ...FILE RETRIEVAL CLIPPED (not relevant)...
// e.g. csvFile, openErr := os.Open(tmpFile.Name())
//////////////////////////////////////////////////////
// TODO this isn't sending due to the WaitGroup.Wait()s below
w.WriteHeader(http.StatusAccepted)
//////////////////////////////////////////////////////
// START PRODUCER/CONSUMER
jobs := make(chan *Job, 100) // buffered channel
results := make(chan *Job, 100) // buffered channel
// start consumers
for i := 0; i < 5; i++ { // 5 consumers
wg.Add(1)
go consume(i, jobs, results)
}
// start producing
go produce(jobs, csvFile)
// start processing
wg2.Add(1)
go process(results)
wg.Wait() // wait for all workers to finish processing jobs
close(results)
wg2.Wait() // wait for process to finish
log.Println("===> Done Processing.")
}

You're doing all the processing in the background, but you're still waiting for it to finish. The solution would be to just not wait. The best solution would move all of the handling elsewhere to a function you can just call with go to run it in the background, but the simplest solution leaving it inline would just be
w.WriteHeader(http.StatusAccepted)
go func() {
// START PRODUCER/CONSUMER
jobs := make(chan *Job, 100) // buffered channel
results := make(chan *Job, 100) // buffered channel
// start consumers
for i := 0; i < 5; i++ { // 5 consumers
wg.Add(1)
go consume(i, jobs, results)
}
// start producing
go produce(jobs, csvFile)
// start processing
wg2.Add(1)
go process(results)
wg.Wait() // wait for all workers to finish processing jobs
close(results)
wg2.Wait() // wait for process to finish
log.Println("===> Done Processing.")
}()
Note that you elided the CSV file handling, so you'll need to ensure that it's safe to use this way (i.e. that you haven't defered closing or deleting the file, which would cause that to occur as soon as the handler returns).

Related

How to interrupt an HTTP handler?

Say I have a http handler like this:
func ReallyLongFunction(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Hello World!")
// run code that takes a long time here
// Executing dd command with cmd.Exec..., etc.
})
Is there a way I can interrupt this function if the user refreshes the page or kills the request some other way without running the subsequent code and how would I do it?
I tried doing this:
notify := r.Context().Done()
go func() {
<-notify
println("Client closed the connection")
s.downloadCleanup()
return
}()
but the code after whenever I interrupt it still runs anyway.
There's no way to forcibly tear a goroutine down from any code external to that goroutine.
Hence the only way to actually interrupt processing is to periodically check whether the client is gone (or whether there's another signal to stop processing).
Basically that would amount to structuring your handler something like this
func ReallyLongFunction(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Hello World!")
done := r.Context().Done()
// Check wheteher we're done
// do some small piece of stuff
// check whether we're done
// do another small piece of stuff
// …rinse, repeat
})
Now a way to check whether there was something written to a channel, but without blocking the operation is to use the "select with default" idiom:
select {
case <- done:
// We're done
default:
}
This statemept executes the code in the "// We're done" block if and only if done was written to or was closed (which is the case with contexts), and otherwis the empty block in the default branch is executed.
So we can refactor that to something like
func ReallyLongFunction(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Hello World!")
done := r.Context().Done()
closed := func () bool {
select {
case <- done:
return true
default:
return false
}
}
if closed() {
return
}
// do some small piece of stuff
if closed() {
return
}
// do another small piece of stuff
// …rinse, repeat
})
Stopping an external process started in an HTTP handler
To address the OP's comment…
The os/exec.Cmd type has the Process field, which is of type os.Process and that type supports the Kill method which forcibly brings the running process down.
The only problem is that exec.Cmd.Run blocks until the process exits,
so the goroutine which is executing it cannot execute other code, and if exec.Cmd.Run is called in an HTTP handler, there's no way to cancel it.
How to best handle running a program in such an asynchronous manner heavily depends on how the process itself is organized but I'd roll like this:
In the handler, prepare the process and then start it using exec.Cmd.Start (as opposed to Run).
Check the error value Start have returned: if it's nil
the process has managed to start OK. Otherwise somehow communicate the failure to the client and quit the handler.
Once the process is known to had started, the exec.Cmd value
has some of its fields populated with process-related information;
of particular interest is the Process field which is of type
os.Process: that type has the Kill method which may be used to forcibly bring the process down.
Start a goroutine and pass it that exec.Cmd value and a channel of some suitable type (see below).
That goroutine should call Wait on it and once it returns,
it should communicate that fact back to the originating goroutine over that channel.
Exactly what to communicate, is an open question as it depends
on whether you want to collect what the process wrote to its standard
output and error streams and/or may be some other data related to the process' activity.
After sending the data, that goroutine exits.
The main goroutine (executing the handler) should just call exec.Cmd.Process.Kill when it detect the handler should terminate.
Killing the process eventually unblocks the goroutine which is executing Wait on that same exec.Cmd value as the process exits.
After killing the process, the handler goroutine waits on the channel to hear back from the goroutine watching the process. The handler does something with that data (may be logs it or whatever) and exits.
You should cancel the goroutine from inside, so for a long calculation task, you may provide checkpoints, to stop and check for the cancelation:
Here is the tested code for the server which has e.g. long calculation task and checkpoints for the cancelation:
package main
import (
"fmt"
"io"
"log"
"net/http"
"time"
)
func main() {
http.HandleFunc(`/`, func(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()
log.Println("wait a couple of seconds ...")
for i := 0; i < 10; i++ { // long calculation
select {
case <-ctx.Done():
log.Println("Client closed the connection:", ctx.Err())
return
default:
fmt.Print(".")
time.Sleep(200 * time.Millisecond) // long calculation
}
}
io.WriteString(w, `Hi`)
log.Println("Done.")
})
log.Println(http.ListenAndServe(":8081", nil))
}
Here is the client code, which times out:
package main
import (
"io/ioutil"
"log"
"net/http"
"time"
)
func main() {
log.Println("HTTP GET")
client := &http.Client{
Timeout: 1 * time.Second,
}
r, err := client.Get(`http://127.0.0.1:8081/`)
if err != nil {
log.Fatal(err)
}
defer r.Body.Close()
bs, err := ioutil.ReadAll(r.Body)
if err != nil {
log.Fatal(err)
}
log.Println("HTTP Done.")
log.Println(string(bs))
}
You may use normal browser to check for not canclation, or close it, refresh it , disconect it, or ..., for the cancelation.

Is the Go HTTP handler goroutine expected to exit immediately in this case?

I have one Go HTTP handler like this:
mux.HandleFunc("/test", func(w http.ResponseWriter, r *http.Request) {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
if cn, ok := w.(http.CloseNotifier); ok {
go func(done <-chan struct{}, closed <-chan bool) {
select {
case <-done:
case <-closed:
fmt.Println("client cancelled....................!!!!!!!!!")
cancel()
}
}(ctx.Done(), cn.CloseNotify())
}
time.Sleep(5 * time.Second)
fmt.Println("I am still running...........")
fmt.Fprint(w, "cancellation testing......")
})
The API works fine, then with curl before the request finish I terminate the curl command deliberately with Control-C, and on server side I do see the client cancelled....................!!!!!!!!! get logged out, but after a while the I am still running........... get logged out also, I thought this goroutine will be terminated immediately!
So, is this desired behaviour, or I did something wrong?
If it is expected, since whatever the goroutine will complete its work, then what is the point of the early cancellation?
If I did something wrong, please help to point me out the correct way.
You create a contex.Context that can be cancelled, which you do cancel when the client closes the connection, BUT you do not check the context and your handler does nothing differently if it is cancelled. The context only carries timeout and cancellation signals, it does not have the power nor the intent to kill / terminate goroutines. The goroutines themselves have to monitor such cancellation signals and act upon it.
So what you see is the expected output of your code.
What you want is to monitor the context, and if it is cancelled, return "immediately" from the handler.
Of course if you're "sleeping", you can't monitor the context meanwhile. So instead use time.After(), like in this example:
mux.HandleFunc("/test", func(w http.ResponseWriter, r *http.Request) {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
if cn, ok := w.(http.CloseNotifier); ok {
go func(done <-chan struct{}, closed <-chan bool) {
select {
case <-done:
case <-closed:
fmt.Println("client cancelled....................!!!!!!!!!")
cancel()
}
}(ctx.Done(), cn.CloseNotify())
}
select {
case <-time.After(5 * time.Second):
fmt.Println("5 seconds elapsed, client didn't close")
case <-ctx.Done():
fmt.Println("Context closed, client closed connection?")
return
}
fmt.Fprint(w, "cancellation testing......")
})

Process concurrent HTTP requests in Golang

I'm trying to process a file which contains 200 URLs and use each URL to make an HTTP request. I need to process 10 URLs concurrently maximum each time (code should block until 10 URLs finish processing). Tried to solve it in go but I keep getting the whole file processed with 200 concurrent connection created.
for scanner.Scan() { // loop through each url in the file
// send each url to golang HTTPrequest
go HTTPrequest(scanner.Text(), channel, &wg)
}
fmt.Println(<-channel)
wg.Wait()
What should i do?
A pool of 10 go routines reading from a channel should fulfill your requirements.
work := make(chan string)
// get original 200 urls
var urlsToProcess []string = seedUrls()
// startup pool of 10 go routines and read urls from work channel
for i := 0; i<=10; i++ {
go func(w chan string) {
url := <-w
}(work)
}
// write urls to the work channel, blocking until a worker goroutine
// is able to start work
for _, url := range urlsToProcess {
work <- url
}
Cleanup and request results are left as an exercise for you. Go channels is will block until one of the worker routines is able to read.
code like this
longTimeAct := func(index int, w chan struct{}, wg *sync.WaitGroup) {
defer wg.Done()
time.Sleep(1 * time.Second)
println(index)
<-w
}
wg := new(sync.WaitGroup)
ws := make(chan struct{}, 10)
for i := 0; i < 100; i++ {
ws <- struct{}{}
wg.Add(1)
go longTimeAct(i, ws, wg)
}
wg.Wait()

Golang writing to http response breaks input reading?

I'm attempting to write a small webapp in Go where the user uploads a gzipped file in a multipart form. The app unzips and parses the file and writes some output to the response. However, I keep running into an error where the input stream looks corrupted when I begin writing to the response. Not writing to the response fixes the problem, as does reading from a non-gzipped input stream. Here's an example http handler:
func(w http.ResponseWriter, req *http.Request) {
//Get an input stream from the multipart reader
//and read it using a scanner
multiReader, _ := req.MultipartReader()
part, _ := multiReader.NextPart()
gzipReader, _ := gzip.NewReader(part)
scanner := bufio.NewScanner(gzipReader)
//Strings read from the input stream go to this channel
inputChan := make(chan string, 1000)
//Signal completion on this channel
donechan := make(chan bool, 1)
//This goroutine just reads text from the input scanner
//and sends it into the channel
go func() {
for scanner.Scan() {
inputChan <- scanner.Text()
}
close(inputChan)
}()
//Read lines from input channel. They all either start with #
//or have ten tab-separated columns
go func() {
for line := range inputChan {
toks := strings.Split(line, "\t")
if len(toks) != 10 && line[0] != '#' {
panic("Dang.")
}
}
donechan <- true
}()
//periodically write some random text to the response
go func() {
for {
time.Sleep(10*time.Millisecond)
w.Write([]byte("write\n some \n output\n"))
}
}()
//wait until we're done to return
<-donechan
}
Weirdly, this code panics every time because it always encounters a line with fewer than 10 tokens, although at different spots every time. Commenting out the line that writes to the response fixes the issue, as does reading from a non-gzipped input stream. Am I missing something obvious? Why would writing to the response break if reading from a gzip file, but not a plain text formatted file? Why would it break at all?
The HTTP protocol is not full-duplex: it is request-response based. You should only send output once you're done with reading the input.
In your code you use a for with range on a channel. This will try to read the channel until it is closed, but you never close the inputChan.
If you never close inputChan, the following line is never reached:
donechan <- true
And therefore receiving from donechan blocks:
<-donechan
You have to close the inputChan when EOF is reached:
go func() {
for scanner.Scan() {
inputChan <- scanner.Text()
}
close(inputChan) // THIS IS NEEDED
}()

Go bufio.Scanner stops while reading TCP connection to Redis

Reading TCP connection between Redis-server by using bufio.Scanner
fmt.Fprintf(conn, "*3\r\n$3\r\nSET\r\n$5\r\nmykey\r\n$7\r\nHello!!\r\n")
scanner := bufio.NewScanner(conn)
for {
// fmt.Println("marker00")
if ok := scanner.Scan(); !ok {
// fmt.Println("marker01")
break
}
// fmt.Println("marker02")
fmt.Println(scanner.Text())
}
"+OK" comes as the result for first scanning, but the second scanning stops just in invoking Scan method. (marker00 -> marker02 -> marker00 and no output any more)
Why does Scan stop and how can I know the end of TCP response (without using bufio.Reader)?
Redis does not close the connection for you after sending a command. Scan() ends after io.EOF which is not sent.
Check out this:
package main
import (
"bufio"
"fmt"
"net"
)
// before go run, you must hit `redis-server` to wake redis up
func main() {
conn, _ := net.Dial("tcp", "localhost:6379")
message := "*3\r\n$3\r\nSET\r\n$1\r\na\r\n$1\r\nb\r\n"
go func(conn net.Conn) {
for i := 0; i < 10; i++ {
fmt.Fprintf(conn, message)
}
}(conn)
scanner := bufio.NewScanner(conn)
for {
if ok := scanner.Scan(); !ok {
break
}
fmt.Println(scanner.Text())
}
fmt.Println("Scanning ended")
}
Old question, but I had the same issue. Two solutions:
1) Add a "QUIT\r\n" command to your Redis message. This will cause Redis to close the connection which will terminate the scan. You'll have to deal with the extra "+OK" that the quit outputs.
2) Add
conn.SetReadDeadline(time.Now().Add(time.Second*5))
just before you start scanning. This will cause the scan to stop trying after 5 seconds. Unfortunately, it will always take 5 seconds to complete the scan so choose this time wisely.

Resources