Golang writing to http response breaks input reading? - http

I'm attempting to write a small webapp in Go where the user uploads a gzipped file in a multipart form. The app unzips and parses the file and writes some output to the response. However, I keep running into an error where the input stream looks corrupted when I begin writing to the response. Not writing to the response fixes the problem, as does reading from a non-gzipped input stream. Here's an example http handler:
func(w http.ResponseWriter, req *http.Request) {
//Get an input stream from the multipart reader
//and read it using a scanner
multiReader, _ := req.MultipartReader()
part, _ := multiReader.NextPart()
gzipReader, _ := gzip.NewReader(part)
scanner := bufio.NewScanner(gzipReader)
//Strings read from the input stream go to this channel
inputChan := make(chan string, 1000)
//Signal completion on this channel
donechan := make(chan bool, 1)
//This goroutine just reads text from the input scanner
//and sends it into the channel
go func() {
for scanner.Scan() {
inputChan <- scanner.Text()
}
close(inputChan)
}()
//Read lines from input channel. They all either start with #
//or have ten tab-separated columns
go func() {
for line := range inputChan {
toks := strings.Split(line, "\t")
if len(toks) != 10 && line[0] != '#' {
panic("Dang.")
}
}
donechan <- true
}()
//periodically write some random text to the response
go func() {
for {
time.Sleep(10*time.Millisecond)
w.Write([]byte("write\n some \n output\n"))
}
}()
//wait until we're done to return
<-donechan
}
Weirdly, this code panics every time because it always encounters a line with fewer than 10 tokens, although at different spots every time. Commenting out the line that writes to the response fixes the issue, as does reading from a non-gzipped input stream. Am I missing something obvious? Why would writing to the response break if reading from a gzip file, but not a plain text formatted file? Why would it break at all?

The HTTP protocol is not full-duplex: it is request-response based. You should only send output once you're done with reading the input.
In your code you use a for with range on a channel. This will try to read the channel until it is closed, but you never close the inputChan.
If you never close inputChan, the following line is never reached:
donechan <- true
And therefore receiving from donechan blocks:
<-donechan
You have to close the inputChan when EOF is reached:
go func() {
for scanner.Scan() {
inputChan <- scanner.Text()
}
close(inputChan) // THIS IS NEEDED
}()

Related

Parse Content Disposition header in GO

I am trying to retrieve the filename from this http writer for testing purposes.
On the server I have:
func servefile(w http.ResponseWriter, r *http.Request ) {
...
// file.Name() is randomized with os.CreateTemp(dir, temp+"*"+ext) above
w.Header().Set("Content-Disposition", "attachment; filename="+file.Name())
w.Header().Set("Content-Type", "application/octet-stream")
http.ServeFile(w, r, file.Name()+".xyz") // serve file to user to download
...
}
*I put .xyz as a place holder for this demonstration
I am testing this function programatically with Go and I want to access the filename to able to save it in a variable in the client code.
I have looked at this post How can I parse the Content-Disposition header to retrieve the filename property? , but I have not gotten it to work. I have no clue what the filename is on the client side so I don't know how to reference it specifically. I know my code on the server side works, because when I send a request (through the browser) to this endpoint/function, the "Downloads" popup shows the file download progress with the name of the file.
EDIT** This is the client code I am calling it from:
func TestGetFile(t *testing.T) {
...
cid := "some string"
// requestfile() creates, executes, and returns an httptest.ResponseRecorder to the requestFile endpoint
reqfileRespRecorder := requestfile()
// createTmpFile creates a new file out of the contents recieved in requestfile()
filePath := "/tmp/temp.xyz"
file := createTmpFile(reqfileRespRecorder , filePath)
// CreateWriter() - writes file contents to body of multipart.Writer
w, body := createWriter(file)
// Create request to postRecord endpoint
req, err := http.NewRequest("POST", "/PostRecord?CID="+cid, body)
check(err)
req.Header.Add("Content-Type", w.FormDataContentType())
// execute request to PostRecord endpoint. returns an httptest.ResponseRecorder
respRecorder := executeRequest(PostRecord, req)
disposition, params, err := mime.ParseMediaType(`Content-Disposition`)
...
}
Based on #BaytaDarrell's comment. It dawned on me that I could print out the responses. That helped me realize that I was trying to find the content-disposition after the wrong request/response. The linked post still didn't help, but I got my code working like this:
func TestGetFile(t *testing.T) {
...
cid := "some string"
// requestfile() creates, executes, and returns an httptest.ResponseRecorder to the requestFile endpoint
reqfileRespRecorder := requestfile()
disp := reqfileRespRecorder.Header().Get("Content-Disposition")
line := strings.Split(disp, "=")
filename := line[1]
fmt.Println("filename: ", filename)
// createTmpFile creates a new file out of the contents recieved in requestfile()
filePath := "/tmp/temp.xyz"
file := createTmpFile(reqfileRespRecorder , filePath)
// CreateWriter() - writes file contents to body of multipart.Writer
w, body := createWriter(file)
...
}
Their comment realized I should re-look at the httptest package documentation. Here I found the Header() function and that I can use it to look at the header with Get().
This line reqfileRespRecorder.Header().Get("Content-Disposition") returns attachment; filename=temp37bf73gd.xyz and to store the filename in a variable I split on =.

Respond to HTTP request while processing in the background

I have an API that receives a CSV file to process. I'd like to be able to send back an 202 Accepted (or any status really) while processing the file in the background. I have a handler that checks the request, writes the success header, and then continues processing via a producer/consumer pattern. The problem is that, due to the WaitGroup.Wait() calls, the accepted header isn't sending back. The errors on the handler validation are sending back correctly but that's because of the return statements.
Is it possible to send that 202 Accepted back with the wait groups as I'm hoping (and if so, what am I missing)?
func SomeHandler(w http.ResponseWriter, req *http.Request) {
endAccepted := time.Now()
err := verifyRequest(req)
if err != nil {
w.WriteHeader(http.StatusBadRequest)
data := JSONErrors{Errors: []string{err.Error()}}
json.NewEncoder(w).Encode(data)
return
}
// ...FILE RETRIEVAL CLIPPED (not relevant)...
// e.g. csvFile, openErr := os.Open(tmpFile.Name())
//////////////////////////////////////////////////////
// TODO this isn't sending due to the WaitGroup.Wait()s below
w.WriteHeader(http.StatusAccepted)
//////////////////////////////////////////////////////
// START PRODUCER/CONSUMER
jobs := make(chan *Job, 100) // buffered channel
results := make(chan *Job, 100) // buffered channel
// start consumers
for i := 0; i < 5; i++ { // 5 consumers
wg.Add(1)
go consume(i, jobs, results)
}
// start producing
go produce(jobs, csvFile)
// start processing
wg2.Add(1)
go process(results)
wg.Wait() // wait for all workers to finish processing jobs
close(results)
wg2.Wait() // wait for process to finish
log.Println("===> Done Processing.")
}
You're doing all the processing in the background, but you're still waiting for it to finish. The solution would be to just not wait. The best solution would move all of the handling elsewhere to a function you can just call with go to run it in the background, but the simplest solution leaving it inline would just be
w.WriteHeader(http.StatusAccepted)
go func() {
// START PRODUCER/CONSUMER
jobs := make(chan *Job, 100) // buffered channel
results := make(chan *Job, 100) // buffered channel
// start consumers
for i := 0; i < 5; i++ { // 5 consumers
wg.Add(1)
go consume(i, jobs, results)
}
// start producing
go produce(jobs, csvFile)
// start processing
wg2.Add(1)
go process(results)
wg.Wait() // wait for all workers to finish processing jobs
close(results)
wg2.Wait() // wait for process to finish
log.Println("===> Done Processing.")
}()
Note that you elided the CSV file handling, so you'll need to ensure that it's safe to use this way (i.e. that you haven't defered closing or deleting the file, which would cause that to occur as soon as the handler returns).

How to avoid running into max open files limit

I'm building an application that will be downloading roughly 5000 CSV files concurrently using go routines and plain ol http get requests. Downloading the files in parallel.
I'm currently running into open file limits imposed by OS X.
The CSV files are served over http. Are there any other network protocols that I can use to batch each request into one? I don't have access to the server, so I can't zip them. I'd also prefer not to change the ulimit because once in production, I probably won't have access to that configuration.
You probably want to limit active concurrent requests to a more sensible number than 5000. Possibly spin up 10/20 workers and send individual files to them over a channel.
The http client should reuse connections for requests, assuming you always read the entire request body, and close it.
Something like this:
func main() {
http.DefaultTransport.(*http.Transport).MaxIdleConnsPerHost = 100
for i := 0; i < 10; i++ {
wg.Add(1)
go worker()
}
var csvs = []string{"http://example.com/a.csv", "http://example.com/b.csv"}
for _, u := range csvs {
ch <- u
}
close(ch)
wg.Wait()
}
var ch = make(chan string)
var wg sync.WaitGroup
func worker() {
defer wg.Done()
for u := range ch {
get(u)
}
}
func get(u string) {
resp, err := http.Get(u)
//check err here
// make sure we always read rest of body, and close
defer resp.Body.Close()
defer io.Copy(ioutil.Discard, resp.Body)
//read and decode / handle it. Make sure to read all of body.
}

why is golang http server failing with "broken pipe" when response exceeds 8kb?

I have a example web server below where if you call curl localhost:3000 -v then ^C (cancel) it immediately (before 1 second), it will report write tcp 127.0.0.1:3000->127.0.0.1:XXXXX: write: broken pipe.
package main
import (
"fmt"
"net/http"
"time"
)
func main() {
log.Fatal(http.ListenAndServe(":3000", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
time.Sleep(1 * time.Second)
// Why 8061 bytes? Because the response header on my computer
// is 132 bytes, adding up the entire response to 8193 (1 byte
// over 8kb)
if _, err := w.Write(make([]byte, 8061)); err != nil {
fmt.Println(err)
return
}
})))
}
Based on my debugging, I have been able to conclude that this will only happen if the entire response is writing more than 8192 bytes (or 8kb). If my entire response write less than 8192, the broken pipe error is not returned.
My question is where is this 8192 bytes (or 8kb) buffer limit set? Is this a limit in Golang's HTTP write buffer? Is this related to the response being chunked? Is this only related to the curl client or the browser client? How can I change this limit so I can have a bigger buffer written before the connection is closed (for debugging purposes)?
Thanks!
In net/http/server.go the output buffer is set to 4<<10, i.e. 4KB.
The reason you see the error at 8KB, is that it takes at least 2 writes to a socket to detect a closed remote connection. The first write succeeds, but the remote host sends an RST packet. The second write will be to a closed socket, which is what returns the broken pipe error.
Depending on the socket write buffer, and the connection latency, it's possible that even more writes could succeed before the first RST packet is registered.
It is broken pipe, but u should use ioutil.ReadAll for small data size of response or io.copy for large data size of response.
For ioutil.ReadAll
defer response.Body.Close()
body, err := ioutil.ReadAll(response.Body)
if err != nil {
logger.Errorf(ctx, "err is %+v", err)
return nil, err
}
For io.copy
// 10MB
var wb = make([]byte, 0, 10485760)
buf := bytes.NewBuffer(wb)
written, err := io.Copy(buf, response.Body)
body := wb[:written]

Go bufio.Scanner stops while reading TCP connection to Redis

Reading TCP connection between Redis-server by using bufio.Scanner
fmt.Fprintf(conn, "*3\r\n$3\r\nSET\r\n$5\r\nmykey\r\n$7\r\nHello!!\r\n")
scanner := bufio.NewScanner(conn)
for {
// fmt.Println("marker00")
if ok := scanner.Scan(); !ok {
// fmt.Println("marker01")
break
}
// fmt.Println("marker02")
fmt.Println(scanner.Text())
}
"+OK" comes as the result for first scanning, but the second scanning stops just in invoking Scan method. (marker00 -> marker02 -> marker00 and no output any more)
Why does Scan stop and how can I know the end of TCP response (without using bufio.Reader)?
Redis does not close the connection for you after sending a command. Scan() ends after io.EOF which is not sent.
Check out this:
package main
import (
"bufio"
"fmt"
"net"
)
// before go run, you must hit `redis-server` to wake redis up
func main() {
conn, _ := net.Dial("tcp", "localhost:6379")
message := "*3\r\n$3\r\nSET\r\n$1\r\na\r\n$1\r\nb\r\n"
go func(conn net.Conn) {
for i := 0; i < 10; i++ {
fmt.Fprintf(conn, message)
}
}(conn)
scanner := bufio.NewScanner(conn)
for {
if ok := scanner.Scan(); !ok {
break
}
fmt.Println(scanner.Text())
}
fmt.Println("Scanning ended")
}
Old question, but I had the same issue. Two solutions:
1) Add a "QUIT\r\n" command to your Redis message. This will cause Redis to close the connection which will terminate the scan. You'll have to deal with the extra "+OK" that the quit outputs.
2) Add
conn.SetReadDeadline(time.Now().Add(time.Second*5))
just before you start scanning. This will cause the scan to stop trying after 5 seconds. Unfortunately, it will always take 5 seconds to complete the scan so choose this time wisely.

Resources