Alternative To ioutil.ReadAll in go? - http

For a program I'm making this function is ran as a goroutine in a for loop depending on how many urls are passed in (no set amount).
func makeRequest(url string, ch chan<- string, errors map[string]error){
res, err := http.Get(url)
if err != nil {
errors[url] = err
close(ch)
return
}
defer res.Body.Close()
body, _ := ioutil.ReadAll(res.Body)
ch <- string(body)
}
The entire body of the response has to be used so ioutil.ReadAll seemed like the perfect fit but with no restriction on the amount of urls that can be passed in and the nature of ReadAll being that it's all stored in memory it's starting to feel less like the golden ticket. I'm fairly new to Go so if you do decide to answer, if you could give some explanation behind your solution it would be greatly appreciated!

One insight that I got as I learned how to use Go is that ReadAll is often inefficient for large readers, and like in your case, is subject to arbitrary input being very big and possibly leaking out memory. When I started out, I used to do JSON parsing like this:
data, err := ioutil.ReadAll(r)
if err != nil {
return err
}
json.Unmarshal(data, &v)
Then, I learned of a much more efficient way of parsing JSON, which is to simply use the Decoder type.
err := json.NewDecoder(r).Decode(&v)
if err != nil {
return err
}
Not only is this more concise, it is much more efficient, both memory-wise and time-wise:
The decoder doesn't have to allocate a huge byte slice to accommodate for the data read - it can simply reuse a tiny buffer which will be used against the Read method to get all the data and parse it. This saves a lot of time in allocations and removes stress from the GC
The JSON Decoder can start parsing data as soon as the first chunk of data comes in - it doesn't have to wait for everything to finish downloading.
Now, of course your question has nothing to do with JSON, but this example is useful to illustrate that if you can use Read directly and parse data chunks at a time, do it. Especially with HTTP requests, parsing is faster than reading/downloading, so this can lead to parsed data being almost immediately ready the moment the request body finishes arriving.
In your case, you seem not to be actually doing any handling of the data for now, so there's not much to suggest to aid you specifically. But the io.Reader and the io.Writer interfaces are the Go equivalent of UNIX pipes, and so you can use them in many different places:
Writing data to a file:
f, err := os.Create("file")
if err != nil {
return err
}
defer f.Close()
// Copy will put all the data from Body into f, without creating a huge buffer in memory
// (moves chunks at a time)
io.Copy(f, resp.Body)
Printing everything to stdout:
io.Copy(os.Stdout, resp.Body)
Pipe a response's body to a request's body:
resp, err := http.NewRequest("POST", "https://example.com", resp.Body)

In order to bound the amount of memory that you're application is using, the common approach is to read into a buffer, which should directly address your ioutil.ReadAll problem.
go's bufio package offers utilities (Scanner) which supports reading until a delimiter, or reading a line from the input, which is highly related to #Howl's question

While that is pretty much simple in go
Here is the client program:
package main
import (
"fmt"
"net/http"
)
var data []byte
func main() {
data = make([]byte, 128)
ch := make(chan string)
go makeRequest("http://localhost:8080", ch)
for v := range ch {
fmt.Println(v)
}
}
func makeRequest(url string, ch chan<- string) {
res, err := http.Get(url)
if err != nil {
close(ch)
return
}
defer res.Body.Close()
defer close(ch) //don't forget to close the channel as well
for n, err := res.Body.Read(data); err == nil; n, err = res.Body.Read(data) {
ch <- string(data[:n])
}
}
Here is the serve program:
package main
import (
"net/http"
)
func main() {
http.HandleFunc("/", hello)
http.ListenAndServe("localhost:8080", nil)
}
func hello(w http.ResponseWriter, r *http.Request) {
http.ServeFile(w, r, "movie.mkv")
}

Related

Golang multiple timers with map+channel+mutex

So I'm implementing multiple timers using map/channel/mutex. In order for timer to cancel, I have a channel map that stores cancel info, below is the code:
var timerCancelMap = make(map[string]chan interface{})
var mutexLocker sync.Mutex
func cancelTimer(timerIndex string) {
mutexLocker.Lock()
defer mutexLocker.Unlock()
timerCancelMap[timerIndex] = make(chan interface{})
timerCancelMap[timerIndex] <- struct{}{}
}
func timerStart(timerIndex string) {
fmt.Println("###### 1. start timer: ", timerIndex)
timerStillActive := true
newTimer := time.NewTimer(time.Second * 10)
for timerStillActive {
mutexLocker.Lock()
select {
case <-newTimer.C:
timerStillActive = false
fmt.Println("OOOOOOOOO timer time's up: ", timerIndex)
case <-timerCancelMap[timerIndex]:
timerCancelMap[timerIndex] = nil
timerStillActive = false
fmt.Println("XXXXXXXXX timer canceled: ", timerIndex)
default:
}
mutexLocker.Unlock()
}
fmt.Println("###### 2. end timer: ", timerIndex)
}
func main() {
for i := 0; i < 10; i++ {
go timerStart(strconv.Itoa(i))
if i%10 == 0 {
cancelTimer(strconv.Itoa(i))
}
}
}
Now this one gives me deadlock, if I remove all mutex.lock/unlock, it gives me concurrent map read and map write. So what am I doing wrong?
I know sync.Map solves my problem, but the performance suffers significantly, so I kinda wanna stick with the map solution.
Thanks in advance!
There's a few things going on here which are going to cause problems with your script:
cancelTimer creates a channel make(chan interface{}) which has no buffer, e.g. make(chan struct{}, 1). This means that sending to the channel will block until another goroutine attempts to receive from that same channel. So when you attempt to call cancelTimer from the main goroutine, it locks mutexLocker and then blocks on sending the cancellation, meanwhile no other goroutine can lock mutexLocker to receive from the cancellation channel, thus causing a deadlock.
After adding a buffer, the cancelTimer call will return immediately.
We will then run into a few other little issues. The first is that the program will immediately quit without printing anything. This happens because after launching the test goroutines and sending the cancel, the main thread has done all of its work, which tells the program it is finished. So we need to tell the main thread to wait for the goroutines, which sync.WaitGroup is very good for:
func main() {
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go func(i int) {
defer wg.Done()
timerStart(strconv.Itoa(i))
}(i)
if i%10 == 0 {
cancelTimer(strconv.Itoa(i))
}
}
wg.Wait()
}
I can see you've added the mutexLocker to protect the map and later added the for loop to give each goroutine an opportunity to acquire mutexLocker to check their timers. This results in a lot of work for the computer, and more complicated code than is necessary. Instead of having timerStart look up it's index in the cancellations map, we can provide the cancellation channel as an argument:
func testTimer(i int, cancel <-chan interface{}) {
and have the main function create the channels. You will then be a le to remove map access, mutexLocker locking, and the for loop from testTimer. If you still require the map for purposes not shown here, you can put the same channel in the map that you pass to testTimer, and if not you can remove all of that code too.
This all ends up looking something like https://play.golang.org/p/iQUvc52B6Nk
Hope that helps 👍

PUT upload a file's byte range with streams and progress

I just got started with Go and need some help. I would like to upload a certain range of bytes from a file.
I already accomplished this by reading the bytes into a buffer. But this increases memory usage.
Instead of reading bytes into memory, I want to stream them while uploading and have an upload progress. I did something like this in Node.js but struggle to get the puzzle pieces together for Go.
The code that I have now looks like this:
func uploadChunk(id, mimeType, uploadURL, filePath string, offset, size uint) {
// open file
file, err := os.Open(filePath)
panicCheck(err, ErrorFileRead) // custom error handler
defer file.Close()
// move to the proper byte
file.Seek(int64(offset), 0)
// read byte chunk into buffer
buffer := make([]byte, size)
file.Read(buffer)
fileReader := bytes.NewReader(buffer)
request, err := http.NewRequest(http.MethodPut, uploadURL, fileReader)
client := &http.Client{
Timeout: time.Second * 10,
}
response, err := client.Do(request)
panicCheck(err, ErrorFileRead)
defer response.Body.Close()
b, err := httputil.DumpResponse(response, true)
panicCheck(err, ErrorFileRead)
fmt.Println("response\n", string(b))
}
Could you guys help me to figure out how to stream and get progress for an upload?
Thanks
You can use an io.LimitedReader to wrap the file and only read the amount of data you want. The implementation returned by io.LimitReader is an *io.LimitedReader.
file.Seek(int64(offset), 0)
fileReader := io.LimitReader(file, size)
request, err := http.NewRequest(http.MethodPut, uploadURL, fileReader)
And for S3 you will want to ensure that you don't use chunked encoding by explicitly setting the ContentLength:
request.ContentLength = size
As for upload progress, see: Go: Tracking POST request progress

Why does my code work correctly when I run wg.Wait() inside a goroutine?

I have a list of urls that I am scraping. What I want to do is store all of the successfully scraped page data into a channel, and when I am done, dump it into a slice. I don't know how many successful fetches I will get, so I cannot specify a fixed length. I expected the code to reach wg.Wait() and then wait until all the wg.Done() methods are called, but I never reached the close(queue) statement. Looking for a similar answer, I came across this SO answer
https://stackoverflow.com/a/31573574/5721702
where the author does something similar:
ports := make(chan string)
toScan := make(chan int)
var wg sync.WaitGroup
// make 100 workers for dialing
for i := 0; i < 100; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for p := range toScan {
ports <- worker(*host, p)
}
}()
}
// close our receiving ports channel once all workers are done
go func() {
wg.Wait()
close(ports)
}()
As soon as I wrapped my wg.Wait() inside the goroutine, close(queue) was reached:
urls := getListOfURLS()
activities := make([]Activity, 0, limit)
queue := make(chan Activity)
for i, activityURL := range urls {
wg.Add(1)
go func(i int, url string) {
defer wg.Done()
activity, err := extractDetail(url)
if err != nil {
log.Println(err)
return
}
queue <- activity
}(i, activityURL)
}
// calling it like this without the goroutine causes the execution to hang
// wg.Wait()
// close(queue)
// calling it like this successfully waits
go func() {
wg.Wait()
close(queue)
}()
for a := range queue {
// block channel until valid url is added to queue
// once all are added, close it
activities = append(activities, a)
}
Why does the code not reach the close if I don't use a goroutine for wg.Wait()? I would think that the all of the defer wg.Done() statements are called so eventually it would clear up, because it gets to the wg.Wait(). Does it have to do with receiving values in my channel?
You need to wait for goroutines to finish in a separate thread because queue needs to be read from. When you do the following:
queue := make(chan Activity)
for i, activityURL := range urls {
wg.Add(1)
go func(i int, url string) {
defer wg.Done()
activity, err := extractDetail(url)
if err != nil {
log.Println(err)
return
}
queue <- activity // nothing is reading data from queue.
}(i, activityURL)
}
wg.Wait()
close(queue)
for a := range queue {
activities = append(activities, a)
}
Each goroutine blocks at queue <- activity since queue is unbuffered and nothing is reading data from it. This is because the range loop on queue is in the main thread after wg.Wait.
wg.Wait will only unblock once all the goroutine return. But as mentioned, all the goroutines are blocked at channel send.
When you use a separate goroutine to wait, code execution actually reaches the range loop on queue.
// wg.Wait does not block the main thread.
go func() {
wg.Wait()
close(queue)
}()
This results in the goroutines unblocking at the queue <- activity statement (main thread starts reading off queue) and running until completion. Which in turn calls each individual wg.Done.
Once the waiting goroutine get past wg.Wait, queue is closed and the main thread exits the range loop on it.
queue channel is unbuffered so every goroutine trying to write to it gets blocked because reader process is not yet started. So no goroutinte can write and they all hang - as a result wg.Wait waits forever.
Try to launch reader in a separate goroutine:
go func() {
for a := range queue {
// block channel until valid url is added to queue
// once all are added, close it
activities = append(activities, a)
}
}()
and then start waiter:
wg.Wait()
close(queue)
This way you can not to accumulate all the data in channel and overload it, but get data as it comes and put to target slice.

In Go, how do I write a streaming http response body to a seek position in a file effectively?

I have a program that combines multiple http responses and writes to the respective seek positions on a file. I am currently doing this by
client := new(http.Client)
req, _ := http.NewRequest("GET", os.Args[1], nil)
resp, _ := client.Do(req)
defer resp.Close()
reader, _ := ioutil.ReadAll(resp.Body) //Reads the entire response to memory
//Some func that gets the seek value someval
fs.Seek(int64(someval), 0)
fs.Write(reader)
This sometimes results in a large memory usage because of the ioutil.ReadAll.
I tried bytes.Buffer as
buf := new(bytes.Buffer)
offset, _ := buf.ReadFrom(resp.Body) //Still reads the entire response to memory.
fs.Write(buf.Bytes())
but it was still the same.
My intention was to use a buffered write to the file, then seek to the offset again, and to continue write again till the end of stream is received (and hence capturing the offset value from buf.ReadFrom). But it was also keeping everything in the memory and writing at once.
What is the best way to write a similar stream directly to the disk, without keeping the entire content in buffer?
An example to understand would be much appreciated.
Thank you.
Use io.Copy to copy the response body to the file:
resp, _ := client.Do(req)
defer resp.Close()
//Some func that gets the seek value someval
fs.Seek(int64(someval), 0)
n, err := io.Copy(fs, resp.Body)
// n is number of bytes copied

Golang: Why does increasing the size of a buffered channel eliminate output from my goroutines?

I am trying to understand why making the buffer size of a channel larger changes causes my code to run unexpectedly. If the buffer is smaller than my input (100 ints), the output is as expected, i.e., 7 goroutines each read a subset of the input and send output on another channel which prints it. If the buffer is the same size or larger than the input, I get no output and no error. Am I closing a channel at the wrong time? Do I have the wrong expectation about how buffers work? Or, something else?
package main
import (
"fmt"
"sync"
)
var wg1, wg2 sync.WaitGroup
func main() {
share := make(chan int, 10)
out := make(chan string)
go printChan(out)
for j:= 1; j<=7; j++ {
go readInt(share, out, j)
}
for i:=1; i<=100; i++ {
share <- i
}
close(share)
wg1.Wait()
close(out)
wg2.Wait()
}
func readInt(in chan int, out chan string, id int) {
wg1.Add(1)
for n := range in {
out <- fmt.Sprintf("goroutine:%d was sent %d", id, n)
}
wg1.Done()
}
func printChan(out chan string){
wg2.Add(1)
for l := range out {
fmt.Println(l)
}
wg2.Done()
}
To run this:
Small buffer, expected output. http://play.golang.org/p/4r7rTGypPO
Big buffer, no output. http://play.golang.org/p/S-BDsw7Ctu
This has nothing directly to do with the size of the buffer. Adding the buffer is exposing a bug in where you're calling waitGroup.Add(1)
You have to add to the WaitGroup before you dispatch the goroutine, otherwise you may end up calling Wait() before the waitGroup.Add(1) executes.
http://play.golang.org/p/YaDhc6n8_B
The reason it works in the first and not the second, is because the synchronous sends ensure that the gouroutines have executed at least that far. In the second example, the for loop fills up the channel, closes it and calls Wait before anything else can happen.

Resources