downloading files with goroutines? - http

I'm new to Go and I'm learning how to work with goroutines.
I have a function that downloads images:
func imageDownloader(uri string, filename string) {
fmt.Println("starting download for ", uri)
outFile, err := os.Create(filename)
defer outFile.Close()
if err != nil {
os.Exit(1)
}
client := &http.Client{}
req, err := http.NewRequest("GET", uri, nil)
resp, err := client.Do(req)
defer resp.Body.Close()
if err != nil {
panic(err)
}
header := resp.ContentLength
bar := pb.New(int(header))
rd := bar.NewProxyReader(resp.Body)
// and copy from reader
io.Copy(outFile, rd)
}
When I call by itself as part of another function, it downloads images completely and there is no truncated data.
However, when I try to modify it to make it a goroutine, images are often truncated or zero length files.
func imageDownloader(uri string, filename string, wg *sync.WaitGroup) {
...
io.Copy(outFile, rd)
wg.Done()
}
func main() {
var wg sync.WaitGroup
wg.Add(1)
go imageDownloader(url, file, &wg)
wg.Wait()
}
Am I using WaitGroups incorrectly? What could cause this and how can I fix it?
Update:
Solved it. I had placed the wg.add() function outside of a loop. :(

While I'm not sure exactly what's causing your issue, here's two options for how to get it back into working order.
First, looking to the example of how to use waitgroups from the sync library, try calling defer wg.Done() at the beginning of your function to ensure that even if the goroutine ends unexpectedly, that the waitgroup is properly decremented.
Second, io.Copy returns an error that you're not checking. That's not great practice anyway, but in your particular case it's preventing you from seeing if there is indeed an error in the copying routine. Check it and deal with it appropriately. It also returns the number of bytes written, which might help you as well.

Your example doesn't have anything obviously wrong with its use of WaitGroups. As long as you are calling wg.Add() with the same number as the number of goroutines you launch, or incrementing it by 1 every time you start a new goroutine, that should be correct.
However you call os.Exit and panic for certain errors conditions in the goroutine, so if you have more than one of these running, a failure in any one of them will terminate all of them, regardless of the use of WaitGroups. If it's failing without a panic message, I would take a look at the os.Exit(1) line.
It would also, be good practice in go to use defer wg.Done() at the start of your function, so that even if an error occurs, the goroutine still decrements its counter. That way your main thread won't hang on completion if one of the goroutines returns an error.

One change I would make in your example is leverage defer when you are Done. I think this defer ws.Done() should be the first statement in your function.
I like WaitGroup's simplicity. However, I do not like that we need to pass the reference to the goroutine because that would mean that the concurrency logic would be mixed with your business logic.
So I came up with this generic function to solve this problem for me:
// Parallelize parallelizes the function calls
func Parallelize(functions ...func()) {
var waitGroup sync.WaitGroup
waitGroup.Add(len(functions))
defer waitGroup.Wait()
for _, function := range functions {
go func(copy func()) {
defer waitGroup.Done()
copy()
}(function)
}
}
So your example could be solved this way:
func imageDownloader(uri string, filename string) {
...
io.Copy(outFile, rd)
}
func main() {
functions := []func(){}
list := make([]Object, 5)
for _, object := range list {
function := func(obj Object){
imageDownloader(object.uri, object.filename)
}(object)
functions = append(functions, function)
}
Parallelize(functions...)
fmt.Println("Done")
}
If you would like to use it, you can find it here https://github.com/shomali11/util

Related

How to solve concurrency access of Golang map?

Now i have a map with only one write/delete goroutine and many read goroutines, there are some solutions upon Map with concurrent access, such as RWMutex, sync.map, concurrent-map, sync.atomic, sync.Value, what's the best choice for me?
RWMutex's read lock is a little redundant
sync.map and concurrent-map focus on many write goroutines
Your question is a little vague - so I'll break it down.
What form of concurrent access should I use for a map?
The choice depends on the performance you require from the map. I would opt for a simple mutex (or a RWMutex) based approach.
Granted, you can get better performance from a concurrent map. sync.Mutex locks all of a maps buckets, whereas in a concurrent map, each bucket has it's own sync.Mutex.
Again - it all depends on the scale of your program and the performance you require.
How would I use a mutex for concurrent access?
To ensure the map is being used correctly, you can wrap this in a struct.
type Store struct {
Data map[T]T
}
This a more object-oriented solution, but it allows us to make sure any read/writes are performed concurrently. As well as this, we can easily store other information that may be useful for debugging or security, such as author.
Now, we would implement this with a set of methods like so:
mux sync.Mutex
// New initialises a Store type with an empty map
func New(t, h uint) *Store {
return &Store{
Data: map[T]T{},
}
}
// Insert adds a new key i to the store and places the value of x at this location
// If there is an error, this is returned - if not, this is nil
func (s *Store) Insert(i, x T) error {
mux.Lock()
defer mux.Unlock()
_, ok := s.Data[i]
if ok {
return fmt.Errorf("index %s already exists; use update", i)
}
s.Data[i] = x
return nil
}
// Update changes the value found at key i to x
// If there is an error, this is returned - if not, this is nil
func (s *Store) Update(i, x T) error {
mux.Lock()
defer mux.Unlock()
_, ok := s.Data[i]
if !ok {
return fmt.Errorf("value at index %s does not exist; use insert", i)
}
s.Data[i] = x
return nil
}
// Fetch returns the value found at index i in the store
// If there is an error, this is returned - if not, this is nil
func (s *Store) Fetch(i T) (T, error) {
mux.Lock()
defer mux.Unlock()
v, ok := s.Data[i]
if !ok {
return "", fmt.Errorf("no value for key %s exists", i)
}
return v, nil
}
// Delete removes the index i from store
// If there is an error, this is returned - if not, this is nil
func (s *Store) Delete(i T) (T, error) {
mux.Lock()
defer mux.Unlock()
v, ok := s.Data[i]
if !ok {
return "", fmt.Errorf("index %s already empty", i)
}
delete(s.Data, i)
return v, nil
}
In my solution, I've used a simple sync.Mutex - but you can simply change this code to accommodate RWMutex.
I recommend you take a look at How to use RWMutex in Golang?.

Recursion when overwriting in variable

I hope you can help me out since this gave me quite a headache.
I'm creating a chain for the middleware which is executed afterwards. But it looks like it has become recurring. The variable next within the anonymous function points to itself.
type MiddlewareInterface interface {
// Run the middleware for the given request, and receive the next handler.
Run(http.ResponseWriter, *http.Request, http.Handler)
}
createChain(collection []MiddlewareInterface, handler http.Handler) http.Handler
next := handler
for _, middlew := range collection {
next = func(w http.ResponseWriter, res *http.Request) {
middlew.Run(w, res, next)
}
}
return next
}
I know it's kind of a noob question, but I sincerely do want to understand what causes this and how this can be resolved. Looking forward to your answers!
Seems this is a closure variable in loop problem. You are creating a function which is capturing next in each loop, but what this will mean is that all of the functions share the same variable next and they will all have the value that is left on the last loop. I think you can fix it by introducing a new temporary variable inside the loop scope:
func createChain(collection []MiddlewareInterface, handler http.Handler) http.Handler
next := handler
for _, middlew := range collection {
thisNext:= next
mw := middlew
next = func(w http.ResponseWriter, res *http.Request) {
mw.Run(w, res, thisNext)
}
}
return next
}
Possibly the placement of the new variable definition isn't quite right but the closure issue will definitely be the source of your problem. It's not normally how http middleware handlers work as they normally wrap each other rather than being chained.

How can I block (and join) on a channel fed by an unknown number of goroutines?

I have a recursive function. The function will call itself with various different values depending on the data it gets, so the arity and depth of recursion is not known: each call may call itself zero or more times. The function may return any number of values.
I want to parallelise it by getting goroutines and channels involved. Each recursion of inner runs in its own goroutine, and sends back a value on the channel. The outer function deals with those values.
func outer(response []int) {
results := make([]int)
resultsChannel := make(chan int)
inner := func(...) {
resultsChannel <- «some result»;
// Recurse in a new goroutine.
for _, recursionArgument in «some calculated data» {
go inner(recursionArgument)
}
}
go inner(«initial values»);
for {
result := <- resultsChannel
results = append(results, result)
// HELP! How do I decide when to break?
}
return results
}
The problem comes with escaping the results channel loop. Because of the 'shape' of the recursion (unknown arity and depth) I can't say "finish after n events" and I can't send a sentinel value.
How do I detect when all my recursions have happened and return from outer? Is there a better way to approach this?
You can use a sync.WaitGroup to manage the collection of goroutines you spawn: call Add(1) before spawning each new goroutine, and Done when each goroutine completes. So something like this:
var wg sync.WaitGroup
inner := func(...) {
...
// Recurse in a new goroutine.
for _, recursionArgument := range «some calculated data» {
wg.Add(1)
go inner(recursionArgument)
}
...
wg.Done()
}
wg.Add(1)
go inner(«initial values»)
Now waiting on wg will tell you when all the goroutines have completed.
If you are reading the results from a channel, the obvious way to tell when there are no more results is by closing the channel. You can achieve this through another goroutine to do this for us:
go func() {
wg.Wait()
close(resultsChannel)
}()
You should now be able to simply range over resultsChannel to read all the results.

Using reflection with structs to build generic handler function

I have some trouble building a function that can dynamically use parametrized structs. For that reason my code has 20+ functions that are similar except basically for one type that gets used. Most of my experience is with Java, and I'd just develop basic generic functions, or use plain Object as parameter to function (and reflection from that point on). I would need something similar, using Go.
I have several types like:
// The List structs are mostly needed for json marshalling
type OrangeList struct {
Oranges []Orange
}
type BananaList struct {
Bananas []Banana
}
type Orange struct {
Orange_id string
Field_1 int
// The fields are different for different types, I am simplifying the code example
}
type Banana struct {
Banana_id string
Field_1 int
// The fields are different for different types, I am simplifying the code example
}
Then I have function, basically for each list type:
// In the end there are 20+ of these, the only difference is basically in two types!
// This is very un-DRY!
func buildOranges(rows *sqlx.Rows) ([]byte, error) {
oranges := OrangeList{} // This type changes
for rows.Next() {
orange := Orange{} // This type changes
err := rows.StructScan(&orange) // This can handle each case already, could also use reflect myself too
checkError(err, "rows.Scan")
oranges.Oranges = append(oranges.Oranges,orange)
}
checkError(rows.Err(), "rows.Err")
jsontext, err := json.Marshal(oranges)
return jsontext, err
}
Yes, I could change the sql library to use more intelligent ORM or framework, but that's besides the point. I want to learn on how to build generic function that can handle similar function for all my different types.
I got this far, but it still doesn't work properly (target isn't expected struct I think):
func buildWhatever(rows *sqlx.Rows, tgt interface{}) ([]byte, error) {
tgtValueOf := reflect.ValueOf(tgt)
tgtType := tgtValueOf.Type()
targets := reflect.SliceOf(tgtValueOf.Type())
for rows.Next() {
target := reflect.New(tgtType)
err := rows.StructScan(&target) // At this stage target still isn't 1:1 smilar struct so the StructScan fails... It's some perverted "Value" object instead. Meh.
// Removed appending to the list because the solutions for that would be similar
checkError(err, "rows.Scan")
}
checkError(rows.Err(), "rows.Err")
jsontext, err := json.Marshal(targets)
return jsontext, err
}
So umm, I would need to give the list type, and the vanilla type as parameters, then build one of each, and the rest of my logic would be probably fixable quite easily.
Turns out there's an sqlx.StructScan(rows, &destSlice) function that will do your inner loop, given a slice of the appropriate type. The sqlx docs refer to caching results of reflection operations, so it may have some additional optimizations compared to writing one.
Sounds like the immediate question you're actually asking is "how do I get something out of my reflect.Value that rows.StructScan will accept?" And the direct answer is reflect.Interface(target); it should return an interface{} representing an *Orange you can pass directly to StructScan (no additional & operation needed). Then, I think targets = reflect.Append(targets, target.Indirect()) will turn your target into a reflect.Value representing an Orange and append it to the slice. targets.Interface() should get you an interface{} representing an []Orange that json.Marshal understands. I say all these 'should's and 'I think's because I haven't tried that route.
Reflection, in general, is verbose and slow. Sometimes it's the best or only way to get something done, but it's often worth looking for a way to get your task done without it when you can.
So, if it works in your app, you can also convert Rows straight to JSON, without going through intermediate structs. Here's a sample program (requires sqlite3 of course) that turns sql.Rows into map[string]string and then into JSON. (Note it doesn't try to handle NULL, represent numbers as JSON numbers, or generally handle anything that won't fit in a map[string]string.)
package main
import (
_ "code.google.com/p/go-sqlite/go1/sqlite3"
"database/sql"
"encoding/json"
"os"
)
func main() {
db, err := sql.Open("sqlite3", "foo")
if err != nil {
panic(err)
}
tryQuery := func(query string, args ...interface{}) *sql.Rows {
rows, err := db.Query(query, args...)
if err != nil {
panic(err)
}
return rows
}
tryQuery("drop table if exists t")
tryQuery("create table t(i integer, j integer)")
tryQuery("insert into t values(?, ?)", 1, 2)
tryQuery("insert into t values(?, ?)", 3, 1)
// now query and serialize
rows := tryQuery("select * from t")
names, err := rows.Columns()
if err != nil {
panic(err)
}
// vals stores the values from one row
vals := make([]interface{}, 0, len(names))
for _, _ = range names {
vals = append(vals, new(string))
}
// rowMaps stores all rows
rowMaps := make([]map[string]string, 0)
for rows.Next() {
rows.Scan(vals...)
// now make value list into name=>value map
currRow := make(map[string]string)
for i, name := range names {
currRow[name] = *(vals[i].(*string))
}
// accumulating rowMaps is the easy way out
rowMaps = append(rowMaps, currRow)
}
json, err := json.Marshal(rowMaps)
if err != nil {
panic(err)
}
os.Stdout.Write(json)
}
In theory, you could build this to do fewer allocations by not reusing the same rowMap each time and using a json.Encoder to append each row's JSON to a buffer. You could go a step further and not use a rowMap at all, just the lists of names and values. I should say I haven't compared the speed against a reflect-based approach, though I know reflect is slow enough it might be worth comparing them if you can put up with either strategy.

How to use run time error in golang?

I am trying to download something from 3 servers. My idea is if the first server is closed,it will use the second server. I noticed,if the first server has been closed, it will created a run time error.I want to know how to use this error,what i need is like this:
if run time err!=nil{do something}
i am new to golang,hope someone can help me
thank you
To elaborate on what FUZxxl explained, go makes a distinction between an error (something which could go wrong indeed went wrong) and an exception (something which could not possibly go wrong actually went wrong).
The distinction can sometimes be subtle (as it relies on what is 'unexpected'), but it can also be clearer than the 'everything is an exception' that you see in other languages.
For instance, consider integers which might overflow. One possibility is to consider it a 'normal' behaviour, which should be handled appropriately:
func safe_add(x, y uint32) (uint32, error) {
z := x + y
if z < x || z < y {
return 0, fmt.Errorf("Integer overflow")
}
return z, nil
}
Another is to consider it 'never happens' and have the runtime panic in the unlikely case when it happens against all odds:
func panic_add(x, y uint32) uint32 {
z, err := safe_add(x, y)
if err != nil {
panic(err)
}
return z
}
(Note that I use my own 'safe_add' here, but you don't have to of course)
The main difference is in the way you handle the error afterwards. Adding a number to itself until it overflows with errors gives:
func safeloop(u uint32) {
var err error
for {
if u, err = safe_add(u, u); err != nil {
fmt.Println(err)
return
} else {
fmt.Println(u)
}
}
}
While handling panics uses the recover built-in function:
func panicloop(u uint32) {
defer func() {
if err := recover(); err != nil {
fmt.Println(err)
}
}()
for {
u = panic_add(u, u)
fmt.Println(u)
}
}
(full examples on the playground)
Note that the panic version has a much simpler loop, as you basically never expect anything to go wrong and never check for errors. The counterpart for this is that the way to handle panics is quite cumbersome, even for a very simple example like this. You defer a function which will call recover and capture the error when it arises and breaks out of the function. When your code becomes more complex, tracking exactly where/how the panic arose and acting on it accordingly can become much more complex than checking for errors in places where they could arise, with the result, err := func_which_may_fail(...) pattern.
You can even alternate between panics, recover which return errors, errors converted to panics, ... but this is (understandably) considered poor design.
There are some good resources on error handling and panics on the go blog. The specs is a good read to.
In your case, as you expect 'the server is closed' to be a pretty frequent behaviour, you should definitely go the error way, as FUZxxl suggested, but I hope this might be useful to you (or others) to understand how error handling works in Go.
When you do something which could go wrong, you get an error object.
bytes, err = stream.Read(buffer)
To check whether what you tried actually went wrong, compare the error object against nil. nil signalizes that no error has happened. In the case the error is not nil, you can do something about it.
if err != nil {
// insert error handling here
}

Resources