How can I block (and join) on a channel fed by an unknown number of goroutines? - recursion

I have a recursive function. The function will call itself with various different values depending on the data it gets, so the arity and depth of recursion is not known: each call may call itself zero or more times. The function may return any number of values.
I want to parallelise it by getting goroutines and channels involved. Each recursion of inner runs in its own goroutine, and sends back a value on the channel. The outer function deals with those values.
func outer(response []int) {
results := make([]int)
resultsChannel := make(chan int)
inner := func(...) {
resultsChannel <- «some result»;
// Recurse in a new goroutine.
for _, recursionArgument in «some calculated data» {
go inner(recursionArgument)
}
}
go inner(«initial values»);
for {
result := <- resultsChannel
results = append(results, result)
// HELP! How do I decide when to break?
}
return results
}
The problem comes with escaping the results channel loop. Because of the 'shape' of the recursion (unknown arity and depth) I can't say "finish after n events" and I can't send a sentinel value.
How do I detect when all my recursions have happened and return from outer? Is there a better way to approach this?

You can use a sync.WaitGroup to manage the collection of goroutines you spawn: call Add(1) before spawning each new goroutine, and Done when each goroutine completes. So something like this:
var wg sync.WaitGroup
inner := func(...) {
...
// Recurse in a new goroutine.
for _, recursionArgument := range «some calculated data» {
wg.Add(1)
go inner(recursionArgument)
}
...
wg.Done()
}
wg.Add(1)
go inner(«initial values»)
Now waiting on wg will tell you when all the goroutines have completed.
If you are reading the results from a channel, the obvious way to tell when there are no more results is by closing the channel. You can achieve this through another goroutine to do this for us:
go func() {
wg.Wait()
close(resultsChannel)
}()
You should now be able to simply range over resultsChannel to read all the results.

Related

How to solve concurrency access of Golang map?

Now i have a map with only one write/delete goroutine and many read goroutines, there are some solutions upon Map with concurrent access, such as RWMutex, sync.map, concurrent-map, sync.atomic, sync.Value, what's the best choice for me?
RWMutex's read lock is a little redundant
sync.map and concurrent-map focus on many write goroutines
Your question is a little vague - so I'll break it down.
What form of concurrent access should I use for a map?
The choice depends on the performance you require from the map. I would opt for a simple mutex (or a RWMutex) based approach.
Granted, you can get better performance from a concurrent map. sync.Mutex locks all of a maps buckets, whereas in a concurrent map, each bucket has it's own sync.Mutex.
Again - it all depends on the scale of your program and the performance you require.
How would I use a mutex for concurrent access?
To ensure the map is being used correctly, you can wrap this in a struct.
type Store struct {
Data map[T]T
}
This a more object-oriented solution, but it allows us to make sure any read/writes are performed concurrently. As well as this, we can easily store other information that may be useful for debugging or security, such as author.
Now, we would implement this with a set of methods like so:
mux sync.Mutex
// New initialises a Store type with an empty map
func New(t, h uint) *Store {
return &Store{
Data: map[T]T{},
}
}
// Insert adds a new key i to the store and places the value of x at this location
// If there is an error, this is returned - if not, this is nil
func (s *Store) Insert(i, x T) error {
mux.Lock()
defer mux.Unlock()
_, ok := s.Data[i]
if ok {
return fmt.Errorf("index %s already exists; use update", i)
}
s.Data[i] = x
return nil
}
// Update changes the value found at key i to x
// If there is an error, this is returned - if not, this is nil
func (s *Store) Update(i, x T) error {
mux.Lock()
defer mux.Unlock()
_, ok := s.Data[i]
if !ok {
return fmt.Errorf("value at index %s does not exist; use insert", i)
}
s.Data[i] = x
return nil
}
// Fetch returns the value found at index i in the store
// If there is an error, this is returned - if not, this is nil
func (s *Store) Fetch(i T) (T, error) {
mux.Lock()
defer mux.Unlock()
v, ok := s.Data[i]
if !ok {
return "", fmt.Errorf("no value for key %s exists", i)
}
return v, nil
}
// Delete removes the index i from store
// If there is an error, this is returned - if not, this is nil
func (s *Store) Delete(i T) (T, error) {
mux.Lock()
defer mux.Unlock()
v, ok := s.Data[i]
if !ok {
return "", fmt.Errorf("index %s already empty", i)
}
delete(s.Data, i)
return v, nil
}
In my solution, I've used a simple sync.Mutex - but you can simply change this code to accommodate RWMutex.
I recommend you take a look at How to use RWMutex in Golang?.

Recursion when overwriting in variable

I hope you can help me out since this gave me quite a headache.
I'm creating a chain for the middleware which is executed afterwards. But it looks like it has become recurring. The variable next within the anonymous function points to itself.
type MiddlewareInterface interface {
// Run the middleware for the given request, and receive the next handler.
Run(http.ResponseWriter, *http.Request, http.Handler)
}
createChain(collection []MiddlewareInterface, handler http.Handler) http.Handler
next := handler
for _, middlew := range collection {
next = func(w http.ResponseWriter, res *http.Request) {
middlew.Run(w, res, next)
}
}
return next
}
I know it's kind of a noob question, but I sincerely do want to understand what causes this and how this can be resolved. Looking forward to your answers!
Seems this is a closure variable in loop problem. You are creating a function which is capturing next in each loop, but what this will mean is that all of the functions share the same variable next and they will all have the value that is left on the last loop. I think you can fix it by introducing a new temporary variable inside the loop scope:
func createChain(collection []MiddlewareInterface, handler http.Handler) http.Handler
next := handler
for _, middlew := range collection {
thisNext:= next
mw := middlew
next = func(w http.ResponseWriter, res *http.Request) {
mw.Run(w, res, thisNext)
}
}
return next
}
Possibly the placement of the new variable definition isn't quite right but the closure issue will definitely be the source of your problem. It's not normally how http middleware handlers work as they normally wrap each other rather than being chained.

How can I create a first-class map iterator in Go?

I am writing a function that iterates over the entries in a map. I want to be able to deal cleanly with items which are added or deleted from the map while iterating, like for k, v := range myMap { //... does, but I am processing just one key/value pair per iteration so I can't use range. I want something like:
func processItem(i iterator) bool {
k, v, ok := i.next()
if(!ok) {
return false
}
process(v)
return true
}
var m = make(map[string]widget)
// ...
i := makeIterator(m)
for processItem(i) {
// code which might add/remove item from m here
}
I know that range is using a 'hiter' struct and associated functions, as defined in src/runtime/hashmap.go, to perform iteration. Is there some way to gain access to this iterator as a reified (first-class) Go object?
Is there an alternative strategy for iterating over a map which would deal well with insertions/deletions but give a first-class iterator object?
Bonus question: is there an alternative strategy for iterating over a map which could also deal with the map and iterator being serialised to disk and then restored, with iteration continuing from where it left off? (Obviously the built-in range iterator does not have this capability!)
You can't :(
The only way to iterate over a map is by using for range and you can't get an iterator object out of that.
You can use channels as iterators.
Your iterator would be a function returning a channel that communicates the current iteration value to whoever receives it:
func iterator(m map[string]widget) chan iteration {
c := make(chan iteration)
go func() {
for k,v := range m {
c <- iteration{k,v}
}
close(c)
}()
return c
}
This is of course not generic, you could make it generic using interface{} and/or reflection but that shouldn't be too hard if you actually need it.
Closing the channel at the end of iteration will notify the end of iteration, demonstrated later.
The iteration type is just there so you can send key and value at the same time, it would look something like this:
type iteration struct {
key string
value widget
}
With this you can then do this (on play):
m := map[string]widget{"foo": widget{3}, "bar": widget{4}}
i := iterator(m)
iter, ok := <- i
fmt.Println(iter, ok)
iter, ok = <- i
fmt.Println(iter, ok)
iter, ok = <- i
fmt.Println(iter, ok)
which yields
{foo {3}} true
{bar {4}} true
{ {0}} false
A very simple approach is to obtain a list of all the keys in the map, and package the list and the map up in an iterator struct. When we want the next key, we take the next one from the list that hasn't been deleted from the map:
type iterator struct {
m map[string]widget
keys []string
}
func newIterator(m map[string]widget) *iterator {
it := iterator{m, make([]string, len(m))}
i := 0
for k, _ := range m {
it.keys[i] = k
i++
}
return &it
}
func (it *iterator) next() (string, widget, bool) {
for len(it.keys) > 0 {
k := it.keys[0]
it.keys = it.keys[1:]
if _, exists := it.m[k]; exists {
return k, it.m[k], true
}
}
return "", widget{0}, false
}
See running example on play.
You can define your own map type. Also it will be good to solve concurrency problem:
type ConcurrentMap struct {
sync.RWMutex
items map[string]interface{}
}
type ConcurrentMapItem struct {
Key string
Value interface{}
}
func (cm *ConcurrentMap) Iter() <-chan ConcurrentMapItem {
c := make(chan ConcurrentMapItem)
f := func() {
cm.Lock()
defer cm.Unlock()
for k, v := range cm.items {
c <- ConcurrentMapItem{k, v}
}
close(c)
}
go f()
return c
}

downloading files with goroutines?

I'm new to Go and I'm learning how to work with goroutines.
I have a function that downloads images:
func imageDownloader(uri string, filename string) {
fmt.Println("starting download for ", uri)
outFile, err := os.Create(filename)
defer outFile.Close()
if err != nil {
os.Exit(1)
}
client := &http.Client{}
req, err := http.NewRequest("GET", uri, nil)
resp, err := client.Do(req)
defer resp.Body.Close()
if err != nil {
panic(err)
}
header := resp.ContentLength
bar := pb.New(int(header))
rd := bar.NewProxyReader(resp.Body)
// and copy from reader
io.Copy(outFile, rd)
}
When I call by itself as part of another function, it downloads images completely and there is no truncated data.
However, when I try to modify it to make it a goroutine, images are often truncated or zero length files.
func imageDownloader(uri string, filename string, wg *sync.WaitGroup) {
...
io.Copy(outFile, rd)
wg.Done()
}
func main() {
var wg sync.WaitGroup
wg.Add(1)
go imageDownloader(url, file, &wg)
wg.Wait()
}
Am I using WaitGroups incorrectly? What could cause this and how can I fix it?
Update:
Solved it. I had placed the wg.add() function outside of a loop. :(
While I'm not sure exactly what's causing your issue, here's two options for how to get it back into working order.
First, looking to the example of how to use waitgroups from the sync library, try calling defer wg.Done() at the beginning of your function to ensure that even if the goroutine ends unexpectedly, that the waitgroup is properly decremented.
Second, io.Copy returns an error that you're not checking. That's not great practice anyway, but in your particular case it's preventing you from seeing if there is indeed an error in the copying routine. Check it and deal with it appropriately. It also returns the number of bytes written, which might help you as well.
Your example doesn't have anything obviously wrong with its use of WaitGroups. As long as you are calling wg.Add() with the same number as the number of goroutines you launch, or incrementing it by 1 every time you start a new goroutine, that should be correct.
However you call os.Exit and panic for certain errors conditions in the goroutine, so if you have more than one of these running, a failure in any one of them will terminate all of them, regardless of the use of WaitGroups. If it's failing without a panic message, I would take a look at the os.Exit(1) line.
It would also, be good practice in go to use defer wg.Done() at the start of your function, so that even if an error occurs, the goroutine still decrements its counter. That way your main thread won't hang on completion if one of the goroutines returns an error.
One change I would make in your example is leverage defer when you are Done. I think this defer ws.Done() should be the first statement in your function.
I like WaitGroup's simplicity. However, I do not like that we need to pass the reference to the goroutine because that would mean that the concurrency logic would be mixed with your business logic.
So I came up with this generic function to solve this problem for me:
// Parallelize parallelizes the function calls
func Parallelize(functions ...func()) {
var waitGroup sync.WaitGroup
waitGroup.Add(len(functions))
defer waitGroup.Wait()
for _, function := range functions {
go func(copy func()) {
defer waitGroup.Done()
copy()
}(function)
}
}
So your example could be solved this way:
func imageDownloader(uri string, filename string) {
...
io.Copy(outFile, rd)
}
func main() {
functions := []func(){}
list := make([]Object, 5)
for _, object := range list {
function := func(obj Object){
imageDownloader(object.uri, object.filename)
}(object)
functions = append(functions, function)
}
Parallelize(functions...)
fmt.Println("Done")
}
If you would like to use it, you can find it here https://github.com/shomali11/util

Is it safe to remove selected keys from map within a range loop?

How can one remove selected keys from a map?
Is it safe to combine delete() with range, as in the code below?
package main
import "fmt"
type Info struct {
value string
}
func main() {
table := make(map[string]*Info)
for i := 0; i < 10; i++ {
str := fmt.Sprintf("%v", i)
table[str] = &Info{str}
}
for key, value := range table {
fmt.Printf("deleting %v=>%v\n", key, value.value)
delete(table, key)
}
}
https://play.golang.org/p/u1vufvEjSw
This is safe! You can also find a similar sample in Effective Go:
for key := range m {
if key.expired() {
delete(m, key)
}
}
And the language specification:
The iteration order over maps is not specified and is not guaranteed to be the same from one iteration to the next. If map entries that have not yet been reached are removed during iteration, the corresponding iteration values will not be produced. If map entries are created during iteration, that entry may be produced during the iteration or may be skipped. The choice may vary for each entry created and from one iteration to the next. If the map is nil, the number of iterations is 0.
Sebastian's answer is accurate, but I wanted to know why it was safe, so I did some digging into the Map source code. It looks like on a call to delete(k, v), it basically just sets a flag (as well as changing the count value) instead of actually deleting the value:
b->tophash[i] = Empty;
(Empty is a constant for the value 0)
What the map appears to actually be doing is allocating a set number of buckets depending on the size of the map, which grows as you perform inserts at the rate of 2^B (from this source code):
byte *buckets; // array of 2^B Buckets. may be nil if count==0.
So there are almost always more buckets allocated than you're using, and when you do a range over the map, it checks that tophash value of each bucket in that 2^B to see if it can skip over it.
To summarize, the delete within a range is safe because the data is technically still there, but when it checks the tophash it sees that it can just skip over it and not include it in whatever range operation you're performing. The source code even includes a TODO:
// TODO: consolidate buckets if they are mostly empty
// can only consolidate if there are no live iterators at this size.
This explains why using the delete(k,v) function doesn't actually free up memory, just removes it from the list of buckets you're allowed to access. If you want to free up the actual memory you'll need to make the entire map unreachable so that garbage collection will step in. You can do this using a line like
map = nil
I was wondering if a memory leak could happen. So I wrote a test program:
package main
import (
log "github.com/Sirupsen/logrus"
"os/signal"
"os"
"math/rand"
"time"
)
func main() {
log.Info("=== START ===")
defer func() { log.Info("=== DONE ===") }()
go func() {
m := make(map[string]string)
for {
k := GenerateRandStr(1024)
m[k] = GenerateRandStr(1024*1024)
for k2, _ := range m {
delete(m, k2)
break
}
}
}()
osSignals := make(chan os.Signal, 1)
signal.Notify(osSignals, os.Interrupt)
for {
select {
case <-osSignals:
log.Info("Recieved ^C command. Exit")
return
}
}
}
func GenerateRandStr(n int) string {
rand.Seed(time.Now().UnixNano())
const letterBytes = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
b := make([]byte, n)
for i := range b {
b[i] = letterBytes[rand.Int63() % int64(len(letterBytes))]
}
return string(b)
}
Looks like GC do frees the memory. So it's okay.
In short, yes. See previous answers.
And also this, from here:
ianlancetaylor commented on Feb 18, 2015
I think the key to understanding this is to realize that while executing the body of a for/range statement, there is no current iteration. There is a set of values that have been seen, and a set of values that have not been seen. While executing the body, one of the key/value pairs that has been seen--the most recent pair--was assigned to the variable(s) of the range statement. There is nothing special about that key/value pair, it's just one of the ones that has already been seen during the iteration.
The question he's answering is about modifying map elements in place during a range operation, which is why he mentions the "current iteration". But it's also relevant here: you can delete keys during a range, and that just means that you won't see them later on in the range (and if you already saw them, that's okay).

Resources