Golang: Why does increasing the size of a buffered channel eliminate output from my goroutines? - asynchronous

I am trying to understand why making the buffer size of a channel larger changes causes my code to run unexpectedly. If the buffer is smaller than my input (100 ints), the output is as expected, i.e., 7 goroutines each read a subset of the input and send output on another channel which prints it. If the buffer is the same size or larger than the input, I get no output and no error. Am I closing a channel at the wrong time? Do I have the wrong expectation about how buffers work? Or, something else?
package main
import (
"fmt"
"sync"
)
var wg1, wg2 sync.WaitGroup
func main() {
share := make(chan int, 10)
out := make(chan string)
go printChan(out)
for j:= 1; j<=7; j++ {
go readInt(share, out, j)
}
for i:=1; i<=100; i++ {
share <- i
}
close(share)
wg1.Wait()
close(out)
wg2.Wait()
}
func readInt(in chan int, out chan string, id int) {
wg1.Add(1)
for n := range in {
out <- fmt.Sprintf("goroutine:%d was sent %d", id, n)
}
wg1.Done()
}
func printChan(out chan string){
wg2.Add(1)
for l := range out {
fmt.Println(l)
}
wg2.Done()
}
To run this:
Small buffer, expected output. http://play.golang.org/p/4r7rTGypPO
Big buffer, no output. http://play.golang.org/p/S-BDsw7Ctu

This has nothing directly to do with the size of the buffer. Adding the buffer is exposing a bug in where you're calling waitGroup.Add(1)
You have to add to the WaitGroup before you dispatch the goroutine, otherwise you may end up calling Wait() before the waitGroup.Add(1) executes.
http://play.golang.org/p/YaDhc6n8_B
The reason it works in the first and not the second, is because the synchronous sends ensure that the gouroutines have executed at least that far. In the second example, the for loop fills up the channel, closes it and calls Wait before anything else can happen.

Related

how to create a real race condition in golang

problem description
Recently I learned the option -race to check the exist of race condition in go. The full command is go run -race xxx.go It really helped me a lot. But as with the code below, I think the check result is wrong, and tried a lot of method (My try to get a panic below) to get a REAL panic but failed. So I want to know whether the code is correct and the race check is wrong,or can you revise my code so that I can SEE a real panic. Thanks a lot.
The code
package main
import "fmt"
type myType struct {
A int
}
func main(){
c:=make(chan bool)
x := new(myType)
go func(){
x = new(myType) // write to x
c <- true
}()
_ = *x // read from x
<-c
fmt.Println("end")
}
The race check result
go run -race test.go
==================
WARNING: DATA RACE
Write at 0x00c00009c010 by goroutine 6:
main.main.func1()
/Users/yaodongen/test.go:12 +0x56
Previous read at 0x00c00009c010 by main goroutine:
main.main()
/Users/yaodongen/test.go:15 +0xe2
Goroutine 6 (running) created at:
main.main()
/Users/yaodongen/test.go:11 +0xd4
==================
end
Found 1 data race(s)
exit status 66
My point
I tried to find the reason for the race condition report.In a post(Chinese), it mentions that the operation a = in64(0) is not atomic. For example in one 32 bit Machine and the data like int64 may be 64 bit length, CPU could copy half of the data and be interruptted by others. In the following code (Prove the golang copy is not atomic), I write a code to prove its true. But in my case, the code x = new(myType) is to copy a pointer value, and I think it can be done in one CPU copy. In other word, the operation is atomic and will never reach race condition.
Prove the golang copy is not atomic
package main
import "fmt"
import "time"
func main(){
var x = [...]int{1,1,1,1,1,1}
c := make(chan int, 100)
go func(){
for i:=0;;i++{
if i&1 == 0 {
x = [...]int{2,2,2,2,2,2} // write to x
}else{
x = [...]int{1,1,1,1,1,1} // write to x
}
c<-0 // let other goroutine see the change of x
<-c
}
}()
go func(){
for {
d := x // read from x
if d[0] != d[5] {
fmt.Println(d)
panic("error") // proved the copy operation is not atomic
}
c<-0
<-c
}
}()
time.Sleep(time.Millisecond * 10000)
fmt.Println("end")
}
My try to get a panic
But it failed, the code will panic if there exists a race condition (wrong memory address).
package main
import "fmt"
import "time"
type myType struct {
A int
}
func main(){
x := new(myType)
c := make(chan int, 100)
go func(){
for {
x = new(myType) // write to x
c<-0
<-c
}
}()
for i:=0; i<4; i++{
go func(){
for {
_ = *x // if exists a race condition, `*x` will visit a wrong memory address, and will panic
c<-0
<-c
}
}()
}
time.Sleep(time.Second * 10)
fmt.Println("end")
}
Go's race detection never gives false positives. If it tells you there's a race, then there is a race. It might not recognize all races (they have to happen to be detectable), but what it finds is always positive (bugs in the race detector not counting).
The race condition in your example is clear and simple. You have 2 goroutines, one reads a variable and the other one writes it without synchronization. This is the recipe for a race condition.
Race conditions make your app unpredictable. A race condition's behavior is undefined. Any experienced behavior falls under undefined, including the lack of panic. Don't tempt the devil, if there's a race condition, use proper synchronization. End of story.
See Is it safe to read a function pointer concurrently without a lock?

Golang multiple timers with map+channel+mutex

So I'm implementing multiple timers using map/channel/mutex. In order for timer to cancel, I have a channel map that stores cancel info, below is the code:
var timerCancelMap = make(map[string]chan interface{})
var mutexLocker sync.Mutex
func cancelTimer(timerIndex string) {
mutexLocker.Lock()
defer mutexLocker.Unlock()
timerCancelMap[timerIndex] = make(chan interface{})
timerCancelMap[timerIndex] <- struct{}{}
}
func timerStart(timerIndex string) {
fmt.Println("###### 1. start timer: ", timerIndex)
timerStillActive := true
newTimer := time.NewTimer(time.Second * 10)
for timerStillActive {
mutexLocker.Lock()
select {
case <-newTimer.C:
timerStillActive = false
fmt.Println("OOOOOOOOO timer time's up: ", timerIndex)
case <-timerCancelMap[timerIndex]:
timerCancelMap[timerIndex] = nil
timerStillActive = false
fmt.Println("XXXXXXXXX timer canceled: ", timerIndex)
default:
}
mutexLocker.Unlock()
}
fmt.Println("###### 2. end timer: ", timerIndex)
}
func main() {
for i := 0; i < 10; i++ {
go timerStart(strconv.Itoa(i))
if i%10 == 0 {
cancelTimer(strconv.Itoa(i))
}
}
}
Now this one gives me deadlock, if I remove all mutex.lock/unlock, it gives me concurrent map read and map write. So what am I doing wrong?
I know sync.Map solves my problem, but the performance suffers significantly, so I kinda wanna stick with the map solution.
Thanks in advance!
There's a few things going on here which are going to cause problems with your script:
cancelTimer creates a channel make(chan interface{}) which has no buffer, e.g. make(chan struct{}, 1). This means that sending to the channel will block until another goroutine attempts to receive from that same channel. So when you attempt to call cancelTimer from the main goroutine, it locks mutexLocker and then blocks on sending the cancellation, meanwhile no other goroutine can lock mutexLocker to receive from the cancellation channel, thus causing a deadlock.
After adding a buffer, the cancelTimer call will return immediately.
We will then run into a few other little issues. The first is that the program will immediately quit without printing anything. This happens because after launching the test goroutines and sending the cancel, the main thread has done all of its work, which tells the program it is finished. So we need to tell the main thread to wait for the goroutines, which sync.WaitGroup is very good for:
func main() {
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go func(i int) {
defer wg.Done()
timerStart(strconv.Itoa(i))
}(i)
if i%10 == 0 {
cancelTimer(strconv.Itoa(i))
}
}
wg.Wait()
}
I can see you've added the mutexLocker to protect the map and later added the for loop to give each goroutine an opportunity to acquire mutexLocker to check their timers. This results in a lot of work for the computer, and more complicated code than is necessary. Instead of having timerStart look up it's index in the cancellations map, we can provide the cancellation channel as an argument:
func testTimer(i int, cancel <-chan interface{}) {
and have the main function create the channels. You will then be a le to remove map access, mutexLocker locking, and the for loop from testTimer. If you still require the map for purposes not shown here, you can put the same channel in the map that you pass to testTimer, and if not you can remove all of that code too.
This all ends up looking something like https://play.golang.org/p/iQUvc52B6Nk
Hope that helps 👍

Golang persistent channel accepting input from multiple function calls

I have a function a:
func a(input *some_type) {
// do sth.
b(input)
}
This function gets called multiple times.
I want a function b to wait indefinitely for input from function a and perform an action when it has collected n inputs.
func b(input *some_type) {
// wait until received n inputs then do sth. with all inputs
}
How would I go about doing this? My first thought was to use a sync.WaitGroup with a channel between a and b.
This is a common producer-consumer problem. Use channels to wait on the input from another routine. Does something like this help?
In this particular example, you would have to call go b(c) again after collecting the inputs as it terminates, but you could easily wrap whatever b does in an infinite for loop. Or whatever needs to happen.
Please note that in this example, and unbuffered channel is used, which forces both routines to meet at the same time to "hand off" the *Thing. If you want the producer (a's process) to not have to wait, you can use a buffered channel, which is created like so:
c := make(chan(*Thing, n))
Where n is the number of items the channel can store. This allows several to be queued by the producer.
https://play.golang.org/p/X14_QsSSU4
package main
import (
"fmt"
"time"
)
type Thing struct {
N int
}
func a(t *Thing, c chan (*Thing)) {
// stuff happens. whee
c <- t
}
func b(c chan (*Thing)) {
things := []*Thing{}
for i := 0; i < 10; i++ {
t := <-c
things = append(things, t)
fmt.Printf("I have %d things\n", i+1)
}
fmt.Println("I now have 10 things! Let's roll!")
// do stuff with your ten things
}
func main() {
fmt.Println("Hello, playground")
c := make(chan (*Thing))
go b(c)
// this would probably be done producer-consumer like in a go-routine
for i := 0; i < 10; i++ {
a(&Thing{i}, c)
time.Sleep(time.Second)
}
time.Sleep(time.Second)
fmt.Println("Program finished")
}

Golang http server implementation

I have read that net/http starts a go subroutine for each connection. I have few questions. But I haven't seen any parameter to limit the number of spawned new go subroutines. For example, If I have to handle 1 million concurrent requests per second, what will happen? Do we have any control over spawned go subroutines? If it spawns one go subroutine per connection, won't it choke my entire system? What is the recommended way of handling huge number of concurrent requests for a go webserver? I have to handle both cases of responses being asynchronous and synchronous.
Job/Worker pattern is a well common go concurrency pattern suited for this task.
Multiple goroutines can read from a single channel, distributing an amount of work between CPU cores, hence the workers name. In Go, this pattern is easy to implement - just start a number of goroutines with channel as parameter, and just send values to that channel - distributing and multiplexing will be done by Go runtime.
package main
import (
"fmt"
"sync"
"time"
)
func worker(tasksCh <-chan int, wg *sync.WaitGroup) {
defer wg.Done()
for {
task, ok := <-tasksCh
if !ok {
return
}
d := time.Duration(task) * time.Millisecond
time.Sleep(d)
fmt.Println("processing task", task)
}
}
func pool(wg *sync.WaitGroup, workers, tasks int) {
tasksCh := make(chan int)
for i := 0; i < workers; i++ {
go worker(tasksCh, wg)
}
for i := 0; i < tasks; i++ {
tasksCh <- i
}
close(tasksCh)
}
func main() {
var wg sync.WaitGroup
wg.Add(36)
go pool(&wg, 36, 50)
wg.Wait()
}
All goroutines run in parallel, waiting for channel to give them work. The goroutines receive their work almost immediately one after another.
Here is a great article about how you can handle 1 million requests per minute in go: http://marcio.io/2015/07/handling-1-million-requests-per-minute-with-golang/

How to find out nothing is being received in an unbuffered channel without closing it?

Is there a way to know if all the values in channel has been consumed? I'm making a crawler which recursively fetches sites from seed site. I'm not closing the channel because it consumes from the server and should crawl every time new site is sent. For a given seed site, I can't find a better way to know completion of a subtask other than timing out. If there was a way to know that there is no value in channel(left to be consumed), my program could get out of the sub task and continue listening to the server.
There is no such things as "queued in an unbuffered channel." If the channel is unbuffered, it is by definition always empty. If it is buffered, then it may have some number of elements in it up to its size. But trying to read how many elements are in it is always going to cause race conditions, so don't design that way (it's also impossible in Go).
Ideally, avoid designs that need to know when children are complete, but when you must, send them a channel to respond to you on. When they respond, then you know they're complete.
The kind of problem you're describing is well covered in the Go blogs and talks:
Go Concurrency Patterns: Pipelines and cancellation
Go Concurrency Patterns: Context
Concurrency is not paralellism
Go Concurrency Patterns
Advanced Go Concurrency Patterns
You can determine whether or not a goroutine is blocked on the other end of a channel by using default in a select statement. For example:
package main
import (
"fmt"
"time"
)
var c = make(chan int)
func produce(i int) {
c <- i
}
func consume() {
for {
select {
case i := <-c:
fmt.Println(i)
default:
return
}
}
}
func main() {
for i := 0; i < 10; i++ {
go produce(i)
}
time.Sleep(time.Millisecond)
consume()
}
Keep in mind that this isn't a queue though. If you were to have 1 producing goroutine that looped and produced multiple values between the time it took to send one value and get back around the loop again the default case would happen and your consumer would move on.
You could use a timeout:
case <-time.After(time.Second):
Which would give your producer a second to produce another value, but you're probably better off using a terminal value. Wrap whatever you're sending in a struct:
type message struct {
err error
data theOriginalType
}
And send that thing instead. Then use io.EOF or a custom error var Done = errors.New("DONE") to signal completion.
Since you have a recursive problem why not use a WaitGroup? Each time you start a new task increment the wait group, and each time a task completes, decrement it. Then have an outer task waiting on completion. For example here's a really inefficient way of calculating a fibonacci number:
package main
import (
"fmt"
"sync"
)
var wg sync.WaitGroup
func fib(c chan int, n int) {
defer wg.Done()
if n < 2 {
c <- n
} else {
wg.Add(2)
go fib(c, n - 1)
go fib(c, n - 2)
}
}
func main() {
wg.Add(1)
c := make(chan int)
go fib(c, 18)
go func() {
wg.Wait()
close(c)
}()
sum := 0
for i := range c {
sum += i
}
fmt.Println(sum)
}

Resources