Where does named pipe (FIFO) data go when reader disconnects? - unix

Let's say I have a producer.go and consumer.go. The consumer.go reads from a UNIX named pipe, and the producer writes to the named pipe.
As expected, If you start up just one of the producer or consumer programs, it hangs because there's no reader or writer on the other side of the pipe.
Now, If I start both programs, then immediately CTRL-C from the consumer, the producer continues sending data to the pipe and as far as I can tell there's no limit to the size of that data (I've sent 80MB)
If I start the consumer program again (while the producer is still running), it starts pulling data off of the named pipe, but not the data that I "missed" while the consumer program was not running.
My question is: When a reader of a named pipe disconnects, what happens to the data that's sent to the named pipe?
Here are my consumer.go and producer.go programs:
consumer.go
package main
import (
"io"
"io/ioutil"
"log"
"os"
"syscall"
)
func main() {
syscall.Mkfifo("fifo0", 0777)
fp, err := os.OpenFile("fifo0", os.O_RDONLY, 0777)
if err != nil {
log.Fatalf("Could not open fifo0: %s", err)
}
tee := io.TeeReader(fp, os.Stdout)
ioutil.ReadAll(tee)
}
producer.go
package main
import (
"fmt"
"io"
"log"
"os"
"strings"
"time"
)
func main() {
dots := strings.Repeat(".", 990)
fifo, err := os.OpenFile("fifo0", os.O_WRONLY, 0777)
if err != nil {
log.Fatalf("Could not open fifo0: %s", err)
}
defer fifo.Close()
w := io.MultiWriter(os.Stdout, fifo)
for i := 0; i < 8000; i++ {
fmt.Fprintf(w, "%010d%s\n", i, dots)
time.Sleep(time.Millisecond * 10)
}
}

A FIFO requires at least one source and one destination for data to get transferred anywhere. A reader alone waits to pull from someone, and a writer alone waits to send to someone. In this way, there are no gaps in a one-to-one pipe.
So if you're still trying to read or write from one end of a disconnected or nonexistent pipe, the answer is that the data is going nowhere; the pipe is "blocked" and can hold no data of its own. So it depends on how your code handles this situation.
In producer.go, the loop continues to run even when there's no longer a connection. Because Fprintf or MultiWriter don't raise a code-stopping error for whatever reason. In this case you could either add a check in the loop or an event handler for the fifo object's disconnection.
The reason it seems that there's a gap of disappearing data, is because the loop continues to iterate for i and generate strings it can't send.

Related

panic: runtime error: invalid memory address or nil pointer dereference with bigger data

I am working on a recommendation engine with Apache Prediction IO. Before the event server i have an GO api that listens events from customer and importer. In a particular case when customer uses importer i collect the imported identitys and i send in a json from importer api to GO api. As an example if user imports a csv that contains 45000 data, i send those 45000 identity to GO api in a json like {"barcodes":[...]}. Prediction IO event server wants data in a particular shape.
type ItemEvent struct {
Event string `json:"event"`
EntityType string `json:"entityType"`
EntityId string `json:"entityId"`
Properties map[string][]string `json:"properties"`
EventTime time.Time `json:"eventTime"`
}
type ItemBulkEvent struct {
Event string `json:"event"`
Barcodes []string `json:"barcodes"`
EventTime time.Time `json:"eventTime"`
}
ItemEvent is the final data that i will send to event server from GO Api. ItemBulkEvent is the data that i receive from importer api.
func HandleItemBulkEvent(w http.ResponseWriter, r *http.Request) {
var itemBulk model.ItemBulkEvent
err := decode(r,&itemBulk)
if err != nil {
log.Fatalln("handleitembulkevent -> ",err)
util.RespondWithError(w,400,err.Error())
}else {
var item model.ItemEvent
item.EventTime = itemBulk.EventTime; item.EntityType = "item"; item.Event = itemBulk.Event
itemList := make([]model.ItemEvent,0,50)
for index, barcode := range itemBulk.Barcodes{
item.EntityId = barcode
if (index > 0 && (index % 49) == 0){
itemList = append(itemList, item)
go sendBulkItemToEventServer(w,r,itemList)
itemList = itemList[:0]
}else if index == len(itemBulk.Barcodes) - 1{
itemList = append(itemList, item)
itemList = itemList[:( (len(itemBulk.Barcodes) - 1) % 49)]
go sendBulkItemToEventServer(w,r,itemList) // line 116
itemList = itemList[:0]
} else{
itemList = append(itemList, item)
}
}
util.RespondWithJSON(w,200,"OK")
}
}
HandleItemBulkEvent is a handler function for bulk updates. In this step i should mention about prediction io's batch uploads. Via rest api prediction io event server takes 50 event per request. So i created a list with 50 cap and an item. I used same item and just changed identity part(barcode) in every turn and added to list. In every 50. item i used a handler function that sends that list to event server and after that cleaned the list so on.
func sendBulkItemToEventServer(w http.ResponseWriter, r *http.Request, itemList []model.ItemEvent){
jsonedItem,err := json.Marshal(itemList)
if err != nil{
log.Fatalln("err marshalling -> ",err.Error())
}
// todo: change url to event server url
resp, err2 := http.Post(fmt.Sprintf("http://localhost:7070/batch/events.json?accessKey=%s",
r.Header.Get("Authorization")),
"application/json",
bytes.NewBuffer(jsonedItem))
if err2 != nil{
log.Fatalln("err http -> " , err.Error()) // line 141
}
defer resp.Body.Close()
}
sendBulkItemToEventServer function marshals the incoming itemlist and makes an post request to prediction io's event server. In this part when i try with 5000+- item it does well but when i try with 45000 item application crashes with below error.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xc05938]
goroutine 620 [running]:
api-test/service.sendBulkItemToEventServer(0x1187860, 0xc00028e0e0, 0xc00029c200, 0xc00011c000, 0x31, 0x32)
/home/kadirakinkorkunc/Desktop/playground/recommendation-engine/pio_api/service/CollectorService.go:141 +0x468
created by api-test/service.HandleItemBulkEvent
/home/kadirakinkorkunc/Desktop/playground/recommendation-engine/pio_api/service/CollectorService.go:116 +0x681
Debugger finished with exit code 0
Any idea how can i solve this problem?
edit: as Burak Serdar mentioned in the answers, i fixed the err, err2 confusion and the data race problem by using marshalling before send. Now it gives me the real error(res,err2) i guess.
2020/08/03 15:11:55 err http -> Post "http://localhost:7070/batch/events.json?accessKey=FJbGODbGzxD-CoLTdwTN2vwsuEEBJEZc4efrSUc6ekV1qUYAWPu5anDTyMGDoNq1": read tcp 127.0.0.1:54476->127.0.0.1:7070: read: connection reset by peer
Any idea on this?
There are several errors in your program. The runtime error is because you are checking if err2 is not nil, but then you're printing err, not err2. err is nil, thus the runtime error.
This means err2 is not nil, so you should see what that error is.
You mentioned you are sending messages in batches of 50, but that implementation is wrong. You add elements to the itemList, then start a goroutine with that itemList, then truncate it and start filling again. That is a data race, and your goroutines will see the itemList instances that are being modified by the handler. Instead of truncating, simply create a new itemList when you submit one to the goroutine, so each goroutine can have their own copy.
If you want to keep using the same slice, you can marshal the slice, and then pass the JSON message to the goroutine instead of the slice.
The error you are getting is the one sent by the server you are making the request to. Check this out for understanding more about the error.
Most likely the following for loop
for index, barcode := range itemBulk.Barcodes{
has too many iterations and because you are using separate go routines for creating the request, all the requests happen concurrently which either overloads the server or makes it deliberately close the connection.

Is there a better way to parse this Map?

Fairly new to Go, essentially in the actual code I'm writing I plan to read from a file which will contain environment variables, i.e. API_KEY=XYZ. Means I can keep them out of Version control. The below solution 'works' but I feel like there is probably a better way of doing it.
The end goal is to be able to access the elements from the file like so
m["API_KEY"] and that would print XYZ. This may even already exist and I'm re-inventing the wheel, I saw Go has environment variables but it didn't seem to be what I was after specifically.
So any help is appreciated.
Playground
Code:
package main
import (
"fmt"
"strings"
)
var m = make(map[string]string)
func main() {
text := `Var1=Value1
Var2=Value2
Var3=Value3`
arr := strings.Split(text, "\n")
for _, value := range arr {
tmp := strings.Split(value, "=")
m[strings.TrimSpace(tmp[0])] = strings.TrimSpace(tmp[1])
}
fmt.Println(m)
}
First, I would recommend to read this related question: How to handle configuration in Go
Next, I would really consider storing your configuration in another format. Because what you propose isn't a standard. It's close to Java's property file format (.properties), but even property files may contain Unicode sequences and thus your code is not a valid .properties format parser as it doesn't handle Unicode sequences at all.
Instead I would recommend to use JSON, so you can easily parse it with Go or with any other language, and there are many tools to edit JSON texts, and still it is human-friendly.
Going with the JSON format, decoding it into a map is just one function call: json.Unmarshal(). It could look like this:
text := `{"Var1":"Value1", "Var2":"Value2", "Var3":"Value3"}`
var m map[string]string
if err := json.Unmarshal([]byte(text), &m); err != nil {
fmt.Println("Invalid config file:", err)
return
}
fmt.Println(m)
Output (try it on the Go Playground):
map[Var1:Value1 Var2:Value2 Var3:Value3]
The json package will handle formatting and escaping for you, so you don't have to worry about any of those. It will also detect and report errors for you. Also JSON is more flexible, your config may contain numbers, texts, arrays, etc. All those come for "free" just because you chose the JSON format.
Another popular format for configuration is YAML, but the Go standard library does not include a YAML parser. See Go implementation github.com/go-yaml/yaml.
If you don't want to change your format, then I would just use the code you posted, because it does exactly what you want it to do: process input line-by-line, and parse a name = value pair from each line. And it does it in a clear and obvious way. Using a CSV or any other reader for this purpose is bad because they hide what's under the hood (they intentionally and rightfully hide format specific details and transformations). A CSV reader is a CSV reader first; even if you change the tabulator / comma symbol: it will interpret certain escape sequences and might give you different data than what you see in a plain text editor. This is an unintended behavior from your point of view, but hey, your input is not in CSV format and yet you asked a reader to interpret it as CSV!
One improvement I would add to your solution is the use of bufio.Scanner. It can be used to read an input line-by-line, and it handles different styles of newline sequences. It could look like this:
text := `Var1=Value1
Var2=Value2
Var3=Value3`
scanner := bufio.NewScanner(strings.NewReader(text))
m := map[string]string{}
for scanner.Scan() {
parts := strings.Split(scanner.Text(), "=")
if len(parts) == 2 {
m[strings.TrimSpace(parts[0])] = strings.TrimSpace(parts[1])
}
}
if err := scanner.Err(); err != nil {
fmt.Println("Error encountered:", err)
}
fmt.Println(m)
Output is the same. Try it on the Go Playground.
Using bufio.Scanner has another advantage: bufio.NewScanner() accepts an io.Reader, the general interface for "all things being a source of bytes". This means if your config is stored in a file, you don't even have to read all the config into the memory, you can just open the file e.g. with os.Open() which returns a value of *os.File which also implements io.Reader, so you may directly pass the *os.File value to bufio.NewScanner() (and so the bufio.Scanner will read from the file and not from an in-memory buffer like in the example above).
1- You may read all with just one function call r.ReadAll() using csv.NewReader from encoding/csv with:
r.Comma = '='
r.TrimLeadingSpace = true
And result is [][]string, and input order is preserved, Try it on The Go Playground:
package main
import (
"encoding/csv"
"fmt"
"strings"
)
func main() {
text := `Var1=Value1
Var2=Value2
Var3=Value3`
r := csv.NewReader(strings.NewReader(text))
r.Comma = '='
r.TrimLeadingSpace = true
all, err := r.ReadAll()
if err != nil {
panic(err)
}
fmt.Println(all)
}
output:
[[Var1 Value1] [Var2 Value2] [Var3 Value3]]
2- You may fine-tune csv.ReadAll() to convert the output to the map, but the order is not preserved, try it on The Go Playground:
package main
import (
"encoding/csv"
"fmt"
"io"
"strings"
)
func main() {
text := `Var1=Value1
Var2=Value2
Var3=Value3`
r := csv.NewReader(strings.NewReader(text))
r.Comma = '='
r.TrimLeadingSpace = true
all, err := ReadAll(r)
if err != nil {
panic(err)
}
fmt.Println(all)
}
func ReadAll(r *csv.Reader) (map[string]string, error) {
m := make(map[string]string)
for {
tmp, err := r.Read()
if err == io.EOF {
return m, nil
}
if err != nil {
return nil, err
}
m[tmp[0]] = tmp[1]
}
}
output:
map[Var2:Value2 Var3:Value3 Var1:Value1]

downloading files with goroutines?

I'm new to Go and I'm learning how to work with goroutines.
I have a function that downloads images:
func imageDownloader(uri string, filename string) {
fmt.Println("starting download for ", uri)
outFile, err := os.Create(filename)
defer outFile.Close()
if err != nil {
os.Exit(1)
}
client := &http.Client{}
req, err := http.NewRequest("GET", uri, nil)
resp, err := client.Do(req)
defer resp.Body.Close()
if err != nil {
panic(err)
}
header := resp.ContentLength
bar := pb.New(int(header))
rd := bar.NewProxyReader(resp.Body)
// and copy from reader
io.Copy(outFile, rd)
}
When I call by itself as part of another function, it downloads images completely and there is no truncated data.
However, when I try to modify it to make it a goroutine, images are often truncated or zero length files.
func imageDownloader(uri string, filename string, wg *sync.WaitGroup) {
...
io.Copy(outFile, rd)
wg.Done()
}
func main() {
var wg sync.WaitGroup
wg.Add(1)
go imageDownloader(url, file, &wg)
wg.Wait()
}
Am I using WaitGroups incorrectly? What could cause this and how can I fix it?
Update:
Solved it. I had placed the wg.add() function outside of a loop. :(
While I'm not sure exactly what's causing your issue, here's two options for how to get it back into working order.
First, looking to the example of how to use waitgroups from the sync library, try calling defer wg.Done() at the beginning of your function to ensure that even if the goroutine ends unexpectedly, that the waitgroup is properly decremented.
Second, io.Copy returns an error that you're not checking. That's not great practice anyway, but in your particular case it's preventing you from seeing if there is indeed an error in the copying routine. Check it and deal with it appropriately. It also returns the number of bytes written, which might help you as well.
Your example doesn't have anything obviously wrong with its use of WaitGroups. As long as you are calling wg.Add() with the same number as the number of goroutines you launch, or incrementing it by 1 every time you start a new goroutine, that should be correct.
However you call os.Exit and panic for certain errors conditions in the goroutine, so if you have more than one of these running, a failure in any one of them will terminate all of them, regardless of the use of WaitGroups. If it's failing without a panic message, I would take a look at the os.Exit(1) line.
It would also, be good practice in go to use defer wg.Done() at the start of your function, so that even if an error occurs, the goroutine still decrements its counter. That way your main thread won't hang on completion if one of the goroutines returns an error.
One change I would make in your example is leverage defer when you are Done. I think this defer ws.Done() should be the first statement in your function.
I like WaitGroup's simplicity. However, I do not like that we need to pass the reference to the goroutine because that would mean that the concurrency logic would be mixed with your business logic.
So I came up with this generic function to solve this problem for me:
// Parallelize parallelizes the function calls
func Parallelize(functions ...func()) {
var waitGroup sync.WaitGroup
waitGroup.Add(len(functions))
defer waitGroup.Wait()
for _, function := range functions {
go func(copy func()) {
defer waitGroup.Done()
copy()
}(function)
}
}
So your example could be solved this way:
func imageDownloader(uri string, filename string) {
...
io.Copy(outFile, rd)
}
func main() {
functions := []func(){}
list := make([]Object, 5)
for _, object := range list {
function := func(obj Object){
imageDownloader(object.uri, object.filename)
}(object)
functions = append(functions, function)
}
Parallelize(functions...)
fmt.Println("Done")
}
If you would like to use it, you can find it here https://github.com/shomali11/util

Working out the number of times a (request handler) function has been called in Go

Context
I'm making a web app that serves dynamically generated pdfs. These contain content from the internet, so every time it serves a pdf, it downloads a number of files to a new temporary folder.
The Problem
I end up with a large number of folders after I load the page once, so it seems that, for some reason, the handler is being called multiple times, which is an issue because I'm downloading multiple times more than I need to of not insubstantial files. I'd like to check at what stage of the process multiple requests are occurring.
The Question
Is there a way of working out how many times a function has been called, quite possibly using closures? (I haven't quite got closures into my mental model for programming yet; I don't completely understand them/how they're used).
This would preferably be something involving an int in the language rather than printing something at every stage and counting by hand - I'm looking for a more scalable solution than that (for later situations as well as this one).
Thanks!
Here are two ways you can count function calls, and one for method calls. There are plenty of other ways too, but just to get you started:
Using closure: (not what I would recommended)
package main
import(
"fmt"
"sync/atomic"
)
var Foo = func() (func() uint64) {
var called uint64
return func() uint64 {
atomic.AddUint64(&called, 1)
fmt.Println("Foo!")
return called
}
}()
func main() {
Foo()
c := Foo()
fmt.Printf("Foo() is called %d times\n", c)
}
Playground: http://play.golang.org/p/euKbamdI7h
Using global counter:
package main
import (
"fmt"
"sync/atomic"
)
var called uint64
func Foo() {
atomic.AddUint64(&called, 1)
fmt.Println("Foo!");
}
func main() {
Foo()
Foo()
fmt.Printf("Foo() is called %d times\n", called)
}
Playground: http://play.golang.org/p/3Ib29VCnoF
Counting method calls:
package main
import (
"fmt"
"sync/atomic"
)
type T struct {
Called uint64
}
func (t *T) Foo() {
atomic.AddUint64(&t.Called, 1)
fmt.Println("Foo!")
}
func main() {
var obj T
obj.Foo()
obj.Foo()
fmt.Printf("obj.Foo() is called %d times\n", obj.Called)
}
Playground: http://play.golang.org/p/59eOQdUQU1
Edit:
I just realized that the handler might not be in your own package. In such a case, you might want to write a wrapper:
var called uint64
func Foo() {
atomic.AddUint64(&called, 1)
importedPackage.Foo()
}
Edit 2:
Updated the examples to use atomic +1 operations.
Counting Calls
To answer the specific question you asked, here is one quick way to count handler executions:
func countCalls(h http.HandlerFunc) http.HandlerFunc {
var lock sync.Mutex
var count int
return func(w http.ResponseWriter, r *http.Request) {
lock.Lock()
count++
w.Header().Set("X-Call-Count", fmt.Sprintf("%d", count))
lock.Unlock()
h.ServeHTTP(w, r)
}
}
http.Handle("/foobar", countCalls(foobarHandler))
This will add a header that you can inspect with your favorite web developer tools; you could also just log it to standard output or something.
Logging Handlers
To expand upon the answers mentioned above, what you probably want to do to debug this and have in place for future use is to log details of each request.
package main
import (
"flag"
"log"
"net/http"
"os"
"github.com/gorilla/handlers"
)
var (
accessLogFile = flag.String("log", "/var/log/yourapp/access.log", "Access log file")
)
func main() {
accessLog, err := os.OpenFile(*accessLogFile, os.O_CREATE|os.O_WRITE|os.O_APPEND, 0644)
if err != nil {
log.Fatalf("Failed to open access log: %s", err)
}
wrap := func(f http.HandlerFunc) http.Handler {
return handlers.LoggingHandler(accessLog, http.HandlerFunc(foobarHandler))
}
http.Handle("/foobar", wrap(foobarHandler))
...
}
This uses LoggingHandler (or CombinedLoggingHandler) to write a standard Apache format log message that you can either inspect yourself or analyze with various tools.
An example of a log line would be
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
which tells you who made the request, when, what the method and URL was, how your server responded, and how long the response was. From this log, you should be able to see exactly what requests are being made, to determine not only how many times your handlers are being called, but exactly what is generating the requests and whether they're to another endpoint (like /favicon.ico).

How can I block (and join) on a channel fed by an unknown number of goroutines?

I have a recursive function. The function will call itself with various different values depending on the data it gets, so the arity and depth of recursion is not known: each call may call itself zero or more times. The function may return any number of values.
I want to parallelise it by getting goroutines and channels involved. Each recursion of inner runs in its own goroutine, and sends back a value on the channel. The outer function deals with those values.
func outer(response []int) {
results := make([]int)
resultsChannel := make(chan int)
inner := func(...) {
resultsChannel <- «some result»;
// Recurse in a new goroutine.
for _, recursionArgument in «some calculated data» {
go inner(recursionArgument)
}
}
go inner(«initial values»);
for {
result := <- resultsChannel
results = append(results, result)
// HELP! How do I decide when to break?
}
return results
}
The problem comes with escaping the results channel loop. Because of the 'shape' of the recursion (unknown arity and depth) I can't say "finish after n events" and I can't send a sentinel value.
How do I detect when all my recursions have happened and return from outer? Is there a better way to approach this?
You can use a sync.WaitGroup to manage the collection of goroutines you spawn: call Add(1) before spawning each new goroutine, and Done when each goroutine completes. So something like this:
var wg sync.WaitGroup
inner := func(...) {
...
// Recurse in a new goroutine.
for _, recursionArgument := range «some calculated data» {
wg.Add(1)
go inner(recursionArgument)
}
...
wg.Done()
}
wg.Add(1)
go inner(«initial values»)
Now waiting on wg will tell you when all the goroutines have completed.
If you are reading the results from a channel, the obvious way to tell when there are no more results is by closing the channel. You can achieve this through another goroutine to do this for us:
go func() {
wg.Wait()
close(resultsChannel)
}()
You should now be able to simply range over resultsChannel to read all the results.

Resources