Data processing by several functions is asynchronous - asynchronous

I have data that I receive via http, this data needs to be processed by two different functions. It is important that they are processed by each function in sequence. In the file, for example: 1,2,3,4,5. And the database also recorded 1,2,3,4,5. As a fifo model.
Now I have such a problem ... The data I have is running continuously and sometimes the database can fulfill my request to update the data quite a long time, because of this I can not update the file in a timely manner.
It is important for me that the data has been added to the file or database when it is possible. I could use buffered channels, but I can not know how much data can wait for processing in the queue, I would not want to indicate the size of the buffer is certainly large.
I tried adding more goroutine to the NewData function, but in that case my data is not written sequentially.
this code shows the problem.
package main
import (
"fmt"
"time"
)
type procHandler interface {
Start()
NewData(newdata []byte)
}
type fileWriter struct {
Data chan []byte
}
func (proc *fileWriter) Start() {
proc.Data = make(chan []byte)
go func() {
for {
obj := <-proc.Data
fmt.Printf("proc %T ", proc)
fmt.Println(obj)
}
}()
}
func (proc *fileWriter) NewData(newdata []byte) {
proc.Data <- newdata
}
type sqlWriter struct {
Data chan []byte
}
func (proc *sqlWriter) Start() {
proc.Data = make(chan []byte)
go func() {
for {
obj := <-proc.Data
time.Sleep(5 * time.Second)
fmt.Printf("proc %T ", proc)
fmt.Println(obj)
}
}()
}
func (proc *sqlWriter) NewData(newdata []byte) {
proc.Data <- newdata
}
var processors = []procHandler{}
func receiver() {
newDataImitateByteRange := 30
for i := 0; i < newDataImitateByteRange; i++ {
pseudoData := []byte{byte(i)}
for _, handler := range processors {
handler.NewData(pseudoData)
}
}
}
func main() {
// file writer
fileUpdate := &fileWriter{}
processors = append(processors, fileUpdate)
// sql writer
sqlUpdate := &sqlWriter{}
processors = append(processors, sqlUpdate)
sqlUpdate.Start()
fileUpdate.Start()
go receiver()
fmt.Scanln()
}
Code works: https://play.golang.org/p/rSshsJYZ4h
output:
proc *main.fileWriter [0]
proc *main.fileWriter [1]
proc *main.sqlWriter [0] (sleep)
proc *main.fileWriter [2] (Display after 5 seconds when the previous channel is processed)
proc *main.sqlWriter [1] (sleep)
proc *main.fileWriter [3] (Display after 5 seconds when the previous channel is processed)
proc *main.sqlWriter [2]
proc *main.fileWriter [4]
proc *main.sqlWriter [3]
proc *main.fileWriter [5]
proc *main.sqlWriter [4]
proc *main.fileWriter [6]
I want:
proc *main.fileWriter [0]
proc *main.fileWriter [1]
proc *main.fileWriter [2]
proc *main.fileWriter [3]
proc *main.fileWriter [4]
proc *main.fileWriter [5]
proc *main.fileWriter [6]
proc *main.sqlWriter [0] (after 5 seconds passed the handler started execution.)
proc *main.sqlWriter [1] (sleep)
proc *main.sqlWriter [2] (sleep)
proc *main.sqlWriter [3] (sleep)
proc *main.sqlWriter [4] (sleep)
proc *main.sqlWriter [5] (sleep)
proc *main.sqlWriter [6] (sleep)
I hope for help, thank you!

It sounds like what you are looking for is something that works like a channel that resizes (grows or shrinks) with the data that is enqueued on it. This could be implemented by having a queue between an input and output channel, with a goroutine to service those channels. Here is such a solution:
https://github.com/gammazero/bigchan#bigchan
I have used a BigChan as the Data channel in your fileWriter and sqlWriter and it appears to have the results you are looking for. Following is your reworked code:
package main
import (
"fmt"
"time"
"github.com/gammazero/bigchan"
)
// Maximum number of items to buffer. set to -1 for unlimited.
const limit = 65536
type procHandler interface {
Start()
NewData(newdata []byte)
}
type fileWriter struct {
Data *bigchan.BigChan
}
func (proc *fileWriter) Start() {
proc.Data = bigchan.New(limit)
go func() {
for {
_obj := <-proc.Data.Out()
obj := _obj.([]byte)
fmt.Printf("proc %T ", proc)
fmt.Println(obj)
}
}()
}
func (proc *fileWriter) NewData(newdata []byte) {
proc.Data.In() <- newdata
}
type sqlWriter struct {
Data *bigchan.BigChan
}
func (proc *sqlWriter) Start() {
proc.Data = bigchan.New(limit)
go func() {
for {
_obj := <-proc.Data.Out()
obj := _obj.([]byte)
time.Sleep(5 * time.Second)
fmt.Printf("proc %T ", proc)
fmt.Println(obj)
}
}()
}
func (proc *sqlWriter) NewData(newdata []byte) {
proc.Data.In() <- newdata
}
var processors = []procHandler{}
func receiver() {
newDataImitateByteRange := 30
for i := 0; i < newDataImitateByteRange; i++ {
pseudoData := []byte{byte(i)}
for _, handler := range processors {
handler.NewData(pseudoData)
}
}
}
func main() {
// file writer
fileUpdate := &fileWriter{}
processors = append(processors, fileUpdate)
// sql writer
sqlUpdate := &sqlWriter{}
processors = append(processors, sqlUpdate)
sqlUpdate.Start()
fileUpdate.Start()
go receiver()
fmt.Scanln()
}

Related

Save to struct a local pointer

How come this program prints nil instead of hello? How can I solve this situation and successfully store that pointer in the struct? Shouldn't Go be able to figure out when local pointers are used outside the scope of a function?
package main
import (
"fmt"
)
type test struct {
name *string
}
func (t test) test() {
h := "hello"
t.name = &h
return
}
func main() {
a := test{nil}
a.test()
fmt.Println(a.name)
}
Your test function has a value receiver. So test will be applied on a copy of a. If you want to mutate a struct with a method, you should write a method which has a pointer receiver. On calling the method, go will automatically use the reference for this method
func (t *test) test() {
h := "hello"
t.name = &h
}

How to retrieve website source without using ioutil.ReadAll in golang

My code:
func getSourceUrl(url string) (string, error) {
resp, err := http.Get(url)
if err != nil {
fmt.Println("Error getSourceUrl: ")
return "", err
}
defer resp.Body.Close()
body := resp.Body
// time = 0
sourcePage, err := ioutil.ReadAll(body)
// time > 5 minutes
return string(sourcePage), err
}
I have a website link with a source of around> 100000 lines. Using ioutil.ReadAll made me get very long (about> 5 minutes for 1 link). Is there a way to get Source website faster? Thank you!
#Minato try this code, play with M throttling parameter. Play with it if you get too errors (reduce it).
package main
import (
"fmt"
"io"
"io/ioutil"
"log"
"net/http"
"runtime"
"time"
)
// Token is an empty struct for signalling
type Token struct{}
// N files to get
var N = 301 // at the source 00000 - 00300
// M max go routines
var M = runtime.NumCPU() * 16
// Throttle to max M go routines
var Throttle = make(chan Token, M)
// DoneStatus is used to signal end of
type DoneStatus struct {
length int
sequence string
duration float64
err error
}
// ExitOK is simple exit counter
var ExitOK = make(chan DoneStatus)
// TotalBytes read
var TotalBytes = 0
// TotalErrors captured
var TotalErrors = 0
// URLTempl is templte for URL construction
var URLTempl = "https://virusshare.com/hashes/VirusShare_%05d.md5"
func close(c io.Closer) {
err := c.Close()
if err != nil {
log.Fatal(err)
}
}
func main() {
log.Printf("start main. M=%d\n", M)
startTime := time.Now()
for i := 0; i < N; i++ {
go func(idx int) {
// slow ramp up fire getData after i seconds
time.Sleep(time.Duration(i) * time.Second)
url := fmt.Sprintf(URLTempl, idx)
_, _ = getData(url) // errors captured as data
}(i)
}
// Count N byte count signals
for i := 0; i < N; i++ {
status := <-ExitOK
TotalBytes += status.length
if status.err != nil {
TotalErrors++
log.Printf("[%d] : %v\n", i, status.err)
continue
}
log.Printf("[%d] file %s, %.1f MByte, %.1f min, %.1f KByte/sec\n",
i, status.sequence,
float64(status.length)/(1024*1024),
status.duration/60,
float64(status.length)/(1024)/status.duration)
}
// totals
duration := time.Since(startTime).Seconds()
log.Printf("Totals: %.1f MByte, %.1f min, %.1f KByte/sec\n",
float64(TotalBytes)/(1024*1024),
duration/60,
float64(TotalBytes)/(1024)/duration)
// using fatal to verify only one go routine is running at the end
log.Fatalf("TotalErrors: %d\n", TotalErrors)
}
func getData(url string) (data []byte, err error) {
var startTime time.Time
defer func() {
// release token
<-Throttle
// signal end of go routine, with some status info
ExitOK <- DoneStatus{
len(data),
url[41:46],
time.Since(startTime).Seconds(),
err,
}
}()
// acquire one of M tokens
Throttle <- Token{}
log.Printf("Started file: %s\n", url[41:46])
startTime = time.Now()
resp, err := http.Get(url)
if err != nil {
return
}
defer close(resp.Body)
data, err = ioutil.ReadAll(resp.Body)
if err != nil {
return
}
return
}
Per transfer variation is about 10-40KByte/sec and final total for all 301 files I get 928MB, 11.1min at 1425 KByte/sec. I believe you should be able to get similar results.
// outside the scope of the question but maybe useful
Also give this a try http://www.dslreports.com/speedtest/ go to settings and select bunch of US servers for testing and set duration to 60sec. This will tell you what your actual effective total rate is to US.
Good luck!
You could iterate sections of the response at a time, something like;
responseSection := make([]byte, 128)
body.Read(responseSection)
return string(responseSection), err
Which would read 128 bytes at a time. However would suggest confirming the download speed is not causing the slow load.
The 5 minutes is probably network time.
That said, you generally would not want to buffer enormous objects in memory.
resp.Body is a Reader.
So you cold use io.Copy to copy its contents into a file.
Converting sourcePage into a string is a bad idea as it forces another allocation.

serving content of unbounded size in http in golang

I have the following server code:
package main
import (
"fmt"
"net/http"
"time"
)
type serveData struct {
}
func (s *serveData) Read (p []byte) (int, error) {
l := len (p)
fmt.Println ("p size is ", l);
time.Sleep(200 * time.Millisecond);
return l, nil;
}
func (s *serveData) Seek (offset int64, whence int) (int64, error) {
fmt.Println ("in seek ", offset);
return offset, nil;
}
func handler(w http.ResponseWriter, r *http.Request) {
reader := new (serveData)
//w.WriteHeader(206);
w.Header().Set("Content-type", "application/octet-stream");
fmt.Println ("got request");
http.ServeContent(w, r, "cool", time.Date(2009, time.November, 10, 23, 0, 0, 0, time.UTC), reader)
}
func main() {
http.HandleFunc("/check", handler)
http.ListenAndServe(":8080", nil)
}
When i connect the client to the server above (curl -X GET http://127.0.0.1:8080/check), nothing happens and curl just exits. The server calls Seek() function 2 times, but never calls Read()
I am serving partial content, as the size is unbounded (pseudo-live data). Also, when I uncomment w.WriteHeader(206), the server complains about "http: multiple response.WriteHeader calls"
What possibly is going wrong here?

Should I RLock map before range?

Is it safe to range map without locking if multiple goroutines will run notifyAll func? Actually in a range I need to sometimes remove entries from a map.
var mu sync.RWMutex
func (self *Server) notifyAll(event *Event)
ch := make(chan int, 64)
num := 0
for k, v := range self.connections {
num++
ch <- num
go func(int k, conn *Conn) {
err := conn.sendMessage(event)
<-ch
if err != nil {
self.removeConn(k)
}
}(k, v)
}
}
func (self *Server) removeConn(int k) {
mu.Lock()
defer mu.Unlock()
delete(self.connections, k)
}
// Somewhere in another goroutine
func (self *Server) addConn(conn *Conn, int k) {
mu.Lock()
defer mu.Unlock()
self.connections[k] = conn
}
Or I must RLock map before range?
func (self *Server) notifyAll(event *Event)
mu.RLock()
defer mu.RUnlock()
// Skipped previous body...
}
Short answer: maps are not concurrent-safe (one can still say thread-safe) in Go.
So, if you need to access a map from different go-routines, you must employ some form of access orchestration, otherwise "uncontrolled map access can crash the program" (see this).
Edit:
This is another implementation (without considering housekeeping concerns - timeouts, quit, log, etc) which ignores the mutex all-together and uses a more Goish approach (this is just for demonstrating this approach which helps us to clear access orchestration concerns - might be right or not for your case):
type Server struct {
connections map[*Conn]struct{}
_removeConn, _addConn chan *Conn
_notifyAll chan *Event
}
func NewServer() *Server {
s := new(Server)
s.connections = make(map[*Conn]struct{})
s._addConn = make(chan *Conn)
s._removeConn = make(chan *Conn, 1)
s._notifyAll = make(chan *Event)
go s.agent()
return s
}
func (s *Server) agent() {
for {
select {
case c := <-s._addConn:
s.connections[c] = struct{}{}
case c := <-s._removeConn:
delete(s.connections, c)
case e := <-s._notifyAll:
for c := range s.connections {
closure := c
go func() {
err := closure.sendMessage(e)
if err != nil {
s._removeConn <- closure
}
}()
}
}
}
}
func (s *Server) removeConn(c *Conn) {
s._removeConn <- c
}
func (s *Server) addConn(c *Conn) {
s._addConn <- c
}
Edit:
I stand corrected; according to Damian Gryski maps are safe for concurrent reads. The reason that the map order changes on each iteration is "the random seed chosen for map iteration order, which is local to the goroutine iterating" (another tweet of him). This fact does not affect the first edit and suggested solution.

Unable to send gob data over TCP in Go Programming

I have a client server application, using TCP connection
Client:
type Q struct {
sum int64
}
type P struct {
M, N int64
}
func main() {
...
//read M and N
...
tcpAddr, err := net.ResolveTCPAddr("tcp4", service)
...
var p P
p.M = M
p.N = N
err = enc.Encode(p)
}
Server:
type Q struct {
sum int64
}
type P struct {
M, N int64
}
func main() {
...
tcpAddr, err := net.ResolveTCPAddr("ip4", service)
listener, err := net.ListenTCP("tcp", tcpAddr)
...
var connB bytes.Buffer
dec := gob.NewDecoder(&connB)
var p P
err = dec.Decode(p)
fmt.Printf("{%d, %d}\n", p.M, p.N)
}
The result on serve is {0, 0} because I don't know how to obtain a bytes.Buffer variable from net.Conn.
Is there any way for sending gob variables over TCP ?
If true, how can this be done ? Or there are any alternative in sending numbers over TCP ?
Any help or sample code would really be appreciated.
Here's a complete example.
Server:
package main
import (
"fmt"
"net"
"encoding/gob"
)
type P struct {
M, N int64
}
func handleConnection(conn net.Conn) {
dec := gob.NewDecoder(conn)
p := &P{}
dec.Decode(p)
fmt.Printf("Received : %+v", p);
conn.Close()
}
func main() {
fmt.Println("start");
ln, err := net.Listen("tcp", ":8080")
if err != nil {
// handle error
}
for {
conn, err := ln.Accept() // this blocks until connection or error
if err != nil {
// handle error
continue
}
go handleConnection(conn) // a goroutine handles conn so that the loop can accept other connections
}
}
Client :
package main
import (
"fmt"
"log"
"net"
"encoding/gob"
)
type P struct {
M, N int64
}
func main() {
fmt.Println("start client");
conn, err := net.Dial("tcp", "localhost:8080")
if err != nil {
log.Fatal("Connection error", err)
}
encoder := gob.NewEncoder(conn)
p := &P{1, 2}
encoder.Encode(p)
conn.Close()
fmt.Println("done");
}
Launch the server, then the client, and you see the server displaying the received P value.
A few observations to make it clear :
When you listen on a socket, you should pass the open socket to a goroutine that will handle it.
Conn implements the Reader and Writer interfaces, which makes it easy to use : you can give it to a Decoder or Encoder
In a real application you would probably have the P struct definition in a package imported by both programs

Resources