I am trying to implement http server that:
Calculate farther redirect using some logic
Redirect user
Log user data
The goal is to achieve maximum throughput (at least 15k rps). In order to do this, I want to save log asynchronously. I'm using kafka as logging system and separate logging block of code into separate goroutine. Overall example of current implementation:
package main
import (
"github.com/confluentinc/confluent-kafka-go/kafka"
"net/http"
"time"
"encoding/json"
)
type log struct {
RuntimeParam string `json:"runtime_param"`
AsyncParam string `json:"async_param"`
RemoteAddress string `json:"remote_address"`
}
var (
producer, _ = kafka.NewProducer(&kafka.ConfigMap{
"bootstrap.servers": "localhost:9092,localhost:9093",
"queue.buffering.max.ms": 1 * 1000,
"go.delivery.reports": false,
"client.id": 1,
})
topicName = "log"
)
func main() {
siteMux := http.NewServeMux()
siteMux.HandleFunc("/", httpHandler)
srv := &http.Server{
Addr: ":8080",
Handler: siteMux,
ReadTimeout: 2 * time.Second,
WriteTimeout: 5 * time.Second,
IdleTimeout: 10 * time.Second,
}
if err := srv.ListenAndServe(); err != nil {
panic(err)
}
}
func httpHandler(w http.ResponseWriter, r *http.Request) {
handlerLog := new(log)
handlerLog.RuntimeParam = "runtimeDataString"
http.Redirect(w, r, "http://google.com", 301)
go func(goroutineLog *log, request *http.Request) {
goroutineLog.AsyncParam = "asyncDataString"
goroutineLog.RemoteAddress = r.RemoteAddr
jsonLog, err := json.Marshal(goroutineLog)
if err == nil {
producer.ProduceChannel() <- &kafka.Message{
TopicPartition: kafka.TopicPartition{Topic: &topicName, Partition: kafka.PartitionAny},
Value: jsonLog,
}
}
}(handlerLog, r)
}
The questions are:
Is it correct/efficient to use separate goroutine to implement async logging or should I use a different approach? (workers and channels for example)
Maybe there is a way to further improve performance of server, that I'm missing?
Yes, this is correct and efficient use of a goroutine (as Flimzy pointed in the comments). I couldn't agree more, this is a good approach.
The problem is that the handler may finish executing before the goroutine started processing everything and the request (which is a pointer) may be gone or you may have some races down the middleware stack. I read your comments, that it isn't your case, but in general, you shouldn't pass a request to a goroutine. As I can see from your code, you're really using only RemoteAddr from the request and why not to redirect straight away and put logging in the defer statement? So, I'd rewrite your handler a bit:
func httpHandler(w http.ResponseWriter, r *http.Request) {
http.Redirect(w, r, "http://google.com", 301)
defer func(runtimeDataString, RemoteAddr string) {
handlerLog := new(log)
handlerLog.RuntimeParam = runtimeDataString
handlerLog.AsyncParam = "asyncDataString"
handlerLog.RemoteAddress = RemoteAddr
jsonLog, err := json.Marshal(handlerLog)
if err == nil {
producer.ProduceChannel() <- &kafka.Message{
TopicPartition: kafka.TopicPartition{Topic: &topicName, Partition: kafka.PartitionAny},
Value: jsonLog,
}
}
}("runtimeDataString", r.RemoteAddr)
}
The goroutines unlikely improve performance of your server as you just send the response earlier and those kafka connections could pile up in the background and slow down the whole server. If you find this as the bottleneck, you may consider saving logs locally and sending them to kafka in another process (or pool of workers) outside of your server. This may spread the workload over time (like sending fewer logs when you have more requests and vice versa).
Related
Are there any options for logging request times in grpc-node? I've been able to log response times using opentelemetry & jaegar (to display response times). I couldn't find any npm packages for this either, but just wanted to ask you guys if you did find any options for grpc-node.
You don't need a package to do it, you can do it using a simple gRPC client interceptor.
This is how you would do it in Golang. Just check how you can create a gRPC interceptor in Node. Sorry for not having a JS example, hope it helps some how.
func UnaryRequestTImeInterceptor() grpc.UnaryClientInterceptor {
return func(
ctx context.Context,
method string,
req interface{},
reply interface{},
cc *grpc.ClientConn,
invoker grpc.UnaryInvoker,
opts ...grpc.CallOption,
) error {
start := time.Now()
err := invoker(ctx, method, req, reply, cc, opts...)
reqTime := time.Since(start)
return err
}
}
Say I have a http handler like this:
func ReallyLongFunction(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Hello World!")
// run code that takes a long time here
// Executing dd command with cmd.Exec..., etc.
})
Is there a way I can interrupt this function if the user refreshes the page or kills the request some other way without running the subsequent code and how would I do it?
I tried doing this:
notify := r.Context().Done()
go func() {
<-notify
println("Client closed the connection")
s.downloadCleanup()
return
}()
but the code after whenever I interrupt it still runs anyway.
There's no way to forcibly tear a goroutine down from any code external to that goroutine.
Hence the only way to actually interrupt processing is to periodically check whether the client is gone (or whether there's another signal to stop processing).
Basically that would amount to structuring your handler something like this
func ReallyLongFunction(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Hello World!")
done := r.Context().Done()
// Check wheteher we're done
// do some small piece of stuff
// check whether we're done
// do another small piece of stuff
// …rinse, repeat
})
Now a way to check whether there was something written to a channel, but without blocking the operation is to use the "select with default" idiom:
select {
case <- done:
// We're done
default:
}
This statemept executes the code in the "// We're done" block if and only if done was written to or was closed (which is the case with contexts), and otherwis the empty block in the default branch is executed.
So we can refactor that to something like
func ReallyLongFunction(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Hello World!")
done := r.Context().Done()
closed := func () bool {
select {
case <- done:
return true
default:
return false
}
}
if closed() {
return
}
// do some small piece of stuff
if closed() {
return
}
// do another small piece of stuff
// …rinse, repeat
})
Stopping an external process started in an HTTP handler
To address the OP's comment…
The os/exec.Cmd type has the Process field, which is of type os.Process and that type supports the Kill method which forcibly brings the running process down.
The only problem is that exec.Cmd.Run blocks until the process exits,
so the goroutine which is executing it cannot execute other code, and if exec.Cmd.Run is called in an HTTP handler, there's no way to cancel it.
How to best handle running a program in such an asynchronous manner heavily depends on how the process itself is organized but I'd roll like this:
In the handler, prepare the process and then start it using exec.Cmd.Start (as opposed to Run).
Check the error value Start have returned: if it's nil
the process has managed to start OK. Otherwise somehow communicate the failure to the client and quit the handler.
Once the process is known to had started, the exec.Cmd value
has some of its fields populated with process-related information;
of particular interest is the Process field which is of type
os.Process: that type has the Kill method which may be used to forcibly bring the process down.
Start a goroutine and pass it that exec.Cmd value and a channel of some suitable type (see below).
That goroutine should call Wait on it and once it returns,
it should communicate that fact back to the originating goroutine over that channel.
Exactly what to communicate, is an open question as it depends
on whether you want to collect what the process wrote to its standard
output and error streams and/or may be some other data related to the process' activity.
After sending the data, that goroutine exits.
The main goroutine (executing the handler) should just call exec.Cmd.Process.Kill when it detect the handler should terminate.
Killing the process eventually unblocks the goroutine which is executing Wait on that same exec.Cmd value as the process exits.
After killing the process, the handler goroutine waits on the channel to hear back from the goroutine watching the process. The handler does something with that data (may be logs it or whatever) and exits.
You should cancel the goroutine from inside, so for a long calculation task, you may provide checkpoints, to stop and check for the cancelation:
Here is the tested code for the server which has e.g. long calculation task and checkpoints for the cancelation:
package main
import (
"fmt"
"io"
"log"
"net/http"
"time"
)
func main() {
http.HandleFunc(`/`, func(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()
log.Println("wait a couple of seconds ...")
for i := 0; i < 10; i++ { // long calculation
select {
case <-ctx.Done():
log.Println("Client closed the connection:", ctx.Err())
return
default:
fmt.Print(".")
time.Sleep(200 * time.Millisecond) // long calculation
}
}
io.WriteString(w, `Hi`)
log.Println("Done.")
})
log.Println(http.ListenAndServe(":8081", nil))
}
Here is the client code, which times out:
package main
import (
"io/ioutil"
"log"
"net/http"
"time"
)
func main() {
log.Println("HTTP GET")
client := &http.Client{
Timeout: 1 * time.Second,
}
r, err := client.Get(`http://127.0.0.1:8081/`)
if err != nil {
log.Fatal(err)
}
defer r.Body.Close()
bs, err := ioutil.ReadAll(r.Body)
if err != nil {
log.Fatal(err)
}
log.Println("HTTP Done.")
log.Println(string(bs))
}
You may use normal browser to check for not canclation, or close it, refresh it , disconect it, or ..., for the cancelation.
I'm trying to process a file which contains 200 URLs and use each URL to make an HTTP request. I need to process 10 URLs concurrently maximum each time (code should block until 10 URLs finish processing). Tried to solve it in go but I keep getting the whole file processed with 200 concurrent connection created.
for scanner.Scan() { // loop through each url in the file
// send each url to golang HTTPrequest
go HTTPrequest(scanner.Text(), channel, &wg)
}
fmt.Println(<-channel)
wg.Wait()
What should i do?
A pool of 10 go routines reading from a channel should fulfill your requirements.
work := make(chan string)
// get original 200 urls
var urlsToProcess []string = seedUrls()
// startup pool of 10 go routines and read urls from work channel
for i := 0; i<=10; i++ {
go func(w chan string) {
url := <-w
}(work)
}
// write urls to the work channel, blocking until a worker goroutine
// is able to start work
for _, url := range urlsToProcess {
work <- url
}
Cleanup and request results are left as an exercise for you. Go channels is will block until one of the worker routines is able to read.
code like this
longTimeAct := func(index int, w chan struct{}, wg *sync.WaitGroup) {
defer wg.Done()
time.Sleep(1 * time.Second)
println(index)
<-w
}
wg := new(sync.WaitGroup)
ws := make(chan struct{}, 10)
for i := 0; i < 100; i++ {
ws <- struct{}{}
wg.Add(1)
go longTimeAct(i, ws, wg)
}
wg.Wait()
It is so slow when using template package to generate a dynamic web page to client.
Testing code as below, golang 1.4.1
http.Handle("/js/", (http.FileServer(http.Dir(webpath))))
http.Handle("/css/", (http.FileServer(http.Dir(webpath))))
http.Handle("/img/", (http.FileServer(http.Dir(webpath))))
http.HandleFunc("/test", TestHandler)
func TestHandler(w http.ResponseWriter, r *http.Request) {
Log.Info("Entering TestHandler ...")
r.ParseForm()
filename := NiConfig.webpath + "/test.html"
t, err := template.ParseFiles(filename)
if err != nil {
Log.Error("template.ParseFiles err = %v", err)
}
t.Execute(w, nil)
}
According to the log, I found that it took about 3 seconds in t.Execute(w, nil), I do not know why it uses so much time. I also tried Apache server to test test.html, it responded very fast.
You should not parse templates every time you serve a request!
There is a significant time delay to read a file, parse its content and build the template. Also since templates do not change (varying parts should be parameters!) you only have to read and parse a template once.
Also parsing and creating the template each time when serving a request generates lots of values in memory which are then thrown away (because they are not reused) giving additional work for the garbage collector.
Parse the templates when your application starts, store it in a variable, and you only have to execute the template when a request comes in. For example:
var t *template.Template
func init() {
filename := NiConfig.webpath + "/test.html"
t = template.Must(template.ParseFiles(filename))
http.HandleFunc("/test", TestHandler)
}
func TestHandler(w http.ResponseWriter, r *http.Request) {
Log.Info("Entering TestHandler ...")
// Template is ready, just Execute it
if err := t.Execute(w, nil); err != nil {
log.Printf("Failed to execute template: %v", err)
}
}
In Go, a TCP connection (net.Conn) is a io.ReadWriteCloser. I'd like to test my network code by simulating a TCP connection. There are two requirements that I have:
the data to be read is stored in a string
whenever data is written, I'd like it to be stored in some kind of buffer which I can access later
Is there a data structure for this, or an easy way to make one?
No idea if this existed when the question was asked, but you probably want net.Pipe() which provides you with two full duplex net.Conn instances which are linked to each other
EDIT: I've rolled this answer into a package which makes things a bit simpler - see here: https://github.com/jordwest/mock-conn
While Ivan's solution will work for simple cases, keep in mind that a real TCP connection is actually two buffers, or rather pipes. For example:
Server | Client
---------+---------
reads <=== writes
writes ===> reads
If you use a single buffer that the server both reads from and writes to, you could end up with the server talking to itself.
Here's a solution that allows you to pass a MockConn type as a ReadWriteCloser to the server. The Read, Write and Close functions simply proxy through to the functions on the server's end of the pipes.
type MockConn struct {
ServerReader *io.PipeReader
ServerWriter *io.PipeWriter
ClientReader *io.PipeReader
ClientWriter *io.PipeWriter
}
func (c MockConn) Close() error {
if err := c.ServerWriter.Close(); err != nil {
return err
}
if err := c.ServerReader.Close(); err != nil {
return err
}
return nil
}
func (c MockConn) Read(data []byte) (n int, err error) { return c.ServerReader.Read(data) }
func (c MockConn) Write(data []byte) (n int, err error) { return c.ServerWriter.Write(data) }
func NewMockConn() MockConn {
serverRead, clientWrite := io.Pipe()
clientRead, serverWrite := io.Pipe()
return MockConn{
ServerReader: serverRead,
ServerWriter: serverWrite,
ClientReader: clientRead,
ClientWriter: clientWrite,
}
}
When mocking a 'server' connection, simply pass the MockConn in place of where you would use the net.Conn (this obviously implements the ReadWriteCloser interface only, you could easily add dummy methods for LocalAddr() etc if you need to support the full net.Conn interface)
In your tests you can act as the client by reading and writing to the ClientReader and ClientWriter fields as needed:
func TestTalkToServer(t *testing.T) {
/*
* Assumes that NewMockConn has already been called and
* the server is waiting for incoming data
*/
// Send a message to the server
fmt.Fprintf(mockConn.ClientWriter, "Hello from client!\n")
// Wait for the response from the server
rd := bufio.NewReader(mockConn.ClientReader)
line, err := rd.ReadString('\n')
if line != "Hello from server!" {
t.Errorf("Server response not as expected: %s\n", line)
}
}
Why not using bytes.Buffer? It's an io.ReadWriter and has a String method to get the stored data. If you need to make it an io.ReadWriteCloser, you could define you own type:
type CloseableBuffer struct {
bytes.Buffer
}
and define a Close method:
func (b *CloseableBuffer) Close() error {
return nil
}
In majority of the cases you do not need to mock net.Conn.
You only have to mock stuff that will add time to your tests, prevent tests from running in parallel (using shared resources like the hardcoded file name) or can lead to outages (you can potentially exhaust the connection limit or ports but in most of the cases it is not a concern, when you run your tests in isolation).
Not mocking has an advantage of more precise testing of what you want to test with a real thing.
https://www.accenture.com/us-en/blogs/software-engineering-blog/to-mock-or-not-to-mock-is-that-even-a-question
Instead of mocking net.Conn, you can write a mock server, run it in a goroutine in your test and connect to it using real net.Conn
A quick and dirty example:
port := someRandomPort()
srv := &http.Server{Addr: port}
go func(msg string) {
http.HandleFunc("/hello", myHandleFUnc)
srv.ListenAndServe()
}
myTestCodeUsingConn(port)
srv.Shutdown(context.TODO())