Remove file once served? - http

Is there a way to remove the whole static directory from the server once its content was served for one time? (with served I mean being displayed on the browser once).
func main() {
fs := http.FileServer(http.Dir(tempDir))
http.Handle("/", fs)
http.HandleFunc("/app/wo", workOrderApp)
log.Fatal(http.ListenAndServe(":"+os.Args[1], nil))
}
func workOrderApp(w http.ResponseWriter, r *http.Request) {
workOrderAppProcess(w)
time.Sleep(time.Duration(4 * time.Second)) //some time to let render the html
os.RemoveAll(tempDir)
}
The sleep os.RemoveAll was a hit and miss. Had to adjust the sleep time to a few seconds otherwise the file sometimes was served and sometimes not, I believe because bandwith or network related stuff. But it also had the side effect of delaying the whole rendering of the page.
In this example I remove all the directory, which is what I want.
func workOrderAppProcess(aid, date, language, token string, w http.ResponseWriter) {
zipDir := os.Args[2]
if _, err := os.Stat(tempDir); os.IsNotExist(err) {
log.Printf("Creating directory: %v", tempDir)
err := os.MkdirAll(tempDir, 0777)
if err != nil {
log.Print(err.Error())
}
}
log.Printf("Extracting file: %v to: %v", date+".zip", tempDir)
zipPath, _ := filepath.Abs(zipDir + "/" + date + ".zip")
app.ExtractZip(zipPath, tempDir)
batch := app.ReturnBatchNumber(tempDir + date)
typesData := app.ReturnWorkTypeData(app.ParseXML(tempDir + date + "/" + batch + "_type_list.xml"))
record := app.FindAppointmentRecord(aid, app.ParseXML(tempDir+date+"/"+batch+"_appt.xml"))
signatureFileURL := app.ReturnSignatureFileURL(tempDir+date, aid, date)
app.RenderTemplate(record, typesData, "template/wo.html", language, "/"+signatureFileURL, w)
}

Your code is causing the HTTP handler to wait 4 seconds, delete the files, then finalize the HTTP response. Just remove the sleep.
func workOrderApp(w http.ResponseWriter, r *http.Request) {
workOrderAppProcess(w)
os.RemoveAll(tempDir)
}
This is more efficient, more directly reflects your intention, and doesen't leave the HTTP connection open needlessly for an extra 4 seconds.
If you have other logic in your handler not shown, and you want to ensure that the delete happens in all cases, a defer can be useful:
func workOrderApp(w http.ResponseWriter, r *http.Request) {
workOrderAppProcess(w)
defer os.RemoveAll(tempDir)
/* Other logic that may do things */
}
After discussion in chat, it's apparent that workOrderAppProcess is rendering your HTML, and that os.RemoveAll is removing the images needed by that HTML. To solve this, you need to delay the removal, but after serving the HTML. This can be done with a simple goroutine:
func workOrderApp(w http.ResponseWriter, r *http.Request) {
workOrderAppProcess(w)
go func() {
time.Sleep(60 * time.Second)
os.RemoveAll(tempDir)
}()
}

Related

Why is there a 60 second delay on my HTTP POST request when using a Go HTTP client?

My goal is to scrape a website that requires me to log in first using HTTP requests in Golang. I actually succeeded by finding out I can send a post request to the website writing form-data into the body of the request. When I test this through an API development software I use called Postman, the response is instantaneous with no delays. However, when performing the request with an HTTP client in Go, there is a consistent 60 second delay every single time. I end up getting a logged in page, but for my program I need the response to be nearly instantaneous.
As you can see in my code, I've tried adding a bunch of headers to the request like "Connection", "Content-Type", "User-Agent" since I thought maaaaaybe the website can tell I'm requesting from a program and is forcing me to wait 60 seconds for a response. Adding these headers to make my request more legitimate(?) doesn't work at all.
Is the delay coming from Go's HTTP client being slow or is there something wrong with how I'm forming my HTTP POST request? Also, was I on to something with my headers and HTTP client is rewriting them when they send out?
Here's my simple program...
package main
import (
"bytes"
"fmt"
"mime/multipart"
"net/http"
"net/http/cookiejar"
"os"
)
func main() {
url := "https://easypronunciation.com/en/log-in"
method := "POST"
payload := &bytes.Buffer{}
writer := multipart.NewWriter(payload)
_ = writer.WriteField("email", "foo#bar.com")
_ = writer.WriteField("password", "*********")
_ = writer.WriteField("persistent_login", "on")
_ = writer.WriteField("submit", "")
err := writer.Close()
if err != nil {
fmt.Println(err)
}
cookieJar, _ := cookiejar.New(nil)
client := &http.Client{
Jar: cookieJar,
}
req, err := http.NewRequest(method, url, payload)
if err != nil {
fmt.Println(err)
}
req.Header.Set("Content-Type", writer.FormDataContentType())
req.Header.Set("Connection", "Keep-Alive")
req.Header.Set("Accept-Language", "en-US")
req.Header.Set("User-Agent", "Mozilla/5.0")
res, err := client.Do(req)
if err != nil {
fmt.Println(err)
}
defer res.Body.Close()
f, err := os.Create("response.html")
defer f.Close()
res.Write(f)
}
I doubt, this is the go client library too. I would suggest printing out the latencies for different components and see if/where the 60 second delay is. I would also replace and try different URLs instead

Async work after response

I am trying to implement http server that:
Calculate farther redirect using some logic
Redirect user
Log user data
The goal is to achieve maximum throughput (at least 15k rps). In order to do this, I want to save log asynchronously. I'm using kafka as logging system and separate logging block of code into separate goroutine. Overall example of current implementation:
package main
import (
"github.com/confluentinc/confluent-kafka-go/kafka"
"net/http"
"time"
"encoding/json"
)
type log struct {
RuntimeParam string `json:"runtime_param"`
AsyncParam string `json:"async_param"`
RemoteAddress string `json:"remote_address"`
}
var (
producer, _ = kafka.NewProducer(&kafka.ConfigMap{
"bootstrap.servers": "localhost:9092,localhost:9093",
"queue.buffering.max.ms": 1 * 1000,
"go.delivery.reports": false,
"client.id": 1,
})
topicName = "log"
)
func main() {
siteMux := http.NewServeMux()
siteMux.HandleFunc("/", httpHandler)
srv := &http.Server{
Addr: ":8080",
Handler: siteMux,
ReadTimeout: 2 * time.Second,
WriteTimeout: 5 * time.Second,
IdleTimeout: 10 * time.Second,
}
if err := srv.ListenAndServe(); err != nil {
panic(err)
}
}
func httpHandler(w http.ResponseWriter, r *http.Request) {
handlerLog := new(log)
handlerLog.RuntimeParam = "runtimeDataString"
http.Redirect(w, r, "http://google.com", 301)
go func(goroutineLog *log, request *http.Request) {
goroutineLog.AsyncParam = "asyncDataString"
goroutineLog.RemoteAddress = r.RemoteAddr
jsonLog, err := json.Marshal(goroutineLog)
if err == nil {
producer.ProduceChannel() <- &kafka.Message{
TopicPartition: kafka.TopicPartition{Topic: &topicName, Partition: kafka.PartitionAny},
Value: jsonLog,
}
}
}(handlerLog, r)
}
The questions are:
Is it correct/efficient to use separate goroutine to implement async logging or should I use a different approach? (workers and channels for example)
Maybe there is a way to further improve performance of server, that I'm missing?
Yes, this is correct and efficient use of a goroutine (as Flimzy pointed in the comments). I couldn't agree more, this is a good approach.
The problem is that the handler may finish executing before the goroutine started processing everything and the request (which is a pointer) may be gone or you may have some races down the middleware stack. I read your comments, that it isn't your case, but in general, you shouldn't pass a request to a goroutine. As I can see from your code, you're really using only RemoteAddr from the request and why not to redirect straight away and put logging in the defer statement? So, I'd rewrite your handler a bit:
func httpHandler(w http.ResponseWriter, r *http.Request) {
http.Redirect(w, r, "http://google.com", 301)
defer func(runtimeDataString, RemoteAddr string) {
handlerLog := new(log)
handlerLog.RuntimeParam = runtimeDataString
handlerLog.AsyncParam = "asyncDataString"
handlerLog.RemoteAddress = RemoteAddr
jsonLog, err := json.Marshal(handlerLog)
if err == nil {
producer.ProduceChannel() <- &kafka.Message{
TopicPartition: kafka.TopicPartition{Topic: &topicName, Partition: kafka.PartitionAny},
Value: jsonLog,
}
}
}("runtimeDataString", r.RemoteAddr)
}
The goroutines unlikely improve performance of your server as you just send the response earlier and those kafka connections could pile up in the background and slow down the whole server. If you find this as the bottleneck, you may consider saving logs locally and sending them to kafka in another process (or pool of workers) outside of your server. This may spread the workload over time (like sending fewer logs when you have more requests and vice versa).

How to avoid running into max open files limit

I'm building an application that will be downloading roughly 5000 CSV files concurrently using go routines and plain ol http get requests. Downloading the files in parallel.
I'm currently running into open file limits imposed by OS X.
The CSV files are served over http. Are there any other network protocols that I can use to batch each request into one? I don't have access to the server, so I can't zip them. I'd also prefer not to change the ulimit because once in production, I probably won't have access to that configuration.
You probably want to limit active concurrent requests to a more sensible number than 5000. Possibly spin up 10/20 workers and send individual files to them over a channel.
The http client should reuse connections for requests, assuming you always read the entire request body, and close it.
Something like this:
func main() {
http.DefaultTransport.(*http.Transport).MaxIdleConnsPerHost = 100
for i := 0; i < 10; i++ {
wg.Add(1)
go worker()
}
var csvs = []string{"http://example.com/a.csv", "http://example.com/b.csv"}
for _, u := range csvs {
ch <- u
}
close(ch)
wg.Wait()
}
var ch = make(chan string)
var wg sync.WaitGroup
func worker() {
defer wg.Done()
for u := range ch {
get(u)
}
}
func get(u string) {
resp, err := http.Get(u)
//check err here
// make sure we always read rest of body, and close
defer resp.Body.Close()
defer io.Copy(ioutil.Discard, resp.Body)
//read and decode / handle it. Make sure to read all of body.
}

Process concurrent HTTP requests in Golang

I'm trying to process a file which contains 200 URLs and use each URL to make an HTTP request. I need to process 10 URLs concurrently maximum each time (code should block until 10 URLs finish processing). Tried to solve it in go but I keep getting the whole file processed with 200 concurrent connection created.
for scanner.Scan() { // loop through each url in the file
// send each url to golang HTTPrequest
go HTTPrequest(scanner.Text(), channel, &wg)
}
fmt.Println(<-channel)
wg.Wait()
What should i do?
A pool of 10 go routines reading from a channel should fulfill your requirements.
work := make(chan string)
// get original 200 urls
var urlsToProcess []string = seedUrls()
// startup pool of 10 go routines and read urls from work channel
for i := 0; i<=10; i++ {
go func(w chan string) {
url := <-w
}(work)
}
// write urls to the work channel, blocking until a worker goroutine
// is able to start work
for _, url := range urlsToProcess {
work <- url
}
Cleanup and request results are left as an exercise for you. Go channels is will block until one of the worker routines is able to read.
code like this
longTimeAct := func(index int, w chan struct{}, wg *sync.WaitGroup) {
defer wg.Done()
time.Sleep(1 * time.Second)
println(index)
<-w
}
wg := new(sync.WaitGroup)
ws := make(chan struct{}, 10)
for i := 0; i < 100; i++ {
ws <- struct{}{}
wg.Add(1)
go longTimeAct(i, ws, wg)
}
wg.Wait()

It takes too much time when using "template" package to generate a dynamic web page to client in Golang

It is so slow when using template package to generate a dynamic web page to client.
Testing code as below, golang 1.4.1
http.Handle("/js/", (http.FileServer(http.Dir(webpath))))
http.Handle("/css/", (http.FileServer(http.Dir(webpath))))
http.Handle("/img/", (http.FileServer(http.Dir(webpath))))
http.HandleFunc("/test", TestHandler)
func TestHandler(w http.ResponseWriter, r *http.Request) {
Log.Info("Entering TestHandler ...")
r.ParseForm()
filename := NiConfig.webpath + "/test.html"
t, err := template.ParseFiles(filename)
if err != nil {
Log.Error("template.ParseFiles err = %v", err)
}
t.Execute(w, nil)
}
According to the log, I found that it took about 3 seconds in t.Execute(w, nil), I do not know why it uses so much time. I also tried Apache server to test test.html, it responded very fast.
You should not parse templates every time you serve a request!
There is a significant time delay to read a file, parse its content and build the template. Also since templates do not change (varying parts should be parameters!) you only have to read and parse a template once.
Also parsing and creating the template each time when serving a request generates lots of values in memory which are then thrown away (because they are not reused) giving additional work for the garbage collector.
Parse the templates when your application starts, store it in a variable, and you only have to execute the template when a request comes in. For example:
var t *template.Template
func init() {
filename := NiConfig.webpath + "/test.html"
t = template.Must(template.ParseFiles(filename))
http.HandleFunc("/test", TestHandler)
}
func TestHandler(w http.ResponseWriter, r *http.Request) {
Log.Info("Entering TestHandler ...")
// Template is ready, just Execute it
if err := t.Execute(w, nil); err != nil {
log.Printf("Failed to execute template: %v", err)
}
}

Resources