Why is Golang http.ResponseWriter execution being delayed? - http

I am trying to send a page response as soon as request is received, then process something, but I found the response does not get sent out "first" even though it is first in code sequence.In real life I have a page for uploading a excel sheet which gets saved into the database which takes time (50,0000+ rows) and would like to update to user progress. Here is a simplified example; (depending how much RAM you have you may need to add a couple zeros to counter to see result)
package main
import (
"fmt"
"net/http"
)
func writeAndCount(w http.ResponseWriter, r *http.Request) {
w.Write([]byte("Starting to count"))
for i := 0; i < 1000000; i++ {
if i%1000 == 0 {
fmt.Println(i)
}
}
w.Write([]byte("Finished counting"))
}
func main() {
http.HandleFunc("/", writeAndCount)
http.ListenAndServe(":8080", nil)
}

The original concept of the HTTP protocol is a simple request-response server-client computation model. There was no streaming or "continuous" client update support. It is (was) always the client who first contacted the server should it needed some kind of information.
Also since most web servers cache the response until it is fully ready (or a certain limit is reached–which is typically the buffer size), data you write (send) to the client won't be transmitted immediately.
Several techniques were "developed" to get around this "limitation" so that the server is able to notify the client about changes or progress, such as HTTP Long polling, HTTP Streaming, HTTP/2 Server Push or Websockets. You can read more about these in this answer: Is there a real server push over http?
So to achieve what you want, you have to step around the original "borders" of the HTTP protocol.
If you want to send data periodically, or stream data to the client, you have to tell this to the server. The easiest way is to check if the http.ResponseWriter handed to you implements the http.Flusher interface (using a type assertion), and if it does, calling its Flusher.Flush() method will send any buffered data to the client.
Using http.Flusher is only half of the solution. Since this is a non-standard usage of the HTTP protocol, usually client support is also needed to handle this properly.
First, you have to let the client know about the "streaming" nature of the response, by setting the ContentType=text/event-stream response header.
Next, to avoid clients caching the response, be sure to also set Cache-Control=no-cache.
And last, to let the client know that you might not send the response as a single unit (but rather as periodic updates or as a stream) and so that the client should keep the connection alive and wait for further data, set the Connection=keep-alive response header.
Once the response headers are set as the above, you may start your long work, and whenever you want to update the client about the progress, write some data and call Flusher.Flush().
Let's see a simple example that does everything "right":
func longHandler(w http.ResponseWriter, r *http.Request) {
flusher, ok := w.(http.Flusher)
if !ok {
http.Error(w, "Server does not support Flusher!",
http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "text/event-stream")
w.Header().Set("Cache-Control", "no-cache")
w.Header().Set("Connection", "keep-alive")
start := time.Now()
for rows, max := 0, 50*1000; rows < max; {
time.Sleep(time.Second) // Simulating work...
rows += 10 * 1000
fmt.Fprintf(w, "Rows done: %d (%d%%), elapsed: %v\n",
rows, rows*100/max, time.Since(start).Truncate(time.Millisecond))
flusher.Flush()
}
}
func main() {
http.HandleFunc("/long", longHandler)
panic(http.ListenAndServe("localhost:8080", nil))
}
Now if you open http://localhost:8080/long in your browser, you will see an output "growing" by every second:
Rows done: 10000 (20%), elapsed: 1s
Rows done: 20000 (40%), elapsed: 2s
Rows done: 30000 (60%), elapsed: 3s
Rows done: 40000 (80%), elapsed: 4.001s
Rows done: 50000 (100%), elapsed: 5.001s
Also note that when using SSE, you should "pack" updates into SSE frames, that is you should start them with "data:" prefix, and end each frame with 2 newline chars: "\n\n".
"Literature" and further reading / tutorials
Read more about Server-sent events on Wikipedia.
See a Golang HTML5 SSE example.
See Golang SSE server example with client codes using it.
See w3school.com's turorial on Server-Sent Events - One Way Messaging.

You can check if the ResponseWriter is a http.Flusher, and if so, force the flush to network:
if f, ok := w.(http.Flusher); ok {
f.Flush()
}
However, bear in mind that this is a very unconventional HTTP handler. Streaming out progress messages to the response as if it were a terminal presents a few problems, particularly if the client is a web browser.
You might want to consider something more fitting with the nature of HTTP, such as returning a 202 Accepted response immediately, with a unique identifier the client can use to check on the status of processing using subsequent calls to your API.

Related

Is it possible to concurrently download a single file without range requests?

With HTTP/1.1 Ranged Requests there is a non-zero chance of corruption. Various transparent proxies usually used by ISPs can mangle the requests and thereafter return junk.
Also, not all servers support ranged requests and content length headers.
Is there a KISS way to concurrently download a file using GO without using these tricks?
Is there a KISS way to concurrently download a file using GO without using these tricks?
No, there isn't. But this is not related to Go, this is a plain fact of how HTTP works. If you do not want to use range headers you cannot download just "the first" and "the second" half of a request.
Not only is there no "KISS way", there is absolutely no way (except by using the appropriate tools like range request).
One possible method (depending on exactly how you interpret "concurrently") would be something like the code below. This will NOT allow you to set time-outs or a method through which you can monitor the progress of the download, but it will allow you to start a download, then read it at leisure later.
type download struct {
err error
r io.ReadCloser
}
// Download a file in a goroutine, return a channel through which we can
// read a download struct. The struct will either contain an error or
// an io.ReadCloser through which we can read the file.
func download(url string) chan download {
rv := make(chan download)
go func() {
resp, err := http.Get(url)
if err != nil {
rv <- download{err: err}
return
}
switch {
resp.StatusCode == 200:
// This can be made better, but is probably good
// enough for an illustration
rv <- download{r: resp.Body}
default:
rv <- download{err: fmt.Errorf("HTTP GET gave status %s", resp.Status)
}
}()
return rv
}

Golang : ExecuteTemplate results in errror: `write tcp 127.0.0.1:8080->127.0.0.1:35212: write: broken pipe` with multiple requests

I have something like this:
t, err1 := template.ParseFiles("exampleFile.tmpl")
if err1 != nil {
panic(err1)
}
err2 := t.ExecuteTemplate(w, "example", someStruct)
if err2 != nil {
panic(err2)
}
With simple requests, there are no problems.
But if I send two requests very close to each other, about 40% of the time from simple page refreshes from Chrome and a few ajax requests, it results in various different errors (err2):
write tcp 127.0.0.1:8080->127.0.0.1:35212: write: broken pipe
write: connection reset by peer
use of closed network connection
http: panic serving 127.0.0.1:35814: runtime error: slice bounds out of range
goroutine 340 (...)
Others seem to have had similar issues, but with no clear resolution.
Filter out broken pipe errors from template execution
https://groups.google.com/forum/#!topic/golang-nuts/g6UNu4Mrv28
Another post seems to suggest just to expect this, and retry failed requests:
https://stackoverflow.com/a/31409281/1923095
What's the best Golang way to handle this?

I don't know the time unit to use for av_dict_set to set a timeout

I am confused. I am using the av_dict_set function to set a time-out, but when I searched for information about av_dict_set, the time unit seems to be different. I don't know how to set it now. Can anyone help?
I found some code like the following:
pFormatCtx = avformat_alloc_context();
av_dict_set(&opts, "rtsp_transport", "tcp", 0);
//av_dict_set(&opts, "timeout", "5000000", 0);
if(strncmp(stream_url, "rtmp:", sizeof("rtmp:")) == 0){
av_dict_set(&opts, "timeout", "6", 0); // in secs
}
else if(strncmp(stream_url, "http:", sizeof("http:")) == 0){
av_dict_set(&opts, "timeout", "6000", 0); // in ms
}
if(avformat_open_input(&pFormatCtx, stream_url, NULL, &opts)!=0)
{
return 1;
}
Maybe it should set the time unit according to the different protocols (http or rtsp).
Is the code above right ?
TL;DR
RTMP and RTSP protocol, time base: seconds;
HTTP protocol, time base: microseconds (not milliseconds).
So just fix the HTTP section accordingly by multiplying your current value *1000.
FULL
I have a C++ application that uses libav to encode an H.264/AAC RTSP stream and push it to a local RTSP server that then serves it. Also I have another C++ application that uses libav to decode this RTSP stream, extract video/audio data from packets, rescale them and show pixels data from buffers using SFML.
In the decoding application I use timeout option to determine if the RTSP stream is available or not. This is an optional parameter but if the decoding process starts earlier than RTSP stream being available, the decoding process hangs if timeout is not set. This happens because the default value for RTSP and HTTP protocol is -1, that means "wait infinitely". If you instead set it to a different value and this situation happen, avformat_open_input will return an AVERROR code that you can analyze further, for example you can do new attempts to reconnect to the RTSP stream by simply starting over, thus giving you a finer control over your execution flow.
So the question is: "What is the correct time base for this value so I can use it accordingly?"
As documented here, for RTSP protocol, you can set timeout option to estabilish the maximum amount of time needed to wait to open your stream. In RTSP section, the guide explicitely says that this value is estimated in seconds:
timeout
Set maximum timeout (in seconds) to wait for incoming connections.
A value of -1 means infinite (default). This option implies the rtsp_flags set to ‘listen’.
While it doesn't specify it for RTMP protocol, I have tested it by changing my RTSP URL to RTMP URL without changing the time base and it worked as expected, so my deduction is that both protocols share the same time base.
Also, in the same page, here, for HTTP protocol, you can set timeout value for the same purpose but the time base must be in microseconds.
timeout
Set timeout in microseconds of socket I/O operations used by the underlying low level operation. By default it is set to -1, which means that the timeout is not specified.
So in your case you need to replace, as the time base you expected is not correct (I assume you meant milliseconds) and the correct one is microseconds, in order to have a 6s timeout and not 0.006s timeout:
else if(strncmp(stream_url, "http:", sizeof("http:")) == 0){
av_dict_set(&opts, "timeout", "6000", 0); // in ms
}
with this:
else if(strncmp(stream_url, "http:", sizeof("http:")) == 0){
av_dict_set(&opts, "timeout", "6000000", 0); // In microseconds
}
As your example shows how to do it already, you alloc a format context; then, before you open your stream, you create an AVDictionary and set the timeout value with av_dict_set. You can also set other options. All these informations are passed to avformat_open_input by passing as reference the dictionary just created and configured.
As described in line 405 in libavformat\utils.c, the dictionary info will be copied to the decoder format context priv_data and will be used in order to open the stream.
If timeout is triggered, the function will return an AVERROR code.
avformat_network_init();
AVFormatContext* muxer_receiver = avformat_alloc_context();
AVDictionary* options = NULL;
av_dict_set(&options, "timeout", "3", 0);
if(avformat_open_input(&muxer_receiver, "rtsp://:32400/live/1", NULL, &options)!=0){
return EXIT_FAILURE;
}
if(avformat_find_stream_info(muxer_receiver, NULL)<0){
return EXIT_FAILURE;
}
// Do stuff like retrieving video and audio streams index
av_read_play(muxer_receiver);

multiple response.WriteHeader calls in really simple example?

I have the most basic net/http program that I'm using to learn the namespace in Go:
package main
import (
"fmt"
"log"
"net/http"
)
func main() {
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
fmt.Println(r.URL)
go HandleIndex(w, r)
})
fmt.Println("Starting Server...")
log.Fatal(http.ListenAndServe(":5678", nil))
}
func HandleIndex(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(200)
w.Write([]byte("Hello, World!"))
}
When I run the program and connect to localhost:5678 in Chrome, I get this in the console:
Starting Server...
/
2015/01/15 13:41:29 http: multiple response.WriteHeader calls
/favicon.ico
2015/01/15 13:41:29 http: multiple response.WriteHeader calls
But I don't see how that's possible. I print the URL, start up a new goroutine, write the header once, and give it a static body of Hello, World! It seems like one of two things is happening. Either something behind the scenes is writing another header or somehow HandleIndex is called twice for the same request. What can I do to stop writing multiple headers?
EDIT: It seems to have something to do with the go HandleIndex(w, r) line because if I remove go and just make it a function call instead of a goroutine, I don't get any issues and the browser gets it's data. With it being a goroutine, I get the multiple WriteHeader error and the browser doesn't show "Hello World." Why is making this a goroutine breaking it?
Take a look at the anonymous function you register as the handler of incoming requests:
func(w http.ResponseWriter, r *http.Request) {
fmt.Println(r.URL)
go HandleIndex(w, r)
}
It prints the URL (to the standard output) then calls HandleIndex() in a new goroutine and continues execution.
If you have a handler function where you do not set the response status before the first call to Write, Go will automatically set the response status to 200 (HTTP OK). If the handler function does not write anything to the response (and does not set the response status and completes normally), that is also treated as a successful handling of the request and the response status 200 will be sent back. Your anonymous function does not set it, it does not even write anything to the response. So Go will do just that: set the response status to 200 HTTP OK.
Note that handling each request runs in its own goroutine.
So if you call HandleIndex in a new goroutine, your original anonymous function will continue: it will end and so the response header will be set - meanwhile (concurrently) your started new goroutine will also set the response header - hence the "multiple response.WriteHeader calls" error.
If you remove the "go", your HandleIndex function will set the response header in the same goroutine before your handler function returns, and the "net/http" will know about this and will not try to set the response header again, so the error you experienced will not happen.
You already received a correct answer which addresses your problem, I will give some information about the general case (such error appears often).
From the documentation, you see that WriteHeader sends an http status code and you can't send more than 1 status code. If you Write anything this is equivalent to sending 200 status code and then writing things.
So the message that you see appears if you either user w.WriteHeader more than once explicitly or uses w.Write before w.WriteHeader.
From the documentation:
// WriteHeader sends an HTTP response header with status code.
// If WriteHeader is not called explicitly, the first call to Write
// will trigger an implicit WriteHeader(http.StatusOK).
What is happening in your case is that you are launching go HandleIndex from the handler.
The first handler finishes. The standard WriteHeader writes to the ResponseWriter. Then the go routine HandleIndex is launched and it also tries to write a header and write.
Just remove the go from HandleIndex and it will work.
the root cause is that you called WriteHeader more than once. from the source codes
func (w *response) WriteHeader(code int) {
if w.conn.hijacked() {
w.conn.server.logf("http: response.WriteHeader on hijacked connection")
return
}
if w.wroteHeader {
w.conn.server.logf("http: multiple response.WriteHeader calls")
return
}
w.wroteHeader = true
w.status = code
if w.calledHeader && w.cw.header == nil {
w.cw.header = w.handlerHeader.clone()
}
if cl := w.handlerHeader.get("Content-Length"); cl != "" {
v, err := strconv.ParseInt(cl, 10, 64)
if err == nil && v >= 0 {
w.contentLength = v
} else {
w.conn.server.logf("http: invalid Content-Length of %q", cl)
w.handlerHeader.Del("Content-Length")
}
}
}
so when you wrote once, the variable wroteHeader would be true, then you wrote header again, it wouldn't be effective and gave a warning "http: multiple respnse.WriteHeader calls".
actually the function Write also calls WriteHeader, so putting the function WriteHeader after the function Write also causes that error, and the later WriteHeader doesn't work.
from your case, go handleindex runs in another thread and the original already returns, if you do nothing, it will call WriteHeader to set 200. when running handleindex, it calls another WriteHeader, at that time wroteHeader is true, then the message "http: multiple response.WriteHeader calls" is output.
Yes, use HandleIndex(w, r) instead of go HandleIndex(w, r) will fix your issue, I think you have already figured that out.
The reason is simple, when handling multiple requests at the same time, the http server will start multiple goroutines, and your handler function will be called separately in each of the goroutines without blocking others.
You don't need to start your own goroutine in the handler, unless you practically need it, but that will be another topic.
Because modern browsers send an extra request for /favicon.ico which is also handled in your / request handler.
If you ping your server with curl for example, you'll see only one request being sent:
curl localhost:5678
To be sure you can add an EndPoint in your http.HandleFunc
http.HandleFunc("/Home", func(w http.ResponseWriter, r *http.Request)

Parallel HTTP web crawler in Erlang

I'm coding on a simple web crawler and have generated a bunch gf static files I try to crawl by the code at bottom. I have two issues/questions I don't have an idea for:
1.) Looping over the sequence 1..200 throws me an error exactly after 100 pages have been crawled:
** exception error: no match of right hand side value {error,socket_closed_remotely}
in function erlang_test_01:fetch_page/1 (erlang_test_01.erl, line 11)
in call from lists:foreach/2 (lists.erl, line 1262)
2.) How to parallelize the requests, e.g. 20 cincurrent reqs
-module(erlang_test_01).
-export([start/0]).
-define(BASE_URL, "http://46.4.117.69/").
to_url(Id) ->
?BASE_URL ++ io_lib:format("~p", [Id]).
fetch_page(Id) ->
Uri = to_url(Id),
{ok, {{_, Status, _}, _, Data}} = httpc:request(get, {Uri, []}, [], [{body_format,binary}]),
Status,
Data.
start() ->
inets:start(),
lists:foreach(fun(I) -> fetch_page(I) end, lists:seq(1, 200)).
1. Error message
socket_closed_remotely indicates that the server closed the connection, maybe because you made too many requests in a short timespan.
2. Parallellization
Create 20 worker processes and one process holding the URL queue. Let each process ask the queue for a URL (by sending it a message). This way you can control the number of workers.
An even more "Erlangy" way is to spawn one process for each URL! The upside to this is that your code will be very straightforward. The downside is that you cannot control your bandwidth usage or number of connections to the same remote server in a simple way.

Resources