Is it possible to concurrently download a single file without range requests?

Is it possible to concurrently download a single file without range requests? - http

With HTTP/1.1 Ranged Requests there is a non-zero chance of corruption. Various transparent proxies usually used by ISPs can mangle the requests and thereafter return junk.
Also, not all servers support ranged requests and content length headers.
Is there a KISS way to concurrently download a file using GO without using these tricks?

Is there a KISS way to concurrently download a file using GO without using these tricks?
No, there isn't. But this is not related to Go, this is a plain fact of how HTTP works. If you do not want to use range headers you cannot download just "the first" and "the second" half of a request.
Not only is there no "KISS way", there is absolutely no way (except by using the appropriate tools like range request).

One possible method (depending on exactly how you interpret "concurrently") would be something like the code below. This will NOT allow you to set time-outs or a method through which you can monitor the progress of the download, but it will allow you to start a download, then read it at leisure later.
type download struct {
err error
r io.ReadCloser
}
// Download a file in a goroutine, return a channel through which we can
// read a download struct. The struct will either contain an error or
// an io.ReadCloser through which we can read the file.
func download(url string) chan download {
rv := make(chan download)
go func() {
resp, err := http.Get(url)
if err != nil {
rv <- download{err: err}
return
}
switch {
resp.StatusCode == 200:
// This can be made better, but is probably good
// enough for an illustration
rv <- download{r: resp.Body}
default:
rv <- download{err: fmt.Errorf("HTTP GET gave status %s", resp.Status)
}
}()
return rv
}

Related

Why is Golang http.ResponseWriter execution being delayed?

I am trying to send a page response as soon as request is received, then process something, but I found the response does not get sent out "first" even though it is first in code sequence.In real life I have a page for uploading a excel sheet which gets saved into the database which takes time (50,0000+ rows) and would like to update to user progress. Here is a simplified example; (depending how much RAM you have you may need to add a couple zeros to counter to see result)
package main
import (
"fmt"
"net/http"
)
func writeAndCount(w http.ResponseWriter, r *http.Request) {
w.Write([]byte("Starting to count"))
for i := 0; i < 1000000; i++ {
if i%1000 == 0 {
fmt.Println(i)
}
}
w.Write([]byte("Finished counting"))
}
func main() {
http.HandleFunc("/", writeAndCount)
http.ListenAndServe(":8080", nil)
}

The original concept of the HTTP protocol is a simple request-response server-client computation model. There was no streaming or "continuous" client update support. It is (was) always the client who first contacted the server should it needed some kind of information.
Also since most web servers cache the response until it is fully ready (or a certain limit is reached–which is typically the buffer size), data you write (send) to the client won't be transmitted immediately.
Several techniques were "developed" to get around this "limitation" so that the server is able to notify the client about changes or progress, such as HTTP Long polling, HTTP Streaming, HTTP/2 Server Push or Websockets. You can read more about these in this answer: Is there a real server push over http?
So to achieve what you want, you have to step around the original "borders" of the HTTP protocol.
If you want to send data periodically, or stream data to the client, you have to tell this to the server. The easiest way is to check if the http.ResponseWriter handed to you implements the http.Flusher interface (using a type assertion), and if it does, calling its Flusher.Flush() method will send any buffered data to the client.
Using http.Flusher is only half of the solution. Since this is a non-standard usage of the HTTP protocol, usually client support is also needed to handle this properly.
First, you have to let the client know about the "streaming" nature of the response, by setting the ContentType=text/event-stream response header.
Next, to avoid clients caching the response, be sure to also set Cache-Control=no-cache.
And last, to let the client know that you might not send the response as a single unit (but rather as periodic updates or as a stream) and so that the client should keep the connection alive and wait for further data, set the Connection=keep-alive response header.
Once the response headers are set as the above, you may start your long work, and whenever you want to update the client about the progress, write some data and call Flusher.Flush().
Let's see a simple example that does everything "right":
func longHandler(w http.ResponseWriter, r *http.Request) {
flusher, ok := w.(http.Flusher)
if !ok {
http.Error(w, "Server does not support Flusher!",
http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "text/event-stream")
w.Header().Set("Cache-Control", "no-cache")
w.Header().Set("Connection", "keep-alive")
start := time.Now()
for rows, max := 0, 50*1000; rows < max; {
time.Sleep(time.Second) // Simulating work...
rows += 10 * 1000
fmt.Fprintf(w, "Rows done: %d (%d%%), elapsed: %v\n",
rows, rows*100/max, time.Since(start).Truncate(time.Millisecond))
flusher.Flush()
}
}
func main() {
http.HandleFunc("/long", longHandler)
panic(http.ListenAndServe("localhost:8080", nil))
}
Now if you open http://localhost:8080/long in your browser, you will see an output "growing" by every second:
Rows done: 10000 (20%), elapsed: 1s
Rows done: 20000 (40%), elapsed: 2s
Rows done: 30000 (60%), elapsed: 3s
Rows done: 40000 (80%), elapsed: 4.001s
Rows done: 50000 (100%), elapsed: 5.001s
Also note that when using SSE, you should "pack" updates into SSE frames, that is you should start them with "data:" prefix, and end each frame with 2 newline chars: "\n\n".
"Literature" and further reading / tutorials
Read more about Server-sent events on Wikipedia.
See a Golang HTML5 SSE example.
See Golang SSE server example with client codes using it.
See w3school.com's turorial on Server-Sent Events - One Way Messaging.

You can check if the ResponseWriter is a http.Flusher, and if so, force the flush to network:
if f, ok := w.(http.Flusher); ok {
f.Flush()
}
However, bear in mind that this is a very unconventional HTTP handler. Streaming out progress messages to the response as if it were a terminal presents a few problems, particularly if the client is a web browser.
You might want to consider something more fitting with the nature of HTTP, such as returning a 202 Accepted response immediately, with a unique identifier the client can use to check on the status of processing using subsequent calls to your API.

golang http: how to route requests to handlers

I'm newbie on Golang and have a simple question about building a web server.
Saying that my web server has users so the users can change their names and their passwords. Here is how I design the URLs:
/users/Test GET
/users/Test/rename POST newname=Test2
/users/Test/newpassword POST newpassword=PWD
The first line is to show the information of the user named Test. The second and the third is to rename and to reset password.
So I'm thinking that I need to use some regular expression to match the HTTP requests, things like http.HandleFunc("/users/{\w}+", controller.UsersHandler).
However, it doesn't seem that Golang supports such a thing. So does it mean that I have to change my design? For example, to show the information of the user Test, I have to do /users GET name=Test?

You may want to run pattern matching on r.URL.Path, using the regex package (in your case you may need it on POST) This post shows some pattern matching samples. As #Eugene suggests there are routers/http utility packages also which can help.
Here's something which can give you some ideas, in case you don't want to use other packages:
In main:
http.HandleFunc("/", multiplexer)
...
func multiplexer(w http.ResponseWriter, r *http.Request) {
switch r.Method {
case "GET":
getHandler(w, r)
case "POST":
postHandler(w, r)
}
}
func getHandler(w http.ResponseWriter, r *http.Request) {
//Match r.URL.path here as required using switch/use regex on it
}
func postHandler(w http.ResponseWriter, r *http.Request) {
//Regex as needed on r.URL.Path
//and then get the values POSTed
name := r.FormValue("newname")
}

Unfortunately standard http router not to savvy. There are 2 ways:
Manually check methods, urls and extract usernames.
Use routers from other packages like
https://github.com/gorilla/mux
gorilla mix, echo gin-gonic etc

Golang : ExecuteTemplate results in errror: `write tcp 127.0.0.1:8080->127.0.0.1:35212: write: broken pipe` with multiple requests

I have something like this:
t, err1 := template.ParseFiles("exampleFile.tmpl")
if err1 != nil {
panic(err1)
}
err2 := t.ExecuteTemplate(w, "example", someStruct)
if err2 != nil {
panic(err2)
}
With simple requests, there are no problems.
But if I send two requests very close to each other, about 40% of the time from simple page refreshes from Chrome and a few ajax requests, it results in various different errors (err2):
write tcp 127.0.0.1:8080->127.0.0.1:35212: write: broken pipe
write: connection reset by peer
use of closed network connection
http: panic serving 127.0.0.1:35814: runtime error: slice bounds out of range
goroutine 340 (...)
Others seem to have had similar issues, but with no clear resolution.
Filter out broken pipe errors from template execution
https://groups.google.com/forum/#!topic/golang-nuts/g6UNu4Mrv28
Another post seems to suggest just to expect this, and retry failed requests:
https://stackoverflow.com/a/31409281/1923095
What's the best Golang way to handle this?

http.Client rejects request with >unsupported protocol scheme ""< even if it's set

I try to upload some videos to youtube. Somewhere in the stack it comes down to a http.Client. This part somehow behaves weird.
The request and everything is created inside the youtube package.
After doing my request in the end it fails with:
Error uploading video: Post https://www.googleapis.com/upload/youtube/v3/videos?alt=json&part=snippet%2Cstatus&uploadType=multipart: Post : unsupported protocol scheme ""
I debugged the library a bit and printed the URL.Scheme content. As a string the result is https and in []byte [104 116 116 112 115]
https://golang.org/src/net/http/transport.go on line 288 is the location where the error is thrown.
https://godoc.org/google.golang.org/api/youtube/v3 the library I use
My code where I prepare/upload the video:
//create video struct which holds info about the video
video := &yt3.Video{
//TODO: set all required video info
}
//create the insert call
insertCall := service.Videos.Insert("snippet,status", video)
//attach media data to the call
insertCall = insertCall.Media(tmp, googleapi.ChunkSize(1*1024*1024)) //1MB chunk
video, err = insertCall.Do()
if err != nil {
log.Printf("Error uploading video: %v", err)
return
//return errgo.Notef(err, "Failed to upload to youtube")
}
So I have not idea why the schema check fails.

Ok, I figured it out. The problem was not the call to YouTube itself.
The library tried to refresh the token in the background but there was something wrong with the TokenURL.
Ensuring there is a valid URL fixed the problem.
A nicer error message would have helped a lot, but well...

This will probably apply to very, very few who arrive here: but my problem was that a RoundTripper was overriding the Host field with an empty string.

multiple response.WriteHeader calls in really simple example?

I have the most basic net/http program that I'm using to learn the namespace in Go:
package main
import (
"fmt"
"log"
"net/http"
)
func main() {
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
fmt.Println(r.URL)
go HandleIndex(w, r)
})
fmt.Println("Starting Server...")
log.Fatal(http.ListenAndServe(":5678", nil))
}
func HandleIndex(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(200)
w.Write([]byte("Hello, World!"))
}
When I run the program and connect to localhost:5678 in Chrome, I get this in the console:
Starting Server...
/
2015/01/15 13:41:29 http: multiple response.WriteHeader calls
/favicon.ico
2015/01/15 13:41:29 http: multiple response.WriteHeader calls
But I don't see how that's possible. I print the URL, start up a new goroutine, write the header once, and give it a static body of Hello, World! It seems like one of two things is happening. Either something behind the scenes is writing another header or somehow HandleIndex is called twice for the same request. What can I do to stop writing multiple headers?
EDIT: It seems to have something to do with the go HandleIndex(w, r) line because if I remove go and just make it a function call instead of a goroutine, I don't get any issues and the browser gets it's data. With it being a goroutine, I get the multiple WriteHeader error and the browser doesn't show "Hello World." Why is making this a goroutine breaking it?

Take a look at the anonymous function you register as the handler of incoming requests:
func(w http.ResponseWriter, r *http.Request) {
fmt.Println(r.URL)
go HandleIndex(w, r)
}
It prints the URL (to the standard output) then calls HandleIndex() in a new goroutine and continues execution.
If you have a handler function where you do not set the response status before the first call to Write, Go will automatically set the response status to 200 (HTTP OK). If the handler function does not write anything to the response (and does not set the response status and completes normally), that is also treated as a successful handling of the request and the response status 200 will be sent back. Your anonymous function does not set it, it does not even write anything to the response. So Go will do just that: set the response status to 200 HTTP OK.
Note that handling each request runs in its own goroutine.
So if you call HandleIndex in a new goroutine, your original anonymous function will continue: it will end and so the response header will be set - meanwhile (concurrently) your started new goroutine will also set the response header - hence the "multiple response.WriteHeader calls" error.
If you remove the "go", your HandleIndex function will set the response header in the same goroutine before your handler function returns, and the "net/http" will know about this and will not try to set the response header again, so the error you experienced will not happen.

You already received a correct answer which addresses your problem, I will give some information about the general case (such error appears often).
From the documentation, you see that WriteHeader sends an http status code and you can't send more than 1 status code. If you Write anything this is equivalent to sending 200 status code and then writing things.
So the message that you see appears if you either user w.WriteHeader more than once explicitly or uses w.Write before w.WriteHeader.

From the documentation:
// WriteHeader sends an HTTP response header with status code.
// If WriteHeader is not called explicitly, the first call to Write
// will trigger an implicit WriteHeader(http.StatusOK).
What is happening in your case is that you are launching go HandleIndex from the handler.
The first handler finishes. The standard WriteHeader writes to the ResponseWriter. Then the go routine HandleIndex is launched and it also tries to write a header and write.
Just remove the go from HandleIndex and it will work.

the root cause is that you called WriteHeader more than once. from the source codes
func (w *response) WriteHeader(code int) {
if w.conn.hijacked() {
w.conn.server.logf("http: response.WriteHeader on hijacked connection")
return
}
if w.wroteHeader {
w.conn.server.logf("http: multiple response.WriteHeader calls")
return
}
w.wroteHeader = true
w.status = code
if w.calledHeader && w.cw.header == nil {
w.cw.header = w.handlerHeader.clone()
}
if cl := w.handlerHeader.get("Content-Length"); cl != "" {
v, err := strconv.ParseInt(cl, 10, 64)
if err == nil && v >= 0 {
w.contentLength = v
} else {
w.conn.server.logf("http: invalid Content-Length of %q", cl)
w.handlerHeader.Del("Content-Length")
}
}
}
so when you wrote once, the variable wroteHeader would be true, then you wrote header again, it wouldn't be effective and gave a warning "http: multiple respnse.WriteHeader calls".
actually the function Write also calls WriteHeader, so putting the function WriteHeader after the function Write also causes that error, and the later WriteHeader doesn't work.
from your case, go handleindex runs in another thread and the original already returns, if you do nothing, it will call WriteHeader to set 200. when running handleindex, it calls another WriteHeader, at that time wroteHeader is true, then the message "http: multiple response.WriteHeader calls" is output.

Yes, use HandleIndex(w, r) instead of go HandleIndex(w, r) will fix your issue, I think you have already figured that out.
The reason is simple, when handling multiple requests at the same time, the http server will start multiple goroutines, and your handler function will be called separately in each of the goroutines without blocking others.
You don't need to start your own goroutine in the handler, unless you practically need it, but that will be another topic.

Because modern browsers send an extra request for /favicon.ico which is also handled in your / request handler.
If you ping your server with curl for example, you'll see only one request being sent:
curl localhost:5678
To be sure you can add an EndPoint in your http.HandleFunc
http.HandleFunc("/Home", func(w http.ResponseWriter, r *http.Request)