How to retrieve website source without using ioutil.ReadAll in golang - http

My code:
func getSourceUrl(url string) (string, error) {
resp, err := http.Get(url)
if err != nil {
fmt.Println("Error getSourceUrl: ")
return "", err
}
defer resp.Body.Close()
body := resp.Body
// time = 0
sourcePage, err := ioutil.ReadAll(body)
// time > 5 minutes
return string(sourcePage), err
}
I have a website link with a source of around> 100000 lines. Using ioutil.ReadAll made me get very long (about> 5 minutes for 1 link). Is there a way to get Source website faster? Thank you!

#Minato try this code, play with M throttling parameter. Play with it if you get too errors (reduce it).
package main
import (
"fmt"
"io"
"io/ioutil"
"log"
"net/http"
"runtime"
"time"
)
// Token is an empty struct for signalling
type Token struct{}
// N files to get
var N = 301 // at the source 00000 - 00300
// M max go routines
var M = runtime.NumCPU() * 16
// Throttle to max M go routines
var Throttle = make(chan Token, M)
// DoneStatus is used to signal end of
type DoneStatus struct {
length int
sequence string
duration float64
err error
}
// ExitOK is simple exit counter
var ExitOK = make(chan DoneStatus)
// TotalBytes read
var TotalBytes = 0
// TotalErrors captured
var TotalErrors = 0
// URLTempl is templte for URL construction
var URLTempl = "https://virusshare.com/hashes/VirusShare_%05d.md5"
func close(c io.Closer) {
err := c.Close()
if err != nil {
log.Fatal(err)
}
}
func main() {
log.Printf("start main. M=%d\n", M)
startTime := time.Now()
for i := 0; i < N; i++ {
go func(idx int) {
// slow ramp up fire getData after i seconds
time.Sleep(time.Duration(i) * time.Second)
url := fmt.Sprintf(URLTempl, idx)
_, _ = getData(url) // errors captured as data
}(i)
}
// Count N byte count signals
for i := 0; i < N; i++ {
status := <-ExitOK
TotalBytes += status.length
if status.err != nil {
TotalErrors++
log.Printf("[%d] : %v\n", i, status.err)
continue
}
log.Printf("[%d] file %s, %.1f MByte, %.1f min, %.1f KByte/sec\n",
i, status.sequence,
float64(status.length)/(1024*1024),
status.duration/60,
float64(status.length)/(1024)/status.duration)
}
// totals
duration := time.Since(startTime).Seconds()
log.Printf("Totals: %.1f MByte, %.1f min, %.1f KByte/sec\n",
float64(TotalBytes)/(1024*1024),
duration/60,
float64(TotalBytes)/(1024)/duration)
// using fatal to verify only one go routine is running at the end
log.Fatalf("TotalErrors: %d\n", TotalErrors)
}
func getData(url string) (data []byte, err error) {
var startTime time.Time
defer func() {
// release token
<-Throttle
// signal end of go routine, with some status info
ExitOK <- DoneStatus{
len(data),
url[41:46],
time.Since(startTime).Seconds(),
err,
}
}()
// acquire one of M tokens
Throttle <- Token{}
log.Printf("Started file: %s\n", url[41:46])
startTime = time.Now()
resp, err := http.Get(url)
if err != nil {
return
}
defer close(resp.Body)
data, err = ioutil.ReadAll(resp.Body)
if err != nil {
return
}
return
}
Per transfer variation is about 10-40KByte/sec and final total for all 301 files I get 928MB, 11.1min at 1425 KByte/sec. I believe you should be able to get similar results.
// outside the scope of the question but maybe useful
Also give this a try http://www.dslreports.com/speedtest/ go to settings and select bunch of US servers for testing and set duration to 60sec. This will tell you what your actual effective total rate is to US.
Good luck!

You could iterate sections of the response at a time, something like;
responseSection := make([]byte, 128)
body.Read(responseSection)
return string(responseSection), err
Which would read 128 bytes at a time. However would suggest confirming the download speed is not causing the slow load.

The 5 minutes is probably network time.
That said, you generally would not want to buffer enormous objects in memory.
resp.Body is a Reader.
So you cold use io.Copy to copy its contents into a file.
Converting sourcePage into a string is a bad idea as it forces another allocation.

Related

Content Length in Golang

I couldn't find anything helpful online on this one.
I am writing an REST API, and I want to log the size of the body of the request in bytes for metrics. Go net/http API does not provide that directly. http.Request does have Content-Length field, but that field can be empty or the client might send false data.
Is there a way to get that in the middlware level? The bruteforce method would be to read the full body and check the size. But if I do that in the middleware, the handler will not have access to the body because it would have been read and closed.
Why do you want a middle in here?
The simple way is b, err = io.Copy(anyWriterOrMultiwriter, r.Body)
b is total content length of request when err == nil
Use request body as you want. Also b, err = io.Copy(ioutil.Discard, r.Body)
You could write a custom ReadCloser that proxies an existing one and counts bytes as it goes. Something like:
type LengthReader struct {
Source io.ReadCloser
Length int
}
func (r *LengthReader) Read(b []byte) (int, error) {
n, err := r.Source.Read(b)
r.Length += n
return n, err
}
func (r *LengthReader) Close() error {
var buf [32]byte
var n int
var err error
for err == nil {
n, err = r.Source.Read(buf[:])
r.Length += n
}
closeerr := r.Source.Close()
if err != nil && err != io.EOF {
return err
}
return closeerr
}
This will count bytes as you read them from the stream, and when closed it will consume and count all remaining unread bytes first. After you're finished with the stream, you can then access the length.
Option 1
Use TeeReader and this is scalable. It splits reader into two and one of them calculates the size using allocated memory. Also, in the first case
maxmem := 4096
var buf bytes.Buffer
// comment this line out if you want to disable gathering metrics
resp.Body = io.TeeReader(resp.Body, &buf)
readsize := func(r io.Reader) int {
bytes := make([]byte, maxmem)
var size int
for {
read, err := r.Read(bytes)
if err == io.EOF {
break
}
size += read
}
return size
}
log.Printf("Size is %d", readsize(&buf))
Option 2 unscalable way (original answer)
You can just read the body, calculate the size, then unmarshal into struct, so that it becomes:
b, _ := ioutil.ReadAll(r.Body)
size := len(b) // can be nil so check err in your app
if err := json.Unmarshal(b, &input); err != nil {
s.BadReq(w, errors.New("error reading body"))
return
}

What is the proper way to detect a dropped connection in Go?

I am working with a Go HTTP server implementation that reads an upload from a mobile client. However, I'm experiencing problems where because of a long keep-alive, the server will hang reading the request buffer for quite a long time if the mobile client goes offline (as often happens).
What is the proper way to detect a dropped connection and close the input buffer?
Set a reasonable timeout on the server, for example:
srv := &http.Server{
Addr: ":443",
ReadTimeout: time.Minute * 2,
WriteTimeout: time.Minute * 2,
}
log.Fatal(srv.ListenAndServeTLS(certFile, keyFile))
Because I wanted to drop the connection only when writes stop (and quickly, so that I can record the data received so far and allow the client to resume), the ReadTimeout isn't the right solution for me.
I found the answer in this gist. You need to set a read on the connection itself.
package nettimeout
import (
"net"
"time"
)
// Listener wraps a net.Listener, and gives a place to store the timeout
// parameters. On Accept, it will wrap the net.Conn with our own Conn for us.
type Listener struct {
net.Listener
ReadTimeout time.Duration
WriteTimeout time.Duration
}
func (l *Listener) Accept() (net.Conn, error) {
c, err := l.Listener.Accept()
if err != nil {
return nil, err
}
tc := &Conn{
Conn: c,
ReadTimeout: l.ReadTimeout,
WriteTimeout: l.WriteTimeout,
}
return tc, nil
}
// Conn wraps a net.Conn, and sets a deadline for every read
// and write operation.
type Conn struct {
net.Conn
ReadTimeout time.Duration
WriteTimeout time.Duration
}
func (c *Conn) Read(b []byte) (int, error) {
err := c.Conn.SetReadDeadline(time.Now().Add(c.ReadTimeout))
if err != nil {
return 0, err
}
return c.Conn.Read(b)
}
func (c *Conn) Write(b []byte) (int, error) {
err := c.Conn.SetWriteDeadline(time.Now().Add(c.WriteTimeout))
if err != nil {
return 0, err
}
return c.Conn.Write(b)
}
func NewListener(addr string, readTimeout, writeTimeout time.Duration) (net.Listener, error) {
l, err := net.Listen("tcp", addr)
if err != nil {
return nil, err
}
tl := &Listener{
Listener: l,
ReadTimeout: readTimeout,
WriteTimeout: writeTimeout,
}
return tl, nil
}

Trying to implement a port scanner in go

I recently started to learn go. The only reson for that is the goroutine thing which seem to exist only in this language (I have java background and, to be honest, won't ever completely switch to go).
I wanted to implement a simple port scanner which is to find every http server (host with opened port 80) in the given network range. Here's how I am doing this:
package main
import (
"net"
"fmt"
"regexp"
"strconv"
"time"
)
// next two functions are shamelessly copied from somewhere
func ip2long(ipstr string) (ip uint32) {
r := `^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})`
reg, err := regexp.Compile(r)
if err != nil {
return
}
ips := reg.FindStringSubmatch(ipstr)
if ips == nil {
return
}
ip1, _ := strconv.Atoi(ips[1])
ip2, _ := strconv.Atoi(ips[2])
ip3, _ := strconv.Atoi(ips[3])
ip4, _ := strconv.Atoi(ips[4])
if ip1 > 255 || ip2 > 255 || ip3 > 255 || ip4 > 255 {
return
}
ip += uint32(ip1 * 0x1000000)
ip += uint32(ip2 * 0x10000)
ip += uint32(ip3 * 0x100)
ip += uint32(ip4)
return
}
func long2ip(ip uint32) string {
return fmt.Sprintf("%d.%d.%d.%d", ip>>24, ip<<8>>24, ip<<16>>24, ip<<24>>24)
}
// the actual code
func main() {
seconds := 10 // timeout
fmt.Println(seconds) // just to see it
timeOut := time.Duration(seconds) * time.Second // time out to pass to the DialTimeout
can := make(chan int) // a chan
req := func (ip string){ // parallelized function to do requests
c, err := net.DialTimeout("tcp", ip+":80",timeOut) // connect to ip with given timeout
if err == nil { // if we're connected
fmt.Println(ip) // output the successful ip
c.Close() // close connection
}
can <- 0 // tell that we're done
}
startIp := ip2long("50.97.99.0") // starting ip
endIp := ip2long("50.97.102.0")
curIp := startIp // current ip
go func(){ // a demon function ran as goroutine which listens to the chan
count := 0 // how many ips we processed
looper: // label to break
for{
<- can // wait for some goroutine to finish
curIp++ // next ip
count++
go req(long2ip(curIp)) // start new goroutine
if (curIp > endIp) { // if we've walked through the range
fmt.Println("final")
break looper;
}
}
}()
numGoes := 100 // number of goroutines ran at one time
for i := 0; i < numGoes; i++{
can <- 0 // start 100 goroutines
}
// standard way to make the program hung
var input string
fmt.Scanln(&input)
}
I hope the code is well-commented so you can see what I'm trying to do.
The ip range is the range of some hosting company and I know for sure that the IP 50.97.99.189 runs http server, but the problem is that this IP never shows up in console, when I run my given code, although the host is up and ping time is around 156 ms so 10 secs is more than enough.
So the question - what am I doing wrong?
Here is a slightly reworked version that is more idiomatic go.
There are shorter ways to write it, but this is probably more clear.
Logic is basically the same. I just ran it and it worked fine, printed out several ips it connected to. This version also prints why it fails, which is more for troubleshooting.
Do you still have issues running this version? If so, what error are you getting?
My version, on Play.
package main
import (
"fmt"
"net"
"regexp"
"strconv"
"sync"
"time"
)
// next two functions are shamelessly copied from somewhere
func ip2long(ipstr string) (uint32, error) {
r := `^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})`
reg, err := regexp.Compile(r)
if err != nil {
return 0, err
}
ips := reg.FindStringSubmatch(ipstr)
if ips == nil {
return 0, fmt.Errorf("Invalid ip address")
}
var ip1, ip2, ip3, ip4 int
if ip1, err = strconv.Atoi(ips[1]); err != nil {
return 0, err
}
if ip2, err = strconv.Atoi(ips[2]); err != nil {
return 0, err
}
if ip3, err = strconv.Atoi(ips[3]); err != nil {
return 0, err
}
if ip4, err = strconv.Atoi(ips[4]); err != nil {
return 0, err
}
if ip1 > 255 || ip2 > 255 || ip3 > 255 || ip4 > 255 {
return 0, fmt.Errorf("Invalid ip address")
}
ip := uint32(ip1 * 0x1000000)
ip += uint32(ip2 * 0x10000)
ip += uint32(ip3 * 0x100)
ip += uint32(ip4)
return ip, nil
}
func long2ip(ip uint32) string {
return fmt.Sprintf("%d.%d.%d.%d", ip>>24, ip<<8>>24, ip<<16>>24, ip<<24>>24)
}
// the actual code
func main() {
timeOut := 10 * time.Second // time out to pass to the DialTimeout
fmt.Println("Timeout is:", timeOut)
req := func(ip string) { // parallelized function to do requests
c, err := net.DialTimeout("tcp", ip+":80", timeOut) // connect to ip with given timeout
if err == nil { // if we're connected
fmt.Println(ip) // output the successful ip
c.Close() // close connection
} else {
fmt.Println("Error is:", err)
}
}
startIp, err := ip2long("50.97.99.0") // starting ip
if err != nil {
fmt.Println(err)
return
}
endIp, err := ip2long("50.97.102.0")
if err != nil {
fmt.Println(err)
return
}
var wg sync.WaitGroup // synchronizer for main routine to wait for spawned workers
ips := make(chan uint32) // channel to feed ip addrs
//spawn 100 workers
for idx := 0; idx < 100; idx++ {
wg.Add(1)
go func() {
for ip := range ips {
req(long2ip(ip)) // perform check of ip
}
wg.Done()
}()
}
// send ip addrs to workers to process
for curIp := startIp; curIp <= endIp; curIp++ {
ips <- curIp
}
close(ips) // signal goroutines to end
wg.Wait() //wait for all goroutines to complete
}

Go: Tracking POST request progress

I'm coding a ShareX clone for Linux in Go that uploads files and images to file sharing services through http POST requests.
I'm currently using http.Client and Do() to send my requests, but I'd like to be able to track the upload progress for bigger files that take up to a minute to upload.
The only way I can think of at the moment is manually opening a TCP connection on port 80 to the website and write the HTTP request in chunks, but I don't know if it would work on https sites and I'm not sure if it's the best way to do it.
Is there any other way to achieve this?
You can create your own io.Reader to wrap the actual reader and then you can output the progress each time Read is called.
Something along the lines of:
type ProgressReader struct {
io.Reader
Reporter func(r int64)
}
func (pr *ProgressReader) Read(p []byte) (n int, err error) {
n, err = pr.Reader.Read(p)
pr.Reporter(int64(n))
return
}
func main() {
file, _ := os.Open("/tmp/blah.go")
total := int64(0)
pr := &ProgressReader{file, func(r int64) {
total += r
if r > 0 {
fmt.Println("progress", r)
} else {
fmt.Println("done", r)
}
}}
io.Copy(ioutil.Discard, pr)
}
Wrap the reader passed as the request body with something that reports progress. For example,
type progressReporter struct {
r io.Reader
max int
sent int
}
func (pr *progressReader) Read(p []byte) (int, error) {
n, err := pr.r.Read(p)
pr.sent += n
if err == io.EOF {
pr.atEOF = true
}
pr.report()
return n, err
}
func (pr *progressReporter) report() {
fmt.Printf("sent %d of %d bytes\n", pr.sent, pr.max)
if pr.atEOF {
fmt.Println("DONE")
}
}
If previously you called
client.Post(u, contentType, r)
then change the code to
client.Post(u, contentType, &progressReader{r:r, max:max})
where max is the number of bytes you expect to send. Modify the progressReporter.report() method and add fields to progressReporter to meet your specific needs.

Unable to send gob data over TCP in Go Programming

I have a client server application, using TCP connection
Client:
type Q struct {
sum int64
}
type P struct {
M, N int64
}
func main() {
...
//read M and N
...
tcpAddr, err := net.ResolveTCPAddr("tcp4", service)
...
var p P
p.M = M
p.N = N
err = enc.Encode(p)
}
Server:
type Q struct {
sum int64
}
type P struct {
M, N int64
}
func main() {
...
tcpAddr, err := net.ResolveTCPAddr("ip4", service)
listener, err := net.ListenTCP("tcp", tcpAddr)
...
var connB bytes.Buffer
dec := gob.NewDecoder(&connB)
var p P
err = dec.Decode(p)
fmt.Printf("{%d, %d}\n", p.M, p.N)
}
The result on serve is {0, 0} because I don't know how to obtain a bytes.Buffer variable from net.Conn.
Is there any way for sending gob variables over TCP ?
If true, how can this be done ? Or there are any alternative in sending numbers over TCP ?
Any help or sample code would really be appreciated.
Here's a complete example.
Server:
package main
import (
"fmt"
"net"
"encoding/gob"
)
type P struct {
M, N int64
}
func handleConnection(conn net.Conn) {
dec := gob.NewDecoder(conn)
p := &P{}
dec.Decode(p)
fmt.Printf("Received : %+v", p);
conn.Close()
}
func main() {
fmt.Println("start");
ln, err := net.Listen("tcp", ":8080")
if err != nil {
// handle error
}
for {
conn, err := ln.Accept() // this blocks until connection or error
if err != nil {
// handle error
continue
}
go handleConnection(conn) // a goroutine handles conn so that the loop can accept other connections
}
}
Client :
package main
import (
"fmt"
"log"
"net"
"encoding/gob"
)
type P struct {
M, N int64
}
func main() {
fmt.Println("start client");
conn, err := net.Dial("tcp", "localhost:8080")
if err != nil {
log.Fatal("Connection error", err)
}
encoder := gob.NewEncoder(conn)
p := &P{1, 2}
encoder.Encode(p)
conn.Close()
fmt.Println("done");
}
Launch the server, then the client, and you see the server displaying the received P value.
A few observations to make it clear :
When you listen on a socket, you should pass the open socket to a goroutine that will handle it.
Conn implements the Reader and Writer interfaces, which makes it easy to use : you can give it to a Decoder or Encoder
In a real application you would probably have the P struct definition in a package imported by both programs

Resources