Reading non-utf8 encoded data from a network call in golang - http

I am trying to read bytes from http response body in golang. My problem is that the response body is encoded using ISO-8859-1. I want to read the response body in the same encoding and write the contents to a file in the ISO-8859-1 encoding.
Is there a way using which I can accomplish this? I don't want to convert the data into UTF-8 at all.

Here is a good read about encoding, which you might benefit from.
You are seemingly assuming Go decodes the raw bytes it receives when it performs a request. It does not.
Take this example:
package main
import (
"io"
"log"
"net/http"
"os"
)
func main() {
// We perform a request to a Latin-1 encoded page
resp, err := http.Get("http://andrew.triumf.ca/multilingual/samples/german.meta.html")
if err != nil {
log.Fatalln(err)
}
//
f, err := os.Create("/tmp/latin1")
defer f.Close()
if err != nil {
log.Fatalln(err)
}
io.Copy(f, resp.Body)
}
In the documentation, you can read that resp.Body conforms to the io.ReadCloser interface, which allows you to read the raw bytes and stream them to a file.
Once we run this code, this is the output of file -i /tmp/latin1:
/tmp/latin1: text/html; charset=iso-8859-1

Read and write the response body as a slice of bytes, []byte, an opaque data type.

Related

How to bypass golang's HTTP request (net/http) RFC compliance

I'm developing a Security Scanner and therefore need to send HTTP requests which don't honor RFC specifications. However, golang is very strict to comply with these.
Issue
I want to send a HTTP request which contains prohibited special characters such as "".
For example: "Ill\egal": "header value"
However, golang always throws the error: 'net/http: invalid header field name "Ill\egal"'.
This error is thrown on line 523 at https://go.dev/src/net/http/transport.go
Issue
I want to send a single HTTP request which contains either two content-length, two transfer-encoding or one content-length & one transfer-encoding header (for HTTP request smuggling). Those need sometimes to have wrong values.
However, it isn't possible to set those headers oneself, they are generated automatically. So it's only possible to use one of these headers with a correct value.
I've bypassed this by using a Raw TCP Stream, however this solution isn't satisfying, as I can't use a proxy this way: Use Dialer with Proxy. Route TCP stream through Proxy
Issue
I want to send a HTTP request where the header name is mixed upper and lowercase. E.g. "HeAdErNaMe": "header value".
This is possible for HTTP 1 requests by writing directly to the header map (req.Header["HeAdErNaMe"] = []string{"header value"})
However for HTTP 2 requests the headers will still be capitalized to meet the RFC specifications.
You can dump request into a buffer, modify the buffer (with regexp or replace), and send modified buffer to the host using net.Dial.
Example:
package main
import (
"bufio"
"crypto/tls"
"fmt"
"log"
"net/http"
"net/http/httputil"
"strings"
)
func main() {
// create and dump request
req, err := http.NewRequest(http.MethodGet, "https://golang.org", nil)
if err != nil {
log.Fatal(err)
}
req.Header.Add("User-Agent", "aaaaa")
buf, err := httputil.DumpRequest(req, true)
if err != nil {
log.Fatal(err)
}
// Corrupt request
str := string(buf)
str = strings.Replace(str, "User-Agent: aaaaa", "UsEr-AgEnT: aaa\"aaa", 1)
println(str)
// Dial and send raw request text
conn, err := tls.Dial("tcp", "golang.org:443", nil)
if err != nil {
log.Fatal(err)
}
defer conn.Close()
fmt.Fprintf(conn, str)
// Read response
br := bufio.NewReader(conn)
resp, err := http.ReadResponse(br, nil)
if err != nil {
log.Fatal(err)
}
log.Printf("%+v", resp)
}

Downloading just part of a file

I want to download just part of a text-like file from the internet using go. It appears that curl --max-filesize and --range aren't respected by the websites I want to download from. Additionally, I read that the http MaxBytesReader still downloads the entire file, but only stores part of it.
Is there a way to just get the first kb of a file and then close the connection? The equivalent of pressing "x" when a large page is loading, on chrome.
I'm thinking that I can run a thread that reads a website to a file, and then kill the thread after a ms or two. Is this possible?
One simple way to do it without much error handling (which would have to added could be):
import (
"fmt"
"io"
"io/ioutil"
"net/http"
)
const readLimit = 1024 // bytes
func main() {
resp, err := http.Get("http://example.com/")
if err != nil {
// handle error
}
fixedReader := io.LimitedReader{R: resp.Body, N: readLimit}
data, _ := ioutil.ReadAll(fixedReader)
resp.Body.Close()
fmt.Println(string(data))
}

How to request a page with a specific charset in Go?

I am rewriting a software from Python to Go. I am facing an issue with the http.Get while fetching a page encoded in iso-8859-1. The Python version is working but not the one in Go.
This is working: Python
r = requests.get("https://www.bger.ch/ext/eurospider/live/de/php/aza/http/index.php?lang=de&type=show_document&print=yes&highlight_docid=aza://27-01-2016-5A_718-2015")
r.encoding = 'iso-8859-1'
file = open('tmp_python.txt', 'w')
file.write(r.text.strip())
file.close()
This is not working: Go
package main
import (
"golang.org/x/net/html/charset"
"io/ioutil"
"log"
"net/http"
)
func main() {
link := "https://www.bger.ch/ext/eurospider/live/de/php/aza/http/index.php?lang=de&type=show_document&print=yes&highlight_docid=aza://27-01-2016-5A_718-2015"
resp, err := http.Get(link)
if err != nil {
panic(err)
}
defer resp.Body.Close()
reader, err := charset.NewReader(resp.Body, "iso-8859-1")
if err != nil {
panic(err)
}
content, err := ioutil.ReadAll(reader)
if err != nil {
panic(err)
}
log.Println(string(content))
}
My browser and Python give the same result but not the Go version. How can I fix that?
Edit
I think there is redirection with Go. This does not happen with Python.
Edit 2
My question was badly written. I had two problems: 1) the encoding 2) the wrong page returned. I do not know if there are related.
I will open a new thread for the second question.
The second argument of NewReader is documented as contentType and not as a character encoding. This means it expects the value of the Content-Type field in the HTTP header instead. Thus, the proper usage would be:
reader, err := charset.NewReader(resp.Body, "text/html; charset=iso-8859-1")
And this works perfectly.
Note that if the given contentType has no useful charset definition inside it will look at the body itself in order to determine the charset. And while the HTTP header of this page has a clear
Content-Type: text/html;charset=iso-8859-1
the actual HTML document returned defines a different charset encoding:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
With the wrong setting of contentType in your code it will thus take the charset encoding declared wrongly in the HTML.

why is golang http server failing with "broken pipe" when response exceeds 8kb?

I have a example web server below where if you call curl localhost:3000 -v then ^C (cancel) it immediately (before 1 second), it will report write tcp 127.0.0.1:3000->127.0.0.1:XXXXX: write: broken pipe.
package main
import (
"fmt"
"net/http"
"time"
)
func main() {
log.Fatal(http.ListenAndServe(":3000", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
time.Sleep(1 * time.Second)
// Why 8061 bytes? Because the response header on my computer
// is 132 bytes, adding up the entire response to 8193 (1 byte
// over 8kb)
if _, err := w.Write(make([]byte, 8061)); err != nil {
fmt.Println(err)
return
}
})))
}
Based on my debugging, I have been able to conclude that this will only happen if the entire response is writing more than 8192 bytes (or 8kb). If my entire response write less than 8192, the broken pipe error is not returned.
My question is where is this 8192 bytes (or 8kb) buffer limit set? Is this a limit in Golang's HTTP write buffer? Is this related to the response being chunked? Is this only related to the curl client or the browser client? How can I change this limit so I can have a bigger buffer written before the connection is closed (for debugging purposes)?
Thanks!
In net/http/server.go the output buffer is set to 4<<10, i.e. 4KB.
The reason you see the error at 8KB, is that it takes at least 2 writes to a socket to detect a closed remote connection. The first write succeeds, but the remote host sends an RST packet. The second write will be to a closed socket, which is what returns the broken pipe error.
Depending on the socket write buffer, and the connection latency, it's possible that even more writes could succeed before the first RST packet is registered.
It is broken pipe, but u should use ioutil.ReadAll for small data size of response or io.copy for large data size of response.
For ioutil.ReadAll
defer response.Body.Close()
body, err := ioutil.ReadAll(response.Body)
if err != nil {
logger.Errorf(ctx, "err is %+v", err)
return nil, err
}
For io.copy
// 10MB
var wb = make([]byte, 0, 10485760)
buf := bytes.NewBuffer(wb)
written, err := io.Copy(buf, response.Body)
body := wb[:written]

Reading image from HTTP request's body in Go

I'm playing with Go (first time ever) and I want to build a tool to retrieve images from Internet and cut them (even resize) but I'm stuck on the first step.
package main
import (
"fmt"
"http"
)
var client = http.Client{}
func cutterHandler(res http.ResponseWriter, req *http.Request) {
reqImg, err := client.Get("http://www.google.com/intl/en_com/images/srpr/logo3w.png")
if err != nil {
fmt.Fprintf(res, "Error %d", err)
return
}
buffer := make([]byte, reqImg.ContentLength)
reqImg.Body.Read(buffer)
res.Header().Set("Content-Length", fmt.Sprint(reqImg.ContentLength)) /* value: 7007 */
res.Header().Set("Content-Type", reqImg.Header.Get("Content-Type")) /* value: image/png */
res.Write(buffer)
}
func main() {
http.HandleFunc("/cut", cutterHandler)
http.ListenAndServe(":8080", nil) /* TODO Configurable */
}
I'm able to request an image (let's use Google logo) and to get its kind and size.
Indeed, I'm just re-writing the image (look at this as a toy "proxy"), setting Content-Length and Content-Type and writing the byte slice back but I get it wrong somewhere. See how it looks the final image rendered on Chromium 12.0.742.112 (90304):
Also I checked the downloaded file and it is a 7007 bytes PNG image. It should be working properly if we look at the request:
GET /cut HTTP/1.1
User-Agent: curl/7.22.0 (i486-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.0e zlib/1.2.3.4 libidn/1.23 libssh2/1.2.8 librtmp/2.3
Host: 127.0.0.1:8080
Accept: /
HTTP/1.1 200 OK
Content-Length: 7007
Content-Type: image/png
Date: Tue, 27 Dec 2011 19:51:53 GMT
[PNG data]
What do you think I'm doing wrong here?
Disclaimer: I'm scratching my own itch, so probably I'm using the wrong tool :) Anyway, I can implement it on Ruby but before I would like to give Go a try.
Update: still scratching itches but... I think this is going to be a good side-of-side project so I'm opening it https://github.com/imdario/go-lazor If it is not useful, at least somebody can find usefulness with the references used to develop it. They were for me.
I think you went too fast to the serve things part.
Focus on the first step, downloading the image.
Here you have a little program that downloads that image to memory.
It works on my 2011-12-22 weekly version, for r60.3 you just need to gofix the imports.
package main
import (
"log"
"io/ioutil"
"net/http"
)
const url = "http://www.google.com/intl/en_com/images/srpr/logo3w.png"
func main() {
// Just a simple GET request to the image URL
// We get back a *Response, and an error
res, err := http.Get(url)
if err != nil {
log.Fatalf("http.Get -> %v", err)
}
// We read all the bytes of the image
// Types: data []byte
data, err = ioutil.ReadAll(res.Body)
if err != nil {
log.Fatalf("ioutil.ReadAll -> %v", err)
}
// You have to manually close the body, check docs
// This is required if you want to use things like
// Keep-Alive and other HTTP sorcery.
res.Body.Close()
// You can now save it to disk or whatever...
ioutil.WriteFile("google_logo.png", data, 0666)
log.Println("I saved your image buddy!")
}
Voilá!
This will get the image to memory inside data.
Once you have that, you can decode it, crop it and serve back to the browser.
Hope this helps.
I tried your code and noticed that the image you were serving was the right size, but the contents of the file past a certain point were all 0x00.
Review the io.Reader documentation. The important thing to remember is that Read reads up to the number of bytes you request. It can read fewer with no error returned. (You should be checking the error too, but that's not an issue here.)
If you want to make sure your buffer is completely full, use io.ReadFull. In this case it's simpler to just copy the entire contents of the Reader with io.Copy.
It's also important to remember to close HTTP request bodies.
I would rewrite the code this way:
package main
import (
"fmt"
"http"
"io"
)
var client = http.Client{}
func cutterHandler(res http.ResponseWriter, req *http.Request) {
reqImg, err := client.Get("http://www.google.com/intl/en_com/images/srpr/logo3w.png")
if err != nil {
fmt.Fprintf(res, "Error %d", err)
return
}
res.Header().Set("Content-Length", fmt.Sprint(reqImg.ContentLength))
res.Header().Set("Content-Type", reqImg.Header.Get("Content-Type"))
if _, err = io.Copy(res, reqImg.Body); err != nil {
// handle error
}
reqImg.Body.Close()
}
func main() {
http.HandleFunc("/cut", cutterHandler)
http.ListenAndServe(":8080", nil) /* TODO Configurable */
}

Resources