How to request a page with a specific charset in Go? - http

I am rewriting a software from Python to Go. I am facing an issue with the http.Get while fetching a page encoded in iso-8859-1. The Python version is working but not the one in Go.
This is working: Python
r = requests.get("https://www.bger.ch/ext/eurospider/live/de/php/aza/http/index.php?lang=de&type=show_document&print=yes&highlight_docid=aza://27-01-2016-5A_718-2015")
r.encoding = 'iso-8859-1'
file = open('tmp_python.txt', 'w')
file.write(r.text.strip())
file.close()
This is not working: Go
package main
import (
"golang.org/x/net/html/charset"
"io/ioutil"
"log"
"net/http"
)
func main() {
link := "https://www.bger.ch/ext/eurospider/live/de/php/aza/http/index.php?lang=de&type=show_document&print=yes&highlight_docid=aza://27-01-2016-5A_718-2015"
resp, err := http.Get(link)
if err != nil {
panic(err)
}
defer resp.Body.Close()
reader, err := charset.NewReader(resp.Body, "iso-8859-1")
if err != nil {
panic(err)
}
content, err := ioutil.ReadAll(reader)
if err != nil {
panic(err)
}
log.Println(string(content))
}
My browser and Python give the same result but not the Go version. How can I fix that?
Edit
I think there is redirection with Go. This does not happen with Python.
Edit 2
My question was badly written. I had two problems: 1) the encoding 2) the wrong page returned. I do not know if there are related.
I will open a new thread for the second question.

The second argument of NewReader is documented as contentType and not as a character encoding. This means it expects the value of the Content-Type field in the HTTP header instead. Thus, the proper usage would be:
reader, err := charset.NewReader(resp.Body, "text/html; charset=iso-8859-1")
And this works perfectly.
Note that if the given contentType has no useful charset definition inside it will look at the body itself in order to determine the charset. And while the HTTP header of this page has a clear
Content-Type: text/html;charset=iso-8859-1
the actual HTML document returned defines a different charset encoding:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
With the wrong setting of contentType in your code it will thus take the charset encoding declared wrongly in the HTML.

Related

How to bypass golang's HTTP request (net/http) RFC compliance

I'm developing a Security Scanner and therefore need to send HTTP requests which don't honor RFC specifications. However, golang is very strict to comply with these.
Issue
I want to send a HTTP request which contains prohibited special characters such as "".
For example: "Ill\egal": "header value"
However, golang always throws the error: 'net/http: invalid header field name "Ill\egal"'.
This error is thrown on line 523 at https://go.dev/src/net/http/transport.go
Issue
I want to send a single HTTP request which contains either two content-length, two transfer-encoding or one content-length & one transfer-encoding header (for HTTP request smuggling). Those need sometimes to have wrong values.
However, it isn't possible to set those headers oneself, they are generated automatically. So it's only possible to use one of these headers with a correct value.
I've bypassed this by using a Raw TCP Stream, however this solution isn't satisfying, as I can't use a proxy this way: Use Dialer with Proxy. Route TCP stream through Proxy
Issue
I want to send a HTTP request where the header name is mixed upper and lowercase. E.g. "HeAdErNaMe": "header value".
This is possible for HTTP 1 requests by writing directly to the header map (req.Header["HeAdErNaMe"] = []string{"header value"})
However for HTTP 2 requests the headers will still be capitalized to meet the RFC specifications.
You can dump request into a buffer, modify the buffer (with regexp or replace), and send modified buffer to the host using net.Dial.
Example:
package main
import (
"bufio"
"crypto/tls"
"fmt"
"log"
"net/http"
"net/http/httputil"
"strings"
)
func main() {
// create and dump request
req, err := http.NewRequest(http.MethodGet, "https://golang.org", nil)
if err != nil {
log.Fatal(err)
}
req.Header.Add("User-Agent", "aaaaa")
buf, err := httputil.DumpRequest(req, true)
if err != nil {
log.Fatal(err)
}
// Corrupt request
str := string(buf)
str = strings.Replace(str, "User-Agent: aaaaa", "UsEr-AgEnT: aaa\"aaa", 1)
println(str)
// Dial and send raw request text
conn, err := tls.Dial("tcp", "golang.org:443", nil)
if err != nil {
log.Fatal(err)
}
defer conn.Close()
fmt.Fprintf(conn, str)
// Read response
br := bufio.NewReader(conn)
resp, err := http.ReadResponse(br, nil)
if err != nil {
log.Fatal(err)
}
log.Printf("%+v", resp)
}

Reading non-utf8 encoded data from a network call in golang

I am trying to read bytes from http response body in golang. My problem is that the response body is encoded using ISO-8859-1. I want to read the response body in the same encoding and write the contents to a file in the ISO-8859-1 encoding.
Is there a way using which I can accomplish this? I don't want to convert the data into UTF-8 at all.
Here is a good read about encoding, which you might benefit from.
You are seemingly assuming Go decodes the raw bytes it receives when it performs a request. It does not.
Take this example:
package main
import (
"io"
"log"
"net/http"
"os"
)
func main() {
// We perform a request to a Latin-1 encoded page
resp, err := http.Get("http://andrew.triumf.ca/multilingual/samples/german.meta.html")
if err != nil {
log.Fatalln(err)
}
//
f, err := os.Create("/tmp/latin1")
defer f.Close()
if err != nil {
log.Fatalln(err)
}
io.Copy(f, resp.Body)
}
In the documentation, you can read that resp.Body conforms to the io.ReadCloser interface, which allows you to read the raw bytes and stream them to a file.
Once we run this code, this is the output of file -i /tmp/latin1:
/tmp/latin1: text/html; charset=iso-8859-1
Read and write the response body as a slice of bytes, []byte, an opaque data type.

Unexpected EOF using Go http client

I am learning Go and came across this problem.
I am just downloading web page content using HTTP client:
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
)
func main() {
client := &http.Client{}
req, err := http.NewRequest("GET", "https://mail.ru/", nil)
req.Close = true
response, err := client.Do(req)
if err != nil {
log.Fatal(err)
}
defer response.Body.Close()
content, err := ioutil.ReadAll(response.Body)
if err != nil {
fmt.Println(err)
}
fmt.Println(string(content)[:100])
}
I get an unexpected EOF error when reading response body. At the same time content variable has full page content.
This error appear only when I downloading https://mail.ru/ content. With other URLs everything works fine - without any errors.
I used curl for downloading this page content - everything works as expected.
I am confused a bit - what's happening here?
Go v1.2, tried on Ubuntu and MacOS X
It looks like the that server (Apache 1.3, wow!) is serving up a truncated gzip response. If you explicitly request the identity encoding (preventing the Go transport from adding gzip itself), you won't get the ErrUnexpectedEOF:
req.Header.Add("Accept-Encoding", "identity")

In golang, how to determine the final URL after a series of redirects?

So, I'm using the net/http package. I'm GETting a URL that I know for certain is redirecting. It may even redirect a couple of times before landing on the final URL. Redirection is handled automatically behind the scenes.
Is there an easy way to figure out what the final URL was without a hackish workaround that involves setting the CheckRedirect field on a http.Client object?
I guess I should mention that I think I came up with a workaround, but it's kind of hackish, as it involves using a global variable and setting the CheckRedirect field on a custom http.Client.
There's got to be a cleaner way to do it. I'm hoping for something like this:
package main
import (
"fmt"
"log"
"net/http"
)
func main() {
// Try to GET some URL that redirects. Could be 5 or 6 unseen redirections here.
resp, err := http.Get("http://some-server.com/a/url/that/redirects.html")
if err != nil {
log.Fatalf("http.Get => %v", err.Error())
}
// Find out what URL we ended up at
finalURL := magicFunctionThatTellsMeTheFinalURL(resp)
fmt.Printf("The URL you ended up at is: %v", finalURL)
}
package main
import (
"fmt"
"log"
"net/http"
)
func main() {
resp, err := http.Get("http://stackoverflow.com/q/16784419/727643")
if err != nil {
log.Fatalf("http.Get => %v", err.Error())
}
// Your magic function. The Request in the Response is the last URL the
// client tried to access.
finalURL := resp.Request.URL.String()
fmt.Printf("The URL you ended up at is: %v\n", finalURL)
}
Output:
The URL you ended up at is: http://stackoverflow.com/questions/16784419/in-golang-how-to-determine-the-final-url-after-a-series-of-redirects
I would add a note that http.Head method should be enough to retrieve the final URL. Theoretically it should be faster comparing to http.Get as a server is expected to send back just a header:
resp, err := http.Head("http://stackoverflow.com/q/16784419/727643")
...
finalURL := resp.Request.URL.String()
...

Reading image from HTTP request's body in Go

I'm playing with Go (first time ever) and I want to build a tool to retrieve images from Internet and cut them (even resize) but I'm stuck on the first step.
package main
import (
"fmt"
"http"
)
var client = http.Client{}
func cutterHandler(res http.ResponseWriter, req *http.Request) {
reqImg, err := client.Get("http://www.google.com/intl/en_com/images/srpr/logo3w.png")
if err != nil {
fmt.Fprintf(res, "Error %d", err)
return
}
buffer := make([]byte, reqImg.ContentLength)
reqImg.Body.Read(buffer)
res.Header().Set("Content-Length", fmt.Sprint(reqImg.ContentLength)) /* value: 7007 */
res.Header().Set("Content-Type", reqImg.Header.Get("Content-Type")) /* value: image/png */
res.Write(buffer)
}
func main() {
http.HandleFunc("/cut", cutterHandler)
http.ListenAndServe(":8080", nil) /* TODO Configurable */
}
I'm able to request an image (let's use Google logo) and to get its kind and size.
Indeed, I'm just re-writing the image (look at this as a toy "proxy"), setting Content-Length and Content-Type and writing the byte slice back but I get it wrong somewhere. See how it looks the final image rendered on Chromium 12.0.742.112 (90304):
Also I checked the downloaded file and it is a 7007 bytes PNG image. It should be working properly if we look at the request:
GET /cut HTTP/1.1
User-Agent: curl/7.22.0 (i486-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.0e zlib/1.2.3.4 libidn/1.23 libssh2/1.2.8 librtmp/2.3
Host: 127.0.0.1:8080
Accept: /
HTTP/1.1 200 OK
Content-Length: 7007
Content-Type: image/png
Date: Tue, 27 Dec 2011 19:51:53 GMT
[PNG data]
What do you think I'm doing wrong here?
Disclaimer: I'm scratching my own itch, so probably I'm using the wrong tool :) Anyway, I can implement it on Ruby but before I would like to give Go a try.
Update: still scratching itches but... I think this is going to be a good side-of-side project so I'm opening it https://github.com/imdario/go-lazor If it is not useful, at least somebody can find usefulness with the references used to develop it. They were for me.
I think you went too fast to the serve things part.
Focus on the first step, downloading the image.
Here you have a little program that downloads that image to memory.
It works on my 2011-12-22 weekly version, for r60.3 you just need to gofix the imports.
package main
import (
"log"
"io/ioutil"
"net/http"
)
const url = "http://www.google.com/intl/en_com/images/srpr/logo3w.png"
func main() {
// Just a simple GET request to the image URL
// We get back a *Response, and an error
res, err := http.Get(url)
if err != nil {
log.Fatalf("http.Get -> %v", err)
}
// We read all the bytes of the image
// Types: data []byte
data, err = ioutil.ReadAll(res.Body)
if err != nil {
log.Fatalf("ioutil.ReadAll -> %v", err)
}
// You have to manually close the body, check docs
// This is required if you want to use things like
// Keep-Alive and other HTTP sorcery.
res.Body.Close()
// You can now save it to disk or whatever...
ioutil.WriteFile("google_logo.png", data, 0666)
log.Println("I saved your image buddy!")
}
Voilá!
This will get the image to memory inside data.
Once you have that, you can decode it, crop it and serve back to the browser.
Hope this helps.
I tried your code and noticed that the image you were serving was the right size, but the contents of the file past a certain point were all 0x00.
Review the io.Reader documentation. The important thing to remember is that Read reads up to the number of bytes you request. It can read fewer with no error returned. (You should be checking the error too, but that's not an issue here.)
If you want to make sure your buffer is completely full, use io.ReadFull. In this case it's simpler to just copy the entire contents of the Reader with io.Copy.
It's also important to remember to close HTTP request bodies.
I would rewrite the code this way:
package main
import (
"fmt"
"http"
"io"
)
var client = http.Client{}
func cutterHandler(res http.ResponseWriter, req *http.Request) {
reqImg, err := client.Get("http://www.google.com/intl/en_com/images/srpr/logo3w.png")
if err != nil {
fmt.Fprintf(res, "Error %d", err)
return
}
res.Header().Set("Content-Length", fmt.Sprint(reqImg.ContentLength))
res.Header().Set("Content-Type", reqImg.Header.Get("Content-Type"))
if _, err = io.Copy(res, reqImg.Body); err != nil {
// handle error
}
reqImg.Body.Close()
}
func main() {
http.HandleFunc("/cut", cutterHandler)
http.ListenAndServe(":8080", nil) /* TODO Configurable */
}

Resources