I've been implementing a kong plugin that needs to make HTTP requests to retrieve information to share it with the upstream services.
There is an excellent library called lua-resty-http that can be used to make HTTP requests.
The service that contains the information needed, it is configured behind the proxy, and it matches the path: /endpoint-providing-info.
The goal is to rely on the proxy capabilities to avoid having to parse the hostname which has some particular form that is not relevant to this question.
By playing around, I was able to achieve the desired behavior by doing the following:
local ok, err = http_client:connect("127.0.0.1", ngx.var.server_port)
if not ok and err then return nil, 'there was a failure opening a connection: ' .. err
local res, err = http_client:request({
method = 'GET',
path = '/endpoint-providing-info'
})
//parse the response, etc...
The request gets routed to the upstream service and works as expected.
My primary concern is the following:
By connecting to localhost, I assumed that the current Nginx node is the one attending the request. Will this affect the performance? Is it better/possible to connect to the cluster directly?
I suppose that you configure for current nginx a location which match /endpoint-providing-info, use proxy module and configure an upstream for a cluster.
If you would use lua-resty-http:
Pros - you may use body_reader - an iterator function for reading the body in a streaming fashion.
Cons - your request will go thru kernel boundary (loopback interface).
Another possibility is to issue a subrequest using ngx.location.capture API
Pros - subrequests just mimic the HTTP interface but there is no extra HTTP/TCP traffic nor IPC involved. Everything works internally, efficiently, on the C level.
Cons - it is full buffered approach, will not work efficiently for big responses.
Update - IMO:
If you expect from upstream server big responses -lua-resty-http is your choice.
If you expect from upstream server a lot of small responses - ngx.location.capture should be used.
Related
I have a toy proxy server that accepts connections on a port. I set some deadlines for read/write operations to avoid having too many idle connections from bad clients that fail to close properly.
The problem is that I would like to set a higher deadline for connections that are towards websockets (wss in particular). For plain http requests I can see the 101 Switching Protocols response but https/wss is trickier since I mostly do a io.CopyBuffer from src connection to dst connection and I don't see anything "websocket related" in the initial proxy connection in order to differentiate between https and wss and apply the proper deadline.
I've included a debug screen to such a request towards a wss:// demo server.
Any ideas?
One cannot reliably distinguish between "normal" HTTP traffic and Websockets just from looking at the encrypted data.
One can try to do some heuristics by looking at traffic patterns, i.e. in which direction how many data are transferred in which time and with which idle times between data. Such heuristics can be based on the assumption that HTTP is a request + response protocol with typically small requests shortly followed by larger responses, while Websockets can show arbitrary traffic patterns.
But arbitrary traffic patterns also means that Websockets might be used in a request + response way too. (including request + response though). Also, in some use cases the usage pattern for HTTP consists of mostly larger requests followed by small responses. Thus depending on the kind of application such heuristics might be successful or they might fail.
It is always a good practice to define global Server timeout to make sure that resources are not locked forever. That timeout should be not smaller that longest timeout in all handlers.
DefaultServer = &http.Server{
Handler: http.TimeoutHandler(handler, wssTimeout, timeoutResponse),
...
}
In handler that processes http and wss requests, we need to set timeout dynamically.
func (proxy *ProxyHttpServer) handleHttps(w http.ResponseWriter, r *http.Request) {
// Request Context is going to be cancelled if client's connection closes, the request is canceled (with HTTP/2), Server we created above time outed.
// all code down the stack should respect that ctx.
ctx := r.Context()
timeoit := httpTimeout
if itIsWSS(r) {
timeout = wssTimeout
}
ctx, cancel = cWithTimeout(ctx, timeout)
defer cancel()
// all code below to use ctx instead of context.Backgound()/TODO()
In this situation, I'm using all standard Go libraries -- net/http, most importantly.
The application consists of two layers. The first layer is the basic web application. The web application serves out the UI, and proxies a bunch of API calls back to the second layer based on username -- so, it's effectively a load balancer with consistent hashing -- each user is allocated to one of these second-layer nodes, and any requests pertaining to that user must be sent to that particular node.
Quick details
These API endpoints in the first layer effectively read in a JSON body, check the username, use that to figure out which of the layer 2 nodes to send the JSON body to, and then it sends it there. This is done using a global http.Client that has timeouts set on it, as appropriate.
The server side does a defer request.Body.Close() in each of the handlers after ensuring no error comes back from decoder.Decode(&obj) calls that unmarshal the JSON. If there is any codepath where that could happen, it isn't one that's likely to get followed very often.
Symptoms
On the node in the second layer (the application server) I get log lines like this because it's leaking sockets presumably and sucking up all the FDs:
2019/07/15 16:16:59 http: Accept error: accept tcp [::]:8100: accept4: too many open files; retrying in 1s
2019/07/15 16:17:00 http: Accept error: accept tcp [::]:8100: accept4: too many open files; retrying in 1s
And, when I do lsof 14k lines are output, of which 11,200 are TCP sockets. When I look into the contents of lsof, I see that nearly all these TCP sockets are in connection state CLOSE_WAIT, and are between my application server (second layer node) and the web server (the first layer node).
Interestingly, nothing seems to go wrong with the web application server (layer 1) during this timeframe.
Why does this happen?
I've seen lots of explanations, but most either point out that you need to specify custom defaults on a custom http.Client and not use the default, or they tell you to make sure to close the request bodies after reading from them in the layer 2 handlers.
Given all this information, does anyone have any idea what I can do to at least put this to bed once and for all? Everything I search on the internet is user error, and while I certainly hope that's the case here, I worry that I've nailed down every last quirk of the Go standard library I can find.
Been having trouble nailing down exactly how long it takes for this to happen -- the last time it happened, it was up for 3 days before I started to see this error, and at that point obviously nothing recovers until I kill and restart the process.
Any help would be hugely appreciated!
EDIT: example of client-side code
Here is an example of what I'm doing in the web application (layer 1) to call the layer 2 node:
var webHttpClient = &http.Client{
Transport: &http.Transport{
MaxIdleConnsPerHost: MaxIdleConnections,
},
Timeout: time.Second * 20,
}
// ...
uri := fmt.Sprintf("http://%s/%s", tsUri, "pms/all-venue-balances")
req, e := http.NewRequest("POST", uri, bytes.NewBuffer(b))
resp, err := webHttpClient.Do(req)
if err != nil {
log.Printf("Submit rebal error 3: %v\n", err)
w.WriteHeader(500)
return
}
defer resp.Body.Close()
body, _ := ioutil.ReadAll(resp.Body)
w.WriteHeader(200)
w.Write(body)
I have built a router with extended logging capabilities using Go. It works properly for most use cases. However, it encounters problems when clients send non-standard HTTP messages on port 80.
To date, I have solved this by implementing my own version of ServeHTTP():
func (myproxy *MyProxy) ServeHTTP(w http.ResponseWtier, r *http.Request) {
// Inspect headers
// Determine if it is a custom protocol (ie: websockets, CONNECT requests)
// Implement handlers for each time
}
In the event that I determine a request is a non-standard HTTP protocol, the request is played back to the original destination (via http.Request.Write()) and everyone is happy.
At least, most of the time. The problem arises with edge cases. Tivo clients do not send "Host" headers and appear to not like all kinds of other standard things that Go does (such as capitalizing header names). The number of possible variations on this is endless so what I would very much like to do is to buffer the original request - exactly as it was sent to me without any modifications - and replay it back to the original destination server.
I could hack this by re-implementing net.Http.Server, but this seems like an extraordinarily difficult and brittle approach. Would I would prefer to do would be to somehow hook into net/http/http/server.go right around the point where it receives a connection, then wrap that in a spy wrapper that logs the request to a buffer.
func (srv *Server) Serve(l net.Listener) error {
// Lots of listener code...
c := srv.newConn(rw)
c.setState(c.rwc, StateNew) // before Serve can return
// I'd like to wrap c here and save the request for possible reply later.
go c.serve(ctx)
}
https://golang.org/src/net/http/server.go?s=93229:93284#L2805
Edit: I have looked at httpUtil:DumpRequest but this slightly modifies the original request (changes case and order of headers). If it didn't do that, it would be an ideal solution.
https://godoc.org/net/http/httputil#DumpRequest
Is there a way to hook connections around this point before they are parsed by http.Request?
In the interest of helping others, I wanted to answer my question. The approach I suggested above does in fact work and is the proper way to do this. In summary:
Implement ListenAndServe()
Wrap the incoming net.Conn in a TeeReader or other multiplexed connection wrapper
Record the request
Dial the original destination and connect with the inbound connection, replaying the original request if necessary.
A similar use case is required when upgrading connection requests for websockets servers. A nice writeup can be found here:
https://medium.freecodecamp.org/million-websockets-and-go-cc58418460bb
I am having trouble understanding ProxyFromEnvironment and ProxyURL in net/http package. Can someone please explain me when and why these two functionalities are used?
My current understanding (at least for ProxyFromEnvironment) is that this is used to get the URL of a PROXY Server from the Environment Variables and this PROXY Server is used to make HTTP Requests.
Both functions are related to how you use the http.Transport mechanism.
One can be used to allow the transport to dynamically retrieve the proxy settings from the environment, the other can be used to provide a static URL to be used by the transport for the proxy every time.
ProxyFromEnvironment is a func that returns a URL describing the proxy that is configured in the Environment; it can be assigned to the Transport.Proxy field, and every time the Transport makes a request, the Proxy will depend on the values in the Environment.
ProxyURL is a func that returns a general func which returns the given URL every time it is invoked; it can be used to generate a helper function to assign to the Transport.Proxy field, so that your Transport has a consistent Proxy every time the Transport makes a request.
I am working with a client/server application which uses HTTP, and my goal is to add new features to it. I can extend the client by hooking my own code to some specific events, but unfortunately the server is not customizable. Both client and server are in a Windows environment.
My current problem is that performance is awful when a lot of data are received from the server: it takes time to transmit it and time to process it. The solution could be to have an application on server side to do the processing and send only the result (which is much smaller). The problem is there is not built-in functions to manipulate responses from the server before sending them.
I was thinking to listen to all traffic on port 80, identifying relevant HTTP responses and send them to my application while blocking the response (to avoid sending huge data volume which won't be processed by the client). As I am lacking a lot of network knowledge, I am a bit lost when thinking about how to do it.
I had a look at some low-level packet intercepting methods like WinPCap, but it seems to require a lot of work to do what I need. Moreover I think it is not possible to block or modify responses with this API.
A reverse proxy which allows user scripts to be triggered by specific requests or responses would be perfect, but I am wondering if there is no simpler way to do this interception/send elsewhere work.
What would be the simplest and cleanest method to enable this behavior?
Thanks!
I ended making a simple reverse proxy to access the HTTP server. The reverse proxy then extracts relevant information from the server response and sends it to the server-side processing component, and replaces information extracted from the response by an ID the client uses to request the other component to get the processing results.
The article at http://www.codeproject.com/KB/web-security/HTTPReverseProxy.aspx was very helpful to make the first draft of the reverse proxy.
Hmm.... too much choices.
2 ideas:
configure on all clients a Http Proxy. there are some out there, that let you manipulate what goes through in both directions (with scripts, plugins).
or
make a pass through project, that listens to port 80, and forewards the needed stuff to port 8080 (where your original server app runs)
question is, what software is the server app running at,
and what knowledge (dev) do you have?
ah. and what is "huge data"? kilobyte? megabyte? gigabyte?