I'm using java-websocket for my websocket needs, inside a wowza application, and using nginx for ssl, proxying the requests to java.
The problem is that the connection seems to be cut after exactly 1 hour, server-side. The client-side doesn't even know that it was disconnected for quite some time. I don't want to just adjust the timeout on nginx, I want to understand why the connection is being terminated, as the socket is functioning as usual until it isn't.
EDIT:
Forgot to post the configuration:
location /websocket/ {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
include conf.d/proxy_websocket;
proxy_connect_timeout 1d;
proxy_send_timeout 1d;
proxy_read_timeout 1d;
}
And that included config:
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_pass http://127.0.0.1:1938/;
Nginx/1.12.2
CentOS Linux release 7.5.1804 (Core)
Java WebSocket 1.3.8 (GitHub)
The timeout could be coming from the client, nginx, or the back-end. When you say that it is being cut "server side" I take that to mean that you have demonstrated that it is not the client. Your nginx configuration looks like it shouldn't timeout for 1 day, so that leaves only the back-end.
Test the back-end directly
My first suggestion is that you try connecting directly to the back-end and confirm that the problem still occurs (taking nginx out of the picture for troubleshooting purposes). Note that you can do this with command line utilities like curl, if using a browser is not practical. Here is an example test command:
time curl --trace-ascii curl-dump.txt -i -N \
-H "Host: example.com" \
-H "Connection: Upgrade" \
-H "Upgrade: websocket" \
-H "Sec-WebSocket-Version: 13" \
-H "Sec-WebSocket-Key: BOGUS+KEY+HERE+IS+FINE==" \
http://127.0.0.1:8080
In my (working) case, running the above example stayed open indefinitely (I stopped with Ctrl-C manually) since neither curl nor my server was implementing a timeout. However, when I changed this to go through nginx as a proxy (with default timeout of 1 minute) as shown below I saw a 504 response from nginx after almost exactly 1 minute.
time curl -i -N --insecure \
-H "Host: example.com" \
https://127.0.0.1:443/proxied-path
HTTP/1.1 504 Gateway Time-out
Server: nginx/1.14.2
Date: Thu, 19 Sep 2019 21:37:47 GMT
Content-Type: text/html
Content-Length: 183
Connection: keep-alive
<html>
<head><title>504 Gateway Time-out</title></head>
<body bgcolor="white">
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx/1.14.2</center>
</body>
</html>
real 1m0.207s
user 0m0.048s
sys 0m0.042s
Other ideas
Someone mentioned trying proxy_ignore_client_abort but that shouldn't make any difference unless the client is closing the connection. Besides, although that might keep the inner connection open I don't think it is able to keep the end-to-end stream intact.
You may want to try proxy_socket_keepalive, though that requires nginx >= 1.15.6.
Finally, there's a note in the WebSocket proxying doc that hints at a good solution:
Alternatively, the proxied server can be configured to periodically send WebSocket ping frames to reset the timeout and check if the connection is still alive.
If you have control over the back-end and want connections to stay open indefinitely, periodically sending "ping" frames to the client (assuming a web browser is used then no change is needed on the client-side as it is implemented as part of the spec) should prevent the connection from being closed due to inactivity (making proxy_read_timeout unnecessary) no matter how long it's open or how many middle-boxes are involved.
Most likely it's because your configuration for the websocket proxy needs tweaking a little, but since you asked:
There are some challenges that a reverse proxy server faces in
supporting WebSocket. One is that WebSocket is a hop‑by‑hop protocol,
so when a proxy server intercepts an Upgrade request from a client it
needs to send its own Upgrade request to the backend server, including
the appropriate headers. Also, since WebSocket connections are long
lived, as opposed to the typical short‑lived connections used by HTTP,
the reverse proxy needs to allow these connections to remain open,
rather than closing them because they seem to be idle.
Within your location directive which handles your websocket proxying you need to include the headers, this is the example Nginx give:
location /wsapp/ {
proxy_pass http://wsbackend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
}
This should now work because:
NGINX supports WebSocket by allowing a tunnel to be set up between a
client and a backend server. For NGINX to send the Upgrade request
from the client to the backend server, the Upgrade and Connection
headers must be set explicitly, as in this example
I'd also recommend you have a look at the Nginx Nchan module which adds websocket functionality directly into Nginx. Works well.
Related
My ASP.NET Core 3.1 application runs in Kubernetes. The ingress (load balancer) terminates SSL and talks to the pod over plain HTTP.
Let's say the ingress is reachable at https://my-app.com:443, and talks to the pod (where my app is running) at http://10.0.0.1:80.
When handling a request, the middleware pipeline sees an HttpRequest object with the following:
Scheme == "http"
Host.Value == "my-app.com"
IsHttps == False
This is weird:
Judging by the Scheme (and IsHttps), it seems the HttpRequest object describes the forwarded request from the ingress to the pod, the one that goes over plain HTTP. However, why isn't Host.Value equal to 10.0.0.1 in that case?
Or vice versa: if HttpRequest is trying to be clever and represent the original request, the one to the ingress, why doesn't it show "https" along with "my-app.com"?
At no point in handling the request is there a request coming to http://my-app.com. It's either https://my-app.com or http://10.0.0.1. The combination is inconsistent.
Other details
Digging deeper, the HttpRequest object has the following headers (among others) that show the reverse proxying in action:
Host: my-app.com
Referer: https://my-app.com/swagger/index.html
X-Real-IP: 10.0.0.1
X-Forwarded-For: 10.0.0.1
X-Forwarded-Host: my-app.com
X-Forwarded-Port: 443
X-Forwarded-Proto: https
X-Scheme: https
I'm guessing HttpRequest is using these to get hold of the original host (my-app.com rather than 10.0.0.1) but it doesn't do the same for the original scheme (https rather than http).
Q1: Is this expected, and if so, what is the rationale?
Q2: What's the best way to get at the original URL (https://my-app.com)? The best I've found so far was to check if the X-Scheme and X-Forwarded-Host headers were present (by inspecting HttpRequest.Headers) and if so, using those. However, it's a little weird having to go to the raw HTTP headers in the middleware pipeline.
Q1: Is this expected, and if so, what is the rationale?
I would say yes, it's the expected behavior.
The 'Host.Value=my-app.com' of HttpRequest object is reflecting the request header field originated by client (web browser, curl, ...), e.g.:
curl --insecure -H 'Host: my-app.com' https://<FRONTEND_FOR_INGRESS-MOST_OFTEN_LB-IP>
which is set*[1] in location block for that server 'my-app.com' inside generated nginx.conf file:
...
set $best_http_host $http_host;
# [1] - Set Host value
proxy_set_header Host $best_http_host;
...
# Pass the extracted client certificate to the backend
...
proxy_set_header Connection $connection_upgrade;
proxy_set_header X-Request-ID $req_id;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header X-Forwarded-Host $best_http_host;
...
whereas 'http_host' variable is created based on following "ngx_http_core_module" core functionality:
$http_name -
arbitrary request header field; the last part of a variable name is the field name converted to lower case with dashes replaced by underscores
So described behavior is not anyhow unique to ASP.NET Core 3.0, you see unknown Dictionary containing key/value pairs of custom headers set explicitly by nginx controller, accordingly to nginx ingress current configuration, that's it.
You can inspect current nginx controller's configuration by your self with following command:
kubectl exec -it po/<nginx-ingress-controller-pod-name> -n <ingress-controller-namespace> -- cat /etc/nginx/nginx.conf
Q2: What's the best way to get at the original URL
(https://my-app.com)?
I would try with constructing it using Configuration snippet, this is by introducing another custom header, where you concatenate values.
We have the following config for our reverse proxy:
location ~ ^/stuff/([^/]*)/stuff(.*)$ {
set $sometoken $1;
set $some_detokener "foo";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Authorization "Basic $do_token_decoding";
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_redirect https://place/ https://place_with_token/$1/;
proxy_redirect http://place/ http://place_with_token/$1/;
resolver 10.0.0.2 valid=10s;
set $backend https://real_storage$2;
proxy_pass $backend;
}
Now, all of this works .... until the real_storage rotates a server. For example, say real_storage comes from foo.com. This is a load balancer which directs to two servers: 1.1.1.1 and 1.1.1.2. Now, 1.1.1.1 is removed and replaced with 1.1.1.3. However, nginx continues to try 1.1.1.1, resulting in:
epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while connecting to upstream, client: ..., server: ..., request: "GET ... HTTP/1.1", upstream: "https://1.1.1.1:443/...", host: "..."
Note that the upstream is the old server, shown by a previous log:
[debug] 1888#1888: *570837 connect to 1.1.1.1:443, fd:60 #570841
Is this something misconfigured on our side or the host for our real_storage?
*The best I could find that sounds even close to my issue is https://mailman.nginx.org/pipermail/nginx/2013-March/038119.html ...
Further Details
We added
proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504;
and it still failed. I am now beginning to suspect that since it is two ELBs (ours and theirs) then the resolver we are using is the problem - since it is amazon specific (per https://serverfault.com/a/929517/443939)...and amazon still sees it as valid, but it won't resolve externally (our server trying to hit theirs..)
I have removed the resolver altogether from one configuration and will see where that goes. We have not been able to reproduce this using internal servers, so we must rely on waiting for the third party servers to cycle (about once per week).
I'm a bit uncertain about this resolver being the issue only because a restart of nginx will solve the problem and get the latest IP pair :/
Is it possible that I have to set the dns variable without the https?:
set $backend real_storage$2;
proxy_pass https://$backend;
I know that you have to use a variable or else the re-resolve won't happen, but maybe it is very specific which part of the variable - as I have only ever seen it set up as above in my queries....but no reason was ever given...I'll set that up on a 2nd server and see what happens...
And for my 3rd server I am trying this comment and moving the set outside of location. Of course if anybody else has a concrete idea then I'm open to changing my testing for this go round :D
set $rootbackend https://real_storage;
location ~ ^/stuff/([^/]*)/stuff(.*)$ {
set $backend $rootbackend$2;
proxy_pass $backend;
}
Note that I have to set it inside because it uses a dynamic variable, though.
As it was correctly noted by #cnst, using a variable in proxy_pass makes nginx resolve address of real_storage for every request, but there are further details:
Before version 1.1.9 nginx used to cache DNS answers for 5 minutes.
After version 1.1.9 nginx caches DNS answers for a duration equal to their TTL, and the default TTL of Amazon ELB is 60 seconds.
So it is pretty legal that after rotation nginx keeps using old address for some time. As per documentation, the expiration time of DNS cache can be overridden:
resolver 127.0.0.1 [::1]:5353 valid=10s;
or
resolver 127.0.0.1 ipv6=off valid=10s;
There's nothing special about using variables within http://nginx.org/r/proxy_pass — any variable use will make nginx involve the resolver on each request (if not found in a server group — perhaps you have a clash?), you can even get rid of $backend if you're already using $2 in there.
As to interpreting the error message — you have to figure out whether this happens because the existing connections get dropped, or whether it's because nginx is still trying to connect to the old addresses.
You might also want to look into lowering the _time values within http://nginx.org/en/docs/http/ngx_http_proxy_module.html; they all appear to be set at 60s, which may be too long for your use-case:
http://nginx.org/r/proxy_connect_timeout
http://nginx.org/r/proxy_send_timeout
http://nginx.org/r/proxy_read_timeout
I'm not surprised that you're not able to reproduce this issue, because there doesn't seem to be anything wrong with your existing configuration; perhaps the problem manifested itself in an earlier revision?
The documentation says the following
Sets the HTTP protocol version for proxying. By default, version 1.0 is used. Version 1.1 is recommended for use with keepalive connections and NTLM authentication.
In my nginx config I have
location / {
proxy_http_version 1.1;
proxy_pass http://127.0.0.1:1980;
}
Doing http://127.0.0.1:1980 directly I can see my app get many request (when I refresh) on one connection. This is the response I send
HTTP/1.1 200 OK\nContent-Type:text/html\nContent-Length: 14\nConnection: keep-alive\n\nHello World!
However nginx makes one request and closes it. WTH? I can see nginx sends the "Connection: keep-alive" header. I can see it added the server and date header. I tried adding proxy_set_header Connection "keep-alive"; but that didn't help.
How do I get nginx to not close the connection every thread?
In order Nginx to keep connection alive, the following configuration is required:
Configure appropriate headers (HTTP 1.1 and Connection header does not contain "Close" value, the actual value doesn't matter, Keep-alive or just an empty value)
Use upstream block with keepalive instruction, just proxy_pass url won't work
Origin server should have keep-alive enabled
So the following Nginx configuration makes keepalive working for you:
upstream keepalive-upstream {
server 127.0.0.1:1980;
keepalive 64;
}
server {
location / {
proxy_pass http://keepalive-upstream;
proxy_set_header Connection "";
proxy_http_version 1.1;
}
}
Make sure, your origin server doesn't finalise the connection, according to RFC-793 Section 3.5:
A TCP connection may terminate in two ways: (1) the normal TCP close
sequence using a FIN handshake, and (2) an "abort" in which one or
more RST segments are sent and the connection state is immediately
discarded. If a TCP connection is closed by the remote site, the local
application MUST be informed whether it closed normally or was
aborted.
A bit more details can be found in the other answer on Stackoverflow.
keepalive should enable in upstream block, not direct proxy_pass http://ip:port.
For HTTP, the proxy_http_version directive should be set to “1.1” and the “Connection” header field should be cleared
like this:
upstream keepalive-upstream {
server 127.0.0.1:1980;
keepalive 23;
}
location / {
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_pass http://keepalive-upstream;
}
I have a website on www.mydomain.com and would like to direct traffic via proxy to another server when the URL ends with /blog. This other server runs a Wordpress blog and WORDPRESS_URL and SITE_URL both point to its IP address.
My current setup is, DNS points to mydomain.com, where nginx works as a reverse proxy. Any request is directed via proxy_pass to a web application running on localhost:3000, except the ones matching /blog. For these requests I have in my NGINX conf, inside the single server block, the following:
location /blog {
rewrite ^/blog/(.*)$ /$1 break;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
proxy_pass http://<blog-ip>;
proxy_redirect off;
}
The proxy works well and requests do go to the blog server, but page resources, like fonts and themes are not returned propertly due to CORS. I assume I have to change WORDPRESS_URL and SITE_URL to www.mydomain.com/blog and everything does function properly at first but, about 10 minutes after the URL changes, it stops working completely and www.mydomain.com/blog starts returning Bad Gateway.
The strangest part is, before the URL changes, ping and curl work just fine running from mydomain server:
$ ping <blog-ip> -W 2 -c 3
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.391/0.507/0.690/0.132 ms
$ curl -I <blog-ip>
HTTP/1.1 200 OK
After Bad Gateway begins, ping still works:
$ ping <blog-ip> -W 2 -c 3
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.391/0.507/0.690/0.132 ms
but curl does not:
$ curl -Iv <blog-ip>
Hostname was NOT found in DNS cache
* Trying <blog-ip>...
* connect to <blog-ip> port 80 failed: Connection refused
* Failed to connect to <blog-ip> port 80: Connection refused
* Closing connection 0
curl: (7) Failed to connect to <blog-ip> port 80: Connection refused
Interestingly, running curl or ping from my local machine works just fine, but the wordpress server becomes curl-invisible to mydomain server. The only way to stop Bad Gateway is to change WORDPRESS_URL and SITE_URL back to the server's IP and even then, it again only starts working after some time.
I am completely clueless about what is going on. Both servers are Digital Ocean droplets. I had issues before with undocumented blocks on their side (they do not allow sending email from a Droplet by default, so you have to contact support for that) and wondered if it is not the case. Their support, however, doesn't seem to know what is happening either, so I decided to try and post the question here.
Any thoughts or suggestions are much appreciated.
I have an older tornado server that handles vanilla WebSocket connections. I proxy these connections, via Nginx, from wss://info.mydomain.com to wss://mydomain.com:8080 in order to get around customer proxies that block non standard ports.
After the recent upgrade to Tornado 4.0 all connections get refused with a 403. What is causing this problem and how can I fix it?
Tornado 4.0 introduced an, on by default, same origin check. This checks that the origin header set by the browser is the same as the host header
The code looks like:
def check_origin(self, origin):
"""Override to enable support for allowing alternate origins.
The ``origin`` argument is the value of the ``Origin`` HTTP header,
the url responsible for initiating this request.
.. versionadded:: 4.0
"""
parsed_origin = urlparse(origin)
origin = parsed_origin.netloc
origin = origin.lower()
host = self.request.headers.get("Host")
# Check to see that origin matches host directly, including ports
return origin == host
In order for your proxied websocket connection to still work you will need to override check origin on the WebSocketHandler and whitelist the domains that you care about. Something like this.
import re
from tornado import websocket
class YouConnection(websocket.WebSocketHandler):
def check_origin(self, origin):
return bool(re.match(r'^.*?\.mydomain\.com', origin))
This will let the connections coming through from info.mydomain.com to get through as before.
I would like to propose and alternative solution, instead of messing with the tornado application code, I solved the issue by telling nginx to fix the host header:
location /ws {
proxy_set_header Host $host;
proxy_pass http://backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}