Ingress support for websocket - nginx

I have a jetty web app running under k8s. This web app has a websocket end point. The service deployed is exposed via an nginx ingress on https.
Everything works fine, I have the web app running and the websockets work fine (ie messages get pushed and received) but the websockets close with a 1006 error code, which to be honest doesn't stop my code from working but doesn't look good either.
The websocket is exposed # /notifications. In a "normal" config, ie not k8s, just plain software installed on a VM, I would need to add the following to nginx.conf
location /notifications {
proxy_pass http://XXX/notifications;
proxy_read_timeout 3700s;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_set_header Origin '';
}
I tried doing this via the ingress
nginx.ingress.kubernetes.io/configuration-snippet: |
location /notifications {
proxy_pass http://webapp:8080/notifications;
proxy_http_version 1.1;
proxy_set_header Upgrade "websocket";
proxy_set_header Connection "Upgrade";
}
But it has no effect, ie I checked the nginx.conf generated and there is no such block added...
Anybody has had issues like this before? any clue on how to solve the 1006 issue?

1006 meaning
As per RFC-6455 1006 means Abnormal Closure:
Used to indicate that a connection was closed abnormally (that is, with no close frame being sent) when a status code is expected.
Also see CloseReason.CloseCodes (Java(TM) EE 7 Specification APIs)
There are so many possible causes either on server or on client.
Client errors: try websocket.org Echo Test
To isolate and debug client's errors, you may use websocket.org Echo Test
As for server error
Jetty
Jetty-related discussion is here: Question regarding abnormal · Issue #604 · eclipse/jetty.project. But it doesn't contain any solutions.
Race detector for golang server code
If your server is written on golang, you may try Data Race Detector - The Go Programming Language
Data races are among the most common and hardest to debug types of bugs in concurrent systems. A data race occurs when two goroutines access the same variable concurrently and at least one of the accesses is a write. See the The Go Memory Model for details.
Here is an example of a data race that can lead to crashes and memory corruption:
func main() {
c := make(chan bool)
m := make(map\[string\]string)
go func() {
m\["1"\] = "a" // First conflicting access.
c <- true
}()
m\["2"\] = "b" // Second conflicting access.
<-c
for k, v := range m {
fmt.Println(k, v)
}
}
Case for PHP code
The case for PHP code discussed here: Unclean a closed connection by close() websocket's method (1006) · Issue #236 · walkor/Workerman

Related

Vaadin 23 WebSockets recovery logic

I'm still fighting with a proper configuration of WebSockets in Vaadin 23 application and NGINX.
For NGINX I configured the following:
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_read_timeout 300;
proxy_connect_timeout 300;
proxy_send_timeout 300;
reset_timedout_connection on;
I also tried 600(10m) instead of 300. The same issue.
For Vaadin application:
#Push(transport = Transport.WEBSOCKET_XHR)
vaadin.heartbeatInterval=300
vaadin.maxMessageSuspendTimeout=5000
Everything works relatively well on my computer. Most of the issues I may see on my iPhone - for example after being idle for a while - I click the button with #Async ListenableFuture logic, and may only see the progress bar I showed before:
progressBar.setVisible(true);
ListenableFuture listenableFuture = //some async method call
var ui = UI.getCurrent();
listenableFuture.addCallback(result -> {
ui.access(() -> {
// some UI updates
progressBar.setVisible(false);
});
}
}, err -> {
logger.error("Error", err);
});
block.
After that I don't see any issues in my NGINX/Tomcat error logs.. nothing. I just see a browser with an infinitive ProgressBar. But if I refresh the page - everything starts working properly again.
So, I'm trying to figure out what could be wrong and how Vaadin is supposed to detect a failed WS connection and recover it. What properties are responsible for this and how quickly it can be done. Could you please help me with this?
Also, is there any correlation between vaadin.heartbeatInterval and WebSockets ? And do I need to specify vaadin.pushLongPollingSuspendTimeout in case of Transport.WEBSOCKET_XHR ?

dotnet core - Server hangs on Production

We are currently experiencing an issue when we run our dotnet core server setup on Production. We publish it in Bamboo and run it from an AWS linux server, and it sits behind an nginx reverse proxy.
Essentially, every few days our dotnet core server process will go mute. It silently accepts and hangs on web requests, and even silently ignores our (more polite) attempts to stop it. We have verified that it is actually the netcore process that hangs by sending curl requests directly to port 5000 from within the server. We've replicated our production deployment to the best of our ability to our test environment and have not been able to reproduce this failure mode.
We've monitored the server with NewRelic and have inspected it at times when it's gone into failure mode. We've not been able to correlate this behaviour with any significant level of traffic, RAM usage, CPU usage, or open file descriptor usage. Indeed, these measurements all seem to stay at very reasonable levels.
My team and I are a bit stuck as to what might be causing our hung server, or even what we can do next to diagnose it. What might be causing our server process to hang? What further steps can we take to diagnose the issue?
Extra Information
Our nginx conf template:
upstream wfe {
server 127.0.0.1:5000;
server 127.0.0.1:5001;
}
server {
listen 80 default_server;
location / {
proxy_set_header Host $http_host;
proxy_pass http://wfe;
proxy_read_timeout 20s;
# Attempting a fix suggested by:
# https://medium.com/#mshanak/soved-dotnet-core-too-many-open-files-in-system-when-using-postgress-with-entity-framework-c6e30eeff6d1
proxy_buffering off;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection keep-alive;
proxy_cache_bypass $http_upgrade;
fastcgi_buffers 16 16k;
fastcgi_buffer_size 32k;
}
}
Our Program.cs:
using System.Diagnostics.CodeAnalysis;
using System.IO;
using System.Net;
using Microsoft.AspNetCore;
using Microsoft.AspNetCore.Hosting;
using Microsoft.Extensions.Logging;
using Serilog;
namespace MyApplication.Presentation
{
[ExcludeFromCodeCoverage]
public class Program
{
public static void Main(string[] args)
{
IWebHost host = WebHost.CreateDefaultBuilder(args)
#if DEBUG
.UseKestrel(options => options.Listen(IPAddress.Any, 5000))
#endif
.UseStartup<Startup>()
.UseSerilog()
.Build();
host.Run();
}
}
}
During our CD build process, we publish our application for deployment with:
dotnet publish --self-contained -c Release -r linux-x64
We then deploy the folder bin/Release/netcoreapp2.0/linux-x64 to our server, and run publish/<our-executable-name> from within.
EDIT: dotnet --version outputs 2.1.4, both on our CI platform and on the production server.
When the outage starts, nginx logs show that server responses to requests change from 200 to 502, with a single 504 being emitted at the time of the outage.
At the same time, logs from our server process just stop. And there are warnings there, but they're all explicit warnings that we've put into our application code. None of them indicate that any exceptions have been thrown.
After a few days of investigation I've found the reason of that issue. It is being caused by glibc >= 2.27, which lead to GC hang at some conditions, so there is almost nothing to do about it. However you have a few options:
Use Alpine Linux. It doesn't rely on glibc.
Use older distro like Debian 9, Ubuntu 16.04 or any other with glibc < 2.27
Try to patch glibc by yourself at your own risk: https://sourceware.org/bugzilla/show_bug.cgi?id=25847
Or wait for the glibc patch to be reviewed by community and included in your favorite distro.
More information can be found here: https://github.com/dotnet/runtime/issues/47700

Rails 5 Action Cable deployment with Nginx, Puma & Redis

I am trying to deploy an Action Cable -enabled-application to a VPS using Capistrano. I am using Puma, Nginx, and Redis (for Cable). After a couple hurdles, I was able to get it working in a local developement environment. I'm using the default in-process /cable URL. But, when I try deploying it to the VPS, I keep getting these two errors in the JS-log:
Establishing connection to host ws://{server-ip}/cable failed.
Connection to host ws://{server-ip}/cable was interrupted while loading the page.
And in my app-specific nginx.error.log I'm getting these messages:
2016/03/10 16:40:34 [info] 14473#0: *22 client 90.27.197.34 closed keepalive connection
Turning on ActionCable.startDebugging() in the JS-prompt shows nothing of interest. Just ConnectionMonitor trying to reopen the connection indefinitely. I'm also getting a load of 301: Moved permanently -requests for /cable in my network monitor.
Things I've tried:
Using the async adapter instead of Redis. (This is what is used in the developement env)
Adding something like this to my /etc/nginx/sites-enabled/{app-name}:
location /cable/ {
proxy_pass http://puma;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
}
Setting Rails.application.config.action_cable.allowed_request_origins to the proper host (tried "http://{server-ip}" and "ws://{server-ip}")
Turning on Rails.application.config.action_cable.disable_request_forgery_protection
No luck. What is causing the issue?
$ rails -v
Rails 5.0.0.beta3
Please inform me of any additional details that may be useful.
Finally, I got it working! I've been trying various things for about a week...
The 301-redirects were caused by nginx actually trying to redirect the browser to /cable/ instead of /cable. This is because I had specified /cable/ instead of /cable in the location stanza! I got the idea from this answer.

NGINX configuration for Rails 5 ActionCable with puma

I am using Jelastic for my development environment (not yet in production).
My application is running with Unicorn but I discovered websockets with ActionCable and integrated it in my application.
Everything is working fine in local, but when deploying to my Jelastic environment (with the default NGINX/Unicorn configuration), I am getting this message in my javascript console and I see nothing in my access log
WebSocket connection to 'ws://dev.myapp.com:8080/' failed: WebSocket is closed before the connection is established.
I used to have on my local environment and I solved it by adding the needed ActionCable.server.config.allowed_request_origins in my config file. So I double-checked my development config for this and it is ok.
That's why I was wondering if there is something specific for NGINX config, else than what is explained on ActionCable git page
bundle exec puma -p 28080 cable/config.ru
For my application, I followed everything from enter link description here but nothing's mentioned about NGINX configuration
I know that websocket with ActionCable is quite new but I hope someone would be able to give me a lead on that
Many thanks
Ok so I finally managed to fix my issue. Here are the different steps which allowed to make this work:
1.nginx : I don't really know if this is needed but as my application is running with Unicorn, I added this into my nginx conf
upstream websocket {
server 127.0.0.1:28080;
}
server {
location /cable/ {
proxy_pass http://websocket/;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
}
}
And then in my config/environments/development.rb file:
config.action_cable.url = "ws://my.app.com/cable/"
2.Allowed request origin: I have then noticed that my connection was refused even if I was using ActionCable.server.config.allowed_request_origins in my config/environments/development.rb file. I am wondering if this is not due to the development default as http://localhost:3000 as stated in the documentation. So I have added this:
ActionCable.server.config.disable_request_forgery_protection = true
I have not yet a production environment so I am not yet able to test how it will be.
3.Redis password: as stated in the documentation, I was using a config/redis/cable.yml but I was having this error:
Error raised inside the event loop: Replies out of sync: #<RuntimeError: ERR operation not permitted>
/var/www/webroot/ROOT/public/shared/bundle/ruby/2.2.0/gems/em-hiredis-0.3.0/lib/em-hiredis/base_client.rb:130:in `block in connect'
So I understood the way I was setting my password for my redis server was not good.
In fact your have to do something like this:
development:
<<: *local
:url: redis://user:password#my.redis.com:6379
:host: my.redis.com
:port: 6379
And now everything is working fine and Actioncable is really impressive.
Maybe some of my issues were trivial but I am sharing them and how I resolved them so everyone can pick something if needed

How can I avoid uwsgi_modifier1 30 and keep WSGI my application location-independent?

I have a WSGI application using CherryPy hosted using uWSGI behind a ngnix server.
I would like for the application itself to be "portable". That is, the application should not know or care what URL it is mapped to, and should even work if mapped to multiple different URLs. I want to DRY by keeping the URL mapping information in one place only. Unfortunately, the only way I have found to do this involves using uwsgi_modifier 30, which has been called an ugly hack. Can I avoid that hack?
For the present purposes, I have created a tiny application called sample that demonstrates my question.
The ngnix config looks like this:
location /sample/ {
uwsgi_pass unix:/run/uwsgi/app/sample/socket;
include uwsgi_params;
uwsgi_param SCRIPT_NAME /sample;
uwsgi_modifier1 30;
}
The uwsgi config in /etc/uwsgi/apps-enabled/sample.js:
{
"uwsgi": {
"uid": "nobody",
"gid": "www-data",
"module": "sample:app"
}
}
...and the application itself:
#!/usr/bin/python
import cherrypy
class Root(object):
#cherrypy.expose
def default(self, *path):
return "hello, world; path=%r\n" % (path,)
app = cherrypy.Application(Root(), script_name=None)
It works:
The URL under which the application is mapped (/sample) appears only in one place: in the ngnix config file.
The application does not see that prefix and does not have to worry about it, it only receives whatever appears after /sample:
$ curl http://localhost/sample/
hello, world; path=()
$ curl http://localhost/sample/foo
hello, world; path=('foo',)
$ curl http://localhost/sample/foo/bar
hello, world; path=('foo', 'bar')
To motivate the reason for my question, let's say I have a development version of the application. I can make a second uwsgi app and point it to a different copy of the source code, add an extra location /sample.test/ { ... } to ngnix pointing to the new uwsgi app, and hack on it using the alternate URL without affecting the production version.
But it makes use of uwsgi_modifier1 30 which is supposedly an ugly hack:
http://uwsgi-docs.readthedocs.org/en/latest/Nginx.html
Note: ancient uWSGI versions used to support the so called “uwsgi_modifier1 30” approach. Do not do it. it is a really ugly hack
Now, I can do this:
location /something/ {
uwsgi_pass unix:/run/uwsgi/app/sample/socket;
include uwsgi_params;
}
...and this...
{
"uwsgi": {
"uid": "nobody",
"gid": "www-data",
"pythonpath": "", # no idea why I need this, btw
"mount": "/something=sample:app",
"manage-script-name": true
}
}
But it requires that I hardcode the path (/something) in 2 places instead of 1. Can I avoid that? Or should I stick with the original setup which uses uwsgi_modifier1 30?
My answer is really about simplifying things, because the following and the amount of configuration you have indicates one thing -- overkill.
CherryPy ⇐ WSGI ⇒ uWSGI ⇐ uwsgi ⇒ Nginx ⇐ HTTP ⇒ Client
CherryPy has production ready server that natively speaks HTTP. No intermediary protocol, namely WSGI, is required. For low traffic you can use it on its own. For high traffic with Nginx in front, like:
CherryPy ⇐ HTTP ⇒ Nginx ⇐ HTTP ⇒ Client
CherryPy has notion of an application and you can serve several applications with one CherryPy instance. CherryPy also can serve other WSGI applications. Recently I answer a related question.
Portability
The portability your are talking about is natively supported by CherryPy. That means you can mount an app to a given path prefix and there's nothing else to configure (well, as long as you build URLs with cherrypy.url and generally keep in mind that the app can be mounted to different path prefixes).
server.py
#!/usr/bin/env python3
import cherrypy
config = {
'global' : {
'server.socket_host' : '127.0.0.1',
'server.socket_port' : 8080,
'server.thread_pool' : 8
}
}
# proxy tool is optional
stableConf = {'/': {'tools.proxy.on': True}}
develConf = {'/': {'tools.proxy.on': True}}
class AppStable:
#cherrypy.expose
def index(self):
return 'I am stable branch'
class AppDevel:
#cherrypy.expose
def index(self):
return 'I am development branch'
cherrypy.config.update(config)
cherrypy.tree.mount(AppStable(), '/stable', stableConf)
cherrypy.tree.mount(AppDevel(), '/devel', develConf)
if __name__ == '__main__':
cherrypy.engine.signals.subscribe()
cherrypy.engine.start()
cherrypy.engine.block()
server.conf (optional)
server {
listen 80;
server_name localhost;
# settings for serving static content with nginx directly, logs, ssl, etc.
location / {
proxy_pass http://127.0.0.1:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}

Resources