Vaadin 23 WebSockets recovery logic

Vaadin 23 WebSockets recovery logic - nginx

I'm still fighting with a proper configuration of WebSockets in Vaadin 23 application and NGINX.
For NGINX I configured the following:
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_read_timeout 300;
proxy_connect_timeout 300;
proxy_send_timeout 300;
reset_timedout_connection on;
I also tried 600(10m) instead of 300. The same issue.
For Vaadin application:
#Push(transport = Transport.WEBSOCKET_XHR)
vaadin.heartbeatInterval=300
vaadin.maxMessageSuspendTimeout=5000
Everything works relatively well on my computer. Most of the issues I may see on my iPhone - for example after being idle for a while - I click the button with #Async ListenableFuture logic, and may only see the progress bar I showed before:
progressBar.setVisible(true);
ListenableFuture listenableFuture = //some async method call
var ui = UI.getCurrent();
listenableFuture.addCallback(result -> {
ui.access(() -> {
// some UI updates
progressBar.setVisible(false);
});
}
}, err -> {
logger.error("Error", err);
});
block.
After that I don't see any issues in my NGINX/Tomcat error logs.. nothing. I just see a browser with an infinitive ProgressBar. But if I refresh the page - everything starts working properly again.
So, I'm trying to figure out what could be wrong and how Vaadin is supposed to detect a failed WS connection and recover it. What properties are responsible for this and how quickly it can be done. Could you please help me with this?
Also, is there any correlation between vaadin.heartbeatInterval and WebSockets ? And do I need to specify vaadin.pushLongPollingSuspendTimeout in case of Transport.WEBSOCKET_XHR ?

Related

Ingress support for websocket

I have a jetty web app running under k8s. This web app has a websocket end point. The service deployed is exposed via an nginx ingress on https.
Everything works fine, I have the web app running and the websockets work fine (ie messages get pushed and received) but the websockets close with a 1006 error code, which to be honest doesn't stop my code from working but doesn't look good either.
The websocket is exposed # /notifications. In a "normal" config, ie not k8s, just plain software installed on a VM, I would need to add the following to nginx.conf
location /notifications {
proxy_pass http://XXX/notifications;
proxy_read_timeout 3700s;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_set_header Origin '';
}
I tried doing this via the ingress
nginx.ingress.kubernetes.io/configuration-snippet: |
location /notifications {
proxy_pass http://webapp:8080/notifications;
proxy_http_version 1.1;
proxy_set_header Upgrade "websocket";
proxy_set_header Connection "Upgrade";
}
But it has no effect, ie I checked the nginx.conf generated and there is no such block added...
Anybody has had issues like this before? any clue on how to solve the 1006 issue?

1006 meaning
As per RFC-6455 1006 means Abnormal Closure:
Used to indicate that a connection was closed abnormally (that is, with no close frame being sent) when a status code is expected.
Also see CloseReason.CloseCodes (Java(TM) EE 7 Specification APIs)
There are so many possible causes either on server or on client.
Client errors: try websocket.org Echo Test
To isolate and debug client's errors, you may use websocket.org Echo Test
As for server error
Jetty
Jetty-related discussion is here: Question regarding abnormal · Issue #604 · eclipse/jetty.project. But it doesn't contain any solutions.
Race detector for golang server code
If your server is written on golang, you may try Data Race Detector - The Go Programming Language
Data races are among the most common and hardest to debug types of bugs in concurrent systems. A data race occurs when two goroutines access the same variable concurrently and at least one of the accesses is a write. See the The Go Memory Model for details.
Here is an example of a data race that can lead to crashes and memory corruption:
func main() {
c := make(chan bool)
m := make(map\[string\]string)
go func() {
m\["1"\] = "a" // First conflicting access.
c <- true
}()
m\["2"\] = "b" // Second conflicting access.
<-c
for k, v := range m {
fmt.Println(k, v)
}
}
Case for PHP code
The case for PHP code discussed here: Unclean a closed connection by close() websocket's method (1006) · Issue #236 · walkor/Workerman

Grafana 6.7 auth proxy behind nginx for automatic UI login

I have a Nginx reverse proxy in front of my Grafana server.
I'm trying to use Nginx auth_basic to automatically login the user into Grafana.
I would like to do this, to be able to automatically login an embedded iframe graph placed in another web application (not on the same network)
nginx.conf
server {
server_name grafana.mydomain.com;
...
location / {
proxy_pass http://grafana.mydomain.com;
}
location /grafana/ {
proxy_pass http://grafana.mydomain.com;
auth_basic "Restricted grafana.mydomain.com";
auth_basic_user_file /etc/nginx/htpasswd/grafana.mydomain.com;
proxy_set_header X-WEBAUTH-USER $remote_user;
proxy_set_header Authorization "";
}
}
grafana.ini
[auth.basic]
enabled = true
[security]
allow_embedding = true
cookie_samesite = lax
root_url = https://grafana.mydomain.com/grafana/
[auth.proxy]
enabled = true
header_name = X-WEBAUTH-USER
header_property = username
auto_sign_up = true
sync_ttl = 60
enable_login_token = true
What is happening with this setup, is that if I go to grafana.mydomain.com it appears the normal login and everything works fine
While if I go to grafana.mydomain.com/grafana/ after logging in with Nginx, Grafana return this:
If I try to click on any link on the page a lot of unauthorized errors appears and I get logged out.
I've been playing with those settings a lot:
proxy_set_header X-WEBAUTH-USER
root_url
enable_login_token
cookie_samesite
But was unable to make things working
The user is created inside Grafana, so I have tried to give the created user full permissions:
But I still get unauthorized errors and 404 errors
I'm not even sure this is the right path to achieve what I'm trying to do, any suggestions?

I've removed the two locations and placed the authentication for the / location
Then I've switched back cookie_samesite = none and it started working as it was supposed to do.
By doing this I lost the possibility to log into grafana normally

How to debug django channels in production (nginx)?

django channels is working on both my local server and in the development server in my production environment; however, I cannot get it to respond in production nor can I get it to work with the following Daphne command (dojos is the project name):
daphne -b 0.0.0.0 -p 8001 dojos.asgi:channel_layer
Here is sample of what happens after the command:
2019-05-08 08:17:18,463 INFO Starting server at tcp:port=8001:interface=0.0.0.0, channel layer dojos.asgi:channel_layer.
2019-05-08 08:17:18,464 INFO HTTP/2 support not enabled (install the http2 and tls Twisted extras)
2019-05-08 08:17:18,464 INFO Using busy-loop synchronous mode on channel layer
2019-05-08 08:17:18,464 INFO Listening on endpoint tcp:port=8001:interface=0.0.0.0
127.0.0.1:57186 - - [08/May/2019:08:17:40] "WSCONNECTING /chat/stream/" - -
127.0.0.1:57186 - - [08/May/2019:08:17:44] "WSDISCONNECT /chat/stream/" - -
127.0.0.1:57190 - - [08/May/2019:08:17:46] "WSCONNECTING /chat/stream/" - -
127.0.0.1:57190 - - [08/May/2019:08:17:50] "WSDISCONNECT /chat/stream/" - -
127.0.0.1:57192 - - [08/May/2019:08:17:52] "WSCONNECTING /chat/stream/" - -
(forever)
Meanwhile on the client side I get the following console info:
websocketbridge.js:121 WebSocket connection to 'wss://www.joinourstory.com/chat/stream/' failed: WebSocket is closed before the connection is established.
Disconnected from chat socket
I have a feeling that the problem is with nginx configuration, so here is my config file server block:
location /chat/stream/ {
proxy_pass http://0.0.0.0:8001;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $http_host;
}
location /static/ {
root /home/adam/LOCdojos;
}
I made sure that the consumers.py file had this line:
def ws_connect(message):
message.reply_channel.send(dict(accept=True))
I tried installing django debug toolbar w/ the channels panel per question:
Debugging django-channels
but it did not help in the production environment.
I am stuck - what is the next step?

I am also stuck with this kind of problem, but this:
proxy_pass http://0.0.0.0:8001;
looks really weird for me - is it works that way? Maybe:
proxy_pass http://127.0.0.1:8001;

dotnet core - Server hangs on Production

We are currently experiencing an issue when we run our dotnet core server setup on Production. We publish it in Bamboo and run it from an AWS linux server, and it sits behind an nginx reverse proxy.
Essentially, every few days our dotnet core server process will go mute. It silently accepts and hangs on web requests, and even silently ignores our (more polite) attempts to stop it. We have verified that it is actually the netcore process that hangs by sending curl requests directly to port 5000 from within the server. We've replicated our production deployment to the best of our ability to our test environment and have not been able to reproduce this failure mode.
We've monitored the server with NewRelic and have inspected it at times when it's gone into failure mode. We've not been able to correlate this behaviour with any significant level of traffic, RAM usage, CPU usage, or open file descriptor usage. Indeed, these measurements all seem to stay at very reasonable levels.
My team and I are a bit stuck as to what might be causing our hung server, or even what we can do next to diagnose it. What might be causing our server process to hang? What further steps can we take to diagnose the issue?
Extra Information
Our nginx conf template:
upstream wfe {
server 127.0.0.1:5000;
server 127.0.0.1:5001;
}
server {
listen 80 default_server;
location / {
proxy_set_header Host $http_host;
proxy_pass http://wfe;
proxy_read_timeout 20s;
# Attempting a fix suggested by:
# https://medium.com/#mshanak/soved-dotnet-core-too-many-open-files-in-system-when-using-postgress-with-entity-framework-c6e30eeff6d1
proxy_buffering off;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection keep-alive;
proxy_cache_bypass $http_upgrade;
fastcgi_buffers 16 16k;
fastcgi_buffer_size 32k;
}
}
Our Program.cs:
using System.Diagnostics.CodeAnalysis;
using System.IO;
using System.Net;
using Microsoft.AspNetCore;
using Microsoft.AspNetCore.Hosting;
using Microsoft.Extensions.Logging;
using Serilog;
namespace MyApplication.Presentation
{
[ExcludeFromCodeCoverage]
public class Program
{
public static void Main(string[] args)
{
IWebHost host = WebHost.CreateDefaultBuilder(args)
#if DEBUG
.UseKestrel(options => options.Listen(IPAddress.Any, 5000))
#endif
.UseStartup<Startup>()
.UseSerilog()
.Build();
host.Run();
}
}
}
During our CD build process, we publish our application for deployment with:
dotnet publish --self-contained -c Release -r linux-x64
We then deploy the folder bin/Release/netcoreapp2.0/linux-x64 to our server, and run publish/<our-executable-name> from within.
EDIT: dotnet --version outputs 2.1.4, both on our CI platform and on the production server.
When the outage starts, nginx logs show that server responses to requests change from 200 to 502, with a single 504 being emitted at the time of the outage.
At the same time, logs from our server process just stop. And there are warnings there, but they're all explicit warnings that we've put into our application code. None of them indicate that any exceptions have been thrown.

After a few days of investigation I've found the reason of that issue. It is being caused by glibc >= 2.27, which lead to GC hang at some conditions, so there is almost nothing to do about it. However you have a few options:
Use Alpine Linux. It doesn't rely on glibc.
Use older distro like Debian 9, Ubuntu 16.04 or any other with glibc < 2.27
Try to patch glibc by yourself at your own risk: https://sourceware.org/bugzilla/show_bug.cgi?id=25847
Or wait for the glibc patch to be reviewed by community and included in your favorite distro.
More information can be found here: https://github.com/dotnet/runtime/issues/47700

Rails 5 Action Cable deployment with Nginx, Puma & Redis

I am trying to deploy an Action Cable -enabled-application to a VPS using Capistrano. I am using Puma, Nginx, and Redis (for Cable). After a couple hurdles, I was able to get it working in a local developement environment. I'm using the default in-process /cable URL. But, when I try deploying it to the VPS, I keep getting these two errors in the JS-log:
Establishing connection to host ws://{server-ip}/cable failed.
Connection to host ws://{server-ip}/cable was interrupted while loading the page.
And in my app-specific nginx.error.log I'm getting these messages:
2016/03/10 16:40:34 [info] 14473#0: *22 client 90.27.197.34 closed keepalive connection
Turning on ActionCable.startDebugging() in the JS-prompt shows nothing of interest. Just ConnectionMonitor trying to reopen the connection indefinitely. I'm also getting a load of 301: Moved permanently -requests for /cable in my network monitor.
Things I've tried:
Using the async adapter instead of Redis. (This is what is used in the developement env)
Adding something like this to my /etc/nginx/sites-enabled/{app-name}:
location /cable/ {
proxy_pass http://puma;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
}
Setting Rails.application.config.action_cable.allowed_request_origins to the proper host (tried "http://{server-ip}" and "ws://{server-ip}")
Turning on Rails.application.config.action_cable.disable_request_forgery_protection
No luck. What is causing the issue?
$ rails -v
Rails 5.0.0.beta3
Please inform me of any additional details that may be useful.

Finally, I got it working! I've been trying various things for about a week...
The 301-redirects were caused by nginx actually trying to redirect the browser to /cable/ instead of /cable. This is because I had specified /cable/ instead of /cable in the location stanza! I got the idea from this answer.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Vaadin 23 WebSockets recovery logic - nginx

Related

Ingress support for websocket

Grafana 6.7 auth proxy behind nginx for automatic UI login

How to debug django channels in production (nginx)?

dotnet core - Server hangs on Production

Rails 5 Action Cable deployment with Nginx, Puma & Redis

Categories

Resources