SignalR disconnects and does not reconnect - signalr

Whenever my application gets reset, signalR disconnects but does not reconnect.
I have a long running server task which sends updates to clients when each task is completed.
// inside action executed on every completion of a task
var h = new ForceHub();
h.MessageSent(email);
above code stops sending updates when application gets reset (i can emulate this problem by touching web.config).
I'd like a way to reconnect to a client. Currently the user has to reload the page for it to get updates again.
Here is my hub definition
public class ForceHub : Hub
{
public void MessageSent(string text)
{
GetContext().Clients.All.sent(text);
}
public void UpdateStatus(string msg)
{
GetContext().Clients.All.status(msg);
}
IHubContext GetContext()
{
return GlobalHost.ConnectionManager.GetHubContext<ForceHub>();
}
public override Task OnConnected()
{
try {
IoC.Resolve<ILogger>().Info("SignalR Connected -----------");
}catch (Exception){}
return base.OnConnected();
}
public override Task OnDisconnected()
{
try {
IoC.Resolve<ILogger>().Info("SignalR Disconnected -----------");
}
catch (Exception) { }
return base.OnDisconnected();
}
public override Task OnReconnected()
{
try {
IoC.Resolve<ILogger>().Info("SignalR Re-Connected -----------");
}
catch (Exception) { }
return base.OnReconnected();
}
}
I can see Connected and Re-Connected events triggered after startup, however after touching web.config, I don't see any of these events triggered.
i tried catching this on the client, but this event is not tirggered:
$.connection.hub.disconnected(function () {
console.error('signalR disconnected, retrying connection');
logError('Signal lost.');
setTimeout(function () { connection.start(); }, 1000);
});
update
I also hooked into State Changed event, which does get triggered, but the re-connection attempt below does not work.
$.connection.hub.stateChanged(function (state) {
console.debug('signalR state changed', state);
if (state.newState == 1) {
console.debug('restarting');
setTimeout(function () { $.connection.hub.start(); }, 1000);
}
});
this event gets triggered twice: newState is 2 ,and then 1.

I might have a clue... Touching the Web.config produces an appPool Recycle, meaning that a new worker process will be created for new requests while the existing process will continue for a while until the remaining requests end or the timeout is reached. Request that do not end in the timeout period are terminated.
Signalr client reconnects to the new process while the long running task is running in the old process, so when on the long running task you do
GlobalHost.ConnectionManager.GetHubContext<ForceHub>();
you actually get a reference for "old" hub while the client is connected to the "new" hub.
That's why the test preformed by Wasp worked: he was making a new request to publish on the signalr hub that was processed in the newly created worker process.
You could try to configure a singalr backplane (https://www.asp.net/signalr/overview/performance/scaleout-in-signalr), it’s really easy to configure it using Sql Server (https://www.asp.net/signalr/overview/performance/scaleout-with-sql-server). The backplane should be capable of connect the two worker processes and hopefully you will get the notification on the client.
If this is the problem, notifications generated by new requests will work even without the backplane. Notice that the real purpose of the backplane is to scale out signalr, this is, to connect a farm of WebServers between them.
Also keep in mind that running long-running task inside IIS is as task hard to achieve as, among other things, IIS does regular appPool recycles and has timeout limits for the requests to execute. I recommend that you read the following post: http://www.hanselman.com/blog/HowToRunBackgroundTasksInASPNET.aspx
“If you think you can just write a background task yourself, it's likely you'll get it wrong. I'm not impugning your skills, I'm just saying it's subtle. Plus, why should you have to?”
Hope this helps

Related

Configure Windows Service to restart on both graceful AND ungraceful shutdown

I am aware of the Recovery section of the Service control, and how we can set an app to restart after failure.
I have created a .NET 6 worker service to run as a windows service. The problem is that whenever there is an exception in the code, the app logs the error but then shuts down gracefully. This does not signal to windows that the service should be restarted since it returns an exit code of 0.
I've tried returning an exit code of -1 (by setting Environment.ExitCode and returning -1 from Main()) but it's ignored.
I've also tried setting the exit code of the underlying WindowsServiceLifetime and that also does not work.
Are there any ways to have the SCM restart the service no matter how it shut down?
Exceptions should not bring down the host. Exceptions do not bring down IIS and they should not bring down a Windows Service.
You should put try/catch where work begins – every endpoint and background service. In the catch you should log the error.
Here is an endpoint example:
[Route("Get")]
[HttpGet]
public async Task<IActionResult> GetAsync()
{
try
{
return Ok(await BusinessRules.GetSomethingAsync());
}
catch (Exception e)
{
_logger.LogError(e, e.Message);
throw;
}
}
Here is a background service example:
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
while (!stoppingToken.IsCancellationRequested)
{
try
{
//Need a try/catch round Task.Delay because exception will the thrown
//if stoppingToken is activated and we don't care about logging this exception.
try
{
await Task.Delay(TimeSpan.FromMinutes(1), stoppingToken);
}
catch { }
await BusinessRules.DoSomethingAsync(stoppingToken);
}
catch (Exception e)
{
_logger.LogError(e, e.Message);
//In a loop, log file can fill up quickly unless we slow it down after error.
try
{
await Task.Delay(TimeSpan.FromSeconds(10), stoppingToken);
}
catch { }
}
}
}

gRPC client failing with "CANCELLED: io.grpc.Context was cancelled without error"

I have a gRPC server written in C++ and a client written in Java.
Everything was working fine using a blocking stub. Then I decided that I want to change one of the calls to be asynchronous, so I created an additional stub in my client, this one is created with newStub(channel) as opposed to newBlockingStub(channel). I didn't make any changes on the server side. This is a simple unary RPC call.
So I changed
Empty response = blockingStub.callMethod(request);
to
asyncStub.callMethod(request, new StreamObserver<Empty>() {
#Override
public void onNext(Empty response) {
logInfo("asyncStub.callMethod.onNext");
}
#Override
public void onError(Throwable throwable) {
logError("asyncStub.callMethod.onError " + throwable.getMessage());
}
#Override
public void onCompleted() {
logInfo("asyncStub.callMethod.onCompleted");
}
});
Ever since then, onError is called when I use this RPC (Most of the time) and the error it gives is "CANCELLED: io.grpc.Context was cancelled without error". I read about forking Context objects when making an RPC call from within an RPC call, but that's not the case here. Also, the Context seems to be a server side object, I don't see how it relates to the client. Is this a server side error propagating back to the client? On the server side everything seems to complete successfully, so I'm at a loss as to why this is happening. Inserting a 1ms sleep after calling asyncStub.callMethod seems to make this issue go away, but defeats the purpose. Any and all help in understanding this would be greatly appreciated.
Some notes:
The processing time on the server side is around 1 microsecond
Until now, the round trip time for the blocking call was several hundred microseconds (This is the time I'm trying to cut down, as this is essentially a void function, so I don't need to wait for a response)
This method is called multiple times in a row, so before it used to wait until the previous one finished, now they just fire off one after the other.
Some snippets from the proto file:
service EventHandler {
rpc callMethod(Msg) returns (Empty) {}
}
message Msg {
uint64 fieldA = 1;
int32 fieldB = 2;
string fieldC = 3;
string fieldD = 4;
}
message Empty {
}
So it turns out that I was wrong. The context object is used by the client too.
The solution was to do the following:
Context newContext = Context.current().fork();
Context origContext = newContext.attach();
try {
// Call async RPC here
} finally {
newContext.detach(origContext);
}
Hopefully this can help someone else in the future.

How to run an async task daily in a Kestrel process?

How do I run an async task in a Kestrel process with a very long time interval (say daily or perhaps even longer)? The task needs to run in the memory space of the web server process to update some global variables that slowly go out of date.
Bad answers:
Trying to use an OS scheduler is a poor plan.
Calling await from a controller is not acceptable. The task is slow.
The delay is too long for Task.Delay() (about 16 hours or so and Task.Delay will throw).
HangFire, etc. make no sense here. It's an in-memory job that doesn't care about anything in the database. Also, we can't call the database without a user context (from a logged-in user hitting some controller) anyway.
System.Threading.Timer. It's reentrant.
Bonus:
The task is idempotent. Old runs are completely irrelevant.
It doesn't matter if a particular page render misses the change; the next one will get it soon enough.
As this is a Kestrel server we're not really worried about stopping the background task. It'll stop when the server process goes down anyway.
The task should run once immediately on startup. This should make coordination easier.
Some people are missing this. The method is async. If it wasn't async the problem wouldn't be difficult.
I am going to add an answer to this, because this is the only logical way to accomplish such a thing in ASP.NET Core: an IHostedService implementation.
This is a non-reentrant timer background service that implements IHostedService.
public sealed class MyTimedBackgroundService : IHostedService
{
private const int TimerInterval = 5000; // change this to 24*60*60 to fire off every 24 hours
private Timer _t;
public async Task StartAsync(CancellationToken cancellationToken)
{
// Requirement: "fire" timer method immediatly.
await OnTimerFiredAsync();
// set up a timer to be non-reentrant, fire in 5 seconds
_t = new Timer(async _ => await OnTimerFiredAsync(),
null, TimerInterval, Timeout.Infinite);
}
public Task StopAsync(CancellationToken cancellationToken)
{
_t?.Dispose();
return Task.CompletedTask;
}
private async Task OnTimerFiredAsync()
{
try
{
// do your work here
Debug.WriteLine($"{TimerInterval / 1000} second tick. Simulating heavy I/O bound work");
await Task.Delay(2000);
}
finally
{
// set timer to fire off again
_t?.Change(TimerInterval, Timeout.Infinite);
}
}
}
So, I know we discussed this in comments, but System.Threading.Timer callback method is considered a Event Handler. It is perfectly acceptable to use async void in this case since an exception escaping the method will be raised on a thread pool thread, just the same as if the method was synchronous. You probably should throw a catch in there anyway to log any exceptions.
You brought up timers not being safe at some interval boundary. I looked high and low for that information and could not find it. I have used timers on 24 hour intervals, 2 day intervals, 2 week intervals... I have never had them fail. I have a lot of them running in ASP.NET Core in production servers for years, too. We would have seen it happen by now.
OK, so you still don't trust System.Threading.Timer...
Let's say that, no... There is just no fricken way you are going to use a timer. OK, that's fine... Let's go another route. Let's move from IHostedService to BackgroundService (which is an implementation of IHostedService) and simply count down.
This will alleviate any fears of the timer boundary, and you don't have to worry about async void event handlers. This is also a non-reentrant for free.
public sealed class MyTimedBackgroundService : BackgroundService
{
private const long TimerIntervalSeconds = 5; // change this to 24*60 to fire off every 24 hours
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
// Requirement: "fire" timer method immediatly.
await OnTimerFiredAsync(stoppingToken);
var countdown = TimerIntervalSeconds;
while (!stoppingToken.IsCancellationRequested)
{
if (countdown-- <= 0)
{
try
{
await OnTimerFiredAsync(stoppingToken);
}
catch(Exception ex)
{
// TODO: log exception
}
finally
{
countdown = TimerIntervalSeconds;
}
}
await Task.Delay(1000, stoppingToken);
}
}
private async Task OnTimerFiredAsync(CancellationToken stoppingToken)
{
// do your work here
Debug.WriteLine($"{TimerIntervalSeconds} second tick. Simulating heavy I/O bound work");
await Task.Delay(2000);
}
}
A bonus side-effect is you can use long as your interval, allowing you more than 25 days for the event to fire as opposed to Timer which is capped at 25 days.
You would inject either of these as so:
services.AddHostedService<MyTimedBackgroundService>();

Reliably counting the number of client connections to a SignalR hub

I'm creating a web dashboard that will display the status of our test environments.
I use a hub to connect the browser to the server and have a background task that polls the status of the environment. I only want to perform this check if at least one client is connected.
My hub looks a little like this:
public class StatusHub : Hub
{
private static int connectionCount = 0;
public override Task OnConnected()
{
Interlocked.Increment(ref connectionCount);
return base.OnConnected();
}
public override Task OnReconnected()
{
Interlocked.Increment(ref connectionCount);
return base.OnReconnected();
}
public override Task OnDisconnected()
{
Interlocked.Decrement(ref connectionCount);
return base.OnDisconnected();
}
// other useful stuff
}
This mainly works but sometimes OnConnected is called but OnDisconnected is not.
One specific case is if I open chrome and type the address of the page but don't actually navigate to it. It seems Chrome is pre-fetching the page and connecting, but never disconnecting.
So two questions:
Is this a good approach to counting connections (I'm never going to be running in a web farm environment)?
Will these zombied connections from Chrome eventually timeout (I tried setting timeouts very low but still didn't get a disconnect)?
The events will always fire. If they don't, file a bug with repro steps on github. To get a more accurate number, you can store a hashset of connection ids and get the count from that.

How to call properly HTTP client from HTTP server request handler in netty?

I am developing custom HTTP server with netty 3.3.1.
I need to implement something like this
HTTP Server receives request
HTTP Server parses it and invokes HTTP request as a client to other machine
HTTP Server waits for the response of request sent in (2)
HTTP Server sends response to request from (1) based on what had received in (3)
It means that client request (2) has to behave as synchronous.
What I wrote is based on HttpSnoopClient example but it does not work, because I receive
java.lang.IllegalStateException:
await*() in I/O thread causes a dead lock or sudden performance drop. Use addListener() instead or call await*() from a different thread.
I've refactored the code from the example mentioned above and now it looks more less like this (starting from line 7f of HttpSnoopClient):
ChannelFuture future = bootstrap.connect(new InetSocketAddress(host, port));
future.addListener(new ChannelFutureListener() {
public void operationComplete(ChannelFuture future) {
if (!future.isSuccess()) {
System.err.println("Cannot connect");
future.getCause().printStackTrace();
bootstrap.releaseExternalResources();
return;
}
System.err.println("Connected");
Channel channel = future.getChannel();
// Send the HTTP request.
channel.write(request);
channel.close();
// Wait for the server to close the connection.
channel.getCloseFuture().addListener(new ChannelFutureListener() {
public void operationComplete(ChannelFuture future) {
System.err.println("Disconnected");
bootstrap.releaseExternalResources(); // DOES NOT WORK?
}
});
}
});
}
}
The run() command from the above example is invoked in the messageReceived function of my herver handler.
So it became asynchronous and avoid await* functions. Request is invoked properly. But - for uknown reason for me - the line
bootstrap.releaseExternalResources(); // DOES NOT WORK?
does not work. It throws an exception saying that I cannot kill the thread I am currently using (which sounds reasonable, but still does not give me an answer how to do that in a different way).
I am also not sure is this a correct approach?
Maybe you can recommend a tutorial of such event programming techniques in netty? How to deal - in general - with a few asynchronous requests that has to be invoked in specified order and wait for each other?
Thank you,
If you really want to release the bootstrap on close you can do it like this:
channel.getCloseFuture().addListener(new ChannelFutureListener() {
public void operationComplete(ChannelFuture future) {
System.err.println("Disconnected");
new Thread(new Runnable() {
public void run() {
bootstrap.releaseExternalResources();
}
}).start();
}
});

Resources