In my application, I'm hosting a fairly CPU-intensive engine on a web server, which is connected to clients via SignalR. From the client, the server will be signalled to do some work (via an AJAX request), and every 200ms will send down a queue of "animation events" which describe the work being done.
This is the code used to set up the connection on the client:
$.connection.hub.start({ transport: ['webSockets', 'serverSentEvents', 'longPolling'] })
And here's the related code in the backend:
private const int PUSH_INTERVAL = 200;
private ManualResetEvent _mrs;
private void SetupTimer(bool running)
{
if (running)
{
UpdateTimer = new Timer(PushEventQueue, null, 0, PUSH_INTERVAL);
}
else
{
/* Lock here to prevent race condition where the final call to PushEventQueue()
* could be followed by the timer calling PushEventQueue() one last time and
* thus the End event would not be the final event to arrive clientside,
* which causes a crash */
_mrs = new ManualResetEvent(false);
UpdateTimer.Dispose(_mrs);
_mrs.WaitOne();
Observer.End();
PushEventQueue(null);
}
}
private void PushEventQueue(object state)
{
SentMessages++;
SignalRConnectionManager<SimulationHub>.PushEventQueueToClient(ConnectionId, new AnimationEventSeries { AnimationPackets = SimulationObserver.EventQueue.FlushQueue(), UpdateTime = DateTime.UtcNow });
}
public static void PushEventQueueToClient(string connectionId, AnimationEventSeries series)
{
HubContext.Clients.Client(connectionId).queue(series);
}
And for completeness' sake, the related Javascript method:
self.hub.client.queue = function(data) {
self.eventQueue.addEvents(data);
};
When testing this functionality on localhost, it works absolutely smoothly, with no delay (as you would expect), using serverSentEvents as a transport method.
However, when used in production, this more often than not takes a very long time to complete. Using SignalR's logging and a bit of my own instrumentation, it can be seen that the first series of events reaches the client within a couple of seconds, which is totally acceptable. However, after that SignalR often gives the following error:
Keep alive has been missed, connection may be dead/slow.
Followed soon after by:
Keep alive timed out. Notifying transport that connection has been lost.
This will happen a few times, and then eventually, up to a minute later, the events will arrive, with my own instrumentation showing that they were sen from the server approximately 200ms apart, as expected. It can also be seen that in production, they were sent with the primary transport method, web sockets.
Is anyone aware of any issues that sending multiple SignalR requests on a timer might cause? Like I say, this primarily seems to happen with web sockets. I've been told that using web sockets is best practice, so I'm keen to keep using them, but if there isn't a workaround to these kinds of issues, then I'm afraid I'll have to remove them permanently.
Edit
I've now removed the option to use web sockets on the live site, and I'm running into the same issues with server sent events - several failed attempts to reconnect after the first queue update arrives.
Summing up our discussion, I don't think there are specific issues with websockets/signalr on azure.
I've sample code here: https://github.com/jonegerton/SignalR.StockTicker which can be used for testing, with some minor tweaks (I'll probably develop it as a test platform at some point).
Its based on the sample project from MS which can be found here: https://github.com/SignalR/SignalR-StockTicker.
I've put an example in azure here (http://stockticker.azurewebsites.net) for testing purposes. It has the default transport configurations enabled (ie websockets >> serversentevents >> longpolling)
Related
I have a Quarkus application where I use the event bus.
the code in question looks like this:
#ConsumeEvent(value = "execution-request", blocking = true)
#Transactional
#TransactionConfiguration(timeout = 3600)
public void consume(final Message<ExecutionRequest> msg) {
try {
execute(...);
} catch (final Exception e) {
// some logging
}
}
private void execute(...)
throws InterruptedException {
// it actually runs a long running task, but for
// this example this has the same effect
Thread.sleep(65000);
}
Why do I still get a
WARN [io.ver.cor.imp.BlockedThreadChecker] (vertx-blocked-thread-checker) Thread Thread[vert.x-worker-thread-0,5,main] has been blocked for 63066 ms, time limit is 60000 ms: io.vertx.core.VertxException: Thread blocked
I'm I doing something wrong? Is the blocking parameter at the ConsumeEvent annotation not enough to let that handle in a separate Worker?
Your annotation is working as designed; the method is running in a worker thread. You can tell by both the name of the thread "vert.x-worker-thread-0", and by the 60 second timeout before the warnings were logged. The eventloop thread only has a 3 second timeout, I believe.
The default Vert.x worker thread pool is not designed for "very" long running blocking code, as stated in their docs:
Warning:
Blocking code should block for a reasonable amount of time (i.e no more than a few seconds). Long blocking operations or polling operations (i.e a thread that spin in a loop polling events in a blocking fashion) are precluded. When the blocking operation lasts more than the 10 seconds, a message will be printed on the console by the blocked thread checker. Long blocking operations should use a dedicated thread managed by the application, which can interact with verticles using the event-bus or runOnContext
That message mentions blocking for more than 10 seconds triggers a warning, but I think that's a typo; the default is actually 60.
To avoid the warning, you'll need to create a dedicated WorkerExecutor (via vertx.createSharedWorkerExecutor) configured with a very high maxExcecuteTime. However, it does not appear you can tell the #ConsumeEvent annotation to use it instead of the default worker pool, so you'd need to manually create an event bus consumer, as well, or use a regular #ConsumeEvent annotation, but call workerExectur.executeBlocking inside of it.
I have a server-side streaming gRPC service that may have messages coming in very rapidly. A nice to have client feature would be to know there are more updates already queued by the time this onNext execution is ready to display in the UI, as I would simply display the next one instead.
StreamObserver< Info > streamObserver = new StreamObserver< info >( )
{
#Override
public void onNext( Info info )
{
doStuffForALittleWhile();
if( !someHasNextFunction() )
render();
}
}
Is there some has next function or method of detection I'm unaware of?
There's no API to determine if additional messages have been received, but not yet delivered to the application.
The client-side stub API (e.g., StreamObserver) is implemented using the more advanced ClientCall/ClientCall.Listener API. It does not provide any received-but-not-delivered hint.
Internally, gRPC processes messages lazily. gRPC waits until the application is ready for more messages (typically by returning from StreamObserver.onNext()) to try to decode another message. If it decodes another message then it will immediately begin delivering that message.
One way would be to have a small, buffer with messages from onNext. That would let you should the current message, and then check to see if another has arrived in the mean time.
I have an http web server that I'm trying to detect long-running requests and abort them. The following code successfully returns to the client upon timeout, but the async zone still continues to run to completion. How can I actually kill the request handler?
var zone = runZoned(() {
var timer = new Timer(new Duration(seconds: Config.longRequestTimeoutSeconds), () {
if (!completer.isCompleted) { // -- not already completed
log.severe('request timed out');
// TODO: This successfully responds to the client early, but it does nothing to abort the zone/handler that is already running.
// Even though the client will never see the result (and won't have to wait for it), the zone/handler will continue to run to completion as normal.
// TODO: Find a way to kill/abort/cancel the zone
completer.complete(new shelf.Response(HttpStatus.SERVICE_UNAVAILABLE, body: 'The server timed out while processing the request'));
}
});
return innerHandler(request) // -- handle request as normal (this may consist of several async futures within)
.then((shelf.Response response) {
timer.cancel(); // -- prevent the timeout intercept
if (!completer.isCompleted) { // -- not already completed (not timed out)
completer.complete(response);
}
})
.catchError(completer.completeError);
});
Nothing can kill running code except itself. If you want code to be interruptible, you need some way to tell it, and the code itself needs to terminate (likely by throwing an error). In this case, the innerHandler needs to be able to interrupt itself when requested. If it's not your code, that might not be possible.
You can write a zone that stops execution of asynchronous events when a flag is set (by modifying Zone.run etc.), but you must be very careful about that - it might never get to an asynchronous finally block and release resources if you start throwing away asynchronous events. So, that's not recommended as a general solution, only for very careful people willing to do manual resource management.
We have a Java class that listens to a database (Oracle) queue table and process it if there are records placed in that queue. It worked normally in UAT and development environments. Upon deployment in production, there are times when it cannot read a record from the queue. When a record is inserted, it cannot detect it and the records remain in the queue. This seldom happens but it happens. If I would give statistic, out of 30 records queued in a day, about 8 don't make it. We would need to restart the whole app for it to be able to read the records.
Here is a code snippet of my class..
public class SomeListener implements MessageListener{
public void onMessage(Message msg){
InputStream input = null;
try {
TextMessage txtMsg = (TextMessage) msg;
String text = txtMsg.getText();
input = new ByteArrayInputStream(text.getBytes());
} catch (Exception e1) {
// TODO Auto-generated catch block
logger.error("Parsing from the queue.... failed",e1);
e1.printStackTrace();
}
//process text message
}
}
Weird thing we cant find any traces of exceptions from the logs.
Can anyone help? by the way we set the receiveTimeout to 10 secs
We would need to restart the whole app for it to be able to read the records.
The most common reason for this is the listener thread is "stuck" in user code (//process text message). You can take a thread dump with jstack or jvisualvm or similar to see what the thread is doing.
Another possibility (with low volume apps like this) is the network (most likely a router someplace in the network) silently closes an idle socket because it has not been used for some time. If the container (actually the broker's JMS client library) doesn't know the socket is dead, it will never receive any more messages.
The solution to the first is to fix the code; the solution to the second is to enable some kind of heartbeat or keepalives on the connection so that the network/router does not close the socket when it has no "real" traffic on it.
You would need to consult your broker's documentation about configuring heartbeats/keepalives.
I noticed when I set up an onDisconnect(), hit my application on a different computer, and turn the WIFI off, my db is not updated; however, it is updated when I turn the WIFI back on. This worries me because I am building an application with expected mobile users and I want to gracefully handle temporary connection drops.
On the otherhand, /.info/connected knows about the disconnection and connection immediately.
Can anyone explain why this is happening and if there is a way to prevent the disconnect from happening once connection is reestablished?
Updated code:
var connectedRef, userRef;
connectedRef = new Firebase('https://{fb}/.info/connected');
userRef = new Firebase('https://{fb}/users/myUser');
connectedRef.on('value', function (snap) {
if (snap.val()) {
userRef.update({ online: true });
userRef.onDisconnect().update({ online: false }, function () {
console.log('Turn the Wi-Fi off after seeing this log.');
});
}
});
Result: The db does not set online to false when I turn the Wi-Fi off, unless I wait about 1 minute. The db does set online to false when I turn the Wi-Fi back on.
Turning off your wifi does not close the sockets in an efficient manner. Thus, the server has to wait for the socket to time out before it can fire onDisconnect. Since this is an entirely server-side process, the only possible outcomes are:
1) The user isn't allowed to perform the onDisconnect op (indicated in the callback immediately upon establishing the onDisconnect)
2) The event will fire when the socket times out or disconnects (the length of time is completely up to the browser/server negotiation (1 minute is not unreasonable)
3) Some data changes in Firebase between the time of establishing onDisconnect and the event firing that makes it invalid (the security rules won't allow it because the op is no longer valid)
To see your onDisconnect() fire a bit faster, try using goOffline(), which I believe will properly close the socket connections.