I have started experiencing a performance issue, particularly after upgrading to version 2.2.0.
With more complicated pages, which use sever different hubs, it can take up to 30 seconds to initiate a connection:
[16:20:35 GMT+0100 (GMT Daylight Time)] SignalR: serverSentEvents transport connected. Initiating start request.
[16:21:05 GMT+0100 (GMT Daylight Time)] SignalR: The start request succeeded. Transitioning to the connected state.
To demonstrate the issue, I have created a test page, which only uses one hub. I call a server method on the hub and only retrieve one int value, without doing any database calls or any complicated calculations.
define(["knockout", "jquery", "signalr"],
function (ko, $) {
function SignalRTestViewModel() {
var self = this;
var connection = $.connection.dashboardHub;
self.init = function () {
connection.server.signalRTest();
};
self.test = ko.observable();
connection.client.populateSignalRTest = function (test) {
self.test(test);
};
}
return SignalRTestViewModel;
})
It still takes 2-3 seconds to initiate the connection and another 2 seconds to retrieve just that one int value.
The issue is the same for different transport methods (serverSentEvents, longPolling) and exists in all browsers.
I'm not too sure how to narrow down the issue and what can be causing those delays. Any help will be appreciated.
After a lot of debugging I found out that one of the dependencies, which was being resolved in quite a few of my services, was verifying Lucene search index every time it was resolved (which was completely unnecessary).
Commented it out and everything seems OK.
Related
I have an http web server that I'm trying to detect long-running requests and abort them. The following code successfully returns to the client upon timeout, but the async zone still continues to run to completion. How can I actually kill the request handler?
var zone = runZoned(() {
var timer = new Timer(new Duration(seconds: Config.longRequestTimeoutSeconds), () {
if (!completer.isCompleted) { // -- not already completed
log.severe('request timed out');
// TODO: This successfully responds to the client early, but it does nothing to abort the zone/handler that is already running.
// Even though the client will never see the result (and won't have to wait for it), the zone/handler will continue to run to completion as normal.
// TODO: Find a way to kill/abort/cancel the zone
completer.complete(new shelf.Response(HttpStatus.SERVICE_UNAVAILABLE, body: 'The server timed out while processing the request'));
}
});
return innerHandler(request) // -- handle request as normal (this may consist of several async futures within)
.then((shelf.Response response) {
timer.cancel(); // -- prevent the timeout intercept
if (!completer.isCompleted) { // -- not already completed (not timed out)
completer.complete(response);
}
})
.catchError(completer.completeError);
});
Nothing can kill running code except itself. If you want code to be interruptible, you need some way to tell it, and the code itself needs to terminate (likely by throwing an error). In this case, the innerHandler needs to be able to interrupt itself when requested. If it's not your code, that might not be possible.
You can write a zone that stops execution of asynchronous events when a flag is set (by modifying Zone.run etc.), but you must be very careful about that - it might never get to an asynchronous finally block and release resources if you start throwing away asynchronous events. So, that's not recommended as a general solution, only for very careful people willing to do manual resource management.
I am working with:
let callTheAPI = async {
printfn "\t\t\tMAKING REQUEST at %s..." (System.DateTime.Now.ToString("yyyy-MM-ddTHH:mm:ss"))
let! response = Http.AsyncRequestStream(url,query,headers,httpMethod,requestBody)
printfn "\t\t\t\tREQUEST MADE."
}
And
let cts = new System.Threading.CancellationTokenSource()
let timeout = 1000*60*4//4 minutes (4 mins no grace)
cts.CancelAfter(timeout)
Async.RunSynchronously(callTheAPI,timeout,cts.Token)
use respStrm = response.ResponseStream
respStrm.Flush()
writeLinesTo output (responseLines respStrm)
To call a web API (REST) and the let! response = Http.AsyncRequestStream(url,query,headers,httpMethod,requestBody) just hangs on certain queries. Ones that take a long time (>4 minutes) particularly. This is why I have made it Async and put a 4 minute timeout. (I collect the calls that timeout and make them with smaller time range parameters).
I started Http.RequestStream from FSharp.Data first, but I couldn't add a timeout to this so the script would just 'hang'.
I have looked at the API's IIS server and the application pool Worker Process active requests in IIS manager and I can see the requests come in and go again. They then 'vanish' and the F# script hangs. I can't find an error message anywhere on the script side or server side.
I included the Flush() and removed the timeout and it still hung. (Removing the Async in the process)
Additional:
Successful calls are made. Failed calls can be followed by successful calls. However, it seems to get to a point where all the calls time out and the do so without even reaching the server any more. (Worker Process Active Requests doesn't show the query)
Update:
I made the Fsx script output the queries and ran them through IRM with now issues (I have timeout and it never locks up). I have a suspicion that there is an issue with FSharp.Data.Http.
Async.RunSynchronously blocks. Read the remarks section in the docs: RunSynchronously. Instead, use Async.AwaitTask.
In my application, I'm hosting a fairly CPU-intensive engine on a web server, which is connected to clients via SignalR. From the client, the server will be signalled to do some work (via an AJAX request), and every 200ms will send down a queue of "animation events" which describe the work being done.
This is the code used to set up the connection on the client:
$.connection.hub.start({ transport: ['webSockets', 'serverSentEvents', 'longPolling'] })
And here's the related code in the backend:
private const int PUSH_INTERVAL = 200;
private ManualResetEvent _mrs;
private void SetupTimer(bool running)
{
if (running)
{
UpdateTimer = new Timer(PushEventQueue, null, 0, PUSH_INTERVAL);
}
else
{
/* Lock here to prevent race condition where the final call to PushEventQueue()
* could be followed by the timer calling PushEventQueue() one last time and
* thus the End event would not be the final event to arrive clientside,
* which causes a crash */
_mrs = new ManualResetEvent(false);
UpdateTimer.Dispose(_mrs);
_mrs.WaitOne();
Observer.End();
PushEventQueue(null);
}
}
private void PushEventQueue(object state)
{
SentMessages++;
SignalRConnectionManager<SimulationHub>.PushEventQueueToClient(ConnectionId, new AnimationEventSeries { AnimationPackets = SimulationObserver.EventQueue.FlushQueue(), UpdateTime = DateTime.UtcNow });
}
public static void PushEventQueueToClient(string connectionId, AnimationEventSeries series)
{
HubContext.Clients.Client(connectionId).queue(series);
}
And for completeness' sake, the related Javascript method:
self.hub.client.queue = function(data) {
self.eventQueue.addEvents(data);
};
When testing this functionality on localhost, it works absolutely smoothly, with no delay (as you would expect), using serverSentEvents as a transport method.
However, when used in production, this more often than not takes a very long time to complete. Using SignalR's logging and a bit of my own instrumentation, it can be seen that the first series of events reaches the client within a couple of seconds, which is totally acceptable. However, after that SignalR often gives the following error:
Keep alive has been missed, connection may be dead/slow.
Followed soon after by:
Keep alive timed out. Notifying transport that connection has been lost.
This will happen a few times, and then eventually, up to a minute later, the events will arrive, with my own instrumentation showing that they were sen from the server approximately 200ms apart, as expected. It can also be seen that in production, they were sent with the primary transport method, web sockets.
Is anyone aware of any issues that sending multiple SignalR requests on a timer might cause? Like I say, this primarily seems to happen with web sockets. I've been told that using web sockets is best practice, so I'm keen to keep using them, but if there isn't a workaround to these kinds of issues, then I'm afraid I'll have to remove them permanently.
Edit
I've now removed the option to use web sockets on the live site, and I'm running into the same issues with server sent events - several failed attempts to reconnect after the first queue update arrives.
Summing up our discussion, I don't think there are specific issues with websockets/signalr on azure.
I've sample code here: https://github.com/jonegerton/SignalR.StockTicker which can be used for testing, with some minor tweaks (I'll probably develop it as a test platform at some point).
Its based on the sample project from MS which can be found here: https://github.com/SignalR/SignalR-StockTicker.
I've put an example in azure here (http://stockticker.azurewebsites.net) for testing purposes. It has the default transport configurations enabled (ie websockets >> serversentevents >> longpolling)
I noticed when I set up an onDisconnect(), hit my application on a different computer, and turn the WIFI off, my db is not updated; however, it is updated when I turn the WIFI back on. This worries me because I am building an application with expected mobile users and I want to gracefully handle temporary connection drops.
On the otherhand, /.info/connected knows about the disconnection and connection immediately.
Can anyone explain why this is happening and if there is a way to prevent the disconnect from happening once connection is reestablished?
Updated code:
var connectedRef, userRef;
connectedRef = new Firebase('https://{fb}/.info/connected');
userRef = new Firebase('https://{fb}/users/myUser');
connectedRef.on('value', function (snap) {
if (snap.val()) {
userRef.update({ online: true });
userRef.onDisconnect().update({ online: false }, function () {
console.log('Turn the Wi-Fi off after seeing this log.');
});
}
});
Result: The db does not set online to false when I turn the Wi-Fi off, unless I wait about 1 minute. The db does set online to false when I turn the Wi-Fi back on.
Turning off your wifi does not close the sockets in an efficient manner. Thus, the server has to wait for the socket to time out before it can fire onDisconnect. Since this is an entirely server-side process, the only possible outcomes are:
1) The user isn't allowed to perform the onDisconnect op (indicated in the callback immediately upon establishing the onDisconnect)
2) The event will fire when the socket times out or disconnects (the length of time is completely up to the browser/server negotiation (1 minute is not unreasonable)
3) Some data changes in Firebase between the time of establishing onDisconnect and the event firing that makes it invalid (the security rules won't allow it because the op is no longer valid)
To see your onDisconnect() fire a bit faster, try using goOffline(), which I believe will properly close the socket connections.
When the Meteor server connection is lost, how can I verify that Meteor.call() failed? Meteor.call() doesn't return any value. Basically Ctrl+Z in the Meteor shell when your app is running, then do something in the app that triggers a Meteor.call i.e. adding a new blog post:
Meteor.call('createPhrase', phrase, function(error) {
console.log("This NEVER gets called if server is down.");
if (error) {
throwError(error.reason);
}
});
I tried using Session vars, but the reactivity screws it up, i.e. the code below will trigger an error in my template handler (that get's flashed to the browser quickly) and as soon as isMyError is set to true, then when the Meteor.call is successful the error goes away as per isMyError = false, but this looks really sloppy.
Session.set("isMyError", true);
Meteor.call('createPhrase', phrase, function(error) {
console.log("This NEVER gets called if server is down.");
Session.set("isMyError", false);
if (error) {
throwError(error.reason);
}
});
Template.index.isMeteorStatus = function () {
myClientStatus = Meteor.status();
if ( (myClientStatus.connected === false) || (Session.get("isMyError") === true) ) {
return false;
} else {
return true;
}
};
Meteor's calls are generally entered into a queue that are sent to the server in the order that they are called. If there is no connection they stay in the queue until the server is connected once more.
This is the reason nothing is returned because Meteor hopes that it can reconnect then send the call and when it does it does eventually return a result then.
If you want to validate whether the server is connected at the point of the call it's best to check Meteor.status().connected (which is reactive) and only run Meteor.call if it is else throw an error
if(Meteor.status().connected)
Meteor.call(....
else throwError("Error - not connected");
You could also use navigator.onLine to check whether the network is connected.
The reason you would experience a 60 second delay with Meteor.status().connected on the true status of whether meteor is connected or not is there isn't really a way for a browser to check if its connected or not.
Meteor sends a periodic heartbeat, a 'h' on the websocket/long polling wire to check it is connected. Once it realizes it didn't get a heartbeat on the other end it marks the connection disconnected.
However, it also marks it as disconnected if a Meteor.call or some data is sent through and the socket isn't able to send any data. If you use a Meteor.call beforehand to Meteor.status().connected it would realize much sooner that it is disconnected. I'm not sure it would realize it immediately that you can use them one line after the next, but you could use a Meteor.setTimeout after a second or two to fire the call.
Attempt to succeed:
Meteor is designed very well to attempt to succeed. Instead of 'attempting to fail' with an error stating the network is not available its better to try and queue everything up until the connection is back.
The best thing to do would be to avoid telling the user the network is down because usually they would know this. The queued tasks ensure the userflow would be unchanged as soon as the connection is back.
So it would be better to work with the queues that are built into the reconnection process rather than to avoid them.