gRPC not dropping disconnected channel - grpc

Steps to reproduce
Start server
Send a client RPC to server
Restart server
Using the same client, send another RPC. The call will fail
Send another RPC, this call will success
Also I found that if the server is leave stopped for a long time before starting up again, the call in step 5 will return "channel is in state TRANSIENT_FAILURE" as well.
Example code: https://github.com/whs/grpc-repro
(Install from requirements.txt then run main.py)
Expected result
All call should success.
Tested with Python grpcio==1.19.0 server/client and with go-grpc server. I tried setting grpc.max_connection_age_grace_ms, grpc.max_connection_age_ms, grpc.max_connection_idle_ms, grpc.keepalive_time_ms, grpc.keepalive_permit_without_calls but they doesn't seems to help.

The question is duplicated with https://groups.google.com/forum/#!msg/grpc-io/199V_iF0NMw/NahHz_vMBwAJ.
The feature you want probably is "wait_for_ready". In case of TRANSIENT_FAILURE (server not available temporarily), it will automatically wait for the channel become READY again without failing. Read more about wait for ready.

Related

handle server shutdown while serving http request

Scenario : The server is in middle of processing a http request and the server shuts down. There are multiple points till where the code has executed. How are such cases typically handled ?. A typical example could be that some downstream http calls had to be made as a part of the incoming http request. How to find whether such calls were made or not made when the shutdown occurred. I assume that its not possible to persist every action in the code flow. Suggestions and views are welcome.
There are two kinds of shutdowns to consider here.
There are graceful shutdowns: when the execution environment politely asks your process to stop (e.g. systemd sends a SIGTERM) and expects it to exit on its own. If your process doesn’t exit within a few seconds, the environment proceeds to kill the process in a more forceful way.
A typical way to handle a graceful shutdown is:
listen for the signal from the environment
when you receive the signal, stop accepting new requests...
...and then wait for all current requests to finish
Exactly how you do this depends on your platform/framework. For instance, Go’s standard net/http library provides a Server.Shutdown method.
In a typical system, most shutdowns will be graceful. For example, when you need to restart your process to deploy a new version of code, you do a graceful shutdown.
There can also be unexpected shutdowns: e.g. when you suddenly lose power or network connectivity (a disconnected server is usually as good as a dead one). Such faults are harder to deal with. There’s an entire body of research dedicated to making distributed systems robust to arbitrary faults. In the simple case, when your server only writes to a single database, you can open a transaction at the beginning of a request and commit it before returning the response. This will guarantee that either all the changes are saved to the database or none of them are. But if you call multiple downstream services as part of one upstream HTTP request, you need to coordinate them, for example, with a saga.
For some applications, it may be OK to ignore unexpected shutdowns and simply deal with any inconsistencies manually if/when they arise. This depends on your application.

SQL to resume suspended messages in order

We have an upcoming deploy for a system that processes a lot of messages through BizTalk. Since those messages are cumulative updates they need to be queued up during the deployment outage then processed in order when the deploy is finished. Since there may be a large number of them it’s difficult to do this manually.
One possible solution is to leave the send port stopped and let the messages suspend. We can then resume them in order when the deployment is completed.
Is it possible to run a SQL script (or a tool) against the BizTalk messagebox database that will resume suspended messages, for a specific port, in order of receipt?
If you have an ordered requirement (you either do or don't), then the Send Port should be marked for Ordered Delivery.
If so, then when you Start a Stopped Send Port, the messages will be processed in the same order they were submitted.
If you stop the port (but leave it subscribed) and start it again afterwards it should resume the message itself, or if not it is simple enough to go into the Administration Console and batch resume them.
However if the response messages of the send port are subscribed too by running Orchestrations you will not be able to un-deploy the Orchestrations until they have all completed, so stopping the send port would not work in this scenario.
Sometimes one option is if the initiating port is a one way receive, is to stop the receive location and let everything complete. You can then stop the application and redeploy and restart it and the send port will pick up all the waiting message to process.
If the above is not possible you may want to look at doing a side by side deployment where you increment the version numbers of all the assemblies in the solution so you can have both versions deployed at the same time and you can then allow the old version to finish running but have the new version processing any new messages.
The better option is to send messages to msmq, usually there is no extra coding required for this. You can just route messages to msmq using MSMQ adapter and then after deployment receive them in order as MSMQ adapter allows to receive in order. Just make sure you do a small test in yr QA environment before doing it in production.

Client Reconnection

My understanding of the (JavaScript) hub client is that if a connection is lost, it enters a 'Reconnecting...' phase which attempts to reconnect. If it can't do so, it will enter a 'Disconnected' state which is where it'll stay until asked to start again.
How long is the 'Reconnecting...' phase meant to last before it gives up? I've read 40 seconds before, but my client seems to take much less time - about 10, maybe less. [EDIT: Nevermind this part, I had configured a 10 disconnect on the server as a test... and forgot. I understand this is set by the server during the negotiate. Makes sense!] ... I'd prefer to have the client continually retry until it is told to abort - can this be done, and would it cause issues?
Another question; during the Reconnecting... phase, if I attempt to call a hub method (again, in JS) it never seems to complete. I'm using the returned Deferred to check for 'done' and 'fail' events, but neither seems to get called. Is this by design?
Thanks.
You can definitely have it continually reconnect.
Handle the disconnected event on the client and call connection.start:
$.connection.hub.disconnected(function() {
setTimeout(function() {
$.connection.hub.start();
}, 5000); // Re-start connection after 5 seconds
});
The only issues this would cause is that you could potentially be triggering infinite requests to a server that isn't there for client machines. This becomes even more troublesome when you introduce the mobile market into the situation (drains battery like crazy).
When you attempt to call a hub method while reconnecting SignalR will try to send your command. Since there are 2 channels, one for receiving data and one for sending, (for all transports except web sockets) in some cases it can still be possible to send requests while your offline. Therefore SignalR does not know if a request fails until the browser tells it that it could not successfully make the request.
Hope this helps!
I might have a clue... Touching the Web.config produces an appPool Recycle, meaning that a new worker process will be created for new requests while the existing process will continue for a while until the remaining requests end or the timeout is reached. Request that do not end in the timeout period are terminated.
Signalr client reconnects to the new process while the long running task is running in the old process, so when on the long running task you do
GlobalHost.ConnectionManager.GetHubContext<ForceHub>();
you actually get a reference for "old" hub while the client is connected to the "new" hub.
That's why the test preformed by Wasp worked: he was making a new request to publish on the signalr hub that was processed in the newly created worker process.
You could try to configure a singalr backplane (https://www.asp.net/signalr/overview/performance/scaleout-in-signalr), it’s really easy to configure it using Sql Server (https://www.asp.net/signalr/overview/performance/scaleout-with-sql-server). The backplane should be capable of connect the two worker processes and hopefully you will get the notification on the client.
If this is the problem, notifications generated by new requests will work even without the backplane. Notice that the real purpose of the backplane is to scale out signalr, this is, to connect a farm of WebServers between them.
Also keep in mind that running long-running task inside IIS is as task hard to achieve as, among other things, IIS does regular appPool recycles and has timeout limits for the requests to execute. I recommend that you read the following post: http://www.hanselman.com/blog/HowToRunBackgroundTasksInASPNET.aspx
“If you think you can just write a background task yourself, it's likely you'll get it wrong. I'm not impugning your skills, I'm just saying it's subtle. Plus, why should you have to?”
Hope this helps

SignalR - how to detect client connect/reconnect failure (e.g. timeout)

How do I detect when a client is failing to connect (or re-connect) to a Signalr hub?
Is there an event that fires on the js client?
Responding to your second comment:
Currently in 0.5.3 SignalR does not handle the case when the server goes away. However, this will be/is handled in the next release 1.0alpha.
For the interim I'd recommend pinging the server every 5 seconds and seeing if the request fails. If it fails say 2 times then chances are the server is down and you can handle the logic from there.
If you'd like to see how we do it in the next release here's the link to the github feature: https://github.com/SignalR/SignalR/issues/469

How to detect if a client has crashed (or exit) for a server using Qt

The client use ssh login and start up a server on remote machine, then the clinet create a tcp connect to the server.
The server need exit when the client has exit normally or crashed or network is dropped.
So the question is how to detect if the client which the server has connected to is crashed.
The first try is using error() signal, catch QAbsoluteSocket::NetworkError to determine the network has dropped. But I can't receive error() signal at all even if i pull out the network cable.
The second try is using the SocketState, i think whenever SocketState is UnconnectedState,the client may has exit normally and the server should exit too. This way works fine for "normal exit", but I don't know how to deal with "crash" and "dead network".
Help me, thanks!
I'd recommend using TCP keep alive. It is not exposed through the public QTcpSocket interface, but you can use setsockopt with QAbstractSocker::socketDescriptor to activate the SO_KEEPALIVE feature.
EDIT: It appears that keep alive was added to QAbstractSocket at some point. So, simply call QAbstractSocket::setSocketOption with QAbstractSocket::KeepAliveOption.
You can find information about adjusting the timeout of keep alive request here: http://www.gnugk.org/keepalive.html
Most of the time, the only way you will know there is a problem with a socket connection is when you try to read or write with it. There are some exceptions: Windows will change the state of sockets if the network cable is unplugged, Linux (in my experience) will not.
The most reliable way to detect connection problems is to have the client regularly send a small message at an agreed upon interval with the server. If the server does not see this message within a reasonable time, it should consider the client dead and drop the connection. This will also give both sides regular opportunities to detect a problem via reads and writes.

Resources