Am working on a project using GRPC which has a node js server and client is Android written in Kotlin.
We are able to setup the connection and the server streaming works as expected.
The problem we are seeing is the server is pinging to client every 30 seconds for keepalive, even though we have kept the keep alive time as 5 minutes.
const options = { 'grpc.keepalive_permit_without_calls': false, "grpc.keepalive_time_ms": 300000 };
On client side
val builder = ManagedChannelBuilder.forAddress(ipaddress, portno)
.keepAliveTime(5, TimeUnit.MINUTES)
.keepAliveWithoutCalls(false)
.proxyDetector { null }
.usePlaintext()
channel = builder.build()
From wireshark logs I can see that the server is pinging the client every 30 seconds, is there any way to increase the time frame for pinging? If yes what is the maximum time frame we can keep and what are the code changes to be done ?
Related
I am building a system with a microservice architecture that communicates between services using grpc. I have a long running request in the beginning that hits a central endpoint that makes a bunch of requests to other services. The first request to the central service in turn waits until the other services finish computing their requests before receiving a response from the central endpoint. This may take minutes to complete. The problem is that I keep getting a grpc error saying "Too many pings". I have set keepalive params on my go servers the following way:
ka_params := keepalive.ServerParameters{
Time: 10 * time.Second,
Timeout: 5 * time.Second,}
opts := []grpc.ServerOption{
grpc.KeepaliveParams(ka_params),
}
s = grpc.NewServer(opts...)
And in my python servers like this:
opts = [("grpc.keepalive_time_ms", 10000),
("grpc.keepalive_timeout_ms", 5000),
("grpc.keepalive_permit_without_calls", True),
("grpc.http2.max_pings_without_data", 0)]
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10), options=opts)
I'm not sure why I get an error with too many pings. Aren't the pings expected because of the keepalive?
I think I found a solution to this. The crux of the issue was that the python and golang grpc versions have different default settings and python grpc is very poorly documented.
To solve the issue you have to set the max_ping_strikes on the python server to 0. The python server should have the following options:
opts = [("grpc.keepalive_time_ms", 10000),
("grpc.keepalive_timeout_ms", 5000),
("grpc.keepalive_permit_without_calls", True),
("grpc.http2.max_ping_strikes", 0)]
On the python server side, to configure the acceptable keepalive time period, you would want to set "grpc.http2.min_ping_interval_without_data_ms" to 10 seconds (maybe a bit higher to account for network latency).
The default setting for this parameter is 5 minutes and if the client is sending pings every 10 seconds, then it would result in a GOAWAY frame being sent with "too_many_pings".
(Also, the "grpc.keepalive_time_ms" on the server side results in the server also sending keepalive pings every 10 seconds. This might not be what you want.)
Reference:
https://github.com/grpc/grpc/blob/master/doc/keepalive.md
We have a public gRPC API. We have a client that is consuming our API based on the REST paradigm of creating a connection (channel) for every request. We suspect that they are not closing this channel once the request has been made.
On the server side, everything functions ok for a while, then something appears to be exhausted. Requests back up on the servers and are not processed - this results in our proxy timing out and sending an unavailable response. Restarting the server fixes the issue, and I can see backed up requests being flushed in the logs as the servers shutdown.
Unfortunately, it seems that there is no way to monitor what is happening on the server side and prune these connections. We have the following keep alive settings, but they don't appear to have an impact:
grpc.KeepaliveParams(keepalive.ServerParameters{
MaxConnectionIdle: time.Minute * 5,
MaxConnectionAge: time.Minute * 15,
MaxConnectionAgeGrace: time.Minute * 1,
Time: time.Second * 60,
Timeout: time.Second * 10,
})
We also have tried upping MaxConcurrentStreams from the default 250 to 1000, but the pod
Is there any way that we can monitor channel creation, usage and destruction on the server side - if only to prove or disprove that the clients method of consumption is causing the problems.
Verbose logging has not been helpful as it seems to only log the client activity on the server (I.e. the server consuming pub/sub and logging as a client). I have also looked a channelz, but we have mutual TLS auth and I have been unsuccessful in being able to get it to work on our production pods.
We have instructed our client to use a single channel, and if that is not possible, to close the channels that they are creating, but they are a corporate and move very slowly. We've also not been able to examine their code. We only know that they are developing with dotnet. We're also unable to replicate the behaviour running our own go client at similar volumes.
The culpit is MaxConnectionIdle, it will always create a new http2server after the specified amount of time, and eventually your service will crash due to a goroutine leak.
Remove MaxConnectionIdle and MaxConnectionAge, then (preferably) make sure both ServerParameters and ClientParameters are using the same Time and Timeout.
const (
Time = 5 * time.Second // wait X seconds, then send ping if there is no activity
Timeout = 5 * time.Second // wait for ping back
)
// server code...
grpc.KeepaliveParams(keepalive.ServerParameters{
Time: Time,
Timeout: Timeout,
MaxConnectionAgeGrace: 10 * time.Second,
})
// client code...
grpc.WithKeepaliveParams(keepalive.ClientParameters{
Time: Time,
Timeout: Timeout,
PermitWithoutStream: true,
}),
I have this code to test asynchronous programming in SignalR. this code send back to client the text after 10 seconds.
public class TestHub : Hub
{
public async Task BroadcastMessage(string text)
{
await DelayResponse(text);
}
async Task DelayResponse(string text)
{
await Task.Delay(10000);
Clients.All.displayText(text);
}
}
this code work fine but there is an unexpected behavior. when 5 messages are sent in less than 10 second, client can't send more message until previous "DelayResponse" methods end. it happens per connection and if before 10 seconds close the connection and reopen it, client can send 5 messages again. I test it with chrome, firefox and IE.
I made some mistake or it is signalr limitation?
You are most likely hitting a browser limit. When using longPolling and serverSentEvent transport each send is a separate HTTP request. Since you are delaying response these requests are longer running and browsers have limits of how many concurrent connection can be opened. Once you reach the limit a new connection will not be open until one of the previous ones is completed.
More details on concurrent requests limit:
Max parallel http connections in a browser?
That's not the sens of signalR, that you waiting for a "long running" task. For that signalR supports server push mechanisme.
So if you have something which needs more time you can trigger this from client.
In the case the calculation is finish you can send a message from server to client.
I have two meteor applications connected via DDP on different servers and server A send data to server B. This is the way they work.
Server A
Items = new Meteor.Collection('items');
Items.insert({name: 'item 1'});
if (Meteor.isServer) {
Meteor.publish('items', function() {
return Items.find();
});
}
Server B
var remote = DDP.connect('http://server-a/');
Items = new Meteor.Collection('items', remote);
remote.subscribe('items');
Items.find().observe({
added: function(item) {
console.log(item);
}
});
Every time I call Items.insert(something) on server A, on Server B I got a log on the console with the object I saved on Server A. But if Server B lost Internet connection, the data inserted on Server A doesn't appear anymore on Server B when it reconnect to Internet.
Server B is connected to Internet through a router. This problem only happen when I disconnect and reconnect the router, not when I disconnect and reconnect the server from the router. Both servers are on different networks and connect via Internet.
I created a timer on Server B that call remote.status() but always get { status: 'connected', connected: true, retryCount: 0 } when connected or disconnected from Internet.
Update: steps to reproduce
I created a project on github with the testing code https://github.com/camilosw/ddp-servers-test. Server A is installed on http://ddpserverstest-9592.onmodulus.net/
My computer is connected to Internet through a wireless cable modem.
Run mrt on server-b folder
Go to http://ddpserverstest-9592.onmodulus.net/ and click the link Insert (you can click delete to remove all previous inserts). You must see a message on your local console with the added item.
Turn off the wireless on the computer and click the insert link again. (You will need to click on another computer with Internet access, I used an smartphone to click the link)
Turn on the wireless on the computer. You must see a message on your local console with the second item.
Now, turn off the cable modem and click the insert link again.
Turn on the cable modem. This time, the new item doesn't appear on the console.
I also did it with an android smartphone using the option to share Internet to my computer via wireless. First I turned off and on the wireless on my computer and worked right. Then I turned off and on the Internet connection on the smartphone and I got the same problem.
Update 2
I have two wireless router on my office. I found that the same problem happen if I move between routers.
Emily Stark, from the Meteor Team, confirmed that this is due to a missing feature on the current implementation (version 0.7.0.1 at the moment I write this answer). Their answer is here https://github.com/meteor/meteor/issues/1543. Below is their answer and a workaround she suggest:
The server-to-server connection is not reconnecting because Meteor currently doesn't do any heartbeating on server-to-server DDP connections. Just as in any other TCP connection, once you switch to a different router, no data can be sent or received on the connection, but the client will not notice unless it attempts to send some data and times out. This differs from browser-to-server DDP connections, which run over SockJS. SockJS does its own heartbeating that we can use to detect dead connections.
To see this in action, here is some code that I added to server-b in your example:
var heartbeatOutstanding = false;
Meteor.setInterval(function () {
if (! heartbeatOutstanding) {
console.log("Sending heartbeat");
remote.call("heartbeat", function () {
console.log("Heartbeat returned");
heartbeatOutstanding = false;
});
heartbeatOutstanding = true;
}
}, 3000);
remote.onReconnect = function () {
console.log("RECONNECTING REMOTE");
};
With this code added in there, server-b will reconnect after a long enough time goes by without an ACK from server-a for the TCP segments that are delivering the heartbeat method call. On my machine, this is just a couple minutes, and I get an ETIMEDOUT followed by a reconnect.
I've opened a separate task for us to think about implementing heartbeating on server-to-server DDP connections during our next bug week. In the meantime, you can always implement heartbeating in your application to ensure that a DDP reconnection happens if the client can no longer talk to the server.
I think you are not passing DDP connection object to the Collection correctly, try:
var remote = DDP.connect('http://server-a/');
Items = new Meteor.Collection('items', { connection: remote });
It might be useful for debugging to try all these connection games from the browser console first, since Meteor provides the same API of connection/collections on the client (except for the control flow). Just open any Meteor application and try this lines from the console.
I revised a sample of communication between two ddp server, based on camilosw's code.
Server A as Cloud Data Center. Server B as Data Source, if some data changed, should be send to Server A.
You can find the code from https://github.com/iascchen/ddp-servers-test
I'm using SignalR 0.5.3 with hubs and I'm explicitely setting transport to long polling like this:
$.connection.hub.start({ transport: 'longPolling' }, function () {
console.log('connected');
});
with configuration like this (in global.asax.cs Application_Start method):
GlobalHost.DependencyResolver.UseRedis(server, port, password, pubsubDB, "FooBar");
GlobalHost.Configuration.DisconnectTimeout = TimeSpan.FromSeconds(2);
GlobalHost.Configuration.KeepAlive = TimeSpan.FromSeconds(15);
However the long polling doesn't seem to be working neither on development (IIS express) nor on production (IIS 7.5) environment. Connection seems to be made properly, however the long poll request is always timed out (after ~2 minutes) and reconnect happens afterwards. Logs from IIS are here. Response from first timed out request:
{"MessageId":"3636","Messages":[],"Disconnect":false,"TimedOut":true,"TransportData":{"Groups":["NotificationHub.56DDB6692001Ex"],"LongPollDelay":0}}
Timed out reconnect responses looks like this:
{"MessageId":"3641","Messages":[],"Disconnect":false,"TimedOut":true,"TransportData":{"Groups":["NotificationHub.56DDB6692001Ex"],"LongPollDelay":0}}
I would appreciate any help regarding this issue. Thanks.
Edit
If reconnect means the beginning of a new long poll cycle why it is initiated after ~2 minutes when KeepAlive setting in global.asax.cs is set to 15 seconds? Problem with this is that I have a reverse proxy in front of IIS which timeouts keep-alive requests after 25 seconds therefore I get 504 response when this reverse proxy timeout is reached.
Take a look at this post: How signalr works internally. The way long pulling works is after a set time the connection will timeout or receive a response and repull (reconnect)
Keep alive functionality is disabled for long polling. Seems that ConnectionTimeout is used instead.
This setting represents the amount of time to leave a transport
connection open and waiting for a response before closing it and
opening a new connection. The default value is 110 seconds.
https://learn.microsoft.com/en-us/aspnet/signalr/overview/guide-to-the-api/handling-connection-lifetime-events#timeoutkeepalive
If the request is timed out and server is not sending any data, but you expect it to send, maybe it is some issue on the server side that you don't yet see.