Grpc Error on Long Connections: Too Many Pings - grpc

I am building a system with a microservice architecture that communicates between services using grpc. I have a long running request in the beginning that hits a central endpoint that makes a bunch of requests to other services. The first request to the central service in turn waits until the other services finish computing their requests before receiving a response from the central endpoint. This may take minutes to complete. The problem is that I keep getting a grpc error saying "Too many pings". I have set keepalive params on my go servers the following way:
ka_params := keepalive.ServerParameters{
Time: 10 * time.Second,
Timeout: 5 * time.Second,}
opts := []grpc.ServerOption{
grpc.KeepaliveParams(ka_params),
}
s = grpc.NewServer(opts...)
And in my python servers like this:
opts = [("grpc.keepalive_time_ms", 10000),
("grpc.keepalive_timeout_ms", 5000),
("grpc.keepalive_permit_without_calls", True),
("grpc.http2.max_pings_without_data", 0)]
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10), options=opts)
I'm not sure why I get an error with too many pings. Aren't the pings expected because of the keepalive?

I think I found a solution to this. The crux of the issue was that the python and golang grpc versions have different default settings and python grpc is very poorly documented.
To solve the issue you have to set the max_ping_strikes on the python server to 0. The python server should have the following options:
opts = [("grpc.keepalive_time_ms", 10000),
("grpc.keepalive_timeout_ms", 5000),
("grpc.keepalive_permit_without_calls", True),
("grpc.http2.max_ping_strikes", 0)]

On the python server side, to configure the acceptable keepalive time period, you would want to set "grpc.http2.min_ping_interval_without_data_ms" to 10 seconds (maybe a bit higher to account for network latency).
The default setting for this parameter is 5 minutes and if the client is sending pings every 10 seconds, then it would result in a GOAWAY frame being sent with "too_many_pings".
(Also, the "grpc.keepalive_time_ms" on the server side results in the server also sending keepalive pings every 10 seconds. This might not be what you want.)
Reference:
https://github.com/grpc/grpc/blob/master/doc/keepalive.md

Related

GRPC server streaming, how to increase server ping time

Am working on a project using GRPC which has a node js server and client is Android written in Kotlin.
We are able to setup the connection and the server streaming works as expected.
The problem we are seeing is the server is pinging to client every 30 seconds for keepalive, even though we have kept the keep alive time as 5 minutes.
const options = { 'grpc.keepalive_permit_without_calls': false, "grpc.keepalive_time_ms": 300000 };
On client side
val builder = ManagedChannelBuilder.forAddress(ipaddress, portno)
.keepAliveTime(5, TimeUnit.MINUTES)
.keepAliveWithoutCalls(false)
.proxyDetector { null }
.usePlaintext()
channel = builder.build()
From wireshark logs I can see that the server is pinging the client every 30 seconds, is there any way to increase the time frame for pinging? If yes what is the maximum time frame we can keep and what are the code changes to be done ?

limit number of concurrent_rpcs in grpc c++

I am using grpc for bidirection streaming service. Beause this is an online real-time service, and we don't want our clients to wait too long, so we want to limit the rpcs to a certain number and reject the extra requests.
We are able to do it in Python with following code:
grpcServer = grpc.server(futures.ThreadPoolExecutor(max_workers=10), maximum_concurrent_rpcs=10)
In such case, only 10 requests will be processed, and other request will be rejected, the clients will throw a RESOURCE_EXHAUSTED error.
However, we find it hard to do it in C++. We use code following grpc sync server limit handle thread
ServerBuilder builder;
builder.SetSyncServerOption(ServerBuilder::SyncServerOption::NUM_CQS, 10);
grpc::ResourceQuota quota;
quota.SetMaxThreads(10 * 2);
builder.SetResourceQuota(quota);
We are using grpc in many services. Some using kaldi, some using libtorch.
In some cases, the above code behaves normally, it processes 10 requests at a time, and rejects other requests. and the processing speed (our service requires a log of cpu calculation) is ok.
In some cases, it only accepts 9 request at a time.
In some cases, it accepts 10 request at a time, but the processing speed is significantly lower than before.
We also tried
builder.AddChannelArgument(GRPC_ARG_MAX_CONCURRENT_STREAMS, 10);
But it is useless becuase GRPC_ARG_MAX_CONCURRENT_STREAMS is the max_concurrent_rpcs on a single http connection.
Could someone please point out the equivalent C++ version of the Python code? We want to make our service to handle 10 requests at a time, and rejects other request, we do not want any service to wait in the queue.

gRPC-go server becomes unresponsive and starts backing up requests after some time

We have a public gRPC API. We have a client that is consuming our API based on the REST paradigm of creating a connection (channel) for every request. We suspect that they are not closing this channel once the request has been made.
On the server side, everything functions ok for a while, then something appears to be exhausted. Requests back up on the servers and are not processed - this results in our proxy timing out and sending an unavailable response. Restarting the server fixes the issue, and I can see backed up requests being flushed in the logs as the servers shutdown.
Unfortunately, it seems that there is no way to monitor what is happening on the server side and prune these connections. We have the following keep alive settings, but they don't appear to have an impact:
grpc.KeepaliveParams(keepalive.ServerParameters{
MaxConnectionIdle: time.Minute * 5,
MaxConnectionAge: time.Minute * 15,
MaxConnectionAgeGrace: time.Minute * 1,
Time: time.Second * 60,
Timeout: time.Second * 10,
})
We also have tried upping MaxConcurrentStreams from the default 250 to 1000, but the pod
Is there any way that we can monitor channel creation, usage and destruction on the server side - if only to prove or disprove that the clients method of consumption is causing the problems.
Verbose logging has not been helpful as it seems to only log the client activity on the server (I.e. the server consuming pub/sub and logging as a client). I have also looked a channelz, but we have mutual TLS auth and I have been unsuccessful in being able to get it to work on our production pods.
We have instructed our client to use a single channel, and if that is not possible, to close the channels that they are creating, but they are a corporate and move very slowly. We've also not been able to examine their code. We only know that they are developing with dotnet. We're also unable to replicate the behaviour running our own go client at similar volumes.
The culpit is MaxConnectionIdle, it will always create a new http2server after the specified amount of time, and eventually your service will crash due to a goroutine leak.
Remove MaxConnectionIdle and MaxConnectionAge, then (preferably) make sure both ServerParameters and ClientParameters are using the same Time and Timeout.
const (
Time = 5 * time.Second // wait X seconds, then send ping if there is no activity
Timeout = 5 * time.Second // wait for ping back
)
// server code...
grpc.KeepaliveParams(keepalive.ServerParameters{
Time: Time,
Timeout: Timeout,
MaxConnectionAgeGrace: 10 * time.Second,
})
// client code...
grpc.WithKeepaliveParams(keepalive.ClientParameters{
Time: Time,
Timeout: Timeout,
PermitWithoutStream: true,
}),

Apache Async http client performance vs sync client

I am trying to switch my application to the async version from apache http-components client . The goal is to be able to handle more outbound connections (in the near future).
The payload of the requests is quite small (<5KB)
The endpoints I hit are around 20 in number.
With sync version of apache http client, the through put is about 200 requests/sec.
The average response time is about 100ms/request.
I abort the requests after a max of 180ms.
After switching to Async, the response time went up by 20ms/request.
The throughput also reduced to 160/sec. The number of aborted requests doubled.
This is after fine tuning the application a lot.
Is there anything I can do to improve the performance of async client?
I set maxConnectionsPerRoute high. Have a large Connection pool.
Are there any params that are key to getting the most out of async client?
Did you forget to set maxConnTotal?
The default maxConnTotal is 20, this is a global limit.
I forgot to set it once.

IBM MQ websphere - creates an additional TCP "ACK" and waiting 13 sec for answer

I got two Tomcats with identical java apps ('clients') that communicate with a single MQ server
On one client, the communication is fast, but on the other it takes a lot of time (> 13 sec).
On the infrastructure the only difference is that the 'slow client' is on a different domain then the server, while the 'fast client' is on the same domain, which should cause any trouble, as far as I know.
I've sniffed the communication both ways with Wireshark, and I've found one difference: the MQ server sends an additional "ACK" to the slow client, does not get nothing, waits for 13 seconds and then sends the "INITIAL DATA" packet that actually starts the work, like this:
Client - > server :IBM WebSphere SYN
Server - > client :IBM WebSphere SYN-ACK
Client - > server :IBM WebSphere ACK
Client - > server :INITIAL DATA
Server - > client :IBM WebSphere ACK // only on slow client!!!
// Waits for 13 seconds - only on slow client.
Server - > client :INITIAL DATA // from here, it flows on both clients
Notes:
1) The communication to the server is not SSL, and uses IP address rather then DNS names, so resolving issues should not occur.
2) I've published a similar question before - this is not the same case! last time it was BIOS over TCP issue.
They way we used to overcome the issue is using SimpleConnctionManager and a long timeout.
It's a work around, but it works.

Resources