I want to know how many clients can have an open connection
to an gRPC server running on an average machine.
The clients should connect to the server and open a stream.
Thus I am searching for a benchmark on how many gRPC streams
a gRPC server can handle.
There is currently no such benchmark to my knowledge; however, I will attempt to answer what I think is your question.
In terms of the number of gRPC connections, your typical gRPC server will be bounded by the amount of memory those connections take up. Based on data we've collected in the past, a channel will take up on the order of 40 KB of memory on the server side. So taking into account the amount of memory your server has available, you can estimate the max number of gRPC connections that your server will accept.
If you want to dynamically control how much memory gets used (and, thus, how many connections get accepted), gRPC has a ResourceQuota object that you can configure [1]. If accepting a connection would put the server over the resource quota, your server will instead refuse the connection. This provides a much better alternative to OOM'ing.
[1] https://grpc.github.io/grpc/cpp/classgrpc__impl_1_1_resource_quota.html
Related
I have an application server which does nothing but send requests to an upstream service, wait, and then respond to the client with data recieved from the upstream service. The microservice takes Xms to respond, or sometimes Yms, where X<<Y. The client response time is (in steady state) essentially equal to the amount of time the upstream microservice takes to process the request - any additional latency is negligible, as the client, application server, and upstream microservice are all located in the same datacenter, and communicate over private IPs with a very large network bandwidth.
When the client starts sending requests at a rate of N, the application server becomes overloaded and response times spike dramatically as the server becomes unsteady. The client and the microservice have minimal CPU usage, and the application server is at maximum CPU usage. (The application server is on a much weaker baremetal than the other two services - this is a testing environment used to monitor the application server's behavior under stress.)
Intuivetly, I would expect N to be the same value, regardless of how long the microservice is taking to respond, but I'm finding that the maximum throughput in steady state is significantly less when the microservice takes Yms then when it's only taking Xms. The number of ephemeral ports in use when this happens is also significantly less than the limit. Since the amount of reading and writing being done is the same, and memory usage is the same, I can't really figure out why N is a factor of the microservice's execution time. Also, no, the input/output of the services is the same regardless of the execution time, so the amount of bytes being written is the same regardless. Since the only difference is the execution time, which only requires more TCP connections to be used when responses are taking a while, I'm not sure why maximum throughput is affected? From my understanding, the cost of a TCP connection is negligible once it has already been established.
Am I missing something?
Thanks,
Additional details:
The services use HTTP/1.1 with keepalive, with no pipelining.
Also should've mentioned that I'm using an IO-Thread model. If I were using a thread per request I could understand this behavior, but with only a thread per core it's confusing.
With HTTP/1.0, there used to be a recommended limit of 2 connections per domain. More recent HTTP RFCs have relaxed this limitation but still warn to be conservative when opening multiple connections:
According to RFC 7230, section 6.4, "a client ought to limit the number of simultaneous open connections that it maintains to a given server".
More specifically, besides HTTP/2, these days, browsers impose a per-domain limit of 6-8 connections when using HTTP/1.1. From what I'm reading, these guidelines are intended to improve HTTP response times and avoid congestion.
Can someone help me understand what would happen with congestion and response times if many connections were opened by domain? It doesn't sound like an HTTP server problem since the amount of connection they can handle seems like an implementation detail. The explanation above seems to say it's about TCP performance? I can't find any more precise explanations for why HTTP clients limit the number of connections per domains.
The primary reasoning for this is resources on the server side.
Imagine that you have a server running Apache with the default of 256 worker threads. Imagine that this server is hosting an index page that has 20 images on it. Now imagine that 20 clients simultaneously connect and download the index page; each of these clients closes these connections after obtaining the page.
Since each of them will now establish connections to download the image, you likely see that the connections increase exponentially (or multiplicatively, I suppose). Consider what happens if every client is configured to establish up to ten simultaneous connections in parallel to optimize the display of the page with images. This takes us very quickly to 400 simultaneous connections. This is nearly double the number of worker processes that Apache has available (again, by default, with a pre-fork).
For the server, resources must be balanced to be able to serve the most likely load, but the clients help with this tremendously by throttling connections. If every client felt free to establish 100+ connections to a server in parallel, we would very quickly DoS lots of hosts. :)
We are trying to do Load test on our servers, for this we are currently using JMeter.
However we have decided to use golang's concurrency model to create simultaneous http requests to the server and perform the load test.
Is there any limitations on how many http requests or tcp connections a machine can open/send to any other machine, is there any way to find this number?
Edit----
We need this number since this will help us identify how many http request can be sent simultaneously to the server
Thanks
Is there any limitations on how many http requests or tcp connections a machine can open/send to any other machine, is there any way to find this number?
Yes. When connecting to a single target, you are limited by the number of outbound ports, which is 65535. In practice somewhat less, as not all ports are available for use as outbound ports.
We need this number since this will help us identify how many http request can be sent simultaneously to the server
From any one machine. It has nothing to do with the maximum number of connections from different machines.
When we talk about capacity of a web application, we often mention the concurrent requests it could handle.
As my another question discussed, Ethernet use TDM (Time Division Multiplexing) and no 2 signals could pass along the wire simultaneously. So if the web server is connected to the outside world through a Ethernet connection, there'll be literally no concurrent requests at all. All requests will come in one after another.
But if the web server is connected to the outside world through something like a wireless network card, I believe the multiple signals could arrive at the same time through the electro-magnetic wave. Only in this situation, there are real concurrent requests to talk about.
Am I right on this?
Thanks.
I imagine "concurrent requests" for a web application doesn't get down to the link level. It's more a question of the processing of a request by the application and how many requests arrive during that processing.
For example, if a request takes on average 2 seconds to fulfill (from receiving it at the web server to processing it through the application to sending back the response) then it could need to handle a lot of concurrent requests if it gets many requests per second.
The requests need to overlap and be handled concurrently, otherwise the queue of requests would just fill up indefinitely. This may seem like common sense, but for a lot of web applications it's a real concern because the flood of requests can bog down a resource for the application, such as a database. Thus, if the application has poor database interactions (overly complex procedures, poor indexing/optimization, a slow link to a database shared by many other applications, etc.) then that creates a bottleneck which limits the number of concurrent requests the application can handle, even though the application itself should be able to handle them.
.Imagining a http server listening at port 80, what happens is:
a client connects to the server to request some page; it is connecting from some origin IP address, using some origin local port.
the OS (actually the network stack) looks at the incoming request's destination IP (since the server may have more than one NIC) and destination port (80) and verifies that some application is registered to handle data on that port (the http server). The combination of 4 numbers (origin IP, origin port, destination IP, port 80) uniquely identifies a connection. If such a connection does not exists yet, a new one is added to the network stack's internal table and a connection request is passed on to the http server's listening socket. From now on the network stack just passes on data for that connection to the application.
Multiple client can be sending requests, for each one the above happens. So from the network perspective, all happens serially, since data arrives one packet at a time.
From the software perspective, the http server is listening to incoming requests. The number of requests it can have queued before the clients start getting errors is determined by the programmer based on the hardware capacity (this is the first bit of concurrency: there can be multiple requests waiting to be processed). For each one it will create a new socket (as fast as possible in order to continue emptying the request queue) and let the actual processing of the request be done by another part of the application (different threads). These processing routines will (ideally) spend most of their time waiting for data to arrive and react (ideally) quickly to it.
Since usually the processing of data is many times faster than the network I/O, the server can handle many requests while processing network traffic, even if the hardware consist of only one processor. Multiple processors increase this capability. So from the software perspective all happens concurrently.
How the actual processing of the data is implemented is where the key to performance lies (you want it to be as efficient as possible). Several possibilities exist (async socket operations as provided by the Socket class, threadpool, unique threads, the new parallel features from .NET 4).
It's true that no two packets can arrive at the exact same time (unless multiple network cards are in use per Gabe's comment). However, web request usually requires a number of packets. The arrival of these packages is interspersed when multiple requests are coming in at near the same time (whether using wired or wireless access). Also, the processing of these requests can overlap.
Add multi-threading (or multiple processors / cores) to the picture, and you can see how lengthy operations such as reading from a database (which requires a lot of waiting around for a response) can easily overlap even though the individual packets are arriving in a serial fashion.
Edit: Added note above to incorporate Gabe's feedback.
What is the maximum number of concurrent connections possible in BlazeDS using only remoting service
Remoting calls are simple HTTP POST calls...every remote call is going to be executed on one of the application server threads. So the maximum number of concurrent connection will depends on your server configuration (the thread pool size).
If I'm understanding what you mean by remoting, the http proxy? There is a place to set max connections and max connections per client in the proxy-config.xml file. There may also be issues for the BlazeDS if you're using data push, not just the max number of threads, but there are OS settings that may come into question as well, like max number of FDs that can be opened.