Can we Scale gRPC like worker processes in the same server - grpc

Our application is expected to receive thousands of request every second and we are considering gRPC as one of our main service is in a different language.
My queries are
Can we use something like supervisor to spawn multiple workers (one gRPC server per service) as gRPC servers listening to the same port, Or is gRPC servers limited to only one per server/port
How would i go about the performance testing to determine maximum requests per gRPC server.
Thanks in advance

While you can certainly use supervisord to spawn multiple gRPC server processes, port sharing would be a problem. However, this is a Posix limitation, not a gRPC limitation. By default, multiple processes cannot listen on the same port. (to be clear, multiple processes can bind to the same port with SO_REUSEPORT, but this would not result in the behavior you presumably want).
So you have two options in order to get traffic routed to the proper service on a single port. The first option is to run all of the gRPC services in the same process and attached to the same server.
If having only a single server process won't work for you, then you'll have to start looking at load balancing. You'd front all of your services with any HTTP/2-capable load balancer (e.g. Envoy, Nginx) and have it listen on your single desired port and route to each gRPC server process as appropriate.
This is a very broad question. The answer is "the way you'd benchmark any non-gRPC server." This site is a great resource for some principles behind benchmarking.

Related

Correct way to get a gRPC client to communicate with one of many ECS instances of the gRPC service?

I have a gRPC client, not dockerised, and server application, which I want to dockerise.
What I don't understand is that gRPC first creates a connection with a server, which involves a handshake. So, if I want to deploy the dockerised server on ECS with multiple instances, then how will the client switch from one to the other (e.g., if one gRPC server falls over).
I know AWS loadbalancer now works with HTTP 2, but I can't find information on how to handle the fact that the server might change after the client has already opened a connection to another one.
What is involved?
You don't necessarily need an in-line load balancer for this. By using a Round Robin client-side load balancing policy along with a DNS record that points to multiple backend instances, you should be able to get some level of redundancy.

Do client services need ports?

Recently, I was having a chat with a much experienced engineer. We had a service on the server that only initiated requests to a partner. I suggested that this service requires us to configure a port and he turned down my suggestion. I believe he said something on the line of "Since we are not hosting a service that is not accessed by anyone rather we are accessing a partner's service, we don't require a port." It got me thinking, given on the same server, we have so many services, how does the server know that this response is for this given service?
broadly, the server is really acting as a client and the ports used for connections are assigned dynamically by the networking stack
under normal conditions, the port
is numbered >1000 (low ports are reserved for root processes)
not in use

How should I healthcheck an event-driven service

Suppose I have a service which rather than listening for http request, or gRPC procedure calls only consumes messages from a broker (Kafka, rabbitMQ, Google Pub/Sub, what have you). How should I go about healthchecking the service (eg. k8s liveness and readyness probes) ?
Should the service also listen for http solely for the purpose of healthchecking or is there some other technique which can be used ?
Having the service listen to HTTP solely to expose a liveness/readiness check (although in services that pull input from a message broker, readiness isn't necessarily something that a container scheduler like k8s would be concerned with) isn't really a problem (and it also opens up the potential to expose diagnostic and control endpoints).
Kubernetes supports three different types of probes, see also Kubernetes docs:
Running a command
Making an HTTP request
Checking a TCP socket
So, in your case you can run a command that fails when your service is unhealthy.
Also be aware that liveness probes may be dangerous to use.

HTTP connection pools to share among processes

Where I work our main web application is served with nginx+uwsgi+Django. A given production box has 80 uwsgi worker processes running on it. Our Django application makes moderately frequent requests to Amazon S3 but, if each of those 80 workers has to use its own HTTP connection for such requests, they're not frequent enough to take advantage of the (relatively short) HTTP Keep-Alive allowed for by Amazon's servers. So, we frequently have to pay a reconnection penalty after the connection is dropped on Amazon's side.
What I would like is if there were a proxy service running on the same box that could "concentrate" the S3 connections from those 80 processes down into a smaller pool of HTTP connections that would get enough use that they would be kept alive. The Django app would connect to the proxy, and the proxy would use its pool of kept-alive connections to forward the requests to S3. I see that it is possible to use nginx itself as a forward proxy, but it's not clear to me if or how this can take advantage of connection pooling the way I have in mind. An ideal solution would be good at auto-scaling so that a uwsgi worker would never have to wait on the proxy itself for a connection, but would pare back connections as load drops so as to keep the connections as "hot" as possible (perhaps keeping 1 or 2 spare to handle occasional upticks).
I've run across other forward proxies such as Squid but these products seem designed to fulfill the more traditional caching proxy role for use by e.g. ISPs that have many disparate remote clients.
Does anyone know of an existing solution for this type of problem? Many thanks!

How can a web server handle multiple user's incoming requests at a time on a single port (80)?

How does a web server handle multiple incoming requests at the same time on a single port(80)?
Example :
At the same time 300k users want to see an image from www.abcdef.com which is assigned IP 10.10.100.100 and port 80. So how can www.abcdef.com handle this incoming users' load?
Can one server (which is assigned with IP 10.10.100.100) handle this vast amount of incoming users? If not, then how can one IP address be assigned to more than one server to handle this load?
A port is just a magic number. It doesn't correspond to a piece of hardware. The server opens a socket that 'listens' at port 80 and 'accepts' new connections from that socket. Each new connection is represented by a new socket whose local port is also port 80, but whose remote IP:port is as per the client who connected. So they don't get mixed up. You therefore don't need multiple IP addresses or even multiple ports at the server end.
From tcpipguide
This identification of connections using both client and server sockets is what provides the flexibility in allowing multiple connections between devices that we take for granted on the Internet. For example, busy application server processes (such as Web servers) must be able to handle connections from more than one client, or the World Wide Web would be pretty much unusable. Since the connection is identified using the client's socket as well as the server's, this is no problem. At the same time that the Web server maintains the connection mentioned just above, it can easily have another connection to say, port 2,199 at IP address 219.31.0.44. This is represented by the connection identifier:
(41.199.222.3:80, 219.31.0.44:2199).
In fact, we can have multiple connections from the same client to the same server. Each client process will be assigned a different ephemeral port number, so even if they all try to access the same server process (such as the Web server process at 41.199.222.3:80), they will all have a different client socket and represent unique connections. This is what lets you make several simultaneous requests to the same Web site from your computer.
Again, TCP keeps track of each of these connections independently, so each connection is unaware of the others. TCP can handle hundreds or even thousands of simultaneous connections. The only limit is the capacity of the computer running TCP, and the bandwidth of the physical connections to it—the more connections running at once, the more each one has to share limited resources.
TCP Takes care of client identification
As a.m. said, TCP takes care of the client identification, and the server only sees a "socket" per client.
Say a server at 10.10.100.100 listens to port 80 for incoming TCP connections (HTTP is built over TCP). A client's browser (at 10.9.8.7) connects to the server using the client port 27143. The server sees: "the client 10.9.8.7:27143 wants to connect, do you accept?". The server app accepts, and is given a "handle" (a socket) to manage all communication with this client, and the handle will always send packets to 10.9.8.7:27143 with the proper TCP headers.
Packets are never simultaneous
Now, physically, there is generally only one (or two) connections linking the server to internet, so packets can only arrive in sequential order. The question becomes: what is the maximum throughput through the fiber, and how many responses can the server compute and send in return. Other than CPU time spent or memory bottlenecks while responding to requests, the server also has to keep some resources alive (at least 1 active socket per client) until the communication is over, and therefore consume RAM. Throughput is achieved via some optimizations (not mutually-exclusive): non-blocking sockets (to avoid pipelining/socket latencies), multi-threading (to use more CPU cores/threads).
Improving request throughput further: load balancing
And last, the server on the "front-side" of websites generally do not do all the work by themselves (especially the more complicated stuff, like database querying, calculations etc.), and defer tasks or even forward HTTP requests to distributed servers, while they keep on handling trivially (e.g. forwarding) as many requests per second as they can. Distribution of work over several servers is called load-balancing.
1) How does a web server handle multiple incoming requests at the same time on a single port(80)
==> a) one instance of the web service( example: spring boot micro service) runs/listens in the server machine at port 80.
b) This webservice(Spring boot app) needs a servlet container like mostly tomcat.
This container will have thread pool configured.
c) when ever request come from different users simultaneously, this container will
assign each thread from the pool for each of the incoming requests.
d) Since the server side web service code will have beans(in case java) mostly
singleton, each thread pert aining to each request will call the singleton API's
and if there is a need for Database access , then synchronization of these
threads is needed which is done through the #transactional annotation. This
annotation synchronizes the database operation.
2) Can one server (which is assigned with IP 10.10.100.100) handle this vast amount of incoming users?
If not, then how can one IP address be assigned to more than one server to handle this load?
==> This will taken care by loadbalancer along with routetable
answer is: virtual hosts, in HTTP Header is name of domain so the web server know which files run or send to client

Resources