HTTP connection pools to share among processes - http

Where I work our main web application is served with nginx+uwsgi+Django. A given production box has 80 uwsgi worker processes running on it. Our Django application makes moderately frequent requests to Amazon S3 but, if each of those 80 workers has to use its own HTTP connection for such requests, they're not frequent enough to take advantage of the (relatively short) HTTP Keep-Alive allowed for by Amazon's servers. So, we frequently have to pay a reconnection penalty after the connection is dropped on Amazon's side.
What I would like is if there were a proxy service running on the same box that could "concentrate" the S3 connections from those 80 processes down into a smaller pool of HTTP connections that would get enough use that they would be kept alive. The Django app would connect to the proxy, and the proxy would use its pool of kept-alive connections to forward the requests to S3. I see that it is possible to use nginx itself as a forward proxy, but it's not clear to me if or how this can take advantage of connection pooling the way I have in mind. An ideal solution would be good at auto-scaling so that a uwsgi worker would never have to wait on the proxy itself for a connection, but would pare back connections as load drops so as to keep the connections as "hot" as possible (perhaps keeping 1 or 2 spare to handle occasional upticks).
I've run across other forward proxies such as Squid but these products seem designed to fulfill the more traditional caching proxy role for use by e.g. ISPs that have many disparate remote clients.
Does anyone know of an existing solution for this type of problem? Many thanks!

Related

Correct way to get a gRPC client to communicate with one of many ECS instances of the gRPC service?

I have a gRPC client, not dockerised, and server application, which I want to dockerise.
What I don't understand is that gRPC first creates a connection with a server, which involves a handshake. So, if I want to deploy the dockerised server on ECS with multiple instances, then how will the client switch from one to the other (e.g., if one gRPC server falls over).
I know AWS loadbalancer now works with HTTP 2, but I can't find information on how to handle the fact that the server might change after the client has already opened a connection to another one.
What is involved?
You don't necessarily need an in-line load balancer for this. By using a Round Robin client-side load balancing policy along with a DNS record that points to multiple backend instances, you should be able to get some level of redundancy.

Can we Scale gRPC like worker processes in the same server

Our application is expected to receive thousands of request every second and we are considering gRPC as one of our main service is in a different language.
My queries are
Can we use something like supervisor to spawn multiple workers (one gRPC server per service) as gRPC servers listening to the same port, Or is gRPC servers limited to only one per server/port
How would i go about the performance testing to determine maximum requests per gRPC server.
Thanks in advance
While you can certainly use supervisord to spawn multiple gRPC server processes, port sharing would be a problem. However, this is a Posix limitation, not a gRPC limitation. By default, multiple processes cannot listen on the same port. (to be clear, multiple processes can bind to the same port with SO_REUSEPORT, but this would not result in the behavior you presumably want).
So you have two options in order to get traffic routed to the proper service on a single port. The first option is to run all of the gRPC services in the same process and attached to the same server.
If having only a single server process won't work for you, then you'll have to start looking at load balancing. You'd front all of your services with any HTTP/2-capable load balancer (e.g. Envoy, Nginx) and have it listen on your single desired port and route to each gRPC server process as appropriate.
This is a very broad question. The answer is "the way you'd benchmark any non-gRPC server." This site is a great resource for some principles behind benchmarking.

How can a web server handle multiple user's incoming requests at a time on a single port (80)?

How does a web server handle multiple incoming requests at the same time on a single port(80)?
Example :
At the same time 300k users want to see an image from www.abcdef.com which is assigned IP 10.10.100.100 and port 80. So how can www.abcdef.com handle this incoming users' load?
Can one server (which is assigned with IP 10.10.100.100) handle this vast amount of incoming users? If not, then how can one IP address be assigned to more than one server to handle this load?
A port is just a magic number. It doesn't correspond to a piece of hardware. The server opens a socket that 'listens' at port 80 and 'accepts' new connections from that socket. Each new connection is represented by a new socket whose local port is also port 80, but whose remote IP:port is as per the client who connected. So they don't get mixed up. You therefore don't need multiple IP addresses or even multiple ports at the server end.
From tcpipguide
This identification of connections using both client and server sockets is what provides the flexibility in allowing multiple connections between devices that we take for granted on the Internet. For example, busy application server processes (such as Web servers) must be able to handle connections from more than one client, or the World Wide Web would be pretty much unusable. Since the connection is identified using the client's socket as well as the server's, this is no problem. At the same time that the Web server maintains the connection mentioned just above, it can easily have another connection to say, port 2,199 at IP address 219.31.0.44. This is represented by the connection identifier:
(41.199.222.3:80, 219.31.0.44:2199).
In fact, we can have multiple connections from the same client to the same server. Each client process will be assigned a different ephemeral port number, so even if they all try to access the same server process (such as the Web server process at 41.199.222.3:80), they will all have a different client socket and represent unique connections. This is what lets you make several simultaneous requests to the same Web site from your computer.
Again, TCP keeps track of each of these connections independently, so each connection is unaware of the others. TCP can handle hundreds or even thousands of simultaneous connections. The only limit is the capacity of the computer running TCP, and the bandwidth of the physical connections to it—the more connections running at once, the more each one has to share limited resources.
TCP Takes care of client identification
As a.m. said, TCP takes care of the client identification, and the server only sees a "socket" per client.
Say a server at 10.10.100.100 listens to port 80 for incoming TCP connections (HTTP is built over TCP). A client's browser (at 10.9.8.7) connects to the server using the client port 27143. The server sees: "the client 10.9.8.7:27143 wants to connect, do you accept?". The server app accepts, and is given a "handle" (a socket) to manage all communication with this client, and the handle will always send packets to 10.9.8.7:27143 with the proper TCP headers.
Packets are never simultaneous
Now, physically, there is generally only one (or two) connections linking the server to internet, so packets can only arrive in sequential order. The question becomes: what is the maximum throughput through the fiber, and how many responses can the server compute and send in return. Other than CPU time spent or memory bottlenecks while responding to requests, the server also has to keep some resources alive (at least 1 active socket per client) until the communication is over, and therefore consume RAM. Throughput is achieved via some optimizations (not mutually-exclusive): non-blocking sockets (to avoid pipelining/socket latencies), multi-threading (to use more CPU cores/threads).
Improving request throughput further: load balancing
And last, the server on the "front-side" of websites generally do not do all the work by themselves (especially the more complicated stuff, like database querying, calculations etc.), and defer tasks or even forward HTTP requests to distributed servers, while they keep on handling trivially (e.g. forwarding) as many requests per second as they can. Distribution of work over several servers is called load-balancing.
1) How does a web server handle multiple incoming requests at the same time on a single port(80)
==> a) one instance of the web service( example: spring boot micro service) runs/listens in the server machine at port 80.
b) This webservice(Spring boot app) needs a servlet container like mostly tomcat.
This container will have thread pool configured.
c) when ever request come from different users simultaneously, this container will
assign each thread from the pool for each of the incoming requests.
d) Since the server side web service code will have beans(in case java) mostly
singleton, each thread pert aining to each request will call the singleton API's
and if there is a need for Database access , then synchronization of these
threads is needed which is done through the #transactional annotation. This
annotation synchronizes the database operation.
2) Can one server (which is assigned with IP 10.10.100.100) handle this vast amount of incoming users?
If not, then how can one IP address be assigned to more than one server to handle this load?
==> This will taken care by loadbalancer along with routetable
answer is: virtual hosts, in HTTP Header is name of domain so the web server know which files run or send to client

Round robin load balancing options for a single client

We have a biztalk server that makes frequent calls to a web service that we also host.
The web service is hosted on 4 servers with a DNS load balancer sitting between them. The theory is that each subsequent call to the service will round robin the servers and balance the load.
However this does not work presumably because the result of the DNS lookup is cached for a small amount of time on the client. The result is that we get a flood of requests to each server before it moves on to the next.
Is that presumption correct and what are the alternative options here?
a bit more googling has suggested that I can disable client side caching for DNS: http://support.microsoft.com/kb/318803
...however this states the default cache time is 1 day which is not consistent with my experiences
You need to load balance at a lower level with NLB Clustering on Windows or LVS on Linux (or other equivalent piece of software). If you are letting clients to the web service keep an HTTP connection open for longer than a single request/response you still might not get the granularity of load balancing you are looking for so you might have to reconfigure your application servers if that is the case.
The solution we finally decided to go with was Application Request Routing which is an IIS extension. In tests this has shown to do what we want and is far easier for us (as developers) to get up and running as compared to a hardware load balancer.
http://www.iis.net/download/ApplicationRequestRouting

Can a load balancer recognize when an ASP.NET worker process is restarting and divert traffic?

Say I have a web farm of six IIS 7 web servers, each running an identical ASP.NET application.
They are behind a hardware load balancer (say F5).
Can the load balancer detect when the ASP.NET application worker process is restarting and divert traffic away from that server until after the worker process has restarted?
What happens during an IIS restart is actually a roll-over process. A new IIS worker process starts that accepts new connections, while the old worker process continues to process existing connections.
This means that if you configure your load balancer to use a balancing algorithm other than simple round-robin, that it will tend to "naturally" divert some, but not all, connections away from the machine that's recycling. For example, with a "least connections" algorithm, connections will tend to hang around longer while a server is recycling. Or, with a performance or latency algorithm, the recycling server will suddenly appear slower to the load balancer.
However, unless you write some code or scripts that explicitly detect or recognize a recycle, that's about the best you can do--and in most environments, it's also really all you need.
We use a Cisco CSS 11501 series load balancer. The keepalive "content" we are checking on each server is actually a PHP script.
service ac11-192.168.1.11
ip address 192.168.1.11
keepalive type http
keepalive method get
keepalive uri "/LoadBalancer.php"
keepalive hash "b026324c6904b2a9cb4b88d6d61c81d1"
active
Because it is a dynamic script, we are able to tell it to check various aspects of the server, and return "1" if all is well, or "0" if not all is well.
In your case, you may be able to implement a similar check script that will not work if the ASP.NET application worker process is restarting (or down).
It depends a lot on the polling interval of the load balancer, a request from the balancer has to fail in order before it can decide to divert traffic
IIS 6 and 7 restart application pools every 1740 minutes by default. It does this in an overlapped manner so that services are not impacted.
http://technet.microsoft.com/en-us/library/cc770764%28v=ws.10%29.aspx
http://forums.iis.net/t/1153736.aspx/1
On the other hand, in case of a fault, a good load balancer (I'm sure F5 is) can detect a fault with one of the web servers and send requests to the remaining, healthy web servers. That's a critical part of a high-availability web infrastructure.

Resources