NGINX - HTTPS Load Balancer Configuration - nginx

I have created 2 CentOS servers on different zones in the same region and installed NGINX on that.
Created Instance groups like ig1 & ig2 and added those servers in that.
Created the external load balancer.
I'm able to launch the web page using public static IP. But the result is not as expected.
Is there any round-robin method in LB config? if yes how do we achieve that?
I have set the Max RPS is 1 sec on both the instance groups and health check interval period 1 sec.
NA
The requirement is, whenever I'm refreshing the load balancer IP once, it should load the page from different instances. But the thing is, I have to refresh the page no of times to load the page from different instances.
I'm not sure what configuration is missing. Can someone help me with this?

Most load balancers use a round-robin.
In GCP HTTP(S) LB has two methods of determining instance load. Within the backend service resource, the balancing Mode property selects between the requests per second (RPS) and CPU utilization modes.
You can override round-robin distribution by configuring session affinity. However, note that session affinity works best if you also set the balancing mode to requests per second (RPS).
Session affinity sends all requests from the same client to the same virtual machine instance as long as the instance stays healthy and has capacity.
================
Now, GCP HTTP(S) LB offers two types of session affinity:
a) client IP affinity— forwards all requests from the same client IP address to the same instance.
Client IP affinity directs requests from the same client IP address to the same backend instance based on a hash of the client's IP address. Client IP affinity is an option for every GCP load balancer that uses backend services.
but, when using client IP affinity, keep the following in mind:
The client IP address as seen by the load balancer mightn't be the originating client if it is behind NAT or makes requests through a proxy. Requests made through NAT or a proxy use the IP address of the NAT router or proxy as the client IP address. This can cause incoming traffic to clump unnecessarily onto the same backend instances.
If a client moves from one network to another, its IP address changes, resulting in broken affinity.
b) generated cookie affinity— sets a client cookie, then sends all requests with that cookie to the same instance.
When the generated cookie affinity is set, the load balancer issues a cookie named GCLB on the first request and then directs each subsequent request that has the same cookie to the same instance. Cookie-based affinity allows the load balancer to distinguish different clients using the same IP address so it can spread those clients across the instances more evenly. Cookie-based affinity allows the load balancer to maintain instance affinity even when the client's IP address changes.
The path of the cookie is always /, so if there are two backend services on the same hostname that enable cookie-based affinity, the two services are balanced by the same cookie.
===========================
Main source:
Load distribution algorithm
Requests per second

Related

How Does Running Multiple Server Instances Work In Regards to the User IP Address

If I'm running multiple web server instances can a client application (like a user using a web browser) be using the different instances or would they be routed to the same instance every time? Let's say they duplicate a tab or open a new tab are those tabs still using the same instance too?
This would be in Azure with IIS/ASP.NET.
When you are using load balance in any environment, you almost always have the option to set session affinity. It means basically, a client who is directed to server 1 on his first request will always be routed to the same server. Azure does provide this flexibility too without question. Here is the documentation with some details on how to do that configuration.
There are a couple of ways how you could configure session affinity. One prominent way is by source IP. So, using a different tab or a different browser instance will not make any difference. Requests from a client machine will always carry the same IP address and hence will go to the same server.
Here is the Powershell sample to set source IP based affinity:
Set-AzureLoadBalancedEndpoint -ServiceName MyService -LBSetName LBSet1 -Protocol TCP -LocalPort 80 -ProbeProtocolTCP -ProbePort 8080 –LoadBalancerDistribution sourceIP
Here is some detail on a more specific scenario that happens when users access a load balanced site from behind a company;s firewall.

target pools vs backend services vs regional backend service difference?

While exploring google cloud platform's Load balancer options
Advanced Menu shows multiple options which are a bit confusing.
there are multiple backends
backend service -> HTTP(S) LB
backend bucket -> HTTP(S) LB
regional backend service -> internal LB
target pools -> TCP LB
Just going through documentations for target pools and backend-service Looks to me they have similar parameters to configure and in the basic menu both are listed as backends.
I understand that target pools are used by TCP forwarding rules where as backend-service used by url map ( http/s Load balancer).
But Are there any other difference between these or is it just names?
A Backend Bucket allow you to use Google Cloud Storage bucket with HTTP(S) load balancing. It can handle request for static content. This option would be useful for a webpage that with static content and it would avoid the costs of resources than a instance would need.
The Backend Service is a centralized service that manages backends, which in turn manage an indeterminate number of instances that handle user requests.
The Target Pools resource defines a group of instances that should receive incoming traffic from forwarding rules. When a forwarding rule directs traffic to a target pool, Google Compute Engine picks an instance from these target pools based on a hash of the source IP and port and the destination IP and port.
This is why they both are listed as backend-services, because at the end they both do the same, but they specify for two different kind of load balancer. The backend service works for HTTP(S) load balancer and target pools are used for forwarding rules.
"A Network load balancer (unlike HTTP(s) load balancer) is a pass-through load balancer. It does not proxy connections from clients." On same note, TargetPools use forwarding rules, backend services use target proxies. Request is sent to instance in target pool "based on a hash of the source IP and port, destination IP and port, and protocol". Backend service has different mechanism to choose an instance group for e.g URL maps.

How does GCP Load balancers will manage websocket connections?

Clients are connecting to API gateway server through websocket connection. This server just orchestrates swarm of cloud functions, that are handling all of the data requesting and transforming. Server is statefull - it holds essential session data, which is defining, for example, what cloud functions are allowed to be requested by a given user.
This server doesn't use socket to broadcast data, so socket connections are not interacting between each other, and will not be doing this. So, all it needs to handle is single-client-to-server communication.
What will happen if i'll create bunch of replicas and put load balancer in front of all of them (like regular horizontal scaling)? If a user got connected to certain server instance, then his connection will stick there? or it will be switching between instances by load balancer?
There is a parameter available for load balancer that allows you to do what you are looking for: Session affinity.
"Session affinity if set attempts to send all network request from the same client to the same virtual machine instance."
Actually even if it seems to be related to load balancer you set it while creating target pools and/or backends. You should check if this solution can be applied to your particular configuration.

Google cloud HTTP load balancer always returns unhealthy instance for meteor app

I am trying to set up a HTTP load balancer for my Meteor app on google cloud. I have the application set up correctly, and I know this because I can visit the IP given in the Network Load Balancer.
However, when I try and set up a HTTP load balancer, the health checks always say that the instances are unhealthy (even though I know they are not). I tried including a route in my application that returns a status 200, and pointing the health check towards that route.
Here is exactly what I did, step by step:
Create new instance template/group for the app.
Upload image to google cloud.
Create replication controller and service for the app.
The network load balancer was created automatically. Additionally, there were two firewall rules allowing HTTP/HTTPS traffic on all IPs.
Then I try and create the HTTP load balancer. I create a backend service in the load balancer with all the VMs corresponding to the meteor app. Then I create a new global forwarding rule. No matter what, the instances are labelled "unhealthy" and the IP from the global forwarding rule returns a "Server Error".
In order to use HTTP load balancing on Google Cloud with Kubernetes, you have to take a slightly different approach than for network load balancing, due to the current lack of built-in support for HTTP balancing.
I suspect you created your service in step 3 with type: LoadBalancer. This won't work properly because of how the LoadBalancer type is implemented, which causes the service to be available only on the network forwarding rule's IP address, rather than on each host's IP address.
What will work, however, is using type: NodePort, which will cause the service to be reachable on the automatically-chosen node port on each host's external IP address. This plays more nicely with the HTTP load balancer. You can then pass this node port to the HTTP load balancer that you create. Once you open up a firewall on the node port, you should be good to go!
If you want more concrete steps, a walkthrough of how to use HTTP load balancers with Container Engine was actually recently added to GKE's documentation. The same steps should work with normal Kubernetes.
As a final note, now that version 1.0 is out the door, the team is getting back to adding some missing features, including native support for L7 load balancing. We hope to make it much easier for you soon!

How can a web server handle multiple user's incoming requests at a time on a single port (80)?

How does a web server handle multiple incoming requests at the same time on a single port(80)?
Example :
At the same time 300k users want to see an image from www.abcdef.com which is assigned IP 10.10.100.100 and port 80. So how can www.abcdef.com handle this incoming users' load?
Can one server (which is assigned with IP 10.10.100.100) handle this vast amount of incoming users? If not, then how can one IP address be assigned to more than one server to handle this load?
A port is just a magic number. It doesn't correspond to a piece of hardware. The server opens a socket that 'listens' at port 80 and 'accepts' new connections from that socket. Each new connection is represented by a new socket whose local port is also port 80, but whose remote IP:port is as per the client who connected. So they don't get mixed up. You therefore don't need multiple IP addresses or even multiple ports at the server end.
From tcpipguide
This identification of connections using both client and server sockets is what provides the flexibility in allowing multiple connections between devices that we take for granted on the Internet. For example, busy application server processes (such as Web servers) must be able to handle connections from more than one client, or the World Wide Web would be pretty much unusable. Since the connection is identified using the client's socket as well as the server's, this is no problem. At the same time that the Web server maintains the connection mentioned just above, it can easily have another connection to say, port 2,199 at IP address 219.31.0.44. This is represented by the connection identifier:
(41.199.222.3:80, 219.31.0.44:2199).
In fact, we can have multiple connections from the same client to the same server. Each client process will be assigned a different ephemeral port number, so even if they all try to access the same server process (such as the Web server process at 41.199.222.3:80), they will all have a different client socket and represent unique connections. This is what lets you make several simultaneous requests to the same Web site from your computer.
Again, TCP keeps track of each of these connections independently, so each connection is unaware of the others. TCP can handle hundreds or even thousands of simultaneous connections. The only limit is the capacity of the computer running TCP, and the bandwidth of the physical connections to it—the more connections running at once, the more each one has to share limited resources.
TCP Takes care of client identification
As a.m. said, TCP takes care of the client identification, and the server only sees a "socket" per client.
Say a server at 10.10.100.100 listens to port 80 for incoming TCP connections (HTTP is built over TCP). A client's browser (at 10.9.8.7) connects to the server using the client port 27143. The server sees: "the client 10.9.8.7:27143 wants to connect, do you accept?". The server app accepts, and is given a "handle" (a socket) to manage all communication with this client, and the handle will always send packets to 10.9.8.7:27143 with the proper TCP headers.
Packets are never simultaneous
Now, physically, there is generally only one (or two) connections linking the server to internet, so packets can only arrive in sequential order. The question becomes: what is the maximum throughput through the fiber, and how many responses can the server compute and send in return. Other than CPU time spent or memory bottlenecks while responding to requests, the server also has to keep some resources alive (at least 1 active socket per client) until the communication is over, and therefore consume RAM. Throughput is achieved via some optimizations (not mutually-exclusive): non-blocking sockets (to avoid pipelining/socket latencies), multi-threading (to use more CPU cores/threads).
Improving request throughput further: load balancing
And last, the server on the "front-side" of websites generally do not do all the work by themselves (especially the more complicated stuff, like database querying, calculations etc.), and defer tasks or even forward HTTP requests to distributed servers, while they keep on handling trivially (e.g. forwarding) as many requests per second as they can. Distribution of work over several servers is called load-balancing.
1) How does a web server handle multiple incoming requests at the same time on a single port(80)
==> a) one instance of the web service( example: spring boot micro service) runs/listens in the server machine at port 80.
b) This webservice(Spring boot app) needs a servlet container like mostly tomcat.
This container will have thread pool configured.
c) when ever request come from different users simultaneously, this container will
assign each thread from the pool for each of the incoming requests.
d) Since the server side web service code will have beans(in case java) mostly
singleton, each thread pert aining to each request will call the singleton API's
and if there is a need for Database access , then synchronization of these
threads is needed which is done through the #transactional annotation. This
annotation synchronizes the database operation.
2) Can one server (which is assigned with IP 10.10.100.100) handle this vast amount of incoming users?
If not, then how can one IP address be assigned to more than one server to handle this load?
==> This will taken care by loadbalancer along with routetable
answer is: virtual hosts, in HTTP Header is name of domain so the web server know which files run or send to client

Resources