Load Balancing and TCP Connections - tcp

Apologies if this is a very naive question, but in the case of a TCP connection from a client to a backend server that is behind a load balancer, how does the load balancer know which server to serve the data to after the initial connection?
Thanks

Related

TCP Packet loss after Azure load balancer

If we have one VM after the Azure load balancer (external) at first. The TCP connection will be routed to VM1.
--connection1 packet--> Azure Load Balancer (20.20.20.20) --connection1 packet--> VM1
Then we add a new VM after the lb.
--connection1 packet--> Azure Load Balancer (20.20.20.20) ----> VM1
--connection1 packet--> VM2
Normally the connection would still be routed to VM1 as there is connection tracking. But this is not guranteed because the Azure load balancer is implemented as a distributed software load balancer as described in this article.
So the packet might be routed to VM2. The expected behaviour is that the packet could get inside VM2 and gets a TCP RST to end this connection. But it turns out the packet would be dropped before it gets inside VM2.
I hope to know why this packet is dropped. Is it because the NAT?
I have gone through the article, and I am not sure that I am correctly spotting the reason:
The reason may be based on the Traffic Manager routing methods of azure. As in the article, they 've used a weighted random load policy. You can make look at this link for reference to various methods of handling traffic methods.
I am not damn sure, I just replied maybe this could be the reason.

network optimizations while web crawling - using udp and using connection pooling?

I'm looking at donne martin's design for a web crawler.
they are suggesting the following network optimization:
The Crawler Service can improve performance and reduce memory usage by keeping many open connections at a time, referred to as connection pooling
Switching to UDP could also boost performance
I don't understand both suggestions: what's connection pooling got to do with web crawling? isn't each crawler service opening its own connection to the host its currently crawling? what good would connection pooling do here?
and about UDP - isn't crawling issuing a HTTP over TCP requests to web hosts? how is UDP relevant here?
what's connection pooling got to do with web crawling? isn't each crawler service opening its own connection to the host its currently crawling?
I think you are assuming that the crawler will send a request to a host only once. This is not the case, a host may have hundreds of pages that you want to crawl, and opening a connection each time is not efficient.
about UDP - isn't crawling issuing a HTTP over TCP requests to web hosts? how is UDP relevant here?
Taken from the book Web Data Mining:
The crawler needs to resolve host names in URLs to IP addresses. The
connections to the Domain Name System (DNS) servers for this purpose
are one of the major bottlenecks of a naïve crawler, which opens a new
TCP connection to the DNS server for each URL. To address this
bottleneck, the crawler can take several steps. First, it can use UDP
instead of TCP as the transport protocol for DNS requests. While UDP
does not guarantee delivery of packets and a request can occasionally
be dropped, this is rare. On the other hand, UDP incurs no connection
overhead with a significant speed-up over TCP

Can HAProxy loadbalance multiple requests send through single TCP connection

Our client to HAProxy establishes single TCP connection and continues to send messages. We would like to know, is there are way to load balance those messages across the services sitting behind HAProxy. Please advise.

How to load balancing multiple Netty TCP socket server with nginx stream module?

I need load balancing tcp socket connections to multiple netty.io server.
At nginx 1.9, it has stream-module, it support load balancing tcp socket.
I test success with 1 traccar server. Nginx listener port 5095 and forward package to port 5005 of traccar server.
But with multiple server, problem will happen. DeviceX open socket to serverA, but will send package to serverB.
Please give me advice!
Thank you very much.
I test with module stream.I have 3 server X,Y,Z, and 3 client A,B,C. And I don't see problem loss connected as I think.
ClientA open socket to serverX,and communication is always send to ServerX.
If ClientA close connection socket and reopen it, clientA will communicate to ServerY, and communication is always send to ServerY.

Load on load balancer

In our TCP servers deployment, we have load balancer to which all clients initially connect. Then load balancer gives each of them actual server IP address to which they are suppose to connect. Client then disconnects from load balancer and proceeds with TCP connection to the server IP address they've got. Thus, load is being distributed amongst servers.
This arrangement works perfectly well for thousands of connections. But we are worried if this would work for millions of number of connections? Load balancer itself will not be able to cater server IP address to all those clients in timely manner, is what our nightmare. What are alternatives here?
Really depends on the load balancer you are using on whether it can cater or not. Some load balancers can do millions of L4 connections. Also I don't think that having connections go direct to the server is a good idea because what happens to the connections if the server becomes unavailable. I would keep all traffic going to the load balancer. You could also consider Direct Server Return which is where requests from clients go through the load balancer and responses go direct to the client (bypassing the load balancer).

Resources