TCP connections and traffic routing under HTTP load balancing - http

I'm trying to get my head around what type of load (e.g. in terms of network traffic) a load balancer for HTTP traffic can be expected to receive / forward / redirect.
Consider the following two different HTTP traffic scenarios.
Single HTTP request with a payload that exceeds significantly in size what fits on a single TCP IP packet (i.e. request requires many network packets)
Multiple HTTP requests over a single persistent TCP connection
For each of them:
Would all network traffic (e.g. TCP IP packets) go through the load balancer itself? (i.e. load balancer "acting like a cable" seeing and forwarding all traffic so-to-speak)
Or would the load balancer get the client to establish a direct TCP connection with the IP address of the app servers alleviating the LB of the network traffic?
Does this depend on the protocol? (e.g. say HTTP requests vs WebSocket)? Or do these protocols require that a TCP connection is always available with the load balancer receiving and forwarding all IP traffic throughout the full duration of the WebSocket connection or request-response HTTP transmission?
             

It depends.
There are application layer load balancers like HAProxy, where the full HTTP request and response is passed through the proxy. There are two separate TCP connections here: one between client and load balancer and another between load balancer and upstream server. The choice of the upstream server can be done based on the contents of the HTTP request, for example the Host header and/or path but also session cookies - to make sure that the same session is always handled by the same upstream server. If the decision for the upstream server is done on the HTTP request then the connection to the upstream server can only be established after the HTTP request was read, since it is not known before. But the request does not need to be inside a single packet.
There are network or transport layer load balancers which do not act on the packet payload at all. Instead the choice of upstream server is usually done based on the client IP, so that the same client ends up on the same upstream server. In this case the decision which upstream to use is already done on the first packet (i.e. the SYN starting the TCP handshake) and the client essentially establishes the connection directly with the upstream proxy - the load balancer only forwards the packets like a router does. The size of the HTTP request does not matter here either since the TCP payload is not even inspected to make the routing decision.
With a network or transport layer load balancer there can be asymmetric routing, i.e. the response might go a different way and not pass through the load balancer. With application layer load balancing the response goes back through the load balancer instead.

Related

Can Load Balancer be only responsible for TCP handshake and selecting appropriate workers?

Can Load Balancer be only responsible for TCP handshake and selecting appropriate workers and next network flows are transferred between client and worker?I searched lots of information about Load Balance , like LVS , but seems that IP Tunneling and Direct Routing Mode can only be used for achieving the following purpose:
The response data can be directly send from worker to client.
The request data is still relayed by load balancer which means that the request data is passed from client to Load Balancer to Worker..
What I want to achieve is as follows:
Client sends request to Load Balancer.
Load Balancer assigns a proper worker to Client.
Then next Client and worker communicate directly but not going through the load balancer.
The next steps may work ,but I think it is a little stupid..
Client sends request to Load Balancer.
Load Balancer response the proper worker's IP to client.
Client disconnect from Load Balancer and establish a TCP Connection to Worker directly.
Then next Client and worker can communicate as expected.
Does anyone have other better ideas?

Load balancer for websockets

I know how load balancers work for http requests. A client opens a connection with the LB, LB forwards the request to the backend servers, LB gets a response and from the same connection it sends the response to the client and closes the connection. I want to know the internal details of load balancer for websockets. How the connections are maintained and how the response is sent to the client. I read many questions on stackoverflow but none of them gave a clear picture of the internal implementation of LB
the LB just route the connection to a server behind it.
so as long you keep the connection open you will keep being connected to the same server and do not communicate with the LB again.
depending on the client, on reconnection you could be routed to another server.
I'm not sure how it works when some libraries fallback to JSON-P tho
Implementations of load balancers have great variety. There are load balancers that support websockets, like F5's BIG-IP (https://support.f5.com/kb/en-us/solutions/public/14000/700/sol14754.html), and LB's that I don't think support websocekts, like AWS ELB (there is a thread where somebody says they could make it with ELB but I suppose they added some other component behind ELB: How do you get Amazon's ELB with HTTPS/SSL to work with Web Sockets?.
Load Balancer's not only act as terminators of HTTP connections, they can terminate also HTTPS, SSL, and TCP connections. They can implement stickiness based on different parameters, like cookies, origin IP, etc. (like F5) In the case of ELB's they use only cookies, and it could be application generated cookies or LB generated cookies (both only with HTTP or HTTPS). Also stickiness can be kept for certain defined time, sometimes configurable.
Now, in order to forward data corresponding to websockets, they need to terminate, and forward, connections at level of SSL or TCP (not HTTP or HTTPS). Unless they understand websocket protocol (I don't know if any does it). Additionally, they need to keep stickiness to the server with which the connetion was opened. This is not possible with ELB but yes with more complex LB's like BIG-IP.

WebSockets - why is the handshake HTTP? Sharing port 80

I'm not clear why the handshake for WebSocket is HTTP. Wiki says "The handshake resembles HTTP so that servers can handle HTTP connections as well as WebSocket connections on the same port." What is the benefit of this? Once you start communicating over WebSocket you are using port 80 also...so why can't the initial handshake be in WebSocket format?
Also, how do you have both WebSocket and HTTP servers listening on port 80? Or is it typically the same application functioning as HTTP and WebSocket servers?
Thanks y'all :)
WebSockets are designed to work almost flawlessly with existing web infrastructures. That is the reason why WS connections starts as HTTP and then switches to a persistent binary connection.
This way the deployment is simplified. You don't need to modify your router's port forwarding and server listen ports... Also, because it starts as HTTP it can be load balanced in the same way that a normal HTTP request, firewalls are more lean to let the connection through, etc.. etc... Last but not the least, the HTTP handshake also carry cookies, which it is great to integrate with the rest of the app in the same way that AJAX does.
Both, traditional HTTP request-response and WS, can operate in the same port. Basiclally the WS client sends a HTTP request asking for "Upgrade:websocket", then if the server accepts the WS connections, replies with a HTTP response indicating "101 Switching Protocols", from that point the connection remains open and both ends consider it as a binary connection.

Why are HTTP proxies able to support protocols like IRC and FTP?

I understand that a SOCKS proxy only establishes a connection at the TCP level while an HTTP proxy interprets traffic at HTTP level. Thus a SOCKS proxy can work for any kind of protocol while an HTTP Proxy can only handle HTTP traffic. But why does an HTTP Proxy like Squid can support protocol like IRC, FTP ? When we use an HTTP Proxy for an IRC or FTP connection, what does specifically happen? Is there any metadata added to the package when it is sent to the proxy over the HTTP protocol?
HTTP proxy is able to support high level protocols other than HTTP if it supports CONNECT method, which is primarily used for HTTPS connections, here is description from Squid wiki:
The CONNECT method is a way to tunnel any kind of connection through an HTTP proxy. By default, the proxy establishes a TCP connection to the specified server, responds with an HTTP 200 (Connection Established) response, and then shovels packets back and forth between the client and the server, without understanding or interpreting the tunnelled traffic
If client software supports connection through 'HTTP CONNECT'-enabled (HTTPS) proxy it can be any high level protocol that can work with such a proxy (VPN, SSH, SQL, version control, etc.)
As others have mentioned, the "HTTP CONNECT" method allows you to establish any TCP-based connection via a proxy. This functionality is needed primarily for HTTPS connections, since for HTTPS connections, the entire HTTP request is encrypted (so it appears to the proxy as a "meaningless" TCP connection). In other words, an HTTPS session over a proxy, or a SSH/FTPS session over a proxy, will both appear as "encrypted sessions" to the proxy, and it won't be able to tell them apart, so it has to either allow them all or none of them.
During normal operation, the HTTP proxy receives the HTTP request, and is "smart enough" to understand the request to be able to do high level things with it (e.g. search its cache to see if it can serve the response without going to the destination server, or consults a whitelist/blacklist to see if this URL is allowed, etc.). In "CONNECT" mode, none of this happens. The proxy establishes a TCP connection to the destination server, and simply forwards all traffic from the client to the destination server and all traffic from the destination server to the client. That means any TCP protocol can work (HTTPS, SSH, FTP - even plain HTTP)

http push - http streaming method with ssl - do proxies interfere whith https traffic?

My Question is related to the HTTP Streaming Method for realizing HTTP Server Push:
The "HTTP streaming" mechanism keeps a request open indefinitely. It
never terminates the request or closes the connection, even after the
server pushes data to the client. This mechanism significantly
reduces the network latency because the client and the server do not
need to open and close the connection.
The HTTP streaming mechanism is based on the capability of the server
to send several pieces of information on the same response, without
terminating the request or the connection. This result can be
achieved by both HTTP/1.1 and HTTP/1.0 servers.
The HTTP protocol allows for intermediaries
(proxies, transparent proxies, gateways, etc.) to be involved in
the transmission of a response from server to the client. There
is no requirement for an intermediary to immediately forward a
partial response and it is legal for it to buffer the entire
response before sending any data to the client (e.g., caching
transparent proxies). HTTP streaming will not work with such
intermediaries.
Do I avoid the descibed problems whith proxy servers if i use HTTPS?
HTTPS doesn't use HTTP proxies - this would make security void. HTTPS connection can be routed via some HTTP proxy or just HTTP redirector by using HTTP CONNECT command, which establishes transparent tunnel to the destination host. This tunnel is completely opaque to the proxy, and proxy can't get to know, what is transferred (it can attempt to modify the dataflow, but SSL layer will detect modification and send an alert and/or close connection), i.e. what has been encrypted by SSL.
Update: for your task you can try to use one of NULL cipher suites (if the server allows) to reduce the number of operations, such as perform no encryption, anonymous key exchange etc. (this will not affect proxy's impossibility to alter your data).

Resources