HAproxy single-arm loadbalancing - networking

I am trying to setup a loadbalancing lab for HAproxy in single-arm mode (when actual frontend IP and backend servers reside in same subnet, while actual clients are always remote). Another request is to make client source IPs visible to backend nodes. As we load-balance custom tcp-based app, it seems that option 'source 0.0.0.0 usesrc clientip' is a right choice here. Also, I have configured backends to have default-gateways pointing to HAproxy's IP address.
Although strange things happen once I enable this backend option: I see connection to frontend VIP was properly done and 3-way handshake formed. But when HAproxy server is trying to build a 2nd session to reach out to backend servers with spoofed IP of a client, I see exactly this happening:
Proxy is sending SYN with spoofed Client's IP address to one of the backends;
Backend is normally repsonds with SYN-ACK packet;
Proxy is NOT sending last ACK, just blindly sends SYN packets after timeout with same outcome;
On a proxy I see this connection is marked as SYN_SENT in netstat output, so it looks like proxy server doesn't accept actualy SYN-ACK packet for some reason.
Any comment would be appreciated.

The source option makes HAProxy bind to a specific IP address before it relays the request to the server. If you just need to load balance servers over TCP/IP (not HTTP), then you do not need this.
Set mode tcp in your frontend and backend, which enables load balancing of TCP-enabled applications.
To forward the client's IP address to the server, can you modify your custom application to support the Proxy Protocol? https://www.haproxy.com/blog/using-haproxy-with-the-proxy-protocol-to-better-secure-your-database/

Related

HTTP proxy SSL tunneling relay details

I am trying to wrap my head around the ssl Tunneling process which is performed by an http proxy after receiving the CONNECT method from a client.
Stuff I can't seem to find or understand in docs, blogs, rfcs:
1) when setting up the tunnel, are the two connections from client-proxy and proxy-destination two separate connections or just one and the same? E.g. is there an tcp handshake between client-proxy and another between proxy-destination?
2) when starting the ssl handshake what node is targeted (ip address/hostname) by the client? The proxy or the destination host? Since ssl requires a point-to-point connection to make the authentication work my feeling tells me it should be the destination host. But then again that wouldn't make sense since the destination host isn't (directly) accessible from the clients perspective (hence the proxy).
when setting up the tunnel, are the two connections from client-proxy and proxy-destination two separate connections or just one and the same? E.g. is there an tcp handshake between client-proxy and another between proxy-destination?
Since the client makes the TCP connection to the proxy there is no other way than that the proxy is making another TCP connection to the server. There is no way to change an existing TCP connection to be connected to a different IP:port.
when starting the ssl handshake what node is targeted (ip address/hostname) by the client? The proxy or the destination host?
The SSL handshake is done with the destination host, not the proxy.
Since ssl requires a point-to-point connection to make the authentication
It doesn't need a point-to-point connection. It just needs that all data gets exchanged unmodified between client and server which is the case when the proxy simply forwards the data.

How does firewall handle incoming http traffic to a browser?

when a browser sends a request to a web server, the web server has to send a response.
from what i have understood from reading so far, the server than dispatches the packets of response data with dest-port/dest-ip parts being the client browser's.
1) If the above is right, than doesn't it mean that the browser has to always be listening to a port for incoming traffic from the server?
2) And if the client is listening for incoming connections on a port, isn't that a security concern?
3) If 2 is right, than how are most corporate firewalls for employees be configured? (seeing as they probably need to browse the net) - a quick overview, details unnecessary.
doesn't it mean that the browser has to always be listening to a port for incoming traffic from the server?
No. Layman's explanation: a browser initiates a TCP connection to the web server. This connection is recognized by source ip and port, dest ip and port and protocol by all intermediate level 3 machines (e.g. routers, firewalls).
In a TCP connection, one party listens (the web server) while the other party connects (the browser). Traffic can flow over this connection in both directions, until either party (or intermediate machine) closes the connection.
Corporate firewalls allow outbound connections over port 80 (and 443), so their employees can browse the web over HTTP(S). The data the server returns is sent over the connection initiated by the client.
Of course if an outside attacker knows of a connection, they can send packets with a spoofed IP, so they can send data pretending to be the server. Those packets will be dropped if anything is wrong, like the sequence number, so they won't end up in the user's browser.

How do browsers detect which HTTP response is theirs?

Given that you have multiple web browsers running, all which obviously listen on port 80, how would a browser figure if an incoming HTTP response was originated by itself? And whether or not catch the response and show it?
As part of the connection process a TCP/IP connection is assigned a client port. Browsers do not "listen on port 80"; rather a browser/clients initiate a request to port 80 on the server and waits for a reply on the client port from the server's IP.
After the client port is assigned (locally), each client [TCP/IP] connection is uniquely identified by (server IP, server port, client IP, client port) and the connection (and response sent over such) can be "connected back" to the correct browser. This same connection-identifying tuple is how a server doesn't confuse multiple requests coming from the same client/IP1
HTTP sits on top of the TCP/IP layer and doesn't have to concern itself with mixing up connection streams. (HTTP/2 introduces multiplexing, but that is a different beast and only affects connection from the same browser.)
See The Ephemeral Port Range for an overview:
A TCP/IPv4 connection consists of two endpoints, and each endpoint consists of an IP address and a port number. Therefore, when a client user connects to a server computer, an established connection can be thought of as the 4-tuple of (server IP, server port, client IP, client port). Usually three of the four are readily known -- client machine uses its own IP address and when connecting to a remote service, the server machine's IP address and service port number are required [leaving only the client port unknown and to be automatically assigned].
What is not immediately evident is that when a connection is established that the client side of the connection uses a port number. Unless a client program explicitly requests a specific port number, the port number used is an ephemeral port number. Ephemeral ports are temporary ports assigned by a machine's IP stack, and are assigned from a designated range of ports for this purpose. When the connection terminates, the ephemeral port is available for reuse, although most IP stacks won't reuse that port number until the entire pool of ephemeral ports have been used. So, if the client program reconnects, it will be assigned a different ephemeral port number for its side of the new connection.
See TCP/IP Client (Ephemeral) Ports and Client/Server Application Port Use for an additional gentle explanation:
To know where to send the reply, the server must know the port number the client is using. This [client port] is supplied by the client as the Source Port in the request, and then used by the server as the destination port to send the reply. Client processes don't use well-known or registered ports. Instead, each client process is assigned a temporary port number for its use. This is commonly called an ephemeral port number.
1 If there are multiple client computers (ie. different TCP/IP stacks each assigning possibly-duplicate ephemeral ports) using the same external IP then something like Network Address Translation must be used so the server still has a unique tuple per connection:
Network address translation (NAT) is a methodology of modifying network address information in Internet Protocol (IP) datagram packet headers while they are in transit across a traffic routing device for the purpose of remapping one IP address space into another.
thank you all for answers.
the hole listening thing over port 80 was my bad,I must have been dizzy last night :D
anyway,as I have read HTTP is connectionless.
browser initiates an HTTP request and after a request is made, the client disconnects from >the server and waits for a response. The server process the request and re-establish the >connection with the client to send response back.
therefor the browser does not maintain connection waiting for a response.so the answer is not that easy to just send the response back to the open socket.
here's the source
Pay attention browesers aren't listening on specific port to receive HTTP response. Web server listening on specific ports (usually 80 or 443). Browser open connection to web server, and send HTTP request to web server. Browser don't close connection before receive HTTP response. Web server writes HTTP response on opened connection.
Given that you have multiple web browsers running, all which obviously listen on port 80
Not obvious: just wrong. The HTTP server listens on port 80. The browsers connect to port 80.
how would a browser figure if an incoming HTTP response was originated by itself?
Because it comes back on the same connection and socket that was used to send the request.
And whether or not catch the response and show it?
Anything that comes back on the connected socket belongs to the guy who connected the socket.
And in any case all this is the function of TCP, not the browser.

Can you send outbound request from a VPS if all ports are closed?

Suppose I have a VPS with private networking setup such that the only ports that are open are the port for SSHing into the server and the port that connects the server to other servers on the private network. Can this same server still send requests through the internet and receive back responses? If so, through what 'channel' are the requests/responses being sent/received?
It depends on what the outbound firewall settings are on the server. If the firewall allows all outbound connections then you can connect out to any server on any protocol.
However, depending on the hosting provider, they may limit the ports which you can use for outbound connections. Most likely (but not guaranteed) you'll be able to use HTTP (80) and HTTPS (443). It is quite possible that SSH (22) would be open as well. Those three should cover most, if not all, of the needs to would have.

how can an application use port 80/HTTP without conflicting with browsers?

If I understand right, applications sometimes use HTTP to send messages, since using other ports is liable to cause firewall problems. But how does that work without conflicting with other applications such as web-browsers? In fact how do multiple browsers running at once not conflict? Do they all monitor the port and get notified... can you share a port in this way?
I have a feeling this is a dumb question, but not something I ever thought of before, and in other cases I've seen problems when 2 apps are configured to use the same port.
There are 2 ports: a source port (browser) and a destination port (server). The browser asks the OS for an available source port (let's say it receives 33123) then makes a socket connection to the destination port (usually 80/HTTP, 443/HTTPS).
When the web server receives the answer, it sends a response that has 80 as source port and 33123 as destination port.
So if you have 2 browsers concurrently accessing stackoverflow.com, you'd have something like this:
Firefox (localhost:33123) <-----------> stackoverflow.com (69.59.196.211:80)
Chrome (localhost:33124) <-----------> stackoverflow.com (69.59.196.211:80)
Outgoing HTTP requests don't happen on port 80. When an application requests a socket, it usually receives one at random. This is the Source port.
Port 80 is for serving HTTP content (by the server, not the client). This is the Destination port.
Each browser uses a different Source to generate requests. That way, the packets make it back to the correct application.
It is the 5-tuple of (IP protocol, local IP address, local port, remote IP address, remote port) that identifies a connection. Multiple browsers (or in fact a single browser loading multiple pages simultaneously) will each use destination port 80, but the local port (which is allocated by the O/S) is distinct in each case. Therefore there is no conflict.
Clients usually pick a port between 1024 and 65535.
It depends on the operating system how to handle this. I think Windows Clients increment the value for each new connection, Unix Clients pick a random port no.
Some services rely on a static client port like NTP (123 UDP)
A browser is a client application that you use in order to see content on a web server which is usually on a different machine.
The web server is the one listening on port 80, not the browser on the client.
You need to be careful in making the distinction between "listening on port 80" and "connecting to port 80".
When you say "applications sometimes use HTTP to send messages, since using other ports is liable to cause firewall problems", you actually mean "applications sometimes send messages to port 80".
The server is listening on port 80, and can accept multiple connections on that port.
Port 80 you're talking about here is the remote port on the server, locally browser opens high port for each connection established.
Each connection has port numbers on both ends, one is called local port, other remote port.
Firewall will allow traffic to high port for browser, because it knows that connection has been established from you computer.

Resources