Port Forward based on condition in Twisted - networking

I am trying to write a port-forwarder in Twisted, that will forward to port 8000 if an IP address is already in the cache & to another port - say 4000 if not. I already have the cache written, but am having issues with where to add logic to the portforwarding with Twisted.
Take this simple example:
class LoggingProxyServer(portforward.ProxyServer):
def dataReceived(self, data):
portforward.ProxyServer.dataReceived(self, data)
class LoggingProxyFactory(portforward.ProxyFactory):
protocol = LoggingProxyServer
What twisted method do I override to add the cache checking?

ProxyServer.connectionMade is responsible for setting up the TCP connection that is outgoing from the proxy process. It uses the host and port attributes of its factory to decide what it's going to use as the destination of that connection attempt.
If you want to vary the behavior of the proxy, that's the code you'll need to override.
You can easily find the IP address of the client which has connected to ProxyServer. The ProxyServer instance has a transport attribute that refers to an ITransport provider (probably an ITCPTransport provider if your proxy is listening for incoming TCP connections).
Transports have methods to tell you the addresses of their two endpoints. getHost tells you the local address and getPeer tells you the remote address.
So, for example, you could write a conditional that had one behavior for all TCP clients with IP addresses starting with a 1 and something else for all TCP clients with other IP addresses:
if self.transport.getPeer().host.startswith("1"):
...
else:
...

Related

gRPC ServerPort.PickUnused - how will client connect?

When establishing the server-side of grpc you can specify that it automatically chooses an unused port for you. However, if you use this feature then how will the clients know which port to connect to since it is dynamic?
In my particular case, I'll be using local ipc, though I suppose the question can pertain to remote ipc as well.
https://grpc.github.io/grpc/csharp/api/Grpc.Core.ServerPort.html
They must be told.
PickUnused is a convenience to save the server (developer) determining an available port. It does not change nor simplify the client's port determination.
Clients need to know a remote host address (if any) and a socket/port in order to connect to a server.
Host addresses can be looked up (e.g. DNS) and the only solutions for the socket's discovery are:
Static (not PickUnused)
Well-known e.g. HTTP on 80 and HTTP/S on 443
Service|Port lookup (server publishes its port to some service discovery1)
Port scanning
1 -- Perhaps another gRPC service using PickUnused 😃 Turtles, all the way down!

How does a server know which domain name was used?

As far as i know what we get from a dns query is a ip address. So in the end of the day if thats true we are still using ip addresses to connect the server and domains are pretty names for them.
So how does a server know which domain i used to query that ip address?
How does vhosts work an understand that if the domain data is lost during dns query?
The Internet works in layers. Each layer uses different kind of parameters to do its work.
Layer 3 is typically IP aka Internet Protocol. To work it uses IP addresses, each computer has at least one to be able to discuss with another one. And there are two families in fact: version 4 and version 6.
Since multiple services can be on any given computer at some point, you need a layer on top of that, layer 4, that deals with transport. The "predominant" one is TCP aka Transport Control Protocol, but there is also UDP. TCP and UDP uses ports: a 2 bytes integer that encodes for a specific protocol.
For example, HTTP was given port number 80 (completely arbitrary), and HTTPS port 443.
The DNS, which itself uses UDP and TCP (on port 53), allows, among other things, to map a given hostname to a given IP address or multiple IP addresses. This is the typical A and AAAA records. There is also a CNAME record that maps one domain name to another. There also exists a SRV record that maps a service (which is a protocol name + a transport) to a given hostname and port number.
When one computer connects to another, its first step for all the above is to find out which IP address to use to connect to. It can use the DNS for that. Typically it will get only the IP address, but, depending on the protocol (layer above 4), may also get a port (if using SRV records).
The HTTP world does not use SRV records. So a browser just uses the hardcoded 80 or 443 ports, or the port number appearing in the URL.
Then we are at the transport level, let us say TCP.
The connection is done (since now the remote IP address and port are known) and the protocol above TCP, like HTTP, is free to convey any kind of extra data, such as the hostname that the client initially used (as taken from the URL) to find out the IP address.
This is done through the HTTP host header, see RFC 2616
Note that if you do things through TLS (which conceptually sits between TCP and HTTP) there is even something else happening: SNI or Server Name Indication.
When doing the TLS handshake, so before any kind of HTTP headers or content, the client will send the final hostname desired in some specific TLS message. Why? So that the server can find which specific certificate it should answer which as otherwhise it would not be able to know which hostname is requested as this sits in some HTTP header which do not exist until the TLS handshake is finished.
A webserver will be able to see both the SNI content to find out which certificate to send back and then the host header to find out which VirtualHost (in Apache) section is relevant to the query being processed.
If you are not in HTTP world, then it all depends on the protocol used. Older protocols, like FTP, did not plan for "multihoming" at the beginning, a given IP address meant only one hostname and service for example.

What hostname did the client use to connect to my TCP server?

In http the client supplies the hostname it used to connect to the service with. Now, for bare TCP connections, is there something similar one can do? My scenario is I have a service that has multiple open TCP ports and that works fine, but for convenience I would like to use the same port and subdomains. Is there any layer I can add on top (like a load balancer), or change the service? I have control over most things, basically anything goes.
Example:
Today I can connect to two TCP services like so: foobar.com:1001 and foobar.com:1002. Is it possible to have e.g. service-1.foobar.com:1000 go to foobar.com:1001 and service-2.foobar.com:1000 go to foobar.com:1002.
Different services can bind to same port but on different IP. Hence different domains shall resolve to different IPs : Port combination [where Port is same for all services]. And you can use Proxy service as from HA Proxy to route connections to final destination.
If I understand your question correctly based on your example then no it is not possible. In this case, there is no difference between an HTTP and TCP connection.
In both cases, the hostname is simply resolved to an ip address. If you setup DNS resolution for foobar.com, service-1.foobar.com, service-2.foobar.com to point to the same ip address then they will all go to the same machine.
I have at times needed to have a service running on a different port internally than it is accessible externally. For that, if you are running on Linux, you can simply use iptables to do the port forwarding.
You can find other stack overflow questions/answers for setting up the port forwarding.
https://serverfault.com/questions/140622/how-can-i-port-forward-with-iptables

Why two HTTP and TCP addresses can use the same port and two IPC addresses cannot use the same named pipe?

What I think of a port is: Whenever a message arrives to a machine, it is copied to a memory area which is mapped to the port specified and the concerned application or service is notified that a message has arrived for it.
If this is true, then what happens if two messages arrive for two different services listening on the same port ? ( either http or tcp )
And why can not two named pipe addresses use the same named pipe ?
TCP identifies "connections" via a tuple of { local ip, local port, remote ip, remote port }. Therefore, since each incoming connection has a different remote ip/port pair, your local machine can distinguish between them.
HTTP uses TCP for its transport. Thus, an HTTP port is a TCP port.
If you've ever had your machine get a new IP address while you had connections open, you'll note that they break the first time they send any data out since the remote host does not recognize the (new) address and sends a RST response.
A pipe has only its name to distinguish it so there is only one "connection" no matter how many writers it has.
Your description is one way to handle incoming messages.
In the case of two web sites listening on the same port, there is one web server listening on that port, which then looks at the http host header to find the correct web site to forward the request to.
The same is true for named pipes, the RPC listener listens on the TCP port, and then finds out that it is a named pipe message and then forwards the message to the right named pipe.

How are different TCP connections in HTTP requests identified?

From what I understand, each HTTP request uses its own TCP connection (please correct me if i'm wrong). So, let's say that there are two current connections to the same server. For example, client side javascript code triggering a couple of AJAX POST requests using the XMLHttpRequest object, one right after the other, before getting the response to the first one. So we're talking about two connections to the same server, each waiting for a response in order to route it to each separate callback function.
Now here's the thing that I don't understand: The TCP packet includes source and destination ip and port, but won't both of these connections have the same src and dest ip addresses, and port 80? How can the packets be differentiated and routed to appropriately? Does it have anything to do with the packet sequence number which is different for each connection?
When your browser creates a new connection to the HTTP server, it uses a different source port.
For example, say your browser creates two connections to a server and that your IP address is 60.12.34.56. The first connection might originate from source port 60123 and the second from 60127. This is embedded in the TCP header of each packet sent to the server. When the server replies to each connection, it uses the appropriate port (e.g. 60123 or 60127) so that the packet makes it back to the right spot.
One of the best ways to learn about this is to download Wireshark and just observe traffic on your own network. It will show you this and much more.
Additionally, this gives insight into how Network Address Translation (NAT) works on a router. You can have many computers share the same IP address and the router will rewrite the request to use a different port so that two computers can simultaneously connect to places like AOL Instant Messenger.
They're differentiated by the source port.
The main reason for each HTTP request to not generate a separate TCP connection is called keepalives, incidentally.
A socket, in packet network communications, is considered to be the combination of 4 elements: server IP, server port, client IP, client port. The second one is usually fixed in a protocol, e.g. http usually listen in port 80, but the client port is a random number usually in the range 1024-65535. This is because the operating system could use those ports for known server protocols (e.g. 21 for FTP, 22 for SSH, etc.). The same network device can not use the same client port to open two different connections even to different servers and if two different clients use the same port, the server can tell them apart by their IP addresses. If a port is being used in a system either to listen for connection or to establish a connection, it can not be used for anything else. That's how the operating system can dispatch packets to the correct process once received by the network card.

Resources