Understanding the SOCKS5 protocol RFC

Understanding the SOCKS5 protocol RFC - tcp

I am reading the SOCKS5 RFC, it has:
CONNECT
In the reply to a CONNECT, BND.PORT contains the port number that the
server assigned to connect to the target host, while BND.ADDR
contains the associated IP address. The supplied BND.ADDR is often
different from the IP address that the client uses to reach the SOCKS
server, since such servers are often multi-homed. It is expected
that the SOCKS server will use DST.ADDR and DST.PORT, and the
client-side source address and port in evaluating the CONNECT
request.
For the last part of this paragraph, I have two questions:
The doc states that SOCKS servers are often multi-homed, and will reply to the client different bound address and port than the ones the client originally connects to. Does this mean the SOCKS server the client connects to redirects the connection to another SOCKS server? If so, what is point of letting the client sense the presence of the redirected SOCKS server? What will a client normally do with the bound address and port the SOCKS server replies?
The doc states It is expected that the SOCKS server will use DST.ADDR and DST.PORT, and the client-side source address and port in evaluating the CONNECT request, what exactly does it mean by evaluating the CONNECT request? What am I supposed to do in this evaluating process if I am implementing a SOCKS server?

SOCKS proxies are multi-homed because they are often installed at network boundaries like firewalls. The client connects to the internal interface of the firewall but the outgoing address is the external face. Since some protocols like FTP need to include the external visible IP address and port in-band (see FTP data transfer, i.e. PORT and PASV) they need to know this externally visible IP and port.
A normal socks proxy will connect where the clients wants to, i.e. DST. But when an upstream proxy is configured or when firewall ACL say different the proxy might behave differently.

No. It means the server has 2 (or more) network cards/connections -- you communicate with the server on cardA, but when that server connects to the device downstream, it uses cardB.
That's up to you really...perhaps you want to blacklist/whitelist certain clients/servers/ports (ex. only allow clients from your country, or only allow connections to a specific country). Good example is not letting a client connect back to itself (?). Just a guess. Usually RFCs are good about saying "MUST, MIGHT, MUST NOT, etc" ..if it says "expected", to me that sounds like 'might' which basically means 'can, but doesn't have to.'

Related

How picky is your average Stateful Firewall system?

Given a scenario:
The Client, described as a user of sorts on a home network with a standard router running NAT/PAT on a residential network. Imagine all traffic within their network is allowed, but the router is configured with a Stateful Firewall that performs basic SPI on incoming and outgoing data.
A Primary server that the client can always reach, and has no security, available on the open internet. Only responds to HTTP, and relays the information (IP:PORT) to the secondary server for later processing
A secondary server that can always contact the primary server, has a fundamentally different physical location and IP address/Network scheme from both the primary and client. It cannot contact the Client until the client contacts the Primary Server to relay its ip and port.
Given this data, my question is this:
If the client sends an HTTP GET request to the primary server, and then immediately takes the ephemeral source port attached to that GET request and sets up an HTTP server with that same source port, after the GET is finished, could the secondary server solicit that harvested port and IP address combination with a GET request and receive a response?
This would be effectively, NAT Traversal using ONLY HTTP. Would the stateful firewall reject the packets from the secondary server given that a different IP is now poking at a port that originally had the primary server poke at? Could multiple computers connect to this now opened server on the client?

How does a server know which domain name was used?

As far as i know what we get from a dns query is a ip address. So in the end of the day if thats true we are still using ip addresses to connect the server and domains are pretty names for them.
So how does a server know which domain i used to query that ip address?
How does vhosts work an understand that if the domain data is lost during dns query?

The Internet works in layers. Each layer uses different kind of parameters to do its work.
Layer 3 is typically IP aka Internet Protocol. To work it uses IP addresses, each computer has at least one to be able to discuss with another one. And there are two families in fact: version 4 and version 6.
Since multiple services can be on any given computer at some point, you need a layer on top of that, layer 4, that deals with transport. The "predominant" one is TCP aka Transport Control Protocol, but there is also UDP. TCP and UDP uses ports: a 2 bytes integer that encodes for a specific protocol.
For example, HTTP was given port number 80 (completely arbitrary), and HTTPS port 443.
The DNS, which itself uses UDP and TCP (on port 53), allows, among other things, to map a given hostname to a given IP address or multiple IP addresses. This is the typical A and AAAA records. There is also a CNAME record that maps one domain name to another. There also exists a SRV record that maps a service (which is a protocol name + a transport) to a given hostname and port number.
When one computer connects to another, its first step for all the above is to find out which IP address to use to connect to. It can use the DNS for that. Typically it will get only the IP address, but, depending on the protocol (layer above 4), may also get a port (if using SRV records).
The HTTP world does not use SRV records. So a browser just uses the hardcoded 80 or 443 ports, or the port number appearing in the URL.
Then we are at the transport level, let us say TCP.
The connection is done (since now the remote IP address and port are known) and the protocol above TCP, like HTTP, is free to convey any kind of extra data, such as the hostname that the client initially used (as taken from the URL) to find out the IP address.
This is done through the HTTP host header, see RFC 2616
Note that if you do things through TLS (which conceptually sits between TCP and HTTP) there is even something else happening: SNI or Server Name Indication.
When doing the TLS handshake, so before any kind of HTTP headers or content, the client will send the final hostname desired in some specific TLS message. Why? So that the server can find which specific certificate it should answer which as otherwhise it would not be able to know which hostname is requested as this sits in some HTTP header which do not exist until the TLS handshake is finished.
A webserver will be able to see both the SNI content to find out which certificate to send back and then the host header to find out which VirtualHost (in Apache) section is relevant to the query being processed.
If you are not in HTTP world, then it all depends on the protocol used. Older protocols, like FTP, did not plan for "multihoming" at the beginning, a given IP address meant only one hostname and service for example.

RPC & TCP Behavior

Can someone describe from a network point of view what RPC (SUN and/or DCE) is and why it deviates from standard TCP behavior?
The way that I understand it is a client reaches out to a server with a unique source port and then switches the source port after the TCP three way handshake finishes. I work with ASA firewalls so this behavior becomes very apparent when the inspection of DCE RPC is not enabled since the firewall will block it because it sees it as a threat. I have read a few MS TechNet articles and other website definitions to include watching about five Youtube videos which all seem to explain it from a programmers perspective but I have yet to fully understand this concept since I am not a programmer.

Note that there is nothing that deviates from standard TCP regarding the RPC protocols.
SunRPC or DCE RPC works on top of UDP(at least SunRPC can use UDP) or on top of TCP.
Typically in order for an RPC client to contact/call an RPCserver, it first contacts some sort of lookup server (called portmapper or rpcbind in the case of SunRPC), which replies with the location (IP address and port number) of where the actual server is running.
So from a networking perspective:
RPC Servers listens on a random port number, which may change each time that server program is (re)started.
At startup the RPC server connects to the portmapper, which runs on a well known port and register itself with which IP address and port number it's listening on.
Normally the portmapper service runs on the same machine as the RPC server programs.
When a client wants to connect to or call an RPC service it performs these actions:
Connects to the portmapper, on a well known/standard destination port and asks it where the particular service it wants to connect to is.
portmapper replies with the IP address and port number of the service the client asked for.
client tears down the connection to the portmapper
client establishes a new connection to the service using the IP address and port number that pormapper gave it.
client calls the RPC service over this new connection, which the client may use for multiple RPC calls.
These RPC calls are just application message exchanged on top of a TCP connection.
(In the case UDP is used instead of TCP, it works much the same, but there's no naturally no connection setup/teardown being performed over the network)
This presents a problem for firewalls, since the servers listens on randomly chosen port numbers, one cannot administratively allow access to a particular port number. Instead a firewall wanting to support this kind of setup would need to open up the portmapper port, catch the RPC messages going to that well known port of portmapper, inspect the message content exchanged with the portmapper to extract the IP address and port number from the RPC messages(the portmapper is itself implemented as an RPC server) in order to dynamically open a port between the RPC server and client.

How the clients (client sockets) are identified?

To my understanding by serverSocket = new ServerSocket(portNumber) we create an object which potentially can "listen" to the indicated port. By clientSocket = serverSocket.accept() we force the server socket to "listen" to its port and to "accept" a connection from any client which tries to connect to the server through the port associated with the server. When I say "client tries to connect to the server" I mean that client program executes "nameSocket = new Socket(serverIP,serverPort)".
If client is trying to connect to the server, the server "accepts" this client (i.e. creates a "client socket" associated with this client).
If a new client tries to connect to the server, the server creates another client socket (associated with the new client). But how the server knows if it is a "new" client or an "old" one which has already its socket? Or, in other words, how the clients are identified? By their IP? By their IP and port? By some "signatures"?
What happens if an "old" client tries to use Socket(serverIP,serverIP) again? Will server create the second socket associated with this client?

The server listens on an address and port. For example, your server's IP address is 10.0.0.1, and it is listening on port 8000.
Your client IP address is 10.0.0.2, and the client "connects" to the server at 10.0.0.1 port 8000. In the TCP connect, you are giving the port of the server that you want to connect to. Your client will actually get its own port number, but you don't control this, and it will be different on each connection. The client chooses the server port that it wants to connect to and not the client port that it is connecting from.
For example, on the first connection, your client may get client-side port 12345. It is connecting from 10.0.0.2 port 12345 to the server 10.0.0.1 port 8000. Your server can see what port the client is connecting from by calling getpeername on its side of the connection.
When the client connects a second time, the port number is going to be different, say port 12377. The server can see this by calling getpeername on the second connection -- it will see a different port number on the client side. (getpeername also shows the client's IP address.)
Also, each time you call accept on the server, you are getting a new socket. You still have the original socket listening, and on each accept you get a new socket. Call getpeername on the accepted socket to see which client port the connection is coming from. If two clients connect to your server, you now have three sockets -- the original listening socket, and the sockets of each of the two clients.
You can have many clients connected to the same server port 8000 at the same time. And, many clients can be connected from the same client port (e.g. port 12345), only not from the same IP address. From the same client IP address, e.g. 10.0.0.2, each client connection to the server port 8000 will be from a unique client port, e.g. 12345, 12377, etc. You can tell the clients apart by their combination of IP address and port.
The same client can also have multiple connections to the server at the same time, e.g. one connection from client port 12345 and another from 12377 at the same time. By client I mean the originating IP address, and not a particular software object. You'll just see two active connections having the same client IP address.
Also, eventually over time, the combination of client-address and client-port can be reused. That is, eventually, you may see a new client come in from 10.0.0.2 port 12345, long after the first client at 10.0.0.2 port 12345 has disconnected.

Every TCP connection has as identifier the quadruple (src port, src address, dest port, dest address).
Whenever your server accepts a new client, a new Socket is created and it's indipendent from every other socket created so far. The identification of clients is not implictly handled somehow..
You don't have to think sockets as associated to "clients", they are associated with an ip and a port, but there is not direct correlation between these two.
If the same client tries to open another socket by creating a new one you'll have two unrelated sockets (because ports will be different for sure). This because the client cannot use the same port to open the new connection so the quadruple will be different, same client ip, same server ip, same server port but different client port.
EDIT for your questions:
clients don't specify a port because it's randomly choosen from the free ones (> 1024 if I'm not wrong) from the underlying operating system
a connection cannot be opened from a client using the same port, the operating system won't let you do that (actually you don't specify any port at all) and in any case it would tell you that port is already bound to a socket so this issue cannot happen.
whenever the server receives a new connection request it's is considered new, because also if ip is the same port will be different for sure (in case of old packet resend or similar caveats I think that the request will be discarded)
By the way all these situations are clearly explained in TCP RFC here.

I think the question here is why do you care if the client is new or old. What is new and old?
For example, a web browser could connect to a web server to request a web page. This will create a connection so serverSocket.accept() will return a new Socket. Then the connection is closed by the web browser.
Afer a couple of minutes, the end used click on a link in the web page and the browser request a new page to the server. This will create a connection so serverSocket.accept() will return a new Socket.
Now, the web server do not care if this is a new or old client. It just need to server the requested page. If the server do care if the "client" already requested a page in the past, it should do so using some information in the protocol used on the socket. Check out http://en.wikipedia.org/wiki/OSI_model
In this case, the ServerSocket and Socket ack on the transport level. The question "does this client already requested a page on the server" should be answered by information on the session or even application layer.
In the web browser/server example, the http protocol (which is an application) protocol hold information about who is this browser in the parameters of the request (the browser transmit cookie informations with every request). The http server can then set/read cookie information to known if the browser connected before and eventually maintain a server side session for that browser.
So back to your question: why do you care if it's a new or old client?

A socket is identified by:
(Local IP,Local Port, Remote IP,
Remote Port,IP Protocol(UDP/TCP/SCTP/etc.)
And that's the information the OS uses to map the packets/data to the right handle/file descriptor of your program. For some kinds of sockets,(e.g. an non-connected UDP socket)the remote port/remote IP might be wildcards.

By definition, this is not a Java related question, but about networking in general, since Sockets and SeverSockets apply to any networking-enabled programming language.
A Socket is bounded to a local-port. The client will open a connection to the server (by the Operating System/drivers/adapters/hardware/line/.../line/hardware/adapters/drivers/Server OS). This "connection" is done by a protocol, called the IP (Internet Protocol) when you are connected to the Internet. When you use "Sockets", it will use another protocol, which is the TCP/IP-protocol.
The Internet Protocol will identify nodes on a network by two things: their IP-address and their port. The TCP/IP-protocol will send messages using the IP, and making sure messages are correctly received.
Now; to answer your question: it all depends! It depends on your drivers, your adapters, your hardware, your line. When you connect to your localhost machine, you will not get further than the adapter. The hardware isn't necessairy, since no data is actually sent over the line. (Though often you need hardware before you can have an adapter.)
By definition, the Internet Protocol defines a connection as pair of nodes (thus four things: two IP-adresses and two ports). Also, the Internet Protocol defines that one node can only use one port at a time to initiate a connection with another node (note: this only applies for the client, not the server).
To answer your second question: if there are two Sockets: the "new" and the "old". Since, by the Internet Protocol, a connection is a pair of nodes, and nodes can only use one port at a time for a connection, the ports of "new" and "old" must be different. And because this is different, the "new" client can be discriminated from the "old", since the port-number is differently.

How are different TCP connections in HTTP requests identified?

From what I understand, each HTTP request uses its own TCP connection (please correct me if i'm wrong). So, let's say that there are two current connections to the same server. For example, client side javascript code triggering a couple of AJAX POST requests using the XMLHttpRequest object, one right after the other, before getting the response to the first one. So we're talking about two connections to the same server, each waiting for a response in order to route it to each separate callback function.
Now here's the thing that I don't understand: The TCP packet includes source and destination ip and port, but won't both of these connections have the same src and dest ip addresses, and port 80? How can the packets be differentiated and routed to appropriately? Does it have anything to do with the packet sequence number which is different for each connection?

When your browser creates a new connection to the HTTP server, it uses a different source port.
For example, say your browser creates two connections to a server and that your IP address is 60.12.34.56. The first connection might originate from source port 60123 and the second from 60127. This is embedded in the TCP header of each packet sent to the server. When the server replies to each connection, it uses the appropriate port (e.g. 60123 or 60127) so that the packet makes it back to the right spot.
One of the best ways to learn about this is to download Wireshark and just observe traffic on your own network. It will show you this and much more.
Additionally, this gives insight into how Network Address Translation (NAT) works on a router. You can have many computers share the same IP address and the router will rewrite the request to use a different port so that two computers can simultaneously connect to places like AOL Instant Messenger.

They're differentiated by the source port.
The main reason for each HTTP request to not generate a separate TCP connection is called keepalives, incidentally.

A socket, in packet network communications, is considered to be the combination of 4 elements: server IP, server port, client IP, client port. The second one is usually fixed in a protocol, e.g. http usually listen in port 80, but the client port is a random number usually in the range 1024-65535. This is because the operating system could use those ports for known server protocols (e.g. 21 for FTP, 22 for SSH, etc.). The same network device can not use the same client port to open two different connections even to different servers and if two different clients use the same port, the server can tell them apart by their IP addresses. If a port is being used in a system either to listen for connection or to establish a connection, it can not be used for anything else. That's how the operating system can dispatch packets to the correct process once received by the network card.