How TCP works from the client point of view - networking

I know how TCP protocols work from the server point of view. NAT is used to know on which local machine is the port number xx reserved.
When I establish a connection with a server, it opens a non-reserved port. How does my livebox, for example, knows I reserved my port and not an other computer ?
Additional questions:
- For the Client/Server, is there one port per socket ? One per processus ? One per Thread ?
- On a server, there is a socket listening on a port, once it accepts a connection, does it open a new port ?

How does my livebox, for example, knows I reserved my port and not an
other computer ?
I assume you are talking if the client is behind a NAT.
Is ever the client to contact the server. So when the client send the first packet to the server through your router(that provide for you NATting). The router save into is NAT table the address and the port of the client, so when i get the answer he know who is the client.
Additional questions: - For the Client/Server, is there one port per
socket ? One per processus ? One per Thread ? - On a server, there is
a socket listening on a port, once it accepts a connection, does it
open a new port ?
I think you are confusing some concepts.
Into the OSI protocol and into TCP/IP protocol there are differents kind of layers.
As you can see TCP (Transmission Control Protocol) is under the Application Layer.
When you are talking of socket you are into the application layer. So all decisions:
"Use same port, bind new port for every incoming request" are made by specific application.
A network socket is an endpoint of a connection across a computer
network. Today, most communication between computers is based on the
Internet Protocol; therefore most network sockets are Internet
sockets. More precisely, a socket is a handle (abstract reference)
that a local program can pass to the networking application
programming interface (API) to use the connection, for example "send
this data on this socket". Sockets are internally often simply
integers, which identify which connection to use.
For more about NAT read here.
For more about Port Forwarding read here.

Related

RPC & TCP Behavior

Can someone describe from a network point of view what RPC (SUN and/or DCE) is and why it deviates from standard TCP behavior?
The way that I understand it is a client reaches out to a server with a unique source port and then switches the source port after the TCP three way handshake finishes. I work with ASA firewalls so this behavior becomes very apparent when the inspection of DCE RPC is not enabled since the firewall will block it because it sees it as a threat. I have read a few MS TechNet articles and other website definitions to include watching about five Youtube videos which all seem to explain it from a programmers perspective but I have yet to fully understand this concept since I am not a programmer.
Note that there is nothing that deviates from standard TCP regarding the RPC protocols.
SunRPC or DCE RPC works on top of UDP(at least SunRPC can use UDP) or on top of TCP.
Typically in order for an RPC client to contact/call an RPCserver, it first contacts some sort of lookup server (called portmapper or rpcbind in the case of SunRPC), which replies with the location (IP address and port number) of where the actual server is running.
So from a networking perspective:
RPC Servers listens on a random port number, which may change each time that server program is (re)started.
At startup the RPC server connects to the portmapper, which runs on a well known port and register itself with which IP address and port number it's listening on.
Normally the portmapper service runs on the same machine as the RPC server programs.
When a client wants to connect to or call an RPC service it performs these actions:
Connects to the portmapper, on a well known/standard destination port and asks it where the particular service it wants to connect to is.
portmapper replies with the IP address and port number of the service the client asked for.
client tears down the connection to the portmapper
client establishes a new connection to the service using the IP address and port number that pormapper gave it.
client calls the RPC service over this new connection, which the client may use for multiple RPC calls.
These RPC calls are just application message exchanged on top of a TCP connection.
(In the case UDP is used instead of TCP, it works much the same, but there's no naturally no connection setup/teardown being performed over the network)
This presents a problem for firewalls, since the servers listens on randomly chosen port numbers, one cannot administratively allow access to a particular port number. Instead a firewall wanting to support this kind of setup would need to open up the portmapper port, catch the RPC messages going to that well known port of portmapper, inspect the message content exchanged with the portmapper to extract the IP address and port number from the RPC messages(the portmapper is itself implemented as an RPC server) in order to dynamically open a port between the RPC server and client.

What is the technology behind port numbers?

I know that port numbers are used for identifying different processes running on a server, so that multiple processes can use the same networking resources. But how does it work internally?
For example, if a request to a website http://www.my-awesome-website.com:80 reaches a server, how does the server know that there is a web server running on port 80? I mean, what does the request pipeline look like between getting the request to finding out that a web server is running on port 80 and forwarding the request to the web server?
Port numbers are merely addresses for some transport-layer protocols, such as TCP and UDP, in the same way that IP addresses are for layer-3 protocols, and MAC addresses are for layer-2 protocols. Not all transport-layer protocols use ports, and each transport-layer protocol independently maintains its ports so that TCP port 80 is not the same as UDP port 80, and each can be used simultaneously by different applications.
Layer-2 addresses are only relevant to the LAN links, layer-3 addresses are only relevant host-to-host over the layer-3 network, and layer-4 addresses are relevant application-to-application.
IANA registers ports and maintains the official registry list at Service Name and Transport Protocol Port Number Registry.
From RFC 793, TRANSMISSION CONTROL PROTOCOL:
Multiplexing:
To allow for many processes within a single Host to use TCP
communication facilities simultaneously, the TCP provides a set of
addresses or ports within each host. Concatenated with the network
and host addresses from the internet communication layer, this forms
a socket. A pair of sockets uniquely identifies each connection.
That is, a socket may be simultaneously used in multiple
connections.
The binding of ports to processes is handled independently by each
Host. However, it proves useful to attach frequently used processes
(e.g., a "logger" or timesharing service) to fixed sockets which are
made known to the public. These services can then be accessed
through the known addresses. Establishing and learning the port
addresses of other processes may involve more dynamic mechanisms.
Connections:
The reliability and flow control mechanisms described above require
that TCPs initialize and maintain certain status information for
each data stream. The combination of this information, including
sockets, sequence numbers, and window sizes, is called a connection.
Each connection is uniquely specified by a pair of sockets
identifying its two sides.
When two processes wish to communicate, their TCP's must first
establish a connection (initialize the status information on each
side). When their communication is complete, the connection is
terminated or closed to free the resources for other uses.
Since connections must be established between unreliable hosts and
over the unreliable internet communication system, a handshake
mechanism with clock-based sequence numbers is used to avoid
erroneous initialization of connections.
After opening a socket(which is like an open file but used for network communications), the user of the socket may use it directly with an ephemeral port(selected by the OS), which is typical if the application is a client application.
What server processes do is to call the bind() socket API call to set a port for the socket, and then call listen() in case of a TCP socket to start listening for incoming connection requests.
Because of the bind() call the OS will know that this particular socket is the one receiving the data sent to the particular port number.
The packets sent over the network contain the source and destination IP addresses as well as the source and destination ports:
http://www.techrepublic.com/article/exploring-the-anatomy-of-a-data-packet/
So the OS has a data structure with open sockets listed by their port numbers and it will pass the received data to the correct socket's input buffer. Sent data will be marked by the port number of the sending socket.

Client Server Architecture

I'm struggle with what technique to choose for a server client aspect of my application.
Defining design
Windows, C# on .net 2
On many machines there is a .net 2 service. I call that the Client.
Machines can be in different networks behind NAT's (or not) connected to Internet.
Server services are public.
Requirements
To communicate with the Clients on demand.
Client must listen for incoming connections.
The server can be or not online.
Port forwarding is not possible.
What are my choices to do something like that?
Now I'm looking in the UDP Hole punching technique. The difference between the UDP hole punching technique setup and my setup is that instead of having 2 clients behind a NAT and a mediation Server, I got only the client behind the NAT that must communicate with the server. That must be easier but I'm having hard time to understand and implement.
I'm on the right way with the this kind of NAT traversal or may be some other methods much easier to implement?
Other methods that I've taken in consideration:
When the service sees the server online, creates a connection to the server using TCP. The problem is that I have something around 200 clients, and the number is rising and I was afraid that this is a resource killer.
When the service sees the server online, checks a database table for commands then at every 30 seconds checks again. This is also a resource killer for my server.
Bottom of line is, if the UDP Hole Punching tehnique is the right way for this scenario, please provide some code ideas for de UDPServer that will run on the service behind NAT.
Thank you.
Hole punching and p2p
You might be interested in a high level discussion of UDP hole punching. Hole punching is needed if you want clients (who both might be behind a firewall) to communicate directly without an relaying server. This is how many peer 2 peer (p2p) communications work.
With p2p, typically NAT'ed clients must use some external server to determine each other's "server reflexive address." When NAT translation occurs, behind the firewall ports can be mapped to some arbitrary port to the public. A client can use a STUN server to determine its "server reflexive address." Clients then will, through an intermediary server, exchange server reflexive addresses and can initiate communication (with hole punching to initiate the session).
Often, a NAT will not behave in a manner to allow direct communication as described above. Sending packets to different destinations will cause a NAT to map ports to entirely different values depending on the destination. In this case, a TURN server is needed.
Links
Nat traversal and different types of NAT behaviors
STUN RFC
TURN RFC
Server-client communication
If your client only needs to communicate with a server, hole punching is not needed. As long as the client can communicate with the public Internet, then you can use any C# socket API (I'm not familiar with C#) to make connections to the server's public IP/port combination. Typically, clients making socket connections don't specify a source port and let the underlying socket API make that decision since it really doesn't matter.
Your server should be listening to a specific port (you make this determination), and when it receives a packet from the client, the source address of the packet will be some NAT'ed address. In other words, the source address will be the public IP of whatever firewall your client is behind. If the NAT changed the source port of the client's packet, the server will see this NAT'ed port as the source port. It really doesn't matter, since when the server sends back a response packet, the NAT on the client machine will translate the destination port (it stores translations internally) and correctly send the packet back to the correct private host (the client).

How the clients (client sockets) are identified?

To my understanding by serverSocket = new ServerSocket(portNumber) we create an object which potentially can "listen" to the indicated port. By clientSocket = serverSocket.accept() we force the server socket to "listen" to its port and to "accept" a connection from any client which tries to connect to the server through the port associated with the server. When I say "client tries to connect to the server" I mean that client program executes "nameSocket = new Socket(serverIP,serverPort)".
If client is trying to connect to the server, the server "accepts" this client (i.e. creates a "client socket" associated with this client).
If a new client tries to connect to the server, the server creates another client socket (associated with the new client). But how the server knows if it is a "new" client or an "old" one which has already its socket? Or, in other words, how the clients are identified? By their IP? By their IP and port? By some "signatures"?
What happens if an "old" client tries to use Socket(serverIP,serverIP) again? Will server create the second socket associated with this client?
The server listens on an address and port. For example, your server's IP address is 10.0.0.1, and it is listening on port 8000.
Your client IP address is 10.0.0.2, and the client "connects" to the server at 10.0.0.1 port 8000. In the TCP connect, you are giving the port of the server that you want to connect to. Your client will actually get its own port number, but you don't control this, and it will be different on each connection. The client chooses the server port that it wants to connect to and not the client port that it is connecting from.
For example, on the first connection, your client may get client-side port 12345. It is connecting from 10.0.0.2 port 12345 to the server 10.0.0.1 port 8000. Your server can see what port the client is connecting from by calling getpeername on its side of the connection.
When the client connects a second time, the port number is going to be different, say port 12377. The server can see this by calling getpeername on the second connection -- it will see a different port number on the client side. (getpeername also shows the client's IP address.)
Also, each time you call accept on the server, you are getting a new socket. You still have the original socket listening, and on each accept you get a new socket. Call getpeername on the accepted socket to see which client port the connection is coming from. If two clients connect to your server, you now have three sockets -- the original listening socket, and the sockets of each of the two clients.
You can have many clients connected to the same server port 8000 at the same time. And, many clients can be connected from the same client port (e.g. port 12345), only not from the same IP address. From the same client IP address, e.g. 10.0.0.2, each client connection to the server port 8000 will be from a unique client port, e.g. 12345, 12377, etc. You can tell the clients apart by their combination of IP address and port.
The same client can also have multiple connections to the server at the same time, e.g. one connection from client port 12345 and another from 12377 at the same time. By client I mean the originating IP address, and not a particular software object. You'll just see two active connections having the same client IP address.
Also, eventually over time, the combination of client-address and client-port can be reused. That is, eventually, you may see a new client come in from 10.0.0.2 port 12345, long after the first client at 10.0.0.2 port 12345 has disconnected.
Every TCP connection has as identifier the quadruple (src port, src address, dest port, dest address).
Whenever your server accepts a new client, a new Socket is created and it's indipendent from every other socket created so far. The identification of clients is not implictly handled somehow..
You don't have to think sockets as associated to "clients", they are associated with an ip and a port, but there is not direct correlation between these two.
If the same client tries to open another socket by creating a new one you'll have two unrelated sockets (because ports will be different for sure). This because the client cannot use the same port to open the new connection so the quadruple will be different, same client ip, same server ip, same server port but different client port.
EDIT for your questions:
clients don't specify a port because it's randomly choosen from the free ones (> 1024 if I'm not wrong) from the underlying operating system
a connection cannot be opened from a client using the same port, the operating system won't let you do that (actually you don't specify any port at all) and in any case it would tell you that port is already bound to a socket so this issue cannot happen.
whenever the server receives a new connection request it's is considered new, because also if ip is the same port will be different for sure (in case of old packet resend or similar caveats I think that the request will be discarded)
By the way all these situations are clearly explained in TCP RFC here.
I think the question here is why do you care if the client is new or old. What is new and old?
For example, a web browser could connect to a web server to request a web page. This will create a connection so serverSocket.accept() will return a new Socket. Then the connection is closed by the web browser.
Afer a couple of minutes, the end used click on a link in the web page and the browser request a new page to the server. This will create a connection so serverSocket.accept() will return a new Socket.
Now, the web server do not care if this is a new or old client. It just need to server the requested page. If the server do care if the "client" already requested a page in the past, it should do so using some information in the protocol used on the socket. Check out http://en.wikipedia.org/wiki/OSI_model
In this case, the ServerSocket and Socket ack on the transport level. The question "does this client already requested a page on the server" should be answered by information on the session or even application layer.
In the web browser/server example, the http protocol (which is an application) protocol hold information about who is this browser in the parameters of the request (the browser transmit cookie informations with every request). The http server can then set/read cookie information to known if the browser connected before and eventually maintain a server side session for that browser.
So back to your question: why do you care if it's a new or old client?
A socket is identified by:
(Local IP,Local Port, Remote IP,
Remote Port,IP Protocol(UDP/TCP/SCTP/etc.)
And that's the information the OS uses to map the packets/data to the right handle/file descriptor of your program. For some kinds of sockets,(e.g. an non-connected UDP socket)the remote port/remote IP might be wildcards.
By definition, this is not a Java related question, but about networking in general, since Sockets and SeverSockets apply to any networking-enabled programming language.
A Socket is bounded to a local-port. The client will open a connection to the server (by the Operating System/drivers/adapters/hardware/line/.../line/hardware/adapters/drivers/Server OS). This "connection" is done by a protocol, called the IP (Internet Protocol) when you are connected to the Internet. When you use "Sockets", it will use another protocol, which is the TCP/IP-protocol.
The Internet Protocol will identify nodes on a network by two things: their IP-address and their port. The TCP/IP-protocol will send messages using the IP, and making sure messages are correctly received.
Now; to answer your question: it all depends! It depends on your drivers, your adapters, your hardware, your line. When you connect to your localhost machine, you will not get further than the adapter. The hardware isn't necessairy, since no data is actually sent over the line. (Though often you need hardware before you can have an adapter.)
By definition, the Internet Protocol defines a connection as pair of nodes (thus four things: two IP-adresses and two ports). Also, the Internet Protocol defines that one node can only use one port at a time to initiate a connection with another node (note: this only applies for the client, not the server).
To answer your second question: if there are two Sockets: the "new" and the "old". Since, by the Internet Protocol, a connection is a pair of nodes, and nodes can only use one port at a time for a connection, the ports of "new" and "old" must be different. And because this is different, the "new" client can be discriminated from the "old", since the port-number is differently.

How does the socket API accept() function work?

The socket API is the de-facto standard for TCP/IP and UDP/IP communications (that is, networking code as we know it). However, one of its core functions, accept() is a bit magical.
To borrow a semi-formal definition:
accept() is used on the server side.
It accepts a received incoming attempt
to create a new TCP connection from
the remote client, and creates a new
socket associated with the socket
address pair of this connection.
In other words, accept returns a new socket through which the server can communicate with the newly connected client. The old socket (on which accept was called) stays open, on the same port, listening for new connections.
How does accept work? How is it implemented? There's a lot of confusion on this topic. Many people claim accept opens a new port and you communicate with the client through it. But this obviously isn't true, as no new port is opened. You actually can communicate through the same port with different clients, but how? When several threads call recv on the same port, how does the data know where to go?
I guess it's something along the lines of the client's address being associated with a socket descriptor, and whenever data comes through recv it's routed to the correct socket, but I'm not sure.
It'd be great to get a thorough explanation of the inner-workings of this mechanism.
Your confusion lies in thinking that a socket is identified by Server IP : Server Port. When in actuality, sockets are uniquely identified by a quartet of information:
Client IP : Client Port and Server IP : Server Port
So while the Server IP and Server Port are constant in all accepted connections, the client side information is what allows it to keep track of where everything is going.
Example to clarify things:
Say we have a server at 192.168.1.1:80 and two clients, 10.0.0.1 and 10.0.0.2.
10.0.0.1 opens a connection on local port 1234 and connects to the server. Now the server has one socket identified as follows:
10.0.0.1:1234 - 192.168.1.1:80
Now 10.0.0.2 opens a connection on local port 5678 and connects to the server. Now the server has two sockets identified as follows:
10.0.0.1:1234 - 192.168.1.1:80
10.0.0.2:5678 - 192.168.1.1:80
Just to add to the answer given by user "17 of 26"
The socket actually consists of 5 tuple - (source ip, source port, destination ip, destination port, protocol). Here the protocol could TCP or UDP or any transport layer protocol. This protocol is identified in the packet from the 'protocol' field in the IP datagram.
Thus it is possible to have to different applications on the server communicating to to the same client on exactly the same 4-tuples but different in protocol field. For example
Apache at server side talking on (server1.com:880-client1:1234 on TCP)
and
World of Warcraft talking on (server1.com:880-client1:1234 on UDP)
Both the client and server will handle this as protocol field in the IP packet in both cases is different even if all the other 4 fields are same.
What confused me when I was learning this, was that the terms socket and port suggest that they are something physical, when in fact they're just data structures the kernel uses to abstract the details of networking.
As such, the data structures are implemented to be able to distinguish connections from different clients. As to how they're implemented, the answer is either a.) it doesn't matter, the purpose of the sockets API is precisely that the implementation shouldn't matter or b.) just have a look. Apart from the highly recommended Stevens books providing a detailed description of one implementation, check out the source in Linux or Solaris or one of the BSD's.
As the other guy said, a socket is uniquely identified by a 4-tuple (Client IP, Client Port, Server IP, Server Port).
The server process running on the Server IP maintains a database (meaning I don't care what kind of table/list/tree/array/magic data structure it uses) of active sockets and listens on the Server Port. When it receives a message (via the server's TCP/IP stack), it checks the Client IP and Port against the database. If the Client IP and Client Port are found in a database entry, the message is handed off to an existing handler, else a new database entry is created and a new handler spawned to handle that socket.
In the early days of the ARPAnet, certain protocols (FTP for one) would listen to a specified port for connection requests, and reply with a handoff port. Further communications for that connection would go over the handoff port. This was done to improve per-packet performance: computers were several orders of magnitude slower in those days.

Resources