How does TCP/Application layer identifies the destination port number? - networking

When the application layer sends the data to the Transport layer to deliver to the server, how does it know which port number to communicate to?
Precisely, the TCP segment contains as a header the destination port no., how does it determine it?

The application has to be told. Either the port is a standard port listed in etc/services, in which case the getaddrinfo() API tells you, or else it is provided via the application's configuration, or it's hard-wired into the source code.

The application establishes the port number when it creates a socket connection to the server. The socket knows which local IP/Port it is bound to and which remote IP/Port it is connected to. Those values are used whenever data is sent using that socket. The transport layer knows which values to put in the IP and TCP headers.

Related

destination port in UDP protocol

My question is that how the destination port address in UDP is chosen/given?
I mean what matters to set a destination port in a UDP packet?
Because when we send a packet, just the destination address(ip) is important and we want to send data to our destination.
It has nothing to do with the port!
Do we assign a random port?
Typically, whatever documentation tells you what to put in the UDP datagram you're sending should also tell you what port to send it to.
For example, if you're trying to talk to an NTP server, RFC5905 tells you what to put in the UDP datagrams you send. It also tells you, on page 16, to send it to port 123.
If you're writing a DNS resolver, RFC1035 is one place you might look for the information needed to know what to put in your UDP datagrams. It also tells you, in section 4.2, to send the datagrams to port 53.
So however you're figuring out what to put in the UDP datagrams you're going to send, that's typically what tells you either what port to send them to or, in some cases, how to determine what port to send them to.
For example, a media streaming protocol might start with the information about the stream being delivered by a web server. In that case, the information delivered by the web server to the client might include the destination port to send datagrams to.
Generally, there's either a well-known port that at least one side listens for datagrams on or there's some external method using a different protocol that tells whichever end sends the first datagram what port to send it to. The other end then just replies, sending its response datagrams to whatever port that first datagram was sent from.
Generally, the sending port is chose randomly for the ephemeral ports available.
The destination port is the port to which the destination application is listening. To facilitate this, IANA maintains the Service Name and Transport Protocol Port Number Registry for standard applications and protocols.
If you create your own application or protocol, there is a range for you to use, but you should always check the registry to make sure you will not step on some other application or protocol.
When you design your listening application or protocol, you choose a port on which it listens, and the sending application will need to send to that port.

Why two HTTP and TCP addresses can use the same port and two IPC addresses cannot use the same named pipe?

What I think of a port is: Whenever a message arrives to a machine, it is copied to a memory area which is mapped to the port specified and the concerned application or service is notified that a message has arrived for it.
If this is true, then what happens if two messages arrive for two different services listening on the same port ? ( either http or tcp )
And why can not two named pipe addresses use the same named pipe ?
TCP identifies "connections" via a tuple of { local ip, local port, remote ip, remote port }. Therefore, since each incoming connection has a different remote ip/port pair, your local machine can distinguish between them.
HTTP uses TCP for its transport. Thus, an HTTP port is a TCP port.
If you've ever had your machine get a new IP address while you had connections open, you'll note that they break the first time they send any data out since the remote host does not recognize the (new) address and sends a RST response.
A pipe has only its name to distinguish it so there is only one "connection" no matter how many writers it has.
Your description is one way to handle incoming messages.
In the case of two web sites listening on the same port, there is one web server listening on that port, which then looks at the http host header to find the correct web site to forward the request to.
The same is true for named pipes, the RPC listener listens on the TCP port, and then finds out that it is a named pipe message and then forwards the message to the right named pipe.

How the clients (client sockets) are identified?

To my understanding by serverSocket = new ServerSocket(portNumber) we create an object which potentially can "listen" to the indicated port. By clientSocket = serverSocket.accept() we force the server socket to "listen" to its port and to "accept" a connection from any client which tries to connect to the server through the port associated with the server. When I say "client tries to connect to the server" I mean that client program executes "nameSocket = new Socket(serverIP,serverPort)".
If client is trying to connect to the server, the server "accepts" this client (i.e. creates a "client socket" associated with this client).
If a new client tries to connect to the server, the server creates another client socket (associated with the new client). But how the server knows if it is a "new" client or an "old" one which has already its socket? Or, in other words, how the clients are identified? By their IP? By their IP and port? By some "signatures"?
What happens if an "old" client tries to use Socket(serverIP,serverIP) again? Will server create the second socket associated with this client?
The server listens on an address and port. For example, your server's IP address is 10.0.0.1, and it is listening on port 8000.
Your client IP address is 10.0.0.2, and the client "connects" to the server at 10.0.0.1 port 8000. In the TCP connect, you are giving the port of the server that you want to connect to. Your client will actually get its own port number, but you don't control this, and it will be different on each connection. The client chooses the server port that it wants to connect to and not the client port that it is connecting from.
For example, on the first connection, your client may get client-side port 12345. It is connecting from 10.0.0.2 port 12345 to the server 10.0.0.1 port 8000. Your server can see what port the client is connecting from by calling getpeername on its side of the connection.
When the client connects a second time, the port number is going to be different, say port 12377. The server can see this by calling getpeername on the second connection -- it will see a different port number on the client side. (getpeername also shows the client's IP address.)
Also, each time you call accept on the server, you are getting a new socket. You still have the original socket listening, and on each accept you get a new socket. Call getpeername on the accepted socket to see which client port the connection is coming from. If two clients connect to your server, you now have three sockets -- the original listening socket, and the sockets of each of the two clients.
You can have many clients connected to the same server port 8000 at the same time. And, many clients can be connected from the same client port (e.g. port 12345), only not from the same IP address. From the same client IP address, e.g. 10.0.0.2, each client connection to the server port 8000 will be from a unique client port, e.g. 12345, 12377, etc. You can tell the clients apart by their combination of IP address and port.
The same client can also have multiple connections to the server at the same time, e.g. one connection from client port 12345 and another from 12377 at the same time. By client I mean the originating IP address, and not a particular software object. You'll just see two active connections having the same client IP address.
Also, eventually over time, the combination of client-address and client-port can be reused. That is, eventually, you may see a new client come in from 10.0.0.2 port 12345, long after the first client at 10.0.0.2 port 12345 has disconnected.
Every TCP connection has as identifier the quadruple (src port, src address, dest port, dest address).
Whenever your server accepts a new client, a new Socket is created and it's indipendent from every other socket created so far. The identification of clients is not implictly handled somehow..
You don't have to think sockets as associated to "clients", they are associated with an ip and a port, but there is not direct correlation between these two.
If the same client tries to open another socket by creating a new one you'll have two unrelated sockets (because ports will be different for sure). This because the client cannot use the same port to open the new connection so the quadruple will be different, same client ip, same server ip, same server port but different client port.
EDIT for your questions:
clients don't specify a port because it's randomly choosen from the free ones (> 1024 if I'm not wrong) from the underlying operating system
a connection cannot be opened from a client using the same port, the operating system won't let you do that (actually you don't specify any port at all) and in any case it would tell you that port is already bound to a socket so this issue cannot happen.
whenever the server receives a new connection request it's is considered new, because also if ip is the same port will be different for sure (in case of old packet resend or similar caveats I think that the request will be discarded)
By the way all these situations are clearly explained in TCP RFC here.
I think the question here is why do you care if the client is new or old. What is new and old?
For example, a web browser could connect to a web server to request a web page. This will create a connection so serverSocket.accept() will return a new Socket. Then the connection is closed by the web browser.
Afer a couple of minutes, the end used click on a link in the web page and the browser request a new page to the server. This will create a connection so serverSocket.accept() will return a new Socket.
Now, the web server do not care if this is a new or old client. It just need to server the requested page. If the server do care if the "client" already requested a page in the past, it should do so using some information in the protocol used on the socket. Check out http://en.wikipedia.org/wiki/OSI_model
In this case, the ServerSocket and Socket ack on the transport level. The question "does this client already requested a page on the server" should be answered by information on the session or even application layer.
In the web browser/server example, the http protocol (which is an application) protocol hold information about who is this browser in the parameters of the request (the browser transmit cookie informations with every request). The http server can then set/read cookie information to known if the browser connected before and eventually maintain a server side session for that browser.
So back to your question: why do you care if it's a new or old client?
A socket is identified by:
(Local IP,Local Port, Remote IP,
Remote Port,IP Protocol(UDP/TCP/SCTP/etc.)
And that's the information the OS uses to map the packets/data to the right handle/file descriptor of your program. For some kinds of sockets,(e.g. an non-connected UDP socket)the remote port/remote IP might be wildcards.
By definition, this is not a Java related question, but about networking in general, since Sockets and SeverSockets apply to any networking-enabled programming language.
A Socket is bounded to a local-port. The client will open a connection to the server (by the Operating System/drivers/adapters/hardware/line/.../line/hardware/adapters/drivers/Server OS). This "connection" is done by a protocol, called the IP (Internet Protocol) when you are connected to the Internet. When you use "Sockets", it will use another protocol, which is the TCP/IP-protocol.
The Internet Protocol will identify nodes on a network by two things: their IP-address and their port. The TCP/IP-protocol will send messages using the IP, and making sure messages are correctly received.
Now; to answer your question: it all depends! It depends on your drivers, your adapters, your hardware, your line. When you connect to your localhost machine, you will not get further than the adapter. The hardware isn't necessairy, since no data is actually sent over the line. (Though often you need hardware before you can have an adapter.)
By definition, the Internet Protocol defines a connection as pair of nodes (thus four things: two IP-adresses and two ports). Also, the Internet Protocol defines that one node can only use one port at a time to initiate a connection with another node (note: this only applies for the client, not the server).
To answer your second question: if there are two Sockets: the "new" and the "old". Since, by the Internet Protocol, a connection is a pair of nodes, and nodes can only use one port at a time for a connection, the ports of "new" and "old" must be different. And because this is different, the "new" client can be discriminated from the "old", since the port-number is differently.

How are different TCP connections in HTTP requests identified?

From what I understand, each HTTP request uses its own TCP connection (please correct me if i'm wrong). So, let's say that there are two current connections to the same server. For example, client side javascript code triggering a couple of AJAX POST requests using the XMLHttpRequest object, one right after the other, before getting the response to the first one. So we're talking about two connections to the same server, each waiting for a response in order to route it to each separate callback function.
Now here's the thing that I don't understand: The TCP packet includes source and destination ip and port, but won't both of these connections have the same src and dest ip addresses, and port 80? How can the packets be differentiated and routed to appropriately? Does it have anything to do with the packet sequence number which is different for each connection?
When your browser creates a new connection to the HTTP server, it uses a different source port.
For example, say your browser creates two connections to a server and that your IP address is 60.12.34.56. The first connection might originate from source port 60123 and the second from 60127. This is embedded in the TCP header of each packet sent to the server. When the server replies to each connection, it uses the appropriate port (e.g. 60123 or 60127) so that the packet makes it back to the right spot.
One of the best ways to learn about this is to download Wireshark and just observe traffic on your own network. It will show you this and much more.
Additionally, this gives insight into how Network Address Translation (NAT) works on a router. You can have many computers share the same IP address and the router will rewrite the request to use a different port so that two computers can simultaneously connect to places like AOL Instant Messenger.
They're differentiated by the source port.
The main reason for each HTTP request to not generate a separate TCP connection is called keepalives, incidentally.
A socket, in packet network communications, is considered to be the combination of 4 elements: server IP, server port, client IP, client port. The second one is usually fixed in a protocol, e.g. http usually listen in port 80, but the client port is a random number usually in the range 1024-65535. This is because the operating system could use those ports for known server protocols (e.g. 21 for FTP, 22 for SSH, etc.). The same network device can not use the same client port to open two different connections even to different servers and if two different clients use the same port, the server can tell them apart by their IP addresses. If a port is being used in a system either to listen for connection or to establish a connection, it can not be used for anything else. That's how the operating system can dispatch packets to the correct process once received by the network card.

How does the socket API accept() function work?

The socket API is the de-facto standard for TCP/IP and UDP/IP communications (that is, networking code as we know it). However, one of its core functions, accept() is a bit magical.
To borrow a semi-formal definition:
accept() is used on the server side.
It accepts a received incoming attempt
to create a new TCP connection from
the remote client, and creates a new
socket associated with the socket
address pair of this connection.
In other words, accept returns a new socket through which the server can communicate with the newly connected client. The old socket (on which accept was called) stays open, on the same port, listening for new connections.
How does accept work? How is it implemented? There's a lot of confusion on this topic. Many people claim accept opens a new port and you communicate with the client through it. But this obviously isn't true, as no new port is opened. You actually can communicate through the same port with different clients, but how? When several threads call recv on the same port, how does the data know where to go?
I guess it's something along the lines of the client's address being associated with a socket descriptor, and whenever data comes through recv it's routed to the correct socket, but I'm not sure.
It'd be great to get a thorough explanation of the inner-workings of this mechanism.
Your confusion lies in thinking that a socket is identified by Server IP : Server Port. When in actuality, sockets are uniquely identified by a quartet of information:
Client IP : Client Port and Server IP : Server Port
So while the Server IP and Server Port are constant in all accepted connections, the client side information is what allows it to keep track of where everything is going.
Example to clarify things:
Say we have a server at 192.168.1.1:80 and two clients, 10.0.0.1 and 10.0.0.2.
10.0.0.1 opens a connection on local port 1234 and connects to the server. Now the server has one socket identified as follows:
10.0.0.1:1234 - 192.168.1.1:80
Now 10.0.0.2 opens a connection on local port 5678 and connects to the server. Now the server has two sockets identified as follows:
10.0.0.1:1234 - 192.168.1.1:80
10.0.0.2:5678 - 192.168.1.1:80
Just to add to the answer given by user "17 of 26"
The socket actually consists of 5 tuple - (source ip, source port, destination ip, destination port, protocol). Here the protocol could TCP or UDP or any transport layer protocol. This protocol is identified in the packet from the 'protocol' field in the IP datagram.
Thus it is possible to have to different applications on the server communicating to to the same client on exactly the same 4-tuples but different in protocol field. For example
Apache at server side talking on (server1.com:880-client1:1234 on TCP)
and
World of Warcraft talking on (server1.com:880-client1:1234 on UDP)
Both the client and server will handle this as protocol field in the IP packet in both cases is different even if all the other 4 fields are same.
What confused me when I was learning this, was that the terms socket and port suggest that they are something physical, when in fact they're just data structures the kernel uses to abstract the details of networking.
As such, the data structures are implemented to be able to distinguish connections from different clients. As to how they're implemented, the answer is either a.) it doesn't matter, the purpose of the sockets API is precisely that the implementation shouldn't matter or b.) just have a look. Apart from the highly recommended Stevens books providing a detailed description of one implementation, check out the source in Linux or Solaris or one of the BSD's.
As the other guy said, a socket is uniquely identified by a 4-tuple (Client IP, Client Port, Server IP, Server Port).
The server process running on the Server IP maintains a database (meaning I don't care what kind of table/list/tree/array/magic data structure it uses) of active sockets and listens on the Server Port. When it receives a message (via the server's TCP/IP stack), it checks the Client IP and Port against the database. If the Client IP and Client Port are found in a database entry, the message is handed off to an existing handler, else a new database entry is created and a new handler spawned to handle that socket.
In the early days of the ARPAnet, certain protocols (FTP for one) would listen to a specified port for connection requests, and reply with a handoff port. Further communications for that connection would go over the handoff port. This was done to improve per-packet performance: computers were several orders of magnitude slower in those days.

Resources