How Network Layer finds route for a packet - networking

An HTTP application request for www.stackoverflow.com.
This message is passed to Transport layer. Transport layer adds its header and sends the packet to Internet Layer.
The Internet Layer cannot see www.stackoverflow.com as it can only access the header which was appended by Transport Layer. Then how can Internet Layer decide route for this request packet.
How is the destination address field in IP header is filled, as only Application Layar and Transport Layer know about that field. (Application layer has no interaction with Internet Layer and Transport Layer mention port number in its Header.)

The application layer would have already retrieved the IP address of the host from the URL via DNS. The IP address as well as other data from the Application layer are sent down to the Transport layer which packetizes the data and then send it down to the Internet layer and then it goes.

The application, in this case the browser, did something that ended up calling the getaddrinfo library function or something equivalent, which made the system's resolver look up the name in the DNS and return a set of IP addresses.
The application somehow chose one of those (there's standard ways to do this, but the lovely thing is how many standard ways) and used the connect system call to make the connection, which started the transport layer in the kernel working on getting a connection to that IP address.
That ends up creating IP packets with that destination address and the local address as the source, next protocol set to TCP and the SYN bit on in the TCP header. Each router on the path consults its tables and forwards the packet.
TCP magic happens, a SYN+ACK comes back, then there's a connection, over which HTTP magic happens, and the page loads.

rfc791 IP - Addressing
A distinction is made between names, addresses, and routes [4]. A name indicates what we seek. An address indicates where it is. A route indicates how to get there. The internet protocol deals primarily with addresses. It is the task of higher level (i.e., host-to-host or application) protocols to make the mapping from names to addresses. The internet module maps internet addresses to local net addresses. It is the task of lower level (i.e., local net or gateways) procedures to make the mapping from local net addresses to routes. Addresses are fixed length of four octets (32 bits).
Read more: http://www.faqs.org/rfcs/rfc791.html#ixzz0buBJkVEI
It is the task of higher level (i.e., host-to-host or application) protocols to make the mapping from names to addresses ???

If you want to know how the actual IP header gets the address. It occurs in the Kernel, when a socket is created. In this case a TCP socket, Check out
man 7 ip
The data is not inherited from the TCP packet, though the data is included in the checksum of the TCP header.

Related

How does a server know which domain name was used?

As far as i know what we get from a dns query is a ip address. So in the end of the day if thats true we are still using ip addresses to connect the server and domains are pretty names for them.
So how does a server know which domain i used to query that ip address?
How does vhosts work an understand that if the domain data is lost during dns query?
The Internet works in layers. Each layer uses different kind of parameters to do its work.
Layer 3 is typically IP aka Internet Protocol. To work it uses IP addresses, each computer has at least one to be able to discuss with another one. And there are two families in fact: version 4 and version 6.
Since multiple services can be on any given computer at some point, you need a layer on top of that, layer 4, that deals with transport. The "predominant" one is TCP aka Transport Control Protocol, but there is also UDP. TCP and UDP uses ports: a 2 bytes integer that encodes for a specific protocol.
For example, HTTP was given port number 80 (completely arbitrary), and HTTPS port 443.
The DNS, which itself uses UDP and TCP (on port 53), allows, among other things, to map a given hostname to a given IP address or multiple IP addresses. This is the typical A and AAAA records. There is also a CNAME record that maps one domain name to another. There also exists a SRV record that maps a service (which is a protocol name + a transport) to a given hostname and port number.
When one computer connects to another, its first step for all the above is to find out which IP address to use to connect to. It can use the DNS for that. Typically it will get only the IP address, but, depending on the protocol (layer above 4), may also get a port (if using SRV records).
The HTTP world does not use SRV records. So a browser just uses the hardcoded 80 or 443 ports, or the port number appearing in the URL.
Then we are at the transport level, let us say TCP.
The connection is done (since now the remote IP address and port are known) and the protocol above TCP, like HTTP, is free to convey any kind of extra data, such as the hostname that the client initially used (as taken from the URL) to find out the IP address.
This is done through the HTTP host header, see RFC 2616
Note that if you do things through TLS (which conceptually sits between TCP and HTTP) there is even something else happening: SNI or Server Name Indication.
When doing the TLS handshake, so before any kind of HTTP headers or content, the client will send the final hostname desired in some specific TLS message. Why? So that the server can find which specific certificate it should answer which as otherwhise it would not be able to know which hostname is requested as this sits in some HTTP header which do not exist until the TLS handshake is finished.
A webserver will be able to see both the SNI content to find out which certificate to send back and then the host header to find out which VirtualHost (in Apache) section is relevant to the query being processed.
If you are not in HTTP world, then it all depends on the protocol used. Older protocols, like FTP, did not plan for "multihoming" at the beginning, a given IP address meant only one hostname and service for example.

How does tcp/udp connection works?

I would like to ask a general newbie question. I understand that for a computer in location A to connect to a server in location B, packets of data have to be sent to multiple data centers through multiple gateways and through multiple verification channels to ensure the connection request finds the right destination.
However after the connection is established, when the computer and the server send/receive data, do these data still need to go through [multiple data centers through multiple gateways and through multiple verification channels]?
Every TCP / UDP packet can have a different network path between source to destination. However the connection establishment of a TCP connection being stateful is all about what packet size, compression method etc.
At network layer- Connection is stateless. Please read about OSI model in detail also you can refer to this https://www.ccnahub.com/wp-content/uploads/2013/09/watermarked-pc1-comm.jpg It has good explanation of how OSI works.
A TCP packet being sent from computer A to computer B will be addressed to a particular IP address. If that TCP address is not on the local LAN, it will go first through the local LAN to whatever is designated as the local gateway. That gateway then sends it on over the connection to an external network. At that point, it will be delivered to some router in your ISP. That router will look at the destination IP address and consult a routing table to find where it should next send the packet. That will typically be another router elsewhere in the network. This continues and (assuming good routing tables in each router) the packet will get closer to its end desination on each hop. Eventually, the packet will get to a router that has a routing table that knows about either the actual IP address or the home gateway for that IP address and the packet will be sent to that gateway. That home gateway can then deliver the packet to that actual IP address. In some cases, there may be a private network at either end where private IP addresses/port combinations are converted to public IP addresses and vice versa.
If computer A sends multiple packets to computer B, they do not have to all go the exact same path, though typically they will (assuming no problems or congestion in the network between the two endpoints).
In this scenario where A and B are on different private networks, there is no direct connection between computer A and computer B so each packet has to follow the path from one router to the next until it arrive at the final gateway and then destination address.
However after the connection is established, when the computer and the server send/receive data, do these data still need to go through [multiple data centers through multiple gateways and through multiple verification channels]?
If the routers are doing their job appropriately, the very first packet takes the most efficient path from A to B that the network knows. There is no "better" way to send subsequent packets. Subsequent packets will follow the same process (to a router, router looks up in routing table where to send for next hop and so on). If the two endpoints are a long ways apart (in terms of network topology), then the packet may go through many routers. Routers are highly optimized pieces of equipment capable of passing off millions of packets a second as this is how data moves on any TCP/IP network like the internet.
There is no difference in how the first packet that initiates the TCP connection flows versus subsequent packets. At the network level, they are just packets traveling from a source IP address to a destination IP address. Once the connection is established, a reliability layer will be started to track packets that might get lost, initiate retransmissions, etc... but this doesn't have anything to do with how a given packet gets from A to B.

Where is the source and destination address fields in TCP header?

From what I've read, TCP sits on the layer between the application and IP, and handles setting up the packets, checking for errors, ordering etc so the application itself doesn't have to do it.
However, when I looked at the TCP header I became confused. From the way I understand it, some data is handed to TCP from the application, and is given a destination address to which to send the data. The TCP layer packages it up, and sends it on to the IP layer, who in turn hands it off, all the way on down to the physical layer.
But looking at the TCP header on Wikipedia, there is no mention of a destination address! There is only a destination port number which I am pretty sure is not an address.
So my question is, how does TCP get the addresses? And/or, how does IP get the address if TCP isn't passing them to it?
It's the Application that's running on top of Transport Layer that chooses everything.
If the Application is designed with reliability in mind, it chooses the connection oriented protocol like TCP.
The same applications tells TCP what the Source and Destination port should be, TCP alone cannot decide this.
Example: If you're accessing a website, your Application would be the browser, since accessing websites normally happens over HTTP/HTTPS and HTTP/HTTPS is designed to be reliable, it chooses TCP. Port 80(HTTP) or 443(HTTPS) are the standard ports used for accessing websites, so either of these ports are used in the Destination Port field while the Source Port can be any random higher number port.
This combination is used to identify something called Transport Layer VC(Virtual Circuit).
Coming to IP, the same application tells what the Destination IP address is, while the Source IP is the machine from where you are running the browser.
IP in Network Layer and TCP in Transport Layer cannot choose anything, it's the Application that tells them what to choose, considering they are the chosen ones.

How does TCP/Application layer identifies the destination port number?

When the application layer sends the data to the Transport layer to deliver to the server, how does it know which port number to communicate to?
Precisely, the TCP segment contains as a header the destination port no., how does it determine it?
The application has to be told. Either the port is a standard port listed in etc/services, in which case the getaddrinfo() API tells you, or else it is provided via the application's configuration, or it's hard-wired into the source code.
The application establishes the port number when it creates a socket connection to the server. The socket knows which local IP/Port it is bound to and which remote IP/Port it is connected to. Those values are used whenever data is sent using that socket. The transport layer knows which values to put in the IP and TCP headers.

How are different TCP connections in HTTP requests identified?

From what I understand, each HTTP request uses its own TCP connection (please correct me if i'm wrong). So, let's say that there are two current connections to the same server. For example, client side javascript code triggering a couple of AJAX POST requests using the XMLHttpRequest object, one right after the other, before getting the response to the first one. So we're talking about two connections to the same server, each waiting for a response in order to route it to each separate callback function.
Now here's the thing that I don't understand: The TCP packet includes source and destination ip and port, but won't both of these connections have the same src and dest ip addresses, and port 80? How can the packets be differentiated and routed to appropriately? Does it have anything to do with the packet sequence number which is different for each connection?
When your browser creates a new connection to the HTTP server, it uses a different source port.
For example, say your browser creates two connections to a server and that your IP address is 60.12.34.56. The first connection might originate from source port 60123 and the second from 60127. This is embedded in the TCP header of each packet sent to the server. When the server replies to each connection, it uses the appropriate port (e.g. 60123 or 60127) so that the packet makes it back to the right spot.
One of the best ways to learn about this is to download Wireshark and just observe traffic on your own network. It will show you this and much more.
Additionally, this gives insight into how Network Address Translation (NAT) works on a router. You can have many computers share the same IP address and the router will rewrite the request to use a different port so that two computers can simultaneously connect to places like AOL Instant Messenger.
They're differentiated by the source port.
The main reason for each HTTP request to not generate a separate TCP connection is called keepalives, incidentally.
A socket, in packet network communications, is considered to be the combination of 4 elements: server IP, server port, client IP, client port. The second one is usually fixed in a protocol, e.g. http usually listen in port 80, but the client port is a random number usually in the range 1024-65535. This is because the operating system could use those ports for known server protocols (e.g. 21 for FTP, 22 for SSH, etc.). The same network device can not use the same client port to open two different connections even to different servers and if two different clients use the same port, the server can tell them apart by their IP addresses. If a port is being used in a system either to listen for connection or to establish a connection, it can not be used for anything else. That's how the operating system can dispatch packets to the correct process once received by the network card.

Resources