Fundamental principle about data transfer on the internet - networking

Let's assume I open my PC browser in a LAN at my home and type http://foo.com, the browser or another running service find the foo's IP address and send it through ISP to the foo's server.
Now the server knows I sent a request and generate a HTML page or something to respond my request. How its respond arrives to browser in a LAN?

IP packets contain the IP address of the source (i.e. the sender). So the server knows who initiated the request, and can then send its response to that IP address (no DNS lookup involved).
One common complexity is in a LAN behind a router using NAT (network address translation); this is the case in most residential settings. Although all the clients in the LAN have different local IP addresses, the router modifies all the IP packets so that they all have the same IP address (the router's external address). Therefore all the response traffic all gets sent back to that single IP address. The router is able to distinguish and send the packets back to the correct local client based on TCP/UDP port number.

Browser opens connection to the server and sends the request; server responds through the same connection.

Its nicely explained at:
http://technet.microsoft.com/en-us/library/cc780783(WS.10).aspx

Related

How is data sent between public IPs?

I just started learning about networking and IP/MAC addresses. I understand if you want to send data to another device within the same network, you would sent out an ARP broadcast to the local network and the target device will respond with its mac address so they can start a connection.
My question is that does public IP also work the same way? Like if my home router wants to connect to the google.com IP, does it send out a ARP broadcast to the entire internet and wait for google to respond with its mac address?
Usually your gateway router will arp for the next hop router along the path to the destination, this path is often the default route (cisco calls this "the gateway of last resort"; the "default route" in cisco-ese is the default route only if routing is disable in the router which typically only happens if an image is not loaded so it is boot strapping). But sometimes your gateway will only have an out going interface, not a next hop ip address. This is fine for serial links since there can only be one next hop. But for ethernet, this causes your gateway and the next hop router to use proxy arp. There are other way to do this as well.

What actually happens on the client side when running VPN client software

I'm struggling to find a good explanation about what actually happens on the client side on a computer when a VPN client is running and connecting to a VPN server on the internet.
When we turn on and enable the VPN application to connect to a VPN somewhere in the world, and then use Chrome or Firefox etc. to access a website, how does the browser software know to connect to the VPN IP address instead?
My understanding is that normally an IP packet from layer 3 which has a source and destination IP address, gets wrapped in an Ethernet frame at layer 2.
When we use a VPN, does the IP packet for the destination address get wrapped in another packet for the VPN server first? Where does the TLS encryption come into this then?
If you have a real VPN (where N stands for network, i.e. not a web proxy) then a virtual network interface is created on the computer and routes are setup, so that all non-local traffic is send through this virtual network interface. The traffic will be encrypted there and then send through the "real" network interface to the other VPN endpoint, i.e. the original IP packet will be encrypted and then wrapped into another IP packet for transport.
In the other VPN endpoint there is the same kind of setup: the encrypted network traffic comes in through the real network interface, gets passed into the virtual network interface, gets unwrapped and decrypted there and emerges decrypted on the VPN endpoint where it (the decrypted data) then get forwarded to the final target.

What network port do clients use to send outbound packets?

I read a post on superuser.com (https://superuser.com/questions/284051/what-is-port-forwarding-and-what-is-it-used-for) that answered everything except for the port that is used. When sending out data from behind a NAT router, what port does the sending device use to send to the router and what port is used by the router once it's sent out, over the internet? I know that when a server receives this packet, it uses the port it was sent by the sending device (client) to know where to send the packet back to. But, this still doesn't answer where the NAT router came up with these two (private and public) ports originally. Do NAT routers just pick random ports and play a game of peek-a-boo with it's sending ports to make it nearly impossible for hackers to use port scanners to find an opening on random nodes on the internet? Please someone put me out of my misery.
The port stays constant across the entire spectrum unless otherwise specified. For example,
Client sends HTTP on 80, router forwards HTTP on 80 private to 80 public. Internet router recieve on public 80 and forwards to private 80.
The only thing that changes at the router (behind NAT) is the requesting IP Address. If I send a packet from my computer on port 80 to am internet site, the router changes the packet Source IP to its IP and then sends it across the globe.
Now, let's say we're on a home network. Here's how things work.
192.168.0.2 send request to router headed to 8.8.8.8 (google) on port 80. Packet gets to router. Router changes the SourceIP from 192.168.0.2 to its Public IP (64.5.5.5). It stores a record of this using various information such as the requesting MAC Address. Packet arrives at 8.8.8.8, which then changes the Destination IP from 8.8.8.8 to 172.0.0.5 (some internal web server at Google) and send the request to the server. When the server send a response, the same process happens in reverse.

What happens when my browser does a search? (ARP,DNS,TCP specifics)

I'm trying to learn the basics of ARP/TCP/HTTP (in sort of a scatter-shot way).
As an example, what happens when I go to google.com and do a search?
My understanding so far:
For my machine to communicate with others (the gateway in this case),
it may need to do an ARP Broadcast (if it doesn't already have the
MAC address in the ARP cache)
It then needs to resolve google.com's IP address. It does this by
contacting the DNS server. (I'm not completely sure how it knows
where the DNS server is? Or is it the gateway that knows?)
This involves communication through the TCP protocol since HTTP is
built on it (TCP handshake: SYN, SYN/ACK, ACK, then requests for
content, then RST, RST/ACK, ACK)
To actually load a webpage, the browser gets the index.html, parses
it, then sends more requests based on what it needs? (images,etc)
And finally, to do the actual google search, I don't understand how
the browser knows to communicate "I typed something in the search box
and hit Enter".
Does this seem about right? / Did I get anything wrong or leave out anything crucial?
Firstly try to understand that your home router is two devices: a switch and a router.
Focus on these facts:
The switch connects all the devices in your LAN together(including the router).
The router merely connects your switch(LAN) with the ISP(WAN).
Your LAN is essentially an Ethernet network which works with MAC addresses.
For my machine to communicate with others (the gateway in this case),
it may need to do an ARP Broadcast (if it doesn't already have the MAC
address in the ARP cache)
Correct.
When you want to send a file from your dekstop to your laptop, you do not want to go through the router. You want to go through the switch, as that is faster(lower layer). However you only know the IP of the laptop in your network. For that reason you need to get its MAC address. That's where ARP kicks in.
In this case you would broadcast the ARP request in the LAN until someone responds to you. This could be the router or any other device connected to the switch.
It then needs to resolve google.com's IP address. It does this by
contacting the DNS server. (I'm not completely sure how it knows where
the DNS server is? Or is it the gateway that knows?)
If you use DHCP, then that has already provided you with the IP of the DNS server. If not, then it means that you manually provided the IP of the DNS. So the IP of the DNS server is stored locally on your computer.
Making a DNS request is just about putting its IP in the packet with the request and forwarding the packet to the network.
Sidenote: DHCP also provides the IP address of the router.
This involves communication through the TCP protocol since HTTP is
built on it (TCP handshake: SYN, SYN/ACK, ACK, then requests for
content, then RST, RST/ACK, ACK)
Yes. To clarify things: When your computer sends the request
FRAME[IP[TCP[GET www.google.com]]]
The frame is being sent to your LAN's switch which forwards it to the MAC of the router. Your router will open the frame to check the destination IP and route it accordingly(in this case to the WAN). Finally when the frame arrives at the server, the server will open the TCP segment and read the payload, which is the HTTP message. The ACK/SYN etc. messages are being processed just by your computer and the server and not any router or switch.
To actually load a webpage, the browser gets the index.html, parses
it, then sends more requests based on what it needs? (images,etc)
Yes. An HTML file is essentially a tree structure which can have embedded resources like images, javafiles, CSS etc. For each such resource a new request has to be sent.
Once your browser gets all these recourses, it will render the webpage.
And finally, to do the actual google search, I don't understand how
the browser knows to communicate "I typed something in the search box
and hit Enter".
When you type a single character, it is being sent to the server. The server then responds with its suggestions. Easy as that.
References(good reads):
http://www.tcpipguide.com/free/t_TheNeedForAddressResolution.htm
http://www.howtogeek.com/99001/htg-explains-routers-and-switches/
http://www.eventhelix.com/realtimemantra/networking/ip_routing.htm#.UsrYAvim3yO
http://en.wikipedia.org/wiki/Dynamic_Host_Configuration_Protocol

How does client-machine/browser handle unrequested HTTP response?

Imagine the following:
User goes to script (http://sample.org/test.php),
Script sends an HTTP request to some other page (http://google.com/). For this example, we'll say using curl.
The script sets the IP address of the request to the user's IP, via CURLOPT_INTERFACE.
I know already that the requesting script will not receive the response, as the remote-host will send any responses to the IP address given in the request.
What I am wondering is what happens to this response? Assuming the client is on a LAN that has one external address and that all traffic sent to that IP is handled by a router acting as a DHCP server, will the response even get back to the user's machine? If it did, would there be any way to ensure that it was handled by the user's browser? And if so, how would the browser handle this, typically? Would it open a new window with Google in it?
I definitely have a follow up to this question, but I am very curious what goes on at this level, before I experiment further.
The script sets the IP address of the request to the user's IP, via CURLOPT_INTERFACE.
Usually, this won't work. Your ISP knows which IP address you are supposed to have and will not forward traffic coming from "fake" IP addresses.
In particular, since you can only communicate one-way with a fake IP (since the answer won't reach you), you would not be able to establish a working TCP connection, since TCP requires a three-way handshake. Thus, you wouldn't be able to submit your web request.
What I am wondering is what happens to this response? Assuming the client is on a LAN that has one external address and that all traffic sent to that IP is handled by a router acting as a DHCP server, will the response even get back to the user's machine?
If the user's PC has an internal IP address and uses NAT, the router will not know which LAN machine to forward the packet to (since it did not see any outgoing request to which it could match that response). Therefore, the answer would be dropped.
Even if you could get the response to reach the client:
If it did, would there be any way to ensure that it was handled by the user's browser?
No. As stated above, a TCP request consists of a three-way handshake. This handshake has not been completed, so the operating system would just drop the packet.
CURLOPT_INTERFACE is for use on computers that have multiple IP addresses assigned to them, to specify which of those addresses should be used as the source IP for the connection. You can't use it to spoof some other computer's IP address. Most likely you'll either get an error, or the option will be ignored and the OS will choose a source interface automatically (the default behavior).
The response will be returned on the same TCP connection as the request.

Resources