I thought I understood the whole thing about NAT etc but now I came to a problem.
First what I assumed:
Because there are not enough IPv4 addresses available we need another system.
The devices of today at home for connecting to the internet are a combination of:
1) A modem at the physical-level to change the type of signals on the wire.
2) A switch at link-level so you can connect multiple computers to the device
3) A router to connect all the computers to the internet and go beyond your home-subnet etc.
4) A NAT to allow all the internal computers to connect to the outside
5) A portforwarder to let connections from the outside to the internal network
What I call a NAT:
When making a request to the outside: the NAT-part of the device changes the source-port and the source-ip of the request coming from an internal computer. The new source-ip will be your public-ip. The NAT-part will hold a record in a table with this mapping: "original-ip, original-port, new-port".
When a reponse comes back, the NAT will check the destination-port and compare this with the new-ports is in his table. If it finds a match the NAT will replace the destination-ip with original-ip and new-port with original-port. As a consequence the response will be forwarded to the internal computer that made the request.
So, the NAT-part is for when a connection is initialized from the inside. When this request traverses the NAT, 2 things are changed: source-ip and source-port.
Then the portforwarder:
This part of the device will accept connections initialized in the outside-world to your network. It will look at the destination-port of the incoming request and by making a rule for that port-number it may change the destination-port and the destination-ip of the request to an internal ip. With these rules a request from the outside can connect to a computer on your internal network and thus the portforwarder changes 2 things: the destination-ip and the destination-port.
A: Before I ask my question, how is this explanation?
Now my problem is with the response after a request came from the outside through the portforwarder. Assume the right rules are made and a request came through portforwarding on an internal computer. So in the portforwarder the destination-ip was changed to the internal-ip of the computer and the destination-port was changed to the port where the service is running on. If this internal-computer is a webserver it will generate a response. So the destination-ip will be the request's source-ip and the destination-port will be the request's source-port. The source-ip will be the internal-ip of the computer and the source-port will be the port of the service.
Now that response has to go to the outside. So I assume it goes through the NAT to the outside?
So after passing the NAT, the source-ip will be the public-ip and the source-port will be random. Now I tested this with wireshark. I contacted a webserver behind a NAT and I saw the reponse was coming from port 80 ?! How is this possible? This indicates that the response of the forwarded request did not pass the NAT?
I rethought the concept and my new hypothesis is that when a connection is initialized from the outside, it will pass the portforwarder and reach the right computer. This will create a response and when this response reaches our "all-in-one"device, this device can recognize it forwarded the request of the response and will not change the source-port.
B: Is this indeed the case or is it done in another way?
Wikipedia says about portforwarding: "The source address and port are, in this case, left unchanged. When used on machines that are not the default gateway of the network, the source address must be changed to be the address of the translating machine, or packets will bypass the translator and the connection will fail." (http://en.wikipedia.org/wiki/Port_forwarding)
This confirms that the response of a forwarded request MUST go through the portforwarder again and not through the NAT so the source-port wont be changed. The portforwarder will change the source-ip to the public-ip.
Can someone verify this or give me another explanation than mine?
Now I tested this with wireshark. I contacted a webserver behind a NAT
and I saw the reponse was coming from port 80 ?! How is this possible?
This indicates that the response of the forwarded request did not pass
the NAT?
The webserver inside the NAT does not have to be running on port 80. It certainly is set up at the NAT to port forward and respond as if it were at port 80, but that doesn't mean much about the port the web server is actually running on.
Here is some ASCII "art" that may help.
**Internal Network** **NAT Router** **External Computer**
Web Server running at IP 9.9.9.9 port 80 IP 20.20.20.20
IP 192.168.1.7 port 4567
Request web page at 9.9.9.9:80
Forwards port 80 traffic
to 192.168.1.7:4567
Replies with the web page
Puts 9.9.9.9:80 in the
source field and sends
the page on
Gets the page from "9.9.9.9:80"
even though it actually came
from 192.168.1.7:4567
Related
As far as i know what we get from a dns query is a ip address. So in the end of the day if thats true we are still using ip addresses to connect the server and domains are pretty names for them.
So how does a server know which domain i used to query that ip address?
How does vhosts work an understand that if the domain data is lost during dns query?
The Internet works in layers. Each layer uses different kind of parameters to do its work.
Layer 3 is typically IP aka Internet Protocol. To work it uses IP addresses, each computer has at least one to be able to discuss with another one. And there are two families in fact: version 4 and version 6.
Since multiple services can be on any given computer at some point, you need a layer on top of that, layer 4, that deals with transport. The "predominant" one is TCP aka Transport Control Protocol, but there is also UDP. TCP and UDP uses ports: a 2 bytes integer that encodes for a specific protocol.
For example, HTTP was given port number 80 (completely arbitrary), and HTTPS port 443.
The DNS, which itself uses UDP and TCP (on port 53), allows, among other things, to map a given hostname to a given IP address or multiple IP addresses. This is the typical A and AAAA records. There is also a CNAME record that maps one domain name to another. There also exists a SRV record that maps a service (which is a protocol name + a transport) to a given hostname and port number.
When one computer connects to another, its first step for all the above is to find out which IP address to use to connect to. It can use the DNS for that. Typically it will get only the IP address, but, depending on the protocol (layer above 4), may also get a port (if using SRV records).
The HTTP world does not use SRV records. So a browser just uses the hardcoded 80 or 443 ports, or the port number appearing in the URL.
Then we are at the transport level, let us say TCP.
The connection is done (since now the remote IP address and port are known) and the protocol above TCP, like HTTP, is free to convey any kind of extra data, such as the hostname that the client initially used (as taken from the URL) to find out the IP address.
This is done through the HTTP host header, see RFC 2616
Note that if you do things through TLS (which conceptually sits between TCP and HTTP) there is even something else happening: SNI or Server Name Indication.
When doing the TLS handshake, so before any kind of HTTP headers or content, the client will send the final hostname desired in some specific TLS message. Why? So that the server can find which specific certificate it should answer which as otherwhise it would not be able to know which hostname is requested as this sits in some HTTP header which do not exist until the TLS handshake is finished.
A webserver will be able to see both the SNI content to find out which certificate to send back and then the host header to find out which VirtualHost (in Apache) section is relevant to the query being processed.
If you are not in HTTP world, then it all depends on the protocol used. Older protocols, like FTP, did not plan for "multihoming" at the beginning, a given IP address meant only one hostname and service for example.
This could not be the right place, as it's not about pure programming;
nevertheless, as a simple web developer I find myself quite
ignorant on the subject of networking(Wikipedia usually mix
different subjects on the matter), and I feel as it is a "must" to know.
I sort of have an image of what happens when you write google.com
on your browser, and I don't know the whole process(I have a modem,
a router and a few computers connected to it. let's use my case for an example):
You write characters into chrome ->
there is some character encoding done to translate the address(ASCII or else) ->
DNS does something, not sure ->
your router receives a digital request from a computer's internet cable/WIFI, it saves the internal IPV4 address of
the sender in order to know to which computer to respond back. it sends the digital data to the modem ->
your modem receives digital data, and translates it from digital to analog ->
now your network provider does some work - >
the google server receives a request from an IP address - >
not sure how the google server handles the data, nevertheless it sends back data ->
service provider - > router gets translated digital data from the modem and remembers who sent the request, and sends it to the right person.
in order to optimize a web server or maybe to write a better code which involves networking, perhaps each beginner(such as myself) needs to understand this first? Thank you for your time.
EDIT: I did read wikipedia's OSI model, though it's not quite as helpful as I thought it would.
i will try to explain the idea, although its may be much ,more complicate - it depends on how deep you want to go ...
you write "www.stackoverflow.com"
your OS will try to resolve the www.stackoverflow.com to an IP address
since your OS probably cant, it needs to ask a DNS server
assuming you use an external DNS ( say IP=5.5.5.5 and your IP=10.10.10.10 which is on a different networks ), your OS will check if it knows how to reach 5.5.5.5
a default route 0.0.0.0/0 exists on your PC (this is also known as 'default-gw' which includes ALL internet, it points to your local router
an IP packet will be sent to the router MAC address with the DNS IP address in the destination
your router will probably change your private IP address to its own public IP address and will sends it to the ISP
ISP will route it to the internet until it reaches 5.5.5.5 which is the DNS
DNS will reply back resolving stackoverflow.com to an IP address
your PC now knows how to send packets to stackoverflow.com
packet will be sent to stackoverflow ip address (104.16.36.249) to port 80 (http)
stackoverflow web server listen to requests on port 80
once a packet arrives it will generate a response packet
it will send it back to you exactly in the same way
all that traffic can be seen with a network capture utility like wireshark, u can use those commands (windows) to verify...
ping stackoverflow.com
netstat -rn
ipconfig
nslookup
tracert -d
I have a static local IP Address: 10.8.4., and the public IP Address of my machine is: 72.43.135.. when the server(sitting on different network from my workstation) gets a request from my machine, it sees my IP address from
Context.Request.UserHostAddress
and got 10.20.102.*.
why it the server not getting the IP as: 72.43.135.*?
If you define public and local, you will get to know that these terms might refere to the same network under some conditions. This could be a demilitarized zone (DMZ) for example.
What IP the destination server sees, depends on the interface you send the packets through and the routers it crosses.
Is there masquerading (NAT) ? - Is the main question. You can be on totally different networks but the routers might still forward your local IP, now this also depends on the routing table. Can a packet find its way back to your host? Is there a reversed route from the host to your machine?
The destination host is propably having 2 interfaces, 1 with IP 72.43.. one with a 10.8.. maybe it recieves through the 72 but sends back through the 10.8 because it has a different route back. Networking can be real voodoo! Trace your packets, ask your sysadmins..
(not talking about proxies here, they deliver different custom headers with different IPs)
I'm trying to learn the basics of ARP/TCP/HTTP (in sort of a scatter-shot way).
As an example, what happens when I go to google.com and do a search?
My understanding so far:
For my machine to communicate with others (the gateway in this case),
it may need to do an ARP Broadcast (if it doesn't already have the
MAC address in the ARP cache)
It then needs to resolve google.com's IP address. It does this by
contacting the DNS server. (I'm not completely sure how it knows
where the DNS server is? Or is it the gateway that knows?)
This involves communication through the TCP protocol since HTTP is
built on it (TCP handshake: SYN, SYN/ACK, ACK, then requests for
content, then RST, RST/ACK, ACK)
To actually load a webpage, the browser gets the index.html, parses
it, then sends more requests based on what it needs? (images,etc)
And finally, to do the actual google search, I don't understand how
the browser knows to communicate "I typed something in the search box
and hit Enter".
Does this seem about right? / Did I get anything wrong or leave out anything crucial?
Firstly try to understand that your home router is two devices: a switch and a router.
Focus on these facts:
The switch connects all the devices in your LAN together(including the router).
The router merely connects your switch(LAN) with the ISP(WAN).
Your LAN is essentially an Ethernet network which works with MAC addresses.
For my machine to communicate with others (the gateway in this case),
it may need to do an ARP Broadcast (if it doesn't already have the MAC
address in the ARP cache)
Correct.
When you want to send a file from your dekstop to your laptop, you do not want to go through the router. You want to go through the switch, as that is faster(lower layer). However you only know the IP of the laptop in your network. For that reason you need to get its MAC address. That's where ARP kicks in.
In this case you would broadcast the ARP request in the LAN until someone responds to you. This could be the router or any other device connected to the switch.
It then needs to resolve google.com's IP address. It does this by
contacting the DNS server. (I'm not completely sure how it knows where
the DNS server is? Or is it the gateway that knows?)
If you use DHCP, then that has already provided you with the IP of the DNS server. If not, then it means that you manually provided the IP of the DNS. So the IP of the DNS server is stored locally on your computer.
Making a DNS request is just about putting its IP in the packet with the request and forwarding the packet to the network.
Sidenote: DHCP also provides the IP address of the router.
This involves communication through the TCP protocol since HTTP is
built on it (TCP handshake: SYN, SYN/ACK, ACK, then requests for
content, then RST, RST/ACK, ACK)
Yes. To clarify things: When your computer sends the request
FRAME[IP[TCP[GET www.google.com]]]
The frame is being sent to your LAN's switch which forwards it to the MAC of the router. Your router will open the frame to check the destination IP and route it accordingly(in this case to the WAN). Finally when the frame arrives at the server, the server will open the TCP segment and read the payload, which is the HTTP message. The ACK/SYN etc. messages are being processed just by your computer and the server and not any router or switch.
To actually load a webpage, the browser gets the index.html, parses
it, then sends more requests based on what it needs? (images,etc)
Yes. An HTML file is essentially a tree structure which can have embedded resources like images, javafiles, CSS etc. For each such resource a new request has to be sent.
Once your browser gets all these recourses, it will render the webpage.
And finally, to do the actual google search, I don't understand how
the browser knows to communicate "I typed something in the search box
and hit Enter".
When you type a single character, it is being sent to the server. The server then responds with its suggestions. Easy as that.
References(good reads):
http://www.tcpipguide.com/free/t_TheNeedForAddressResolution.htm
http://www.howtogeek.com/99001/htg-explains-routers-and-switches/
http://www.eventhelix.com/realtimemantra/networking/ip_routing.htm#.UsrYAvim3yO
http://en.wikipedia.org/wiki/Dynamic_Host_Configuration_Protocol
Imagine the following:
User goes to script (http://sample.org/test.php),
Script sends an HTTP request to some other page (http://google.com/). For this example, we'll say using curl.
The script sets the IP address of the request to the user's IP, via CURLOPT_INTERFACE.
I know already that the requesting script will not receive the response, as the remote-host will send any responses to the IP address given in the request.
What I am wondering is what happens to this response? Assuming the client is on a LAN that has one external address and that all traffic sent to that IP is handled by a router acting as a DHCP server, will the response even get back to the user's machine? If it did, would there be any way to ensure that it was handled by the user's browser? And if so, how would the browser handle this, typically? Would it open a new window with Google in it?
I definitely have a follow up to this question, but I am very curious what goes on at this level, before I experiment further.
The script sets the IP address of the request to the user's IP, via CURLOPT_INTERFACE.
Usually, this won't work. Your ISP knows which IP address you are supposed to have and will not forward traffic coming from "fake" IP addresses.
In particular, since you can only communicate one-way with a fake IP (since the answer won't reach you), you would not be able to establish a working TCP connection, since TCP requires a three-way handshake. Thus, you wouldn't be able to submit your web request.
What I am wondering is what happens to this response? Assuming the client is on a LAN that has one external address and that all traffic sent to that IP is handled by a router acting as a DHCP server, will the response even get back to the user's machine?
If the user's PC has an internal IP address and uses NAT, the router will not know which LAN machine to forward the packet to (since it did not see any outgoing request to which it could match that response). Therefore, the answer would be dropped.
Even if you could get the response to reach the client:
If it did, would there be any way to ensure that it was handled by the user's browser?
No. As stated above, a TCP request consists of a three-way handshake. This handshake has not been completed, so the operating system would just drop the packet.
CURLOPT_INTERFACE is for use on computers that have multiple IP addresses assigned to them, to specify which of those addresses should be used as the source IP for the connection. You can't use it to spoof some other computer's IP address. Most likely you'll either get an error, or the option will be ignored and the OS will choose a source interface automatically (the default behavior).
The response will be returned on the same TCP connection as the request.