How does a server know which domain name was used? - networking

As far as i know what we get from a dns query is a ip address. So in the end of the day if thats true we are still using ip addresses to connect the server and domains are pretty names for them.
So how does a server know which domain i used to query that ip address?
How does vhosts work an understand that if the domain data is lost during dns query?

The Internet works in layers. Each layer uses different kind of parameters to do its work.
Layer 3 is typically IP aka Internet Protocol. To work it uses IP addresses, each computer has at least one to be able to discuss with another one. And there are two families in fact: version 4 and version 6.
Since multiple services can be on any given computer at some point, you need a layer on top of that, layer 4, that deals with transport. The "predominant" one is TCP aka Transport Control Protocol, but there is also UDP. TCP and UDP uses ports: a 2 bytes integer that encodes for a specific protocol.
For example, HTTP was given port number 80 (completely arbitrary), and HTTPS port 443.
The DNS, which itself uses UDP and TCP (on port 53), allows, among other things, to map a given hostname to a given IP address or multiple IP addresses. This is the typical A and AAAA records. There is also a CNAME record that maps one domain name to another. There also exists a SRV record that maps a service (which is a protocol name + a transport) to a given hostname and port number.
When one computer connects to another, its first step for all the above is to find out which IP address to use to connect to. It can use the DNS for that. Typically it will get only the IP address, but, depending on the protocol (layer above 4), may also get a port (if using SRV records).
The HTTP world does not use SRV records. So a browser just uses the hardcoded 80 or 443 ports, or the port number appearing in the URL.
Then we are at the transport level, let us say TCP.
The connection is done (since now the remote IP address and port are known) and the protocol above TCP, like HTTP, is free to convey any kind of extra data, such as the hostname that the client initially used (as taken from the URL) to find out the IP address.
This is done through the HTTP host header, see RFC 2616
Note that if you do things through TLS (which conceptually sits between TCP and HTTP) there is even something else happening: SNI or Server Name Indication.
When doing the TLS handshake, so before any kind of HTTP headers or content, the client will send the final hostname desired in some specific TLS message. Why? So that the server can find which specific certificate it should answer which as otherwhise it would not be able to know which hostname is requested as this sits in some HTTP header which do not exist until the TLS handshake is finished.
A webserver will be able to see both the SNI content to find out which certificate to send back and then the host header to find out which VirtualHost (in Apache) section is relevant to the query being processed.
If you are not in HTTP world, then it all depends on the protocol used. Older protocols, like FTP, did not plan for "multihoming" at the beginning, a given IP address meant only one hostname and service for example.

Related

Why do we need port number in HOST header of HTTP when we have port number in TCP packet?

I'm aware of how HOST header can help us having multiple websites on a single IP address. In HOST header, we can optionally specify "port number". (80 by default for HTTP)
In OSI model, layer-4 is responsible for dealing with "ports" and after reassembling the packets, it can hand them to the correct application/process.
On the other hand, HTTP works in layer-7 of OSI. So on that point, I think the application already received the correct packet and knows the port number.
Then why the HOST header have this "port number" part and how can this "port" of HOST header help us?
Also I want to know that if they are different or can be different ?
The port in the URL is the same port that gets used for the TCP connection and it is the same port that's in the host header.
The protocol is kind of Layer 5/6 but definitely not Layer 7. You might be able to argue it is Layer 6, but probably not if it's encrypted, in which case TLS would be l5 and http l6.
Adding the port allows the session layer to instruct the OS what port to use.
For some L5 protocols the application knows the default port, eg http(80) https(443) ftp(21).
But when you want to run one of those L5 sessions over a different L4 connection the user needs a way to instruct the TCP stack to do this. So the designers of http decided to allow an optional TCP port at the end of the URL.
The port in the host header tells you which endpoint your clients connected to. Eg abc.com:80 and abc.com:81 are different Endpoints, but they could be connected to the same server instance.
While it's true that a server can work out which port a user is connected to by looking at the socket, the server implementation might not support this or it might be necessary to retain it in the future.
If your server needs the port on the host header becomes a question of implementation and needs.

Understanding the SOCKS5 protocol RFC

I am reading the SOCKS5 RFC, it has:
CONNECT
In the reply to a CONNECT, BND.PORT contains the port number that the
server assigned to connect to the target host, while BND.ADDR
contains the associated IP address. The supplied BND.ADDR is often
different from the IP address that the client uses to reach the SOCKS
server, since such servers are often multi-homed. It is expected
that the SOCKS server will use DST.ADDR and DST.PORT, and the
client-side source address and port in evaluating the CONNECT
request.
For the last part of this paragraph, I have two questions:
The doc states that SOCKS servers are often multi-homed, and will reply to the client different bound address and port than the ones the client originally connects to. Does this mean the SOCKS server the client connects to redirects the connection to another SOCKS server? If so, what is point of letting the client sense the presence of the redirected SOCKS server? What will a client normally do with the bound address and port the SOCKS server replies?
The doc states It is expected that the SOCKS server will use DST.ADDR and DST.PORT, and the client-side source address and port in evaluating the CONNECT request, what exactly does it mean by evaluating the CONNECT request? What am I supposed to do in this evaluating process if I am implementing a SOCKS server?
SOCKS proxies are multi-homed because they are often installed at network boundaries like firewalls. The client connects to the internal interface of the firewall but the outgoing address is the external face. Since some protocols like FTP need to include the external visible IP address and port in-band (see FTP data transfer, i.e. PORT and PASV) they need to know this externally visible IP and port.
A normal socks proxy will connect where the clients wants to, i.e. DST. But when an upstream proxy is configured or when firewall ACL say different the proxy might behave differently.
No. It means the server has 2 (or more) network cards/connections -- you communicate with the server on cardA, but when that server connects to the device downstream, it uses cardB.
That's up to you really...perhaps you want to blacklist/whitelist certain clients/servers/ports (ex. only allow clients from your country, or only allow connections to a specific country). Good example is not letting a client connect back to itself (?). Just a guess. Usually RFCs are good about saying "MUST, MIGHT, MUST NOT, etc" ..if it says "expected", to me that sounds like 'might' which basically means 'can, but doesn't have to.'

Ensure http response goes to $_SERVER['REMOTE_ADDR']

I want to ensure people who use my site are who they say they are. Not who they say, they say, they are.
How do I ensure my data is going back to the IP given by $_SERVER['REMOTE_ADDR']?
Or is it automatic that the http response is sent there?
How do I ensure my data is going back to the IP given by
$_SERVER['REMOTE_ADDR']?
The client IP is handled by the TCP protocol. The REMOTE_ADDR property is populated by the client address of the TCP connection. It's not part of the HTTP protocol. So it is guaranteed that your application is talking to this IP.
This doesn't mean that the IP that you are seeing is the actual IP of the end-user (as attributed for example by his internet provider). There could be proxies or intermediate devices between him and your application. So basically what you will be seeing is the closest IP to your web server in this chain.

server is getting wreird IP address from client

I have a static local IP Address: 10.8.4., and the public IP Address of my machine is: 72.43.135.. when the server(sitting on different network from my workstation) gets a request from my machine, it sees my IP address from
Context.Request.UserHostAddress
and got 10.20.102.*.
why it the server not getting the IP as: 72.43.135.*?
If you define public and local, you will get to know that these terms might refere to the same network under some conditions. This could be a demilitarized zone (DMZ) for example.
What IP the destination server sees, depends on the interface you send the packets through and the routers it crosses.
Is there masquerading (NAT) ? - Is the main question. You can be on totally different networks but the routers might still forward your local IP, now this also depends on the routing table. Can a packet find its way back to your host? Is there a reversed route from the host to your machine?
The destination host is propably having 2 interfaces, 1 with IP 72.43.. one with a 10.8.. maybe it recieves through the 72 but sends back through the 10.8 because it has a different route back. Networking can be real voodoo! Trace your packets, ask your sysadmins..
(not talking about proxies here, they deliver different custom headers with different IPs)

How are different TCP connections in HTTP requests identified?

From what I understand, each HTTP request uses its own TCP connection (please correct me if i'm wrong). So, let's say that there are two current connections to the same server. For example, client side javascript code triggering a couple of AJAX POST requests using the XMLHttpRequest object, one right after the other, before getting the response to the first one. So we're talking about two connections to the same server, each waiting for a response in order to route it to each separate callback function.
Now here's the thing that I don't understand: The TCP packet includes source and destination ip and port, but won't both of these connections have the same src and dest ip addresses, and port 80? How can the packets be differentiated and routed to appropriately? Does it have anything to do with the packet sequence number which is different for each connection?
When your browser creates a new connection to the HTTP server, it uses a different source port.
For example, say your browser creates two connections to a server and that your IP address is 60.12.34.56. The first connection might originate from source port 60123 and the second from 60127. This is embedded in the TCP header of each packet sent to the server. When the server replies to each connection, it uses the appropriate port (e.g. 60123 or 60127) so that the packet makes it back to the right spot.
One of the best ways to learn about this is to download Wireshark and just observe traffic on your own network. It will show you this and much more.
Additionally, this gives insight into how Network Address Translation (NAT) works on a router. You can have many computers share the same IP address and the router will rewrite the request to use a different port so that two computers can simultaneously connect to places like AOL Instant Messenger.
They're differentiated by the source port.
The main reason for each HTTP request to not generate a separate TCP connection is called keepalives, incidentally.
A socket, in packet network communications, is considered to be the combination of 4 elements: server IP, server port, client IP, client port. The second one is usually fixed in a protocol, e.g. http usually listen in port 80, but the client port is a random number usually in the range 1024-65535. This is because the operating system could use those ports for known server protocols (e.g. 21 for FTP, 22 for SSH, etc.). The same network device can not use the same client port to open two different connections even to different servers and if two different clients use the same port, the server can tell them apart by their IP addresses. If a port is being used in a system either to listen for connection or to establish a connection, it can not be used for anything else. That's how the operating system can dispatch packets to the correct process once received by the network card.

Resources