How are URLs mapped to their respective IP addresses in DNS?

How are URLs mapped to their respective IP addresses in DNS? - networking

What would be an explanation to a site being mapped to its IP address in DNS? I know inverse tree / resolver and name server are part of the process, but what are the actual steps?

They are not. The DNS does not deal with URL, which is a concept at level 4 of the Internet stack, that is the application protocol part, like HTTP here.
In the DNS you find domain names, host names, and IP addresses (both v4 and v6).
The browser extracts the hostname from the URL, resolves it to some IP, connects to it, if under HTTPS sends the hostname in SNI extension during TLS handshake, and then send the URL inside its first HTTP message, typically in part using the host header.
There is an URL record type in the DNS, but it is rarely used. In theory SRV records could also be used by browsers to find the proper server to connect to based on the hostname in the URL, but in practice browsers do not use it for various technical and non technical reasons.

Related

Does internal communication between private servers use DNS and HTTPS?

I would like to know how internal communication links between private internal servers and a reverse proxy look.
When from my client (browser) I make a request to, say https://facebook.com, I hit Facebook's reverse proxy. I have two questions, when that reverse proxy gets a request and needs to forward it to the server that should handle it, does that sever it is forwarding the request to have a domain name or is it just an IP address ((user.facebook.com or useroffacebook.com v.s. 34.23.66.25 (DO NOT GO TO THAT ADDRESS I JUST MADE IT UP!!!)))? Also, does that connection use HTTP or HTTPS?

Like Kshitij Joshi already mentioned, it could be both.
A more detailed perspective for implementation:
reverse proxy should use IP addresses for routing so they are still working even if the DNS fails or is unavailable to the proxy for some reason.
internal traffic should also be encrypted (HTTPS). using plain text, even in internal networks, must be considered dangerous and is not recommended.
from my mindset you can replace the 'should' with a 'must'.

How does cloudlfare (or other CDN) detect direct IP address requests?

So I did a DNS lookup of a website hosted through cloudflare.
I pasted the IP address in my address bar and got a page saying:
Error 1003 Ray ID: 729ca4f4aff82e38 • 2022-07-12 20:49:14 UTC
Direct IP access not allowed
If my browser is doing the same thing i.e. fetching the ip using the url then sending a HTTPS req to the same IP, but when I do it manually I am getting this error - how can cloudflare detect that its a direct IP access attempt?

how can cloudflare detect that its a direct IP access attempt?
Just based on what you (your browser) are sending!
An URL is of the form http://hostname/path considering that hostname can be an IP address.
When you put that in your browser, the browser will split the parts and do an HTTP query.
The HTTP protocol defines an HTTP message to be headers plus an optional body. Among headers, one is called host and the value is exactly what was in URL.
Said differently, between http://www.example.com/ and http://192.0.2.42/ (if www.example.com was resolving to that IP address):
at the TCP/IP level nothing changes: in both cases, through OS, the browser connects at IP address 192.0.2.42 (because the www.example.com from first URL will be resolved to its IP address)
when it starts the HTTP exchange, the message sent by the client will then have as header either host: www.example.com in the first case or host: 192.0.2.42 in the second case
the webserver sees obviously all headers sent by client, including this host one and hence can do whatever it wants with it, and most importantly select which website was requested if multiple websites resolves to the same IP address (if you understand the text above, you now see why the host header is necessary). If URLs are https:// and not just http:// there is a subtetly because there is another layer between the TCP/IP connection and the HTTP application protocol, which is TLS, and the equivalent of the host header is sent also at the TLS level, through what is called the SNI extension, so that the server can also decide which server certificate it needs to send back to the client, before even the first byte of the HTTP exchange is done.

Map DNS entry to specific port

Let's say I have this DNS entry: mysite.sample. I am developing, and have a copy of my website running locally in http://localhost:8080. I want this website to be reachable using the (fake) DNS: http://mysite.sample, without being forced to remember in what port this site is running. I can setup /etc/hosts and nginx to do proxing for that, but ... Is there an easier way?
Can I somehow setup a simple DNS entry using /etc/hosts and/or dnsmasq where also a non-standard port (something different than :80/:443) is specified? Without the need to provide extra configuration for nginx?
Or phrased in a simpler way: Is it possible to provide port mappings for dns entries in /etc/hosts or dnsmasq?

DNS has nothing to do with the TCP port. DNS is there to resolv names (e.g. mysite.sample) into IP addresses - kind of like a phone book.
So it's a clear "NO". However, there's another solution and I try to explain it.
When you enter http://mysite.sample:8080 in your browser URL bar, your client (e.g. browser) will first try to resolve mysite.sample (via OS calls) to an IP address. This is where DNS kicks in, as DNS is your name resolver. If that happened, the job of DNS is finished and the browser continues.
This is where the "magic" in HTTP happens. The browser is connecting to the resolved IP address and the desired port (by default 80 for http and 443 for https), is waiting for the connection to be accepted and is then sending the following headers:
GET <resource> HTTP/1.1
Host: mysite.sample:8080
Now the server reads those headers and acts accordingly. Most modern web servers have something called "virtual hosts" (i.e. Apache) or "sites" (i.e. nginx). You can configure multiple vhosts/sites - one for each domain. The web server will then provide the site matching the requested host (which is retreived by the browser from the URL bar and passed to the server via Host HTTP header). This is pure HTTP and has nothing to do with TCP.
If you can't change the port of your origin service (in your case 8080), you might want to setup a new web server in front of your service. This is also called reverse proxy. I recommend reading the NGINX Reverse Proxy docs, but you can also use Apache or any other modern web server.
For nginx, just setup a new site and redirect it to your service:
location mysite.example {
proxy_pass http://127.0.0.1:8080;
}

There is a mechanism in DNS for discovering the ports that a service uses, it is called the Service Record (SRV) which has the form
_service._proto.name. TTL class SRV priority weight port target.
However, to make use of this record you would need to have an application that referenced that record prior to making the call. As Dominique has said, this is not the way HTTP works.
I have written a previous answer that explains some of the background to this, and why HTTP isn't in the standard. (the article discusses WS, but the underlying discussion suggested adding this to the HTTP protocol directly)
Edited to add -
There was actually a draft IETF document exploring an official way to do this, but it never made it past draft stage.
This document specifies a new URI scheme called http+srv which uses a DNS SRV lookup to locate a HTTP server.
There is an specific SO answer here which points to an interesting post here

Faking an HTTP request header

I have a general networking question but it's related with security aspect.
Here is my case: I have a host which is infected by a malware. The malware creates an http packet to communicate with it's command and control server. While constructing the packet, the IP layer contains the correct IP address of the command and control server. The tcp layer contains the correct port number 80.
Before sending the packet out, the malware modifies the http header to replace the host header with “google.com" instead of it's server address. It then attaches the stolen data with the packet and sends it out.
My understanding is that the packet will get delivered to the correct server because the routing will happen based on the IP.
But can I host a webserver on this IP that would receive all packets with header host google.com and parse it correctly?
Based on my reading on the internet, it is possible but if it is that easy then why have malware authors not adopted this technique to spoof the http headers and bypass traditional domain whitelisting engines.

When you make a request to let's say Apache2 server, what actually Apache does is match your "Host" header with any VirtualHost within server's configuration. Only if it cannot be found / is invalid, Apache will route the request to default virtualhost if it's defined. Basically nothing stops you from changing these headers.
You can simply test it by editing your hosts file and pointing google.com to any other IP - you will be able to handle the google.com domain on your server, but only you will be to use it this way - no one else.
Anything you send inside HTTP headers shouldn't be trusted - it just a guide for your server on how to actually handle the traffic.

The fake host header is just there to trick some deep-inspection firewalls ("it's for Google? you may pass..."). The server on that IP either doesn't care about the host header (default vhost) or is explicitly configured to accept it.
Passing the loot on by using fake headers or just as plain data behind the headers is another trick to fool data loss prevention.
These methods can mislead shallow application-layer inspection but won't pass a decent firewall.

Should x-forwarded-for contain a proxy in https traffic?

I have a web server cluster behind a proxy/load balancer. That proxy contains my SSL certs and hands the web servers the decrypted traffic, and along the way adds an "x-forwarded-for" header into the HTTP header the web application receives. This application has seen millions of IP addresses over the past decade, but something weird happened today.
For the first time, I saw an x-forwarded-for that contained a second address reach the application [addressed altered]:
x-forwarded-for: 62.211.19.218, 177.168.159.85
This indicates that the traffic came through a proxy, and I understand this is normal for x-f-f. I would have thought this was impossible (or at least unlikely) with https as the protocol.
Can someone explain how this is legit?

As per RFC 7239, this HTTP header is specified as
X-Forwarded-For: client, proxy1, proxy2, ...
Where client is the IP of the original client and then each proxy adds the IP it received the request from, at the end of the list. In the above example, you would see IP of proxy3 in your webserver and proxy2 is the IP which connected to the proxy3.
As anyone can put anything inside this header, you should accept it only from known sources like your own reverse proxy or whitelist of known legit proxies. For example Apache has mod_rpaf, which transparently changes client IP address to the one provided in this header, but only if the request is received from the IP of known proxy server.
On corporate networks you can easily do transparent proxying for HTTPS traffic without any notice from normal users. Just create your own certification authority, use for example Windows Group Policy to install & trust this CA on all corporate workstations. Then redirect all HTTPS connections to your proxy which will generate certificate for all visited domains on the fly. This is something which is happening and you can even buy enterprise hardware proxies using this method.
So to summarize the reasons why you could see multiple IPs in the X-Forwarded-For header:
Transparent HTTPS proxy as mentioned above
The header was added by the requestor itself (browser, wget, script) for whatever reason, for example to hide its own IP
Some CDN like Cloudflare could add that header if used
Multiple reverse proxies defined either intentionally or by mistake
Conclusion: You should only trust this header if it originates from your own proxy (in case of multiple IPs, trust only the last one).

MAYBE it's using the Proxy protocol for HTTPS. Granted you may not be using httproxy, but this seems to be a decent description:
http://www.haproxy.org/download/1.5/doc/proxy-protocol.txt
I'm not sure about the SSL cert, but there's no guarantee someone is doing something pathalogical (maybe unintentionally) like running all their HTTPS traffic through a proxy and then accepting all the invalid certificates. But I suspect the proxy protocol might make this work; it does expose the HTTP headers to the proxy in some sense.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex