How does cloudlfare (or other CDN) detect direct IP address requests? - http

So I did a DNS lookup of a website hosted through cloudflare.
I pasted the IP address in my address bar and got a page saying:
Error 1003 Ray ID: 729ca4f4aff82e38 • 2022-07-12 20:49:14 UTC
Direct IP access not allowed
If my browser is doing the same thing i.e. fetching the ip using the url then sending a HTTPS req to the same IP, but when I do it manually I am getting this error - how can cloudflare detect that its a direct IP access attempt?

how can cloudflare detect that its a direct IP access attempt?
Just based on what you (your browser) are sending!
An URL is of the form http://hostname/path considering that hostname can be an IP address.
When you put that in your browser, the browser will split the parts and do an HTTP query.
The HTTP protocol defines an HTTP message to be headers plus an optional body. Among headers, one is called host and the value is exactly what was in URL.
Said differently, between http://www.example.com/ and http://192.0.2.42/ (if www.example.com was resolving to that IP address):
at the TCP/IP level nothing changes: in both cases, through OS, the browser connects at IP address 192.0.2.42 (because the www.example.com from first URL will be resolved to its IP address)
when it starts the HTTP exchange, the message sent by the client will then have as header either host: www.example.com in the first case or host: 192.0.2.42 in the second case
the webserver sees obviously all headers sent by client, including this host one and hence can do whatever it wants with it, and most importantly select which website was requested if multiple websites resolves to the same IP address (if you understand the text above, you now see why the host header is necessary). If URLs are https:// and not just http:// there is a subtetly because there is another layer between the TCP/IP connection and the HTTP application protocol, which is TLS, and the equivalent of the host header is sent also at the TLS level, through what is called the SNI extension, so that the server can also decide which server certificate it needs to send back to the client, before even the first byte of the HTTP exchange is done.

Related

Usage of 'Host' Header in Web Requests

I am looking at the http-requests in BurpSuite. I see a field named as 'Host'. What is the importance of this field?
What happens if I change this field and then send the request? If I change the host header field to some other IP then would the server respond back to this new modified IP?
A single web server can host multiple websites with different domains and subdomains.
The Host header allows it to distinguish between them.
Given the limited availability of IPv4 addresses, this is important as there are more websites than available IP addresses.
What happens if I change this field and then send the request?
If the server pays attention to it and recognises the hostname, it will respond with that website (otherwise it may fall back to its default website or throw an error).
For an example, see Name-based Virtual Host Support in the Apache HTTPD manual.
If I change the host header field to some other IP then would the server respond back to this new modified IP?
No. The Host header is the host the client is asking for. It has nothing to do with where the response should be sent.
To quote from https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Host :
The Host request header specifies the host and port number of the server to which the request is being sent.
If no port is included, the default port for the service requested (e.g., 443 for an HTTPS URL, and 80 for an HTTP URL) is implied.
A Host header field must be sent in all HTTP/1.1 request messages. A 400 (Bad Request) status code may be sent to any HTTP/1.1 request message that lacks a Host header field or that contains more than one.

How are URLs mapped to their respective IP addresses in DNS?

What would be an explanation to a site being mapped to its IP address in DNS? I know inverse tree / resolver and name server are part of the process, but what are the actual steps?
They are not. The DNS does not deal with URL, which is a concept at level 4 of the Internet stack, that is the application protocol part, like HTTP here.
In the DNS you find domain names, host names, and IP addresses (both v4 and v6).
The browser extracts the hostname from the URL, resolves it to some IP, connects to it, if under HTTPS sends the hostname in SNI extension during TLS handshake, and then send the URL inside its first HTTP message, typically in part using the host header.
There is an URL record type in the DNS, but it is rarely used. In theory SRV records could also be used by browsers to find the proper server to connect to based on the hostname in the URL, but in practice browsers do not use it for various technical and non technical reasons.

Faking an HTTP request header

I have a general networking question but it's related with security aspect.
Here is my case: I have a host which is infected by a malware. The malware creates an http packet to communicate with it's command and control server. While constructing the packet, the IP layer contains the correct IP address of the command and control server. The tcp layer contains the correct port number 80.
Before sending the packet out, the malware modifies the http header to replace the host header with “google.com" instead of it's server address. It then attaches the stolen data with the packet and sends it out.
My understanding is that the packet will get delivered to the correct server because the routing will happen based on the IP.
But can I host a webserver on this IP that would receive all packets with header host google.com and parse it correctly?
Based on my reading on the internet, it is possible but if it is that easy then why have malware authors not adopted this technique to spoof the http headers and bypass traditional domain whitelisting engines.
When you make a request to let's say Apache2 server, what actually Apache does is match your "Host" header with any VirtualHost within server's configuration. Only if it cannot be found / is invalid, Apache will route the request to default virtualhost if it's defined. Basically nothing stops you from changing these headers.
You can simply test it by editing your hosts file and pointing google.com to any other IP - you will be able to handle the google.com domain on your server, but only you will be to use it this way - no one else.
Anything you send inside HTTP headers shouldn't be trusted - it just a guide for your server on how to actually handle the traffic.
The fake host header is just there to trick some deep-inspection firewalls ("it's for Google? you may pass..."). The server on that IP either doesn't care about the host header (default vhost) or is explicitly configured to accept it.
Passing the loot on by using fake headers or just as plain data behind the headers is another trick to fool data loss prevention.
These methods can mislead shallow application-layer inspection but won't pass a decent firewall.

Why does the host address is included in HTTP 1.1 GET command?

GET /calcuapp/calculator.jsp HTTP/1.1
Host: 192.168.1.66:8080
I'm using PuTTy and the host destination is already set up on the settings. Why do I need again to type the host destination as you can see above?
The short answer is Virtual Hosts.
For many years now, it has been quite common to host multiple sites/domains from a single server. HTTP 1.1 supports this by requiring the host header. If you use HTTP 1.0 you may leave this out.
The Host HTTP header is mandatory since HTTP/1.1 and it's used for virtual hosting.
It must include the domain name of the server, and the TCP port number on which the server is listening. The port number may be omitted if the port is the standard port for the service requested (80 for HTTP and 443 for HTTPS).
A HTTP/1.1 request that lacks the Host header should be responded with a 400 (Bad Request) status code.
The RFC 7230, the current reference message syntax and routing in HTTP/1.1, tells the whole story about this header:
5.4. Host
The Host header field in a request provides the host and port
information from the target URI, enabling the origin server to
distinguish among resources while servicing requests for multiple
host names on a single IP address.
Host = uri-host [ ":" port ]
A client MUST send a Host header field in all HTTP/1.1 request
messages. If the target URI includes an authority component, then a
client MUST send a field-value for Host that is identical to that
authority component, excluding any userinfo subcomponent and its #
delimiter. If the authority component is missing or
undefined for the target URI, then a client MUST send a Host header
field with an empty field-value.
Since the Host field-value is critical information for handling a
request, a user agent SHOULD generate Host as the first header field
following the request-line.
For example, a GET request to the origin server for
http://www.example.org/pub/WWW/ would begin with:
GET /pub/WWW/ HTTP/1.1
Host: www.example.org
A client MUST send a Host header field in an HTTP/1.1 request even if
the request-target is in the absolute-form, since this allows the
Host information to be forwarded through ancient HTTP/1.0 proxies
that might not have implemented Host.
When a proxy receives a request with an absolute-form of
request-target, the proxy MUST ignore the received Host header field
(if any) and instead replace it with the host information of the
request-target. A proxy that forwards such a request MUST generate a
new Host field-value based on the received request-target rather than
forward the received Host field-value.
Since the Host header field acts as an application-level routing
mechanism, it is a frequent target for malware seeking to poison a
shared cache or redirect a request to an unintended server. An
interception proxy is particularly vulnerable if it relies on the
Host field-value for redirecting requests to internal servers, or for
use as a cache key in a shared cache, without first verifying that
the intercepted connection is targeting a valid IP address for that
host.
A server MUST respond with a 400 (Bad Request) status code to any
HTTP/1.1 request message that lacks a Host header field and to any
request message that contains more than one Host header field or a
Host header field with an invalid field-value.
Your local resolver (DNS etc) converts the host name on the command line to an IP address before connecting; there is no way for the remote server to know which host name you gave on the command line if there are multiple host names which resolve to the same IP address (this is what's called "virtual hosting"; with HTTP 1.0, you needed a separate IP address for each distinct HTTP host, which is extremely wasteful, but saves you from needing to transmit the Host: header).

nginx DNS redirect: how does it work

I'm trying to do some DNS redirect: if user access h##p://subdomain.mydomain.com, he/she will be redirected to h##p://www.mydomain.com/some/url.
I think it can be done with a URL record in the DNS server. But like mentioned [here] it can be done with HTTP server configuration as well. And ... that confused me.
AFAIK, a request starts with a DNS resolve, which give us the IP address of the server. From there one, HTTP traffic are IP based. So how does nginx/apache know the server name?
There is no DNS URL record. If you refer to DNSimple product, it's actually a combination of CNAME (or A) record and simple HTTP server.
HTTP clients (browsers) send server's name in a header as a part of HTTP request.

Resources