Difference between origin and x-forwarded-host http header - http

At work we use several websites with several virtual hosts.
I understand what a Virtual Host is, but I don't understand the difference between the Origin and X-forwarded-Host headers. (We use both of these headers at work.)
Examples from MDN:
X-forwarded-Host=X-Forwarded-Host: ==>X-Forwarded-Host: id42.example-cdn.com
Origin==Origin: "://" [ ":" ]==>Origin: https://developer.mozilla.org
From what I deduce from the above examples is that: X-forwarded-Host just contains the host and Origin contains the host plus the method and maybe the port.
Can someone told me if I'm wrong?

When a request is made to a website, the user-agent (browser) will add a Host header to the request. The value of this header will be domain name that the request is made to. This header can be used by the server to distinguish which website you are trying to visit, for instance when the same server hosts multiple websites.
If you are using a reverse proxy, the the Host header will contain the value of the reverse proxy itself. In order to know to the original host, the X-forwarded-Host can be used. This header contains what the proxy initially received as Host.
Example: If you make a GET request to https://stackoverflow.com, the Host (or after the proxy X-forwarded-Host) header will always have the value stackoverflow.com.
The Origin header is not related to this. This header tells you from which site the request is made.
For instance, if you have a <form> on example.com that makes a POST request to stackoverflow.com, the Host will be stackoverflow.com (the domain the request is sent to), while the Origin will have the value https://example.com.
For a demonstration you can visit this link: https://jsfiddle.net/parcqhn4/
If you open the "Network" tab in your developer tools and hit the "Submit" button, you can see that the Host header contains the value example.com (which is in the action of the <form>) while the Origin contains the value https://fiddle.jshell.net (which is container that the Fiddle runs in).

Related

How to use https in the host header of a post request

https://my.example.com works in browser while my.example.com:443 does not, but I need to use the former in a Host Header of a POST request but the Host header can't contain slashes so I'm not sure what to do

Using Delphi THTTPClient with TLS/SSL and own DNS host name resolving

Is it possible to use the Delphi System.Net.HttpClient.THTTPClient class with TLS/SSL and own DNS host name resolving? At the moment I didn't get it to work. Http "Host" Header isn't enough - it seems to be a SNI issue. So I have to set the host name individually for the TLS negotiation process.
Url example:
https://api.ipgeolocationapi.com
-->
https://172.64.111.34
For the second one I have to >>set<< the host name "api.ipgeolocationapi.com" separately to avoid certificate problems. The question is how to do that.
Any suggestions?
The host name is part of the HTTP request only in a header line "host". So you can simply add that header line to the request and use the IP in the URL. If Indy component add a host header line, you must change the value (which would in you case be the IP) to the host name.

Real life usage of the X-Forwarded-Host header?

I've found some interesting reading on the X-Forwarded-* headers, including the Reverse Proxy Request Headers section in the Apache documentation, as well as the Wikipedia article on X-Forwarded-For.
I understand that:
X-Forwarded-For gives the address of the client which connected to the proxy
X-Forwarded-Port gives the port the client connected to on the proxy (e.g. 80 or 443)
X-Forwarded-Proto gives the protocol the client used to connect to the proxy (http or https)
X-Forwarded-Host gives the content of the Host header the client sent to the proxy.
These all make sense.
However, I still can't figure out a real life use case of X-Forwarded-Host. I understand the need to repeat the connection on a different port or using a different scheme, but why would a proxy server ever change the Host header when repeating the request to the target server?
If you use a front-end service like Apigee as the front-end to your APIs, you will need something like X-FORWARDED-HOST to understand what hostname was used to connect to the API, because Apigee gets configured with whatever your backend DNS is, nginx and your app stack only see the Host header as your backend DNS name, not the hostname that was called in the first place.
This is the scenario I worked on today:
Users access certain application server using "https://neaturl.company.com" URL which is pointing to Reverse Proxy. Proxy then terminates SSL and redirects users' requests to the actual application server which has URL of "http://192.168.1.1:5555". The problem is - when application server needed to redirect user to other page on the same server using absolute path, it was using latter URL and users don't have access to this. Using X-Forwarded-Host (+ X-Forwarded-Proto and X-Forwarded-Port) allowed our proxy to tell application server which URL user used originally and thus server started to generate correct absolute path in its responses.
In this case there was no option to stop application server to generate absolute URLs nor configure it for "public url" manually.
I can tell you a real life issue, I had an issue using an IBM portal.
In my case the problem was that the IBM portal has a rest service which retrieves an url for a resource, something like:
{"url":"http://internal.host.name/path"}
What happened?
Simple, when you enter from intranet everything works fine because internalHostName exists but... when the user enter from internet then the proxy is not able to resolve the host name and the portal crashes.
The fix for the IBM portal was to read the X-FORWARDED-HOST header and then change the response to something like:
{"url":"http://internet.host.name/path"}
See that I put internet and not internal in the second response.
For the need for 'x-forwarded-host', I can think of a virtual hosting scenario where there are several internal hosts (internal network) and a reverse proxy sitting in between those hosts and the internet. If the requested host is part of the internal network, the requested host resolves to the reverse proxy IP and the web browser sends the request to the reverse proxy. This reverse proxy finds the appropriate internal host and forwards the request sent by the client to this host. In doing so, the reverse proxy changes the host field to match the internal host and sets the x-forward-host to the actual host requested by the client. More details on reverse proxy can be found in this wikipedia page http://en.wikipedia.org/wiki/Reverse_proxy.
Check this post for details on x-forwarded-for header and a simple demo python script that shows how a web-server can detect the use of a proxy server: x-forwarded-for explained
One example could be a proxy that blocks certain hosts and redirects them to an external block page. In fact, I’m almost certain my school filter does this…
(And the reason they might not just pass on the original Host as Host is because some servers [Nginx?] reject any traffic to the wrong Host.)
X-Forwarded-Host just saved my life. CDNs (or reverse proxy if you'd like to go down to "trees") determine which origin to use by Host header a user comes to them with. Thus, a CDN can't use the same Host header to contact the origin - otherwise, the CDN would go to itself in a loop rather than going to the origin. Thus, the CDN uses either IP address or some dummy FQDN as the Host header fetching content from the origin. Now, the origin may wish to know what was the Host header (aka website name) the content is asked for. In my case, one origin served 2 websites.
Another scenario, you license your app to a host URL then you want to load balance across n > 1 servers.

How does web application restore original URL after HTTP proxy or load balancer?

You deploy Web application (in my case Java EE + Spring MVC, but I think it doesn't have matter what web-stack is used) and hide it behind HTTP proxy or load balancer.
Proxy/balancer software can fix HTTP headers. This is not question.
But application itself put links into generated HTML:
...
...
Proxy/balancer can use different $HOST:$PORT or $CONTEXT part. In case of Java EE with JSP this piece of code fix this issue:
<c:url value="$PATH">
${pageContext.request.contextPath}/$PATH
I want to know how Web framework gets knowledge about user requested $HOST:$PORT/$CONTEXT so it can be rendered in HTML?
Is this info extracted from:
http://en.wikipedia.org/wiki/X-Forwarded-For
non-standard de-facto tag? It look like:
X-Forwarded-For: client, proxy1, proxy2, ..., proxyN
so web framework extract second argument (which is proxy1 in my example, or host IP if N == 0) to provide to you $HOST:$PORT/$CONTEXT?
This is going to be dependent on your particular proxy or load balancer. X-Forwarded-For is very common, but it usually only tells you about the IP address of the original request.
In AWS you can use three headers to construct more of the original URL:
X-Forwarded-For
X-Forwarded-Proto
X-Forwarded-Port
Apache uses these and you can configure other custom headers with additional data:
X-Forwarded-For - The IP address of the client.
X-Forwarded-Host - The original host requested by the client in the Host HTTP request header.
X-Forwarded-Server - The hostname of the proxy server.
In Azure, these headers are available:
x-original-host
x-original-path
Bottom line is that there is no standard way of re-constructing the original URL. You will have to use the documentation of whatever proxy you are using. Some data may be missing. In some cases you may be able to configure the proxy to send missing data in custom headers.

How does a webserver know what website you want to access?

Apache has something called VirtualHosts.
You can configure it in that way that when you go to example.com get a different site than example2.com even if you use the same IP's.
A HTTP Request looks something like this:
GET /index.html HTTP/1.0
[some more]
How does the server know you are trying to access www.example.com or www.example2.com?
In addition to the GET line, the browser sends a number of headers. One of these headers is the Host header, which specifies which host the request is targeted at.
A simple example request could be:
GET /index.html HTTP/1.0
Host: example.com
This indicates that the browser wants whatever is at http://example.com/index.html, and not what is at http://example2.com/index.html.
Further information:
The Host header in the HTTP specification
IIS also has this and I believe refers to it as host header redirection.
The http packet header contains the destination hostname which the server uses to determine which website to serve up. Some more reading: http://www.it-notebook.org/iis/article/understanding_host_headers.htm

Resources