am struggling finding a solution with reverse proxy.
The goal is to be able to dynamically reroute, based on URI path, the incoming requests, e.g :
https://a.b.c/23432/IP.IP.IP.IP.IP/Path should be proxied to https://IP.IP.IP.IP:23432/Path
While it is working at first sight with
location ~ ^/(?<targetport>([0-9]+)?)/(?<targethost>[^/]+) {
proxy_pass http://$targethost:$targetport;
[...]
in the end, only the first element (index.html) is served correctly. The requests made by this page (let's say js/my.js) obviously forget the return path, and are generated to access https://a.b.c/js/my.js, and fail to be served.
I tried setting http_referer (even reverse_proxying the request to it) but it doesn't help as am unable to reparse it correctly
What am I missing here ?
Thanks for your help
Problem solved, the proxied site was prefixing all the resources with /, killing the initial path
Related
I came into a situation today. Please share your expertise 🙏
I have a project (my-app.com) and one of the features is to generate a status page consisting of different endpoints.
Current Workflow
User login into the system
User creates a status page for one of his sites (e.g.google) and adds different endpoints and components to be included on that page.
System generates a link for a given status page.
For Example. my-app.com/status-page/google
But the user may want to see this page in his custom domain.
For Example. status.google.com
Since this is a custom domain, we need on-demand TLS functionality. For this feature, I used Caddy and is working fine. Caddy is running on our subdomain status.myserver.com and user's custom domain status.google.com has a CNAME to our subdomain status.myserver.com
Besides on-demand TLS, I am also required to do reverse proxy as
shown below.
For Example. status.google.com ->(CNAME)-> status.myserver.com ->(REVERSE_PROXY)-> my-app.com/status-page/google
But Caddy supports only protocol, host, and port format for reverse proxy like my-app.com but my requirement is to support reverse proxy for custom page my-app.com/status-page/google. How can I achieve this? Is there a better alternative to Caddy or a workaround with Caddy?
You're right, since you can't use a path in a reverse-proxy upstream URL, you'd have to do rewrite the request to include the path first, before initiating the reverse-proxy.
Additionally, upstream addresses cannot contain paths or query strings, as that would imply simultaneous rewriting the request while proxying, which behavior is not defined or supported. You may use the rewrite directive should you need this.
So you should be able to use an internal caddy rewrite to add the /status-page/google path to every request. Then you can simply use my-app.com as your Caddy reverse-proxy upstream. This could look like this:
https:// {
rewrite * /status-page/google{path}?{query}
reverse_proxy http://my-app.com
}
You can find out more about all possible Caddy reverse_proxy upstream addresses you can use here: https://caddyserver.com/docs/caddyfile/directives/reverse_proxy#upstream-addresses
However, since you probably can't hard-code the name of the status page (/status-page/google) in your Caddyfile, you could set up a script (e.g. at /status-page) which takes a look at the requested URL, looks up the domain (e.g. status.google.com) in your database, and automatically outputs the correct status-page.
I have a rewrite rule that looks like the following
rewrite ^/([a-zA-Z0-9_]+)$ /mysite/#!/$1/login;
The idea is that a shortcode like
/foo
gets redirected to
/mysite/#!/foo/login
However nginx is redirecting to:
/mysite/%23!/foo/login
How do I prevent the URL encoding from happening in the rewrite?
I can reproduce this issue by using a reverse proxy.
Nginx is actually doing the right thing, as # is a reserved character for URIs and identifies the start of the fragment identifier.
The fragment identifier is for the browser's use only and is not usually received by the server in the requested URL. I am not sure how your Tomcat server is receiving requests containing a naked # in the first place.
I am using the following configuration for NGinx currently to test my app :
location / {
# see if the 'id' cookie is set, if yes, pass to that server.
if ($cookie_id){
proxy_pass http://${cookie_id}/$request_uri;
break;
}
# if the cookie isn't set, then send him to somewhere else
proxy_pass http://localhost:99/index.php/setUserCookie;
}
But they say "IFisEvil". Can anyone show me a way how to do the same job without using "if"?
And also, is my usage of "if" is buggy?
There are two reasons why 'if is evil' as far as nginx is concerned. One is that many howtos found on the internet will directly translate htaccess rewrite rules into a series of ifs, when separate servers or locations would be a better choice. Secondly, nginx's if statement doesn't behave the way most people expect it to. It acts more like a nested location, and some settings don't inherit as you would expect. Its behavior is explained here.
That said, checking things like cookies must be done with ifs. Just be sure you read and understand how ifs work (especially regarding directive inheritance) and you should be ok.
You may want to rethink blindly proxying to whatever host is set in the cookie. Perhaps combine the cookie with a map to limit the backends.
EDIT: If you use names instead of ip addresses in the id cookie, you'll also need a resolver defined so nginx can look up the address of the backend. Also, your default proxy_pass will append the request onto the end of the setUserCookie. If you want to proxy to exactly that url, you replace that default proxy_pass with:
rewrite ^ /index.php/setUserCookie break;
proxy_pass http://localhost:99;
How does web server implements url rewrite mechanism and changes the address bar of browsers?
I'm not asking specific information to configure apache, nginx, lighthttpd or other!
I would like to know what kind of information is sent to clients when servers want rewrite url?
There are two types of behaviour.
One is rewrite, the other is redirect.
Rewrite
The server performs the substitution for itself, making a URL like http://example.org/my/beatuful/page be understood as http://example.org/index.php?page=my-beautiful-page
With rewrite, the client does not see anything and redirection is internal only. No URL changes in the browser, just the server understands it differently.
Redirect
The server detects that the address is not wanted by the server. http://example.org/page1 has moved to http://example.org/page2, so it tells the browser with an HTTP 3xx code what the new page is. The client then asks for this page instead. Therefore the address in the browser changes!
Process
The process remains the same and is well described by this diagram:
Remark Every rewrite/redirect triggers a new call to the rewrite rules (with exceptions IIRC)
RewriteCond %{REDIRECT_URL} !^$
RewriteRule .* - [L]
can become useful to stop loops. (Since it makes no rewrite when it has happened once already).
Are you talking about server-side rewrites (like Apache mod-rewrite)? For those, the address bar does not generally change (unless a redirection is performed).
Or are you talking about redirections? These are done by having the server respond with an HTTP code (301, 302 or 307) and the location in the HTTP header.
There are two forms of "URL rewrite": those done purely within the server and those that are redirections.
If it's purely within the server, it's an internal matter and only matters with respect to the dispatch mechanism implemented in the server. In Apache HTTPD, mod_rewrite can do this, for example.
If it's a redirection, a status code implying a redirection is sent in the response, along with a Location header indicating to which URL the browser should be redirected (this should be an absolute URL). mod_rewrite can also do this, with the [R] flag.
The status code is usually 302 (found), but it could be configured for other codes (e.g. 301 or 307).
Another quite common use (often unnoticed because it's usually on by default in Apache HTTPD) is the redirection to the the URL with a trailing slash on a directory. This is implemented by mod_dir:
A "trailing slash" redirect is issued
when the server receives a request for
a URL http://servername/foo/dirname
where dirname is a directory.
Directories require a trailing slash,
so mod_dir issues a redirect to
http://servername/foo/dirname/.
Jeff Atwood had a great post about this: http://www.codinghorror.com/blog/2007/02/url-rewriting-to-prevent-duplicate-urls.html
How web server implements url rewrite mechanism and changes the address bar of browsers?
URL rewriting and forwarding are two completely different things. A server has no control over your browser so it can't change the URL of your browser, but it can ask your browser to go to a different URL. When your browser gets a response from a server it's entirely up to your browser to determine what to do with that response: it can follow the redirect, ignore it or be really mean and spam the server until the server gives up. There is no "mechanism" that the server uses to change the address, it's simply a protocol (HTTP 1.1) that the server abides by when a particular resource has been moved to a different location, thus the 3xx responses.
URL rewriting can transform URLs purely on the server-side. This allows web application developers the ability to make web resources accessible from multiple URLs.
For example, the user might request http://www.example.com/product/123 but thanks to rewriting is actually served a resource from http://www.example.com/product?id=123. Note that, there is no need for the address displayed in the browser to change.
The address can be changed if so desired. For this, a similar mapping as above happens on the server, but rather than render the resource back to the client, the server sends a redirect (301 or 302 HTTP code) back to the client for the rewritten URL.
For the example above this might look like:
Client request
GET /product/123 HTTP/1.1
Host: www.example.com
Server response
HTTP/1.1 302 Found
Location: http://www.example.com/product?id=123
At this point, the browser will issue a new GET request for the URL in the Location header.
I'm trying to use nginx as a temporary http cache in order to minimize requests to content. My content is on multiple servers so I can't use a static proxy_pass parameter to the direct location but instead of that I use a rewrite to a php script:
rewrite /([^/]+\.jpg) /index.php?file=$1 break;
proxy_pass http://www.phpserver.com;
The php script(that would be http://www.phpserver.com/index.php) then returns a redirect with http code 301 to the actual file location(like http://www.contentserver1.com/filepath/file.jpg).
The problem is that nginx returns the redirect headers instead of retrieving,caching and returning the actual content.
So how do I make it to get the content from the actual server instead of just caching the headers?
Nginx can work only as proxy. It doesn't know anything about logic of you application (site), it's just proxies requests, and can put to cache responses.
For make this schema work, you must remove rewrite section from nginx, and move this logic to phpserver.com. phpserver must download this file and output it to nginx. Even if it very hard operation, nginx would cache this response and when next request will be received, nginx will give response direct from his cache.