Restrict access to a certain URL prefix - nginx

I want to block access to all URLs in my server except URLs starting with /myapp/path, excluding /myapp/path/nope, which should be blocked as well.
I tried:
nginx.org/server-snippets: |
location = /myapp/path/nope { return 404; }
location ^~ /myapp/path {}
location / { return 404; }
But got 404 messages on URLs starting with /myapp/path as well. To be frank, even after reading the documentation and trying all sorts of things it seems I haven't figured out how nginx determines what location to serve. What it wrong with my snippet? Thanks!

Eventually I resolved this issue using a negative regex.
In my question I used the location and the regex incorrectly, as I never told nginx what to do with the path I wrote.
So in order to restrict access to anything that doesn't start with /myapp/path use:
location ~ ^/(?!myapp/path) { return 404; } # Or deny all;

Related

How to use regex like /page/.*/page/ in Nginx location

Url example:
http://test.com/test/page/4?-test?o?o_html/page/100/page/2/page/3/page/4/page/4/page/2/page/106/page/107/page/2/page/3/page/4/page/108/page/3/page/2/page/3/page/4&-test
I want to use nginx location to forbidden it.
But I faild, I have tried different rules in http://nginx.viraptor.info/
location ~ /page/.*/page/ {
return 403;
}
location ~* \/page/.*/page/ {
location ~* /page/\.*/page/ {
None of them worked...
I found only use /page/ is Worked.
location ~* /page/ {
But when I add .*/page/ like:
location ~* /page/.*/page/ {
It's not worked...
Now I use php to judge url like:
if (preg_match ("/\/page\/.*\/page\//i", $_SERVER["REQUEST_URI"]))
Please tell me how to use regex .* in nginx conf location. I want to use nginx.
Everything after the first ? are URI arguments, they are not part of the URI, as such a location match will never work. If there is a solution that allows you to do this in Nginx, it will be a kludge, you should perform this level of checking in your script like you stated you are already doing.
See: https://serverfault.com/questions/811912/can-nginx-location-blocks-match-a-url-query-string

Reverse image proxy without specifying host

I have the following in my config as a reverse proxy for images:
location ~ ^/image/(.+) {
proxy_pass http://example.com/$1;
}
The problem is that not all images will be example.com images and so we need to pass in the full url. If I try:
location ~ ^/image/(.+) {
proxy_pass $1;
}
I get an error:
invalid URL prefix in "https:/somethingelse.com/someimage.png"
The question is quite vague, but, based on the error message, what you're trying to do is perform a proxy_pass entirely based on the user input, by using the complete URL specified after the /image/ prefix of the URI.
Basically, this is a very bad idea, as you're opening yourself to become an open proxy. However, the reason it doesn't work as in the conf you supplied is due to URL normalisation, which, in your case, compacts http://example into http:/example (double slash becomes single), which is different in the context of proxy_pass.
If you don't care about security, you can just change merge_slashes from the default of on to off:
merge_slashes off;
location …
Another possibility is to somewhat related to nginx proxy_pass and URL decoding
location ~ ^/image/.+ {
rewrite ^ $request_uri;
rewrite ^/image/(.*) $1 break;
return 400;
proxy_pass $uri; # will result in an open-proxy, don't try at home
}
The proper solution would be to implement a whitelist, possibly with the help of map or even prefix-based location directives:
location ~ ^/image/(http):/(upload.example.org)/(.*) {
proxy_pass $1://$2/$3;
}
Do note that, as per the explanation in the begginning, the location above is subject to the merge_slash setting, so, it'll never have the double // by default, hence the need to add the double // manually at the proxy_pass stage.
I would use a map in this case
map $request_uri $proxied_url {
# if you don't care about domain and file extension
~*/image/(https?)://?(.*) $1://$2;
# if you want to limit file extension
~*/image/(https?)://?(.*\.(png|jpg|jpeg|ico))$ $1://$2;
# if you want to limit file extension and domain
~*/image/(https?)://?(abc\.xyz\.com/)(.*\.(png|jpg|jpeg|ico))$ $1://$2$3;
default "/404";
}
Then in your proxy pass part you would use something like below
location /image/ {
proxy_pass $proxied_url;
}
I have given three different example depending how you want to handle it

NGINX location block always redirect to different location

I have NGINX hosting many drop in apps that will usually all use the same api. My nginx has a location block for that api, so something liek
location /default-api/ {
proxy_pass https://some/location.com;
}
Usually each GUI will want to use the same api, occasionally someone may wish to change the api a specific app uses though. I wanted each GUI to be configured to hit a different url, so that it's easier to redirect that url later if someone wants to change their api, but rather then hard coding each url to https://some/location.com in each location block I wanted to redirect to the default-api.
So effectively I want something like, if it would work
location /foo-api/ {
redirect /default-api/;
}
location /bar-api/ {
redirect /default-api/;
}
location /baz-api/ {
redirect /default-api/;
}
I thought when I first played with nginx that I saw a very simple directive for doing this, but I can't find it now. I know a number of directives could do this, but none of the ones I know of feel clean enough to be worth doing.
rewrite requires an overly complex regex, redirect requires the client to make a new query after getting the redirect. proxy_pass does some unneeded proxying logic, all three seem to require me to hardcode the servername into the redirect path. the cleanest I could figure out was possibly using tryfiles in a manner it wasn't made for.
Is there some simpler directive to do an internal redirect like this?
Two suggestions.
1) Comment out the location /foo-api block unless it is needed:
location / {
rewrite ... ... break; # if required to normalize the /prefix/...
proxy_pass ...;
}
# location / foo-api/ { } # disabled - use `location /`
2) Use a named location:
location /default-api/ {
try_files /nonexistent #api;
}
location /foo-api/ {
try_files /nonexistent #api;
}
location #api {
rewrite ... ... break; # if required to normalize the /prefix/...
proxy_pass https://some/location.com;
}

Nginx: Prevent direct access to static files

I've been searching for a while now but didn't manage to find anything that fits my needs. I don't need hotlinking protection, as much as I'd like to prevent people from directly accessing my files. Let's say:
My website.com requests website.com/assets/custom.js, that'd work,but I'd like visitors which directly visit this file to get a 403 status code or something. I really have no idea if it's possible, and I don't have any logical steps in mind..
Regards !
You can use nginx referer module: http://nginx.org/en/docs/http/ngx_http_referer_module.html.
Something like this:
server {
listen 80;
server_name website.com;
root /var/www/website.com/html ;
location /assets/ {
valid_referers website.com/ website.com/index.html website.com/some_other_good_page.html ;
if ($invalid_referer) {
deny all;
}
}
}
This config guard assets directory. But remember, that not guaranteed and worked only for browser - any body can emulate valid request with curl or telnet. For true safety you need use dynamic generated pages with dynamic generated links.
You do not need to create the variable $invalid_referer as this is set by the nginx module.
If you nginx powered development instances are showing up in Google search results, there is a quick and easy way to prevent search engines from crawling your site. Add the following line to the location block of your virtualhost configuration file for the block that you want to prevent crawling.
add_header X-Robots-Tag "noindex, nofollow, nosnippet, noarchive";
You can simply deny access to any folder or file just by putting these lines with your folders' name
location ~ /(no_access_folder|folder_2)
{
deny all;
return 403;
}

Nginx disable logging for certain user agents

Basically, I'm trying to remove search engine crawlers such as Google, Bing, and what not from my access logs. They really build up over time, eventually adding hundreds of thousands of useless access log entries to the logs, this is especially a pain if you ever have to search through them. The trouble I'm having is that in my blocks, I'm defining the access log, therefore Nginx is just looking at that and ignoring my 2nd one that I define in the location / block. If I comment out my access log for my site (not the crawler block) then it works fine. Here is the configuration:
server {
listen 80;
server_name example.com;
access_log /home/domains/example.com/logs/access;
error_log /home/domains/example.com/logs/error;
root /home/domains/example.com/forums;
location / {
index index.html index.htm;
if ($http_user_agent ~* ("googlebot") ) {
access_log off;
}
}
I've removed everything except that upon posting (php include, and what not), though I've checked that nothing is interfering it by commenting everything out everything except what is above. So to sum it up, I have a log defined in my virtual block to log all of the traffic (I have it defined for every block, to make it neater and what not. I'm trying to disable logging for certain user agents, unless I disable the main log for the site, it'll continue logging what I tell it not to for the user agents.
I've been at this for a few hours now, any help will be greatly appreciated.
You should not use if statements in nginx - if is evil
Use conditional logging:
http {
map $http_user_agent $excluded_ua {
~Googlebot 0;
default 1;
}
.......
}
server {
access_log /home/domains/example.com/logs/access combined if=$excluded_ua;
}
However be careful about excluding googlebot as some abusive bots disguise themselves.
Well, actually the regex ("googlebot") will match a user agent with double quotes, clearly not what you want. Drop the parentheses, and the quotes if you want, and you should be fine.
You need to add return 200; after access_log off;
so it looks like this:
location / {
if ($http_user_agent ~* "(googlebot)" ) {
access_log off;
return 200;
}
}

Resources