Nginx: Prevent direct access to static files - nginx

I've been searching for a while now but didn't manage to find anything that fits my needs. I don't need hotlinking protection, as much as I'd like to prevent people from directly accessing my files. Let's say:
My website.com requests website.com/assets/custom.js, that'd work,but I'd like visitors which directly visit this file to get a 403 status code or something. I really have no idea if it's possible, and I don't have any logical steps in mind..
Regards !

You can use nginx referer module: http://nginx.org/en/docs/http/ngx_http_referer_module.html.
Something like this:
server {
listen 80;
server_name website.com;
root /var/www/website.com/html ;
location /assets/ {
valid_referers website.com/ website.com/index.html website.com/some_other_good_page.html ;
if ($invalid_referer) {
deny all;
}
}
}
This config guard assets directory. But remember, that not guaranteed and worked only for browser - any body can emulate valid request with curl or telnet. For true safety you need use dynamic generated pages with dynamic generated links.
You do not need to create the variable $invalid_referer as this is set by the nginx module.

If you nginx powered development instances are showing up in Google search results, there is a quick and easy way to prevent search engines from crawling your site. Add the following line to the location block of your virtualhost configuration file for the block that you want to prevent crawling.
add_header X-Robots-Tag "noindex, nofollow, nosnippet, noarchive";

You can simply deny access to any folder or file just by putting these lines with your folders' name
location ~ /(no_access_folder|folder_2)
{
deny all;
return 403;
}

Related

Serving static website in nginx, wrong path for static files

I'm trying to use nginx to serve a static website that was given to me. Its folder structure is like this:
static_website/
index.html
www.example.com/
resources.example.com/
uploads.example.com/
The index.html file in the root is the one generated by httrack and it simply contains a redirect to www.example.com/index.html.
Inside the folder www.example.com are all the html files, in the other two folders are the css, javascript and image files.
Here is the nginx configuration:
server {
index index.php index.html index.htm;
server_name example.com;
location / {
root /var/www/static_website/www.example.com;
try_files $uri $uri/ =404;
index index.html;
}
}
I can navigate through the pages, but the css, javascript and image files are not loaded.
The path to one of the css files inside the html is like this:
href="../resources.example.com/style.css"
The only way I managed to get this working was to have the have the url like this:
example.com/www.example.com/
This way, all the path are correct. I'd like to avoid this and have simply example.com.
Is there a way to do this?
It looks like the site was originally intended to operate with ugly URLs like //example.com/www.example.com/.
But the path-relative URIs for the resources should work just fine relative to /, you just need to provide a location block which matches /resources.example.com/.
For example:
location / {
root /var/www/static_website/www.example.com;
try_files $uri $uri/ =404;
index index.html;
}
location /resources.example.com/ {
root /var/www/static_website;
}
I originally commented that you should try this:
location ~ \.(css|js|jpg|png|svg)$ {
root /var/www/static_website;
}
Which achieves a similar goal, but Nginx will process prefix locations more efficiently that regular expression locations.
I want to share my experience with this problem for others encountering similar issues as the solution was not so obvious to me
My setup and problem in particular had to do with cloudlflare settings which i was using to leverage TLS instead of handling it on the origin server for one of my 2 sites. if you are serving your site from a CDN that supports encryption and you use nginx on your origin consider the following setup:
# static1.conf
{ server_name static1.com; root: /var/www/static1/public; listen 80; listen 443; }
# static2.conf - no tls setup in nginx, figured id let cloudflare handle it
{ server_name static2.com; root: /var/www/static2/public; listen 80; }
static1 was setup at the origin with letsencrypt to handle tls connections
static2 was setup at the origin without any tls configuration
from left to right, here are the appropriate cloudlfare TLS modes which allowed me to access the correct files thru nginx
The distinction between full and flexible is that full mode lets the origin handle the certificate.
Initially I had the static2 site misconfigured as full, which lacked a listen directive for 443 causing nginx to serve static1 instead.
I realize the original question has nothing to do with cdn's or cloudflare but this scheme / protocol mismatch cost me a few hours and I am hoping to save someone else from similar grief
Honestly I am surprised nginx doesn't stick to matching on server_name and that oit implicitly matches on scheme as a fallback (or atleast appears to), even without a default_server specified - and without any meaningful messages in the logs to boot! Debugging nginx is a nightmare sometimes.

NGINX multiple server_name, but have robots.txt file for each server_name?

I have to create a server_name as a listener for origin pulls by my CDN.
The CDN wants to pull from origin.mydomain.com
I already have 100s of lines of code under www.mydomain.com that showcases all the rewrites, rules and such, and I need to use all this code again.
My easy solution would be to have
server_name www.mydomain.com origin.mydomain.com
To easily have NGINX listen for the requests to the "origin" subdomain.
My fear is that google discovers the subdomain and starts crawling it. I'd like to block google from the "origin" subdomain somehow. Since declaring multiple server_name, I am not sure I can just place robots.txt file somewhere, since using same root folder as live site.
Is there an easy way to do this?
All feedback appreciated.
Cheers
Ryan
Use two server blocks and use the include directive to pull in the common code. For example:
server {
server_name www.mydomain.com;
include /path/to/common/config;
location = /robots.txt {
root /path/to/friendly/dir;
}
}
server {
server_name origin.mydomain.com;
include /path/to/common/config;
location = /robots.txt {
root /path/to/unfriendly/dir;
}
}
So you have two robot.txt files in different directories - or use rewrite ... last to map the URI to different local files.

How to hide direct access of your video/images file in Nginx?

I am using Nginx Server. I have some images and video in /video directory. I am using these video and images into my site, but I want to show an 403 error, if someone try to access them directly like this examples shows:
http://xysz.com/video/abc.png
I know it's possible in Apache by changing the .htaccess config, but not sure how to do the same in Nginx. How can I achieve this with Nginx?
You can use this,
server {
listen 80;
server_name xysz.com;
root /var/www/xysz.com/html ;
location /assets/ {
valid_referers xysz.com/ xysz.com/video/index.html;
if $invalid_referer {
deny all;
}
}

Nginx disable logging for certain user agents

Basically, I'm trying to remove search engine crawlers such as Google, Bing, and what not from my access logs. They really build up over time, eventually adding hundreds of thousands of useless access log entries to the logs, this is especially a pain if you ever have to search through them. The trouble I'm having is that in my blocks, I'm defining the access log, therefore Nginx is just looking at that and ignoring my 2nd one that I define in the location / block. If I comment out my access log for my site (not the crawler block) then it works fine. Here is the configuration:
server {
listen 80;
server_name example.com;
access_log /home/domains/example.com/logs/access;
error_log /home/domains/example.com/logs/error;
root /home/domains/example.com/forums;
location / {
index index.html index.htm;
if ($http_user_agent ~* ("googlebot") ) {
access_log off;
}
}
I've removed everything except that upon posting (php include, and what not), though I've checked that nothing is interfering it by commenting everything out everything except what is above. So to sum it up, I have a log defined in my virtual block to log all of the traffic (I have it defined for every block, to make it neater and what not. I'm trying to disable logging for certain user agents, unless I disable the main log for the site, it'll continue logging what I tell it not to for the user agents.
I've been at this for a few hours now, any help will be greatly appreciated.
You should not use if statements in nginx - if is evil
Use conditional logging:
http {
map $http_user_agent $excluded_ua {
~Googlebot 0;
default 1;
}
.......
}
server {
access_log /home/domains/example.com/logs/access combined if=$excluded_ua;
}
However be careful about excluding googlebot as some abusive bots disguise themselves.
Well, actually the regex ("googlebot") will match a user agent with double quotes, clearly not what you want. Drop the parentheses, and the quotes if you want, and you should be fine.
You need to add return 200; after access_log off;
so it looks like this:
location / {
if ($http_user_agent ~* "(googlebot)" ) {
access_log off;
return 200;
}
}

Drop unwanted connections

I want to block unwanted Bots from accessing sites on the server.
Can nginx drop / kill the connection right away when a certain Bot is detected?
if ($http_user_agent ~ (agent1|agent2) ) {
**KILL CONNECTION**;
}
Something like example above.
Use return 444; This non-standard status code of 444 causes nginx to simply close the connection without responding to it.
if ($http_user_agent ~ (agent1|agent2) ) {
return 444;
}
Reference documentation
More elaborative documentation
Yes, it can. See the question below - this redirects based on an agent string, but you can really do whatever you want (error page or whatever).
Nginx proxy or rewrite depending on user agent
However, please note - that a decent bot will fake its user-agent string to look just like a normal browser, so this is by no means a robust way to deter bots from sweeping your site.
server {
listen 8443 default ssl;
error_page 404 403 = 444; #disconnect if 404 or 403
root /srv/empty; #Empty forder
...
...
location /summary_report{
root /srv/www;
index index.html index.htm;
}
}
https://127.0.0.1/ Disconnect.https://127.0.0.1/summary_report Not disconnect.

Resources