Basically, I'm trying to remove search engine crawlers such as Google, Bing, and what not from my access logs. They really build up over time, eventually adding hundreds of thousands of useless access log entries to the logs, this is especially a pain if you ever have to search through them. The trouble I'm having is that in my blocks, I'm defining the access log, therefore Nginx is just looking at that and ignoring my 2nd one that I define in the location / block. If I comment out my access log for my site (not the crawler block) then it works fine. Here is the configuration:
server {
listen 80;
server_name example.com;
access_log /home/domains/example.com/logs/access;
error_log /home/domains/example.com/logs/error;
root /home/domains/example.com/forums;
location / {
index index.html index.htm;
if ($http_user_agent ~* ("googlebot") ) {
access_log off;
}
}
I've removed everything except that upon posting (php include, and what not), though I've checked that nothing is interfering it by commenting everything out everything except what is above. So to sum it up, I have a log defined in my virtual block to log all of the traffic (I have it defined for every block, to make it neater and what not. I'm trying to disable logging for certain user agents, unless I disable the main log for the site, it'll continue logging what I tell it not to for the user agents.
I've been at this for a few hours now, any help will be greatly appreciated.
You should not use if statements in nginx - if is evil
Use conditional logging:
http {
map $http_user_agent $excluded_ua {
~Googlebot 0;
default 1;
}
.......
}
server {
access_log /home/domains/example.com/logs/access combined if=$excluded_ua;
}
However be careful about excluding googlebot as some abusive bots disguise themselves.
Well, actually the regex ("googlebot") will match a user agent with double quotes, clearly not what you want. Drop the parentheses, and the quotes if you want, and you should be fine.
You need to add return 200; after access_log off;
so it looks like this:
location / {
if ($http_user_agent ~* "(googlebot)" ) {
access_log off;
return 200;
}
}
Related
I want to block access to all URLs in my server except URLs starting with /myapp/path, excluding /myapp/path/nope, which should be blocked as well.
I tried:
nginx.org/server-snippets: |
location = /myapp/path/nope { return 404; }
location ^~ /myapp/path {}
location / { return 404; }
But got 404 messages on URLs starting with /myapp/path as well. To be frank, even after reading the documentation and trying all sorts of things it seems I haven't figured out how nginx determines what location to serve. What it wrong with my snippet? Thanks!
Eventually I resolved this issue using a negative regex.
In my question I used the location and the regex incorrectly, as I never told nginx what to do with the path I wrote.
So in order to restrict access to anything that doesn't start with /myapp/path use:
location ~ ^/(?!myapp/path) { return 404; } # Or deny all;
This question refers to almost the opposite of this question. As nginx only allows if and within the if clause only a return, rewrite and proxy_pass, etc. but try_files, and else or NOT may not be used.
So the issue: I want to model the following (in pseudo nginx configuration)
location / {
if (NOT isset(cookie("cookie-consent")) ) {
return 302 "https://example.com/cookieconsent";
}
else {
# I am running the site on a backend server
proxy_pass https://example.com:8443;
}
}
# The HTML here has a button to agree to the use of said cookies
location /cookieconsent {
root /path/to/www-root;
try_files $uri/index.html $uri.html;
}
location /setconsent {
add_header Set-Cookie 'consent=true;Domain=$host;Path=/;Max-Age=7776000;SameSite=strict;HTTPOnly;Secure';
return 302 https://$host;
}
The background of this is that I am using Nextcloud and since Nextcloud uses functional cookies only, it is sufficient that users are informed of the use of cookies through a popup, however Nextcloud has no GDPR Cookie Consent plugin, nor am I intrigued to develop one. So the easiest alternative is to check if a user had been informed about the use of cookies and display the cookie consent page at first otherwise before the actual site is displayed.
As powerful as nginx is, there's of course a working solution to the above problem. Actually, the reason I added the question was to publish this answer since I believe that it might help others, and of course me too in case I forget how I achieved it as time passes and google for a quick solution. Without further ado, here's my solution in the nginx configuration file:
location / {
proxy_set_header HOST example.com;
proxy_set_header X-Forwarded-For $remote_addr;
if( $http_cookie ~* "consent" ) {
proxy_pass https://example.com:8443;
}
# Setting `return 302` directly here doesn't work well
# the first argument is just a filler as entering '#noconsent' only is disallowed
try_files $uri/$arg_key #noconsent;
}
location #noconsent {
return 302 "https://example.com/cookieconsent";
}
location /cookieconsent {
root /path/to/www-root;
expires max;
try_files $uri/index.html $uri.html;
}
location /setconsent {
add_header Set-Cookie 'consent=true;Domain=$host;Path=/;Max-Age=7776000;SameSite=strict;HTTPOnly;Secure';
return 302 https://$host;
}
What this configuration does is to 302 redirect to 'https://example.com/cookieconsent' if cookie not set. In there, it searches for '/cookieconsent/index.html' and 'cookieconsent.html'. Either of those files should be present and readable by the server. It should also include a button or link that triggers '/setconsent'. Thenafter, the consent cookie is set such that the condition is true and proxy_pass is triggered (Nextcloud is loaded in this case).
The nice feature of this is that one can do this purely in HTML without setting any cookies until the explicit consent was given by the visitor as required by GDPR laws. On the other hand, a visitor cannot visit the actual website without agreeing to the use of cookies first.
I have NGINX hosting many drop in apps that will usually all use the same api. My nginx has a location block for that api, so something liek
location /default-api/ {
proxy_pass https://some/location.com;
}
Usually each GUI will want to use the same api, occasionally someone may wish to change the api a specific app uses though. I wanted each GUI to be configured to hit a different url, so that it's easier to redirect that url later if someone wants to change their api, but rather then hard coding each url to https://some/location.com in each location block I wanted to redirect to the default-api.
So effectively I want something like, if it would work
location /foo-api/ {
redirect /default-api/;
}
location /bar-api/ {
redirect /default-api/;
}
location /baz-api/ {
redirect /default-api/;
}
I thought when I first played with nginx that I saw a very simple directive for doing this, but I can't find it now. I know a number of directives could do this, but none of the ones I know of feel clean enough to be worth doing.
rewrite requires an overly complex regex, redirect requires the client to make a new query after getting the redirect. proxy_pass does some unneeded proxying logic, all three seem to require me to hardcode the servername into the redirect path. the cleanest I could figure out was possibly using tryfiles in a manner it wasn't made for.
Is there some simpler directive to do an internal redirect like this?
Two suggestions.
1) Comment out the location /foo-api block unless it is needed:
location / {
rewrite ... ... break; # if required to normalize the /prefix/...
proxy_pass ...;
}
# location / foo-api/ { } # disabled - use `location /`
2) Use a named location:
location /default-api/ {
try_files /nonexistent #api;
}
location /foo-api/ {
try_files /nonexistent #api;
}
location #api {
rewrite ... ... break; # if required to normalize the /prefix/...
proxy_pass https://some/location.com;
}
I've been searching for a while now but didn't manage to find anything that fits my needs. I don't need hotlinking protection, as much as I'd like to prevent people from directly accessing my files. Let's say:
My website.com requests website.com/assets/custom.js, that'd work,but I'd like visitors which directly visit this file to get a 403 status code or something. I really have no idea if it's possible, and I don't have any logical steps in mind..
Regards !
You can use nginx referer module: http://nginx.org/en/docs/http/ngx_http_referer_module.html.
Something like this:
server {
listen 80;
server_name website.com;
root /var/www/website.com/html ;
location /assets/ {
valid_referers website.com/ website.com/index.html website.com/some_other_good_page.html ;
if ($invalid_referer) {
deny all;
}
}
}
This config guard assets directory. But remember, that not guaranteed and worked only for browser - any body can emulate valid request with curl or telnet. For true safety you need use dynamic generated pages with dynamic generated links.
You do not need to create the variable $invalid_referer as this is set by the nginx module.
If you nginx powered development instances are showing up in Google search results, there is a quick and easy way to prevent search engines from crawling your site. Add the following line to the location block of your virtualhost configuration file for the block that you want to prevent crawling.
add_header X-Robots-Tag "noindex, nofollow, nosnippet, noarchive";
You can simply deny access to any folder or file just by putting these lines with your folders' name
location ~ /(no_access_folder|folder_2)
{
deny all;
return 403;
}
I want to block unwanted Bots from accessing sites on the server.
Can nginx drop / kill the connection right away when a certain Bot is detected?
if ($http_user_agent ~ (agent1|agent2) ) {
**KILL CONNECTION**;
}
Something like example above.
Use return 444; This non-standard status code of 444 causes nginx to simply close the connection without responding to it.
if ($http_user_agent ~ (agent1|agent2) ) {
return 444;
}
Reference documentation
More elaborative documentation
Yes, it can. See the question below - this redirects based on an agent string, but you can really do whatever you want (error page or whatever).
Nginx proxy or rewrite depending on user agent
However, please note - that a decent bot will fake its user-agent string to look just like a normal browser, so this is by no means a robust way to deter bots from sweeping your site.
server {
listen 8443 default ssl;
error_page 404 403 = 444; #disconnect if 404 or 403
root /srv/empty; #Empty forder
...
...
location /summary_report{
root /srv/www;
index index.html index.htm;
}
}
https://127.0.0.1/ Disconnect.https://127.0.0.1/summary_report Not disconnect.