How to block pingdom bots in nginx?

How to block pingdom bots in nginx? - nginx

Considering that Pingdom is a high reputable site, I tried to subscribe for their uptime monitoring service. However, even though I setup for 5 minutes interval, their bot Pingdom.com_bot_version_1.4 keeps accessing my site not once every second, but tens of times every second! Resulting thousands of access per minute!
Then I tried to completely cancel the service but still get bombarded by their bot. I tried to block in robots.txt, but apparently they chokes it on. Next, I tried to block in nginx.conf with this command:
if ($http_user_agent ~* Pingdom.com_bot) {
return 403;
}
It works, but I see a lot of 503 errors in access.log. How to not log this bot? Really really annoying. I regret ever subscribed to their service.

Here is a post about blocking w00tw00t which you could easily adopt.
The easiest option for you to adapt would proablably beeing the fail2ban one.
using a fail regex triggering on your 403 error.
So something like
[Definition]
failregex = ^<HOST> .* "(GET|POST|HEAD).*HTTP.*" 403 [0-9]{1,} ".+" ".+"$
ignoreregex=
in /etc/fail2ban/filter.d/nginx-pindotban.conf
and
[pingdotban]
enabled = true
port = http,https
filter = nginx-pingdotban
logpath = /path/to/nginx/access.log
maxretry = 5
bantime = 360000
in /etc/fail2ban/jail.conf
You can test the regex with
fail2ban-regex logfile /etc/fail2ban/filter.d/nging-pingdotban.conf
Also the iptable variant could be adopted with something like
iptables -A INPUT -p tcp --dport 80 -m string --algo bm --string "the useragent" -j DROP

Related

How to control vhost_shared_traffic memory K8s nginx ingress?

Background
We run a kubernetes cluster that handles several php/lumen microservices. We started seeing the app php-fpm/nginx reporting 499 status code in it's logs, and it seems to correspond with the client getting a blank response (curl returns curl: (52) Empty reply from server) while the applications log 499.
10.10.x.x - - [09/Mar/2020:18:26:46 +0000] "POST /some/path/ HTTP/1.1" 499 0 "-" "curl/7.65.3"
My understanding is nginx will return the 499 code when the client socket is no longer open/available to return the content to. In this situation that appears to mean something before the nginx/application layer is terminating this connection. Our configuration currently is:
ELB -> k8s nginx ingress -> application
So my thoughts are either ELB or ingress since the application is the one who has no socket left to return to. So i started hitting ingress logs...
Potential core problem?
While looking the the ingress logs i'm seeing quite a few of these:
2020/03/06 17:40:01 [crit] 11006#11006: ngx_slab_alloc() failed: no memory in vhost_traffic_status_zone "vhost_traffic_status"
Potential Solution
I imagine if i gave vhost_traffic_status_zone some more memory at least that error would go away and on to finding the next error.. but I can't seem to find any configmap value or annotation that would allow me to control this. I've checked the docs:
https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/
https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/
Thanks in advance for any insight / suggestions / documentation I might be missing!

here is the standard way to look up how to modify the nginx.conf in the ingress controller. After that, I'll link in some info on suggestions on how much memory you should give the zone.
First start by getting the ingress controller version by checking the image version on the deploy
kubectl -n <namespace> get deployment <deployment-name> | grep 'image:'
From there, you can retrieve the code for your version from the following URL. In the following, I will be using version 0.10.2.
https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.10.2
The nginx.conf template can be found at rootfs/etc/nginx/template/nginx.tmpl in the code or /etc/nginx/template/nginx.tmpl on a pod. This can be grepped for the line of interest. I the example case, we find the following line in the nginx.tmpl
vhost_traffic_status_zone shared:vhost_traffic_status:{{ $cfg.VtsStatusZoneSize }};
This gives us the config variable to look up in the code. Our next grep for VtsStatusZoneSize leads us to the lines in internal/ingress/controller/config/config.go
// Description: Sets parameters for a shared memory zone that will keep states for various keys. The cache is shared between all worker processe
// https://github.com/vozlt/nginx-module-vts#vhost_traffic_status_zone
// Default value is 10m
VtsStatusZoneSize string `json:"vts-status-zone-size,omitempty"
This gives us the key "vts-status-zone-size" to be added to the configmap "ingress-nginx-ingress-controller". The current value can be found in the rendered nginx.conf template on a pod at /etc/nginx/nginx.conf.
When it comes to what size you may want to set the zone, there are the docs here that suggest setting it to 2*usedSize:
If the message("ngx_slab_alloc() failed: no memory in vhost_traffic_status_zone") printed in error_log, increase to more than (usedSize * 2).
https://github.com/vozlt/nginx-module-vts#vhost_traffic_status_zone
"usedSize" can be found by hitting the stats page for nginx or through the JSON endpoint. Here is the request to get the JSON version of the stats and if you have jq the path to the value: curl http://localhost:18080/nginx_status/format/json 2> /dev/null | jq .sharedZones.usedSize
Hope this helps.

module.run not executing in state.highstate, but works with state.sls

I'm attempting to re-run a state from another state. I'm not using watch or watch_in etc b/c i want it to run each time. I configure all my nginx virtual hosts and then at the end another state runs called nginx-certs the relevant portion is here:
nginx-frontend:
module.run:
- name: state.sls
- mods:
- nginx-frontend
During the highstate i see the state_id is executed but has no comments, nor shows it reruns that state, it just completes as Result: True. I can then jump to the salt master and run
sudo salt webserver state.sls nginx-certs
and when it hits nginx-frontend, it does reload all of the virtual hosts, putting the new cert in the config.
I'm curious why this does not run in the highstate.
I have attempted ll sorts of different variations of the simple block outlined above. This one works, but not in the highstate, which is my goal to fix.
If you wonder why i do it this way, all certificates for production and staging terminate at HAProxy and nginx only serves up 80/http1 81/h2, but when building out dev servers i want to assign the cert directly to the server as it will be public facing. I need to build out the virtual hosts first to get port 80 open which is used for lets-encrypt. Then when the cert is available, update the nginx vhosts listen directive and cert paths.

From what I understand: you have one server which you want temporarily configured with Nginx on port 80, then generate its certificate with letsencrypt, then change Nginx configuration to be on port 443.
What you can do is:
have one state which installs and configures Nginx to listen on port 80
have another state with installs/configures/runs letsencrypt
a third state which configures Nginx as you want it to be at the end [1]
you just include them in salt to be run in the specific order like
# custom_nginx.sls
include:
- temp_nginx_on_port_80
- letsencrypt_cert
- nginx
[1] for this I think its better to use formula like the one from the community https://github.com/saltstack-formulas/nginx-formula/ and configure it with pillar data. Obviously if you use it for step 3, you won't be able to use for step 1 (or at least I don't see right now how)

Nginx Reload Configuration Best Practice

Currently setting up a nginx reverse-proxy load-balancing a wide variety of domain names.
nginx configuration files are programatically generated and might change very often (ie add or delete http/https servers)
I am using:
nginx -s reload
To tell nginx to re-read the configuration.
the main nginx.conf file contain an include of all the generated configuration files as such:
http {
include /volumes/config/*/domain.conf;
}
Included configuration file might look like this:
server {
listen 80;
listen [::]:80;
server_name mydomain.com;
location / {
try_files $uri /404.html /404.htm =404;
root /volumes/sites/mydomain;
}
}
My question:
Is it healthy or considered harmfull to run:
nginx -s reload
multiple times per minutes to notify nginx to take into account modifications on the configuration?
What kind of performance hit would that imply ?
EDIT: I'd like to reformulate the question: How can we make it possible to dynamically change the configuration of nginx very often without a big perfomance hit ?

I would use inotifywatch with a timeout on the directory containing the generated conf files and reload nginx only if something was modified/created/deleted in said directory during that time:
-t , --timeout Listen only for the specified amount of seconds. If not specified, inotifywatch will gather
statistics until receiving an interrupt signal by (for example)
pressing CONTROL-C at the console.
while true; do
if [[ "$(inotifywatch -e modify,create,delete -t 30 /volumes/config/ 2>&1)" =~ filename ]]; then
service nginx reload;
fi;
done
This way you set up a minimum timer after which the reloads will take place and you don't lose any watches between calls to inotifywait.

If you
Use a script similar to what's provided in this answer, let's call it check_nginx_confs.sh
Change your ExecStart directive in nginx.service so /etc/nginx/ is /dev/shm/nginx/
Add a script to /etc/init.d/ to copy conf files to your temp dir ------------------------
mkdir /dev/shm/nginx && cp /etc/nginx/* /dev/shm/nginx
Use rsync (or other sync tool) to sync /dev/shm/nginx back to /etc/nginx; so you dont lose config files created in /dev/shm/nginx on reboot. Or simply make both locations in-app, for atomic checks as desired
Set a cronjob to run check_nginx_confs.sh as often as files 'turn old' in check_nginx_confs.sh, so you know if a change happened within the last time window but only check once
Only systemctl reload ngnix if check_nginx_confs.sh finds a new file, once per time period defined by $OLDTIME
Rest
Now nginx will load those configs much, much faster; from RAM. It will only reload once every $OLDTIME seconds and only if it needs to. Beyond just routing requests to a dynamic handler of your own; this is probably the fastest you get nginx to reload frequently
It's a good idea to reserve a certain disk quota to the temp directory you use, to ensure you don't run out of memory. There are various ways of accomplishing that. You can also add a symlink to an empty, on-disk directory in case you have to spill over but that'd be a lot of confs
Script from other answer:
#!/bin/sh
# Input file
TESTDIR=/dev/shm/nginx
# How many seconds before dir is deemed "older"
OLDTIME=75
#add a little grace period, optional
# Get current and file times
CURTIME=$(date +%s)
FILETIME=$(date -r $TESTDIR +%s)
TIMEDIFF=$(expr $CURTIME - $FILETIME)
# Check if dir updated in last 120 seconds
if [ $OLDTIME -gt $TIMEDIFF ]; then
systemctl nginx reload
fi
# Run me every 1 minute with cron
Optionally; if you're feeling up to it you can put the copy and sync commands in nginx.service's ExecStart with some && magic so they always happen together. You can also && a sort of 'destructor function' which does a final sync and frees /dev/shm/nginx on ExecStop. This would replace step (3) and (4)
Alternative to cron; you can have a script running a loop in the background with a wait duration. If you do this, you can pass LastUpdateTime back and forth between the two scripts for greater accuracy as LastUpdateTime+GracePeriod is more reliable. With this, I would still use cron to periodically make sure the loop is still running
For reference, on my CentOS 7 images, nginx.service is at
/usr/lib/systemd/system/nginx.service

Rather than reloading nginx several times a minute I would suggest to watch the config file and execute the reload only when the changes are saved; you can use inotifywait (available through the inotify-tools package) with the following command:
while inotifywait -e close_write /etc/nginx/sites-enabled/default; do service nginx reload; done

Fail2ban for nginx post flood ignores time intervals

I'm trying to create a fail2ban filter that is going to ban the host when it sends over 100 POST requests over 30 seconds interval.
jail.local:
[nginx-postflood]
enabled = false
filter = nginx-postflood
action = myaction
logpath = /var/log/nginx/access.log
findtime = 30
bantime = 100
maxretry = 100
nginx-postflood.conf
[Definition]
failregex = ^<HOST>.*"POST.*
ignoreregex =
Using GREP i was able to test the regular expressions and indeed it matches Host and POST requests.
Problem is that it bans any Host that performs at least one POST request. This means likely that it's not taking findttime or maxretry options into consideration. In my opinion it's timestamp issue.
Sample line of nginx log:
5.5.5.5 - user [05/Aug/2014:00:00:09 +0200] "POST /auth HTTP/1.1" 200 6714 "http://referer.com" "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0"
Any help?

I guess it maybe to late for the answer but anyway...
The excerpt you have posed has the filter disabled.
enabled = false
As there is not mentioning of Fail2Ban version and syslog/fail2ban logs are missing for this jail.
I tested your Filter on fail2ban 0.9.3-1 and it works fine although I had to enable it and had to drop the line with action = myaction as you have not provided what you are expecting fail2ban to do.
Therefore this filter should work fine, based that it's enabled and the action is correct as well.
What is happening in the provided example is that Your Filter is disabled and fail2ban is using another Filter which checks the same log file and matches your regex but has more restrictive rules i.e ban after 1 request.

How can I serve HTTP slowly?

I'm working on an http client and I would like to test it on requests that take some time to finish. I could certainly come up with a python script to suit my needs, something about like:
def slow_server(environ, start_response):
with getSomeFile(environ) as file_to_serve:
block = file_to_serve.read(1024);
while block:
yield block
time.sleep(1.0)
block = file_to_serve.read(1024);
but this feels like a problem others have already encountered. Is there an easy way to serve static files with an absurdly low bandwidth cap, short of a full scale server like apache or nginx.
I'm working on linux, and the way I've been testing so far is with python -m SimpleHTTPServer 8000 in a directory full of files to serve. I'm equally interested in another simple command line server or a way to do bandwidth limiting with one or a few iptables commands on tcp port 8000 (or whatever would work).

The solution I'm going with for now uses a "real" webserver, but a much easier to configure one, lighttpd. I've added the following file to my path (its in ~/bin)
#! /usr/sbin/lighttpd -Df
server.document-root = "/dev/null"
server.modules = ("mod_proxy")
server.kbytes-per-second = env.LIGHTTPD_THROTTLE
server.port = env.LIGHTTPD_PORT
proxy.server = ( "" => (( "host" => "127.0.0.1", "port" => env.LIGHTTPD_PROXY )))
Which is a lighttpd config file that acts as a reverse proxy to localhost; source and destination ports, as well as a server total maximum bandwidth are given as environment variables, and so it can be invoked like:
$ cd /path/to/some/files
$ python -m SimpleHTTPServer 8000 &
$ LIGHTTPD_THROTTLE=60 LIGHTTPD_PORT=8001 LIGHTTPD_PROXY=8000 throttle.lighttpd
to proxy the python file server on port 8000 with a low 60KB per second on port 8001. Obviously, lighttpd could be used to serve the files itself, but this little script can be used to make any http server slow

On Windows you can use Fiddler which is a HTTP proxy debugging tool to simulate very slow speeds. Maybe a similar tool exists on what ever OS you are using.

I remember I once had the same question and my search turned up an Apache2 module that goes by the name of mod_bw (mod_bandwith that is). It served me well for my testings.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex