I'm using nginx for a reverse proxy in a lab environment for a POC. While looking through the nginx access.log file I see this entry every 10 seconds
127.0.0.1 - - [06/Aug/2019:02:16:50 +0000] "GET /server-status HTTP/1.1" 404 152 "-" "Go-http-client/1.1"
I'm trying to figure out what is the source of this and reason behind it. It's really just curiosity and me trying to understand more about nginx as a reverse proxy. Where else should I be looking to further understand what's going on?
Thanks!
I'm about a year late to answering, but for me this was a request made by Netdata and disabling its NGINX statistics collector removed these requests.
Related
Background
We run a kubernetes cluster that handles several php/lumen microservices. We started seeing the app php-fpm/nginx reporting 499 status code in it's logs, and it seems to correspond with the client getting a blank response (curl returns curl: (52) Empty reply from server) while the applications log 499.
10.10.x.x - - [09/Mar/2020:18:26:46 +0000] "POST /some/path/ HTTP/1.1" 499 0 "-" "curl/7.65.3"
My understanding is nginx will return the 499 code when the client socket is no longer open/available to return the content to. In this situation that appears to mean something before the nginx/application layer is terminating this connection. Our configuration currently is:
ELB -> k8s nginx ingress -> application
So my thoughts are either ELB or ingress since the application is the one who has no socket left to return to. So i started hitting ingress logs...
Potential core problem?
While looking the the ingress logs i'm seeing quite a few of these:
2020/03/06 17:40:01 [crit] 11006#11006: ngx_slab_alloc() failed: no memory in vhost_traffic_status_zone "vhost_traffic_status"
Potential Solution
I imagine if i gave vhost_traffic_status_zone some more memory at least that error would go away and on to finding the next error.. but I can't seem to find any configmap value or annotation that would allow me to control this. I've checked the docs:
https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/
https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/
Thanks in advance for any insight / suggestions / documentation I might be missing!
here is the standard way to look up how to modify the nginx.conf in the ingress controller. After that, I'll link in some info on suggestions on how much memory you should give the zone.
First start by getting the ingress controller version by checking the image version on the deploy
kubectl -n <namespace> get deployment <deployment-name> | grep 'image:'
From there, you can retrieve the code for your version from the following URL. In the following, I will be using version 0.10.2.
https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.10.2
The nginx.conf template can be found at rootfs/etc/nginx/template/nginx.tmpl in the code or /etc/nginx/template/nginx.tmpl on a pod. This can be grepped for the line of interest. I the example case, we find the following line in the nginx.tmpl
vhost_traffic_status_zone shared:vhost_traffic_status:{{ $cfg.VtsStatusZoneSize }};
This gives us the config variable to look up in the code. Our next grep for VtsStatusZoneSize leads us to the lines in internal/ingress/controller/config/config.go
// Description: Sets parameters for a shared memory zone that will keep states for various keys. The cache is shared between all worker processe
// https://github.com/vozlt/nginx-module-vts#vhost_traffic_status_zone
// Default value is 10m
VtsStatusZoneSize string `json:"vts-status-zone-size,omitempty"
This gives us the key "vts-status-zone-size" to be added to the configmap "ingress-nginx-ingress-controller". The current value can be found in the rendered nginx.conf template on a pod at /etc/nginx/nginx.conf.
When it comes to what size you may want to set the zone, there are the docs here that suggest setting it to 2*usedSize:
If the message("ngx_slab_alloc() failed: no memory in vhost_traffic_status_zone") printed in error_log, increase to more than (usedSize * 2).
https://github.com/vozlt/nginx-module-vts#vhost_traffic_status_zone
"usedSize" can be found by hitting the stats page for nginx or through the JSON endpoint. Here is the request to get the JSON version of the stats and if you have jq the path to the value: curl http://localhost:18080/nginx_status/format/json 2> /dev/null | jq .sharedZones.usedSize
Hope this helps.
I've got a Python flask app behind gunicorn behind nginx-ingress and I'm honestly just running out of ideas. What happens after a long-running computation is that there's a RemoteDisconnect error at 60 seconds, with nothing untoward in the logs. Gunicorn is set to have a massive timeout, so it's not that. Nginx is quite happy to terminate at 60 sec without any error:
xxx.xx.xx.xx - [xxx.xx.xx.xx] - - [03/Dec/2019:19:32:08 +0000] "POST /my/url" 499 0 "-" "python-requests/2.22.0" 1516 59.087 [my-k8s-service] [] xxx.xx.xx.xx:port 0 59.088 - c676c3df9a40c1692b1789e677a27268
No error, warning, nothing. Since 60s was so suspect, I figured it was proxy-read-timeout or upstream-keepalive-timeout... nothing; I've set those both in a configmap and in the .yaml files using annotations, and exec'ing into the pod for a cat /etc/nginx/nginx.conf shows the requisite server has the test values in place:
proxy_connect_timeout 72s;
proxy_send_timeout 78s;
proxy_read_timeout 75s;
...funny values set to better identify the result. And yet... still disconnects at 60 sec.
The "right" answer, which we're doing, is to rewrite the thing to have asynchronous calls, but it's really bothering me that I don't know how to fix this. Am I setting the wrong thing? In the background, the Flask app continues running and completes after several minutes, but Nginx just says the POST dies after a minute. I'm totally baffled. I've got an Nginx 499 error which means a client disconnect.
We're on AWS, so I even tried adding the annotation
Annotations: service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: 93
...to the service just in case (from https://kubernetes.io/docs/concepts/services-networking/service/). No dice: still dies at 60s.
What am I missing?
Seems no issue on AWS load balancer side. its NGINX GUNICORN connection issue . you need to use update proxy timeout value . try annotations in ingress rules to fix .
nginx.ingress.kubernetes.io/proxy-connect-timeout = 300s
nginx.ingress.kubernetes.io/proxy-send-timeout = 300s
nginx.ingress.kubernetes.io/proxy-read-timeout = 300s
if you are using GUNICORN also set --timeout = 300
I started to learn Python with connexion and flask and got stuck resolving an issue. Even after taking a long break I cannot see where I am wrong and need some advise.
I created a swagger API definition which maps /api/blacklist/{zipcode} to a function. However when I try to access /api/blacklist/12345 I receive a 404. The YAML is located here:
https://github.com/TheHasgarion/pythonflaskrest/blob/master/api.yml
The mapping for /api/blacklist/ works just fine. BTW, even accessing /api/blacklist/ gives me a 404.
The server log says:
127.0.0.1 - - [23/Jun/2019 19:52:09] "GET /api/blacklist/12345 HTTP/1.1" 404 -
127.0.0.1 - - [23/Jun/2019 19:57:16] "GET /api/blacklist HTTP/1.1" 200 -
127.0.0.1 - - [23/Jun/2019 19:57:18] "GET /api/blacklist/ HTTP/1.1" 404 -
Thank you very much in advance for guidance.
Apparently you need to use integer as type, not number. Whyever 'number' means 'float' :-)
I'm facing with this error from few days, maybe weeks. I've tried and tried and tried all kind of solutions over the internet but with no success, so:
I have a dedicated server with Plesk 12 and Ubuntu 14.04 LTS.
This error appear daily, 1-2 times per day and I have to restart whole server and then all websites are working properly until next time this error appear again.
First time it appeared when I have added two more domains to be hosted. I suspected high server loading but no, rams never gets almost completely and cpu load is max 20%.
PS: Error appear only on websites, so I can access Plesk Panel and reboot server.
NGINX error log (/var/log/nginx/error.log):
[error] 1406#0: *119083 connect() failed (111: Connection refused) while connecting to upstream, client: IP, server: , request: "GET http://www.example.com/ HTTP/1.1", upstream: "http://SERVER_IP:7080/", host: "www.example.com"
It is first time when I'm using Plesk and I don't really know how to do or check... .
PS: Can this be caused by a cron job service?
Cron log which is worrying me:
254 postfix/master[1664]: warning: master_wakeup_timer_event: service pickup(public/pickup): Connection refused
"(111: Connection refused) while connecting to upstream"
this error means that something goes wrong with Apache or PHP-FPM(if you have using it), check apache logs of site where it happens: /var/www/vhosts/examples.com/logs/error_log
I'm using named pipes as the log file for the access_log of my nginx, I want to know what happens internally in nginx when I delete and recreate the pipe. What I note is that nginx keep working but stop logging.
Even if i don't create again the pipe nginx didn't try to create a regular file for logging.
I don't want to lose my logs, but apparently the only option is to restart nginx, can I enforce nginx to check again for the log file?
The error log only says this, even if the pipe doesn't exists or the pipe is recreated:
2012/02/27 22:45:13 [alert] 24537#0: *1097 write() to "/tmp/access.log.fifo" failed (32: Broken pipe) while logging request, client: 127.0.0.1, server: , request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:8000/", host: "localhost:8002"
Thanks.
AFIAK, you need to send nginx a USR1 signal to instruct it to reopen the log files. Basically nginx will keep trying to write to the file-descriptor for the old files (that's why you are seeing the Broken Pipe error). More info here:
http://wiki.nginx.org/LogRotation (also click through the other links at the bottom of that page).
hth