NGINX (Operation not permitted) while reading upstream

NGINX (Operation not permitted) while reading upstream - nginx

I have NGINX working as a cache engine and can confirm that pages are being cached as well as being served from the cache. But the error logs are getting filled with this error:
2018/01/19 15:47:19 [crit] 107040#107040: *26 chmod()
"/etc/nginx/cache/nginx3/c0/1d/61/ddd044c02503927401358a6d72611dc0.0000000007"
failed (1: Operation not permitted) while reading upstream, client:
xx.xx.xx.xx, server: *.---.com, request: "GET /support/applications/
HTTP/1.1", upstream: "http://xx.xx.xx.xx:80/support/applications/",
host: "---.com"
I'm not really sure what the source of this error could be since NGINX is working. Are these errors that can be safely ignored?

It looks like you are using nginx proxy caching, but nginx does not have the ability to manipulate files in it's cache directory. You will need to get the ownership/permissions correct on the cache directory.

Not explained in the original question is that the mounted storage is an Azure file share. So in the FSTAB I had to include the gid= and uid= for the desired owner. This then removed the need for chown and chmod also became unnecessary. This removed the chmod() error but introduced another.
Then I was getting errors on rename() without permission to perform this. At this point I scrapped what I was doing, moved to a different type of Azure storage (specifically a Disk attached to the VM) and all these problems went away.
So I'm offering this as an answer but realistically, the problem was not solved.

We noticed the same problem. Following the guide from Microsoft # https://learn.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv#create-a-storage-class seems to have fixed it.
In our case the nginx process was using a different user for the worker threads, so we needed to find that user's uid and gid and use that in the StorageClass definition.

Related

Passenger Looking for Permissions on a Missing File

Linux installation of Phusion Passenger(R) 6.0.14 on nginx/1.14.1. I'm not able to load a site (on a simplified nginx.conf with the conf.d/passenger.conf and conf.d/site1.conf includes. That error:
2022/08/03 19:24:34 [alert] 55601#0: *3 Error opening '/home/user3/sites/Passengerfile.json' for reading: Permission denied (errno=13);
This error means that the Nginx worker process (PID 55601, running as UID 992) does not have permission to access this file.
Please read this page to learn how to fix this problem: https://www.phusionpassenger.com/library/admin/nginx/troubleshooting/?a=upon-accessing-the-web-app-nginx-reports-a-permission-denied-error;
Extra info, client: 192.168.1.4, server: domain1.com, request: "GET / HTTP/1.1", host: "server_f.local"
I don't even know what to ask, other than how can I get this to work? That file doesn't exist. I've restarted nginx many times, and this is the only feedback that I get. I've checked over that page, which tells me to look at the error log it's reported in already. The host and server are correct, and I am on my LAN.

How to control vhost_shared_traffic memory K8s nginx ingress?

Background
We run a kubernetes cluster that handles several php/lumen microservices. We started seeing the app php-fpm/nginx reporting 499 status code in it's logs, and it seems to correspond with the client getting a blank response (curl returns curl: (52) Empty reply from server) while the applications log 499.
10.10.x.x - - [09/Mar/2020:18:26:46 +0000] "POST /some/path/ HTTP/1.1" 499 0 "-" "curl/7.65.3"
My understanding is nginx will return the 499 code when the client socket is no longer open/available to return the content to. In this situation that appears to mean something before the nginx/application layer is terminating this connection. Our configuration currently is:
ELB -> k8s nginx ingress -> application
So my thoughts are either ELB or ingress since the application is the one who has no socket left to return to. So i started hitting ingress logs...
Potential core problem?
While looking the the ingress logs i'm seeing quite a few of these:
2020/03/06 17:40:01 [crit] 11006#11006: ngx_slab_alloc() failed: no memory in vhost_traffic_status_zone "vhost_traffic_status"
Potential Solution
I imagine if i gave vhost_traffic_status_zone some more memory at least that error would go away and on to finding the next error.. but I can't seem to find any configmap value or annotation that would allow me to control this. I've checked the docs:
https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/
https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/
Thanks in advance for any insight / suggestions / documentation I might be missing!

here is the standard way to look up how to modify the nginx.conf in the ingress controller. After that, I'll link in some info on suggestions on how much memory you should give the zone.
First start by getting the ingress controller version by checking the image version on the deploy
kubectl -n <namespace> get deployment <deployment-name> | grep 'image:'
From there, you can retrieve the code for your version from the following URL. In the following, I will be using version 0.10.2.
https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.10.2
The nginx.conf template can be found at rootfs/etc/nginx/template/nginx.tmpl in the code or /etc/nginx/template/nginx.tmpl on a pod. This can be grepped for the line of interest. I the example case, we find the following line in the nginx.tmpl
vhost_traffic_status_zone shared:vhost_traffic_status:{{ $cfg.VtsStatusZoneSize }};
This gives us the config variable to look up in the code. Our next grep for VtsStatusZoneSize leads us to the lines in internal/ingress/controller/config/config.go
// Description: Sets parameters for a shared memory zone that will keep states for various keys. The cache is shared between all worker processe
// https://github.com/vozlt/nginx-module-vts#vhost_traffic_status_zone
// Default value is 10m
VtsStatusZoneSize string `json:"vts-status-zone-size,omitempty"
This gives us the key "vts-status-zone-size" to be added to the configmap "ingress-nginx-ingress-controller". The current value can be found in the rendered nginx.conf template on a pod at /etc/nginx/nginx.conf.
When it comes to what size you may want to set the zone, there are the docs here that suggest setting it to 2*usedSize:
If the message("ngx_slab_alloc() failed: no memory in vhost_traffic_status_zone") printed in error_log, increase to more than (usedSize * 2).
https://github.com/vozlt/nginx-module-vts#vhost_traffic_status_zone
"usedSize" can be found by hitting the stats page for nginx or through the JSON endpoint. Here is the request to get the JSON version of the stats and if you have jq the path to the value: curl http://localhost:18080/nginx_status/format/json 2> /dev/null | jq .sharedZones.usedSize
Hope this helps.

How can I stop nginx failling over when openresty throws runtime error deploying cert

We are using openresty and the lua-resty-auto-ssl package to generate certificates from Lets Encrypt but lately the server keeps falling over. Im guessing its triggered when a certificate trys to auto renew as generating a certificate for first time works fine ... the error we are seeing is
2019/05/12 08:25:24 [error] 2623#2623: *1024227 lua entry thread aborted: runtime error: ...sty/luajit/share/lua/5.1/resty/auto-ssl/servers/hook.lua:40: assertion failed!
stack traceback:
coroutine 0:
[C]: in function 'assert'
...sty/luajit/share/lua/5.1/resty/auto-ssl/servers/hook.lua:40: in function 'server'
.../local/openresty/luajit/share/lua/5.1/resty/auto-ssl.lua:99: in function 'hook_server'
content_by_lua(nginx.conf:194):2: in function <content_by_lua(nginx.conf:194):1>, client: 127.0.0.1, server: , request: "POST /deploy-cert HTTP/1.1", host: "127.0.0.1:8999"
From what I can see in the error it is failing to assert something when trying to deploy the cert which could be any of 4 things
assert(params["domain"])
assert(params["fullchain"])
assert(params["privkey"])
assert(params["expiry"])
Im a bit stuck to what I can do, its no good having the server dropping out on use. Thats the last error thats reported before the server goes offline so im guessing thats the cause? but not 100% sure.
Is there anywhere I can look to find out more information what causes the crash. Im new to nginx/openresty so fumbling my round a bit. Has anyone come across a similar issue?

Wrap it all in a function and call it with pcall or xpcall and add some logic to deal with the error.

PHP 5.5, NGINX and Memcached - 502 Error

I'm having a problem with Memcached pools. I will try to add all the context of the error to see if you guys can help me with this.
Context:
PHP 5.5.17 (cgi-fcgi) (built: Sep 24 2014 20:38:04)
php-pecl-memcache-3.0.8-2.fc17.remi.5.5.x86_64
nginx version:
nginx/1.0.15
My problem:
I am creating a connection with memcached and saving a several keys, just in one server first, something like this:
$_memcache = new Memcache;
$_memcache->addServer("127.0.0.1", "11211", true, 50, 3600, 45);
So, let suppose that I add several keys, in that server and I can get those without problem, I actually can, when I see my site and my code is calling to get the keys, it's getting it.
Now the problem, let say with those keys already saved and working without problem, I added another memcached server to the pool, this way:
$_memcache = new Memcache;
$_memcache->addServer("10.0.0.2", "11211", true, 50, 3600, 45);
$_memcache->addServer("10.0.0.3", "11211", true, 50, 3600, 45);
But before I refreshed the site to run my code and get the keys that I have storage in the first server I stopped the memcached in that server number 1 (10.0.0.2), after that I refreshed my site and then I received a 502 error (Bad Gateway)
The error that I am seeing in the log of NGINX is:
[error] 9364#0: *329504 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: XX.XX.XXX.XXX, server: _, request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/run/php-fastcgi/sock:", host: "www.myhost.com"
So, why I am getting that error. The only theory that I have is that for some reason because the connections is persistent is not closing properly when I stop the memcached server, but that only happens when I have a pool of X > 1 servers. If I use just that one and stopped I won't see the error.
There is any way that PHP 5.5 has a bug with the fastcgi socket that I don't know.
NOTE: This problem that I am having wasn't happening on previous PHP version 5.3, after changing the version is this is happening.
NOTE: When I don't use persistent connection seems to work, but this site has a huge traffic and it won't handle the amount of connections open, I tested this in dev environment.
Any help or any suggestions are more than welcome.
Thanks in advance!

What happens to nginx when I use named pipes as log file and remove and recreate the pipe?

I'm using named pipes as the log file for the access_log of my nginx, I want to know what happens internally in nginx when I delete and recreate the pipe. What I note is that nginx keep working but stop logging.
Even if i don't create again the pipe nginx didn't try to create a regular file for logging.
I don't want to lose my logs, but apparently the only option is to restart nginx, can I enforce nginx to check again for the log file?
The error log only says this, even if the pipe doesn't exists or the pipe is recreated:
2012/02/27 22:45:13 [alert] 24537#0: *1097 write() to "/tmp/access.log.fifo" failed (32: Broken pipe) while logging request, client: 127.0.0.1, server: , request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:8000/", host: "localhost:8002"
Thanks.

AFIAK, you need to send nginx a USR1 signal to instruct it to reopen the log files. Basically nginx will keep trying to write to the file-descriptor for the old files (that's why you are seeing the Broken Pipe error). More info here:
http://wiki.nginx.org/LogRotation (also click through the other links at the bottom of that page).
hth

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex