Understanding Docker container resource usage - nginx

I have server running Ubuntu 16.04 with Docker 17.03.0-ce running an Nginx container. That server also has ConfigServer Security & Firewall installed. Shortly after starting the Nginx container I start receiving emails about "Excessive resource usage" with the following details:
Time: Fri Mar 24 00:06:02 2017 -0400
Account: systemd-timesync
Resource: Process Time
Exceeded: 1820 > 1800 (seconds)
Executable: /usr/sbin/nginx
Command Line: nginx: worker process
PID: 2302 (Parent PID:2077)
Killed: No
I fully understand that I can add exe:/usr/sbin/nginx to csf.pignore to stop these email alerts but I would like to understand a few things first.
Why is the "systemd-timesync" account being reported? That does not seem to have anything to do with Docker.
Why does the host machine seem to be reporting the excessive resource usage (the extended process time) when that is something running in the container?
Why are other docker containers not running Nginx not resulting in excessive resource usage emails?
I'm sure there are other questions but basically, why is this being reported the way it is being reported?

I can at least answer the first two questions:
Unlike real VMs, Docker containers are simply a collection of processes run under the host system kernel. They just have a different view on certain system resources, including their own file hierarchy, their own PID namespace and their own /etc/passwd file. As a result, they will still show up if you ps aux on the host machine.
The nginx container's /etc/passwd includes a user 'nginx' with UID 104 that runs the nginx worker process. However, in the host's /etc/passwd, UID 104 might belong to a completely different user, such as systemd-timesync.
As a result, if you run ps aux | grep nginx in the container, you might see
nginx 7 0.0 0.0 32152 2816 ? S 11:20 0:00 nginx: worker process
while on the host, you see
systemd-timesync 22004 0.0 0.0 32152 2816 ? S 13:20 0:00 nginx: worker process
even though both are the are the same process (also note the different PID namespaces; in containers, PIDs are counted from 1 again).
As a result, container processes will still be subject to ConfigServer's resource monitoring, but they might show up with random, or even non-existent user accounts.
As to why nginx triggers the emails and other containers don't, I can only assume that nginx is the only one of your containers that crosses ConfigServer's resource thresholds.

Related

Linux Apline: restart nginx

I have Alpine Linux, 3.15.0 version on the server.
The installed nginx version is 1.21.6. I have performed apk update
nginx -t command successfully responds with
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
When I type nginx -s reload server responds with
2023/02/03 10:58:00 [notice] 54#54: signal process started
but nothing actually happens. It's like the process started and that's all.
What am I missing?
According to Nginx documentation, command nginx -s reload actually sends signal to nginx master process and
once the master process receives the signal to reload configuration,
it checks the syntax validity of the new configuration file and tries
to apply the configuration provided in it. If this is a success, the
master process starts new worker processes and sends messages to old
worker processes, requesting them to shut down.
Thus, we can consider that nginx is restarted (If we disregard the fact that the master process itself continued to work).
At the same time, if you want to totally restart nginx, you can stop it with nginx -s quit command and then start again. Or that's much better use your system service manager. If I'm not mistaken, there is an open-rc in Alpine, thus command will be rc-service nginx restart.

Ubuntu (Oracle VM) - Mounted Samba shares hang indefinitely

I have a VM instance on Oracle Cloud (Ubuntu 22.04) set up with ZeroTier to act as a web server for some services that should work with my local Synology NAS.
For some of those services I also need to mount three SMB shares from my NAS with the ZeroTier tunnel, but I can't make it work.
I used mount and mount.cifs plenty of times with automounting too, this time it acts very strange:
running the mount command seems to succeed from the console, but /var/log/syslog reads
CIFS: VFS: \\XXX.XXX.XXX.XXX has not responded in 180 seconds.
Reconnecting...
if trying to access one of the shares (ls or lsof or cd or any other command), it succeeds for only one of the shares (always the same one), but only for the first time any command is given:
$ ls /temp
folder1 folder2 folder3
any other following command just "hangs" as if they system is working on something, but it stays like that indefinitely most of the times:
$ ls /temp
█
Just a few times it spits out this error
lsof: WARNING: can't stat() cifs file system /temp
Output information may be incomplete.
ls 1475 ubuntu 3r DIR 0,44 0 123207681 /temp
findmnt reads:
└─/temp //XXX.XXX.XXX.XXX/Downloads cifs rw,relatime,vers=2.0,cache=strict, username=[redacted],uid=1005,noforceuid,gid=0,noforcegid,addr=XXX.XXX.XXX.XXX,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=65536,wsize=65536,bsize=1048576,echo_interval=60,actimeo=1
for the remaining two "mounted" shares, none of them seems to respond to any command, not even the very first command, and they just hang like the one share that, at least, lets me browse for one time;
umount and umount -l take at least 2-3 minutes to successfully unmount the shares.
Same behavior when using smbclient and also with NFS shares from the same NAS.
What I have already tried:
update kernel and all packages;
remove, purge and reinstall cifs-utils, smbclient and so on...
tried mounting the same shares in another client / node within the ZeroTier network and it works just fine; also browsing from Windows and Android file manager apps with and without ZeroTier works flawlessly;
tried all SMB versions including SMBv3 and SMBv1 (CIFS);
tried different browsing or mounting methods / commands including mount, mount.cifs, autofs, smbclient;
tried to debug what happens behind the console, but didn't found anything that seems related to this in logs, htop or anything else. During the "hanging" sessions there is no spike in CPU, RAM or Network usage in either the Oracle VM or Synology NAS;
checked, reset and reconfigured all permissions on my NAS for shares, folders and files recursively and reconfigured users groups permissions.
What I haven't tried yet (I'll try as soon as possible):
reproduce this on another Oracle VM configured the same as the faulty one and another with a different base image (maybe Oracle Linux?);
It seems to me that the mount.cifs process doesn't really succeeds in mounting the share correctly, as it doesn't show as such anywhere. It also seems an issue not related to folder/file permissions, but rather something related to networking?
A note on something that may or may not be related to this: ZeroTier on my Synology NAS does not seems to work with IPv4 only - it remains OFFLINE. The node goes ONLINE only when IPv6 is enabled, but I must say that this is the only node in my ZT network that shows a IPv6 as public IP in the ZT web GUI - the other nodes show IPv4 public addresses.
If anyone has any clue on this, I'll be happy to support and reproduce any advice. Thank you!
I'm using YailScale, but I presume it will work the same.
You need to add the port 445 to /etc/iptables/rules.v4 just under the SSH setup like below:
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 445 -j ACCEPT (like this)
Then you need to edit the interfaces in /etc/samba/smb.conf to:
interfaces = lo tailscale0 100.0.0.0/24
Obviously, my interface is tailscale0, but yours will be different. Use ip link show to find yours. You may also need to change your IP range to suit ZeroTeirs, such as 100.0.0.0/24, which is what tailscale uses.
Then reboot!
I couldn't get it working without doing this.

Why there are 3 processes for nginx?

My OS is Ubuntu, I use ps -aux |grep nginx, and I've found 3 nginx's processes; so my question is why there are 3 processes for nginx? it seems one process is by root, another two from www-data:
root 7833 0.0 0.0 126092 1476 ? Ss 12:32 0:00 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
www-data 7834 0.0 0.0 126504 3124 ? S 12:32 0:00 nginx: worker process
www-data 7835 0.0 0.1 126504 5068 ? S 12:32 0:00 nginx: worker process
The process that is being run as root is the master NGINX process.
The two others are worker processes.
During the launch of NGINX service, the master process is the first one to launch.
It spans off the worker processes that actually handle the connections.
The master process runs as root in order to be able to do things like binding to privileged network ports, reading TLS certificates/keys during configuration load.
The worker processes have dropped privileges, as they only require to be able to read website files.
The number of worker processes can be controlled with worker_processes configuration directive. The default value is 1. Which means on a system with default config you will see a total of 2 processes (1 master and 1 worker).
The more worker processes you have, the more connections your web server can handle on a multi-core system.
E.g. you have 4 core CPU. By setting worker_processes 4; you make sure that all cores are being used to handle connections, so it is going to improve performance on a busy website.
Moreover you can just set worker_processes auto;. That will have NGINX determine the number of logical CPU units and set the number of workers corresponding to that.
The root process is necessary for nginx to access the network and files on your system.
The other two processes are set in your config file. Look there and you will see a setting for that which is dependent on the number of cores in the processor on your server. More available processes means more compute power as access to your server increases with visitors.
It's possible (I do not recall) that two processes is a default setting.

Ensure nginx master process stays running

I am currently trying to setup a docker container using ubuntu:14.04 as my base image, with nginx and gunicorn/django/celery running inside. I am using supervisor to start all of the processes, and have tested to make sure gunicorn is relaunched when it goes down. However, I can't figure out how to do it with nginx.
My supervisord.conf for nginx is as follows:
[program:nginx]
command=nginx
autorestart=false
I have autorestart set to false because, from what I can tell, the nginx command simply starts the master process and worker processes, and then exits with status code 0. If I have autorestart set to true, it simply keeps trying to restart that nginx command, which will fail for subsequent retries because the master/worker processes are already running and bound to the port.
On the surface, this seems okay, because if I try and kill a worker process, the master will start another worker to take it's place. But how do I ensure the master process stays running as well?
You need to append daemon off; to your nginx.conf configuration instructing nginx to run in the foreground.
Then modify your supervisor stanza to be:
[program:nginx]
command=nginx
autorestart=true
It will still spawn master/worker processes/subprocesses and can be used this way in production setups just fine. In this case it's supervisor that runs the process in the background and controls and supervises it.
See this FAQ entry

Phusion Passenger Standalone seems to be on but nothing appears in browser

I ssh to the dev box where I am suppose to setup Redmine. Or rather, downgrade Redmine. In January I was asked to upgrade Redmine from 1.2 to 2.2. But the plugins we wanted did not work with 2.2. So now I'm being asked to setup Redmine 1.3.3. We figure we can upgrade from 1.2 to 1.3.3.
In January I had trouble getting Passenger to work with Nginx. This was on a CentOS box. I tried several installs of Nginx. I'm left with different error logs:
This:
whereis nginx.conf
gives me:
nginx: /etc/nginx
but I don't think that is in use.
This:
find / -name error.log
gives me:
/opt/nginx/logs/error.log
/var/log/nginx/error.log
When I tried to start Passenger again I was told something was already running on port 80. But if I did "passenger stop" I was told that passenger was not running.
So I did:
passenger start -p 81
If I run netstat I see something is listening on port 81:
netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 localhost:81 localhost:42967 ESTABLISHED
tcp 0 0 10.0.1.253:ssh 10.0.1.91:51874 ESTABLISHED
tcp 0 0 10.0.1.253:ssh 10.0.1.91:62993 ESTABLISHED
tcp 0 0 10.0.1.253:ssh 10.0.1.91:62905 ESTABLISHED
tcp 0 0 10.0.1.253:ssh 10.0.1.91:50886 ESTABLISHED
tcp 0 0 localhost:81 localhost:42966 TIME_WAIT
tcp 0 0 10.0.1.253:ssh 10.0.1.91:62992 ESTABLISHED
tcp 0 0 localhost:42967 localhost:81 ESTABLISHED
but if I point my browser here:
http: // 10.0.1.253:81 /
(StackOverFlow does not want me to publish the IP address, so I have to malform it. There is no harm here as it is an internal IP that no one outside my company could reach.)
In Google all I get is "Oops! Google Chrome could not connect to 10.0.1.253:81".
I started Phusion Passenger at the command line, and the output is verbose, and I expect to see any error messages in the terminal. But I'm not seeing anything. It's as if my browser request is not being heard, even though netstat seems to indicate the app is listening on port 81.
A lot of other things could be wrong with this app (I still need to reverse migrate the database schema) but I'm not seeing any of the error messages that I expect to see. Actually, I'm not seeing any error messages, which is very odd.
UPDATE:
If I do this:
ps aux | grep nginx
I get:
root 20643 0.0 0.0 103244 832 pts/8 S+ 17:17 0:00 grep nginx
root 23968 0.0 0.0 29920 740 ? Ss Feb13 0:00 nginx: master process /var/lib/passenger-standalone/3.0.19-x86_64-ruby1.9.3-linux-gcc4.4.6-1002/nginx-1.2.6/sbin/nginx -c /tmp/passenger-standalone.23917/config -p /tmp/passenger-standalone.23917/
nobody 23969 0.0 0.0 30588 2276 ? S Feb13 0:34 nginx: worker process
I tried to cat the file /tmp/passenger-standalone.23917/config but it does not seem to exist.
I also killed every session of "screen" and every terminal window where Phusion Passenger might be running, but clearly, looking at ps aux, it looks like something is running.
Could the Nginx be running even if the Passenger is killed?
This:
ps aux | grep phusion
brings back nothing
and this:
ps aux | grep passenger
Only brings back the line with nginx.
If I do this:
service nginx stop
I get:
nginx: unrecognized service
and:
service nginx start
gives me:
nginx: unrecognized service
This is a CentOS machine, so if I had Nginx installed normally, this would work.
The answer is here - Issue Uploading Files from Rails app hosted on Elastic Beanstalk
You probably have /etc/cron.daily/tmpwatch removing the /tmp/passenger-standalone* files every day, and causing you all this grief.

Resources