Unicorn is killed automatically - nginx

I'm using unicorn in a staging environment (Ubuntu), when a build process is started unicorn is killed automatically with the following logs.
I, [2014-09-23T06:59:58.912673 #16717] INFO -- : reaped #<Process::Status: pid 16720 exit 0> worker=0
I, [2014-09-23T06:59:58.913144 #16717] INFO -- : reaped #<Process::Status: pid 16722 exit 0> worker=1
I, [2014-09-23T06:59:58.913464 #16717] INFO -- : master complete
I'm unable to locate why this is error is happening.

It seems your unicorn server is gracefully shutdown by sending a SIGQUIT to the master process. In this case, the master process reaps all its worker processes after they have finished their current request and then shuts down itself. Unicorn supports a couple more signals to trigger certain behaviour (e.g. adding or removing workers, reloading itself, ...). You can lean more about that at the SIGNALS documentation of unicorn.
The SIGQUIT is probably caused by your deployment process which probably tries to reload/restart your unicorn but dies something strange. Generally, you should look at your unicorn init script or your deployment process for which signals are send (e.g by using the kill command).

Related

salt-stack master v3005.1 with minion v3000.5 cp module problem

I have allowed 4505 and 4506 port on the master node system.
When I want to use 'cp.get_dir' to transfer a directory to minion v3000.5(CentOS6 arch), it took a long time, and then returned:
Minion did not return. [No response]
The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:
salt-run jobs.lookup_jid 20221208081502397105
and the minion log shows:
2022-12-08 16:16:25,987 [salt.utils.parsers :1082][WARNING ][21455] Minion received a SIGTERM. Exiting.
2022-12-08 16:16:28,141 [salt.minion :2003][WARNING ][21819] The minion failed to return the job information for job 20221208081547958420. This is often due to the master be
ing shut down or overloaded. If the master is running, consider increasing the worker_threads value.
When I use the same command to transfer the same directory to minion v3005.1(CentOS7 arch), it works.
And I tried several CentOS6 minion nodes, got the same error returns.
It can transfer the name of first file and directory to minion, but with empty content in the file. It seems like the communication between the master and the minion is broken when the first file start to be transfered.
But when I use cmd.run module command, it works.

Data unpack would read past end of buffer in file util/show_help.c at line 501

I submitted a job via slurm. The job ran for 12 hours and was working as expected. Then I got Data unpack would read past end of buffer in file util/show_help.c at line 501. It is usual for me to get errors like ORTE has lost communication with a remote daemon but I usually get this in the beginning of the job. It is annoying but still does not cause as much time loss as getting error after 12 hours. Is there a quick fix for this? Open MPI version is 4.0.1.
--------------------------------------------------------------------------
By default, for Open MPI 4.0 and later, infiniband ports on a device
are not used by default. The intent is to use UCX for these devices.
You can override this policy by setting the btl_openib_allow_ib MCA parameter
to true.
Local host: barbun40
Local adapter: mlx5_0
Local port: 1
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.
Local host: barbun40
Local device: mlx5_0
--------------------------------------------------------------------------
[barbun21.yonetim:48390] [[15284,0],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in
file util/show_help.c at line 501
[barbun21.yonetim:48390] 127 more processes have sent help message help-mpi-btl-openib.txt / ib port
not selected
[barbun21.yonetim:48390] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error
messages
[barbun21.yonetim:48390] 126 more processes have sent help message help-mpi-btl-openib.txt / error in
device init
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
An MPI communication peer process has unexpectedly disconnected. This
usually indicates a failure in the peer process (e.g., a crash or
otherwise exiting without calling MPI_FINALIZE first).
Although this local MPI process will likely now behave unpredictably
(it may even hang or crash), the root cause of this problem is the
failure of the peer -- that is what you need to investigate. For
example, there may be a core file that you can examine. More
generally: such peer hangups are frequently caused by application bugs
or other external events.
Local host: barbun64
Local PID: 252415
Peer host: barbun39
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[15284,1],35]
Exit code: 9
--------------------------------------------------------------------------

Ensure nginx master process stays running

I am currently trying to setup a docker container using ubuntu:14.04 as my base image, with nginx and gunicorn/django/celery running inside. I am using supervisor to start all of the processes, and have tested to make sure gunicorn is relaunched when it goes down. However, I can't figure out how to do it with nginx.
My supervisord.conf for nginx is as follows:
[program:nginx]
command=nginx
autorestart=false
I have autorestart set to false because, from what I can tell, the nginx command simply starts the master process and worker processes, and then exits with status code 0. If I have autorestart set to true, it simply keeps trying to restart that nginx command, which will fail for subsequent retries because the master/worker processes are already running and bound to the port.
On the surface, this seems okay, because if I try and kill a worker process, the master will start another worker to take it's place. But how do I ensure the master process stays running as well?
You need to append daemon off; to your nginx.conf configuration instructing nginx to run in the foreground.
Then modify your supervisor stanza to be:
[program:nginx]
command=nginx
autorestart=true
It will still spawn master/worker processes/subprocesses and can be used this way in production setups just fine. In this case it's supervisor that runs the process in the background and controls and supervises it.
See this FAQ entry

Autosys jobs hung

We have jobs getting stuck in autosys R11 screen due to app server down
So is there any way to monitor autoys itself is up and running
Note-The jobs which got stuck shows completed in database but the dependent jobs cannot start though from front end the jobs are still in runnig status
Please help
chk_auto_up command will check if application server, event server,
scheduler and agent are working fine.
chase command checks if agent is running fine.
autoping command checks if agent is able to communicate with the
application server.
Check the log files of components by below commands :
autosyslog -e (scheduler)
autosyslog -s (server)
autosyslog -d j (job)
check the status of each components manually by below commands
unisrvcntr status waae_server.$AUTOSERV
unisrvcntr status waae_agent-$AGENT_NAME
unisrvcntr status waae_webserver.$AUOTSERV
unisrvcntr status waae_sched.$AUTOSERV

mpi_comm_spawn on remote nodes

How does one use MPI_Comm_spawn to start worker processes on remote nodes?
Using OpenMPI 1.4.3, I've tried this code:
MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "host", "node2");
MPI_Comm intercom;
MPI_Comm_spawn("worker",
MPI_ARGV_NULL,
nprocs,
info,
0,
MPI_COMM_SELF,
&intercom,
MPI_ERRCODES_IGNORE);
But that fails with this error message:
--------------------------------------------------------------------------
There are no allocated resources for the application
worker
that match the requested mapping:
Verify that you have mapped the allocated resources properly using the
--host or --hostfile specification.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
If I replace the "node2" with the name of my local machine, then it works fine. If I ssh into node2 and run the same thing there (with "node2" in the info dictionary) then it also works fine.
I don't want to start the parent process with mpirun, so I'm just looking for a way to dynamically spawn processes on remote nodes. Is this possible?
I don't want to start the parent
process with mpirun, so I'm just
looking for a way to dynamically spawn
processes on remote nodes. Is this
possible?
I'm not sure why you don't want to start it with mpirun? You're implicitly starting up the whole MPI machinery anyway as soon as you hit MPI_Init(), this way you just get to pass it options rather than relying on the default.
The issue here is simply that when the MPI library starts up (at MPI_Init()) it doesn't see any other hosts available, because you haven't given it any with the --host or --hostfile options to mpirun. It won't just launch processes elsewhere on your say-so (indeed, spawn doesn't require Info host, so in general it wouldn't even know where to go otherwise), so it fails.
So you'll need to do
mpirun --host myhost,host2 -np 1 ./parentjob
or, more generally, provide a hostfile, preferably with a number of slots available
myhost slots=1
host2 slots=8
host3 slots=8
and launch the jobs this way, mpirun --hostfile mpihosts.txt -np 1 ./parentjob This is a feature, not a bug; now it's MPIs job to figure out where the workers go, and if you don't specify a host explicitly in the info, it'll try to put it in the most underutilized place. It also means you don't have to recompile to change the hosts you'll spawn to.

Resources