How get daemons' status programmatically using Ruby gem "daemons"

How get daemons' status programmatically using Ruby gem "daemons" - ruby-daemons

I have a script (myscript.rb) like below:
require 'daemons'
Daemons.run_proc 'myproc', dir_mode: :normal, dir: '/path/to/pids' do
# Daemon code here...
end
So, I can check the daemon's status in console by ruby myscript.rb status.
But I need to show the daemon's status in a web page (Rails), like:
<p>Daemon status: <%= "Daemon status here..." </p>
How can this be done?

The defaut way to do this would really be to call ruby myscript.rb status from your Rails application and parse its output to obtain the desired information.
Alternatively you could also create an Application object itself and call #running? on it.

In my tests seems like "status" command exits with non-zero value when the daemon is not running:
$ ruby myscript.rb status; echo $?
myproc: no instances running
3
$ ruby myscript.rb start
myproc: process with pid 21052 started.
$ ruby myscript.rb status; echo $?
myproc: running [pid 21052]
0
$ ruby myscript.rb stop
myproc: trying to stop process with pid 21052...
myproc: process with pid 21052 successfully stopped.
$ ruby myscript.rb status; echo $?
myproc: no instances running
3
So it is possible to programmatically check the daemon status as below:
require 'open3'
stdin, stdout, stderr, wait_thr = Open3.popen3('ruby', 'myscript.rb', 'status')
if wait_thr.value.to_i == 0
puts "Running"
else
puts "Not running"
end

Related

How to get supervisord to restart hung workers?

I have a number of Python workers managed by supervisord that should continuously print to stdout (after each completed task) if they are working properly. However, they tend to hang, and we've had difficulty finding the bug. Ideally supervisord would notice that they haven't printed in X minutes and restart them; the tasks are idempotent, so non-graceful restarts are fine. Is there any supervisord feature or addon that can do this? Or another supervisor-like program that has this out of the box?
We are already using http://superlance.readthedocs.io/en/latest/memmon.html to kill if memory usage skyrockets, which mitigates some of the hangs, but a hang that doesn't cause a memory leak can still cause the workers to reach a standstill.

One possible solution would be to wrap your python script in a bash script that'd monitor it and exit if there isn't output to stdout for a period of time.
For example:
kill-if-hung.sh
#!/usr/bin/env bash
set -e
TIMEOUT=60
LAST_CHANGED="$(date +%s)"
{
set -e
while true; do
sleep 1
kill -USR1 $$
done
} &
trap check_output USR1
check_output() {
CURRENT="$(date +%s)"
if [[ $((CURRENT - LAST_CHANGED)) -ge $TIMEOUT ]]; then
echo "Process STDOUT hasn't printed in $TIMEOUT seconds"
echo "Considering process hung and exiting"
exit 1
fi
}
STDOUT_PIPE=$(mktemp -u)
mkfifo $STDOUT_PIPE
trap cleanup EXIT
cleanup() {
kill -- -$$ # Send TERM to child processes
[[ -p $STDOUT_PIPE ]] && rm -f $STDOUT_PIPE
}
$# >$STDOUT_PIPE || exit 2 &
while true; do
if read tmp; then
echo "$tmp"
LAST_CHANGED="$(date +%s)"
fi
done <$STDOUT_PIPE
Then you would run a python script in supervisord like: kill-if-hung.sh python -u some-script.py (-u to disable output buffering, or set PYTHONUNBUFFERED).
I'm sure you could imagine a python script that'd do something similar.

Run Facebook Flow on a Continuos Integration Meteor application

I have a Meteor application with Circle CI as continuous integration service.
Facebook Flow is running locally with the following .flowconfig:
[ignore]
.*/node_modules/.*
[options]
module.name_mapper='^\/\(.*\)$' -> '<PROJECT_ROOT>/\1'
module.name_mapper='^meteor\/\(.*\):\(.*\)$' -> '<PROJECT_ROOT>/.meteor/local/build/programs/server/packages/\1_\2'
module.name_mapper='^meteor\/\(.*\)$' -> '<PROJECT_ROOT>/.meteor/local/build/programs/server/packages/\1'
In CI I get errors like:
client/main.jsx:4
4: import { Meteor } from 'meteor/meteor';
^^^^^^^^^^^^^^^ meteor/meteor. Required module not found
Flow seems not to find my modules. The rewrite rules do not apply. With SSH access to Circle CI I found ot that the <PROJECT_ROOT>/.meteor/local directory is not present.
Once I run meteor on the CI machine the directory will appear.
Problem: If I run meteor the Meteor server will start up and my test will time out.
As far as I see I need to either
Adapt my .flowconfig or
Find a way to get Meteor to create the directory without running meteor or
Find a way to kill the meteor process once the server is running.

I went with the third option:
bbaja42 shared a script that saves the output of a program and terminates the program once a keyword is reached.
Adapted to my case I have two files:
ci-tests.sh
#!/bin/sh
# Check if the output directory exits. Flow needs the modules there.
if [ ! -d ".meteor/local/build/programs/server/packages" ]; then
echo "Meteor build directory does not exist. Starting Meteor."
# Run Meteor so the output directory is built.
./build-and-kill-meteor.sh
else
echo "Meteor build directory exists"
fi
./node_modules/.bin/flow --json
if [ $? -ne 0 ] ; then
exit 1
fi
build-and-kill-meteor.sh
#!/bin/bash
OUTPUT=/tmp/meteor-launch.log
PROGRAM=meteor
$PROGRAM > $OUTPUT &
PID=$!
echo Program is running under pid: $PID
#Every 10 seconds, check requirements
while true; do
tail -1 $OUTPUT
grep "App running at: http://localhost" $OUTPUT
if [ $? -eq 0 ] ; then
break
fi
sleep 10
done
kill $PID || echo "Killing process with pid $PID failed, try manual kill with -9 argument"

I ran into the same issue and came up with my own derivation based on the OP's answer. I run this script on every CI build to ensure that Meteor will always install any new atmosphere packages that I'm shipping with.
#!/bin/bash
# Install meteor
if [ -d ~/.meteor ]; then sudo ln -s ~/.meteor/meteor /usr/local/bin/meteor; fi
if [ ! -e $HOME/.meteor/meteor ]; then curl https://install.meteor.com | sh; fi
OUTPUT=/tmp/meteor-launch.log
PROGRAM=meteor
$PROGRAM > $OUTPUT &
PID=$!
echo Program is running under pid: $PID
# Start meteor to install atmosphere packages
while true; do
tail -1 $OUTPUT
grep "Your application is crashing." $OUTPUT
# Cancel the program once meteor has started
if [ $? -eq 0 ] ; then
break
fi
sleep 10
done
kill $PID || echo "Killing process with pid $PID failed, try manual kill with -9 argument."

Unable to capture failure of rsh

I have the below rsh code as a part of a script. This code runs in a loop within the main script. In case the rsh fails, I wish to capture the exit code in a log for which the below If part was created. But it does not seem to be working as it always returns 0 for $? even when the remote server refuses connections.
I cannot use ssh as it is not configured.
rsh ${machine} -l ${osusernm} nohup ${ScrDir}/${LoadJobNm}.scr ${osusernm} ${machine} ${SIDFile} ${logon_id} ${calling_machine} &
if [ $? -ne 0 ]
then
echo "ERROR : Failed to execute ${LoadJobNm}.scr in ${machine} for file ${SIDFile}" >> ${LogDir}/${JobNm}.log
break
fi

How to get the proper exit code from nohup

From the nohup documentation in info coreutils 'nohup invocation' it states:
Exit status:
125 if `nohup' itself fails, and `POSIXLY_CORRECT' is not set
126 if COMMAND is found but cannot be invoked
127 if COMMAND cannot be found
the exit status of COMMAND otherwise
However, the only exit codes I've ever gotten from nohup have been 1 and 0. I have a nohup command that's failing from within a script, and I need the exception appropriately...and based on this documentation I would assume that the nohup exit code should be 126. Instead, it is 0.
The command I'm running is: nohup perl myscript.pl &
Is this because perl is exiting successfully?

If your shell script runs the process with:
nohup perl myscript.pl &
you more or less forego the chance to collect the exit status from nohup. The command as a whole succeeds with 0 if the shell forked and fails with 1 if the shell fails to fork. In bash, you can wait for the background process to die and collect its status via wait:
nohup perl myscript.pl &
oldpid=$!
...do something else or this whole rigmarole is pointless...
wait $oldpid
echo $?
The echoed $? is usually the exit status of the specified PID (unless the specified PID had already died and been waited for).
If you run the process synchronously, you can detect the different exit statuses:
(
nohup perl myscript.pl
echo "PID $! exited with status $?" >&2
) &
And now you should be able to spot the different exit statuses from nohup (eg try different misspellings: nohup pearl myscript.pl, etc).
Note that the sub-shell as a whole is run in the background, but the nohup is run synchronously within the sub-shell.

As my understanding, the question was how to get the command status when it was running in nohup. As my experiences it was very little chance that you were able to get the COMMAND exit status even when it failed right away. Most time you just got the 'nohup COMMAND &' exit status unless you wait or synchronize as Jonathan mentioned. To check the COMMAND status right after nohup, I use:
pid=`ps -eo pid,cmd | awk '/COMMAND/ {print $1}'`
if [ -z $pid ]; then
echo "the COMMAND failed"
else
echo "the COMMAND is running in nohup"
fi

How to do parallel processing in Unix Shell script?

I have a shell script that transfers a build.xml file to a remote unix machine (devrsp02) and executes the ANT task wldeploy on that machine (devrsp02). Now, this wldeploy task takes around 15 minutes to complete and while this is running, the last line at the unix console is -
"task {some digit} initialized".
Once this task is complete, we get a "task Completed" msg and the next task in the script is executed only after that.
But sometimes, there might be a problem with the weblogic domain and the deployment might be failing internally, with no effect on the status of the wldeploy task. The unix console will still be stuck at "task {some digit} initialized". The error of the deployment will be getting logged in a file called output.a
So, what I want now is -
Start a time counter before running wldeploy. If the wldeploy runs for more than 15 minutes, the following command should be run -
tail -f output.a ## without terminating the wldeploy
or
cat output.a ## after terminating the wldeploy forcefully
Point to be noted here is - I can't run the wldeploy task in background, as in that case the user won't get to know when the task is complete, which is crucial for this script.
Could you please suggest anything to achieve this?

Create this script (deploy.sh for example):
#!/bin/sh
sleep 900 && pkill -n wldeploy && cat output.a &
wldeploy
Then from the console
chmod +x deploy.sh
Then run
./deploy.sh
This script will start a counter (15 minutes) that will forcibly kill the wldeploy process if it's running, and if the process was running you'll see the contents of output.a.
If the script has terminated then pkill will not return true and output.a will not be shown.
I would call this task monitoring rather than "parallel processing" :)

This will only kill the wldeploy process it started, tell you whether wldeploy returned success or failure, and run no more than 30 seconds after wldeploy finishes.
It should be sh-compatible, but the /bin/sh I've got access to now seems to have a broken wait command.
#!/bin/ksh
wldeploy &
while [ ${slept:-0} -le 900 ]; do
sleep 30 && slept=`expr ${slept:-0} + 30`
if [ $$ = "`ps -o ppid= -p $!`" ]; then
echo wldeploy still running
else
wait $! && echo "wldeploy succeeded" || echo "wldeploy failed"
break
fi
done
if [ $$ = "`ps -o ppid= -p $!`" ]; then
echo "wldeploy did not finish in $slept seconds, killing it"
kill $!
cat output.a
fi

For the part without terminating the wldeploy it is easy, just execute before
{ sleep 900; tail -f output.a; } &
For the part with kill it, it is more complex, as you have determine the PID of the wldeploy process. The answer of pra is exactly doing that, so I would just refer to that.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How get daemons' status programmatically using Ruby gem "daemons" - ruby-daemons

The defaut way to do this would really be to call ruby myscript.rb status from your Rails application and parse its output to obtain the desired information. Alternatively you could also create an Application object itself and call #running? on it.

Related

How to get supervisord to restart hung workers?

Run Facebook Flow on a Continuos Integration Meteor application

Unable to capture failure of rsh

How to get the proper exit code from nohup

How to do parallel processing in Unix Shell script?

Categories

Resources