Kill all R processes that hang for longer than a minute - r

I use crontask to regularly run Rscript. Unfortunately, I need to do this on a small instance of aws and the process may hang, building more and more processes on top of each other until the whole system is lagging.
I would like to write a crontask to kill all R processes lasting longer than one minute. I found another answer on Stack Overflow that I've adapted that I think would solve the problem. I came up with;
if [[ "$(uname)" = "Linux" ]];then killall --older-than 1m "/usr/lib/R/bin/exec/R --slave --no-restore --file=/home/ubuntu/script.R";fi
I copied the task directly from htop, but it does not work as I expect. I get the No such file or directory error but I've checked it a few times.
I need to kill all R processes that have lasted longer than a minute. How can I do this?

You may want to avoid killing processes from another user and try SIGKILL (kill -9) after SIGTERM (kill -15). Here is a script you could execute every minute with a CRON job:
#!/bin/bash
PROCESS="R"
MAXTIME=`date -d '00:01:00' +'%s'`
function killpids()
{
PIDS=`pgrep -u "${USER}" -x "${PROCESS}"`
# Loop over all matching PIDs
for pid in ${PIDS}; do
# Retrieve duration of the process
TIME=`ps -o time:1= -p "${pid}" |
egrep -o "[0-9]{0,2}:?[0-9]{0,2}:[0-9]{2}$"`
# Convert TIME to timestamp
TTIME=`date -d "${TIME}" +'%s'`
# Check if the process should be killed
if [ "${TTIME}" -gt "${MAXTIME}" ]; then
kill ${1} "${pid}"
fi
done
}
# Leave a chance to kill processes properly (SIGTERM)
killpids "-15"
sleep 5
# Now kill remaining processes (SIGKILL)
killpids "-9"

Why imply an additional process every minute with cron?
Would it not be easier to start R with timeout from coreutils, the processes will then be killed automatically after the time you chose.
timeout [option] duration command [arg]…

I think the best option is to do this with R itself. I am no expert, but it seems the future package will allow executing a function in a separate thread. You could run the actual task in a separate thread, and in the main thread sleep for 60 seconds and then stop().
Previous Update
user1747036's answer which recommends timeout is a better alternative.
My original answer
This question is more appropriate for superuser, but here are a few things wrong with
if [[ "$(uname)" = "Linux" ]];then
killall --older-than 1m \
"/usr/lib/R/bin/exec/R --slave --no-restore --file=/home/ubuntu/script.R";
fi
The name argument is either the name of image or path to it. You have included parameters to it as well
If -s signal is not specified killall sends SIGTERM which your process may ignore. Are you able to kill a long running script with this on the command line? You may need SIGKILL / -9
More at http://linux.die.net/man/1/killall

Related

Use of sleep in bash within for loop launching sbatch

I want to submit an R script myjob.R that takes two arguments for which I have several scenarios (here only a few as an example).
I want to pass these arguments by looping through scens and sets.
In order to avoid overloading the squeue on the cluster, I don't want to submit the whole loop at once.
Instead I want to wait 1h between each individual job submission.
Therefore, I included the sleep 1h command, after each iteration.
I used to launch the bash script via bash mybash.sh, however this command requires to keep the terminal open until all jobs have been submitted.
My solution was then to launch mybash.sh via sbatch mybash.sh. This is somehow nesting two sbatch commands. Seems to work very well.
My question is only if there is any reason against submitting nested sbatch commands.
Thanks!
Here is mybash.sh script:
#!/bin/bash
scens=('AAA' 'BBB')
sets=('set1' 'set2')
wd=/projects/workdir
for sc in "${!scens[#]}";do
for se in "${!sets[#]}" ;do
echo "SCENARIO: ${scens[sc]} --- SET: ${sets[se]}"
sbatch -t 00:05:00 -J myjob --workdir=${wd} -e myjob.err -o myjob.out R --file=myjob.R --args "${scens[sc]}" "${sets[se]}"
# My solution is to include the following line & run this bash script via sbatch
sleep 1h
done
done

Can Snakemake work if a rule's shell command is a cluster job?

In below example, if shell script shell_script.sh sends a job to cluster, is it possible to have snakemake aware of that cluster job's completion? That is, first, file a should be created by shell_script.sh which sends its own job to the cluster, and then once this cluster job is completed, file b should be created.
For simplicity, let's assume that snakemake is run locally meaning that the only cluster job originating is from shell_script.sh and not by snakemake .
localrules: that_job
rule all:
input:
"output_from_shell_script.txt",
"file_after_cluster_job.txt"
rule that_job:
output:
a = "output_from_shell_script.txt",
b = "file_after_cluster_job.txt"
shell:
"""
shell_script.sh {output.a}
touch {output.b}
"""
PS - At the moment, I am using sleep command to give it a waiting time before the job is "completed". But this is an awful workaround as this could give rise to several problems.
Snakemake can manage this for you with the --cluster argument on the command line.
You can supply a template for the jobs to be executed on the cluster.
As an example, here is how I use snakemake on a SGE managed cluster:
template which will encapsulate the jobs which I called sge.sh:
#$ -S /bin/bash
#$ -cwd
#$ -V
{exec_job}
then I use directly on the login node:
snakemake -rp --cluster "qsub -e ./logs/ -o ./logs/" -j 20 --jobscript sge.sh --latency-wait 30
--cluster will tell which queuing system to use
--jobscript is the template in which jobs will be encapsulated
--latency-wait is important if the file system takes a bit of time to write the files. You job might end and return before the output of the rules are actually visible to the filesystem which will cause an error
Note that you can specify rules not to be executed on the nodes in the Snakefile with the keyword localrules:
Otherwise, depending on your queuing system, some options exist to wait for job sent to cluster to finish:
SGE:
Wait for set of qsub jobs to complete
SLURM:
How to hold up a script until a slurm job (start with srun) is completely finished?
LSF:
https://superuser.com/questions/46312/wait-for-one-or-all-lsf-jobs-to-complete

Write to one output file from a few parallel LSF bsub jobs, avoiding writing at the same time

I have developed a code that composed of two files:
An 'envelop bash file', which does a few things and writes to a log-file, and then at some point runs into a for loop in which within it it executes one job at a time using bsub.
And 'an internal bash file', which gets as input the name of the log-file (in addition to other input values that necessary for its execution), and executes process X (using the input values it received from the 'envelop file'.
Once process X is finished, the 'internal script' writes to the log-file that process X (with its specific serial number) has been completed.
Since the for-loop of the envelop file loops 10 times, there are at least 10 parallel processes that being executed and run in parallel, and they all being executed with bsub given the SAME log-file name. The idea is that they would all report to the same log-file once they completed their execution of Process X.
The general procedure works well, and in each case process X is being executed, and the log-file accumulates as required all the notifications regarding the completion of process X. However, in some incidences we see that the writing to the log-file get disturbed and output lines of two parallel runs are running into each other.
I would like to lock the log-file in such manner that would allow it to receive text only from one parallel run at a time. The idea is to avoid cases where the text becomes mixed due to two processes that write by chance to the log-file exactly at the same time.
Here is the part of my envelop file which call to the bsub (I reduced the content to the minimum necessary):
for ((i=1;i<=$batchesnumber; i++));
do
bsub -J $SerialName -q normal "bash FetchFasta.bash $genome_fa ${SerialFileName}".bed" $logfile"
done
Here is the part of my internal file that echo to the log-file:
(
echo "~~~~~~~~~~~~~~~~~~"
echo "^^^^^^^^^^^^^^^^^^"
echo -n "Completed running "; bedtools -version
echo "bedtools getfasta -s -fi $genome_fasta -bed $mySerialFile -fo ${mySerialFile%.*}".fa" "
echo "Run's completion time is: $timedate"
echo -e "~~~~~~~~~~~~~~~~~~\n"
) >> $logfile
I would appreciate any useful solution!
There's a couple of ways I can think of going about this:
Have each job write its output to a different file (use $LSB_JOBID inside each job to name the file). Then use another "cleanup" job to concatenate all of the ouptut into a single file. You can use job dependencies (bsub -w) to make sure the cleanup job runs after all the other jobs are done.
Implement a lock inside your "internal" job to make sure only one of them writes to a file at a time. This is a lot simpler than it might sound, one way to do it is to have each job try to create the same directory with mkdir before writing to the file and then delete the directory after its done. If they fail to create the directory it's because another one of the jobs got to it first and is currently writing to the file.
Here's a snippet illustrating #2 in bash:
# Try to get the lock every second
while ! mkdir lock &> /dev/null ; do
sleep 1
done
# Got the lock, write to the logfile
echo blahblahblah >> $logfile
# Release the lock
rmdir lock
I should mention an important caveat here though: if one of your jobs dies while it's "holding the lock" (say someone sends it a kill signal at the wrong time) then it'll never remove the directory and all the other jobs won't be able to create it, so they'll just keep sleeping forever.

write a background process to check process is still active

In UNIX, I have a utility, say 'Test_Ex', a binary file. How can I write a job or a shell script(as a cron job) running always in the background which keeps checking if 'Test_Ex' is still running every 5 seconds(and probably hide this job). If it is running, do nothing. If not, delete a directory at the specified path.
Try this script:
pgrep Test_Ex > /dev/null || rm -r dir
If you don't have pgrep, use
ps -e -ocomm | grep Test_Ex || ...
instead.
Utilities like upstart, originally part of the Ubuntu linux distribution I believe, are good for monitoring running tasks.
The best way to do this is to not do it. If you want to know if Test_Ex is still running, then start it from a script that looks something like:
#!/bin/sh
Test_Ex
logger "Test_Ex died"
rm /p/a/t/h
or
#!/bin/sh
while ! Test_ex
do
logger "Test_Ex terminated unsuccesfully, restarting in 5 seconds"
sleep 5
done
Querying ps regularly is a bad idea, and trying to monitor it from cron is a horrible, horrible idea. There seems to be some comfort in the idea that crond will always be running, but you can no more rely on that than you can rely on the wrapper script staying alive; either one can be killed at any time. Waking up every 10 seconds to query ps is just a waste of resources.

What is difference between a job and a process in Unix?

What is the difference between a job and a process in Unix ? Can you please give an example ?
Jobs are processes which are started by a shell. The shell keeps track of these in a job table. The jobs command shows a list of active background processes. They get a jobspec number which is not the pid of the process. Commands like fg use the jobspec id.
In the spirit of Jürgen Hötzel's example:
find $HOME | sort &
[1] 15317
$ jobs
[1]+ Running find $HOME | sort &
$ fg
find $HOME | sort
C-c C-z
[1]+ Stopped find $HOME | sort
$ bg 1
[1]+ find $HOME | sort &
Try the examples yourself and look at the man pages.
A Process Group can be considered as a Job. For example you create a background process group in shell:
$ find $HOME|sort &
[1] 2668
And you can see two processes as members of the new process group:
$ ps -p 2668 -o cmd,pgrp
CMD PGRP
sort 2667
$ ps -p "$(pgrep -d , -g 2667)" -o cmd,pgrp
CMD PGRP
find /home/juergen 2667
sort 2667
You can can also kill the whole process group/job:
$ pkill -g 2667
http://en.wikipedia.org/wiki/Job_control_%28Unix%29:
Processes under the influence of a job control facility are referred to as jobs.
http://en.wikipedia.org/wiki/Job_control_%28Unix%29
Jobs are one or more processes that are grouped together as a 'job', where job is a UNIX shell concept.
Jobs are one or more processes that are grouped together as a 'job', where job is a UNIX shell concept. A job consists of multiple processes running in series or parallel. while
A process is a program under execution. job is when you want to know about processes started from the current shell.
A job consists of multiple processes running in series or parallel. A process is a program under execution.
job is when you want to know about processes started from the current shell.
process is when you want to know about a process running from any shell or computer.
I think a job is a scheduled process or set of processes, a job always has the notion of schedule, otherwise we could call it a process.

Resources