What is the difference between a job and a process in Unix ? Can you please give an example ?
Jobs are processes which are started by a shell. The shell keeps track of these in a job table. The jobs command shows a list of active background processes. They get a jobspec number which is not the pid of the process. Commands like fg use the jobspec id.
In the spirit of Jürgen Hötzel's example:
find $HOME | sort &
[1] 15317
$ jobs
[1]+ Running find $HOME | sort &
$ fg
find $HOME | sort
C-c C-z
[1]+ Stopped find $HOME | sort
$ bg 1
[1]+ find $HOME | sort &
Try the examples yourself and look at the man pages.
A Process Group can be considered as a Job. For example you create a background process group in shell:
$ find $HOME|sort &
[1] 2668
And you can see two processes as members of the new process group:
$ ps -p 2668 -o cmd,pgrp
CMD PGRP
sort 2667
$ ps -p "$(pgrep -d , -g 2667)" -o cmd,pgrp
CMD PGRP
find /home/juergen 2667
sort 2667
You can can also kill the whole process group/job:
$ pkill -g 2667
http://en.wikipedia.org/wiki/Job_control_%28Unix%29:
Processes under the influence of a job control facility are referred to as jobs.
http://en.wikipedia.org/wiki/Job_control_%28Unix%29
Jobs are one or more processes that are grouped together as a 'job', where job is a UNIX shell concept.
Jobs are one or more processes that are grouped together as a 'job', where job is a UNIX shell concept. A job consists of multiple processes running in series or parallel. while
A process is a program under execution. job is when you want to know about processes started from the current shell.
A job consists of multiple processes running in series or parallel. A process is a program under execution.
job is when you want to know about processes started from the current shell.
process is when you want to know about a process running from any shell or computer.
I think a job is a scheduled process or set of processes, a job always has the notion of schedule, otherwise we could call it a process.
Related
In below example, if shell script shell_script.sh sends a job to cluster, is it possible to have snakemake aware of that cluster job's completion? That is, first, file a should be created by shell_script.sh which sends its own job to the cluster, and then once this cluster job is completed, file b should be created.
For simplicity, let's assume that snakemake is run locally meaning that the only cluster job originating is from shell_script.sh and not by snakemake .
localrules: that_job
rule all:
input:
"output_from_shell_script.txt",
"file_after_cluster_job.txt"
rule that_job:
output:
a = "output_from_shell_script.txt",
b = "file_after_cluster_job.txt"
shell:
"""
shell_script.sh {output.a}
touch {output.b}
"""
PS - At the moment, I am using sleep command to give it a waiting time before the job is "completed". But this is an awful workaround as this could give rise to several problems.
Snakemake can manage this for you with the --cluster argument on the command line.
You can supply a template for the jobs to be executed on the cluster.
As an example, here is how I use snakemake on a SGE managed cluster:
template which will encapsulate the jobs which I called sge.sh:
#$ -S /bin/bash
#$ -cwd
#$ -V
{exec_job}
then I use directly on the login node:
snakemake -rp --cluster "qsub -e ./logs/ -o ./logs/" -j 20 --jobscript sge.sh --latency-wait 30
--cluster will tell which queuing system to use
--jobscript is the template in which jobs will be encapsulated
--latency-wait is important if the file system takes a bit of time to write the files. You job might end and return before the output of the rules are actually visible to the filesystem which will cause an error
Note that you can specify rules not to be executed on the nodes in the Snakefile with the keyword localrules:
Otherwise, depending on your queuing system, some options exist to wait for job sent to cluster to finish:
SGE:
Wait for set of qsub jobs to complete
SLURM:
How to hold up a script until a slurm job (start with srun) is completely finished?
LSF:
https://superuser.com/questions/46312/wait-for-one-or-all-lsf-jobs-to-complete
I use crontask to regularly run Rscript. Unfortunately, I need to do this on a small instance of aws and the process may hang, building more and more processes on top of each other until the whole system is lagging.
I would like to write a crontask to kill all R processes lasting longer than one minute. I found another answer on Stack Overflow that I've adapted that I think would solve the problem. I came up with;
if [[ "$(uname)" = "Linux" ]];then killall --older-than 1m "/usr/lib/R/bin/exec/R --slave --no-restore --file=/home/ubuntu/script.R";fi
I copied the task directly from htop, but it does not work as I expect. I get the No such file or directory error but I've checked it a few times.
I need to kill all R processes that have lasted longer than a minute. How can I do this?
You may want to avoid killing processes from another user and try SIGKILL (kill -9) after SIGTERM (kill -15). Here is a script you could execute every minute with a CRON job:
#!/bin/bash
PROCESS="R"
MAXTIME=`date -d '00:01:00' +'%s'`
function killpids()
{
PIDS=`pgrep -u "${USER}" -x "${PROCESS}"`
# Loop over all matching PIDs
for pid in ${PIDS}; do
# Retrieve duration of the process
TIME=`ps -o time:1= -p "${pid}" |
egrep -o "[0-9]{0,2}:?[0-9]{0,2}:[0-9]{2}$"`
# Convert TIME to timestamp
TTIME=`date -d "${TIME}" +'%s'`
# Check if the process should be killed
if [ "${TTIME}" -gt "${MAXTIME}" ]; then
kill ${1} "${pid}"
fi
done
}
# Leave a chance to kill processes properly (SIGTERM)
killpids "-15"
sleep 5
# Now kill remaining processes (SIGKILL)
killpids "-9"
Why imply an additional process every minute with cron?
Would it not be easier to start R with timeout from coreutils, the processes will then be killed automatically after the time you chose.
timeout [option] duration command [arg]…
I think the best option is to do this with R itself. I am no expert, but it seems the future package will allow executing a function in a separate thread. You could run the actual task in a separate thread, and in the main thread sleep for 60 seconds and then stop().
Previous Update
user1747036's answer which recommends timeout is a better alternative.
My original answer
This question is more appropriate for superuser, but here are a few things wrong with
if [[ "$(uname)" = "Linux" ]];then
killall --older-than 1m \
"/usr/lib/R/bin/exec/R --slave --no-restore --file=/home/ubuntu/script.R";
fi
The name argument is either the name of image or path to it. You have included parameters to it as well
If -s signal is not specified killall sends SIGTERM which your process may ignore. Are you able to kill a long running script with this on the command line? You may need SIGKILL / -9
More at http://linux.die.net/man/1/killall
under Migration process, we need to identify the commands/box which are kept ON HOLD/ON ICE. We are doing this activity as we need to delete those unused jobs.
Please let me know the Autosys command to list these unused jobs?
Thanks
If you're doing this manually, then a simple grep/find should work for you, for example:
in Unix:
autorep -wj ALL | grep OI
autorep -wj ALL | grep OH
or in a Windows agent CMD:
autorep -wj ALL | find "OI" or OH for on_hold jobs
However, if you plan to automate this, you can write a simple program that uses the above commands as a base, then parse the output.
In UNIX, I have a utility, say 'Test_Ex', a binary file. How can I write a job or a shell script(as a cron job) running always in the background which keeps checking if 'Test_Ex' is still running every 5 seconds(and probably hide this job). If it is running, do nothing. If not, delete a directory at the specified path.
Try this script:
pgrep Test_Ex > /dev/null || rm -r dir
If you don't have pgrep, use
ps -e -ocomm | grep Test_Ex || ...
instead.
Utilities like upstart, originally part of the Ubuntu linux distribution I believe, are good for monitoring running tasks.
The best way to do this is to not do it. If you want to know if Test_Ex is still running, then start it from a script that looks something like:
#!/bin/sh
Test_Ex
logger "Test_Ex died"
rm /p/a/t/h
or
#!/bin/sh
while ! Test_ex
do
logger "Test_Ex terminated unsuccesfully, restarting in 5 seconds"
sleep 5
done
Querying ps regularly is a bad idea, and trying to monitor it from cron is a horrible, horrible idea. There seems to be some comfort in the idea that crond will always be running, but you can no more rely on that than you can rely on the wrapper script staying alive; either one can be killed at any time. Waking up every 10 seconds to query ps is just a waste of resources.
I have created a job with the at command on Solaris 10.
It's working now but I want to kill it but I don't know how I can find the job number and how to kill that job or process.
You should be able to find your command with a ps variant like:
ps -ef
ps -fubob # if your job's user ID is bob.
Then, once located, it should be a simple matter to use kill to kill the process (permissions permitting).
If you're talking about getting rid of jobs in the at queue (that aren't running yet), you can use atq to list them and atrm to get rid of them.
To delete a job which has not yet run, you need the atrm command. You can use atq command to get its number in the at list.
To kill a job which has already started to run, you'll need to grep for it using:
ps -eaf | grep <command name>
and then use kill to stop it.
A quicker way to do this on most systems is:
pkill <command name>
at -l to list jobs, which gives return like this:
age2%> at -l
11 2014-10-21 10:11 a hoppent
10 2014-10-19 13:28 a hoppent
atrm 10 kills job 10
Or so my sysadmin told me, and it
First
ps -ef
to list all processes. Note the the process number of the one you want to kill. Then
kill 1234
were you replace 1234 with the process number that you want.
Alternatively, if you are absolutely certain that there is only one process with a particular name, or you want to kill multiple processes which share the same name
killall processname