On my system (debian wheezy), lsof (make by checkrestart) take a very long time, 15 minutes.
After search, i suppose that is occur because we have a lot off child in a process (3000).
Can i exclude a process in lsof ?
Thanks,
Related
I use crontask to regularly run Rscript. Unfortunately, I need to do this on a small instance of aws and the process may hang, building more and more processes on top of each other until the whole system is lagging.
I would like to write a crontask to kill all R processes lasting longer than one minute. I found another answer on Stack Overflow that I've adapted that I think would solve the problem. I came up with;
if [[ "$(uname)" = "Linux" ]];then killall --older-than 1m "/usr/lib/R/bin/exec/R --slave --no-restore --file=/home/ubuntu/script.R";fi
I copied the task directly from htop, but it does not work as I expect. I get the No such file or directory error but I've checked it a few times.
I need to kill all R processes that have lasted longer than a minute. How can I do this?
You may want to avoid killing processes from another user and try SIGKILL (kill -9) after SIGTERM (kill -15). Here is a script you could execute every minute with a CRON job:
#!/bin/bash
PROCESS="R"
MAXTIME=`date -d '00:01:00' +'%s'`
function killpids()
{
PIDS=`pgrep -u "${USER}" -x "${PROCESS}"`
# Loop over all matching PIDs
for pid in ${PIDS}; do
# Retrieve duration of the process
TIME=`ps -o time:1= -p "${pid}" |
egrep -o "[0-9]{0,2}:?[0-9]{0,2}:[0-9]{2}$"`
# Convert TIME to timestamp
TTIME=`date -d "${TIME}" +'%s'`
# Check if the process should be killed
if [ "${TTIME}" -gt "${MAXTIME}" ]; then
kill ${1} "${pid}"
fi
done
}
# Leave a chance to kill processes properly (SIGTERM)
killpids "-15"
sleep 5
# Now kill remaining processes (SIGKILL)
killpids "-9"
Why imply an additional process every minute with cron?
Would it not be easier to start R with timeout from coreutils, the processes will then be killed automatically after the time you chose.
timeout [option] duration command [arg]…
I think the best option is to do this with R itself. I am no expert, but it seems the future package will allow executing a function in a separate thread. You could run the actual task in a separate thread, and in the main thread sleep for 60 seconds and then stop().
Previous Update
user1747036's answer which recommends timeout is a better alternative.
My original answer
This question is more appropriate for superuser, but here are a few things wrong with
if [[ "$(uname)" = "Linux" ]];then
killall --older-than 1m \
"/usr/lib/R/bin/exec/R --slave --no-restore --file=/home/ubuntu/script.R";
fi
The name argument is either the name of image or path to it. You have included parameters to it as well
If -s signal is not specified killall sends SIGTERM which your process may ignore. Are you able to kill a long running script with this on the command line? You may need SIGKILL / -9
More at http://linux.die.net/man/1/killall
I stuck on a small problem.
I'm launching many bsub commands at the same time each one on a specified host:
bsub -sp 20 -W 0:5 -m $myhostname -q "myQueue" -J "mkdir_script" -o $log_file "script_to_launch param1 param2 param3"
all this inside a for, for each hostName.
The problem is that everything is OK for all hosts except one (always the same one). The job is always in PENDING state, and is not moving to RUN state.
The script to execute is a script that will check for a folder and creating it if is not there (so a very small task to do).
Is there a way to see what happens on that host and why my job is not going to RUN state ?
PS: I just found the bjobs -p command and I have the following message:
Not specified in job submission: 81 hosts;
Closed by LSF administrator: 3 hosts;
What does this message mean?
The -m option limits you to a particular host, which excludes 81 hosts. The other three have been closed by your system administrator. You would have to contact them to find out why.
In UNIX, I have a utility, say 'Test_Ex', a binary file. How can I write a job or a shell script(as a cron job) running always in the background which keeps checking if 'Test_Ex' is still running every 5 seconds(and probably hide this job). If it is running, do nothing. If not, delete a directory at the specified path.
Try this script:
pgrep Test_Ex > /dev/null || rm -r dir
If you don't have pgrep, use
ps -e -ocomm | grep Test_Ex || ...
instead.
Utilities like upstart, originally part of the Ubuntu linux distribution I believe, are good for monitoring running tasks.
The best way to do this is to not do it. If you want to know if Test_Ex is still running, then start it from a script that looks something like:
#!/bin/sh
Test_Ex
logger "Test_Ex died"
rm /p/a/t/h
or
#!/bin/sh
while ! Test_ex
do
logger "Test_Ex terminated unsuccesfully, restarting in 5 seconds"
sleep 5
done
Querying ps regularly is a bad idea, and trying to monitor it from cron is a horrible, horrible idea. There seems to be some comfort in the idea that crond will always be running, but you can no more rely on that than you can rely on the wrapper script staying alive; either one can be killed at any time. Waking up every 10 seconds to query ps is just a waste of resources.
What is the difference between a job and a process in Unix ? Can you please give an example ?
Jobs are processes which are started by a shell. The shell keeps track of these in a job table. The jobs command shows a list of active background processes. They get a jobspec number which is not the pid of the process. Commands like fg use the jobspec id.
In the spirit of Jürgen Hötzel's example:
find $HOME | sort &
[1] 15317
$ jobs
[1]+ Running find $HOME | sort &
$ fg
find $HOME | sort
C-c C-z
[1]+ Stopped find $HOME | sort
$ bg 1
[1]+ find $HOME | sort &
Try the examples yourself and look at the man pages.
A Process Group can be considered as a Job. For example you create a background process group in shell:
$ find $HOME|sort &
[1] 2668
And you can see two processes as members of the new process group:
$ ps -p 2668 -o cmd,pgrp
CMD PGRP
sort 2667
$ ps -p "$(pgrep -d , -g 2667)" -o cmd,pgrp
CMD PGRP
find /home/juergen 2667
sort 2667
You can can also kill the whole process group/job:
$ pkill -g 2667
http://en.wikipedia.org/wiki/Job_control_%28Unix%29:
Processes under the influence of a job control facility are referred to as jobs.
http://en.wikipedia.org/wiki/Job_control_%28Unix%29
Jobs are one or more processes that are grouped together as a 'job', where job is a UNIX shell concept.
Jobs are one or more processes that are grouped together as a 'job', where job is a UNIX shell concept. A job consists of multiple processes running in series or parallel. while
A process is a program under execution. job is when you want to know about processes started from the current shell.
A job consists of multiple processes running in series or parallel. A process is a program under execution.
job is when you want to know about processes started from the current shell.
process is when you want to know about a process running from any shell or computer.
I think a job is a scheduled process or set of processes, a job always has the notion of schedule, otherwise we could call it a process.
I have created a job with the at command on Solaris 10.
It's working now but I want to kill it but I don't know how I can find the job number and how to kill that job or process.
You should be able to find your command with a ps variant like:
ps -ef
ps -fubob # if your job's user ID is bob.
Then, once located, it should be a simple matter to use kill to kill the process (permissions permitting).
If you're talking about getting rid of jobs in the at queue (that aren't running yet), you can use atq to list them and atrm to get rid of them.
To delete a job which has not yet run, you need the atrm command. You can use atq command to get its number in the at list.
To kill a job which has already started to run, you'll need to grep for it using:
ps -eaf | grep <command name>
and then use kill to stop it.
A quicker way to do this on most systems is:
pkill <command name>
at -l to list jobs, which gives return like this:
age2%> at -l
11 2014-10-21 10:11 a hoppent
10 2014-10-19 13:28 a hoppent
atrm 10 kills job 10
Or so my sysadmin told me, and it
First
ps -ef
to list all processes. Note the the process number of the one you want to kill. Then
kill 1234
were you replace 1234 with the process number that you want.
Alternatively, if you are absolutely certain that there is only one process with a particular name, or you want to kill multiple processes which share the same name
killall processname