Calling Rscript from linux shell script - r

Can anyone suggest how I might get this working....
I have an R script that takes several minutes to run and writes a few hundred lines of output. I want to write a shell script wrapper around this R script which will launch the R script in the background, pipe its output to a file and start following the bottom of that file. If the user then enters CTRL-C I want that to kill the shell script and tail command but not the R script. Sounds simple right?
I've produced a simplified example below, but I don't understand why this doesn't work. Whenever I kill the shell script the R script is also killed despite apparently running in the background. I've tried nohup, disown etc with no success.
example.R
for(i in 1:1000){
Sys.sleep(1)
print(i)
}
wrapper.sh
#!/bin/bash
Rscript example.R > logfile &
tail -f logfile
Thanks in advance!

The following seems to work on my Ubuntu machine:
#!/bin/bash
setsid Rscript example.R > logfile.txt &
tail -f logfile.txt
Here are some of the relevant processes before sending SIGINT to wrapper.sh:
5361 pts/10 00:00:00 bash
6994 ? 00:00:02 update-notifier
8519 pts/4 00:00:00 wrapper.sh
8520 ? 00:00:00 R
8521 pts/4 00:00:00 tail
and after Ctrl+C, you can see that R is still running, but wrapper.sh and tail have been killed:
5361 pts/10 00:00:00 bash
6994 ? 00:00:02 update-notifier
8520 ? 00:00:00 R
Although appending your Rscript [...] command with & will send it to the background, it is still part of the same process group, and therefore receives SIGINT as well.
I'm not sure if it was your intention, but since you are calling tail -f, if not interrupted with Ctrl+c, your shell that is running wrapper.sh will continue to hang even after the R process completes. If you want to avoid this, the following should work,
#!/bin/bash
setsid Rscript example.R > logfile.txt &
tail --pid="$!" -f logfile.txt
where "$!" is the process id of the last background process executed (the Rscript [...] call).

Related

Use of sleep in bash within for loop launching sbatch

I want to submit an R script myjob.R that takes two arguments for which I have several scenarios (here only a few as an example).
I want to pass these arguments by looping through scens and sets.
In order to avoid overloading the squeue on the cluster, I don't want to submit the whole loop at once.
Instead I want to wait 1h between each individual job submission.
Therefore, I included the sleep 1h command, after each iteration.
I used to launch the bash script via bash mybash.sh, however this command requires to keep the terminal open until all jobs have been submitted.
My solution was then to launch mybash.sh via sbatch mybash.sh. This is somehow nesting two sbatch commands. Seems to work very well.
My question is only if there is any reason against submitting nested sbatch commands.
Thanks!
Here is mybash.sh script:
#!/bin/bash
scens=('AAA' 'BBB')
sets=('set1' 'set2')
wd=/projects/workdir
for sc in "${!scens[#]}";do
for se in "${!sets[#]}" ;do
echo "SCENARIO: ${scens[sc]} --- SET: ${sets[se]}"
sbatch -t 00:05:00 -J myjob --workdir=${wd} -e myjob.err -o myjob.out R --file=myjob.R --args "${scens[sc]}" "${sets[se]}"
# My solution is to include the following line & run this bash script via sbatch
sleep 1h
done
done

Shell script when executed appears twice is the process list

I am running the below shell script to run in background ./something.sh &
#!/bin/bash
tail -n0 -f -F service.log | while read LOGLINE
do
done
when i check ps -ef| grep something, i see two processes
20273 1 0 16:13 ? 00:00:00 /bin/bash /something.sh
20280 20273 0 16:13 ? 00:00:00 /bin/bash /something.sh
This is because your script is piping the output of a program to a shell command. When you run this there will be three processes:
The something.sh that you explicitly started
The tail that your script starts
A copy of something.sh that is executing the while loop.

Kill all R processes that hang for longer than a minute

I use crontask to regularly run Rscript. Unfortunately, I need to do this on a small instance of aws and the process may hang, building more and more processes on top of each other until the whole system is lagging.
I would like to write a crontask to kill all R processes lasting longer than one minute. I found another answer on Stack Overflow that I've adapted that I think would solve the problem. I came up with;
if [[ "$(uname)" = "Linux" ]];then killall --older-than 1m "/usr/lib/R/bin/exec/R --slave --no-restore --file=/home/ubuntu/script.R";fi
I copied the task directly from htop, but it does not work as I expect. I get the No such file or directory error but I've checked it a few times.
I need to kill all R processes that have lasted longer than a minute. How can I do this?
You may want to avoid killing processes from another user and try SIGKILL (kill -9) after SIGTERM (kill -15). Here is a script you could execute every minute with a CRON job:
#!/bin/bash
PROCESS="R"
MAXTIME=`date -d '00:01:00' +'%s'`
function killpids()
{
PIDS=`pgrep -u "${USER}" -x "${PROCESS}"`
# Loop over all matching PIDs
for pid in ${PIDS}; do
# Retrieve duration of the process
TIME=`ps -o time:1= -p "${pid}" |
egrep -o "[0-9]{0,2}:?[0-9]{0,2}:[0-9]{2}$"`
# Convert TIME to timestamp
TTIME=`date -d "${TIME}" +'%s'`
# Check if the process should be killed
if [ "${TTIME}" -gt "${MAXTIME}" ]; then
kill ${1} "${pid}"
fi
done
}
# Leave a chance to kill processes properly (SIGTERM)
killpids "-15"
sleep 5
# Now kill remaining processes (SIGKILL)
killpids "-9"
Why imply an additional process every minute with cron?
Would it not be easier to start R with timeout from coreutils, the processes will then be killed automatically after the time you chose.
timeout [option] duration command [arg]…
I think the best option is to do this with R itself. I am no expert, but it seems the future package will allow executing a function in a separate thread. You could run the actual task in a separate thread, and in the main thread sleep for 60 seconds and then stop().
Previous Update
user1747036's answer which recommends timeout is a better alternative.
My original answer
This question is more appropriate for superuser, but here are a few things wrong with
if [[ "$(uname)" = "Linux" ]];then
killall --older-than 1m \
"/usr/lib/R/bin/exec/R --slave --no-restore --file=/home/ubuntu/script.R";
fi
The name argument is either the name of image or path to it. You have included parameters to it as well
If -s signal is not specified killall sends SIGTERM which your process may ignore. Are you able to kill a long running script with this on the command line? You may need SIGKILL / -9
More at http://linux.die.net/man/1/killall

for loop of subjobs using qsub R

I am trying to run subjobs (one for each chromosome) using R --vanilla. Since each chromosome is independent I want them to run parallel in the system. I have written the following script:
#!/bin/bash
for K in {20..21};
do
qsub -V -cwd -b y -q short.q R --vanilla --args arg1$K arg2$K arg3$K < RareMETALS.R > loggroup$K.txt; done
But somehow R opens interactively and not in command line as suppose... when trying the script itself
R --vanilla --args arg1 arg2 arg3 < RareMETALS.R > loggroup.txt; done
It runs perfectly calling the script.
Can someboby guide me, or point out which might be the problem.
My take on this would be to use echo instead of --args option to pass parameters to the script. I find separating the script and the Grid Engine code to be more straightforward:
for K in {20..21};
do
echo "Rscript RareMETALS.R arg1$K arg2$K arg3$K > loggroup$K.txt" | qsub -V -cwd -q short.q
done
As others have commented use Rscript.
Code seems cleaner to me, but there may be some limitations to using echo as opposed to --args I am unaware of.

SharpSsh - script runs twice in csh and ksh

i'm running a script from ASP.NET/C# using SharpSsh. I realize when the script runs and i do a ps -ef grep from unix, i see the same script running twice, one in csh -c, and the other with ksh. The script has shebang ksh, so i'm not sure why a copy of csh is also running. Also if i run the same script directly from unix, only one copy runs with ksh. There's no other shell running from within the script.
Most Unix/Linux now have a command or option that will show process trees, with indented list like, look for -t or -T options to ps OR ptree OR ???
USER PID PPID START TT TIME CMD
daemon 1 1 11-03-06 ? 0 init
myusr 221568 1 11-03-07 tty10 1.00s \_ -ksh
myusr 350976 221568 07:52:11 tty10 0 | \_ ps -efT
I bet you'll see that the csh is the user login shell that includes your script as an argument ( you may have to use different options to ps to see the full command-line of the csh process) AND as a sub process you'll see ksh executing your script, and further sub-processes under ksh for any external commands that the script is calling.
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, or give it a + (or -) as a useful answer.

Resources