Submitting R scripts with command_line_arguments to PBS HPC clusters - r

please could you advise me on the following : I've written a R script that reads 3 arguments from the command line, i.e. :
args <- commandArgs(TRUE)
TUMOR <- args[1]
GERMLINE <- args[2]
CHR <- args[3]
when I submit the R script to a PBS HPC scheduler, I do the following (below), but ... I am getting an error message.
(I am not posting the error message, because the R script I wrote works fine when it is run from a regular terminal ..)
Please may I ask, how do you usually submit the R scripts with command line arguments to PBS HPC schedulers ?
qsub -d $PWD -l nodes=1:ppn=4 -l vmem=10gb -m bea -M tanasa#gmail.com \
-v TUMOR="tumor.bam",GERMLINE="germline.bam",CHR="chr22" \
-e script.efile.chr22 \
-o script.ofile.chr22 \
script.R

Related

Batch and Bash codes while submitting jobs

I was used to the following way of submitting my jobs that to be done in R in an sequential way in 'PBS/Torque'.
following is my R code named simsim.R
#########
set<-1
#########
# Read i
#########
#the following two refers to the bash code
arg <- commandArgs()
arg
itration<- as.numeric(arg)[3]
itration
setwd("/home/habijabi")
save(arg,itration,
file = paste0('simsim_RESULT_',set,itration,'.RData'))
Now I write the following set of codes
#!/bin/bash
Chains=10
for cha in `seq 1 $Chains`
do
echo "Chains: " $cha
sleep 1
qsub -q long -l nodes=1:ppn=12,walltime=24:00:00 -v c=$cha ./diffv1.sh
done
in this 'diffv1.sh' I used to load the module and pass the variable 'c'.
#!/bin/bash
## input values
c=$c
#configure software
module load R/4.1.2
#changed
cd /home/habijabi
R --no-save < simsim.R $c
In this way I was used to sending the '$c' value to my R code. And it would have produced me 10 many .R files with the corresponding names.
But then I had to change to 'SLURM'. Following is the batch code that I was using.
#!/bin/bash
#SBATCH --job-name=R-test
#IO files
#SBATCH --error=R-test.%J.err
#SBATCH --output=R-test.%J.out
#!/bin/bash
module load R/4.1.2
set -e -x
mkdir -p jobs
cd /home/habijabi
for cha in {1..10}
do
sbatch --time=24:00:00 \
--ntasks-per-node=12 \
--nodes=1 \
-p compute \
-o jobs/${cha}_srun.txt \
--wrap="R --no-save < /home/habijabi/simsim.R ${cha}"
done
But with this code, it runs only once or twice. And I do not understand why after submitting 150 jobs it does not run all of them.... The run file shows the following:
+ mkdir -p jobs
+ cd /home/habijabi
+ for cha in '{1..10}'
+ sbatch --time=24:00:00 --ntasks-per-node=12 --nodes=1 -p compute -o jobs/1_srun.txt '--wrap=R --no-save < /home/habijabi/simsim.R 1'
+ for cha in '{1..10}'
+ sbatch --time=24:00:00 --ntasks-per-node=12 --nodes=1 -p compute -o jobs/2_srun.txt '--wrap=R --no-save < /home/habijabi/simsim.R 2'
+ for cha in '{1..10}'
+ sbatch --time=24:00:00 --ntasks-per-node=12 --nodes=1 -p compute -o jobs/3_srun.txt '--wrap=R --no-save < /home/habijabi/simsim.R 3'
...so on...
and the .out file shows the following
Submitted batch job 146299
Submitted batch job 146300
Submitted batch job 146301
Submitted batch job 146302
Submitted batch job 146303
......
......
Both are doing fine...But here, a few of the jobs run, and majority of them gives error as follows.
/opt/ohpc/pub/libs/gnu8/R/4.1.2/lib64/R/bin/exec/R: error while loading shared libraries: libpcre2-8.so.0: cannot open shared object file: No such file or directory
I do not understand what I have done wrong....This does not produce anything... I am new at this type of coding, any help is appreciated.

Running Docker with infile bash syntax

I need to run R in a Docker container and want to input a script from a volume I mounted using the standard infile notation, however, the file seems to be redirected to Docker, not R.
I'm using the following command:
docker run -v /root/share:/share r-base:latest R --vanilla --quiet < /share/test.r
How can I use the infile notation and run my R in Docker? (I need the direct output from R, so Rscript will not do.)
As #mazel-tov mentioned: try
docker run -v /root/share:/share r-base:latest /bin/bash -c 'R --vanilla --quiet < /share/test.r'
That way the redirection of stdin is done by the bash inside the docker instead of in your shell that starts the docker.

Where is my log output going?

I'm executing an R-script from within ruby using the following command:
country = "de"
system("cd ~/myrep/production && Rscript --vanilla main.r --country #{country} --environment production &> /home/myuser/dir/shared/log/log_#{country}.log &")
The log file is created, however I find no output in the file, where as if I execute the command straight in the OS like so:
cd ~/myrep/production && Rscript --vanilla main.r --country de --environment production
Then I see output I see lot's of output in the terminal
update
As suggested in the comments I tried changing the redirect. It didn't work. When doing a ps aux in the terminal, it looks like the actual command being executed is actually:
/usr/lib/R/bin/exec/R --slave --no-restore --vanilla --file=main.r --args --country de --environment production
So without the output redirection to file...?

Passing SLURM batch command line arguments to R

I'm trying to run a SLURM sbatch command with various parameters that I can read in an R script. When using PBS system, I used to write qsub -v param1=x,param2=y (+ other system parameters like the memory requirements etc and the script name to be read by PBS) and then in the R script read it with x = Sys.getenv(‘param1’).
Now I tried
sbatch run.sh --export=basePath=‘a’
With run.sh:
#!/bin/bash
cd $SLURM_SUBMIT_DIR
echo $PWD
module load R/common/3.3.3
R CMD BATCH --quiet --no-restore --no-save runDo.R output.txt
And runDo.R:
base.path = Sys.getenv('basePath')
print(base.path)
The script is running but the argument value is not assigned to base.path variable (it prints an empty string).
The export parameter has to be passed to sbatch not to the run.sh script.
It should be like this:
sbatch --export=basePath=‘a’ run.sh

for loop of subjobs using qsub R

I am trying to run subjobs (one for each chromosome) using R --vanilla. Since each chromosome is independent I want them to run parallel in the system. I have written the following script:
#!/bin/bash
for K in {20..21};
do
qsub -V -cwd -b y -q short.q R --vanilla --args arg1$K arg2$K arg3$K < RareMETALS.R > loggroup$K.txt; done
But somehow R opens interactively and not in command line as suppose... when trying the script itself
R --vanilla --args arg1 arg2 arg3 < RareMETALS.R > loggroup.txt; done
It runs perfectly calling the script.
Can someboby guide me, or point out which might be the problem.
My take on this would be to use echo instead of --args option to pass parameters to the script. I find separating the script and the Grid Engine code to be more straightforward:
for K in {20..21};
do
echo "Rscript RareMETALS.R arg1$K arg2$K arg3$K > loggroup$K.txt" | qsub -V -cwd -q short.q
done
As others have commented use Rscript.
Code seems cleaner to me, but there may be some limitations to using echo as opposed to --args I am unaware of.

Resources