Batch and Bash codes while submitting jobs

Batch and Bash codes while submitting jobs - r

I was used to the following way of submitting my jobs that to be done in R in an sequential way in 'PBS/Torque'.
following is my R code named simsim.R
#########
set<-1
#########
# Read i
#########
#the following two refers to the bash code
arg <- commandArgs()
arg
itration<- as.numeric(arg)[3]
itration
setwd("/home/habijabi")
save(arg,itration,
file = paste0('simsim_RESULT_',set,itration,'.RData'))
Now I write the following set of codes
#!/bin/bash
Chains=10
for cha in `seq 1 $Chains`
do
echo "Chains: " $cha
sleep 1
qsub -q long -l nodes=1:ppn=12,walltime=24:00:00 -v c=$cha ./diffv1.sh
done
in this 'diffv1.sh' I used to load the module and pass the variable 'c'.
#!/bin/bash
## input values
c=$c
#configure software
module load R/4.1.2
#changed
cd /home/habijabi
R --no-save < simsim.R $c
In this way I was used to sending the '$c' value to my R code. And it would have produced me 10 many .R files with the corresponding names.
But then I had to change to 'SLURM'. Following is the batch code that I was using.
#!/bin/bash
#SBATCH --job-name=R-test
#IO files
#SBATCH --error=R-test.%J.err
#SBATCH --output=R-test.%J.out
#!/bin/bash
module load R/4.1.2
set -e -x
mkdir -p jobs
cd /home/habijabi
for cha in {1..10}
do
sbatch --time=24:00:00 \
--ntasks-per-node=12 \
--nodes=1 \
-p compute \
-o jobs/${cha}_srun.txt \
--wrap="R --no-save < /home/habijabi/simsim.R ${cha}"
done
But with this code, it runs only once or twice. And I do not understand why after submitting 150 jobs it does not run all of them.... The run file shows the following:
+ mkdir -p jobs
+ cd /home/habijabi
+ for cha in '{1..10}'
+ sbatch --time=24:00:00 --ntasks-per-node=12 --nodes=1 -p compute -o jobs/1_srun.txt '--wrap=R --no-save < /home/habijabi/simsim.R 1'
+ for cha in '{1..10}'
+ sbatch --time=24:00:00 --ntasks-per-node=12 --nodes=1 -p compute -o jobs/2_srun.txt '--wrap=R --no-save < /home/habijabi/simsim.R 2'
+ for cha in '{1..10}'
+ sbatch --time=24:00:00 --ntasks-per-node=12 --nodes=1 -p compute -o jobs/3_srun.txt '--wrap=R --no-save < /home/habijabi/simsim.R 3'
...so on...
and the .out file shows the following
Submitted batch job 146299
Submitted batch job 146300
Submitted batch job 146301
Submitted batch job 146302
Submitted batch job 146303
......
......
Both are doing fine...But here, a few of the jobs run, and majority of them gives error as follows.
/opt/ohpc/pub/libs/gnu8/R/4.1.2/lib64/R/bin/exec/R: error while loading shared libraries: libpcre2-8.so.0: cannot open shared object file: No such file or directory
I do not understand what I have done wrong....This does not produce anything... I am new at this type of coding, any help is appreciated.

Related

running multiple R scripts in bash script

I have 2 R scripts called script1.R and script2.R respectively and I want to run them sequentially as pipeline. To do so, I am trying a bash script to run 2 R scripts, use inputs and arguments (for every script) and returns output (for every script). since there are 2 R scripts, we have 2 steps as follows:
1- this the command for the 1st R script:
Rscript script1.R /path/to/input /path/to/OutputDir argument1 argument2
2- this the command for the 2nd R script:
Rscript script2.R /path/to/output_of_1st_script /path/to/OutputDir argument3 default_yes
to run these 2 R scripts I have made the following bash script but does not return anything! do you know how to fix it?
#!/bin/bash
set -e;
set -u;
OUTDIR1="./output1/";
OUTDIR2="./output2/";
INDIR="./input/";
argument1=$1;
argument2=$2;
argument3=$3;
mkdir ${OUTDIR1} || true;
#to run the 1st script
ls -1 ${INDIR}/*/inputfile.txt | sort -V | while read infile; do
b=$(basename $(dirname "$infile"));
./script1.R \
-v \
run \
-o "${of}" \
-f 0 \
argument1 \
argument2 \
"${b},${infile}" \
;
done
#to run the 2nd script. input of this sript is the output from previous script
ls -1 ${OUTDIR1}/*/output.txt | sort -V | while read outfile1; do
b=$(basename $(dirname "$outfile1"));
./script1.R \
-v \
run \
-o "${of}" \
-f 0 \
argument3 \
"${b},${outfile1}" \
;
done

Run R code in parallel in a shell without having R file

I've got the following .sh file which can be run on a cluster computer using sbatch:
Shell.sh
#!/bin/bash
#
#SBATCH -p smp # partition (queue)
#SBATCH -N 2 # number of nodes
#SBATCH -n 2 # number of cores
#SBATCH --mem 2000 # memory pool for all cores
#SBATCH -t 5-0:00 # time (D-HH:MM)
#SBATCH -o out.out # STDOUT
#SBATCH -e err.err # STDERR
module load R
srun -N1 -n1 R CMD BATCH ./MyFile.R &
srun -N1 -n1 R CMD BATCH ./MyFile2.R &
wait
My problem is that MyFile.R and MyFile2.R almost look the same:
MyFile.R
source("Experiment.R")
Experiment(args1) # some arguments
MyFile2.R
source("Experiment.R")
Experiment(args2) # some arguments
In fact, I need to do this for about 100 files. Since they all load some R file and then run the experiment with different arguments, I was wondering whether I could do this without creating a new file for each run. I want to run all processes in parallel, so I can't just create one single R file, I think.
My question is: is there some way to run the process directly from the shell, without having an R file for each run? So can I do something like
srun -N1 -n1 R cmd BATCH 'source("Experiment.R"); Experiment(args1)' &
srun -N1 -n1 R cmd BATCH 'source("Experiment.R"); Experiment(args2)' &
wait
instead of the last three lines in shell.sh?

Your batch script should still include 2 lines to start 2 different R processes, but you can pass the arguments on command line using the same file name:
module load R
srun -N1 -n1 Rscript ./MyFile.R args1_1 args1_2 &
srun -N1 -n1 Rscript ./MyFile.R args2_1 args2_2 &
wait
Then within your R file:
source("Experiment.R")
#Get aruments from the command line
argv <- commandArgs(TRUE)
# Check if the command line is not empty and convert values if needed
if (length(argv) > 0){
nSim <- as.numeric( argv[1] )
meanVal <- as.numeric( argv[2] )
} else {
nSim=100 # some default values
meanVal =5
}
Experiment(nSim, meanVal) # some arguments
If you prefer to use R command instead of Rscript, then your batch script should look like:
module load R
srun -N1 -n1 R -q --slave --vanilla --args args1_1 args1_2 < myFile.R &
srun -N1 -n1 R -q --slave --vanilla --args args2_1 args2_2 < myFile.R &
wait
You might need (or not) quotes for "R -q --slave ... < myFile.R" part

Passing SLURM batch command line arguments to R

I'm trying to run a SLURM sbatch command with various parameters that I can read in an R script. When using PBS system, I used to write qsub -v param1=x,param2=y (+ other system parameters like the memory requirements etc and the script name to be read by PBS) and then in the R script read it with x = Sys.getenv(‘param1’).
Now I tried
sbatch run.sh --export=basePath=‘a’
With run.sh:
#!/bin/bash
cd $SLURM_SUBMIT_DIR
echo $PWD
module load R/common/3.3.3
R CMD BATCH --quiet --no-restore --no-save runDo.R output.txt
And runDo.R:
base.path = Sys.getenv('basePath')
print(base.path)
The script is running but the argument value is not assigned to base.path variable (it prints an empty string).

The export parameter has to be passed to sbatch not to the run.sh script.
It should be like this:
sbatch --export=basePath=‘a’ run.sh

Submitting R scripts with command_line_arguments to PBS HPC clusters

please could you advise me on the following : I've written a R script that reads 3 arguments from the command line, i.e. :
args <- commandArgs(TRUE)
TUMOR <- args[1]
GERMLINE <- args[2]
CHR <- args[3]
when I submit the R script to a PBS HPC scheduler, I do the following (below), but ... I am getting an error message.
(I am not posting the error message, because the R script I wrote works fine when it is run from a regular terminal ..)
Please may I ask, how do you usually submit the R scripts with command line arguments to PBS HPC schedulers ?
qsub -d $PWD -l nodes=1:ppn=4 -l vmem=10gb -m bea -M tanasa#gmail.com \
-v TUMOR="tumor.bam",GERMLINE="germline.bam",CHR="chr22" \
-e script.efile.chr22 \
-o script.ofile.chr22 \
script.R

logging unix "cp" (copy) command response

I am coping some file,So, the result can be either way.
eg:
>cp -R bin/*.ksh ../backup/
>cp bin/file.sh ../backup/bin/
When I execute above commands, its getting copied. No response from the system, if it copied successful. If not, prints the error or response in terminal itself cp: file.sh: No such file or directory.
Now, I want to log the error message, or if it successful I want to log my custom message to a file. How can I do?
Any help indeed.
Thanks

try writing this in a shell script:
#these three lines are to check if script is already running.
#got this from some site don't remember :(
ME=`basename "$0"`;
LCK="./${ME}.LCK";
exec 8>$LCK;
LOGFILE=~/mycp.log
if flock -n -x 8; then
# 2>&1 will redirect any error or other output to $LOGFILE
cp -R bin/*.ksh ../backup/ >> $LOGFILE 2>&1
# $? is shell variable that contains outcome of last ran command
# cp will return 0 if there was no error
if [$? -eq 0]; then
echo 'copied succesfully' >> $LOGFILE
fi
fi

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Batch and Bash codes while submitting jobs - r

Related

running multiple R scripts in bash script

Run R code in parallel in a shell without having R file

Passing SLURM batch command line arguments to R

Submitting R scripts with command_line_arguments to PBS HPC clusters

logging unix "cp" (copy) command response

Categories

Resources