R -- For loop using sprintf function in command system evaluation - r

I have a number of files that I would like to run using a batch file that will be executed from R using a for loop. As an example, let's assume that I have 2 runs that I would like to execute:
runs <- c(102, 103)
The syntax for the system command requires that the batch file be specified first, followed by the input data file for the run (102.txt and 103.txt) and the name of the output results file after the batch file has been executed (102.res and 103.res). I am attempting to run this using a for loop:
for (r in runs) {
cmd <- sprintf('C:/example1/test.bat %d.txt %d.res', runs, runs)[1]
print(eval(cmd))
command: system(cmd)
}
[1] "C:/example1/test.bat 102.txt 102.res"
Unfortunately, this only executes the first run (102) and does not advance to the next run (103). The R console displays the following warning:
Error in command:system(cmd) : NA/NaN argument
Thinking that this error is what is preventing R from advancing to the next run, I have attempted to use options(warn = -1) in the for loop:
for (r in runs) {
options(warn = -1)
cmd <- sprintf('C:/example1/test.bat %d.ctl %d.res', runs, runs)[1]
print(eval(cmd))
command: system(cmd)
options(warn = 0)
}
Unfortunately, this continues to throw the same error. For what it's worth, the output from my batch file (102.res) is exactly how I want it to be, I simply want to be able to bypass this error and continue on with the rest of my runs. Any thoughts on how best to do that?
Thanks in advance.

Here's what you had
runs <- c(102, 103)
for (r in runs) {
cmd <- sprintf('C:/example1/test.bat %d.txt %d.res', runs, runs)[1]
print(eval(cmd))
# command: system(cmd)
}
which outputs
[1] "C:/example1/test.bat 102.txt 102.res"
[1] "C:/example1/test.bat 102.txt 102.res"
try using the loop variable, r, instead of the array, runs, in the cmd <-... line
for (r in runs) {
cmd <- sprintf('C:/example1/test.bat %d.txt %d.res', r, r)[1] # <- change runs to r
print(eval(cmd))
# command: system(cmd)
}
output is
[1] "C:/example1/test.bat 102.txt 102.res"
[1] "C:/example1/test.bat 103.txt 103.res"

Related

How do I run a bash script in a server with `for` and `R` code that I can exit the terminal and does not kill the process?

Context: I am running a simulation via R that each repetition takes too long and it is memory consuming. Therefore, I need that for every repetition another session in R starts, so that I will not run into memory issues.
My problem: After executing my bash script, I exit from the terminal, the current repetition finishes successfully, but the next one does not start (I am running it on a server via ssh).
What I have done:
compoisson.sh bash script:
#!/bin/bash
for rho in $(seq 1 3); do
for rep in $(seq 1 200); do
Rscript Teste.R $rho $rep
done
done
In the terminal (after entering via ssh user#domain...):
chmod +x compoisson.sh
sh compoisson.sh &
exit
My Teste.R script (the content is not important, it could be an empty file):
rm(list=ls())
library(TMB)
# library(TMB, lib.loc = "/home/est/bonat/nobackup/github")
model <- "04_compoisson_bi" #1st try
compile(paste0(model, ".cpp"), flags = "-O0 -ggdb")
dyn.load(dynlib(model))
## Data simulation -------------------------------------------------------------
nresp <- 2; beta1 <- log(7);beta2 <- log(1.5);nu1 <- .7
nu2 <- .7;n <- 50;s2_1 <- 0.3;s2_2 <- 0.15;true_rho <- 0
sample_size <- 1000
openmp(4)
args <- commandArgs(TRUE)
rhos <- c(-.5,0,.5)
true_rho <- rhos[abs(as.numeric(args[1]))]
j <- abs(as.numeric(args[2]))
seed <- 2109+j
res_neg <- simulacao(nresp, beta1, beta2, true_rho, s2_1, s2_2, seed, sample_size = sample_size, model, nu1=nu1, nu2=nu2, j = j) # 1 by time
saveRDS(res_neg, file = paste0(getwd(), "/Output/output_cmp_rho", true_rho, "n", sample_size, "j", j, ".rds"))
An important detail is that I need to run it on a external server via ssh.
I did a small test with an empty .R file on my PC, and I was able too see different processes being created via htop. On server, it did not happened.
I also tried nohup to run my compoisson.sh file (question1, question2), but I did not have any success. My test:
nohup compoisson.sh &
ignoring the entrance and attaching the output to 'nohup.out'
nohup: failt to execute the command 'compoisson.sh': File or directory does not exists.
What am I doing wrong?
Solved with nohup ./compoisson.sh & instead of sh compoisson.sh &

How to save everything from Console to a text file while executing an R script using "remoteScript" of Microsoft R?

I'm curious to know if there is a way to save everything from the R console to a text file when an R script is submitted and executed "remotely" from Microsoft R client to Microsoft R server using the command remoteScript() or remoteExecute().
For example, consider a simple R script 'test.R' as follows:
set.seed(123)
x <- rnorm(100)
mean(x)
I know, to save all from the console, it can be executed in a Local R session as follows:
Local R session:
# Create a text file to save all from console
logfile <- file("C:/.../MyLog1.txt")
sink(logfile, append = TRUE)
sink(logfile, append = TRUE, type = "message")
# Execute in the local R session
source("C:/.../test.R", echo = TRUE, max.deparse.length = 1000000)
sink()
sink(type = "message")
The log file 'MyLog1.txt', as expected, has everything from the console:
> set.seed(123)
> x <- rnorm(100)
> mean(x)
[1] 0.09040591
Similarly, the same script can be executed with remoteScript() after connecting to the R server.
Remote R session:
# Connect to the server and create a remote session
remoteLogin(...)
# Go back to the local R session
pause()
# Create a text file to save all from console
logfile <- file("C:/.../MyLog2.txt")
sink(logfile, append = TRUE)
sink(logfile, append = TRUE, type = "message")
# Execute in the remote R session
remoteScript("C:/.../test.R")
sink()
sink(type = "message")
But the log file 'MyLog2.txt' looks different, as shown below:
[1] 0.09040591
$success
[1] TRUE
$errorMessage
[1] ""
$outputParameters
list()
$consoleOutput
[1] "[1] 0.09040591\r\n"
$changedFiles
list()
$backgroundUpdate
[1] 0
It has only the "Output" and some additional information. Each line of the code with '>' command prompt was not printed like 'MyLog1.txt'. It may be because there is no option like echo=TRUE for remoteScript().
Can anybody help me out with any alternative?
Thanks..

Rscript: How to inject options for an R script [duplicate]

I've got a R script for which I'd like to be able to supply several command-line parameters (rather than hardcode parameter values in the code itself). The script runs on Windows.
I can't find info on how to read parameters supplied on the command-line into my R script. I'd be surprised if it can't be done, so maybe I'm just not using the best keywords in my Google search...
Any pointers or recommendations?
Dirk's answer here is everything you need. Here's a minimal reproducible example.
I made two files: exmpl.bat and exmpl.R.
exmpl.bat:
set R_Script="C:\Program Files\R-3.0.2\bin\RScript.exe"
%R_Script% exmpl.R 2010-01-28 example 100 > exmpl.batch 2>&1
Alternatively, using Rterm.exe:
set R_TERM="C:\Program Files\R-3.0.2\bin\i386\Rterm.exe"
%R_TERM% --no-restore --no-save --args 2010-01-28 example 100 < exmpl.R > exmpl.batch 2>&1
exmpl.R:
options(echo=TRUE) # if you want see commands in output file
args <- commandArgs(trailingOnly = TRUE)
print(args)
# trailingOnly=TRUE means that only your arguments are returned, check:
# print(commandArgs(trailingOnly=FALSE))
start_date <- as.Date(args[1])
name <- args[2]
n <- as.integer(args[3])
rm(args)
# Some computations:
x <- rnorm(n)
png(paste(name,".png",sep=""))
plot(start_date+(1L:n), x)
dev.off()
summary(x)
Save both files in the same directory and start exmpl.bat. In the result you'll get:
example.png with some plot
exmpl.batch with all that was done
You could also add an environment variable %R_Script%:
"C:\Program Files\R-3.0.2\bin\RScript.exe"
and use it in your batch scripts as %R_Script% <filename.r> <arguments>
Differences between RScript and Rterm:
Rscript has simpler syntax
Rscript automatically chooses architecture on x64 (see R Installation and Administration, 2.6 Sub-architectures for details)
Rscript needs options(echo=TRUE) in the .R file if you want to write the commands to the output file
A few points:
Command-line parameters are
accessible via commandArgs(), so
see help(commandArgs) for an
overview.
You can use Rscript.exe on all platforms, including Windows. It will support commandArgs(). littler could be ported to Windows but lives right now only on OS X and Linux.
There are two add-on packages on CRAN -- getopt and optparse -- which were both written for command-line parsing.
Edit in Nov 2015: New alternatives have appeared and I wholeheartedly recommend docopt.
Add this to the top of your script:
args<-commandArgs(TRUE)
Then you can refer to the arguments passed as args[1], args[2] etc.
Then run
Rscript myscript.R arg1 arg2 arg3
If your args are strings with spaces in them, enclose within double quotes.
Try library(getopt) ... if you want things to be nicer. For example:
spec <- matrix(c(
'in' , 'i', 1, "character", "file from fastq-stats -x (required)",
'gc' , 'g', 1, "character", "input gc content file (optional)",
'out' , 'o', 1, "character", "output filename (optional)",
'help' , 'h', 0, "logical", "this help"
),ncol=5,byrow=T)
opt = getopt(spec);
if (!is.null(opt$help) || is.null(opt$in)) {
cat(paste(getopt(spec, usage=T),"\n"));
q();
}
Since optparse has been mentioned a couple of times in the answers, and it provides a comprehensive kit for command line processing, here's a short simplified example of how you can use it, assuming the input file exists:
script.R:
library(optparse)
option_list <- list(
make_option(c("-n", "--count_lines"), action="store_true", default=FALSE,
help="Count the line numbers [default]"),
make_option(c("-f", "--factor"), type="integer", default=3,
help="Multiply output by this number [default %default]")
)
parser <- OptionParser(usage="%prog [options] file", option_list=option_list)
args <- parse_args(parser, positional_arguments = 1)
opt <- args$options
file <- args$args
if(opt$count_lines) {
print(paste(length(readLines(file)) * opt$factor))
}
Given an arbitrary file blah.txt with 23 lines.
On the command line:
Rscript script.R -h outputs
Usage: script.R [options] file
Options:
-n, --count_lines
Count the line numbers [default]
-f FACTOR, --factor=FACTOR
Multiply output by this number [default 3]
-h, --help
Show this help message and exit
Rscript script.R -n blah.txt outputs [1] "69"
Rscript script.R -n -f 5 blah.txt outputs [1] "115"
you need littler (pronounced 'little r')
Dirk will be by in about 15 minutes to elaborate ;)
In bash, you can construct a command line like the following:
$ z=10
$ echo $z
10
$ Rscript -e "args<-commandArgs(TRUE);x=args[1]:args[2];x;mean(x);sd(x)" 1 $z
[1] 1 2 3 4 5 6 7 8 9 10
[1] 5.5
[1] 3.027650
$
You can see that the variable $z is substituted by bash shell with "10" and this value is picked up by commandArgs and fed into args[2], and the range command x=1:10 executed by R successfully, etc etc.
FYI: there is a function args(), which retrieves the arguments of R functions, not to be confused with a vector of arguments named args
If you need to specify options with flags, (like -h, --help, --number=42, etc) you can use the R package optparse (inspired from Python):
http://cran.r-project.org/web/packages/optparse/vignettes/optparse.pdf.
At least this how I understand your question, because I found this post when looking for an equivalent of the bash getopt, or perl Getopt, or python argparse and optparse.
I just put together a nice data structure and chain of processing to generate this switching behaviour, no libraries needed. I'm sure it will have been implemented numerous times over, and came across this thread looking for examples - thought I'd chip in.
I didn't even particularly need flags (the only flag here is a debug mode, creating a variable which I check for as a condition of starting a downstream function if (!exists(debug.mode)) {...} else {print(variables)}). The flag checking lapply statements below produce the same as:
if ("--debug" %in% args) debug.mode <- T
if ("-h" %in% args || "--help" %in% args)
where args is the variable read in from command line arguments (a character vector, equivalent to c('--debug','--help') when you supply these on for instance)
It's reusable for any other flag and you avoid all the repetition, and no libraries so no dependencies:
args <- commandArgs(TRUE)
flag.details <- list(
"debug" = list(
def = "Print variables rather than executing function XYZ...",
flag = "--debug",
output = "debug.mode <- T"),
"help" = list(
def = "Display flag definitions",
flag = c("-h","--help"),
output = "cat(help.prompt)") )
flag.conditions <- lapply(flag.details, function(x) {
paste0(paste0('"',x$flag,'"'), sep = " %in% args", collapse = " || ")
})
flag.truth.table <- unlist(lapply(flag.conditions, function(x) {
if (eval(parse(text = x))) {
return(T)
} else return(F)
}))
help.prompts <- lapply(names(flag.truth.table), function(x){
# joins 2-space-separatated flags with a tab-space to the flag description
paste0(c(paste0(flag.details[x][[1]][['flag']], collapse=" "),
flag.details[x][[1]][['def']]), collapse="\t")
} )
help.prompt <- paste(c(unlist(help.prompts),''),collapse="\n\n")
# The following lines handle the flags, running the corresponding 'output' entry in flag.details for any supplied
flag.output <- unlist(lapply(names(flag.truth.table), function(x){
if (flag.truth.table[x]) return(flag.details[x][[1]][['output']])
}))
eval(parse(text = flag.output))
Note that in flag.details here the commands are stored as strings, then evaluated with eval(parse(text = '...')). Optparse is obviously desirable for any serious script, but minimal-functionality code is good too sometimes.
Sample output:
$ Rscript check_mail.Rscript --help
--debug Print variables rather than executing function XYZ...
-h --help Display flag definitions

Set Niceness of PSOCK cluster in R

I would like to increase the niceness of my cluster jobs. The following code is successful:
> cl <- makePSOCKcluster(rep('localhost', 2))
> clusterEvalQ(cl = cl, rnorm(3))
[[1]]
[1] -0.6452848 -0.9899609 0.3083131
[[2]]
[1] 1.1687733 -0.1930413 1.1576510
This, however, is not.
cl <- makePSOCKcluster(names = rep('localhost', 8), renice = 15)
nice: +15: No such file or directory
I am able to set the niceness after cluster creation using the following code:
clusterEvalQ(cl = cl, tools::psnice(value = 15))
After reading the documentation for makePSOCKcluster, I'm not sure what I'm doing wrong in the cluster creation step, and have been unable to track down the issue. How can I create a cluster and set the niceness of the worker threads all at the same time?
I consider this to be a bug in the "parallel" package. When you use the makePSOCKcluster "renice" option, it uses a form of the "nice" command that doesn't work with bash, but I believe works with csh/tcsh. You can see the generated command by using the "manual=TRUE" option:
> library(parallel)
> cl <- makePSOCKcluster(2, renice=15, manual=TRUE)
Manually start worker on localhost with
nice +15 '/home/sw/R/sources/R-3.3.0/bin/Rscript' --default-packages=datasets,utils,grDevices,graphics,stats,methods -e 'parallel:::.slaveRSOCK()' MASTER=localhost PORT=11379 OUT=/dev/null TIMEOUT=2592000 XDR=TRUE
If you try to execute this from bash, you will get the same error message that you reported. The syntax for bash should be:
$ nice -n 15 '/home/sw/R/sources/R-3.3.0/bin/Rscript' ...

How to get the queue number from CONDOR into your R job

I think I have a simple problem because I was looking up and down the internet and couldn't find someone else asking this question:
My university has a Condor set-up. I want to run several repetitions of the same code (e.g. 100 times). My R code has a routine to store the results in a file, i.e.:
write.csv(res, file=paste(paste(paste(format(Sys.time(), '%y%m%d'),'res', queue, sep="_"), sep='/'),'.csv',sep='',collapse=''))
res are my results (a data.frame), I indicate that this file contains the results with 'res' and finally I want to add the queue number of this calculation (otherwise files would be replaced, wouldn't they?). It should look like: 140109_res_1.csv, 140109_res_2.csv, ...
My submit file to condor looks like this:
universe = vanilla
executable = /usr/bin/R
arguments = --vanilla
log = testR.log
error = testR.err
input = run_condor.r
output = testR$(Process).txt
requirements = (opsys == "LINUX") && (arch == "X86_64") && (HAS_R_2_13 =?= True)
request_memory = 1000
should_transfer_files = YES
transfer_executable = FALSE
when_to_transfer_output = ON_EXIT
queue 3
I wonder how do I get the 'queue' number into my R code? I tried a simple example with
print(queue)
print(Queue)
But there is no object found called queue or Queue. Any suggestions?
Best wishes,
Marco
Okay, I solved the problem. This is how it goes:
I had to change my submit file. I changed the slot arguments to:
arguments = --vanilla --args $(Process)
Now the process number is forwarded to the R code. There you retrieve it with the following line. The value will be stored as a character. Therefore, you should convert it to a numeric value (also check whether a number like 10 is passed on as '1' and '0' in which case you should also collapse the values).
run <- commandArgs(TRUE)
Here is an example of the code I let run.
> run <- commandArgs(TRUE)
> run
[1] "0"
> class(run)
[1] "character"
> try(as.numeric(run))
[1] 0
> try(run <- as.numeric(paste(run, collapse='')) )
> try(print(run))
[1] 0
> try(write(run, paste(run,'csv', sep='.')))
You can also find information how to pass on variables/arguments to your code here: http://research.cs.wisc.edu/htcondor/manual/v7.6/condor_submit.html
I hope this helps anyone.
Cheers and thanks for all other commenters!
Marco

Resources