Rscript: How to inject options for an R script [duplicate] - r

I've got a R script for which I'd like to be able to supply several command-line parameters (rather than hardcode parameter values in the code itself). The script runs on Windows.
I can't find info on how to read parameters supplied on the command-line into my R script. I'd be surprised if it can't be done, so maybe I'm just not using the best keywords in my Google search...
Any pointers or recommendations?

Dirk's answer here is everything you need. Here's a minimal reproducible example.
I made two files: exmpl.bat and exmpl.R.
exmpl.bat:
set R_Script="C:\Program Files\R-3.0.2\bin\RScript.exe"
%R_Script% exmpl.R 2010-01-28 example 100 > exmpl.batch 2>&1
Alternatively, using Rterm.exe:
set R_TERM="C:\Program Files\R-3.0.2\bin\i386\Rterm.exe"
%R_TERM% --no-restore --no-save --args 2010-01-28 example 100 < exmpl.R > exmpl.batch 2>&1
exmpl.R:
options(echo=TRUE) # if you want see commands in output file
args <- commandArgs(trailingOnly = TRUE)
print(args)
# trailingOnly=TRUE means that only your arguments are returned, check:
# print(commandArgs(trailingOnly=FALSE))
start_date <- as.Date(args[1])
name <- args[2]
n <- as.integer(args[3])
rm(args)
# Some computations:
x <- rnorm(n)
png(paste(name,".png",sep=""))
plot(start_date+(1L:n), x)
dev.off()
summary(x)
Save both files in the same directory and start exmpl.bat. In the result you'll get:
example.png with some plot
exmpl.batch with all that was done
You could also add an environment variable %R_Script%:
"C:\Program Files\R-3.0.2\bin\RScript.exe"
and use it in your batch scripts as %R_Script% <filename.r> <arguments>
Differences between RScript and Rterm:
Rscript has simpler syntax
Rscript automatically chooses architecture on x64 (see R Installation and Administration, 2.6 Sub-architectures for details)
Rscript needs options(echo=TRUE) in the .R file if you want to write the commands to the output file

A few points:
Command-line parameters are
accessible via commandArgs(), so
see help(commandArgs) for an
overview.
You can use Rscript.exe on all platforms, including Windows. It will support commandArgs(). littler could be ported to Windows but lives right now only on OS X and Linux.
There are two add-on packages on CRAN -- getopt and optparse -- which were both written for command-line parsing.
Edit in Nov 2015: New alternatives have appeared and I wholeheartedly recommend docopt.

Add this to the top of your script:
args<-commandArgs(TRUE)
Then you can refer to the arguments passed as args[1], args[2] etc.
Then run
Rscript myscript.R arg1 arg2 arg3
If your args are strings with spaces in them, enclose within double quotes.

Try library(getopt) ... if you want things to be nicer. For example:
spec <- matrix(c(
'in' , 'i', 1, "character", "file from fastq-stats -x (required)",
'gc' , 'g', 1, "character", "input gc content file (optional)",
'out' , 'o', 1, "character", "output filename (optional)",
'help' , 'h', 0, "logical", "this help"
),ncol=5,byrow=T)
opt = getopt(spec);
if (!is.null(opt$help) || is.null(opt$in)) {
cat(paste(getopt(spec, usage=T),"\n"));
q();
}

Since optparse has been mentioned a couple of times in the answers, and it provides a comprehensive kit for command line processing, here's a short simplified example of how you can use it, assuming the input file exists:
script.R:
library(optparse)
option_list <- list(
make_option(c("-n", "--count_lines"), action="store_true", default=FALSE,
help="Count the line numbers [default]"),
make_option(c("-f", "--factor"), type="integer", default=3,
help="Multiply output by this number [default %default]")
)
parser <- OptionParser(usage="%prog [options] file", option_list=option_list)
args <- parse_args(parser, positional_arguments = 1)
opt <- args$options
file <- args$args
if(opt$count_lines) {
print(paste(length(readLines(file)) * opt$factor))
}
Given an arbitrary file blah.txt with 23 lines.
On the command line:
Rscript script.R -h outputs
Usage: script.R [options] file
Options:
-n, --count_lines
Count the line numbers [default]
-f FACTOR, --factor=FACTOR
Multiply output by this number [default 3]
-h, --help
Show this help message and exit
Rscript script.R -n blah.txt outputs [1] "69"
Rscript script.R -n -f 5 blah.txt outputs [1] "115"

you need littler (pronounced 'little r')
Dirk will be by in about 15 minutes to elaborate ;)

In bash, you can construct a command line like the following:
$ z=10
$ echo $z
10
$ Rscript -e "args<-commandArgs(TRUE);x=args[1]:args[2];x;mean(x);sd(x)" 1 $z
[1] 1 2 3 4 5 6 7 8 9 10
[1] 5.5
[1] 3.027650
$
You can see that the variable $z is substituted by bash shell with "10" and this value is picked up by commandArgs and fed into args[2], and the range command x=1:10 executed by R successfully, etc etc.

FYI: there is a function args(), which retrieves the arguments of R functions, not to be confused with a vector of arguments named args

If you need to specify options with flags, (like -h, --help, --number=42, etc) you can use the R package optparse (inspired from Python):
http://cran.r-project.org/web/packages/optparse/vignettes/optparse.pdf.
At least this how I understand your question, because I found this post when looking for an equivalent of the bash getopt, or perl Getopt, or python argparse and optparse.

I just put together a nice data structure and chain of processing to generate this switching behaviour, no libraries needed. I'm sure it will have been implemented numerous times over, and came across this thread looking for examples - thought I'd chip in.
I didn't even particularly need flags (the only flag here is a debug mode, creating a variable which I check for as a condition of starting a downstream function if (!exists(debug.mode)) {...} else {print(variables)}). The flag checking lapply statements below produce the same as:
if ("--debug" %in% args) debug.mode <- T
if ("-h" %in% args || "--help" %in% args)
where args is the variable read in from command line arguments (a character vector, equivalent to c('--debug','--help') when you supply these on for instance)
It's reusable for any other flag and you avoid all the repetition, and no libraries so no dependencies:
args <- commandArgs(TRUE)
flag.details <- list(
"debug" = list(
def = "Print variables rather than executing function XYZ...",
flag = "--debug",
output = "debug.mode <- T"),
"help" = list(
def = "Display flag definitions",
flag = c("-h","--help"),
output = "cat(help.prompt)") )
flag.conditions <- lapply(flag.details, function(x) {
paste0(paste0('"',x$flag,'"'), sep = " %in% args", collapse = " || ")
})
flag.truth.table <- unlist(lapply(flag.conditions, function(x) {
if (eval(parse(text = x))) {
return(T)
} else return(F)
}))
help.prompts <- lapply(names(flag.truth.table), function(x){
# joins 2-space-separatated flags with a tab-space to the flag description
paste0(c(paste0(flag.details[x][[1]][['flag']], collapse=" "),
flag.details[x][[1]][['def']]), collapse="\t")
} )
help.prompt <- paste(c(unlist(help.prompts),''),collapse="\n\n")
# The following lines handle the flags, running the corresponding 'output' entry in flag.details for any supplied
flag.output <- unlist(lapply(names(flag.truth.table), function(x){
if (flag.truth.table[x]) return(flag.details[x][[1]][['output']])
}))
eval(parse(text = flag.output))
Note that in flag.details here the commands are stored as strings, then evaluated with eval(parse(text = '...')). Optparse is obviously desirable for any serious script, but minimal-functionality code is good too sometimes.
Sample output:
$ Rscript check_mail.Rscript --help
--debug Print variables rather than executing function XYZ...
-h --help Display flag definitions

Related

How do I run a bash script in a server with `for` and `R` code that I can exit the terminal and does not kill the process?

Context: I am running a simulation via R that each repetition takes too long and it is memory consuming. Therefore, I need that for every repetition another session in R starts, so that I will not run into memory issues.
My problem: After executing my bash script, I exit from the terminal, the current repetition finishes successfully, but the next one does not start (I am running it on a server via ssh).
What I have done:
compoisson.sh bash script:
#!/bin/bash
for rho in $(seq 1 3); do
for rep in $(seq 1 200); do
Rscript Teste.R $rho $rep
done
done
In the terminal (after entering via ssh user#domain...):
chmod +x compoisson.sh
sh compoisson.sh &
exit
My Teste.R script (the content is not important, it could be an empty file):
rm(list=ls())
library(TMB)
# library(TMB, lib.loc = "/home/est/bonat/nobackup/github")
model <- "04_compoisson_bi" #1st try
compile(paste0(model, ".cpp"), flags = "-O0 -ggdb")
dyn.load(dynlib(model))
## Data simulation -------------------------------------------------------------
nresp <- 2; beta1 <- log(7);beta2 <- log(1.5);nu1 <- .7
nu2 <- .7;n <- 50;s2_1 <- 0.3;s2_2 <- 0.15;true_rho <- 0
sample_size <- 1000
openmp(4)
args <- commandArgs(TRUE)
rhos <- c(-.5,0,.5)
true_rho <- rhos[abs(as.numeric(args[1]))]
j <- abs(as.numeric(args[2]))
seed <- 2109+j
res_neg <- simulacao(nresp, beta1, beta2, true_rho, s2_1, s2_2, seed, sample_size = sample_size, model, nu1=nu1, nu2=nu2, j = j) # 1 by time
saveRDS(res_neg, file = paste0(getwd(), "/Output/output_cmp_rho", true_rho, "n", sample_size, "j", j, ".rds"))
An important detail is that I need to run it on a external server via ssh.
I did a small test with an empty .R file on my PC, and I was able too see different processes being created via htop. On server, it did not happened.
I also tried nohup to run my compoisson.sh file (question1, question2), but I did not have any success. My test:
nohup compoisson.sh &
ignoring the entrance and attaching the output to 'nohup.out'
nohup: failt to execute the command 'compoisson.sh': File or directory does not exists.
What am I doing wrong?
Solved with nohup ./compoisson.sh & instead of sh compoisson.sh &

Can I import variables into R from a global file?

I am integrating an R script to produce some graphics into a larger project that is pulled together with a Makefile. In this larger project, I have a file called globals.mk that contains global variables used by many other scripts in the project. For example, the number of simulations I want to run is a global that I want to use in this R script. Can I "import" this as a variable, or is it necessary to manually define every variable within the R script?
Edit: here is a sample of the globals that I would need to read in.
num = 100
path = ./here/is/a/path
file = $(path)/file.csv
And I would like the R script to set the variables num as 100 (or "100"), path as "./here/is/a/path" and file as "./here/is/a/path/file.csv".
If it is ok to replace the parentheses with brace brackets then readRenviron will read in such files and perform the substitutions returning the contents as environmental variables.
# write out test file globals2.mk which uses brace brackets
Lines <- "num = 100
path = ./here/is/a/path
file = ${path}/file.csv"
cat(Lines, file = "globals2.mk")
readRenviron("globals2.mk")
Sys.getenv("num")
## [1] "100"
Sys.getenv("path")
## [1] "./here/is/a/path"
Sys.getenv("file")
## [1] "./here/is/a/path/file.csv"
If it is important to use parentheses rather than brace brackets, read in globals.mk, replace the parentheses with brace brackets and then write the file out again.
# write out test file - this one uses parentheses as in question
Lines <- "num = 100
path = ./here/is/a/path
file = $(path)/file.csv"
cat(Lines, file = "globals.mk")
# read globals.mk, perform () to {} substitutions, write out and then re-read
tmp <- tempfile()
L <- readLines("globals.mk")
cat(paste(chartr("()", "{}", L), collapse = "\n"), file = tmp)
readRenviron(tmp)
If the .mk file has anything other than direct variable expansion (such as more complex make-rules/tricks/functions), it might be better to trust make to do the expansion for you, and then read it in. There's a post here that I found that dumps all variable contents (after processing).
TL;DR
expand_mkvars <- function(path, aslist = FALSE) {
stopifnot(file.exists(mk <- Sys.which("make")))
tf <- tempfile(fileext = ".mk")
# needed on my windows system
tf <- normalizePath(tf, winslash = "/", mustWork = FALSE) # tempfile should suffice
on.exit(suppressWarnings(file.remove(tf)), add = TRUE)
writeLines(c(".PHONY: printvars",
"printvars:",
"\t#$(foreach V,$(sort $(.VARIABLES)), \\",
"\t $(if $(filter-out environment% default automatic, \\",
"\t $(origin $V)),$(warning $V=$($V))))"), con = tf)
out <- system2(mk, c("-f", shQuote(path), "-f", shQuote(tf), "-n", "printvars"),
stdout = TRUE, stderr = TRUE)
out <- out[grepl(paste0("^", tf), out)]
out <- gsub(paste0("^", tf, ":[0-9]+:\\s*"), "", out)
known_noneed <- c(".DEFAULT_GOAL", "CURDIR", "GNUMAKEFLAGS", "MAKEFILE_LIST", "MAKEFLAGS")
out <- out[!grepl(paste0("^(", paste(known_noneed, collapse = "|"), ")="), out)]
if (aslist) {
spl <- strsplit(out, "=")
nms <- sapply(spl, `[[`, 1)
rest <- lapply(spl, function(a) paste(a[-1], collapse = "="))
setNames(rest, nms)
} else out
}
In action:
expand_mkvars("~/StackOverflow/karthikt.mk")
# [1] "file=./here/is/a/path/file.csv" "num=100"
# [3] "path=./here/is/a/path"
expand_mkvars("~/StackOverflow/karthikt.mk", aslist = TRUE)
# $file
# [1] "./here/is/a/path/file.csv"
# $num
# [1] "100"
# $path
# [1] "./here/is/a/path"
I have not tested on other systems, so you might need to adjust known_noneed to add extra variables that popup. Depending on your needs, you might be able to filter more-intelligently (e.g., none of your variables lead with a capital letter), but for this example I kept it to the known-not-wanted variables that make is giving us.
The blog post suggests using a phony target of
.PHONY: printvars
printvars:
#$(foreach V,$(sort $(.VARIABLES)), \
$(if $(filter-out environment% default automatic, \
$(origin $V)),$(warning $V=$($V))))
(some are tabs, not all spaces, very important for make)
Unfortunately, it produces more output than you technically need:
$ /c/Rtools/bin/make.exe -f ~/StackOverflow/karthikt.mk printvars
C:/Users/r2/StackOverflow/karthikt.mk:10: .DEFAULT_GOAL=all
C:/Users/r2/StackOverflow/karthikt.mk:10: CURDIR=/Users/r2/Projects/Ford/shiny/shinyobjects/inst
C:/Users/r2/StackOverflow/karthikt.mk:10: GNUMAKEFLAGS=
C:/Users/r2/StackOverflow/karthikt.mk:10: MAKEFILE_LIST= C:/Users/r2/StackOverflow/karthikt.mk
C:/Users/r2/StackOverflow/karthikt.mk:10: MAKEFLAGS=
C:/Users/r2/StackOverflow/karthikt.mk:10: SHELL=sh
C:/Users/r2/StackOverflow/karthikt.mk:10: file=./here/is/a/path/file.csv
C:/Users/r2/StackOverflow/karthikt.mk:10: num=100
C:/Users/r2/StackOverflow/karthikt.mk:10: path=./here/is/a/path
make: Nothing to be done for 'printvars'.
so we need a little filtering, ergo the majority of code in the function.
Edit: it the readRenviron-to-envvar is the best way for you, it would not be difficult to redirect the output of this make call to another file, parse out the relevant lines, and then do readRenviron on that new file. It seems more indirect due to the use of two temp files, but they're cleaned up so that should be nothing to worry about.

R -- For loop using sprintf function in command system evaluation

I have a number of files that I would like to run using a batch file that will be executed from R using a for loop. As an example, let's assume that I have 2 runs that I would like to execute:
runs <- c(102, 103)
The syntax for the system command requires that the batch file be specified first, followed by the input data file for the run (102.txt and 103.txt) and the name of the output results file after the batch file has been executed (102.res and 103.res). I am attempting to run this using a for loop:
for (r in runs) {
cmd <- sprintf('C:/example1/test.bat %d.txt %d.res', runs, runs)[1]
print(eval(cmd))
command: system(cmd)
}
[1] "C:/example1/test.bat 102.txt 102.res"
Unfortunately, this only executes the first run (102) and does not advance to the next run (103). The R console displays the following warning:
Error in command:system(cmd) : NA/NaN argument
Thinking that this error is what is preventing R from advancing to the next run, I have attempted to use options(warn = -1) in the for loop:
for (r in runs) {
options(warn = -1)
cmd <- sprintf('C:/example1/test.bat %d.ctl %d.res', runs, runs)[1]
print(eval(cmd))
command: system(cmd)
options(warn = 0)
}
Unfortunately, this continues to throw the same error. For what it's worth, the output from my batch file (102.res) is exactly how I want it to be, I simply want to be able to bypass this error and continue on with the rest of my runs. Any thoughts on how best to do that?
Thanks in advance.
Here's what you had
runs <- c(102, 103)
for (r in runs) {
cmd <- sprintf('C:/example1/test.bat %d.txt %d.res', runs, runs)[1]
print(eval(cmd))
# command: system(cmd)
}
which outputs
[1] "C:/example1/test.bat 102.txt 102.res"
[1] "C:/example1/test.bat 102.txt 102.res"
try using the loop variable, r, instead of the array, runs, in the cmd <-... line
for (r in runs) {
cmd <- sprintf('C:/example1/test.bat %d.txt %d.res', r, r)[1] # <- change runs to r
print(eval(cmd))
# command: system(cmd)
}
output is
[1] "C:/example1/test.bat 102.txt 102.res"
[1] "C:/example1/test.bat 103.txt 103.res"

How to get true file creation date in R, on Unix systems?

I'm trying to get file creation dates from R and I understand that this information might not be possible to retrieve at all on some operating systems that just don't store it anywhere. However, I'm unsure how to retrieve it generically when it is (at least, theoretically) retrievable.
On Windows, this is straight forward because ctime from file.info provides this information, for reference, this is the relevant excerpt from ?file.info
What is meant by the three file times depends on the OS and file system. On Windows native file systems ctime is the file creation time (something which is not recorded on most Unix-alike file systems).
However, although most unix systems don't record this information (as pointed out in the help), some unix-based systems such as OS X do in fact store this. On OS X, for example, the system command metadata ls mdls will print file metadata and list kMDItemContentCreationDate (the actual creation date of the file) as one of the file attributes.
My question is, what advice do people have for getting at file creation dates (if they are available at all) from file metadata? (e.g. specifically in the case of OS X where there's a system command but no direct R call)
UPDATE:
Thanks to info from the comments + details on SO and SE here and here, I've come up with a way to solve this in R on OS X type unix platforms that track creation date and have the BSD style stat command. However, I still couldn't figure out how to do this in R on other linux systems that track creation date but don't have this version of stat. In this answer on unix SE, it is suggested that this info could be retrieved with debugfs + stat even when stat itself does not report it (provided the file system records birthdate), but that solution I couldn't get to work (only linux I could test on didn't have debugfs). Anyways, here's how far I got:
get_birthdate <- function(filepath) {
switch(Sys.info()[['sysname']],
Windows = {
# Windows
file.info(filepath)$ctime
},
Darwin = {
# OS X
cmd <- paste('stat -f "%DB"', filepath) # use BSD stat command
ctime_sec <- as.integer(system(cmd, intern=T)) # retrieve birth date in seconds from start of epoch (%DB)
as.POSIXct(ctime_sec, origin = "1970-01-01", tz = "") # convert to POSIXct
},
Linux = {
# Linux
stop("not sure how to do this")
})
}
Following other's pointers, this should work quite reasonably.
Unfortunately it needs root privileges (dued to debugfs) and it's not very efficient yet (especially a bit quick'n dirty on regular expressions, but it's 01:00 o clock in the morning here :) ).
BTW, we set up the pager to be cat (making debugfs to print on standard output), find in which device the file is stored in order to use debugfs properly and finally get the stats and elaborate it a bit.
In general, in UNIX, once you have a bash-command to read its output in R you have to use pipe in read mode(that is default) and readLines.
Test done in a Debian Gnu Linux.
np350v5c:/home/l# R
> my.file <- "/etc/network/interfaces"
>
> setup_pager <- function() {system("export PAGER=cat")}
>
> where_is <- function(file) {
con <- pipe(sprintf("df %s", file))
res <- strsplit(readLines(con)[2], " ")[[1]][1]
close(con)
res
}
>
> where_is(my.file) # could be /dev/sda1 as well, depending on /etc/fstab
[1] "/dev/disk/by-uuid/9ce40c2b-60d8-40b1-890f-1e5da4199c88"
>
> my.command <- sprintf("debugfs -R 'stat %s' %s",
my.file,
where_is(my.file))
>
> ## root privileges especially here ..
> setup_pager()
> con <- pipe(my.command)
> debugfs <- readLines(con)
debugfs 1.42.9 (4-Feb-2014)
> close(con)
>
> my.date <- gsub("^crtime:.+-- ", "", grep("^crtime", debugfs, value = TRUE))
> my.date
[1] "Tue Feb 19 00:07:21 2013"
> strptime(tolower(substr(my.date, 5, nchar(my.date))),
format = "%b %d %H:%M:%S %Y")
[1] "2013-02-19 00:07:21 CET"
HTH, Luca
I know I am a little late to the game here, but here is a pretty easy solution for unix/Mac OS:
file.name <- "~/dir/file.extension"
df$file_created_dt <- system(paste0("stat -f %SB ", file.name), intern = T)
And then you can format it however you like:
df$file_created_dt <- as.POSIXct(df$file_created_dt, format = "%b %d %H:%M:%S %Y", origin = "1970-01-01 00:00:00", tz = "your/timezone")

How to get the queue number from CONDOR into your R job

I think I have a simple problem because I was looking up and down the internet and couldn't find someone else asking this question:
My university has a Condor set-up. I want to run several repetitions of the same code (e.g. 100 times). My R code has a routine to store the results in a file, i.e.:
write.csv(res, file=paste(paste(paste(format(Sys.time(), '%y%m%d'),'res', queue, sep="_"), sep='/'),'.csv',sep='',collapse=''))
res are my results (a data.frame), I indicate that this file contains the results with 'res' and finally I want to add the queue number of this calculation (otherwise files would be replaced, wouldn't they?). It should look like: 140109_res_1.csv, 140109_res_2.csv, ...
My submit file to condor looks like this:
universe = vanilla
executable = /usr/bin/R
arguments = --vanilla
log = testR.log
error = testR.err
input = run_condor.r
output = testR$(Process).txt
requirements = (opsys == "LINUX") && (arch == "X86_64") && (HAS_R_2_13 =?= True)
request_memory = 1000
should_transfer_files = YES
transfer_executable = FALSE
when_to_transfer_output = ON_EXIT
queue 3
I wonder how do I get the 'queue' number into my R code? I tried a simple example with
print(queue)
print(Queue)
But there is no object found called queue or Queue. Any suggestions?
Best wishes,
Marco
Okay, I solved the problem. This is how it goes:
I had to change my submit file. I changed the slot arguments to:
arguments = --vanilla --args $(Process)
Now the process number is forwarded to the R code. There you retrieve it with the following line. The value will be stored as a character. Therefore, you should convert it to a numeric value (also check whether a number like 10 is passed on as '1' and '0' in which case you should also collapse the values).
run <- commandArgs(TRUE)
Here is an example of the code I let run.
> run <- commandArgs(TRUE)
> run
[1] "0"
> class(run)
[1] "character"
> try(as.numeric(run))
[1] 0
> try(run <- as.numeric(paste(run, collapse='')) )
> try(print(run))
[1] 0
> try(write(run, paste(run,'csv', sep='.')))
You can also find information how to pass on variables/arguments to your code here: http://research.cs.wisc.edu/htcondor/manual/v7.6/condor_submit.html
I hope this helps anyone.
Cheers and thanks for all other commenters!
Marco

Resources