Passing command line arguments to R CMD BATCH - r

I have been using R CMD BATCH my_script.R from a terminal to execute an R script. I am now at the point where I would like to pass an argument to the command, but am having some issues getting it working. If I do R CMD BATCH my_script.R blabla then blabla becomes the output file, rather than being interpreted as an argument available to the R script being executed.
I have tried Rscript my_script.R blabla which seems to pass on blabla correctly as an argument, but then I don't get the my_script.Rout output file that I get with R CMD BATCH (I want the .Rout file). While I could redirect the output of a call to Rscript to a file name of my choosing, I would not be getting the R input commands included in the file in the way R CMD BATCH does in the .Rout file.
So, ideally, I'm after a way to pass arguments to an R script being executed via the R CMD BATCH method, though would be happy with an approach using Rscript if there is a way to make it produce a comparable .Rout file.

My impression is that R CMD BATCH is a bit of a relict. In any case, the more recent Rscript executable (available on all platforms), together with commandArgs() makes processing command line arguments pretty easy.
As an example, here is a little script -- call it "myScript.R":
## myScript.R
args <- commandArgs(trailingOnly = TRUE)
rnorm(n=as.numeric(args[1]), mean=as.numeric(args[2]))
And here is what invoking it from the command line looks like
> Rscript myScript.R 5 100
[1] 98.46435 100.04626 99.44937 98.52910 100.78853
Edit:
Not that I'd recommend it, but ... using a combination of source() and sink(), you could get Rscript to produce an .Rout file like that produced by R CMD BATCH. One way would be to create a little R script -- call it RscriptEcho.R -- which you call directly with Rscript. It might look like this:
## RscriptEcho.R
args <- commandArgs(TRUE)
srcFile <- args[1]
outFile <- paste0(make.names(date()), ".Rout")
args <- args[-1]
sink(outFile, split = TRUE)
source(srcFile, echo = TRUE)
To execute your actual script, you would then do:
Rscript RscriptEcho.R myScript.R 5 100
[1] 98.46435 100.04626 99.44937 98.52910 100.78853
which will execute myScript.R with the supplied arguments and sink interleaved input, output, and messages to a uniquely named .Rout.
Edit2:
You can run Rscript verbosely and place the verbose output in a file.
Rscript --verbose myScript.R 5 100 > myScript.Rout

After trying the options described here, I found this post from Forester in r-bloggers . I think it is a clean option to consider.
I put his code here:
From command line
$ R CMD BATCH --no-save --no-restore '--args a=1 b=c(2,5,6)' test.R test.out &
Test.R
##First read in the arguments listed at the command line
args=(commandArgs(TRUE))
##args is now a list of character vectors
## First check to see if arguments are passed.
## Then cycle through each element of the list and evaluate the expressions.
if(length(args)==0){
print("No arguments supplied.")
##supply default values
a = 1
b = c(1,1,1)
}else{
for(i in 1:length(args)){
eval(parse(text=args[[i]]))
}
}
print(a*2)
print(b*3)
In test.out
> print(a*2)
[1] 2
> print(b*3)
[1] 6 15 18
Thanks to Forester!

You need to put arguments before my_script.R and use - on the arguments, e.g.
R CMD BATCH -blabla my_script.R
commandArgs() will receive -blabla as a character string in this case. See the help for details:
$ R CMD BATCH --help
Usage: R CMD BATCH [options] infile [outfile]
Run R non-interactively with input from infile and place output (stdout
and stderr) to another file. If not given, the name of the output file
is the one of the input file, with a possible '.R' extension stripped,
and '.Rout' appended.
Options:
-h, --help print short help message and exit
-v, --version print version info and exit
--no-timing do not report the timings
-- end processing of options
Further arguments starting with a '-' are considered as options as long
as '--' was not encountered, and are passed on to the R process, which
by default is started with '--restore --save --no-readline'.
See also help('BATCH') inside R.

In your R script, called test.R:
args <- commandArgs(trailingOnly = F)
myargument <- args[length(args)]
myargument <- sub("-","",myargument)
print(myargument)
q(save="no")
From the command line run:
R CMD BATCH -4 test.R
Your output file, test.Rout, will show that the argument 4 has been successfully passed to R:
cat test.Rout
> args <- commandArgs(trailingOnly = F)
> myargument <- args[length(args)]
> myargument <- sub("-","",myargument)
> print(myargument)
[1] "4"
> q(save="no")
> proc.time()
user system elapsed
0.222 0.022 0.236

I add an answer because I think a one line solution is always good!
Atop of your myRscript.R file, add the following line:
eval(parse(text=paste(commandArgs(trailingOnly = TRUE), collapse=";")))
Then submit your script with something like:
R CMD BATCH [options] '--args arguments you want to supply' myRscript.R &
For example:
R CMD BATCH --vanilla '--args N=1 l=list(a=2, b="test") name="aname"' myscript.R &
Then:
> ls()
[1] "N" "l" "name"

Here's another way to process command line args, using R CMD BATCH. My approach, which builds on an earlier answer here, lets you specify arguments at the command line and, in your R script, give some or all of them default values.
Here's an R file, which I name test.R:
defaults <- list(a=1, b=c(1,1,1)) ## default values of any arguments we might pass
## parse each command arg, loading it into global environment
for (arg in commandArgs(TRUE))
eval(parse(text=arg))
## if any variable named in defaults doesn't exist, then create it
## with value from defaults
for (nm in names(defaults))
assign(nm, mget(nm, ifnotfound=list(defaults[[nm]]))[[1]])
print(a)
print(b)
At the command line, if I type
R CMD BATCH --no-save --no-restore '--args a=2 b=c(2,5,6)' test.R
then within R we'll have a = 2 and b = c(2,5,6). But I could, say, omit b, and add in another argument c:
R CMD BATCH --no-save --no-restore '--args a=2 c="hello"' test.R
Then in R we'll have a = 2, b = c(1,1,1) (the default), and c = "hello".
Finally, for convenience we can wrap the R code in a function, as long as we're careful about the environment:
## defaults should be either NULL or a named list
parseCommandArgs <- function(defaults=NULL, envir=globalenv()) {
for (arg in commandArgs(TRUE))
eval(parse(text=arg), envir=envir)
for (nm in names(defaults))
assign(nm, mget(nm, ifnotfound=list(defaults[[nm]]), envir=envir)[[1]], pos=envir)
}
## example usage:
parseCommandArgs(list(a=1, b=c(1,1,1)))

Related

How to output the result of a PowerShell cmdlet run in R(system2) as a dataframe

I'm trying to run some PS cmdlets in R with system2 but I cannot find a way of outputting the result into an R data frame.
Example:
PSCommand_ConnectToMSOL <- "Connect-MsolService"
PSCommand_GetAllLicensingPlans <- "Get-MsolAccountSku | ft SkuPartNumber, ActiveUnits, ConsumedUnits"
PS_output <- system2("powershell", args = PSCommand_ConnectToMSOL, PSCommand_GetAllLicensingPlans)
PS_output
The result in PowerShell is:
However in R studio I don't see a result:
How can I output the results to a data frame?
You have a typo/incorrect syntax in your code. Per the system2 documentation, the args argument takes in a character vector (i.e. character()):
Change:
PS_output <- system2("powershell", args = PSCommand_ConnectToMSOL, PSCommand_GetAllLicensingPlans)
to
PS_output <- system2("powershell", args = c(PSCommand_ConnectToMSOL, PSCommand_GetAllLicensingPlans))
With the original code you posted the PSCommand_GetAllLicensingPlans variable would be passed onto the stdout system2 argument which wouldn't make any sense..
Maybe try examining your code just a smidge more before looking for outside help next time!

bash argument containing R column info, from character to numeric

Trying to pass column coordinates from bash to an R script. For example:
Rscript script.R Input.table "29:37,40:48" "11:19" Output.file
I then have the script
#!/usr/bin/env Rscript
args <- commandArgs(trailingOnly = TRUE)
a <- read.table(args[1], header=T, row.names=1)
locg1 <- c(args[2])
locg2 <- c(args[3])
meangroup1 <- mean(a[,locg1])
meangroup2 <- mean(a[,locg2])
However when I run the script I get execution halted with "undefined columns selected" as an error.
I believe it's because the bash arguments are all interpreted as character and I am not sure how to convert a character like "29:37,40:48" into an actual numerical list.
I'm not expert in using Rscript from the command line to call R scripts, but given this simplified version:
Rscript script.R "29:37,40:48"
we can try using strsplit to separate the two times:
times <- strsplit(args[1], ",")[[1]]
locg1 <- times[1]
locg2 <- times[2]

R - Create a separate environment where to source() an R script, such that the latter does not affect the "caller" environment

Scenario: Let's say I have a master pipeline.R script as follows:
WORKINGDIR="my/master/dir"
setwd(WORKINDIR)
# Step 1
tA = Sys.time()
source("step1.R")
difftime(Sys.time(), tA)
# Add as many steps as desired, ...
And suppose that, within step1.R happens a:
rm(list=ls())
Question:
How can I separate the pipeline.R (caller) environment from the step1.R environment?
More specifically, I would like to run step1.R in separate environment such that any code within it, like the rm, does not affect the caller environment.
There are a few ways to call a R script and run it. One of them would be source().
Source evaluates the r script and does so in a certain environment if called so.
Say we have a Test.R script:
#Test.R
a <- 1
rm(list = ls())
b <- 2
c <- 3
and global variables:
a <- 'a'
b <- 'b'
c <- 'c'
Now you would like to run this script, but in a certain environment not involving the global environment you are calling the script from. You can do this by doing creating a new environment and then calling source:
step1 <- new.env(parent = baseenv())
#Working directory set correctly.
source("Test.R", local = step1)
These are the results after the run, as you can see, the symbols in the global environment are not deleted.
a
#"a"
b
#"b"
step1$a
#NULL
#rm(list = ls()) actually ran in Test.R
step1$b
#2
Note:
You can also run a R script by using system. This will, however, be ran in a different R process and you will not be able to retrieve anything from where you called the script.
system("Rscript Test.R")
We create the new.env
e1 <- new.env()
and use sys.source to source the R script with envir specifying as the 'e1' above
sys.source("step1.R", envir=e1)

Low latency R submits

I have created some R codes, which accept a csv and produces and output, now I call these by:
Rscript code.R input.csv
Here code.R is the the code to be executed and input.csv is the file which it uses as input
Problem:
The script takes 5 seconds or more to produce results, that is because R is called from shell, the libraries need time to be loaded.
Question:
Is it possible to run R in background or as a service with all libraries loaded that I can just do submit my job and it just takes time to compute?
Full disclosure:
The script is a ML model which loads an .RDA object and the scripts calls predict function
Open your R console and run/load all the libraries that are required.
Use source() to now run your R scripts
Test.R file has the following code
#This file has no library declarations
c <- ggplot(mtcars, aes(factor(cyl)))
c <- c + geom_bar()
print(c)
Now I run from my console like this
> library(ggplot2)
> source("<Path>/test.R")
Output:
Edit: To pass Params along with source() command
You can do this by overriding commandArgs()
New test.R file code:
c <- ggplot(mtcars, aes(factor(cyl)))
c <- c + geom_bar()
print(c)
print(commandArgs())
Now from console:
> commandArgs <- function() c('a','b')
> source("<Path>/test.R")
[1] "a" "b"
(Along with the graph)

Input parameters when calling a script in E

I have an rscript that looks like this (its different but this is for reproducing purposes :)).
#createOutputFunction.R
createOutput <- function(parameter1, parameter2){
x <- parameter1 + parameter2
print(x)
}
This works. But the thing is that I would like to call the parameters to execute the function. So basically want to be able to do:
source("createOutputFunction.R") and input the parameters directly
so that with only calling sourceOutputFunction i can get a different output, depending on the parameters I enter.
Any thoughts on how I can get this working?
Try this:
# main.R file
source("createOutputFunction.R")
args <- commandArgs(trailingOnly = TRUE)
createOutput(args[1], args[2])
# end of main.R file
Now run Rscript with input arguments that is passed on to main.R.
Rscript main.R 1 2
This should print out 3.

Resources