Source R script that requires command line arguments in Rstudio? [duplicate] - r

In one R file, I plan to source another R file that supports reading two command-line arguments. This sounds like a trivial task but I couldn't find a solution online. Any help is appreciated.

I assume the sourced script accesses the command line arguments with commandArgs? If so, you can override commandArgs in the parent script to return what you want when it is called in the script you're sourcing. To see how this would work:
file_to_source.R
print(commandArgs())
main_script.R
commandArgs <- function(...) 1:3
source('file_to_source.R')
outputs [1] 1 2 3
If your main script doesn't take any command line arguments itself, you could also just supply the arguments to this script instead.

The simplest solution is to replace source() with system() and paste. Try
arg1 <- 1
arg2 <- 2
system(paste("Rscript file_to_source.R", arg1, arg2))

If you have one script that sources another script, you can define variables in the first one that can be used by the sourced script.
> tmpfile <- tempfile()
> cat("print(a)", file=tmpfile)
> a <- 5
> source(tmpfile)
[1] 5

An extended version of #Matthew Plourde's answer. What I usually do is to have an if statement to check if the command line arguments have been defined, and read them otherwise.
In addition I try to use the argparse library to read command line arguments, as it provides a tidier syntax and better documentation.
file to be sourced
if (!exists("args")) {
suppressPackageStartupMessages(library("argparse"))
parser <- ArgumentParser()
parser$add_argument("-a", "--arg1", type="character", defalt="a",
help="First parameter [default %(defult)s]")
parser$add_argument("-b", "--arg2", type="character", defalt="b",
help="Second parameter [default %(defult)s]")
args <- parser$parse_args()
}
print (args)
file calling source()
args$arg1 = "c"
args$arg2 = "d"
source ("file_to_be_sourced.R")
c, d

This works:
# source another script with arguments
source_with_args <- function(file, ...){
commandArgs <<- function(trailingOnly){
list(...)
}
source(file)
}
source_with_args("sourcefile.R", "first_argument", "second_argument")
Note that the built in commandArgs function has to be redefined with <<- instead of <-.
As I understand it, this makes its scope extend beyond the function source_with_args() where it is defined.

I've tested some of the alternatives here and I end up with this:
File calling the source:
source_with_args <- function(file, ...){
system(paste("Rscript", file, ...))
}
source_with_args(
script_file,
paste0("--data=", data_processed_file),
paste0("--output=", output_file)
)
File to be sourced:
library(optparse)
option_list = list(
make_option(
c("--data"),
type="character",
default=NULL,
help="input file",
metavar="filename"
),
make_option(
c("--output"),
type="character",
default=NULL,
help="output file [default= %default]",
metavar="filename"
)
)
opt_parser = OptionParser(option_list=option_list)
data_processed_file <- opt$data
oputput_file <- opt$output
if(is.null(data_processed_file) || is.null(oputput_file)){
stop("Required information is missing")
}

Related

Make the function accept as an argument a file path or object from the environment [R]

Let's say this is my function:
library(dplyr)
library(vroom)
#input table <- /path/to/file
myfunction <- function(input_table){
#load data
my_table <- vroom::vroom(input_table,col_select=c(readname, transcript_start, qc_tag),show_col_types = FALSE)
#do something
output <- dplyr::filter(my_table, qc_tag=="PASS")
return(output)
}
Currently, it accepts path to the input file. I would like it to accept an argument passed either as a path to file or as an object which is already present in environment.
For instance, a table which would be prefiltered according to some criteria before launching the function of interest.
I know how to do it in not so elegant way by creating an alternative argument and using if else statement. Is there another option?
You can use generic functions to have a function behave differently depending on the type of input it receives:
fn = function(input) {UseMethod('fn')}
fn.character = function(input) {'this is a string'}
fn.data.frame = function(input) {'this is a dataframe'}
fn('string')
[1] "this is a string"
fn(data.frame('a' = 1))
[1] "this is a dataframe"
You can read more details about this technique here:

Is there a way to debug an RScript call in RStudio?

Imagine I run an R script from the command line, like this:
Rscript prog.R x y z
and I want to examine the code at a certain line.
Presently, I can't just debug it interactively within RStudio because I don't know how to pass in the arguments.
As it is designed to be run from the command line, how can I debug the script via the command line / outside of RStudio?
If your only problem is passing command line arguments to a script in Rstudio, you can use the following trick.
Your script probably accesses the command line arguments with something like:
args <- commandArgs(trailingOnly = TRUE)
After which args contains a character vector with all the trailing command arguments, which in your example would be c("x", "y", "z").
The trick to make your code run in Rstudio without any changes is simply to define a local function commandArgs that simulates the arguments you want. Here I also take care of the case where trailingOnly = TRUE although it's rarely used in practice.
commandArgs <- function(trailingOnly = FALSE) {
if (trailingOnly) {
c("x", "y", "z")
} else {
c("Rscript", "--file=prog.R", "--args", "x", "y", "z")
}
}
Once commandArgs is defined in the global environment, you can run your script in Rstudio and it will see "x", "y", "z" as its arguments.
You can even go one step further and streamline the process a bit:
#' Source a file with simulated command line arguments
#' #param file Path to R file
#' #param args Character vector of arguments
source_with_args <- function(file, args = character(0)) {
commandArgs <- function(trailingOnly = FALSE) {
if (trailingOnly) {
args
} else if (length(args)) {
c("Rscript", paste0("--file=", file), "--args", args)
} else {
c("Rscript", paste0("--file=", file))
}
}
# Note `local = TRUE` to make sure the script sees
# the local `commandArgs` function defined just above.
source(file, local = TRUE)
}
Let's imagine you want to run the following script:
# This is the content of test.R
# It simply prints the arguments it received.
args <- commandArgs(trailingOnly = TRUE)
cat("Received ", length(args), "arguments:\n")
for (a in args) {
cat("- ", a, "\n")
}
This is the result with Rscript:
$ Rscript --vanilla path/to/test.R x y z
Received 3 arguments:
- x
- y
- z
And you can run the following in R or Rstudio:
source_with_args("path/to/test.R")
#> Received 0 arguments:
source_with_args("path/to/test.R", "x")
#> Received 1 arguments:
#> - x
source_with_args("path/to/test.R", c("x", "y", "z"))
#> Received 3 arguments:
#> - x
#> - y
#> - z
This solution may solve your problem:
Is there a way to debug an R script run from the command line with Rscript.exe - Answer
ipdb.r <- function(){
input <- ""
while(!input %in% c("c","cont","continue"))
{
cat("ipdb.r>")
# stdin connection to work outside interactive session
input <- readLines("stdin",n=1)
if(!input %in% c("c","cont","continue","exit"))
{
# parent.frame() runs command outside function environment
print(eval(parse(text=input),parent.frame()))
}else if(input=="exit")
{
stop("Exiting from ipdb.r...")
}
}
}
For hacking commandArgs(), it's also possible to just overwrite the base::commandArgs() with your own version instead of masking it, since the base environment comes unlocked.
For the part of your question on debugging, you can perform traceback() as usual with any source()d scripts. Also just want to refer to this function dump.frames(). Through this function you can dump the stracktrace to a file at error and look at it later.

How to pass in function parameters for an R file when run from terminal?

I have a file on my computer that I want to run from the command line. This file would have some stuff happening within it + a function.
For example I have a global variable, start_value=10 and then I call a function further down in the Rscript. I want to run this script while passing in parameters
I tried to find out online how to do this, but I have had no luck.
I receive this error:
Error in help_function(x, y) : object 'x' not found
When running like this:
Rscript helpme.R 100 10
-
##?? saw this someplaces when searching, but couldn't get it to work
args = commandArgs(trailingOnly=TRUE)
starting_value=10
help_function = function(x,y){
division =x/y
answer=starting_value + division
return(answer)
}
help_function(x,y)
commandArgs function returns a character vector with the arguments passed to the command line (trailingOnly = TRUE removes the "RScript helpme.R" part).
In your case :
args <- commandArgs(trailingOnly = TRUE)
# parse your command line arguments
x <- as.numeric(args[1]) # args[1] contains "100"
y <- as.numeric(args[2]) # args[2] contains "10"
# ...continue with your script here

How to pass command-line arguments when calling source() on an R file within another R file

In one R file, I plan to source another R file that supports reading two command-line arguments. This sounds like a trivial task but I couldn't find a solution online. Any help is appreciated.
I assume the sourced script accesses the command line arguments with commandArgs? If so, you can override commandArgs in the parent script to return what you want when it is called in the script you're sourcing. To see how this would work:
file_to_source.R
print(commandArgs())
main_script.R
commandArgs <- function(...) 1:3
source('file_to_source.R')
outputs [1] 1 2 3
If your main script doesn't take any command line arguments itself, you could also just supply the arguments to this script instead.
The simplest solution is to replace source() with system() and paste. Try
arg1 <- 1
arg2 <- 2
system(paste("Rscript file_to_source.R", arg1, arg2))
If you have one script that sources another script, you can define variables in the first one that can be used by the sourced script.
> tmpfile <- tempfile()
> cat("print(a)", file=tmpfile)
> a <- 5
> source(tmpfile)
[1] 5
An extended version of #Matthew Plourde's answer. What I usually do is to have an if statement to check if the command line arguments have been defined, and read them otherwise.
In addition I try to use the argparse library to read command line arguments, as it provides a tidier syntax and better documentation.
file to be sourced
if (!exists("args")) {
suppressPackageStartupMessages(library("argparse"))
parser <- ArgumentParser()
parser$add_argument("-a", "--arg1", type="character", defalt="a",
help="First parameter [default %(defult)s]")
parser$add_argument("-b", "--arg2", type="character", defalt="b",
help="Second parameter [default %(defult)s]")
args <- parser$parse_args()
}
print (args)
file calling source()
args$arg1 = "c"
args$arg2 = "d"
source ("file_to_be_sourced.R")
c, d
This works:
# source another script with arguments
source_with_args <- function(file, ...){
commandArgs <<- function(trailingOnly){
list(...)
}
source(file)
}
source_with_args("sourcefile.R", "first_argument", "second_argument")
Note that the built in commandArgs function has to be redefined with <<- instead of <-.
As I understand it, this makes its scope extend beyond the function source_with_args() where it is defined.
I've tested some of the alternatives here and I end up with this:
File calling the source:
source_with_args <- function(file, ...){
system(paste("Rscript", file, ...))
}
source_with_args(
script_file,
paste0("--data=", data_processed_file),
paste0("--output=", output_file)
)
File to be sourced:
library(optparse)
option_list = list(
make_option(
c("--data"),
type="character",
default=NULL,
help="input file",
metavar="filename"
),
make_option(
c("--output"),
type="character",
default=NULL,
help="output file [default= %default]",
metavar="filename"
)
)
opt_parser = OptionParser(option_list=option_list)
data_processed_file <- opt$data
oputput_file <- opt$output
if(is.null(data_processed_file) || is.null(oputput_file)){
stop("Required information is missing")
}

Passing multiple arguments via command line in R

I am trying to pass multiple file path arguments via command line to an Rscript which can then be processed using an arguments parser. Ultimately I would want something like this
Rscript test.R --inputfiles fileA.txt fileB.txt fileC.txt --printvar yes --size 10 --anotheroption helloworld -- etc...
passed through the command line and have the result as an array in R when parsed
args$inputfiles = "fileA.txt", "fileB.txt", "fileC.txt"
I have tried several parsers including optparse and getopt but neither of them seem to support this functionality. I know argparse does but it is currently not available for R version 2.15.2
Any ideas?
Thanks
Although it wasn't released on CRAN when this question was asked a beta version of the argparse module is up there now which can do this. It is basically a wrapper around the popular python module of the same name so you need to have a recent version of python installed to use it. See install notes for more info. The basic example included sums an arbitrarily long list of numbers which should not be hard to modify so you can grab an arbitrarily long list of input files.
> install.packages("argparse")
> library("argparse")
> example("ArgumentParser")
The way you describe command line options is different from the way that most people would expect them to be used. Normally, a command line option would take a single parameter, and parameters without a preceding option are passed as arguments. If an argument would take multiple items (like a list of files), I would suggest parsing the string using strsplit().
Here's an example using optparse:
library (optparse)
option_list <- list ( make_option (c("-f","--filelist"),default="blah.txt",
help="comma separated list of files (default %default)")
)
parser <-OptionParser(option_list=option_list)
arguments <- parse_args (parser, positional_arguments=TRUE)
opt <- arguments$options
args <- arguments$args
myfilelist <- strsplit(opt$filelist, ",")
print (myfilelist)
print (args)
Here are several example runs:
$ Rscript blah.r -h
Usage: blah.r [options]
Options:
-f FILELIST, --filelist=FILELIST
comma separated list of files (default blah.txt)
-h, --help
Show this help message and exit
$ Rscript blah.r -f hello.txt
[[1]]
[1] "hello.txt"
character(0)
$ Rscript blah.r -f hello.txt world.txt
[[1]]
[1] "hello.txt"
[1] "world.txt"
$ Rscript blah.r -f hello.txt,world.txt another_argument and_another
[[1]]
[1] "hello.txt" "world.txt"
[1] "another_argument" "and_another"
$ Rscript blah.r an_argument -f hello.txt,world.txt,blah another_argument and_another
[[1]]
[1] "hello.txt" "world.txt" "blah"
[1] "an_argument" "another_argument" "and_another"
Note that for the strsplit, you can use a regular expression to determine the delimiter. I would suggest something like the following, which would let you use commas or colons to separate your list:
myfilelist <- strsplit (opt$filelist,"[,:]")
In the front of your script test.R, you put this :
args <- commandArgs(trailingOnly = TRUE)
hh <- paste(unlist(args),collapse=' ')
listoptions <- unlist(strsplit(hh,'--'))[-1]
options.args <- sapply(listoptions,function(x){
unlist(strsplit(x, ' '))[-1]
})
options.names <- sapply(listoptions,function(x){
option <- unlist(strsplit(x, ' '))[1]
})
names(options.args) <- unlist(options.names)
print(options.args)
to get :
$inputfiles
[1] "fileA.txt" "fileB.txt" "fileC.txt"
$printvar
[1] "yes"
$size
[1] "10"
$anotheroption
[1] "helloworld"
After searching around, and avoiding to write a new package from the bottom up, I figured the best way to input multiple arguments using the package optparse is to separate input files by a character which is most likely illegal to be included in a file name (for example, a colon)
Rscript test.R --inputfiles fileA.txt:fileB.txt:fileC.txt etc...
File names can also have spaces in them as long as the spaces are escaped (optparse will take care of this)
Rscript test.R --inputfiles file\ A.txt:file\ B.txt:fileC.txt etc...
Ultimatley, it would be nice to have a package (possibly a modified version of optparse) that would support multiple arguments like mentioned in the question and below
Rscript test.R --inputfiles fileA.txt fileB.txt fileC.txt
One would think such trivial features would be implemented into a widely used package such as optparse
Cheers
#agstudy's solution does not work properly if input arguments are lists of the same length. By default, sapply will collapse inputs of the same length into a matrix rather than a list. The fix is simple enough, just explicitly set simplify to false in the sapply parsing the arguments.
args <- commandArgs(trailingOnly = TRUE)
hh <- paste(unlist(args),collapse=' ')
listoptions <- unlist(strsplit(hh,'--'))[-1]
options.args <- sapply(listoptions,function(x){
unlist(strsplit(x, ' '))[-1]
}, simplify=FALSE)
options.names <- sapply(listoptions,function(x){
option <- unlist(strsplit(x, ' '))[1]
})
names(options.args) <- unlist(options.names)
print(options.args)
I had this same issue, and the workaround that I developed is to adjust the input command line arguments before they are fed to the optparse parser, by concatenating whitespace-delimited input file names together using an alternative delimiter such as a "pipe" character, which is unlikely to be used as part of a file name.
The adjustment is then reversed at the end again, by removing the delimiter using str_split().
Here is some example code:
#!/usr/bin/env Rscript
library(optparse)
library(stringr)
# ---- Part 1: Helper Functions ----
# Function to collapse multiple input arguments into a single string
# delimited by the "pipe" character
insert_delimiter <- function(rawarg) {
# Identify index locations of arguments with "-" as the very first
# character. These are presumed to be flags. Prepend with a "dummy"
# index of 0, which we'll use in the index step calculation below.
flagloc <- c(0, which(str_detect(rawarg, '^-')))
# Additionally, append a second dummy index at the end of the real ones.
n <- length(flagloc)
flagloc[n+1] <- length(rawarg) + 1
concatarg <- c()
# Counter over the output command line arguments, with multiple input
# command line arguments concatenated together into a single string as
# necessary
ii <- 1
# Counter over the flag index locations
for(ij in seq(1,length(flagloc)-1)) {
# Calculate the index step size between consecutive pairs of flags
step <- flagloc[ij+1]-flagloc[ij]
# Case 1: empty flag with no arguments
if (step == 1) {
# Ignore dummy index at beginning
if (ij != 1) {
concatarg[ii] <- rawarg[flagloc[ij]]
ii <- ii + 1
}
}
# Case 2: standard flag with one argument
else if (step == 2) {
concatarg[ii] <- rawarg[flagloc[ij]]
concatarg[ii+1] <- rawarg[flagloc[ij]+1]
ii <- ii + 2
}
# Case 3: flag with multiple whitespace delimited arguments (not
# currently handled correctly by optparse)
else if (step > 2) {
concatarg[ii] <- rawarg[flagloc[ij]]
# Concatenate multiple arguments using the "pipe" character as a delimiter
concatarg[ii+1] <- paste0(rawarg[(flagloc[ij]+1):(flagloc[ij+1]-1)],
collapse='|')
ii <- ii + 2
}
}
return(concatarg)
}
# Function to remove "pipe" character and re-expand parsed options into an
# output list again
remove_delimiter <- function(rawopt) {
outopt <- list()
for(nm in names(rawopt)) {
if (typeof(rawopt[[nm]]) == "character") {
outopt[[nm]] <- unlist(str_split(rawopt[[nm]], '\\|'))
} else {
outopt[[nm]] <- rawopt[[nm]]
}
}
return(outopt)
}
# ---- Part 2: Example Usage ----
# Prepare list of allowed options for parser, in standard fashion
option_list <- list(
make_option(c('-i', '--inputfiles'), type='character', dest='fnames',
help='Space separated list of file names', metavar='INPUTFILES'),
make_option(c('-p', '--printvar'), type='character', dest='pvar',
help='Valid options are "yes" or "no"',
metavar='PRINTVAR'),
make_option(c('-s', '--size'), type='integer', dest='sz',
help='Integer size value',
metavar='SIZE')
)
# This is the customary pattern that optparse would use to parse command line
# arguments, however it chokes when there are multiple whitespace-delimited
# options included after the "-i" or "--inputfiles" flag.
#opt <- parse_args(OptionParser(option_list=option_list),
# args=commandArgs(trailingOnly = TRUE))
# This works correctly
opt <- remove_delimiter(parse_args(OptionParser(option_list=option_list),
args=insert_delimiter(commandArgs(trailingOnly = TRUE))))
print(opt)
Assuming the above file were named fix_optparse.R, here is the output result:
> chmod +x fix_optparse.R
> ./fix_optparse.R --help
Usage: ./fix_optparse.R [options]
Options:
-i INPUTFILES, --inputfiles=INPUTFILES
Space separated list of file names
-p PRINTVAR, --printvar=PRINTVAR
Valid options are "yes" or "no"
-s SIZE, --size=SIZE
Integer size value
-h, --help
Show this help message and exit
> ./fix_optparse.R --inputfiles fileA.txt fileB.txt fileC.txt --printvar yes --size 10
$fnames
[1] "fileA.txt" "fileB.txt" "fileC.txt"
$pvar
[1] "yes"
$sz
[1] 10
$help
[1] FALSE
>
A minor limitation with this approach is that if any of the other arguments have the potential to accept a "pipe" character as a valid input, then those arguments will not be treated correctly. However I think you could probably develop a slightly more sophisticated version of this solution to handle that case correctly as well. This simple version works most of the time, and illustrates the general idea.

Resources