Imagine I run an R script from the command line, like this:
Rscript prog.R x y z
and I want to examine the code at a certain line.
Presently, I can't just debug it interactively within RStudio because I don't know how to pass in the arguments.
As it is designed to be run from the command line, how can I debug the script via the command line / outside of RStudio?
If your only problem is passing command line arguments to a script in Rstudio, you can use the following trick.
Your script probably accesses the command line arguments with something like:
args <- commandArgs(trailingOnly = TRUE)
After which args contains a character vector with all the trailing command arguments, which in your example would be c("x", "y", "z").
The trick to make your code run in Rstudio without any changes is simply to define a local function commandArgs that simulates the arguments you want. Here I also take care of the case where trailingOnly = TRUE although it's rarely used in practice.
commandArgs <- function(trailingOnly = FALSE) {
if (trailingOnly) {
c("x", "y", "z")
} else {
c("Rscript", "--file=prog.R", "--args", "x", "y", "z")
}
}
Once commandArgs is defined in the global environment, you can run your script in Rstudio and it will see "x", "y", "z" as its arguments.
You can even go one step further and streamline the process a bit:
#' Source a file with simulated command line arguments
#' #param file Path to R file
#' #param args Character vector of arguments
source_with_args <- function(file, args = character(0)) {
commandArgs <- function(trailingOnly = FALSE) {
if (trailingOnly) {
args
} else if (length(args)) {
c("Rscript", paste0("--file=", file), "--args", args)
} else {
c("Rscript", paste0("--file=", file))
}
}
# Note `local = TRUE` to make sure the script sees
# the local `commandArgs` function defined just above.
source(file, local = TRUE)
}
Let's imagine you want to run the following script:
# This is the content of test.R
# It simply prints the arguments it received.
args <- commandArgs(trailingOnly = TRUE)
cat("Received ", length(args), "arguments:\n")
for (a in args) {
cat("- ", a, "\n")
}
This is the result with Rscript:
$ Rscript --vanilla path/to/test.R x y z
Received 3 arguments:
- x
- y
- z
And you can run the following in R or Rstudio:
source_with_args("path/to/test.R")
#> Received 0 arguments:
source_with_args("path/to/test.R", "x")
#> Received 1 arguments:
#> - x
source_with_args("path/to/test.R", c("x", "y", "z"))
#> Received 3 arguments:
#> - x
#> - y
#> - z
This solution may solve your problem:
Is there a way to debug an R script run from the command line with Rscript.exe - Answer
ipdb.r <- function(){
input <- ""
while(!input %in% c("c","cont","continue"))
{
cat("ipdb.r>")
# stdin connection to work outside interactive session
input <- readLines("stdin",n=1)
if(!input %in% c("c","cont","continue","exit"))
{
# parent.frame() runs command outside function environment
print(eval(parse(text=input),parent.frame()))
}else if(input=="exit")
{
stop("Exiting from ipdb.r...")
}
}
}
For hacking commandArgs(), it's also possible to just overwrite the base::commandArgs() with your own version instead of masking it, since the base environment comes unlocked.
For the part of your question on debugging, you can perform traceback() as usual with any source()d scripts. Also just want to refer to this function dump.frames(). Through this function you can dump the stracktrace to a file at error and look at it later.
Related
In one R file, I plan to source another R file that supports reading two command-line arguments. This sounds like a trivial task but I couldn't find a solution online. Any help is appreciated.
I assume the sourced script accesses the command line arguments with commandArgs? If so, you can override commandArgs in the parent script to return what you want when it is called in the script you're sourcing. To see how this would work:
file_to_source.R
print(commandArgs())
main_script.R
commandArgs <- function(...) 1:3
source('file_to_source.R')
outputs [1] 1 2 3
If your main script doesn't take any command line arguments itself, you could also just supply the arguments to this script instead.
The simplest solution is to replace source() with system() and paste. Try
arg1 <- 1
arg2 <- 2
system(paste("Rscript file_to_source.R", arg1, arg2))
If you have one script that sources another script, you can define variables in the first one that can be used by the sourced script.
> tmpfile <- tempfile()
> cat("print(a)", file=tmpfile)
> a <- 5
> source(tmpfile)
[1] 5
An extended version of #Matthew Plourde's answer. What I usually do is to have an if statement to check if the command line arguments have been defined, and read them otherwise.
In addition I try to use the argparse library to read command line arguments, as it provides a tidier syntax and better documentation.
file to be sourced
if (!exists("args")) {
suppressPackageStartupMessages(library("argparse"))
parser <- ArgumentParser()
parser$add_argument("-a", "--arg1", type="character", defalt="a",
help="First parameter [default %(defult)s]")
parser$add_argument("-b", "--arg2", type="character", defalt="b",
help="Second parameter [default %(defult)s]")
args <- parser$parse_args()
}
print (args)
file calling source()
args$arg1 = "c"
args$arg2 = "d"
source ("file_to_be_sourced.R")
c, d
This works:
# source another script with arguments
source_with_args <- function(file, ...){
commandArgs <<- function(trailingOnly){
list(...)
}
source(file)
}
source_with_args("sourcefile.R", "first_argument", "second_argument")
Note that the built in commandArgs function has to be redefined with <<- instead of <-.
As I understand it, this makes its scope extend beyond the function source_with_args() where it is defined.
I've tested some of the alternatives here and I end up with this:
File calling the source:
source_with_args <- function(file, ...){
system(paste("Rscript", file, ...))
}
source_with_args(
script_file,
paste0("--data=", data_processed_file),
paste0("--output=", output_file)
)
File to be sourced:
library(optparse)
option_list = list(
make_option(
c("--data"),
type="character",
default=NULL,
help="input file",
metavar="filename"
),
make_option(
c("--output"),
type="character",
default=NULL,
help="output file [default= %default]",
metavar="filename"
)
)
opt_parser = OptionParser(option_list=option_list)
data_processed_file <- opt$data
oputput_file <- opt$output
if(is.null(data_processed_file) || is.null(oputput_file)){
stop("Required information is missing")
}
I am trying to determine all the objects in a script. ( specifically to get all the dataframes but I'll settle for all the assigned objects ie vectors lists etc.)
Is there a way of doing this. Should I make the script run in its own session and then somehow get the objects from that session rather than rely on the global environment.
Use the second argument to source() when you execute the script. For example, here's a script:
x <- y + 1
z <- 2
which I can put in script.R. Then I will execute it in its own environment using the following code:
x <- 1 # This value will *not* change
y <- 2 # This value will be visible to the script
env <- new.env()
source("script.R", local = env)
Now I can print the values, and see that the comments are correct
x # the original one
# [1] 1
ls(env) # what was created?
# [1] "x" "z"
env$x # this is the one from the script
# [1] 3
I had a similar question and found an answer. I am copying the answer from my other post here.
I wrote the following function, get.objects(), that returns all the objects created in a script:
get.objects <- function(path2file = NULL, exception = NULL, source = FALSE, message = TRUE) {
library("utils")
library("tools")
# Step 0-1: Possibility to leave path2file = NULL if using RStudio.
# We are using rstudioapi to get the path to the current file
if(is.null(path2file)) path2file <- rstudioapi::getSourceEditorContext()$path
# Check that file exists
if (!file.exists(path2file)) {
stop("couldn't find file ", path2file)
}
# Step 0-2: If .Rmd file, need to extract the code in R chunks first
# Use code in https://felixfan.github.io/extract-r-code/
if(file_ext(path2file)=="Rmd") {
require("knitr")
tmp <- purl(path2file)
path2file <- paste(getwd(),tmp,sep="/")
source = TRUE # Must be changed to TRUE here
}
# Step 0-3: Start by running the script if you are calling an external script.
if(source) source(path2file)
# Step 1: screen the script
summ_script <- getParseData(parse(path2file, keep.source = TRUE))
# Step 2: extract the objects
list_objects <- summ_script$text[which(summ_script$token == "SYMBOL")]
# List unique
list_objects <- unique(list_objects)
# Step 3: find where the objects are.
src <- paste(as.vector(sapply(list_objects, find)))
src <- tapply(list_objects, factor(src), c)
# List of the objects in the Global Environment
# They can be in both the Global Environment and some packages.
src_names <- names(src)
list_objects = NULL
for (i in grep("GlobalEnv", src_names)) {
list_objects <- c(list_objects, src[[i]])
}
# Step 3bis: if any exception, remove from the list
if(!is.null(exception)) {
list_objects <- list_objects[!list_objects %in% exception]
}
# Step 4: done!
# If message, print message:
if(message) {
cat(paste0(" ",length(list_objects)," objects were created in the script \n ", path2file,"\n"))
}
return(list_objects)
}
To run it, you need a saved script. Here is an example of a script:
# This must be saved as a script, e.g, "test.R".
# Create a bunch of objects
temp <- LETTERS[1:3]
data <- data.frame(x = 1:10, y = 10:1)
p1 <- ggplot(data, aes(x, y)) + geom_point()
# List the objects. If you want to list all the objects except some, you can use the argument exception. Here, I listed as exception "p1.
get.objects()
get.objects(exception = "p1", message = FALSE)
Note that the function also works for external script and R markdown.
If you run an external script, you will have to run the script before. To do so, change the argument source to TRUE.
I have a file on my computer that I want to run from the command line. This file would have some stuff happening within it + a function.
For example I have a global variable, start_value=10 and then I call a function further down in the Rscript. I want to run this script while passing in parameters
I tried to find out online how to do this, but I have had no luck.
I receive this error:
Error in help_function(x, y) : object 'x' not found
When running like this:
Rscript helpme.R 100 10
-
##?? saw this someplaces when searching, but couldn't get it to work
args = commandArgs(trailingOnly=TRUE)
starting_value=10
help_function = function(x,y){
division =x/y
answer=starting_value + division
return(answer)
}
help_function(x,y)
commandArgs function returns a character vector with the arguments passed to the command line (trailingOnly = TRUE removes the "RScript helpme.R" part).
In your case :
args <- commandArgs(trailingOnly = TRUE)
# parse your command line arguments
x <- as.numeric(args[1]) # args[1] contains "100"
y <- as.numeric(args[2]) # args[2] contains "10"
# ...continue with your script here
I am trying to get my head around the Snowfall library and its usage.
Having writing a simulation that makes use of environments, I encountered the following issue. If I source a file to load functions within the parallel mode, the function seems to use a different environment than when I declare the function within parallel mode direclty.
To make things a little bit more clear, lets consider the following two scripts:
q_func.R declares the function
foo.bar <- function(x, envname) assign("val", x, envir = get(envname))
# assigns the value x to the variable "val" in the environment envname
q_snowfall.R main function that uses snowfall
library(snowfall)
SnowFunc <- function(envname) {
# load the functions
# Option 1 not working
source("q_func.R")
# Option 2 working...
# foo.bar <- function(x, envname) assign("val", x, envir = get(envname))
# create the new environment
assign(envname, new.env())
# use the function as declared in q_func.R
# to assign random numbers to the new env
foo.bar(x = rnorm(1), envname = envname)
# return the environment including the random values
return(get("val", envir = get(envname)))
}
sfInit(parallel = TRUE, cpus = 2)
# create environment 'a' and 'b' that each will get a new variable
# called 'val' that gets assigned a random value
envs <- c("a", "b")
result <- sfClusterApplyLB(envs, SnowFunc)
sfStop()
If I execute the script "q_snowfall.R" I get the error
Error in checkForRemoteErrors(val) :
2 nodes produced errors; first error: object 'a' not found
However, if I use the second option (declaring the function within the SnowFunc-function the error disappears.
Do you know how Snowfall handles the different environments? Or do you even have a solution for the issue. (note that 'q_func.R' actually takes some 100 lines of code, therefore I would prefer to have it in a separate file, thus the "keep option 2" is not a solution!)
Thank you very much!
Edit
If I change all get(envname) to get(envname, envir = globalenv()) it seems to work. But it seems to me that this is more or less a workaround and not a very snowfall-like solution.
I think the issue is not with snowfall but with the fact that you're passing the environment by name (as character). You don't need to change all occurences of get, and having it look in globalEnv may indeed be unsafe.
It is sufficient to change the get call in foo.bar to look in parent.frame() instead (i.e., the environment from which foo.bar was called). The following worked on my machine.
new q_func.R
foo.bar <- function(x, envname) assign("val", x, envir=get(envname,
pos=parent.frame()))
(not so) new q_snowfall.R
library(snowfall)
SnowFunc <- function(envname) {
assign(envname, new.env())
foo.bar(x = rnorm(1), envname = envname)
return(get("val", envir = get(envname)))
}
source("q_func.R")
sfInit(parallel = TRUE, cpus = 2)
sfExport("foo.bar")
envs <- c("a", "b")
result <- sfClusterApplyLB(envs, SnowFunc)
sfStop()
Note also that I source'd before starting the cluster and used sfExport to export foo.bar to each node.
In one R file, I plan to source another R file that supports reading two command-line arguments. This sounds like a trivial task but I couldn't find a solution online. Any help is appreciated.
I assume the sourced script accesses the command line arguments with commandArgs? If so, you can override commandArgs in the parent script to return what you want when it is called in the script you're sourcing. To see how this would work:
file_to_source.R
print(commandArgs())
main_script.R
commandArgs <- function(...) 1:3
source('file_to_source.R')
outputs [1] 1 2 3
If your main script doesn't take any command line arguments itself, you could also just supply the arguments to this script instead.
The simplest solution is to replace source() with system() and paste. Try
arg1 <- 1
arg2 <- 2
system(paste("Rscript file_to_source.R", arg1, arg2))
If you have one script that sources another script, you can define variables in the first one that can be used by the sourced script.
> tmpfile <- tempfile()
> cat("print(a)", file=tmpfile)
> a <- 5
> source(tmpfile)
[1] 5
An extended version of #Matthew Plourde's answer. What I usually do is to have an if statement to check if the command line arguments have been defined, and read them otherwise.
In addition I try to use the argparse library to read command line arguments, as it provides a tidier syntax and better documentation.
file to be sourced
if (!exists("args")) {
suppressPackageStartupMessages(library("argparse"))
parser <- ArgumentParser()
parser$add_argument("-a", "--arg1", type="character", defalt="a",
help="First parameter [default %(defult)s]")
parser$add_argument("-b", "--arg2", type="character", defalt="b",
help="Second parameter [default %(defult)s]")
args <- parser$parse_args()
}
print (args)
file calling source()
args$arg1 = "c"
args$arg2 = "d"
source ("file_to_be_sourced.R")
c, d
This works:
# source another script with arguments
source_with_args <- function(file, ...){
commandArgs <<- function(trailingOnly){
list(...)
}
source(file)
}
source_with_args("sourcefile.R", "first_argument", "second_argument")
Note that the built in commandArgs function has to be redefined with <<- instead of <-.
As I understand it, this makes its scope extend beyond the function source_with_args() where it is defined.
I've tested some of the alternatives here and I end up with this:
File calling the source:
source_with_args <- function(file, ...){
system(paste("Rscript", file, ...))
}
source_with_args(
script_file,
paste0("--data=", data_processed_file),
paste0("--output=", output_file)
)
File to be sourced:
library(optparse)
option_list = list(
make_option(
c("--data"),
type="character",
default=NULL,
help="input file",
metavar="filename"
),
make_option(
c("--output"),
type="character",
default=NULL,
help="output file [default= %default]",
metavar="filename"
)
)
opt_parser = OptionParser(option_list=option_list)
data_processed_file <- opt$data
oputput_file <- opt$output
if(is.null(data_processed_file) || is.null(oputput_file)){
stop("Required information is missing")
}