for my work, I run scripts on virtual machines on a computer cluster. These jobs are typically large and have a big output. What I'd like to do is to run a script via the terminal. In the end, the script creates a duplicate of itself so that it contains every line that was part of the script (minus the last if necessary). This is quite vital for replicability and debugging in my work-life because I sometimes can't see which parameters or variables a particular job included as I submit the same script repeatedly just with slightly different parameters and the folders can't be version controlled.
Imagine this file test.R:
a <- rnorm(100)
#test
# Saving history for reproducibility to folder
savehistory(file = 'test2.R')
Now, I run this via the console on my virtual node and get the following error:
[XX home]$ Rscript test.R
Error in.External2(C_savehistory, file): no history available to save
Calls: save history
Execution halted
Is there any command like save history that works inside a script that is just executed?
The desired outcome is a file called test2.R is saved that contains this:
a <- rnorm(100)
#test
# Saving history for reproducibility to folder
You could copy the file instead. Since you're using Rscript, the script name is included in commandArgs() in the form --file=test.R. A simple function like this will return the path of the executing script:
get_filename <- function() {
c_args <- commandArgs()
r_file <- c_args[grepl("\\.R$", c_args, ignore.case = TRUE)]
r_file <- gsub("--file=", "", r_file)
r_file <- normalizePath(r_file)
return(r_file)
}
Then you can copy the file as you see fit. For example, appending ".backup":
script_name <- get_filename()
backup_name <- paste0(script_name, ".backup")
file.copy(script_name, backup_name)
I'm using "studio (preview)" from Microsoft Azure Machine Learning to create a pipeline that applies machine learning to a dataset in a blob storage that is connected to our data warehouse.
In the "Designer", an "Exectue R Script" action can be added to the pipeline. I'm using this functionality to execute some of my own machine learning algorithms.
I've got a 'hello world' version of this script working (including using the "script bundle" to load the functions in my own R files). It applies a very simple manipulation (compute the days difference with the date in the date column and 'today'), and stores the output as a new file. Given that the exported file has the correct information, I know that the R script works well.
The script looks like this:
# R version: 3.5.1
# The script MUST contain a function named azureml_main
# which is the entry point for this module.
# The entry point function can contain up to two input arguments:
# Param<medals>: a R DataFrame
# Param<matches>: a R DataFrame
azureml_main <- function(dataframe1, dataframe2){
message("STARTING R script run.")
# If a zip file is connected to the third input port, it is
# unzipped under "./Script Bundle". This directory is added
# to sys.path.
message('Adding functions as source...')
if (FALSE) {
# This works...
source("./Script Bundle/first_function_for_script_bundle.R")
} else {
# And this works as well!
message('Sourcing all available functions...')
functions_folder = './Script Bundle'
list.files(path = functions_folder)
list_of_R_functions <- list.files(path = functions_folder, pattern = "^.*[Rr]$", include.dirs = FALSE, full.names = TRUE)
for (fun in list_of_R_functions) {
message(sprintf('Sourcing <%s>...', fun))
source(fun)
}
}
message('Executing R pipeline...')
dataframe1 = calculate_days_difference(dataframe = dataframe1)
# Return datasets as a Named List
return(list(dataset1=dataframe1, dataset2=dataframe2))
}
And although I do print some messages in the R Script, I haven't been able to find the "stdoutlogs" nor the "stderrlogs" that should contain these printed messages.
I need the printed messages for 1) information on how the analysis went and -most importantly- 2) debugging in case the code failed.
Now, I have found (on multiple locations) the files "stdoutlogs.txt" and "stderrlogs.txt". These can be found under "Logs" when I click on "Exectue R Script" in the "Designer".
I can also find "stdoutlogs.txt" and "stderrlogs.txt" files under "Experiments" when I click on a finished "Run" and then both under the tab "Outputs" and under the tab "Logs".
However... all of these files are empty.
Can anyone tell me how I can print messages from my R Script and help me locate where I can find the printed information?
Can you please click on the "Execute R module" and download the 70_driver.log? I tried message("STARTING R script run.") in an R sample and can found the output there.
I would like to set the working directory to the path of current script programmatically but first I need to get the path of current script.
So I would like to be able to do:
current_path = ...retrieve the path of current script ...
setwd(current_path)
Just like the RStudio menu does:
So far I tried:
initial.options <- commandArgs(trailingOnly = FALSE)
file.arg.name <- "--file="
script.name <- sub(file.arg.name, "", initial.options[grep(file.arg.name, initial.options)])
script.basename <- dirname(script.name)
script.name returns NULL
source("script.R", chdir = TRUE)
Returns:
Error in file(filename, "r", encoding = encoding) : cannot open the
connection In addition: Warning message: In file(filename, "r",
encoding = encoding) : cannot open file '/script.R': No such file or
directory
dirname(parent.frame(2)$ofile)
Returns: Error in dirname(parent.frame(2)$ofile) : a character vector argument expected
...because parent.frame is null
frame_files <- lapply(sys.frames(), function(x) x$ofile)
frame_files <- Filter(Negate(is.null), frame_files)
PATH <- dirname(frame_files[[length(frame_files)]])
Returns: Null because frame_files is a list of 0
thisFile <- function() {
cmdArgs <- commandArgs(trailingOnly = FALSE)
needle <- "--file="
match <- grep(needle, cmdArgs)
if (length(match) > 0) {
# Rscript
return(normalizePath(sub(needle, "", cmdArgs[match])))
} else {
# 'source'd via R console
return(normalizePath(sys.frames()[[1]]$ofile))
}
}
Returns: Error in path.expand(path) : invalid 'path' argument
Also I saw all answers from here, here, here and here.
No joy.
Working with RStudio 1.1.383
EDIT: It would be great if there was no need for an external library to achieve this.
In RStudio, you can get the path to the file currently shown in the source pane using
rstudioapi::getSourceEditorContext()$path
If you only want the directory, use
dirname(rstudioapi::getSourceEditorContext()$path)
If you want the name of the file that's been run by source(filename), that's a little harder. You need to look for the variable srcfile somewhere back in the stack. How far back depends on how you write things, but it's around 4 steps back: for example,
fi <- tempfile()
writeLines("f()", fi)
f <- function() print(sys.frame(-4)$srcfile)
source(fi)
fi
should print the same thing on the last two lines.
Update March 2019
Based on Alexis Lucattini and user2554330 answers, to make it work on both command line and RStudio. Also solving the "as_tibble" deprecated message
library(tidyverse)
getCurrentFileLocation <- function()
{
this_file <- commandArgs() %>%
tibble::enframe(name = NULL) %>%
tidyr::separate(col=value, into=c("key", "value"), sep="=", fill='right') %>%
dplyr::filter(key == "--file") %>%
dplyr::pull(value)
if (length(this_file)==0)
{
this_file <- rstudioapi::getSourceEditorContext()$path
}
return(dirname(this_file))
}
TLDR: The here package (available on CRAN) helps you build a path from a project's root directory. R projects configured with here() can be shared with colleagues working on different laptops or servers and paths built relative to the project's root directory will still work. The development version is at github.com/r-lib/here.
With git
You certainly store your R code in a directory. This directory is probably part of a git repository and/or an R studio project. I would recommend building all paths relative to that project's root directory. For example let's say that you have an R script that creates reusable plotting functions and that you have an R markdown notebook that loads that script and plots graphs in a nice (so nice) document. The project tree would look something like this
├── notebooks
│ ├── analysis.Rmd
├── R
│ ├── prepare_data.R
│ ├── prepare_figures.R
From the analysis.Rmd notebook, you would import plotting function with here() as such:
source(file.path(here::here("R"), "prepare_figures.R"))
Why?
Hadley Wickham in a Stackoverflow
comment:
"You should never use setwd() in R code - it basically defeats the idea of
using a working directory because you can no longer easily move your code
between computers. – hadley Nov 20 '10 at 23:44 "
From the Ode to the here package:
Do you:
Have setwd() in your scripts? PLEASE STOP DOING THAT.
This makes your script very fragile, hard-wired to exactly one time and place. As soon as you rename or move directories, it breaks. Or maybe you get a new computer? Or maybe someone else needs to run your code?
[...]
Classic problem presentation: Awkwardness around building paths and/or setting working directory in projects with subdirectories. Especially if you use R Markdown and knitr, which trips up alot of people with its default behavior of “working directory = directory where this file lives”. [...]
Install the here package:
install.packages("here")
library(here)
here()
here("construct","a","path")
Documentation of the here() function:
Starting with the current working directory during package load time,
here will walk the directory hierarchy upwards until it finds
a directory that satisfies at least one of the following conditions:
contains a file matching [.]Rproj$ with contents matching ^Version: in
the first line
[... other options ...]
contains a directory .git
Once established, the root directory doesn't change during the active
R session. here() then appends the arguments to the root directory.
The development version of the here package is available on github.
What about
What about files outside the project directory?
If you are loading or sourcing files outside the project directory, the recommended way is to use an environment variable at the Operating System level. Other users of your R code on different laptops or servers would need to set the same environment variable. The advantage is that it is portable.
data_path <- Sys.getenv("PROJECT_DATA")
df <- read.csv(file.path(data_path, "file_name.csv"))
Note: There is a long list of environmental variables which can affect an R session.
What about many projects sourcing each other?
It's time to create an R package.
If you're running an Rscript through the command-line etc
Rscript /path/to/script.R
The function below will assign this_file to /path/to/script
library(tidyverse)
get_this_file <- function() {
commandArgs() %>%
tibble::enframe(name = NULL) %>%
tidyr::separate(
col = value, into = c("key", "value"), sep = "=", fill = "right"
) %>%
dplyr::filter(key == "--file") %>%
dplyr::pull(value)
}
this_file <- get_this_file()
print(this_file)
Here is a custom function to obtain the path of a file in R, RStudio, or from an Rscript:
stub <- function() {}
thisPath <- function() {
cmdArgs <- commandArgs(trailingOnly = FALSE)
if (length(grep("^-f$", cmdArgs)) > 0) {
# R console option
normalizePath(dirname(cmdArgs[grep("^-f", cmdArgs) + 1]))[1]
} else if (length(grep("^--file=", cmdArgs)) > 0) {
# Rscript/R console option
scriptPath <- normalizePath(dirname(sub("^--file=", "", cmdArgs[grep("^--file=", cmdArgs)])))[1]
} else if (Sys.getenv("RSTUDIO") == "1") {
# RStudio
dirname(rstudioapi::getSourceEditorContext()$path)
} else if (is.null(attr(stub, "srcref")) == FALSE) {
# 'source'd via R console
dirname(normalizePath(attr(attr(stub, "srcref"), "srcfile")$filename))
} else {
stop("Cannot find file path")
}
}
https://gist.github.com/jasonsychau/ff6bc78a33bf3fd1c6bd4fa78bbf42e7
Another option to get current script path is funr::get_script_path() and you don't need run your script using RStudio.
I had trouble with all of these because they rely on libraries that I couldn't use (because of packrat) until after setting the working directory (which was why I needed to get the path to begin with).
So, here's an approach that just uses base R. (EDITED to handle windows \ characters in addition to / in paths)
args = commandArgs()
scriptName = args[substr(args,1,7) == '--file=']
if (length(scriptName) == 0) {
scriptName <- rstudioapi::getSourceEditorContext()$path
} else {
scriptName <- substr(scriptName, 8, nchar(scriptName))
}
pathName = substr(
scriptName,
1,
nchar(scriptName) - nchar(strsplit(scriptName, '.*[/|\\]')[[1]][2])
)
If you don't want to use (or have to remember) code, simply hover over the script and the path will appear
The following solves the problem for three cases: RStudio source Button, RStudio R console (source(...), if the file is still in the Source pane) or the OS console via Rscript:
this_file = gsub("--file=", "", commandArgs()[grepl("--file", commandArgs())])
if (length(this_file) > 0){
wd <- paste(head(strsplit(this_file, '[/|\\]')[[1]], -1), collapse = .Platform$file.sep)
}else{
wd <- dirname(rstudioapi::getSourceEditorContext()$path)
}
print(wd)
The following code gives the directory of the running Rscript if you are running it either from Rstudio or from the command line using Rscript command:
if (rstudioapi::isAvailable()) {
if (require('rstudioapi') != TRUE) {
install.packages('rstudioapi')
}else{
library(rstudioapi) # load it
}
wdir <- dirname(getActiveDocumentContext()$path)
}else{
wdir <- getwd()
}
setwd(wdir)
I teach a lab and I have my students write their answers in .Rmd files. For grading, I download and render them as pdfs in a batch. I use the following script to render everything and save in a file.
library(rmarkdown)
# Handy functions for concatenating strings because I want to do it like a Python
# programmer damnit!
`%s%` <- function(x,y) {paste(x,y)}
`%s0%` <- function(x,y) {paste0(x,y)}
# You should set the working directory to the one where the assignments are
# located. Also, make sure ONLY .rmd files are there; anything else may cause
# a problem.
subs <- list.files(getwd()) # Get list of files in working directory
errorfiles <- c() # A vector for names of files that produced errors
for (f in subs) {
print(f)
tryCatch({
# Try to turn the document into a PDF file and save in a pdfs subdirectory
# (you don't need to make the subdirectory; it will be created automatically
# if it does not exist).
render(f, pdf_document(), output_dir = getwd() %s0% "/pdfs")
},
# If an error happens, complain, then save the name in errorfiles
error = function(c) {
warning("File" %s% "did not render!")
warning(c)
errorfiles <- c(errorfiles, f)
})
}
This last assignment I forgot to set error=TRUE in the chunks, so documents will fail to compile if errors are found and I will have to go hunt those errors down and fix them. I tried to modify this code so that I set the parameter error=TRUE as default outside the document. Unfortunately, I've been working at this for hours and have found no way to do so.
How can I change this code so I can change this parameter outside the documents? (Bear in mind that I don't own the computer so I cannot install anything, but the packages knitr and rmarkdown are installed.)
Sourcing files using a relative path is useful when dealing with large codebases. Other programming languages have well-defined mechanisms for sourcing files using a path relative to the directory of the file being sourced into. An example is Ruby's require_relative. What is a good way to implement relative path sourcing in R?
Below is what I pieced together a while back using various recipes and R forum posts. It's worked well for me for straight development but is not robust. For example, it breaks when the files are loaded via the testthat library, specifically auto_test(). rscript_stack() returns character(0).
# Returns the stack of RScript files
rscript_stack <- function() {
Filter(Negate(is.null), lapply(sys.frames(), function(x) x$ofile))
}
# Returns the current RScript file path
rscript_current <- function() {
stack <- rscript_stack()
r <- as.character(stack[length(stack)])
first_char <- substring(r, 1, 1)
if (first_char != '~' && first_char != .Platform$file.sep) {
r <- file.path(getwd(), r)
}
r
}
# Sources relative to the current script
source_relative <- function(relative_path, ...) {
source(file.path(dirname(rscript_current()), relative_path), ...)
}
Do you know of a better source_relative implementation?
After a discussion with #hadley on GitHub, I realized that my question goes against the common development patterns in R.
It seems that in R files that are sourced often assume that the working directory (getwd()) is set to the directory they are in. To make this work, source has a chdir argument whose default value is FALSE. When set to TRUE, it will change the working directory to the directory of the file being sourced.
In summary:
Assume that source is always relative because the working directory of the file being sourced is set to the directory where the file is.
To make this work, always set chdir=T when you source files from another directory, e.g., source('lib/stats/big_stats.R', chdir=T).
For convenient sourcing of entire directories in a predictable way I wrote sourceDir, which sources files in a directory in alphabetical order.
sourceDir <- function (path, pattern = "\\.[rR]$", env = NULL, chdir = TRUE)
{
files <- sort(dir(path, pattern, full.names = TRUE))
lapply(files, source, chdir = chdir)
}