testthat error on check() but not on test() because of ~/.Rprofile? - r

EDIT:
Is it possible that ~/.Rprofile is not loaded on within check(). It looks like my whole process fails since the ~/.Rprofile is not loaded.
DONE EDIT
I have a strange problem on automated testing with testthat. Actually, when I test my package with test() everything works fine. But when I test with check() I get an error message.
The error message says:
1. Failure (at test_DML_create_folder_start_MQ_script.R#43): DML create folder start MQ Script works with "../DML_IC_MQ_DATA/dummy_data" data
capture.output(messages <- source(basename(script_file))) threw an error
Error in sprintf("%s folder got created for each raw file.", subfolder_prefix) :
object 'subfolder_prefix' not found
Before this error I source a script which defines the subfolder_prefix variable and I guess this is why it works in the test() case. But I expected to get this running in the check() function as well.
I will post the complete test script here, hope it is not to complicated:
library(testthat)
context("testing DML create folder and start MQ script")
test_dir <- 'dml_ic_mq_test'
start_dir <- getwd()
# list of test file folders
data_folders <- list.dirs('../DML_IC_MQ_DATA', recursive=FALSE)
for(folder in data_folders) { # for each folder with test files
dir.create(test_dir)
setwd(test_dir)
script_file <- a.DML_prepare_IC.script(dbg_level=Inf) # returns filename I will source
test_that(sprintf('we could copy all files from "%s".',
folder), {
expect_that(
all(file.copy(list.files(file.path('..',folder), full.names=TRUE),
'.',
recursive=TRUE)),
is_true())
})
test_that(sprintf('DML create folder start MQ Script works with "%s" data', folder), {
expect_that(capture.output(messages <- source(basename(script_file))),
not(throws_error()))
})
count_rawfiles <- length(list.files(pattern='.raw$'))
created_folders <- list.dirs(recursive=FALSE)
test_that(sprintf('%s folder got created for each raw file.',
subfolder_prefix), {
expect_equal(length(grep(subfolder_prefix, created_folders)),
count_rawfiles)
})
setwd(start_dir)
unlink(test_dir, recursive=TRUE)
}
In my script I define the variable subfolder_prefix <- 'IC_' and within the test I check if the same number of folders are created for each raw file... This is what my script should do...
So as I said, I am not sure how to debug this problem here since test() works but check() fails during the testthat run.

Now that I know to look in devtools we can find the answer. Per the docs check "automatically builds and checks a source package, using all known best practices". That includes ignoring .Rprofile. It looks like check calls build and that all of that work is done is a separate (clean) R session. In contrast test appears to use your currently running session (in a new environment).

Related

using rstudioapi in devtools tests

I'm making a package which contains a function that calls rstudioapi::jobRunScript(), and I would like to to be able to write tests for this function that can be run normally by devtools::test(). The package is only intended for use during interactive RStudio sessions.
Here's a minimal reprex:
After calling usethis::create_package() to initialize my package, and then usethis::use_r("rstudio") to create R/rstudio.R, I put:
foo_rstudio <- function(...) {
script.file <- tempfile()
write("print('hello')", file = script.file)
rstudioapi::jobRunScript(
path = script.file,
name = "foo",
importEnv = FALSE,
exportEnv = "R_GlobalEnv"
)
}
I then call use_test() to make an accompanying test file, in which I put:
test_that("foo works", {
foo_rstudio()
})
I then run devtools::test() and get:
I think I understand the basic problem here: devtools runs a separate R session for the tests, and that session doesn't have access to RStudio. I see here that rstudioapi can work inside child R sessions, but seemingly only those "normally launched by RStudio."
I'd really like to use devtools to test my function as I develop it. I suppose I could modify my function to accept an argument passed from the test code which will simply run the job in the R session itself or in some other kind of child R process, instead of an RStudio job, but then I'm not actually testing the normal intended functionality, and if there's an issue which is specific to the rstudioapi::jobRunScript() call and which could occur during normal use, then my tests wouldn't be able to pick it up.
Is there a way to initialize an RStudio process from within a devtools::test() session, or some other solution here?

How to call a parallelized script from command prompt?

I'm running into this issue and I for the life of me can't figure out how to solve it.
Quick summary before example:
I have several hundred data sets from which I want create reports on everyday. In order to do this efficiently, I parallelized the process with doParallel. From within RStudio, the process works fine, but when I try to make the process automatic via Task Scheduler on windows, I can't seem to get it to work.
The process within RStudio is:
I call a script that sources all of my other scripts, each individual script has a header section that performs the appropriate package import, so for instance it would look like:
get_files <- function(){
get_files.create_path() -> path
for(file in path){
if(!(file.info(paste0(path, file))[['isdir']])){
source(paste0(path, file))
}
}
}
get_files.create_path <- function(){
return(<path to directory>)
}
#self call
get_files()
This would be simply "Source on saved" and brings in everything I need into the .GlobalEnv.
From there, I could simply type: parallel_report() which calls a script that sources another script that houses the parallelization of the report generations. There was an issue awhile back with simply calling the parallelization directly (I wonder if this is related?) and so I had to make the doParallel script a non-function housing script and thus couldn't be brought in with the get_files script which would start the report generation every time I brought everything in. Thus, I had to include it in its own script and save it elsewhere to be called when necessary. The parallel_report() function would simply be:
parallel_report <- function(){
source(<path to script>)
}
Then the script that is sourced is the real parallelization script, and would look something like:
doParallel::registerDoParallel(cl = (parallel::detectCores() - 1))
foreach(name = report.list$names,
.packages = c('tidyverse', 'knitr', 'lubridate', 'stringr', 'rmarkdown'),
.export = c('generate_report'),
.errorhandling = 'remove') %dopar% {
tryCatch(expr = {
generate_report(name)
}, error = function(e){
error_handler(error = e, caller = paste0("generate report for ", name, " from parallel"), line = 28)
})
}
doParallel::stopImplicitCluster()
The generate_report function is simply an .Rmd and render() caller:
generate_report <- function(<arguments>){
#stuff
generate_report.render(<arguments>)
#stuff
}
generate_report.render <- function(<arguments>){
rmarkdown::render(
paste0(data.information#location, 'report_generator.Rmd'),
params = list(
name = name,
date = date,
thoughts = thoughts,
auto = auto),
output_file = paste0(str_to_upper(stock), '_report_', str_remove_all(date, '-'))
)
}
So to recap, in RStudio I would simply perform the following:
1 - Source save the script to bring everything
2 - type parallel_report
2.a - this calls directly the doParallization of generate_report
2.b - generate_report calls an .Rmd file that houses the required function calling and whatnot to produce the reports
And the process starts and successfully completes without a hitch.
In order to make the situation automatic via the Task Scheduler, I made a script that the Task Scheduler can call, named automatic_caller:
source(<path to the get_files script>) # this brings in all the scripts and data into the global, just
# as if it were being done manually
tryCatch(
expr = {
parallel_report()
}, error = function(e){
error_handler(error = e, caller = "parallel_report from automatic_callng", line = 39)
})
The error_handler function is just an in-house script used to log errors throughout.
So then on the Task Schedule's tasks I have the Rscript.exe called and then the automatic_caller after that. Everything within the automatic_caller function works except for the report generation.
The process completes almost automatically, and the only output I get is an error:
"pandoc version 1.12.3 or higher is required and was not found (see the help page ?rmarkdown::pandoc_available)."
But rmarkdown is within the .export call of the doParallel and it is in the scripts that use it explicitly, and in the actual generate_report it is called directly via rmarkdown::render().
So - I am at a complete loss.
Thoughts and suggestions would be completely appreciated.
So pandoc is apprently an executable that helps convert files from one extension to another. RStudio comes with its own pandoc executable so when running the scripts from RStudio, it knew where to point when pandoc is required.
From the command prompt, the system did not know to look inside of RStudio, so simply downloading pandoc as a standalone executable gives the system the proper pointer.
Downloded pandoc and everything works fine.

How to test for folder existence using testthat in R

I have a function that will set up some folders for the rest of my workflow
library(testthat)
analysisFolderCreation<-function(projectTitle=NULL,Dated=FALSE,destPath=getwd(),SETWD=FALSE){
stopifnot(length(projectTitle)>0,is.character(projectTitle),is.logical(Dated),is.logical(SETWD))
# scrub any characters that might cause trouble
projectTitle<-gsub("[[:space:]]|[[:punct:]]","",projectTitle)
rootFolder<-file.path(destPath,projectTitle)
executionFolder<-file.path(rootFolder,if (Dated) format(Sys.Date(),"%Y%m%d"))
subfolders<-c("rawdata","intermediates","reuse","log","scripts","results")
dir.create(path=executionFolder, recursive=TRUE)
sapply(file.path(executionFolder,subfolders),dir.create)
if(Setwd) setwd(executionFolder)
}
I am trying to unit test it and my error tests work fine:
test_that("analysisFolderCreation: Given incorrect inputs, error is thrown",
{
# scenario: No arguments provided
expect_that(analysisFolderCreation(),throws_error())
})
But my tests for success, do not...
test_that("analysisFolderCreation: Given correct inputs the function performs correctly",
{
# scenario: One argument provided - new project name
analysisFolderCreation(projectTitle="unittest")
expect_that(file.exists(file.path(getwd(),"unittest","log")),
is_true())
}
Errors with
Error: analysisFolderCreation: Given correct inputs the function performs correctly ---------------------------------------------------------------
could not find function "analysisFolderCreation"
As I am checking for a folder's existence, I'm unsure how to go about testing this in an expectation format that includes the function analysisFolderCreation actually inside it.
I am running in dev_mode() and executing the test file explicitly with test_file()
Is anyone able to provide a way of rewriting the test to work, or provide an existence checking expectation?
The problem appeared to be the use of test_file(). Using test() over the whole suite of unit tests does not require the function already be created in the workspace unlike test_file().

How to make "resident folder" to be the working directory? [duplicate]

Is there a way to programmatically find the path of an R script inside the script itself?
I am asking this because I have several scripts that use RGtk2 and load a GUI from a .glade file.
In these scripts I am obliged to put a setwd("path/to/the/script") instruction at the beginning, otherwise the .glade file (which is in the same directory) will not be found.
This is fine, but if I move the script in a different directory or to another computer I have to change the path. I know, it's not a big deal, but it would be nice to have something like:
setwd(getScriptPath())
So, does a similar function exist?
This works for me:
getSrcDirectory(function(x) {x})
This defines an anonymous function (that does nothing) inside the script, and then determines the source directory of that function, which is the directory where the script is.
For RStudio only:
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
This works when Running or Sourceing your file.
Use source("yourfile.R", chdir = T)
Exploit the implicit "--file" argument of Rscript
When calling the script using "Rscript" (Rscript doc) the full path of the script is given as a system parameter. The following function exploits this to extract the script directory:
getScriptPath <- function(){
cmd.args <- commandArgs()
m <- regexpr("(?<=^--file=).+", cmd.args, perl=TRUE)
script.dir <- dirname(regmatches(cmd.args, m))
if(length(script.dir) == 0) stop("can't determine script dir: please call the script with Rscript")
if(length(script.dir) > 1) stop("can't determine script dir: more than one '--file' argument detected")
return(script.dir)
}
If you wrap your code in a package, you can always query parts of the package directory.
Here is an example from the RGtk2 package:
> system.file("ui", "demo.ui", package="RGtk2")
[1] "C:/opt/R/library/RGtk2/ui/demo.ui"
>
You can do the same with a directory inst/glade/ in your sources which will become a directory glade/ in the installed package -- and system.file() will compute the path for you when installed, irrespective of the OS.
This answer works fine to me:
script.dir <- dirname(sys.frame(1)$ofile)
Note: script must be sourced in order to return correct path
I found it in: https://support.rstudio.com/hc/communities/public/questions/200895567-can-user-obtain-the-path-of-current-Project-s-directory-
But I still don´t understand what is sys.frame(1)$ofile. I didn´t find anything about that in R Documentation. Someone can explain it?
#' current script dir
#' #param
#' #return
#' #examples
#' works with source() or in RStudio Run selection
#' #export
z.csd <- function() {
# http://stackoverflow.com/questions/1815606/rscript-determine-path-of-the-executing-script
# must work with source()
if (!is.null(res <- .thisfile_source())) res
else if (!is.null(res <- .thisfile_rscript())) dirname(res)
# http://stackoverflow.com/a/35842176/2292993
# RStudio only, can work without source()
else dirname(rstudioapi::getActiveDocumentContext()$path)
}
# Helper functions
.thisfile_source <- function() {
for (i in -(1:sys.nframe())) {
if (identical(sys.function(i), base::source))
return (normalizePath(sys.frame(i)$ofile))
}
NULL
}
.thisfile_rscript <- function() {
cmdArgs <- commandArgs(trailingOnly = FALSE)
cmdArgsTrailing <- commandArgs(trailingOnly = TRUE)
cmdArgs <- cmdArgs[seq.int(from=1, length.out=length(cmdArgs) - length(cmdArgsTrailing))]
res <- gsub("^(?:--file=(.*)|.*)$", "\\1", cmdArgs)
# If multiple --file arguments are given, R uses the last one
res <- tail(res[res != ""], 1)
if (length(res) > 0)
return (res)
NULL
}
A lot of these solutions are several years old. While some may still work, there are good reasons against utilizing each of them (see linked source below). I have the best solution (also from source): use the here library.
Original example code:
library(ggplot2)
setwd("/Users/jenny/cuddly_broccoli/verbose_funicular/foofy/data")
df <- read.delim("raw_foofy_data.csv")
Revised code
library(ggplot2)
library(here)
df <- read.delim(here("data", "raw_foofy_data.csv"))
This solution is the most dynamic and robust because it works regardless of whether you are using the command line, RStudio, calling from an R script, etc. It is also extremely simple to use and is succinct.
Source: https://www.tidyverse.org/articles/2017/12/workflow-vs-script/
I have found something that works for me.
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
How about using system and shell commands? With the windows one, I think when you open the script in RStudio it sets the current shell directory to the directory of the script. You might have to add cd C:\ e.g or whatever drive you want to search (e.g. shell('dir C:\\*file_name /s', intern = TRUE) - \\ to escape escape character). Will only work for uniquely named files unless you further specify subdirectories (for Linux I started searching from /). In any case, if you know how to find something in the shell, this provides a layout to find it within R and return the directory. Should work whether you are sourcing or running the script but I haven't fully explored the potential bugs.
#Get operating system
OS<-Sys.info()
win<-length(grep("Windows",OS))
lin<-length(grep("Linux",OS))
#Find path of data directory
#Linux Bash Commands
if(lin==1){
file_path<-system("find / -name 'file_name'", intern = TRUE)
data_directory<-gsub('/file_name',"",file_path)
}
#Windows Command Prompt Commands
if(win==1){
file_path<-shell('dir file_name /s', intern = TRUE)
file_path<-file_path[4]
file_path<-gsub(" Directory of ","",file_path)
filepath<-gsub("\\\\","/",file_path)
data_directory<-file_path
}
#Change working directory to location of data and sources
setwd(data_directory)
Thank you for the function, though I had to adjust it a Little as following for me (W10):
#Windows Command Prompt Commands
if(win==1){
file_path<-shell('dir file_name', intern = TRUE)
file_path<-file_path[4]
file_path<-gsub(" Verzeichnis von ","",file_path)
file_path<-chartr("\\","/",file_path)
data_directory<-file_path
}
In my case, I needed a way to copy the executing file to back up the original script together with its outputs. This is relatively important in research. What worked for me while running my script on the command line, was a mixure of other solutions presented here, that looks like this:
library(scriptName)
file_dir <- gsub("\\", "/", fileSnapshot()$path, fixed=TRUE)
file.copy(from = file.path(file_dir, scriptName::current_filename()) ,
to = file.path(new_dir, scriptName::current_filename()))
Alternatively, one can add to the file name the date and our to help in distinguishing that file from the source like this:
file.copy(from = file.path(current_dir, current_filename()) ,
to = file.path(new_dir, subDir, paste0(current_filename(),"_", Sys.time(), ".R")))
None of the solutions given so far work in all circumstances. Worse, many solutions use setwd, and thus break code that expects the working directory to be, well, the working directory — i.e. the code that the user of the code chose (I realise that the question asks about setwd() but this doesn’t change the fact that this is generally a bad idea).
R simply has no built-in way to determine the path of the currently running piece of code.
A clean solution requires a systematic way of managing non-package code. That’s what ‘box’ does. With ‘box’, the directory relative to the currently executing code can be found trivially:
box::file()
However, that isn’t the purpose of ‘box’; it’s just a side-effect of what it actually does: it implements a proper, modern module system for R. This includes organising code in (nested) modules, and hence the ability to load code from modules relative to the currently running code.
To load code with ‘box’ you wouldn’t use e.g. source(file.path(box::file(), 'foo.r')). Instead, you’d use
box::use(./foo)
However, box::file() is still useful for locating data (i.e. OP’s use-case). So, for instance, to locate a file mygui.glade from the current module’s path, you would write.
glade_path = box::file('mygui.glade')
And (as long as you’re using ‘box’ modules) this always works, doesn’t require any hacks, and doesn’t use setwd.

Getting path of an R script

Is there a way to programmatically find the path of an R script inside the script itself?
I am asking this because I have several scripts that use RGtk2 and load a GUI from a .glade file.
In these scripts I am obliged to put a setwd("path/to/the/script") instruction at the beginning, otherwise the .glade file (which is in the same directory) will not be found.
This is fine, but if I move the script in a different directory or to another computer I have to change the path. I know, it's not a big deal, but it would be nice to have something like:
setwd(getScriptPath())
So, does a similar function exist?
This works for me:
getSrcDirectory(function(x) {x})
This defines an anonymous function (that does nothing) inside the script, and then determines the source directory of that function, which is the directory where the script is.
For RStudio only:
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
This works when Running or Sourceing your file.
Use source("yourfile.R", chdir = T)
Exploit the implicit "--file" argument of Rscript
When calling the script using "Rscript" (Rscript doc) the full path of the script is given as a system parameter. The following function exploits this to extract the script directory:
getScriptPath <- function(){
cmd.args <- commandArgs()
m <- regexpr("(?<=^--file=).+", cmd.args, perl=TRUE)
script.dir <- dirname(regmatches(cmd.args, m))
if(length(script.dir) == 0) stop("can't determine script dir: please call the script with Rscript")
if(length(script.dir) > 1) stop("can't determine script dir: more than one '--file' argument detected")
return(script.dir)
}
If you wrap your code in a package, you can always query parts of the package directory.
Here is an example from the RGtk2 package:
> system.file("ui", "demo.ui", package="RGtk2")
[1] "C:/opt/R/library/RGtk2/ui/demo.ui"
>
You can do the same with a directory inst/glade/ in your sources which will become a directory glade/ in the installed package -- and system.file() will compute the path for you when installed, irrespective of the OS.
This answer works fine to me:
script.dir <- dirname(sys.frame(1)$ofile)
Note: script must be sourced in order to return correct path
I found it in: https://support.rstudio.com/hc/communities/public/questions/200895567-can-user-obtain-the-path-of-current-Project-s-directory-
But I still don´t understand what is sys.frame(1)$ofile. I didn´t find anything about that in R Documentation. Someone can explain it?
#' current script dir
#' #param
#' #return
#' #examples
#' works with source() or in RStudio Run selection
#' #export
z.csd <- function() {
# http://stackoverflow.com/questions/1815606/rscript-determine-path-of-the-executing-script
# must work with source()
if (!is.null(res <- .thisfile_source())) res
else if (!is.null(res <- .thisfile_rscript())) dirname(res)
# http://stackoverflow.com/a/35842176/2292993
# RStudio only, can work without source()
else dirname(rstudioapi::getActiveDocumentContext()$path)
}
# Helper functions
.thisfile_source <- function() {
for (i in -(1:sys.nframe())) {
if (identical(sys.function(i), base::source))
return (normalizePath(sys.frame(i)$ofile))
}
NULL
}
.thisfile_rscript <- function() {
cmdArgs <- commandArgs(trailingOnly = FALSE)
cmdArgsTrailing <- commandArgs(trailingOnly = TRUE)
cmdArgs <- cmdArgs[seq.int(from=1, length.out=length(cmdArgs) - length(cmdArgsTrailing))]
res <- gsub("^(?:--file=(.*)|.*)$", "\\1", cmdArgs)
# If multiple --file arguments are given, R uses the last one
res <- tail(res[res != ""], 1)
if (length(res) > 0)
return (res)
NULL
}
A lot of these solutions are several years old. While some may still work, there are good reasons against utilizing each of them (see linked source below). I have the best solution (also from source): use the here library.
Original example code:
library(ggplot2)
setwd("/Users/jenny/cuddly_broccoli/verbose_funicular/foofy/data")
df <- read.delim("raw_foofy_data.csv")
Revised code
library(ggplot2)
library(here)
df <- read.delim(here("data", "raw_foofy_data.csv"))
This solution is the most dynamic and robust because it works regardless of whether you are using the command line, RStudio, calling from an R script, etc. It is also extremely simple to use and is succinct.
Source: https://www.tidyverse.org/articles/2017/12/workflow-vs-script/
I have found something that works for me.
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
How about using system and shell commands? With the windows one, I think when you open the script in RStudio it sets the current shell directory to the directory of the script. You might have to add cd C:\ e.g or whatever drive you want to search (e.g. shell('dir C:\\*file_name /s', intern = TRUE) - \\ to escape escape character). Will only work for uniquely named files unless you further specify subdirectories (for Linux I started searching from /). In any case, if you know how to find something in the shell, this provides a layout to find it within R and return the directory. Should work whether you are sourcing or running the script but I haven't fully explored the potential bugs.
#Get operating system
OS<-Sys.info()
win<-length(grep("Windows",OS))
lin<-length(grep("Linux",OS))
#Find path of data directory
#Linux Bash Commands
if(lin==1){
file_path<-system("find / -name 'file_name'", intern = TRUE)
data_directory<-gsub('/file_name',"",file_path)
}
#Windows Command Prompt Commands
if(win==1){
file_path<-shell('dir file_name /s', intern = TRUE)
file_path<-file_path[4]
file_path<-gsub(" Directory of ","",file_path)
filepath<-gsub("\\\\","/",file_path)
data_directory<-file_path
}
#Change working directory to location of data and sources
setwd(data_directory)
Thank you for the function, though I had to adjust it a Little as following for me (W10):
#Windows Command Prompt Commands
if(win==1){
file_path<-shell('dir file_name', intern = TRUE)
file_path<-file_path[4]
file_path<-gsub(" Verzeichnis von ","",file_path)
file_path<-chartr("\\","/",file_path)
data_directory<-file_path
}
In my case, I needed a way to copy the executing file to back up the original script together with its outputs. This is relatively important in research. What worked for me while running my script on the command line, was a mixure of other solutions presented here, that looks like this:
library(scriptName)
file_dir <- gsub("\\", "/", fileSnapshot()$path, fixed=TRUE)
file.copy(from = file.path(file_dir, scriptName::current_filename()) ,
to = file.path(new_dir, scriptName::current_filename()))
Alternatively, one can add to the file name the date and our to help in distinguishing that file from the source like this:
file.copy(from = file.path(current_dir, current_filename()) ,
to = file.path(new_dir, subDir, paste0(current_filename(),"_", Sys.time(), ".R")))
None of the solutions given so far work in all circumstances. Worse, many solutions use setwd, and thus break code that expects the working directory to be, well, the working directory — i.e. the code that the user of the code chose (I realise that the question asks about setwd() but this doesn’t change the fact that this is generally a bad idea).
R simply has no built-in way to determine the path of the currently running piece of code.
A clean solution requires a systematic way of managing non-package code. That’s what ‘box’ does. With ‘box’, the directory relative to the currently executing code can be found trivially:
box::file()
However, that isn’t the purpose of ‘box’; it’s just a side-effect of what it actually does: it implements a proper, modern module system for R. This includes organising code in (nested) modules, and hence the ability to load code from modules relative to the currently running code.
To load code with ‘box’ you wouldn’t use e.g. source(file.path(box::file(), 'foo.r')). Instead, you’d use
box::use(./foo)
However, box::file() is still useful for locating data (i.e. OP’s use-case). So, for instance, to locate a file mygui.glade from the current module’s path, you would write.
glade_path = box::file('mygui.glade')
And (as long as you’re using ‘box’ modules) this always works, doesn’t require any hacks, and doesn’t use setwd.

Resources