I have written a series of test_that tests. There is one test_that test which has a side-effect of creating a sqlite3 table. The rest of the tests rely on this sqlite3 table. Is there a way to force this one test to run before any of the other tests do?
If you are using test_dir or test_package (otherwise you can just put the tests in the same file after the sqlite test), you can put your test that generates the table in its own file and use naming conventions for execution. For example, inside tests/run.R you could have:
test_file("tests/testthat/myspecialfile.R")
test_dir("tests/testthat/") # will run every file with a name starting with `test`
Related
I recently started looking into Makefiles to keep track of the scripts inside my research project. To really understand what is going on, I would like to understand the contents of .Rout files produced by R CMD BATCH a little better.
Christopher Gandrud is using a Makefile for his book Reproducible research with R and RStudio. The sample project (https://github.com/christophergandrud/rep-res-book-v3-examples/tree/master/data) has only three .R files: two of them download and clean data, the third one merges both datasets. They are invoked by the following lines of the Makefile:
# Key variables to define
RDIR = .
# Run the RSOURCE files
$(RDIR)/%.Rout: $(RDIR)/%.R
R CMD BATCH $<
None of the first two files outputs data; nor does the merge script explicitly import data - it just uses the objects created in the first two scripts. So how is the data preserved between the scripts?
To me it seems like the batch execution happens within the same R environment, preserving both objects and loaded packages. Is this really the case? And is it the .Rout file that transfers the objects from one script to the other or is it a property of the batch execution itself?
If the working environment is really preserved between the scripts, I see a lot of potential for issues if there are objects with the same names or functions with the same names from different packages. Another issue of this setup seems to be that the Makefile cannot propagate changes in the first two files downstream because there is no explicit input/prerequisite for the merge script.
I would appreciate to learn if my intuition is right and if there are better ways to execute R files in a Makefile.
By default R CMD BATCH will save your workspace to a hidden .Rdata file after running unless you choose --no-save. That's why it's not really the recommended way to run R script. The recommended way is with Rscript which will not save by default. You must write code explicitly to save if that's what you want. This is different than the Rout file which should only have the output from the commands run in the script.
In this case, execution doesn't happen in the exact same environment. R is still called three times, but that environment is serialized and reloaded between each run.
You are correct that there may be a lot of problems with saving and re-loading workspaces by default. That's why most people recommend you do not do that. But in this cause, the author just figured it made things easier for their workflow so they used it. It would be better to be more explicit about input and output files in general though.
How can I run testthat in 'auto' mode such that when I'm editing files in my R folder, only specific tests are re-run?
I have a lot of tests and some are slower than others. I need to be able to run specific tests or else I'll be waiting for up to 15 minutes for my test suite to complete. (I'd like to make the test suite faster, but that's not a realistic option right now.)
Ideally, I'd like to specify a grep expression to select the tests I want. In the JavaScript world, MochaJs and Jest both support grepping to select tests by name or by file.
Alternatively, I'd be OK with being able to specify a file directly - as long as I can do it with "auto test" support.
Here's what I've found so far with testthat:
testthat::auto_test_package runs everything at first, but only re-runs a specific test file if you edit that test file. However, if you edit any code in the R folder, it re-runs all tests.
testthat::auto_test accepts a path to a directory of test-files to test. However, testthat doesn't seem to support putting tests into different subdirectories if you want to use devtools::test or testthat::auto_test_package. Am I missing something?
testthat::test_file can run the tests from one file, but it doesn't support "auto" re-running the tests with changes.
testthat::test_dir has a filter argument, but it only filters files, not tests; it also doesn't support "auto" re-running tests
Versions:
R: 3.6.2 (2019-12-12)
testthat: 2.3.1
Addendum
I created a simple repo to demo the problem: https://github.com/generalui/select_testthat_tests
If you open that, run:
renv::restore()
testthat::auto_test_package()
It takes forever because one of the tests is slow. If I'm working on other tests, I want to skip the slow tests and only run the tests I've selected. Grepping for tests is a standard feature of test tools, so I'm sure R must have a way. testthat::test_dir has a filter option to filter files, but how do you filter on test-names, and how do you filter with auto_test_package? I just can't find it.
How do you do something like this in R:
testthat::auto_test_package(filter = 'double_it')
And have it run:
"double_it(2) == 4"
"double_it(3) == 6"
BUT NOT
"work_hard returns 'wow'"
Thanks!
What is the proper way to skip all tests in the test directory of an R package when using testthat/devtools infrastructure? For example, if there is no connection to a database and all the tests rely on that connection, do I need to write a skip in all the files individually or can I write a single skip somewhere?
I have a standard package setup that looks like
mypackage/
... # other package stuff
tests/
testthat.R
testthat/
test-thing1.R
test-thing2.R
At first I thought I could put a test in the testthat.R file like
## in testthat.R
library(testthat)
library(mypackage)
fail_test <- function() FALSE
if (fail_test()) test_check("package")
but, that didn't work and it looks like calling devtools::test() just ignores that file. I guess an alternative would be to store all the tests in another directory, but is there a better solution?
The Skipping a test section in the R Packages book covers this use case. Essentially, you write a custom function that checks whatever condition you need to check -- whether or not you can connect to your database -- and then call that function from all tests that require that condition to be satisfied.
Example, parroted from the book:
skip_if_no_db <- function() {
if (db_conn()) {
skip("API not available")
}
}
test_that("foo api returns bar when given baz", {
skip_if_no_db()
...
})
I've found this approach more useful than a single switch to toggle off all tests since I tend to have a mix of test that do and don't rely on whatever condition I'm checking and I want to always run as many tests as possible.
Maybe you may organize tests in subdirectories, putting conditional directory inclusion in a parent folder test:
Consider 'tests' in testthat package.
In particular, this one looks interesting:
test-test_dir.r that includes 'test_dir' subdirectory
I do not see here nothing that recurses subdirectories in test scan:
test_dir
We're building an R codebase and are hoping to unittest any functions that we write. So far, we have found two testing libraries for R: RUnit and testthat.
After doing a bit of sandboxing, we have developed a solid method for testing code every time it runs. E.g.:
# sample.R
library(methods) #required for testthat
library(testthat)
print("Running fun()...")
fun <- function(num1, num2){
num3 = num1 + num2
return(num3)
}
expect_that(fun(1,2), equals(3))
Simple enough. However, we would also like the ability to test the function (with a unittest flag in a makefile for example) without running the script it is defined in. To do this we would write unittests in test.R
# test.R
source("sample.R")
expect_that(fun(2,3), equals(5))
and run it without running the rest of sample.R. But, when the code above is run, not only the functions but the rest of the code from sample.R will be run, in this example outputting "Running fun()...". Is there any way to source() only the user-defined functions from a file?
If not, would you recommend putting functions in a separate file (say, functions.R) which can be unittested by being sourced into test.R and run when sourced in sample.R? The drawback there, it would seem, is the boilerplate needed: a file for the process, a file for the functions, and a file to run the tests.
In each script, set a name variable that is defined only if it is not already defined. See exists(). I'm fond of __name__.
Create a main function in each script that only runs if the name is correct. This function contains everything you only want to run if this is the top level script.
This is similar to the python structure of
if __name__ == "__main__": main()
I didn't understand how to implement Will Beason's answer, but his mention of the pythonic way lead me to find this solution:
if (sys.nframe() == 0) {
# ... do main stuff
}
sys.nframe() is equal to 0 when run from the interactive terminal or using Rscript.exe, in which case the main code will run. Otherwise when the code is sourced, sys.nframe() is equal to 4 (in my case, not sure how it works exactly) which will prevent the main code from running.
source
I am new to R and have been trying to use JRI. Through JRI, I have used the "eval()" function to get certain results. If I want to execute an R script, I have used "source()". However I am now in a situation where I need to execute a script on continuously incoming data. While I can still use "source()", I don't think that would be an optimal way from a performance perspectve.
What I did was to read the entire R script into memory and then try and use "eval()" passing the script - but this does not seem to work. I have ensured that the script has been correctly loaded into memory - that is because if I write this script (loaded into the memory) into a file and source this newly created file, it does produce the expected results.
Is there a way for me to not keep sourcing the same file over and over again and execute it from memory? Each of my data units are independent and have to be processed independently and as soon as they become available. I cannot wait to collect a bunch of data units and then pass them on to the R script.
I have searched a lot and not found anything related to this. Any pointers which could help me in this direction would be really helpful.
The way I handled this is as below -
I enclosed the entire script into a function.
I sourced the script file (which now contains the function) at the start of the execution of my program.
The place where I was sourcing the file, I am now just calling the function which contains the script itself i.e. -
REXP result = rengine.eval("retVal<-" + getFunctionName() + "()");
Here, getFunctionName() gives me the name of the name of the function which contains the script.
Since this is loaded into the memory and available, I do not have to source the script file every time I want to execute the script. Any arguments being passed to the script are done as env. variables.
This seems to be a workaround, but solves my problem. Any better options are welcome.