We're building an R codebase and are hoping to unittest any functions that we write. So far, we have found two testing libraries for R: RUnit and testthat.
After doing a bit of sandboxing, we have developed a solid method for testing code every time it runs. E.g.:
# sample.R
library(methods) #required for testthat
library(testthat)
print("Running fun()...")
fun <- function(num1, num2){
num3 = num1 + num2
return(num3)
}
expect_that(fun(1,2), equals(3))
Simple enough. However, we would also like the ability to test the function (with a unittest flag in a makefile for example) without running the script it is defined in. To do this we would write unittests in test.R
# test.R
source("sample.R")
expect_that(fun(2,3), equals(5))
and run it without running the rest of sample.R. But, when the code above is run, not only the functions but the rest of the code from sample.R will be run, in this example outputting "Running fun()...". Is there any way to source() only the user-defined functions from a file?
If not, would you recommend putting functions in a separate file (say, functions.R) which can be unittested by being sourced into test.R and run when sourced in sample.R? The drawback there, it would seem, is the boilerplate needed: a file for the process, a file for the functions, and a file to run the tests.
In each script, set a name variable that is defined only if it is not already defined. See exists(). I'm fond of __name__.
Create a main function in each script that only runs if the name is correct. This function contains everything you only want to run if this is the top level script.
This is similar to the python structure of
if __name__ == "__main__": main()
I didn't understand how to implement Will Beason's answer, but his mention of the pythonic way lead me to find this solution:
if (sys.nframe() == 0) {
# ... do main stuff
}
sys.nframe() is equal to 0 when run from the interactive terminal or using Rscript.exe, in which case the main code will run. Otherwise when the code is sourced, sys.nframe() is equal to 4 (in my case, not sure how it works exactly) which will prevent the main code from running.
source
Related
I can run julia script with arguments from Powershell as > julia test.jl 'a' 'b'. I can run a script from REPL with include("test.jl") but include accepts just one argument - the path to the script.
From playing around with include it seems that it runs a script as a code block with all the variables referencing the current(?) scope so if I explicitly redefine ARGS variable in REPL it catches on and displays corresponding script results:
>ARGS="c","d"
>include("test.jl") # prints its arguments in a loop
c
d
This however gives a warning for redefining ARGS and doesn't seem the intended way of doing that. Is there another way to run a script from REPL (or from another script) while stating its arguments explicitly?
You probably don't want to run a self-contained script by includeing it. There are two options:
If the script isn't in your control and calling it from the command-line is the canonical interface, just call it in a separate Julia process. run(`$JULIA_HOME/julia path/to/script.jl arg1 arg2`). See running external commands for more details.
If you have control over the script, it'd probably make more sense to split it up into two parts: a library-like file that just defines Julia functions (but doesn't run any analyses) and a command-line file that parses the arguments and calls the functions defined by the library. Both command-line interface and the second script your writing now can include the library — or better yet make the library-like file a full-fledged package.
This solution is not clean or Julia style of doing things. But if you insist:
To avoid the warning when messing with ARGS use the original ARGS but mutate its contents. Like the following:
empty!(ARGS)
push!(ARGS,"argument1")
push!(ARGS,"argument2")
include("file.jl")
And this question is also a duplicate, or related to: juliapassing-argument-to-the-includefile-jl as #AlexanderMorley pointed to.
Not sure if it helps, but it took me a while to figure this:
On your path "C:\Users\\.julia\config\" there may be a .jl file called startup.jl
The trick is that not always Julia setup will create this. So, if neither the directory nor the .jl file exists, create them.
Julia will treat this .jl file as a command list to be executed every time you run REPL. It is very handy in order to set the directory of your projects (i.e. C:\MyJuliaProject\MyJuliaScript.jl using cd("")) and frequent used libs (like using Pkg, using LinearAlgebra, etc)
I wanted to share this as I didn't find anyone explicit saying this directory might not exist in your Julia device's installation. It took me more than it should to figure this thing out.
What is the proper way to skip all tests in the test directory of an R package when using testthat/devtools infrastructure? For example, if there is no connection to a database and all the tests rely on that connection, do I need to write a skip in all the files individually or can I write a single skip somewhere?
I have a standard package setup that looks like
mypackage/
... # other package stuff
tests/
testthat.R
testthat/
test-thing1.R
test-thing2.R
At first I thought I could put a test in the testthat.R file like
## in testthat.R
library(testthat)
library(mypackage)
fail_test <- function() FALSE
if (fail_test()) test_check("package")
but, that didn't work and it looks like calling devtools::test() just ignores that file. I guess an alternative would be to store all the tests in another directory, but is there a better solution?
The Skipping a test section in the R Packages book covers this use case. Essentially, you write a custom function that checks whatever condition you need to check -- whether or not you can connect to your database -- and then call that function from all tests that require that condition to be satisfied.
Example, parroted from the book:
skip_if_no_db <- function() {
if (db_conn()) {
skip("API not available")
}
}
test_that("foo api returns bar when given baz", {
skip_if_no_db()
...
})
I've found this approach more useful than a single switch to toggle off all tests since I tend to have a mix of test that do and don't rely on whatever condition I'm checking and I want to always run as many tests as possible.
Maybe you may organize tests in subdirectories, putting conditional directory inclusion in a parent folder test:
Consider 'tests' in testthat package.
In particular, this one looks interesting:
test-test_dir.r that includes 'test_dir' subdirectory
I do not see here nothing that recurses subdirectories in test scan:
test_dir
I am developing a package that exposes an R interface (a bunch of functions to be used interactively) and a command line interface via Rscript. This second one works via a small launcher, for instance, at the command line:
Rscript mylauncher.R arg1 arg2 arg3
would call a function of my package.
I would like to test a couple of command lines from R. Nothing fancy, just make sure that everything runs without errors.
If I test these calls doing in an R source file:
system("Rscript mylauncher.R arg1 arg2 arg3")
How can I be sure that I called the right Rscript? In case there are multiple R installations? (which is actually the case in my setting).
Another approach would be write in the R source file:
source("mylauncher.R")
But I don't see how to specify the command line (I would avoid the trick of overwriting the function commandArgs, because I want to test also the right tokenization of the command line). Does anybody have an idea?
Thanks!
Regarding
How can I be sure that I called the right Rscript? In case there are
multiple R installations?
you would query R RHOME on the command-line and Sys.getenv("R_HOME") from wihthin R.
You then append bin/RScript and should have the Rscript corresponding to your current session. I still design my libraries in such a way that I can call them from R ...
I have code in a single R file that I want to be able to source (i.e., to define my functions etc.) within RStudio during development, and also run using the #! /usr/bin/env Rscript syntax via the command line (actually, using Hadoop). For the latter, I need the last thing that Rscript does to be to kick off the analysis (i.e., using a call to a main() function). For the former, I don't want my main() function called. I'd like to be able to test if the code is running within Rscript (or, alternatively, within RStudio), so that I can either execute main() or not. Is this possible, please?
One solution would be to break my code into multiple files, but I'd rather avoid this if possible (to make the Hadoop stuff slightly easier).
Thanks in advance.
You could use interactive to test if R is running in interactive mode. interactive will return FALSE under Rscript and TRUE under (most?) GUIs.
I've an R script, that takes commandline arguments, where the top line is:
#!/usr/bin/Rscript --slave
I wanted to interrupt execution in a function (so I can interactively use the data variables that have been loaded by that point to work out the next bit of code I need to write). I added this inside the function in question:
browser()
but it gets ignored. A bit of searching suggests it might be because the program is running in non-interactive mode. But even more searching has not tracked down how I switch the script out non-interactive mode so that browser() will work. Something like a browser_yes_I_really_mean_it() function.
P.S. I want to avoid altering the rest of the script if at all possible. My current approach is to copy and paste the code chunks, needed to prepare the data, into an interactive session; but as the script gets more and more complex this is getting more and more unreasonable.
UPDATE: for anyone else with the same question, it appears the answer to the actual question is that it is impossible. Once you start R in a non-interactive mode the die is cast. The given answers are therefore workarounds: either you hack your code (remembering to unhack it afterwards), or you refactor to make debugging easier. (This comment is not intended as a criticism of the answers; the suggested refactoring makes the code cleaner anyway.)
Can you just fire up R and source the file instead?
R
source("script.R")
Following mdsumner's answer, I edited my script like this:
if(!exists("argv")){
argv=commandArgs(TRUE)
if(length(argv)!=4)usage_and_exit()
}else{
if(length(argv)!=4){
stop("Must set argv as a 4 element vector. E.g. argv=c(...)")
}
}
Then no other change was needed, and I was able to do:
R
> argv=c('a','b','c','d')
> source("script.R")
In addition to the previous answer, I'd create a toplevel function (e.g. doStuff) which performs the analysis you want to perform in batch. The function takes the cmd line options as input. In the batch script you source the script that contains this function and call it. In this way you can easily run the function in interactive mode and use e.g. browser().
In some cases, the suggested solution (workaround) may not work - for example, when the R code needs to be run as a part of an existing bash script. For those cases, I suggest to write in your R script into the bash script using here document:
#!/bin/bash
R --interactive << EOT
# R code starts here
argv=c('a','b','c','d')
print(interactive())
# Rest of script contents
quit("no")
# R code ends here
EOT
This way, print(interactive()) above will yield TRUE.
Sidenote: Make sure to avoid the $ character in your R code, as this would not be processed correctly - for example, retrieve a column from a data.frame() by using df[["X1"]] instead of df$X1.