Testthat does not see external csv data - r

I'm writing test for my package and I want to use macro economical data rather than artificially randomized. The problem is that when I'm using read.csv('my_file.csv') and then running test_that, all tests using my data are ignored. For example
library('tseries')
library('testthat')
data<-read.csv('my_file.csv')
test_that('ADF test',{
vec<-data[,2]
expect_is(adf.test(vec),'htest')
})
After running 'testpackage' I get no information about failure or passing of my test. where is the problem ?

testthat only returns an error in console if the test didn't succeed:
library(testthat)
data<-iris
test_that('test1',{
expect_is(data$Petal.Length,'numeric')
})
test_that('test2',{
expect_is(data$Species,'numeric')
})
#> Error: Test failed: 'test2'
#> * <text>:8: data$Species inherits from `factor` not `numeric`.
Created on 2020-09-21 by the reprex package (v0.3.0)
You can use test_file or test_dir to get the results:
res <- test_file('mytest.R',
reporter = "list",
env = test_env(),
start_end_reporter = TRUE,
load_helpers = TRUE,
wrap = TRUE)
√ | OK F W S | Context
x | 1 1 | mytest
--------------------------------------------------------------------------------
mytest.R:8: failure: test2
data$Species inherits from `factor` not `numeric`.
--------------------------------------------------------------------------------
== Results =====================================================================
OK: 1
Failed: 1
Warnings: 0
Skipped: 0
Warning message:
`encoding` is deprecated; all files now assumed to be UTF-8

Related

testthat different interactive vs terminal

Hit a weird quirk of R and testthat, wondering whether anyone can explain it?
I have found that testthat seems to give different results depending on whether it is run interactively or on the terminal?
An example is create a file example_funcs.R
function_main <- function(df, avg_gear = NA) {
function_main2(df, avg_gear)
}
function_main2 <- function(data_df, avg_gear) {
data_df$car_name = row.names(data_df)
get_avg_gear(data_df)
}
# This function is missing an argument
get_avg_gear <- function(data_df) {
data_df %>%
filter(gear == avg_gear)
}
Then create a second file called example.R
library(dplyr)
source("example_funcs.R")
data_df = mtcars
avg_gear = median(data_df$gear)
test_that("example", {
output = function_main(data_df, avg_gear = avg_gear)
expect_true(
all(output$gear == 4)
)
})
If you run
> testthat::test_file("example.R")
══ Testing example.R ═══════════════════════════════════════════════════════════════════════════════════════
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1 ] Done!
You see it passes. However in a terminal with the identical command (using GitBash terminal in RStudio)
$ Rscript -e ' testthat::test_file("example.R")'
══ Testing example.R ═══════════════════════════════════════════════════════════════════════════════════════════════════════════[ FAIL 1 | WARN 0 | SKIP 0 | PASS 0 ]
── Error (example.R:11:3): example ─────────────────────────────────────────────
Error in `filter(., gear == avg_gear)`: Problem while computing `..1 = gear == avg_gear`.
Caused by error:
! object 'avg_gear' not found
Backtrace:
1. global function_main(data_df, avg_gear = avg_gear)
at example.R:11:2
6. dplyr:::filter.data.frame(., gear == avg_gear)
7. dplyr:::filter_rows(.data, ..., caller_env = caller_env())
8. dplyr:::filter_eval(dots, mask = mask, error_call = error_call)
10. mask$eval_all_filter(dots, env_filter)
[ FAIL 1 | WARN 0 | SKIP 0 | PASS 0 ]
It now fails?
I can't think why that would be? The error is caused by a missing function argument, but R lexical scoping rules should cover it I think
First, R looks inside the current function. Then, it looks where that function was defined (and so on, all the way up to the global environment). Finally, it looks in other loaded packages. Advanced R

My testthat tests pass the devtools::test() check but fail the devtools::check() step when developing an R package

I have 3 rds files I use in my testthat tests. the files are in /tests/testthat/testdata/. When I run the tests with devtools::test(), the files can be found. When I use devtools::check(), the files won't be found.
I have seen this issue discussed here and here and many other places but all the suggested solutions won't work for me.
I tried adding the Sys.setenv(R_TESTS="") line at the top of my "testthat.R" file; I tried using here::here() from the package "here" instead of the , system.file() function, etc.
What can I do?
Here is my test:
test_that("the test_arguments() returns errors as it should", {
x <- readRDS(system.file("tests", "testthat", "testdata", "objectA.rds", package = "package"))
y <- readRDS(system.file("tests", "testthat", "testdata", "objectB.rds", package = "package"))
z <- readRDS(here::here("tests", "testthat", "testdata", "objectC.rds", package = "package"))
expect_error(test_arguments(z, y), "X has to be an object of class SpatialPolygonsDataFrame")
expect_error(test_arguments(x, z), "y has to be an object of class RasterLayer")
})
And here is the result of the devtools::check():
-- Error (test_user_input.R:3:3): the arguments are of the right class ---------------------
Error: cannot open the connection
Backtrace:
x
1. \-base::readRDS(here::here("tests", "testthat", "testdata", "objectA.rds")) test_user_input.R:3:2
2. \-base::gzfile(file, "rb")
[ FAIL 1 | WARN 1 | SKIP 0 | PASS 0 ]
Error: Test failures
Execution halted
1 error x | 1 warning x | 1 note x

Throw warnings rather than errors in testthat

I'm writing unit tests for a package and there are some tests where I don't want the tests to throw errors if they fail but to instead give warnings.
This isn't my real code, but let's say I want to test something like:
add_x_y <- function(x, y) x + y
expect_equal( add_x_y(2, 2), 3 )
The output is an error:
Error: add_x_y(2, 2) not equal to 3.
1/1 mismatches
[1] 4 - 3 == 1
Is there a variant or alternative function that would throw a warning rather than an error for this check?
In the absence of an approach specific to testthat you could use general error handling to output a warning in place of an error.
expect_equal_or_warn <- function(...) tryCatch(expect_equal(...),
error = function(e) warning(e))
expect_equal_or_warn(add_x_y(2,2), 3)
Warning message:
add_x_y(2, 2) not equal to 3.
1/1 mismatches
[1] 4 - 3 == 1

Report extra information from a test_that block when failing

I want to cat() some information to the console in the case a test fails (I'm getting confident this won't happen but I can't prove it wont) so I can investigate the issue.
Now I have code that is approximately like this:
testthat::test_that('Maybe fails', {
seed <- as.integer(Sys.time())
set.seed(seed)
testthat::expect_true(maybe_fails(runif(100L)))
testthat::expect_equal(long_vector(runif(100L)), target, tol = 1e-8)
if (failed()) {
cat('seed: ', seed, '\n')
}
})
Unfortunately, failed() doesn't exist.
Return values of expect_*() don't seem useful, they just return the actual argument.
I'm considering to just check again using all.equal() but that is a pretty ugly duplication.
Instead of using cat, you could use the info argument managed by testthat and its reporters for all expect functions (argument kept for compatibility reasons):
library(testthat)
testthat::test_that("Some tests",{
testthat::expect_equal(1,2,info=paste('Test 1 failed at',Sys.time()))
testthat::expect_equal(1,1,info=paste('Test 2 failed at',sys.time()))
})
#> -- Failure (<text>:5:3): Some tests --------------------------------------------
#> 1 not equal to 2.
#> 1/1 mismatches
#> [1] 1 - 2 == -1
#> Test 1 failed at 2021-03-03 17:25:37

How to reuse sparklyr context with mclapply?

I have a R code that does some distributed data preprocessing in sparklyr, and then collects the data to R local dataframe to finally save the result in the CSV. Everything works as expected and now I plan to re-use the spark context across multiple input files processing.
My code looks similar to this reproducible example:
library(dplyr)
library(sparklyr)
sc <- spark_connect(master = "local")
# Generate random input
matrix(rbinom(1000, 1, .5), ncol=1) %>% write.csv('/tmp/input/df0.csv')
matrix(rbinom(1000, 1, .5), ncol=1) %>% write.csv('/tmp/input/df1.csv')
# Multi-job input
input = list(
list(name="df0", path="/tmp/input/df0.csv"),
list(name="df1", path="/tmp/input/df1.csv")
)
global_parallelism = 2
results_dir = "/tmp/results2"
# Function executed on each file
f <- function (job) {
spark_df <- spark_read_csv(sc, "df_tbl", job$path)
local_df <- spark_df %>%
group_by(V1) %>%
summarise(n=n()) %>%
sdf_collect
output_path <- paste(results_dir, "/", job$name, ".csv", sep="")
local_df %>% write.csv(output_path)
return (output_path)
}
If I execute the function of a job inputs in sequential way with lapply everything works as expected:
> lapply(input, f)
[[1]]
[1] "/tmp/results2/df0.csv"
[[2]]
[1] "/tmp/results2/df1.csv"
However, if I plan to run it in parallel to maximize usage of spark context (if df0 spark processing is done and the local R is working on it, df1 can be already processed by spark):
> library(parallel)
> library(MASS)
> mclapply(input, f, mc.cores = global_parallelism)
*** caught segfault ***
address 0x560b2c134003, cause 'memory not mapped'
[[1]]
[1] "Error in as.vector(x, \"list\") : \n cannot coerce type 'environment' to vector of type 'list'\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in as.vector(x, "list"): cannot coerce type 'environment' to vector of type 'list'>
[[2]]
NULL
Warning messages:
1: In mclapply(input, f, mc.cores = global_parallelism) :
scheduled core 2 did not deliver a result, all values of the job will be affected
2: In mclapply(input, f, mc.cores = global_parallelism) :
scheduled core 1 encountered error in user code, all values of the job will be affected
When I'm doing similar with Python and ThreadPoolExcutor, the spark context is shared across threads, same for Scala and Java.
Is this possible to reuse sparklyr context in parallel execution in R?
Yeah, unfortunately, the sc object, which is of class spark_connection, cannot be exported to another R process (even if forked processing is used). If you use the future.apply package, part of the future ecosystem, you can see this if you use:
library(future.apply)
plan(multicore)
## Look for non-exportable objects and given an error if found
options(future.globals.onReference = "error")
y <- future_lapply(input, f)
That will throw:
Error: Detected a non-exportable reference (‘externalptr’) in one of the
globals (‘sc’ of class ‘spark_connection’) used in the future expression

Resources