I'm developing unit tests for a model. I use a wrapper function that sources all necessary functions, loads inputs (to calculate the actual output) and loads the expected output. I would like to pass that data along with the test functions in the test directory with test_dir() (the model is not a package, hence not all functionalities in devtools work). However, I'm not able to pass them along. I found a workaround by storing the data in the global environment but this is not best practice. Is there a way to pass them along with the test_dir functionality? Somehow the ellipsis argument does not do the trick for me.
Wrapper code (only works because of the <<- to store in global env):
unittest_4576 <- function(){
# Load packages
library(data.table)
library(giuseppe)
library(plyr)
library(testthat)
library(readxl)
#Set timezone
old_val <- Sys.getenv('TZ')
Sys.setenv(TZ = 'UTC')
# Load all functions in memory
source('~/Repos/tests/unittest_subfunctions/source_all_functions.R')
source_all_functions()
# Load all inputs and expected outputs in global environment
inputs_and_expected_outputs <- load_data()
list_inputs <<- inputs_and_expected_outputs$list_inputs
list_expected_outputs <<- inputs_and_expected_outputs$list_expected_outputs
# Perform unit tests
test_dir(
'~/Repos/tests/testthat/',
filter = NULL,
reporter = NULL,
env = NULL,
inputs = list_inputs,
expected_outputs = list_expected_outputs
)
}
The following code gives is the actual test:
test_create_interest_rate_curves = function(inputs,
expected_outputs,
USE.NAMES = FALSE){
# TEST
# Match outputs
expect_equal(actual_output,expected_output)
}
testthat::test_that("check_result", {test_create_interest_rate_curves(inputs = list_inputs, expected_outputs = list_expected_outputs)})
So to summarize; The inputs and expected outputs that are passed along in the function call are not available in the function. How do I pass them along so that each test is still run in a different environment and the data is not stored in the global environment?
I ship a text file with all exported functions listed. To make sure, that all functions are listed, I would like to create a unit test via testthat and compare all exported function with the one in the text file. My current approach reads in the file and compares it with ls("package:myPackage"). But this call returns a long list of all functions of all imported packages. Any ideas how to solve this?
A complete different approach would be to generate this file automatically. But I think the first approach is easier to realise. Hopefully.
Thanks to #Emmanuel-Lin here is my solution:
require(data.table)
test_that("Function List", {
# read namespace and extract exported functions
funnamespace = data.table(read.table(system.file("NAMESPACE", package = "PackageName"), stringsAsFactors = FALSE))
funnamespace[, c("status", "fun") := tstrsplit(V1, "\\(")]
funnamespace[, fun := tstrsplit(fun, "\\)")]
# read function list
funlist = read.csv2(system.file("subdirectory", "functionList.txt", package = "PackageName"), stringsAsFactors = FALSE)
# test
expect_equal(funnamespace[status == "export", fun], funlist[, 1])
})
Obviously, I was to lazy to work out the correct regular expression to replace the two tstrsplit by one row.
If I were to use a configuration file in a normal R script, I would do this:
config.R
a <- 1
b <- 2
c <- 3
RScript
source('config.R')
d = a+b+c
Do stuff
How would I do this inside an R package? Can I keep a config file and source it inside an R function? Or should I include a,b,c in every function? What's the best practice?
If the configs shall be contained in the R package itself:
Store the config file(s) in the inst/configs folder.
After the package installation the configs are contained in the configs folder of the package location (libPaths())
Source the config file using the package installation directory from within a package function:
myPackage::load_config <- function(config_file_name = "default_config.R",
config_file_path = system.file("configs", package = getPackageName(), mustWork = TRUE))
{
env <- new.env() # all values are then contained in an separate environment
# env <- globalenv() # to make the variables visible in the client's environment
config_file_FQN <- file.path(config_file_path, config_file_name)
source(config_file_FQN, local = env, keep.source = TRUE)
return(env)
}
The client can then trigger the configuration and use it (eg. pass around)
# client call
myConf <- myPackage::load_config()
print(myConf$YourVariableName))
Or store the environment with the configured variables within the package
as a package-global variable, see this example code (sorry, too much to explain here):
https://github.com/aryoda/tryCatchLog/blob/master/R/zzz.R#L47
1: One option would be to have these as default values in your functions. As in
my_fun <- function(..., a = 1, b = 2) so on.
2: Given that what you have in a package is functions, you can easily have them declared in your main functions. So, the other functions being called by these have access to them.
3: Another option would be to keep them as functions.
a <- function()
a <- 1
Now you can call a() when ever you want, as in a() + 2.
4: Another option would be to use environments. I haven't use those much. I think you'll find this useful, in particular the section on Package state.
I am in the process of putting together a package I've been working on for almost a year now. I have what I call a hash table that a syllable look up function requires. The hash table is really just an environment (I think I'm not computer whiz) that's a look up table. You can see the function I create it with below. I have a data set DICTIONARY(about 20,000 words) that will load when the package is loaded. I also what this DICTIONARY to be passed to the hash function to create a new environment when the package is loaded; something like env <- hash(DICTIONARY) as htis is how I load the environment now. How do I make a function run on start up when the package is loaded so that this new environment is created for those using my package?
hash <- function(x, type = "character") {
e <- new.env(hash = TRUE, size = nrow(x), parent = emptyenv())
char <- function(col) assign(col[1], as.character(col[2]), envir = e)
num <- function(col) assign(col[1], as.numeric(col[2]), envir = e)
FUN <- if(type=="character") char else num
apply(x, 1, FUN)
return(e)
}
#currently how I load the environment with the DICTIONARY lookup table
env <- hash(DICTIONARY)
Here's the head of DICTIONARY if it's helpful:
word syllables
1 hm 1
2 hmm 1
3 hmmm 1
4 hmph 1
5 mmhmm 2
6 mmhm 2
7 mm 1
8 mmm 1
9 mmmm 1
10 pff 1
Many of you may be thinking "This is up to the user to determine if they want the environment loaded". Valid point but the intended audience of this package is people in the literacy field. Not many in that field are R users and so I have to make this thing as easy as possible to use. Just wanted to get out the philosophy of why I want to do this, out there so that it doesn't become a point of contention.
Thank you in advance. (PS I've looked at this manual (LINK) but can't seem to locate any info about this topic)
EDIT:
Per Andrei's suggestion i think it will be something like this? But I'm not sure. Does this load after all the other functions and data sets in the package load? This stuff is a little confusing to me.
.onLoad <- function(){
env <- hash(DICTIONARY)
}
If the hash is going to change infrequently (this seems like the case, from your problem description), then save the hash into your package source tree as
save(env, file="<my_pkg>/R/sysdata.rda")
After installing the package, env will be available inside the name space, my_pkg:::env. See section 1.1.3 of "Writing R Extensions". You might have a script, say in "/inst/scripts/make_env.R" that creates env, and that you as the developer use on those rare occasions when env needs to be updated.
Another possibility is that the hash changes, but only on package installation. Then the solution is to write code that is evaluated at package installation. So in a file /R/env.R write something along the lines of
env <- local({
localenv <- new.env(parent=emptyenv())
## fill up localenv, then return it
localenv[["foo"]] = "bar"
localenv
})
The possibility solved by .onLoad is that the data changes each time the package is loaded, e.g., because it is retrieving an update from some on-line source.
env <- new.env(parent=emptyenv())
.onLoad <- function(libname, pkgname)
{
## fill up env
env[["foo"]] = "bar"
}
I know I can use ls() and rm() to see and remove objects that exist in my environment.
However, when dealing with "old" .RData file, one needs to sometimes pick an environment a part to find what to keep and what to leave out.
What I would like to do, is to have a GUI like interface to allow me to see the objects, sort them (for example, by there size), and remove the ones I don't need (for example, by a check-box interface). Since I imagine such a system is not currently implemented in R, what ways do exist? What do you use for cleaning old .RData files?
Thanks,
Tal
I never create .RData files. If you are practicing reproducible research (and you should be!) you should be able to source in R files to go from input data files to all outputs.
When you have operations that take a long time it makes sense to cache them. If often use a construct like:
if (file.exists("cache.rdata")) {
load("cache.rdata")
} else {
# do stuff ...
save(..., file = "cache.rdata")
}
This allows you to work quickly from cached files, and when you need to recalculate from scratch you can just delete all the rdata files in your working directory.
Basic solution is to load your data, remove what you don't want and save as new, clean data.
Another way to handle this situation is to control loaded RData by loading it to own environment
sandbox <- new.env()
load("some_old.RData", sandbox)
Now you can see what is inside
ls(sandbox)
sapply(ls(sandbox), function(x) object.size(get(x,sandbox)))
Then you have several posibilities:
write what you want to new RData: save(A, B, file="clean.RData", envir=sandbox)
remove what you don't want from environment rm(x, z, u, envir=sandbox)
make copy of variables you want in global workspace and remove sandbox
I usually do something similar to third option. Load my data, do some checks, transformation, copy final data to global workspace and remove environments.
You could always implement what you want. So
Load the data
vars <- load("some_old.RData")
Get sizes
vars_size <- sapply(vars, function(x) object.size(get(x)))
Order them
vars <- vars[order(vars_size, decreasing=TRUE)]
vars_size <- vars_size [order(vars_size, decreasing=TRUE)]
Make dialog box (depends on OS, here is Windows)
vars_with_size <- paste(vars,vars_size)
vars_to_save <- select.list(vars_with_size, multiple=TRUE)
Remove what you don't want
rm(vars[!vars_with_size%in%vars_to_save])
To nice form of object size I use solution based on getAnywhere(print.object_size)
pretty_size <- function(x) {
ifelse(x >= 1024^3, paste(round(x/1024^3, 1L), "Gb"),
ifelse(x >= 1024^2, paste(round(x/1024^2, 1L), "Mb"),
ifelse(x >= 1024 , paste(round(x/1024, 1L), "Kb"),
paste(x, "bytes")
)))
}
Then in 4. one can use paste(vars, pretty_size(vars_size))
You may want to check out the RGtk2 package.
You can very easily create an interface with Glade Interface Designer and then attach whatever R commands you want to it.
If you want a good starting point where to "steal" ideas on how to use RGtk2, install the rattle package and run rattle();. Then look at the source code and start making your own interface :)
I may have a go at it and see if I can come out with something simple.
EDIT: this is a quick and dirty piece of code that you can play with. The big problem with it is that for whatever reason the rm instruction does not get executed, but I'm not sure why... I know that it is the central instruction, but at least the interface works! :D
TODO:
Make rm work
I put all the variables in the remObjEnv environment. It should not be listed in the current variable and it should be removed when the window is closed
The list will only show objects in the global environment, anything inside other environment won't be shown, but that's easy enough to implement
probably there's some other bug I haven't thought of :D
Enjoy
# Our environment
remObjEnv <<- new.env()
# Various required libraries
require("RGtk2")
remObjEnv$createModel <- function()
{
# create the array of data and fill it in
remObjEnv$objList <- NULL
objs <- objects(globalenv())
for (o in objs)
remObjEnv$objList[[length(remObjEnv$objList)+1]] <- list(object = o,
type = typeof(get(o)),
size = object.size(get(o)))
# create list store
model <- gtkListStoreNew("gchararray", "gchararray", "gint")
# add items
for (i in 1:length(remObjEnv$objList))
{
iter <- model$append()$iter
model$set(iter,
0, remObjEnv$objList[[i]]$object,
1, remObjEnv$objList[[i]]$type,
2, remObjEnv$objList[[i]]$size)
}
return(model)
}
remObjEnv$addColumns <- function(treeview)
{
colNames <- c("Name", "Type", "Size (bytes)")
model <- treeview$getModel()
for (n in 1:length(colNames))
{
renderer <- gtkCellRendererTextNew()
renderer$setData("column", n-1)
treeview$insertColumnWithAttributes(-1, colNames[n], renderer, text=n-1)
}
}
# Builds the list.
# I seem to have some problems in correctly build treeviews from glade files
# so we'll just do it by hand :)
remObjEnv$buildTreeView <- function()
{
# create model
model <- remObjEnv$createModel()
# create tree view
remObjEnv$treeview <- gtkTreeViewNewWithModel(model)
remObjEnv$treeview$setRulesHint(TRUE)
remObjEnv$treeview$getSelection()$setMode("single")
remObjEnv$addColumns(remObjEnv$treeview)
remObjEnv$vbox$packStart(remObjEnv$treeview, TRUE, TRUE, 0)
}
remObjEnv$delObj <- function(widget, treeview)
{
model <- treeview$getModel()
selection <- treeview$getSelection()
selected <- selection$getSelected()
if (selected[[1]])
{
iter <- selected$iter
path <- model$getPath(iter)
i <- path$getIndices()[[1]]
model$remove(iter)
}
obj <- as.character(remObjEnv$objList[[i+1]]$object)
rm(obj)
}
# The list of the current objects
remObjEnv$objList <- NULL
# Create the GUI.
remObjEnv$window <- gtkWindowNew("toplevel", show = FALSE)
gtkWindowSetTitle(remObjEnv$window, "R Object Remover")
gtkWindowSetDefaultSize(remObjEnv$window, 500, 300)
remObjEnv$vbox <- gtkVBoxNew(FALSE, 5)
remObjEnv$window$add(remObjEnv$vbox)
# Build the treeview
remObjEnv$buildTreeView()
remObjEnv$button <- gtkButtonNewWithLabel("Delete selected object")
gSignalConnect(remObjEnv$button, "clicked", remObjEnv$delObj, remObjEnv$treeview)
remObjEnv$vbox$packStart(remObjEnv$button, TRUE, TRUE, 0)
remObjEnv$window$showAll()
Once you've figured out what you want to keep, you can use the function -keep- from package gdata does what its name suggests.
a <- 1
b <- 2
library(gdata)
keep(a, all = TRUE, sure = TRUE)
See help(keep) for details on the -all- and -sure- options.
all: whether hidden objects (beginning with a .) should be removed, unless explicitly kept.
sure: whether to perform the removal, otherwise return names of objects that would have been removed.
This function is so useful that I'm surprised it isn't part of R itself.
The OS X gui does have such a thing, it's called the Workspace Browser. Quite handy.
I've also wished for an interface that shows the session dependency between objects, i.e. if I start from a plot() and work backwards to find all the objects that were used to create it. This would require parsing the history.
It doesn't have checkboxes to delete with, rather you select the file(s) then click delete. However, the solution below is pretty easy to implement:
library(gWidgets)
options(guiToolkit="RGtk2")
## make data frame with files
out <- lapply((x <- list.files()), file.info)
out <- do.call("rbind", out)
out <- data.frame(name=x, size=as.integer(out$size), ## more attributes?
stringsAsFactors=FALSE)
## set up GUI
w <- gwindow("Browse directory")
g <- ggroup(cont=w, horizontal=FALSE)
tbl <- gtable(out, cont=g, multiple=TRUE)
size(tbl) <- c(400,400)
deleteThem <- gbutton("delete", cont=g)
enabled(deleteThem) <- FALSE
## add handlers
addHandlerClicked(tbl, handler=function(h,...) {
enabled(deleteThem) <- (length(svalue(h$obj, index=TRUE)) > 0)
})
addHandlerClicked(deleteThem, handler=function(h,...) {
inds <- svalue(tbl, index=TRUE)
files <- tbl[inds,1]
print(files) # replace with rm?
})
The poor guy answer could be :
ls()
# spot the rank of the variables you want to remove, for example 10 to 25
rm(list= ls()[[10:25]])
# repeat until satisfied
To clean the complete environment you can try:
rm(list(ls())