I've written an import function that gets a single file from an aws s3-bucket.
That function itself is a wrapper around aws.s3::s3read_using() which takes a reading function as its first argument.
Why do I wrap around aws.s3::s3read_using() ? Because I need to do some special error-handling and want the wrapping function to do some Recall() up to a limit... but that's a different story.
Now that i've successfully build and tested my wrapping function i want to do another wrapping arround that:
I want to iterate n times over my wrapper to bind the downloaded files together. I now have the difficulty to hand the 'reading_function' to the FUN argument of aws.s3::s3read_using().
I could do that by simply using ... - BUT!
I want to make clear to the USER of my wrapping wrapper, that he needs to specify that argument.
So I've decided to use rlangs rlang::enexpr() to capture the argument and to hand it over to my first wrapper via !! - which in return captures that argument again with rlang::enexpr() and hands it over - finally - to aws.s3::s3read_using() via rlang::expr(aws.s3::s3read_using(FUN = !!reading_fn, object = s3_object))
That works perfectly fine and smooth. My Problem is with testing that function construct using testthat and mockery
Here is some broadly simplyfied code:
my_workhorse_function <- function(fn_to_work_with, value_to_work_on) {
fn <- rlang::enexpr(fn_to_work_with)
# Some other magic happens here - error handling, condition-checking, etc...
out <- eval(rlang::expr((!!fn)(value_to_work_on)))
}
my_iterating_function <- function(fn_to_iter_with, iterate_over) {
fn <- rlang::enexpr(fn_to_iter_with)
out <- list()
for(i in seq_along(iterate_over)) {
out[[i]] <- my_workhorse_function(!!fn, iterate_over[i])
}
return(out)
}
# Works just fine
my_iterating_function(sqrt, c(9:16))
Now, to the test:
# Throws an ERROR: 'Error in `!fn`: invalid argument type'
test_that("my_iterating_function iterates length(iterate_over) times over my_workhorse_function", {
mock_1 <- mockery::mock(1, cycle = TRUE)
stub(my_iterating_function, "my_workhorse_function", mock_1)
expect_equal(my_iterating_function(sqrt, c(9:16)), list(1,1,1,1,1,1,1,1))
expect_called(mock_1, 8)
})
I've used a workarround, but that just doesn't feel right, even though, it works:
# Test passed
test_that("my_iterating_function iterates length(iterate_over) times over my_workhorse_function", {
mock_1 <- mockery::mock(1, cycle = TRUE)
stub(my_iterating_function, "my_workhorse_function",
function(fn_to_work_with, value_to_work_on) {
fn <- rlang::enexpr(fn_to_work_with)
out <- mock_1(fn, value_to_work_on)
out})
expect_equal(my_iterating_function(sqrt, c(9:16)), list(1,1,1,1,1,1,1,1))
expect_called(mock_1, 8)
})
I'm using version of R: 4.1.1
I'm using versions of testthat(3.1.1), mockery(0.4.2), rlang(0.4.12)
I think you're complicating things here, although maybe I'm not fully understanding your end goal. You can directly pass functions through arguments without any issue. Your example code above can be easily simplified to (keeping the loop just to match your test_that() call):
library(testthat)
library(mockery)
my_workhorse_function <- function(fn_to_work_with, value_to_work_on) {
fn_to_work_with(value_to_work_on)
}
my_iterating_function <- function(fn_to_iter_with, iterate_over) {
out <- list()
for(i in seq_along(iterate_over)) {
out[[i]] <- my_workhorse_function(fn_to_iter_with, iterate_over[i])
}
return(out)
}
# Works just fine
my_iterating_function(sqrt, c(9:16))
#> [[1]]
#> [1] 3
#>
#> ...
test_that("my_iterating_function iterates length(iterate_over) times over my_workhorse_function", {
mock_1 <- mockery::mock(1, cycle = TRUE)
stub(my_iterating_function, "my_workhorse_function", mock_1)
expect_equal(my_iterating_function(sqrt, c(9:16)), list(1,1,1,1,1,1,1,1))
expect_called(mock_1, 8)
})
#> Test passed 🥇
You can just pass FUN directly through all of your nested functions. The functions you're wrapping with enexpr() were never going to be evaluated in the first place until you explicitly call them. You usually use enexpr when users are supplying expressions, not just functions.
Related
We're trying to all users to write a function that will execute depending on variables that will be available at runtime (this is a simulation context).
The user "declares" an inner function using the "function declarer". Later, when the outer function is called, the required piece of information (my var) is available. But we just don't know how to give it to the declared variable.
If possible, we'd like to solve this using rlang / tidyverse style tools. Thanks in advance.
library(rlang)
function_declarer <- function(expr) {
function() {
eval(expr)
}
}
inner_function <- function_declarer(mean(my_var))
outer_function <- function(step) {
my_var <- c(5, 6, 7)
# environment(step) <- env_clone(environment(step), parent = current_env()) # know this does not work
step()
}
outer_function(inner_function)
It seems you just can grab the unevaluated expression and then evaluate it later with eval and the local environment.
library(rlang)
function_declarer <- function(expr) {
enexpr(expr)
}
inner_function <- function_declarer(mean(my_var))
outer_function <- function(step) {
my_var <- c(5, 6, 7)
eval(step)
}
outer_function(inner_function)
So in this scenario we're working with expressions rather than functions. Functions would add a complication because variables are lexically scoped so free variables are looked for in the environment where the function is defined, not where it is called. Using expressions avoids this problem. So you could simplify this with just
inner_function <- quote(mean(my_var)) # base
inner_function <- rlang::expr(mean(my_var)) # rlang
I am trying to apply a function to each group of data in the main dataframe and I decided to use group_modify() (since it returns a dataframe as well). Here is my initial code:
max_conc_fx <- function(df) {
highest_conc <- 0
for (i in 1:nrow(df)) {
curr_time <- df$event_time[i]
within1hr <- filter(df, abs(event_time - curr_time) <= hours(1))
num_buyers <- length(unique(within1hr$userid))
curr_conc <- nrow(within1hr)/num_buyers
if (curr_conc > highest_conc) {
highest_conc <- curr_conc
}
}
mutate(df, highest_conc)
}
conc_data <- group_modify(data, max_conc_fx)
However, I keep getting this error message:
Error in as_group_map_function(.f) :
The function must accept at least two arguments. You can use ... to absorb unused components
After some trial and error, I rectified this by adding the argument "..." to my max_conc_fx() function, which leads to this code which works:
max_conc_fx <- function(df, ...) { #x is the rows of data for one shop
highest_conc <- 0
for (i in 1:nrow(df)) {
curr_time <- df$event_time[i]
within1hr <- filter(df, abs(event_time - curr_time) <= hours(1))
num_buyers <- length(unique(within1hr$userid))
curr_conc <- nrow(within1hr)/num_buyers
if (curr_conc > highest_conc) {
highest_conc <- curr_conc
}
}
mutate(df, highest_conc)
}
conc_data <- group_modify(data, max_conc_fx)
Can someone explain to me what the dots are actually for in this case? I understood them to be used for representing an arbitrary number of arguments or for passing on additional arguments to other functions, but I do not see both of these events happening here. Do let me know if I am missing out something or if you have a better solution for my code.
The dots don't do much in that case, but there is a condition that requires them in your functions case for group_modify()to work. The function you are passing is getting converted using a helper function as_group_map_function(). This function checks if the function has more than two arguments and if not it should have ... to pass:
## dplyr/R/group_map.R (Lines 2-8)
as_group_map_function <- function(.f) {
.f <- rlang::as_function(.f)
if (length(form <- formals(.f)) < 2 && ! "..." %in% names(form)){
stop("The function must accept at least two arguments. You can use ... to absorb unused components")
}
.f
}
I'm not 100% sure why it is done, but based on a quick peek on the source code it looks like there is a point where they pass two arguments and ... to the 'converted' version of your function (technically there is no conversion that happens – the conversion only takes place if you pass a formula instead of a function...), so my best guess is that is the reason: it needs to have some way of dealing with at least two arguments — if it doesn't need them, then it needs ... to 'absorb' them, otherwise it would fail.
For complicated reasons I'm trying to spy on often-used I/O methods in R.
I've got working code but there's something going on I don't understand about R's evaluation which leads to a lengthy comment.
.spy.on.methods <- function() {
attached.packages <- .packages()
pkg.watch <- list(
"data.table" = "fread",
"maptools" = "readShapePoly",
"utils" = "read.csv"
)
for (pkg in names(pkg.watch)) {
methods <- pkg.watch[[pkg]]
if (pkg %in% attached.packages) {
# replace now
.replace.methods.with.spy(methods)
} else {
# replace them right after they're loaded
setHook(
packageEvent(pkg, "attach"),
# parameters included for reader comprehension - we don't need or use them
function(pkg.name, pkg.path) {
# WARNING: you cannot use `methods` here. The reason is the value is
# mutated within this loop and R's lazy evaluation means that ALL
# hook functions will end up having a `methods` value from the final
# iteration of the loop
.replace.methods.with.spy(pkg.watch[[pkg.name]])
},
action = "append"
)
}
}
}
It seems to me that I should be able to replace
.replace.methods.with.spy(pkg.watch[[pkg.name]])
with
.replace.methods.with.spy(methods)
but it doesn't work. Diagnostic print statements indicate that in all cases methods has the value associated with pkg.watch[["utils"]].
I feel as if some combination of force, eval, and substitute are the solution here but I have not yet found it. Can anyone enlighten me?
I have a simple recursion function in R that would modify its input arguments. I use eval.parent(substitute()) in order to apply the changes on the arguments, However I get this error:
"invalid (do_set) left-hand side to assignment "
Does anyone know how to fix this error?
Here is my function:
decompose_clade=function(ind,Small_clades_ind,Remove_tips,Clades){
Ch=Children(tree,ind)
if(Ch[1]<length(tree$tip.label)){
l=Remove_tips
l=c(l,Ch[1])
eval.parent(substitute(Remove_tips=l))
print("tip added=")
print(Ch[1])
}else if(length(extract.clade(tree,Ch[1])$tip.label)<4){
s=Small_clades_ind
s=c(s,Ch[1])
eval.parent(substitute(Small_clades_ind=s))
print("small clade added=")
print(Ch[1])
}else if(length(extract.clade(tree,Ch[1])$tip.label)>80){
decompose_clade(Ch[1],Small_clades_ind,Remove_tips,Clades)
print("function calls again")
}else{
m=Clades
m=c(m,Ch[1])
eval.parent(substitute(Clades<-m))
print("a clade added=")
print(Ch[1])
}
}
There is no easy way to have pass-by-reference behaviours in R, with only one exception: environment. I'm not sure if environment suits your need, you can give it a try:
modify_input <- function(x){
x$z <- 1
}
x <- new.env(parent = emptyenv())
modify_input(x)
x$z
As to the usage, environment supports e$z and e[["z"]] and length(e) just like list, but it doesn't support e[[1]] and things like that. You can think of environment as a dictionary and elements in it have no order. If you want to list all the elements in an environment, you can use ls. And there are ways to transform environment to list (as.list) and vise versa (list2env). Hope it can help.
May be you can follow the following approach.
x <- 1
fn <- function() {
x <<- 2
}
Now x is 2.
I am currently writing a package using reference classes. I have come across
an issue which from reading various sources:
Method initialisation in R reference classes
Can't reliably use RefClass methods in Snowfall
I gather is caused because reference methods are not all copied to every object
in the class rather they are copied when first accessed.
https://stat.ethz.ch/pipermail/r-devel/2011-June/061261.html
As an example define:
test <- setRefClass("TEST",
fields = list( a = "numeric"),
methods = list(
addone = function(){
a <<- a+1
},
initialize = function(){
a <<- 1
}
)
)
example <- test$new()
So example is a new object of class TEST. Typing example$ and tabbing in the
console gives
> example$
# example$.->a example$.refClassDef example$.self
# example$a example$initialize
so the method addone is not presented as an option. It is available to
call however:
example$addone()
Now tabbing again reveals
# >
# > example
# Reference class object of class "TEST"
# Field "a":
# [1] 2
# > example$
# example$.->a example$.refClassDef example$.self
# example$a example$addone example$field
# example$initialize example$show
so now addone and field and show are presented as options.
Martin Morgan advises to force definition of the methods in one of the above links. This
works well
test <- setRefClass("TEST",
fields = list( a = "numeric"),
methods = list(
addone = function(){
a <<- a+1
},
initialize = function(){
a <<- 1
.self$addone #force definition
}
)
)
example <- test$new()
so now tabbing gives:
# > example$
# example$.->a example$.refClassDef example$.self
# example$a example$addone example$initialize
Some of my classes have over 30 methods so I would like to do this as succintly as possible.
I have defined:
test <- setRefClass("TEST",
fields = list( a = "numeric"),
methods = list(
addone = function(){
a <<- a+1
},
initialize = function(){
a <<- 1
eval(parse(text=paste0('.self$',ls(test$def#refMethods))))
}
)
)
example <- test$new()
tabbing now gives:
# > example$
# example$.->a example$.refClassDef example$.self
# example$a example$addone example$callSuper
# example$copy example$export example$field
# example$getClass example$getRefClass example$import
# example$initFields example$initialize example$show
# example$trace example$untrace
Whilst this works it feels a bit clumsy. Also test$def#refMethods is used rather then getRefClass("TEST")$def#refMethods so that
feels a bit wrong. Has anyone dealt with this issue before.
Is there a better way to approach a solution? Thanks for any advice and apologies if the question is overly drawn out.
I wonder what your objective is? Function names showing up with tab completion? Then it's worth a post to the R-devel mailing list with a feature request. The original scenario is more elegantly handled with usingMethods as documented on ?setRefClass. A continued hack might be
initialize = function(...) {
methods <- getRefClass(class(.self))$methods()
eval(parse(text=paste0(".self$", methods)))
callSuper(...)
}
Tab completions can be customized via .DollarNames in the utils package, so
.DollarNames.TEST <- function(x, pattern)
grep(pattern, getRefClass(class(x))$methods(), value=TRUE)
Maybe an S3 method could be written at the base of your class hierarchy for this?
I know this is an old question but it is still the top entry when searching for refClass tab completion on google, so I'll just add an update:
Instead of using grep in the .DollarNames function as suggested by Martin, use findMatches from the utils package as it plays better with the different Rgui's around (grep will delete your partially typed name upon hitting tab)
.DollarNames.TEST <- function(x, pattern){
utils:::findMatches(pattern, getRefClass(class(x))$methods())
}
This is also how tab completion is handled internally for lists and data.frames
#Martin Morgan noted that this was termed tab completion. The package rcompletion and later rcompgen were tasked with achieving this. They have been now moved to utils.
rcompletion update
I looked thru the code for completion.R and from what I could determine utils:::.DollarNames.environment was handling tab completion for reference classes.
completion.R
Redefining the function seemed to achieve tab completion:
assignInNamespace( x = ".DollarNames.environment",
function(x, pattern = "") {
y <- NULL
if(isS4(x) && !is.null(x[['.refClassDef']])){
if(.hasSlot(x$.refClassDef,'refMethods')){
y<-x$.refClassDef#refMethods
y<-ls(y, all.names = TRUE, pattern = pattern)
}
}
x<-ls(x, all.names = TRUE, pattern = pattern)
unique(c(x,y))
}
,ns = "utils")
Some things to note:
I would only use this for my own use. Currently I am debugging and documenting a package. I had some longish method names and couldnt remember exactly what they were so tab completion will help greatly.
Usage of assignInNamespace in a package is frowned upon (if not banned) see ?assignInNamespace.
Forced definition of methods is more advisable.