Make list content available in a function environment - r

In order to avoid creating R functions with many arguments defining settings for a single object, I'm gathering them in a list,
list_my_obj <- list("var1" = ..., "var2" = ..., ..., "varN" = ...)
class(list_my_obj) <- "my_obj"
I then define functions that accept such a list as argument and inject the elements of the list in the function scope:
my_fun <- function(list_my_obj) {
stopifnot(class(list_my_obj) == "my_obj")
list2env(list_my_obj, envir=environment())
rm(list_my_obj)
var_sum <- var1 + var2
(...)
}
Injecting the elements of the list in the function scope allows me to avoid calling them with list_my_obj$var1, list_my_obj$var2, etc, later in the function, which would reduce the readability of the code.
This solution works perfectly fine, however it produces a note when running R CMD check, saying "no visible binding for global variable" for var1, var2, ... varN.
To avoid such notes, one could just create new variables at the beginning of the function body "by hand" for each element of the list:
var1 <- list_my_obj$var1
(...)
varN <- list_my_obj$varN
but I would like to avoid this because N can be large.
Any better solution or idea on how to suppress the R CMD check notes in this case?
Thank you!

The function list2env is made for this, for example:
list2env(list_my_obj, env = environment())

Try with (or within):
f <- function(x) {
stopifnot(inherits(x, "my_obj"))
with(x, {
# ...
var_sum <- var1 + var2
# ...
var_sum
})
}
my_obj <- structure(list(var1 = 1, var2 = 2), class = "my_obj")
f(my_obj)

Related

Why is print() going to change the output of my function?

I am working on a function that tries to give me the top answers of a column. In the example below there is just a part of my whole function. My final goal is to run the function over a loop. I have detected something weird: why is print(df_col_indicator) gonna change the result when I define "df_col_indicator" externally and not within my function? With print(df_col_indicator) my function is actually exactly doing what I want..
library(dplyr)
library(tidyverse)
remove(list = ls())
dataframe_test <- data.frame(
county_name = c("a", "b","c", "d","e", "f", "g", "h"),
column_test1 = c(100,100,100,100,100,100,50,50),
column_test2 = c(40,90,50,40,40,100,13,14),
column_test3 = c(100,90,50,40,30,40,100,50),
month = c("2020-09-01", "2020-09-01" ,"2020-09-01" ,"2020-09-01" ,"2020-09-01" ,"2020-09-01" ,"2020-08-01","2020-08-01"))
choose_top_5 <- function(df, df_col_indicator, df_col_month, char_month, numb_top, df_col_county) {
### this here changes output of my function
#print(df_col_indicator) # changes output of my function depending on included or excluded
### enquo / ensym / deparse
df_col_indicator_ensym <- ensym(df_col_indicator)
df_col_month_ensym <- ensym(df_col_month)
### filter month and top 5 observations
df_top <- df %>%
filter(!!df_col_month_ensym == char_month) %>%
slice_max(!!df_col_indicator_ensym, n = numb_top) %>%
select(!!df_col_county, !!df_col_month_ensym, !!df_col_indicator_ensym)
return(df_top)
}
### define "df_col_indicator" within the function
a = choose_top_5(df = dataframe_test, df_col_indicator = "column_test3",
df_col_month = "month", char_month = "2020-09-01", numb_top = 5,
df_col_county = "county_name")
a
### define "df_col_indicator" externally
external = "column_test3"
b = choose_top_5(df = dataframe_test, df_col_indicator = external,
df_col_month = "month", char_month = "2020-09-01", numb_top = 5,
df_col_county = "county_name")
b
### goal is to run function over loop
external <- c("column_test1","column_test2","column_test3")
my_list <- list()
for (i in external) {
my_list[[i]] <- choose_top_5(df = dataframe_test, df_col_indicator = i,
df_col_month = "month", char_month = "2020-09-01", numb_top = 5,
df_col_county = "county_name")
}
my_list
Your example is quite lengthy. Let's boil it down to a minimal reproducible example with two very similar functions. These both take a single argument and simply print the passed variable to the console, and return the result of calling ensym on the same variable.
The only difference between the two is the order in which the calls to print and ensym are made.
library(rlang)
test_ensym1 <- function(x)
{
result <- ensym(x)
print(x)
return(result)
}
test_ensym2 <- function(x)
{
print(x)
result <- ensym(x)
return(result)
}
Now we might expect these two functions to do exactly the same thing, and indeed when we pass a string directly to them, they both give the same result:
test_ensym1("hello")
#> [1] "hello"
#> hello
test_ensym2("hello")
#> [1] "hello"
#> hello
But look what happens when we use an external variable to pass in our string:
y <- "hello"
test_ensym1(y)
#> [1] "hello"
#> y
test_ensym2(y)
#> [1] "hello"
#> hello
The functions both still print "hello", as expected, but they return a different result. When we called ensym first, the function returned the symbol y, and when we called print first it returned the symbol hello.
The reason for this is that when you call a function in R, the symbols you pass as parameters are not evaluated immediately. Instead, they are interpreted as promise objects and evaluated as required in the body of the function. It is this lazy evalutation that allows for some of the tidyverse trickery.
The difference between the two functions above is that calling print(x) forces the evaluation of x. Before that point, x is an unevaluated symbol. Afterwards, it behaves just like any other variable you would use interactively in the console, so when you call ensym, you are calling it on this evaluated variable, not as an unevaluated promise.
ensym, on the other hand, does not evaluate x, so if ensym is called first, it will return the unevaluated symbol that was passed to the function.
So actually, the easiest way to fix your problem is to move print to after the ensym call.
You also have to change ensym to as.symbol.
Consider a function like this
f <- function(x) ensym(x)
myvar <- "some string"
You will find that
> f("some string")
`some string`
> f(myvar)
myvar
This is because ensym only searches for the thing one step ahead. It attempts to convert whatever thing found into a symbol and just returns that (note that if what found is neither a string nor variable, then you will get an error). As such, in your first example, ensym returns column_test3; in your second one, it returns external.
As far as I can tell, what you want to do is getting the value that df_col_indicator represents and then converting that value into a symbol. This means you have to first evaluate df_col_indicator and then convert. as.symbol does what you need.
g <- function(x) as.symbol(x)
myvar <- "some string"
Some tests
> g("some string")
`some string`
> g(myvar)
`some string`

Why might quasiquotation fail to pass a variable to `with()` inside a function?

Issue
I am trying to implement quasiquotation within a function to split a dataframe into parts according to levels in a factor. The function returns a Error in levels(variable) : object 'Component' not found error when run, where Component is a factor in the provided dataframe.
Function
split_by_factor <- function(x, variable){
v <- quote(variable)
el <- expr(with(x, levels(!!v)))
l <- eval(el)
result <- list()
i <- 1
for(level in l){
e <- expr(with(x, x[!!v == level,]))
result[i] <- eval(e)
i <- i+1
}
return(result)
}
The function accepts a data frame as x and an unquoted factor variable within that data frame as variable. It is then supposed to return a list of data frames that have been separated by the levels in the provided factor
Failed resolution
When defining x and variable independently and running body of the function outside of the function call, everything works as intended. This lead me to believe it was an issue with operating within the function environment. I have tried switching quote() to enquo() and expr() to enexpr() as I read the expr() can misbehave within a function, but I get the similar error: Error:arg must be a symbol. Moreover, since with() sets the environment R should look for objects I am confused why this is behaving differently inside the function at all.
Edit: Reproducible example
sample_data <- data.frame(v1 = sample(1:5, 10, replace = TRUE),
v2 = factor(x = sample(1:5, 10, replace = TRUE),
levels = c(1, 2, 3, 4, 5),
labels = c("a", "b", "c", "d", "e")))
split_by_factor(sample_data, v2)
Error message received when running this example:
Show Traceback
Rerun with Debug
Error in levels(variable) : object 'v2' not found
The line
v <- quote(variable)
will only store the symbol variable. It will not capture the symbol passed to your function. Which means when you call
el <- expr(with(x, levels(!!v)))
it will be running
el <- expr(with(x, levels(variable)))
at that point it will try to evaluate the argument you passed in and it will discover there is no such variable named v2 in the global environment.
If you are just trying to pass in a column name to the rlang world, instead you should use ensym (though when I tested it, enexpr also worked just fine)
library(rlang)
split_by_factor <- function(x, variable){
v <- ensym(variable)
el <- expr(with(x, levels(!!v)))
l <- eval(el)
result <- list()
i <- 1
for(level in l){
e <- expr(with(x, x[!!v == level,]))
result[[i]] <- eval(e)
i <- i+1
}
return(result)
}
This will now make sure you are running
el <- expr(with(x, levels(v2)))
Also, an unrelated error you have in your code is when you assign a value into a list, you want to use [[]] rather than []. So you should have
result[[i]] <- eval(e)

R: passing argument name in dots (...) through a third string variable

Imagine you have a simple function that specifies which statistical tests to run for each variable. Its syntax, simplified for the purposes of this question is as follows:
test <- function(...) {
x <- list(...)
return(x)
}
which takes argument pairs such as Gender = 'Tukey', and intends to pass its result to other functions down the line. The output of test() is as follows:
test(Gender = 'Tukey')
# $Gender
# [1] "Tukey"
What is desired is the ability to replace the literal Gender by a dynamically assigned variable varname (e.g., for looping purposes). Currently what happens is:
varname <- 'Gender'
test(varname = 'Tukey')
# $varname
# [1] "Tukey"
but what is desired is this:
varname <- 'Gender'
test(varname = 'Tukey')
# $Gender
# [1] "Tukey"
I tried tinkering with functions such as eval() and parse(), but to no avail. In practice, I resolved the issue by simply renaming the resulting list, but it is an ugly solution and I am sure there is an elegant R way to achieve it. Thank in advance for the educational value of your answer.
NB: This question occurred to me while trying to program a custom function which uses mcp() from the effects package in its internals. The said mcp() function is the real world counterpart of test().
EDIT1: Perhaps it needs to be clarified that (for educational purposes) changing test() is not an option. The question is about how to pass the tricky argument to test(). If you take a look at NB, it becomes clear why: the real world counterpart of test(), namely mcp(), comes with a package. And while it is possible to create a modified copy of it, I am really curious whether there exists a simple solution in somehow 'converting' the dynamically assigned variable to a literal in the context of dot-arguments.
This works:
test <- function(...) {
x = list(...)
names(x) <- sapply(names(x),
function(p) eval(as.symbol(p)))
return(x)
}
apple = "orange"
test(apple = 5)
We can use
test <- function(...) {
x <- list(...)
if(exists(names(x))) names(x) <- get(names(x))
x
}
test(Gender = 'Tukey')
#$Gender
#[1] "Tukey"
test(varname = 'Tukey')
#$Gender
#[1] "Tukey"
What about this:
varname <- "Gender"
args <- list()
args[[varname]] <- "Tukey"
do.call(test, args)

Allowing R functions to directly alter the parent environment

I'm trying to figure out how to allow a function to directly alter or create variables in its parent environment, whether the parent environment is the global environment or another function.
For example if I have a function
my_fun <- function(){
a <- 1
}
I would like a call to my_fun() to produce the same results as doing a <- 1.
I know that one way to do this is by using parent.frame as per below but I would prefer a method that doesn't involve rewriting every variable assignment.
my_fun <- function(){
env = parent.frame()
env$a <- 1
}
Try with:
g <- function(env = parent.frame()) with(env, { b <- 1 })
g()
b
## [1] 1
Note that normally it is preferable to pass the variables as return values rather than directly create them in the parent frame. If you have many variables to return you can always return them in a list, e.g. h <- function() list(a = 1, b = 2); result <- h() Now result$a and result$b have the values of a and b.
Also see Function returning more than one value.

Selecting Which Argument to Pass Dynamically in R

I'm trying to pass a specific argument dynamically to a function, where the function has default values for most or all arguments.
Here's a toy example:
library(data.table)
mydat <- data.table(evildeeds=rep(c("All","Lots","Some","None"),4),
capitalsins=rep(c("All", "Kinda","Not_really", "Virginal"),
each = 4),
hellprobability=seq(1, 0, length.out = 16))
hellraiser <- function(arg1 = "All", arg2= "All "){
mydat[(evildeeds %in% arg1) & (capitalsins %in% arg2), hellprobability]}
hellraiser()
hellraiser(arg1 = "Some")
whicharg = "arg1"
whichval = "Some"
#Could not get this to work:
hellraiser(eval(paste0(whicharg, '=', whichval)))
I would love a way to specify dynamically which argument I'm calling: In other words, get the same result as hellraiser(arg1="Some") but while picking whether to send arg1 OR arg2 dynamically. The goal is to be able to call the function with only one parameter specified, and specify it dynamically.
You could use some form of do.call like
do.call("hellraiser", setNames(list(whichval), whicharg))
but really this just seems like a bad way to handle arguments for your functions. It might be better to treat your parameters like a list that you can more easily manipulate. Here's a version that allows you to choose values where the argument names are treated like column names
hellraiser2 <- function(..., .dots=list()) {
dots <- c(.dots, list(...))
expr <- lapply(names(dots), function(x) bquote(.(as.name(x)) %in% .(dots[[x]])))
expr <- Reduce(function(a,b) bquote(.(a) & .(b)), expr)
eval(bquote(mydat[.(expr), hellprobability]))
}
hellraiser2(evildeeds="Some", capitalsins=c("Kinda","Not_really"))
hellraiser2(.dots=list(evildeeds="Some", capitalsins=c("Kinda","Not_really")))
This use of ... and .dots= syntax is borrowed from the dplyr standard evaluation functions.
I managed to get the result with
hellraiser(eval(parse(text=paste(whicharg, ' = \"', whichval, '\"', sep=''))))

Resources