I am working on a function that tries to give me the top answers of a column. In the example below there is just a part of my whole function. My final goal is to run the function over a loop. I have detected something weird: why is print(df_col_indicator) gonna change the result when I define "df_col_indicator" externally and not within my function? With print(df_col_indicator) my function is actually exactly doing what I want..
library(dplyr)
library(tidyverse)
remove(list = ls())
dataframe_test <- data.frame(
county_name = c("a", "b","c", "d","e", "f", "g", "h"),
column_test1 = c(100,100,100,100,100,100,50,50),
column_test2 = c(40,90,50,40,40,100,13,14),
column_test3 = c(100,90,50,40,30,40,100,50),
month = c("2020-09-01", "2020-09-01" ,"2020-09-01" ,"2020-09-01" ,"2020-09-01" ,"2020-09-01" ,"2020-08-01","2020-08-01"))
choose_top_5 <- function(df, df_col_indicator, df_col_month, char_month, numb_top, df_col_county) {
### this here changes output of my function
#print(df_col_indicator) # changes output of my function depending on included or excluded
### enquo / ensym / deparse
df_col_indicator_ensym <- ensym(df_col_indicator)
df_col_month_ensym <- ensym(df_col_month)
### filter month and top 5 observations
df_top <- df %>%
filter(!!df_col_month_ensym == char_month) %>%
slice_max(!!df_col_indicator_ensym, n = numb_top) %>%
select(!!df_col_county, !!df_col_month_ensym, !!df_col_indicator_ensym)
return(df_top)
}
### define "df_col_indicator" within the function
a = choose_top_5(df = dataframe_test, df_col_indicator = "column_test3",
df_col_month = "month", char_month = "2020-09-01", numb_top = 5,
df_col_county = "county_name")
a
### define "df_col_indicator" externally
external = "column_test3"
b = choose_top_5(df = dataframe_test, df_col_indicator = external,
df_col_month = "month", char_month = "2020-09-01", numb_top = 5,
df_col_county = "county_name")
b
### goal is to run function over loop
external <- c("column_test1","column_test2","column_test3")
my_list <- list()
for (i in external) {
my_list[[i]] <- choose_top_5(df = dataframe_test, df_col_indicator = i,
df_col_month = "month", char_month = "2020-09-01", numb_top = 5,
df_col_county = "county_name")
}
my_list
Your example is quite lengthy. Let's boil it down to a minimal reproducible example with two very similar functions. These both take a single argument and simply print the passed variable to the console, and return the result of calling ensym on the same variable.
The only difference between the two is the order in which the calls to print and ensym are made.
library(rlang)
test_ensym1 <- function(x)
{
result <- ensym(x)
print(x)
return(result)
}
test_ensym2 <- function(x)
{
print(x)
result <- ensym(x)
return(result)
}
Now we might expect these two functions to do exactly the same thing, and indeed when we pass a string directly to them, they both give the same result:
test_ensym1("hello")
#> [1] "hello"
#> hello
test_ensym2("hello")
#> [1] "hello"
#> hello
But look what happens when we use an external variable to pass in our string:
y <- "hello"
test_ensym1(y)
#> [1] "hello"
#> y
test_ensym2(y)
#> [1] "hello"
#> hello
The functions both still print "hello", as expected, but they return a different result. When we called ensym first, the function returned the symbol y, and when we called print first it returned the symbol hello.
The reason for this is that when you call a function in R, the symbols you pass as parameters are not evaluated immediately. Instead, they are interpreted as promise objects and evaluated as required in the body of the function. It is this lazy evalutation that allows for some of the tidyverse trickery.
The difference between the two functions above is that calling print(x) forces the evaluation of x. Before that point, x is an unevaluated symbol. Afterwards, it behaves just like any other variable you would use interactively in the console, so when you call ensym, you are calling it on this evaluated variable, not as an unevaluated promise.
ensym, on the other hand, does not evaluate x, so if ensym is called first, it will return the unevaluated symbol that was passed to the function.
So actually, the easiest way to fix your problem is to move print to after the ensym call.
You also have to change ensym to as.symbol.
Consider a function like this
f <- function(x) ensym(x)
myvar <- "some string"
You will find that
> f("some string")
`some string`
> f(myvar)
myvar
This is because ensym only searches for the thing one step ahead. It attempts to convert whatever thing found into a symbol and just returns that (note that if what found is neither a string nor variable, then you will get an error). As such, in your first example, ensym returns column_test3; in your second one, it returns external.
As far as I can tell, what you want to do is getting the value that df_col_indicator represents and then converting that value into a symbol. This means you have to first evaluate df_col_indicator and then convert. as.symbol does what you need.
g <- function(x) as.symbol(x)
myvar <- "some string"
Some tests
> g("some string")
`some string`
> g(myvar)
`some string`
Related
I want to make many plots using multiple pairs of variables in a dataframe, all with the same x. I store the plots in a named list. For simplicity, below is an example with only 1 variable in each plot.
Key to this function is a select() call that is clearly not necessary here but is with my actual data.
The body of the function works fine on each variable, but when I loop through a list of variables, the last one in the list always produces
Error in get(ll): object 'd' not found.
(or whatever the last variable, if not 'd'). Replacing data <- df %>% select(x,ll) with data <- df avoids the error.
## make data
df2 <- data.frame(x = 1:10,
a = 1:10,
b = 2:11,
c = 101:110,
d = 10*(1:10))
## make function
testfun <- function(df = df2, vars = letters[1:4]){
## initialize list to store plots
plotlist <- list()
for (ll in vars){
## subset data
data <- df %>% select(x, ll) ## comment out select() to get working function
# print(data) ## uncomment to check that dataframe subset works correctly
## plot variable vs. x
p <- ggplot(data,
aes(x = x, y = get(ll))) +
geom_point() +
ylab(ll)
## add plot to named list
plotlist[[ll]] <- p
# print(p) ## uncomment to see that each plot is being made
}
return(plotlist) ## unnecessary, being explicit for troubleshooting
}
## use function
pl <- testfun(df2)
## error ?
pl
I have a work-around that avoids select() by renaming variables in my actual dataframe, but I am curious why this does not work? Any ideas?
get() could work, but not with ll directly. Try y = get(!!ll) or y = {{ll}}.
ggplot (or maybe aes, it's hard to tell) waits to run this code until its plot object is referenced, as the error in the provided code demonstrates. By the time each ggplot evaluates get(ll), the for loop has already finished. So ll evaluates to the last value of the loop variable, "d", for all four ggplots. ll being "d" in the error makes it seem like it's the final ggplot object that fails, but it's actually evaluating the first one that causes this error.
In the body of the loop we'd like a way to evaluate the ll variable and stick that resulting string ("a", "b", "c", or "d") into this code, the rest of which won't run until later. Changing y = get(ll) to y = get(!!ll) is one way to do this: !! performs "surgery" on the unevaluated expression (called a "blueprint for code" in Tidyverse docs) so that the expression passed into ggplot contains a literal string like "a" instead of the variable reference ll.
testfun <- function(df = df2, vars = letters[1:4]){
plotlist <- list()
for (ll in vars){
data <- df %>% select(x, ll)
p <- ggplot(data,
aes(x = x, y = get(!!ll))) +
geom_point() +
ylab(ll)
plotlist[[ll]] <- p
}
return(plotlist)
}
Read on for explanation and an alternate solution.
The loop problem: late binding
In a given function or in the global scope in R, there's just one variable of any given name. A for (x in xs) loop repeatedly rebinds that variable to a new value. That means that after a for loop has finished, that variable still exists and retains the last value it was assigned. Here's a way this can trip you up:
vars <- c("a", "b", "c", "d")
results <- list()
for (ll in vars){
message("in for loop, ll: ", ll)
func <- function () { ll }
results[[ll]] <- c(ll, func)
}
message("after for loop, ll: ", ll)
# after for loop, now ll is "d"
for (vec in results) {
message(vec[[1]], " ", vec[[2]]())
}
This outputs
in for loop, ll: a
in for loop, ll: b
in for loop, ll: c
in for loop, ll: d
after for loop, ll: d
a d
b d
c d
d d
Each of the four functions constructed here use the same outer scope variable ll which, by the time the functions are actually called after the for loop, is "d". The late binding part is that the value of the variable at function call time (late) is used when looking up its value, not the value of the variable when the function is defined (early).
The NSE problem
The OP isn't creating functions in a loop though, they're calling ggplot. ggplot does something similar to creating a function: it takes some code as an argument that it doesn't evaluate until later. ggplot (or maybe aes) "captures" code from some of arguments instead of running them. In OP's case, get(ll) isn't evaluated until later.
When this code is evaluated it's in a new context with a "data mask" that allows names of a data frame to be referenced directly. This part is great, it's what we want — this is what makes get("a") work at all. But the fact that the evaluation happens later is a problem for the OP: ll in get(ll) evaluated to "d", like get("d"), because the code is evaluated after the for-loop iteration where ll had the expected value.
Ignoring the data mask part, here's a function called run.later that, like ggplot, doesn't run one of its arguments. When we run that code later, we again find that ll evaluates to "d" for all four of the saved expressions.
vars <- c("a", "b", "c", "d")
unevaluated.exprs <- list();
run.later <- function(name, something) {
expr <- substitute(something)
unevaluated.exprs[[name]] <<- c(name, expr)
}
for (ll in vars){
run.later(ll, ll)
}
for (vec in unevaluated.exprs) {
message(c(vec[[1]], " ", eval(vec[[2]])))
}
prints
a d
b d
c d
d d
That's the ll part of the problem. The rule of thumb from languages like Python of "Don't define functions in a loop (if they reference loop variables)" could be generalized for R to "don't define functions or otherwise write code that won't be immediately evaluated in a loop (if that code references loop variables)."
Fixing the scope problem instead of metaprogramming
The !! solution provided at the top uses metaprogramming to evaluate the ll variable in the loop instead of evaluating it later.
Theoretically, one could instead dynamically create variables in each iteration of a loop, then carefully reference that dynamically created variable name with metaprogramming. But a more elegant way would be to use the same variable name but in different scopes. This is what Nithin's answer does with a function: every function creates a new scope and tada, you can use the same variable name in each. Here's another version of that, closer to OP's code:
testfun <- function(df = df2, vars = letters[1:4]){
plotlist <- list()
plot.fn <- function(var) {
data <- df %>% select(x, var)
p <- ggplot(data,
aes(x = x, y = get(var))) +
geom_point() +
ylab(var)
plotlist[[ll]] <<- p
}
for (ll in vars){
plot.fn(ll)
}
return(plotlist)
}
pl <- testfun(df2)
pl
There are 4 distinct variables called var in this code, and each iteration of the loop references a different one.
Prettier metaprogramming
I think (haven't tested) that get(!!ll) is equivalent to {{ll}} here — get() looks up a string as a variable, but that's also what sticking the symbol of the string that ll evaluates to into the expression does. Double curlies seem more common and can roughly be understood as "evaluate the result of this expression as a variable in the other context," or as "template this string into the expression."
write a custom function like this
plot_fn<- function(df,y){
df %>% ggplot(aes(x=x,
y=get(y))+
geom_point()+
ylab(y)
}
Iterate over plots with purrr:::map
map(letters[1:4],~plot_fn(df=df2,y=.x))
The issue is that we cannot use get to access dplyr/tidyverse data in a "programming" paradigm. Instead, we should use non standard evaluation to access the data. I offer a simplified function below (originally I thought it was a function masking issue as I quickly skimmed the question).
testfun <- function(df = df2, vars = letters[1:4]){
lapply(vars, function(y) {
ggplot(df,
aes(x = x, y = .data[[y]] )) +
geom_point() +
ylab(y)
})
}
Calling
plots <- testfun(df2)
plots[[1]]
EDIT
Since OP would like to know what the issue is, I have used a traditional loop as requested
testfun2 <- function(df = df2, vars = letters[1:4]){
## initialize list to store plots
plotlist <- list()
for (ll in vars){
## subset data
d_t <- df %>% select(x, ll) ## comment out select() to get working function
# print(data) ## uncomment to check that dataframe subset works correctly
## plot variable vs. x
p <- ggplot(d_t,
aes(x = x, y = .data[[ll]])) +
geom_point() +
ylab(ll)
## add plot to named list
plotlist[[ll]] <- p
## uncomment to see that each plot is being made
}
plotlist
}
pl <- testfun2(df2)
pl[[1]]
The reason get does not work is that we need to use non-standard evaluation as the docs state. Related questions on using get may be useful.
First plot
I'm trying to program over a function inside a package, but I'm stuck with the function internally using match.call() to parse one of its arguments.
A super-simplified example of the function with the usual utilization could look like this:
f1 = function(x, y=0, z=0, a=0, b=0){ #lots of arguments not needed for the example
mc = match.call()
return(mc$x)
#Returning for testing purpose.
#Normally, the function later uses calls as character:
r1 = as.character(mc$x[1])
r2 = as.character(mc$x[2])
#...
}
x1 = f1(x = foo(bar))
x1
# foo(bar)
class(x1)
# [1] "call"
In my case, I need to get the value of x from a variable (value in the following code). Expected utilisation of f1 is as following :
value = "foo(bar)" #this line could also be anything else
f1(x=some_magic_function(value))
# Expected result = foo(bar)
# Unwanted result = some_magic_function(value)
Unfortunately, match.call() always return the very input value. I'm quite out of my league here so I only tried few functions.
Is there any way I could trick match.call() so it could accept external variable ?
Failed attempts so far:
#I tried to create the exact same call using rlang::sym()
#This may not be the best way...
value = call("foo", rlang::sym("bar"))
value
# foo(bar)
class(value)
# [1] "call"
x1==value
# [1] TRUE
f1(x=value)
# value
f1(x=eval(value))
# eval(value)
f1(x=substitute(value))
# substitute(value)
There's nothing you can include as a parameter to f1 to make this work. Instead, you would dynamically need to build your call to f1. With base R you might do this with do.call.
do.call("f1", list(parse(text=value)[[1]]))
or with rlang
eval_tidy(quo(f1(!!parse_expr(value))))
Imagine you have a simple function that specifies which statistical tests to run for each variable. Its syntax, simplified for the purposes of this question is as follows:
test <- function(...) {
x <- list(...)
return(x)
}
which takes argument pairs such as Gender = 'Tukey', and intends to pass its result to other functions down the line. The output of test() is as follows:
test(Gender = 'Tukey')
# $Gender
# [1] "Tukey"
What is desired is the ability to replace the literal Gender by a dynamically assigned variable varname (e.g., for looping purposes). Currently what happens is:
varname <- 'Gender'
test(varname = 'Tukey')
# $varname
# [1] "Tukey"
but what is desired is this:
varname <- 'Gender'
test(varname = 'Tukey')
# $Gender
# [1] "Tukey"
I tried tinkering with functions such as eval() and parse(), but to no avail. In practice, I resolved the issue by simply renaming the resulting list, but it is an ugly solution and I am sure there is an elegant R way to achieve it. Thank in advance for the educational value of your answer.
NB: This question occurred to me while trying to program a custom function which uses mcp() from the effects package in its internals. The said mcp() function is the real world counterpart of test().
EDIT1: Perhaps it needs to be clarified that (for educational purposes) changing test() is not an option. The question is about how to pass the tricky argument to test(). If you take a look at NB, it becomes clear why: the real world counterpart of test(), namely mcp(), comes with a package. And while it is possible to create a modified copy of it, I am really curious whether there exists a simple solution in somehow 'converting' the dynamically assigned variable to a literal in the context of dot-arguments.
This works:
test <- function(...) {
x = list(...)
names(x) <- sapply(names(x),
function(p) eval(as.symbol(p)))
return(x)
}
apple = "orange"
test(apple = 5)
We can use
test <- function(...) {
x <- list(...)
if(exists(names(x))) names(x) <- get(names(x))
x
}
test(Gender = 'Tukey')
#$Gender
#[1] "Tukey"
test(varname = 'Tukey')
#$Gender
#[1] "Tukey"
What about this:
varname <- "Gender"
args <- list()
args[[varname]] <- "Tukey"
do.call(test, args)
I'm trying to pass a specific argument dynamically to a function, where the function has default values for most or all arguments.
Here's a toy example:
library(data.table)
mydat <- data.table(evildeeds=rep(c("All","Lots","Some","None"),4),
capitalsins=rep(c("All", "Kinda","Not_really", "Virginal"),
each = 4),
hellprobability=seq(1, 0, length.out = 16))
hellraiser <- function(arg1 = "All", arg2= "All "){
mydat[(evildeeds %in% arg1) & (capitalsins %in% arg2), hellprobability]}
hellraiser()
hellraiser(arg1 = "Some")
whicharg = "arg1"
whichval = "Some"
#Could not get this to work:
hellraiser(eval(paste0(whicharg, '=', whichval)))
I would love a way to specify dynamically which argument I'm calling: In other words, get the same result as hellraiser(arg1="Some") but while picking whether to send arg1 OR arg2 dynamically. The goal is to be able to call the function with only one parameter specified, and specify it dynamically.
You could use some form of do.call like
do.call("hellraiser", setNames(list(whichval), whicharg))
but really this just seems like a bad way to handle arguments for your functions. It might be better to treat your parameters like a list that you can more easily manipulate. Here's a version that allows you to choose values where the argument names are treated like column names
hellraiser2 <- function(..., .dots=list()) {
dots <- c(.dots, list(...))
expr <- lapply(names(dots), function(x) bquote(.(as.name(x)) %in% .(dots[[x]])))
expr <- Reduce(function(a,b) bquote(.(a) & .(b)), expr)
eval(bquote(mydat[.(expr), hellprobability]))
}
hellraiser2(evildeeds="Some", capitalsins=c("Kinda","Not_really"))
hellraiser2(.dots=list(evildeeds="Some", capitalsins=c("Kinda","Not_really")))
This use of ... and .dots= syntax is borrowed from the dplyr standard evaluation functions.
I managed to get the result with
hellraiser(eval(parse(text=paste(whicharg, ' = \"', whichval, '\"', sep=''))))
I'm trying to create a function that can evaluate multiple independent expressions. My goal is to input many expressions at once like myfunction(x = 2, y = c(5,10,11) , z = 10, ...), and use each expression's name and value to feed other functions inside of it. The transform() function works kind of like that: transform(someData, x = x*2, y = y + 1).
I know I can get the name and the value of an expression using:
> names(expression(x=2))
[1] "x"
> eval(expression(x=2))
[1] 2
However, I don't know how to pass those expressions through a function. Here is some of my work so far.
With unquoted expression (x=2) I could not pass it using the dots (...).
> myfunction <- function(...) { names(expression(...)) }
> myfunction(x=2)
expression(...)
Now, using quotes. It gets the value but not the name. Parse structure is different from the tradicional expression. See class(expression(x=2)) and class(parse(text="x=2")), then str(expression(x=2)) and str(parse(text="x=2")).
> myfunction <- function(...) {
assign("temp",...)
results <- parse(text=temp)
cat(names(results))
cat(eval(results))
}
> myfunction("x=2")
> 2
So, any ideas?
It's unclear exactly what you want the return of your function to be. You can get the names and expressions passed to a function using
myfunction <- function(...) {
x<-substitute(...())
#names(x)
x
}
myfunction(x = 2, y = c(5,10,11) , z = 10)
Here you get a named list and each of the items is an unevaluated expression or language object that you can evaluate later if you like.