Assigning to a quosure in R / dplyr / rlang - r

Within a function, I want to update-in-place the values of columns specified by the user, where the user specified column names are captured via enquo(). So, here's a simplified example:
f1 <- function(df, x, y) {
x <- enquo(x)
y <- enquo(y)
df %>%
mutate((!! x) := (!! x)^2,
(!! y) := (!! y)+1)
}
dat <- data.frame(a=1:10, b=10:1)
f1(dat, x=a, y=b)
This fails with an error: "The LHS of := must be a string or a symbol".
I've also tried replacing, for example, (!! x) with quo_get_expr(x) and f_text(x), but get the same error. For example:
f1 <- function(df, x, y) {
x <- enquo(x)
y <- enquo(y)
df %>%
mutate(quo_get_expr(x) := (!! x)^2,
quo_get_expr(y) := (!! y)+1)
}
Can anyone point out what I'm doing wrong?
I'm using R 4.1, dplyr 0.7.4, and rlang 0.2.0
Thanks in advance.

You need to use quo_name. This works:
f1 <- function(df, x, y) {
x <- enquo(x)
y <- enquo(y)
df %>%
mutate(
!!quo_name(x) := (!!x)^2,
!!quo_name(y) := (!!y)+1)
}
dat <- data.frame(a=1:10, b=10:1)
f1(dat, x=a, y=b)

Related

Function table(x, y) compatible with R base language and native pipe

I am looking to develop a tab(x, y) or tab(d, x, y) function compatible with r base language (table(d$x, d$y)) and native pipe (d |> table(x, y)).
Native pipe compatible function:
tab1 <- function(d, x, y){
eval(substitute(table(d$x, d$y)))
}
mtcars |> tab1(cyl, vs)
# or
tab1(mtcars, cyl, vs)
R base:
tab2 <- function (x, y) {
result <- table(x, y)
result
}
tab2(mtcars$cyl, mtcars$vs)
What should I add to the tab1 function so that the command tab1(mtcars, mtcars$cyl, mtcars$vs) does not return an error message (sometimes, using a pipe compatible function, the name of the dataframe is repeated incorrectly)?
I would like to get a function (tab(d, x, y) or tab(x, y)) that works with these commands:
tab(mtcars, cyl, vs) # it's tab1 function
mtcars |> tab(cyl, vs) # it's tab1 function
tab(mtcars, mtcars$cyl, mtcars$vs) # doesn't work with tab1 function
and ideally also with:
tab(mtcars$cyl, mtcars$vs) # it's tab2 function
With this definition the examples below all work as in table. Note that table(mtcars$vs, mtcars$cyl) does not show names so tab3 does not either in those cases. (If table in tab3 were replaced with qtab from collapse then it would show names.)
tab3 <- function(d = parent.frame(), x, y, ...) {
if (missing(y)) eval(substitute(table(d, x)))
else eval(substitute(table(x, y)), d)
}
tab3(x = mtcars$vs, y = mtcars$cyl)
tab3(mtcars, vs, cyl)
tab3(mtcars$vs, mtcars$cyl)
tab3(mtcars, mtcars$vs, mtcars$cyl)
The code could be simplified if d were last:
tab4 <- function(x, y, d = parent.frame(), ...) {
eval(substitute(table(x, y)), d)
}
tab4(x = mtcars$vs, y = mtcars$cyl)
tab4(vs, cyl, mtcars)
tab4(mtcars$vs, mtcars$cyl)
tab4(mtcars$vs, mtcars$cyl, mtcars)
Perhaps you could just use xtabs. All of these work and do show names.
xtabs(~ vs + cyl, mtcars)
xtabs(~ mtcars$vs + mtcars$cyl)
xtabs(~ mtcars$vs + mtcars$cyl, mtcars)

Passing data-variables to R formulas

Let's say I'd like to write anscombe %>% lm_tidy("x1", "y1") (Actually, I'd like to write anscombe %>% lm_tidy(x1, y1), where x1 and y1 are part of the data frame). So, as the following function seems working:
plot_gg <- function(df, x, y) {
x <- enquo(x)
y <- enquo(y)
ggplot(df, aes(x = !!x, y = !!y)) + geom_point() +
geom_smooth(formula = y ~ x, method="lm", se = FALSE)
}
I started writing the following function:
lm_tidy_1 <- function(df, x, y) {
x <- enquo(x)
y <- enquo(y)
fm <- y ~ x ##### I tried many stuff here!
lm(fm, data=df)
}
## Error in model.frame.default(formula = fm, data = df, drop.unused.levels = TRUE) :
## object is not a matrix
One comment in passing in column name as argument states that embrace {{...}} is a shorthand notation for the quote-unquote pattern. Unfortunately, error messages were different in both situations:
lm_tidy_2 <- function(df, x, y) {
fm <- !!enquo(y) ~ !!enquo(x) # alternative: {{y}} ~ {{x}} with different errors!!
lm(fm, data=df)
}
## Error:
## ! Quosures can only be unquoted within a quasiquotation context.
This seems working (based on #jubas's answer but we're stuck with string handling and paste):
lm_tidy_str <- function(df, x, y) {
fm <- formula(paste({{y}}, "~", {{x}}))
lm(fm, data=df)
}
Yet again, {{y}} != !!enquo(y). But it's worse: the following function breaks down with the same Quosure error as earlier:
lm_tidy_str_1 <- function(df, x, y) {
x <- enquo(x)
y <- enquo(y)
fm <- formula(paste(!!y, "~", !!x))
lm(fm, data=df)
}
Is {{y}} != !!enquo(y)?
How to pass data-variables to lm?
EDIT: Sorry, there were left-overs from my many trials. I want to directly pass the data-variables (say x1 and y1) to the function that is going to use them as formula components (such as lm) and not their string versions ("x1" and "y1"): I try to avoid strings as long as possible and it's more streamlined from the user perspective.
Consider:
lm_tidy_1 <- function(df, x, y) {
fm <- reformulate(as.character(substitute(x)), substitute(y))
lm(fm, data=df)
}
lm_tidy_1(iris, Species, Sepal.Length)
lm_tidy_1(iris, 'Species', Sepal.Length)
lm_tidy_1(iris, Species, 'Sepal.Length')
lm_tidy_1(iris, 'Species', 'Sepal.Length')
Edit:
If you need the formula to appear, change the call object:
lm_tidy_1 <- function(df, x, y) {
fm <- reformulate(as.character(substitute(x)), substitute(y))
res<-lm(fm, data=df)
res$call[[2]]<- fm
res
}
lm_tidy_1(iris, Species, Sepal.Length)
Call:
lm(formula = Sepal.Length ~ Species, data = df)
Coefficients:
(Intercept) Speciesversicolor Speciesvirginica
5.006 0.930 1.582
#BiranSzydek's answer is pretty good.
However it has 3 downsides:
Call:
lm(formula = fm, data = .)
One cannot see the formula nor the data which were actually used.
One has to input the symbols as strings.
The dependency from rlang - though it is a great package.
You can indeed solve this problem with pure base R!
The solution in pure base R
R is actually under-the-hood a Lisp. It is suitable for such meta-programming tasks. The only downside of R is its horrible syntax.
Especially when facing meta-programming, it is not as beautiful and as elegant like the Lisp languages. The syntax really can confuse a lot - as you experienced it yourself when trying to solve this problem.
The solution is to use substitute() by which you can substitute code pieces in a quoted manner:
lm_tidy <- function(df, x, y) {
# take the arguments as code pieces instead to evaluate them:
.x <- substitute(x)
.y <- substitute(y)
.df <- substitute(df)
# take the code piece `y ~ x` and substitute using list lookup table
.fm <- substitute(y ~ x, list(y=.y, x=.x))
# take the code `lm(fm, data=df)` and substitute with the code pieceses defined by the lookup table
# by replacing them by the code pieces stored in `.fm` and `.df`
# and finally: evaluate the substituted code in the parent environment (the environment where the function was called!)
eval.parent(substitute(lm(fm, data=df), list(fm=.fm, df=.df)))
}
The trick is to use eval.parent(substitute( <your expression>, <a list which determines the evaluation lookup-table for the variables in your expression>)).
Beware of scoping! As long as <your expression> is constructed only using variables which are defined inside the function or inside the lookup-list given to substitute(), there won't be any scoping problems! But avoid to refer to any other variables within <your expression>! - So this is the only rule you have to obey to use eval()/eval.parent() safely in this context!
but even if, the eval.parent() takes care, that the substituted code
is executed within the environment where this function was called.
Now, you can do:
lm_tidy(mtcars, cyl, mpg)
the output is now as desired:
Call:
lm(formula = mpg ~ cyl, data = mtcars)
Coefficients:
(Intercept) cyl
37.885 -2.876
And we did this with pure base R!
The trick for safe use of eval() is really that every variable in the substitute() expression is defined/given inside the lookup tables for substitute() or the function's argument. In other words: None of the replaced variables refers to any dangling variables outside the function definition.
plot_gg function
So following these rules, your plot_gg function would be defined as:
plot_gg <- function(df, x, y) {
.x <- substitute(x)
.y <- substitute(y)
.df <- substitute(df)
.fm <- substitute( y ~ x, list(x=.x, y=.y))
eval.parent(substitute(
ggplot(df, aes(x=x, y=y)) + geom_point() +
geom_smooth(formula = fm, method="lm", se=FALSE),
list(fm=.fm, x=.x, y=.y, df=.df)
))
}
When you want to enter x and y as strings
lm_tidy_str <- function(df, x, y) {
.x <- as.name(x)
.y <- as.name(y)
.df <- substitute(df)
.fm <- substitute(y ~ x, list(y=.y, x=.x))
eval.parent(substitute(lm(fm, data=df), list(fm=.fm, df=.df)))
}
plot_gg_str <- function(df, x, y) {
.x <- as.name(x)
.y <- as.name(y)
.df <- substitute(df)
.fm <- substitute( y ~ x, list(x=.x, y=.y))
eval.parent(substitute(
ggplot(df, aes(x=x, y=y)) + geom_point() +
geom_smooth(formula = fm, method="lm", se=FALSE),
list(fm=.fm, x=.x, y=.y, df=.df)
))
}
lm_tidy_str(mtcars, "cyl", "mpg")
# Call:
# lm(formula = mpg ~ cyl, data = mtcars)
#
# Coefficients:
# (Intercept) cyl
# 37.885 -2.876
#
require(ggplot2)
plot_gg_str(mtcars, "cyl", "mpg")
Wrap the formula in "expr," then evaluate it.
library(dplyr)
lm_tidy <- function(df, x, y) {
x <- sym(x)
y <- sym(y)
fm <- expr(!!y ~ !!x)
lm(fm, data = df)
}
This function is equivalent:
lm_tidy <- function(df, x, y) {
fm <- expr(!!sym(y) ~ !!sym(x))
lm(fm, data = df)
}
Then
lm_tidy(mtcars, "cyl", "mpg")
gives
Call:
lm(formula = fm, data = .)
Coefficients:
(Intercept) cyl
37.885 -2.876
EDIT per comment below:
library(rlang)
lm_tidy_quo <- function(df, x, y){
y <- enquo(y)
x <- enquo(x)
fm <- paste(quo_text(y), "~", quo_text(x))
lm(fm, data = df)
}
You can then pass symbols as arguments
lm_tidy_quo(mtcars, cyl, mpg)

Function within a function: cramer function "Error in loglin"

I tried to embed the cramer function from sjstats package. Although the function works perfectly outside the custom function, it doesn't work within it.
Thank you very much in advance.
library (sjstats)
cramer2 <- function(dta, x, y){
effsize <- cramer(x ~ y, data = dta)
return(effsize)
}
cramer2(x=gender, y=age, dta=df)
Error in loglin(data, margins, start = start, fit = fitted, param = param, :
falsche Spezifikationen für 'table' oder 'start'
This happens because x and y are not automatically substituted in a formula for the variables you have passed. Look:
f <- function(x, y) {
return(x ~ y)
}
f(a, b)
#> x ~ y
If you want the variables substituted, you can do something like
f2 <- function(x, y) {
call("~", substitute(x), substitute(y))
}
f2(a, b)
#> a ~ b
So in your case you can do:
library (sjstats)
cramer2 <- function(dta, x, y) {
f <- as.formula(call("~", substitute(x), substitute(y)))
effsize <- cramer(f, data = dta)
return(effsize)
}
Obviously we don't have your data, but using the built-in data set efc we can demonstrate this works as expected:
data(efc)
cramer2(efc, e16sex, c161sex)
#> [1] 0.05258249
Created on 2022-02-27 by the reprex package (v2.0.1)
The solution provided by Allan works perfectly fine if your function does not target the variables with quotation marks, i.e. customfunction(dta=mydata, x= gender, y=age, weight=dataweight).
If, however, you must for some reason target the variables with quotation marks, e.g. customfunction(dta=mydata, x= "gender", y="age", weight="dataweight").
Then replace substitute with sym:
library (sjstats)
cramer2 <- function(dta, x, y) {
f <- as.formula(call("~", sym(x), sym(y)))
effsize <- cramer(f, weights=dta[[weight]], data = dta)
return(effsize)
}

How to make the code more readable (for-loop or apply)

In my R code I’m using variables mydata1, mydata2, mydata3, …, mydataN that are store results of the function1():
mydata1 <- function1()
mydata2 <- function1()
mydata3 <- function1()
#…
mydataN <- function1()
Variables mydata1, mydata2, mydata3, …, mydataN can be the vectors, matrices or data.frames.
Later I pass variables mydata1, mydata2, mydata3, …, mydataN as input values of the function2() and save results in the new variables newmydata1, newmydata2, …, newmydata1:
newmydata1<- function2(mydata1)
newmydata2<- function2(mydata2)
newmydata3<- function2(mydata3)
#…
newmydataN<- function2(mydataN)
I now the number N before calculation.
Question. How to make the code more functional and readable?
Should I use the for-loop or a function from the apply family?
To make this concrete and runnable define the initial list, L0 and functions used as in the Note at the end.
In the following solutions L3, LL, L, out4, out4a and out5 are all identical.
1) Repeated lapply
L1 <- lapply(L0, function1)
L2 <- lapply(L1, function2)
L3 <- lapply(L2, function3)
2) Reduce
FL <- list(function1, function2, function3)
LL <- Reduce(lapply, FL, init = L0)
3) loop FL is as in (2).
L <- L0
for(f in FL) L <- Map(f, L) # or for(f in FL) L <- lapply(L, f)
4) magrittr
library(magrittr)
L0 %>% lapply(function1) %>% lapply(function2) %>% lapply(function3) -> out4
4a) magrittr variation
library(magrittr)
L0 %>% lapply(. %>% function1 %>% function2 %>% function3) -> out4a
5) functional::Compose
library(functional)
out5 <- Map(Compose(function1, function2, function3), L0)
Note: Input used:
L0 <- as.list(1:4)
function1 <- function(x) x+1
function2 <- function(x) 2*x
function3 <- function(x) x^2

How can I use dplyr/magrittr's pipe inside functions in R?

I'm trying to write a function which takes as argument a dataframe and the name of the function. When I try to write the function with the standard R syntax, I can get the good result using eval and substitute as recommanded by #hadley in http://adv-r.had.co.nz/Computing-on-the-language.html
> df <- data.frame(y = 1:10)
> f <- function(data, x) {
+ out <- mean(eval(expr = substitute(x), envir = data))
+ return(out)
+ }
> f(data = df, x = y)
[1] 5.5
Now, when I try to write the same function using the %>% operator, it doesn't work :
> df <- data.frame(y = 1:10)
> f <- function(data, x) {
+ data %>%
+ eval(expr = substitute(x), envir = .) %>%
+ mean()
+ }
> f(data = df, x = y)
Show Traceback
Rerun with Debug
Error in eval(expr, envir, enclos) : objet 'y' introuvable
>
How can I using the combine the piping operator with the use of eval and substitute ? It's seems really tricky for me.
A workaround would be
f <- function(data, x) {
v <- substitute(x)
data %>%
eval(expr = v, envir = .) %>%
mean()
}
The problem is that the pipe functions (%>%) are creating another level of closure which interferes with the evaluation of substitute(x). You can see the difference with this example
df <- data.frame(y = 1:10)
f1 <- function(data, x) {
print(environment())
eval(expr = environment(), envir = data)
}
f2 <- function(data, x) {
print(environment())
data %>%
eval(expr = environment(), envir = .)
}
f1(data = df, x = y)
# <environment: 0x0000000006388638>
# <environment: 0x0000000006388638>
f2(data = df, x = y)
# <environment: 0x000000000638a4a8>
# <environment: 0x0000000005f91ae0>
Notice how the environments differ in the matrittr version. You want to take care of substitute stuff as soon as possible when mucking about with non-standard evaluation.
I hope your use case is a bit more complex than your example, because it seems like
mean(df$y)
would be a much easier bit of code to read.
I've been trying to understand my problem.
First, I've written what I want with the summarise() function :
> library(dplyr)
> df <- data.frame(y = 1:10)
> summarise_(.data = df, mean = ~mean(y))
mean
1 5.5
Then I try to program my own function. I've found a solution which seems to work with the lazyeval package in this post. I use the lazy() and the interp() functions to write what I want.
The first possibility is here :
> library(lazyeval)
> f <- function(data, col) {
+ col <- lazy(col)
+ inter <- interp(~mean(x), x = col)
+ summarise_(.data = data, mean = inter)
+ }
> f(data = df, col = y)
mean
1 5.5
I can also use pipes :
> f <- function(data, col) {
+ col <- lazy(col)
+ inter <- interp(~mean(x), x = col)
+ data %>%
+ summarise_(.data = ., mean = inter)
+ }
>
> f(data = df, col = y)
mean
1 5.5
I would not use eval and substitute.
What follows is a simplified version of this great post suited to your question.
df <- data.frame(y = 1:10)
f <- function(data, x) {
x <- enquo(x)
df %>% summarise(mean = mean(!!x))
}
f(data = df, x = y)
There are two things happening here:
Tranforming the column name with enquo()
Prefixing the column with !!
Please see refer to the link for a more complicated example.

Resources