I tried to embed the cramer function from sjstats package. Although the function works perfectly outside the custom function, it doesn't work within it.
Thank you very much in advance.
library (sjstats)
cramer2 <- function(dta, x, y){
effsize <- cramer(x ~ y, data = dta)
return(effsize)
}
cramer2(x=gender, y=age, dta=df)
Error in loglin(data, margins, start = start, fit = fitted, param = param, :
falsche Spezifikationen für 'table' oder 'start'
This happens because x and y are not automatically substituted in a formula for the variables you have passed. Look:
f <- function(x, y) {
return(x ~ y)
}
f(a, b)
#> x ~ y
If you want the variables substituted, you can do something like
f2 <- function(x, y) {
call("~", substitute(x), substitute(y))
}
f2(a, b)
#> a ~ b
So in your case you can do:
library (sjstats)
cramer2 <- function(dta, x, y) {
f <- as.formula(call("~", substitute(x), substitute(y)))
effsize <- cramer(f, data = dta)
return(effsize)
}
Obviously we don't have your data, but using the built-in data set efc we can demonstrate this works as expected:
data(efc)
cramer2(efc, e16sex, c161sex)
#> [1] 0.05258249
Created on 2022-02-27 by the reprex package (v2.0.1)
The solution provided by Allan works perfectly fine if your function does not target the variables with quotation marks, i.e. customfunction(dta=mydata, x= gender, y=age, weight=dataweight).
If, however, you must for some reason target the variables with quotation marks, e.g. customfunction(dta=mydata, x= "gender", y="age", weight="dataweight").
Then replace substitute with sym:
library (sjstats)
cramer2 <- function(dta, x, y) {
f <- as.formula(call("~", sym(x), sym(y)))
effsize <- cramer(f, weights=dta[[weight]], data = dta)
return(effsize)
}
Related
I have a global function of roughly the form:
demo_fcn <- function(f, x1,x2){
r = x1 - x2
return(f(r))
}
I want to create this function in a general way so that users can add their own f with their own custom inputs, so long as there is an input slot for r. Say we take f to be the following function
f <- function(input, factor){
out = input^factor
return(out)
}
In this case, input = r, so that the user is able to call
demo_fcn(f(factor=2),x1=2,x2=3)
I get the error
Error in f(factor = 2) : argument "input" is missing, with no default
The desired outcome here should be the following code running
r = 2-3
f(input=r, factor=2)
The end goal is to implement this in a more complicated function, with multiple arguments for both demo_fcn and f
demo_fcn <- function(f, x1,x2){
r1 = x1 - x2
r2 = x1+x2
return(f(r1,r2))
}
f <- function(input1, input2, factor1,factor2){
out = input^factor1 + input2^factor2
return(out)
}
One way is to pass a function (not a function call), and use ... in the top function to pass additional arguments.
demo_fcn <- function(f, x1, x2, ...) {
r = x1 - x2
f(r, ...)
}
f <- function(input, factor){
out = input^factor
out
}
demo_fcn(f, x1=2, x2=5, factor=2)
# [1] 9
If you want to have multiple such functions, then you can do:
demo_fcn <- function(f1, f2, x1, x2, f1opts = NULL) {
r = x1 - x2
do.call(f, c(list(r), f1opts))
}
demo_fcn(f, x1=2, x2=5, f1opts=list(factor=2))
Yet another alternative, taking from curve, which may match more closely what you're hoping for.
demo_fcn <- function(expr, x1, x2, xname = "x") {
r = x1 - x2
sexpr <- substitute(expr)
if (is.name(sexpr)) {
expr <- call(as.character(sexpr), as.name(xname))
} else {
if (!((is.call(sexpr) || is.expression(sexpr)) && xname %in%
all.vars(sexpr)))
stop(gettextf("'expr' must be a function, or a call or an expression containing '%s'",
xname), domain = NA)
expr <- sexpr
}
ll <- list(x = r)
names(ll) <- xname
eval(expr, envir = ll, enclos = parent.frame())
}
demo_fcn(f(x, factor=2), x1=2, x2=5)
# [1] 9
See ?curve for more explanation of xname=, but in short: use x in your call to f(.) though it does not use any object named x in the local or other environment, it is just a placeholder. If you prefer, you can change to xname="input" and demo_fcn(f(input,factor=2),...) for the same effect, but realize that in that call, input is still a placeholder, not a reference to an object.
Let's say I'd like to write anscombe %>% lm_tidy("x1", "y1") (Actually, I'd like to write anscombe %>% lm_tidy(x1, y1), where x1 and y1 are part of the data frame). So, as the following function seems working:
plot_gg <- function(df, x, y) {
x <- enquo(x)
y <- enquo(y)
ggplot(df, aes(x = !!x, y = !!y)) + geom_point() +
geom_smooth(formula = y ~ x, method="lm", se = FALSE)
}
I started writing the following function:
lm_tidy_1 <- function(df, x, y) {
x <- enquo(x)
y <- enquo(y)
fm <- y ~ x ##### I tried many stuff here!
lm(fm, data=df)
}
## Error in model.frame.default(formula = fm, data = df, drop.unused.levels = TRUE) :
## object is not a matrix
One comment in passing in column name as argument states that embrace {{...}} is a shorthand notation for the quote-unquote pattern. Unfortunately, error messages were different in both situations:
lm_tidy_2 <- function(df, x, y) {
fm <- !!enquo(y) ~ !!enquo(x) # alternative: {{y}} ~ {{x}} with different errors!!
lm(fm, data=df)
}
## Error:
## ! Quosures can only be unquoted within a quasiquotation context.
This seems working (based on #jubas's answer but we're stuck with string handling and paste):
lm_tidy_str <- function(df, x, y) {
fm <- formula(paste({{y}}, "~", {{x}}))
lm(fm, data=df)
}
Yet again, {{y}} != !!enquo(y). But it's worse: the following function breaks down with the same Quosure error as earlier:
lm_tidy_str_1 <- function(df, x, y) {
x <- enquo(x)
y <- enquo(y)
fm <- formula(paste(!!y, "~", !!x))
lm(fm, data=df)
}
Is {{y}} != !!enquo(y)?
How to pass data-variables to lm?
EDIT: Sorry, there were left-overs from my many trials. I want to directly pass the data-variables (say x1 and y1) to the function that is going to use them as formula components (such as lm) and not their string versions ("x1" and "y1"): I try to avoid strings as long as possible and it's more streamlined from the user perspective.
Consider:
lm_tidy_1 <- function(df, x, y) {
fm <- reformulate(as.character(substitute(x)), substitute(y))
lm(fm, data=df)
}
lm_tidy_1(iris, Species, Sepal.Length)
lm_tidy_1(iris, 'Species', Sepal.Length)
lm_tidy_1(iris, Species, 'Sepal.Length')
lm_tidy_1(iris, 'Species', 'Sepal.Length')
Edit:
If you need the formula to appear, change the call object:
lm_tidy_1 <- function(df, x, y) {
fm <- reformulate(as.character(substitute(x)), substitute(y))
res<-lm(fm, data=df)
res$call[[2]]<- fm
res
}
lm_tidy_1(iris, Species, Sepal.Length)
Call:
lm(formula = Sepal.Length ~ Species, data = df)
Coefficients:
(Intercept) Speciesversicolor Speciesvirginica
5.006 0.930 1.582
#BiranSzydek's answer is pretty good.
However it has 3 downsides:
Call:
lm(formula = fm, data = .)
One cannot see the formula nor the data which were actually used.
One has to input the symbols as strings.
The dependency from rlang - though it is a great package.
You can indeed solve this problem with pure base R!
The solution in pure base R
R is actually under-the-hood a Lisp. It is suitable for such meta-programming tasks. The only downside of R is its horrible syntax.
Especially when facing meta-programming, it is not as beautiful and as elegant like the Lisp languages. The syntax really can confuse a lot - as you experienced it yourself when trying to solve this problem.
The solution is to use substitute() by which you can substitute code pieces in a quoted manner:
lm_tidy <- function(df, x, y) {
# take the arguments as code pieces instead to evaluate them:
.x <- substitute(x)
.y <- substitute(y)
.df <- substitute(df)
# take the code piece `y ~ x` and substitute using list lookup table
.fm <- substitute(y ~ x, list(y=.y, x=.x))
# take the code `lm(fm, data=df)` and substitute with the code pieceses defined by the lookup table
# by replacing them by the code pieces stored in `.fm` and `.df`
# and finally: evaluate the substituted code in the parent environment (the environment where the function was called!)
eval.parent(substitute(lm(fm, data=df), list(fm=.fm, df=.df)))
}
The trick is to use eval.parent(substitute( <your expression>, <a list which determines the evaluation lookup-table for the variables in your expression>)).
Beware of scoping! As long as <your expression> is constructed only using variables which are defined inside the function or inside the lookup-list given to substitute(), there won't be any scoping problems! But avoid to refer to any other variables within <your expression>! - So this is the only rule you have to obey to use eval()/eval.parent() safely in this context!
but even if, the eval.parent() takes care, that the substituted code
is executed within the environment where this function was called.
Now, you can do:
lm_tidy(mtcars, cyl, mpg)
the output is now as desired:
Call:
lm(formula = mpg ~ cyl, data = mtcars)
Coefficients:
(Intercept) cyl
37.885 -2.876
And we did this with pure base R!
The trick for safe use of eval() is really that every variable in the substitute() expression is defined/given inside the lookup tables for substitute() or the function's argument. In other words: None of the replaced variables refers to any dangling variables outside the function definition.
plot_gg function
So following these rules, your plot_gg function would be defined as:
plot_gg <- function(df, x, y) {
.x <- substitute(x)
.y <- substitute(y)
.df <- substitute(df)
.fm <- substitute( y ~ x, list(x=.x, y=.y))
eval.parent(substitute(
ggplot(df, aes(x=x, y=y)) + geom_point() +
geom_smooth(formula = fm, method="lm", se=FALSE),
list(fm=.fm, x=.x, y=.y, df=.df)
))
}
When you want to enter x and y as strings
lm_tidy_str <- function(df, x, y) {
.x <- as.name(x)
.y <- as.name(y)
.df <- substitute(df)
.fm <- substitute(y ~ x, list(y=.y, x=.x))
eval.parent(substitute(lm(fm, data=df), list(fm=.fm, df=.df)))
}
plot_gg_str <- function(df, x, y) {
.x <- as.name(x)
.y <- as.name(y)
.df <- substitute(df)
.fm <- substitute( y ~ x, list(x=.x, y=.y))
eval.parent(substitute(
ggplot(df, aes(x=x, y=y)) + geom_point() +
geom_smooth(formula = fm, method="lm", se=FALSE),
list(fm=.fm, x=.x, y=.y, df=.df)
))
}
lm_tidy_str(mtcars, "cyl", "mpg")
# Call:
# lm(formula = mpg ~ cyl, data = mtcars)
#
# Coefficients:
# (Intercept) cyl
# 37.885 -2.876
#
require(ggplot2)
plot_gg_str(mtcars, "cyl", "mpg")
Wrap the formula in "expr," then evaluate it.
library(dplyr)
lm_tidy <- function(df, x, y) {
x <- sym(x)
y <- sym(y)
fm <- expr(!!y ~ !!x)
lm(fm, data = df)
}
This function is equivalent:
lm_tidy <- function(df, x, y) {
fm <- expr(!!sym(y) ~ !!sym(x))
lm(fm, data = df)
}
Then
lm_tidy(mtcars, "cyl", "mpg")
gives
Call:
lm(formula = fm, data = .)
Coefficients:
(Intercept) cyl
37.885 -2.876
EDIT per comment below:
library(rlang)
lm_tidy_quo <- function(df, x, y){
y <- enquo(y)
x <- enquo(x)
fm <- paste(quo_text(y), "~", quo_text(x))
lm(fm, data = df)
}
You can then pass symbols as arguments
lm_tidy_quo(mtcars, cyl, mpg)
I'm trying to write a function which takes as argument a dataframe and the name of the function. When I try to write the function with the standard R syntax, I can get the good result using eval and substitute as recommanded by #hadley in http://adv-r.had.co.nz/Computing-on-the-language.html
> df <- data.frame(y = 1:10)
> f <- function(data, x) {
+ out <- mean(eval(expr = substitute(x), envir = data))
+ return(out)
+ }
> f(data = df, x = y)
[1] 5.5
Now, when I try to write the same function using the %>% operator, it doesn't work :
> df <- data.frame(y = 1:10)
> f <- function(data, x) {
+ data %>%
+ eval(expr = substitute(x), envir = .) %>%
+ mean()
+ }
> f(data = df, x = y)
Show Traceback
Rerun with Debug
Error in eval(expr, envir, enclos) : objet 'y' introuvable
>
How can I using the combine the piping operator with the use of eval and substitute ? It's seems really tricky for me.
A workaround would be
f <- function(data, x) {
v <- substitute(x)
data %>%
eval(expr = v, envir = .) %>%
mean()
}
The problem is that the pipe functions (%>%) are creating another level of closure which interferes with the evaluation of substitute(x). You can see the difference with this example
df <- data.frame(y = 1:10)
f1 <- function(data, x) {
print(environment())
eval(expr = environment(), envir = data)
}
f2 <- function(data, x) {
print(environment())
data %>%
eval(expr = environment(), envir = .)
}
f1(data = df, x = y)
# <environment: 0x0000000006388638>
# <environment: 0x0000000006388638>
f2(data = df, x = y)
# <environment: 0x000000000638a4a8>
# <environment: 0x0000000005f91ae0>
Notice how the environments differ in the matrittr version. You want to take care of substitute stuff as soon as possible when mucking about with non-standard evaluation.
I hope your use case is a bit more complex than your example, because it seems like
mean(df$y)
would be a much easier bit of code to read.
I've been trying to understand my problem.
First, I've written what I want with the summarise() function :
> library(dplyr)
> df <- data.frame(y = 1:10)
> summarise_(.data = df, mean = ~mean(y))
mean
1 5.5
Then I try to program my own function. I've found a solution which seems to work with the lazyeval package in this post. I use the lazy() and the interp() functions to write what I want.
The first possibility is here :
> library(lazyeval)
> f <- function(data, col) {
+ col <- lazy(col)
+ inter <- interp(~mean(x), x = col)
+ summarise_(.data = data, mean = inter)
+ }
> f(data = df, col = y)
mean
1 5.5
I can also use pipes :
> f <- function(data, col) {
+ col <- lazy(col)
+ inter <- interp(~mean(x), x = col)
+ data %>%
+ summarise_(.data = ., mean = inter)
+ }
>
> f(data = df, col = y)
mean
1 5.5
I would not use eval and substitute.
What follows is a simplified version of this great post suited to your question.
df <- data.frame(y = 1:10)
f <- function(data, x) {
x <- enquo(x)
df %>% summarise(mean = mean(!!x))
}
f(data = df, x = y)
There are two things happening here:
Tranforming the column name with enquo()
Prefixing the column with !!
Please see refer to the link for a more complicated example.
I'm trying to adjust the names of an argument inside a function. I want to create a procedure that takes the body of a function, looks for x, changes every x into x0, and then restores the function to what it was before. To provide an example:
f = function(x, y) -x^2 + x + -y^2 + y
# Take old names
form_old = names(formals(f))
# Make new names
form_new = paste0(form_old, 0)
# Give f new formals
formals(f) = setNames(vector("list", length(form_new)), form_new)
# Copy function body
bod = as.list(body(f))
for (i in 1:length(form_new)) {
bod = gsub(form_old[i], form_new[i], bod)
}
# return from list to call ?
body(f) = as.call(list(bod))
f(1, 1) # produces an error
So far, this code will change all variable names from x to x0 and from y to y0. However, the final output of bod is a character vector and not a call. How can I now change this back to a call?
Thanks in advance!
Surely there is a better way to do what you are trying to do that doesn't require modifying functions. That being said, you definetly don't want to be replacing variables by regular expressions, that could have all sorts of problems. Generally, trying to manipulate code as strings is going to lead to problems, for example, a function like tricky <- function(x, y) { tst <- "x + y"; -xx*x + yy*y }, where there are strings and variable names overlap, will lead to the wrong results.
Here is a function that takes a recursive approach (Recall) to traverse the expression tree (recursion could be avoided using a 'stack' type structure, but it seems more difficult to me).
## Function to replace variables in function body
## expr is `body(f)`, keyvals is a lookup table for replacements
rep_vars <- function(expr, keyvals) {
if (!length(expr)) return()
for (i in seq_along(expr)) {
if (is.call(expr[[i]])) expr[[i]][-1L] <- Recall(expr[[i]][-1L], keyvals)
if (is.name(expr[[i]]) && deparse(expr[[i]]) %in% names(keyvals))
expr[[i]] <- as.name(keyvals[[deparse(expr[[i]])]])
}
return( expr )
}
## Test it
f <- function(x, y) -x^2 + x + -y^2 + y
newvals <- c('x'='x0', 'y'='y0') # named lookup vector
newbod <- rep_vars(body(f), newvals)
newbod
# -x0^2 + x0 + -y0^2 + y0
## Rename the formals, and update the body
formals(f) <- pairlist(x0=bquote(), y0=bquote())
body(f) <- newbod
## The new 'f'
f
# function (x0, y0)
# -x0^2 + x0 + -y0^2 + y0
f(2, 2)
# [1] -4
With a more difficult function, where you want to avoid modifying strings or the other variables named yy and xx for example,
tricky <- function(x, y) { tst <- "x + y"; -xx*x + yy*y }
formals(tricky) <- pairlist(x0=bquote(), y0=bquote())
body(tricky) <- rep_vars(body(tricky), newvals)
tricky
# function (x0, y0)
# {
# tst <- "x + y"
# -xx * x0 + yy * y0
# }
#
There are a few ways to go here. Following your code, I would go with something like this:
f = function(x, y) -x^2 + x + -y^2 + y
# Take old names
form_old = names(formals(f))
# Make new names
form_new = paste0(form_old, 0)
deparse(body(f)) -> bod
for (i in 1:length(form_new)) {
bod = gsub(form_old[i], form_new[i], bod, fixed = TRUE)
}
formals(f) = setNames(vector("list", length(form_new)), form_new)
body(f) <- parse(text = bod)
f(1, 1)
I want to write a function that can take a variable number of inputs and regress the first input on the rest of the inputs. More specifically,
Hypothetically, suppose the function had been supplied with 2 or 3 or 4 variables, I would defined it as:
egen_neut<-function(x,y) residuals(lm(x~y,na.action=na.exclude)
egen_neut<-function(x,y,z) residuals(lm(x~y+z,na.action=na.exclude)
egen_neut<-function(x,y,z,w) residuals(lm(x~y+z+w,na.action=na.exclude)
how can I convert the dot-dot-dot, i.e. "...", such that it can be interpreted as a formula with a "+" between the variables, i.e. what will go in place of the ????? below
egen_neut<-function(x,...) {
residuals(lm(x ~ ?????,na.action=na.exclude)
}
Here's one way to do it:
ff <- function(x, ...) {
mc <- as.list(match.call())[-1]
ll <- as.character(mc[[1]])
rr <- paste(sapply(mc[-(1)], as.character), collapse="+")
fm <- as.formula(paste(ll, "~", rr))
# now you can execute `lm` as:
lm(fm, na.action = na.exclude)
}
# now you can do
ff(x, y, z, w)
where all these input variables reside in the global environment. Hope this helps.
Since you're dealing with data.frame, this is how I'd do it:
ff <- function(df, ...) {
mc <- as.list(match.call())[-(1:2)]
ll <- as.character(mc[[1]])
rr <- paste(sapply(mc[-(1)], as.character), collapse="+")
fm <- as.formula(paste(ll, "~", rr))
# now you can execute `lm` as:
lm(fm, data = df, na.action = na.exclude)
}
Suppose your data.frame is DF with columns x, y, z and you want to do x ~ y then:
ff(DF, x, y)
Or x ~ y + z, then:
ff(DF, x, y, z)
You get the idea.
You don't have to do that. Just write your function to take a formula argument:
egen_neut <- function(fm)
resid(lm(fm, na.action=na.exclude))
egen_neut(x ~ y)
egen_neut(x ~ y + z)
egen_neut(x ~ y + z + w)
If you put all the variables into a data frame, then you can use a loop to extract a subset of the columns and a . in the formula object to fit an additive model to all of the variables in the data frame not already in the formula. Suppose your data frame, d, with columns x, y, z, etc. Then something like
sapply(seq(2, length(d)), function(ix, d) residuals(lm(x ~ ., d[, seq(ix)])), d = d)
should do the trick, but this is untried so it might need some tweaking