I am trying to run a t-test within a custom function, and am running into a quosure misapplication (I believe). Any help would be greatly appreciated.
library(tidyverse)
tp_pull <- function(mydata, dv, iv){
dv <- enquo(dv)
iv <- enquo(iv)
t.test(!!dv ~ !!iv, mydata)
}
tp_pull(mydata = mtcars, dv = mpg, iv = vs)
My error message reads:
numerical expression has 2 elements: only the first usedNAs introduced by
coercion
Show Traceback
Error in quo_name(dv):~!(!iv) : NA/NaN argument
For context this t-test will be part of a larger custom function.
Quosures are unique to tidyeval and are not supposed by the base R language. Right now they only work with dplyr. It is very unlikely these will ever work with base functions such as t.test.
If you want to do this with base R, you can use G. Grothendieck's suggestion of
tp_pull <- function(mydata, dv, iv){
t.test(formula(substitute(dv ~ iv)), mydata)
}
tp_pull(mydata = mtcars, dv = mpg, iv = vs)
The substitute captures the un-evaulated symbol names from the promise passed to the call and allows you to re-assemble them into a new expression. The formula() call helps coerce the un-evaulated expression returned by substitute() into a proper R formula object.
Related
This question is highly related to R - how to pass formula to a with(df, glm(y ~ x)) construction inside a function but asks a broader question.
Why do these expressions work?
text_obj <- "mpg ~ cyl"
form_obj <- as.formula(text_obj)
with(mtcars, lm(mpg ~ cyl))
with(mtcars, lm(as.formula(text_obj)))
lm(form_obj, data = mtcars)
But not this one?
with(mtcars, lm(form_obj))
Error in eval(predvars, data, env) : object 'mpg' not found
I would usually use the data argument but this is not possible in the mice package.
Ie.
library(mice)
mtcars[5, 5] <- NA # introduce a missing value to be imputed
mtcars.imp = mice(mtcars, m = 5)
These don't work
lm(form_obj, data = mtcars.imp)
with(mtcars.imp, lm(form.obj))
but this does
with(mtcars.imp, lm(as.formula(text_obj)))
Thus, is it better to always thus use the as.formula argument inside the function, rather than construct it first and then pass it in?
An important "hidden" aspect of formulas is their associated environment.
When form_obj is created, its environment is set to where form_obj was created:
environment(form_obj)
# <environment: R_GlobalEnv>
For every other version, the formula's environment is created from within with(), and is set to that temporary environment. It's easiest to see this with the as.formula approach by splitting it into a few steps:
with(mtcars, {
f = as.formula(text_obj)
print(environment(f))
lm(f)
})
# <environment: 0x7fbb68b08588>
We can make the form_obj approach work by editing its environment before calling lm:
with(mtcars, {
# set form_obj's environment to the current one
environment(form_obj) = environment()
lm(form_obj)
})
The help page for ?formula is a bit long, but there's a section on environments:
Environments
A formula object has an associated environment, and this environment (rather than the parent environment) is used by model.frame to evaluate variables that are not found in the supplied data argument.
Formulas created with the ~ operator use the environment in which they were created. Formulas created with as.formula will use the env argument for their environment.
The upshot is, making a formula with ~ puts the environment part "under the rug" -- in more general settings, it's safer to use as.formula which gives you fuller control over the environment to which the formula applies.
You might also check Hadley's chapter on environments:
http://adv-r.had.co.nz/Environments.html
I want to loop glm/lm over multiple outcomes and predictors while stratified by groups. nest() and map() functions from purrr package seems to provide an elegant solution to stratification analysis. However, when I use a customized function which takes multiple input, map() doesn't seem to work.
In almost all the tutorials on map() from purrr I have seen,regression model examples are static -- the dependent and independent variables are explicitly defined in the function. Because I want to loop over dozens of outcomes and predictors, I am trying to write a lm() function that can iterate over different combinations.
library(dplyr)
library(broom)
library(tidyr)
library(purrr)
# example data set
set.seed(20)
df <- data.frame(
out = rep(c(0,1),5,replace=TRUE),
pre = sample(c(1:4),10,replace = TRUE),
var1 = sample(c(1:2),10,replace = TRUE),
var2 = sample(c(1:50),10,replace = TRUE),
group = sample(c(1:2),10,replace = TRUE)
)
explicit_fun<-function(data){
glm(out ~ pre + var1 + var2, data=data, family = binomial())
}
input_fun<-function(data, outcome, predictor, covariate){
glm(as.formula(paste(outcome,"~",predictor,"+",paste(covariate,collapse = "+"))),data=data,family = binomial())
}
# nesting the data set
df_by_group<-df%>%
group_by(group)%>%
nest()
it works fine with the explicit function
models <- df_by_group%>%
mutate(mod=purrr::map(data,explicit_fun))
models <- models%>%
mutate(
glance_glm=purrr::map(mod,broom::glance),
tidy_glm=purrr::map(mod,broom::tidy),
augment_glm=purrr::map(mod,broom::augment)
)
unnest(models,data)
unnest(models,glance_glm,.drop = TRUE)%>% View()
unnest(models,tidy_glm) %>% View()
it stops working when using the function has multiple inputs
models<-df_by_group%>%
mutate(mod=purrr::map(data,input_fun(data=.,outcome="out",predictor="pre",covariate=c("var1","var2"))))
I expect the input_fun would work the same as the explicit_fun, but I received the following error message:
Error in mutate_impl(.data, dots) :
Evaluation error: Can't convert a `glm/lm` object to function
Call `rlang::last_error()` to see a backtrace.
You need to pass a function to map(). Right now, you are calling a function in the second parameter, not passing a function. The quickest way to fix this is to use the formula syntax to create a function. Try
models <- df_by_group%>%
mutate(mod=purrr::map(data, ~input_fun(data=.,outcome="out",predictor="pre",covariate=c("var1","var2"))))
This delays the evaluation of input_fun till the map actually happens and properly fills in the . value.
I am working on a problem where I need to fit many additive models of the form y ~ s(x), where the response y is constant whereas the predictor x varies between each model. I am using mgcv::smoothCon() to set up the bases, and lm() to fit the models. The reason why I do this, rather than calling gam() directly, is that I need the unpenalized fits. My problem is that smoothCon() requires it object argument to be unquoted, e.g., s(x), and I wonder how I can generated such unquoted arguments from a character vector of variable names.
A minimal example can be illustrated using the mtcars dataset. The following snippet shows what I am able to do at the moment:
library(mgcv)
# Variables for which I want to create a smooth term s(x)
responses <- c("mpg", "disp")
# At the moment, this is the only solution which I am able to make work
bs <- list(
smoothCon(s(mpg), data = mtcars),
smoothCon(s(disp), data = mtcars)
)
It would be nicer to be able to generate bs using some functional programming approach. I imagine something like this, where foo() is my missing link:
lapply(paste0("s(", responses, ")"), function(x) smoothCon(foo(x),
data = mtcars))
I have tried noquote() and as.symbol(), but both fail.
responses <- c("mpg", "disp")
lapply(paste0("s(", responses, ")"),
function(x) smoothCon(noquote(x), data = mtcars))
#> Error: $ operator is invalid for atomic vectors
lapply(paste0("s(", responses, ")"),
function(x) smoothCon(as.symbol(x), data = mtcars))
#> Error: object of type 'symbol' is not subsettable
We can do this by converting to language class, evaluate and then apply the smoothCon
library(tidyverse)
out <- paste0("s(", responses, ")") %>%
map(~ rlang::parse_expr(.x) %>%
eval %>%
smoothCon(., data = mtcars))
identical(out, bs)
#[1] TRUE
why don't you try like this?
smoothCon(s(get("disp")), data = mtcars)
and, instead of disp you give the name of the variable you prefer. You can even put this within a loop or any other construct you prefer
I need to use mixed model lme function many times in my code. But I do not know how to use it within a function. If used otherwise, the lme function works just well but when used within the function, it throws errors:
myfunc<- function(cc, x, y, z)
{
model <- lme(fixed = x ~1 , random = ~ 1|y/z,
data=cc,
method="REML")
}
on calling this function:
myfunc (dbcon2, birthweight, sire, dam)
I get the error :
Error in model.frame.default(formula = ~x + y + z, data = list(animal
= c("29601/9C1", : invalid type (list) for variable 'x'
I think, there is a different procedure for using this which I am unaware of. Any help would be greatly appreciated.
Thanks in advance
Not sure if you are looking for this, you may try to use this, as correctly pointed out by #akrun, you may use paste, I am using paste0 however(its a special case of paste), paste concatenates two strings:
Here the idea is to concatenate the variable names with the formula, but since paste converts it to a string hence you can't refer that as formula to build a model,so you need to convert that string to a formula using as.formula which is wrapped around paste0 statement.
To understand above, Try writing a formula like below using paste:
formula <-paste0("mpg~", paste0("hp","+", "am"))
print(formula)
[1] "mpg~hp+am"
class(formula)
[1] "character" ##This should ideally be a formula rather than character
formula <- as.formula(formula) ##conversion of character string to formula
class(formula)
[1] "formula"
To work inside a model, you would always require a formula object, also please also try to learn about collapse and sep option in paste they are very handy.
I don't have your data , hence I have used mtcars data to represent the same.
library("nlme")
myfunc<- function(cc, x, y, z)
{
model <- lme(fixed = as.formula(paste0(x," ~1")) , random = as.formula(paste0("~", "1|",y,"/",z)),
data=cc,
method="REML")
}
models <- myfunc(cc=mtcars, x="hp", y="mpg", z="am")
summary(models)
You can read more about paste by typing ?paste in your console.
loess.smooth <- function(dat) {
dat <- dat[complete.cases(dat),]
## response
vars <- colnames(dat)
## covariate
id <- 1:nrow(dat)
## define a loess filter function (fitting loess regression line)
loess.filter <- function (x, span) loess(formula = paste(x, "id", sep = "~"),
data = dat,
degree = 1,
span = span)$fitted
## apply filter column-by-column
new.dat <- as.data.frame(lapply(vars, loess.filter, span = 0.75),
col.names = colnames(dat))
}
When I try to apply loess.smooth to a dataframe, I get the error:
Error in model.frame.default(formula = paste(x, "id", sep = "~"), data = dat) :
invalid type (closure) for variable 'id'
I don't understand why this is a problem since id is not a function, which is implied by the error.
When I run through these lines of code outside of the function, it works perfectly fine and does exactly what I want it to do.
It is a scoping issue involving passing a vector of strings to the loess function instead of passing a vector of formulas. The problem is that the environment returns NULL for the former, so loess doesn't know where to find it. If you wrap the formula in as.formula it works. This variable will be assigned the local environment inside the function call by default.
As to the cryptic error, it happens when you name a variable the same name of a given function from another package that is loaded, since if a function doesn't find a variable in the local environment, it will scope in the loaded packages for the function. In my case, the id function was loaded by the dplyr library.