I use speedglm to fit a GLM to data. When I call the function directly, the code works as expected, but when I create a function to fit the model, I get an error that an argument is not found.
The variable (w in the example below) clearly exists in the scope of the function but it seems that the variable is evaluated only later within the speedglm function where w is no longer available or so I think. This is where I start questioning my current understanding of R.
Did I make an error while creating the function, does speedglm use some weird trick to scope the variable (source code here) that breaks the normal (?) logic or do I have a wrong understanding of how R functions work?
I am trying to understand this behavior and also fix my train_glm function to make it work with speedglm and weights.
MWE
library(speedglm)
# works as expected
m1 <- speedglm(wt ~ cyl, data = mtcars, weights = mtcars$wt)
# define a small helper function that just forwards its arguments
train_glm <- function(f, d, w) {
speedglm(formula = f, data = d, weights = w)
}
# does not work
m <- train_glm(wt ~ cyl, d = mtcars, w = mtcars$wt)
#> Error in eval(extras, data, env) : object 'w' not found
Even weirder, if I change the code I found the following
# removing the weights as a base case -> WORKS
train_glm3 <- function(f, d) {
speedglm(formula = f, data = d)
}
m3 <- train_glm3(wt ~ cyl, d = mtcars)
# works
# hardcoding the weights inside the function -> BREAKS
train_glm4 <- function(f, d) {
speedglm(formula = f, data = d, weights = d$wt)
}
m4 <- train_glm4(wt ~ cyl, d = mtcars)
# Error in eval(extras, data, env) : object 'd' not found
# creating a new dataset and hardcoding the weights inside the function
# but using the name of the dataset at the highest environment -> WORKS
train_glm5 <- function(f, d) {
speedglm(formula = f, data = d, weights = mtcars2$wt)
}
mtcars2 <- mtcars
m5 <- train_glm5(wt ~ cyl, d = mtcars2)
# works
The solution (thanks to #Mike for the hint) is to evaluate the code either by using the solution given by this answer or by using do.call like so:
library(speedglm)
train_glm_docall <- function(f, d, w) {
do.call(
speedglm,
list(
formula = f,
data = d,
weights = w
)
)
}
m2 <- train_glm_docall(f = wt ~ cyl, d = mtcars, w = mtcars$wt)
class(m2)
#> [1] "speedglm" "speedlm"
Related
I am trying to use lme function from nlme package inside a lapply loop. This works for lmer function from lme4 package, but produces an error message for lme. How can I loop lme functions similarly to the lmer function in the example below?
library("nlme")
library("lme4")
set.seed(1)
dt <- data.frame(Resp1 = rnorm(100, 50, 23), Resp2 = rnorm(100, 80, 15), Pred = rnorm(100,10,2), group = factor(rep(LETTERS[1:10], each = 10)))
## Syntax:
lmer(Resp1 ~ Pred + (1 |group), data = dt)
lme(Resp1 ~ Pred, random = ~1 | group, data = dt)
## Works for lme4
lapply(c("Resp1", "Resp2"), function(k) {
lmer(substitute(j ~ Pred + (1 | group), list(j = as.name(k))), data = dt)})
## Does not work for nlme
lapply(c("Resp1", "Resp2"), function(k) {
lme(substitute(j ~ Pred, list(j = as.name(k))), random = ~1 | group, data = dt)})
# Error in UseMethod("lme") :
# no applicable method for 'lme' applied to an object of class "call"
PS. I am aware that this solution exists, but I would like to use a method substituting response variable directly in the model function instead of subsetting data using an additional function.
Instead of fiddling around with substitute and eval you also could do the following:
lapply(c("Resp1", "Resp2"), function(r) {
f <- formula(paste(r, "Pred", sep = "~"))
m <- lme(fixed = f, random = ~ 1 | group, data = dt)
m$call$fixed <- f
m})
You could use the same trick if you want to provide different data sets to a modelling function:
makeModel <- function(dat) {
l <- lme(Resp1 ~ Pred, random = ~ 1 | group, data = dat)
l$call$data <- as.symbol(deparse(substitute(dat)))
l
}
I use this snippet quite a bit, when I want to generate a model from within a function and want to update it afterwards.
As #CarlWitthoft suggested, adding eval into the function will solve the issue:
lapply(c("Resp1", "Resp2"), function(k) {
lme(eval(substitute(j ~ Pred, list(j = as.name(k)))), random = ~1 | group, data = dt)})
Also see #thothal's alternative.
I want to pass weights to glm() via a function without having to use the eval(substitute()) or do.call() methods, but using rlang.
This describes a more complicated underlying function.
# Toy data
mydata = dplyr::tibble(outcome = c(0,0,0,0,0,0,0,0,1,1,1,1,1,1),
group = c(0,1,0,1,0,1,0,1,0,1,0,1,0,1),
wgts = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1)
)
# This works
glm(outcome ~ group, data = mydata)
# This works
glm(outcome ~ group, data = mydata, weights = wgts)
library(rlang)
# Function not passing weights
myglm <- function(.data, y, x){
glm(expr(!! enexpr(y) ~ !! enexpr(x)), data = .data)
}
# This works
myglm(mydata, outcome, group)
# Function passing weights
myglm2 <- function(.data, y, x, weights){
glm(expr(!! enexpr(y) ~ !! enexpr(x)), `weights = !! enexpr(weights)`, data = .data)
}
# This doesn't work
myglm2(mydata, outcome, group, wgts)
(Ticks are to highlight).
I know the weights argument here is wrong, I have tried many different ways of doing this all unsuccessfully. The actual function will be passed to a version of purrr:map() or purrr:invoke(), which is why I want to avoid a simple do.call(). Thoughts greatly appreciated.
The issue is that glm() can recognize an expression being provided to its weights argument, but doesn't support quasiquotation, because it uses the base quote() / substitute() / eval() mechanisms instead of rlang. This causes problems for nested expression arithmetic.
One way to get around it is to compose the entire glm expression, then evaluate it. You can use ... to supply optional arguments.
myglm2 <- function( .data, y, x, weights, ... ) {
myglm <- expr( glm(!!enexpr(y) ~ !!enexpr(x), data=.data,
weights = !!enexpr(weights), ...) )
eval(myglm)
}
myglm2(mydata, outcome, group)
# Call: glm(formula = outcome ~ group, data = .data)
myglm2(mydata, outcome, group, wgts)
# Call: glm(formula = outcome ~ group, data = .data, weights = wgts)
myglm2(mydata, outcome, group, wgts, subset=7:10)
# Call: glm(formula = outcome ~ group, data = .data, weights = wgts,
# subset = ..1)
# While masked as ..1, the 7:10 is nevertheless correctly passed to glm()
To follow #lionel's suggestion, you can encapsulate the expression composition / evaluation into a standalone function:
value <- function( e ) {eval(enexpr(e), caller_env())}
myglm2 <- function( .data, y, x, weights, ... ) {
value( glm(!!enexpr(y) ~ !!enexpr(x), data=.data,
weights = !!enexpr(weights), ...) )
}
I am trying to use lme function from nlme package inside a lapply loop. This works for lmer function from lme4 package, but produces an error message for lme. How can I loop lme functions similarly to the lmer function in the example below?
library("nlme")
library("lme4")
set.seed(1)
dt <- data.frame(Resp1 = rnorm(100, 50, 23), Resp2 = rnorm(100, 80, 15), Pred = rnorm(100,10,2), group = factor(rep(LETTERS[1:10], each = 10)))
## Syntax:
lmer(Resp1 ~ Pred + (1 |group), data = dt)
lme(Resp1 ~ Pred, random = ~1 | group, data = dt)
## Works for lme4
lapply(c("Resp1", "Resp2"), function(k) {
lmer(substitute(j ~ Pred + (1 | group), list(j = as.name(k))), data = dt)})
## Does not work for nlme
lapply(c("Resp1", "Resp2"), function(k) {
lme(substitute(j ~ Pred, list(j = as.name(k))), random = ~1 | group, data = dt)})
# Error in UseMethod("lme") :
# no applicable method for 'lme' applied to an object of class "call"
PS. I am aware that this solution exists, but I would like to use a method substituting response variable directly in the model function instead of subsetting data using an additional function.
Instead of fiddling around with substitute and eval you also could do the following:
lapply(c("Resp1", "Resp2"), function(r) {
f <- formula(paste(r, "Pred", sep = "~"))
m <- lme(fixed = f, random = ~ 1 | group, data = dt)
m$call$fixed <- f
m})
You could use the same trick if you want to provide different data sets to a modelling function:
makeModel <- function(dat) {
l <- lme(Resp1 ~ Pred, random = ~ 1 | group, data = dat)
l$call$data <- as.symbol(deparse(substitute(dat)))
l
}
I use this snippet quite a bit, when I want to generate a model from within a function and want to update it afterwards.
As #CarlWitthoft suggested, adding eval into the function will solve the issue:
lapply(c("Resp1", "Resp2"), function(k) {
lme(eval(substitute(j ~ Pred, list(j = as.name(k)))), random = ~1 | group, data = dt)})
Also see #thothal's alternative.
I'm using the rpart package to fit some models, like this:
fitmodel = function(formula, data, w) {
fit = rpart(formula, data, weights = w)
}
Call the custom function
fit = fitmodel(y ~ x1 + x2, data, w)
This causes the error:
Error in eval(expr, envir, enclos) : object 'w' not found
Then i decided to use
fitmodel = function(formula, data, w) {
data$w = w
fit = rpart(formula, data, weights = w)
}
This works, but there's another problem:
This will work
fit = fitmodel(y ~ x1 + x2, data, w)
This does not work
fit = fitmodel(y ~ ., data, w)
Error in eval(expr, envir, enclos) : object 'w' not found
What's the correct way to apply weights inside a custom function? Thanks!
Hopefully someone else gives a more complete answer. The reason why rpart can't find w is that rpart searches the environment that the formula is defined in for data, weights, etc. The formula is created in some environment most likely the GlobalEnv and the w is created within some other function. Changing the environment of the formula to the environment where w is created with parent.frame fixes that. rpart can still find the data since the search path will always continue to the GlobalEnv. I'm not sure why the sys.frame(sys.nframe()) works since the environments aren't the same but apparently w is still somewhere on the search path
edit: sys.frame(sys.nframe()) seems to be the same as setting the environment of the forumla to the environment of the function rpart is called in (foo3 in this example). In that case, rpart looks for w, data, etc. in foo3, then bar3 then the GlobalEnv.
library(rpart)
data(iris)
bar <- function(formula, data) {
w <- rpois(nrow(iris), 1)
print(environment())
foo(formula, data, w)
}
foo <- function(formula, data, w) {
print(environment(formula))
fit <- rpart(formula, data, weights = w)
return(fit)
}
bar(I(Species == "versicolor") ~ ., data = iris)
## <environment: 0x1045b1a78>
## <environment: R_GlobalEnv>
## Error in eval(expr, envir, enclos) (from #2) : object 'w' not found
bar2 <- function(formula, data) {
w <- rpois(nrow(iris), 1)
print(environment())
foo2(formula, data, w)
}
foo2 <- function(formula, data, w) {
print(environment(formula))
environment(formula) <- parent.frame()
print(environment(formula))
fit <- rpart(formula, data, weights = w)
return(fit)
}
bar2(I(Species == "versicolor") ~ ., data = iris)
## <environment: 0x100bf5910>
## <environment: R_GlobalEnv>
## <environment: 0x100bf5910>
bar3 <- function(formula, data) {
w <- rpois(nrow(iris), 1)
print(environment())
foo3(formula, data, w)
}
foo3 <- function(formula, data, w) {
print(environment(formula))
environment(formula) <- environment() ## seems to be the same as sys.frame(sys.nframe())
print(environment(formula))
print(environment())
fit <- rpart(formula, data, weights = w)
return(fit)
}
bar3(I(Species == "versicolor") ~ ., data = iris)
## <environment: 0x104e11bb8>
## <environment: R_GlobalEnv>
## <environment: 0x104b4ff78>
## <environment: 0x104b4ff78>
According to the rpart documentation (March 12, 2017, page 23, section 6.1), "Weights are not yet supported, and will be ignored if present."
https://cran.r-project.org/web/packages/rpart/vignettes/longintro.pdf
I've managed to solve this using the code below, but i'm sure there's a better way:
The weak learner
fitmodel = function(formula, data, w) {
# just paste the weights into the data frame
data$w = w
rpart(formula, data, weights = w, control = rpart.control(maxdepth = 1))
}
The algorithm
ada.boost = function(formula, data, wl.FUN = fitmodel, test.data = NULL, M = 100) {
# Just rewrites the formula and get ride of any '.'
dep.var = all.vars(formula)[1]
vars = attr(terms(formula, data = data), "term.labels")
formula = as.formula(paste(dep.var, "~", paste(vars, collapse = "+")))
# ...more code
}
Now everything works!
I have written a function to run phylogenetic generalized least squares, and everything looks like it should work fine, but for some reason, a specific variable which is defined in the script (W) keeps coming up as undefined. I have stared at this code for hours and cannot figure out where the problem is.
Any ideas?
myou <- function(alpha, datax, datay, tree){
data.frame(datax[tree$tip.label,],datay[tree$tip.label,],row.names=tree$tip.label)->dat
colnames(dat)<-c("Trait1","Trait2")
W<-diag(vcv.phylo(tree)) # Weights
fm <- gls(Trait1 ~ Trait2, data=dat, correlation = corMartins(alpha, tree, fixed = TRUE),weights = ~ W,method = "REML")
return(as.numeric(fm$logLik))
}
corMartins2<-function(datax, datay, tree){
data.frame(datax[tree$tip.label,],datay[tree$tip.label,],row.names=tree$tip.label)->dat
colnames(dat)<-c("Trait1","Trait2")
result <- optimize(f = myou, interval = c(0, 4), datax=datax,datay=datay, tree = tree, maximum = TRUE)
W<-diag(vcv.phylo(tree)) # Weights
fm <- gls(Trait1 ~ Trait2, data = dat, correlation = corMartins(result$maximum, tree, fixed =T),weights = ~ W,method = "REML")
list(fm, result$maximum)}
#test
require(nlme)
require(phytools)
simtree<-rcoal(50)
as.data.frame(fastBM(simtree))->dat1
as.data.frame(fastBM(simtree))->dat2
corMartins2(dat1,dat2,tree=simtree)
returns "Error in eval(expr, envir, enclos) : object 'W' not found"
even though W is specifically defined!
Thanks!
The error's occuring in the gls calls in myou and corMatrins2: you have to pass in W as a column in dat because gls is looking for it there (when you put weights = ~W as a formula like that it looks for dat$W and can't find it).
Just change data=dat to data=cbind(dat,W=W) in both functions.
The example is not reproducible for me, as lowerB and upperB are not defined, however, perhaps the following will work for you, cbinding dat with W:
myou <- function(alpha, datax, datay, tree){
data.frame(datax[tree$tip.label,],datay[tree$tip.label,],row.names=tree$tip.label)->dat
colnames(dat)<-c("Trait1","Trait2")
W<-diag(vcv.phylo(tree)) # Weights
### cbind W to dat
dat <- cbind(dat, W = W)
fm <- gls(Trait1 ~ Trait2, data=dat, correlation = corMartins(alpha, tree, fixed = TRUE),weights = ~ W,method = "REML")
return(as.numeric(fm$logLik))
}
corMartins2<-function(datax, datay, tree){
data.frame(datax[tree$tip.label,],datay[tree$tip.label,],row.names=tree$tip.label)->dat
colnames(dat)<-c("Trait1","Trait2")
result <- optimize(f = myou, interval = c(lowerB, upperB), datax=datax,datay=datay, tree = tree, maximum = TRUE)
W<-diag(vcv.phylo(tree)) # Weights
### cbind W to dat
dat <- cbind(dat, W = W)
fm <- gls(Trait1 ~ Trait2, data = dat, correlation = corMartins(result$maximum, tree, fixed =T),weights = ~ W,method = "REML")
list(fm, result$maximum)}
#test
require(phytools)
simtree<-rcoal(50)
as.data.frame(fastBM(simtree))->dat1
as.data.frame(fastBM(simtree))->dat2
corMartins2(dat1,dat2,tree=simtree)