Passing weights to glm() using rlang - r

I want to pass weights to glm() via a function without having to use the eval(substitute()) or do.call() methods, but using rlang.
This describes a more complicated underlying function.
# Toy data
mydata = dplyr::tibble(outcome = c(0,0,0,0,0,0,0,0,1,1,1,1,1,1),
group = c(0,1,0,1,0,1,0,1,0,1,0,1,0,1),
wgts = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1)
)
# This works
glm(outcome ~ group, data = mydata)
# This works
glm(outcome ~ group, data = mydata, weights = wgts)
library(rlang)
# Function not passing weights
myglm <- function(.data, y, x){
glm(expr(!! enexpr(y) ~ !! enexpr(x)), data = .data)
}
# This works
myglm(mydata, outcome, group)
# Function passing weights
myglm2 <- function(.data, y, x, weights){
glm(expr(!! enexpr(y) ~ !! enexpr(x)), `weights = !! enexpr(weights)`, data = .data)
}
# This doesn't work
myglm2(mydata, outcome, group, wgts)
(Ticks are to highlight).
I know the weights argument here is wrong, I have tried many different ways of doing this all unsuccessfully. The actual function will be passed to a version of purrr:map() or purrr:invoke(), which is why I want to avoid a simple do.call(). Thoughts greatly appreciated.

The issue is that glm() can recognize an expression being provided to its weights argument, but doesn't support quasiquotation, because it uses the base quote() / substitute() / eval() mechanisms instead of rlang. This causes problems for nested expression arithmetic.
One way to get around it is to compose the entire glm expression, then evaluate it. You can use ... to supply optional arguments.
myglm2 <- function( .data, y, x, weights, ... ) {
myglm <- expr( glm(!!enexpr(y) ~ !!enexpr(x), data=.data,
weights = !!enexpr(weights), ...) )
eval(myglm)
}
myglm2(mydata, outcome, group)
# Call: glm(formula = outcome ~ group, data = .data)
myglm2(mydata, outcome, group, wgts)
# Call: glm(formula = outcome ~ group, data = .data, weights = wgts)
myglm2(mydata, outcome, group, wgts, subset=7:10)
# Call: glm(formula = outcome ~ group, data = .data, weights = wgts,
# subset = ..1)
# While masked as ..1, the 7:10 is nevertheless correctly passed to glm()
To follow #lionel's suggestion, you can encapsulate the expression composition / evaluation into a standalone function:
value <- function( e ) {eval(enexpr(e), caller_env())}
myglm2 <- function( .data, y, x, weights, ... ) {
value( glm(!!enexpr(y) ~ !!enexpr(x), data=.data,
weights = !!enexpr(weights), ...) )
}

Related

Cannot find object in function when object is defined with speedglm

I use speedglm to fit a GLM to data. When I call the function directly, the code works as expected, but when I create a function to fit the model, I get an error that an argument is not found.
The variable (w in the example below) clearly exists in the scope of the function but it seems that the variable is evaluated only later within the speedglm function where w is no longer available or so I think. This is where I start questioning my current understanding of R.
Did I make an error while creating the function, does speedglm use some weird trick to scope the variable (source code here) that breaks the normal (?) logic or do I have a wrong understanding of how R functions work?
I am trying to understand this behavior and also fix my train_glm function to make it work with speedglm and weights.
MWE
library(speedglm)
# works as expected
m1 <- speedglm(wt ~ cyl, data = mtcars, weights = mtcars$wt)
# define a small helper function that just forwards its arguments
train_glm <- function(f, d, w) {
speedglm(formula = f, data = d, weights = w)
}
# does not work
m <- train_glm(wt ~ cyl, d = mtcars, w = mtcars$wt)
#> Error in eval(extras, data, env) : object 'w' not found
Even weirder, if I change the code I found the following
# removing the weights as a base case -> WORKS
train_glm3 <- function(f, d) {
speedglm(formula = f, data = d)
}
m3 <- train_glm3(wt ~ cyl, d = mtcars)
# works
# hardcoding the weights inside the function -> BREAKS
train_glm4 <- function(f, d) {
speedglm(formula = f, data = d, weights = d$wt)
}
m4 <- train_glm4(wt ~ cyl, d = mtcars)
# Error in eval(extras, data, env) : object 'd' not found
# creating a new dataset and hardcoding the weights inside the function
# but using the name of the dataset at the highest environment -> WORKS
train_glm5 <- function(f, d) {
speedglm(formula = f, data = d, weights = mtcars2$wt)
}
mtcars2 <- mtcars
m5 <- train_glm5(wt ~ cyl, d = mtcars2)
# works
The solution (thanks to #Mike for the hint) is to evaluate the code either by using the solution given by this answer or by using do.call like so:
library(speedglm)
train_glm_docall <- function(f, d, w) {
do.call(
speedglm,
list(
formula = f,
data = d,
weights = w
)
)
}
m2 <- train_glm_docall(f = wt ~ cyl, d = mtcars, w = mtcars$wt)
class(m2)
#> [1] "speedglm" "speedlm"

How should I call model.frame in R?

I am trying to write my own modeling function in R, one which takes a formula, some data, and maybe some extra context, like weights; after calling model.frame to extract the necessary numeric data, it will perform a fit. My first pass looked like:
my_modfunc <- function(formula,data,weights=NULL) {
mf <- model.frame(formula,data=data,weights=weights)
wt <- model.weights(mf)
# do some fitting here...
}
# make fake data to test it
set.seed(1234)
data <- data.frame(x1=rnorm(50),x2=rnorm(50),y=rnorm(50),w=runif(50))
# call it:
my_modfunc(y ~ x1 + x2,data=data,weights=w)
This fails, I get the error:
Error in model.frame.default(formula, data = data, weights = weights) :
invalid type (closure) for variable '(weights)'
Similarly, if I call
my_modfunc(y ~ x1 + x2,data=data,weights='w')
I get the same error. I suspect there is some problem with environment, quoting, and so on.
Cutting and pasting the source for lm, I could rewrite my function as
# based on lm
weird_modfunc <- function(formula,data,weights=NULL ) {
cl <- match.call() # what?
mf <- match.call(expand.dots = FALSE) # what??
m <- match(c("formula", "data", "weights"), names(mf), 0L)
mf <- mf[c(1L, m)] # ??
mf$drop.unused.levels <- TRUE # ??
mf[[1L]] <- quote(stats::model.frame) ## ???
mf <- eval(mf, parent.frame())
wt <- as.vector(model.weights(mf))
# do some fitting here...
}
# this runs without error:
weird_modfunc(y ~ x1 + x2,data=data,weights=w)
# this fails with the same error as above about variable lengths.
weird_modfunc(y ~ x1 + x2,data=data,weights='w')
The problem is that this contains multiple somewhat mystical incantations that I do not know how to interpret, modify or maintain.
What is the right way to call model.frame? Bonus points for making my function accept both weights=w and weights='w'
Welcome to the joys of non-standard evaluation. I suggest you base your function on the lm approach. It constructs a call to model.frame and evaluates it. That's necessary, because model.frame does non-standard evaluation, i.e., it accepts/expects a symbol for the weights parameter. Furthermore, it also ensures correct scoping regarding the formula's environment.
weird_modfunc <- function(formula,data,weights=NULL ) {
#cl not needed, lm only adds this call to the return object
mf <- match.call(expand.dots = FALSE)
message("Call with ellipses not expanded: ")
#note that there are no ellipses in the function arguments for now,
#but you might want to change that later
print(mf)
#turn weights into symbol if character is passed
if (is.character(mf$weights)) mf$weights <- as.symbol(mf$weights)
m <- match(c("formula", "data", "weights"), names(mf), 0L)
message("Position of formula, data and weights in the call:")
print(m)
mf <- mf[c(1L, m)]
message("New call that only contains what is needed:")
print(mf)
mf$drop.unused.levels <- TRUE
message("Call with argument added:")
print(mf)
mf[[1L]] <- quote(stats::model.frame)
message("Change call to a call to model.frame:")
print(mf)
mf <- eval(mf, parent.frame()) #evaluate call
wt <- as.vector(model.weights(mf))
# do some fitting here...
message("Return value:")
wt
}
# this runs without error:
weird_modfunc(y ~ x1 + x2,data=data,weights=w)
#Call with ellipses not expanded:
#weird_modfunc(formula = y ~ x1 + x2, data = data, weights = w)
#Position of formula, data and weights in the call
#[1] 2 3 4
#New call that only contains what is needed:
#weird_modfunc(formula = y ~ x1 + x2, data = data, weights = w)
#Call with argument added:
#weird_modfunc(formula = y ~ x1 + x2, data = data, weights = w,
# drop.unused.levels = TRUE)
#Change call to a call to model.frame:
#stats::model.frame(formula = y ~ x1 + x2, data = data, weights = w,
# drop.unused.levels = TRUE)
#Return value:
# [1] 0.35299850 0.98095832 0.53888276 0.44403386 0.94936678 0.45248337 0.19062580 0.99160915 0.54845545 0.76881577 0.91342167 0.68211200 0.40725142
#[14] 0.40759230 0.14608279 0.19666771 0.19220934 0.40841440 0.34822131 0.83454285 0.19840001 0.86180531 0.39718531 0.15325377 0.33928338 0.36718044
#[27] 0.42737908 0.18633690 0.65801660 0.92041138 0.73389406 0.88231927 0.95334653 0.19490154 0.47261674 0.38605066 0.37416586 0.02785566 0.92935521
#[40] 0.41052928 0.95584022 0.27215284 0.51724649 0.97830984 0.36969649 0.31043044 0.03420963 0.66756585 0.92091638 0.04498960
#this runs without error too:
weird_modfunc(y ~ x1 + x2,data=data,weights='w')
Here is a simpler version but there might be problems (well, more than usual with non-standard evaluation):
my_modfunc <- function(formula,data,weights=NULL) {
weights <- substitute(weights)
if (!is.symbol(weights)) weights <- as.symbol(weights)
#substitute the symbol into the call:
mf <- eval(substitute(model.frame(formula,data=data,weights=weights)))
wt <- model.weights(mf)
# do some fitting here...
wt
}
my_modfunc(y ~ x1 + x2,data=data,weights=w)
#works
my_modfunc(y ~ x1 + x2,data=data,weights="w")
#works

update() does not work for models created via lapply()

I would like to use lapply() to compute several models in R, but it seems that the update() function can't handle models generated through lapply().
A minimal example:
d1 <- data.frame(y = log(1:9), x = 1:9, trt = rep(1:3, each = 3))
f <- list(y ~ 1, y ~ x, y ~ trt)
modsa <- lapply(f, function(formula) glm(formula, data = d1))
modsb <- lapply(f, glm, data = d1)
update(modsa[[1]], data = d1[1:7, ])
#> Error: object of type 'closure' is not subsettable
update(modsb[[1]], data = d1[1:7, ])
#> Error in FUN(formula = X[[i]], data = d1[1:7, ]): could not find function "FUN"
Is there a way that allows update() to deal with models generated through lapply()?
The error is occurring because the call elements of the glm objects are being overwritten by the argument name passed to the anonymous function
modsa <- lapply(f, function(x) glm(x, data = d1))
modsa[[1]]$call
glm(formula = x, data = d1)
#compare with a single instance of the model
moda1<-glm(y ~ 1, data=d1)
moda1$call
glm(formula = y ~ 1, data = d1)
If you add back in the formula, it will correctly recreate the call
update(modsa[[1]], data = d1[1:7, ], formula=f[[1]])
This doesn't work for the second instance, but you can see that if you manually update the call element the update functionality is rescued.
modsb[[1]]$call<-getCall(moda1)
update(modsb[[1]], data = d1[1:7, ])
Esther is correct, the problem is with the call element of glm. From ?update:
‘update’ will update and (by default) re-fit a model. It does this by
extracting the call stored in the object, updating the call and (by
default) evaluating that call.
As already mentioned, one can update including the formula as well:
update(modsa[[1]], data = d1[1:7, ], formula=f[[1]])
If for some reason this is not convenient, here is how to run your lapply() and have it assign directly the correct formula to the call element:
modsa <- lapply(f, function(formula) eval(substitute(glm(F, data = d1), list(F=formula))))
This substutites the respective formula to the glm call, and then evaluates it. With this long one-liner you can run update(modsa[[1]], data = d1[1:7, ]) with no problem.

R: Clustered robust standard errors using miceadds lm.cluster - error with subset and weights

I am trying to use the lm.cluster function in the package miceadds to get robust clustered standard errors for a multiply imputed dataset.
I am able to get the standard version of it to run but I get the following error when I try to add a subset or weights:
Error in eval(substitute(subset), data, env) :
..1 used in an incorrect context, no ... to look in
Example that works without subset or weights:
require("mice")
require("miceadds")
data(data.ma01)
# imputation of the dataset: use six imputations
dat <- data.ma01[ , - c(1:2) ]
imp <- mice::mice( dat , maxit=3 , m=6 )
datlist <- miceadds::mids2datlist( imp )
# linear regression with cluster robust standard errors
mod <- lapply(datlist, FUN = function(data){miceadds::lm.cluster( data=data ,
formula=read ~ paredu+ female , cluster = data.ma01$idschool )} )
# extract parameters and covariance matrix
betas <- lapply( mod , FUN = function(rr){ coef(rr) } )
vars <- lapply( mod , FUN = function(rr){ vcov(rr) } )
# conduct statistical inference
summary(pool_mi( qhat = betas, u = vars ))
Example that breaks with subset:
mod <- lapply(datlist, FUN = function(data){miceadds::lm.cluster( data=data ,
formula=read ~ paredu+ female , cluster = data.ma01$idschool, subset=
(data.ma01$urban==1))} )
Error during wrapup: ..1 used in an incorrect context, no ... to look in
Example that breaks with weights:
mod <- lapply(datlist, FUN = function(data){miceadds::lm.cluster( data=data ,
formula=read ~ paredu+ female , cluster = data.ma01$idschool,
weights=data.ma01$studwgt)} )
Error during wrapup: ..1 used in an incorrect context, no ... to look in
From searching, I think I am encountering similar issues as others when passing these commands through an lm or glm wrapper (such as: Passing Argument to lm in R within Function or R : Pass argument to glm inside an R function or Passing the weights argument to a regression function inside an R function)
However, I am not sure how to address the issue with the imputed datasets & existing lm.cluster command.
Thanks
This works fine with the estimatr package which is on CRAN and the estimatr::lm_robust() function. Two notes: (1) you can change the type of standard errors using se_type = and (2) I keep idschool in the data because we like the clusters to be in the same data.frame as we fit the model on.
library(mice)
library(miceadds)
library(estimatr)
# imputation of the dataset: use six imputations
data(data.ma01)
dat <- data.ma01[, -c(1)] # note I keep idschool in data
imp <- mice::mice( dat , maxit = 3, m = 6)
datlist <- miceadds::mids2datlist(imp)
# linear regression with cluster robust standard errors
mod <- lapply(
datlist,
function (dat) {
estimatr::lm_robust(read ~ paredu + female, dat, clusters = idschool)
}
)
# subset
mod <- lapply(
datlist,
function (dat) {
estimatr::lm_robust(read ~ paredu + female, dat, clusters = idschool, subset = urban == 1)
}
)
# weights
mod <- lapply(
datlist,
function (dat) {
estimatr::lm_robust(read ~ paredu + female, dat, clusters = idschool, weights = studwgt)
}
)
# note that you can use the `se_type` argument of lm_robust()
# to change the vcov estimation
# extract parameters and covariance matrix
betas <- lapply(mod, coef)
vars <- lapply(mod, vcov)
# conduct statistical inference
summary(pool_mi( qhat = betas, u = vars ))
I'm no expert, but there is an issue with the passing of the weights to lm(). I know this is not an ideal situation, but I managed to get it to work by modifying the lm.cluster() function to hard code the weights pass and then just used my own.
lm.cluster <- function (data, formula, cluster, wgts=NULL, ...)
{
TAM::require_namespace_msg("multiwayvcov")
if(is.null(wgts)) {
mod <- stats::lm(data = data, formula = formula)
} else {
data$.weights <- wgts
mod <- stats::lm(data = data, formula = formula, weights=data$.weights)
}
if (length(cluster) > 1) {
v1 <- cluster
}
else {
v1 <- data[, cluster]
}
dfr <- data.frame(cluster = v1)
vcov2 <- multiwayvcov::cluster.vcov(model = mod, cluster = dfr)
res <- list(lm_res = mod, vcov = vcov2)
class(res) <- "lm.cluster"
return(res)
}

How to use substitute() to loop lme functions from nlme package?

I am trying to use lme function from nlme package inside a lapply loop. This works for lmer function from lme4 package, but produces an error message for lme. How can I loop lme functions similarly to the lmer function in the example below?
library("nlme")
library("lme4")
set.seed(1)
dt <- data.frame(Resp1 = rnorm(100, 50, 23), Resp2 = rnorm(100, 80, 15), Pred = rnorm(100,10,2), group = factor(rep(LETTERS[1:10], each = 10)))
## Syntax:
lmer(Resp1 ~ Pred + (1 |group), data = dt)
lme(Resp1 ~ Pred, random = ~1 | group, data = dt)
## Works for lme4
lapply(c("Resp1", "Resp2"), function(k) {
lmer(substitute(j ~ Pred + (1 | group), list(j = as.name(k))), data = dt)})
## Does not work for nlme
lapply(c("Resp1", "Resp2"), function(k) {
lme(substitute(j ~ Pred, list(j = as.name(k))), random = ~1 | group, data = dt)})
# Error in UseMethod("lme") :
# no applicable method for 'lme' applied to an object of class "call"
PS. I am aware that this solution exists, but I would like to use a method substituting response variable directly in the model function instead of subsetting data using an additional function.
Instead of fiddling around with substitute and eval you also could do the following:
lapply(c("Resp1", "Resp2"), function(r) {
f <- formula(paste(r, "Pred", sep = "~"))
m <- lme(fixed = f, random = ~ 1 | group, data = dt)
m$call$fixed <- f
m})
You could use the same trick if you want to provide different data sets to a modelling function:
makeModel <- function(dat) {
l <- lme(Resp1 ~ Pred, random = ~ 1 | group, data = dat)
l$call$data <- as.symbol(deparse(substitute(dat)))
l
}
I use this snippet quite a bit, when I want to generate a model from within a function and want to update it afterwards.
As #CarlWitthoft suggested, adding eval into the function will solve the issue:
lapply(c("Resp1", "Resp2"), function(k) {
lme(eval(substitute(j ~ Pred, list(j = as.name(k)))), random = ~1 | group, data = dt)})
Also see #thothal's alternative.

Resources