How do I use safely with coxph and subset or weights? - r

I'm trying to use purrr::safely with coxph so that I can capture error messages. I've made a safe version of coxph as follows
library(survival)
library(purrr)
coxph_safe <- safely(coxph)
This works perfectly when my only inputs are the formula and data, however, if I add another input such as subset or weights, I get the following error message:
simpleError in eval(substitute(subset), data, env): ..3 used in an incorrect context, no ... to look in
Does anyone know how to apply safely to coxph when additional inputs are required? I also get the same error using quietly instead of safely, and also if I make a safe version of lm and specify a subset. I'm using R 3.6.1 and purrr 0.3.2. For now, I've programmed a workaround, where I subset the data before applying coxph_safe, but it would be good to know if there was a better solution.
Here's a simple example:
test1 <- list(time=c(4,3,1,1,2,2,3),
status=c(1,1,1,0,1,1,0),
x=c(0,2,1,1,1,0,0),
sex=c(0,0,0,0,1,1,1))
# Without subset
coxph(Surv(time, status) ~ x, test1) # Works as expected
coxph_safe(Surv(time, status) ~ x, test1) # Works as expected
# With subset
coxph(Surv(time, status) ~ x, test1, subset = !sex) # Works as expected
coxph_safe(Surv(time, status) ~ x, test1, subset = !sex) # Error!
Edit
On a related note, I also get a similar error when applying anova to a coxph object generated via coxph_safe.
cox_1 <- coxph(Surv(time, status) ~ x, test1) # Works as expected
anova(cox_1) # Works as expected
cox_1s <- coxph_safe(Surv(time, status) ~ x, test1) # Works as expected
anova(cox_1s$result) # Error in is.data.frame(data) : ..2 used in an incorrect context, no ... to look in
As far as I can tell, this has something to do with how the call is stored. I can fix it by over-writing the call.
cox_1$call # coxph(formula = Surv(time, status) ~ x, data = test1)
cox_1s$result$call # .f(formula = ..1, data = ..2)
cox_1s$result$call <- cox_1$call
anova(cox_1s$result) # Now works as expected
Is there a better way around this?

This actually has nothing to do with purrr::safely. The issue is function nesting. Consider:
f <- function(...) {coxph(...)}
f(Surv(time, status) ~ x, test1) # Works
f(Surv(time, status) ~ x, test1, subset=!sex) # Error
The real reason for why it fails has to do with the behavior of substitute() inside nested functions. coxph() uses substitute(), and safely() creates a nested function, leading to the scenario described in my link.
To address this issue, we need to wrap coxph() into a function that properly handles non-standard evaluation (NSE):
coxph_nse <- function(...) {eval(rlang::expr(coxph( !!!rlang::enexprs(...) )))}
The new function no longer suffers the same nesting issues and can be safely passed to safely():
coxph_safe <- safely(coxph_nse)
coxph_safe(Surv(time, status) ~ x, test1) # works
cx1 <- coxph_safe(Surv(time, status) ~ x, test1, subset=!sex) # now also works!
anova(cx1$result) # works as well!

Related

MuMIn dredge gam error using default na.omit

I have a global model I'm trying to dredge, but I keep getting the error "Error in dredge(myglobalmod, evaluate = TRUE, trace = 2) :
'global.model' uses 'na.action' = "na.omit"
I tried running the global model with na.action="na.omit" within the gam() call and leaving it out (since it's the default).
myglobalmod <- gam(response~ s(x1) + s(x2) + s(x3) + offset(x4), data=mydata, family="tw", na.action="na.omit")
options(na.action=na.omit)
mydredge <- dredge(myglobalmod, evaluate=TRUE, trace=2)
When I didn't include na.action="na.omit" within the gam, I got a similar error.
I then tried with a subset of the data that has all the NA rows removed, but same error.
I've gotten dredge to work before so I'm not sure why it doesn't like the na.omit now, I'm using the same code.
MuMIn insists that you use na.action = na.fail, in order to ensure that the same data set is used for every model (if NA values were left in the data set, different subsets could be used for different models depending on which variables were used). You can use na.omit(mydata) or mydata[complete.cases(mydata), ] to get rid of NA values before you start (assuming that the NA values in your data set occur only in variables you will be using for the full model).
> library(MuMIn)
> m1 <- lm(mpg ~ ., data = mtcars)
> d0 <- dredge(m1)
Error in dredge(m1) :
'global.model''s 'na.action' argument is not set and options('na.action') is "na.omit"
> m1 <- lm(mpg ~ ., data = mtcars, na.action = na.fail)
> d1 <- dredge(m1)
Fixed term is "(Intercept)"

Use the formula("string") with felm() from the lfe package while also using fixed effects

I'm trying to run a large regression formula that is created somewhere else as a long string. I also want to use "fixed effects" (individual specific intercepts).
Without fixed effects this works both in the lm() and in felm() functions:
library("lfe")
MyData <- data.frame(country = c("US","US","DE","DE"),
y = rnorm(4),
x = rnorm(4))
testformula <- "y ~ x"
lm(formula(testformula),
data = MyData)
felm(formula(testformula),
data = MyData)
There is also no problem with this kind of regression in felm() if I use country fixed effects:
felm(y ~ x | country,
data = MyData)
However, when I try to combine both the formula() function and the fixed effects argument, I get an error:
felm(formula(testformula) | country ,
data = MyData)
"Error in terms(formula(as.Formula(formula), rhs = 1), specials = "G") :
Object 'country' not found"
I find this strange, separately, both of these arguments work. How can I use the formula() function in felm() and still work with the convenient fixed effects syntax of that function? I don't want to write the fixed effects into the formula because I want to rely on the within transformations of the lfe package.
p.s.: This works in plm() by the way so I'm guessing there is something odd in the felm() function or I input it badly.
library("plm")
plm(formula(testformula),
data = MyData,
index = c("country"),
model = "within",
effect = "individual")
Since the fixed effects are part of the formula*, we can include them in the formula string.
fit1 <- felm(y ~ x | country, data=MyData)
testformula <- "y ~ x | country"
fit2 <- felm(formula(testformula), data=MyData)
fit2
# x
# 0.3382
all.equal(fit1$coefficients, fit2$coefficients)
# [1] TRUE
*you can see this by the fact that function parameters in R are usually separated by commas

How do I create a "macro" for regressors in R?

For long and repeating models I want to create a "macro" (so called in Stata and there accomplished with global var1 var2 ...) which contains the regressors of the model formula.
For example from
library(car)
lm(income ~ education + prestige, data = Duncan)
I want something like:
regressors <- c("education", "prestige")
lm(income ~ #regressors, data = Duncan)
I could find is this approach. But my application on the regressors won't work:
reg = lm(income ~ bquote(y ~ .(regressors)), data = Duncan)
as it throws me:
Error in model.frame.default(formula = y ~ bquote(.y ~ (regressors)), data =
Duncan, : invalid type (language) for variable 'bquote(.y ~ (regressors))'
Even the accepted answer of same question:
lm(formula(paste('var ~ ', regressors)), data = Duncan)
strikes and shows me:
Error in model.frame.default(formula = formula(paste("var ~ ", regressors)),
: object is not a matrix`.
And of course I tried as.matrix(regressors) :)
So, what else can I do?
Here are some alternatives. No packages are used in the first 3.
1) reformulate
fo <- reformulate(regressors, response = "income")
lm(fo, Duncan)
or you may wish to write the last line as this so that the formula that is shown in the output looks nicer:
do.call("lm", list(fo, quote(Duncan)))
in which case the Call: line of the output appears as expected, namely:
Call:
lm(formula = income ~ education + prestige, data = Duncan)
2) lm(dataframe)
lm( Duncan[c("income", regressors)] )
The Call: line of the output look like this:
Call:
lm(formula = Duncan[c("income", regressors)])
but we can make it look exactly as in the do.call solution in (1) with this code:
fo <- formula(model.frame(income ~., Duncan[c("income", regressors)]))
do.call("lm", list(fo, quote(Duncan)))
3) dot
An alternative similar to that suggested by #jenesaisquoi in the comments is:
lm(income ~., Duncan[c("income", regressors)])
The approach discussed in (2) to the Call: output also works here.
4) fn$ Prefacing a function with fn$ enables string interpolation in its arguments. This solution is nearly identical to the desired syntax shown in the question using $ in place of # to perform substitution and the flexible substitution could readily extend to more complex scenarios. The quote(Duncan) in the code could be written as just Duncan and it will still run but the Call: shown in the lm output will look better if you use quote(Duncan).
library(gsubfn)
rhs <- paste(regressors, collapse = "+")
fn$lm("income ~ $rhs", quote(Duncan))
The Call: line looks almost identical to the do.call solutions above -- only spacing and quotes differ:
Call:
lm(formula = "income ~ education+prestige", data = Duncan)
If you wanted it absolutely the same then:
fo <- fn$formula("income ~ $rhs")
do.call("lm", list(fo, quote(Duncan)))
For the scenario you described, where regressors is in the global environment, you could use:
lm(as.formula(paste("income~", paste(regressors, collapse="+"))), data =
Duncan)
Alternatively, you could use a function:
modincome <- function(regressors){
lm(as.formula(paste("income~", paste(regressors, collapse="+"))), data =
Duncan)
}
modincome(c("education", "prestige"))

R - model.frame() and non-standard evaluation

I am puzzled at a behaviour of a function that I am trying to write. My example comes from the survival package but I think that the question is more general than that. Basically, the following code
library(survival)
data(bladder) ## this will load "bladder", "bladder1" and "bladder2"
mod_init <- coxph(Surv(start, stop, event) ~ rx + number, data = bladder2, method = "breslow")
survfit(mod_init)
Will yield an object that I am interested in. However, when I write it in a function,
my_function <- function(formula, data) {
mod_init <- coxph(formula = formula, data = data, method = "breslow")
survfit(mod_init)
}
my_function(Surv(start, stop, event) ~ rx + number, data = bladder2)
the function will return an error at the last line:
Error in eval(predvars, data, env) :
invalid 'envir' argument of type 'closure'
10 eval(predvars, data, env)
9 model.frame.default(formula = Surv(start, stop, event) ~ rx +
number, data = data)
8 stats::model.frame(formula = Surv(start, stop, event) ~ rx +
number, data = data)
7 eval(expr, envir, enclos)
6 eval(temp, environment(formula$terms), parent.frame())
5 model.frame.coxph(object)
4 stats::model.frame(object)
3 survfit.coxph(mod_init)
2 survfit(mod_init)
1 my_function(Surv(start, stop, event) ~ rx + number, data = bladder2)
I am curious whether there is something obvious that I am missing or whether such behaviour is normal. I find it strange, since in the environment of my_function I would have the same objects as in the global environment when running the first portion of the code.
Edit: I also received useful input from Terry Therneau, the author of the survival package. This is his answer:
This is a problem that stems from the non-standard evaluation done by model.frame. The only way out of it that I have found is to add model.frame=TRUE to the original coxph call. I consider it a serious design flaw in R. Non-standard evaluation is like the dark side -- a tempting and easy path that always ends badly.
Terry T.
Diagnose
From the error message:
2 survfit(mod_init, newdata = base_case)
1 my_function(Surv(start, stop, event) ~ rx + number, data = bladder2)
the problem is clearly not with coxph during model fitting, but with survfit.
And from this message:
10 eval(predvars, data, env)
9 model.frame.default(formula = Surv(start, stop, event) ~ rx +
number, data = data)
I can tell that the problem is that during early stage of survfit, the function model.frame.default() can not find a model frame containing relevant data used in formula Surv(start, stop, event) ~ rx + number. Hence it complains.
What is a model frame?
A model frame, is formed from the data argument passed to fitting routine, like lm(), glm() and mgcv:::gam(). It is a data frame with the same number of rows as data, but:
dropping all variables not referenced by formula
adding many attributes, the most important of which is envrionement
Most model fitting routines, like lm(), glm(), and mgcv:::gam(), will keep the model frame in their fitted object by default. This has advantage that if we later call predict, and no newdata is provided, it will find data from this model frame for evaluation. However, a clear disadvantage is that it will substantially increase the size of your fitted object.
However, survival:::coxph() is an exception. It will by default not retain such model frame in their fitted object. Well, clearly, this makes the resulting fitted object much smaller in size, but, expose you to the problem you have encountered. If we want to ask survival:::coxph() to keep this model frame, then use model = TRUE of this function.
Test with survial:::coxph()
library(survival); data(bladder)
my_function <- function(myformula, mydata, keep.mf = TRUE) {
fit <- coxph(myformula, mydata, method = "breslow", model = keep.mf)
survfit(fit)
}
Now, this function call will fail, as you have seen:
my_function(Surv(start, stop, event) ~ rx + number, bladder2, keep.mf = FALSE)
but this function call will succeed:
my_function(Surv(start, stop, event) ~ rx + number, bladder2, keep.mf = TRUE)
Same behaviour for lm()
We can actually demonstrate the same behaviour in lm():
## generate some toy data
foo <- data.frame(x = seq(0, 1, length = 20), y = seq(0, 1, length = 20) + rnorm(20, 0, 0.15))
## a wrapper function
bar <- function(myformula, mydata, keep.mf = TRUE) {
fit <- lm(myformula, mydata, model = keep.mf)
predict.lm(fit)
}
Now this will succeed, by keeping model frame:
bar(y ~ x - 1, foo, keep.mf = TRUE)
while this will fail, by discarding model frame:
bar(y ~ x - 1, foo, keep.mf = FALSE)
Using argument newdata?
Note that my example for lm() is slightly artificial, because we can actually use newdata argument in predict.lm() to get through this problem:
bar1 <- function(myformula, mydata, keep.mf = TRUE) {
fit <- lm(myformula, mydata, model = keep.mf)
predict.lm(fit, newdata = lapply(mydata, mean))
}
Now whether we keep model frame, both will succeed:
bar1(y ~ x - 1, foo, keep.mf = TRUE)
bar1(y ~ x - 1, foo, keep.mf = FALSE)
Then you may wonder: can we do the same for survfit()?
survfit() is a generic function, in your code, you are really calling survfit.coxph(). There is indeed a newdata argument for this function. The documentation reads:
newdata:
a data frame with the same variable names as those that appear in the
‘coxph’ formula. ... ... Default is the mean of the covariates used in the
‘coxph’ fit.
So, let's try:
my_function1 <- function(myformula, mydata) {
mtrace.off()
fit <- coxph(myformula, mydata, method = "breslow")
survival:::survfit.coxph(fit, newdata = lapply(mydata, mean))
}
and we hope this work:
my_function1(Surv(start, stop, event) ~ rx + number, bladder2)
But:
Error in is.data.frame(data) (from #5) : object 'mydata' not found
1: my_function1(Surv(start, stop, event) ~ rx + number, bladder2)
2: #5: survival:::survfit.coxph(fit, lapply(mydata, mean))
3: stats::model.frame(object)
4: model.frame.coxph(object)
5: eval(temp, environment(formula$terms), parent.frame())
6: eval(expr, envir, enclos)
7: stats::model.frame(formula = Surv(start, stop, event) ~ rx + number, data =
8: model.frame.default(formula = Surv(start, stop, event) ~ rx + number, data
9: is.data.frame(data)
Note that although we pass in newdata, it is not used in construction of model frame:
3: stats::model.frame(object)
Only object, a copy of fitted model, is passed to model.frame.default().
This is very different from what happens in predict.lm(), predict.glm() and mgcv:::predict.gam(). In these routines, newdata is passed to model.frame.default(). For example, in lm(), there is:
m <- model.frame(Terms, newdata, na.action = na.action, xlev = object$xlevels)
I don't use survival package, so not sure how newdata works in this package. So I think we really need some expert explaining this.
I think it might be that if your
Surv(start, stop, event) ~ rx + number
is in as a parameter, it does not get properly created. Try put
is.Surv(formula)
as your first line in the function. I suspect it wont work, then I would suggest using apply family of functions.

Passing Argument to lm in R within Function

I would like to able to call lm within a function and specify the weights variable as an argument passed to the outside function that is then passed to lm. Below is a reproducible example where the call works if it is made to lm outside of a function, but produces the error message Error in eval(expr, envir, enclos) : object 'weightvar' not found when called from within a wrapper function.
olswrapper <- function(form, weightvar, df){
ols <- lm(formula(form), weights = weightvar, data = df)
}
df <- mtcars
ols <- lm(mpg ~ cyl + qsec, weights = gear, data = df)
summary(ols)
ols2 <- olswrapper(mpg ~ cyl + qsec, weightvar = gear, df = df)
#Produces error: "Error in eval(expr, envir, enclos) : object 'weightvar' not found"
Building on the comments, gear isn't defined globally. It works inside the stand-alone lm call as you specify the data you are using, so lm knows to take gear from df.
Howver, gear itself doesn't exist outside that stand-alone lm function. This is shown by the output of gear
> gear
Error: object 'gear' not found
You can pass the gear into the function using df$gear
weightvar <- df$gear
ols <- olswrapper(mpg ~ cyl + qsec, weightvar , df = df)
I know I'm late on this, but I believe the previous explanation is incomplete. Declaring weightvar <- df$gear and then passing it in to the function only works because you use weightvar as the name for your weight argument. This is just using weightvar as a global variable. That's why df$gear doesn't work directly. It also doesn't work if you use any name except weightvar.
The reason why it doesn't work is that lm looks for data in two places: the dataframe argument (if specified), and the environment of your formula. In this case, your formula's environment is R_GlobalEnv. (You can test this by running print(str(form)) from inside olswrapper). Thus, lm will only look in the global environment and in df, not the function environment.
edit: In the lm documentation the description of the data argument says:
"an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which lm is called."
A quick workaround is to say environment(form) <- environment() to change your formula's environment. This won't cause any problems because the data in the formula is in the data frame you specify.
eval(substitute(...)) inside a body of a function allows us to employ non-standard evaluation
df <- mtcars
olswrapper <- function(form, weightvar, df)
eval(substitute(ols <- lm(formula(form), weights = weightvar, data = df)))
summary(ols)
olswrapper(mpg ~ cyl + qsec, weightvar = gear, df = df)
More here:
http://adv-r.had.co.nz/Computing-on-the-language.html

Resources