Logistic regression & bootstrap - r

I'm trying to run first a logistic regression using lrm from the package RMS.
My model works fine with glm but not with lrm.
model1 <- lrm( Outcome30Days ~ ISS1 + ISS2 + as.factor(GCSgr)+
as.factor(Gender)*as.factor(agegr),data=sub2, x=T, y=T, se.fit=T)
If ISS1 and ISS2 are removed the model runs, but with these 2 variables it won't.
error message:
Unable to fit model using “lrm.fit”
I need to run it with lrm, because the package validate using bootstrap works (apparently) only with lrm.
Any help would be appreciated.

lrm has a lower tolerance for correlation among independent variables than glm. If your model runs with glm and runs with lrm when you remove some variables this is probably the issue. Luckily, you can adjust the tolerance with the tol argument. By default tol=1e-7. Try changing it to tol=1e-9. The code would look like this:
model1 <- lrm( Outcome30Days ~ ISS1 + ISS2 + as.factor(GCSgr)+
as.factor(Gender)*as.factor(agegr),data=sub2, x=T, y=T, se.fit=T, tol=1e-9)
This is better than messing with the penalty because changing the penalty will change your log-likelihood and could affect your results.

Related

How to test significant improvement of LRM model

Using the rms package of Frank Harrell I constructed a predictive model using the lrm function.
I want to compare if this model has a significant better predictive value on a binomial event in comparison with another (lrm-) model.
I used different functions like anova(model1, model2) or the pR2 function of the pscl library to compare the pseudo R^2, but they all don't work with the lrm based model.
What can I do best to see if my new model is significant beter than the earlier model?
Update: Here is a example (where I want to predict the chance on bone metastasis) to check if size or stage (in addition to other variabele) gives the best model:
library(rms)
getHdata(prostate)
ddd <- datadist(prostate)
options( datadist = "ddd" )
mod1 = lrm(as.factor(bm) ~ age + sz + rx, data=prostate, x=TRUE, y=TRUE)
mod2 = lrm(as.factor(bm) ~ age + stage + rx, data=prostate, x=TRUE, y=TRUE)
It seems fundamentally the question is about comparing two non-nested models.
If you fit your models using the glm function you can use the -vuong- function in -pscl- package.
To test the fit of 2 nested models, you can use the lrtest function from the "rms" package.
lrtest(mod1,mod2)

Cannot get adjusted means for glmer using lsmeans

I have a glm that I would like to get adjusted means for using lsmeans. The following code makes the model (and seems to be doing it correctly):
library(lmerTest)
data$group <- as.factor(data$grp)
data$site <- as.factor(data$site)
data$stimulus <- as.factor(data$stimulus)
data.acc1 = glmer(accuracy ~ site + grp*stimulus + (1|ID), data=data, family=binomial)
However, using when I try to use any of the below code to get adjusted means for the model, I get the error
Error in lsmeansLT(model, test.effs = test.effs, ddf = ddf) :
The model is not linear mixed effects model.
lsmeans(data.acc1, "stimulus")
or
data.lsm <- lsmeans(data.acc1, accuracy ~ stimulus ~ grp)
pairs(data.lsm)
Any suggestiongs?
The problem is that you have created a generalised linear mixed model using glmer() (in this case a mixed logistic regression model) not a linear mixed model using lmer(). The lsmeans() function does not accept objects created by glmer() because they are not linear mixed models.
Answers in this post might help: I can't get lsmeans output in glmer
And this post might be useful if you want to understand/compute marginal effects for mixed GLMs: Is there a way of getting "marginal effects" from a `glmer` object

train() function and rate model (poisson regression with offset) with caret

I fitted a rate model using glm() (poisson link with offset, like
y ~ offset(log(x1)) + x2 + x3
the response is y/x1 in this case).
Then I wanted to do cross validation using caret package so I used 'train()' function with k-fold CV control. It turns out the 2 models I have are very different. It seems that train() can't handle offset: I change the variable within offset to be offset(log(log(x1)) or offset(log(sqrt(x1)), the models remain the same.
Any one have this kind of experience before and how did you deal with it?
Thanks!
btw I want to save the prediction on each validation set so so far I only know caret can do that, thats why I didnt choose to use cv.glm.
I cannot claim to have prior experience with this exact process, and have not done any testing in the absence of you offering a reproducible example and code. But I do have experience with moving offsets to the LHS of a glm-Poission regression call, so why not change the formula (and family) to:
glm( I(y/x1) ~ x2 + x3, family=quasipoisson, data= , ...)

Logistic regression with robust clustered standard errors in R

A newbie question: does anyone know how to run a logistic regression with clustered standard errors in R? In Stata it's just logit Y X1 X2 X3, vce(cluster Z), but unfortunately I haven't figured out how to do the same analysis in R. Thanks in advance!
You might want to look at the rms (regression modelling strategies) package. So, lrm is logistic regression model, and if fit is the name of your output, you'd have something like this:
fit=lrm(disease ~ age + study + rcs(bmi,3), x=T, y=T, data=dataf)
fit
robcov(fit, cluster=dataf$id)
bootcov(fit,cluster=dataf$id)
You have to specify x=T, y=T in the model statement. rcs indicates restricted cubic splines with 3 knots.
Another alternative would be to use the sandwich and lmtest package as follows. Suppose that z is a column with the cluster indicators in your dataset dat. Then
# load libraries
library("sandwich")
library("lmtest")
# fit the logistic regression
fit = glm(y ~ x, data = dat, family = binomial)
# get results with clustered standard errors (of type HC0)
coeftest(fit, vcov. = vcovCL(fit, cluster = dat$z, type = "HC0"))
will do the job.
I have been banging my head against this problem for the past two days; I magically found what appears to be a new package which seems destined for great things--for example, I am also running in my analysis some cluster-robust Tobit models, and this package has that functionality built in as well. Not to mention the syntax is much cleaner than in all the other solutions I've seen (we're talking near-Stata levels of clean).
So for your toy example, I'd run:
library(Zelig)
logit<-zelig(Y~X1+X2+X3,data=data,model="logit",robust=T,cluster="Z")
Et voilà!
There is a command glm.cluster in the R package miceadds which seems to give the same results for logistic regression as Stata does with the option vce(cluster). See the documentation here.
In one of the examples on this page, the commands
mod2 <- miceadds::glm.cluster(data=dat, formula=highmath ~ hisei + female,
cluster="idschool", family="binomial")
summary(mod2)
give the same robust standard errors as the Stata command
logit highmath hisei female, vce(cluster idschool)
e.g. a standard error of 0.004038 for the variable hisei.

How to get coefficients and their confidence intervals in mixed effects models?

In lm and glm models, I use functions coef and confint to achieve the goal:
m = lm(resp ~ 0 + var1 + var1:var2) # var1 categorical, var2 continuous
coef(m)
confint(m)
Now I added random effect to the model - used mixed effects models using lmer function from lme4 package. But then, functions coef and confint do not work any more for me!
> mix1 = lmer(resp ~ 0 + var1 + var1:var2 + (1|var3))
# var1, var3 categorical, var2 continuous
> coef(mix1)
Error in coef(mix1) : unable to align random and fixed effects
> confint(mix1)
Error: $ operator not defined for this S4 class
I tried to google and use docs but with no result. Please point me in the right direction.
EDIT: I was also thinking whether this question fits more to https://stats.stackexchange.com/ but I consider it more technical than statistical, so I concluded it fits best here (SO)... what do you think?
Not sure when it was added, but now confint() is implemented in lme4.
For example the following example works:
library(lme4)
m = lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
confint(m)
There are two new packages, lmerTest and lsmeans, that can calculate 95% confidence limits for lmer and glmer output. Maybe you can look into those? And coefplot2, I think can do it too (though as Ben points out below, in a not so sophisticated way, from the standard errors on the Wald statistics, as opposed to Kenward-Roger and/or Satterthwaite df approximations used in lmerTest and lsmeans)... Just a shame that there are still no inbuilt plotting facilities in package lsmeans (as there are in package effects(), which btw also returns 95% confidence limits on lmer and glmer objects but does so by refitting a model without any of the random factors, which is evidently not correct).
I suggest that you use good old lme (in package nlme). It has confint, and if you need confint of contrasts, there is a series of choices (estimable in gmodels, contrast in contrasts, glht in multcomp).
Why p-values and confint are absent in lmer: see http://finzi.psych.upenn.edu/R/Rhelp02a/archive/76742.html .
Assuming a normal approximation for the fixed effects (which confint would also have done), we can obtain 95% confidence intervals by
estimate + 1.96*standard error.
The following does not apply to the variance components/random effects.
library("lme4")
mylm <- lmer(Reaction ~ Days + (Days|Subject), data =sleepstudy)
# standard error of coefficient
days_se <- sqrt(diag(vcov(mylm)))[2]
# estimated coefficient
days_coef <- fixef(mylm)[2]
upperCI <- days_coef + 1.96*days_se
lowerCI <- days_coef - 1.96*days_se
I'm going to add a bit here. If m is a fitted (g)lmer model (most of these work for lme too):
fixef(m) is the canonical way to extract coefficients from mixed models (this convention began with nlme and has carried over to lme4)
you can get the full coefficient table with coef(summary(m)); if you have loaded lmerTest before fitting the model, or convert the model after fitting (and then loading lmerTest) via coef(summary(as(m,"merModLmerTest"))), then the coefficient table will include p-values. (The coefficient table is a matrix; you can extract the columns via e.g. ctab[,"Estimate"], ctab[,"Pr(>|t|)"], or convert the matrix to a data frame and use $-indexing.)
As stated above you can get likelihood profile confidence intervals via confint(m); these may be computationally intensive. If you use confint(m, method="Wald") you'll get the standard +/- 1.96SE confidence intervals. (lme uses intervals(m) instead of confint().)
If you prefer to use broom.mixed:
tidy(m,effects="fixed") gives you a table with estimates, standard errors, etc.
tidy(as(m,"merModLmerTest"), effects="fixed") (or fitting with lmerTest in the first place) includes p-values
adding conf.int=TRUE gives (Wald) CIs
adding conf.method="profile" (along with conf.int=TRUE) gives likelihood profile CIs
You can also get confidence intervals by parametric bootstrap (method="boot"), which is considerably slower but more accurate in some circumstances.
To find the coefficient, you can simply use the summary function of lme4
m = lm(resp ~ 0 + var1 + var1:var2) # var1 categorical, var2 continuous
m_summary <- summary(m)
to have all coefficients :
m_summary$coefficient
If you want the confidence interval, multiply the standart error by 1.96:
CI <- m_summary$coefficient[,"Std. Error"]*1.96
print(CI)
I'd suggest tab_model() function from sjPlot package as alternative. Clean and readable output ready for markdown. Reference here and examples here.
For those more visually inclined plot_model() from the same package might come handy too.
Alternative solution is via parameters package using model_parameters() function.

Resources