How to test significant improvement of LRM model - r

Using the rms package of Frank Harrell I constructed a predictive model using the lrm function.
I want to compare if this model has a significant better predictive value on a binomial event in comparison with another (lrm-) model.
I used different functions like anova(model1, model2) or the pR2 function of the pscl library to compare the pseudo R^2, but they all don't work with the lrm based model.
What can I do best to see if my new model is significant beter than the earlier model?
Update: Here is a example (where I want to predict the chance on bone metastasis) to check if size or stage (in addition to other variabele) gives the best model:
library(rms)
getHdata(prostate)
ddd <- datadist(prostate)
options( datadist = "ddd" )
mod1 = lrm(as.factor(bm) ~ age + sz + rx, data=prostate, x=TRUE, y=TRUE)
mod2 = lrm(as.factor(bm) ~ age + stage + rx, data=prostate, x=TRUE, y=TRUE)

It seems fundamentally the question is about comparing two non-nested models.
If you fit your models using the glm function you can use the -vuong- function in -pscl- package.

To test the fit of 2 nested models, you can use the lrtest function from the "rms" package.
lrtest(mod1,mod2)

Related

How to obtain R^2 for robust mixed effect model (rlmer command; robustlmm)?

I estimated a robust mixed effect model with the rlmercommand from the robustlmmpackage. Is there a way to obtain the marginal and conditional R^2 values?
Just going to answer that myself. I could not find a package or rather a function in R that is equivalent to e.g. r.squaredGLMM in the case of lmerMod objects but I found a quick workaround that works with rlmerMod objects. Basically you just have to extract the variance components for the fixed effects, random effects and residuals and then manually calcualte the marginal and conditional R^2 based on the formula provided by Nakagawa & Schielzeth (2013).
library(robustlmm)
library(insight)
library(lme4)
data(Dyestuff, package = "lme4")
robust.model <- rlmer(Yield ~ 1|Batch, data=Dyestuff)
var.fix <- get_variance_fixed(robust.model)
var.ran <- get_variance_random(robust.model)
var.res <- get_variance_residual(robust.model)
R2m = var.fix/(var.fix+var.ran+var.res)
R2c = (var.fix+var.ran)/(var.fix+var.ran+var.res)
Literature:
Nakagawa, S. and Schielzeth, H. (2013), A general and simple method for obtaining R2 from generalized linear mixed‐effects models. Methods Ecol Evol, 4: 133-142. doi:10.1111/j.2041-210x.2012.00261.x

Compare fit Cox models with a cluster() element in R

I would like to compare the model fits of Cox proportional-hazards models with a likelihood-ratio test (or equivalent). This usually works by comparing the fit with the anova() function. However, when I have data that is organized in clusters and therefore has to be defined that way, this does not work. See the example below (the lung data set is included in the survival package):
library(survival)
data(lung)
lung_cox <- coxph(Surv(time, event = status) ~ age + cluster(sex), data = lung)
lung_cox1 <- coxph(Surv(time, event = status) ~ 1 + cluster(sex), data = lung)
anova(lung_cox, lung_cox1)
The error I get after the anova() function is:
Error in anova.coxphlist(c(list(object), dotargs), test = test) :
Can't do anova tables with robust variances
Is there a way to work around this? I'm quite new to survival analysis in R, so maybe I'm not aware of a certain package that provides this?
Many thanks!

Cannot get adjusted means for glmer using lsmeans

I have a glm that I would like to get adjusted means for using lsmeans. The following code makes the model (and seems to be doing it correctly):
library(lmerTest)
data$group <- as.factor(data$grp)
data$site <- as.factor(data$site)
data$stimulus <- as.factor(data$stimulus)
data.acc1 = glmer(accuracy ~ site + grp*stimulus + (1|ID), data=data, family=binomial)
However, using when I try to use any of the below code to get adjusted means for the model, I get the error
Error in lsmeansLT(model, test.effs = test.effs, ddf = ddf) :
The model is not linear mixed effects model.
lsmeans(data.acc1, "stimulus")
or
data.lsm <- lsmeans(data.acc1, accuracy ~ stimulus ~ grp)
pairs(data.lsm)
Any suggestiongs?
The problem is that you have created a generalised linear mixed model using glmer() (in this case a mixed logistic regression model) not a linear mixed model using lmer(). The lsmeans() function does not accept objects created by glmer() because they are not linear mixed models.
Answers in this post might help: I can't get lsmeans output in glmer
And this post might be useful if you want to understand/compute marginal effects for mixed GLMs: Is there a way of getting "marginal effects" from a `glmer` object

R Prediction on a Linear Regression Model

I'm sure this is something that can be done, just not sure how!
I have a dataset that is around 500 rows(csv) and it shows footballers match stas(e,g passes, shots on target)etc.I have some of their salaries(around 10) and I'n trying to predict their salaries using a linear regression equation.
In the below, if Y is salaries, is there a way on R to essentially autopopulate? what the rest of the salaries might be based on the ten salaries I do have?
lm(y ~ x1 + x2 +x3)
Any help would be much appreciated.
This is what the predict function does.
Note that you don't need to call predict.lm explicitly. Because the result of a call to lm is an object with class "lm", R "knows" to use predict.lm when you call predict on it.
Eg:
lm1 <- lm(y ~ x1 + x2 +x3)
y.fitted <- predict(lm1)
You should also be able to test the predictive accuracy of your model using cross validation with the function cv.lm in the DAAG library. With this function you create test data to test the model which is generated using training data.

Do I need to set refit=FALSE when testing for random effects in lmer() models with anova()?

I am currently testing whether I should include certain random effects in my lmer model or not. I use the anova function for that. My procedure so far is to fit the model with a function call to lmer() with REML=TRUE (the default option). Then I call anova() on the two models where one of them does include the random effect to be tested for and the other one doees not. However, it is well known that the anova() function refits the model with ML but in the new version of anova() you can prevent anova() from doing so by setting the option refit=FALSE. In order to test for random effects should I set refit=FALSE in my call to anova() or not? (If I do set refit=FALSE the p-values tend to be lower. Are the p-values anti-conservative when I set refit=FALSE?)
Method 1:
mod0_reml <- lmer(x ~ y + z + (1 | w), data=dat)
mod1_reml <- lmer(x ~ y + z + (y | w), data=dat)
anova(mod0_reml, mod1_reml)
This will result in anova() refitting the models with ML instead of REML. (Newer versions of the anova() function will also output an info about this.)
Method 2:
mod0_reml <- lmer(x ~ y + z + (1 | w), data=dat)
mod1_reml <- lmer(x ~ y + z + (y | w), data=dat)
anova(mod0_reml, mod1_reml, refit=FALSE)
This will result in anova() performing its calculations on the original models, i.e. with REML=TRUE.
Which of the two methods is correct in order to test whether I should include a random effect or not?
Thanks for any help
In general I would say that it would be appropriate to use refit=FALSE in this case, but let's go ahead and try a simulation experiment.
First fit a model without a random slope to the sleepstudy data set, then simulate data from this model:
library(lme4)
mod0 <- lmer(Reaction ~ Days + (1|Subject), data=sleepstudy)
## also fit the full model for later use
mod1 <- lmer(Reaction ~ Days + (Days|Subject), data=sleepstudy)
set.seed(101)
simdat <- simulate(mod0,1000)
Now refit the null data with the full and the reduced model, and save the distribution of p-values generated by anova() with and without refit=FALSE. This is essentially a parametric bootstrap test of the null hypothesis; we want to see if it has the appropriate characteristics (i.e., uniform distribution of p-values).
sumfun <- function(x) {
m0 <- refit(mod0,x)
m1 <- refit(mod1,x)
a_refit <- suppressMessages(anova(m0,m1)["m1","Pr(>Chisq)"])
a_no_refit <- anova(m0,m1,refit=FALSE)["m1","Pr(>Chisq)"]
c(refit=a_refit,no_refit=a_no_refit)
}
I like plyr::laply for its convenience, although you could just as easily use a for loop or one of the other *apply approaches.
library(plyr)
pdist <- laply(simdat,sumfun,.progress="text")
library(ggplot2); theme_set(theme_bw())
library(reshape2)
ggplot(melt(pdist),aes(x=value,fill=Var2))+
geom_histogram(aes(y=..density..),
alpha=0.5,position="identity",binwidth=0.02)+
geom_hline(yintercept=1,lty=2)
ggsave("nullhist.png",height=4,width=5)
Type I error rate for alpha=0.05:
colMeans(pdist<0.05)
## refit no_refit
## 0.021 0.026
You can see that in this case the two procedures give practically the same answer and both procedures are strongly conservative, for well-known reasons having to do with the fact that the null value of the hypothesis test is on the boundary of its feasible space. For the specific case of testing a single simple random effect, halving the p-value gives an appropriate answer (see Pinheiro and Bates 2000 and others); this actually appears to give reasonable answers here, although it is not really justified because here we are dropping two random-effects parameters (the random effect of slope and the correlation between the slope and intercept random effects):
colMeans(pdist/2<0.05)
## refit no_refit
## 0.051 0.055
Other points:
You might be able to do a similar exercise with the PBmodcomp function from the pbkrtest package.
The RLRsim package is designed precisely for fast randomization (parameteric bootstrap) tests of null hypotheses about random effects terms, but doesn't appear to work in this slightly more complex situation
see the relevant GLMM faq section for similar information, including arguments for why you might not want to test the significance of random effects at all ...
for extra credit you could redo the parametric bootstrap runs using the deviance (-2 log likelihood) differences rather than the p-values as output and check whether the results conformed to a mixture between a chi^2_0 (point mass at 0) and a chi^2_n distribution (where n is probably 2, but I wouldn't be sure for this geometry)

Resources