glmer - predict with binomial data (cbind count data) - r

I am trying to predict values over time (Days in x axis) for a glmer model that was run on my binomial data. Total Alive and Total Dead are count data. This is my model, and the corresponding steps below.
full.model.dredge<-glmer(cbind(Total.Alive,Total.Dead)~(CO2.Treatment+Lime.Treatment+Day)^3+(Day|Container)+(1|index),
data=Survival.data,family="binomial")
We have accounted for overdispersion as you can see in the code (1:index).
We then use the dredge command to determine the best fitted models with the main effects (CO2.Treatment, Lime.Treatment, Day) and their corresponding interactions.
dredge.models<-dredge(full.model.dredge,trace=FALSE,rank="AICc")
Then made a workspace variable for them
my.dredge.models<-get.models(dredge.models)
We then conducted a model average to average the coefficients for the best fit models
silly<-model.avg(my.dredge.models,subset=delta<10)
But now I want to create a graph, with the Total Alive on the Y axis, and Days on the X axis, and a fitted line depending on the output of the model. I understand this is tricky because the model concatenated the Total.Alive and Total.Dead (see cbind(Total.Alive,Total.Dead) in the model.
When I try to run a predict command I get the error
# 9: In UseMethod("predict") :
# no applicable method for 'predict' applied to an object of class "mer"

Most of your problem is that you're using a pre-1.0 version of lme4, which doesn't have the predict method implemented. (Updating would be easiest, but I believe that if you can't for some reason, there's a recipe at http://glmm.wikidot.com/faq for doing the predictions by hand by extracting the fixed-effect design matrix and the coefficients ...)There's actually not a problem with the predictions, which predict the log-odds (by default) or the probability (if type="response"); if you wanted to predict numbers, you'd have to multiply by N appropriately.
You didn't give one, but here's a reproducible (albeit somewhat trivial) example using the built-in cbpp data set (I do get some warning messages -- no non-missing arguments to max; returning -Inf -- but I think this may be due to the fact that there's only one non-trivial fixed-effect parameter in the model?)
library(lme4)
packageVersion("lme4") ## 1.1.4, but this should work as long as >1.0.0
library(MuMIn)
It's convenient for later use (with ggplot) to add a variable for the proportion:
cbpp <- transform(cbpp,prop=incidence/size)
Fit the model (you could also use glmer(prop~..., weights=size, ...))
gm0 <- glmer(cbind(incidence, size - incidence) ~ period+(1|herd),
family = binomial, data = cbpp)
dredge.models<-dredge(gm0,trace=FALSE,rank="AICc")
my.dredge.models<-get.models(dredge.models)
silly<-model.avg(my.dredge.models,subset=delta<10)
Prediction does work:
predict(silly,type="response")
Creating a plot:
library(ggplot2)
theme_set(theme_bw()) ## cosmetic
g0 <- ggplot(cbpp,aes(period,prop))+
geom_point(alpha=0.5,aes(size=size))
Set up a prediction frame:
predframe <- data.frame(period=levels(cbpp$period))
Predict at the population level (ReForm=NA -- this may have to be REForm=NA in lme4 `1.0.5):
predframe$prop <- predict(gm0,newdata=predframe,type="response",ReForm=NA)
Add it to the graph:
g0 + geom_point(data=predframe,colour="red")+
geom_line(data=predframe,colour="red",aes(group=1))

Related

Obtaining predictions from a pooled imputation model

I want to implement a "combine then predict" approach for a logistic regression model in R. These are the steps that I already developed, using a fictive example from pima data from faraway package. Step 4 is where my issue occurs.
#-----------activate packages and download data-------------##
library(faraway)
library(mice)
library(margins)
data(pima)
Apply a multiple imputation by chained equation method using MICE package. For the sake of the example, I previously randomly assign missing values to pima dataset using the ampute function from the same package. A number of 20 imputated datasets were generated by setting "m" argument to 20.
#-------------------assign missing values to data-----------------#
result<-ampute(pima)
result<-result$amp
#-------------------multiple imputation by chained equation--------#
#generate 20 imputated datasets
newresult<-mice(result,m=20)
Run a logistic regression on each of the 20 imputated datasets. Inspecting convergence, original and imputated data distributions is skipped for the sake of the example. "Test" variable is set as the binary dependent variable.
#run a logistic regression on each of the 20 imputated datasets
model<-with(newresult,glm(test~pregnant+glucose+diastolic+triceps+age+bmi,family = binomial(link="logit")))
Combine the regression estimations from the 20 imputation models to create a single pooled imputation model.
#pooled regressions
summary(pool(model))
Generate predictions from the pooled imputation model using prediction function from the margins package. This specific function allows to generate predicted values fixed at a specific level (for factors) or values (for continuous variables). In this example, I could chose to generate new predicted probabilites, i.e. P(Y=1), while setting pregnant variable (# of pregnancies) at 3. In other words, it would give me the distribution of the issue in the contra-factual situation where all the observations are set at 3 for this variable. Normally, I would just give my model to the x argument of the prediction function (as below), but in the case of a pooled imputation model with MICE, the object class is a mipo and not a glm object.
#-------------------marginal standardization--------#
prediction(model,at=list(pregnant=3))
This throws the following error:
Error in check_at_names(names(data), at) :
Unrecognized variable name in 'at': (1) <empty>p<empty>r<empty>e<empty>g<empty>n<empty>a<empty>n<empty>t<empty
I thought of two solutions:
a) changing the class object to make it fit prediction()'s requirements
b) extracting pooled imputation regression parameters and reconstruct it in a list that would fit prediction()'s requirements
However, I'm not sure how to achieve this and would enjoy any advice that could help me getting closer to obtaining predictions from a pooled imputation model in R.
You might be interested in knowing that the pima data set is a bit problematic (the Native Americans from whom the data was collected don't want it used for research any more ...)
In addition to #Vincent's comment about marginaleffects, I found this GitHub issue discussing mice support for the emmeans package:
library(emmeans)
emmeans(model, ~pregnant, at=list(pregnant=3))
marginaleffects works in a different way. (Warning, I haven't really looked at the results to make sure they make sense ...)
library(marginaleffects)
fit_reg <- function(dat) {
mod <- glm(test~pregnant+glucose+diastolic+
triceps+age+bmi,
data = dat, family = binomial)
out <- predictions(mod, newdata = datagrid(pregnant=3))
return(out)
}
dat_mice <- mice(pima, m = 20, printFlag = FALSE, .Random.seed = 1024)
dat_mice <- complete(dat_mice, "all")
mod_imputation <- lapply(dat_mice, fit_reg)
mod_imputation <- pool(mod_imputation)

Error when model diagnostics on stratified variable with coxph (survival package) in R

I have a dataset where 3 groups have recieved exposure to different media. One group is exposed to 1 of the 3 media. Therefore, my coxph model is stratified:
# My treatment variable is loaded as a factor.
fullModel <- coxph(Surv(time, status) ~ strata(treatment), data = d)
When I try to do model diagnostics I get this error:
test.assump <- cox.zph(fullModel)
Error in cox.zph(fullModel) :
there are no score residuals for a Null model
But, if I remove the strata() argument, I get to run diagnostics on the model:
chisq df p
treatment 1.29 2 0.52
GLOBAL 1.29 2 0.52
I've made this example to reproduce my error:
data <- list(time=c(4,3,1,1,2,2,3,2,4,1,3,4),
status=c(1,1,1,0,1,1,0,1,1,0,0,1),
treatment=c(0,0,0,0,1,1,1,1,2,2,2,2))
cox.test <- coxph(Surv(time, status) ~ strata(treatment), data = data)
test.coxas <- cox.zph(cox.test)
ggcoxzph(test.coxas)
ggcoxdiagnostics(test.coxas, type = "schoenfeld",
linear.predictions = F)
Should I do diagnostics without the strata argument? And then use the strata argument after so I can plot the different exposures in a ggsurvplot?
Where am I going wrong here?
Thank you in advance for helping me resolve this trouble.
I'm bracketing whether using strata() is the best modeling choice, given what I understand of your application, and focusing on the actual question you asked.
Schoenfeld residuals are used to diagnose proportional hazard violations a Cox model's covariates. Your model specification has no covariates. Ergo, you have no PH violations to diagnose and potentially correct, which is why cox.zph is throwing a "null model" error, as in "this model only estimates the (Cox model's version of an) intercept term".
Put differently: Schoenfeld residuals are covariate-specific quantities, so if there are no covariates in the Cox model, there are no Schoenfelds to calculate. cox.zph's calculations involve the Schoenfeld residuals, hence the error.
Instead, you have a strata() term. Stratifying permits different groups to have a different baseline hazard rate (= the Cox's version of an intercept term, heuristically speaking). There are many reasons you might stratify, but one of them is to correct for possible PH violations—the very issue that's leading you to run cox.zph in the first place. If you keep stratifying on treatment, there are no PH-related model diagnostics for you to run.
(As an aside: for ggcoxdiagnostics in your MWE, you need to pass in the coxph object, not the cox.zph object.)

R predict warning

Doing:
predictions <- predict(lm.sqrtFlatprices, interval='prediction', level = 0.68) ^ 2
I get:
predictions on current data refer to _future_ responses
Why is this warning exist, and how can i suppress it?
From ?predict.lm
The prediction intervals are for a single observation at each case in newdata (or by default, the data used for the fit) with error variance(s) pred.var. This can be a multiple of res.var, the estimated value of σ^2: the default is to assume that future observations have the same error variance as those used for fitting. If weights is supplied, the inverse of this is used as a scale factor. For a weighted fit, if the prediction is for the original data frame, weights defaults to the weights used for the model fit, with a warning since it might not be the intended result. If the fit was weighted and newdata is given, the default is to assume constant prediction variance, with a warning.
Essentially, R is making some assumptions in order to use to calculate the predicted value limits (as opposed to the confidence limits of the fitted value) and wants you to be aware of the assumptions it is making. The actual warning assumes that the user has read the documentation at ?predict.lm.
If you are unconcerned with the assumptions and wish to suppress the warning, you may use
suppressWarnings(predict(lm.sqrtFlatprices, interval='prediction', level = 0.68) ^ 2)

Is there a way to extrapolate predicted data from lmer

I am using lmer to fit a multilevel polynomial regression model with several fixed effects (including subject-specific variables like age, short-term memory span, etc.) and two sets of random effects (Subject and Subject:Condition). Now I would like to predict data for a hypothetical subject with particular properties (age, short-term memory span, etc.). I fit the model (m) and created a new data frame (pred) that contains my hypothetical subject, but when I tried predict(m, pred) I got an error:
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "mer"
I know I could use the brute-force method of extracting fixed effects from my model and multiplying it all out, but is there a more elegant solution?
You can do this type of extrapolated prediction easily with the merTools package for R: http://www.github.com/jknowles/merTools
merTools includes a function called predictInterval which provides robust prediction capabilities for lmer and glmer fits. Specifically, you can use this function to predict extrapolated data, and to obtain prediction intervals that account for the variance in both the fixed and random effects, as well as the residual error of the model.
Here's a quick code example:
library(merTools)
m1 <- lmer(Reaction ~ Days + (1|Subject), data = sleepstudy)
predOut <- predictInterval(m1, newdata = sleepstudy, n.sims = 100)
# extrapolated data
extrapData <- sleepstudy[1:10,]
extrapData$Days <- 20
extrapPred <- predictInterval(m1, newdata = extrapData)

How do I plot predictions from new data fit with gee, lme, glmer, and gamm4 in R?

I have fit my discrete count data using a variety of functions for comparison. I fit a GEE model using geepack, a linear mixed effect model on the log(count) using lme (nlme), a GLMM using glmer (lme4), and a GAMM using gamm4 (gamm4) in R.
I am interested in comparing these models and would like to plot the expected (predicted) values for a new set of data (predictor variables). My goal is to compare the predicted effects for each model under particular conditions (x variables). Of particular interest is the comparison between marginal (GEE) and conditional estimates.
I think my main problem might be getting the new data in the correct form with the correct labels and attributes and such. I am still very much an R novice and struggle with this stuff (no course on this at my university unfortunately).
I currently have fitted models
gee1 lme1 lmer1 gamm1
and can extract their fixed effect coefficients and standard errors without a problem. I also don't have a problem converting them from the log scale or estimating confidence intervals accounting for the random effects.
I also have my new dataframe newdat which has 365 observations of 23 variables (average environmental data for each day of the year).
I am stuck on how to predict new count estimates from this. I played around with the model.matrix function but couldn't get it to work. For example, I tried:
mm = model.matrix(terms(glmm1), newdat) # Error in model.frame.default(object,
# data, xlev = xlev) : object is not a matrix
newdat$pcount = mm %*% fixef(glmm1)
Any suggestions or good references would be greatly appreciated. Can anyone help with the error above?
Getting predictions for lme() and lmer() is documented on http://glmm.wikidot.com/faq

Resources