How do I plot predictions from new data fit with gee, lme, glmer, and gamm4 in R? - r

I have fit my discrete count data using a variety of functions for comparison. I fit a GEE model using geepack, a linear mixed effect model on the log(count) using lme (nlme), a GLMM using glmer (lme4), and a GAMM using gamm4 (gamm4) in R.
I am interested in comparing these models and would like to plot the expected (predicted) values for a new set of data (predictor variables). My goal is to compare the predicted effects for each model under particular conditions (x variables). Of particular interest is the comparison between marginal (GEE) and conditional estimates.
I think my main problem might be getting the new data in the correct form with the correct labels and attributes and such. I am still very much an R novice and struggle with this stuff (no course on this at my university unfortunately).
I currently have fitted models
gee1 lme1 lmer1 gamm1
and can extract their fixed effect coefficients and standard errors without a problem. I also don't have a problem converting them from the log scale or estimating confidence intervals accounting for the random effects.
I also have my new dataframe newdat which has 365 observations of 23 variables (average environmental data for each day of the year).
I am stuck on how to predict new count estimates from this. I played around with the model.matrix function but couldn't get it to work. For example, I tried:
mm = model.matrix(terms(glmm1), newdat) # Error in model.frame.default(object,
# data, xlev = xlev) : object is not a matrix
newdat$pcount = mm %*% fixef(glmm1)
Any suggestions or good references would be greatly appreciated. Can anyone help with the error above?

Getting predictions for lme() and lmer() is documented on http://glmm.wikidot.com/faq

Related

Survival Curves For Cox PH Models. Checking My Understanding About Plotting Them

Im using the book Applied Survival Analysis Using R by Moore to try and model some time-to-event data. The issue I'm running into is plotting the estimated survival curves from the cox model. Because of this I'm wondering if my understanding of the model is wrong or not. My data is simple: a time column t, an event indicator column (1 for event 0 for censor) i, and a predictor column with 6 factor levels p.
I believe I can plot estimated surival curves for a cox model as follows below. But I don't understand how to use survfit and baseplot, nor functions from survminer to achieve the same end. Here is some generic code for clarifying my question. I'll use the pharmcoSmoking data set to demonstrate my issue.
library(survival)
library(asaur)
t<-pharmacoSmoking$longestNoSmoke
i<-pharmacoSmoking$relapse
p<-pharmacoSmoking$levelSmoking
data<-as.data.frame(cbind(t,i,p))
model <- coxph(Surv(data$t, data$i) ~ p, data=data)
As I understand it, with the following code snippets, modeled after book examples, a baseline (cumulative) hazard at my reference factor level for p may be given from
base<-basehaz(model, centered=F)
An estimate of the survival curve is given by
s<-exp(-base$hazard)
t<-base$time
plot(s~t, typ = "l")
The survival curve associated with a different factor level may then be given by
beta_n<-model$coefficients #only one coef in this case
s_n <- s^(exp(beta_n))
lines(s_n~t)
where beta_n is the coefficient for the nth factor level from the cox model. The code above gives what I think are estimated survival curves for heavy vs light smokers in the pharmcoSmokers dataset.
Since thats a bit of code I was looking to packages for a one-liner solution, I had a hard time with the documentation for Survival ( there weren't many examples in the docs) and also tried survminer. For the latter I've tried:
library(survminer)
ggadjustedcurves(model, variable ="p" , data=data)
This gives me something different than my prior code, although it is similar. Is the method I used earlier incorrect? Or is there a different methodology that accounts for the difference? The survminer code doesn't work from my data (I get a 'can't allocated vector of size yada yada error, and my data is ~1m rows) which seems weird considering I can make plots using what I did before no problem. This is the primary reason I am wondering if I am understanding how to plot survival curves for my model.

I get an error with functions of nlme package in R

I am trying to fit a linear growth model (LGM) in R, and I understand that the primary steps would be to fit a Null model with time as a predictor of my independent variable Y (allowing for random effects) and a Null model not allowing for random effects, then compare the two and see whether the random effect is strong enough to justify the usage of the model with random intercept.
I managed to fit the model with random intercept with the lmer function of the lme4 package, but I can't find a function in that package that allows me to fit a model without random intercept.
I have tried to fit models both with random intercept (lme function) and without (gls function) with the nlme package, but neither of them have been working for me.
My original code was:
library(nlme)
LMModel <- lme(Y~Time, random=~Time| ID, data=dataset,
method="ML")
and running that, I got an error saying "missing values in object" (apparently referring to my Time variable). I thus added a transformation of my dataset into a matrix with "matr <- as.matrix(dataset)" and added the missing data management part to my code, which ended up being:
LMModel <- lme(Y~Time, random=~Time| ID, data=dataset,
method="ML", na.action = na.exclude(matr))
Running this, I get the error: ' could not find function "1" '
I further tried to fit a model with no random effect with the gls function of nlme and got the exact same error.
I feel quite lost as I can't seem to figure out what that function 1 means. Any ideas of what might be happening here?
Thanks a lot in advance for the help!
Federico

Moving from SPSS to R: Defining a two-intercept model in gls with random effect and repeated measure

I am new to R and am trying to reproduce results from my SPSS analyses, but seem to be missing something.
I am trying to run a linear mixed effects model using gls in the nlme package.
The SPSS syntax I am trying to reproduce is:
MIXED Satisfaction_A BY Role
/FIXED=Role | NOINT SSTYPE(3)
/METHOD=ML
/PRINT=SOLUTION TESTCOV
/RANDOM=Role | SUBJECT(focalid) COVTYPE(UNR)
/REPEATED=Role | SUBJECT(dyadid*focalid) COVTYPE(UNR).
Essentially, it is a two-intercept model with nested data where Focal ID is the level-2/nesting variable which contains 2 responses for Satisfaction, distinguished by Role.
The R code I have so far is:
gls(Satisfaction_A ~ Role -1, #Two-intercept approach
data = chlpairwise,
correlation = corSymm(form = ~1|focalid/dyadid),
weights = varIdent(form = ~1|Role),
method = "ML",
na.action = na.omit)
The regression coefficients are very similar to those from SPSS. But what am I missing in the code so that I can view the Covariance Parameters like in SPSS? SPSS Covariance Parameters
Many thanks! I hope to keep learning so that I can eventually give back to this community for all of the help I have received. :)
Probably you need to use the function getVarCov() that returns the marginal covariance matrix from a fitted marginal model. It will also work if you fit a linear mixed effects model using function lme().

Using stepAIC to make out of sample predictions

just had a quick question on using Step AIC to make prediction. I'm a beginner in R, so please pardon if the solution is obvious. Tried searching around but couldn't really find what I was looking for.
So I'm trying to predict the response variable, after running stepwise AIC on a main model (main model has all the explanatory variables). The stepAIC gives out a new model that has a reduced number of variables. My question is how do I do an out of sample prediction using the new reduced model. In other words, how does I reduce the dataset so that when I feed it into predict.lm, it only has the variables that were selected in the reduced model.
Here's my code below:
# Specify start and end row of the first 5 year window
start_row=1
end_row=60
#declare matrix that will contain the predicted returns by specifying dimensions
predicted=matrix(0,179,7)
y_var=as.matrix(orig_data[start_row:end_row,2:7])
x_var=as.matrix(orig_data[start_row:end_row,8:27])
# Perform linear regression on all factors and then select factors using stepwise AIC method
initial_model<- lm(y_var[,1]~x_var[,1]+x_var[,2]+x_var[,3]+x_var[,4]+x_var[,5]+x_var[,6]+x_var[,7]+x_var[,8]+x_var[,9]+x_var[,10]+x_var[,11]+x_var[,12]+x_var[,13]+x_var[,14]+x_var[,15]+x_var[,16]+x_var[,17]+x_var[,18]+x_var[,19]+x_var[,20])
reduced_model<-stepAIC(initial_model, direction="both")
reduced_coefs<-t(as.matrix(coef(reduced_model)))
x_input<-as.matrix(x_var[60,])
Basically how do I multiply the coefficients that I get from the reduced model to only the corresponding explanatory variables in "x_var" (which has all the explanatory variables)
Thanks a lot for your help!

GLM with autoregressive term to correct for serial correlation

I have a stationary time series to which I want to fit a linear model with an autoregressive term to correct for serial correlation, i.e. using the formula At = c1*Bt + c2*Ct + ut, where ut = r*ut-1 + et
(ut is an AR(1) term to correct for serial correlation in the error terms)
Does anyone know what to use in R to model this?
Thanks
Karl
The GLMMarp package will fit these models. If you just want a linear model with Gaussian errors, you can do it with the arima() function where the covariates are specified via the xreg argument.
There are several ways to do this in R. Here are two examples using the "Seatbelts" time series dataset in the datasets package that comes with R.
The arima() function comes in package:stats that is included with R. The function takes an argument of the form order=c(p, d, q) where you you can specify the order of the auto-regressive, integrated, and the moving average component. In your question, you suggest that you want to create a AR(1) model to correct for first-order autocorrelation in the errors and that's it. We can do that with the following command:
arima(Seatbelts[,"drivers"], order=c(1,0,0),
xreg=Seatbelts[,c("kms", "PetrolPrice", "law")])
The value for order specifies that we want an AR(1) model. The xreg compontent should be a series of other Xs we want to add as part of a regression. The output looks a little bit like the output of summary.lm() turned on its side.
Another alternative process might be more familiar to the way you've fit regression models is to use gls() in the nlme package. The following code turns the Seatbelt time series object into a dataframe and then extracts and adds a new column (t) that is just a counter in the sorted time series object:
Seatbelts.df <- data.frame(Seatbelts)
Seatbelts.df$t <- 1:(dim(Seatbelts.df)[1])
The two lines above are only getting the data in shape. Since the arima() function is designed for time series, it can read time series objects more easily. To fit the model with nlme you would then run:
library(nlme)
m <- gls(drivers ~ kms + PetrolPrice + law,
data=Seatbelts.df,
correlation=corARMA(p=1, q=0, form=~t))
summary(m)
The line that begins with "correlation" is the way you pass in the ARMA correlation structure to GLS. The results won't be exactly the same because arima() uses maximum likelihood to estimate models and gls() uses restricted maximum likelihood by default. If you add method="ML" to the call to gls() you will get identical estimates you got with the ARIMA function above.
What is your link function?
The way you describe it sounds like a basic linear regression with autocorrelated errors. In that case, one option is to use lm to get a consistent estimate of your coefficients and use Newey-West HAC standard errors.
I'm not sure the best answer for GLM more generally.

Resources