How do you fit a linear mixed model with an AR(1) random effects correlation structure in R? - r

I am trying to use R to rerun someone else's project, so we need to use some macros in R.
Here comes a very basic question:
m1.nlme = lme(log.bp.dia ~ M25.9to9.ma5iqr + temp.c.9to9.ma4iqr + o3.ma5iqr + sea_spring + sea_summer + sea_fall + BMI + male + age_ini, data=barbara.1.clean, random = ~ 1|study_id)
Since the model is using AR(1) [autocorrelation 1 covariance model] in SAS for within person variance, I am not sure how to do this in R.
And where I can see the index for different models, like unstructured?
Thanks

I don't know what you mean by "index" for different models, but to specify an AR(1) covariance structure for the residuals, you can add corr=corAR1() to your lme call.

The correlation at lag $1$ is say $r$, where $-1< r <1$ for a stationary $AR(1)$ model. The correlation at lag $k \geq 1$ is $r^k$. This gives you the autocovariance matrix by just multiplying by the variance of $X_t$.

Related

Restricted Cubic Spline output in R rms package after cph

I am developing a COX regression model in R.
The model I am currently using is as follows
fh <- cph(S ~ rcs(MPV,4) + rcs(age,3) + BMI + smoking + hyperten + gender +
rcs(FVCPP,3) + TLcoPP, x=TRUE, y=TRUE, surv=TRUE, time.inc=2*52)
If I then want to look at this with
print(fh, latex = TRUE)
I get 3 coefs/SE/Wald etc for MPV (MVP, MVP' and MVP'') and 2 for age (age, age').
Could someone please explain to me what these outputs are? i.e. I believe they are to do with the restricted cubic splines I have added.
When you write rcs(MPV,4), you define the number of knots to use in the spline; in this case 4. Similarly, rcs(age,3) defines a spline with 3 knots. Due to identifiability constraints, 1 knot from each spline is subtracted out. You can think of this as defining an intercept for each spline. So rcs(Age,3) is a linear combination of 2 nonlinear basis functions and an intercept, while rcs(MPV,4) is a linear combination of 3 nonlinear basis functions and an intercept, i.e.,
and
In the notation above, what you get out from the print statement are the regression coefficients and , with corresponding standard errors, p-values etc. The intercepts and are typically set to zero, but they are important, because without them, the model fitting routine how have no idea of where on the y-axis to constrain the splines.
As a final note, you might actually be more interested in the output of summary(fh).

R equivalent to random residual by subject in SAS

I can code this problem in SAS with residual as a random effect ( I believe this is a r-side random intercept by fish)
proc glimmix data=one method=mmpl ;
class fish;
model increment =age growth_year age*growth_year;
random residual / subject=fish ;
run;
Here is the same analysis with AR(1) covariance structure.
proc glimmix data=one method=mmpl ;
class fish;
model increment =age growth_year age*growth_year;
random residual/ subject=fish type = ar(1) ;
run;
Here is my attempt in R to reproduce the first model that doesn't work.
model = lmer(increment ~ age + growth_year+ age*growth_year
+ (resid()|fish), data = SR_data)
Please Help, Use of lmer or glmer(gamma instead of normal distribution) or lme or any other package that I am unaware of.
The lme4 package doesn't allow R-side models, but nlme does. If you want correlation within fish without random effects of fish (i.e. R-side effects only, without any G-side effects), then I think you want to use gls: here's an example using the Orthodont data from the nlme package:
library("nlme")
gls(distance~age*Sex, correlation=corAR1(form=~1|Subject), data=Orthodont)
If you want to allow variation in the baseline value/intercept by group (both G- and R-side), then you'd use:
lme(distance~age*Sex, random = ~1|Subject,
correlation=corAR1(form=~1|Subject), data=Orthodont)
If you want variation in the baseline but not correlated residuals within subject (G-side only):
lme(distance~age*Sex, random=~1|Subject, data=Orthodont)
or
library(lme4)
lmer(distance~age*Sex + (1|Subject), data=Orthodont)

How to get confidence interval for hypothesis test of non-linear multiple parameters

I am trying to do something that seems very simple yet I cannot find any good advice out there. I would like to get the confidence interval for the non-linear combination of two coefficients in a regression model. I cam use linearHypothesis() to conduct an F-test and get the p-value for a linear combination. The code I ran for that part is:
reg4 <- lm(bpsys ~ current_tobac + male + wtlb + age, data=NAMCS2010)
linearHypothesis(reg4, "current_tobac + male = 0")
I can use glht() from the multcomp package to get the confidence interval for a linear combination of parameters:
confcm <- summary(glht(reg4, linfct = c("current_tobac + male = 0")))
confint(confcm)
But I'm not sure what to do for a non-linear combination like (summary(reg4)$coefficients[2])/ (summary(reg4)$coefficients[4])
Any advice?

Mixed Modelling - Different Results between lme and lmer functions

I am currently working through Andy Field's book, Discovering Statistics Using R. Chapter 14 is on Mixed Modelling and he uses the lme function from the nlme package.
The model he creates, using speed dating data, is such:
speedDateModel <- lme(dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality,
random = ~1|participant/looks/personality)
I tried to recreate a similar model using the lmer function from the lme4 package; however, my results are different. I thought I had the proper syntax, but maybe not?
speedDateModel.2 <- lmer(dateRating ~ looks + personality + gender +
looks:gender + personality:gender +
(1|participant) + (1|looks) + (1|personality),
data = speedData, REML = FALSE)
Also, when I run the coefficients of these models I notice that it only produces random intercepts for each participant. I was trying to then create a model that produces both random intercepts and slopes. I can't seem to get the syntax correct for either function to do this. Any help would be greatly appreciated.
The only difference between the lme and the corresponding lmer formula should be that the random and fixed components are aggregated into a single formula:
dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality+ (1|participant/looks/personality)
using (1|participant) + (1|looks) + (1|personality) is only equivalent if looks and personality have unique values at each nested level.
It's not clear what continuous variable you want to define your slopes: if you have a continuous variable x and groups g, then (x|g) or equivalently (1+x|g) will give you a random-slopes model (x should also be included in the fixed-effects part of the model, i.e. the full formula should be y~x+(x|g) ...)
update: I got the data, or rather a script file that allows one to reconstruct the data, from here. Field makes a common mistake in his book, which I have made several times in the past: since there is only a single observation in the data set for each participant/looks/personality combination, the three-way interaction has one level per observation. In a linear mixed model, this means the variance at the lowest level of nesting will be confounded with the residual variance.
You can see this in two ways:
lme appears to fit the model just fine, but if you try to calculate confidence intervals via intervals(), you get
intervals(speedDateModel)
## Error in intervals.lme(speedDateModel) :
## cannot get confidence intervals on var-cov components:
## Non-positive definite approximate variance-covariance
If you try this with lmer you get:
## Error: number of levels of each grouping factor
## must be < number of observations
In both cases, this is a clue that something's wrong. (You can overcome this in lmer if you really want to: see ?lmerControl.)
If we leave out the lowest grouping level, everything works fine:
sd2 <- lmer(dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality+
(1|participant/looks),
data=speedData)
Compare lmer and lme fixed effects:
all.equal(fixef(sd2),fixef(speedDateModel)) ## TRUE
The starling example here gives another example and further explanation of this issue.

Jackknife in logistic regression

I'm interested into apply a Jackknife analysis to in order to quantify the uncertainty of my coefficients estimated by the logistic regression. I´m using a glm(family=’binomial’) because my independent variable is in 0 - 1 format.
My dataset has 76000 obs, and I'm using 7 independent variables plus an offset. The idea involves to split the data in let’s say 5 random subsets and then obtaining the 7 estimated parameters by dropping one subset at a time from the dataset. Then I can estimate uncertainty of the parameters.
I understand the procedure but I'm unable to do it in R.
This is the model that I'm fitting:
glm(f_ocur ~ altitud + UTM_X + UTM_Y + j_sin + j_cos + temp_res + pp +
offset(log(1/off)), data = mydata, family = 'binomial')
Does anyone have an idea of how can I make this possible?
Jackknifing a logistic regression model is incredibly inefficient. But an easy time intensive approach would be like this:
Formula <- f_ocur~altitud+UTM_X+UTM_Y+j_sin+j_cos+temp_res+pp+offset(log(1/off))
coefs <- sapply(1:nrow(mydata), function(i)
coef(glm(Formula, data=mydata[-i, ], family='binomial'))
)
This is your matrix of leave-one-out coefficient estimates. The covariance matrix of this matrix estimates the covariance matrix of the parameter estimates.
A significant time improvement could be had by using glm's workhorse function, glm.fit. You can go even farther by linearizing the model (use one-step estimation, limit niter in the Newton Raphson algorithm to one iteration only, using Jackknife SEs for the one-step estimators are still robust, unbiased, the whole bit...)

Resources