Converting Repeated Measures mixed model formula from SAS to R - r

There are several questions and posts about mixed models for more complex experimental designs, so I thought this more simple model would help other beginners in this process as well as I.
So, my question is I would like to formulate a repeated measures ancova in R from sas proc mixed procedure:
proc mixed data=df1;
FitStatistics=akaike
class GROUP person day;
model Y = GROUP X1 / solution alpha=.1 cl;
repeated / type=cs subject=person group=GROUP;
lsmeans GROUP;
run;
Here is the SAS output using the data created in R (below):
. Effect panel Estimate Error DF t Value Pr > |t| Alpha Lower Upper
Intercept -9.8693 251.04 7 -0.04 0.9697 0.1 -485.49 465.75
panel 1 -247.17 112.86 7 -2.19 0.0647 0.1 -460.99 -33.3510
panel 2 0 . . . . . . .
X1 20.4125 10.0228 7 2.04 0.0811 0.1 1.4235 39.4016
Below is how I formulated the model in R using 'nlme' package, but am not getting similar coefficient estimates:
## create reproducible example fake panel data set:
set.seed(94); subject.id = abs(round(rnorm(10)*10000,0))
set.seed(99); sds = rnorm(10,15,5);means = 1:10*runif(10,7,13);trends = runif(10,0.5,2.5)
this = NULL; set.seed(98)
for(i in 1:10) { this = c(this,rnorm(6, mean = means[i], sd = sds[i])*trends[i]*1:6)}
set.seed(97)
that = sort(rep(rnorm(10,mean = 20, sd = 3),6))
df1 = data.frame(day = rep(1:6,10), GROUP = c(rep('TEST',30),rep('CONTROL',30)),
Y = this,
X1 = that,
person = sort(rep(subject.id,6)))
## use package nlme
require(nlme)
## run repeated measures mixed model using compound symmetry covariance structure:
summary(lme(Y ~ GROUP + X1, random = ~ +1 | person,
correlation=corCompSymm(form=~day|person), na.action = na.exclude,
data = df1,method='REML'))
Now, the output from R, which I now realize is similar to the output from lm():
Value Std.Error DF t-value p-value
(Intercept) -626.1622 527.9890 50 -1.1859379 0.2413
GROUPTEST -101.3647 156.2940 7 -0.6485518 0.5373
X1 47.0919 22.6698 7 2.0772934 0.0764
I believe I'm close as to the specification, but not sure what piece I'm missing to make the results match (within reason..). Any help would be appreciated!
UPDATE: Using the code in the answer below, the R output becomes:
> summary(model2)
Scroll to bottom for the parameter estimates -- look! identical to SAS.
Linear mixed-effects model fit by REML
Data: df1
AIC BIC logLik
776.942 793.2864 -380.471
Random effects:
Formula: ~GROUP - 1 | person
Structure: Diagonal
GROUPCONTROL GROUPTEST Residual
StdDev: 184.692 14.56864 93.28885
Correlation Structure: Compound symmetry
Formula: ~day | person
Parameter estimate(s):
Rho
-0.009929987
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | GROUP
Parameter estimates:
TEST CONTROL
1.000000 3.068837
Fixed effects: Y ~ GROUP + X1
Value Std.Error DF t-value p-value
(Intercept) -9.8706 251.04678 50 -0.0393178 0.9688
GROUPTEST -247.1712 112.85945 7 -2.1900795 0.0647
X1 20.4126 10.02292 7 2.0365914 0.0811

Please try below:
model1 <- lme(
Y ~ GROUP + X1,
random = ~ GROUP | person,
correlation = corCompSymm(form = ~ day | person),
na.action = na.exclude, data = df1, method = "REML"
)
summary(model1)
I think random = ~ groupvar | subjvar option with R lme provides similar result of repeated / subject = subjvar group = groupvar option with SAS/MIXED in this case.
Edit:
SAS/MIXED
R (a revised model2)
model2 <- lme(
Y ~ GROUP + X1,
random = list(person = pdDiag(form = ~ GROUP - 1)),
correlation = corCompSymm(form = ~ day | person),
weights = varIdent(form = ~ 1 | GROUP),
na.action = na.exclude, data = df1, method = "REML"
)
summary(model2)
So, I think these covariance structures are very similar (σg1 = τg2 + σ1).
Edit 2:
Covariate estimates (SAS/MIXED):
Variance person GROUP TEST 8789.23
CS person GROUP TEST 125.79
Variance person GROUP CONTROL 82775
CS person GROUP CONTROL 33297
So
TEST group diagonal element
= 125.79 + 8789.23
= 8915.02
CONTROL group diagonal element
= 33297 + 82775
= 116072
where diagonal element = σk1 + σk2.
Covariate estimates (R lme):
Random effects:
Formula: ~GROUP - 1 | person
Structure: Diagonal
GROUP1TEST GROUP2CONTROL Residual
StdDev: 14.56864 184.692 93.28885
Correlation Structure: Compound symmetry
Formula: ~day | person
Parameter estimate(s):
Rho
-0.009929987
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | GROUP
Parameter estimates:
1TEST 2CONTROL
1.000000 3.068837
So
TEST group diagonal element
= 14.56864^2 + (3.068837^0.5 * 93.28885 * -0.009929987) + 93.28885^2
= 8913.432
CONTROL group diagonal element
= 184.692^2 + (3.068837^0.5 * 93.28885 * -0.009929987) + (3.068837 * 93.28885)^2
= 116070.5
where diagonal element = τg2 + σ1 + σg2.

Oooh, this is going to be a tricky one, and if it's even possible using standard nlme functions, is going to take some serious study of Pinheiro/Bates.
Before you spend the time doing that though, you should make absolutely sure that this is exact model you need. Perhaps there's something else that might fit the story of your data better. Or maybe there's something R can do more easily that is just as good, but not quite the same.
First, here's my take on what you're doing in SAS with this line:
repeated / type=cs subject=person group=GROUP;
This type=cs subject=person is inducing correlation between all the measurements on the same person, and that correlation is the same for all pairs of days. The group=GROUP is allowing the correlation for each group to be different.
In contrast, here's my take on what your R code is doing:
random = ~ +1 | person,
correlation=corCompSymm(form=~day|person)
This code is actually adding almost the same effect in two different ways; the random line is adding a random effect for each person, and the correlation line is inducing correlation between all the measurements on the same person. However, these two things are almost identical; if the correlation is positive, you get the exact same result by including either of them. I'm not sure what happens when you include both, but I do know that only one is necessary. Regardless, this code has the same correlation for all individuals, it's not allowing each group to have their own correlation.
To let each group have their own correlation, I think you have to build a more complicated correlation structure up out of two different pieces; I've never done this but I'm pretty sure I remember Pinheiro/Bates doing it.
You might consider instead adding a random effect for person and then letting the variance be different for the different groups with weights=varIdent(form=~1|group) (from memory, check my syntax, please). This won't quite be the same but tells a similar story. The story in SAS is that the measurements on some individuals are more correlated than the measurements on other individuals. Thinking about what that means, the measurements for individuals with higher correlation will be closer together than the measurements for individuals with lower correlation. In contrast, the story in R is that the variability of measurements within individuals varies; thinking about that, measurements with higher variability with have lower correlation. So they do tell similar stories, but come at it from opposite sides.
It is even possible (but I would be surprised) that these two models end up being different parameterizations of the same thing. My intuition is that the overall measurement variability will be different in some way. But even if they aren't the same thing, it would be worth writing out the parameterizations just to be sure you understand them and to make sure that they are appropriately describing the story of your data.

Related

lme specifying correlation structure for three-level model

I'm new to R and to multilevel modeling. I have a data set where I have a dependent variable y and predictor x, both of which are measured one time per day over a number of days within subjects. In addition, each subject is part of a twin pair. So in terms of my nesting structure, I have my measurements, nested within subject, nested within family (FamID). I expect my measurement values to be correlated over days within subjects, so I would like to specify an autocorrelation structure of order 1. Below is how I am specifying my model:
m1 <- lme(y ~ x,
random = list(~1 + Subject | FamID, ~1 + x | Subject),
data = dataset, method="ML",
correlation=corAR1(,form=~1|Subject),
na.action="na.omit")
However, I receive the error message,
incompatible formulas for groups in 'random' and 'correlation'
Might anyone be able to help me appropriately specify this model?
As #Roland says, the correlation and random effects formulas must match. See below ...
Simulate data:
dd <- expand.grid(
FamID = 1:3,
Subject = 1:2,
time = 1:10)
set.seed(101)
dd$x <- rnorm(nrow(dd))
dd$y <- rnorm(nrow(dd))
library(nlme)
m1 <- lme(y~x,
random = ~1|FamID/Subject,
data = dd,
method = "ML",
correlation = corAR1(form = ~1|FamID/Subject))
Notes:
1|FamID/Subject specifies "Subject nested within FamID", which sounds like what you described. Your current random effect specification list(~1 + Subject | FamID, ~1 + x | Subject) makes little sense to me: this would indicate
random effects of subject within family (i.e., separate variances for each subject, and an arbitrary correlation between subjects)
random slopes (effects of intercept and slope, arbitrarily correlated) within family
(the simpler 1|FamID/Subject specification does imply correlation between subjects, through the shared family effect; however, this correlation must be ≥ 0, unlike the 1 + Subject | FamID specification. The 1 + Subject | FamID` specification is also a little bit weird because it implies that the twins in a family are non-exchangeable, i.e. 'twin 1' and 'twin 2' would be specified in some way ...)
This is most likely overparameterized/unidentifiable (if you do want a random-slopes model allowing for the variation in the effects of x across subjects and/or families you can use random = ~1+x|FamID/Subject to estimate slopes at both levels — I checked, and this does still work with the correlation argument. I don't know if it's possible to specify random slopes at only one level (e.g. across subjects but not families) in lme ...
corAR1(form = ~1|FamID/Subject) seems as though it might specify two autocorrelation parameters (at the levels of both family and subject-within-family), but according to the output (below) only one is estimated.
(Note that the random effects estimated are vanishingly small because I used made-up data with no structure.)
Linear mixed-effects model fit by maximum likelihood
Data: dd
Log-likelihood: -81.77192
Fixed: y ~ x
(Intercept) x
0.08731064 -0.09266083
Random effects:
Formula: ~1 | FamID
(Intercept)
StdDev: 2.303609e-05
Formula: ~1 | Subject %in% FamID
(Intercept) Residual
StdDev: 2.414397e-06 0.9598456
Correlation Structure: AR(1)
Formula: ~1 | FamID/Subject
Parameter estimate(s):
Phi
-0.181599
Number of Observations: 60
Number of Groups:
FamID Subject %in% FamID
3 6

lme4: How to specify random slopes while constraining all correlations to 0?

Due to an interesting turn of events, I'm trying use the lme4 package in R to fit a model in which the random slopes are not allowed to correlate with each other or the random intercept. Effectively, I want to estimate the variance parameter for each random slope, but none of the correlations/covariances. From the reading I've done so far, I think what I want is effectively a diagonal variance/covariance structure for the random effects.
An answer to a similar question here provides a workaround to specify a model where slopes are correlated with intercepts, but not with each other. I also know the || syntax in lme4 makes slopes that are correlated with each other, but not with the intercepts. Neither of these seems to fully accomplish what I'm looking to do.
Borrowing the example from the earlier post, if my model is:
m1 <- lmer (Y ~ A + B + (1+A+B|Subject), data=mydata)
is there a way to specify the model such that I estimate variance parameters for A and B while constraining all three correlations to 0? I would like to achieve a result that looks something like this:
VarCorr(m1)
## Groups Name Std.Dev. Corr
## Subject (Intercept) 1.41450
## A 1.49374 0.000
## B 2.47895 0.000 0.000
## Residual 0.96617
I'd prefer a solution that could achieve this for an arbitrary number of random slopes. For example, if I were to add a random effect for a third variable C, there would be 6 correlation parameters to fix at 0 rather than 3. However, anything that could get me started in the right direction would be extremely helpful.
Edit:
On asking this question, I misunderstood what the || syntax does in lme4. Struck through the incorrect statement above to avoid misleading anyone in the future.
This is exactly what the double-bar notation does. However, note that the || in lme4 does not work as one might expect for factor variables. It does work 'properly' in glmmTMB, and the afex::mixed() function is a wrapper for [g]lmer which does implement a fully functional version of ||. (I have meant to import this into lme4 for years but just haven't gotten around to it yet ...)
simulated example
library(lme4)
set.seed(101)
dd <- data.frame(A = runif(500), B = runif(500),
Subject = factor(rep(1:25, 20)))
dd$Y <- simulate(~ A + B + (1 + A + B|Subject),
newdata = dd,
family = gaussian,
newparams = list(beta = rep(1,3), theta = rep(1,6), sigma = 1))[[1]]
solution
summary(m <- lmer (Y ~ A + B + (1+A+B||Subject), data=dd))
The correlations aren't listed because they are structurally absent (internally, the random effects term is expanded to (1|Subject) + (0 + A|Subject) + (0+B|Subject), which is also why the groups are listed as Subject, Subject.1, Subject.2).
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 0.8744 0.9351
Subject.1 A 2.0016 1.4148
Subject.2 B 2.8718 1.6946
Residual 0.9456 0.9724
Number of obs: 500, groups: Subject, 25

Calculate indirect effect of 1-1-1 (within-person, multilevel) mediation analyses

I have data from an Experience Sampling Study, which consists of 8140 observations nested in 106 participants. I want to test if there is a mediation, in which I also want to compare the predictors (X1= socialInteraction_tech, X2= socialInteraction_ftf, M = MPEE_int, Y= wellbeing). X1, X2, and M are person-mean centred in order to obtain the within-person effects. To account for the autocorrelation I have fit a model with an ARMA(2,1) structure. We control for time with the variable "obs".
This is the final model including all variables of interest:
fit_mainH1xmy <- lme(fixed = wellbeing ~ 1 + obs # Controls
+ MPEE_int_centred + socialInteraction_tech_centred + socialInteraction_ftf_centred,
random = ~ 1 + obs | ID, correlation = corARMA(form = ~ obs | ID, p = 2, q = 1),
data = file, method = "ML", na.action=na.exclude)
summary(fit_mainH1xmy)
The mediation is partial, as my predictor X still significantly predicts Y after adding M.
However, I can't find a way to calculate c'(cprime), the indirect effect.
I have found the mlma package, but it looks weird and requires me to do transformations to my data.
I have tried melting the data in a long format and using lmer() to fit the model (following https://quantdev.ssri.psu.edu/sites/qdev/files/ILD_Ch07_2017_Within-PersonMedationWithMLM.html), but lmer() does not let me take into account the moving average (MA-part of the ARMA(2,1) structure).
Does anyone know how I could now obtain the indirect effect?

Syntax for diagonal variance-covariance matrix for non-linear mixed effects model in nlme

I am analysing routinely collected substance use data during the first 12 months' of treatment in a large sample of outpatients attending drug and alcohol treatment services. I am interested in whether differing levels of methamphetamine use (no use, low use, and high use) at the outset of treatment predicts different levels after a year in treatment, but the data is very irregular, with different clients measured at different times and different numbers of times during their year of treatment.
The data for the high and low use group seem to suggest that drug use at outset reduces during the first 3 months of treatment and then asymptotes. Hence I thought I would try a non-linear exponential decay model.
I started with the following nonlinear generalised least squares model using the gnls() function in the nlme package:
fitExp <- gnls(outcome ~ C*exp(-k*yearsFromStart),
params = list(C ~ atsBase_fac, k ~ atsBase_fac),
data = dfNL,
start = list(C = c(nsC[1], lsC[1], hsC[1]),
k = c(nsC[2], lsC[2], hsC[2])),
weights = varExp(-0.8, form = ~ yearsFromStart),
control = gnlsControl(nlsTol = 0.1))
where outcome is number of days of drug use in the 28 days previous to measurement, atsBase_fac is a three-level categorical predictor indicating level of amphetamine use at baseline (noUse, lowUse, and highUse), yearsFromStart is a continuous predictor indicating time from start of treatment in years (baseline = 0, max - 1), C is a parameter indicating initial level of drug use, and k is the rate of decay in drug use. The starting values of C and k are taken from nls models estimating these parameters for each group. These are the results of that model
Generalized nonlinear least squares fit
Model: outcome ~ C * exp(-k * yearsFromStart)
Data: dfNL
AIC BIC logLik
27672.17 27725.29 -13828.08
Variance function:
Structure: Exponential of variance covariate
Formula: ~yearsFromStart
Parameter estimates:
expon
0.7927517
Coefficients:
Value Std.Error t-value p-value
C.(Intercept) 0.130410 0.0411728 3.16738 0.0015
C.atsBase_faclow 3.409828 0.1249553 27.28839 0.0000
C.atsBase_fachigh 20.574833 0.3122500 65.89218 0.0000
k.(Intercept) -1.667870 0.5841222 -2.85534 0.0043
k.atsBase_faclow 2.481850 0.6110666 4.06150 0.0000
k.atsBase_fachigh 9.485155 0.7175471 13.21886 0.0000
So it looks as if there are differences between groups in initial rate of drug use and in rate of reduction in drug use. I would like to go a step further and fit a nonlinear mixed effects model.I tried consulting Pinhiero and Bates' book accompanying the nlme package but the only models I could find that used irregular, sparse data like mine used a self-starting function, and my model does not do that.
I tried to adapt the gnls() model to nlme like so:
fitNLME <- nlme(model = outcome ~ C*exp(-k*yearsFromStart),
data = dfNL,
fixed = list(C ~ atsBase_fac, k ~ atsBase_fac),
random = pdDiag(yearsFromStart ~ id),
groups = ~ id,
start = list(fixed = c(nsC[1], lsC[1], hsC[1], nsC[2], lsC[2], hsC[2])),
weights = varExp(-0.8, form = ~ yearsFromStart),
control = nlmeControl(optim = "optimizer"))
bit I keep getting error message, I presume through errors in the syntax specifying the random effects.
Can anyone give me some tips on how the syntax for the random effects works in nlme?
The only dataset in Pinhiero and Bates that resembled mine used a diagonal variance-covariance matrix. Can anyone filled me in on the syntax of this nlme function, or suggest a better one?
p.s. I wish I could provide a reproducible example but coming up with synthetic data that re-creates the same errors is way beyond my skills.

Analyze longitudinal data with a mixed effects model in R

I try to analyze some simulated longitudinal data in R using a mixed-effects model (lme4 package).
Simulated data: 25 subjects have to perform 2 tasks at 5 consecutive time points.
#Simulate longitudinal data
N <- 25
t <- 5
x <- rep(1:t,N)
#task1
beta1 <- 4
e1 <- rnorm(N*t, mean = 0, sd = 1.5)
y1 <- 1 + x * beta1 + e1
#task2
beta2 <- 1.5
e2 <- rnorm(N*t, mean = 0, sd = 1)
y2 <- 1 + x * beta2 + e2
data1 <- data.frame(id=factor(rep(1:N, each=t)), day = x, y = y1, task=rep(c("task1"),length(y1)))
data2 <- data.frame(id=factor(rep(1:N, each=t)), day = x, y = y2, task=rep(c("task2"),length(y2)))
data <- rbind(data1, data2)
Question1: How to analyze how a subject learns each task?
library(lme4)
m1 <- lmer(y ~ day + (1 | id), data=data1)
summary(m1)
...
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 1.2757 0.3561 123.0000 3.582 0.000489 ***
day 3.9299 0.1074 123.0000 36.603 < 2e-16 ***
With ranef(m1) I get the random intercept for each subject, which I think reflects the baseline value for each subject at day = 1. But I don't understand how I can tell how an individual learns a task, or whether subjects differ in the way how they learn the task.
Question2: How can I analyze whether the way subjects learn differ between task1 and task2.
I expanded on your example to answer your questions briefly, but I can recommend reading chapter 15 of Snijders & Bosker (2012) or the book by Singer & Willet (2003) for a better explanation. Day is treated as a continuous variable in your model, seeing as you have panel data (i.e. everyone is measured at the same day) and day has no meaning apart from indicating the different measurement occasions, it may be better to treat day as a factor (i.e. use dummy variables).
However, for now I will continue with your example
Your first model (I think you want data instread of data1) gives a fixed linear slope (i.e. average slope, no difference in the tasks, no difference between individuals). The fixed intercept is the performance when day is 0, which has no meaning so you may want to consider centering the effect of day for a better interpretation (or indeed use dummies). The random effect gives the individual deviance from this intercept which has an estimated variance of 0.00 in your example so individuals hardly differ from each other in their starting position.
m1 <- lmer(y ~ day + (1 | id), data=data)
summary(m1)
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 0.00 0.000
Residual 18.54 4.306
Number of obs: 250, groups: id, 25
We can extend this model by adding an interaction with task. Meaning that the fixed slope is different for task1 and task2 which answers question 2 I believe (you can also use update() to update your model)
m2 <- lmer(y ~ day*task + (1|id), data = data)
summary(m2)
The effect of day in this model is the fixed slope of your reference category (task1) and the interaction is the difference between the slope of task1 and task2. The fixed effect of task is the difference in intercept.
model fit can be assessed with a deviance test, read Snijders & Boskers (2012) for an explanation of ML and REML estimates.
anova(m1,m2)
To add a random effect for the growth of individuals we can update the model again, which answers question 1
m3 <- lmer(y ~ day*task + (day|id), data = data)
summary(m3)
ranef(m3)
The random effects indicate the individual deviations in slope and intercept. A summary of the distribution of you random effects is included in the model summary (same as for m1).
Finally I think you could add a random effect on the day-task interaction to assess whether individuals differ in their performance growth on task1 and task2. But this depends very much on your data and the performance of the previous models.
m4 <- lmer(y ~ day*task + (day*task|id), data = data)
summary(m4)
ranef(m4)
Hope this helps. The books I recommended certainly should. Both provide excellent examples and explanation of theory (no R examples unfortunately). If you decide on a fixed occasion model (effect of day expressed by dummies) the nlme package provides excellent options to control the covariance structure of random effects. Good documentation of the package is provided by Pinheiro & Bates (2000).

Resources