I am interested in estimating a mixed effect model with two random components (I am sorry for the somewhat unprecise notation. I am somewhat new to these kind of models). Finally, I also want also the standard errors of the variances of the random components. That is why I am somewhat boudn to using the package lme. The reason is that I found this description on how to calculate those standard errors and also interesting, the standard error for function of these variances link.
I believe I know how to use the package lmer. I am finally interested in model2. For the model1, both command yield the same estimates. But model2 with lme yields different results than model2 with lmer from the lme4 package. Could you help me to get around how to set up the random components for lme? This would be much appreciated. Thanks. Please find attached my MWE.
Best
Daniel
#### load all packages #####
loadpackage <- function(x){
for( i in x ){
# require returns TRUE invisibly if it was able to load package
if( ! require( i , character.only = TRUE ) ){
# If package was not able to be loaded then re-install
install.packages( i , dependencies = TRUE )
}
# Load package (after installing)
library( i , character.only = TRUE )
}
}
# Then try/install packages...
loadpackage( c("nlme", "msm", "lmeInfo", "lme4"))
alcohol1 <- read.table("https://stats.idre.ucla.edu/stat/r/examples/alda/data/alcohol1_pp.txt", header=T, sep=",")
attach(alcohol1)
id <- as.factor(id)
age <- as.factor(age)
model1.lmer <-lmer(alcuse ~ 1 + peer + (1|id))
summary(model1.lmer)
model2.lmer <-lmer(alcuse ~ 1 + peer + (1|id) + (1|age))
summary(model2.lmer)
model1.lme <- lme(alcuse ~ 1+ peer, data = alcohol1, random = ~ 1 |id, method ="REML")
summary(model1.lme)
model2.lme <- lme(alcuse ~ 1+ peer, data = alcohol1, random = ~ 1 |id + 1|age, method ="REML")
Edit (15/09/2021):
Estimating the model as follows end then returning the estimates via nlme::VarCorr gives me different results. While the estimates seem to be in the ball park, it is as they are switched across components.
model2a.lme <- lme(alcuse ~ 1+ peer, data = alcohol1, random = ~ 1 |id/age, method ="REML")
summary(model2a.lme)
nlme::VarCorr(model2a.lme)
Variance StdDev
id = pdLogChol(1)
(Intercept) 0.38390274 0.6195989
age = pdLogChol(1)
(Intercept) 0.47892113 0.6920413
Residual 0.08282585 0.2877948
EDIT (16/09/2021):
Since Bob pushed me to think more about my model, I want to give some additional information. Please know that the data I use in the MWE do not match my true data. I just used it for illustrative purposes since I can not upload myy true data. I have a household panel with income, demographic informations and parent indicators.
I am interested in intergenerational mobility. Sibling correlations of permanent income are one industry standard. At the very least, contemporanous observations are very bad proxies of permanent income. Due to transitory shocks, i.e., classical measurement error, those estimates are most certainly attenuated. For this reason, we exploit the longitudinal dimension of our data.
For sibling correlations, this amounts to hypothesising that the income process is as follows:
$$Y_{ijt} = \beta X_{ijt} + \epsilon_{ijt}.$$
With Y being income from individual i from family j in year t. X comprises age and survey year indicators to account for life-cycle effects and macroeconmic conditions in survey years. Epsilon is a compund term comprising a random individual and family component as well as a transitory component (measurement error and short lived shocks). It looks as follows:
$$\epsilon_{ijt} = \alpha_i + \gamma_j + \eta_{ijt}.$$
The variance of income is then:
$$\sigma^2_\epsilon = \sigma^2_\alpha + \sigma^2\gamma + \sigma^2\eta.$$
The quantitiy we are interested in is
$$\rho = \frac(\sigma^2\gamma}{\sigma^2_\alpha + \sigma^2\gamma},$$
which reflects the share of shared family (and other characteristics) among siblings of the variation in permanent income.
B.t.w.: The struggle is simply because I want to have a standard errors for all estimates and for \rho.
This is an example of crossed vs nested random effects. (Note that the example you refer to is fitting a different kind of model, a random-slopes model rather than a model with two different grouping variables ...)
If you try with(alcohol1, table(age,id)) you can see that every id is associated with every possible age (14, 15, 16). Or subset(alcohol1, id==1) for example:
id age coa male age_14 alcuse peer cpeer ccoa
1 1 14 1 0 0 1.732051 1.264911 0.2469111 0.549
2 1 15 1 0 1 2.000000 1.264911 0.2469111 0.549
3 1 16 1 0 2 2.000000 1.264911 0.2469111 0.549
There are three possible models you could fit for a model with random effects of age(indexed by i) and id (indexed by j)
crossed ((1|age) + (1|id)): Y_{ij} = beta0 + beta1*peer + eps1_i + eps2_j +epsr_{ij}; alcohol use varies among individuals and, independently, across ages (this model won't work very well because there are only three distinct ages in the data set, more levels are usually needed)
id nested within age ((1|age/id) = (1|age) + (1|age:id)): Y_{ij} = beta0 + beta1*peer + eps1_i + eps2_{ij} + epsr_{ij}; alcohol use varies across ages, and varies across individuals within ages (see note above about number of levels).
age nested within id ((1|id/age) = (1|id) + (1|age:id)): Y_{ij} = beta0 + beta1*peer + eps1_j + eps2_{ij} + epsr_{ij}; alcohol use varies across individuals, and varies across ages within individuals
Here eps1_i, eps2_{ij}, and epsr_{ij} are normal deviates; epsr is the residual error term.
The latter two models actually don't make sense in this case; because there is only one observation per age/id combination, the nested variance (eps2) is completely confounded with the residual variance (epsr). lme doesn't notice this; if you try to fit one of the nested models in lmer it will give an error that
number of levels of each grouping factor must be < number of observations (problems: id:age)
(Although if you try to compute confidence intervals based on model1.lme you'll get an error "cannot get confidence intervals on var-cov components: Non-positive definite approximate variance-covariance", which is a hint that something is wrong.)
You could restate this problem as saying that the residual variation, and the variation among ages within individuals, are jointly unidentifiable (can't be separated from each other, statistically).
The updated answer here shows how to get the standard errors of the variance components from an lmer model, so you shouldn't be stuck with lme (but you should think carefully about which model you're really trying to fit ...)
The GLMM FAQ might also be useful.
More generally, the standard error of
rho = (V_gamma)/(V_alpha + V_gamma)
will be hard to compute accurately, because this is a nonlinear function of the model parameters. You can apply the delta method, but the most reliable approach would be to use parametric bootstrapping: if you have a fitted model m, then something like this should work:
var_ratio <- function(m) {
v <- as.data.frame(sapply(VarCorr(m), as.numeric))
return(v$family/(v$family + v$id))
}
confint(m, method="boot", FUN =var_ratio)
You should specify random effects in lme by using / not +
By lmer
model2.lmer <-lmer(alcuse ~ 1 + peer + (1|id) + (1|age), data = alcohol1)
summary(model2.lmer)
Linear mixed model fit by REML ['lmerMod']
Formula: alcuse ~ 1 + peer + (1 | id) + (1 | age)
Data: alcohol1
REML criterion at convergence: 651.3
Scaled residuals:
Min 1Q Median 3Q Max
-2.0228 -0.5310 -0.1329 0.5854 3.1545
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 0.08078 0.2842
age (Intercept) 0.30313 0.5506
Residual 0.56175 0.7495
Number of obs: 246, groups: id, 82; age, 82
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.3039 0.1438 2.113
peer 0.6074 0.1151 5.276
Correlation of Fixed Effects:
(Intr)
peer -0.814
By lme
model2.lme <- lme(alcuse ~ 1+ peer, data = alcohol1, random = ~ 1 |id/age, method ="REML")
summary(model2.lme)
Linear mixed-effects model fit by REML
Data: alcohol1
AIC BIC logLik
661.3109 678.7967 -325.6554
Random effects:
Formula: ~1 | id
(Intercept)
StdDev: 0.4381206
Formula: ~1 | age %in% id
(Intercept) Residual
StdDev: 0.4381203 0.7494988
Fixed effects: alcuse ~ 1 + peer
Value Std.Error DF t-value p-value
(Intercept) 0.3038946 0.1438333 164 2.112825 0.0361
peer 0.6073948 0.1151228 80 5.276060 0.0000
Correlation:
(Intr)
peer -0.814
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.0227793 -0.5309669 -0.1329302 0.5853768 3.1544873
Number of Observations: 246
Number of Groups:
id age %in% id
82 82
Okay, finally. Just to sketch my confidential data: I have a panel of individuals. The data includes siblings, identified via mnr. income is earnings, wavey survey year, age age factors. female a factor for gender, pid is the factor identifying the individual.
m1 <- lmer(income ~ age + wavey + female + (1|pid) + (1 | mnr),
data = panel)
vv <- vcov(m1, full = TRUE)
covvar <- vv[58:60, 58:60]
covvar
3 x 3 Matrix of class "dgeMatrix"
cov_pid.(Intercept) cov_mnr.(Intercept) residual
[1,] 2.6528679 -1.4624588 -0.4077576
[2,] -1.4624588 3.1015001 -0.0597926
[3,] -0.4077576 -0.0597926 1.1634680
mean <- as.data.frame(VarCorr(m1))$vcov
mean
[1] 17.92341 16.86084 56.77185
deltamethod(~ x2/(x1+x2), mean, covvar, ses =TRUE)
[1] 0.04242089
The last scalar should be what I interprete as the shared background of the siblings of permanent income.
Thanks to #Ben Bolker who pointed me into this direction.
Related
I'm new to R and to multilevel modeling. I have a data set where I have a dependent variable y and predictor x, both of which are measured one time per day over a number of days within subjects. In addition, each subject is part of a twin pair. So in terms of my nesting structure, I have my measurements, nested within subject, nested within family (FamID). I expect my measurement values to be correlated over days within subjects, so I would like to specify an autocorrelation structure of order 1. Below is how I am specifying my model:
m1 <- lme(y ~ x,
random = list(~1 + Subject | FamID, ~1 + x | Subject),
data = dataset, method="ML",
correlation=corAR1(,form=~1|Subject),
na.action="na.omit")
However, I receive the error message,
incompatible formulas for groups in 'random' and 'correlation'
Might anyone be able to help me appropriately specify this model?
As #Roland says, the correlation and random effects formulas must match. See below ...
Simulate data:
dd <- expand.grid(
FamID = 1:3,
Subject = 1:2,
time = 1:10)
set.seed(101)
dd$x <- rnorm(nrow(dd))
dd$y <- rnorm(nrow(dd))
library(nlme)
m1 <- lme(y~x,
random = ~1|FamID/Subject,
data = dd,
method = "ML",
correlation = corAR1(form = ~1|FamID/Subject))
Notes:
1|FamID/Subject specifies "Subject nested within FamID", which sounds like what you described. Your current random effect specification list(~1 + Subject | FamID, ~1 + x | Subject) makes little sense to me: this would indicate
random effects of subject within family (i.e., separate variances for each subject, and an arbitrary correlation between subjects)
random slopes (effects of intercept and slope, arbitrarily correlated) within family
(the simpler 1|FamID/Subject specification does imply correlation between subjects, through the shared family effect; however, this correlation must be ≥ 0, unlike the 1 + Subject | FamID specification. The 1 + Subject | FamID` specification is also a little bit weird because it implies that the twins in a family are non-exchangeable, i.e. 'twin 1' and 'twin 2' would be specified in some way ...)
This is most likely overparameterized/unidentifiable (if you do want a random-slopes model allowing for the variation in the effects of x across subjects and/or families you can use random = ~1+x|FamID/Subject to estimate slopes at both levels — I checked, and this does still work with the correlation argument. I don't know if it's possible to specify random slopes at only one level (e.g. across subjects but not families) in lme ...
corAR1(form = ~1|FamID/Subject) seems as though it might specify two autocorrelation parameters (at the levels of both family and subject-within-family), but according to the output (below) only one is estimated.
(Note that the random effects estimated are vanishingly small because I used made-up data with no structure.)
Linear mixed-effects model fit by maximum likelihood
Data: dd
Log-likelihood: -81.77192
Fixed: y ~ x
(Intercept) x
0.08731064 -0.09266083
Random effects:
Formula: ~1 | FamID
(Intercept)
StdDev: 2.303609e-05
Formula: ~1 | Subject %in% FamID
(Intercept) Residual
StdDev: 2.414397e-06 0.9598456
Correlation Structure: AR(1)
Formula: ~1 | FamID/Subject
Parameter estimate(s):
Phi
-0.181599
Number of Observations: 60
Number of Groups:
FamID Subject %in% FamID
3 6
Due to an interesting turn of events, I'm trying use the lme4 package in R to fit a model in which the random slopes are not allowed to correlate with each other or the random intercept. Effectively, I want to estimate the variance parameter for each random slope, but none of the correlations/covariances. From the reading I've done so far, I think what I want is effectively a diagonal variance/covariance structure for the random effects.
An answer to a similar question here provides a workaround to specify a model where slopes are correlated with intercepts, but not with each other. I also know the || syntax in lme4 makes slopes that are correlated with each other, but not with the intercepts. Neither of these seems to fully accomplish what I'm looking to do.
Borrowing the example from the earlier post, if my model is:
m1 <- lmer (Y ~ A + B + (1+A+B|Subject), data=mydata)
is there a way to specify the model such that I estimate variance parameters for A and B while constraining all three correlations to 0? I would like to achieve a result that looks something like this:
VarCorr(m1)
## Groups Name Std.Dev. Corr
## Subject (Intercept) 1.41450
## A 1.49374 0.000
## B 2.47895 0.000 0.000
## Residual 0.96617
I'd prefer a solution that could achieve this for an arbitrary number of random slopes. For example, if I were to add a random effect for a third variable C, there would be 6 correlation parameters to fix at 0 rather than 3. However, anything that could get me started in the right direction would be extremely helpful.
Edit:
On asking this question, I misunderstood what the || syntax does in lme4. Struck through the incorrect statement above to avoid misleading anyone in the future.
This is exactly what the double-bar notation does. However, note that the || in lme4 does not work as one might expect for factor variables. It does work 'properly' in glmmTMB, and the afex::mixed() function is a wrapper for [g]lmer which does implement a fully functional version of ||. (I have meant to import this into lme4 for years but just haven't gotten around to it yet ...)
simulated example
library(lme4)
set.seed(101)
dd <- data.frame(A = runif(500), B = runif(500),
Subject = factor(rep(1:25, 20)))
dd$Y <- simulate(~ A + B + (1 + A + B|Subject),
newdata = dd,
family = gaussian,
newparams = list(beta = rep(1,3), theta = rep(1,6), sigma = 1))[[1]]
solution
summary(m <- lmer (Y ~ A + B + (1+A+B||Subject), data=dd))
The correlations aren't listed because they are structurally absent (internally, the random effects term is expanded to (1|Subject) + (0 + A|Subject) + (0+B|Subject), which is also why the groups are listed as Subject, Subject.1, Subject.2).
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 0.8744 0.9351
Subject.1 A 2.0016 1.4148
Subject.2 B 2.8718 1.6946
Residual 0.9456 0.9724
Number of obs: 500, groups: Subject, 25
I try to analyze some simulated longitudinal data in R using a mixed-effects model (lme4 package).
Simulated data: 25 subjects have to perform 2 tasks at 5 consecutive time points.
#Simulate longitudinal data
N <- 25
t <- 5
x <- rep(1:t,N)
#task1
beta1 <- 4
e1 <- rnorm(N*t, mean = 0, sd = 1.5)
y1 <- 1 + x * beta1 + e1
#task2
beta2 <- 1.5
e2 <- rnorm(N*t, mean = 0, sd = 1)
y2 <- 1 + x * beta2 + e2
data1 <- data.frame(id=factor(rep(1:N, each=t)), day = x, y = y1, task=rep(c("task1"),length(y1)))
data2 <- data.frame(id=factor(rep(1:N, each=t)), day = x, y = y2, task=rep(c("task2"),length(y2)))
data <- rbind(data1, data2)
Question1: How to analyze how a subject learns each task?
library(lme4)
m1 <- lmer(y ~ day + (1 | id), data=data1)
summary(m1)
...
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 1.2757 0.3561 123.0000 3.582 0.000489 ***
day 3.9299 0.1074 123.0000 36.603 < 2e-16 ***
With ranef(m1) I get the random intercept for each subject, which I think reflects the baseline value for each subject at day = 1. But I don't understand how I can tell how an individual learns a task, or whether subjects differ in the way how they learn the task.
Question2: How can I analyze whether the way subjects learn differ between task1 and task2.
I expanded on your example to answer your questions briefly, but I can recommend reading chapter 15 of Snijders & Bosker (2012) or the book by Singer & Willet (2003) for a better explanation. Day is treated as a continuous variable in your model, seeing as you have panel data (i.e. everyone is measured at the same day) and day has no meaning apart from indicating the different measurement occasions, it may be better to treat day as a factor (i.e. use dummy variables).
However, for now I will continue with your example
Your first model (I think you want data instread of data1) gives a fixed linear slope (i.e. average slope, no difference in the tasks, no difference between individuals). The fixed intercept is the performance when day is 0, which has no meaning so you may want to consider centering the effect of day for a better interpretation (or indeed use dummies). The random effect gives the individual deviance from this intercept which has an estimated variance of 0.00 in your example so individuals hardly differ from each other in their starting position.
m1 <- lmer(y ~ day + (1 | id), data=data)
summary(m1)
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 0.00 0.000
Residual 18.54 4.306
Number of obs: 250, groups: id, 25
We can extend this model by adding an interaction with task. Meaning that the fixed slope is different for task1 and task2 which answers question 2 I believe (you can also use update() to update your model)
m2 <- lmer(y ~ day*task + (1|id), data = data)
summary(m2)
The effect of day in this model is the fixed slope of your reference category (task1) and the interaction is the difference between the slope of task1 and task2. The fixed effect of task is the difference in intercept.
model fit can be assessed with a deviance test, read Snijders & Boskers (2012) for an explanation of ML and REML estimates.
anova(m1,m2)
To add a random effect for the growth of individuals we can update the model again, which answers question 1
m3 <- lmer(y ~ day*task + (day|id), data = data)
summary(m3)
ranef(m3)
The random effects indicate the individual deviations in slope and intercept. A summary of the distribution of you random effects is included in the model summary (same as for m1).
Finally I think you could add a random effect on the day-task interaction to assess whether individuals differ in their performance growth on task1 and task2. But this depends very much on your data and the performance of the previous models.
m4 <- lmer(y ~ day*task + (day*task|id), data = data)
summary(m4)
ranef(m4)
Hope this helps. The books I recommended certainly should. Both provide excellent examples and explanation of theory (no R examples unfortunately). If you decide on a fixed occasion model (effect of day expressed by dummies) the nlme package provides excellent options to control the covariance structure of random effects. Good documentation of the package is provided by Pinheiro & Bates (2000).
There are several questions and posts about mixed models for more complex experimental designs, so I thought this more simple model would help other beginners in this process as well as I.
So, my question is I would like to formulate a repeated measures ancova in R from sas proc mixed procedure:
proc mixed data=df1;
FitStatistics=akaike
class GROUP person day;
model Y = GROUP X1 / solution alpha=.1 cl;
repeated / type=cs subject=person group=GROUP;
lsmeans GROUP;
run;
Here is the SAS output using the data created in R (below):
. Effect panel Estimate Error DF t Value Pr > |t| Alpha Lower Upper
Intercept -9.8693 251.04 7 -0.04 0.9697 0.1 -485.49 465.75
panel 1 -247.17 112.86 7 -2.19 0.0647 0.1 -460.99 -33.3510
panel 2 0 . . . . . . .
X1 20.4125 10.0228 7 2.04 0.0811 0.1 1.4235 39.4016
Below is how I formulated the model in R using 'nlme' package, but am not getting similar coefficient estimates:
## create reproducible example fake panel data set:
set.seed(94); subject.id = abs(round(rnorm(10)*10000,0))
set.seed(99); sds = rnorm(10,15,5);means = 1:10*runif(10,7,13);trends = runif(10,0.5,2.5)
this = NULL; set.seed(98)
for(i in 1:10) { this = c(this,rnorm(6, mean = means[i], sd = sds[i])*trends[i]*1:6)}
set.seed(97)
that = sort(rep(rnorm(10,mean = 20, sd = 3),6))
df1 = data.frame(day = rep(1:6,10), GROUP = c(rep('TEST',30),rep('CONTROL',30)),
Y = this,
X1 = that,
person = sort(rep(subject.id,6)))
## use package nlme
require(nlme)
## run repeated measures mixed model using compound symmetry covariance structure:
summary(lme(Y ~ GROUP + X1, random = ~ +1 | person,
correlation=corCompSymm(form=~day|person), na.action = na.exclude,
data = df1,method='REML'))
Now, the output from R, which I now realize is similar to the output from lm():
Value Std.Error DF t-value p-value
(Intercept) -626.1622 527.9890 50 -1.1859379 0.2413
GROUPTEST -101.3647 156.2940 7 -0.6485518 0.5373
X1 47.0919 22.6698 7 2.0772934 0.0764
I believe I'm close as to the specification, but not sure what piece I'm missing to make the results match (within reason..). Any help would be appreciated!
UPDATE: Using the code in the answer below, the R output becomes:
> summary(model2)
Scroll to bottom for the parameter estimates -- look! identical to SAS.
Linear mixed-effects model fit by REML
Data: df1
AIC BIC logLik
776.942 793.2864 -380.471
Random effects:
Formula: ~GROUP - 1 | person
Structure: Diagonal
GROUPCONTROL GROUPTEST Residual
StdDev: 184.692 14.56864 93.28885
Correlation Structure: Compound symmetry
Formula: ~day | person
Parameter estimate(s):
Rho
-0.009929987
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | GROUP
Parameter estimates:
TEST CONTROL
1.000000 3.068837
Fixed effects: Y ~ GROUP + X1
Value Std.Error DF t-value p-value
(Intercept) -9.8706 251.04678 50 -0.0393178 0.9688
GROUPTEST -247.1712 112.85945 7 -2.1900795 0.0647
X1 20.4126 10.02292 7 2.0365914 0.0811
Please try below:
model1 <- lme(
Y ~ GROUP + X1,
random = ~ GROUP | person,
correlation = corCompSymm(form = ~ day | person),
na.action = na.exclude, data = df1, method = "REML"
)
summary(model1)
I think random = ~ groupvar | subjvar option with R lme provides similar result of repeated / subject = subjvar group = groupvar option with SAS/MIXED in this case.
Edit:
SAS/MIXED
R (a revised model2)
model2 <- lme(
Y ~ GROUP + X1,
random = list(person = pdDiag(form = ~ GROUP - 1)),
correlation = corCompSymm(form = ~ day | person),
weights = varIdent(form = ~ 1 | GROUP),
na.action = na.exclude, data = df1, method = "REML"
)
summary(model2)
So, I think these covariance structures are very similar (σg1 = τg2 + σ1).
Edit 2:
Covariate estimates (SAS/MIXED):
Variance person GROUP TEST 8789.23
CS person GROUP TEST 125.79
Variance person GROUP CONTROL 82775
CS person GROUP CONTROL 33297
So
TEST group diagonal element
= 125.79 + 8789.23
= 8915.02
CONTROL group diagonal element
= 33297 + 82775
= 116072
where diagonal element = σk1 + σk2.
Covariate estimates (R lme):
Random effects:
Formula: ~GROUP - 1 | person
Structure: Diagonal
GROUP1TEST GROUP2CONTROL Residual
StdDev: 14.56864 184.692 93.28885
Correlation Structure: Compound symmetry
Formula: ~day | person
Parameter estimate(s):
Rho
-0.009929987
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | GROUP
Parameter estimates:
1TEST 2CONTROL
1.000000 3.068837
So
TEST group diagonal element
= 14.56864^2 + (3.068837^0.5 * 93.28885 * -0.009929987) + 93.28885^2
= 8913.432
CONTROL group diagonal element
= 184.692^2 + (3.068837^0.5 * 93.28885 * -0.009929987) + (3.068837 * 93.28885)^2
= 116070.5
where diagonal element = τg2 + σ1 + σg2.
Oooh, this is going to be a tricky one, and if it's even possible using standard nlme functions, is going to take some serious study of Pinheiro/Bates.
Before you spend the time doing that though, you should make absolutely sure that this is exact model you need. Perhaps there's something else that might fit the story of your data better. Or maybe there's something R can do more easily that is just as good, but not quite the same.
First, here's my take on what you're doing in SAS with this line:
repeated / type=cs subject=person group=GROUP;
This type=cs subject=person is inducing correlation between all the measurements on the same person, and that correlation is the same for all pairs of days. The group=GROUP is allowing the correlation for each group to be different.
In contrast, here's my take on what your R code is doing:
random = ~ +1 | person,
correlation=corCompSymm(form=~day|person)
This code is actually adding almost the same effect in two different ways; the random line is adding a random effect for each person, and the correlation line is inducing correlation between all the measurements on the same person. However, these two things are almost identical; if the correlation is positive, you get the exact same result by including either of them. I'm not sure what happens when you include both, but I do know that only one is necessary. Regardless, this code has the same correlation for all individuals, it's not allowing each group to have their own correlation.
To let each group have their own correlation, I think you have to build a more complicated correlation structure up out of two different pieces; I've never done this but I'm pretty sure I remember Pinheiro/Bates doing it.
You might consider instead adding a random effect for person and then letting the variance be different for the different groups with weights=varIdent(form=~1|group) (from memory, check my syntax, please). This won't quite be the same but tells a similar story. The story in SAS is that the measurements on some individuals are more correlated than the measurements on other individuals. Thinking about what that means, the measurements for individuals with higher correlation will be closer together than the measurements for individuals with lower correlation. In contrast, the story in R is that the variability of measurements within individuals varies; thinking about that, measurements with higher variability with have lower correlation. So they do tell similar stories, but come at it from opposite sides.
It is even possible (but I would be surprised) that these two models end up being different parameterizations of the same thing. My intuition is that the overall measurement variability will be different in some way. But even if they aren't the same thing, it would be worth writing out the parameterizations just to be sure you understand them and to make sure that they are appropriately describing the story of your data.
For IGF data from nlme library, I'm getting this error message:
lme(conc ~ 1, data=IGF, random=~age|Lot)
Error in lme.formula(conc ~ 1, data = IGF, random = ~age | Lot) :
nlminb problem, convergence error code = 1
message = iteration limit reached without convergence (10)
But everything is fine with this code
lme(conc ~ age, data=IGF)
Linear mixed-effects model fit by REML
Data: IGF
Log-restricted-likelihood: -297.1831
Fixed: conc ~ age
(Intercept) age
5.374974367 -0.002535021
Random effects:
Formula: ~age | Lot
Structure: General positive-definite
StdDev Corr
(Intercept) 0.082512196 (Intr)
age 0.008092173 -1
Residual 0.820627711
Number of Observations: 237
Number of Groups: 10
As IGF is groupedData, so both codes are identical. I'm confused why the first code produces error. Thanks for your time and help.
I find the other, older answer here unsatisfactory. I distinguish between cases where, statistically, age has no impact and conversely we encounter a computational error. Personally, I have made career mistakes by conflating these two cases. R has signaled the latter and I would like to dive into why that is.
The model that OP has specified is a growth model, with random slopes and intercepts. A grand intercept is included but not a grand age slope. One unsavory constraint that is imposed by fitting a random slope without addition of its "grand" term is that you are forcing the random slope to have 0 mean, which is very difficult to optimize. Marginal models indicate age does not have a statistically significant different value from 0 in the model. Furthermore adding age as a fixed effect does not remedy the problem.
> lme(conc~ age, random=~age|Lot, data=IGF)
Error in lme.formula(conc ~ age, random = ~age | Lot, data = IGF) :
nlminb problem, convergence error code = 1
message = iteration limit reached without convergence (10)
Here the error is obvious. It might be tempting to set the number of iterations up. lmeControl has many iterative estimands. But even that doesn't work:
> fit <- lme(conc~ 1, random=~age|Lot, data=IGF,
control = lmeControl(maxIter = 1e8, msMaxIter = 1e8))
Error in lme.formula(conc ~ 1, random = ~age | Lot,
data = IGF, control = lmeControl(maxIter = 1e+08, :
nlminb problem, convergence error code = 1
message = singular convergence (7)
So it's not a precision thing, the optimizer is running out-of-bounds.
There must be key differences between the two models you have proposed fitting, and a way to diagnose the error that you have found. One simple approach is specifying a "verbose" fit for the problematic model:
> lme(conc~ 1, random=~age|Lot, data=IGF, control = lmeControl(msVerbose = TRUE))
0: 602.96050: 2.63471 4.78706 141.598
1: 602.85855: 3.09182 4.81754 141.597
2: 602.85312: 3.12199 4.97587 141.598
3: 602.83803: 3.23502 4.93514 141.598
(truncated)
48: 602.76219: 6.22172 4.81029 4211.89
49: 602.76217: 6.26814 4.81000 4425.23
50: 602.76216: 6.31630 4.80997 4638.57
50: 602.76216: 6.31630 4.80997 4638.57
The first term is the REML (I think). The second through fourth terms are the parameters to an object called lmeSt of class lmeStructInt, lmeStruct, and modelStruct. If you use Rstudio's debugger to inspect attributes of this object (the lynchpin of the problem), you'll see it is the random effects component that explodes here. coef(lmeSt) after 50 iterations produces
reStruct.Lot1 reStruct.Lot2 reStruct.Lot3
6.316295 4.809975 4638.570586
as seen above and produces
> coef(lmeSt, unconstrained = FALSE)
reStruct.Lot.var((Intercept)) reStruct.Lot.cov(age,(Intercept))
306382.7 2567534.6
reStruct.Lot.var(age)
21531399.4
which is the same as the
Browse[1]> lmeSt$reStruct$Lot
Positive definite matrix structure of class pdLogChol representing
(Intercept) age
(Intercept) 306382.7 2567535
age 2567534.6 21531399
So it's clear the covariance of the random effects is something that's exploding here for this particular optimizer. PORT routines in nlminb have been criticized for their uninformative errors. The text from David Gay (Bell Labs) is here http://ms.mcmaster.ca/~bolker/misc/port.pdf The PORT documentation suggests our error 7 from using a 1 billion iter max "x may have too many free components. See §5.". Rather than fix the algorithm, it behooves us to ask if there are approximate results which should generate similar outcomes. It is, for instance, easy to fit an lmList object to come up with the random intercept and random slope variance:
> fit <- lmList(conc ~ age | Lot, data=IGF)
> cov(coef(fit))
(Intercept) age
(Intercept) 0.13763699 -0.018609973
age -0.01860997 0.003435819
although ideally these would be weighted by their respective precision weights:
To use the nlme package I note that unconstrained optimization using BFGS does not produce such an error and gives similar results:
> lme(conc ~ 1, data=IGF, random=~age|Lot, control = lmeControl(opt = 'optim'))
Linear mixed-effects model fit by REML
Data: IGF
Log-restricted-likelihood: -292.9675
Fixed: conc ~ 1
(Intercept)
5.333577
Random effects:
Formula: ~age | Lot
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 0.032109976 (Intr)
age 0.005647296 -0.698
Residual 0.820819785
Number of Observations: 237
Number of Groups: 10
An alternative syntactical declaration of such a model can be done with the MUCH easier lme4 package:
library(lme4)
lmer(conc ~ 1 + (age | Lot), data=IGF)
which yields:
> lmer(conc ~ 1 + (age | Lot), data=IGF)
Linear mixed model fit by REML ['lmerMod']
Formula: conc ~ 1 + (age | Lot)
Data: IGF
REML criterion at convergence: 585.7987
Random effects:
Groups Name Std.Dev. Corr
Lot (Intercept) 0.056254
age 0.006687 -1.00
Residual 0.820609
Number of obs: 237, groups: Lot, 10
Fixed Effects:
(Intercept)
5.331
An attribute of lmer and its optimizer is that random effects correlations which are very close to 1, 0, or -1 are simply set to those values since it simplifies the optimization (and statistical efficiency of the estimation) substantially.
Taken together, this does not suggest that age does not have an effect, as was said earlier, and this argument can be supported by the numeric results.
If you plot the data, you can see that there is no effect of age, so it seems strange to be trying to fit a random effect of age in spite of this. No wonder it is not converging.
library(nlme)
library(ggplot2)
dev.new(width=6, height=3)
qplot(age, conc, data=IGF) + facet_wrap(~Lot, nrow=2) + geom_smooth(method='lm')
I think what you want to do is model a random effect of Lot on the intercept. We can try including age as a fixed effect, but we'll see that it is not significant and can be thrown out:
> summary(lme(conc ~ 1 + age, data=IGF, random=~1|Lot))
Linear mixed-effects model fit by REML
Data: IGF
AIC BIC logLik
604.8711 618.7094 -298.4355
Random effects:
Formula: ~1 | Lot
(Intercept) Residual
StdDev: 0.07153912 0.829998
Fixed effects: conc ~ 1 + age
Value Std.Error DF t-value p-value
(Intercept) 5.354435 0.10619982 226 50.41849 0.0000
age -0.000817 0.00396984 226 -0.20587 0.8371
Correlation:
(Intr)
age -0.828
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-5.46774548 -0.43073893 -0.01519143 0.30336310 5.28952876
Number of Observations: 237
Number of Groups: 10