Inflated DF in lsmeans results for an lmer model - r

I used lmer from the lme4 package to run a linear mixed effects model. I have 3 years of temperature data for untreated (5) and treated plots (10). The model:
modela<-lmer(ave~yr*tr+(1|pl), REML=FALSE, data=mydata)
Model checked for normality of residuals; qqnorm plot
My data:
'data.frame': 6966 obs. of 7 variables:
$ yr : Factor w/ 3 levels "yr1","yr2","yr3": 1 1 1 1 1 1 1 1 1 1 ...
$ pl : Factor w/ 15 levels "C02","C03","C05",..: 1 1 1 1 1 1 1 1 1 1 ...
$ tr : Factor w/ 2 levels "Cont","OTC": 1 1 1 1 1 1 1 1 1 1 ...
$ ave: num 14.8 16.1 11.6 10.3 11.6 ...
The interaction is significant, so I used lsmeans:
lsmeans(modela, pairwise~yr*tr, adjust="tukey")
In the contrasts, I get (excerpts only)
contrast estimate SE df t.ratio p.value
yr1,Cont - yr2,Cont -0.727102895 0.2731808 6947.24 -2.662 0.0832
yr1,OTC - yr2,OTC -0.990574030 0.2015650 6449.10 -4.914 <.0001
yr1,Cont - yr1,OTC -0.005312771 0.3889335 31.89 -0.014 1.0000
yr2,Cont - yr2,OTC -0.268783907 0.3929332 32.97 -0.684 0.9825
My question regards the high dfs for some of the contrasts, and associated, but meaningless low p-values.
Can this be due to:
-presence of NA's in my data set (some improvement when removed)
-unequal sample sizes (e.g. 5 of one treatment, 10 of the other - however, those (yr1,Cont - yr1, OTC) don't seem to be a problem.
Other issues?
I have searched stakoverflow questions, and crossvalidated.
Thanks for any answers, ideas, comments.

In this example, treatments are assigned experimentally to plots. Having small numbers of plots assigned to treatments severely limits the information available to statistically compare the treatments. (If you had only one plot per treatment, it would not even be possible to compare treatments, because you wouldn't be able to sort out the effect of the treatments from the effect of the plots.) You have 10 plots assigned to one treatment and 5 to the other. In terms of the main effect for treatment, you thus have (10-1)+(5-1) = 13 d.f. for the main effect of treatment, and if you do
lsmeans(modela, pairwise ~ tr)
you will see around 13 d.f. (maybe less due to imbalance and missingness) for those statistics. When you compare combinations of years and treatments, you get roughly 3 times the d.f. because there are 3 years. However, in some of those comparisons, year is that same in each combination being compared, and in those comparisons, the variation in plots mostly cancels out (it is a within-plot comparison); and in those cases, the d.f. basically come from the residual error for the model, which has thousands of d.f. Due to imbalances in the data, these comparisons are a little bit polluted by the between-plot variations, making the d.f. somewhat smaller than the residual d.f.
It appears you are not particularly interested in cross-comparisons such as treat1, year1 vs. treat2, year3. I suggest using "by" variables to cut down on the number of comparisons tested, because when you test them all, the multiplicity correction is unnecessarily conservative. It would go something like this:
modela.lsm = lsmeans(modela, ~ tr * yr)
pairs(modela.lsm, by = "yr") # compare tr for each yr
pairs(modela.lsm, by = "tr") # compare yr for each tr
These calls will apply the Tukey correction separately to each "by" group. If you want a multiplicity correction for each whole family, do this:
rbind(pairs(modela.lsm, by = "yr"))
rbind(pairs(modela.lsm, by = "tr"))
By default, a multivariate t correction is used (Tukey is not the right method here). You can even do
rbind(pairs(modela.lsm, by = "yr"), pairs(modela.lsm, by = "tr"))
to group all of the comparisons into one family and apply a multivariate t adjustment.

Related

2 way repeated ANOVA did not show sphericity test

There were 4 levels of treatment over 11 sampling time points. Each treatment had 3 identical systems. And samples were analyzed for different parameters. So I used 2 way repeated measures anova, for example:
ezANOVA(data = WP4, dv = parameter1, wid = System, within = Time, between = Treatment, type=3)
However, it produced anova table without sphericity test on some parameters, like below:
Warning: NaNs producedWarning: NaNs produced$ANOVA
Effect DFn DFd F p p<.05 ges
2 Treatment 3 8 2.255871 1.590810e-01 0.2934692
3 Time 10 80 35.989273 1.524902e-25 * 0.6960297
4 Treatment:Time 30 80 7.574502 2.275032e-13 * 0.5911306
Please let me know if you need more information to crack my problem. Thanks.
I would like to have the complete anova table with $Mauchly's Test for Sphericity and $Sphericity Corrections.

How to get value of group = 0 in linear mixed model

I have a very simple stat question probably.
So, I am fitting linear mixed models like this:
lme(dependent ~ Group + Sex + Age + npgs, data=boookclub, random = ~ 1| subject)
Group is a factor variable with levels = 0, 1 , 2 , 3
The dependent are continuous variables standardized (mean 0) and the others are covariates with sex being factor, with Male/Female levels, Age being numerical, and npgs being numerical continuous standardized as well.
When I get the table with beta, standard error, t and p values, I get this:
Value Std.Error DF t-value p-value
(Intercept) -0.04550502 0.02933385 187 -1.551280 0.0025
Group1 0.04219801 0.03536929 181 1.193069 0.2344
Group2 0.03350827 0.03705896 181 0.904188 0.3671
Group3 0.00192119 0.03012654 181 0.063771 0.9492
SexMale 0.03866387 0.05012901 181 0.771287 0.4415
Age -0.00011675 0.00148684 181 -0.078520 0.9375
npgs 0.15308844 0.01637163 181 9.350835 0.0000
SexMale:Age 0.00492966 0.00276117 181 1.785352 0.0759
My problem is: how do I get the beta of Group0? In this case the intercept is Group0 but also the average of npgs, being npgs standardized. How do I get the Beta of Group0? And how can I check if Group0 is significantly associated to the dependent? I'd like to see the effect of all Group levels.
Thanks
The easiest way to do what you want may be with the emmeans package, but you may also have some conceptual issues. Technical details first, then conceptual:
Technical
Fitting an example (this isn't necessarily statistically sensible, but I wanted an example with a categorical fixed effect)
library(nlme)
m1 <- lme(Yield~Variety, random = ~1|Block, data=Alfalfa)
As with your example, the effects are "intercept" (= mean of the baseline group, which is the "Cossack" variety in this case [by default, the alphabetically-first group]), "Ladak" (difference between Ladak and Cossack means) and "Ranger" (similarly). (As #Ben hints in the comments above, R automatically generates dummies for [most of] the levels of the categorical variables [factors] in your model.)
coef(summary(m1))
## Value Std.Error DF t-value p-value
## (Intercept) 1.57166667 0.11665326 64 13.4729767 2.373343e-20
## VarietyLadak 0.09458333 0.07900687 64 1.1971532 2.356624e-01
## VarietyRanger -0.01916667 0.07900687 64 -0.2425949 8.090950e-01
The emmeans package is a convenient way to see predicted values for each group without recoding.
library(emmeans)
emmeans(m1, spec = ~Variety)
## Variety emmean SE df lower.CL upper.CL
## Cossack 1.57 0.117 5 1.27 1.87
## Ladak 1.67 0.117 5 1.37 1.97
## Ranger 1.55 0.117 5 1.25 1.85
Conceptual
You can't "check if Group0 is significantly associated with the dependent [response] variable". You can only check whether the response variables differs significantly between two groups, or whether it differs significantly among all groups (e.g. the results of anova()). You have to pick a baseline. (If you insist, you can test all pairwise comparisons among groups; emmeans can help with this too.) If you "remove the intercept" (by fitting Variety ~Yield-1, or by looking at the results that emmeans produces) then the difference you are quantifying is the difference between the mean of a particular group and zero. This is usually not a meaningful question; in the example here, for instance, this would be testing whether a wheat variety gave a yield that was significantly greater than zero — probably not very interesting.
On the other hand, if you are just interested in estimating the expected value in each group (conditioning on the baseline values of the other variables in the model), along with the standard errors/CIs, then the answers you get from emmeans are perfectly sensible.
There's a related question here that explains why you get an NA value if you manually create dummies for every level of your factor ...

Show family class in TukeyHSD

I am use to conducting Tukey post-hoc tests in minitab. When I do, I usually get family grouping of the dependent/predictor variables.
In R, using TukeyHSD() the family grouping is not displayed (or calculated?). It only displays the relationship between each of the dependent/predictor variables. Is it possible to display the family groupings like in minitab?
Using the diamonds data set:
av <- aov(price ~ cut, data = diamonds)
tk <- TukeyHSD(av, ordered = T, which = "cut")
plot(tk)
Output:
Fit: aov(formula = price ~ cut, data = diamonds)
$cut
diff lwr upr p adj
Good-Ideal 471.32248 300.28228 642.3627 0.0000000
Very Good-Ideal 524.21792 401.33117 647.1047 0.0000000
Fair-Ideal 901.21579 621.86019 1180.5714 0.0000000
Premium-Ideal 1126.71573 1008.80880 1244.6227 0.0000000
Very Good-Good 52.89544 -130.15186 235.9427 0.9341158
Fair-Good 429.89331 119.33783 740.4488 0.0014980
Premium-Good 655.39325 475.65120 835.1353 0.0000000
Fair-Very Good 376.99787 90.13360 663.8622 0.0031094
Premium-Very Good 602.49781 467.76249 737.2331 0.0000000
Premium-Fair 225.49994 -59.26664 510.2665 0.1950425
Picture added to help clarify my response to Maruits's comment:
Here is a step-by-step example on how to reproduce minitab's table for the ggplot2::diamonds dataset. I've included details/explanation as much as possible.
Please note that as far as I can tell, results shown in minitab's table are not dependent/related to results from Tukey's post-hoc test; they are based on results from the analysis of variance. Tukey's honest significant difference (HSD) test is a post-hoc test that establishes which comparisons (of all the possible pairwise comparisons of group means) are (honestly) statistically significant, given the ANOVA results.
In order to reproduce minitabs "mean-grouping" summary table (see the first table of "Interpret the results: Step 3" of the minitab Express Support), I recommend (re-)running a linear model to extract means and confidence intervals. Note that this is exactly how aov fits the analysis of variance model for each group.
Fit a linear model
We specify a 0 offset to get absolute estimates for every group (rather than estimates for the changes relative to an offset).
fit <- lm(price ~ 0 + cut, data = diamonds)
coef <- summary(fit)$coef;
coef;
# Estimate Std. Error t value Pr(>|t|)
#cutFair 4358.758 98.78795 44.12236 0
#cutGood 3928.864 56.59175 69.42468 0
#cutVery Good 3981.760 36.06181 110.41487 0
#cutPremium 4584.258 33.75352 135.81570 0
#cutIdeal 3457.542 27.00121 128.05137 0
Determine family groupings
In order to obtain something similar to minitab's "family groupings", we adopt the following approach:
Calculate confidence intervals for all parameters
Perform a hierarchical clustering analysis on the confidence interval data for all parameters
Cut the resulting tree at a height corresponding to the standard deviation of the CIs. This will gives us a grouping of parameter estimates based on their confidence intervals. This is a somewhat empirical approach but justifiable as the tree measures pairwise distances between the confidence intervals, and the standard deviation can be interpreted as a Euclidean distance.
We start by calculating the confidence interval and cluster the resulting distance matrix using hierarchical clustering using complete linkage.
CI <- confint(fit);
hc <- hclust(dist(CI));
We inspect the cluster dendrogram
plot(hc);
We now cut the tree at a height corresponding to the standard deviation of all CIs across all parameter estimates to get the "family groupings"
grps <- cutree(hc, h = sd(CI))
Summarise results
Finally, we collate all quantities and store results in a table similar to minitab's "mean-grouping" table.
library(tidyverse)
bind_cols(
cut = rownames(coef),
N = as.integer(table(fit$model$cut)),
Mean = coef[, 1],
Groupings = grps) %>%
as.data.frame()
# cut N Mean Groupings
#1 cutFair 1610 4358.758 1
#2 cutGood 4906 3928.864 2
#3 cutVery Good 12082 3981.760 2
#4 cutPremium 13791 4584.258 1
#5 cutIdeal 21551 3457.542 3
Note the near-perfect agreement of our results with those from the minitab "mean-grouping" table: cut = Ideal is by itself in group 3 (group C in minitab's table), while Fair+Premium share group 1 (minitab: group A ), and Good+Very Good share group 2 (minitab: group B).
See the cld function in the multcomp package, as explained here (copy-pasted below).
Example data set:
> data(ToothGrowth)
> ToothGrowth$treat <- with(ToothGrowth, interaction(supp,dose))
> str(ToothGrowth)
'data.frame': 60 obs. of 3 variables:
$ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
$ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
$ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
$ treat: Factor w/ 6 levels "OJ.0.5","VC.0.5",..: 2 2 2 2 2 2 2 2 2 2 ...
Model fit:
> fit <- lm(len ~ treat, data=ToothGrowth)
All pairwise comparisons with Tukey test:
> apctt <- multcomp::glht(fit, linfct = multcomp::mcp(treat = "Tukey"))
Letter-based representation of all-pairwise comparisons (algorithm from Piepho 2004):
> lbrapc <- multcomp::cld(apctt)
> lbrapc
OJ.0.5 VC.0.5 OJ.1 VC.1 OJ.2 VC.2
"b" "a" "c" "b" "c" "c"

R post hoc comparisons of ezANOVA

I perform following ezANOVA:
RMANOVAGHB1 <- ezANOVA(GHB1, dv=DIF.SCORE.STARTLE, wid=RAT.ID, within=TRIAL.TYPE, between=GROUP, detailed = TRUE, return_aov = TRUE)
My dataset looks like this:
RAT.ID DIF.SCORE.STARTLE GROUP TRIAL.TYPE
1 1 170.73 SAL TONO
2 1 80.07 SAL NOAL
3 2 456.40 PROP TONO
4 2 290.40 PROP NOAL
5 3 507.20 SAL TONO
6 3 261.60 SAL NOAL
7 4 208.67 PROP TONO
8 4 137.60 PROP NOAL
9 5 500.50 SAL TONO
10 5 445.73 SAL NOAL
up until rat.id 16.
My supervisors don't work with R, so they can't help me. I need code that will give me all post hoc contrasts, but looking it up only confuses me more and more.
I already tried to do TukeyHSD on the aov output of ezANOVA and tried pairwise.t.test next (as I found out bonferroni is a more appropriate correction in this case), but none seem to work. I've also found things about using a linear model and then multcomp, but I don't know if that would be a good solution in this case. I feel like the problem with everything I tried is either that I have between and within variables or that my dataset is not set up right. Another complicating factor is that I'm just a beginner with R and my statistical knowledge is still pretty basic as this is one of my first practical experiences with doing analyses.
If it's important, this is the output of the anova:
$ANOVA
Effect DFn DFd SSn SSd F p p<.05 ges
1 (Intercept) 1 14 1233568.9 1076460.9 16.043280 0.001302172 * 0.508451750
2 GROUP 1 14 212967.9 1076460.9 2.769771 0.118272657 0.151521743
3 TRIAL.TYPE 1 14 137480.6 116097.9 16.578499 0.001143728 * 0.103365833
4 GROUP:TRIAL.TYPE 1 14 11007.2 116097.9 1.327335 0.268574391 0.009145489
$aov
Call:
aov(formula = formula(aov_formula), data = data)
Grand Mean: 196.3391
Stratum 1: RAT.ID
Terms:
GROUP Residuals
Sum of Squares 212967.9 1076460.9
Deg. of Freedom 1 14
Residual standard error: 277.2906
1 out of 2 effects not estimable
Estimated effects are balanced
Stratum 2: RAT.ID:TRIAL.TYPE
Terms:
TRIAL.TYPE GROUP:TRIAL.TYPE Residuals
Sum of Squares 137480.6 11007.2 116097.9
Deg. of Freedom 1 1 14
Residual standard error: 91.0643
Estimated effects may be unbalanced
My solution, considering your dataset - first 5 rats:
1. Let's build the linear model:
model.lm = lm(DIF_SCORE_STARTLE ~ GROUP * TRIAL_TYPE, data = dat)
2. Let's chceck the homogeneity of variance (leveneTest) and distribution of our model (Shapiro-Wilk). We are looking for normal distribution and our variance should be homogenic. Two tests for this:
>shapiro.test(resid(model.lm))
Shapiro-Wilk normality test
data: resid(model.lm)
W = 0.91783, p-value = 0.3392
> leveneTest(DIF_SCORE_STARTLE ~ GROUP * TRIAL_TYPE, data = dat)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 3 0.066 0.976
6
Our p-values are higher than 0.05 in both cases so we don't have proof that our variance differs between groups. In case of normality test we can also conclude that the sample doesn't deviate from normality. Summarizing we can use parametrical tests such as ANOVA or pairwise t-test.
3.Yo can also run:
hist(resid(model.lm))
To check how does distribution of our data look like. And check the model:
plot(model.lm)
Here: https://stats.stackexchange.com/questions/58141/interpreting-plot-lm/65864 you'll find interpretation of plots produced by this function. As I saw, data looks fine.
4.Now finally we can do ANOVA test and post hoc HSD test:
> anova(model.lm)
Analysis of Variance Table
Response: DIF_SCORE_STARTLE
Df Sum Sq Mean Sq F value Pr(>F)
GROUP 1 7095 7095 0.2323 0.6469
TRIAL_TYPE 1 39451 39451 1.2920 0.2990
GROUP:TRIAL_TYPE 1 84 84 0.0027 0.9600
Residuals 6 183215 30536
> (result.hsd = HSD.test(model.lm, list('GROUP', 'TRIAL_TYPE')))
$statistics
Mean CV MSerror HSD r.harmonic
305.89 57.12684 30535.91 552.2118 2.4
$parameters
Df ntr StudentizedRange alpha test name.t
6 4 4.895599 0.05 Tukey GROUP:TRIAL_TYPE
$means
DIF_SCORE_STARTLE std r Min Max
PROP:NOAL 214.0000 108.0459 2 137.60 290.40
PROP:TONO 332.5350 175.1716 2 208.67 456.40
SAL:NOAL 262.4667 182.8315 3 80.07 445.73
SAL:TONO 392.8100 192.3561 3 170.73 507.20
$comparison
NULL
$groups
trt means M
1 SAL:TONO 392.8100 a
2 PROP:TONO 332.5350 a
3 SAL:NOAL 262.4667 a
4 PROP:NOAL 214.0000 a
As you see, our 'pairs' have been grouped in one big group a that means that there are not significant difference between them. However there's some difference between NOAL and TONO no matter of SAL and PROP.

nlme error "Invalid formula for groups" although random effect specified

I have done some searching for this, but the mailing list posts I have found are associated with the person not specifying a random effect in nlme whereas I have done this. I also own the book Mixed Effect Models in S and S-Plus by Pinheiro and Bates, but can't work out my problem from the book.
I'm still working on my nutrient data analysis, and have now shifted onto real data. The data come from a population survey, and feature a repeated measures design as each respondent has two 24-hour intake recalls for the nutrient.
I have successfully fit a lme4 model to my data, and now I am trying to find out what happens if I use a nonlinear method instead. A snapshot of my data is below:
head(Male.Data)
RespondentID Age SampleWeight IntakeDay IntakeAmt AgeFactor BoxCoxXY
2 100020 12 0.4952835 Day1Intake 12145.852 9to13 15.61196
7 100419 14 0.3632839 Day1Intake 9591.953 14to18 15.01444
8 100459 11 0.4952835 Day1Intake 7838.713 9to13 14.51458
12 101138 15 1.3258785 Day1Intake 11113.266 14to18 15.38541
14 101214 6 2.1198688 Day1Intake 7150.133 4to8 14.29022
18 101389 5 2.1198688 Day1Intake 5091.528 4to8 13.47928
And the summary information about the data is:
str(Male.Data)
'data.frame': 4498 obs. of 7 variables:
$ RespondentID: Factor w/ 4487 levels "100013","100020",..: 2 7 8 12 14 18 19 20 21 22 ...
$ Age : int 12 14 11 15 6 5 10 2 2 9 ...
$ SampleWeight: num 0.495 0.363 0.495 1.326 2.12 ...
$ IntakeDay : Factor w/ 2 levels "Day1Intake","Day2Intake": 1 1 1 1 1 1 1 1 1 1 ...
$ IntakeAmt : num 12146 9592 7839 11113 7150 ...
$ AgeFactor : Factor w/ 4 levels "1to3","4to8",..: 3 4 3 4 2 2 3 1 1 3 ...
$ BoxCoxXY : num 15.6 15 14.5 15.4 14.3 ...
Using the lme4 package, I have successfully fit a linear mixed effects model using (the random effect is from the subjects and IntakeDay is the repeated measure factor associated with BoxCoxXY, which is a transform of IntakeAmt):
Male.lme1 <- lmer(BoxCoxXY ~ AgeFactor + IntakeDay + (1|RespondentID),
data = Male.Data,
weights = SampleWeight)
I have been trying to use the nlme package to look at fitting a nonlinear model to compare the two, but I cannot get my syntax to work. My initial problem was that there does not seem to be a relevant SelfStart model for my data, so I used geeglm to generate starting values (coefficients saved to a data frame called Male.nlme.start). But now I just get the error:
Error in getGroups.data.frame(dataMix, eval(parse(text = paste("~1", deparse(groups[[2]]), :
Invalid formula for groups
I can't work out what I am doing wrong, the nlme syntax I am using is:
Male.nlme1 <- nlme(BoxCoxXY ~ AgeFactor + IntakeDay + RespondentID , data = Male.Data,
fixed = AgeFactor + IntakeDay ~ 1,
random = RespondentID ~ 1,
start=c(Male.nlme.start[,"Estimate"]))
I have tried the analysis both with and without RespondentID being included in the overall model specification, and this seems to have no impact.
The reason I am trying to persevere with the nonlinear method is that the original analysis in SAS used a nonlinear approach. While my residuals etc look acceptably good from the lme analysis, I am curious to see what impact a nonlinear approach would have.
In case it is helpful, the traceback() results from the last analysis attempt, which includes RespondentID is:
5: stop("Invalid formula for groups")
4: getGroups.data.frame(dataMix, eval(parse(text = paste("~1", deparse(groups[[2]]),
sep = "|"))))
3: getGroups(dataMix, eval(parse(text = paste("~1", deparse(groups[[2]]),
sep = "|"))))
2: nlme.formula(BoxCoxXY ~ AgeFactor + IntakeDay, data = Male.Data,
fixed = AgeFactor + IntakeDay ~ 1, random = RespondentID ~
1, start = c(Male.nlme.start[, "Estimate"]))
1: nlme(BoxCoxXY ~ AgeFactor + IntakeDay, data = Male.Data, fixed = AgeFactor +
IntakeDay ~ 1, random = RespondentID ~ 1, start = c(Male.nlme.start[,
"Estimate"]))
Can anyone suggest where I have gone wrong? I'm starting to wonder if either (1) there are too many factor levels for RespondentID to work in nlme or (2) the method will only work if I supply a start parameter for RespondentID, which seems nonsensical with the data I have as this is my subject identifier.
Update: to answer Ben, the SAS nlmixed model is a general log likelihood function for the fixed effects:
ll1 <- log(1/sqrt(2*pi*Scale))
ll2 <- as.data.frame(-(BoxCoxXY - Intercept + AgeFactor + IntakeDay + u2)^2)/(2*Scale)+(Lambda.Value-1)*log(IntakeAmt)
ll <- ll1 + ll2
model IntakeAmt ~ general(ll)
where:
Scale = dispersion value from geeglm and
Lambda.Value = lambda value associated with the maximum log likelihood output from an earlier boxcox() which was used to transform IntakeAmt to BoxCoxXY through the formula Male.Data$BoxCoxXY <- (Male.Data$IntakeAmt^Lambda.Value-1)/Lambda.Value
The random statement in the SAS code is:
random u1 u2 ~ normal([0,0][&vu1,COV_U1U2,&vu2]) subject=RespondentID
so there are two error terms in the model and they are both being fit as random effects. The second square bracket represents the lower triangle of the random-effects variance matrix listed in row order, and is specified using SAS macro variables in the SAS syntax.
The summary of the model that I have been given is the normal one-line overview that shows matrix of covariates (BX) plus an error component, so it's not a lot of help here.
Second update: I realised that I had not removed the RespondentID levels associated with the female subjects as I factorised RespondentID over the entire data frame before I did the split into separate data frames, by gender, for analysis. I have repeated the nlme analysis after removing unused factor levels for RespondentID and I get the same error. The lmer results are the same - which is good to know. :)

Resources