Multiple comparisions using glht with repeated measure anova - r

I'm using the following code to try to get at post-hoc comparisons for my cell means:
result.lme3<-lme(Response~Pressure*Treatment*Gender*Group, mydata, ~1|Subject/Pressure/Treatment)
aov.result<-aov(result.lme3, mydata)
TukeyHSD(aov.result, "Pressure:Treatment:Gender:Group")
This gives me a result, but most of the adjusted p-values are incredibly small - so I'm not convinced the result is correct.
Alternatively I'm trying this:
summary(glht(result.lme3,linfct=mcp(????="Tukey")
I don't know how to get the Pressure:Treatment:Gender:Group in the glht code.
Help is appreciated - even if it is just a link to a question I didn't find previously.
I have 504 observations, Pressure has 4 levels and is repeated in each subject, Treatment has 2 levels and is repeated in each subject, Group has 3 levels, and Gender is obvious.
Thanks

I solved a similar problem creating a interaction dummy variable using interaction() function which contains all combinations of the leves of your 4 variables.
I made many tests, the estimates shown for the various levels of this variable show the joint effect of the active levels plus the interaction effect.
For example if:
temperature ~ interaction(infection(y/n), acetaminophen(y/n))
(i put the possible leves in the parenthesis for clarity) the interaction var will have a level like "infection.y:acetaminophen.y" which show the effect on temperature of both infection, acetaminophen and the interaction of the two in comparison with the intercept (where both variables are n).
Instead if the model was:
temperature ~ infection(y/n) * acetaminophen(y/n)
to have the same coefficient for the case when both vars are y, you would have had to add the two simple effect plus the interaction effect. The result is the same but i prefer using interaction since is more clean and elegant.
The in glht you use:
summary(glht(model, linfct= mcp(interaction_var = 'Tukey'))
to achieve your post-hoc, where interaction_var <- interaction(infection, acetaminophen).
TO BE NOTED: i never tested this methodology with nested and mixed models so beware!

Related

Negative Binomial model offset seems to be creating a 2 level factor

I am trying to fit some data to a negative binomial model and run a pairwise comparison using emmeans. The data has two different sample sizes, 15 and 20 (num_sample in the example below).
I have set up two data frames: good.data which produces the expected result of offset() using random sample sizes between 15 and 20, and bad.data using a sample size of either 15 or 20, which seems to produce a factor of either 15 or 20. The bad.data pairwise comparison produces way too many comparisons compared to the good.data, even though they should produce the same number?
set.seed(1)
library(dplyr)
library(emmeans)
library(MASS)
# make data that works
data.frame(site=c(rep("A",24),
rep("B",24),
rep("C",24),
rep("D",24),
rep("E",24)),
trt_time=rep(rep(c(10,20,30),8),5),
pre_trt=rep(rep(c(rep("N",3),rep("Y",3)),4),5),
storage_time=rep(c(rep(0,6),rep(30,6),rep(60,6),rep(90,6)),5),
num_sample=sample(c(15,17,20),24*5,T),# more than 2 sample sizes...
bad=sample(c(1:7),24*5,T,c(0.6,0.1,0.1,0.05,0.05,0.05,0.05)))->good.data
# make data that doesn't work
data.frame(site=c(rep("A",24),
rep("B",24),
rep("C",24),
rep("D",24),
rep("E",24)),
trt_time=rep(rep(c(10,20,30),8),5),
pre_trt=rep(rep(c(rep("N",3),rep("Y",3)),4),5),
storage_time=rep(c(rep(0,6),rep(30,6),rep(60,6),rep(90,6)),5),
num_sample=sample(c(15,20),24*5,T),# only 2 sample sizes...
bad=sample(c(1:7),24*5,T,c(0.6,0.1,0.1,0.05,0.05,0.05,0.05)))->bad.data
# fit models
good.data%>%
mutate(trt_time=factor(trt_time),
pre_trt=factor(pre_trt),
storage_time=factor(storage_time))%>%
MASS::glm.nb(bad~trt_time:pre_trt:storage_time+offset(log(num_sample)),
data=.)->mod.good
bad.data%>%
mutate(trt_time=factor(trt_time),
pre_trt=factor(pre_trt),
storage_time=factor(storage_time))%>%
MASS::glm.nb(bad~trt_time:pre_trt:storage_time+offset(log(num_sample)),
data=.)->mod.bad
# pairwise comparison
emmeans::emmeans(mod.good,pairwise~trt_time:pre_trt:storage_time+offset(log(num_sample)))$contrasts%>%as.data.frame()
emmeans::emmeans(mod.bad,pairwise~trt_time:pre_trt:storage_time+offset(log(num_sample)))$contrasts%>%as.data.frame()
First , I think you should look up how to use emmeans.The intent is not to give a duplicate of the model formula, but rather to specify which factors you want the marginal means of.
However, that is not the issue here. What emmeans does first is to setup a reference grid that consists of all combinations of
the levels of each factor
the average of each numeric predictor; except if a
numeric predictor has just two different values, then
both its values are included.
It is that exception you have run against. Since num_samples has just 2 values of 15 and 20, both levels are kept separate rather than averaged. If you want them averaged, add cov.keep = 1 to the emmeans call. It has nothing to do with offsets you specify in emmeans-related functions; it has to do with the fact that num_samples is a predictor in your model.
The reason for the exception is that a lot of people specify models with indicator variables (e.g., female having values of 1 if true and 0 if false) in place of factors. We generally want those treated like factors rather than numeric predictors.
To be honest I'm not exactly sure what's going on with the expansion (276, the 'correct' number of contrasts, is choose(24,2), the 'incorrect' number of contrasts is 1128 = choose(48,2)), but I would say that you should probably be following the guidance in the "offsets" section of one of the emmeans vignettes where it says
If a model is fitted and its formula includes an offset() term, then by default, the offset is computed and included in the reference grid. ...
However, many users would like to ignore the offset for this kind of model, because then the estimates we obtain are rates per unit value of the (logged) offset. This may be accomplished by specifying an offset parameter in the call ...
The most natural choice for setting the offset is to 0 (i.e. make predictions etc. for a sample size of 1), but in this case I don't think it matters.
get_contr <- function(x) as_tibble(x$contrasts)
cfun <- function(m) {
emmeans::emmeans(m,
pairwise~trt_time:pre_trt:storage_time, offset=0) |>
get_contr()
}
nrow(cfun(mod.good)) ## 276
nrow(cfun(mod.bad)) ## 276
From a statistical point of view I question the wisdom of looking at 276 pairwise comparisons, but that's a different issue ...

Split plot design with nested random effects and interactions between random effects

I am trying to analyse a split plot design for a plant growth experiment with these variables:
Biomass (dependent variable)
Transect (sub plot factor with three levels)
Treatment (main plot factor with two levels)
Block (2 blocks in total, serving as replicates of the treatment)
Location (multiple locations within each transect point)
I know what the random effect structure should look like. However, I can’t work out how to write this in R script. Could someone please help me? It’s probably very easy, but I have been looking for hours and hours and can’t find it.
Random effects should be:
Block
Interaction Block and Treatment
Location nested within Transect
(Location nested within Transect), interaction with Treatment
So perhaps something like:
(1|block) + (1|block*treatment) + (1|location:transect) +
(1|(location:transect)*treatment)
OK, I'll take a shot at this.
First: in 'modern' mixed model approaches it is not practical to treat a two-level categorical variable as random. In 'classical' method-of-moment/SSQ ratio approaches it works, although the power is terrible; in modern methods you will end up with 'singular models' (do a web search for "GLMM FAQ" or search here and on CrossValidated for more info). (The exception to this statement is if you go full-Bayesian and put regularizing priors on the random-effects parameters ...) Therefore, I'm going to take block as a fixed effect.
This would be (I think) your maximal model:
~ treatment*block + (treatment|transect/location)
treatment*block (expands to 1 + block + treatment + block:treatment: the baseline biomass (intercept) could differ between blocks, the treatments could differ, the treatment effect could differ between blocks
(treatment|transect/location) (expands to (1+treatment|transect) + (1+treatment|transect:location)); the intercept and treatment effect vary among transects and among locations within transects. (This assumes that transects are uniquely coded between blocks, i.e. you don't have a transect 001 in both blocks, rather they are labeled something like A001 and B001. If not, you need something like (1+treatment|block:(transect/location)) ...
This also assumes you have multiple observations per transect/location/treatment combination. If not (if each treatment is observed only once per location), then the full interaction will be confounded with the residual variation and you instead need something like (1+treatment|transect) + (1|transect:location).

Nested model in R

I'm having a huge problem with a nested model I am trying to fit in R.
I have response time experiment with 2 conditions with 46 people each and 32 measures each. I would like measures to be nested within people and people nested within conditions, but I can't get it to work.
The code I thought should make sense was:
nestedmodel <- lmer(responsetime ~ 1 + condition +
(1|condition:person) + (1|person:measure), data=dat)
However, all I get is an error:
Error in checkNlevels(reTrms$flist, n = n, control) :
number of levels of each grouping factor must be < number of observations
Unfortunately, I do not even know where to start looking what the problem is here.
Any ideas? Please, please, please? =)
Cheers!
This might be more appropriate on CrossValidated, but: lme4 is trying to tell you that one or more of your random effects is confounded with the residual variance. As you've described your data, I don't quite see why: you should have 2*46*32=2944 total observations, 2*46=92 combinations of condition and person, and 46*32=1472 combinations of measure and person.
If you do
lf <- lFormula(responsetime ~ 1 + condition +
(1|condition:person) + (1|person:measure), data=dat)
and then
lapply(lf$reTrms$Ztlist,dim)
to look at the transposed random-effect design matrices for each term, what do you get? You should (based on your description of your data) see that these matrices are 1472 by 2944 and 92 by 2944, respectively.
As #MrFlick says, a reproducible example would be nice. Other things you could show us are:
fit the model anyway, using lmerControl(check.nobs.vs.nRE="ignore") to ignore the test, and show us the results (especially the random effects variances and the statement of the numbers of groups)
show us the results of with(dat,table(table(interaction(condition,person))) to give information on the number of replicates per combination (and similarly for measure)

Repeated-measures ANOVA in R: +Error(subject) or + Error(subject/ VI1 * VI2)?

I have spent a lot of time on multiple posts and tutorials, but I still do not understand wich "rule" I have to apply to my current data, and why.
My experiment follows a within-subject design, as every subjects (n=17) performed a task in 2 conditions, accross 5 blocks of trials. VD is the mean RTs, fixed effects are condition and block, random effect is subject.
I would like to analyse the interaction between condition and block.
Using aov
I first aggregated my data:
ag<-aggregate(RT~condition+block+subject,data=d, FUN=mean)
But then I don't know if I have to include my within-subject factors into my Error term:
(1)aov<-aov(RT~ condition * block + Error(subject/(condition * block)), data=ag)
OR
(2)aov<-aov(RT~ condition * block + Error(subject), data=ag)
I have seen on several posts that the within factors have to be included in the error term, as in (1), but I do not understand how the dfs are calculated.
Using lmer
Additionally, I would like to attempt using lmer instead of aov.
I suspect that the equivalent of (2) would be:
lmer(RT ~ 1+(1|sujet)+condition*block, ag)
But if it is the (1) which is the correct one, I can not figure out how does it would have to be specified using lmer.

Regression coefficients by group in dataframe R

I have data of various companies' financial information organized by company ticker. I'd like to regress one of the columns' values against the others while keeping the company constant. Is there an easy way to write this out in lm() notation?
I've tried using:
reg <- lmList(lead2.dDA ~ paudit1 + abs.d.GINDEX + logcapx + logmkvalt +
logmkvalt2|pp, data=reg.df)
where pp is a vector of company names, but this returns coefficients as though I regressed all the data at once (and did not separate by company name).
A convenient and apparently little-known syntax for estimating separate regression coefficients by group in lm() involves using the nesting operator, /. In this case it would look like:
reg <- lm(lead2.dDA ~ 0 + pp/(paudit1 + abs.d.GINDEX + logcapx +
logmkvalt + logmkvalt2), data=reg.df)
Make sure that pp is a factor and not a numeric. Also notice that the overall intercept must be suppressed for this to work; in the new formulation, we have a different "intercept" for each group.
A couple comments:
Although the regression coefficients obtained this way will match those given by lmList(), it should be noted that with lm() we estimate only a single residual variance across all the groups, whereas lmList() would estimate separate residual variances for each group.
Like I mentioned in my earlier comment, the lmList() syntax that you gave looks like it should have worked. Since you say it didn't, this leads me to expect that really the problem is something else (although it's hard to tell what without a reproducible example), and so it seems likely that the solution I posted will fail for you as well, for the same unknown reasons. If you want more detailed guidance, please provide more information; help us help you.

Resources