Simple effect tests on glmm significant interactions in R - r

I have significant interactions in my glmm output and I know what my reference category (another interaction) is, but I need to know which variable is fixed when contrasting the significant interaction with the reference category.
I am looking for a post-hoc test called "simple effects test" (not Tukey). The same test is called "test slice" in JMP for any of you who use both R and JMP.
I have looked everywhere but cannot find the code for the simple effects test. Does anyone know how to use this test in R?
Here is an example of my glmm (using neg. binomial distribution) output:
Call:
glm.nb(formula = N ~ FoodCategory * Season + FoodCategory + Season +
(1 | Group/Animal), data = SPwg, init.theta = 0.8744631431,
link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4796 -0.9720 -0.3713 -0.0350 4.7595
Coefficients: (2 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.2763 0.2940 0.939 0.34748
FoodCategoryFruit 0.8849 0.3316 2.669 0.00762 **
FoodCategoryInvertebrate -0.1962 0.5086 -0.386 0.69966
FoodCategoryPlantMatter 0.4169 1.3153 0.317 0.75128
SeasonHFLC -0.2250 0.4435 -0.507 0.61195
SeasonLFLC -0.2763 0.4610 -0.599 0.54904
1 | Group/AnimalTRUE NA NA NA NA
FoodCategoryFruit:SeasonHFLC 1.1511 0.4811 2.393 0.01673 *
FoodCategoryInvertebrate:SeasonHFLC 1.6265 0.6784 2.398 0.01651 *
FoodCategoryPlantMatter:SeasonHFLC NA NA NA NA
FoodCategoryFruit:SeasonLFLC 1.5565 0.4997 3.115 0.00184 **
FoodCategoryInvertebrate:SeasonLFLC 0.3016 0.7822 0.386 0.69984
FoodCategoryPlantMatter:SeasonLFLC 0.8640 1.4630 0.591 0.55479
---
My reference category is "FoodCategoryOther:SeasonHFHL". I know from this output, for example, that "FoodCategoryFruit:SeasonLFLC" is significantly more positive than my reference category.
However, I do not know if this is because "FoodCategoryFruit" is significantly more positive than "FoodCategoryPlantMatter" during the "SeasonLFLC" (for example) or if "FoodCategoryFruit" is significantly more positive during the "SeasonLFLC" than "FoodCategoryFruit" is during the "SeasonHFHL".
A simple effects test will fix one of the variables while testing for the effects of the other. This is what I need to work out the problem, unless someone can inform me of a similar/better/more appropriate test. However, please don't tell me Tukey, because this post-hoc test does not fix one variable while testing for the effects of the other.

This... is not a GLMM (generalized linear mixed model). You're fitting a regular old GLM with fixed effects only, albeit with the wrinkle of a negative binomial error distribution. Because glm.nb doesn't understand random effects notation, your (1 | Group/Animal) term has been interpreted as an arithmetic/logical expression, ie 1 ORed with the result of Group divided by Animal. 1 ORed with anything is identically TRUE, hence the NA coefficient for this term.
For an actual GLMM, you'll need to use something like glmer in the lme4 package, or the arm package (and possibly others I don't know about).

Related

How to get value of group = 0 in linear mixed model

I have a very simple stat question probably.
So, I am fitting linear mixed models like this:
lme(dependent ~ Group + Sex + Age + npgs, data=boookclub, random = ~ 1| subject)
Group is a factor variable with levels = 0, 1 , 2 , 3
The dependent are continuous variables standardized (mean 0) and the others are covariates with sex being factor, with Male/Female levels, Age being numerical, and npgs being numerical continuous standardized as well.
When I get the table with beta, standard error, t and p values, I get this:
Value Std.Error DF t-value p-value
(Intercept) -0.04550502 0.02933385 187 -1.551280 0.0025
Group1 0.04219801 0.03536929 181 1.193069 0.2344
Group2 0.03350827 0.03705896 181 0.904188 0.3671
Group3 0.00192119 0.03012654 181 0.063771 0.9492
SexMale 0.03866387 0.05012901 181 0.771287 0.4415
Age -0.00011675 0.00148684 181 -0.078520 0.9375
npgs 0.15308844 0.01637163 181 9.350835 0.0000
SexMale:Age 0.00492966 0.00276117 181 1.785352 0.0759
My problem is: how do I get the beta of Group0? In this case the intercept is Group0 but also the average of npgs, being npgs standardized. How do I get the Beta of Group0? And how can I check if Group0 is significantly associated to the dependent? I'd like to see the effect of all Group levels.
Thanks
The easiest way to do what you want may be with the emmeans package, but you may also have some conceptual issues. Technical details first, then conceptual:
Technical
Fitting an example (this isn't necessarily statistically sensible, but I wanted an example with a categorical fixed effect)
library(nlme)
m1 <- lme(Yield~Variety, random = ~1|Block, data=Alfalfa)
As with your example, the effects are "intercept" (= mean of the baseline group, which is the "Cossack" variety in this case [by default, the alphabetically-first group]), "Ladak" (difference between Ladak and Cossack means) and "Ranger" (similarly). (As #Ben hints in the comments above, R automatically generates dummies for [most of] the levels of the categorical variables [factors] in your model.)
coef(summary(m1))
## Value Std.Error DF t-value p-value
## (Intercept) 1.57166667 0.11665326 64 13.4729767 2.373343e-20
## VarietyLadak 0.09458333 0.07900687 64 1.1971532 2.356624e-01
## VarietyRanger -0.01916667 0.07900687 64 -0.2425949 8.090950e-01
The emmeans package is a convenient way to see predicted values for each group without recoding.
library(emmeans)
emmeans(m1, spec = ~Variety)
## Variety emmean SE df lower.CL upper.CL
## Cossack 1.57 0.117 5 1.27 1.87
## Ladak 1.67 0.117 5 1.37 1.97
## Ranger 1.55 0.117 5 1.25 1.85
Conceptual
You can't "check if Group0 is significantly associated with the dependent [response] variable". You can only check whether the response variables differs significantly between two groups, or whether it differs significantly among all groups (e.g. the results of anova()). You have to pick a baseline. (If you insist, you can test all pairwise comparisons among groups; emmeans can help with this too.) If you "remove the intercept" (by fitting Variety ~Yield-1, or by looking at the results that emmeans produces) then the difference you are quantifying is the difference between the mean of a particular group and zero. This is usually not a meaningful question; in the example here, for instance, this would be testing whether a wheat variety gave a yield that was significantly greater than zero — probably not very interesting.
On the other hand, if you are just interested in estimating the expected value in each group (conditioning on the baseline values of the other variables in the model), along with the standard errors/CIs, then the answers you get from emmeans are perfectly sensible.
There's a related question here that explains why you get an NA value if you manually create dummies for every level of your factor ...

GLM in R: Difference between estimates from glmer-summary and from effects-package

I set up a GLMM in R using glmer() from lme4 package. I used effects package for calculating estimates and CIs for fixed effects.
I'm new to GLMMs, so my question is how the estimates provided by the effects package are calculated and in which way they differ from log-means given in the glmer-summary?
For example if I run
Model = glmer(response ~ fixed1 + fixed2 + (1 | random), data=df, family = poisson)
summary(Model) returns the following:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.1459 0.4863 -0.300 0.764
fixed1_level2 0.3044 0.4479 0.680 0.497
fixed2_level2 0.2298 0.3212 0.716 0.474
fixed2_level3 0.3576 0.3368 1.062 0.288
Whereas summary(allEffects(Model)) returns this:
fixed1 effect
fixed1_level1 fixed1_level2
1.125860 1.526514
fixed2 effect
fixed2_level1 fixed2_level2 fixed2_level3
1.115492 1.403738 1.594999
For what it's worth, this question is not specific to mixed models - it would be applicable to any generalized linear model.
The standard parameterization for models with categorical predictors in R (treatment contrasts) is that the intercept term gives the expected value for the first level of the factor on the "linear predictor" or "link" scale (the log scale in this case), while the second and subsequent terms give the differences (again on the log scale) between the expected values of the second, third, ... levels and the first level. Thus (if the parameters are b0, b1, b2) the predicted value of the first level is exp(b0), of the second is exp(b0+b1), of the third is exp(b0+b2). Testing for your example:
> exp(-2.4858)
[1] 0.08325892
> exp(-2.4858+1.6187)
[1] 0.4201683
> exp(-2.4858+0.8966)
[1] 0.2040888
These match up to round-off error.

Svyglm in package survey in R not returning Std Errors

I'd really appreciate some assistance with this. I'd like to estimate coefficients and 95% CI for a glm that is applied to a household survey with 2 levels (defined by dd and hh.num1). I've only recently come across the package survey.
I've been following the examples within vignette for 1) setting up a dataset to consider the sampling methods - using svydesign 2) setting up a glm using the command svyglm. For the example datasets:
library(survey)data(api)head(apiclus1)dclus1 <- svydesign(id = ~dnum, weights = ~pw, data = apiclus1)logitmodel <-svyglm(I(sch.wide=="Yes")~awards+comp.imp+enroll+target+hsg+pct.resp+mobility+ell+meals, design=dclus1, family=quasibinomial())summary(logitmodel)
Adding lots of variables seems OK so I'm confident that the package is working with a good dataset.
When I do the same to my dataset, the std errors return with "Inf" if 3 or 4 variables are added in and I can't figure out why. It seems as though it's more common with factors. I'm sorry that I haven't been able to replicate the error with the other examples, but the dataset could be downloaded here.
So using this dataset:
load("balo2_7March17.Rdat")
dclus1 <- svydesign(id=~dd+hh.num1, weights=~chweight, data = balo2)
glm1 <- svyglm(out.penta ~ factor(MN18c) + windex5 + age.y,
design=dclus1, family=quasibinomial())
summary(glm1)
If MN18c is numeric then the std errors are produced, if it's a factor (and it should be) the stnd errors are Inf. Short of knowing what else to do I'll need to try the analysis in STATA. I saw some commentary that errors may occur if applied to a "bad" dataset, but what comprises "bad"?
The problem is that you have zero residual degrees of freedom in your model. The residual df is the design df (the number of PSUs minus the number of strata) minus the number of predictors, which can easily get negative when you have two large clusters per stratum. This definition of residual df is probably conservative, but it's not a straightforward question.
> degf(dclus1)
[1] 5
> glm1$df.resid
[1] 0
You can extract the standard errors with
> SE(glm1)
(Intercept) factor(MN18c)2 factor(MN18c)3 factor(MN18c)4 windex5
0.5461374 0.4655331 0.2805168 0.3718879 0.1376936
age.y
0.1638210
and if you are willing to use a different residual degrees of freedom, you can specify that to summary and get $p$-values. In particular, if none of your covariates are at the cluster level, there is a reasonable argument that the regression doesn't use up degrees of freedom and so for one parameter at a time you can do
> summary(glm1, df=degf(dclus1))
Call:
svyglm(formula = out.penta ~ factor(MN18c) + windex5 + age.y,
design = dclus1, family = quasibinomial())
Survey design:
svydesign(id = ~dd + hh.num1, weights = ~chweight, data = balo2)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.0848 0.5461 -5.648 0.00241 **
factor(MN18c)2 -0.1183 0.4655 -0.254 0.80957
factor(MN18c)3 -0.4908 0.2805 -1.750 0.14059
factor(MN18c)4 -0.6137 0.3719 -1.650 0.15981
windex5 0.2556 0.1377 1.856 0.12256
age.y 0.9934 0.1638 6.064 0.00176 **
Combining parameters (eg to test the three coefficients making up MN18c) is more problematic, and I think you at least need df=degf(clus1)-3+1.
In the forthcoming version 4.1 the package will report standard errors in this situation (but not $p$-values unless a different df= is specified)

Post-hoc test for glmer

I'm analysing my binomial dataset with R using a generalized linear mixed model (glmer, lme4-package). I wanted to make the pairwise comparisons of a certain fixed effect ("Sound") using a Tukey's post-hoc test (glht, multcomp-package).
Most of it is working fine, but one of my fixed effect variables ("SoundC") has no variance at all (96 times a "1" and zero times a "0") and it seems that the Tukey's test cannot handle that. All pairwise comparisons with this "SoundC" give a p-value of 1.000 whereas some are clearly significant.
As a validation I changed one of the 96 "1"'s to a "0" and after that I got normal p-values again and significant differences where I expected them, whereas the difference had actually become smaller after my manual change.
Does anybody have a solution? If not, is it fine to use the results of my modified dataset and report my manual change?
Reproducible example:
Response <- c(1,0,1,1,0,1,1,0,1,1,0,1,1,0,1,1,1,1,0,
0,1,1,0,1,1,0,1,1,0,1,1,0,1,1,0,1,1,0,1,1,0,
1,1,0,1,1,0,1,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,1,0,1)
Data <- data.frame(Sound=rep(paste0('Sound',c('A','B','C')),22),
Response,
Individual=rep(rep(c('A','B'),2),rep(c(18,15),2)))
# Visual
boxplot(Response ~ Sound,Data)
# Mixed model
library (lme4)
model10 <- glmer(Response~Sound + (1|Individual), Data, family=binomial)
# Post-hoc analysis
library (multcomp)
summary(glht(model10, mcp(Sound="Tukey")))
This is verging on a CrossValidated question; you are definitely seeing complete separation, where there is a perfect division of your response into 0 vs 1 results. This leads to (1) infinite values of the parameters (they're only listed as non-infinite due to computational imperfections) and (2) crazy/useless values of the Wald standard errors and corresponding $p$ values (which is what you're seeing here). Discussion and solutions are given here, here, and here, but I'll illustrate a little more below.
To be a statistical grouch for a moment: you really shouldn't be trying to fit a random effect with only 3 levels anyway (see e.g. http://glmm.wikidot.com/faq) ...
Firth-corrected logistic regression:
library(logistf)
L1 <- logistf(Response~Sound*Individual,data=Data,
contrasts.arg=list(Sound="contr.treatment",
Individual="contr.sum"))
coef se(coef) p
(Intercept) 3.218876e+00 1.501111 2.051613e-04
SoundSoundB -4.653960e+00 1.670282 1.736123e-05
SoundSoundC -1.753527e-15 2.122891 1.000000e+00
IndividualB -1.995100e+00 1.680103 1.516838e-01
SoundSoundB:IndividualB 3.856625e-01 2.379919 8.657348e-01
SoundSoundC:IndividualB 1.820747e+00 2.716770 4.824847e-01
Standard errors and p-values are now reasonable (p-value for the A vs C comparison is 1 because there is literally no difference ...)
Mixed Bayesian model with weak priors:
library(blme)
model20 <- bglmer(Response~Sound + (1|Individual), Data, family=binomial,
fixef.prior = normal(cov = diag(9,3)))
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.711485 2.233667 0.7662221 4.435441e-01
## SoundSoundB -5.088002 1.248969 -4.0737620 4.625976e-05
## SoundSoundC 2.453988 1.701674 1.4421024 1.492735e-01
The specification diag(9,3) of the fixed-effect variance-covariance matrix produces
$$
\left(
\begin{array}{ccc}
9 & 0 & 0 \
0 & 9 & 0 \
0 & 0 & 9
\end{array}
\right)
$$
In other words, the 3 specifies the dimension of the matrix (equal to the number of fixed-effect parameters), and the 9 specifies the variance -- this corresponds to a standard devation of 3 or a 95% range of about $\pm 6$, which is quite large/weak/uninformative for logit-scaled responses.
These are roughly consistent (the model is very different)
library(multcomp)
summary(glht(model20, mcp(Sound="Tukey")))
## Estimate Std. Error z value Pr(>|z|)
## SoundB - SoundA == 0 -5.088 1.249 -4.074 0.000124 ***
## SoundC - SoundA == 0 2.454 1.702 1.442 0.309216
## SoundC - SoundB == 0 7.542 1.997 3.776 0.000397 ***
As I said above, I would not recommend a mixed model in this case anyway ...

Getting Generalized Least Squares Means for fixed effects in nlme or lme4

Least Squares Means with their standard errors for aov object can be obtained with model.tables function:
npk.aov <- aov(yield ~ block + N*P*K, npk)
model.tables(npk.aov, "means", se = TRUE)
I wonder how to get the generalized least squares means with their standard errors from nlme or lme4 objects:
library(nlme)
data(Machines)
fm1Machine <- lme(score ~ Machine, data = Machines, random = ~ 1 | Worker )
Any comment and hint will be highly appreciated. Thanks
lme and nlme fit through maximum likelihood or restricted maximum likelihood (the latter is the default), so your results will be based on either of those methods
summary(fm1Machine) will provide you with the output that includes the means and standard errors:
....irrelevant output deleted
Fixed effects: score ~ Machine
Value Std.Error DF t-value p-value
(Intercept) 52.35556 2.229312 46 23.48507 0
MachineB 7.96667 1.053883 46 7.55935 0
MachineC 13.91667 1.053883 46 13.20514 0
Correlation:
....irrelevant output deleted
Because you have fitted the fixed effects with an intercept, you get an intercept term in the fixed effects result instead of a result for MachineA. The results for MachineB and MachineC are contrasts with the intercept, so to get the means for MachineB and MachineC, add the value of each to the intercept mean. But the standard errors are not the ones you would like.
To get the information you are after, fit the model so it doesn't have an intercept term in the fixed effects (see the -1 at the end of the fixed effects:
fm1Machine <- lme(score ~ Machine-1, data = Machines, random = ~ 1 | Worker )
This will then give you the means and standard error output you want:
....irrelevant output deleted
Fixed effects: score ~ Machine - 1
Value Std.Error DF t-value p-value
MachineA 52.35556 2.229312 46 23.48507 0
MachineB 60.32222 2.229312 46 27.05867 0
MachineC 66.27222 2.229312 46 29.72765 0
....irrelevant output deleted
To quote Douglas Bates from
http://markmail.org/message/dqpk6ftztpbzgekm
"I have a strong suspicion that, for most users, the definition of lsmeans is "the numbers that I get from SAS when I use an lsmeans statement". My suggestion for obtaining such numbers is to buy a SAS license and use SAS to fit your models."

Resources