interaction contrast with glmer - r

I am running a model with a similar structure:
model <- glmer(protest ~ factora*factorb*numeric+factora+factorb+numeric+1 + (1 + factor1|level1) + (1|level2), data=data, family=binomial(link='logit'))
where factora and factorb are factor variables, numeric is a numerical variable.
I am curious of the statistical significance of the contrast in the interaction while holding factora constant at 3, between two values of factorb (1-5) across the range of the numerical value.
I have tried the following options with no luck:
library(psycho)
get_contrasts(model, formula="factora:factorb:numeric", adjust="tukey")
View(contrasts$contrasts)
this works, but unfortunately the results hold numeric constant and only vary factora and factorb. Therefore, it does not answer my question.
I have also tried:
library(multcomp)
test = glht(model,linfct=mcp("factora:factorb:numeric"="Tukey"))
this yields the error of
Error in mcp2matrix(model, linfct = linfct) :
Variable(s) ‘factora:factorb:numeric’ have been specified in ‘linfct’ but cannot be found in ‘model’!
without regard of the way I specify the interaction and despite other functions like get_contrasts finding the interaction specified the same way.
I have also tried:
library(emmeans)
contrast(m.3[[2]], interaction = c("factora", "factorb", "numeric"))
this however does not support glmer.
Any ideas?

There are a couple of issues here that are tripping you up.
One is that we don't really apply contrasts to numeric predictors. Numeric predictors have slopes, not contrasts; and if you have a model where a numeric predictor interacts with a factor, that means that the slope for the numeric predictor is different for each level of the factor. The function emtrends() in the emmeans package can help you estimate those different slopes.
The second is that the interaction argument in emmeans::contrast() needs a specification for the type of contrasts to use, e.g., "pairwise". The factors to apply them to are those in the emmGrid object in the first argument.
So... I think maybe you want to try something like this:
emt <- emtrends(model, ~ factora*factorb, var = "numeric")
emt # list the estimated slopes
contrast(emt, interaction = "consec")
# obtain interaction contrasts comparing consecutive levels of the two factors

Related

Calculating VIF for ordinal logistic regression & multicollinearity in R

I am running an ordinal regression model. I have 8 explanatory variables, 4 of them categorical ('0' or '1') , 4 of them continuous. Beforehand I want to be sure there's no multicollinearity, so I use the variance inflation factor (vif function from the car package) :
mod1<-polr(Y ~ X1+X2+X3+X4+X5+X6+X7+X8, Hess = T, data=df)
vif(mod1)
but I get a VIF value of 125 for one of the variables, as well as the following warning :
Warning message: In vif.default(mod1) : No intercept: vifs may not be sensible.
However, when I convert my dependent variable to numeric (instead of a factor), and do the same thing with a linear model :
mod2<-lm(Y ~ X1+X2+X3+X4+X5+X6+X7+X8, data=df)
vif(mod2)
This time all the VIF values are below 3, suggesting that there's no multicollinearity.
I am confused about the vif function. How can it return VIFs > 100 for one model and low VIFs for another ? Should I stick with the second result and still do an ordinal model anyway ?
The vif() function uses determinants of the correlation matrix of the parameters (and subsets thereof) to calculate the VIF. In the linear model, this includes just the regression coefficients (excluding the intercept). The vif() function wasn't intended to be used with ordered logit models. So, when it finds the variance-covariance matrix of the parameters, it includes the threshold parameters (i.e., intercepts), which would normally be excluded by the function in a linear model. This is why you get the warning you get - it doesn't know to look for threshold parameters and remove them. Since the VIF is really a function of inter-correlations in the design matrix (which doesn't depend on the dependent variable or the non-linear mapping from the linear predictor into the space of the response variable [i.e., the link function in a glm]), you should get the right answer with your second solution above, using lm() with a numeric version of your dependent variable.

Specify varying-coefficient and factor-level replicate in mgcv

I am using the mgcv package in R to fit smooths. I am interested in fitting a varying-coefficient model, where the varying-coefficient smooth also varies based on a factor variable.
According to the mgcv documentation, when specifying a smooth with s(), the by argument can take either a numeric variable, in which case a varying-coefficient model is fit, OR a factor variable, in which case a replicate of the smooth is produced for each factor level. However, the documentation does not say how to specify a model with a varying-coefficient effect AND have that effect differ across multiple factor levels. I dont see any reason why this wouldnt be possible, so it is somewhat odd that these two different effects are specified by the same argument.
I reached out to the creator of the mgcv package with this question, and here is the response I got:
This should be possible, but there may be an identifiability problem.
Suppose f is a factor with levels "a", "b", "c", x is your smoothing
variable and z your covariate of interest, you want a model something like
y = z s_f(x) + noise
Then
da <- as.numeric(f="a")*z
db <- as.numeric(f="b")*z
dc <- as.numeric(f="c")*z
gam(y ~ s(x,by=da) + s(x,by=db) + s(x,by=dc) - 1)
would fit the model. The only problem with this is the -1. We need it
because otherwise the intercepts of the smooths are confounded with the
overall intercept. -1 deals with the identifiability in this simple
case, but won't in some other situations - for example if you include a
factor variable parametrically in the model, or have a second set of
smooths conditional on factor levels in this way.
A possibility in these cases would be to use the select=TRUE argument
to gam, which will formally remove the identifiability problem.

Is it necessary to include a factor (fitted as a smooth-factor interaction) as a parametric term within a gam?

I am interested in investigating a non-linear temporal trend in a data set and so I would like to use the R package mgcv to fit the following GAM:
model1 <- gam(Variable ~ s(Date, by = Site.Factor), data = data)
where Variable is the continuous variable of interest, Site.Factor is a factor with two levels and Date is a continuous variable.
I have read that know that because of the inclusion of the by factor within the smoothing function, differences in the means of the two factor levels are not accounted for. I should therefore include Site.Factor as a parametric term like so:
model2 <- gam(Variable ~ Site.Factor + s(Date, by = Site.Factor), data = data)
However, whilst I might expect the influence of Site.Factor on the smooth to be significant, I do not expect the means of each level of the factor to be significant. Do I still need to include the factor separately within the model as in model1, or would model2 be okay?
Unless you know that the populations from which your data are drawn have exactly the same mean then yes, you should include the term Site.Factor as a fixed effect term, whether that difference in sample is significant or not.

Tukey HSD for multiple variables and single variable return different results

I have tried to run Tukey HSD for a multi-variable dataset. However, when I run the same test on a single variable, the results are completely opposite.
While running for multiple variables, I observed the following error in ANOVA output:
8 out of 87 effects not estimable
Estimated effects may be unbalanced
While running for single variable, I observed the following error in ANOVA output:
Estimated effects may be unbalanced
Is this in any way related to the completely opposite Tukey HSD output which I received? Also, how do I go on solving this problem?
I used aov() and have close to 500000 datapoints in my dataset.
to be more specific, the following code gave me a different result:
code1:
lm_test1 <- lm(y ~ x1+ x2, data=data)
glht(lm_test1, linfct = mcp(x1 = "Tukey"))
code2:
lm_test1 <- lm(y ~ x1, data=data)
glht(lm_test1, linfct = mcp(x1 = "Tukey"))
Please tell me how this is possible...
after some more research, I found the answer to this, so thought I should post this. Anova in R is by default type - I anova. So that means the first variable that we input, the effects are considered without controlling for any other factors, on the other hand, for the other variables, the results are shown after controlling for the effects of other variables. Therefore, since I was inputting my variable as the 2nd variable, the results shown were after controlling for the 1st variable which was by chance, in a completely opposite direction to looking at a direct effect.

Post hoc test in Generalised linear mixed models: how to do?

I am working with a mixed model (glmmadmb) in R for count data. I have one random factor (Locality), and one fixed factor(Habitat). The fixed factor has two levels, and the random factor has seven levels. I want to do comparisons between the two levels of fixed factor within each of the seven levels of random factor. But I don't know how to do it in R. I am very new to R. Can anyone help me? Many thanks.
This is my glmm formula for over dispersed data:
model<-glmmadmb(Species.abundance~Habitat(1|Locality:Habitat),
data=data,family='nbinom1')
I tried it with just "Habitat" but it is clearly not taking Locality into account:
summary(glht(model,linfct=mcp(Habitat='Tukey')))
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: glmmadmb(formula = Species.abundance ~ Habitat + (1 | Locality:Habitat),
data = data, family = "nbinom1")
Linear Hypotheses:
Estimate Std. Error z value Pr(>|z|)
Fynbos - Forest == 0 -0.2614 0.2010 -1.301 0.193
(Adjusted p values reported -- single-step method)
I would probably just do separate tests within each Locality, and do multiple-comparison corrections if you like. Functions from plyr are convenient, but not necessary, to do this, something like
library(plyr)
library(glmmADMB)
model.list <- dlply(data,"Locality",glmmadmb,
formula=Species.abundance~Habitat,
family="nbinom1")
p.vals <- laply(model.list,function(x) coef(summary(x))[2,"Pr(>|z|)"])
p.adjust(pvals)
(I can't guarantee that this actually works since you haven't given a reproducible example and I can't be bothered to invent one ...)

Resources