Failed to contrast intercepts through emmeans in R - r

I would like to test the simetry in the response of an observer to a contrast stimuli with different polarity, positive (white) and negative (black). I took the reaction time (RT) as dependent variable, along four different contrasts. It is known that the response time follows a Pieron curve whose asymptotas are placed (1) at observer threshold (Inf) and (2) at a base RT placed somewere between 250 and 450 msec.
The knowledge allows us to linearize the relationship transforming the independent variable (effective contrast EC) as 1/EC^2 (tEC), so the equation linking RT to EC becomes:
RT = m * tEC + RT0
To test the symmetry I established the criteria: same slope and same intercept in the two polarities implies symmetry.
To obtain the coefficients I made a linear model with interaction (coding trough a dummy variable for Polarity: Positive or Negative). The output of lm is clear to me, but some colegues prefer somthing more similar to an ANOVA output. So I decided to use emmeans to make the contrasts. With the slope is all right, but when computing the interceps starts the problem. The intercepts computed by lm are very different from the output of emmeans, and the conclusions are also different. In what follows I reproduce the example.
The question is two fold: It is possible to use emmeans to solve my problem? If not, it is possible to make the contrasts through other packages (which one)?
Data
RT1000
EC
tEC
Polarity
596.3564
-25
0.001600
Negative
648.2471
-20
0.002500
Negative
770.7602
-17
0.003460
Negative
831.2971
-15
0.004444
Negative
1311.3331
15
0.004444
Positive
1173.8942
17
0.003460
Positive
1113.7240
20
0.002500
Positive
869.3635
25
0.001600
Positive
Code
# Model
model <- lm(RT1000 ~ tEC * Polarity, data = Data)
# emmeans
library(emmeans)
# Slopes
m.slopes <- lstrends(model, "Polarity", var="tEC")
# Intercepts
m.intercept <- lsmeans(model, "Polarity")
# Contrasts
pairs(m.slopes)
pairs(m.intercept)
Outputs
Modelo
term
estimate
std.error
statistic
p.value
(Intercept)
449.948
66.829
6.733
0.003
tEC
87205.179
20992.976
4.154
0.014
PolarityPositive
230.946
94.511
2.444
0.071
tEC:PolarityPositive
58133.172
29688.551
1.958
0.122
Slopes (it is all right)
Polarity
tEC.trend
SE
df
lower.CL
upper.CL
Negative
87205.18
20992.98
4
28919.33
145491.0
Positive
145338.35
20992.98
4
87052.51
203624.2
contrast
estimate
SE
df
t.ratio
p.value
Negative - Positive
-58133.17
29688.55
4
-1.958101
0.12182
Intercepts (problem)
Polarity
lsmean
SE
df
lower.CL
upper.CL
Negative
711.6652
22.2867
4
649.7874
773.543
Positive
1117.0787
22.2867
4
1055.2009
1178.957
contrast
estimate
SE
df
t.ratio
p.value
Negative - Positive
-405.4135
31.51816
4
-12.86285
0.000211
Computed intercepts through emmeans differs from the ones computed by lm. I think the problem is that the model is not defined for EC = 0. But I'm not sure.

What you are calling the intercepts are not; they are the model predictions at the mean value of tEC. If you want the intercepts, use instead:
m.intercept <- lsmeans(model, "Polarity", at = list(tEC = 0))
You can tell what reference levels are being used via
ref_grid(model) # or str(m.intercept)
Please note that the model fitted here consists of two lines with different slopes; hence the difference between the predictions changes depending on the value of tEC. Thus, I would strongly recommend against testing the comparison of the intercepts; those are predictions at a tEC value that, as you say, can't even occur. Instead, try to be less of a mathematician and do the comparisons at a few representative values of tEC, e.g.,
LSMs <- lsmeans(model, "Polarity", at = list(tEC = c(0.001, 0.003, 0.005)))
pairs(LSMs, by = tEC)
You can also easily visualize the fitted lines:
emmip(model, Polarity ~ tEC, cov.reduce = range)

Related

emmeans: regrid() for binomial GLMM with user-defined link function

I have fitted a binomial GLMM in R with a modified link function with a fixed guessing probability as suggested in this thread - except that the guessing probability is 1/2 and not 1/3. Therefore the sigmoidal activation in my case becomes:
P(correct) = 0.5 + 0.5*(exp(term)/(1 + exp(term))).
My model looks like this:
library(lme4)
m = 2
mod = glmer(correct ~ group*stim_strength + (stim_strength|subject) ,
family=binomial(link=mafc.logit(m)), data=obs_data)
where: guessing probability is 1/m; correct is a categorical variable indicating correct/incorrect response; group is a factor with two levels; stim_strength is numerical with values in [0,1]; mafc.logit is the function suggested in the thread.
I'm essentially fitting separate psychometric curves of the stimulus strength (stim_strength) for the two groups, while taking into account the inter-subject fluctuations in slope and intercept (random effect structure (stim_strength|subject))
This is what I get:
plot_model(mod, type = 'emm', terms = c('stim_strength', 'group'))
---> plot
The model describes the data nicely, and I now want to perform some post-hoc analyses on it. Specifically, I want to run for example:
mod.emm = emmeans(mod, ~group|stim_strength, at=list(stim_strength=c(.25,.75)))
confint(regrid(mod.emm))
contrast(regrid(mod.emm), 'pairwise', simple = 'group', combine = TRUE, adjust = 'holm')
i.e. compute confidence intervals for the %correct of the two groups at some specified values of stim_strength, and compare the %correct of the two groups at these values.
Note that I'm using regrid(), because I want the analyses to be done on the back-transformed values, not on the linear part of the model!
However, regrid() won't work with a user-defined link function. In fact, the regrid is just ignored here, as you can see e.g. from the output of the confint() call above (estimates are labelled as prob but they're clearly not transformed to [.5,1]):
stim_strength = 0.25:
group prob SE df asymp.LCL asymp.UCL
1 -1.329 0.173 Inf -1.716 -0.942
2 -0.553 0.161 Inf -0.913 -0.192
stim_strength = 0.75:
group prob SE df asymp.LCL asymp.UCL
1 1.853 0.372 Inf 1.018 2.687
2 3.375 0.395 Inf 2.489 4.261
Similarly, when adding type='response' in emmeans, I get the message:
Unknown transformation "mafc.logit(2)": no transformation done
Any workaround?
Thanks!
Looking at the linked suggestion, it appears that mafc.logit() is a function that returns a list with all the information needed to implement the transform. All you need to do is update the emmGrid object with that information:
mod.emm <- update(mod.emm, tran = mafc.logit(2))
confint(regrid(mod.emm), adjust = 'holm')
# etc...
See, for example, this vignette section and possibly other parts of that vignette.

How to get value of group = 0 in linear mixed model

I have a very simple stat question probably.
So, I am fitting linear mixed models like this:
lme(dependent ~ Group + Sex + Age + npgs, data=boookclub, random = ~ 1| subject)
Group is a factor variable with levels = 0, 1 , 2 , 3
The dependent are continuous variables standardized (mean 0) and the others are covariates with sex being factor, with Male/Female levels, Age being numerical, and npgs being numerical continuous standardized as well.
When I get the table with beta, standard error, t and p values, I get this:
Value Std.Error DF t-value p-value
(Intercept) -0.04550502 0.02933385 187 -1.551280 0.0025
Group1 0.04219801 0.03536929 181 1.193069 0.2344
Group2 0.03350827 0.03705896 181 0.904188 0.3671
Group3 0.00192119 0.03012654 181 0.063771 0.9492
SexMale 0.03866387 0.05012901 181 0.771287 0.4415
Age -0.00011675 0.00148684 181 -0.078520 0.9375
npgs 0.15308844 0.01637163 181 9.350835 0.0000
SexMale:Age 0.00492966 0.00276117 181 1.785352 0.0759
My problem is: how do I get the beta of Group0? In this case the intercept is Group0 but also the average of npgs, being npgs standardized. How do I get the Beta of Group0? And how can I check if Group0 is significantly associated to the dependent? I'd like to see the effect of all Group levels.
Thanks
The easiest way to do what you want may be with the emmeans package, but you may also have some conceptual issues. Technical details first, then conceptual:
Technical
Fitting an example (this isn't necessarily statistically sensible, but I wanted an example with a categorical fixed effect)
library(nlme)
m1 <- lme(Yield~Variety, random = ~1|Block, data=Alfalfa)
As with your example, the effects are "intercept" (= mean of the baseline group, which is the "Cossack" variety in this case [by default, the alphabetically-first group]), "Ladak" (difference between Ladak and Cossack means) and "Ranger" (similarly). (As #Ben hints in the comments above, R automatically generates dummies for [most of] the levels of the categorical variables [factors] in your model.)
coef(summary(m1))
## Value Std.Error DF t-value p-value
## (Intercept) 1.57166667 0.11665326 64 13.4729767 2.373343e-20
## VarietyLadak 0.09458333 0.07900687 64 1.1971532 2.356624e-01
## VarietyRanger -0.01916667 0.07900687 64 -0.2425949 8.090950e-01
The emmeans package is a convenient way to see predicted values for each group without recoding.
library(emmeans)
emmeans(m1, spec = ~Variety)
## Variety emmean SE df lower.CL upper.CL
## Cossack 1.57 0.117 5 1.27 1.87
## Ladak 1.67 0.117 5 1.37 1.97
## Ranger 1.55 0.117 5 1.25 1.85
Conceptual
You can't "check if Group0 is significantly associated with the dependent [response] variable". You can only check whether the response variables differs significantly between two groups, or whether it differs significantly among all groups (e.g. the results of anova()). You have to pick a baseline. (If you insist, you can test all pairwise comparisons among groups; emmeans can help with this too.) If you "remove the intercept" (by fitting Variety ~Yield-1, or by looking at the results that emmeans produces) then the difference you are quantifying is the difference between the mean of a particular group and zero. This is usually not a meaningful question; in the example here, for instance, this would be testing whether a wheat variety gave a yield that was significantly greater than zero — probably not very interesting.
On the other hand, if you are just interested in estimating the expected value in each group (conditioning on the baseline values of the other variables in the model), along with the standard errors/CIs, then the answers you get from emmeans are perfectly sensible.
There's a related question here that explains why you get an NA value if you manually create dummies for every level of your factor ...

Syntax for diagonal variance-covariance matrix for non-linear mixed effects model in nlme

I am analysing routinely collected substance use data during the first 12 months' of treatment in a large sample of outpatients attending drug and alcohol treatment services. I am interested in whether differing levels of methamphetamine use (no use, low use, and high use) at the outset of treatment predicts different levels after a year in treatment, but the data is very irregular, with different clients measured at different times and different numbers of times during their year of treatment.
The data for the high and low use group seem to suggest that drug use at outset reduces during the first 3 months of treatment and then asymptotes. Hence I thought I would try a non-linear exponential decay model.
I started with the following nonlinear generalised least squares model using the gnls() function in the nlme package:
fitExp <- gnls(outcome ~ C*exp(-k*yearsFromStart),
params = list(C ~ atsBase_fac, k ~ atsBase_fac),
data = dfNL,
start = list(C = c(nsC[1], lsC[1], hsC[1]),
k = c(nsC[2], lsC[2], hsC[2])),
weights = varExp(-0.8, form = ~ yearsFromStart),
control = gnlsControl(nlsTol = 0.1))
where outcome is number of days of drug use in the 28 days previous to measurement, atsBase_fac is a three-level categorical predictor indicating level of amphetamine use at baseline (noUse, lowUse, and highUse), yearsFromStart is a continuous predictor indicating time from start of treatment in years (baseline = 0, max - 1), C is a parameter indicating initial level of drug use, and k is the rate of decay in drug use. The starting values of C and k are taken from nls models estimating these parameters for each group. These are the results of that model
Generalized nonlinear least squares fit
Model: outcome ~ C * exp(-k * yearsFromStart)
Data: dfNL
AIC BIC logLik
27672.17 27725.29 -13828.08
Variance function:
Structure: Exponential of variance covariate
Formula: ~yearsFromStart
Parameter estimates:
expon
0.7927517
Coefficients:
Value Std.Error t-value p-value
C.(Intercept) 0.130410 0.0411728 3.16738 0.0015
C.atsBase_faclow 3.409828 0.1249553 27.28839 0.0000
C.atsBase_fachigh 20.574833 0.3122500 65.89218 0.0000
k.(Intercept) -1.667870 0.5841222 -2.85534 0.0043
k.atsBase_faclow 2.481850 0.6110666 4.06150 0.0000
k.atsBase_fachigh 9.485155 0.7175471 13.21886 0.0000
So it looks as if there are differences between groups in initial rate of drug use and in rate of reduction in drug use. I would like to go a step further and fit a nonlinear mixed effects model.I tried consulting Pinhiero and Bates' book accompanying the nlme package but the only models I could find that used irregular, sparse data like mine used a self-starting function, and my model does not do that.
I tried to adapt the gnls() model to nlme like so:
fitNLME <- nlme(model = outcome ~ C*exp(-k*yearsFromStart),
data = dfNL,
fixed = list(C ~ atsBase_fac, k ~ atsBase_fac),
random = pdDiag(yearsFromStart ~ id),
groups = ~ id,
start = list(fixed = c(nsC[1], lsC[1], hsC[1], nsC[2], lsC[2], hsC[2])),
weights = varExp(-0.8, form = ~ yearsFromStart),
control = nlmeControl(optim = "optimizer"))
bit I keep getting error message, I presume through errors in the syntax specifying the random effects.
Can anyone give me some tips on how the syntax for the random effects works in nlme?
The only dataset in Pinhiero and Bates that resembled mine used a diagonal variance-covariance matrix. Can anyone filled me in on the syntax of this nlme function, or suggest a better one?
p.s. I wish I could provide a reproducible example but coming up with synthetic data that re-creates the same errors is way beyond my skills.

Post-hoc test for glmer

I'm analysing my binomial dataset with R using a generalized linear mixed model (glmer, lme4-package). I wanted to make the pairwise comparisons of a certain fixed effect ("Sound") using a Tukey's post-hoc test (glht, multcomp-package).
Most of it is working fine, but one of my fixed effect variables ("SoundC") has no variance at all (96 times a "1" and zero times a "0") and it seems that the Tukey's test cannot handle that. All pairwise comparisons with this "SoundC" give a p-value of 1.000 whereas some are clearly significant.
As a validation I changed one of the 96 "1"'s to a "0" and after that I got normal p-values again and significant differences where I expected them, whereas the difference had actually become smaller after my manual change.
Does anybody have a solution? If not, is it fine to use the results of my modified dataset and report my manual change?
Reproducible example:
Response <- c(1,0,1,1,0,1,1,0,1,1,0,1,1,0,1,1,1,1,0,
0,1,1,0,1,1,0,1,1,0,1,1,0,1,1,0,1,1,0,1,1,0,
1,1,0,1,1,0,1,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,1,0,1)
Data <- data.frame(Sound=rep(paste0('Sound',c('A','B','C')),22),
Response,
Individual=rep(rep(c('A','B'),2),rep(c(18,15),2)))
# Visual
boxplot(Response ~ Sound,Data)
# Mixed model
library (lme4)
model10 <- glmer(Response~Sound + (1|Individual), Data, family=binomial)
# Post-hoc analysis
library (multcomp)
summary(glht(model10, mcp(Sound="Tukey")))
This is verging on a CrossValidated question; you are definitely seeing complete separation, where there is a perfect division of your response into 0 vs 1 results. This leads to (1) infinite values of the parameters (they're only listed as non-infinite due to computational imperfections) and (2) crazy/useless values of the Wald standard errors and corresponding $p$ values (which is what you're seeing here). Discussion and solutions are given here, here, and here, but I'll illustrate a little more below.
To be a statistical grouch for a moment: you really shouldn't be trying to fit a random effect with only 3 levels anyway (see e.g. http://glmm.wikidot.com/faq) ...
Firth-corrected logistic regression:
library(logistf)
L1 <- logistf(Response~Sound*Individual,data=Data,
contrasts.arg=list(Sound="contr.treatment",
Individual="contr.sum"))
coef se(coef) p
(Intercept) 3.218876e+00 1.501111 2.051613e-04
SoundSoundB -4.653960e+00 1.670282 1.736123e-05
SoundSoundC -1.753527e-15 2.122891 1.000000e+00
IndividualB -1.995100e+00 1.680103 1.516838e-01
SoundSoundB:IndividualB 3.856625e-01 2.379919 8.657348e-01
SoundSoundC:IndividualB 1.820747e+00 2.716770 4.824847e-01
Standard errors and p-values are now reasonable (p-value for the A vs C comparison is 1 because there is literally no difference ...)
Mixed Bayesian model with weak priors:
library(blme)
model20 <- bglmer(Response~Sound + (1|Individual), Data, family=binomial,
fixef.prior = normal(cov = diag(9,3)))
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.711485 2.233667 0.7662221 4.435441e-01
## SoundSoundB -5.088002 1.248969 -4.0737620 4.625976e-05
## SoundSoundC 2.453988 1.701674 1.4421024 1.492735e-01
The specification diag(9,3) of the fixed-effect variance-covariance matrix produces
$$
\left(
\begin{array}{ccc}
9 & 0 & 0 \
0 & 9 & 0 \
0 & 0 & 9
\end{array}
\right)
$$
In other words, the 3 specifies the dimension of the matrix (equal to the number of fixed-effect parameters), and the 9 specifies the variance -- this corresponds to a standard devation of 3 or a 95% range of about $\pm 6$, which is quite large/weak/uninformative for logit-scaled responses.
These are roughly consistent (the model is very different)
library(multcomp)
summary(glht(model20, mcp(Sound="Tukey")))
## Estimate Std. Error z value Pr(>|z|)
## SoundB - SoundA == 0 -5.088 1.249 -4.074 0.000124 ***
## SoundC - SoundA == 0 2.454 1.702 1.442 0.309216
## SoundC - SoundB == 0 7.542 1.997 3.776 0.000397 ***
As I said above, I would not recommend a mixed model in this case anyway ...

Testing differences in coefficients including interactions from piecewise linear model

I'm running a piecewise linear random coefficient model testing the influence of a covariate on the second piece. Thereby, I want to test whether the coefficient of the second piece under the influence of the covariate (piece2 + piece2:covariate) differs from the coefficient of the first piece (piece1), hence whether the growth rate differs.
I set up some exemplary data:
set.seed(100)
# set up dependent variable
temp <- rep(seq(0,23),50)
y <- c(rep(seq(0,23),50)+rnorm(24*50), ifelse(temp <= 11, temp + runif(1200), temp + rnorm(1200) + (temp/sqrt(temp))))
# set up ID variable, variables indicating pieces and the covariate
id <- sort(rep(seq(1,100),24))
piece1 <- rep(c(seq(0,11), rep(11,12)),100)
piece2 <- rep(c(rep(0,12), seq(1,12)),100)
covariate <- c(rep(0,24*50), rep(c(rep(0,12), rep(1,12)), 50))
# data frame
example.data <- data.frame(id, y, piece1, piece2, covariate)
# run piecewise linear random effects model and show results
library(lme4)
lmer.results <- lmer(y ~ piece1 + piece2*covariate + (1|id) , example.data)
summary(lmer.results)
I came across the linearHypothesis() command from the car package to test differences in coefficients. However, I could not find an example on how to use it when including interactions.
Can I even use linearHypothesis() to test this or am I aiming for the wrong test?
I appreciate your help.
Many thanks in advance!
Mac
Assuming your output looks like this
Estimate Std. Error t value
(Intercept) 0.26293 0.04997 5.3
piece1 0.99582 0.00677 147.2
piece2 0.98083 0.00716 137.0
covariate 2.98265 0.09042 33.0
piece2:covariate 0.15287 0.01286 11.9
if I understand correctly what you want, you are looking for the contrast:
piece1-(piece2+piece2:covariate)
or
c(0,1,-1,0,-1)
My preferred tool for this is function estimable in gmodels; you could also do it by hand or with one of the functions in Frank Harrel's packages.
library(gmodels)
estimable(lmer.results,c(0,1,-1,0,-1),conf.int=TRUE)
giving
Estimate Std. Error p value Lower.CI Upper.CI
(0 1 -1 0 -1) -0.138 0.0127 0 -0.182 -0.0928

Resources