Coefficients output in R without alphabetical stacked-up default - r

Both in lm() as in glm or in lmer the default output for coefficients is to format them as as an (Intercept) corresponding to the variable with the highest alphabet order, followed by the rest of the coefficients. To find out the actual value of any coefficient, it is necessary to add (or subtract) from the (Intercept) baseline and, depending on the model, multiple intermediate coefficients.
Although this is not a problem when using just a couple of regressors, it is cumbersome and prone to error in more complex models. For example:
For the data:
head(trees)
site tree treatment organ sample tissue length
1 L LT01 T root 1 phloem 90.9924
2 L LT01 T root 1 xylem 123.4933
3 L LT01 T root 2 phloem 101.2444
4 L LT01 T root 2 xylem 106.0529
5 L LT01 T root 3 phloem 108.8453
6 L LT01 T root 3 xylem 126.5165
The call:
fit <- lmer(length ~ treatment + organ + tissue + (1|tree/organ/sample), data = trees)
summary(fit)
Random effects:
Groups Name Variance Std.Dev.
sample:(organ:tree) (Intercept) 1.035e-12 1.017e-06
organ:tree (Intercept) 0.000e+00 0.000e+00
tree (Intercept) 8.796e+00 2.966e+00
Residual 8.873e+01 9.420e+00
Number of obs: 360, groups: sample:(organ:tree), 180; organ:tree, 60; tree, 30
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 93.7903 1.2539 58.4000 74.798 < 2e-16 ***
treatmentT 13.5571 1.4692 28.0000 9.227 5.51e-10 ***
organstem 8.1326 0.9929 328.0000 8.191 5.77e-15 ***
tissuexylem 13.9814 0.9929 328.0000 14.081 < 2e-16 ***
yields coefficients expressed in a highly dependent way, which makes the quick calculation of any one of them tedious.
Is there a way of getting the actual coefficients, at least for the fixed effects, in a more straightforward format?

Related

power analysis in simr - object is not a matrix

I have the following model:
ModelPower <- lmer(DV ~ GroupAbstract * Condition_Cat_Abs + (1|Participant) + (1 + GroupAbstract|Stimulus), data = Dataset)
This model gives the following output:
Random effects:
Groups Name Variance Std.Dev. Corr
Participant (Intercept) 377.401 19.427
Stimulus (Intercept) 91.902 9.587
GroupAbstractOutgroup 2.003 1.415 -0.40
Residual 338.927 18.410
Number of obs: 16512, groups: Participant, 344; Stimulus, 32
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 65.8962 2.0239 59.6906 32.559 < 0.0000000000000002 ***
GroupAbstractOutgroup -0.9287 0.5561 129.9242 -1.670 0.0973 .
Condition_Cat_AbsSecondOrderIn -2.2584 0.4963 16103.9277 -4.550 0.00000539 ***
Condition_Cat_AbsSecondOrderOut -7.0821 0.4963 16103.9277 -14.270 < 0.0000000000000002 ***
GroupAbstractOutgroup:Condition_Cat_AbsSecondOrderIn -3.0229 0.7019 16103.9277 -4.307 0.00001665 ***
GroupAbstractOutgroup:Condition_Cat_AbsSecondOrderOut 7.8765 0.7019 16103.9277 11.222 < 0.0000000000000002 ***
I am interested in the interaction "GroupAbstractOutgroup:Condition_Cat_AbsSecondOrderIn" and I am trying to estimate the sample size to detect an effect size of at least -2 using the R package simr. the original slope is -3.02 so I specify the new one:
ModelPower#beta[names(fixef(ModelPower)) %in% "GroupAbstractOutgroup:Condition_Cat_AbsSecondOrderIn"] <- -2
However, regardless of how I specify the powerSim function both for the main effects and interactions (see some examples below), I get power of 0% and the following error when running lastResult()$errors 'object is not a matrix'. I know what the error should mean but even after converting the original data frame and the table of fixed effects to a matrix, the error is still there and I am not sure what it is referring to and how to get the actual output. Any help would be much appreciated!
Examples of the powerSim function:
powerSim(ModelPower, test=fixed("GroupAbstract", "anova"), nsim=10, seed=1)
powerSim(ModelPower, test=fixed("GroupAbstractOutgroup:Condition_Cat_AbsSecondOrderIn", "anova"), nsim=10, seed=1)

Likelihood Ratio Test for LMM gives P-Value of 1?

I did an experiment in which people had to give answers to moral dilemmas that were either personal or impersonal. I now want to see if there is an interaction between the type of dilemma and the answer participants gave (yes or no) that influences their reaction time.
For this, I computed a Linear Mixed Model using the lmer()-function of the lme4-package.
My Data looks like this:
subject condition gender.b age logRT answer dilemma pers_force
1 105 a_MJ1 1 27 5.572154 1 1 1
2 107 b_MJ3 1 35 5.023881 1 1 1
3 111 a_MJ1 1 21 5.710427 1 1 1
4 113 c_COA 0 31 4.990433 1 1 1
5 115 b_MJ3 1 23 5.926926 1 1 1
6 119 b_MJ3 1 28 5.278115 1 1 1
My function looks like this:
lmm <- lmer(logRT ~ pers_force * answer + (1|subject) + (1|dilemma),
data = dfb.3, REML = FALSE, control = lmerControl(optimizer="Nelder_Mead"))
with subjects and dilemmas as random factors. This is the output:
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: logRT ~ pers_force * answer + (1 | subject) + (1 | dilemma)
Data: dfb.3
Control: lmerControl(optimizer = "Nelder_Mead")
AIC BIC logLik deviance df.resid
-13637.3 -13606.7 6825.6 -13651.3 578
Scaled residuals:
Min 1Q Median 3Q Max
-3.921e-07 -2.091e-07 2.614e-08 2.352e-07 6.273e-07
Random effects:
Groups Name Variance Std.Dev.
subject (Intercept) 3.804e-02 1.950e-01
dilemma (Intercept) 0.000e+00 0.000e+00
Residual 1.155e-15 3.398e-08
Number of obs: 585, groups: subject, 148; contrasts, 4
Fixed effects:
Estimate Std. Error t value
(Intercept) 5.469e+00 1.440e-02 379.9
pers_force1 -1.124e-14 5.117e-09 0.0
answer -1.095e-15 4.678e-09 0.0
pers_force1:answer -3.931e-15 6.540e-09 0.0
Correlation of Fixed Effects:
(Intr) prs_f1 answer
pers_force1 0.000
answer 0.000 0.447
prs_frc1:aw 0.000 -0.833 -0.595
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see ?isSingular
I then did a Likelihood Ratio Test using a reduced model to obtain p-Values:
lmm_null <- lmer(logRT ~ pers_force + answer + (1|subject) + (1|dilemma),
data = dfb.3, REML = FALSE,
control = lmerControl(optimizer="Nelder_Mead"))
anova(lmm,lmm_null)
For both models, I get the warning "boundary (singular) fit: see ?isSingular", but if I drop one random effect to make the structure simpler, then I get the warning that the models failed to converge (which is a bit strange), so I ignored it.
But then, the LRT output looks like this:
Data: dfb.3
Models:
lmm_null: logRT ~ pers_force + answer + (1 | subject) + (1 | dilemma)
lmm: logRT ~ pers_force * answer + (1 | subject) + (1 | dilemma)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
lmm_null 6 -13639 -13613 6825.6 -13651
lmm 7 -13637 -13607 6825.6 -13651 0 1 1
As you can see, the Chi-Square value is 0 and the p-Value is exactly 1, which seems very strange. I guess something must have gone wrong here, but I can't figure out what.
You say
logRT is the average logarithmized reaction time across all those 4 dilemmas.
If I'm interpreting this correctly — i.e., each subject has the same response for all of the times they are observed — then this is the proximal cause of your problem. (I know I've seen this exact problem before, but I don't know where — here? r-sig-mixed-models#r-project.org?)
simulate data
library(lme4)
set.seed(101)
dd1 <- expand.grid(subject=factor(100:150), contrasts=factor(1:4))
dd1$answer <- rbinom(nrow(dd1),size=1,prob=0.5)
dd1$logRT <- simulate(~answer + contrasts + (1|subject),
family=gaussian,
newparams=list(beta=c(0,1,1,-1,2),theta=1,sigma=1),
newdata=dd1)[[1]]
regular fit
This is fine and gives answers close to the true parameters:
m1 <- lmer(logRT~answer + contrasts + (1|subject), data=dd1)Linear mixed model fit by REML ['lmerMod']
## Random effects:
## Groups Name Std.Dev.
## subject (Intercept) 1.0502
## Residual 0.9839
## Number of obs: 204, groups: subject, 51
## Fixed Effects:
## (Intercept) answer contrasts2 contrasts3 contrasts4
## -0.04452 0.85333 1.16785 -1.07847 1.99243
now average the responses by subject
We get a raft of warning messages, and the same pathologies you are seeing (residual variance and all parameter estimates other than the intercept are effectively zero). This is because lmer is trying to estimate residual variance from the within-subject variation, and we have gotten rid of it!
I don't know why you are doing the averaging. If this is unavoidable, and your design is the randomized-block type shown here (each subject sees all four dilemmas/contrasts), then you can't estimate the dilemma effects.
dd2 <- transform(dd1, logRT=ave(logRT,subject))
m2 <- update(m1, data=dd2)
## Random effects:
## Groups Name Std.Dev.
## subject (Intercept) 6.077e-01
## Residual 1.624e-05
## Number of obs: 204, groups: subject, 51
## Fixed Effects:
## (Intercept) answer contrasts2 contrasts3 contrasts4
## 9.235e-01 1.031e-10 -1.213e-11 -1.672e-15 -1.011e-11
Treating the dilemmas as a random effect won't do what you want (allow for individual-to-individual variability in how they were presented). That among-subject variability in the dilemmas is going to get lumped into the residual variability, where it belongs — I would recommend treating it as a fixed effect.

Direction of Estimate coefficient in a Log Regression

I'm analysing ordinal logistic regression and I'm wondering, how to know which direction the estimate coefficient has? My Variables are just 0, 1 for Women, Men and 0,1,2,4 for different postures. So my question is, how do I know, if the estimate describes the change from 0 to 1 or the change from 1 to 0, talking about gender?
The output added a 1 to PicSex, is it a sign, that this one has a 1->0 direction? See the code for that.
Thank you for any help
Cumulative Link Mixed Model fitted with the Laplace approximation
formula: Int ~ PicSex + Posture + (1 | PicID)
data: x
Random effects:
Groups Name Variance Std.Dev.
PicID (Intercept) 0.0541 0.2326
Number of groups: PicID 16
Coefficients:
Estimate Std. Error z value Pr(>|z|)
PicSex1 0.3743 0.1833 2.042 0.0411 *
Posture -1.1232 0.1866 -6.018 1.77e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
P.S.
Thank you here are my head results (I relabeled PicSex to Sex)
> head(Posture)
[1] 1 1 1 1 1 1
Levels: 0 1
> head(Sex)
[1] 0 0 0 0 0 0
Levels: 0 1
So the level order is the same, but on Sex it still added a 1 but on posture not. Still very confused about the directions.
Your sex has two levels,0 or 1. So PicSex1 means the effect of PicSex being 1 compared to PicSex being 0. I show an example below using the wine dataset:
library(ordinal)
DATA = wine
> head(DATA$temp)
[1] cold cold cold cold warm warm
Levels: cold warm
Here cold comes first in Levels, so it is set as the reference in any linear models.First we verify the effect of cold vs warm
do.call(cbind,tapply(DATA$rating,DATA$temp,table))
#warm has a higher average rating
Fit the model
# we fit the a model, temp is fixed effect
summary(clmm(rating ~ temp + contact+(1|judge), data = DATA))
Cumulative Link Mixed Model fitted with the Laplace approximation
formula: rating ~ temp + contact + (1 | judge)
data: DATA
link threshold nobs logLik AIC niter max.grad cond.H
logit flexible 72 -81.57 177.13 332(999) 1.03e-05 2.8e+01
Random effects:
Groups Name Variance Std.Dev.
judge (Intercept) 1.279 1.131
Number of groups: judge 9
Coefficients:
Estimate Std. Error z value Pr(>|z|)
tempwarm 3.0630 0.5954 5.145 2.68e-07 ***
contactyes 1.8349 0.5125 3.580 0.000344 ***
Here we see warm being attached to "temp" and as we know, it has a positive coefficient because the rating is better in warm, compared to cold (the reference).
So if you set another group as the reference, you will see another name attached, and the coefficient is reversed (-3.. compared to +3.. in previous example)
# we set warm as reference now
DATA$temp = relevel(DATA$temp,ref="warm")
summary(clmm(rating ~ temp + contact+(1|judge), data = DATA))
Cumulative Link Mixed Model fitted with the Laplace approximation
formula: rating ~ temp + contact + (1 | judge)
data: DATA
link threshold nobs logLik AIC niter max.grad cond.H
logit flexible 72 -81.57 177.13 269(810) 1.14e-04 1.8e+01
Random effects:
Groups Name Variance Std.Dev.
judge (Intercept) 1.28 1.131
Number of groups: judge 9
Coefficients:
Estimate Std. Error z value Pr(>|z|)
tempcold -3.0630 0.5954 -5.145 2.68e-07 ***
contactyes 1.8349 0.5125 3.580 0.000344 ***
So always check what is the reference before you fit the model

Anova table for pscl:zeroinfl

We're trying to model a count variable with excessive zeros using a zero-inflated poisson (as implemented in pscl package). Here is a (simplified) output showing both categorical and continuous explanatory variables:
library(pscl)
> m1 <- zeroinfl(y ~ treatment + some_covar, data = d, dist =
"poisson")
> summary(m1)
Count model coefficients (poisson with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.189253 0.102256 31.189 < 2e-16 ***
treatmentB -0.282478 0.107965 -2.616 0.00889 **
treatmentC 0.227633 0.103605 2.197 0.02801 *
some_covar 0.002190 0.002329 0.940 0.34706
Zero-inflation model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.67251 0.74961 0.897 0.3696
treatmentB -1.72728 0.89931 -1.921 0.0548 .
treatmentC -0.31761 0.77668 -0.409 0.6826
some_covar -0.03736 0.02684 -1.392 0.1640
summary gave us some good answers but we are looking for a ANOVA-like table. So, the question is: is it ok to use car::Anova to obtain such table?
> Anova(m1)
Analysis of Deviance Table (Type II tests)
Response: y
Df Chisq Pr(>Chisq)
treatment 2 30.7830 2.068e-07 ***
some_covar 1 0.8842 0.3471
It seems to work fine but i'm not really sure whether is a valid approach since documentation is missing (seems like is only considering the 'count model' part?). Do you recommend to follow this approach or there is a better way?

How do I overcome this error in execution of lmer function: Error in if (REML) p else 0L : argument is not interpretable as logical?

I have checked out the defensive methods as was discussed in this post in order to prevent this error but it still doesn't go away.
model<-lmer(Proportion~Plot+Treatment+(1|Plot/Treatment),binomial,data=data)
Error in if (REML) p else 0L : argument is not interpretable as logical
tl;dr you should use glmer instead. Because you haven't named your arguments, R is interpreting them by position (order). lmer's third argument is REML, so R thinks you're specifying REML=binomial, which isn't a legitimate value. family is the third argument to glmer, so this would work (sort of: see below) if you used glmer, but it's usually safer to name the arguments explicitly if there's any possibility of getting confused.
A reproducible example would be nice, but:
model <- glmer(Proportion~Plot+Treatment+(1|Plot/Treatment),
family=binomial,data=data)
is a starting point. I foresee a few more problems though:
if your data are not Bernoulli (0/1) (which I'm guessing not since your response is called Proportion), then you need to include the total number sampled in each trial, e.g. by specifying a weights argument
you have Plot and Treatment as both fixed and as random-effect grouping variables in your model; that won't work. I see that Crawley really does suggest this in the R book (google books link).
Do not do it the way he suggests, it doesn't make any sense. Replicating:
library(RCurl)
url <- "https://raw.githubusercontent.com/jejoenje/Crawley/master/Data/insects.txt"
dd <- read.delim(text=getURL(url),header=TRUE)
## fix typo because I'm obsessive:
levels(dd$treatment) <- c("control","sprayed")
library(lme4)
model <- glmer(cbind(dead,alive)~block+treatment+(1|block/treatment),
data=dd,family=binomial)
If we look at the among-group standard deviation, we see that it's zero for both groups; it's exactly zero for block because block is already included in the fixed effects. It need not be for the treatment:block interaction (we have treatment, but not the interaction between block and treatment, in the fixed effects), but is because there's little among-treatment-within-block variation:
VarCorr(model)
## Groups Name Std.Dev.
## treatment:block (Intercept) 2.8736e-09
## block (Intercept) 0.0000e+00
Conceptually, it makes more sense to treat block as a random effect:
dd <- transform(dd,prop=dead/(alive+dead),ntot=alive+dead)
model1 <- glmer(prop~treatment+(1|block/treatment),
weights=ntot,
data=dd,family=binomial)
summary(model)
## ...
## Formula: prop ~ treatment + (1 | block/treatment)
## Random effects:
## Groups Name Variance Std.Dev.
## treatment:block (Intercept) 0.02421 0.1556
## block (Intercept) 0.18769 0.4332
## Number of obs: 48, groups: treatment:block, 12; block, 6
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.1640 0.2042 -5.701 1.19e-08 ***
## treatmentsprayed 3.2434 0.1528 21.230 < 2e-16 ***
Sometimes you might want to treat it as a fixed effect:
model2 <- update(model1,.~treatment+block+(1|block:treatment))
summary(model2)
## Random effects:
## Groups Name Variance Std.Dev.
## block:treatment (Intercept) 5.216e-18 2.284e-09
## Number of obs: 48, groups: block:treatment, 12
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.5076 0.0739 -6.868 6.50e-12 ***
## treatmentsprayed 3.2676 0.1182 27.642 < 2e-16 ***
Now the block-by-treatment interaction variance is effectively zero (because block soaks up more variability if treated as a fixed effect). However, the estimated effect of spraying is very nearly identical.
If you're worried about overdispersion you can add an individual-level random effect (or use MASS::glmmPQL; lme4 no longer fits quasi-likelihood models)
dd <- transform(dd,obs=factor(seq(1:nrow(dd))))
model3 <- update(model1,.~.+(1|obs))
## Random effects:
## Groups Name Variance Std.Dev.
## obs (Intercept) 4.647e-01 6.817e-01
## treatment:block (Intercept) 1.138e-09 3.373e-05
## block (Intercept) 1.813e-01 4.258e-01
## Number of obs: 48, groups: obs, 48; treatment:block, 12; block, 6
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.1807 0.2411 -4.897 9.74e-07 ***
## treatmentsprayed 3.3481 0.2457 13.626 < 2e-16 ***
The observation-level effect has effectively replaced the treatment-by-block interaction (which is now close to zero). Again, the estimated spraying effect has hardly changed (but its standard error is twice as large ...)

Resources