power analysis in simr - object is not a matrix - r

I have the following model:
ModelPower <- lmer(DV ~ GroupAbstract * Condition_Cat_Abs + (1|Participant) + (1 + GroupAbstract|Stimulus), data = Dataset)
This model gives the following output:
Random effects:
Groups Name Variance Std.Dev. Corr
Participant (Intercept) 377.401 19.427
Stimulus (Intercept) 91.902 9.587
GroupAbstractOutgroup 2.003 1.415 -0.40
Residual 338.927 18.410
Number of obs: 16512, groups: Participant, 344; Stimulus, 32
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 65.8962 2.0239 59.6906 32.559 < 0.0000000000000002 ***
GroupAbstractOutgroup -0.9287 0.5561 129.9242 -1.670 0.0973 .
Condition_Cat_AbsSecondOrderIn -2.2584 0.4963 16103.9277 -4.550 0.00000539 ***
Condition_Cat_AbsSecondOrderOut -7.0821 0.4963 16103.9277 -14.270 < 0.0000000000000002 ***
GroupAbstractOutgroup:Condition_Cat_AbsSecondOrderIn -3.0229 0.7019 16103.9277 -4.307 0.00001665 ***
GroupAbstractOutgroup:Condition_Cat_AbsSecondOrderOut 7.8765 0.7019 16103.9277 11.222 < 0.0000000000000002 ***
I am interested in the interaction "GroupAbstractOutgroup:Condition_Cat_AbsSecondOrderIn" and I am trying to estimate the sample size to detect an effect size of at least -2 using the R package simr. the original slope is -3.02 so I specify the new one:
ModelPower#beta[names(fixef(ModelPower)) %in% "GroupAbstractOutgroup:Condition_Cat_AbsSecondOrderIn"] <- -2
However, regardless of how I specify the powerSim function both for the main effects and interactions (see some examples below), I get power of 0% and the following error when running lastResult()$errors 'object is not a matrix'. I know what the error should mean but even after converting the original data frame and the table of fixed effects to a matrix, the error is still there and I am not sure what it is referring to and how to get the actual output. Any help would be much appreciated!
Examples of the powerSim function:
powerSim(ModelPower, test=fixed("GroupAbstract", "anova"), nsim=10, seed=1)
powerSim(ModelPower, test=fixed("GroupAbstractOutgroup:Condition_Cat_AbsSecondOrderIn", "anova"), nsim=10, seed=1)

Related

Anova table for pscl:zeroinfl

We're trying to model a count variable with excessive zeros using a zero-inflated poisson (as implemented in pscl package). Here is a (simplified) output showing both categorical and continuous explanatory variables:
library(pscl)
> m1 <- zeroinfl(y ~ treatment + some_covar, data = d, dist =
"poisson")
> summary(m1)
Count model coefficients (poisson with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.189253 0.102256 31.189 < 2e-16 ***
treatmentB -0.282478 0.107965 -2.616 0.00889 **
treatmentC 0.227633 0.103605 2.197 0.02801 *
some_covar 0.002190 0.002329 0.940 0.34706
Zero-inflation model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.67251 0.74961 0.897 0.3696
treatmentB -1.72728 0.89931 -1.921 0.0548 .
treatmentC -0.31761 0.77668 -0.409 0.6826
some_covar -0.03736 0.02684 -1.392 0.1640
summary gave us some good answers but we are looking for a ANOVA-like table. So, the question is: is it ok to use car::Anova to obtain such table?
> Anova(m1)
Analysis of Deviance Table (Type II tests)
Response: y
Df Chisq Pr(>Chisq)
treatment 2 30.7830 2.068e-07 ***
some_covar 1 0.8842 0.3471
It seems to work fine but i'm not really sure whether is a valid approach since documentation is missing (seems like is only considering the 'count model' part?). Do you recommend to follow this approach or there is a better way?

lmPerm P-Values Different depending on Order of Coefficients

I am getting different results from lmPerm based on the order in which I enter the variables in the function call.
For example, placing NCF.pf before TotalProperties yields the following:
pfit <- lmp(NetCashOps ~ NCF.pf + TotalProperties, data = sub.pm, subset = Presence == 1)
summary(pfit)
...
Coefficients:
Estimate Iter Pr(Prob)
NCF.pf 4.581e-01 51 1
TotalProperties 5.246e+04 5000 <2e-16 ***
but, when I switch the order of the coefficients in the formula and place TotalProperties before NCF.pf, the p-value on NCF.pf becomes significant
pfit2 <- lmp(NetCashOps ~ TotalProperties + NCF.pf, data = sub.pm, subset = Presence == 1)
summary(pfit2)
...
Coefficients:
Estimate Iter Pr(Prob)
TotalProperties 5.246e+04 5000 <2e-16 ***
NCF.pf 4.581e-01 5000 <2e-16 ***
Am I missing something? Why would the p-values be different just because I switched the order of the variables in the function call?
Update - Data Source and lm Output (11/11/2016)
The data can be found on GitHub at this link.
When calling the standard lm function twice (reversing the order of the variables on the second call), the p-values are identical (see below). Hence, unlike when using the lmPerm function, the order of the variables doesn't matter with lm.
fit1 <- lm(NetCashOps ~ NCF.pf + TotalProperties, data = sub.pm, subset = Presence == 1)
summary(fit1)
...
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.088e+05 2.258e+05 3.138 0.0019 **
NCF.pf 4.581e-01 1.112e-01 4.121 5.11e-05 ***
TotalProperties 5.246e+04 9.519e+03 5.511 8.76e-08 ***
fit2 <- lm(NetCashOps ~ TotalProperties + NCF.pf, data = sub.pm, subset = Presence == 1)
summary(fit2)
...
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.088e+05 2.258e+05 3.138 0.0019 **
TotalProperties 5.246e+04 9.519e+03 5.511 8.76e-08 ***
NCF.pf 4.581e-01 1.112e-01 4.121 5.11e-05 ***
Thanks!
I already saw 2 close votes to migrate this to Cross Validated, but in my humble opinion this should stay on Stack Overflow. It is true, that t-statistic and p-value are not invariant to the specification order of terms, under the non-pivoted QR factorization strategy used by lm and lmp, but as shown in the new edit, for OP's data, these statistic should be invariant. So there must be something sensitive at programming level.
My quick diagnose suggests, that if we set seqs = TRUE, rather than using the default FALSE, we would get consistent result:
## I have subsetted data with `Presence == 1` into a new dataset `dat`
## I have also renamed variable name for simplicity
coef(summary(lmp(y ~ x1 + x2, dat, seqs = TRUE)))
# Estimate Iter Pr(Prob)
#(Intercept) 2.019959e+06 5000 0
#x1 4.580840e-01 5000 0
#x2 5.245619e+04 5000 0
coef(summary(lmp(y ~ x2 + x1, dat, seqs = TRUE)))
# Estimate Iter Pr(Prob)
#(Intercept) 2.019959e+06 5000 0
#x2 5.245619e+04 5000 0
#x1 4.580840e-01 5000 0
Note, the Pr(Prob) should be "< 2e-16" when printed by summary, but when using coef to obtain a matrix, those tiny values are 0.
The documentation of ?lmp mentions a little on this part:
The SS will be calculated _sequentially_, just as ‘lm()’ does; or
they may be calculated _uniquely_, which means that the SS for
each source is calculated conditionally on all other sources.
I am at the moment not sure what SS is (as I am not a user of lmPerm), but this sounds like that for consistent result, we should set seqs = TRUE.

How do I overcome this error in execution of lmer function: Error in if (REML) p else 0L : argument is not interpretable as logical?

I have checked out the defensive methods as was discussed in this post in order to prevent this error but it still doesn't go away.
model<-lmer(Proportion~Plot+Treatment+(1|Plot/Treatment),binomial,data=data)
Error in if (REML) p else 0L : argument is not interpretable as logical
tl;dr you should use glmer instead. Because you haven't named your arguments, R is interpreting them by position (order). lmer's third argument is REML, so R thinks you're specifying REML=binomial, which isn't a legitimate value. family is the third argument to glmer, so this would work (sort of: see below) if you used glmer, but it's usually safer to name the arguments explicitly if there's any possibility of getting confused.
A reproducible example would be nice, but:
model <- glmer(Proportion~Plot+Treatment+(1|Plot/Treatment),
family=binomial,data=data)
is a starting point. I foresee a few more problems though:
if your data are not Bernoulli (0/1) (which I'm guessing not since your response is called Proportion), then you need to include the total number sampled in each trial, e.g. by specifying a weights argument
you have Plot and Treatment as both fixed and as random-effect grouping variables in your model; that won't work. I see that Crawley really does suggest this in the R book (google books link).
Do not do it the way he suggests, it doesn't make any sense. Replicating:
library(RCurl)
url <- "https://raw.githubusercontent.com/jejoenje/Crawley/master/Data/insects.txt"
dd <- read.delim(text=getURL(url),header=TRUE)
## fix typo because I'm obsessive:
levels(dd$treatment) <- c("control","sprayed")
library(lme4)
model <- glmer(cbind(dead,alive)~block+treatment+(1|block/treatment),
data=dd,family=binomial)
If we look at the among-group standard deviation, we see that it's zero for both groups; it's exactly zero for block because block is already included in the fixed effects. It need not be for the treatment:block interaction (we have treatment, but not the interaction between block and treatment, in the fixed effects), but is because there's little among-treatment-within-block variation:
VarCorr(model)
## Groups Name Std.Dev.
## treatment:block (Intercept) 2.8736e-09
## block (Intercept) 0.0000e+00
Conceptually, it makes more sense to treat block as a random effect:
dd <- transform(dd,prop=dead/(alive+dead),ntot=alive+dead)
model1 <- glmer(prop~treatment+(1|block/treatment),
weights=ntot,
data=dd,family=binomial)
summary(model)
## ...
## Formula: prop ~ treatment + (1 | block/treatment)
## Random effects:
## Groups Name Variance Std.Dev.
## treatment:block (Intercept) 0.02421 0.1556
## block (Intercept) 0.18769 0.4332
## Number of obs: 48, groups: treatment:block, 12; block, 6
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.1640 0.2042 -5.701 1.19e-08 ***
## treatmentsprayed 3.2434 0.1528 21.230 < 2e-16 ***
Sometimes you might want to treat it as a fixed effect:
model2 <- update(model1,.~treatment+block+(1|block:treatment))
summary(model2)
## Random effects:
## Groups Name Variance Std.Dev.
## block:treatment (Intercept) 5.216e-18 2.284e-09
## Number of obs: 48, groups: block:treatment, 12
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.5076 0.0739 -6.868 6.50e-12 ***
## treatmentsprayed 3.2676 0.1182 27.642 < 2e-16 ***
Now the block-by-treatment interaction variance is effectively zero (because block soaks up more variability if treated as a fixed effect). However, the estimated effect of spraying is very nearly identical.
If you're worried about overdispersion you can add an individual-level random effect (or use MASS::glmmPQL; lme4 no longer fits quasi-likelihood models)
dd <- transform(dd,obs=factor(seq(1:nrow(dd))))
model3 <- update(model1,.~.+(1|obs))
## Random effects:
## Groups Name Variance Std.Dev.
## obs (Intercept) 4.647e-01 6.817e-01
## treatment:block (Intercept) 1.138e-09 3.373e-05
## block (Intercept) 1.813e-01 4.258e-01
## Number of obs: 48, groups: obs, 48; treatment:block, 12; block, 6
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.1807 0.2411 -4.897 9.74e-07 ***
## treatmentsprayed 3.3481 0.2457 13.626 < 2e-16 ***
The observation-level effect has effectively replaced the treatment-by-block interaction (which is now close to zero). Again, the estimated spraying effect has hardly changed (but its standard error is twice as large ...)

Coefficients output in R without alphabetical stacked-up default

Both in lm() as in glm or in lmer the default output for coefficients is to format them as as an (Intercept) corresponding to the variable with the highest alphabet order, followed by the rest of the coefficients. To find out the actual value of any coefficient, it is necessary to add (or subtract) from the (Intercept) baseline and, depending on the model, multiple intermediate coefficients.
Although this is not a problem when using just a couple of regressors, it is cumbersome and prone to error in more complex models. For example:
For the data:
head(trees)
site tree treatment organ sample tissue length
1 L LT01 T root 1 phloem 90.9924
2 L LT01 T root 1 xylem 123.4933
3 L LT01 T root 2 phloem 101.2444
4 L LT01 T root 2 xylem 106.0529
5 L LT01 T root 3 phloem 108.8453
6 L LT01 T root 3 xylem 126.5165
The call:
fit <- lmer(length ~ treatment + organ + tissue + (1|tree/organ/sample), data = trees)
summary(fit)
Random effects:
Groups Name Variance Std.Dev.
sample:(organ:tree) (Intercept) 1.035e-12 1.017e-06
organ:tree (Intercept) 0.000e+00 0.000e+00
tree (Intercept) 8.796e+00 2.966e+00
Residual 8.873e+01 9.420e+00
Number of obs: 360, groups: sample:(organ:tree), 180; organ:tree, 60; tree, 30
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 93.7903 1.2539 58.4000 74.798 < 2e-16 ***
treatmentT 13.5571 1.4692 28.0000 9.227 5.51e-10 ***
organstem 8.1326 0.9929 328.0000 8.191 5.77e-15 ***
tissuexylem 13.9814 0.9929 328.0000 14.081 < 2e-16 ***
yields coefficients expressed in a highly dependent way, which makes the quick calculation of any one of them tedious.
Is there a way of getting the actual coefficients, at least for the fixed effects, in a more straightforward format?

Get variables from summary?

I want to grab the Standard Error column when I do summary on a linear regression model. The output is below:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.436954 0.616937 -13.676 < 2e-16 ***
x1 -0.138902 0.024247 -5.729 1.01e-08 ***
x2 0.005978 0.009142 0.654 0.51316 `
...
I just want the Std. Error column values stored into a vector. How would I go about doing so? I tried model$coefficients[,2] but that keeps giving me extra values. If anyone could help that would be great.
Say fit is the linear model, then summary(fit)$coefficients[,2] has the standard errors. Type ?summary.lm.
fit <- lm(y~x, myData)
summary(fit)$coefficients[,1] # the coefficients
summary(fit)$coefficients[,2] # the std. error in the coefficients
summary(fit)$coefficients[,3] # the t-values
summary(fit)$coefficients[,4] # the p-values

Resources