lmerTest results in console but not displayed in knitted PDF - r

I have a question regarding lmerTest for approximating the degrees of freedom and p values for linear mixed model.
I just took stats class in R to help me with my behavior experiments in lab, so I am very new to this. When using R Studio, I can run the anova() after mounting the lmerTest library and I can see the results in my console, depicted below:
lmsocial <- lmer(social~ grps + stim + grps*stim + (1|cohort) +(1|subj))
library(lmerTest)
anova(lmsocial)
Analysis of Variance Table of type 3 with Satterthwaite
approximation for degrees of freedom
Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)
grps 22471 22471 1 21 5.5922 0.027747 *
stim 54289 54289 1 22 13.5107 0.001326 **
grps:stim 40423 40423 1 22 10.0599 0.004416 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
However, below is the read out I get on my PDF when I knit in R Studio (same for knitting in HTML). The are no errors when knitting, just so info missing:
lmsocial <- lmer(social~ grps + stim + grps*stim + (1|cohort) +(1|subj))
library(lmerTest)
anova(lmsocial)
## Analysis of Variance Table
## Df Sum Sq Mean Sq F value
## grps 1 22471 22471 5.5922
## stim 1 54289 54289 13.5107
## grps:stim 1 40423 40423 10.0599
Am I missing something? When trying to generate a report, it would be nice to display the resulting ANOVA table form lmerTest.
How can I get it to knit properly?

lmerTest is doing something a bit sneaky, I haven't figured out what yet. In the meantime, it seems to work as long as you add an explicit print() statement:
library("lmerTest")
fm1 <- lmer(Reaction~Days + (Days|Subject),sleepstudy)
print(a1 <- anova(fm1))
(or print(anova(fm1)), or a1 <- anova(fm1); print(a))

the lmerTest package needs to be attached before the specification of the lmer model. The following should work:
library(lmerTest)
lmsocial <- lmer(social~ grps + stim + grps*stim + (1|cohort) +(1|subj))
anova(lmsocial)

Related

R: step function not writing out complete model in result report

I am running the "step" function in RStudio on this model:
inputData.entry = lmer(height ~ ENTRY_NO + REP + (1|SUB.BLOCK), data=inputData); # our model
this is what I am running with "step" :
help.search("step",package="lmerTest");
st <- step(inputData.entry, reduce.fixed=FALSE);
print(st);
Here is the output:
Backward reduced random-effect table:
Eliminated npar logLik AIC LRT Df Pr(>Chisq)
<none> 142 -397.15 1078.3
(1 | SUB.BLOCK) 1 141 -397.47 1076.9 0.63157 1 0.4268
Backward reduced fixed-effect table:
Eliminated Df Sum of Sq RSS AIC F value Pr(>F)
ENTRY_NO 0 138 4238.1 6210.4 844.18 1.9775 5.749e-05 ***
REP 0 1 30.6 2002.9 816.03 1.9720 0.1627
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Model found:
height ~ ENTRY_NO + REP
My issue is the statement---- Model Found:
Why won't the results list "Sub.Block" in the model when inputData.entry shows it in "lmer"?
Is there something I am doing wrong?
Thanks for the advice!
Step function selects a model according to AIC. Model found tells which model is selected according to these criteria. In this instance the algorithm tells us that Sub.Block doesn't bring meaningful information to the analysis over the added complicatoin it causes and thus it's not in the model suggested.

R One-Way ANOVA (getting only 1 DF and expecting 2 DFs)

I'm working through the examples of One-Way ANOVA on the UCLA website http://www.ats.ucla.edu/stat/r/faq/posthoc.htm.
When I run the command a1 <-aov(write ~ ses), my output differs from the example output. I'm particularly bothered by the fact that when I run the command summary(a1), my DF on ses is 1 and there are three ses categories (1,2,3) so I'm expecting 2 DFs which is what the example on the website shows. I've checked the data for the 'write' column and 'ses' column and the counts and averages seem to match with the example, but the result from aov(write ~ ses) doesn't. Has something changed? Why am I getting only 1 DF.
hsb2 <- read.table("http://www.ats.ucla.edu/stat/data/hsb2.csv", sep=",", header=TRUE)
a1 <- aov(write ~ ses, data = hsb2)
summary(a1)
# Df Sum Sq Mean Sq F value Pr(>F)
# ses 1 770 769.8 8.908 0.0032 **
# Residuals 198 17109 86.4
The page you are learning from has an error, in that it doesn't tell you how to enter the data correctly. The ses variable is supposed to be a factor, as we can see from the data they give us, it is read in as numeric:
str(hsb2$ses)
If we convert it to a factor, we get the same answer as the example:
hsb2$ses <- as.factor(hsb2$ses)
a1 <- aov(write ~ ses, data=hsb2)
summary(a1)
Df Sum Sq Mean Sq F value Pr(>F)
ses 2 859 429.4 4.97 0.00784 **
Residuals 197 17020 86.4
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
In addition, using attach is highly discouraged by most R users.

fit exponential decay in increase form in R

I want to fit a function in the increase form of exponential decay (or asymptotic curve), such that:
Richness = C*(1-exp(k*Abundance)) # k < 0
I've read on this page about expn() function, but simply can't find it (or a nls package). All I found was a nlstools package, but it has no expn(). I tried with the usual nls and exp function, but I only get increasing exponentials...
I want to fit the graph like below (drawn in Paint), and I don't know where the curve should stabilize (Richness = C). Thanks in advance.
This should get you started. Read the documentation on nls(...) (type ?nls at the command prompt). Also look up ?summary and ?predict.
set.seed(1) # so the example is reproduceable
df <- data.frame(Abundance=sort(sample(1:70,30)))
df$Richness <- with(df, 20*(1-exp(-0.03*Abundance))+rnorm(30))
fit <- nls(Richness ~ C*(1-exp(k*Abundance)),data=df,
algorithm="port",
start=c(C=10,k=-1),lower=c(C=0,k=-Inf), upper=c(C=Inf,k=0))
summary(fit)
# Formula: Richness ~ C * (1 - exp(k * Abundance))
#
# Parameters:
# Estimate Std. Error t value Pr(>|t|)
# C 20.004173 0.726344 27.54 < 2e-16 ***
# k -0.030183 0.002334 -12.93 2.5e-13 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.7942 on 28 degrees of freedom
#
# Algorithm "port", convergence message: relative convergence (4)
df$pred <- predict(fit)
plot(df$Abundance,df$Richness)
lines(df$Abundance,df$pred, col="blue",lty=2)
Thanks, jlhoward. I've got to something similar after reading the link sent by shujaa.
R <- function(a, b, abT) a*(1 - exp(-b*abT))
form <- Richness ~ R(a,b,Abundance)
fit <- nls(form, data=d, start=list(a=20,b=0.01))
plot(d$Abundance,d$Richness, xlab="Abundance", ylab="Richness")
lines(d$Abundance, predict(fit,list(x=d$Abundance)))
I've found the initial values by trial and error, though. So your solution looks better :)
EDIT: The result:

"weighted" regression in R

I have created a script like the one below to do something I called as "weighted" regression:
library(plyr)
set.seed(100)
temp.df <- data.frame(uid=1:200,
bp=sample(x=c(100:200),size=200,replace=TRUE),
age=sample(x=c(30:65),size=200,replace=TRUE),
weight=sample(c(1:10),size=200,replace=TRUE),
stringsAsFactors=FALSE)
temp.df.expand <- ddply(temp.df,
c("uid"),
function(df) {
data.frame(bp=rep(df[,"bp"],df[,"weight"]),
age=rep(df[,"age"],df[,"weight"]),
stringsAsFactors=FALSE)})
temp.df.lm <- lm(bp~age,data=temp.df,weights=weight)
temp.df.expand.lm <- lm(bp~age,data=temp.df.expand)
You can see that in temp.df, each row has its weight, what I mean is that there is a total of 1178 sample but for rows with same bp and age, they are merge into 1 row and represented in the weight column.
I used the weight parameters in the lm function, then I cross check the result with another dataframe that the temp.df dataframe is "expanded". But I found the lm outputs different for the 2 dataframe.
Did I misinterpret the weight parameters in lm function, and can anyone let me know how to I run regression properly (i.e. without expanding the dataframe manually) for a dataset presented like temp.df? Thanks.
The problem here is that the degrees of freedom are not being properly added up to get the right Df and mean-sum-squares statistics. This will correct the problem:
temp.df.lm.aov <- anova(temp.df.lm)
temp.df.lm.aov$Df[length(temp.df.lm.aov$Df)] <-
sum(temp.df.lm$weights)-
sum(temp.df.lm.aov$Df[-length(temp.df.lm.aov$Df)] ) -1
temp.df.lm.aov$`Mean Sq` <- temp.df.lm.aov$`Sum Sq`/temp.df.lm.aov$Df
temp.df.lm.aov$`F value`[1] <- temp.df.lm.aov$`Mean Sq`[1]/
temp.df.lm.aov$`Mean Sq`[2]
temp.df.lm.aov$`Pr(>F)`[1] <- pf(temp.df.lm.aov$`F value`[1], 1,
temp.df.lm.aov$Df, lower.tail=FALSE)[2]
temp.df.lm.aov
Analysis of Variance Table
Response: bp
Df Sum Sq Mean Sq F value Pr(>F)
age 1 8741 8740.5 10.628 0.001146 **
Residuals 1176 967146 822.4
Compare with:
> anova(temp.df.expand.lm)
Analysis of Variance Table
Response: bp
Df Sum Sq Mean Sq F value Pr(>F)
age 1 8741 8740.5 10.628 0.001146 **
Residuals 1176 967146 822.4
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I am a bit surprised this has not come up more often on R-help. Either that or my search strategy development powers are weakening with old age.

How can I classify post-hoc test results in R?

I am trying to understand how to work with ANOVAs and post-hoc tests in R.
So far, I have used aov() and TukeyHSD() to analyse my data. Example:
uni2.anova <- aov(Sum_Uni ~ Micro, data= uni2)
uni2.anova
Call:
aov(formula = Sum_Uni ~ Micro, data = uni2)
Terms:
Micro Residuals
Sum of Squares 0.04917262 0.00602925
Deg. of Freedom 15 48
Residual standard error: 0.01120756
Estimated effects may be unbalanced
My problem is, now I have a huge list of pairwise comparisons but cannot do anything with it:
TukeyHSD(uni2.anova)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Sum_Uni ~ Micro, data = uni2)
$Micro
diff lwr upr p adj
Act_Glu2-Act_Ala2 -0.0180017863 -0.046632157 0.0106285840 0.6448524
Ana_Ala2-Act_Ala2 -0.0250134285 -0.053643799 0.0036169417 0.1493629
NegI_Ala2-Act_Ala2 0.0702274527 0.041597082 0.0988578230 0.0000000
This dataset has 40 rows...
Idealy, I would like to get a dataset that looks something like this:
Act_Glu2 : a
Act_Ala2 : a
NegI_Ala2: b...
I hope you get the point. So far, I have found nothing comparable online... I also tried to select only significant pairs in the file resulting from TukeyHSD, but the file does not "acknowlegde" that it is made up of rows & columns, making selecting impossible...
Maybe there is something fundamentally wrong with my approach?
I think the OP wants the letters to get a view of the comparisons.
library(multcompView)
multcompLetters(extract_p(TukeyHSD(uni2.anova)))
That will get you the letters.
You can also use the multcomp package
library(multcomp)
cld(glht(uni2.anova, linct = mcp(Micro = "Tukey")))
I hope this is what you need.
The results from the TukeyHSD are a list. Use str to look at the structure. In your case you'll see that it's a list of one item and that item is basically a matrix. So, to extract the first column you'll want to save the TukeyHSD result
hsd <- TukeyHSD(uni2.anova)
If you look at str(hsd) you can that you can then get at bits...
hsd$Micro[,1]
That will give you the column of your differences. You should be able to extract what you want now.
Hard to tell without example data, but assuming Micro is just a factor with 4 levels and uni2 looks something like
n = 40
Micro = c('Act_Glu2', 'Act_Ala2', 'Ana_Ala2', 'NegI_Ala2')[sample(4, 40, rep=T)]
Sum_Uni = rnorm(n, 5, 0.5)
Sum_Uni[Micro=='Act_Glu2'] = Sum_Uni[Micro=='Act_Glu2'] + 0.5
uni2 = data.frame(Sum_Uni, Micro)
> uni2
Sum_Uni Micro
1 4.964061 Ana_Ala2
2 4.807680 Ana_Ala2
3 4.643279 NegI_Ala2
4 4.793383 Act_Ala2
5 5.307951 NegI_Ala2
6 5.171687 Act_Glu2
...
then I think what you're actually trying to get at is the basic multiple regression output:
fit = lm(Sum_Uni ~ Micro, data = uni2)
summary(fit)
anova(fit)
> summary(fit)
Call:
lm(formula = Sum_Uni ~ Micro, data = uni2)
Residuals:
Min 1Q Median 3Q Max
-1.26301 -0.35337 -0.04991 0.29544 1.07887
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.8364 0.1659 29.157 < 2e-16 ***
MicroAct_Glu2 0.9542 0.2623 3.638 0.000854 ***
MicroAna_Ala2 0.1844 0.2194 0.841 0.406143
MicroNegI_Ala2 0.1937 0.2158 0.898 0.375239
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4976 on 36 degrees of freedom
Multiple R-squared: 0.2891, Adjusted R-squared: 0.2299
F-statistic: 4.88 on 3 and 36 DF, p-value: 0.005996
> anova(fit)
Analysis of Variance Table
Response: Sum_Uni
Df Sum Sq Mean Sq F value Pr(>F)
Micro 3 3.6254 1.20847 4.8801 0.005996 **
Residuals 36 8.9148 0.24763
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
You can access the numbers in any of these tables like, for example,
> summary(fit)$coef[2,4]
[1] 0.0008536287
To see the list of what is stored in each object, use names():
> names(summary(fit))
[1] "call" "terms" "residuals" "coefficients"
[5] "aliased" "sigma" "df" "r.squared"
[9] "adj.r.squared" "fstatistic" "cov.unscaled"
In addition to the TukeyHSD() function you found, there are many other options for looking at the pairwise tests further, and correcting the p-values if desired. These include pairwise.table(), estimable() in gmodels, the resampling and boot packages, and others...

Resources