If loop condition with result from summary() after manova() in R - r

I am running a manova and my code is something like this:
fit <- manova(y ~ grp + cov)
summary(fit)
So the output of summary is:
Df Pillai approx F num Df den Df Pr(>F)
grp 2 0.185330 5.6511 6 332 1.322e-05 ***
age 1 0.110497 6.8323 3 165 0.0002284 ***
fd 1 0.153049 9.9388 3 165 4.646e-06 ***
scan 1 0.037374 2.1354 3 165 0.0977272 .
Residuals 167
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I am having trouble finding a way to extract the 6th element of the 1st column (the p value) so that I can create a conditional if loop to run univariate and post hoc tests on significant models. Hoping to have something like:
p <- [method of obtaining p value(1.322e-05) in this case]
if (p < 0.017){
for (i in 1:length(dependentvars){
univariate tests with aov()
p.uni <- univariatetestp(same thing as before with obtaining p value)
if (p.uni <-){
summary(glht(....
Any advice or direction is greatly appreciated.
Thank you
-J

Related

How to set up a 2 way repeated measures ANCOVA using R?

I have not seen an example of this particular test being done in many forums I have been scouring using R. I have seen an example of a 1 way ANCOVA here or 2 way ANOVA (repeated measures) here. I have presented here my version of it after jumping through several hoops and I am not sure if it is correct.
I am trying to setup a statistical test for results from a clinical trial. A simplified version of the study is as follows. In this study I have divided my subjects into two groups (Condition A & B) with n=3 subjects per group. Dependent variable (DV) was measured in these subjects over 4 time points. I aim to conduct a two way ANCOVA (repeated measures). To ask the questions:- is the Response variable (DV) affected by the Condition-timepoint interaction with bodywt as a covariate.
Note: I have to stick to "traditional" aov function and not lme or lmer since my reports will be compared to SPSS generated output-- see discussion here that aov and lme are not the same.
My example code:
SubjectID <- rep(1:6, each= 4)
Condition <- c(rep('A',12),rep('B',12))
bodywt <- rep(16:20, each= 4)
timepoint <- rep(1:4, 6)
DV <- c(42,51,63,57,46,60,63,61,41,55,62,57,73,56,53,58,50,60,56,54,52,57,54,54)
data <- data.frame(SubjectID = SubjectID,
Condition = Condition,
bodywt = bodywt,
timepoint = timepoint,
DV = DV)
#Setting up my 2 way ANCOVA repeated measures
stat_result <- aov(DV~Condition*timepoint + bodywt:Condition + Error(SubjectID), data)
summary(stat_result)
#Error: SubjectID
# Df Sum Sq Mean Sq
#Condition 1 0.8036 0.8036
#
#Error: Within
# Df Sum Sq Mean Sq F value Pr( #F)
#Condition 1 41.8 41.8 1.341 0.26282
#timepoint 1 126.1 126.1 4.045 0.06042 .
#Condition:timepoint 1 323.4 323.4 10.377 0.00501 **
#Condition:bodywt 2 81.7 40.9 1.311 0.29540
#Residuals 17 529.8 31.2
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I know that I am on the right track because when I remove the covariate bodywt, I Notice that the F value and Pr(F) for Condition:timepoint are predictably affected.
stat_result <- aov(DV~Condition*timepoint + bodywt + Error(SubjectID), data)
summary(stat_result)
#Error: SubjectID
# Df Sum Sq Mean Sq
#Condition 1 0.8036 0.8036
#
#Error: Within
# Df Sum Sq Mean Sq F value Pr( #F)
#Condition 1 41.8 41.8 1.231 0.28175
#timepoint 1 126.1 126.1 3.714 0.06989 .
#bodywt 1 0.5 0.5 0.015 0.90435
#Condition:timepoint 1 323.4 323.4 9.527 0.00636 **
#Residuals 18 611.0 33.9
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
My question is that:
Is my code the correct way to set up a 2 way repeated measures ANCOVA?
(bonus question) What would be a best post hoc test to compare means between the two groups. Again I would prefer staying away from lme or lmer.
Thanks in advance

Proportion of variance of outcome explained by each variable in a linear regression

In the example data set found below I want to calculate the proportion of variance in science explained by each independent variable using linear regression model. How could I achieve that in R?
hsb2 <- read.table('http://www.ats.ucla.edu/stat/r/modules/hsb2.csv', header=T, sep=",")
m1<-lm(science ~ math+female+ socst+ read, data =hsb2)
One of the ways is to use anova() function from stats package.
It gives you the residual sum of squares explained by each variable and total sum of squares (i.e. variance)
anova(m1)
Analysis of Variance Table
Response: science
Df Sum Sq Mean Sq F value Pr(>F)
math 1 7760.6 7760.6 151.8810 < 2.2e-16 ***
female 1 233.0 233.0 4.5599 0.033977 *
socst 1 465.6 465.6 9.1128 0.002878 **
read 1 1084.5 1084.5 21.2254 7.363e-06 ***
Residuals 195 9963.8 51.1
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Extracting Multivariate Tests from the output of Anova or Manova function from car package

I wonder how to extract the Multivariate Tests: Site portion from the output of fm1 in the following MWE.
library(car)
fm1 <- summary(Anova(lm(cbind(Al, Fe, Mg, Ca, Na) ~ Site, data=Pottery)))
fm1
Type II MANOVA Tests:
Sum of squares and products for error:
Al Fe Mg Ca Na
Al 48.2881429 7.08007143 0.60801429 0.10647143 0.58895714
Fe 7.0800714 10.95084571 0.52705714 -0.15519429 0.06675857
Mg 0.6080143 0.52705714 15.42961143 0.43537714 0.02761571
Ca 0.1064714 -0.15519429 0.43537714 0.05148571 0.01007857
Na 0.5889571 0.06675857 0.02761571 0.01007857 0.19929286
------------------------------------------
Term: Site
Sum of squares and products for the hypothesis:
Al Fe Mg Ca Na
Al 175.610319 -149.295533 -130.809707 -5.8891637 -5.3722648
Fe -149.295533 134.221616 117.745035 4.8217866 5.3259491
Mg -130.809707 117.745035 103.350527 4.2091613 4.7105458
Ca -5.889164 4.821787 4.209161 0.2047027 0.1547830
Na -5.372265 5.325949 4.710546 0.1547830 0.2582456
Multivariate Tests: Site
Df test stat approx F num Df den Df Pr(>F)
Pillai 3 1.55394 4.29839 15 60.00000 2.4129e-05 ***
Wilks 3 0.01230 13.08854 15 50.09147 1.8404e-12 ***
Hotelling-Lawley 3 35.43875 39.37639 15 50.00000 < 2.22e-16 ***
Roy 3 34.16111 136.64446 5 20.00000 9.4435e-15 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I also couldn't find how to extract the table of tests but as a workaround you can calculate the results by running the Anova command over all test types.
However the print method, print.Anova.mlm does not return the results, so this needs to be tweaked a little.
library(car)
# create new print function
outtests <- car:::print.Anova.mlm
# allow the function to return the results and disable print
body(outtests)[[16]] <- quote(invisible(tests))
body(outtests)[[15]] <- NULL
# Now run the regression
mod <- lm(cbind(Al, Fe, Mg, Ca, Na) ~ Site, data=Pottery)
# Run the Anova over all tests
tab <- lapply(c("Pillai", "Wilks", "Hotelling-Lawley", "Roy"),
function(i) outtests(Anova(mod, test.statistic=i)))
tab <- do.call(rbind, tab)
row.names(tab) <- c("Pillai", "Wilks", "Hotelling-Lawley", "Roy")
tab
# Type II MANOVA Tests: Pillai test statistic
# Df test stat approx F num Df den Df Pr(>F)
#Pillai 3 1.554 4.298 15 60.000 2.413e-05 ***
#Wilks 3 0.012 13.089 15 50.091 1.840e-12 ***
#Hotelling-Lawley 3 35.439 39.376 15 50.000 < 2.2e-16 ***
#Roy 3 34.161 136.644 5 20.000 9.444e-15 ***
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As the output table is of class anova and data.frame you can use xtable on it.
xtable:::xtable(tab)
fm1$multivariate.tests gets you to the Site portion of the fm1 output.
Then you could use a combination of cat and capture.output for nice printing, or just capture.output for a character vector.
> cat(capture.output(fm1$multivariate.tests)[18:26], sep = "\n")
#
# Multivariate Tests: Site
# Df test stat approx F num Df den Df Pr(>F)
# Pillai 3 1.55394 4.29839 15 60.00000 2.4129e-05 ***
# Wilks 3 0.01230 13.08854 15 50.09147 1.8404e-12 ***
# Hotelling-Lawley 3 35.43875 39.37639 15 50.00000 < 2.22e-16 ***
# Roy 3 34.16111 136.64446 5 20.00000 9.4435e-15 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Update: From the result of
unlist(fm1$multivariate.tests, recursive = FALSE)
it doesn't look like the results are easily accessible as numeric values. So, as you requested, here is what it took to manipulate the results into a matrix. Having done this and then seen user20650's answer, I recommend you follow that suggestion and get the values via an ANOVA table.
co <- capture.output(fm1$multivariate.tests)[20:24]
s <- strsplit(gsub("([*]+$)|[<]", "", co[-1]), "\\s+")
dc <- do.call(rbind, lapply(s, function(x) as.numeric(x[-1])))
row.names(dc) <- sapply(s, "[", 1)
s2 <- strsplit(co[1], " ")[[1]]
s2 <- s2[nzchar(s2)]
s3 <- s2[-c(1, length(s2))]
colnames(dc) <- c(s2[1], paste(s3[c(TRUE, FALSE)], s3[c(FALSE, TRUE)]), s2[10])
dc
# Df test stat approx F num Df den Df Pr(>F)
# Pillai 3 1.55394 4.29839 15 60.00000 2.4129e-05
# Wilks 3 0.01230 13.08854 15 50.09147 1.8404e-12
# Hotelling-Lawley 3 35.43875 39.37639 15 50.00000 2.2200e-16
# Roy 3 34.16111 136.64446 5 20.00000 9.4435e-15
If anyone feels like improving my second code chunk, feel free.

Anova difference in SPSS and R

I'm quite new to R but I've tried recently to make a two way repeated measures ANOVA, to replicate the results that my supervisor did on SPSS.
I've struggled for days and read dozens of articles to understand what was going on in R, but I still don't get the same results.
> mod <- lm(Y~A*B)
> Anova(mod, type="III")
Anova Table (Type III tests)
Response: Y
Sum Sq Df F value Pr(>F)
(Intercept) 0.000 1 0.0000 1.00000
A 2.403 5 8.6516 4.991e-08 ***
B 0.403 2 3.6251 0.02702 *
A:B 1.220 10 2.1962 0.01615 *
Residuals 51.987 936
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
My data are from a balanced design, I used the type III SS since it's the one used in SPSS as well. The Sum squares the Df, and the linear model are the same in SPSS, the only things that differ being the F and p value. Thus, it should not be a Sum square mistake.
Results in SPSS are:
F Sig.
A 7.831 .000
B 2.681 .073
A:B 2.247 .014
I'm a little bit lost. Would it be a problem related to the contrasts?
Lucas

conducting GAM-GEE in gamm4 R package?

I am trying to analyze some visual transect data of organisms to generate a habitat distribution model. Once organisms are sighted, they are followed as point data is collected at a given time interval. Because of the autocorrelation among these "follows," I wish to utilize a GAM-GEE approach similar to that of Pirotta et al. 2011, using packages 'yags' and 'splines' (http://www.int-res.com/abstracts/meps/v436/p257-272/). Their R scripts are shown here (http://www.int-res.com/articles/suppl/m436p257_supp/m436p257_supp1-code.r). I have used this code with limited success and multiple issues of models failing to converge.
Below is the structure of my data:
> str(dat2)
'data.frame': 10792 obs. of 4 variables:
$ dist_slag : num 26475 26340 25886 25400 24934 ...
$ Depth : num -10.1 -10.5 -16.6 -22.2 -29.7 ...
$ dolphin_presence: int 0 0 0 0 0 0 0 0 0 0 ...
$ block : int 1 1 1 1 1 1 1 1 1 1 ...
> head(dat2)
dist_slag Depth dolphin_presence block
1 26475.47 -10.0934 0 1
2 26340.47 -10.4870 0 1
3 25886.33 -16.5752 0 1
4 25399.88 -22.2474 0 1
5 24934.29 -29.6797 0 1
6 24519.90 -26.2370 0 1
Here is the summary of my block variable (indicating the number of groups for which autocorrelation exists within each block
> summary(dat2$block)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 39.00 76.00 73.52 111.00 148.00
However, I would like to use the package 'gamm4', as I am more familiar with Professor Simon Wood's packages and functions, and it appears gamm4 might be the most appropriate. It is important to note that the models have a binary response (organism presence of absence along a transect), and thus why I think gamm4 is more appropriate than gamm. In the gamm help, it provides the following example for autocorrelation within factors:
## more complicated autocorrelation example - AR errors
## only within groups defined by `fac'
e <- rnorm(n,0,sig)
for (i in 2:n) e[i] <- 0.6*e[i-1]*(fac[i-1]==fac[i]) + e[i]
y <- f + e
b <- gamm(y~s(x,k=20),correlation=corAR1(form=~1|fac))
Following this example, the following is the code I used for my dataset
b <- gamm4(dolphin_presence~s(dist_slag)+s(Depth),random=(form=~1|block), family=binomial(),data=dat)
However, by examining the output (summary(b$gam)) and specifically summary(b$mer)), I am either unsure of how to interpret the results, or do not believe that the autocorrelation within the group is being taken into consideration.
> summary(b$gam)
Family: binomial
Link function: logit
Formula:
dolphin_presence ~ s(dist_slag) + s(Depth)
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -13.968 5.145 -2.715 0.00663 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(dist_slag) 4.943 4.943 70.67 6.85e-14 ***
s(Depth) 6.869 6.869 115.59 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.317glmer.ML score = 10504 Scale est. = 1 n = 10792
>
> summary(b$mer)
Generalized linear mixed model fit by the Laplace approximation
AIC BIC logLik deviance
10514 10551 -5252 10504
Random effects:
Groups Name Variance Std.Dev.
Xr s(dist_slag) 1611344 1269.39
Xr.0 s(Depth) 98622 314.04
Number of obs: 10792, groups: Xr, 8; Xr.0, 8
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
X(Intercept) -13.968 5.145 -2.715 0.00663 **
Xs(dist_slag)Fx1 -35.871 33.944 -1.057 0.29063
Xs(Depth)Fx1 3.971 3.740 1.062 0.28823
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
X(Int) X(_)F1
Xs(dst_s)F1 0.654
Xs(Dpth)Fx1 -0.030 0.000
>
How do I ensure that autocorrelation is indeed being accounted for within each unique value of the "block" variable? What is the simplest way to interpret the output for "summary(b$mer)"?
The results do differ from a normal gam (package mgcv) using the same variables and parameters without the "correlation=..." term, indicating that something different is occurring.
However, when I use a different variable for the correlation term (season), I get the SAME output:
> dat2 <- data.frame(dist_slag = dat$dist_slag, Depth = dat$Depth, dolphin_presence = dat$dolphin_presence,
+ block = dat$block, season=dat$season)
> head(dat2)
dist_slag Depth dolphin_presence block season
1 26475.47 -10.0934 0 1 F
2 26340.47 -10.4870 0 1 F
3 25886.33 -16.5752 0 1 F
4 25399.88 -22.2474 0 1 F
5 24934.29 -29.6797 0 1 F
6 24519.90 -26.2370 0 1 F
> summary(dat2$season)
F S
3224 7568
> b <- gamm4(dolphin_presence~s(dist_slag)+s(Depth),correlation=corAR1(1, form=~1 | season), family=binomial(),data=dat2)
> summary(b$gam)
Family: binomial
Link function: logit
Formula:
dolphin_presence ~ s(dist_slag) + s(Depth)
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -13.968 5.145 -2.715 0.00663 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(dist_slag) 4.943 4.943 70.67 6.85e-14 ***
s(Depth) 6.869 6.869 115.59 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.317glmer.ML score = 10504 Scale est. = 1 n = 10792
> summary(b$mer)
Generalized linear mixed model fit by the Laplace approximation
AIC BIC logLik deviance
10514 10551 -5252 10504
Random effects:
Groups Name Variance Std.Dev.
Xr s(dist_slag) 1611344 1269.39
Xr.0 s(Depth) 98622 314.04
Number of obs: 10792, groups: Xr, 8; Xr.0, 8
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
X(Intercept) -13.968 5.145 -2.715 0.00663 **
Xs(dist_slag)Fx1 -35.871 33.944 -1.057 0.29063
Xs(Depth)Fx1 3.971 3.740 1.062 0.28823
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
X(Int) X(_)F1
Xs(dst_s)F1 0.654
Xs(Dpth)Fx1 -0.030 0.000
>
I just want to make sure it is correctly allowing for correlation within each value for the "block" variable. How do I formulate the model to say that autocorrelation can exist within each single value for block, but assume independence among blocks?
On another note, I am also receiving the following warning message after model completion for larger models (with many more variables than 2):
Warning message:
In mer_finalize(ans) : false convergence (8)
gamm4 is built on top of lme4, which does not allow for a correlation parameter (in contrast to the nlme, package, which underlies mgcv::gamm). mgcv::gamm does handle binary data, although it uses PQL, which is generally less accurate than Laplace/GHQ approximations as in gamm4/lme4. It is unfortunate (!!) that you're not getting a warning telling you that the correlation argument is being ignored (when I try a simple example using a correlation argument with lme4, I do get a warning, but it's possible that the extra argument is getting swallowed somewhere inside gamm4).
Your desired autocorrelation structure ("autocorrelation can exist within each single value for block, but assume independence among blocks") is exactly the way correlation structures are coded in nlme (and hence in mgcv::gamm).
I would use mcgv::gamm, and would suggest that if at all possible you try it out on some simulated data with known structure (or use the data set provided in the supplementary material above and see if you can reproduce their qualitative conclusions with your alternative approach).
StackOverflow is nice, but there is probably more mixed model expertise at r-sig-mixed-models#r-project.org

Resources