I perform following ezANOVA:
RMANOVAGHB1 <- ezANOVA(GHB1, dv=DIF.SCORE.STARTLE, wid=RAT.ID, within=TRIAL.TYPE, between=GROUP, detailed = TRUE, return_aov = TRUE)
My dataset looks like this:
RAT.ID DIF.SCORE.STARTLE GROUP TRIAL.TYPE
1 1 170.73 SAL TONO
2 1 80.07 SAL NOAL
3 2 456.40 PROP TONO
4 2 290.40 PROP NOAL
5 3 507.20 SAL TONO
6 3 261.60 SAL NOAL
7 4 208.67 PROP TONO
8 4 137.60 PROP NOAL
9 5 500.50 SAL TONO
10 5 445.73 SAL NOAL
up until rat.id 16.
My supervisors don't work with R, so they can't help me. I need code that will give me all post hoc contrasts, but looking it up only confuses me more and more.
I already tried to do TukeyHSD on the aov output of ezANOVA and tried pairwise.t.test next (as I found out bonferroni is a more appropriate correction in this case), but none seem to work. I've also found things about using a linear model and then multcomp, but I don't know if that would be a good solution in this case. I feel like the problem with everything I tried is either that I have between and within variables or that my dataset is not set up right. Another complicating factor is that I'm just a beginner with R and my statistical knowledge is still pretty basic as this is one of my first practical experiences with doing analyses.
If it's important, this is the output of the anova:
$ANOVA
Effect DFn DFd SSn SSd F p p<.05 ges
1 (Intercept) 1 14 1233568.9 1076460.9 16.043280 0.001302172 * 0.508451750
2 GROUP 1 14 212967.9 1076460.9 2.769771 0.118272657 0.151521743
3 TRIAL.TYPE 1 14 137480.6 116097.9 16.578499 0.001143728 * 0.103365833
4 GROUP:TRIAL.TYPE 1 14 11007.2 116097.9 1.327335 0.268574391 0.009145489
$aov
Call:
aov(formula = formula(aov_formula), data = data)
Grand Mean: 196.3391
Stratum 1: RAT.ID
Terms:
GROUP Residuals
Sum of Squares 212967.9 1076460.9
Deg. of Freedom 1 14
Residual standard error: 277.2906
1 out of 2 effects not estimable
Estimated effects are balanced
Stratum 2: RAT.ID:TRIAL.TYPE
Terms:
TRIAL.TYPE GROUP:TRIAL.TYPE Residuals
Sum of Squares 137480.6 11007.2 116097.9
Deg. of Freedom 1 1 14
Residual standard error: 91.0643
Estimated effects may be unbalanced
My solution, considering your dataset - first 5 rats:
1. Let's build the linear model:
model.lm = lm(DIF_SCORE_STARTLE ~ GROUP * TRIAL_TYPE, data = dat)
2. Let's chceck the homogeneity of variance (leveneTest) and distribution of our model (Shapiro-Wilk). We are looking for normal distribution and our variance should be homogenic. Two tests for this:
>shapiro.test(resid(model.lm))
Shapiro-Wilk normality test
data: resid(model.lm)
W = 0.91783, p-value = 0.3392
> leveneTest(DIF_SCORE_STARTLE ~ GROUP * TRIAL_TYPE, data = dat)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 3 0.066 0.976
6
Our p-values are higher than 0.05 in both cases so we don't have proof that our variance differs between groups. In case of normality test we can also conclude that the sample doesn't deviate from normality. Summarizing we can use parametrical tests such as ANOVA or pairwise t-test.
3.Yo can also run:
hist(resid(model.lm))
To check how does distribution of our data look like. And check the model:
plot(model.lm)
Here: https://stats.stackexchange.com/questions/58141/interpreting-plot-lm/65864 you'll find interpretation of plots produced by this function. As I saw, data looks fine.
4.Now finally we can do ANOVA test and post hoc HSD test:
> anova(model.lm)
Analysis of Variance Table
Response: DIF_SCORE_STARTLE
Df Sum Sq Mean Sq F value Pr(>F)
GROUP 1 7095 7095 0.2323 0.6469
TRIAL_TYPE 1 39451 39451 1.2920 0.2990
GROUP:TRIAL_TYPE 1 84 84 0.0027 0.9600
Residuals 6 183215 30536
> (result.hsd = HSD.test(model.lm, list('GROUP', 'TRIAL_TYPE')))
$statistics
Mean CV MSerror HSD r.harmonic
305.89 57.12684 30535.91 552.2118 2.4
$parameters
Df ntr StudentizedRange alpha test name.t
6 4 4.895599 0.05 Tukey GROUP:TRIAL_TYPE
$means
DIF_SCORE_STARTLE std r Min Max
PROP:NOAL 214.0000 108.0459 2 137.60 290.40
PROP:TONO 332.5350 175.1716 2 208.67 456.40
SAL:NOAL 262.4667 182.8315 3 80.07 445.73
SAL:TONO 392.8100 192.3561 3 170.73 507.20
$comparison
NULL
$groups
trt means M
1 SAL:TONO 392.8100 a
2 PROP:TONO 332.5350 a
3 SAL:NOAL 262.4667 a
4 PROP:NOAL 214.0000 a
As you see, our 'pairs' have been grouped in one big group a that means that there are not significant difference between them. However there's some difference between NOAL and TONO no matter of SAL and PROP.
Related
There were 4 levels of treatment over 11 sampling time points. Each treatment had 3 identical systems. And samples were analyzed for different parameters. So I used 2 way repeated measures anova, for example:
ezANOVA(data = WP4, dv = parameter1, wid = System, within = Time, between = Treatment, type=3)
However, it produced anova table without sphericity test on some parameters, like below:
Warning: NaNs producedWarning: NaNs produced$ANOVA
Effect DFn DFd F p p<.05 ges
2 Treatment 3 8 2.255871 1.590810e-01 0.2934692
3 Time 10 80 35.989273 1.524902e-25 * 0.6960297
4 Treatment:Time 30 80 7.574502 2.275032e-13 * 0.5911306
Please let me know if you need more information to crack my problem. Thanks.
I would like to have the complete anova table with $Mauchly's Test for Sphericity and $Sphericity Corrections.
I am use to conducting Tukey post-hoc tests in minitab. When I do, I usually get family grouping of the dependent/predictor variables.
In R, using TukeyHSD() the family grouping is not displayed (or calculated?). It only displays the relationship between each of the dependent/predictor variables. Is it possible to display the family groupings like in minitab?
Using the diamonds data set:
av <- aov(price ~ cut, data = diamonds)
tk <- TukeyHSD(av, ordered = T, which = "cut")
plot(tk)
Output:
Fit: aov(formula = price ~ cut, data = diamonds)
$cut
diff lwr upr p adj
Good-Ideal 471.32248 300.28228 642.3627 0.0000000
Very Good-Ideal 524.21792 401.33117 647.1047 0.0000000
Fair-Ideal 901.21579 621.86019 1180.5714 0.0000000
Premium-Ideal 1126.71573 1008.80880 1244.6227 0.0000000
Very Good-Good 52.89544 -130.15186 235.9427 0.9341158
Fair-Good 429.89331 119.33783 740.4488 0.0014980
Premium-Good 655.39325 475.65120 835.1353 0.0000000
Fair-Very Good 376.99787 90.13360 663.8622 0.0031094
Premium-Very Good 602.49781 467.76249 737.2331 0.0000000
Premium-Fair 225.49994 -59.26664 510.2665 0.1950425
Picture added to help clarify my response to Maruits's comment:
Here is a step-by-step example on how to reproduce minitab's table for the ggplot2::diamonds dataset. I've included details/explanation as much as possible.
Please note that as far as I can tell, results shown in minitab's table are not dependent/related to results from Tukey's post-hoc test; they are based on results from the analysis of variance. Tukey's honest significant difference (HSD) test is a post-hoc test that establishes which comparisons (of all the possible pairwise comparisons of group means) are (honestly) statistically significant, given the ANOVA results.
In order to reproduce minitabs "mean-grouping" summary table (see the first table of "Interpret the results: Step 3" of the minitab Express Support), I recommend (re-)running a linear model to extract means and confidence intervals. Note that this is exactly how aov fits the analysis of variance model for each group.
Fit a linear model
We specify a 0 offset to get absolute estimates for every group (rather than estimates for the changes relative to an offset).
fit <- lm(price ~ 0 + cut, data = diamonds)
coef <- summary(fit)$coef;
coef;
# Estimate Std. Error t value Pr(>|t|)
#cutFair 4358.758 98.78795 44.12236 0
#cutGood 3928.864 56.59175 69.42468 0
#cutVery Good 3981.760 36.06181 110.41487 0
#cutPremium 4584.258 33.75352 135.81570 0
#cutIdeal 3457.542 27.00121 128.05137 0
Determine family groupings
In order to obtain something similar to minitab's "family groupings", we adopt the following approach:
Calculate confidence intervals for all parameters
Perform a hierarchical clustering analysis on the confidence interval data for all parameters
Cut the resulting tree at a height corresponding to the standard deviation of the CIs. This will gives us a grouping of parameter estimates based on their confidence intervals. This is a somewhat empirical approach but justifiable as the tree measures pairwise distances between the confidence intervals, and the standard deviation can be interpreted as a Euclidean distance.
We start by calculating the confidence interval and cluster the resulting distance matrix using hierarchical clustering using complete linkage.
CI <- confint(fit);
hc <- hclust(dist(CI));
We inspect the cluster dendrogram
plot(hc);
We now cut the tree at a height corresponding to the standard deviation of all CIs across all parameter estimates to get the "family groupings"
grps <- cutree(hc, h = sd(CI))
Summarise results
Finally, we collate all quantities and store results in a table similar to minitab's "mean-grouping" table.
library(tidyverse)
bind_cols(
cut = rownames(coef),
N = as.integer(table(fit$model$cut)),
Mean = coef[, 1],
Groupings = grps) %>%
as.data.frame()
# cut N Mean Groupings
#1 cutFair 1610 4358.758 1
#2 cutGood 4906 3928.864 2
#3 cutVery Good 12082 3981.760 2
#4 cutPremium 13791 4584.258 1
#5 cutIdeal 21551 3457.542 3
Note the near-perfect agreement of our results with those from the minitab "mean-grouping" table: cut = Ideal is by itself in group 3 (group C in minitab's table), while Fair+Premium share group 1 (minitab: group A ), and Good+Very Good share group 2 (minitab: group B).
See the cld function in the multcomp package, as explained here (copy-pasted below).
Example data set:
> data(ToothGrowth)
> ToothGrowth$treat <- with(ToothGrowth, interaction(supp,dose))
> str(ToothGrowth)
'data.frame': 60 obs. of 3 variables:
$ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
$ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
$ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
$ treat: Factor w/ 6 levels "OJ.0.5","VC.0.5",..: 2 2 2 2 2 2 2 2 2 2 ...
Model fit:
> fit <- lm(len ~ treat, data=ToothGrowth)
All pairwise comparisons with Tukey test:
> apctt <- multcomp::glht(fit, linfct = multcomp::mcp(treat = "Tukey"))
Letter-based representation of all-pairwise comparisons (algorithm from Piepho 2004):
> lbrapc <- multcomp::cld(apctt)
> lbrapc
OJ.0.5 VC.0.5 OJ.1 VC.1 OJ.2 VC.2
"b" "a" "c" "b" "c" "c"
I used lmer from the lme4 package to run a linear mixed effects model. I have 3 years of temperature data for untreated (5) and treated plots (10). The model:
modela<-lmer(ave~yr*tr+(1|pl), REML=FALSE, data=mydata)
Model checked for normality of residuals; qqnorm plot
My data:
'data.frame': 6966 obs. of 7 variables:
$ yr : Factor w/ 3 levels "yr1","yr2","yr3": 1 1 1 1 1 1 1 1 1 1 ...
$ pl : Factor w/ 15 levels "C02","C03","C05",..: 1 1 1 1 1 1 1 1 1 1 ...
$ tr : Factor w/ 2 levels "Cont","OTC": 1 1 1 1 1 1 1 1 1 1 ...
$ ave: num 14.8 16.1 11.6 10.3 11.6 ...
The interaction is significant, so I used lsmeans:
lsmeans(modela, pairwise~yr*tr, adjust="tukey")
In the contrasts, I get (excerpts only)
contrast estimate SE df t.ratio p.value
yr1,Cont - yr2,Cont -0.727102895 0.2731808 6947.24 -2.662 0.0832
yr1,OTC - yr2,OTC -0.990574030 0.2015650 6449.10 -4.914 <.0001
yr1,Cont - yr1,OTC -0.005312771 0.3889335 31.89 -0.014 1.0000
yr2,Cont - yr2,OTC -0.268783907 0.3929332 32.97 -0.684 0.9825
My question regards the high dfs for some of the contrasts, and associated, but meaningless low p-values.
Can this be due to:
-presence of NA's in my data set (some improvement when removed)
-unequal sample sizes (e.g. 5 of one treatment, 10 of the other - however, those (yr1,Cont - yr1, OTC) don't seem to be a problem.
Other issues?
I have searched stakoverflow questions, and crossvalidated.
Thanks for any answers, ideas, comments.
In this example, treatments are assigned experimentally to plots. Having small numbers of plots assigned to treatments severely limits the information available to statistically compare the treatments. (If you had only one plot per treatment, it would not even be possible to compare treatments, because you wouldn't be able to sort out the effect of the treatments from the effect of the plots.) You have 10 plots assigned to one treatment and 5 to the other. In terms of the main effect for treatment, you thus have (10-1)+(5-1) = 13 d.f. for the main effect of treatment, and if you do
lsmeans(modela, pairwise ~ tr)
you will see around 13 d.f. (maybe less due to imbalance and missingness) for those statistics. When you compare combinations of years and treatments, you get roughly 3 times the d.f. because there are 3 years. However, in some of those comparisons, year is that same in each combination being compared, and in those comparisons, the variation in plots mostly cancels out (it is a within-plot comparison); and in those cases, the d.f. basically come from the residual error for the model, which has thousands of d.f. Due to imbalances in the data, these comparisons are a little bit polluted by the between-plot variations, making the d.f. somewhat smaller than the residual d.f.
It appears you are not particularly interested in cross-comparisons such as treat1, year1 vs. treat2, year3. I suggest using "by" variables to cut down on the number of comparisons tested, because when you test them all, the multiplicity correction is unnecessarily conservative. It would go something like this:
modela.lsm = lsmeans(modela, ~ tr * yr)
pairs(modela.lsm, by = "yr") # compare tr for each yr
pairs(modela.lsm, by = "tr") # compare yr for each tr
These calls will apply the Tukey correction separately to each "by" group. If you want a multiplicity correction for each whole family, do this:
rbind(pairs(modela.lsm, by = "yr"))
rbind(pairs(modela.lsm, by = "tr"))
By default, a multivariate t correction is used (Tukey is not the right method here). You can even do
rbind(pairs(modela.lsm, by = "yr"), pairs(modela.lsm, by = "tr"))
to group all of the comparisons into one family and apply a multivariate t adjustment.
I was implement logistic regression to the following data frame and got a reasonable (the same as using STATA) results. But the Pearson chi square and degree of freedom I got from R is very different from STATA, which in turn gave me an very small p-value. And I cannot get the area under ROC curve. Could anyone help me to find out why residual() does not work on glm() with priori weights, and how to deal with area under ROC curve?
Following is my code and output.
1. Data
Here is my data frame test_data, y is outcome, x1 and x2 are covariates:
y x1 x2 freq
0 No 0 268
0 No 1 14
0 Yes 0 109
0 Yes 1 1
1 No 0 31
1 No 1 6
1 Yes 0 45
1 Yes 1 6
I generated this data frame from the original data by counting occurrence of each covariate pattern, and store the number in new variable freq.
2. GLM Model
Then I did the logistic regression as:
logit=glm(y~x1+x2, data=test_data, family=binomial, weights=freq)
Output shows:
Deviance Residuals:
1 2 3 4 5 6 7 8
-7.501 -3.536 -8.818 -1.521 11.957 3.501 10.409 2.129
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.2010 0.1892 -11.632 < 2e-16 ***
x1 1.3538 0.2516 5.381 7.39e-08 ***
x2 1.6261 0.4313 3.770 0.000163 ***
Signif. codes: 0 '' 0.001 '' 0.01 '' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 457.35 on 7 degrees of freedom
Residual deviance: 416.96 on 5 degrees of freedom
AIC: 422.96
Number of Fisher Scoring iterations: 5
Coefficients are the same as STATA.
3. Chi Square Statistics
when I tried to calculate the Pearson chi square:
pearson_chisq = sum(residuals(logit, type = "pearson", weights=test_data$freq)^2)
I got 488, instead of 1.3 given by STATA. Also the DOF generated by R is chisq_dof = df.residuals(logit)=5, instead of 1. So I got an extremely small p-value~e^-100.
4. Discrimination
Then I calculated the area under ROC curve as:
library(verification)
logit_mf = model.frame(logit)
roc.area(logit_mf $y, fitted(logit))$A
The output is:
[1] 0.5
Warning message:
In wilcox.test.default(pred[obs == 1], pred[obs == 0], alternative = "great") :
cannot compute exact p-value with ties
Thanks!
I figured out how to solve this problem eventually. The data set I used above should be summarised to covariate patterns. Then use the definition of Pearson chi square to do calculation. I provide the R code as follows:
# extract covariate patterns
library(dplyr)
temp=test_data %>% group_by(x1, x2) %>% summarise(m=sum(freq),y=sum(freq*y))
temp$pred=fitted(p01_logit_j)[1:4]
# calculate Pearson chi square
temp=mutate(temp, pearson=(y-mpred)/sqrt(mpred*(1-pred)))
pearson_chi2 = with(temp, sum(pearson^2))
temp_dof = 4-(2+1) #dof=J-(p+1)
# calculate p-value
pchisq(pearson_chi2, temp_dof, lower.tail=F)
The result of p-value is 0.241941, which is same as STATA.
In order to calculate AUC, we should first expand the covariate pattern to the "original" data, then use the "expanded" data to get AUC. Noted we have 392 "0" and 88 "1" in the frequency table. My code follows:
# expand observation
y_expand=c(rep(0,392),rep(1,88))
logit_mf = model.frame(logit)
logit_pred = fitted(logit)
logit_mf$freq=test_data$freq
# expand prediction
yhat_expand=with(logit_mf, rep(pred, freq))
library(verification)
roc.area(y_expand, yhat)$A
AUC=0.6760, same as that of STATA.
I have been asked to see if there is a linear trend in 3 groups of data (5 points each) by using ANOVA and linear contrasts. The 3 groups represent data collected in 2010, 2011 and 2012. I want to use R for this procedure and I have tried both of the following:
contrasts(data$groups, how.many=1) <- contr.poly(3)
contrasts(data$groups) <- contr.poly(3)
Both ways seem to work fine but give slightly different answers in terms of their p-values. I have no idea which is correct and it is really tricky to find help for this on the web. I would like help figuring out what is the reasoning behind the different answers. I'm not sure if it has something to do with partitioning sums of squares or whatnot.
Both approaches differ with respect to whether a quadratic polynomial is used.
For illustration purposes, have a look at this example, both x and y are a factor with three levels.
x <- y <- gl(3, 2)
# [1] 1 1 2 2 3 3
# Levels: 1 2 3
The first approach creates a contrast matrix for a quadratic polynomial, i.e., with a linear (.L) and a quadratic trend (.Q). The 3 means: Create the 3 - 1th polynomial.
contrasts(x) <- contr.poly(3)
# [1] 1 1 2 2 3 3
# attr(,"contrasts")
# .L .Q
# 1 -7.071068e-01 0.4082483
# 2 -7.850462e-17 -0.8164966
# 3 7.071068e-01 0.4082483
# Levels: 1 2 3
In contrast, the second approach results in a polynomial of first order (i.e., a linear trend only). This is due to the argument how.many = 1. Hence, only 1 contrast is created.
contrasts(y, how.many = 1) <- contr.poly(3)
# [1] 1 1 2 2 3 3
# attr(,"contrasts")
# .L
# 1 -7.071068e-01
# 2 -7.850462e-17
# 3 7.071068e-01
# Levels: 1 2 3
If you're interested in the linear trend only, the second option seems more appropriate for you.
Changing the contrasts you ask for changes the degrees of freedom of the model. If one model requests linear and quadratic contrasts, and a second specifies only, say, the linear contrast, then the second model has an extra degree of freedom: this will increase the power to test the linear hypothesis, (at the cost of preventing the model fitting the quadratic trend).
Using the full ("nlevels - 1") set of contrasts creates an orthogonal set of contrasts which explore the full set of (independent) response configurations. Cutting back to just one prevents the model from fitting one configuration (in this case the quadratic component which our data in fact possess.
To see how this works, use the built-in dataset mtcars, and test the (confounded) relationship of gears to gallons. We'll hypothesize that the more gears the better (at least up to some point).
df = mtcars # copy the dataset
df$gear = as.ordered(df$gear) # make an ordered factor
Ordered factors default to polynomial contrasts, but we'll set them here to be explicit:
contrasts(df$gear) <- contr.poly(nlevels(df$gear))
Then we can model the relationship.
m1 = lm(mpg ~ gear, data = df);
summary.lm(m1)
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 20.6733 0.9284 22.267 < 2e-16 ***
# gear.L 3.7288 1.7191 2.169 0.03842 *
# gear.Q -4.7275 1.4888 -3.175 0.00353 **
#
# Multiple R-squared: 0.4292, Adjusted R-squared: 0.3898
# F-statistic: 10.9 on 2 and 29 DF, p-value: 0.0002948
Note we have F(2,29) = 10.9 for the overall model and p=.038 for our linear effect with an estimated extra 3.7 mpg/gear.
Now let's only request the linear contrast, and run the "same" analysis.
contrasts(df$gear, how.many = 1) <- contr.poly(nlevels(df$gear))
m1 = lm(mpg ~ gear, data = df)
summary.lm(m1)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 21.317 1.034 20.612 <2e-16 ***
# gear.L 5.548 1.850 2.999 0.0054 **
# Multiple R-squared: 0.2307, Adjusted R-squared: 0.205
# F-statistic: 8.995 on 1 and 30 DF, p-value: 0.005401
The linear effect of gear is now bigger (5.5 mpg) and p << .05 - A win? Except the overall model fit is now significantly worse: variance accounted for is now just 23% (was 43%)! Why is clear if we plot the relationship:
plot(mpg ~ gear, data = df) # view the relationship
So, if you're interested in the linear trend, but also expect (or are unclear about) additional levels of complexity, you should also test these higher polynomials. The quadratic (or, in general, trends up to levels-1).
Note too that in this example the physical mechanism is confounded: We've forgotten that number of gears is confounded with automatic vs manual transmission, and also with weight, and sedan vs sports car.
If someone wants to test the hypothesis that 4 gears is better than 3, they could answer this question :-)