How to construct a table from a t-test in R - r

I was provided with three t-tests:
Two Sample t-test
data: cammol by gender
t = -3.8406, df = 175, p-value = 0.0001714
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.11460843 -0.03680225
sample estimates:
mean in group 1 mean in group 2
2.318132 2.393837
Welch Two Sample t-test
data: alkphos by gender
t = -2.9613, df = 145.68, p-value = 0.003578
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-22.351819 -4.458589
sample estimates:
mean in group 1 mean in group 2
85.81319 99.21839
Two Sample t-test
data: phosmol by gender
t = -3.4522, df = 175, p-value = 0.0006971
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.14029556 -0.03823242
sample estimates:
mean in group 1 mean in group 2
1.059341 1.148605
And I want to construct a table with these t-test results in R markdown like:
wanted_table_format
I've tried reading some instructions for using "knitr" and "kable" functions, but honestly, I do not know how to apply the t-test results to those functions.
What could I do?

Suppose your three t-tests are saved as t1, t2, and t3.
t1 <- t.test(rnorm(100), rnorm(100)
t2 <- t.test(rnorm(100), rnorm(100, 1))
t3 <- t.test(rnorm(100), rnorm(100, 2))
You could turn them into one data frame (that can then be printed as a table) with the broom and purrr packages:
library(broom)
library(purrr)
tab <- map_df(list(t1, t2, t3), tidy)
On the above data, this would become:
estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
1 0.07889713 -0.008136139 -0.08703327 0.535986 5.925840e-01 193.4152 -0.2114261 0.3692204
2 -0.84980010 0.132836627 0.98263673 -6.169076 3.913068e-09 194.2561 -1.1214809 -0.5781193
3 -1.95876967 -0.039048940 1.91972073 -13.270232 3.618929e-29 197.9963 -2.2498519 -1.6676875
method alternative
1 Welch Two Sample t-test two.sided
2 Welch Two Sample t-test two.sided
3 Welch Two Sample t-test two.sided
Some of the columns probably don't matter to you, so you could do something like this to get just the columns you want:
tab[c("estimate", "statistic", "p.value", "conf.low", "conf.high")]
As noted in the comments, you'd have to first do install.packages("broom") and install.packages("purrr").

Related

How to exract specific values from the lists created by a statistical model (t-test)?

How can extract statistics from this model. To conduct several T-tests I used this:
A<-lapply(merged_DF_final[2:6], function(x) t.test(x ~ merged_DF_final$Group))
How can I extract information about the p-value, t statistics, confidence interval, and group means for each specific subtest and output on a single table?
This is what is saved on A:
$HC_HC_L_amygdala_baseline
Welch Two Sample t-test
data: x by merged_DF_final$Group t = 0.039543, df = 47.412, p-value =
0.9686 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.4694404 0.4882694 sample estimates: mean in group CONN mean in group HC
0.2954200 0.2860055
$HC_HC_L_culmen_baseline
Welch Two Sample t-test
data: x by merged_DF_final$Group t = 0.81387, df = 53.695, p-value =
0.4193 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.2970321 0.7028955 sample estimates: mean in group CONN mean in group HC
0.4020883 0.1991566
$HC_HC_L_fusiform_baseline
Welch Two Sample t-test
data: x by merged_DF_final$Group t = 0.024945, df = 53.851, p-value =
0.9802 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.5768786 0.5914136 sample estimates: mean in group CONN mean in group HC
0.5552184 0.5479509
$HC_HC_L_insula_baseline
Welch Two Sample t-test
data: x by merged_DF_final$Group t = 0.79659, df = 52.141, p-value =
0.4293 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.3000513 0.6951466 sample estimates: mean in group CONN mean in group HC
0.12436946 -0.07317818
$HC_HC_L_lingual_gyrus_baseline
Welch Two Sample t-test
data: x by merged_DF_final$Group t = -0.11033, df = 53.756, p-value =
0.9126 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.5172863 0.4633268 sample estimates: mean in group CONN mean in group HC
0.4395066 0.4664864
Look at names(A[[1]]) or str(A[[1]]) to see what the components are, then use $ or [[ to extract them, e.g.
names(t.test(extra ~ group, data = sleep))
[1] "statistic" "parameter" "p.value" "conf.int" "estimate"
[6] "null.value" "stderr" "alternative" "method" "data.name"
You can then sapply(A, "[[", "statistic") or (being more careful) vapply(A, "[[", "statistic", FUN.VALUE = numeric(1))
If you like tidyverse you can purrr::map_dbl(A, "statistic") (for results with a single value); you'll need purrr::map(A, ~.$estimate[1]) for the mean of the first group etc.. (sapply() will automatically collapse to a matrix.)

T.test on a small set of data from a large set

we compare the average (or mean) of one group against the set average (or mean). This set average can be any theoretical value (or it can be the population mean).
I am trying to compute the average mean of a small group of 300 observations against 1500 observations using one sided t.test.Is this approach correct? If not is there an alternative to this?
head(data$BMI)
attach(data)
tester<-mean(BMI)
table(BMI)
set.seed(123)
sampler<-sample(min(BMI):max(BMI),300,replace = TRUE)
mean(sampler)
t.test(sampler,tester)
The last line of the code yield-
Error in t.test.default(sampler, tester) : not enough 'y' observations
For testing your sample in t.test, you can do:
d <- rnorm(1500,mean = 3, sd = 1)
s <- sample(d,300)
Then, test for the normality of d and s:
> shapiro.test(d)
Shapiro-Wilk normality test
data: d
W = 0.9993, p-value = 0.8734
> shapiro.test(s)
Shapiro-Wilk normality test
data: s
W = 0.99202, p-value = 0.1065
Here the test is superior to 0.05, so you could consider that both d and s are normally distributed. So, you can test for t.test:
> t.test(d,s)
Welch Two Sample t-test
data: d and s
t = 0.32389, df = 444.25, p-value = 0.7462
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.09790144 0.13653776
sample estimates:
mean of x mean of y
2.969257 2.949939

Confidence Interval for a Chi-Square in R

How to calculate Confidence Interval for a Chi-Square in R. Is there a function like chisq.test(),
There is no confidence interval for a chi-square test (you're just checking to see if the first categorical and the second categorical variable are independent), but you can do a confidence interval for the difference in proportions, like this.
Say you have some data where 30% of the first group report success, while 70% of a second group report success:
row1 <- c(70,30)
row2 <- c(30,70)
my.table <- rbind(row1,row2)
Now you have data in contingency table:
> my.table
[,1] [,2]
row1 70 30
row2 30 70
Which you can run chisq.test on, and clearly those two proportions are significantly different so the categorical variables must be independent:
> chisq.test(my.table)
Pearson's Chi-squared test with Yates' continuity correction
data: my.table
X-squared = 30.42, df = 1, p-value = 3.479e-08
If you do prop.test you find that you are 95% confident the difference between the proportions is somewhere between 26.29% and 53.70%, which makes sense, because the actual difference between the two observed proportions is 70%-30%=40%:
> prop.test(x=c(70,30),n=c(100,100))
2-sample test for equality of proportions with continuity correction
data: c(70, 30) out of c(100, 100)
X-squared = 30.42, df = 1, p-value = 3.479e-08
alternative hypothesis: two.sided
95 percent confidence interval:
0.2629798 0.5370202
sample estimates:
prop 1 prop 2
0.7 0.3
An addition to #mysteRious' nice answer: If you have a 2x2 contingency matrix, you could use fisher.test instead of prop.test to test for differences in the ratio of proportions instead of the difference of ratios. In Fisher's exact test the null hypothesis corresponds to an odds-ratio (OR) = 1.
Using #mysteRious' sample data
ft <- fisher.test(my.table)
ft
#
# Fisher's Exact Test for Count Data
#
#data: my.table
#p-value = 2.31e-08
#alternative hypothesis: true odds ratio is not equal to 1
#95 percent confidence interval:
# 2.851947 10.440153
#sample estimates:
#odds ratio
# 5.392849
Confidence intervals for the OR are then given in fit$conf.int
ft$conf.int
#[1] 2.851947 10.440153
#attr(,"conf.level")
#[1] 0.95
To confirm, we manually calculate the OR
OR <- Reduce("/", my.table[, 1] / my.table[, 2])
OR
#[1] 5.444444

How to use the right data name in the t-test results?

I am getting the below result while performing the one sample t-test
One Sample t-test
data: x()
t = 1.9628, df = 6, p-value = 0.09731
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-2.642339 24.070910
sample estimates:
mean of x
10.71429
Here data is not getting displayed as the column name of the data I have used for this test instead it is displaying the variable name I used as part of the code. Now how to change the data name to data column name while displaying the test result?
You can just overwrite the data.name variable of your result.
a <- t.test(c(12,3,4,5,2), c(2,34,2,4,3))
a$data.name <- 'bla bla'
a
Welch Two Sample t-test
data: bla bla
t = -0.58399, df = 4.6367, p-value = 0.5865
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-20.92852 13.32852
sample estimates:
mean of x mean of y
5.2 9.0

Pairwise t test using plyr

I would like to use the R package plyr to run a pairwise t test on a really large data frame, but I'm not sure how to do it. I recently learned how to do correlations using plyr, and I really like how you can specify which groups you want to compare and then plyr breaks down the data for you. For example, you could have plyr calculate the correlation between sepal length and sepal width for each species of iris in the iris dataset like this:
Correlations <- ddply(iris, "Species", function(x) cor(x$Sepal.Length, x$Sepal.Width))
I could break the data frame down myself by specifying that the data for the setosa species of iris are in rows 1:50 and so on, but plyr would be less likely than me to mess up and accidentally say rows 1:51, for example.
So how do I do something similar with a paired t test? How can I specify which observations are the pairs? Here's some example data that are similar to what I'm working with, and I'd like the pairs to be the Subject and I'd like to break the data down by Pesticide:
Exposure <- data.frame("Subject" = rep(1:4, 6),
"Season" = rep(c(rep("summer", 4), rep("winter", 4)),3),
"Pesticide" = rep(c("atrazine", "metolachlor", "chlorpyrifos"), each=8),
"Exposure" = sample(1:100, size=24))
Exposure$Subject <- as.factor(Exposure$Subject)
In other words, the question I'd like to evaluate is whether there is a difference in pesticide exposure for each person during the winter versus during the summer, and I'd like to answer that question separately for each of the three pesticides.
Much thanks in advance!
An edit: To clarify, this is how to do an unpaired t test in plyr:
TTests <- dlply(Exposure, "Pesticide", function(x) t.test(x$Exposure ~ x$Season))
And if I add "paired=T" in there, plyr will do a paired t test, but it assumes that I always have the pairs in the same order. While I do have them all in the same order in the example data frame above, I don't in my real data because I sometimes have missing data.
Do you want this?
library(data.table)
# convert to data.table in place
setDT(Exposure)
# make sure data is sorted correctly
setkey(Exposure, Pesticide, Season, Subject)
Exposure[, list(res = list(t.test(Exposure[Season == "summer"],
Exposure[Season == "winter"],
paired = T)))
, by = Pesticide]$res
#[[1]]
#
# Paired t-test
#
#data: Exposure[Season == "summer"] and Exposure[Season == "winter"]
#t = -4.1295, df = 3, p-value = 0.02576
#alternative hypothesis: true difference in means is not equal to 0
#95 percent confidence interval:
# -31.871962 -4.128038
#sample estimates:
#mean of the differences
# -18
#
#
#[[2]]
#
# Paired t-test
#
#data: Exposure[Season == "summer"] and Exposure[Season == "winter"]
#t = -6.458, df = 3, p-value = 0.007532
#alternative hypothesis: true difference in means is not equal to 0
#95 percent confidence interval:
# -73.89299 -25.10701
#sample estimates:
#mean of the differences
# -49.5
#
#
#[[3]]
#
# Paired t-test
#
#data: Exposure[Season == "summer"] and Exposure[Season == "winter"]
#t = -2.5162, df = 3, p-value = 0.08646
#alternative hypothesis: true difference in means is not equal to 0
#95 percent confidence interval:
# -30.008282 3.508282
#sample estimates:
#mean of the differences
# -13.25
I don't know ddply, but here's how I would do using some base functions.
by(data = Exposure, INDICES = Exposure$Pesticide, FUN = function(x) {
t.test(Exposure ~ Season, data = x)
})
Exposure$Pesticide: atrazine
Welch Two Sample t-test
data: Exposure by Season
t = -0.1468, df = 5.494, p-value = 0.8885
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-49.63477 44.13477
sample estimates:
mean in group summer mean in group winter
60.50 63.25
----------------------------------------------------------------------------------------------
Exposure$Pesticide: chlorpyrifos
Welch Two Sample t-test
data: Exposure by Season
t = -0.8932, df = 4.704, p-value = 0.4151
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-83.58274 41.08274
sample estimates:
mean in group summer mean in group winter
52.25 73.50
----------------------------------------------------------------------------------------------
Exposure$Pesticide: metolachlor
Welch Two Sample t-test
data: Exposure by Season
t = 0.8602, df = 5.561, p-value = 0.4252
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-39.8993 81.8993
sample estimates:
mean in group summer mean in group winter
62.5 41.5

Resources