How to use the right data name in the t-test results? - r

I am getting the below result while performing the one sample t-test
One Sample t-test
data: x()
t = 1.9628, df = 6, p-value = 0.09731
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-2.642339 24.070910
sample estimates:
mean of x
10.71429
Here data is not getting displayed as the column name of the data I have used for this test instead it is displaying the variable name I used as part of the code. Now how to change the data name to data column name while displaying the test result?

You can just overwrite the data.name variable of your result.
a <- t.test(c(12,3,4,5,2), c(2,34,2,4,3))
a$data.name <- 'bla bla'
a
Welch Two Sample t-test
data: bla bla
t = -0.58399, df = 4.6367, p-value = 0.5865
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-20.92852 13.32852
sample estimates:
mean of x mean of y
5.2 9.0

Related

How to exract specific values from the lists created by a statistical model (t-test)?

How can extract statistics from this model. To conduct several T-tests I used this:
A<-lapply(merged_DF_final[2:6], function(x) t.test(x ~ merged_DF_final$Group))
How can I extract information about the p-value, t statistics, confidence interval, and group means for each specific subtest and output on a single table?
This is what is saved on A:
$HC_HC_L_amygdala_baseline
Welch Two Sample t-test
data: x by merged_DF_final$Group t = 0.039543, df = 47.412, p-value =
0.9686 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.4694404 0.4882694 sample estimates: mean in group CONN mean in group HC
0.2954200 0.2860055
$HC_HC_L_culmen_baseline
Welch Two Sample t-test
data: x by merged_DF_final$Group t = 0.81387, df = 53.695, p-value =
0.4193 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.2970321 0.7028955 sample estimates: mean in group CONN mean in group HC
0.4020883 0.1991566
$HC_HC_L_fusiform_baseline
Welch Two Sample t-test
data: x by merged_DF_final$Group t = 0.024945, df = 53.851, p-value =
0.9802 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.5768786 0.5914136 sample estimates: mean in group CONN mean in group HC
0.5552184 0.5479509
$HC_HC_L_insula_baseline
Welch Two Sample t-test
data: x by merged_DF_final$Group t = 0.79659, df = 52.141, p-value =
0.4293 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.3000513 0.6951466 sample estimates: mean in group CONN mean in group HC
0.12436946 -0.07317818
$HC_HC_L_lingual_gyrus_baseline
Welch Two Sample t-test
data: x by merged_DF_final$Group t = -0.11033, df = 53.756, p-value =
0.9126 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.5172863 0.4633268 sample estimates: mean in group CONN mean in group HC
0.4395066 0.4664864
Look at names(A[[1]]) or str(A[[1]]) to see what the components are, then use $ or [[ to extract them, e.g.
names(t.test(extra ~ group, data = sleep))
[1] "statistic" "parameter" "p.value" "conf.int" "estimate"
[6] "null.value" "stderr" "alternative" "method" "data.name"
You can then sapply(A, "[[", "statistic") or (being more careful) vapply(A, "[[", "statistic", FUN.VALUE = numeric(1))
If you like tidyverse you can purrr::map_dbl(A, "statistic") (for results with a single value); you'll need purrr::map(A, ~.$estimate[1]) for the mean of the first group etc.. (sapply() will automatically collapse to a matrix.)

T.test on a small set of data from a large set

we compare the average (or mean) of one group against the set average (or mean). This set average can be any theoretical value (or it can be the population mean).
I am trying to compute the average mean of a small group of 300 observations against 1500 observations using one sided t.test.Is this approach correct? If not is there an alternative to this?
head(data$BMI)
attach(data)
tester<-mean(BMI)
table(BMI)
set.seed(123)
sampler<-sample(min(BMI):max(BMI),300,replace = TRUE)
mean(sampler)
t.test(sampler,tester)
The last line of the code yield-
Error in t.test.default(sampler, tester) : not enough 'y' observations
For testing your sample in t.test, you can do:
d <- rnorm(1500,mean = 3, sd = 1)
s <- sample(d,300)
Then, test for the normality of d and s:
> shapiro.test(d)
Shapiro-Wilk normality test
data: d
W = 0.9993, p-value = 0.8734
> shapiro.test(s)
Shapiro-Wilk normality test
data: s
W = 0.99202, p-value = 0.1065
Here the test is superior to 0.05, so you could consider that both d and s are normally distributed. So, you can test for t.test:
> t.test(d,s)
Welch Two Sample t-test
data: d and s
t = 0.32389, df = 444.25, p-value = 0.7462
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.09790144 0.13653776
sample estimates:
mean of x mean of y
2.969257 2.949939

How do I compute point-by-point t tests between two 50 data point vectors?

I have a data frame with 3 variables and 50 instances (ID,pre and post).somewhat like this
ID<- c("1","2","3","4","5","6","7","8","9","10")
pre<- c("2.56802","2.6686","1.0145","0.2568","2.369","1.2365","0.6989","0.98745","1.09878","2.454658")
post<-c("3.3323","2.66989","1.565656","2.58989","5.96987","3.12145","1.23565","2.74741","2.54101","0.23568")
dfw1<-data.frame(ID,pre,post)
Pre and post columns are mean of other population. I want to run two-tailed t-test between first elements of both pre and post.(pre against post). I want this to loop over all 50 rows. I have tried writing loops as shown below,
t<-0
for (i in 1:nrow(dfw$ID)) {
t[i]<-t.test(dfw$pre,dfw$post,alternative = c("two.sided"), conf.level = 0.95)
print(t)
}
it returned an error
I want to extract statistics of above such as df,p-value, t-value for each row and so on. How do I write this code in R?
This code shows that you cannot reject the null hypothesis of 0 difference at the conventional 5% confidence level:
ID<- c("1","2","3","4","5","6","7","8","9","10")
pre<- as.numeric(c("2.56802","2.6686","1.0145","0.2568","2.369","1.2365","0.6989","0.98745","1.09878","2.454658"))
post<-as.numeric(c("3.3323","2.66989","1.565656","2.58989","5.96987","3.12145","1.23565","2.74741","2.54101","0.23568"))
dfw1<-data.frame(ID,pre,post)
t.test(dfw1$pre,dfw1$post,alternative = c("two.sided"), conf.level = 0.95, paired=TRUE)
Output (giving you the df, t-stat and p-value):
Paired t-test
data: dfw1$pre and dfw1$post
t = -2.1608, df = 9, p-value = 0.05899
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.18109315 0.04997355
sample estimates:
mean of the differences
-1.06556

How to construct a table from a t-test in R

I was provided with three t-tests:
Two Sample t-test
data: cammol by gender
t = -3.8406, df = 175, p-value = 0.0001714
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.11460843 -0.03680225
sample estimates:
mean in group 1 mean in group 2
2.318132 2.393837
Welch Two Sample t-test
data: alkphos by gender
t = -2.9613, df = 145.68, p-value = 0.003578
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-22.351819 -4.458589
sample estimates:
mean in group 1 mean in group 2
85.81319 99.21839
Two Sample t-test
data: phosmol by gender
t = -3.4522, df = 175, p-value = 0.0006971
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.14029556 -0.03823242
sample estimates:
mean in group 1 mean in group 2
1.059341 1.148605
And I want to construct a table with these t-test results in R markdown like:
wanted_table_format
I've tried reading some instructions for using "knitr" and "kable" functions, but honestly, I do not know how to apply the t-test results to those functions.
What could I do?
Suppose your three t-tests are saved as t1, t2, and t3.
t1 <- t.test(rnorm(100), rnorm(100)
t2 <- t.test(rnorm(100), rnorm(100, 1))
t3 <- t.test(rnorm(100), rnorm(100, 2))
You could turn them into one data frame (that can then be printed as a table) with the broom and purrr packages:
library(broom)
library(purrr)
tab <- map_df(list(t1, t2, t3), tidy)
On the above data, this would become:
estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
1 0.07889713 -0.008136139 -0.08703327 0.535986 5.925840e-01 193.4152 -0.2114261 0.3692204
2 -0.84980010 0.132836627 0.98263673 -6.169076 3.913068e-09 194.2561 -1.1214809 -0.5781193
3 -1.95876967 -0.039048940 1.91972073 -13.270232 3.618929e-29 197.9963 -2.2498519 -1.6676875
method alternative
1 Welch Two Sample t-test two.sided
2 Welch Two Sample t-test two.sided
3 Welch Two Sample t-test two.sided
Some of the columns probably don't matter to you, so you could do something like this to get just the columns you want:
tab[c("estimate", "statistic", "p.value", "conf.low", "conf.high")]
As noted in the comments, you'd have to first do install.packages("broom") and install.packages("purrr").

t test in r giving wrong estimate of mean vs aggregate function

totaldata$Age2 <- ifelse(totaldata$Age<=50, 0, 1)
t.test(totaldata$concernsubscorehiv, totaldata$Age2,alternative='two.sided',na.rm=TRUE, conf.level=.95, paired=FALSE
This code yiels this result:
Welch Two Sample t-test
data:
totaldata$concernsubscorehiv and totaldata$Age2
t = 33.19, df = 127.42, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
3.370758 3.798164
sample estimates:
mean of x mean of y
4.336842 0.752381
As you can see the mean of group y is 0.752381
Then we I estimate the mean of each group using this:
aggregate(totaldata$concernsubscorehiv~totaldata$Age2,data=totaldata,mean)
This yields
totaldata$Age2 totaldata$concernsubscorehiv
1 0 4.354286
2 1 4.330612
As you can see the mean of group 0 is 4.354286 not 0.752381 as estimated by t test. What is the problem?
You don't use t.test correctly. 0.752381 is the fraction of people for which age2 is 1. You are supplying a vector of your normal data, and a vector of zero and ones, when instead you want to split up the first vector based on the grouping in the second.
Consider the following:
out <- rnorm(10)*5+100
bin <- rbinom(n=10, size=1, prob=0.5)
mean(out)
[1] 101.9462
mean(bin)
[1] 0.4
From the ?t.test helpfile, we know that x and y are:
x a (non-empty) numeric vector of data values.
y an optional (non-empty) numeric vector of data values.
So, by supplying both out and bin, I compare each vector to each other, which probably does not make much sense in this example. See:
t.test(out, bin)
Welch Two Sample t-test
data: out and bin
t = 86.665, df = 9.3564, p-value = 6.521e-15
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
98.91092 104.18149
sample estimates:
mean of x mean of y
101.9462 0.4000
Here, you see that t.test correctly estimated the means for my two supplied vectors, as I have shown above. What you want to do is to split up the first vector based on whether or not the second is 0 or 1.
In my toy example, I can do this easily by writing:
t.test(out[which(bin==1)], out[which(bin==0)])
Welch Two Sample t-test
data: out[which(bin == 1)] and out[which(bin == 0)]
t = 0.34943, df = 5.1963, p-value = 0.7405
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-5.828182 7.686176
sample estimates:
mean of x mean of y
102.5036 101.5746
Here, these two means correspond exactly to
tapply(out, bin, mean)
0 1
101.5746 102.5036

Resources