I am running prop.test on R. Here are the contingency table and the output of the prop.test. 'a' is the name of the data frame. 'cluster' and 'AD_2' are two dichotomic variables.
table(a$cluster,a$AD_2)
no yes
neg 1227 375
pos 546 292
prop.test(table(a$cluster,a$AD_2))
2-sample test for equality of proportions with continuity
correction
data: table(a$cluster, a$AD_2)
X-squared = 35.656, df = 1, p-value = 2.355e-09
alternative hypothesis: two.sided
95 percent confidence interval:
0.07510846 0.15362412
sample estimates:
prop 1 prop 2
0.7659176 0.6515513
Sample estimates are conditioned on AD_2 being 'no' as can been seen from the contingency table, i.e. 0.7659176 = 1227/(1227+375) and 0.6515513 = 546/(546+292). Being a$cluster==pos the positive event and AD_2==yes the risk factor, I would like to reverse the proportions conditioning on AD_2 equal to 'yes'.
R tables are essentially matrices. The `prop.test function can handle matrices, so use the same data with the columns switched:
> prop.test( matrix(c( 375,292,1227,546), 2))
2-sample test for equality of proportions with continuity correction
data: matrix(c(375, 292, 1227, 546), 2)
X-squared = 35.656, df = 1, p-value = 2.355e-09
alternative hypothesis: two.sided
95 percent confidence interval:
-0.15362412 -0.07510846
sample estimates:
prop 1 prop 2
0.2340824 0.3484487
I think another method might have been to swap the columns with:
table(a$cluster,a$AD_2)[ , 2:1]
Related
we compare the average (or mean) of one group against the set average (or mean). This set average can be any theoretical value (or it can be the population mean).
I am trying to compute the average mean of a small group of 300 observations against 1500 observations using one sided t.test.Is this approach correct? If not is there an alternative to this?
head(data$BMI)
attach(data)
tester<-mean(BMI)
table(BMI)
set.seed(123)
sampler<-sample(min(BMI):max(BMI),300,replace = TRUE)
mean(sampler)
t.test(sampler,tester)
The last line of the code yield-
Error in t.test.default(sampler, tester) : not enough 'y' observations
For testing your sample in t.test, you can do:
d <- rnorm(1500,mean = 3, sd = 1)
s <- sample(d,300)
Then, test for the normality of d and s:
> shapiro.test(d)
Shapiro-Wilk normality test
data: d
W = 0.9993, p-value = 0.8734
> shapiro.test(s)
Shapiro-Wilk normality test
data: s
W = 0.99202, p-value = 0.1065
Here the test is superior to 0.05, so you could consider that both d and s are normally distributed. So, you can test for t.test:
> t.test(d,s)
Welch Two Sample t-test
data: d and s
t = 0.32389, df = 444.25, p-value = 0.7462
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.09790144 0.13653776
sample estimates:
mean of x mean of y
2.969257 2.949939
How to calculate Confidence Interval for a Chi-Square in R. Is there a function like chisq.test(),
There is no confidence interval for a chi-square test (you're just checking to see if the first categorical and the second categorical variable are independent), but you can do a confidence interval for the difference in proportions, like this.
Say you have some data where 30% of the first group report success, while 70% of a second group report success:
row1 <- c(70,30)
row2 <- c(30,70)
my.table <- rbind(row1,row2)
Now you have data in contingency table:
> my.table
[,1] [,2]
row1 70 30
row2 30 70
Which you can run chisq.test on, and clearly those two proportions are significantly different so the categorical variables must be independent:
> chisq.test(my.table)
Pearson's Chi-squared test with Yates' continuity correction
data: my.table
X-squared = 30.42, df = 1, p-value = 3.479e-08
If you do prop.test you find that you are 95% confident the difference between the proportions is somewhere between 26.29% and 53.70%, which makes sense, because the actual difference between the two observed proportions is 70%-30%=40%:
> prop.test(x=c(70,30),n=c(100,100))
2-sample test for equality of proportions with continuity correction
data: c(70, 30) out of c(100, 100)
X-squared = 30.42, df = 1, p-value = 3.479e-08
alternative hypothesis: two.sided
95 percent confidence interval:
0.2629798 0.5370202
sample estimates:
prop 1 prop 2
0.7 0.3
An addition to #mysteRious' nice answer: If you have a 2x2 contingency matrix, you could use fisher.test instead of prop.test to test for differences in the ratio of proportions instead of the difference of ratios. In Fisher's exact test the null hypothesis corresponds to an odds-ratio (OR) = 1.
Using #mysteRious' sample data
ft <- fisher.test(my.table)
ft
#
# Fisher's Exact Test for Count Data
#
#data: my.table
#p-value = 2.31e-08
#alternative hypothesis: true odds ratio is not equal to 1
#95 percent confidence interval:
# 2.851947 10.440153
#sample estimates:
#odds ratio
# 5.392849
Confidence intervals for the OR are then given in fit$conf.int
ft$conf.int
#[1] 2.851947 10.440153
#attr(,"conf.level")
#[1] 0.95
To confirm, we manually calculate the OR
OR <- Reduce("/", my.table[, 1] / my.table[, 2])
OR
#[1] 5.444444
I was provided with three t-tests:
Two Sample t-test
data: cammol by gender
t = -3.8406, df = 175, p-value = 0.0001714
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.11460843 -0.03680225
sample estimates:
mean in group 1 mean in group 2
2.318132 2.393837
Welch Two Sample t-test
data: alkphos by gender
t = -2.9613, df = 145.68, p-value = 0.003578
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-22.351819 -4.458589
sample estimates:
mean in group 1 mean in group 2
85.81319 99.21839
Two Sample t-test
data: phosmol by gender
t = -3.4522, df = 175, p-value = 0.0006971
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.14029556 -0.03823242
sample estimates:
mean in group 1 mean in group 2
1.059341 1.148605
And I want to construct a table with these t-test results in R markdown like:
wanted_table_format
I've tried reading some instructions for using "knitr" and "kable" functions, but honestly, I do not know how to apply the t-test results to those functions.
What could I do?
Suppose your three t-tests are saved as t1, t2, and t3.
t1 <- t.test(rnorm(100), rnorm(100)
t2 <- t.test(rnorm(100), rnorm(100, 1))
t3 <- t.test(rnorm(100), rnorm(100, 2))
You could turn them into one data frame (that can then be printed as a table) with the broom and purrr packages:
library(broom)
library(purrr)
tab <- map_df(list(t1, t2, t3), tidy)
On the above data, this would become:
estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
1 0.07889713 -0.008136139 -0.08703327 0.535986 5.925840e-01 193.4152 -0.2114261 0.3692204
2 -0.84980010 0.132836627 0.98263673 -6.169076 3.913068e-09 194.2561 -1.1214809 -0.5781193
3 -1.95876967 -0.039048940 1.91972073 -13.270232 3.618929e-29 197.9963 -2.2498519 -1.6676875
method alternative
1 Welch Two Sample t-test two.sided
2 Welch Two Sample t-test two.sided
3 Welch Two Sample t-test two.sided
Some of the columns probably don't matter to you, so you could do something like this to get just the columns you want:
tab[c("estimate", "statistic", "p.value", "conf.low", "conf.high")]
As noted in the comments, you'd have to first do install.packages("broom") and install.packages("purrr").
Why would r's prop.test function (documentation here) return different results based on whether I pass it a matrix or vectors?
Here I pass it vectors:
> prop.test(x = c(135, 47), n = c(1781, 1443))
2-sample test for equality of proportions with
continuity correction
data: c(135, 47) out of c(1781, 1443)
X-squared = 27.161, df = 1, p-value = 1.872e-07
alternative hypothesis: two.sided
95 percent confidence interval:
0.02727260 0.05918556
sample estimates:
prop 1 prop 2
0.07580011 0.03257103
Here I create a matrix and pass it in instead:
> table <- matrix(c(135, 47, 1781, 1443), ncol=2)
> prop.test(table)
2-sample test for equality of proportions with
continuity correction
data: table
X-squared = 24.333, df = 1, p-value = 8.105e-07
alternative hypothesis: two.sided
95 percent confidence interval:
0.02382527 0.05400606
sample estimates:
prop 1 prop 2
0.07045929 0.03154362
Why do I get different results? I expect the same results for both scenarios to be returned.
When x and n are entered as separate vectors, they are treated, respectively, as the number of successes and the total number of trials. But when you enter a matrix, the first column is treated as the number of successes and the second as the number of failures. From the help for prop.test:
x a vector of counts of successes, a one-dimensional table with two
entries, or a two-dimensional table (or matrix) with 2 columns, giving
the counts of successes and failures, respectively.
So, to get the same result with a matrix, you need to convert the second column of the matrix to the number of failures (assuming that in your example x is the number of successes and n is the number of trials).
x = c(135, 47)
n = c(1781, 1443)
prop.test(x, n) # x = successes; n = total trials
2-sample test for equality of proportions with continuity correction
data: x out of n
X-squared = 27.161, df = 1, p-value = 1.872e-07
alternative hypothesis: two.sided
95 percent confidence interval:
0.02727260 0.05918556
sample estimates:
prop 1 prop 2
0.07580011 0.03257103
prop.test(cbind(x, n - x)) # x = successes; convert n to number of failures
2-sample test for equality of proportions with continuity correction
data: cbind(x, n - x)
X-squared = 27.161, df = 1, p-value = 1.872e-07
alternative hypothesis: two.sided
95 percent confidence interval:
0.02727260 0.05918556
sample estimates:
prop 1 prop 2
0.07580011 0.03257103
I need some clarification about the use of the prop.test command in R.
Please see the below example:
pill <- matrix(c(122,478,99,301), nrow=2, byrow=TRUE)
dimnames(pill) <- list(c("Pill", "Placebo"), c("Positive", "Negative"))
pill
Positive Negative
Pill 122 478
Placebo 99 301
prop.test(pill, correct=F)
The last line of code in the above example returns a p-value of 0.09914.
However, when we enter the above values directly, we get a completely different p-value:
prop.test(x=c(122/600,99/400), n=c(600,400), correct=F)
The above line of code returns a p-value of 0.8382.
Why does that happen?
Don't divide by the numbers in the group. That would produce a substantially diminished sample size which severely affects the p-value.:
prop.test(x=c(122,99), n=c(600,400), correct=F)
2-sample test for equality of proportions without continuity
correction
data: c(122, 99) out of c(600, 400)
X-squared = 2.7194, df = 1, p-value = 0.09914
alternative hypothesis: two.sided
95 percent confidence interval:
-0.097324375 0.008991042
sample estimates:
prop 1 prop 2
0.2033333 0.2475000
You should have noticed the strange results for the estimated proportions with your call:
prop 1 prop 2
0.0003388889 0.0006187500