I need some clarification about the use of the prop.test command in R.
Please see the below example:
pill <- matrix(c(122,478,99,301), nrow=2, byrow=TRUE)
dimnames(pill) <- list(c("Pill", "Placebo"), c("Positive", "Negative"))
pill
Positive Negative
Pill 122 478
Placebo 99 301
prop.test(pill, correct=F)
The last line of code in the above example returns a p-value of 0.09914.
However, when we enter the above values directly, we get a completely different p-value:
prop.test(x=c(122/600,99/400), n=c(600,400), correct=F)
The above line of code returns a p-value of 0.8382.
Why does that happen?
Don't divide by the numbers in the group. That would produce a substantially diminished sample size which severely affects the p-value.:
prop.test(x=c(122,99), n=c(600,400), correct=F)
2-sample test for equality of proportions without continuity
correction
data: c(122, 99) out of c(600, 400)
X-squared = 2.7194, df = 1, p-value = 0.09914
alternative hypothesis: two.sided
95 percent confidence interval:
-0.097324375 0.008991042
sample estimates:
prop 1 prop 2
0.2033333 0.2475000
You should have noticed the strange results for the estimated proportions with your call:
prop 1 prop 2
0.0003388889 0.0006187500
Related
The p-value using cor.test() is different than by hand. I can't figure out what in the world I'm missing. Any help would be greatly appreciated!
Pearson's product-moment correlation
data: GSSE_new$MusicPerceptionScores and GSSE_new$MusicAptitudeScores
t = 27.152, df = 148, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.8811990 0.9359591
sample estimates:
cor
0.9125834
#######
2*pt(q=MPMA_cortest$statistic, df=MPMA_cortest$parameter, lower.tail=FALSE)
[1] 2.360846e-59
Since you have not supplied a Minimal Reproducible Example with actual data, I cannot confirm with your own data, but here is a procedure that shows the manual version is equal to the cor.test p value:
MPMA_cortest <- cor.test(mtcars$hp, mtcars$mpg)
p_manual <- pt(
q = abs(MPMA_cortest$statistic),
df = MPMA_cortest$parameter,
lower.tail = FALSE) * 2
p_manual == MPMA_cortest$p.value
#> t
#> TRUE
Edit: Also note that the cor.test printout only says p-value < 2.2e-16. The two values may well be exactly equal (yours is smaller, thus meeting the inequality condition).
I am wondering if there is a way to change the output settings of the prop.test function in R so that it displays the confidence interval already in percentage terms instead of a decimal? For example, I am trying to find the 95% confidence interval for the proportion of immigrants in the West with diabetes. Here is my code and the output:
sum(Immigrant_West$DIABETES)= 8, nrow(Immigrant_West)=144
prop.test(x=sum(Immigrant_West$DIABETES),n=nrow(Immigrant_West),conf.level = .95,correct=TRUE)
> 1-sample proportions test with continuity correction
data: sum(Immigrant_West$DIABETES) out of nrow(Immigrant_West), null probability 0.5
X-squared = 112, df = 1, p-value <2e-16
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.02606 0.11017
sample estimates:
p
0.05556
So is there a way to change the confidence interval output to show [2.606%, 11.017%] instead of as decimals? Thank you!
This will probably be simpler:
prp.out <- prop.test(x=8, n=144, conf.level=.95, correct=TRUE)
prp.out$conf.int <- prp.out$conf.int * 100
prp.out
#
# 1-sample proportions test with continuity correction
#
# data: 8 out of 144, null probability 0.5
# X-squared = 112.01, df = 1, p-value < 2.2e-16
# alternative hypothesis: true p is not equal to 0.5
# 95 percent confidence interval:
# 2.606172 11.016593
# sample estimates:
# p
# 0.05555556
Not easily. The print format is controlled by the print.htest() function, which is documented in ?print.htest: it doesn't seem to offer any options other than the number of digits and the prefix for the "method" component.
If you want, you can hack the function yourself. (More details to follow, maybe; dump stats:::print.htest to a file, then edit it, then source() it.
As suggested by #CarlWitthoft:
p <- prop.test(x=8,n=144)
str(p) ## see what's there
p$estimate <- p$estimate*100
p$conf.int <- p$conf.int*100
I am running prop.test on R. Here are the contingency table and the output of the prop.test. 'a' is the name of the data frame. 'cluster' and 'AD_2' are two dichotomic variables.
table(a$cluster,a$AD_2)
no yes
neg 1227 375
pos 546 292
prop.test(table(a$cluster,a$AD_2))
2-sample test for equality of proportions with continuity
correction
data: table(a$cluster, a$AD_2)
X-squared = 35.656, df = 1, p-value = 2.355e-09
alternative hypothesis: two.sided
95 percent confidence interval:
0.07510846 0.15362412
sample estimates:
prop 1 prop 2
0.7659176 0.6515513
Sample estimates are conditioned on AD_2 being 'no' as can been seen from the contingency table, i.e. 0.7659176 = 1227/(1227+375) and 0.6515513 = 546/(546+292). Being a$cluster==pos the positive event and AD_2==yes the risk factor, I would like to reverse the proportions conditioning on AD_2 equal to 'yes'.
R tables are essentially matrices. The `prop.test function can handle matrices, so use the same data with the columns switched:
> prop.test( matrix(c( 375,292,1227,546), 2))
2-sample test for equality of proportions with continuity correction
data: matrix(c(375, 292, 1227, 546), 2)
X-squared = 35.656, df = 1, p-value = 2.355e-09
alternative hypothesis: two.sided
95 percent confidence interval:
-0.15362412 -0.07510846
sample estimates:
prop 1 prop 2
0.2340824 0.3484487
I think another method might have been to swap the columns with:
table(a$cluster,a$AD_2)[ , 2:1]
How to calculate Confidence Interval for a Chi-Square in R. Is there a function like chisq.test(),
There is no confidence interval for a chi-square test (you're just checking to see if the first categorical and the second categorical variable are independent), but you can do a confidence interval for the difference in proportions, like this.
Say you have some data where 30% of the first group report success, while 70% of a second group report success:
row1 <- c(70,30)
row2 <- c(30,70)
my.table <- rbind(row1,row2)
Now you have data in contingency table:
> my.table
[,1] [,2]
row1 70 30
row2 30 70
Which you can run chisq.test on, and clearly those two proportions are significantly different so the categorical variables must be independent:
> chisq.test(my.table)
Pearson's Chi-squared test with Yates' continuity correction
data: my.table
X-squared = 30.42, df = 1, p-value = 3.479e-08
If you do prop.test you find that you are 95% confident the difference between the proportions is somewhere between 26.29% and 53.70%, which makes sense, because the actual difference between the two observed proportions is 70%-30%=40%:
> prop.test(x=c(70,30),n=c(100,100))
2-sample test for equality of proportions with continuity correction
data: c(70, 30) out of c(100, 100)
X-squared = 30.42, df = 1, p-value = 3.479e-08
alternative hypothesis: two.sided
95 percent confidence interval:
0.2629798 0.5370202
sample estimates:
prop 1 prop 2
0.7 0.3
An addition to #mysteRious' nice answer: If you have a 2x2 contingency matrix, you could use fisher.test instead of prop.test to test for differences in the ratio of proportions instead of the difference of ratios. In Fisher's exact test the null hypothesis corresponds to an odds-ratio (OR) = 1.
Using #mysteRious' sample data
ft <- fisher.test(my.table)
ft
#
# Fisher's Exact Test for Count Data
#
#data: my.table
#p-value = 2.31e-08
#alternative hypothesis: true odds ratio is not equal to 1
#95 percent confidence interval:
# 2.851947 10.440153
#sample estimates:
#odds ratio
# 5.392849
Confidence intervals for the OR are then given in fit$conf.int
ft$conf.int
#[1] 2.851947 10.440153
#attr(,"conf.level")
#[1] 0.95
To confirm, we manually calculate the OR
OR <- Reduce("/", my.table[, 1] / my.table[, 2])
OR
#[1] 5.444444
I have a simple question. I've seen this behaviour in R for both t-tests and correlations.
I do a simple paired t-test (in this case, two vectors of length 100). So the df of the paired t-test should be 99. However this is not what appears in the t-test result output.
dataforTtest.x <- rnorm(100,3,1)
dataforTtest.y <- rnorm(100,1,1)
t.test(dataforTtest.x, dataforTtest.y,paired=TRUE)
the output of this is:
Paired t-test
data: dataforTtest.x and dataforTtest.y
t = 10, df = 100, p-value <2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
1.6 2.1
sample estimates:
mean of the differences
1.8
BUT, if I actually look into the resulting object, the df are correct.
> t.test(dataforTtest.x, dataforTtest.y,paired=TRUE)[["parameter"]]
df
99
Am I missing something very stupid?
I'm running R version 3.3.0 (2016-05-03)
This problem can happen if the global setting for rounding numbers is changing in R, which would be done with something like options(digits=2).
Note the results of a t-test before changing this setting:
Paired t-test
data: dataforTtest.x and dataforTtest.y
t = 13.916, df = 99, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
1.700244 2.265718
sample estimates:
mean of the differences
1.982981
And after setting options(digits=2):
Paired t-test
data: dataforTtest.x and dataforTtest.y
t = 13.916, df = 100, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
1.700244 2.265718
sample estimates:
mean of the differences
2
In R, it can be dangerous to change the global settings for this reason. It could completely change the results of statistical analyses without the user's knowledge. Instead, we can either use the round() function directly on a number, or for test results like these, we can use it in combination with the broom package.
round(2.949,2)
[1] 2.95
#and
require(broom)
glance(t.test(dataforTtest.x, dataforTtest.y,paired=TRUE))
estimate statistic p.value parameter cnf.low cnf.high method alternative
1.831433 11.31853 1.494257e-19 99 1.51037 2.152496 Paired t-test two.sided