Confidence Interval for a Chi-Square in R - r

How to calculate Confidence Interval for a Chi-Square in R. Is there a function like chisq.test(),

There is no confidence interval for a chi-square test (you're just checking to see if the first categorical and the second categorical variable are independent), but you can do a confidence interval for the difference in proportions, like this.
Say you have some data where 30% of the first group report success, while 70% of a second group report success:
row1 <- c(70,30)
row2 <- c(30,70)
my.table <- rbind(row1,row2)
Now you have data in contingency table:
> my.table
[,1] [,2]
row1 70 30
row2 30 70
Which you can run chisq.test on, and clearly those two proportions are significantly different so the categorical variables must be independent:
> chisq.test(my.table)
Pearson's Chi-squared test with Yates' continuity correction
data: my.table
X-squared = 30.42, df = 1, p-value = 3.479e-08
If you do prop.test you find that you are 95% confident the difference between the proportions is somewhere between 26.29% and 53.70%, which makes sense, because the actual difference between the two observed proportions is 70%-30%=40%:
> prop.test(x=c(70,30),n=c(100,100))
2-sample test for equality of proportions with continuity correction
data: c(70, 30) out of c(100, 100)
X-squared = 30.42, df = 1, p-value = 3.479e-08
alternative hypothesis: two.sided
95 percent confidence interval:
0.2629798 0.5370202
sample estimates:
prop 1 prop 2
0.7 0.3

An addition to #mysteRious' nice answer: If you have a 2x2 contingency matrix, you could use fisher.test instead of prop.test to test for differences in the ratio of proportions instead of the difference of ratios. In Fisher's exact test the null hypothesis corresponds to an odds-ratio (OR) = 1.
Using #mysteRious' sample data
ft <- fisher.test(my.table)
ft
#
# Fisher's Exact Test for Count Data
#
#data: my.table
#p-value = 2.31e-08
#alternative hypothesis: true odds ratio is not equal to 1
#95 percent confidence interval:
# 2.851947 10.440153
#sample estimates:
#odds ratio
# 5.392849
Confidence intervals for the OR are then given in fit$conf.int
ft$conf.int
#[1] 2.851947 10.440153
#attr(,"conf.level")
#[1] 0.95
To confirm, we manually calculate the OR
OR <- Reduce("/", my.table[, 1] / my.table[, 2])
OR
#[1] 5.444444

Related

Finding confidence interval for the second level of a categorical variable in R commander

I am going to determine the confidence interval for the proportion based on dataset "CES11" dataset which is included in R. The variable that I have used for the test is Abortion and the values are "No" and "Yes". I want to find the confidence interval for the proportion of "Yes", but R does a hypothesis test and provides a confidence interval for the first level which is "No". How can I change it such that results corresponds to level equal to "Yes"?
Frequency counts (test is for first level):
abortion
No Yes
1818 413
1-sample proportions test without continuity correction
data: rbind(.Table), null probability 0.5
X-squared = 884.82, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.7982282 0.8304517
sample estimates:
p
0.8148812
A quick way in base R is to reverse the tabulation vector order:
cts_abortion <- table(carData::CES11$abortion)
## p is for rate of "No"
prop.test(cts_abortion)
## 1-sample proportions test with continuity correction
##
## data: cts_abortion, null probability 0.5
## X-squared = 883.56, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.7979970 0.8306679
## sample estimates:
## p
## 0.8148812
## p is for rate of "Yes"
prop.test(rev(cts_abortion))
## 1-sample proportions test with continuity correction
##
## data: rev(cts_abortion), null probability 0.5
## X-squared = 883.56, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.1693321 0.2020030
## sample estimates:
## p
## 0.1851188
Otherwise, you can likely just relevel the factor:
df_ces11 <- carData::CES11
df_ces11$abortion <- factor(df_ces11$abortion, levels=c("Yes", "No"))

prop.test display the confidence interval in percentage terms instead of decimal in R?

I am wondering if there is a way to change the output settings of the prop.test function in R so that it displays the confidence interval already in percentage terms instead of a decimal? For example, I am trying to find the 95% confidence interval for the proportion of immigrants in the West with diabetes. Here is my code and the output:
sum(Immigrant_West$DIABETES)= 8, nrow(Immigrant_West)=144
prop.test(x=sum(Immigrant_West$DIABETES),n=nrow(Immigrant_West),conf.level = .95,correct=TRUE)
> 1-sample proportions test with continuity correction
data: sum(Immigrant_West$DIABETES) out of nrow(Immigrant_West), null probability 0.5
X-squared = 112, df = 1, p-value <2e-16
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.02606 0.11017
sample estimates:
p
0.05556
So is there a way to change the confidence interval output to show [2.606%, 11.017%] instead of as decimals? Thank you!
This will probably be simpler:
prp.out <- prop.test(x=8, n=144, conf.level=.95, correct=TRUE)
prp.out$conf.int <- prp.out$conf.int * 100
prp.out
#
# 1-sample proportions test with continuity correction
#
# data: 8 out of 144, null probability 0.5
# X-squared = 112.01, df = 1, p-value < 2.2e-16
# alternative hypothesis: true p is not equal to 0.5
# 95 percent confidence interval:
# 2.606172 11.016593
# sample estimates:
# p
# 0.05555556
Not easily. The print format is controlled by the print.htest() function, which is documented in ?print.htest: it doesn't seem to offer any options other than the number of digits and the prefix for the "method" component.
If you want, you can hack the function yourself. (More details to follow, maybe; dump stats:::print.htest to a file, then edit it, then source() it.
As suggested by #CarlWitthoft:
p <- prop.test(x=8,n=144)
str(p) ## see what's there
p$estimate <- p$estimate*100
p$conf.int <- p$conf.int*100

T.test on a small set of data from a large set

we compare the average (or mean) of one group against the set average (or mean). This set average can be any theoretical value (or it can be the population mean).
I am trying to compute the average mean of a small group of 300 observations against 1500 observations using one sided t.test.Is this approach correct? If not is there an alternative to this?
head(data$BMI)
attach(data)
tester<-mean(BMI)
table(BMI)
set.seed(123)
sampler<-sample(min(BMI):max(BMI),300,replace = TRUE)
mean(sampler)
t.test(sampler,tester)
The last line of the code yield-
Error in t.test.default(sampler, tester) : not enough 'y' observations
For testing your sample in t.test, you can do:
d <- rnorm(1500,mean = 3, sd = 1)
s <- sample(d,300)
Then, test for the normality of d and s:
> shapiro.test(d)
Shapiro-Wilk normality test
data: d
W = 0.9993, p-value = 0.8734
> shapiro.test(s)
Shapiro-Wilk normality test
data: s
W = 0.99202, p-value = 0.1065
Here the test is superior to 0.05, so you could consider that both d and s are normally distributed. So, you can test for t.test:
> t.test(d,s)
Welch Two Sample t-test
data: d and s
t = 0.32389, df = 444.25, p-value = 0.7462
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.09790144 0.13653776
sample estimates:
mean of x mean of y
2.969257 2.949939

prop.test, conditioning on the other class

I am running prop.test on R. Here are the contingency table and the output of the prop.test. 'a' is the name of the data frame. 'cluster' and 'AD_2' are two dichotomic variables.
table(a$cluster,a$AD_2)
no yes
neg 1227 375
pos 546 292
prop.test(table(a$cluster,a$AD_2))
2-sample test for equality of proportions with continuity
correction
data: table(a$cluster, a$AD_2)
X-squared = 35.656, df = 1, p-value = 2.355e-09
alternative hypothesis: two.sided
95 percent confidence interval:
0.07510846 0.15362412
sample estimates:
prop 1 prop 2
0.7659176 0.6515513
Sample estimates are conditioned on AD_2 being 'no' as can been seen from the contingency table, i.e. 0.7659176 = 1227/(1227+375) and 0.6515513 = 546/(546+292). Being a$cluster==pos the positive event and AD_2==yes the risk factor, I would like to reverse the proportions conditioning on AD_2 equal to 'yes'.
R tables are essentially matrices. The `prop.test function can handle matrices, so use the same data with the columns switched:
> prop.test( matrix(c( 375,292,1227,546), 2))
2-sample test for equality of proportions with continuity correction
data: matrix(c(375, 292, 1227, 546), 2)
X-squared = 35.656, df = 1, p-value = 2.355e-09
alternative hypothesis: two.sided
95 percent confidence interval:
-0.15362412 -0.07510846
sample estimates:
prop 1 prop 2
0.2340824 0.3484487
I think another method might have been to swap the columns with:
table(a$cluster,a$AD_2)[ , 2:1]

R: prop.test returns different values based on whether matrix or vectors are passed to it

Why would r's prop.test function (documentation here) return different results based on whether I pass it a matrix or vectors?
Here I pass it vectors:
> prop.test(x = c(135, 47), n = c(1781, 1443))
2-sample test for equality of proportions with
continuity correction
data: c(135, 47) out of c(1781, 1443)
X-squared = 27.161, df = 1, p-value = 1.872e-07
alternative hypothesis: two.sided
95 percent confidence interval:
0.02727260 0.05918556
sample estimates:
prop 1 prop 2
0.07580011 0.03257103
Here I create a matrix and pass it in instead:
> table <- matrix(c(135, 47, 1781, 1443), ncol=2)
> prop.test(table)
2-sample test for equality of proportions with
continuity correction
data: table
X-squared = 24.333, df = 1, p-value = 8.105e-07
alternative hypothesis: two.sided
95 percent confidence interval:
0.02382527 0.05400606
sample estimates:
prop 1 prop 2
0.07045929 0.03154362
Why do I get different results? I expect the same results for both scenarios to be returned.
When x and n are entered as separate vectors, they are treated, respectively, as the number of successes and the total number of trials. But when you enter a matrix, the first column is treated as the number of successes and the second as the number of failures. From the help for prop.test:
x a vector of counts of successes, a one-dimensional table with two
entries, or a two-dimensional table (or matrix) with 2 columns, giving
the counts of successes and failures, respectively.
So, to get the same result with a matrix, you need to convert the second column of the matrix to the number of failures (assuming that in your example x is the number of successes and n is the number of trials).
x = c(135, 47)
n = c(1781, 1443)
prop.test(x, n) # x = successes; n = total trials
2-sample test for equality of proportions with continuity correction
data: x out of n
X-squared = 27.161, df = 1, p-value = 1.872e-07
alternative hypothesis: two.sided
95 percent confidence interval:
0.02727260 0.05918556
sample estimates:
prop 1 prop 2
0.07580011 0.03257103
prop.test(cbind(x, n - x)) # x = successes; convert n to number of failures
2-sample test for equality of proportions with continuity correction
data: cbind(x, n - x)
X-squared = 27.161, df = 1, p-value = 1.872e-07
alternative hypothesis: two.sided
95 percent confidence interval:
0.02727260 0.05918556
sample estimates:
prop 1 prop 2
0.07580011 0.03257103

Resources