I am about to calculate the confidence interval(CI) for the proportion.
The data is like this:
End Count
death 57
pat 319
where pat means the total number of sample.
I used following formula:
#lower CI
57/319 - 1.96*sqrt(57/319*(1-57/319)/319)
#upper CI
57/319 + 1.96*sqrt(57/319*(1-57/319)/319)
Formulas above gave the result of [0.1366, 0.2207].
However, when I used prop.test(),
prop.test(57, 319, correct = FALSE)
The result was [0.1405442, 0.2244692].
Could you please explain how this happen?
Thank you in advance.
Your confidence intervals are an approximation assuming a normal distribution. For count data and especially for proportions, this assumption can be very inaccurate, especially if the proportion is not approximately .5 or the sample size is small. The 'prop.test()' function estimates an asymmetric interval (notice that 57/319 does not lie in the middle of the interval). The method used is documented in the article cited on the manual page (?prop.test). The manual page also notes that the estimate used in binom.test can be more accurate:
binom.test(57, 319)
Exact binomial test
data: 57 and 319
number of successes = 57, number of trials = 319, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.1382331 0.2252183
sample estimates:
probability of success
0.1786834
The difference between all three is small, but binom.test() should be your choice if it makes a difference in whether or not to reject the null hypothesis.
Related
I don't understand how summarySE() in Rmisc package calculates the confidence intervals (ci) of my data. The values do not seem to be correct.
For example, after running summarySE(data = df, measurevar = "numbers", groupvars = "conditions", conf.interval = 0.95), the output shows:
conditions N numbers sd se ci
1 constructionA 10 6.025 0.3987829 0.1261062 0.2852721
2 constructionB 10 1.925 0.3545341 0.1121135 0.2536184
However, the confidence interval of constructionA should be 6.025 ± 1.96 x (0.398729)/√10, which should be 6.025 ± 0.24716366. I don't understand where the value of 0.2852721 comes from after applying summarySE... Shouldn't it be 0.24716366?
Could anyone tell me what's wrong here?
Thank you!
A common construction of a confidence interval is
(statistic) +/- c*(standard error of statistic)
where c is the critical value. c=1.96 is (approximately) the critical value you get for a normally-distributed z-statistic and a 95% confidence interval, but that's not part of the definition of a CI or anything, it's just the CI you get if you think your statistic is normally distributed.
However, most calculations of confidence intervals, summarySE() included, use the t-distribution rather than the normal distribution to calculate critical values, as they produce more accurate results than the normal when sample sizes are small (and nearly identical results when they are large).
Here, your sample size is only N=10, so the differences between the normal-distribution 1.96 and the critical value from the t-statistic are noticeable. The 2.5th percentile of the t distribution with 10-1 = 9 degrees of freedom is qt(.025, 9) = -2.262157. So c = 2.262157 for a two-sided 95% confidence interval.
0.1261062*2.262157 = 0.285272, and this is where the confidence interval column comes from.
I am performing a McNemar test in R of the following data:
Obtaining the following result:
I understand the results, nevertheless, someone could explain to me how the confidence interval is computed?
You can read it more in this vignette and also check out the code. Using this wiki picture for illustration:
The odds ratio is b / c, which in your case works out to be 150/86 = 1.744186. You can construct a binomial confidence interval around the proportion of successes, treating b as success and number of trials to be b + c.
In the code, they used this bit of code to calculate:
library(exactci)
CI = binom.exact(150,86+150,tsmethod = "central")
CI
data: 150 and 86 + 150
number of successes = 150, number of trials = 236, p-value = 3.716e-05
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.5706732 0.6970596
sample estimates:
probability of success
0.6355932
You have the upper and lower bound of b, then odds ratio is p / 1- p :
CI$conf.int/(1-CI$conf.int)
[1] 1.329228 2.300979
The vignette for binom.exact states:
The 'central' method gives the Clopper-Pearson intervals, and the
'minlike' method gives confidence intervals proposed by Stern
(1954) (see Blaker, 2000).
So this is one of the many methods of estimating a binomial confidence interval.
professionals and students,
I have significance levels 10%,5% & 1% and I have computed the relative rejection frequency thanks to an answer on my previous question.
replicate_sw10 = replicate(1000,shapiro.test(rnorm(10)))
table(replicate_sw10["p.value",]<0.10)/1000
> FALSE TRUE
> 0.909 0.091
But if I have done this for various sample sizes (T=10,30,50,100,500) and stored it manually via excel. Maybe there is an ever easier way to compute this in a function/list.
However how do I measure if it significantly different from significance levels?
(The hint is the following: the rejection of a test can be modelled as a Bernoulli random variable)
Best regards
So, the easiest way to do this is.. so if you perform 1000 test, you would expect approximately 0.1 of your test to have a pvalue < 0.1. It's like a bernoulli trial like you said, and you can use a binomial test to see the probability of something as extreme as your result:
set.seed(100)
replicate_sw10 = replicate(1000,shapiro.test(rnorm(10)))
obs_significant = sum(replicate_sw10["p.value",]<0.1)
binom.test(obs_significant,n=1000,p=0.1)
Exact binomial test
data: obs_significant and 1000
number of successes = 118, number of trials = 1000, p-value = 0.06479
alternative hypothesis: true probability of success is not equal to 0.1
95 percent confidence interval:
0.09865252 0.13962772
sample estimates:
probability of success
0.118
What is meant by sample estimates in this binom test. They don't seem to change with change in porbability of success. I have tried to find it's meaning in documentation and on google but can't see it. I also tried to hand compute it, to check if it meant any thing but still I cannot see what it really means. Any idea?
binom.test(60, 300, 0.3)
Exact binomial test
data: 60 and 300
number of successes = 60, number of trials = 300, p-value = 0.0001137
alternative hypothesis: true probability of success is not equal to 0.3
95 percent confidence interval:
0.1562313 0.2498044
sample estimates:
probability of success
0.2
binom.test(60, 300, 1/6)
Exact binomial test
data: 60 and 300
number of successes = 60, number of trials = 300, p-value = 0.1216
alternative hypothesis: true probability of success is not equal to 0.1666667
95 percent confidence interval:
0.1562313 0.2498044
sample estimates:
probability of success
0.2
binom.test(60, 300, 0.5)
Exact binomial test
data: 60 and 300
number of successes = 60, number of trials = 300, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.1562313 0.2498044
sample estimates:
probability of success
0.2
I will use your second chunk of code to explain (it is the same for all).
Imagine rolling a die. The probability of rolling a 6 if the die is fair is 1/6. This is the third argument of the binom.test function. Therefore, in your example, the successes that you would expect would be 300 / 6 = 50. This implies a (hypothesized) probability of success of 1/6.
However, you observed 60 successes. These 60 observed successes are used to calculate the success sample estimate i.e. the value that you see at the bottom. This is calculated as 60 / 300 = 0.2.
The binomial test then is used to test whether the proportion of 6s that you observed is significantly higher than would be expected by chance (i.e. 50 if the die is fair).
I am trying to learn F test and on performing the inbuilt var.test() in R, I obtained the following result var.test(gardenB,gardenC)
F test to compare two variances
data: gardenB and gardenC
F = 0.09375, num df = 9, denom df = 9, p-value = 0.001624
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.02328617 0.37743695
sample estimates:
ratio of variances
0.09375
I understand that based on the p-value, I should reject the Null hypothesis.
However, I am unable to understand the meaning conveyed by the 95 percent confidence interval?
I tried reading through the explanation provided for the queries:
https://stats.stackexchange.com/questions/31454/how-to-interpret-the-confidence-interval-of-a-variance-f-test-using-r
But am still able to understand the meaning conveyed in the confidence interval. Any help would be really appreciated?
Sorry, I know this is an old post but it showed up as the second result on google so I will try to answer the question still.
The confidence interval is for the RATIO of the two variances.
For example, if the variances are equal ie. var1 = var2, the ratio would be var1/var2 which is 1.
var.test() is usually used to test if the variances are equal. If 1 is not in the 95% confidence interval, it is safe to assume that the variances are not equal and thus, reject the hypothesis.