Interpreting var.test results in R - r

I am trying to learn F test and on performing the inbuilt var.test() in R, I obtained the following result var.test(gardenB,gardenC)
F test to compare two variances
data: gardenB and gardenC
F = 0.09375, num df = 9, denom df = 9, p-value = 0.001624
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.02328617 0.37743695
sample estimates:
ratio of variances
0.09375
I understand that based on the p-value, I should reject the Null hypothesis.
However, I am unable to understand the meaning conveyed by the 95 percent confidence interval?
I tried reading through the explanation provided for the queries:
https://stats.stackexchange.com/questions/31454/how-to-interpret-the-confidence-interval-of-a-variance-f-test-using-r
But am still able to understand the meaning conveyed in the confidence interval. Any help would be really appreciated?

Sorry, I know this is an old post but it showed up as the second result on google so I will try to answer the question still.
The confidence interval is for the RATIO of the two variances.
For example, if the variances are equal ie. var1 = var2, the ratio would be var1/var2 which is 1.
var.test() is usually used to test if the variances are equal. If 1 is not in the 95% confidence interval, it is safe to assume that the variances are not equal and thus, reject the hypothesis.

Related

Why does the difference in calculation confidence interval occur?

I am about to calculate the confidence interval(CI) for the proportion.
The data is like this:
End Count
death 57
pat 319
where pat means the total number of sample.
I used following formula:
#lower CI
57/319 - 1.96*sqrt(57/319*(1-57/319)/319)
#upper CI
57/319 + 1.96*sqrt(57/319*(1-57/319)/319)
Formulas above gave the result of [0.1366, 0.2207].
However, when I used prop.test(),
prop.test(57, 319, correct = FALSE)
The result was [0.1405442, 0.2244692].
Could you please explain how this happen?
Thank you in advance.
Your confidence intervals are an approximation assuming a normal distribution. For count data and especially for proportions, this assumption can be very inaccurate, especially if the proportion is not approximately .5 or the sample size is small. The 'prop.test()' function estimates an asymmetric interval (notice that 57/319 does not lie in the middle of the interval). The method used is documented in the article cited on the manual page (?prop.test). The manual page also notes that the estimate used in binom.test can be more accurate:
binom.test(57, 319)
Exact binomial test
data: 57 and 319
number of successes = 57, number of trials = 319, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.1382331 0.2252183
sample estimates:
probability of success
0.1786834
The difference between all three is small, but binom.test() should be your choice if it makes a difference in whether or not to reject the null hypothesis.

McNemar exact test

I am performing a McNemar test in R of the following data:
Obtaining the following result:
I understand the results, nevertheless, someone could explain to me how the confidence interval is computed?
You can read it more in this vignette and also check out the code. Using this wiki picture for illustration:
The odds ratio is b / c, which in your case works out to be 150/86 = 1.744186. You can construct a binomial confidence interval around the proportion of successes, treating b as success and number of trials to be b + c.
In the code, they used this bit of code to calculate:
library(exactci)
CI = binom.exact(150,86+150,tsmethod = "central")
CI
data: 150 and 86 + 150
number of successes = 150, number of trials = 236, p-value = 3.716e-05
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.5706732 0.6970596
sample estimates:
probability of success
0.6355932
You have the upper and lower bound of b, then odds ratio is p / 1- p :
CI$conf.int/(1-CI$conf.int)
[1] 1.329228 2.300979
The vignette for binom.exact states:
The 'central' method gives the Clopper-Pearson intervals, and the
'minlike' method gives confidence intervals proposed by Stern
(1954) (see Blaker, 2000).
So this is one of the many methods of estimating a binomial confidence interval.

If computed the relative rejection frequency, how to measure if significantly different from significance levels? (Normality tests in R)

professionals and students,
I have significance levels 10%,5% & 1% and I have computed the relative rejection frequency thanks to an answer on my previous question.
replicate_sw10 = replicate(1000,shapiro.test(rnorm(10)))
table(replicate_sw10["p.value",]<0.10)/1000
> FALSE TRUE
> 0.909 0.091
But if I have done this for various sample sizes (T=10,30,50,100,500) and stored it manually via excel. Maybe there is an ever easier way to compute this in a function/list.
However how do I measure if it significantly different from significance levels?
(The hint is the following: the rejection of a test can be modelled as a Bernoulli random variable)
Best regards
So, the easiest way to do this is.. so if you perform 1000 test, you would expect approximately 0.1 of your test to have a pvalue < 0.1. It's like a bernoulli trial like you said, and you can use a binomial test to see the probability of something as extreme as your result:
set.seed(100)
replicate_sw10 = replicate(1000,shapiro.test(rnorm(10)))
obs_significant = sum(replicate_sw10["p.value",]<0.1)
binom.test(obs_significant,n=1000,p=0.1)
Exact binomial test
data: obs_significant and 1000
number of successes = 118, number of trials = 1000, p-value = 0.06479
alternative hypothesis: true probability of success is not equal to 0.1
95 percent confidence interval:
0.09865252 0.13962772
sample estimates:
probability of success
0.118

How to get tabulated interval of Wilcoxon-Mann-Whitney rank sum test

I was reading this topic on Rbloggers about the use of the Wilcoxon rank sum test: https://www.r-bloggers.com/wilcoxon-mann-whitney-rank-sum-test-or-test-u/
Especially this part, here I quote:
"We can finally compare the intervals tabulated on the tables of Wilcoxon for independent samples. The tabulated interval for two groups of 6 samples each is (26, 52)".
How can I get these "tabulated" values ?
I understand they used a table where the values are reported following the size of each samples, but I was wondering if there was a way to get them in R.
It is important because as I can understand the post, once you have a p-value > 0.05 and so cannot reject the null hypothesis H0, you can actually confirm H0 by comparing "computed" and "tabulated" intervals.
So what I would need is the tabulated intervals, using R.
tl;dr
You can get confidence intervals for a Mann-Whitney-Wilcoxon test by specifying conf.int=TRUE.
Don't believe everything you read on the internet ...
If by "confirm" you mean "make sure that the computation is true", you don't need to double-check by consulting the original tables; the p-value should be enough to decide whether you can reject H0 or not. You can trust R for standard, widely used statistical methods. (I also show below how to repeat the computation with a different implementation from the coin package, which is a nearly independent check.)
if by "confirm" you mean "accept the null hypothesis", please don't do this; this is a fundamental violation of frequentist statistical theory, which says that you can reject a null hypothesis, but that you can never accept the null. Wide confidence intervals and p-values greater than a given threshold are evidence that the conclusion is uncertain (we can't be sure whether the null or the alternative is true), not that the null is true. The concluding text of the blog post referred to ("we conclude by accepting the hypothesis H0 of equality of means") is statistically incorrect.
A better way to interpret the uncertainty is to look at the confidence intervals. You can compute these for the Wilcoxon test: from ?wilcox.test:
... (if argument ‘conf.int’ is true [and a two-sample test is being performed]), a nonparametric
confidence interval and an estimator for ... the difference of the location parameters
‘x-y’ is computed.
> a = c(6, 8, 2, 4, 4, 5)
> b = c(7, 10, 4, 3, 5, 6)
> wilcox.test(b,a, conf.int=TRUE, correct=FALSE)
data: b and a
W = 22, p-value = 0.5174
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
-1.999975 4.000016
sample estimates:
difference in location
0.9999395
The high p-value (0.5174) says that we really can't tell whether the values in a or b have signicantly different ranks. The difference in location gives us the estimated difference between the median ranks, and the confidence interval gives the confidence interval on this difference. In this case, for a sample size of 12, the estimated difference in ranks is 1 (group b has slightly higher ranks than group a), and the confidence interval is (-2, 4) (the data are consistent with group b having slightly lower or much higher ranks than group a). It is admittedly rather difficult to interpret the substantive meaning of these values - that's one of the disadvantages of rank-based nonparametric tests ...
You can assume that the p-value computed by wilcox.test() is a reasonable summary of the evidence against the null hypothesis; there's no need to look up ranges in the tables. If you're worried about wilcox.test() in base R, you can try wilcox_test() from the coin package:
dd <- data.frame(f=rep(c("a","b"),each=6),x=c(a,b))
wilcox_test(x~f,data=dd,conf.int=TRUE) ## asymptotic test
which gives nearly identical results to wilcox.test(), and
wilcox_test(x~f,data=dd,conf.int=TRUE, distribution="exact")
which gives a slightly different p-value, but essentially the same confidence intervals.
of historical interest only
As for the tables: I found them on Google books, by doing a Google Scholar search with author:katti author:wilcox. There you can read the description of how they were computed; this wouldn't be impossible to replicate, but it seems unnecessary since p-values and confidence intervals are available via other methods. Digging through you find this:
The number 0.0206 in the red box indicates that the interval (26,52) corresponds to a one-tail p-value of 0.0206 (2-tailed = 0.0412); that's the closest you can get with a discrete range. The next closest range is given in the line below [(27,51), one-tailed p=0.0325, two-tailed=0.065]. In the 21st century you should never have to do this procedure.

Permutation Distribution in r

So I need to create a permutation distribution of the difference in the proportions for a data set, however I'm not sure the best way to go about doing so.
This is the table that I need it for. I have to asses whether the difference between 2010 and 2011 is significant for "Yes".
mytable1 <- matrix(c(3648,25843,3407,26134), byrow=T, ncol=2)
dimnames(mytable1) <- list(c("2010","2011"),c("Yes","No"))
names(dimnames(mytable1)) <- c("Year","Response")
How do I code this in a for-loop?
Thank you so much!
Why use a permutation-based test if you can calculate exact probabilities? Is this a homework exercise?
fisher.test(mytable1);
Fisher's Exact Test for Count Data
data: mytable1
p-value = 0.001799
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
1.029882 1.138384
sample estimates:
odds ratio
1.082775
gives you the exact probabilitity (the p-value) for seeing a ratio of "Yes" to "No" in 2010 relative to 2011 (i.e. the odds ratio) as extreme or more extreme than what was observed. Note that the null hypothesis corresponds to an odds ratio of 1.
I assume this is what you mean by the "difference between 2010 and 2011 is significant [for Yes]". If not, please clarify and be more precise in specifying your test statistic (and null hypothesis). If it needs to be a permutation-based test, can you show us how far you have gotten?

Resources