Calculating test characteristics - r

I have a 2x2 contingency table from a larger dataset:
> ct
disease
test 0 1
no 118 12
yes 24 46
and I would like to quickly retrieve the different (medical) diagnostic test characteristics such as
Sensitivity
Specificity
Likelihood Ratio +
Likelihood Ratio -
False positive rate
False negative rate
Prob of disease
Pred value positive
Pred value negative
p(neg test wrong)
p(test positive)
p(test negative)
Overall accuracy
with their respective 95% CIs. Is there a package/function that does that? Many thanks.

Possibly you could write a custom function for each of these test characteristics. It would ensure the correct format for your particular problem and is probably faster that all the Googling you're already doing. Each one should be pretty quick. For example, Sensitivity:
sens <- function(ct) { ct[2,2] / sum(ct[,2]) }
And Specificity:
spec <- function(ct) { ct[1,1] / sum(ct[,1]) }

OK, epi.tests(t, verbose=T) it is.

Related

If computed the relative rejection frequency, how to measure if significantly different from significance levels? (Normality tests in R)

professionals and students,
I have significance levels 10%,5% & 1% and I have computed the relative rejection frequency thanks to an answer on my previous question.
replicate_sw10 = replicate(1000,shapiro.test(rnorm(10)))
table(replicate_sw10["p.value",]<0.10)/1000
> FALSE TRUE
> 0.909 0.091
But if I have done this for various sample sizes (T=10,30,50,100,500) and stored it manually via excel. Maybe there is an ever easier way to compute this in a function/list.
However how do I measure if it significantly different from significance levels?
(The hint is the following: the rejection of a test can be modelled as a Bernoulli random variable)
Best regards
So, the easiest way to do this is.. so if you perform 1000 test, you would expect approximately 0.1 of your test to have a pvalue < 0.1. It's like a bernoulli trial like you said, and you can use a binomial test to see the probability of something as extreme as your result:
set.seed(100)
replicate_sw10 = replicate(1000,shapiro.test(rnorm(10)))
obs_significant = sum(replicate_sw10["p.value",]<0.1)
binom.test(obs_significant,n=1000,p=0.1)
Exact binomial test
data: obs_significant and 1000
number of successes = 118, number of trials = 1000, p-value = 0.06479
alternative hypothesis: true probability of success is not equal to 0.1
95 percent confidence interval:
0.09865252 0.13962772
sample estimates:
probability of success
0.118

How to get tabulated interval of Wilcoxon-Mann-Whitney rank sum test

I was reading this topic on Rbloggers about the use of the Wilcoxon rank sum test: https://www.r-bloggers.com/wilcoxon-mann-whitney-rank-sum-test-or-test-u/
Especially this part, here I quote:
"We can finally compare the intervals tabulated on the tables of Wilcoxon for independent samples. The tabulated interval for two groups of 6 samples each is (26, 52)".
How can I get these "tabulated" values ?
I understand they used a table where the values are reported following the size of each samples, but I was wondering if there was a way to get them in R.
It is important because as I can understand the post, once you have a p-value > 0.05 and so cannot reject the null hypothesis H0, you can actually confirm H0 by comparing "computed" and "tabulated" intervals.
So what I would need is the tabulated intervals, using R.
tl;dr
You can get confidence intervals for a Mann-Whitney-Wilcoxon test by specifying conf.int=TRUE.
Don't believe everything you read on the internet ...
If by "confirm" you mean "make sure that the computation is true", you don't need to double-check by consulting the original tables; the p-value should be enough to decide whether you can reject H0 or not. You can trust R for standard, widely used statistical methods. (I also show below how to repeat the computation with a different implementation from the coin package, which is a nearly independent check.)
if by "confirm" you mean "accept the null hypothesis", please don't do this; this is a fundamental violation of frequentist statistical theory, which says that you can reject a null hypothesis, but that you can never accept the null. Wide confidence intervals and p-values greater than a given threshold are evidence that the conclusion is uncertain (we can't be sure whether the null or the alternative is true), not that the null is true. The concluding text of the blog post referred to ("we conclude by accepting the hypothesis H0 of equality of means") is statistically incorrect.
A better way to interpret the uncertainty is to look at the confidence intervals. You can compute these for the Wilcoxon test: from ?wilcox.test:
... (if argument ‘conf.int’ is true [and a two-sample test is being performed]), a nonparametric
confidence interval and an estimator for ... the difference of the location parameters
‘x-y’ is computed.
> a = c(6, 8, 2, 4, 4, 5)
> b = c(7, 10, 4, 3, 5, 6)
> wilcox.test(b,a, conf.int=TRUE, correct=FALSE)
data: b and a
W = 22, p-value = 0.5174
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
-1.999975 4.000016
sample estimates:
difference in location
0.9999395
The high p-value (0.5174) says that we really can't tell whether the values in a or b have signicantly different ranks. The difference in location gives us the estimated difference between the median ranks, and the confidence interval gives the confidence interval on this difference. In this case, for a sample size of 12, the estimated difference in ranks is 1 (group b has slightly higher ranks than group a), and the confidence interval is (-2, 4) (the data are consistent with group b having slightly lower or much higher ranks than group a). It is admittedly rather difficult to interpret the substantive meaning of these values - that's one of the disadvantages of rank-based nonparametric tests ...
You can assume that the p-value computed by wilcox.test() is a reasonable summary of the evidence against the null hypothesis; there's no need to look up ranges in the tables. If you're worried about wilcox.test() in base R, you can try wilcox_test() from the coin package:
dd <- data.frame(f=rep(c("a","b"),each=6),x=c(a,b))
wilcox_test(x~f,data=dd,conf.int=TRUE) ## asymptotic test
which gives nearly identical results to wilcox.test(), and
wilcox_test(x~f,data=dd,conf.int=TRUE, distribution="exact")
which gives a slightly different p-value, but essentially the same confidence intervals.
of historical interest only
As for the tables: I found them on Google books, by doing a Google Scholar search with author:katti author:wilcox. There you can read the description of how they were computed; this wouldn't be impossible to replicate, but it seems unnecessary since p-values and confidence intervals are available via other methods. Digging through you find this:
The number 0.0206 in the red box indicates that the interval (26,52) corresponds to a one-tail p-value of 0.0206 (2-tailed = 0.0412); that's the closest you can get with a discrete range. The next closest range is given in the line below [(27,51), one-tailed p=0.0325, two-tailed=0.065]. In the 21st century you should never have to do this procedure.

Fisher's Exact Test

In this post https://stats.stackexchange.com/questions/94909/course-of-action-for-2x2-tables-with-0s-in-cell-and-low-cell-counts, OP said that s/he got a p-value 0.5152 while conducted a Fisher's exact test for the following data:
Control Cases
A 8 0
B 14 0
But I am getting p-value=1 and odds ratio=0 for the data. My R codes are:
a <- matrix(c(8,14,0,0),2,2)
(res <- fisher.test(a))
Where am I doing mistake?
Good afternoon :)
https://en.wikipedia.org/wiki/Fisher%27s_exact_test
Haven't used these in a while, but I'm assuming its your column of two 0's:
p = choose(14, 14) * choose(8, 8)/ choose(22, 22)
which is 1.0. For odds ratio, read here: https://en.wikipedia.org/wiki/Odds_ratio
The 0's are either the numerators or the denominators. I think this makes sense, as a column of 0's effectively mean you have a group with no observations in.
You get the strange p-value=1 and OR=0 because one or more of your counts is 0. It should not be computed by the chi-square equation, which through multiplication yields chi-values of 0 for these respective cells:
Chi square equation, cell-by-cell.
Instead, you should use the Fisher's exact test ("fisher.test()") which to some extent can correct for the very low cell counts (normally you should use Fisher's for whenever you have at least 20% of cells with a count of <5). Source: https://www.ncbi.nlm.nih.gov/pubmed/23894860 Using the chi-square analysis will require you to correct using the Yates' correction, (e.g.: chisq.test(matrix, correct = T)).

Is there any way to calculate effect size between a pre-test and a post-test when scores on pre-test is 0 (or almost 0)

I would like to calculate an effect size between scores from pre-test and post-test of my studies.
However, due to the nature of my research, pre-test scores are usually 0 or almost 0 (before the treatment, participants usually do not have any knowledge in question).
I cannot just use Cohen's d to calculate effect sizes since the pre-test scores do not follow a normal distribution.
Is there any way I can calculate effect sizes in this case?
Any suggestions would be greatly appreciated.
You are looking for Cohen's d to see if the difference between the two time points (pre- and post-treatment) is large or small. The Cohen's d can be calculated as follows:
(mean_post - mean_pre) / {(variance_post + variance_pre)/2}^0.5
Where variance_post and variance_pre are the sample variances. Nowhere does it require here that the pre- and post-treatment score are normally distributed.
There are multiple packages available in R that provide a function for Cohen's d: effsize, pwr and lsr. In lsr your R-code would look like this:
library(lsr)
cohensD(pre_test_vector, post_test_vector)
Sidenote: The average scores tend to a normal distribution when your sample size tends to infinity. As long as your sample size is large enough, the average scores follow a normal distribution (Central Limit Theorem).

Permutation Distribution in r

So I need to create a permutation distribution of the difference in the proportions for a data set, however I'm not sure the best way to go about doing so.
This is the table that I need it for. I have to asses whether the difference between 2010 and 2011 is significant for "Yes".
mytable1 <- matrix(c(3648,25843,3407,26134), byrow=T, ncol=2)
dimnames(mytable1) <- list(c("2010","2011"),c("Yes","No"))
names(dimnames(mytable1)) <- c("Year","Response")
How do I code this in a for-loop?
Thank you so much!
Why use a permutation-based test if you can calculate exact probabilities? Is this a homework exercise?
fisher.test(mytable1);
Fisher's Exact Test for Count Data
data: mytable1
p-value = 0.001799
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
1.029882 1.138384
sample estimates:
odds ratio
1.082775
gives you the exact probabilitity (the p-value) for seeing a ratio of "Yes" to "No" in 2010 relative to 2011 (i.e. the odds ratio) as extreme or more extreme than what was observed. Note that the null hypothesis corresponds to an odds ratio of 1.
I assume this is what you mean by the "difference between 2010 and 2011 is significant [for Yes]". If not, please clarify and be more precise in specifying your test statistic (and null hypothesis). If it needs to be a permutation-based test, can you show us how far you have gotten?

Resources