I'll be running a one-tailed t-test to determine if one mean is significantly lower than another. The problem is that, when I use R's pwr package to determine what power I can expect with n=30, I get an extremely low power even for large effects. So, for example:
> pwr.t.test(d=0.8,sig.level=.05,n=30,alternative="less")
Two-sample t test power calculation
n = 30
d = 0.8
sig.level = 0.05
power = 1.251823e-06
alternative = less
NOTE: n is number in *each* group
What's even stranger is that, when I increase n, the power goes down. So, for example, upping n to 300 gives me this:
> pwr.t.test(d=0.8,sig.level=.05,n=300,alternative="less")
Two-sample t test power calculation
n = 300
d = 0.8
sig.level = 0.05
power = 0
alternative = less
NOTE: n is number in *each* group
What am I missing?
I guess it's because d and alternative = 'less' are on different 'directions'.
Try this, and you will know what I mean.
pwr.t.test(d= - 0.8,sig.level=.05,n=300,alternative="less")
Two-sample t test power calculation
n = 300
d = -0.8
sig.level = 0.05
power = 1
alternative = less
NOTE: n is number in *each* group
Related
The question:
A screening test for a disease, that affects 0.05% of the male population, is able to identify the disease in 90% of the cases where an individual actually has the disease. The test however generates 1% false positives (gives a positive reading when the individual does not have the disease). Find the probability that a man has the disease given that has tested positive. Then, find the probability that a man has the disease given that he has a negative test.
My wrong attempt:
I first started by letting:
• T be the event that a man has a positive test
• Tc be the event that a man has a negative test
• D be the event that a man has actually the disease
• Dc be the event that a man does not have the disease
Therefore we need to find P(D|T) and P(D|Tc)
Then I wrote this code:
set.seed(110)
sims = 1000
D = rep(0, sims)
Dc = rep(0, sims)
T = rep(0, sims)
Tc = rep(0, sims)
# run the loop
for(i in 1:sims){
# flip to see if we have the disease
flip = runif(1)
# if we got the disease, mark it
if(flip <= .0005){
D[i] = 1
}
# if we have the disease, we need to flip for T and Tc,
if(D[i] == 1){
# flip for S1
flip1 = runif(1)
# see if we got S1
if(flip1 < 1/9){
T[i] = 1
}
# flip for S2
flip2 = runif(1)
# see if we got S1
if(flip2 < 1/10){
Tc[i] = 1
}
}
}
# P(D|T)
mean(D[T == 1])
# P(D|Tc)
mean(D[Tc == 1])
I'm really struggling so any help would be appreciated!
Perhaps the best way to think through a conditional probability question like this is with a concrete example.
Say we tested one million individuals in the population. Then 500 individuals (0.05% of one million) would be expected to have the disease, of whom 450 would be expected to test positive and 50 to test negative (since the false negative rate is 10%).
Conversely, 999,500 would be expected to not have the disease (one million minus the 500 who do have the disease), but since 1% of them would test positive, then we would expect 9,995 people (1% of 999,500) with false positive results.
So, given a positive test result taken at random, it either belongs to one of the 450 people with the disease who tested positive, or one of the 9,995 people without the disease who tested positive - we don't know which.
This is the situation in the first question, since we have a positive test result but don't know whether it is a true positive or a false positive. The probability of our subject having the disease given their positive test is the probability that they are one of the 450 true positives out of the 10,445 people with positive results (9995 false positives + 450 true positives). This boils down to the simple calculation 450/10,445 or 0.043, which is 4.3%.
Similarly, a negative test taken at random either belongs to one of the 989505 (999500 - 9995) people without the disease who tested negative, or one of the 50 people with the disease who tested negative, so the probability of having the disease is 50/989505, or 0.005%.
I think this question is demonstrating the importance of knowing that disease prevalence needs to be taken into account when interpreting test results, and very little to do with programming, or R. It requires only a calculator (at most).
If you really wanted to run a simulation in R, you could do:
set.seed(1) # This makes the sample reproducible
sample_size <- 1000000 # This can be changed to get a larger or smaller sample
# Create a large sample of 1 million "people", using a 1 to denote disease and
# a 0 to denote no disease, with probabilities of 0.0005 (which is 0.05%) and
# 0.9995 (which is 99.95%) respectively.
disease <- sample(x = c(0, 1),
size = sample_size,
replace = TRUE,
prob = c(0.9995, 0.0005))
# Create an empty vector to hold the test results for each person
test <- numeric(sample_size)
# Simulate the test results of people with the disease, using a 1 to denote
# a positive test and 0 to denote a negative test. This uses a probability of
# 0.9 (which is 90%) of having a positive test and 0.1 (which is 10%) of having
# a negative test. We draw as many samples as we have people with the disease
# and put them into the "test" vector at the locations corresponding to the
# people with the disease.
test[disease == 1] <- sample(x = c(0, 1),
size = sum(disease),
replace = TRUE,
prob = c(0.1, 0.9))
# Now we do the same for people without the disease, simulating their test
# results, with a 1% probability of a positive test.
test[disease == 0] <- sample(x = c(0, 1),
size = 1e6 - sum(disease),
replace = TRUE,
prob = c(0.99, 0.01))
Now we have run our simulation, we can just count the true positives, false positives, true negatives and false negatives by creating a contingency table
contingency_table <- table(disease, test)
contingency_table
#> test
#> disease 0 1
#> 0 989566 9976
#> 1 38 420
and get the approximate probability of having the disease given a positive test like this:
contingency_table[2, 2] / sum(contingency_table[,2])
#> [1] 0.04040015
and the probability of having the disease given a negative test like this:
contingency_table[2, 1] / sum(contingency_table[,1])
#> [1] 3.83992e-05
You'll notice that the probability estimates from sampling are not that accurate because of how small some of the sampling probabilities are. You could simulate a larger sample, but it might take a while for your computer to run it.
Created on 2021-08-19 by the reprex package (v2.0.0)
To expand on Allan's answer, but relating it back to Bayes Theorem, if you prefer:
From the question, you know (converting percentages to probabilities):
Plugging in:
I am flipping each coin 100 times in a bag of 50 coins and then I want to use the Method of Maximum statistics in order to determine the Family Wise Error Rate. However, I keep getting an FWER of 1 which feels wrong.
coins <- rbinom(50, 100, 0.5)
So I start by defining a new function where we input how many times we do randomizations, the coins themselves, and how many times we flip them.
simulate_max <- function(n_number_of_randomizations, input_coins, N_number_of_tosses, alpha = 0.05) {
maxList <- NULL
Then we do a for loop for every time we have specified.
for (iteration in 1:n_number_of_randomizations){
Now we shuffle the list of coins
CoinIteration <- sample(input_coins)
Now we apply the binary test to every coin in the bag
testresults <- map_df(CoinIteration, function(x) tidy(binom.test(x,N_number_of_tosses,p=alpha)) )
Now we want to add the maximum result from every test to the max list.
thisRandMax <- max(testresults$statistic)
maxList <- c(maxList, thisRandMax)
}
Finally, we iterate through every member of the maximum list to subtract the expected value of heads (ie 50 for 50% chance * 100 tosses.
for (iterator2 in 1:length(maxList)){
maxList[iterator2]<-maxList[iterator2]-(0.5*N_number_of_tosses)
}
Return the output from the function
return(data.frame(maxList))
}
Now we apply this simulation for each of the requested iterations.
repsmax = map_df(1:Nreps, ~simulate_max(Nrandomizations,coins,Ntosses))
Now we calculate the fwer by dividing the increased amount by the total number of cells.
fwer = sum(repsmax>0) / (Nreps*Nrandomizations)
There are some issues that I think would be good to clarify.
A FWER of ~1 seems about right to me given the parameters of your experiment. FWER relates to Type I error, and for a single normally distributed test at alpha = 0.05, FWER = 1 - P(Type I error = 0); FWER = 1 - 0.95 = 0.05. For two tests at alpha = 0.05, FWER = 1 - P(Type I error = 0); FWER = 1 - 0.95^2 = 0.0975. You have 50 coins (50 tests), so your FWER at alpha = 0.05 is 1 - 0.95^50 = 0.923. If your code treats the 100 coins as 100 tests, your FWER will be = 0.996 (~1).
You can control for Type I error (account for multiple testing) by using e.g. the Bonferroni correction (alpha / n). If you change your alpha to "0.05 / 50" = 0.001, you will control your FWER (reduce it) to 0.05 (1 - 0.999^50 = ~0.049). I suspect this is the answer you are looking for: if alpha = 0.001 then FWER = 0.05 and you have an acceptable chance of incorrectly rejecting the null hypothesis.
I don't know what the "maximum estimate of the effect size" is, or how to calculate it, but given that the two distributions are approximately identical, the effect size will be ~ 0. It then makes sense that controlling FWER to 0.05 (by adjusting alpha to 0.001) is the 'answer' to the question and if you can get your code to reflect that logic, I think you'll have your solution.
How can I code in r for the threshold of a predictive model to automatically be a value such that the sensitivity is a particular proportion/value for all runs of the model?
For example, given the following scenarios:
At threshold of 0.2; True positive = 20, False negative = 60 i.e. sensitivity of 0.25
At threshold of 0.35; True positive = 60, False negative = 20 i.e. sensitivity of 0.8
How do I write an r code that automatically always picks the threshold for sensitivity 0.8 i.e. scenario 2 from above? For context, I'm using the caret modelling framework.
These links on threshold optimization did not help much:
http://topepo.github.io/caret/using-your-own-model-in-train.html#Illustration5
Obtaining threshold values from a ROC curve
(1)
Say you have a data with values and true labels. Here, 5 false and 5 true
df <- data.frame(value = c(1,2,3,5,8,4,6,7,9,10),
truth = c(rep(0,5), rep(1,5)))
At threshold 9, 9 and 10 were detected as true positive, sensitivity = 40%
At threshold 6 (or anything between 5 and 6), (6,7,9,10) were detected, sensitivity = 80%
To see the ROC curve, you can use the pROC package
library(pROC)
roc.demo <- roc(truth ~ value, data = df)
par(pty = "s") # make it square
plot(roc.demo) # plot ROC curve
If you want percentage, do below
roc.demo <- roc(truth ~ value, data = df, percent = T)
and replace 0.8 with 80 in below.
You can get the thresholds from the roc object
roc.demo$thresholds[roc.demo$sensitivities == 0.8]
You might see it says 4.5 and 5.5
You may also use
roc.demo$sensitivities > 0.79 & roc.demo$sensitivities < 0.81
(2)
Alternatively, if you just want a threshold and don't care about the specificity, you may try the quantile function
quantile(df$value[df$truth == 1],
probs = c(0.00, 0.10, 0.20, 0.30), type = 1) # percentile giving the closest number
probs=0.20 corresponds to 80% sensitivity
0% 10% 20% 30%
4 4 4 6
Anything threshold between 4 and 6 is what you are looking for. You may change the probs as you need.
Hopefully, it helps.
tmpSD = 10
tmpSD2 = 20
power.t.test(n=10,delta = 1*tmpSD,sd = tmpSD,sig.level=0.05,power=NULL,type="two.sample", alternative = c("two.sided"))
power.t.test(n=10,delta = 1*tmpSD2,sd = tmpSD2,sig.level=0.05,power=NULL,type="two.sample", alternative = c("two.sided"))
I have the above code in my R program. Both the first and second power.t.test result in the same power of 0.5619846. Does this mean that if my ratio of delta and standard deviation stay the same, so will my power?
EDIT:
In my test, I am running a power analysis to determine the minimum n needed to have 80% power in finding a statistically significant difference in Contribution and in Time. The standard deviations of the two are different. But, when running the following for loop to determine the power of each at different n's I obtain the exact same power at each n. I am nevertheless confused as to why the power in of each is identical at the same n. Why should my power to detect a statistically significant difference in decision time at a particular n be equal to my power to detect a statistically significant difference in contribution amount at the same n?
sdContribution #15.39155
sdTime #22.95667
for (i in seq(20,200,by=10)){
print(power.t.test(n=i,delta = 0.05*sdContribution,sd=sdContribution,sig.level = 0.05,power = NULL,type = "two.sample", alternative = c("two.sided")))
print(power.t.test(n=i,delta = 0.05*sdTime,sd=sdTime,sig.level = 0.05,power = NULL,type = "two.sample", alternative = c("two.sided")))}
I am calculating the sample size for proportion test. I would like to have significance level =0.05, power = 0.90 and that the effect size is greater that 5%.
I would like to have statistically significance result if the difference in proportions is more that 5%.
But when I use pwr.2p.test function from pwr package to calculate sample size
pwr.2p.test(sig.level = 0.05, power =0.9, h=0.2, alternative="greater")
I have to specify effect size as Cohen's D. But it's range is said to be in (-3,3), and interpretation of this is:
The meaning of effect size varies by context, but the standard interpretation offered by Cohen (1988) is: cited from here
.8 = large (8/10 of a standard deviation unit)
.5 = moderate (1/2 of a standard deviation)
.2 = small (1/5 of a standard deviation)
My question is, how to formulate that I'd like to detect that there is more that 5% difference in proportions in 2 groups in a Cohen's d statistic?
Thanks for any help!
I used the function ES.h of the package pwr. This function calculate the Effect Size between two proportions. For p1 = 100% and p2 = 95%, we have:
h = ES.h(1, 0.95) = 0.4510268
I understand that this effect size informs the need to detect the distance between the hypothesis.
I'm not very secure in my interpretation, but I used this value to determine the sample size.
pwr.p.test(h=h, sig.level = 0.05, power = 0.8)
Determining the sample size to detect up to 5 points difference in the proportions:
n = 38.58352
To detect a difference of 10 points, the sample size decreases because the accuracy decreases. So, to h = ES.h(1, 0.90) = 0.6435011, so we have: n = 18.95432.
This is my interpretation? What do you think? Am I right?