I was wondering how I could check via simulation in R that the 95% Confidence Interval obtained from a binomial test with 5 successes in 15 trials when TRUE p = .5 has a 95% "Coverage Probability" in the long-run?
Here is the 95% CI for such a test using R (how can show that the following CI has a 95% coverage in the long-run if TRUE p = .5):
as.numeric(binom.test(x = 5, n = 15, p = .5)[[4]])
# > [1] 0.1182411 0.6161963 (in the long-run 95% of the time, ".5" is contained within these
# two numbers, how to show this in R?)
Something like this?
fun <- function(n = 15, p = 0.5){
x <- rbinom(1, size = n, prob = p)
res <- binom.test(x, n, p)[[4]]
c(Lower = res[1], Upper = res[2])
}
set.seed(3183)
R <- 10000
sim <- t(replicate(R, fun()))
Note that binom.test when called with 5 successes, 15 trials and p = 0.5 will always return the same value, hence the call to rbinom. The number of successes will vary. We can compute the proportion of cases when p is between Lower and Upper.
cov <- mean(sim[,1] <= .5 & .5 <= sim[,2])
Related
I'm finding differences when trying to calculate the CI in R :
x=c(25,30,15,45,22,54)
#IC 1
install.packages("Rmisc")
library(Rmisc)
CI(x,0.95) # [16.30429 ; 47.36238]
#IC2
lclm=mean(x)-(1.96*sd(x)/sqrt(length(x))) #19.99285
Uclm=mean(x)+(1.96*sd(x)/sqrt(length(x))) #43.67382
I want to know why I don't get same intervals with the two ways.
Thank you!
Your 1.96 is an approximation of the desired quantile from the standard normal distribution which is asymptotically equivalent to a student t-distribution as the sample size tends toward infinity. With your sample size of N = 6, there are considerable differences between the standard normal and a student's t distribution.
Here is the calculation of the desired quantile as per Stéphane's comment:
library(Rmisc)
x <- c(25, 30, 15, 45, 22, 54)
#IC 1
CI(x, 0.95)
#> upper mean lower
#> 47.36238 31.83333 16.30429
#IC2
m <- mean(x)
s <- sd(x)
n <- length(x)
q <- qt(1 - 0.05 / 2, n - 1)
c(
"upper" = m + q * s / sqrt(n),
"mean" = m,
"lower" = m - q * s / sqrt(n)
)
#> upper mean lower
#> 47.36238 31.83333 16.30429
Created on 2021-04-09 by the reprex package (v1.0.0)
Additional to the-mad-statter and Stéphane.
This is the function for the calculation of CI in Rmisc package:
function (x, ci = 0.95)
{
a <- mean(x)
s <- sd(x)
n <- length(x)
error <- qt(ci + (1 - ci)/2, df = n - 1) * s/sqrt(n)
return(c(upper = a + error, mean = a, lower = a - error))
}
Here you can find more deeper information:
https://stats.stackexchange.com/questions/467015/what-is-the-confidence-interval-formula-that-rmisc-package-in-r-uses
The following question we need to solve.
Consider the following binomial probability mass function (pmf):
f(x;m,p) = (m¦x) p^x * (1-p)^(m-x), for x = 0, 1, 2,.....,m,
and otherwise equal to 0. Let X_1, X_2,....,Xn be independent and identically distributed random samples from f(x;m = 20; p = 0:45).
1) Assume n = 15 and calculate the 95% confidence interval on p using the p-hat = Σ_(i=1)^n X_i/mn (an estimator of p). Simulate these confidence intervals 10000 times and
count how often the parameter value p lies within these 10000 confidence intervals.
m <- 20
p <- 0.45
n <- 15
x <- m
nsim <- 10000
counter <- 0
for (i in 1:nsim) {
bpmf <- rbinom(x,m,p)
esti_p <- bpmf/(m*n)
var_bpmf <- var(bpmf)
CI_lower <- esti_p - qnorm(0.975)*sqrt(var_bpmf/n)
CI_upper <- esti_p + qnorm(0.975)*sqrt(var_bpmf/n)
if ((CI_lower<p) & (CI_upper>p)) counter <- counter + 1
}
It doesn't work properly and I don't see what I'm doing wrong. Is there anyone who can help me with this?
When I run my code, I believe the answer now is right, but it gives the following sentence: "There were 50 or more warnings (use warnings() to see the first 50)" When I run this it will give:
"1: In if ((CI_lower < p) & (CI_upper > p)) counter <- counter + ... :
the condition has length > 1 and only the first element will be used".
Also I don't know for sure if;
CI_lower <- esti_p - qnorm(0.975)*sqrt(var_bpmf/n)
CI_upper <- esti_p + qnorm(0.975)*sqrt(var_bpmf/n)
is the right formula to calculate the confidence interval.
m <- 20
p <- 0.45
nsim <- 10000
bpmf <- rbinom(size=m,prob=p,n=nsim)
esti_p <- bpmf/m
var_bpmf <- esti_p*(1-esti_p)/m
CI_lower <- esti_p - qnorm(0.975)*sqrt(var_bpmf)
CI_upper <- esti_p + qnorm(0.975)*sqrt(var_bpmf)
counter <-((CI_lower<p) & (CI_upper>p))
table(counter)
I'm looking for a built-in R function that calculates the power of a one sample hypothesis test for proportions.
The built in function power.prop.test only does TWO SAMPLE hypothesis tests for proportions.
The original question is: "How many times do you have to toss a coin to determine that it is biased?
p.null <- 0.5 # null hypothesis.
We say that a coin is "biased" if the probability of tossing heads is either
greater than 0.51 or less than 0.49. Otherwise we say that it is "good enough"
delta <- 0.01
Here is a function to toss a biased coin N times and return the proportion of heads:
biased.coin <- function(delta, N) {
probs <- runif(N, 0, 1)
heads <- probs[probs < 0.5+delta]
return(length(heads)/N)
}
We fix alpha and beta throughout at the standard values. Our goal is to calculate N.
alpha = 0.05 # 95% confidence interval
beta = 0.8 # Correctly reject the null hypothesis 80% of time.
The first step is to use a simulation.
A single experiment is to toss the coin N times and reject the null hypothesis if the number of heads deviates "too far" from the expected value of N/2
We then repeat the experiment M times and count how many times the null hypothesis is (correctly) rejected.
M <- 1000
simulate.power <- function(delta, N, p.null, M, alpha) {
print(paste("Calculating power for N =", N))
reject <- c()
se <- sqrt(p.null*(1-p.null))/sqrt(N)
for (i in (1:M)) {
heads <- biased.coin(delta, N) # perform an experiment
z <- (heads - p.null)/se # z-score
p.value <- pnorm(-abs(z)) # p-value
reject[i] <- p.value < alpha/2 # Do we rejct the null?
}
return(sum(reject)/M) # proportion of time null was rejected.
}
Next we plot a graph (slow, about 5 minutes):
ns <- seq(1000, 50000, by=1000)
my.pwr <- c()
for (i in (1:length(ns))) {
my.pwr[i] <- simulate.power(delta, ns[i], p.null, M, alpha)
}
plot(ns, my.pwr)
From the graph it looks like the N you need for a power of beta = 0.8 is about 20000.
The simulation is very slow so it would be nice to have a built in function.
A little fiddling around gave me this:
magic <- function(p.null, delta, alpha, N) {
magic <-power.prop.test(p1=p.null,
p2=p.null+delta,
sig.level=alpha,
###################################
n=2*N, # mysterious 2
###################################
alternative="two.sided",
strict=FALSE)
return(magic[["power"]])
}
Let's plot it against our simulated data.
pwr.magic <- c()
for (i in (1:length(ns))) {
pwr.magic[i] <- magic(p.null, delta, alpha, ns[i])
}
points(ns, pwr.magic, pch=20)
The fit is good, but I have no idea why I would need to multiply N by two,
in order to get a one sample power out of a two sample proportion test.
It would be nice if there were a built in function that let you do one sample directly.
Thanks!
You could try
library(pwr)
h <- ES.h(0.51, 0.5) # Compute effect size h for two proportions
pwr.p.test(h = h, n = NULL, sig.level = 0.05, power = 0.8, alternative = "two.sided")
# proportion power calculation for binomial distribution (arcsine transformation)
# h = 0.02000133
# n = 19619.53
# sig.level = 0.05
# power = 0.8
# alternative = two.sided
As an aside, one way to speed up your simulation significantly would be to use rbinom instead of runif:
biased.coin2 <- function(delta, N) {
rbinom(1, N, 0.5 + delta) / N
}
Please help me out.
I am doing Metopolis_hasting within Gibbs to generate a Markov Chian with stationary distribution equal to the joint conditional distribution of (beta,phi) given observed y. Where the model for y is simple linear regression and phi is 1/sigma^2. The full conditional distribution for phi is gamma(shape=shape_0+n/2,rate=rate_0 + 0.5*sum((y$y-b[1]-b[1]*y$x)^2)) where shape_0 and rate_0 are prior distribution of phi (which follows a gamma)
Here is my code:
y <- read.table("...",header = T)
n <- 50
shape_0 <- 10
rate_0 <- 25
shape <- shape_0+n/2
mcmc <- function (n = 10){
X <- matrix(0,n,3)
b <- c(5,2)
phi <- 0.2
X[1,] <- c(b,phi)
count1 <- 0
count2 <- 0
for (i in 2:n){
phi_new <- rnorm(1,phi,1) #generate new phi candidate
rate <- rate_0 + 0.5*sum((y$y-b[1]-b[1]*y$x)^2)
prob1 <- min(dgamma(phi_new,shape = shape,
rate = rate)/dgamma(phi,shape = shape, rate = rate),1)
##here is where I run into trouble, dgamma(phi_new,shape = shape,
##rate = rate)
##and dgamma(phi,shape = shape, rate = rate) both gives 0
u <- runif(1)
if (prob1>u)
{X[i,3] <- phi_new; count1=count1+1}
else {X[i,3] <-phi}
phi <- X[i,3]
....}
I know I should use log transformation on the precision parameter, but I'm not exactly sure how to do it. log(dgamma(phi_new,shape = shape, rate = rate)) would return -inf.
Thank you so much for help.
I have the following code to count how many times a normally distributed random number (mu =245, sd = 24.5, n = 9) goes below 200.
# This is a simulation to count the Binomial from B samples of
# size n from a Normal population with mu and sigma that fall below
# a cutoff x_0 B = 100000; mu = 245; sigma = 24.5; n = 9 x_0 = 200
# for one sample
y_count = numeric(n)
y_average = numeric(n)
x = numeric(n)
for (i in 1:n){
x[i] = rnorm(1,mu,sigma)
if (x[i] < x_0) y_count[i] = y_count[i] + 1
}
y_count
sum(y_count)
# for B samples and computing the estimated probability
y_count = matrix(0,B,n)
x = matrix(0,B,n)
for (j in 1:B){
for (i in 1:n){
x[j,i] = rnorm(1,mu,sigma)
if (x[j,i] < x_0) y_count[j,i] = y_count[j,i] + 1
} }
y_count
y_count_rows = apply(y_count,1,sum)
y_count_rows
prob_est = sum(y_count_rows)/B
prob_est
I would like to be able to compute how often the average of the 9 replicates go below 200 how can I do that modififying this program in R
Could this work for you? It uses the pnorm( ) function, which is the cumulative density function (cdf) for the normal distribution. It returns the area below the given value of x_0 for a given set of mean and sigma values.
mu=245
sigma = 24.5
x_0 = 200
pnorm(q=x_0, mean=mu, sd=sigma)
[1] 0.03312454
Namely, it states that about 3.31% of the draws from a random distribution with that mu and sigma will be below that threshold.