Simulation in R using a counter using mean - r

I have the following code to count how many times a normally distributed random number (mu =245, sd = 24.5, n = 9) goes below 200.
# This is a simulation to count the Binomial from B samples of
# size n from a Normal population with mu and sigma that fall below
# a cutoff x_0 B = 100000; mu = 245; sigma = 24.5; n = 9 x_0 = 200
# for one sample
y_count = numeric(n)
y_average = numeric(n)
x = numeric(n)
for (i in 1:n){
x[i] = rnorm(1,mu,sigma)
if (x[i] < x_0) y_count[i] = y_count[i] + 1
}
y_count
sum(y_count)
# for B samples and computing the estimated probability
y_count = matrix(0,B,n)
x = matrix(0,B,n)
for (j in 1:B){
for (i in 1:n){
x[j,i] = rnorm(1,mu,sigma)
if (x[j,i] < x_0) y_count[j,i] = y_count[j,i] + 1
} }
y_count
y_count_rows = apply(y_count,1,sum)
y_count_rows
prob_est = sum(y_count_rows)/B
prob_est
I would like to be able to compute how often the average of the 9 replicates go below 200 how can I do that modififying this program in R

Could this work for you? It uses the pnorm( ) function, which is the cumulative density function (cdf) for the normal distribution. It returns the area below the given value of x_0 for a given set of mean and sigma values.
mu=245
sigma = 24.5
x_0 = 200
pnorm(q=x_0, mean=mu, sd=sigma)
[1] 0.03312454
Namely, it states that about 3.31% of the draws from a random distribution with that mu and sigma will be below that threshold.

Related

Estimating Bias in R

Write a simulation experiment to estimate the bias of the estimator λˆ= 1/ X¯ by sampling
using x=rexp(n,rate=5) and recording the values of 1/mean(x). You should find that the
bias is λ/n−1. Here we’ve used λ = 5 but the result will hold for any λ.
Here is my solution ( I dont get λ/n−1). Am I doing something wrong here?
set.seed(1)
lambda <- 5
x <- rexp(n= 1e5, rate = lambda )
samp.mean <- mean(x)
lam.est <- 1/samp.mean
lam.est ##4.986549
bias <- abs(lambda - lam.est)
bias ##0.01345146
To start with, there is a mistake in your formula. The bias of the lambda estimator is not lambda/n-1 but lambda/(n-1)!
Then note that in order to carry out this experiment correctly, it is not enough to calculate the estimated estimator once.
Do the experiment "n" times on the vector of size "nx".
lambda = 3
nx = 150
n = 1e5
set.seed(1)
out = vector("numeric", n)
for(i in 1:n){
out[i] = 1/mean(rexp(n= nx, rate = lambda))
}
lambda/(nx-1)
mean(out)
bias = abs((mean(out)-lambda))
As you can see for lambda = 3 and nx = 150 the expression lambda/(nx-1) is 0.02013423. And your estimated lambda is 3.019485.
lambda = 5
nx = 200
n = 1e5
set.seed(1)
out = vector("numeric", n)
for(i in 1:n){
out[i] = 1/mean(rexp(n= nx, rate = lambda))
}
lambda/(nx-1)
mean(out)
bias = abs((mean(out)-lambda))
However, for lambda = 5 and nx = 200, the expression lambda/(nx-1) is 0.02512563. And your estimated lambda is 5.024315.
Perform this experiment for other values of lambda and nx and you will find that the bias of this estimator is lambda/(n-1).

Running rJAGS when the likelihood is a custom density

I am trying to figure out how to sample from a custom density in rJAGS but am running into issues. having searched the site, I saw that there is a zeroes (or ones) trick that can be employed based on BUGS code but am having a hard time with its implementation in rJAGS. I think I am doing it correctly but keep getting the following error:
Error in jags.model(model1.spec, data = list(x = x, N = N), n.chains = 4, :
Error in node dpois(lambda)
Length mismatch in Node::setValue
Here is my rJAGS code for reproducibility:
library(rjags)
set.seed(4)
N = 100
x = rexp(N, 3)
L = quantile(x, prob = 1) # Censoring point
censor = ifelse(x <= L, 1, 0) # Censoring indicator
x[censor == 1] <- L
model1.string <-"
model {
for (i in 1:N){
x[i] ~ dpois(lambda)
lambda <- -N*log(1-exp(-(1/mu)))
}
mu ~ dlnorm(mup, taup)
mup <- log(.0001)
taup <- 1/49
R <- 1 - exp(-(1/mu) * .0001)
}
"
model1.spec<-textConnection(model1.string)
jags <- jags.model(model1.spec,
data = list('x' = x,
'N' = N),
n.chains=4,
n.adapt=100)
Here, my negative log likelihood of the density I am interested in is -N*log(1-exp(-(1/mu))). Is there an obvious mistake in the code?
Using the zeros trick, the variable on the left-hand side of the dpois() relationship has to be an N-length vector of zeros. The variable x should show up in the likelihood somewhere. Here is an example using the normal distribution.
set.seed(519)
N <- 100
x <- rnorm(100, mean=3)
z <- rep(0, N)
C <- 10
pi <- pi
model1.string <-"
model {
for (i in 1:N){
lambda[i] <- pow(2*pi*sig2, -0.5) * exp(-.5*pow(x[i]-mu, 2)/sig2)
loglam[i] <- log(lambda[i]) + C
z[i] ~ dpois(loglam[i])
}
mu ~ dnorm(0,.1)
tau ~ dgamma(1,.1)
sig2 <- pow(tau, -1)
sumLL <- sum(log(lambda[]))
}
"
model1.spec<-textConnection(model1.string)
set.seed(519)
jags <- jags.model(model1.spec,
data = list('x' = x,
'z' = z,
'N' = N,
'C' = C,
'pi' = pi),
inits = function()list(tau = 1, mu = 3),
n.chains=4,
n.adapt=100)
samps1 <- coda.samples(jags, c("mu", "sig2"), n.iter=1000)
summary(samps1)
Iterations = 101:1100
Thinning interval = 1
Number of chains = 4
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
mu 4.493 2.1566 0.034100 0.1821
sig2 1.490 0.5635 0.008909 0.1144
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
mu 0.6709 3.541 5.218 5.993 7.197
sig2 0.7909 0.999 1.357 1.850 2.779

Fit Variance gamma to data

I am using R studio to estimate paramters for data under Variance Gamma. I want to fit this data to the data and find estimates of parameters. The code I have is
x<-c(1291,849,238,140,118,108,87,70,63,58,50,47,21,21,19)
library(VarianceGamma)
init<-c(0,0.5,0,0.5)
vgFit(x, freq = NULL, breaks = NULL, paramStart = init, startMethod = "Nelder-Mead", startValues = "SL", method = "Nelder-Mead", hessian = FALSE, plots = TRUE)
The error I got was:
Error in optim(paramStart, llsklp, NULL, method = startMethodSL, hessian = FALSE, :
function cannot be evaluated at initial parameters
I am not sure what the issue is?
The error might suggest divergence. Based on your previous questions, I'm wildly guessing the x is the raw number of the stock values. So a log-transformation may be necessary before modelling the change per time unit (ex. daily returns).
x <- c(1291,849,238,140,118,108,87,70,63,58,50,47,21,21,19)
dx <- log(x)[2:length(dx)] - log(x)[1:(length(dx)-1)]
vgFit(dx)
#Parameter estimates:
# vgC sigma theta nu
# 0.16887 0.03128 -0.47164 0.27558
We may want to compare with simulated data. I implemented two methods and they seem equivalent for large observation number nt.
Method 2 is according to below:
#Simulating VG as a time-fixed Brownian Motion
set.seed(1)
nt = 15 #number of observations
T = nt - 1 #total time
dt = rep(T/(nt-1), nt-1) #fixed time increments
r = 1 + 0.16887 #interest rate
vgC = (r-1)
sigma = 0.03128
theta = -0.47164
nu = 0.27558
V_ = rep(NA,nt) #Simulations for log stock value
V_[1] = 7.163172 #log(x[1])
V2_ = V_ #alternative simulation method
for(i in 2:nt)
{#method 1: by VarianceGamma package
V_[i] <- V_[i-1] + rvg(1,vgC=vgC*dt[i-1], sigma=sigma, theta=theta, nu=nu)
#method 2: by R built-in packages
gamma_i<-rgamma(1, shape=dt[i-1]/nu, scale = nu)
normal<-rnorm(1, mean=0, sd=sigma*sqrt(gamma_i))
V2_[i] <- V2_[i-1] + vgC*dt[i-1] + theta*gamma_i + normal
}
# Visual comparison
x11(width=4,height=4)
plot(x, xlab='Time',ylab='Stock value',type='l')
lines(exp(V_), col='red')
lines(exp(V2_), col='blue')
legend('topright',legend=c('Observed','Method1','Method2'),fill=c('black','red','blue'))
The resulting parameters suggest unstable estimations due to small sample size nt:
#The real parameter:
c(vgC*dt[1], sigma, theta, nu).
# vgC sigma theta nu
# 0.16887 0.03128 -0.47164 0.27558
#Parameter estimates for 1st data set:
dV = V_[2:nt] - V_[1:(nt-1)]
vgFit(dV)
# vgC sigma theta nu
#-0.9851 0.3480 1.2382 2.0000
#Parameter estimates for 2nd data set:
dV2 = V2_[2:nt] - V2_[1:(nt-1)]
vgFit(dV2)
# vgC sigma theta nu
#-0.78033 0.07641 0.52414 0.11840
In addition, the rvg function is assuming fixed time increments. We can relax that hypothesis by #Louis Marascio's answer using log-likelihood approach.
#Simulating VG as a time-changed Brownian Motion
set.seed(1)
nt = 100 #Increase the number of observations!
T = nt-1
dt = runif(nt-1) #random time increments
dt = dt/sum(dt)*T
r = 1 + 0.16887
vgC = (r-1)
sigma = 0.03128
theta = -0.47164
nu = 0.27558
V_ = rep(NA,nt) #simulations for log stock value
V_[1] = 7.163172
for(i in 2:nt)
{V_[i] <- V_[i-1] + rvg(1,vgC=vgC*dt[i-1], sigma=sigma, theta=theta, nu=nu)
}
dV = V_[2:nt] - V_[1:(nt-1)]
# -log-likelihood function with different time increments
ll = function(par){
if(par[2]>0 & par[4]>0)
{tem = 0
for (i in 1:(length(dV)))
{tem = tem - log(dvg(dV[i], vgC = par[1]*dt[i], sigma=par[2], theta=par[3], nu = par[4]))
}
return (tem)
}
else return(Inf)}
Indeed, the results show better estimation by relaxing the fixed time assumption:
#The real parameters:
c(vgC, sigma, theta, nu)
# vgC sigma theta nu
# 0.16887 0.03128 -0.47164 0.27558
#Assuming fixed time increments
vgFit(dV)$param*c(1/mean(dt),1,1,1)
# vgC sigma theta nu
#-0.2445969 0.3299023 -0.0696895 1.5623556
#Assuming different time increments
optim(vgFit(dV)$param*c(1/mean(dt),1,1,1),ll,
method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN", "Brent")[5])
# vgC sigma theta nu
# 0.16503125 0.03241617 -0.50193694 0.28221985
Finally, the confidence intervals for the estimated parameters may be obtained by multiple simulations:
set.seed(1)
out = NULL
for (j in 1:100) #100 simulations
{V_ = rep(NA,nt)
V_[1] = 7.163172
for(i in 2:nt)
{V_[i] <- V_[i-1] + rvg(1,vgC=vgC*dt[i-1], sigma=sigma, theta=theta, nu=nu)
}
dV = V_[2:nt] - V_[1:(nt-1)]
#to skip divergence
tem <- try(vgFit(dV)$param)
if (inherits(tem, "try-error")) next
out = rbind(out,tem)
}
apply(out,2,mean)
# vgC sigma theta nu
#-0.8735168 0.1652970 0.4737270 0.9821458
apply(out,2,sd)
# vgC sigma theta nu
#2.8935938 0.3092993 2.6833866 1.3161695

Confidence Interval (CI) simulation in R: How?

I was wondering how I could check via simulation in R that the 95% Confidence Interval obtained from a binomial test with 5 successes in 15 trials when TRUE p = .5 has a 95% "Coverage Probability" in the long-run?
Here is the 95% CI for such a test using R (how can show that the following CI has a 95% coverage in the long-run if TRUE p = .5):
as.numeric(binom.test(x = 5, n = 15, p = .5)[[4]])
# > [1] 0.1182411 0.6161963 (in the long-run 95% of the time, ".5" is contained within these
# two numbers, how to show this in R?)
Something like this?
fun <- function(n = 15, p = 0.5){
x <- rbinom(1, size = n, prob = p)
res <- binom.test(x, n, p)[[4]]
c(Lower = res[1], Upper = res[2])
}
set.seed(3183)
R <- 10000
sim <- t(replicate(R, fun()))
Note that binom.test when called with 5 successes, 15 trials and p = 0.5 will always return the same value, hence the call to rbinom. The number of successes will vary. We can compute the proportion of cases when p is between Lower and Upper.
cov <- mean(sim[,1] <= .5 & .5 <= sim[,2])

Bayesian simple linear regression Gibbs Sampling with gamma prior

Please help me out.
I am doing Metopolis_hasting within Gibbs to generate a Markov Chian with stationary distribution equal to the joint conditional distribution of (beta,phi) given observed y. Where the model for y is simple linear regression and phi is 1/sigma^2. The full conditional distribution for phi is gamma(shape=shape_0+n/2,rate=rate_0 + 0.5*sum((y$y-b[1]-b[1]*y$x)^2)) where shape_0 and rate_0 are prior distribution of phi (which follows a gamma)
Here is my code:
y <- read.table("...",header = T)
n <- 50
shape_0 <- 10
rate_0 <- 25
shape <- shape_0+n/2
mcmc <- function (n = 10){
X <- matrix(0,n,3)
b <- c(5,2)
phi <- 0.2
X[1,] <- c(b,phi)
count1 <- 0
count2 <- 0
for (i in 2:n){
phi_new <- rnorm(1,phi,1) #generate new phi candidate
rate <- rate_0 + 0.5*sum((y$y-b[1]-b[1]*y$x)^2)
prob1 <- min(dgamma(phi_new,shape = shape,
rate = rate)/dgamma(phi,shape = shape, rate = rate),1)
##here is where I run into trouble, dgamma(phi_new,shape = shape,
##rate = rate)
##and dgamma(phi,shape = shape, rate = rate) both gives 0
u <- runif(1)
if (prob1>u)
{X[i,3] <- phi_new; count1=count1+1}
else {X[i,3] <-phi}
phi <- X[i,3]
....}
I know I should use log transformation on the precision parameter, but I'm not exactly sure how to do it. log(dgamma(phi_new,shape = shape, rate = rate)) would return -inf.
Thank you so much for help.

Resources