Estimating Bias in R - r

Write a simulation experiment to estimate the bias of the estimator λˆ= 1/ X¯ by sampling
using x=rexp(n,rate=5) and recording the values of 1/mean(x). You should find that the
bias is λ/n−1. Here we’ve used λ = 5 but the result will hold for any λ.
Here is my solution ( I dont get λ/n−1). Am I doing something wrong here?
set.seed(1)
lambda <- 5
x <- rexp(n= 1e5, rate = lambda )
samp.mean <- mean(x)
lam.est <- 1/samp.mean
lam.est ##4.986549
bias <- abs(lambda - lam.est)
bias ##0.01345146

To start with, there is a mistake in your formula. The bias of the lambda estimator is not lambda/n-1 but lambda/(n-1)!
Then note that in order to carry out this experiment correctly, it is not enough to calculate the estimated estimator once.
Do the experiment "n" times on the vector of size "nx".
lambda = 3
nx = 150
n = 1e5
set.seed(1)
out = vector("numeric", n)
for(i in 1:n){
out[i] = 1/mean(rexp(n= nx, rate = lambda))
}
lambda/(nx-1)
mean(out)
bias = abs((mean(out)-lambda))
As you can see for lambda = 3 and nx = 150 the expression lambda/(nx-1) is 0.02013423. And your estimated lambda is 3.019485.
lambda = 5
nx = 200
n = 1e5
set.seed(1)
out = vector("numeric", n)
for(i in 1:n){
out[i] = 1/mean(rexp(n= nx, rate = lambda))
}
lambda/(nx-1)
mean(out)
bias = abs((mean(out)-lambda))
However, for lambda = 5 and nx = 200, the expression lambda/(nx-1) is 0.02512563. And your estimated lambda is 5.024315.
Perform this experiment for other values of lambda and nx and you will find that the bias of this estimator is lambda/(n-1).

Related

Better optimizer for constrained multinomial likelihood

Using R, I wish to estimate a vector of parameters a_i (of arbitrary length, i.e. i = 1,...,s) with a multinomial likelihood using a corresponding vector of observations n_i totaling a sample size of N=sum_i (n_i). The probabilities p_i of the multinomial are determined by said a parameters and measurements of variable x such that p_i = (a_i * x_i)/sum_i (a_i * x_i). I wish further to impose the constraint that sum_i a_i = 1.
I've managed to get optim() to do the job as follows --- implementing the two tricks I've seen of estimating the first a_1 as 1 - sum_{i=2} a_i and additionally renormalizing all estimates to 1 --- but the accuracy and dependability of achieving convergence remains rather variable (in addition to being sensitive to the vector of starting estimates I provide), even when N is very large.
I would appreciate guidance on more robust alternatives and/or improvements.
s <- 10 # vector length
N <- 1000 # total sample size
# variable
x_i <- round(rlnorm(n_p, 2.5, 1.5))
# true parameter values
a_i <- rbeta(s, 2, 2)
a_i <- a_i / sum(a_i)
# generate observations
n_i <- rmultinom(1, N, (a_i * x_i) / sum(a_i * x_i))
# negative log-likelihood for parameters `par'
nll = function(par) {
if (any(0 > par | par > 1)) {
return(NA)
}else{
par <- c(1 - sum(par), par) # estimate first as remainder
par <- par / sum(par) # normalize
p_i <- (par * x_i) / sum(par * x_i) # model for probabilities
- sum(dmultinom(
x = n_i,
size = N,
prob = p_i,
log = TRUE
)) }
}
# starting values (dropping first)
start = rep(1/s, s-1)
fit <- optim(par = start,
fn = nll,
control = list(maxit = 10000)
)
ests = c(1 - sum(fit$par), fit$par)
cbind(a_i, ests)
par(pty = 's')
plot(a_i, ests)
abline(0, 1)

Maximum Likelihood Estimation - Choosing between nlm and nloptr

I am attempting to find three parameters by minimizing a negative log-likelihood function in R. I have attempted this using two different commands: nlm and nloptr. I get different results for both of these. As such, I was wondering if it is normal for them to differ and if so, which of the commands I should use for this specific question. Another issue I have is that I have to be able to estimate the standard errors of each the parameters and have no idea how to do this.
Here is my code. The wage, unemployment_duration and employed vectors are all vectors of pre-existing data:
####################### Guessing initial values ###################
mu_guess<- 10
sigma_w_guess<- 3.61
lambda_guess<- 5
#Vector of inputs
initial_guess <- c(mu_guess,sigma_w_guess,lambda_guess)
#Lower and upper bounds
lower_bound <- c(0,0,0)
upper_bound <- c(Inf,Inf,Inf)
####################### Specification 1 ###################################
negative_log_likelihood <- function(x){
#Parameters to estimate
mu <- x[1]
sigma_w <- x[2]
lambda <- x[3]
#Rate of leaving unemployment
hu <- lambda*(1 - plnorm(reservation_wage, meanlog = mu,
sdlog = sigma_w, lower.tail = TRUE,
log.p = FALSE))
#Predefining vectors
fu<-matrix(0,1458,1)
fu_hat <- matrix(0,1458,1)
fa<- matrix(0,1458,1)
L <- matrix(0,1458,1)
for (i in 1:1458){
#Probability of a completed spell (vector)
fu[i,1] = hu*exp(-hu*unemployment_duration[i,1])
#Probability of an ongoing spell (vector)
fu_hat[i,1] = exp(-hu*unemployment_duration[i,1])
#Distribution of wage accounting for truncation at reservation wage
fa[i,1] = dlnorm(wage[i,1], meanlog = mu, sdlog = sigma_w,
log = FALSE)/(1 - plnorm(reservation_wage,
meanlog = mu, sdlog = sigma_w,
lower.tail = TRUE,log.p = FALSE))
#log-ikelihood of an observation
if(employed[i,1]==1){L[i,1]=log(fu[i,1]*fa[i,1])}
else{L[i,1]=log(fu_hat[i,1])}
}
#Log-likelihood function
neg_log_lik <- -sum(L)
return(neg_log_lik)
}

How to calculate Joint Probability of more than two random variable in R

I have written the following code which read the data(which has four parameter) from a file and calculate the mean and co-variance matrix of Multivariate distribution. Parameter mu and sigma is calculated using MLE(Maximum Likelihood Method).
library(mvtnorm)
df <- read.csv("../data/dataset.train", header=FALSE)
mu = colMeans(df)
sigma <- matrix(0, nrow = ncol(df), ncol = ncol(df))
for(row in 1:nrow(df)) {
temp = df[row,]-mu
sigma = sigma + as.matrix(t(temp)) %*% as.matrix(temp)
}
sigma = sigma / nrow(df)
Now I want to find the following probability $P(a1 < X1 < a2,X2 = b1,X3 = c1,d1 < X4 < d2)$, How can I compute the this probability.
My Effort:
I have tried using
pmvnorm(lower=-Inf, upper=Inf, mean=rep(0, length(lower)),corr=NULL, sigma=NULL, algorithm = GenzBretz(), ...) method
by putting lower and upper limit following
lower = [a1,b1,c1,d1] and upper = [a2,b1,c1,d2]
but it gives the error "lower==upper". I think this is because my second and third value in lower and upper is same.
But I don't know how to compute this in R.
Please help me. Thanks in advance.

Bayesian simple linear regression Gibbs Sampling with gamma prior

Please help me out.
I am doing Metopolis_hasting within Gibbs to generate a Markov Chian with stationary distribution equal to the joint conditional distribution of (beta,phi) given observed y. Where the model for y is simple linear regression and phi is 1/sigma^2. The full conditional distribution for phi is gamma(shape=shape_0+n/2,rate=rate_0 + 0.5*sum((y$y-b[1]-b[1]*y$x)^2)) where shape_0 and rate_0 are prior distribution of phi (which follows a gamma)
Here is my code:
y <- read.table("...",header = T)
n <- 50
shape_0 <- 10
rate_0 <- 25
shape <- shape_0+n/2
mcmc <- function (n = 10){
X <- matrix(0,n,3)
b <- c(5,2)
phi <- 0.2
X[1,] <- c(b,phi)
count1 <- 0
count2 <- 0
for (i in 2:n){
phi_new <- rnorm(1,phi,1) #generate new phi candidate
rate <- rate_0 + 0.5*sum((y$y-b[1]-b[1]*y$x)^2)
prob1 <- min(dgamma(phi_new,shape = shape,
rate = rate)/dgamma(phi,shape = shape, rate = rate),1)
##here is where I run into trouble, dgamma(phi_new,shape = shape,
##rate = rate)
##and dgamma(phi,shape = shape, rate = rate) both gives 0
u <- runif(1)
if (prob1>u)
{X[i,3] <- phi_new; count1=count1+1}
else {X[i,3] <-phi}
phi <- X[i,3]
....}
I know I should use log transformation on the precision parameter, but I'm not exactly sure how to do it. log(dgamma(phi_new,shape = shape, rate = rate)) would return -inf.
Thank you so much for help.

Simulation in R using a counter using mean

I have the following code to count how many times a normally distributed random number (mu =245, sd = 24.5, n = 9) goes below 200.
# This is a simulation to count the Binomial from B samples of
# size n from a Normal population with mu and sigma that fall below
# a cutoff x_0 B = 100000; mu = 245; sigma = 24.5; n = 9 x_0 = 200
# for one sample
y_count = numeric(n)
y_average = numeric(n)
x = numeric(n)
for (i in 1:n){
x[i] = rnorm(1,mu,sigma)
if (x[i] < x_0) y_count[i] = y_count[i] + 1
}
y_count
sum(y_count)
# for B samples and computing the estimated probability
y_count = matrix(0,B,n)
x = matrix(0,B,n)
for (j in 1:B){
for (i in 1:n){
x[j,i] = rnorm(1,mu,sigma)
if (x[j,i] < x_0) y_count[j,i] = y_count[j,i] + 1
} }
y_count
y_count_rows = apply(y_count,1,sum)
y_count_rows
prob_est = sum(y_count_rows)/B
prob_est
I would like to be able to compute how often the average of the 9 replicates go below 200 how can I do that modififying this program in R
Could this work for you? It uses the pnorm( ) function, which is the cumulative density function (cdf) for the normal distribution. It returns the area below the given value of x_0 for a given set of mean and sigma values.
mu=245
sigma = 24.5
x_0 = 200
pnorm(q=x_0, mean=mu, sd=sigma)
[1] 0.03312454
Namely, it states that about 3.31% of the draws from a random distribution with that mu and sigma will be below that threshold.

Resources