How to simulate a probability event function? - r

I have an event that follows the below code (see previous question) that outputs the total number of success from n, binomial trials.
successes <- function(n, size = 1, prob = 0.01){
event <- function(n, size = 1, prob = 1/100){
trials <- rbinom(n = n, size = size, prob = prob)
sum(trials)
}
event(1000)
Where event(n) tells how many times the event did happen.
Now I would want to simulate the function (when n=1000) 300000 times and know how many times the event happened. (So not when n=300000 but what values does the above function return when it is repeated 300000 times).

Original function:
successes <- function(n, size = 1, prob = 0.01){
trials <- rbinom(n = n, size = size, prob = prob)
sum(trials)
}
Use the replicate function:
results <- replicate(n = 300000 , successes(1000, prob = .1), simplify = TRUE)
Which returns a vector with the function run 3e6 times.

Related

Creating a matrix with random entries with given probabilities in R

I want to create a 100x100 matrix A with entry a_ij being randomly selected from the set {0,1} with P(a_ij=1)=0.2 and P(a_ij=0)=0.8.
This is what I’ve tried so far:
n<-100
matrix<-matrix(0,100,100)
mynumbers<-c(1,0)
myprobs<-c(0.2,0.8)
for(i in 1:100){
for (j in 1:100){
matrix[i,j]<-sample(mynumbers, 1, replace=TRUE, prob=myprobs)
}
}
matrix
I’m not sure about the sample size being 1, but this way only seems to work if I choose size=1... Is this the correct way to do it? Thank you in advance!
As #akrun noted there are much easier ways. A matrix of 100 x 100 means 10,000 entries. prob = .2 is saying success = 1 = P(a_ij=1)=0.2, size in this case means one trial at a time. The matrix parameters should be pretty self-evident.
set.seed(2020)
trials <- rbinom(n = 10000, size = 1, prob = .2)
my.matrix <- matrix(trials, nrow = 100, ncol = 100)
or to more closely resemble your code
n <- 10000
mynumbers<-c(1,0)
myprobs<-c(0.2,0.8)
trials2 <- sample(x = mynumbers,
size = n,
replace = TRUE,
prob = myprobs)
my.matrix2 <- matrix(trials2, nrow = 100, ncol = 100)

Function that will generate iter samples of size n from a gamma distribution with shape parameter alpha and rate parameter beta

The function needs to return the mean and standard deviation of each sample.
This is what I have:
sample_gamma <- function(alpha, beta, n, iter) {
mean = alpha/beta
var = alpha/(beta)^2
sd = sqrt(var)
gamma = rgamma(n,shape = alpha, scale = 1/beta)
sample_gamma = data.frame(mean = replicate(n = iter, expr = mean))
}
I'm very lost for this. I also need to create a data frame for this function.
Thank you for your time.
Edit:
sample_gamma <- function(alpha, beta, n, iter) {
output <- rgamma(iter, alpha, 1/beta)
output_1 <- matrix(output, ncol = iter)
means <- apply(output_1, 2, mean)
sds <- apply(output_1, 2, sd)
mystats <- data.frame(means, sds)
return(mystats)
}
This works except for the sds. It's returning NAs.
It's not really clear to me what you want. But say you want to create 10 samples of size 1000, alpha = 1, beta = 2. Then you can create a single stream of rgamma realizations, dimension them into a matrix, then get your stats with apply, and finally create a data frame with those vectors:
output <- rgamma(10*1000, 1, 1/2)
output <- matrix(output, ncol = 10)
means <- apply(output, 2, mean)
sds <- apply(output, 2, sd)
mystats <- data.frame(means, sds)
You could wrap your function around that code, replacing the hard values with parameters.

R - Coding Function for Bootstrap CI Coverage Property

I need to write a function that performs a simulation to evaluate the coverage of a bootstrap confidence interval for the variance of n samples from a normal distribution. Belowis what I've attempted but it keeps returning a mean of 0 or 0.002 for the number of samples that lie within the CI...
Var_CI_Coverage <- function(true_mean,true_var, nsim, nboot, alpha, nsamples){
cover = NULL
for(k in 1:nsim){
Var = as.numeric()
y <- rnorm(1, mean = true_mean, sd = sqrt(true_var))
for(i in 1:nboot){
resample_y <- sample(y, size = nsamples, replace = TRUE)
Var[i] <- var(resample_y)
}
LB <- quantile(Var, probs=c(alpha/2))
UB <- quantile(Var, probs=c(1 - (alpha/2)))
cover[k] <- ifelse(LB <= true_var & UB >= true_var, 1, 0)
}
return(mean(cover))
}
Var_CI_Coverage(true_mean= 0, true_var = 4, nsim = 500, nboot = 1000, alpha = 0.05, nsamples = 10)
The main problem is you generate y using
y <- rnorm(1, mean = true_mean, sd = sqrt(true_var))
which means y is a single value, and all your bootstrap samples are just that single y value repeated nsamples times. You need
y <- rnorm(nsamples, mean = true_mean, sd = sqrt(true_var))
Then you get samples with actual variance, and you get a coverage estimate that looks more in the right ballpark (no comment on whether it's correct, I haven't tried to check).

multinomial MLE error in R

I am new to R, Trying do MLE using mle2 in bbmle package.
R Code:
rm(list = ls())
library(bbmle)
N <- 100
testmat=rmultinom(N, size=3, prob = c(0.1,0.2,0.8))
LL<- function(s, p){-sum(dmultinom(x=testmat, size = s, prob=p, log = TRUE))}
values.start <- list(3, c(0.1,0.2,0.7))
names(values.start) <- parnames(LL) <- paste0("b",0:1)
mle2(LL, start =values.start)
I keep getting this error
"Error in mle2(LL, start = values.start) :
some named arguments in 'start' are not arguments to the specified log-likelihood function"
I am using mle2, I thought its not needed here. At first I was using "mle"
N <- 100
testmat=t(rmultinom(3, size=3, prob = c(0.1,0.2,0.8)))
LL<- function(s, p1,p2,p3){prob=unlist(as.list(environment()))[2:4]
-sum(dmultinom(x=testmat, size = s, prob=prob, log = TRUE))}
values.start <- list(s=3,p1=0.1,p2=0.2,p3=7)
mle(LL, start =values.start)
which game this error
""Error in dmultinom(x = testmat, size = s, prob = prob, log = TRUE) :
x[] and prob[] must be equal length vectors."
I even edited it as follows
N <- 100
testmat=t(rmultinom(3, size=3, prob = c(0.1,0.2,0.8)))
LL<- function(s=3, p1=0.1,p2=0.2,p3=0.7){
prob=unlist(as.list(environment()))[2:4]
s=unlist(as.list(environment()))[1]
-sum(dmultinom(x=testmat, size = s, prob=prob, log = TRUE))}
mle(LL)
error still persists. Finally I was able to decode the errors, thanks a lot.
library(bbmle)
N <- 1000
X=rmultinom(N,size=3,prob = rep(1/3, 3))
LL <- function( p_1 = 0.1,p_2=0.1,p_3=0.8) {
p <- unlist(as.list(environment()))
-sum(apply(X, MAR = 2, dmultinom, size = NULL, prob = c(p_1,p_2,p_3), log = TRUE))
}
mle(LL,method = "L-BFGS-B", lower = c(-Inf, 0), upper = c(Inf, Inf))
In my current ploblem, I have 5k features, therefore I need to write something like this.
function( p_1 = 0.1,p_2=0.1,p_3=0.8...., p_5000=..)
which not possible. Is there any way out of it?
I was able to do it with mle2. this way
rm(list = ls())
library(bbmle)
N <- 1000
s<-100
X=rmultinom(N,size=s,prob = rep(1/s, s))
LL= function(params){
p <- unlist(as.list(environment()))
minusll = -sum(apply(X, MAR = 2, dmultinom, size = NULL, prob = p, log = TRUE))
return(minusll)
}
values.start<-vector(mode="list", length=s)
values.start <- c(0.02,0.01*rep(98/99,99))
names(values.start) <- parnames(LL)<-paste0("b",1:s)
mle2(LL, start =values.start,vecpar = TRUE, method = "L-BFGS-B", lower = c(rep(0,s)), upper = c(rep(1,s)))
Above I was doing Multinomial MLE parameter estimation for dimension of 100, and 1000 samples. I was able to solve the problem of vector parameters. Now I am having this error
Error in optim(par = c(0.02, 0.0098989898989899, 0.0098989898989899, 0.0098989898989899, :
L-BFGS-B needs finite values of 'fn'
I found out that this error is due to 'fn=Inf', might be due to one of the propabilities becoming zero, therefore fn=-log(0) = Inf. Is there any way to solve this problem?
Thanks for the help.

How to get the time of sampling in rjags?

I have implemented the LDA model with rjags. And I successfully got the final samples with:
jags <- jags.model('../lda_jags.bug',
data = data,
n.chains = 1,
n.adapt = 100)
update(jags, 2000)
samples <- jags.samples(jags,
c('theta', 'phi', 'z'),
1000)
Now I can use samples$theta or samples$phi to get the result of theta and phi. But how can I know how long did it take to sample? Thanks!
As #eipi10 states you can use system.time() around the update() call to time the process within R. Or, you can use the runjags package which prints the (total) time taken in updating the model, including all previous calls to extend.jags:
library('runjags')
results <- run.jags('../lda_jags.bug', monitor = c('theta', 'phi', 'z'),
data = data, n.chains = 1, adapt = 100, burnin = 2000, sample = 1000)
results
# or:
jags <- jags.model('../lda_jags.bug',
data = data,
n.chains = 1,
n.adapt = 0)
runjags <- as.runjags(jags, monitor = c('theta', 'phi', 'z'))
results <- extend.jags(runjags, adapt = 100, burnin = 2000, sample = 1000)
results
results <- extend.jags(runjags, sample = 1000)
results

Resources