Bayesian R Programming - r

binomial <- function(nmax = 100,
thr = 0.95,
alpha = 1,
beta = 1,
p_true = 0.5,
p_0 = 0.5){
for(j in seq.int(nmax, 0)){
if(pbeta(q = p_0, shape1 = alpha + j, shape2 = beta + nmax - j, lower.tail = FALSE) < thr){
targetatnmax <- j + 1
} else {
print(
break
}
}
result <- list(Success = Success, targeratnmax = targetatnmax)
return(result)
}
res = binomial(nmax,thr,alpha,beta,p_true,p_0)
res
In my program I am trying to find the number of successes needed to exceed 0.95 thr. I am trying to use a for loop with if else statements but when I run it I don't get the value I need. I know my value should be 59 but I cannot seem to get this. I know the code seems really messy but its only because I have been playing around with it for hours. PLEASE ANY HELP

Here is your code after clean-up:
binomial <- function(nmax = 100,
thr = 0.95,
alpha = 1,
beta = 1,
p_true = 0.5,
p_0 = 0.5){
targetatnmax <- 0
for(j in seq.int(0,nmax)){
if(pbeta(q = p_0, shape1 = alpha + j, shape2 = beta + nmax - j, lower.tail = FALSE) < thr){
targetatnmax <- j + 1
} else {
break
}
}
result <- list(targeratnmax = targetatnmax)
return(result)
}
res = binomial()
res
#$targeratnmax
#[1] 59
The main problem (other than the syntax errors and not existent objects) was that your loop ran from nmax to 0 instead of the other way arround.
There is probably potential for optimization, but my understanding of the statistics is not good enough to really tackle that.

Related

MCMC for estimating negative binomial distribution

I want to estimate parameters of negative binomial distribution using MCMC Metropolis-Hastings algorithm. In other words, I have sample:
set.seed(42)
y <- rnbinom(20, size = 3, prob = 0.2)
and I want to write algorithm that will estimate parameter of size and parameter of prob.
My work so far
I defined prior distribution of size as Poisson:
prior_r <- function(r) {
return(dpois(r, lambda = 2, log = T))
}
And prior distribution of prob as uniform on [0, 1]:
prior_prob <- function(prob) {
return(dunif(prob, min = 0, max = 1, log = T))
}
Moreover for simplicity I defined loglikelihood and joint probability functions:
loglikelihood <- function(data, r, prob) {
loglikelihoodValue <- sum(dnorm(data, mean = r, sd = prob, log = T))
return(loglikelihoodValue)
}
joint <- function(r, prob) {
data <- y
return(loglikelihood(data, r, prob) + prior_r(r) + prior_prob(prob))
}
Finally, the whole algorithm:
run_mcmc <- function(startvalue, iterations) {
chain <- array(dim = c(iterations + 1, 2))
chain[1, ] <- startvalue
for (i in 1:iterations) {
proposal_r <- rpois(1, lambda = chain[i, 1])
proposal_prob <- chain[i, 2] + runif(1, min = -0.2, max = 0.2)
quotient <- joint(proposal_r, proposal_prob) - joint(chain[i, 1], chain[i, 2])
if (runif(1, 0, 1) < min(1, exp(quotient))) chain[i + 1, ] <- c(proposal_r, proposal_prob)
else chain[i + 1, ] <- chain[i, ]
}
return(chain)
}
The problem
Problem that I'm having is that when I run it with starting values even very close to correct ones:
iterations <- 2000
startvalue <- c(4, 0.25)
res <- run_mcmc(startvalue, iterations)
I'll obtain posterior distribution which is obviously wrong. For example
> colMeans(res)
[1] 11.963018 0.994533
As you can see, size is located very close to point 12, and probability is located in point 1.
Do you know what's the cause of those phenomeons?
Change dnorm in loglikelihood to dnbinom and fix the proposal for prob so it doesn't go outside (0,1):
set.seed(42)
y <- rnbinom(20, size = 3, prob = 0.2)
prior_r <- function(r) {
return(dpois(r, lambda = 2, log = T))
}
prior_prob <- function(prob) {
return(dunif(prob, min = 0, max = 1, log = TRUE))
}
loglikelihood <- function(data, r, prob) {
loglikelihoodValue <- sum(dnbinom(data, size = r, prob = prob, log = TRUE))
return(loglikelihoodValue)
}
joint <- function(r, prob) {
return(loglikelihood(y, r, prob) + prior_r(r) + prior_prob(prob))
}
run_mcmc <- function(startvalue, iterations) {
chain <- array(dim = c(iterations + 1, 2))
chain[1, ] <- startvalue
for (i in 1:iterations) {
proposal_r <- rpois(1, lambda = chain[i, 1])
proposal_prob <- chain[i, 2] + runif(1, min = max(-0.2, -chain[i,2]), max = min(0.2, 1 - chain[i,2]))
quotient <- joint(proposal_r, proposal_prob) - joint(chain[i, 1], chain[i, 2])
if (runif(1, 0, 1) < min(1, exp(quotient))) {
chain[i + 1, ] <- c(proposal_r, proposal_prob)
} else {
chain[i + 1, ] <- chain[i, ]
}
}
return(chain)
}
iterations <- 2000
startvalue <- c(4, 0.25)
res <- run_mcmc(startvalue, iterations)
colMeans(res)
#> [1] 3.1009495 0.1988177

Why is my Monte Carlo Integration wrong by a factor of 2?

I am trying to integrate the following function using a Monte Carlo Integration. The interval I want to integrate is x <- seq(0, 1, by = 0.01) and y <- seq(0, 1, by = 0.01).
my.f <- function(x, y){
result = x^2 + sin(x) + exp(cos(y))
return(result)
}
I calculated the integral using the cubature package.
library(cubature)
library(plotly)
# Rewriting the function, so it can be integrated
cub.function <- function(x){
result = x[1]^2 + sin(x[1]) + exp(cos(x[2]))
return(result)
}
cub.integral <- adaptIntegrate(f = cub.function, lowerLimit = c(0,0), upperLimit = c(1,1))
The result is 3.134606. But when I use my Monte Carlo Integration Code, see below, my result is about 1.396652. My code is wrong by more than a factor of 2!
What I did:
Since I need a volume to conduct a Monte Carlo Integration, I calculated the function values on the mentioned interval. This will give me an estimation of the maximum and minimum of the function.
# My data range
x <- seq(0, 1, by = 0.01)
y <- seq(0, 1, by = 0.01)
# The matrix, where I save the results
my.f.values <- matrix(0, nrow = length(x), ncol = length(y))
# Calculation of the function values
for(i in 1:length(x)){
for(j in 1:length(y)){
my.f.values[i,j] <- my.f(x = x[i], y = y[j])
}
}
# The maximum and minimum of the function values
max(my.f.values)
min(my.f.values)
# Plotting the surface, but this is not necessary
plot_ly(y = x, x = y, z = my.f.values) %>% add_surface()
So, the volume that we need is simply the maximum of the function values, since 1 * 1 * 4.559753 is simply 4.559753.
# Now, the Monte Carlo Integration
# I found the code online and modified it a bit.
monte = function(x){
tests = rep(0,x)
hits = 0
for(i in 1:x){
y = c(runif(2, min = 0, max = 1), # y[1] is y; y[2] is y
runif(1, min = 0, max = max(my.f.values))) # y[3] is z
if(y[3] < y[1]**2+sin(y[1])*exp(cos(y[2]))){
hits = hits + 1
}
prop = hits / i
est = prop * max(my.f.values)
tests[i] = est
}
return(tests)
}
size = 10000
res = monte(size)
plot(res, type = "l")
lines(x = 1:size, y = rep(cub.integral$integral, size), col = "red")
So, the result is completely wrong. But if I change the function a bit, suddenly is works.
monte = function(x){
tests = rep(0,x)
hits = 0
for(i in 1:x){
x = runif(1)
y = runif(1)
z = runif(1, min = 0, max = max(my.f.values))
if(z < my.f(x = x, y = y)){
hits = hits + 1
}
prop = hits / i
est = prop * max(my.f.values)
tests[i] = est
}
return(tests)
}
size = 10000
res = monte(size)
plot(res, type = "l")
lines(x = 1:size, y = rep(cub.integral$integral, size), col = "red")
Can somebody explain why the result suddenly changes? To me, both functions seem to do the exact same thing.
In your (first) code for monte, this line is in error:
y[3] < y[1]**2+sin(y[1])*exp(cos(y[2]))
Given your definition of my.f, it should surely be
y[3] < y[1]**2 + sin(y[1]) + exp(cos(y[2]))
Or..., given that you shouldn't be repeating yourself unnecessarily:
y[3] < my.f(y[1], y[2])

My variogram code result different from variog() result

I am writing code for producing a variogram. For validating my result, I checked with geoR::variog() but both variograms are different.
I tried to understand the code of variog() to see what happens under the hood but there are so many things happening that I can't seem to understand it. I, in my code, am using the parameters X-coordinate, Y-coordiante, data value, number of lags, minimum lag value, lag interval, azimuth (angle in degrees; 90 corresponds to vertical direction), angle tolerance (in degrees) and maximum bandwidth.
variogram = function(xcor, ycor, data, nlag, minlag, laginv, azm, atol, maxbandw){
dl <- length(data)
lowangle <- azm - atol
upangle <- azm + atol
gamlag <- integer(nlag)
n <- integer(nlag)
dist <- pairdist(xcor, ycor)
maxd <- max(dist)
llag <- seq(minlag, minlag + (nlag-1) * laginv, by = laginv)
hlag <- llag + laginv
for(i in 1:dl){
for(j in i:dl){
if(i != j){
if(xcor[j]- xcor[i] == 0)
theta <- 90
else
theta <- 180/pi * atan((ycor[j] - ycor[i])/(xcor[j] - xcor[i]))
for(k in 1:nlag){
d <- dist[j, i]
b <- abs(d * sin(theta - azm))
if((llag[k] <= d & d < hlag[k]) & (lowangle <= theta & theta < upangle) & (b <= maxbandw)){
gamlag[k] <- gamlag[k] + (data[i] - data[j])^2;
n[k] <- n[k] + 1
}
}
}
}
}
gamlag <- ifelse(n == 0, NA, gamlag/(2*n))
tmp <- data.frame("lag" = llag, "gamma" = gamlag)
return(tmp)
}
function call for the above code
ideal_variogram_2 <- variogram(data3[,1], data3[,2], data3[,3], 18, 0, 0.025, 90, 45, 1000000)
ideal_variogram_2 <- na.omit(ideal_variogram_2)
plot(ideal_variogram_2$lag, ideal_variogram_2$gamma, main = "Using my code")
function call for variog()
geodata1 <- as.geodata(data3, coords.col = 1:2, data.col = 3)
ideal_variogram_1 <- variog(geodata1, coords = geodata1$coords, data = geodata1$data, option = "bin", uvec = seq(0, 0.45, by = 0.025), direction = pi/2, tolerance = pi/4)
df <- data.frame(u = ideal_variogram_1$u, v = ideal_variogram_1$v)
plot(df$u, df$v, main = "Using variog()")
The 2 variograms that I got are at the following link:
Variogram

R : Changing values of variables after certain time

the question I am trying to ask is how to I change one of the values of my variables (noted as LO$M in my list) after I pass a certain time.
The thing I am trying to achieve is that after 20,000 seconds passing I would like to change my value of Lac to the value of Lac at time 20,0000 +10,000
So at t = 20,000, Lac = Lac + 10,000
The issue I am having with my code is that within my if command I have if tt>= 20000, but this leads to the issue that every value of Lac after 20,000 being increased by 10,000 when what i want is that the FIRST value after 20,000 be increased by 10,000.
Basically, after 20,000 of my experiment passing I am trying to inject 10,000 more Lac into the experiment.
My code is given below:
LO = list()
LO$M = c(i = 1, ri = 0, I = 50, Lac = 20, ILac = 0, o = 1, Io = 0, RNAP = 100, RNAPo = 0, r = 0, z = 0)
LO$Pre = matrix(c(1,0,0,0,0,0,0,0,0,0,0,
0,1,0,0,0,0,0,0,0,0,0,
0,0,1,1,0,0,0,0,0,0,0,
0,0,0,0,1,0,0,0,0,0,0,
0,0,1,0,0,1,0,0,0,0,0,
0,0,0,0,0,0,1,0,0,0,0,
0,0,0,0,0,1,0,1,0,0,0,
0,0,0,0,0,0,0,0,1,0,0,
0,0,0,0,0,0,0,0,1,0,0,
0,0,0,0,0,0,0,0,0,1,0,
0,0,0,1,0,0,0,0,0,0,1,
0,1,0,0,0,0,0,0,0,0,0,
0,0,1,0,0,0,0,0,0,0,0,
0,0,0,0,1,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,1,0,
0,0,0,0,0,0,0,0,0,0,1), ncol=11, byrow=TRUE)
LO$Post = matrix(c(1,1,0,0,0,0,0,0,0,0,0,
0,1,1,0,0,0,0,0,0,0,0,
0,0,0,0,1,0,0,0,0,0,0,
0,0,1,1,0,0,0,0,0,0,0,
0,0,0,0,0,0,1,0,0,0,0,
0,0,1,0,0,1,0,0,0,0,0,
0,0,0,0,0,0,0,0,1,0,0,
0,0,0,0,0,1,0,1,0,0,0,
0,0,0,0,0,1,0,1,0,1,0,
0,0,0,0,0,0,0,0,0,1,1,
0,0,0,0,0,0,0,0,0,0,1,
0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,
0,0,0,1,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0), ncol=11, byrow=TRUE)
LO$h = function(x,t,th=c(0.02,0.1,0.005,0.1,1,0.01,0.1,0.01,0.03,0.1,1e-05,0.01,0.002,0.01,0.001))
{
with(as.list(c(x, th)), {
return(c(th[1]*i, th[2]*ri, th[3]*I*Lac, th[4]*ILac, th[5]*I*o, th[6]*Io, th[7]*o*RNAP,
th[8]*RNAPo, th[9]*RNAPo, th[10]*r, th[11]*Lac*z, th[12]*ri, th[13]*I,
th[13]*ILac, th[14]*r, th[15]*z))
})
}
gillespie1 = function (N, n, ...)
{
tt = 0
x = N$M
S = t(N$Post - N$Pre)
u = nrow(S)
v = ncol(S)
tvec = vector("numeric", n)
xmat = matrix(ncol = u, nrow = n + 1)
xmat[1, ] = x
for (i in 1:n) {
h = N$h(x, tt, ...)
tt = tt + rexp(1, sum(h))
j = sample(v, 1, prob = h)
x = x + S[, j]
tvec[i] = tt
xmat[i + 1, ] = x
if( tt >=20000){
x[4] = x[4] +10000
}
}
return(list(t = tvec, x = xmat))
}
newout = gillespie1(LO,200000)
matplot(newout$x[,4], type="l", lwd=0.25, col="grey")
I don't have a high enough reputation to attach images, but it should look something like this:
https://gyazo.com/0ffd940a22df23b2ccfdf4a17e85dca8
Sorry if this isn't clear. Thanks
In this example, you have the function myTask(). When you call execMyTask(), you will execute myTask()once, and after that, you will execute it at random intervals between 1 to max_wait milliseconds. When you get tired, you can kill the task with tclTaskDelete().
library(tcltk2)
myTask <- function() cat("some task!\n")
id = "execMyTask"
execMyTask <- function(max_wait = 3000) {
id <- toString(match.call()[[1]])
myTask()
wait = sample(1:max_wait, 1)
cat("Waiting", wait, "miliseconds\n") # replace with your function
if (is.null(tclTaskGet(id))) {
tclTaskSchedule(wait=wait, execMyTask(), id=id, redo = TRUE)
} else {
tclTaskChange(wait=wait, execMyTask(), id=id, redo = TRUE)
}
}
execMyTask()
tclTaskDelete(id)
So far, there is a little problem with this approach, because we can not supply arguments to the function fun in tclTaskChange().

arima method in mtsdi

I have a large data set(more than 2000 rows and 2000 variables) with lots of missing values. I am using mnimputfunction of mtsdi package of R for imputing all missing values. This is my code
formula = data
imput_out <- mnimput(formula,data, by = NULL, log = FALSE, log.offset = 1,
eps = 1e-3, maxit = 1e2, ts = TRUE, method = "arima", ar.control = list(order = c(1,1,1), period = 4, f.eps = 1e-6, f.maxit = 1e3, ga.bf.eps = 1e-6,verbose = TRUE, digits = getOption("digits")))
But I am getting an error
Error in o[1:3, j] : incorrect number of dimensions
Please help me out.
you have to get real deep into the package source to uncover whats going on here.
the ar.control is placed into a variable o that is iterated on by the j # of columns that you put into your formula. so if your formula looks like ~c31+c32+c33 your ar term need to be 3 columns of (p,d,q) values
I assigned it outside of the ar.control parameter for ease of editing
arcontrol<-list(order=cbind(c(1,0,0),c(0,0,1),c(1,0,0)), period=NULL)
mnimput(formula,data,eps=1e-3,ts=TRUE, method="arima", ar.control=arcontrol
here is the package source if you are interested
function (xn, o, s, eps, maxit)
{
rows <- dim(xn)[1]
cols <- dim(xn)[2]
models <- as.list(rep(NA, cols))
ar.pred <- matrix(NA, nrow = rows, ncol = cols)
for (j in 1:cols) {
if (is.null(s)) {
order <- o[1:3, j]
seasonal <- list(order = c(0, 0, 0), period = NA)
}
else {
order <- o[1:3, j]
seasonal <- list(order = o[4:6, j], period = s)
}
models[[j]] <- arima(xn[, j], order = order, seasonal = seasonal,
xreg = NULL, optim.control = list(maxit = maxit,
reltol = eps))
ar.pred[, j] <- xn[, j] - residuals(models[[j]])
}
retval <- list(ar.pred = ar.pred, models = models)
return(retval)
}

Resources