plotting the posterior distribution of N given likelihood and prior - r

Given that the likelihood is Y|n~Binomial(n, theta) and the prior is n~Poisson(5), I tried to calculate the posterior distribution of sample size n with Y=0 and theta=0.2. My code is as follows:
Y <- 0
theta <- 0.2
n_grid <- seq(0,1,length=1000)
like <- dbinom(Y,n_grid,theta)
prior <- dpois(n_grid,5)
fy <- sum(like*prior)
post <- like*prior/fy
plot(n_grid,post,type="l")
I keep getting NaN results when computing the likelihood function and priors. Any help would be appreciated!

So I realize it might be unconventional to answer my own question, but I figured out my solution and thought I would post the answer to help someone else out.
Y <- 0
theta <- 0.2
N <- 0:0.01:100
like <- dbinom(Y,N,theta)
prior <- dpois(N,5)
fy <- sum(like*prior)
post <- like*prior/fy
plot(N,post,type="l")

Related

Bayesian relative premium calculation: problem with integration in r

I ran into a problem, when i was trying to make a program which counts relative premium for policy-holders in bonus malus system, here is the formula for relative premium:
theta is random accident proneness and it follows gamma distribution with parameters(alpha, alpha) which can be seen below in the code, so dF is density function of this distribution. represents stationary distribution in level l, and is counted by this formula: [ where e is vector of 1s I is matrix of identity and E is matrix of 1s, P is transition probability matrix.
Problem is that in formula for relative premium, stationary distribution is dependent from theta, which is something unknown.
Here is what I tried to do: tried to do it through 2 functions, 1 for denominator, 1 for numerator, they count the probability transition matrix, stationary distribution and outputs are then integrated by integrate function, however problem comes when counting the stationary distribution, R says that the arrays are not confortable, i think its because of the unknown theta. this is how the code looks like:
`
E <- rbind(c(1,1,1,1,1),c(1,1,1,1,1),c(1,1,1,1,1),c(1,1,1,1,1),c(1,1,1,1,1))
e <- c(1,1,1,1,1)
I <- diag(5)
lambda <- 0.1474
alpha <- 0.889
level <- 5
rl1 <- function(teta){
for(l in 1:level){
M <- rbind(c(exp(-lambda*teta),0,0,0,1-exp(-lambda*teta)),c(exp(-lambda*teta),0,0,0,1-exp(-lambda*teta)),
c(0,exp(-lambda*teta),0,0,1-exp(-lambda*teta)),c(0,0,exp(-lambda*teta),0,1-exp(-lambda*teta)),
c(0,0,0,exp(-lambda*teta),1-exp(-lambda*teta)))
stac <- t(e)%*%solve(I-M+E)
integral1[l] <- teta*stac[l]*alpha^alpha*exp(-alpha*teta)*teta^(alpha-1)/gamma(alpha)
}
return(integral1)
}
x <- integrate(rl1,lower=0,upper=Inf)
rl2 <- function(teta){
for(l in 1:level){
M <- rbind(c(exp(-lambda*teta),0,0,0,1-exp(-lambda*teta)),c(exp(-lambda*teta),0,0,0,1-exp(-lambda*teta)),
c(0,exp(-lambda*teta),0,0,1-exp(-lambda*teta)),c(0,0,exp(-lambda*teta),0,1-exp(-lambda*teta)),
c(0,0,0,exp(-lambda*teta),1-exp(-lambda*teta)))
stac <- t(e)%*%solve(I-M+E)
integral2[l] <- stac[l]*alpha^alpha*exp(-alpha*teta)*teta^(alpha-1)/gamma(alpha)
}
return(integral2)
}
y <- integrate(rl2,lower=0,upper=Inf)
rl <- x/y
`
Please can you help me solving this problem?
Thanks

R: probability / numerical integral of bivariate (or multivariate) kernel density

I am using the package ks for kernel density estimation. Here's an easy example:
n <- 70
x <- rnorm(n)
library(ks)
f_kde <- kde(x)
I am actually interested in the respective exceeding probabilities of my input data, which can be easily returned by ks having f_kde:
p_kde <- pkde(x, f_kde)
This is done in ks with a numerical integration using Simpson's rule. Unfortunately, they only implemented this for a 1d case. In a bivariate case, there's no implementation in ks of any method for returning the probabilities :
y <- rnorm(n)
f_kde <- kde(data.frame(x,y))
# does not work, but it's what I am looking for:
p_kde <- pkde(data.frane(x,y), f_kde)
I couldnt find any package or help searching in stackoverflow to solve this issue in R (some suggestions for Python exist, but I would like to keep it in R). Any line of code or package recommendation is appreciated. Even though I am mostly interested in the bivariate case, any ideas for a multivariate case are appreciated as well.
kde allows multidimensional kernel estimate, so we could use kde to calculate pkde.
For this, we calculate kde on small enough dx and dy steps using eval.points parameter : this gives us the local density estimate on a dx*dy
square.
We verify that the sum of estimates mutiplied by the surface of the squares almost equals 1:
library(ks)
set.seed(1)
n <- 10000
x <- rnorm(n)
y <- rnorm(n)
xy <- cbind(x,y)
xmin <- -10
xmax <- 10
dx <- .1
ymin <- -10
ymax <- 10
dy <- .1
pts.x <- seq(xmin, xmax, dx)
pts.y <- seq(ymin, ymax, dy)
pts <- as.data.frame(expand.grid(x = pts.x, y = pts.y))
f_kde <- kde(xy,eval.points=pts)
pts$est <- f_kde$estimate
sum(pts$est)*dx*dy
[1] 0.9998778
You can now query the pts dataframe for the cumulative probability on the area of your choice :
library(data.table)
setDT(pts)
# cumulative density
pts[x < 1 & y < 2 , .(pkde=sum(est)*dx*dy)]
pkde
1: 0.7951228
# average density around a point
tolerance <-.1
pts[pmin(abs(x-1))<tolerance & pmin(abs(y-2))<tolerance, .(kde = mean(est))]
kde
1: 0.01465478

Manually calculate two-sample kolmogorov-smirnov using ECDF

I am trying to manually compute the KS statistic for two random samples. As far as I understood the KS statistic D is the maximum vertical deviation between the two CDFs. However, manually calculating the differences between the two CDF and running the ks.test from the base R yields different results. I wonder where is the mistake.
set.seed(123)
a <- rnorm(10000)
b <- rnorm(10000)
### Manual calculation
# function for calculating manually the ecdf
decdf <- function(x, baseline, treatment) ecdf(baseline)(x) - ecdf(treatment)(x)
#Difference between the two CDFs
d <- curve(decdf(x,a,b), from=min(a,b), to=max(a,b))
# getting D
ks <- max(abs(d$y))
#### R-Base calculation
ks.test(a,b)
The R-Base D = 0.0109 while the manual calculation is 0.0088. Any help explaining the difference is appreciated.
I attach the R-Base source code ( a bit cleaned up)
n <- length(a)
n.x <- as.double(n)
n.y <- length(b)
n <- n.x * n.y/(n.x + n.y)
w <- c(a, b)
z <- cumsum(ifelse(order(w) <= n.x, 1/n.x, -1/n.y))
STATISTIC <- max(abs(z))
By default, curve evaluates the function on a subdivision of 100 points between from and to. By restricting to these 100 points, it's possible that you miss the value for which the maximum difference is attained.
Instead, evaluate the difference at all points where the ecdf's jump and you are sure to catch the value for which the maximum difference is attained.
set.seed(123)
a <- rnorm(10000)
b <- rnorm(10000)
Fa <- ecdf(a)
Fb <- ecdf(b)
x <- c(a,b) # the points where Fa or Fb jump
max(abs(Fa(x) - Fb(x)))
# [1] 0.0109

Remove linear trend from raster stack R

Trying remove the linear trend (detrend) from a monthly precipitation raster stack for the US from 1979-2015 (https://www.northwestknowledge.net/metdata/data/monthly/pr_gridMET.nc). These data are large enough that using those data as an example would be a bit unruly here so I am going to use the data from the raster package for sake of efficiency. The working model I have currently is to use `raster"::calc`` on a linear model and pull the residuals. My understanding is that those residuals are the detrended series, but I am not 100% sure that is correct. The code I am using is as follows:
library(raster)
fn <- raster(system.file("external/test.grd", package="raster"))
fn2 <- fn+1000
fn3 <- fn +500
fn4 <- fn +750
fn5 <- fn+100
fns <- stack(fn, fn2, fn3, fn4, fn5)
time <- 1:nlayers(fns)
# Get residuals to detrend the raw data
get_residuals <- function(x) {
if (is.na(x[1])){
rep(NA, length(x)) }
else {
m <- lm(x~time)
q <- residuals(m)
return(q)
}
}
detrended_fns <- calc(fns, get_residuals) # Create our residual (detrended) time series stack
I feel like I'm missing something here. Can anyone confirm that I'm on the right track here? If I'm not any suggestions on how to properly detrend these data would be helpful! thanks!
The residuals remove the slope and the intercept and you get anomalies. Perhaps you only want to remove the slope? In that case you could add the intercept to the residuals in get_residuals
q <- residuals(m) + coefficients(m)[1]
Or better:
q <- residuals(m) + predict(m)[1]
So that you use year 1 (and not year 0) as the base, and it would also work if time is, say, 2000:2004
You could also take the last year, mid year, or average as base.

Defining a threshold for results in lsoda, R language

I have a problem with lsoda in deSolve package in R. (It might be applicable to ode function too). I am modeling the dynamics of a food web using a set of ODEs calculating abundances of 5 species in two identical food webs which are connected through dispersal.
the abundances are calculated in 2000 time steps, and they are not supposed to be negative or less than 1e-6. In that case the result should be changed into 0. I could not find any parameter for lsoda to turn negative results into zero. I tried the following trick in my ODE function:
solve.model <- function(t,y, parms){
solve.model <- function(t,y, parms){
y <- ifelse(y<1e-6, 0, y)
#ODE functions here
#...
#...
return(list(dy))
}
but it seems not working. Below is a sample of abundances of species in a web.
I will be very grateful for your help, and hope the sample code can give enough information about my problem.
Babak
P.S. I am solving the following ODE set for the abundances of species(the first two equations) and resource change (third equation)
the corresponding code for the function is as below
solve.model <- function(t, y, parms){
y <- ifelse(y<1e-6, 0, y)
with(parms,{
# return from vector form into matrix form for calculations
(R <- as.matrix(y[(max(no.species)*length(no.species)+1):length(y)]))
(N <- matrix(y[1:(max(no.species)*length(no.species))], ncol=length(no.species)))
dy1 <- matrix(nrow=max(no.species), ncol=length(no.species))
dy2 <- matrix(nrow=length(no.species), ncol=1)
for (i in 1:no.webs){
species <- no.species[i]
(abundance <- N[1:species,i])
adj <- as.matrix(webs[[i]])
a.temp <- a[1:species, 1:species]*adj
b.temp <- b[1:species, 1:species]*adj
h.temp <- h[1:species, 1:species]*adj
#Calculating sigmas in denominator of Holing type II functional response
(sum.over.preys <- abundance%*%(a.temp*h.temp))
(sum.over.predators <- (a.temp*h.temp)%*%abundance)
#Calculating growth of basal
(basal.growth <- basals[,i]*N[,i]*(mu*R[i]/(K+R[i])-m))
# Calculating growth for non-basal species
no.basal <- rep(1,len=species)-basals[1:species]
predator.growth<- rep(0, max(no.species))
(predator.growth[1:species] <- ((abundance%*%(a.temp*b.temp))/(1+sum.over.preys)-m*no.basal)*abundance)
predation <- rep(0, max(no.species))
(predation[1:species] <- (((a.temp*b.temp)%*%abundance)/t(1+sum.over.preys))*abundance)
(pop <- basal.growth + predator.growth - predation)
dy1[,i] <- pop
dy2[i] <- 0.0005 #Change in the resource
}
#Calculating dispersals .they can be easily replaced
# by adjacency maps of connections between food webs arbitrarily!
# added to solve the problem of negative abundances
deltas <- append(c(dy1), dy2)
return(list(append(c(dy1),dy2)))
})
}
this function is used by lsoda by the following call:
temp.abund[[j]] <- lsoda(y=initials, func=solve.model, times=0:max.time, parms=parms)

Resources