Recursive sum, using Poisson distribution

Recursive sum, using Poisson distribution - r

i am trying to build a recursive function in R,
H(x,t) = \sum\limits_{d=0}^{x} (Pr(D=d)*(h*(x-d)+H(x-d,t-1)))
+ \sum\limits_{d=x+1}^{\infty} (Pr(D=d)*(p(*d-x)+ H(0,t-1)))
Where h,p are some constants, D ~ Po(l) and H(x,0) = 0, the are code i have done so far, gives an obvious error, but i can't see the fix. The code
p<- 1000 # Unit penalty cost for lost sales
h<- 10 # Unit inventory holding cost pr. time unit
l<- 5 # Mean of D
H <- function(x,t){
if(t==0)(return(0))
fp <- 0
sp <- 0
for(d in 0:x){
fp <- fp + dpois(x=d,l)*(h*(x-d)+H(x-d,t-1))
}
for(d in x+1:Inf){
sp <- sp + dpois(x=d,l)*(p*(d-x)+H(0,t-1))
}
return(fp+sp)
}
When i run this, the error is
Error in 1:Inf : result would be too long a vector
Which, seems obvious, so the question is, can anyone point me in the direction to redefine the problem, so i can get R to bring me a solution?
Thanks in advance.

Going from x+1:Inf won't work. Since you're using poisson's pdf, you can just add a upper bound (why? think about the shape of the pdf and how small the values are at the right tail):
for(d in x+1:100)
which when ran for H(20,2) gives
[1] 252.806
when you increase it to
for(d in x+1:500)
then H(20,2) also gives
[1] 252.806

Related

Simulating a process n times in R

I've written an R script (sourced from here) simulating the path of a geometric Brownian motion of a stock price, and I need the simulation to run 1000 times such that I generate 1000 paths of the process Ut = Ste^-mu*t, by discretizing the law of motion derived from Ut which is the bottom line of the solution to the question posted here.
The process also has n = 252 steps and discretization step = 1/252, also risk of sigma = 0.4 and instantaneous drift mu, which I've treated as zero, although I'm not sure about this. I'm struggling to simulate 1000 paths of the process but am able to generate one single path, I'm unsure which variables I need to change or whether there's an issue in my for loop that's restricting me from generating all 1000 paths. Could it also be that the script is simulating each individual point for 252 realization instead of simulating the full process? If so, would this restrict me from generating all 1000 paths? Is it also possible that the array I'm generating defined as U hasn't being correctly generated by me? U[0] must equal 1 and so too must the first realization U(1) = 1. The code is below, I'm pretty stuck trying to figure this out so any help is appreciated.
#Simulating Geometric Brownian motion (GMB)
tau <- 1 #time to expiry
N <- 253 #number of sub intervals
dt <- tau/N #length of each time sub interval
time <- seq(from=0, to=N, by=dt) #time moments in which we simulate the process
length(time) #it should be N+1
mu <- 0 #GBM parameter 1
sigma <- 0.4 #GBM parameter 2
s0 <- 1 #GBM parameter 3
#simulate Geometric Brownian motion path
dwt <- rnorm(N, mean = 0, sd = 1) #standard normal sample of N elements
dW <- dwt*sqrt(dt) #Brownian motion increments
W <- c(0, cumsum(dW)) #Brownian motion at each time instant N+1 elements
#Define U Array and set initial values of U
U <- array(0, c(N,1)) #array of U
U[0] = 1
U[1] <- s0 #first element of U is s0. with the for loop we find the other N elements
for(i in 2:length(U)){
U[i] <- (U[1]*exp(mu - 0.5*sigma^2*i*dt + sigma*W[i-1]))*exp(-mu*i)
}
#Plot
plot(ts(U), main = expression(paste("Simulation of Ut")))

This questions is quite difficult to answer since there are a lot of unclear things, at least to me.
To begin with, length(time) is equal to 64010, not N + 1, which will be 254.
If I understand correctly, the brownian motion function returns the position in one dimension given a time. Hence, to calculate this position for each time the following can be enough:
s0*exp((mu - 0.5*sigma^2)*time + sigma*rnorm(length(time),0,time))
However, this calculates 64010 points, not 253. If you replicate it 1000 times, it gives 64010000 points, which is quite a lot.
> B <- 1000
> res <- replicate(B, {
+ s0*exp((mu - 0.5*sigma^2)*time + sigma*rnorm(length(time),0,time))
+ })
> length(res)
[1] 64010000
> dim(res)
[1] 64010 1000
I know I'm missing the second part, the one explained here, but I actually don't fully understand what you need there. If you can draw the formula maybe I can help you.
In general, avoid programming in R using for loops to iterate vectors. R is a vectorized language, and there is no need for that. If you want to run the same code B times, the replicate(B,{ your code }) function is your firend.

R function loglik() returning -inf?

Simulating an SIR model in R. I have a data set I am trying to plot accurately with the model. I am right now using the particle filter function, then would like to use the corresponding logLik method on the result. When I do this, I get "[1] -Inf" as a result. I can't find in the documentation why this is and how I can avoid it. Are my parameters for the model not accurate enough? Is there something else wrong?
My function looks like this:
SIRsim %>%
pfilter(Np=5000) -> pf
logLik(pf)
From an online course lesson entitled Likelihood for POMPS https://kingaa.github.io/sbied/pfilter/ , this is the R script for the lesson. However, the code works here... I'm not sure how to reproduce my specific problem with it and unfortunately cannot share the dataset or code I am using because it is for academic research.
library(tidyverse)
library(pomp)
options(stringsAsFactors=FALSE)
stopifnot(packageVersion("pomp")>="3.0")
set.seed(1350254336)
library(tidyverse)
library(pomp)
sir_step <- Csnippet("
double dN_SI = rbinom(S,1-exp(-Beta*I/N*dt));
double dN_IR = rbinom(I,1-exp(-mu_IR*dt));
S -= dN_SI;
I += dN_SI - dN_IR;
R += dN_IR;
H += dN_IR;
")
sir_init <- Csnippet("
S = nearbyint(eta*N);
I = 1;
R = nearbyint((1-eta)*N);
H = 0;
")
dmeas <- Csnippet("
lik = dbinom(reports,H,rho,give_log);
")
rmeas <- Csnippet("
reports = rbinom(H,rho);
")
read_csv("https://kingaa.github.io/sbied/pfilter/Measles_Consett_1948.csv")
%>%
select(week,reports=cases) %>%
filter(week<=42) %>%
pomp(
times="week",t0=0,
rprocess=euler(sir_step,delta.t=1/7),
rinit=sir_init,
rmeasure=rmeas,
dmeasure=dmeas,
accumvars="H",
statenames=c("S","I","R","H"),
paramnames=c("Beta","mu_IR","eta","rho","N"),
params=c(Beta=15,mu_IR=0.5,rho=0.5,eta=0.06,N=38000)
) -> measSIR
measSIR %>%
pfilter(Np=5000) -> pf
logLik(pf)
library(doParallel)
library(doRNG)
registerDoParallel()
registerDoRNG(652643293)
foreach (i=1:10, .combine=c) %dopar% {
measSIR %>% pfilter(Np=5000)
} -> pf
logLik(pf) -> ll
logmeanexp(ll,se=TRUE)

If I set Beta=100 in the code above I can get a negative-infinite log-likelihood.
Replacing the measurement-error snippet with this:
dmeas <- Csnippet("
double ll = dbinom(reports,H,rho,give_log);
lik = (!isfinite(ll) ? -1000 : ll );
")
appears to 'solve' the problem, although you should be a little bit careful; papering over numerical cracks like this is sometimes OK, but could conceivably come back to bite you in some way later on. If you just need to avoid non-finite values long enough to get into a reasonable parameter range this might be OK ...
Some guesses as to why this is happening:
you are somehow getting an "impossible" situation like a positive number of reported cases when the underlying true number of infections is zero.
Sometimes non-finite log-likelihoods occur when a very small positive probability underflows to zero. The equivalent here is likely that the probability of infection 1-exp(-Beta*I/N*dt) goes to 1.0; then any observed outcome where less than 100% of the population is infected is impossible.
You can try to diagnose the situation by seeing what the filtered trajectory actually looks like and comparing it with the data, or by adding debugging statements to the code. If there's a way to run just the deterministic simulation with your parameter values that might tell you pretty quickly what's going wrong.
An easier/more direct way to debug would be to replace the Csnippet you're using for dmeas with an R function: this will be slower but easier to work with (especially if you're not familiar with C coding). If you uncomment the browser() statement below, the code will drop into debug mode when you encounter the bad situation ...
dmeas <- function(reports,H,rho,log, ...) {
lik <- dbinom(reports,size=H,prob=rho,log=log)
if (!is.finite(lik)) {
lik <- -1000
## browser()
}
return(lik)
}
For example:
(t = 3, reports = 2, S = 2280, I = 0, R = 35721, H = 0, Beta = 100,
mu_IR = 0.5, rho = 0.5, eta = 0.06, N = 38000, log = TRUE)
Browse[1]> debug at /tmp/SO65554258.R!ZlSILG#7: return(lik)
Browse[2]> reports
[1] 2
Browse[2]> H
[1] 0
Browse[2]> rho
[1] 0.5
This shows that the problem is indeed that you have a positive number of reported cases when there have been zero infections ... R is trying to compute the binomial probability of observing reports cases out when there are H infections that are potentially reportable, each reported with a probability rho. When the number of trials N in a binomial probability Binom(N,p) is zero, the only possible outcome is zero 'successes' (reported cases), with probability 1. All other outcomes have probability 0 (and log-probability -Inf).

Optimization - Limits and simple constraint

I have a rather simple optimization question and while I'm fairly decent with R, optimization is something I haven't done a lot.
my.function <- function(parameters){
x <- parameters[1]
y <- parameters[2]
z <- parameters[3]
((10*x^2) - ((y/2) * (z/4)))^2
}
result <- optim(c(7,10,18),fn = my.function, method = 'L-BFGS-B',
lower = c(2,7,7),
upper = c(15,20,20))
result$par
#[1] 2.205169 19.546621 19.902243
This is a made up version of the problem I'm working on, so please forgive it if its purpose makes no sense. I have limits in place using the 'L-BFGS-B' method but I need to add a constraint and I'm unsure how to do it. My rules that I'm trying to implement are as follows:
x must be between 2 and 15
y must be between 7 and 20
z must be between 7 and 20
z <= y
It's the last one I don't know how to implement. Any help would be appreciated. Thank you.

Add a large number to the objective function if the constraint is violated, i.e. change the last line of my.function to:
((10*x^2) - ((y/2) * (z/4)))^2 + ifelse(y > z, 10^5, 0)
The result in this case is the following which does satisfy the constraint. Also, since the objective is non-negative its value cannot be less than 0 so we have achieved the minimum to numeric tolerance.
result$par
## [1] 2.223537 19.776462 20.000000
result$value
## [1] 1.256682e-11

optim function in R: L-BFGS-B needs finite values of 'fn'

I am trying to use the optim function in R for a MLE of three variables, but i keep getting the error: Error in optim(fn = logL_geotest5_test, par = c(0.2, 1.5, 0.3), I = I, :
L-BFGS-B needs finite values of 'fn'
I am trying to understand the reasons behind this error and it seems to be related to the maximal value of loglikelihood beeing beyond .Machine$double.xmax.
This Code is part of the geometricVaR Backtesting Method provided by Pelletier &Wei and i can provide you with the loglikelihood. However, optimization worked ( and occasionally didnt) before, so i assume that this is not the problem. If you wish, i can provide you with the formular for the LL, but it is a long code ( and i wanted to keep this post as short as possible).
I am thankful for any suggestions and ideas.
V is a vector of 250 values.
N<-100
iTest<-mat.or.vec(250,N)
iTest<-replicate(n=N,rbinom(n= 250, size=1, prob = 0.01))
LL_H0<-mat.or.vec(1,N)
for(i in 1:N){
I<-iTest[,i]
logL_gtest<-function(Omega,I,VaR){
a=Omega[1]; b=Omega[2]; z=Omega[3]
logL(I,a,b,z,VaR)
}
lower_boundary<- c(1e-8,0,0); upper_boundary<-rep(1,2,2)
LL_H0help <- optim(fn=logL_gtest, par=c(0.2, 1.5,0.3), I=I,VaR=VaR, lower=lower_boundary, upper=upper_boundary, method= "L-BFGS-B")
LL_H0[,i] <- LL_H0help$value
}
Edit1:
Thank you for your advises so far. I am still looking for the right place to insert the Browser function. Meanwhile I'll give you the LL-function:
logL<-function(I,a,b,z,VaR){
m <- sum(I)
v<-which(I == 1)
v<-c(0,v,250)
d<-c(diff(v))
if(a<0 | a>=1 | b<0 | b>1 | z<0 | (m-1)<3){logL<-NA
}else{
s<-rep(0,length(d))
f<-rep(0,length(d))
for(i in 1:length(d)){
lambda<-mat.or.vec(length(d),1)
lambda<- function(a, b, z, d, VaR){
lambda <- a*ds^(b-1)*(exp(-z*(VaR1)))
return(lambda)
}
VaR1<-VaR[(v[i]+1):v[i+1]]
ds<-seq(1:d[i])
lhelp<-lambda(a, b, z, ds, VaR1)
lhelp<-na.omit(lhelp)
lf<-c(1-lhelp[1:(length(lhelp)-1)], lhelp[length(lhelp)])
f[i]<-prod(lf)
ls<-c(1-lhelp[1:(length(lhelp)-1)])
s[i]<-prod(ls)
}
part1 <- ifelse(d[1]>0,log(s[1]), log(f[1]) )
part2 <- sum(log(f[2:(length(d)-1)]))
part3 <- ifelse(d[length(d)]<250,log(s[length(d)]), log(f[length(d)]))
logL <- part1 + part2 + part3
return(-logL)
}
}
Edit2: Forgot to mention that V is a vector of Value-At-Risk computations, therefore beeing small values of around -0.02.
Edit3:Thank you for your suggestions so far. I replaced any V by the VaR and c by z. VaR is a vector of computed Value-at-risks of length 250. All values are roughly around -0.018 to -0.024.

I doubt anybody can guess what the issue is but I can tell you how to debug it yourself:
Use something like:
browser(expr=yourVariable==Inf)
in your likelihood code so that you can explore the variables values and understand why it comes up. Check the help of this function, very helpful as usual in R. Feel free to edit the answer if there is some typo, I cannot check it in R right now.

possible bug in `rbinom()` for large numbers of trials

I've been writing some code that iteratively performs binomial draws (using rbinom) and for some callee arguments I can end up with the size being large, which causes R (3.1.1, both official or homebrew builds tested—so unlikely to be compiler related) to return an unexpected NA. For example:
rbinom(1,2^32,0.95)
is what I'd expect to work, but gives NA back. However, running with size=2^31 or prob≤0.5 works.
The fine manual mentions inversion being used when size < .Machine$integer.max is false, could this be the issue?

Looking at the source rbinom does the equivalent (in C code) of the following for such large sizes:
qbinom(runif(n), size, prob, FALSE)
And indeed:
set.seed(42)
rbinom(1,2^31,0.95)
#[1] 2040095619
set.seed(42)
qbinom(runif(1), 2^31, 0.95, F)
#[1] 2040095619
However:
set.seed(42)
rbinom(1,2^32,0.95)
#[1] NA
set.seed(42)
qbinom(runif(1), 2^32, 0.95, F)
#[1] 4080199349
As #BenBolker points out rbinom returns an integer and if the return value is larger than .Machine$integer.max, e.g., larger than 2147483647 on my machine, NA gets returned. In contrast qbinom returns a double. I don't know why and it doesn't seem to be documented.
So, it seems like there is at least undocumented behavior and you should probably report it.

I agree that (in the absence of documentation saying this is a problem) that this is a bug. A reasonable workaround would be using the Normal approximation, which should be very very good indeed (and faster) for such large values. (I originally meant this to be short and simple but it ended up getting a little bit out of hand.)
rbinom_safe <- function(n,size,prob,max.size=2^31) {
maxlen <- max(length(size),length(prob),n)
prob <- rep(prob,length.out=maxlen)
size <- rep(size,length.out=maxlen)
res <- numeric(n)
bigvals <- size>max.size
if (nbig <- sum(bigvals>0)) {
m <- (size*prob)[bigvals]
sd <- sqrt(size*prob*(1-prob))[bigvals]
res[bigvals] <- round(rnorm(nbig,mean=m,sd=sd))
}
if (nbig<n) {
res[!bigvals] <- rbinom(n-nbig,size[!bigvals],prob[!bigvals])
}
return(res)
}
set.seed(101)
size <- c(1,5,10,2^31,2^32)
rbinom_safe(5,size,prob=0.95)
rbinom_safe(5,3,prob=0.95)
rbinom_safe(5,2^32,prob=0.95)
The Normal approximation should work reasonably well whenever the mean is many standard deviations away from 0 or 1 (whichever is closer). For large N this should be OK unless p is very extreme. For example:
n <- 2^31
p <- 0.95
m <- n*p
sd <- sqrt(n*p*(1-p))
set.seed(101)![enter image description here][1]
rr <- rbinom_safe(10000,n,prob=p)
hist(rr,freq=FALSE,col="gray",breaks=50)
curve(dnorm(x,mean=m,sd=sd),col=2,add=TRUE)
dd <- round(seq(m-5*sd,m+5*sd,length.out=101))
midpts <- (dd[-1]+dd[-length(dd)])/2
lines(midpts,c(diff(sapply(dd,pbinom,size=n,prob=p))/diff(dd)[1]),
col="blue",lty=2)

This is the intended behaviour, but there are two issues:
1) The NA induced by coercion should raise a warning
2) The fact that discrete random variables have storage mode integer should be documented.
I have fixed 1) and will modify the documentation to fix 2) when I have a little more time.