Efficient simulation of brownian motion with drift in R - r

I want to efficiently simulate a brownian motion with drift d>0, where the direction of the drift changes, if some barriers b or -b are exceeded (no reflection, just change of drift direction!).
A for-loop is the simple way doing this
step<-0.1 #step size
sig<-1 #sign of drift
T<-10^4 #length of process
b<-300; d<-0.5#barrier and drift
W<-rep(NA,(T/step))
W[1]<-0
for (i in 2:(T/step))
{
if (W[i-1]>b) {sig<- -1} #change drift to -1
if (W[i-1]< -b) {sig<-1} #change drift to +1
W[i]<-W[i-1]+rnorm(1,d*sig*step,sqrt(step))
}
Of course this loop takes much time in R especially for small step size.
Therefore, I am interested in a more efficient solution maybe using vectorial operations or the apply()-command. (in case of a simple brownian motion I can use cumsum(rnorm()), is a similar solution possible here?)
Thanks a lot!!

You have a recursive calculation for W[i] and sig which also does a bit of logic at each step. There's probably not much you can do in R to reduce the execution times by a large factor but there are a couple of things that will reduce times by almost 50%. In particular rather than calling rnorm on each step, vectorize this calculation by calling rnorm once with mean=0 to compute num_step values and store the result. In each step of the loop, the value from this vector is added to the mean for that step. The logic determining the value of sig can also be simplified a bit. The code with timings for the posted approach and the new code are:
step<-0.1 #step size
T<-10^4 #length of process
b<-300; d<-0.5 #barrier and drift
print(system.time({
sig <- 1 #sign of drift
set.seed(123) # set seed
W<-rep(NA,(T/step))
W[1]<-0
for (i in 2:(T/step))
{
if (W[i-1]>b) {sig<- -1} #change drift to -1
if (W[i-1]< -b) {sig<-1} #change drift to +1
W[i]<-W[i-1]+rnorm(1,d*sig*step,sqrt(step))
}
}))
print(system.time({
sig <- 1 # reset value of sig
set.seed(123) # reset seed
num_steps <- trunc(T/step)
W1 <- numeric(num_steps)
ep <- rnorm(num_steps, 0, sqrt(step))
for (i in 2:num_steps) {
if(abs(W1[i-1]) > b) sig <- ifelse( W1[i-1] >b, -1, 1)
W1[i] <- W1[i-1]+d*sig*step +ep[i-1]
}
}))
The results of the two calculations, W and W1, should be identical.

Related

Simulating a process n times in R

I've written an R script (sourced from here) simulating the path of a geometric Brownian motion of a stock price, and I need the simulation to run 1000 times such that I generate 1000 paths of the process Ut = Ste^-mu*t, by discretizing the law of motion derived from Ut which is the bottom line of the solution to the question posted here.
The process also has n = 252 steps and discretization step = 1/252, also risk of sigma = 0.4 and instantaneous drift mu, which I've treated as zero, although I'm not sure about this. I'm struggling to simulate 1000 paths of the process but am able to generate one single path, I'm unsure which variables I need to change or whether there's an issue in my for loop that's restricting me from generating all 1000 paths. Could it also be that the script is simulating each individual point for 252 realization instead of simulating the full process? If so, would this restrict me from generating all 1000 paths? Is it also possible that the array I'm generating defined as U hasn't being correctly generated by me? U[0] must equal 1 and so too must the first realization U(1) = 1. The code is below, I'm pretty stuck trying to figure this out so any help is appreciated.
#Simulating Geometric Brownian motion (GMB)
tau <- 1 #time to expiry
N <- 253 #number of sub intervals
dt <- tau/N #length of each time sub interval
time <- seq(from=0, to=N, by=dt) #time moments in which we simulate the process
length(time) #it should be N+1
mu <- 0 #GBM parameter 1
sigma <- 0.4 #GBM parameter 2
s0 <- 1 #GBM parameter 3
#simulate Geometric Brownian motion path
dwt <- rnorm(N, mean = 0, sd = 1) #standard normal sample of N elements
dW <- dwt*sqrt(dt) #Brownian motion increments
W <- c(0, cumsum(dW)) #Brownian motion at each time instant N+1 elements
#Define U Array and set initial values of U
U <- array(0, c(N,1)) #array of U
U[0] = 1
U[1] <- s0 #first element of U is s0. with the for loop we find the other N elements
for(i in 2:length(U)){
U[i] <- (U[1]*exp(mu - 0.5*sigma^2*i*dt + sigma*W[i-1]))*exp(-mu*i)
}
#Plot
plot(ts(U), main = expression(paste("Simulation of Ut")))
This questions is quite difficult to answer since there are a lot of unclear things, at least to me.
To begin with, length(time) is equal to 64010, not N + 1, which will be 254.
If I understand correctly, the brownian motion function returns the position in one dimension given a time. Hence, to calculate this position for each time the following can be enough:
s0*exp((mu - 0.5*sigma^2)*time + sigma*rnorm(length(time),0,time))
However, this calculates 64010 points, not 253. If you replicate it 1000 times, it gives 64010000 points, which is quite a lot.
> B <- 1000
> res <- replicate(B, {
+ s0*exp((mu - 0.5*sigma^2)*time + sigma*rnorm(length(time),0,time))
+ })
> length(res)
[1] 64010000
> dim(res)
[1] 64010 1000
I know I'm missing the second part, the one explained here, but I actually don't fully understand what you need there. If you can draw the formula maybe I can help you.
In general, avoid programming in R using for loops to iterate vectors. R is a vectorized language, and there is no need for that. If you want to run the same code B times, the replicate(B,{ your code }) function is your firend.

How to find the probability of extinction = 1 using Galton-Watson process in R?

I am simulating a basic Galton-Watson process (GWP) using a geometric distribution. I'm using this to find the probability of extinction for each generation. My question is, how do I find the generation at which the probability of extinction is equal to 1?
For example, I can create a function for the GWP like so:
# Galton-Watson Process for geometric distribution
GWP <- function(n, p) {
Sn <- c(1, rep(0, n))
for (i in 2:(n + 1)) {
Sn[i] <- sum(rgeom(Sn[i - 1], p))
}
return(Sn)
}
where, n is the number of generations.
Then, if I set the geometric distribution parameter p = 0.25... then to calculate the probability of extinction for, say, generation 10, I just do this:
N <- 10 # Number of elements in the initial population.
GWn <- replicate(N, GWP(10, 0.25)[10])
probExtinction <- sum(GWn==0)/N
probExtinction
This will give me the probability of extinction for generation 10... to find the probability of extinction for each generation I have to change the index value (to the corresponding generation number) when creating GWn... But what I'm trying to do is find at which generation will the probability of extinction = 1.
Any suggestions as to how I might go about solving this problem?
I can tell you how you would do this problem in principle, but I'm going to suggest that you may run into some difficulties (if you already know everything I'm about to say, just take it as advice to the next reader ...)
theoretically, the Galton-Watson process extinction probability never goes exactly to 1 (unless prob==1, or in the infinite-time limit)
of course, for any given replicate and random-number seed you can compute the first time point (if any) at which all of your lineages have gone extinct. This will be highly variable across runs, depending on the random-number seed ...
the distribution of extinction times is extremely skewed; lineages that don't go extinct immediately will last a loooong time ...
I modified your GWP function in two ways to make it more efficient: (1) stop the simulation when the lineage goes extinct; (2) replace the sum of geometric deviates with a single negative binomial deviate (see here)
GWP <- function(n, p) {
Sn <- c(1, rep(0, n))
for (i in 2:(n + 1)) {
Sn[i] <- rnbinom(1, size=Sn[i - 1], prob=p)
if (Sn[i]==0) break ## extinct, bail out
}
return(Sn)
}
The basic strategy now is: (1) run the simulations for a while, keep the entire trajectory; (2) compute extinction probability in every generation; (3) find the first generation such that p==1.
set.seed(101)
N <- 10 # Number of elements in the initial population.
maxgen <- 100
GWn <- replicate(N, GWP(maxgen, 0.5), simplify="array")
probExtinction <- rowSums(GWn==0)/N
which(probExtinction==1)[1]
(Subtract 1 from the last result if you want to start indexing from generation 0.) In this case the answer is NA, because there's 1/10 lineages that manages to stay alive (and indeed gets very large, so it will probably persist almost forever)
plot(0:maxgen, probExtinction, type="s") ## plot extinction probability
matplot(1+GWn,type="l",lty=1,col=1,log="y") ## plot lineage sizes (log(1+x) scale)
## demonstration that (sum(rgeom(n,...)) is equiv to rnbinom(1,size=n,...)
nmax <- 70
plot(prop.table(table(replicate(10000, sum(rgeom(10, prob=0.3))))),
xlim=c(0,nmax))
points(0:nmax,dnbinom(0:nmax, size=10, prob=0.3), col=2,pch=16)

How does the ODE function in R do the calculation

I am using the ODE function In R in order to solve this equation:
library(deSolve)
FluidH <- function(t,state,parameters) {
with(as.list(c(state,parameters)),
dh <- Qin/A - ((5073.3*h^2+6430.1*h)/(60*A))
list(c(dh))
})
}
parameters <- c(Qin =10, A=6200)
state<- c(h=0.35)
time <- seq(0,2000,by=1)
out <- ode(y= state, func = FluidH, parms = parameters, times = time)
I might be missing something with math, but when I try to calculate h by myself by assigning the initial state I don't get the same numbers as the output of the function!
for example to calculate h at time 1 : h=h0+ dh*dt -> h= 0.35 + 10/6200 - ((5073.3*h^2+6430.1*h)/(60*6200))=0.3438924348
and the output of ode gives 0.343973044412394
Can anyone tell what am I missing?
You computed the Euler step with step size dt=1. The solver uses a higher order method with (usually) a smaller step size that is adapted to meet the default error tolerances of 1e-6 for relative and absolute error. The step-size 1 that you give only determines where the numerical solution is sampled for the output, internally the solver may use many more or sometimes even less steps (interpolating the output values).

R: Autocorrelation in a matrix

i have do to a monte carlo approach for AR(1) time series. I have to generate 10,000 time series of length 100 and afterwards i have to get the first step autocorrelation rho_1 for every time series. My problem is that i just get NA values for the autocorrelation and the calculation takes way to much time. I have no problem with computing the AR(1) time series.
Thank you for your help :)
gen_ar <- function(a,b,length,start)
{
z<-rep(0,length)
e<-rnorm(n=length,sd=1)
z[1]<-start
for (i in 2:length)
{
z[i]<-a+b*z[i-1]+e[i]
}
z
}
mc <- matrix(c(rep(0,10000000)),nrow=10000)
for (i in 1:10000)
{
mc[i,] <- gen_ar(0.99,1,100,0)
}
ac <- matrix(c(rep(0,10000)),nrow=1)
for (i in 1:10000){
for (j in 1:99){
ac[i] <- cor(mc[i,j],mc[i,j+1])
}
}
Statistics aside, I think this achieves your goals, and I don't get NA's. I changed the way it was done b/c you said it was going slow.
mc <- matrix(rep(NA,1E5), nrow=100)
for(i in seq_len(100)){
mc[,i] <- arima.sim(model=list(ar=0.99), n=100, sd=1) + 1
}
myAR <- function(x){
cor(x[-1], x[-length(x)])
}
answer <- apply(mc, 2, myAR)
I skipped the last set of nested for loops and replaced them with apply(). It seems easier to read, and is likely faster. Also, to use apply(), I created a function called myAR, which carries out the same calculation that cor() did in your for() loops.
Now, there are a couple of statistical adjustments that I made. Primarily, these were in the simulation step.
First, your simulated AR(1) process has a coefficient that is equal to 1, which seems odd to me (this would not be stationary, and arima.sim() won't even let you simulate this type of process).
Moreover, your "a" parameter adds 1 to the time series at each time step. In other words, your time series is monotonically increasing from 1 to 100 because the coefficient is equal to 1. This too would make your time series nonstationary, and with such a strong positive slope the cor() function would likely return 1 as the estimated correlation, regardless of the value of the simulated AR coefficient. I assume that you wanted the long-term mean to hover near 1, so the 1 is simply added to the entire time series after it is simulated, not iteratively at each time step.
Assuming that you did want to generate a nonstationary time series by adding some constant (a) at each time step, you could do the following:
myInnov <- function(N=100, a=1, SD=1) {a + rnorm(n=N, sd=SD)}
mc2 <- matrix(rep(NA,1E7), nrow=100)
for(i in seq_len(1E5)){
mc2[,i] <- arima.sim(model=list(ar=0.99), n=100, innov=myInnov(a=1, N=100, SD=1)) + 1
}
I hope that this helps.

Fast loan rate calculation for a big number of loans

I have a big data set (around 200k rows) where each row is a loan. I have the loan amount, the number of payments, and the loan payment.
I'm trying to get the loan rate.
R doesn't have a function for calculating this (at least base R doesn't have it, and I couldn't find it).
It isn't that hard to write both a npv and irr functions
Npv <- function(i, cf, t=seq(from=0,by=1,along.with=cf)) sum(cf/(1+i)^t)
Irr <- function(cf) { uniroot(npv, c(0,100000), cf=cf)$root }
And you can just do
rate = Irr(c(amt,rep(pmt,times=n)))
The problem is when you try to calculate the rate for a lot of payments. Because uniroot is not vectorized, and because rep takes a surprising amount of time, you end up with a slow calculation. You can make it faster if you do some math and figure out that you are looking for the roots of the following equation
zerome <- function(r) amt/pmt-(1-1/(1+r)^n)/r
and then use that as input for uniroot. This, in my pc, takes around 20 seconds to run for my 200k database.
The problem is that I'm trying to do some optimization, and this is a step of the optimization, so I'm trying to speed it up even more.
I've tried vectorization, but because uniroot is not vectorized, I can't go further that way. Is there any root finding method that is vectorized?
Thanks
Instead of using a root finder, you could use a linear interpolator. You will have to create one interpolator for each value of n (the number of remaining payments). Each interpolator will map (1-1/(1+r)^n)/r to r. Of course you will have to build a grid fine enough so it will return r to an acceptable precision level. The nice thing with this approach is that linear interpolators are fast and vectorized: you can find the rates for all loans with the same number of remaining payments (n) in a single call to the corresponding interpolator.
Now some code that proves it is a viable solution:
First, we create interpolators, one for each possible value of n:
n.max <- 360L # 30 years
one.interpolator <- function(n) {
r <- seq(from = 0.0001, to = 0.1500, by = 0.0001)
y <- (1-1/(1+r)^n)/r
approxfun(y, r)
}
interpolators <- lapply(seq_len(n.max), one.interpolator)
Note that I used a precision of 1/100 of a percent (1bp).
Then we create some fake data:
n.loans <- 200000L
n <- sample(n.max, n.loans, replace = TRUE)
amt <- 1000 * sample(100:500, n.loans, replace = TRUE)
pmt <- amt / (n * (1 - runif(n.loans)))
loans <- data.frame(n, amt, pmt)
Finally, we solve for r:
library(plyr)
system.time(ddply(loans, "n", transform, r = interpolators[[n[1]]](amt / pmt)))
# user system elapsed
# 2.684 0.423 3.084
It's fast. Note that some of the output rates are NA but it is because my random inputs made no sense and would have returned rates outside of the [0 ~ 15%] grid I selected. Your real data won't have that problem.

Resources