The following problem tells us to generate a Poisson process step by step from ρ (inter-arrival time), and τ (arrival time).
One of the theoretical results presented in the lectures gives the
following direct method for simulating Poisson process:
• Let τ0 = 0.
• Generate i.i.d. exponential random variables ρ1, ρ2, . . ..
• Let τn = ρ1 + . . . + ρn for n = 1, 2, . . . .
• For each k = 0, 1, . . ., let
Nt = k for τk ≤ t < τk+1.
Using this method, generate a realization of a Poisson process (Nt)t with λ = 0.5 on the interval [0, 20].
Generate 10000 realizations of a Poisson process (Nt)t with λ = 0.5 and use your results to estimate E(Nt) and Var(Nt). Compare the estimates
with the theoretical values.
My attempted solution:
First, I have generated the values of ρ using rexp() function in R.
rhos <-function(lambda, max1)
{
vec <- vector()
for (i in 1:max1)
{
vec[i] <- rexp(0.5)
}
return (vec)
}
then, I created τs by progressive summing of ρs.
taos <- function(lambda, max)
{
rho_vec <- rhos(lambda, max)
#print(rho_vec)
vec <- vector()
vec[1] <- 0
sum <- 0
for(i in 2:max)
{
sum <- sum + rho_vec[i]
vec[i] <- sum
}
return (vec)
}
The following function is for finding the value of Nt=k when the value of k is given. Say, it is 7, etc.
Ntk <- function(lambda, max, k)
{
tao_vec <- taos(lambda, max)
val <- max(tao_vec[tao_vec < k])
}
y <- taos(0.5, 20)
x <- seq(0, 20-1, by=1)
plot(x,y, type="s")
Output:
As you can see, the plot of the Poisson process is blank rather than a staircase.
If I change rexp to exp, I get the following output:
.. which is a staircase function but all steps are equal.
Why is my source code not producing the expected output?
It looks like you're using max1 to indicate how many times to sample the exponential distribution in your rhos function. I would recommend something like this:
rhosGen <- function(lambda, maxTime){
rhos <- NULL
i <- 1
while(sum(rhos) < maxTime){
samp <- rexp(n = 1, rate = lambda)
rhos[i] <- samp
i <- i+1
}
return(head(rhos, -1))
}
This will continue to sample from the exponential until the sum of these holding times is larger than the length of the given interval. head the removes the last sample so that all of the events that we keep track of definitely occur in our time interval of interest.
From here you have to generate the taos by summing the previous holding times (rhos):
taosGen <- function(lambda, maxTime){
rhos <- rhosGen(lambda, maxTime)
taos <- NULL
cumSum <- 0
for(i in 1:length(rhos)){
taos[i] <- sum(rhos[1:i])
}
return(taos)
}
Now that you have the taos we know at what time each event in the time interval (0,maxTime) occurs. This leads us to generating the associated Poisson Process by finding the value of the Nt for each t in the time interval:
ppGen <- function(lambda, maxTime){
taos <- taosGen(lambda, maxTime)
pp <- NULL
for(i in 1:maxTime){
pp[i] <- sum(taos <= i)
}
return(pp)
}
This generates the value of the Poisson Process at each integer time in the interval. I suspect that part of your issue was trying to put the tao values on the y-axis instead of the count of events that had occurred already. The following code worked for me to produce a random looking stair case, similar to your example.
y <- ppGen(0.5, 20)
x <- seq(0, 20-1, by=1)
plot(x,y, type="s")
Here's another possible implementation. The idea is to generate a vector of wait times (tau), and plot that against the list of events we're waiting for (max1)
poi.process <- function(lambda,n){
# initialize vector of total wait time for the arrival of each event:
s<-numeric(n+1)
# set S_0 = 0
s[1] <-0
# generate vector of iid Exp random variables:
x <-replicate(n,rexp(1,lambda))
# assign wait time to vector s in for loop:
for (k in 1:n){
s[k+1] <-sum(x[1:k])
}
# return vector of wait time
return(s)
}
Plotting it using stepfun will get us something like this:
n<-20
lambda <-3
# simulate list of wait time:
s_list <-poi.process(lambda,n)
# plot function:
plot(stepfun(0:(n-1), s_list),
do.points = TRUE,
pch = 16,
col.points = "red",
verticals = FALSE,
main = 'Realization of a Poisson process with lambda = 3',
xlab = 'Time of arrival',
ylab = 'Number of arrivals')
Sample Poisson process:
Related
I would like to write a function that calculates the number of arrivals until time t in n
different trials. I know that the arguments should include the exponential parameter lambda, the time t, and the number of counts n to be sampled. It should return a vector with n elements, corresponding to the counts.
Progress: I have created a function that counts the number of events until time t,and I will need to use the rexp() function.
But how do I do this poisson function?
The following simulates a Poisson process. The function Nt takes two arguments, the exponential rate and the time limit.
Nt <- function(lambda = 1, t){
S <- 0 # Total time, sum of X's
n <- 0L # Number of events
repeat{
X <- rexp(1, lambda) # New time between events
if(S + X > t) break # Above the limit time t?
S <- S + X # No, update total time S
n <- n + 1L # and the nr. of events counter
}
n
}
set.seed(2021)
Rate <- 2
Time <- 10
N <- replicate(1e4, Nt(lambda = Rate, t = Time))
tbl <- table(N)
plot(tbl/sum(tbl), lwd = 10, col = "grey")
lines(0:40, dpois(0:40, lambda = Time*Rate), type = "h", col = "red")
I am trying to write a code to solve the following problem (As stated in HW5 in the CalTech course Learning from Data):
In this problem you will create your own target function f
(probability in this case) and data set D to see how Logistic
Regression works. For simplicity, we will take f to be a 0=1
probability so y is a deterministic function of x. Take d = 2 so you
can visualize the problem, and let X = [-1; 1]×[-1; 1] with uniform
probability of picking each x 2 X . Choose a line in the plane as the
boundary between f(x) = 1 (where y has to be +1) and f(x) = 0 (where y
has to be -1) by taking two random, uniformly distributed points from
X and taking the line passing through them as the boundary between y =
±1. Pick N = 100 training points at random from X , and evaluate the
outputs yn for each of these points xn. Run Logistic Regression with
Stochastic Gradient Descent to find g, and estimate Eout(the cross
entropy error) by generating a sufficiently large, separate set of
points to evaluate the error. Repeat the experiment for 100 runs with
different targets and take the average. Initialize the weight vector
of Logistic Regression to all zeros in each run. Stop the algorithm
when |w(t-1) - w(t)| < 0:01, where w(t) denotes the weight vector at
the end of epoch t. An epoch is a full pass through the N data points
(use a random permutation of 1; 2; · · · ; N to present the data
points to the algorithm within each epoch, and use different
permutations for different epochs). Use a learning rate of 0.01.
I am required to calculate the nearest value to Eout for N=100, and the average number of epochs for the required criterion.
I wrote and ran the code but I'm not getting the right answers (as stated in the solutions, these are Eout is near 0.1 and the number of epochs is near 350). The required number of epochs for a delta w of 0.01 comes to far too small (around 10), leaving the error too big (around 2). I then tried to replace the criterion with |w(t-1) - w(t)| < 0.001 (rather than 0.01). Then, the average required number of epochs was about 250 and out of sample error was about 0.35.
Is there something wrong with my code/solution, or is it possible that the answers provided are faulty? I've added comments to indicate what I intend to do at each step. Thanks in advance.
library(pracma)
h<- 0 # h will later be updated to number of required epochs
p<- 0 # p will later be updated to Eout
C <- matrix(ncol=10000, nrow=2) # Testing set, used to calculate out of sample error
d <- matrix(ncol=10000, nrow=1)
for(i in 1:10000){
C[, i] <- c(runif(2, min = -1, max = 1)) # Sample data
d[1, i] <- sign(C[2, i] - f(C[1, i]))
}
for(g in 1:100){ # 100 runs of the experiment
x <- runif(2, min = -1, max = 1)
y <- runif(2, min = -1, max = 1)
fit = (lm(y~x))
t <- summary(fit)$coefficients[,1]
f <- function(x){ # Target function
t[2]*x + t[1]
}
A <- matrix(ncol=100, nrow=2) # Sample data
b <- matrix(ncol=100, nrow=1)
norm_vec <- function(x) {sqrt(sum(x^2))} # vector norm calculator
w <- c(0,0) # weights initialized to zero
for(i in 1:100){
A[, i] <- c(runif(2, min = -1, max = 1)) # Sample data
b[1, i] <- sign(A[2, i] - f(A[1, i]))
}
q <- matrix(nrow = 2, ncol = 1000) # q tracks the weight vector at the end of each epoch
l= 1
while(l < 1001){
E <- function(z){ # cross entropy error function
x = z[1]
y = z[2]
v = z[3]
return(log(1 + exp(-v*t(w)%*%c(x, y))))
}
err <- function(xn1, xn2, yn){ #gradient of error function
return(c(-yn*xn1, -yn*xn2)*(exp(-yn*t(w)*c(xn1,xn2))/(1+exp(-yn*t(w)*c(xn1,xn2)))))
}
e = matrix(nrow = 2, ncol = 100) # e will track the required gradient at each data point
e[,1:100] = 0
perm = sample(100, 100, replace = FALSE, prob = NULL) # Random permutation of the data indices
for(j in 1:100){ # One complete Epoch
r = A[,perm[j]] # pick the perm[j]th entry in A
s = b[perm[j]] # pick the perm[j]th entry in b
e[,perm[j]] = err(r[1], r[2], s) # Gradient of the error
w = w - 0.01*e[,perm[j]] # update the weight vector accorng to the formula involving step size, gradient
}
q[,l] = w # the lth entry is the weight vector at the end of the lth epoch
if(l > 1 & norm_vec(q[,l] - q[,l-1])<0.001){ # given criterion to terminate the algorithm
break
}
l = l+1 # move to the next epoch
}
for(n in 1:10000){
p[g] = mean(E(c(C[1,n], C[2, n], d[n]))) # average over 10000 data points, of the error function, in experiment no. g
}
h[g] = l #gth entry in the vector h, tracks the number of epochs in the gth iteration of the experiment
}
mean(h) # Mean number of epochs needed
mean(p) # average Eout, over 100 experiments
I'm dealing right now with a valuation of Option prices for my university thesis.
We need to program some things in R. It's the first time I'm working with a programming software like R. I've been doing this for the last 2 weeks and this is where I went so far:
s <- 120
#Value of the stock today
sd <- 0.1
#standard deviation
d <- 0.003
#Drift
N <- 365
T <-1
dt <-T/N
t <- seq(0,T, length=N+1)
W <- c(0, cumsum(sqrt(dt)*rnorm(N)))
#plot( t, W, type="l", main="Wiener process", ylim=c(-1,1))
S <- s*exp(d+sd*W)
S
This is a simple generalized Wiener process which I want to turn into a Monte Carlo simulation.
For S there are now 366 (N+1) Values of the Stock path. What I need is a "for loop" which takes the last Value of S and allocates it into a vector (list vector), so that I can run the loop for example 10000 times, collect every last Value of S and get the average of the vector.
I have no idea how I can program such a for loop.
I would really appreciate if you could help me or give me some good hints.
Greetings from Germany
Christian
I never studied Wiener Processes, but I think this would be a simple outline of the code you're trying to achieve:
stock_prices <- s #Initialise vector of stock prices
numIter <- 10^4 #Set number of iterations in the for loop
for(i in 1:numIter) {
s <- stock_prices[i] #This is the current stock price (for ith iteration / time step)
#Calculate the next stock price here, call it next_price
#Add price of next iteration / time step to your vector:
stock_prices <- c(stock_prices, next_price)
}
stock_prices will be a vector of the 10,000 stock prices you simulated.
I don't know how you calculate the next stock price from S, but if you draw from the values of S randomly, then it might be useful to check out the function sample (type ?sample for help on it).
Hope that helps
If you just want to run code repeatedly, putting it in a function is nice (but not absolutely necessary). I will refer to all the code in your question as <your code>.
To make a function that runs your code,
my_function = function() {
<your code>
}
The function will, by default, return its last line, in this case S. You only want the last element of S, tail(S, 1). So we can modify the function to return only that:
my_function = function() {
<your code>
return(tail(S, 1))
}
We can then call it in a for loop n times and assign the result. It is best to pre-allocate the vector for the results so that an appropriately sized block of memory can be set aside for it up front:
n = 10000
results = rep(NA, n)
for (i in 1:n) {
results[i] <- my_function()
}
This is equivalent to
n = 10000
results = rep(NA, n)
for (i in 1:n) {
<your code>
results[i] <- tail(S, 1)
}
And, for that matter, it is also equivalent to
results = replicate(n, my_function())
which is a handy shortcut.
If you want to be fancy, you could parameterize your function:
my_nice_function = function(s = 120, sd = 0.1, d = 0.003, N = 365) {
T <- 1
dt <- T / N
t <- seq(0, T, length = N + 1)
W <- c(0, cumsum(sqrt(dt) * rnorm(N)))
S <- s * exp(d + sd * W)
return(tail(S, 1))
}
Now my_nice_function has default values as in your code, but you can easily adjust them, e.g., to run the 50 simulations with sd = 0.2 you can do this:
replicate(50, my_nice_function(sd = 0.2))
I have a time series problem that I could easily work out manually, only it would take kind of a long time since I have 4 different AR(2) processes and want to calculate at least 20 lags for each.
What I want to do is use the Yule Walker equation for rho as follows:
I have an auto regressive process of second order, AR(2). Phi(1) is 0.6 and Phi(2) is 0.4.
I want to calculate the correlation coefficients rho(k) for all lags up to k = 20.
So rho(0) would naturally be 1 and rho(-1) = rho(1). Therefore
rho(1) = phi(1) + phi(2)*rho(1)
rho(k) = phi(1)*rho(k-1) + phi(2)*rho(k-2)
Now I want to solve this in R, but I have no idea how to start, can anyone help me out here?
You can try my program in R languages,
In R Script:
AR2 <- function(Zt,tetha0,phi1,phi2,nlag)
{
n <- length(Zt)
Zbar <- mean(Zt)
Zt1 <- rep(Zbar,n)
for(i in 2:n){Zt1[i] <- Zt[i-1]}
Zt2 <- rep(Zbar,n)
for(i in 3:n){Zt1[i] <- Zt[i-2]}
Zhat <- tetha0+phi1*Zt1+phi2*Zt2
error <- Zt-Zhat
ACF(error,nlag)
}
ACF <- function(error,nlag)
{
n <- length(error)
rho <- rep(0,nlag)
for(k in 1:nlag)
{
a <- 0
b <- 0
for(t in 1:(n-k)){a <- a+(error[t]*error[t+k])}
for(t in 1:n){b <- b+(error[t]^2)}
rho[k] <- a/b
}
return(rho)
}
In R console:
Let you have a Zt series, tetha(0) = 0, phi(1) = 0.6, phi(2) = 0.4, and number of lag = 20
AR2(Zt,0,0.6,0.4,20)
I have the following code to create a sample function and to generate simulated data
mean_detects<- function(obs,cens) {
detects <- obs[cens==0]
nondetects <- obs[cens==1]
res <- mean(detects)
return(res)
}
mu <-log(1); sigma<- log(3); n_samples=10, n_iterations = 5; p=0.10
dset2 <- function (mu, sigma, n_samples, n_iterations, p) {
X_after <- matrix(NA_real_, nrow = n_iterations, ncol = n_samples)
delta <- matrix(NA_real_, nrow = n_iterations, ncol = n_samples)
lod <- quantile(rlnorm(100000, mu, sigma), p = p)
pct_cens <- numeric(n_iterations)
count <- 1
while(count <= n_iterations) {
X_before <- rlnorm(n_samples, mu, sigma)
X_after[count, ] <- pmax(X_before, lod)
delta [count, ] <- X_before <= lod
pct_cens[count] <- mean(delta[count,])
if (pct_cens [count] > 0 & pct_cens [count] < 1 ) count <- count + 1 }
ave_detects <- mean_detects(X_after,delta) ## how can I use apply or other functions here?
return(ave_detects)
}
If I specify n_iterations, I will have a 1x10 X_after matrix and also 1x10 delta matrix. Then the mean_detects function works fine using this command.
ave_detects <- mean_detects(X_after,delta)
however when I increase n_iterations to say 5, then I will have 2 5x10 X_after and delta then the mean_detects function does not work any more. It only gives me output for 1 iteration instead of 5. My real simulation has thousands of iterations so speed and memory must also be taken into account.
Edits: I edited my code based your comments. The mean_detects function that I created was meant to show an example the use of X_after and delta matrices simultaneously. The real function is very long. That's why I did not post it here.
Your actual question isn't really clear. So,
"My function only takes in 1 dataframe".
Actually your function takes in two vectors
Write code that can use both X_after and delta. This doesn't really mean anything - sorry.
"speed and memory must be taken into account". This is vague. Will your run out of memory? As a suggestion, you could think about a rolling mean. For example,
x = runif(5)
total = 0
for(i in seq_along(x)) {
total = (i-1)*total/i + x[i]/i
cat(i, ": mean ", total, "\n")
}
1 : mean 0.4409
2 : mean 0.5139
3 : mean 0.5596
4 : mean 0.6212
5 : mean 0.6606
Aside
Your dest2 function requires the variable n (which you haven't defined).
Your dest2 function doesn't return an obvious value.
your mean_detects function can be simplified to:
mean(obs[cens==0])