R Poisson simulation function - r

I would like to write a function that calculates the number of arrivals until time t in n
different trials. I know that the arguments should include the exponential parameter lambda, the time t, and the number of counts n to be sampled. It should return a vector with n elements, corresponding to the counts.
Progress: I have created a function that counts the number of events until time t,and I will need to use the rexp() function.
But how do I do this poisson function?

The following simulates a Poisson process. The function Nt takes two arguments, the exponential rate and the time limit.
Nt <- function(lambda = 1, t){
S <- 0 # Total time, sum of X's
n <- 0L # Number of events
repeat{
X <- rexp(1, lambda) # New time between events
if(S + X > t) break # Above the limit time t?
S <- S + X # No, update total time S
n <- n + 1L # and the nr. of events counter
}
n
}
set.seed(2021)
Rate <- 2
Time <- 10
N <- replicate(1e4, Nt(lambda = Rate, t = Time))
tbl <- table(N)
plot(tbl/sum(tbl), lwd = 10, col = "grey")
lines(0:40, dpois(0:40, lambda = Time*Rate), type = "h", col = "red")

Related

Looping through parameters to get equilibrium with deSolve

I struggle with loops intuitively. I have a simple consumer-resource model, and I want to loop through values of resource growth rate g to get final state values to then plot equilibrium as a function of the parameter values. This is what I have so far:
param.values = seq(from = 1, to = 10, by = 1)
variable = rep(0,length(param.values))
for (i in 1:length(param.values)){
state <- c(r = 1, n = 1)
parameters = c(g = variable[i],# resource growth rate
d = 0.5, # n mortality rate
k = 5, # r carrying capacity
c = 1, # consumption rate of n on r
e = 1, # conversion efficiency for n on r
h = 1 # handling time n on r
)
function1 <- function(times, state, parameters) {
with(as.list(c(state, parameters)),{
# rate of change
dr = variable[i]*r*(1 - (r/k)) - (c*n*r/(1+(h*c*r)))
dn = (e*c*n*r/(1+(h*c*r)))- n*d
# return the rate of change
list(c(dr, dn))
})
}
times <- seq(0, 100, by = 1)
out <- ode(y = state, times = times, func = function1, parms = parameters)
sol <- out[101, 2:3] # trying to get last equilibrium value to plot against param values...
print(sol[i])
}
plot(sol[,1] ~ param.values)
plot(sol[,2] ~ param.values)
I think I have thinks right up until the ode function - where should I be indexing i after this? I hope this makes sense.
Your approach had several issues, so I tried to re-organize it so that it runs through. But, as your model shows a stable cycle, it does not reach an equilibrium.
Here a few hints
The loop should only contain things that change during the simulation. Fixed code segments should come before the loop. This is easier to maintain and faster.
First, run the model without the loop, to see whether it works.
Then define a data structure (matrix or data frame) to store the results.
Here one approach how it can be implemented:
library("deSolve")
## define as much as possible outside the loop
function1 <- function(times, state, parameters) {
with(as.list(c(state, parameters)),{
# rate of change
dr = g*r*(1 - (r/k)) - (c*n*r/(1+(h*c*r)))
dn = (e*c*n*r/(1+(h*c*r)))- n*d
# return the rate of change
list(c(dr, dn))
})
}
state <- c(r = 1, n = 1)
parameters = c(g = 1, # resource growth rate
d = 0.5, # n mortality rate
k = 5, # r carrying capacity
c = 1, # consumption rate of n on r
e = 1, # conversion efficiency for n on r
h = 1 # handling time n on r
)
times <- seq(0, 100, by = 1)
## first test single run of model
out <- ode(y = state, times = times, func = function1, parms = parameters)
plot(out)
## It runs and we see a cycling model. I suspect it has no equilibrium!
param.values = seq(from = 1, to = 10, by = 1)
## define a matrix where results can be stored
sol <- matrix(0, nrow=length(param.values), ncol=2)
for (i in 1:length(param.values)){
## replace single parameter g with new value
parameters["g"] <- param.values[i]
out <- ode(y = state, times = times, func = function1, parms = parameters)
## store result of last value in row of matrix.
## Note that it may not be an equilibrium
sol[i, ] <- out[101, 2:3] # trying to get last equilibrium value to plot against param values...
print(sol[i, ])
}
plot(sol[,1] ~ param.values, type="l")
plot(sol[,2] ~ param.values, type="l")
## We see that the model has no equilibrium.
There are several other ways and, as said, the model has no equilibrium. Here another model example, a so-called chemostat with equilibrium.

Keep getting 'unused argument' error when trying to produce a qqplot in R

library(GoFKernel)
library(ggplot2)
rejection_fx_sqz <- function(n){
x <- vector() # the output vector in which simulated values should be stored
acpt <- 0 # count the accepted values
tol <- 0 # count the total number of values (accepted or not accepted)
len_x = 0
while(len_x < n){
n_to_gen = max((n-len_x)/0.69,20) # determine number to generate - not less than 20
tol = tol + n_to_gen # count the total number of values simulated
u1 = runif(n_to_gen) # simulate u1
u2 = runif(n_to_gen) # simulate u2
y = inv_G(u2)
g <- g_x(y)
d <- g*y*(5-y)
condU <- (M*u1) >= 1/d
condL <- !condU
condL[condL] <- (M*u1[condL]) <= lower(y[condL])/d[condL]
other <- !(condU | condL) # condition of below the W_U and above W_L
# modify condL because some samples can still be accepted given condL is false
condL[other] <- u1[other] <= fstar(y[other])/(M*g[other])
cond <- condL
acpt = acpt + sum(cond) # count the number of accepted values
x <- c(x, y[cond]) # add accepted values to the output vector
len_x <- length(x)
}
p = acpt / tol
return(list(x=x[1:n], p=p))
}
n=100000
x=rejection_fx_sqz(n) # a function that simulates from f(x) by generating n samples
x_fx <- data.frame(x=x$x)
x=x_fx$x
x_plot = cbind(x_fx, fy = 1/(I*x*(5-x))*exp(-1/8*(-1+log(x/(5-x)))^2))
f_cdf <- function(x) {
integrate(fstar, 0, x)$value/I
}
# quantile function, inverse cdf
f_q <- inverse(f_cdf, lower=0.000000000000000001, upper=4.999999999999999999)
ggplot(x_plot, aes(sample=x))+
labs(title="Empirical against theoretical quantiles")+
stat_qq(distribution=f_q) +
stat_qq_line(distribution=f_q)
What I'm trying to do is to produce a 'quantile-quantile' diagnostic plot for my algorithm for simulating from f(x). The problem is that I keep getting two error messages which says:
Computation failed in stat_qq(): unused argument (p = quantiles)
Computation failed in stat_qq_line(): unused argument (p = quantiles)
I am beginner for r-language and this is driving me crazy. Any help is appreciated.

Manually simulating Poisson Process in R

The following problem tells us to generate a Poisson process step by step from ρ (inter-arrival time), and τ (arrival time).
One of the theoretical results presented in the lectures gives the
following direct method for simulating Poisson process:
• Let τ0 = 0.
• Generate i.i.d. exponential random variables ρ1, ρ2, . . ..
• Let τn = ρ1 + . . . + ρn for n = 1, 2, . . . .
• For each k = 0, 1, . . ., let
Nt = k for τk ≤ t < τk+1.
Using this method, generate a realization of a Poisson process (Nt)t with λ = 0.5 on the interval [0, 20].
Generate 10000 realizations of a Poisson process (Nt)t with λ = 0.5 and use your results to estimate E(Nt) and Var(Nt). Compare the estimates
with the theoretical values.
My attempted solution:
First, I have generated the values of ρ using rexp() function in R.
rhos <-function(lambda, max1)
{
vec <- vector()
for (i in 1:max1)
{
vec[i] <- rexp(0.5)
}
return (vec)
}
then, I created τs by progressive summing of ρs.
taos <- function(lambda, max)
{
rho_vec <- rhos(lambda, max)
#print(rho_vec)
vec <- vector()
vec[1] <- 0
sum <- 0
for(i in 2:max)
{
sum <- sum + rho_vec[i]
vec[i] <- sum
}
return (vec)
}
The following function is for finding the value of Nt=k when the value of k is given. Say, it is 7, etc.
Ntk <- function(lambda, max, k)
{
tao_vec <- taos(lambda, max)
val <- max(tao_vec[tao_vec < k])
}
y <- taos(0.5, 20)
x <- seq(0, 20-1, by=1)
plot(x,y, type="s")
Output:
As you can see, the plot of the Poisson process is blank rather than a staircase.
If I change rexp to exp, I get the following output:
.. which is a staircase function but all steps are equal.
Why is my source code not producing the expected output?
It looks like you're using max1 to indicate how many times to sample the exponential distribution in your rhos function. I would recommend something like this:
rhosGen <- function(lambda, maxTime){
rhos <- NULL
i <- 1
while(sum(rhos) < maxTime){
samp <- rexp(n = 1, rate = lambda)
rhos[i] <- samp
i <- i+1
}
return(head(rhos, -1))
}
This will continue to sample from the exponential until the sum of these holding times is larger than the length of the given interval. head the removes the last sample so that all of the events that we keep track of definitely occur in our time interval of interest.
From here you have to generate the taos by summing the previous holding times (rhos):
taosGen <- function(lambda, maxTime){
rhos <- rhosGen(lambda, maxTime)
taos <- NULL
cumSum <- 0
for(i in 1:length(rhos)){
taos[i] <- sum(rhos[1:i])
}
return(taos)
}
Now that you have the taos we know at what time each event in the time interval (0,maxTime) occurs. This leads us to generating the associated Poisson Process by finding the value of the Nt for each t in the time interval:
ppGen <- function(lambda, maxTime){
taos <- taosGen(lambda, maxTime)
pp <- NULL
for(i in 1:maxTime){
pp[i] <- sum(taos <= i)
}
return(pp)
}
This generates the value of the Poisson Process at each integer time in the interval. I suspect that part of your issue was trying to put the tao values on the y-axis instead of the count of events that had occurred already. The following code worked for me to produce a random looking stair case, similar to your example.
y <- ppGen(0.5, 20)
x <- seq(0, 20-1, by=1)
plot(x,y, type="s")
Here's another possible implementation. The idea is to generate a vector of wait times (tau), and plot that against the list of events we're waiting for (max1)
poi.process <- function(lambda,n){
# initialize vector of total wait time for the arrival of each event:
s<-numeric(n+1)
# set S_0 = 0
s[1] <-0
# generate vector of iid Exp random variables:
x <-replicate(n,rexp(1,lambda))
# assign wait time to vector s in for loop:
for (k in 1:n){
s[k+1] <-sum(x[1:k])
}
# return vector of wait time
return(s)
}
Plotting it using stepfun will get us something like this:
n<-20
lambda <-3
# simulate list of wait time:
s_list <-poi.process(lambda,n)
# plot function:
plot(stepfun(0:(n-1), s_list),
do.points = TRUE,
pch = 16,
col.points = "red",
verticals = FALSE,
main = 'Realization of a Poisson process with lambda = 3',
xlab = 'Time of arrival',
ylab = 'Number of arrivals')
Sample Poisson process:

Machine Learning: Stochastic gradient descent for logistic regression in R: Calculating Eout and average number of epochs

I am trying to write a code to solve the following problem (As stated in HW5 in the CalTech course Learning from Data):
In this problem you will create your own target function f
(probability in this case) and data set D to see how Logistic
Regression works. For simplicity, we will take f to be a 0=1
probability so y is a deterministic function of x. Take d = 2 so you
can visualize the problem, and let X = [-1; 1]×[-1; 1] with uniform
probability of picking each x 2 X . Choose a line in the plane as the
boundary between f(x) = 1 (where y has to be +1) and f(x) = 0 (where y
has to be -1) by taking two random, uniformly distributed points from
X and taking the line passing through them as the boundary between y =
±1. Pick N = 100 training points at random from X , and evaluate the
outputs yn for each of these points xn. Run Logistic Regression with
Stochastic Gradient Descent to find g, and estimate Eout(the cross
entropy error) by generating a sufficiently large, separate set of
points to evaluate the error. Repeat the experiment for 100 runs with
different targets and take the average. Initialize the weight vector
of Logistic Regression to all zeros in each run. Stop the algorithm
when |w(t-1) - w(t)| < 0:01, where w(t) denotes the weight vector at
the end of epoch t. An epoch is a full pass through the N data points
(use a random permutation of 1; 2; · · · ; N to present the data
points to the algorithm within each epoch, and use different
permutations for different epochs). Use a learning rate of 0.01.
I am required to calculate the nearest value to Eout for N=100, and the average number of epochs for the required criterion.
I wrote and ran the code but I'm not getting the right answers (as stated in the solutions, these are Eout is near 0.1 and the number of epochs is near 350). The required number of epochs for a delta w of 0.01 comes to far too small (around 10), leaving the error too big (around 2). I then tried to replace the criterion with |w(t-1) - w(t)| < 0.001 (rather than 0.01). Then, the average required number of epochs was about 250 and out of sample error was about 0.35.
Is there something wrong with my code/solution, or is it possible that the answers provided are faulty? I've added comments to indicate what I intend to do at each step. Thanks in advance.
library(pracma)
h<- 0 # h will later be updated to number of required epochs
p<- 0 # p will later be updated to Eout
C <- matrix(ncol=10000, nrow=2) # Testing set, used to calculate out of sample error
d <- matrix(ncol=10000, nrow=1)
for(i in 1:10000){
C[, i] <- c(runif(2, min = -1, max = 1)) # Sample data
d[1, i] <- sign(C[2, i] - f(C[1, i]))
}
for(g in 1:100){ # 100 runs of the experiment
x <- runif(2, min = -1, max = 1)
y <- runif(2, min = -1, max = 1)
fit = (lm(y~x))
t <- summary(fit)$coefficients[,1]
f <- function(x){ # Target function
t[2]*x + t[1]
}
A <- matrix(ncol=100, nrow=2) # Sample data
b <- matrix(ncol=100, nrow=1)
norm_vec <- function(x) {sqrt(sum(x^2))} # vector norm calculator
w <- c(0,0) # weights initialized to zero
for(i in 1:100){
A[, i] <- c(runif(2, min = -1, max = 1)) # Sample data
b[1, i] <- sign(A[2, i] - f(A[1, i]))
}
q <- matrix(nrow = 2, ncol = 1000) # q tracks the weight vector at the end of each epoch
l= 1
while(l < 1001){
E <- function(z){ # cross entropy error function
x = z[1]
y = z[2]
v = z[3]
return(log(1 + exp(-v*t(w)%*%c(x, y))))
}
err <- function(xn1, xn2, yn){ #gradient of error function
return(c(-yn*xn1, -yn*xn2)*(exp(-yn*t(w)*c(xn1,xn2))/(1+exp(-yn*t(w)*c(xn1,xn2)))))
}
e = matrix(nrow = 2, ncol = 100) # e will track the required gradient at each data point
e[,1:100] = 0
perm = sample(100, 100, replace = FALSE, prob = NULL) # Random permutation of the data indices
for(j in 1:100){ # One complete Epoch
r = A[,perm[j]] # pick the perm[j]th entry in A
s = b[perm[j]] # pick the perm[j]th entry in b
e[,perm[j]] = err(r[1], r[2], s) # Gradient of the error
w = w - 0.01*e[,perm[j]] # update the weight vector accorng to the formula involving step size, gradient
}
q[,l] = w # the lth entry is the weight vector at the end of the lth epoch
if(l > 1 & norm_vec(q[,l] - q[,l-1])<0.001){ # given criterion to terminate the algorithm
break
}
l = l+1 # move to the next epoch
}
for(n in 1:10000){
p[g] = mean(E(c(C[1,n], C[2, n], d[n]))) # average over 10000 data points, of the error function, in experiment no. g
}
h[g] = l #gth entry in the vector h, tracks the number of epochs in the gth iteration of the experiment
}
mean(h) # Mean number of epochs needed
mean(p) # average Eout, over 100 experiments

for loop and function in R/Rstudio - I'm keep getting errors & graph not displaying correctly

-------------------------------Given Code---------------------------------
# Function to generate coin flips n times and determine the longest run of
# heads or tails.
longest.run<-function(n)
{
# generate n Bernoulli trials with success probability 1/2
x<-rbinom(n,1,.5)
#
# determine the longest run
# compute successive differences
#
diffs<-diff(x)
#
# determine where difference is 1 or -1
#
change.positions<-seq(1:n-1)[abs(diff(x))==1]
#
# insert change positions at the ends (0 and n)
#
change.positions<-c(0,change.positions,n)
#
# return the maximum difference between change positions
#
x
max(diff(change.positions))
}
# Create a vector with value 100 repeated 10,000 times
x<-rep(100,10000)
# Apply the longest run function to each element of x
longest.runs<-sapply(x,longest.run)
# Estimate the expected value using the average of the longest.runs vector
#
mean(longest.runs)
------------------------------Given Code---------------------------------
So my task is
(1) to get an approximation to the expected value of the length of the longest run in n flips of a fair coin
for n = 10; 20; 30; 40; 50; 60; 70; 80; 90; 100; ... 250:
(2) then I have to plot the expected value vs. n and fit a curve of the form y = clog(n) for some c to the data.
(3) Lastly, I have to use that fit to predict the expected value when n = 500:
Then approximate the value you get using
simulation and compare.
What I have right now is
longest.run <- function(n){
+ x <- rbinom(n ,1, .5)
+ diffs <- diff(x)
+ change.positions <- seq(1:n-1)[abs(diff(x))==1]
+ change.positions <- c(0, change.positions, n)
+ max(diff(change.positions))
}
expected.values <- c()
variance.values <- c()
n <- seq(10, 250, 10)
for (i in 1:25){
+ x <- rep(n[i], 10000)
+ longest.runs <- sapply(x, longest.run)
+ expected.values[i] <- mean(longest.runs)
+ variance.values[i] <- var(longest.runs)
}
par(mfrow = c(1, 2))
plot(n, expected.values, xlab = "expected.values")
plot(n, variance.values, xlab = "variance.values")
curve(c*log(x),add = TRUE)
I getting an error
+x <- rep(n[i], 10000) でエラー: (error here)
関数 "+<-" を見つけることができませんでした (can't find the variable "+<-")
I think this is because of the for loop...
also, I'm getting exactly the same graph for two graphs (which should be different) ...
Any idea?

Resources