I'm writing an SI Model using the deSolve package for R.The model describes the transmission of an infection within a community and then allows the introduction of external events - which represent the mass treatment of the whole community. The final graph should show the three curves of "Susceptible", "Infected", and "Total". But the output chart is wrong. I don't know where the problem is. the debug could run without any error pops up. there is no Infected and total line in the graph, and susceptible is wrong too. the first image is How the code runs out. but what I need should be like the second image.
and
rm(list=ls())
require(deSolve)
odeequations=function(t,y,pars) {
S=y[1]; In=y[2];
beta=pars[1];
dS= -beta*S*In;
dIn=beta*S*In;
return(list(c(dS,dIn)));
}
S0=1;
I0=100000;
Y0=c(S0, I0);
tmax=70;
dt=1;
timevec=seq(0,tmax,dt);
beta=1.5;
pars=c(beta);
odeoutput=lsoda(Y0,timevec,odeequations,parms=pars, atol=1e-7,rtol=1e-7);
## windows(width=10,height=10)
plot(odeoutput[,1],odeoutput[,2],type="l",xlab="time (years)",ylab="",col="green",lwd=2,log="",xlim=c(0,tmax),ylim=c(1,max(odeoutput[,2])),main="Outbreak Time Series")
lines(odeoutput[,1],odeoutput[,3],type="l",col="red",lwd=2)
lines(odeoutput[,1],odeoutput[,2]+odeoutput[,3], type="l",col="blue",lwd=2)
legend("right", c("Susceptible","Infected","Total"),col = c("green","red","blue"),lwd=2)
A general SIR model with deSolve is found at different places, for example in the following paper: https://doi.org/10.18637/jss.v033.i03
A SEIR model (i.e. with an additional state variable) can be formulated as follows:
library("deSolve")
SEIR <- function(t, y, parms) {
with(as.list(c(parms, y)), {
dS <- -rho * beta * I * S
dE <- rho * beta * S * I - alpha * E
dI <- alpha * E - gamma * I
dR <- gamma * I
list(c(dS, dE, dI, dR))
})
}
# state variables: fractions of total population
y0 <- c(S=1 - 5e-4, # susceptible
E=4e-4, # exposed
I=1e-4, # infected
R=0) # recovered or deceased
parms <- c(alpha = 0.2, # inverse of incubation period (5 days)
beta = 1.75, # average contact rate
gamma = 0.5, # inverse of mean infectious period (2 days)
rho = 1) # social distancing factor (0 ... 1)
# time in days
times <- seq(0, 150, 1)
# numerical integration
out <- ode(y0, times, SEIR, parms, method="bdf", atol=1e-8, rtol=1e-8)
matplot.0D(out)
This example and several links to other versions can be found at https://github.com/tpetzoldt/covid
There are two errors in your code.
you have S0 and I0 switched (i.e., try setting S0 <- 1e5; I0 <- 1 rather than vice versa)
your beta value is way too high; try beta <- 1.5/1e5 (i.e., scale by population size)
Related
Disclaimer: Cross-post on Stack Computational Science
Aim: I am trying to numerically solve a Lotka-Volterra ODE in R, using de sde.sim() function in the sde package. I would like to use the sde.sim() function in order to eventually transform this system into an SDE. So initially, I started with an simple ODE system (Lotka Volterra model) without a noise term.
The Lotka-Volterra ODE system:
with initial values for x = 10 and y = 10.
The parameter values for alpha, beta, delta and gamma are 1.1, 0.4, 0.1 and 0.4 respectively (mimicking this example).
Attempt to solve problem:
library(sde)
d <- expression((1.1 * x[0] - 0.4 * x[0] * x[1]), (0.1 * x[0] * x[1] - 0.4 * x[1]))
s <- expression(0, 0)
X <- sde.sim(X0=c(10,10), T = 10, drift=d, sigma=s)
plot(X)
However, this does not seem to generate a nice cyclic behavior of the predator and prey population.
Expected Output
I used the deSolve package in R to generate the expected output.
library(deSolve)
alpha <-1.1
beta <- 0.4
gamma <- 0.1
delta <- 0.4
yini <- c(X = 10, Y = 10)
Lot_Vol <- function (t, y, parms) {
with(as.list(y), {
dX <- alpha * X - beta * X * Y
dY <- 0.1 * X * Y - 0.4 * Y
list(c(dX, dY))
}) }
times <- seq(from = 0, to = 100, by = 0.01)
out <- ode(y = yini, times = times, func = Lot_Vol, parms = NULL)
plot(y=out[, "X"], x = out[, "time"], type = 'l', col = "blue", xlab = "Time", ylab = "Animals (#)")
lines(y=out[, "Y"], x = out[, "time"], type = 'l', col = "red")
Question
I think something might be wrong the the drift function, however, I am not sure what. What is going wrong in the attempt to solve this system of ODEs in sde.sim()?
Assuming that not specifying a method takes the first in the list, and that all other non-specified parameters take default values, you are performing the Euler method with step size h=0.1.
As is known on a function that has convex concentric trajectories, the Euler method will produce an outward spiral. As a first order method, the error should grow to size about T*h=10*0.1=1. Or if one wants to take the more pessimistic estimate, the error has size (exp(LT)-1)*h/L, with L=3 in some adapted norm this gives a scale of 3.5e11.
Exploring the actual error e(t)=c(t)*h of the Euler method, one gets the following plots. Left are the errors of the components and right the trajectories for various step sizes in the Euler method. The error coefficient the function c(t) in the left plots is scaled down by the factor (exp(L*t)-1)/L to get comparable values over large time intervals, the value L=0.06 gave best balance.
One can see that the actual error
abs(e(t))<30*h*(exp(L*t)-1)/L
is in-between the linear and exponential error models, but closer to the linear one.
To reduce the error, you have to decrease the step size. In the call of SDE.sim, this is achieved by setting the parameter N=5000 or larger to get a step size h=10/5000=0.002 so that you can hope to be correct in the first two digits with an error bound of 30*h*T=0.6. In the SDE case you accumulate Gaussian noise of size sqrt(h) in every step, so that the truncation error of O(h^2) is a rather small perturbation of the random number.
I would like to pull 1000 samples from a custom distribution in R
I have the following custom distribution
library(gamlss)
mu <- 1
sigma <- 2
tau <- 3
kappa <- 3
rate <- 1
Rmax <- 20
x <- seq(1, 2e1, 0.01)
points <- Rmax * dexGAUS(x, mu = mu, sigma = sigma, nu = tau) * pgamma(x, shape = kappa, rate = rate)
plot(points ~ x)
How can I randomly sample via Monte Carlo simulation from this distribution?
My first attempt was the following code which produced a histogram shape I did not expect.
hist(sample(points, 1000), breaks = 51)
This is not what I was looking for as it does not follow the same distribution as the pdf.
If you want a Monte Carlo simulation, you'll need to sample from the distribution a large number of times, not take a large sample one time.
Your object, points, has values that increases as the index increases to a threshold around 400, levels off, and then decreases. That's what plot(points ~ x) shows. It may describe a distribution, but the actual distribution of values in points is different. That shows how often values are within a certain range. You'll notice your x axis for the histogram is similar to the y axis for the plot(points ~ x) plot. The actual distribution of values in the points object is easy enough to see, and it is similar to what you're seeing when sampling 1000 values at random, without replacement from an object with 1900 values in it. Here's the distribution of values in points (no simulation required):
hist(points, 100)
I used 100 breaks on purpose so you could see some of the fine details.
Notice the little bump in the tail at the top, that you may not be expecting if you want the histogram to look like the plot of the values vs. the index (or some increasing x). That means that there are more values in points that are around 2 then there are around 1. See if you can look at how the curve of plot(points ~ x) flattens when the value is around 2, and how it's very steep between 0.5 and 1.5. Notice also the large hump at the low end of the histogram, and look at the plot(points ~ x) curve again. Do you see how most of the values (whether they're at the low end or the high end of that curve) are close to 0, or at least less than 0.25. If you look at those details, you may be able to convince yourself that the histogram is, in fact, exactly what you should expect :)
If you want a Monte Carlo simulation of a sample from this object, you might try something like:
samples <- replicate(1000, sample(points, 100, replace = TRUE))
If you want to generate data using points as a probability density function, that question has been asked and answered here
Let's define your (not normalized) probability density function as a function:
library(gamlss)
fun <- function(x, mu = 1, sigma = 2, tau = 3, kappa = 3, rate = 1, Rmax = 20)
Rmax * dexGAUS(x, mu = mu, sigma = sigma, nu = tau) *
pgamma(x, shape = kappa, rate = rate)
Now one approach is to use some MCMC (Markov chain Monte Carlo) method. For instance,
simMCMC <- function(N, init, fun, ...) {
out <- numeric(N)
out[1] <- init
for(i in 2:N) {
pr <- out[i - 1] + rnorm(1, ...)
r <- fun(pr) / fun(out[i - 1])
out[i] <- ifelse(runif(1) < r, pr, out[i - 1])
}
out
}
It starts from point init and gives N draws. The approach can be improved in many ways, but I'm simply only going to start form init = 5, include a burnin period of 20000 and to select every second draw to reduce the number of repetitions:
d <- tail(simMCMC(20000 + 2000, init = 5, fun = fun), 2000)[c(TRUE, FALSE)]
plot(density(d))
You invert the ECDF of the distribution:
ecd.points <- ecdf(points)
invecdfpts <- with( environment(ecd.points), approxfun(y,x) )
samp.inv.ecd <- function(n=100) invecdfpts( runif(n) )
plot(density (samp.inv.ecd(100) ) )
plot(density(points) )
png(); layout(matrix(1:2,1)); plot(density (samp.inv.ecd(100) ),main="The Sample" )
plot(density(points) , main="The Original"); dev.off()
Here's another way to do it that draws from R: Generate data from a probability density distribution and How to create a distribution function in R?:
x <- seq(1, 2e1, 0.01)
points <- 20*dexGAUS(x,mu=1,sigma=2,nu=3)*pgamma(x,shape=3,rate=1)
f <- function (x) (20*dexGAUS(x,mu=1,sigma=2,nu=3)*pgamma(x,shape=3,rate=1))
C <- integrate(f,-Inf,Inf)
> C$value
[1] 11.50361
# normalize by C$value
f <- function (x)
(20*dexGAUS(x,mu=1,sigma=2,nu=3)*pgamma(x,shape=3,rate=1)/11.50361)
random.points <- approx(cumsum(pdf$y)/sum(pdf$y),pdf$x,runif(10000))$y
hist(random.points,1000)
hist((random.points*40),1000) will get the scaling like your original function.
I am studying the property of a simulated time series z_t. In brief, z_t is generated as follows: y_t is an AR(1) process with innovations e_t; z_t is y_t * e_t. I expect z_t to show lag-1 correlation.
I performed the Ljung-Box test for z_t in R and found that my result depends a lot on the seed value (for certain seeds I get p-value < 0.001, for others I get p-value near 1; I tested 10 seeds). I also tried to compute the p-value explicitly based on a possible definition of the LB test and found that it's always small. The same analysis done in Python provides low p-values. I suspect that I may be using the Box.test() function incorrectly.
Here is some code to reproduce the problem; the last 3 lines are for my hands-on LB test:
n <- 1000; phi = 0.9; set.seed(9)
errors <- rnorm(n); Y <- rep(0, times <- n)
for (k in 2:1000){
Y[k] <- phi * Y[k-1] + errors[k]
}
y <- ts(Y)
z <- y * errors
Box.test(z, lag=1, type="Ljung-Box", fitdf = 0)
# note: result doesn't depend very much on number of lags
# sometimes result > 0.1.
r1 <- sum(z*lag(z,-1)) / sum(z**2) #1st order autocorrelation
LB <- n * ((n+2)/(n-1))*r1**2 #LB statistic for lag = 1.
1 - pchisq(LB, 1) # p-value of the LB test; always << 1.
I do a MC simulation of a portfolio with 4 assets (Bond, equity, equity, cash market)
I use monthly steps and my simulation horizon is 10 years i.e. 120 steps. My final goal is to compute the yearly expected shortfall, i.e. taking the worst 5% of the Portfolio Returns.
The simulation seems to be ok - at the first glance. However, I have the impression the drift dominates the process over time, so my expected shortfall is even positive for the long end. Also the expected shortfall eventually decreases when I increase the weights for equity. This is also true when I set the expected return for each asset to zero and hence the increased risk should drag the expected shortfall down.
I expect a bug in my code but can't see it. Any advice highly appreciated!
#maturity in years
maturity <- 10
#Using monthly steps
nsteps <- maturity*12
dt <- maturity / nsteps
#number of assets
nAssets = 4
#number of simulations
nTrails = 10000
#expected return p.a. for each asset, stored in vector BM.mu
BM.mu <- rep(NA,nAssets)
BM.mu[1] <- 0.0072
BM.mu[2] <- 0.0365
BM.mu[3] <- 0.04702
BM.mu[4] <- 0.0005
#defining variable size
simulated.Returns <- array(NA, dim = c(nsteps+1, nTrails, nAssets))
cumulative.PortReturns <- matrix(rep(NA,nsteps*nTrails), nrow = nsteps, ncol = nTrails)
ES <- rep(NA, maturity)
#defining my monthly correlation and covariance matrix
corr_matrix <- matrix(c(1.000000000, -0.05081574, -0.07697585, 0.0051,
-0.050815743, 1.00000000, 0.80997805, -0.3540,
-0.076975850, 0.80997805, 1.00000000, -0.3130,
0.005095699, -0.35365332, -0.31278506, 1.0000), nrow = 4, ncol = 4)
cov_matrix <- matrix(c(1.44e-04, -2.20e-05, -3.86e-05, 8.44e-08,
-2.20e-05, 1.30e-03, 1.22e-03, -1.76e-05,
-3.86e-05, 1.22e-03, 1.75e-03, -1.81e-05,
8.44e-08, -1.76e-05, -1.81e-05, 1.90e-06), nrow = 4, ncol = 4)
#defining my portfolio weights
port.weights <- c(0.72, 0.07, 0.07, 0.14)
#performing cholesky decomposition
R <- (chol(corr_matrix))
#generating standard-normal, random variables
x <- array(rnorm(nsteps*nTrails*nAssets), c(nsteps*nTrails,nAssets))
#generating correlated standard-normal, random variables
ep <- x %*% R
#defining the drift
drift <- BM.mu - 0.5 * diag(cov_matrix)
#generating asset paths
temp = array(exp(as.vector(drift %*% t(dt)) + t(ep *sqrt(diag(cov_matrix)))), c(nAssets,nsteps,nTrails))
for(i in 2:nsteps) temp[,i,] = temp[,i,] * temp[,(i-1),]
#changing dimension of the array temp from dim(nAssets, nsteps, nTrails) to dim(nsteps, nAssets, nTrails)
simulated.Returns <- aperm(temp,c(2,1,3))
#computing portfolio returns for each simulation (nTrails). To do this, each step is weighted with "port.weights"
#Since I generate continuous returns, I first transform them into discrete, multiply with weights and then transform back into continuous.
for (z in 1:nTrails) {
for (i in 1:nsteps) cumulative.PortReturns[i,z] = log(1+((exp(simulated.Returns[i,,z]-1)-1) %*% port.weights))
}
#Finally I compute the monthly expected shortfall (5%-level) by taking the average of the 5% worst portfolio yields
#I do steps of 12 as I calculate the ES at the end of each year
z = 0
for (i in seq(12, nsteps, by = 12 )) {
z = z + 1
ES[z] <- mean(sort(cumulative.PortReturns[i,]) [1:(0.05*nTrails)])
}
#plotting a sample of simulated portfolio returns
#library(QRM)
plot(as.timeSeries(cumulative.PortReturns[,1:100]), plot.type = 'single')
From your comments, you have defined BM.mu to be the annual expected return for each asset. However, you are simulating each sample path using monthly rather than annual steps. This then needs to be incorporated in your drift variable by scaling BM.mu to the expected monthly return accordingly:
#defining the drift
drift <- BM.mu/12 - 0.5 * diag(cov_matrix)
Without this, you are computing the drift value using an annual expected return value and a monthly covariance matrix. This is resulting in a larger drift than you expect, which would impact on the results you are seeing.
I need to simulate a stock's daily returns. I am given r=(P(t+1)-P(t))/P(t) (normal distribution) mean of µ=1% and sd of σ =5%. P(t) is the stock price at end of day t. Simulate 100,000 instances of such daily returns.
Since I am a new R user, how do I setup t for this example. I am assuming P should be setup as:
P <- rnorm(100000, .01, .05)
r=(P(t+1)-P(t))/P(t)
You are getting it wrong: from what you wrote, the mean and the sd applies on the return and not on the price. I furthermore make the assumption that the mean is set for an annual basis (1% rate of return from one day to another is just ...huge!) and t moves along a day range of 252 days per year.
With these hypothesis, you can get a series of daily return in R with:
r = rnorm(100000, .01/252, .005)
Assuming the model you mentioned, you can get the serie of the prices P (containing 100001 elements, I will take P[1]=100 - change it with your own value if needed):
factor = 1 + r
temp = 100
P = c(100, sapply(1:100000, function(u){
p = factor[u]*temp
temp<<-p
p
}))
Your configuration for the return price you mention (mean=0.01 and sd=0.05) will however lead to exploding stock price (unrealistic model and parameters). Be carefull to check that prod(rate) will not return Inf .
Here is the result for the first 1000 values of P, representing 4 years:
plot(1:1000, P[1:1000])
One of the classical model (which does not mean this model is realistic) assumes the observed log return are following a normal distribution.
Hope this helps.
I see you already have an answer and ColonelBeauvel might have more domain knowledge than I (assuming this is business or finance homework.) I approached it a bit differently and am going to post a commented transcript. His method uses the <<- operator which is considered as a somewhat suspect strategy in R, although I must admit it seems quite elegant in this application. I suspect my method will probably be a lot faster if you ever get into doing large scale simulations.
Starting with your code:
P <- rnorm(100000, .01, .05)
# r=(P(t+1)-P(t))/P(t) definition, not R code
# inference: P_t+1 = r_t*P_t + P_t = P_t*(1+r_t)
# So, all future P's will be determined by P_1 and r_t
Since P_2 will be P_1*(1+r_1)r_1 then P_3 will be P_1*(1+r_1)*(1+r_2), .i.e a continued product of the vector (1+r) for which there is a vectorized function.
P <- P_1*cumprod(1+r)
#Error: object 'P_1' not found
P_1 <- 100
P <- P_1*cumprod(1+r)
#Error: object 'r' not found
# So the random simulation should have been for `r`, not P
r <- rnorm(100000, .01, .05)
P <- P_1*cumprod(1+r)
plot(P)
#Error in plot.window(...) : infinite axis extents [GEPretty(-inf,inf,5)]
str(P)
This occurred because the cumulative product went above the limits of numerical capacity and got assigned to Inf (infinity). Let's be a little more careful:
r <- rnorm(300, .01, .05)
P <- P_1*cumprod(1+r)
plot(P)
This strategy below iteratively updates the price at time t as 'temp' and multiplies it it by a single value. It's likely to be a lot slower.
r = rnorm(100000, .01/252, .005)
factor = 1 + r
temp = 100
P = c(100, sapply(1:300, function(u){
p = factor[u]*temp
temp<<-p
p
}))
> system.time( {r <- rnorm(10000, .01/250, .05)
+ P <- P_1*cumprod(1+r)
+ })
user system elapsed
0.001 0.000 0.002
> system.time({r = rnorm(10000, .01/252, .05)
+ factor = 1 + r
+ temp = 100
+ P = c(100, sapply(1:300, function(u){
+ p = factor[u]*temp
+ temp<<-p
+ p
+ }))})
user system elapsed
0.079 0.004 0.101
To simulate a log return of the daily stock, use the following method:
Consider working with 256 days of daily stock return data.
Load the original data into R
Create another data.frame for simulating Log return.
Code:
logr <- data.frame(Date=gati$Date[1:255], Shareprice=gati$Adj.Close[1:255], LogReturn=log(gati$Adj.Close[1:251]/gati$Adj.Close[2:256]))
gati is the dataset
Date and Adj.close are the variables
notice the [] values.
P <- rnorm(100000, .01, .05)
r=(P(t+1)-P(t))/P(t)
second line translates directly into :
r <- (P[-1] - P[length(P)]) / P[length(P)] # (1:5)[-1] gives 2:5
Stock returns are not normally distributed for Simple Returns ("R"), given their -1 lower bound per compounded period. However, Log Returns ("r") generally are. The below is adapted from #42's post above. There don't seem to be any solutions to simulating from Log Mean ("Expected Return") and Log Stdev ("Risk") in #Rstats, so I've included them here for those looking for "Monte Carlo Simulation using Log Expected Return and Log Standard Deviation"), which are normally distributed, and have no lower bound at -1. Note: from this single example, it would require looping over thousands of times to simulate a portfolio--i.e., stacking 100k plots like the below and averaging a single slice to calculate a portfolio's average expected return at a chosen forward month. The below should give a good basis for doing so.
startPrice = 100
forwardPeriods = 12*10 # 10 years * 12 months with Month-over-Month E[r]
factor = exp(rnorm(forwardPeriods, .04, .10)) # Monthly Expected Ln Return = .04 and Expected Monthly Risk = .1
temp = startPrice
P = c(startPrice, sapply(1:forwardPeriods, function(u){p = factor[u]*temp; temp <<- p; p}))
plot(P, type = "b", xlab = "Forward End of Month Prices", ylab = "Expected Price from Log E[r]", ylim = c(0,max(P)))
n <- length(P)
logRet <- log(P[-1]/P[-n])
# Notice, with many samples this nearly matches our initial log E[r] and stdev(r)
mean(logRet)
# [1] 0.04540838
sqrt(var(logRet))
# [1] 0.1055676
If tested with a negative log expected return, the price should not fall below zero. The other examples, will return negative prices with negative expected returns. The code I've shared here can be tested to confirm that negative prices do not exist in the simulation.
min(P)
# [1] 100
max(P)
# [1] 23252.67
Horizontal axis is number of days, and vertical axis is price.
n_prices <- 1000
volatility <- 0.2
amplitude <- 10
chng <- amplitude * rnorm(n_prices, 0, volatility)
prices <- cumsum(chng)
plot(prices, type='l')