R: Autocorrelation in a matrix - r

i have do to a monte carlo approach for AR(1) time series. I have to generate 10,000 time series of length 100 and afterwards i have to get the first step autocorrelation rho_1 for every time series. My problem is that i just get NA values for the autocorrelation and the calculation takes way to much time. I have no problem with computing the AR(1) time series.
Thank you for your help :)
gen_ar <- function(a,b,length,start)
{
z<-rep(0,length)
e<-rnorm(n=length,sd=1)
z[1]<-start
for (i in 2:length)
{
z[i]<-a+b*z[i-1]+e[i]
}
z
}
mc <- matrix(c(rep(0,10000000)),nrow=10000)
for (i in 1:10000)
{
mc[i,] <- gen_ar(0.99,1,100,0)
}
ac <- matrix(c(rep(0,10000)),nrow=1)
for (i in 1:10000){
for (j in 1:99){
ac[i] <- cor(mc[i,j],mc[i,j+1])
}
}

Statistics aside, I think this achieves your goals, and I don't get NA's. I changed the way it was done b/c you said it was going slow.
mc <- matrix(rep(NA,1E5), nrow=100)
for(i in seq_len(100)){
mc[,i] <- arima.sim(model=list(ar=0.99), n=100, sd=1) + 1
}
myAR <- function(x){
cor(x[-1], x[-length(x)])
}
answer <- apply(mc, 2, myAR)
I skipped the last set of nested for loops and replaced them with apply(). It seems easier to read, and is likely faster. Also, to use apply(), I created a function called myAR, which carries out the same calculation that cor() did in your for() loops.
Now, there are a couple of statistical adjustments that I made. Primarily, these were in the simulation step.
First, your simulated AR(1) process has a coefficient that is equal to 1, which seems odd to me (this would not be stationary, and arima.sim() won't even let you simulate this type of process).
Moreover, your "a" parameter adds 1 to the time series at each time step. In other words, your time series is monotonically increasing from 1 to 100 because the coefficient is equal to 1. This too would make your time series nonstationary, and with such a strong positive slope the cor() function would likely return 1 as the estimated correlation, regardless of the value of the simulated AR coefficient. I assume that you wanted the long-term mean to hover near 1, so the 1 is simply added to the entire time series after it is simulated, not iteratively at each time step.
Assuming that you did want to generate a nonstationary time series by adding some constant (a) at each time step, you could do the following:
myInnov <- function(N=100, a=1, SD=1) {a + rnorm(n=N, sd=SD)}
mc2 <- matrix(rep(NA,1E7), nrow=100)
for(i in seq_len(1E5)){
mc2[,i] <- arima.sim(model=list(ar=0.99), n=100, innov=myInnov(a=1, N=100, SD=1)) + 1
}
I hope that this helps.

Related

Calculate Errors using loop function in R

I have two data matrices both having the same dimensions. I want to extract the same series of columns vectors. Then take both series as vectors, then calculate different errors for example mean absolute error (mae), mean percentage error (mape) and root means square error
(rmse). My data matrix is quite large dimensional so I try to explain with an example and calculate these errors manually as:
mat1<- matrix(6:75,ncol=10,byrow=T)
mat2<- matrix(30:99,ncol=10,byrow=T)
mat1_seri1 <- as.vector(mat1[,c(1+(0:4)*2)])
mat1_seri2<- as.vector(mat1[,c(2+(0:4)*2)])
mat2_seri1 <- as.vector(mat1[,c(1+(0:4)*2)])
mat2_seri2<- as.vector(mat1[,c(2+(0:4)*2)])
mae1<-mean(abs(mat1_seri1-mat2_seri1))
mae2<-mean(abs(mat1_seri2-mat2_seri2))
For mape
mape1<- mean(abs(mat1_seri1-mat2_seri1)/mat1_seri1)*100
mape2<- mean(abs(mat1_seri2-mat2_seri2)/mat1_seri2)*100
similarly, I calculate rmse from their formula, as I have large data matrices so manually it is quite time-consuming. Is it's possible to do this using looping which gives an output of the errors (mae,mape,rmse) term for each series separately.
I'm not sure if this is what you are looking for, but here is a function that could automate the process, maybe there is also a better way:
fn <- function(m1, m2) {
stopifnot(dim(m1) == dim(m2))
mat1_seri1 <- as.vector(m1[, (1:ncol(m1))[(1:ncol(m1))%%2 != 0]])
mat1_seri2 <- as.vector(m1[, (1:ncol(m1))[!(1:ncol(m1))%%2]])
mat2_seri1 <- as.vector(m2[, (1:ncol(m2))[(1:ncol(m2))%%2 != 0]])
mat2_seri2 <- as.vector(m2[, (1:ncol(m2))[!(1:ncol(m2))%%2]])
mae1 <- mean(abs(mat1_seri1-mat2_seri1))
mae2 <- mean(abs(mat1_seri2-mat2_seri2))
mape1 <- mean(abs(mat1_seri1-mat2_seri1)/mat1_seri1)*100
mape2 <- mean(abs(mat1_seri2-mat2_seri2)/mat1_seri2)*100
setNames(as.data.frame(matrix(c(mae1, mae2, mape1, mape2), ncol = 4)),
c("mae1", "mae2", "mape1", "mape2"))
}
fn(mat1, mat2)
mae1 mae2 mape1 mape2
1 24 24 92.62581 86.89572

How to find the probability of extinction = 1 using Galton-Watson process in R?

I am simulating a basic Galton-Watson process (GWP) using a geometric distribution. I'm using this to find the probability of extinction for each generation. My question is, how do I find the generation at which the probability of extinction is equal to 1?
For example, I can create a function for the GWP like so:
# Galton-Watson Process for geometric distribution
GWP <- function(n, p) {
Sn <- c(1, rep(0, n))
for (i in 2:(n + 1)) {
Sn[i] <- sum(rgeom(Sn[i - 1], p))
}
return(Sn)
}
where, n is the number of generations.
Then, if I set the geometric distribution parameter p = 0.25... then to calculate the probability of extinction for, say, generation 10, I just do this:
N <- 10 # Number of elements in the initial population.
GWn <- replicate(N, GWP(10, 0.25)[10])
probExtinction <- sum(GWn==0)/N
probExtinction
This will give me the probability of extinction for generation 10... to find the probability of extinction for each generation I have to change the index value (to the corresponding generation number) when creating GWn... But what I'm trying to do is find at which generation will the probability of extinction = 1.
Any suggestions as to how I might go about solving this problem?
I can tell you how you would do this problem in principle, but I'm going to suggest that you may run into some difficulties (if you already know everything I'm about to say, just take it as advice to the next reader ...)
theoretically, the Galton-Watson process extinction probability never goes exactly to 1 (unless prob==1, or in the infinite-time limit)
of course, for any given replicate and random-number seed you can compute the first time point (if any) at which all of your lineages have gone extinct. This will be highly variable across runs, depending on the random-number seed ...
the distribution of extinction times is extremely skewed; lineages that don't go extinct immediately will last a loooong time ...
I modified your GWP function in two ways to make it more efficient: (1) stop the simulation when the lineage goes extinct; (2) replace the sum of geometric deviates with a single negative binomial deviate (see here)
GWP <- function(n, p) {
Sn <- c(1, rep(0, n))
for (i in 2:(n + 1)) {
Sn[i] <- rnbinom(1, size=Sn[i - 1], prob=p)
if (Sn[i]==0) break ## extinct, bail out
}
return(Sn)
}
The basic strategy now is: (1) run the simulations for a while, keep the entire trajectory; (2) compute extinction probability in every generation; (3) find the first generation such that p==1.
set.seed(101)
N <- 10 # Number of elements in the initial population.
maxgen <- 100
GWn <- replicate(N, GWP(maxgen, 0.5), simplify="array")
probExtinction <- rowSums(GWn==0)/N
which(probExtinction==1)[1]
(Subtract 1 from the last result if you want to start indexing from generation 0.) In this case the answer is NA, because there's 1/10 lineages that manages to stay alive (and indeed gets very large, so it will probably persist almost forever)
plot(0:maxgen, probExtinction, type="s") ## plot extinction probability
matplot(1+GWn,type="l",lty=1,col=1,log="y") ## plot lineage sizes (log(1+x) scale)
## demonstration that (sum(rgeom(n,...)) is equiv to rnbinom(1,size=n,...)
nmax <- 70
plot(prop.table(table(replicate(10000, sum(rgeom(10, prob=0.3))))),
xlim=c(0,nmax))
points(0:nmax,dnbinom(0:nmax, size=10, prob=0.3), col=2,pch=16)

Monte Carlo Simulation for DCF Model in R

I am trying to create a function where Monte Carlo Simulation is applied to two of the variables in a DCF Model in R Studio. It supposed to take a first value FCF_0 and applied to it a specific growth FCF_ 0*(1 + growth), which is the first input variable until period 6, each period takes the last FCF to keep growing. After that I would like to discount it as well to get the present value which would be FCFn*(1/((1+WACC)^n)). Where WACC is the second variable to simulate.
So far I have the function to calculate the FCF but with a vector of specifics values of growth, which is the following:
What I am trying so far to create this function is this, but I think is bad.
Could you please help me to understand how to create both simulations and if it is neccesary for me to create two functions or in one function I can do everything? I would expect from the function to give the sum of all present values and each sum would be an element in a vector of 10.000 simulations. I am new at this and even though I have read almost for two weeks, I don't get how to create these simulations.
Thank you very much!
revfunc <- function(hist, growth){
rval <- c()
help <- c(hist)
for(i in growth){
help <- help*(1+i)
rval <- c(rval, help)
}
return(rval)
}
Monte Carlo Simulations
pvffcf_function <- function(fcf0, growth, wacc){
rval1 <- c()
help <- c(fcf0)
pvs <- rval1*(1/((1+wacc)^n))
random_growth <- rnorm(n=10000, mean(fcfgrowth), sd(fcfgrowth))
wacc <- rnorm(n=10000, 0.03804, 0.007711)
pvffcf <- sum(freecashflows)
for(i in growth){
help <- help*(1+i)
rval1 <- c(rval1, help)
}
return(freecashflows)
}

Efficient simulation of brownian motion with drift in R

I want to efficiently simulate a brownian motion with drift d>0, where the direction of the drift changes, if some barriers b or -b are exceeded (no reflection, just change of drift direction!).
A for-loop is the simple way doing this
step<-0.1 #step size
sig<-1 #sign of drift
T<-10^4 #length of process
b<-300; d<-0.5#barrier and drift
W<-rep(NA,(T/step))
W[1]<-0
for (i in 2:(T/step))
{
if (W[i-1]>b) {sig<- -1} #change drift to -1
if (W[i-1]< -b) {sig<-1} #change drift to +1
W[i]<-W[i-1]+rnorm(1,d*sig*step,sqrt(step))
}
Of course this loop takes much time in R especially for small step size.
Therefore, I am interested in a more efficient solution maybe using vectorial operations or the apply()-command. (in case of a simple brownian motion I can use cumsum(rnorm()), is a similar solution possible here?)
Thanks a lot!!
You have a recursive calculation for W[i] and sig which also does a bit of logic at each step. There's probably not much you can do in R to reduce the execution times by a large factor but there are a couple of things that will reduce times by almost 50%. In particular rather than calling rnorm on each step, vectorize this calculation by calling rnorm once with mean=0 to compute num_step values and store the result. In each step of the loop, the value from this vector is added to the mean for that step. The logic determining the value of sig can also be simplified a bit. The code with timings for the posted approach and the new code are:
step<-0.1 #step size
T<-10^4 #length of process
b<-300; d<-0.5 #barrier and drift
print(system.time({
sig <- 1 #sign of drift
set.seed(123) # set seed
W<-rep(NA,(T/step))
W[1]<-0
for (i in 2:(T/step))
{
if (W[i-1]>b) {sig<- -1} #change drift to -1
if (W[i-1]< -b) {sig<-1} #change drift to +1
W[i]<-W[i-1]+rnorm(1,d*sig*step,sqrt(step))
}
}))
print(system.time({
sig <- 1 # reset value of sig
set.seed(123) # reset seed
num_steps <- trunc(T/step)
W1 <- numeric(num_steps)
ep <- rnorm(num_steps, 0, sqrt(step))
for (i in 2:num_steps) {
if(abs(W1[i-1]) > b) sig <- ifelse( W1[i-1] >b, -1, 1)
W1[i] <- W1[i-1]+d*sig*step +ep[i-1]
}
}))
The results of the two calculations, W and W1, should be identical.

Monte Carlo simulation in R

I am trying to simulate data (Y) from an AR(1) model with rho=0.7. Then I will use this data to run a regression of Y on an intercept ( by so doing the parameter estimate becomes the mean of Y), then test the null hypothesis of the coefficient being less than or equal to zero ( alternative is greater than 0) using robust standard errors.
I want to run a Monte Carlo simulation of this hypothesis using 2000 replications for different lag values. the purpose is to show the finite sample performance of the Newey West estimator as the lag changes. so this is how I began
A<-array(0, dim=c(2000,1))
for(i in 1:2000){
y_new<-arima.sim(model=list(ar=0.7), n=50, mean=0,sd=1)
reg<-lm(y_new~1)
ad<-coeftest(reg, alternative="greater", vcov=NeweyWest(reg, lag=1, prewhite=FALSE))
A[i]<-ad[,3]
}
My question: is the code above the right way of doing this kind of simulation? And if it is, how can I get a code to repeat this process for different lag values in the HAC test. I want to run the test each time increasing the lag by 1, thus I will be doing this 50 times for lags 1,2,3,4......,50, each time storing the 2000 simulated test statistics in a vector with different names. calculate rejection probabilities for the test statistic (sig. level =0,05, using the critical value of 1.645) for each case and plot them(rejection probabilities) against the various lag values.
Please help
Because you didn't mention the possible purpose of the simulation, it is hard to tell whether it is the right way.
You save a lot of time by computing 50 test statistics for each simulated sample, instead of repeating the simulation 2000 times for each lag (that is, the number of simulation is 2000*50).
Much better format of doing simulation is
library(AER)
library(dplyr)
lags <- 1:50
nreps <- 2000
sim <- function (){
ynew <- arima.sim(model = list(ar=0.7), n=50, mean=0, sd=1)
reg <- lm(ynew ~ 1 )
s <- rep(NA, 50)
for(i in lags){
ad <- coeftest(reg, alternative="greater", vcov=NeweyWest(reg, lag = i, prewhite=FALSE))
s[i] <- ad[ ,4]
}
s
}
Following code stores simulation results in a data.frame
result <- lapply(1:nreps, function(i)data.frame(simulation = i, lag = lags, pvalues = sim())) %>%
rbind_all
From your vague description, I extrapolate what you want looks something like
library(ggplot2)
result %>%
group_by(lag) %>%
summarize(rejectfreq = mean(pvalues > 0.05)) %>%
ggplot(., aes(lag, rejectfreq)) + geom_line()+
coord_cartesian(ylim = c(0,1)) +
scale_y_continuous(breaks=seq(0, 1, by=0.1))
Although the figure was created using only 100 simulations, it is evident that the choice of the lags in Newey-West wouldn't matter much when the disturbance terms are i.i.d.

Resources