AR(2) model simulation with initial points - r

I have an AR(2) model as r_t=0.03+0.2r_{t-2}+a_t with var(a)=0.1. I want to simulate this 1000 times with r_0=-0.02 and r_{-1}=0.01. This is what I wrote but I'm pretty sure it's not correct, because when the first two values that it produces are not the ones I gave it. How do I do this?
mod1 <- arima.sim(list(ar=c(0,0.2), order=c(2,0,0)), n=1000, n.start=2, start.innov= c(0.01,-0.02), sd=sqrt(0.1))+0.03
> head(mod1)
Time Series:
Start = 1
End = 6
Frequency = 1
[1] 0.2629583 -0.6263199 -0.2020755 -0.2865246 -0.3953399 -0.7285421
Thanks in advance!

I figured that what this function outputs is from the next point of what is given to it. for example, if you know r_5=0.01 and r_6=-0.02, and want to forecast 1-step and 2-step ahead with simulation, then n=2 and start.innov=c(0.01,-0.02). If you put innov=c(0,0), what you get will be same as your analytical solution.

Related

R Studio - How to predict row-by-row and use previous prediction in next one - linear model

Sorry if this is unclear, had trouble titling this.
Basically I have a linear model that predicts sales and one of the factors is the previous 10 days of sales. So, when predicting for the next month, I need an estimated number for what the "previous 10 days of sales" is for each day in the month.
I want to use the model to generate these numbers - so, for the first day I'm trying to predict, I have the actual number for the last 10 days of sales. For the day after that, I have 9 days of real data, plus the one predicted number generated. For the day after that, 8 days of real data and two generated, etc.
Not quite sure how to implement this and would appreciate any help. Thanks so much.
The first thing that came to mind would be a moving average using the predicted data. This gets hard to defend though once you're averaging only predicted data but its a place to start.
moving.average = 0
test.dat = rnorm(100, 10,2)
for(i in 1:30){
moving.average[i] = mean(test.dat[i:i+10])
}
Hope this is helpful
Kathy, get your first 10 data points from... where-ever. Seed your prediction with it.
initialization <- c(9.463, 9.704, 10.475, 8.076, 8.221, 8.509,
10.083, 9.572, 8.447, 10.081)
prediction = initialization
Here's a silly prediction function that uses the last 10 values:
predFn <- function(vec10){
stopifnot(length(vec10) == 10)
round(mean(vec10) + 1 , 3)
}
Although I usually like to use the map family, this one seems like it wants to be a loop
for(i in 11:20){
lo = i - 10
hi = i - 1
prediction[i] <- predFn(prediction[lo:hi])
}
What did we get?
prediction
# [1] 9.463 9.704 10.475 8.076 8.221 8.509 10.083 9.572 8.447 10.081 10.263 10.343 10.407 10.400 10.633 10.874 11.110 11.213
# [19] 11.377 11.670

Calculating the mean and variance of a periodic (circular) variable in R

I have several variables in my dataset that represent daily timing of events across a week.
For example for two rows might look like:
t1 = c(NA,12.6,10.7,11.5,12.5,9.5,14.1)
t2 = c(23.7,1.2,NA,22.9,23.2,0.5,0.1)
I want to calculate the variance of these rows. To do this, I need the mean and because these are periodic variables, I've adapted the code from this page:
#This can all be wrapped in a function like this
circ.mean <- function(m,int,na.rm=T) {
if(na.rm) m <- m[!is.na(m)]
rad.m = m*(360/int)*(pi/180)
mean.cos = mean(cos(rad.m))
mean.sin = mean(sin(rad.m))
x.deg = atan(mean.sin/mean.cos)*(180/pi)
return(x.deg/(360/int))
}
This works as expected for t2:
> circ.mean(t2,24)
[1] -0.06803088
although ideally the answer would be 23.93197. But for t1, it gives an incorrect answer:
> circ.mean(t1,24)
[1] -0.1810074
whereas using the normal mean function gives the right answer:
> mean(t1,na.rm=T)
[1] 11.81667
My questions are:
1) Is this "circular mean" code correct and if so, am I using it correctly?
2) I've had a stab my own circ.var function (see below) to calculate the variance of a periodic variable - will this produce the correct variances for all possible input timing vectors?
circ.var <- function(m,int=NULL,na.rm=TRUE) {
if(is.null(int)) stop("Period parameter missing")
if(na.rm) m <- m[!is.na(m)]
if(sum(!is.na(m))==0) return(NA)
n=length(m)
mean.m = circ.mean(m,int)
var.m = 1/(n-1)*sum((((m-mean.m+(int/2))%%int)-(int/2))^2)
return(var.m)
}
Any help would be hugely appreciated! Thanks for taking the time to read this!
I deleted my old answer, as I believe there was a mistake in the solution I provided.
I've written a series of R scripts that I've made available at my GitHub page which should calculate the mean, variance and other stats.
Thanks to #Gregor for his help.

How to force rpart to do exactly 1 Split

Having a problem similar to this, I am trying to force rpart to do exactly one split. Here is a toy example that reproduces my problem:
require(rpart)
y <- factor(c(1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))
x1 <- c(12,18,15,10,10,10,20,6,7,34,7,11,10,22,4,19,10,8,13,6,7,47,6,15,7,7,21,7,8,10,15)
x2 <- c(318,356,341,189,308,236,290,635,550,287,261,472,282,262,1153,435,402,182,415,544,251,281,378,498,142,566,152,560,284,213,326)
data <- data.frame(y=y,x1=x1,x2=x2)
tree <-rpart(y~.,
data=data,
control=rpart.control(maxdepth=1, # at most 1 split
cp=0, # any positive improvement will do
minsplit=1,
minbucket=1, # even leaves with 1 point are accepted
xval=0)) # I don't need crossvalidation
length(tree$frame$var) #==1, so there are no splits
Isolating a single point should be possible (minbucket=1) and even the most marginal improvement (isolating one point always decreases the misclassification rate) should lead to the split being kept (cp=0).
Why does the result not include any splits? And how do I have to alter the code to always get exactly one split? Can it be that splits are not kept if both classify to the same factor output?
Change cp = 0 to cp = -1.
Apparently the cp for the first split (maxdepth = 3) is 0.0000000. So going negative allows it to show up with maxdepth = 1.

how to use previous observations to forecast the next period using for loops in r?

I have made 1000 observations for xt = γ1xt−1 + γ2xt−2 + εt [AR(2)].
What I would like to do is to use the first 900 observations to estimate the model, and use the remaining 100 observations to predict one-step ahead.
This is what I have done so far:
data2=arima.sim(n=1000, list(ar=c(0.5, -0.7))) #1000 observations simulated, (AR (2))
arima(data2, order = c(2,0,0), method= "ML") #estimated parameters of the model with ML
fit2<-arima(data2[1:900], c(2,0,0), method="ML") #first 900 observations used to estimate the model
predict(fit2, 100)
But the problem with my code right now is that the n.ahead=100 but I would like to use n.ahead=1 and make 100 predictions in total.
I think I need to use for loops for this, but since I am a very new user of Rstudio I haven't been able to figure out how to use for loops to make predictions. Can anyone help me with this?
If I've understood you correctly, you want one-step predictions on the test set. This should do what you want without loops:
library(forecast)
data2 <- arima.sim(n=1000, list(ar=c(0.5, -0.7)))
fit2 <- Arima(data2[1:900], c(2,0,0), method="ML")
fit2a <- Arima(data2[901:1000], model=fit2)
fc <- fitted(fit2a)
The Arima command allows a model to be applied to a new data set without the parameters being re-estimated. Then fitted gives one-step in-sample forecasts.
If you want multi-step forecasts on the test data, you will need to use a loop. Here is an example for two-step ahead forecasts:
fcloop <- numeric(100)
h <- 2
for(i in 1:100)
{
fit2a <- Arima(data2[1:(899+i)], model=fit2)
fcloop[i] <- forecast(fit2a, h=h)$mean[h]
}
If you set h <- 1 above you will get almost the same results as using fitted in the previous block of code. The first two values will be different because the approach using fitted does not take account of the data at the end of the training set, while the approach using the loop uses the end of the training set when making the forecasts.

Doing linear prediction with R: How to access the predicted parameter(s)?

I am new to R and I am trying to do linear prediction. Here is some simple data:
test.frame<-data.frame(year=8:11, value= c(12050,15292,23907,33991))
Say if I want to predict the value for year=12. This is what I am doing (experimenting with different commands):
lma=lm(test.frame$value~test.frame$year) # let's get a linear fit
summary(lma) # let's see some parameters
attributes(lma) # let's see what parameters we can call
lma$coefficients # I get the intercept and gradient
predict(lm(test.frame$value~test.frame$year))
newyear <- 12 # new value for year
predict.lm(lma, newyear) # predicted value for the new year
Some queries:
if I issue the command lma$coefficients for instance, a vector of two values is returned to me. How to pick only the intercept value?
I get lots of output with predict.lm(lma, newyear) but cannot understand where the predicted value is. Can someone please clarify?
Thanks a lot...
intercept:
lma$coefficients[1]
Predict, try this:
test.frame <- data.frame(year=12, value=0)
predict.lm(lma, test.frame)

Resources