acf() function "lag.max" parameter explanation - r

I am using the acf function in Time Series Analysis and have confusion understanding the lag.max argument in it.
The help for the function gives the following explanation for lag.max-
lag.max: maximum lag at which to calculate the acf. Default is
10*log10(N/m) where N is the number of observations and m the
number of series. Will be automatically limited to one less
than the number of observations in the series.
What's m or the number of series?
Say I have a time series having monthyl data for the past 34 months and I need to make a prediction for the next month (or the 35th month).
In this case N will be 34, but what should be m so that I can calculate "lag.max" parameter?
Thanks!

m is the dimension of your data. So it just matters if you have a multidimensional time series. In your case, as I understand from your question, m=1.
N<-200
a<-1:N
b<-1:N
acf(a)
# m=1
# lag.max = 10*log10(N/1) = 23"
df<-data.frame(a,b)
acf(df)
# m=2
# lag.max = 10*log10(N/2) = 20"

Related

How are the intermediate values between observations in time series calculated in ts() in R?

I need help regarding how frequency affects my time series. I fit a daily time series data with frequency = 7 When I view the time series, I get intermediate values between days. I have data for 60 days. I created a time series for the same
ts.v1<- ts(V1, start = as.Date("2017-08-01"), end = as.Date("2017-09-30"), frequency = 7)
which gives me 421 values. I kind of understood that it has to do with the frequency as the value is a product of 7 and 60. What I need to know is- how are these calculated? And why? Isn't frequency used only to tell your time series whether the data is daily/weekly/annual etc.? (I referred to this)
Similarly in my ACF and PACF plots, the lag values are < 1 meaning there are seven values to make 1 'lag'. In that scenario, when I estimate arima(p,d,q) using these plots would the values be taken as lag x frequency?
Normally one does not use Date class with ts. With ts, the frequency is the number of points in a unit interval. Just use:
ts(V1, frequency = 7)
The times will be 1, 1 + 1/7, 1 + 2/7, ... You can later match them to the proper dates if need be.

Why is EMA not equal when comparing samples of different length?

I am struggling to understand why EMA (Exponential Moving Average) is different in these 2 cases.
options(scipen = 10)
library(quantmod)
getSymbols("AAPL", src = "google")
data1 <- EMA(AAPL[, "AAPL.Close"])
data2 <- EMA(tail(AAPL[, "AAPL.Close"], n = 10))
result <- data.frame(tail(data1, n = 1), tail(data2, n = 1))
In the first EMA call i supply as parameter the whole AAPL sample. In the second EMA call i supply only the minimum amount of data to calculate EMA for the last date. If i compare calculated value for the last day, they are different.
In concrete, result[1,1] is 152.907 and result[1,2] is 152.623. Why is this happening? I would expect both numbers to be the same, since EMA is not cumulative.
That is because the exponential decay has 'infinite' history: it decays, but never completely goes away.
So the set of weights is different between your two use cases, and hence so is the result. There is a decent recent series on MAs for smoothing price series:
part one
part two
It should make it clear that what you were expecting only holds for SMA which does in fact go to zero weights.

Wind Speed time series simulation in R

Following up from an R blog which is interesting and quite useful to simulate the time series of an unknown area using its Weibull parameters.
Although this method gives a reasonably good estimate of time series as a whole it suffers a great deal when we look for seasonal changes.
Lets see an example:
This method would give wind speeds for say below months, for a particular set of Weibull parameters as:
Jan 7.492608
Feb 7.059587
March 7.261821
Apr 7.192106
May 7.399982
Jun 7.195889
July 7.290898
Aug 7.210269
Sept 7.219063
Oct 7.307073
Nov 7.135451
Dec 7.315633
It can be seen that the variation in wind speed is not that much and in reality, the variation will change from one month to another. If I were to prioritise a certain month say July and June over months of November and December such that the Weibull remains unchanged. How would I do it?
Any lead or advice to make these change in the code listed in the link above would be of great help.
On request here is the sample code.
MeanSpeed<-7.29 ## Mean Yearly Wind Speed at the site.
Shape=2; ## Input Shape parameter.
Scale=8 ##Calculated Scale Parameter.
MaxSpeed<-17
nStates<-16
These are the inputs in the blog, the MeanSpeed is the average annual wind speed at a location that has Shape and Scale parameters as provided. The MaxSpeed is the maximum speed possible over the year.
I would like to have Maxspeed for each month say Maxspeed_Jan, Maxspeed_feb ...till Maxspeed_dec. All with different values. This should be able to reflect the seasonallity in the Wind Speed variations across the year.
Then Calculate the following in a certain way that would reflect this variation in the output time series.
nRows<-nStates;
nColumns<-nStates;
LCateg<-MaxSpeed/nStates;
WindSpeed=seq(LCateg/2,MaxSpeed-LCateg/2,by=LCateg) ## Fine the velocity vector-centered on the average value of each category.
##Determine Weibull Probability Distribution.
wpdWind<-dweibull(WindSpeed,shape=Shape, scale=Scale); # Freqency distribution.
plot(wpdWind,type = "b", ylab= "frequency", xlab = "Wind Speed") ##Plot weibull probability distribution.
norm_wpdWind<-wpdWind/sum(wpdWind); ## Convert weibull/Gaussian distribution to normal distribution.
## Correlation between states (Matrix G)
g<-function(x){2^(-abs(x))} ## decreasing correlation function between states.
G<-matrix(nrow=nRows,ncol=nColumns)
G <- row(G)-col(G)
G <- g(G)
##--------------------------------------------------------
## iterative process to calculate the matrix P (initial probability)
P0<-diag(norm_wpdWind); ## Initial value of the MATRIX P.
P1<-norm_wpdWind; ## Initial value of the VECTOR p.
## This iterative calculation must be done until a certain error is exceeded
## Now, as something tentative, I set the number of iterations
steps=1000;
P=P0;
p=P1;
for (i in 1:steps){
r<-P%*%G%*%p;
r<-as.vector(r/sum(r)); ## The above result is in matrix form. I change it to vector
p=p+0.5*(P1-r)
P=diag(p)}
## $$ ----Markov Transition Matrix --- $$ ##
N=diag(1/as.vector(p%*%G));## normalization matrix
MTM=N%*%G%*%P ## Markov Transition Matrix
MTMcum<-t(apply(MTM,1,cumsum));## From the MTM generated the accumulated
##-------------------------------------------
## Calculating the series from the MTMcum
##Insert number of data sets.
LSerie<-52560; Wind Speed every 10 minutes for a year.
RandNum1<-runif(LSerie);## Random number to choose between states
State<-InitialState<-1;## assumes that the initial state is 1 (this must be changed when concatenating days)
StatesSeries=InitialState;
## Initallise----
## The next state is selected to the one in which the random number exceeds the accumulated probability value
##The next iterative procedure chooses the next state whose random number is greater than the cumulated probability defined by the MTM
for (i in 2:LSerie) {
## i has to start on 2 !!
State=min(which(RandNum1[i]<=MTMcum[State,]));
## if (is.infinite (State)) {State = 1}; ## when the above condition is not met max -Inf
StatesSeries=c(StatesSeries,State)}
RandNum2<-runif(LSerie); ## Random number to choose between speeds within a state
SpeedSeries=WindSpeed[StatesSeries]-0.5+RandNum2*LCateg;
##where the 0.5 correction is needed since the the WindSpeed vector is centered around the mean value of each category.
print(fitdistr(SpeedSeries, 'weibull')) ##MLE fitting of SpeedSeries
The obtained result must resemble the input Scale and Shape parameters. And instead of getting uniform wind speed of each month the variation will reflect the input max wind speeds of each month.
Thank you.

Adding seasonal variations to wind speed time series

Following up from an R blog which is interesting and quite useful to simulate the time series of an unknown area using its Weibull parameters.
Although this method gives a reasonably good estimate of time series as a whole it suffers a great deal when we look for seasonal changes. To account for seasonal changes I want to employ seasonal maximum wind speeds and carry out the time series synthesis such that the yearly distribution remains constant ie. shape and scale parameters (annual values).
I want to employ seasonal maximum wind speeds to the below code by using 12 different maximum wind speeds, one each for every month. This will allow greater wind speeds at certain month and lower in others and should even out the resultant time series.
The code follows like this:
MeanSpeed<-7.29 ## Mean Yearly Wind Speed at the site.
Shape=2; ## Input Shape parameter (yearly).
Scale=8 ##Calculated Scale Parameter ( yearly).
MaxSpeed<-17 (##yearly)
## $$$ 12 values of these wind speed one for each month to be used. The resultant time series should satisfy shape and scale parameters $$ ###
nStates<-16
nRows<-nStates;
nColumns<-nStates;
LCateg<-MaxSpeed/nStates;
WindSpeed=seq(LCateg/2,MaxSpeed-LCateg/2,by=LCateg) ## Fine the velocity vector-centered on the average value of each category.
##Determine Weibull Probability Distribution.
wpdWind<-dweibull(WindSpeed,shape=Shape, scale=Scale); # Freqency distribution.
plot(wpdWind,type = "b", ylab= "frequency", xlab = "Wind Speed") ##Plot weibull probability distribution.
norm_wpdWind<-wpdWind/sum(wpdWind); ## Convert weibull/Gaussian distribution to normal distribution.
## Correlation between states (Matrix G)
g<-function(x){2^(-abs(x))} ## decreasing correlation function between states.
G<-matrix(nrow=nRows,ncol=nColumns)
G <- row(G)-col(G)
G <- g(G)
##--------------------------------------------------------
## iterative process to calculate the matrix P (initial probability)
P0<-diag(norm_wpdWind); ## Initial value of the MATRIX P.
P1<-norm_wpdWind; ## Initial value of the VECTOR p.
## This iterative calculation must be done until a certain error is exceeded
## Now, as something tentative, I set the number of iterations
steps=1000;
P=P0;
p=P1;
for (i in 1:steps){
r<-P%*%G%*%p;
r<-as.vector(r/sum(r)); ## The above result is in matrix form. I change it to vector
p=p+0.5*(P1-r)
P=diag(p)}
## $$ ----Markov Transition Matrix --- $$ ##
N=diag(1/as.vector(p%*%G));## normalization matrix
MTM=N%*%G%*%P ## Markov Transition Matrix
MTMcum<-t(apply(MTM,1,cumsum));## From the MTM generated the accumulated
##-------------------------------------------
## Calculating the series from the MTMcum
##Insert number of data sets.
LSerie<-52560; Wind Speed every 10 minutes for a year.
RandNum1<-runif(LSerie);## Random number to choose between states
State<-InitialState<-1;## assumes that the initial state is 1 (this must be changed when concatenating days)
StatesSeries=InitialState;
## Initallise----
## The next state is selected to the one in which the random number exceeds the accumulated probability value
##The next iterative procedure chooses the next state whose random number is greater than the cumulated probability defined by the MTM
for (i in 2:LSerie) {
## i has to start on 2 !!
State=min(which(RandNum1[i]<=MTMcum[State,]));
## if (is.infinite (State)) {State = 1}; ## when the above condition is not met max -Inf
StatesSeries=c(StatesSeries,State)}
RandNum2<-runif(LSerie); ## Random number to choose between speeds within a state
SpeedSeries=WindSpeed[StatesSeries]-0.5+RandNum2*LCateg;
##where the 0.5 correction is needed since the the WindSpeed vector is centered around the mean value of each category.
print(fitdistr(SpeedSeries, 'weibull')) ##MLE fitting of SpeedSeries
Can anyone suggest where and what changes I need to make to the code?
I don't know much about generating wind speed time series but maybe those guidelines can help you improve your code readability/reusability:
#1 You probably want to have a function which will generate a wind speed time
serie given a number of observations and a seasonal maximum wind speed. So first try to define your code inside a block like this one:
wind_time_serie <- function(nobs, max_speed){
#some code here
}
#2 Doing so, if it seems that some parts of your code are useful to generate wind speed time series but aren't about wind speed time series, try to put them into functions (e.g. the part you compute norm_wpdWind, the part you compute MTMcum,...).
#3 Then, the part of your code at the beginning when your define global variable should disappear and become default arguments in functions.
#4 Avoid using endline comments when your line is already long and delete the ending semicolumns.
#This
State<-InitialState<-1;## assumes that the initial state is 1 (this must be changed when concatenating days)
#Would become this:
#Assumes that the initial state is 1 (this must be changed when concatenating days)
State<-InitialState<-1
Then your code should be more reusable / readable by other people. You have an example below of those guidelines applied to the rnorm part:
norm_distrib<-function(maxSpeed, states = 16, shape = 2, scale = 8){
#Fine the velocity vector-centered on the average value of each category.
LCateg<-maxSpeed/states
WindSpeed=seq(LCateg/2,maxSpeed-LCateg/2,by=LCateg)
#Determine Weibull Probability Distribution.
wpdWind<-dweibull(WindSpeed,shape=shape, scale=scale)
#Convert weibull/Gaussian distribution to normal distribution.
return(wpdWind/sum(wpdWind))
}
#Plot normal distribution with the max speed you want (e.g. 17)
plot(norm_distrib(17),type = "b", ylab= "frequency", xlab = "Wind Speed")

Convert double differenced forecast into actual value diff() in R

I have already read
Time Series Forecast: Convert differenced forecast back to before difference level
and
How to "undifference" a time series variable
None of these unfortunately gives any clear answer how to convert forecast done in ARIMA using differenced method(diff()) to reach at stationary series.
code sample.
## read data and start from 1 jan 2014
dat<-read.csv("rev forecast 2014-23 dec 2015.csv")
val.ts <- ts(dat$Actual,start=c(2014,1,1),freq=365)
##Check how we can get stationary series
plot((diff(val.ts)))
plot(diff(diff(val.ts)))
plot(log(val.ts))
plot(log(diff(val.ts)))
plot(sqrt(val.ts))
plot(sqrt(diff(val.ts)))
##I found that double differencing. i.e.diff(diff(val.ts)) gives stationary series.
#I ran below code to get value of 3 parameters for ARIMA from auto.arima
ARIMAfit <- auto.arima(diff(diff(val.ts)), approximation=FALSE,trace=FALSE, xreg=diff(diff(xreg)))
#Finally ran ARIMA
fit <- Arima(diff(diff(val.ts)),order=c(5,0,2),xreg = diff(diff(xreg)))
#plot original to see fit
plot(diff(diff(val.ts)),col="orange")
#plot fitted
lines(fitted(fit),col="blue")
This gives me a perfect fit time series. However, how do i reconvert fitted values into their original metric from the current form it is now in? i mean from double differencing into actual number? For log i know we can do 10^fitted(fit) for square root there is similar solution, however what to do for differencing, that too double differencing?
Any help on this please in R? After days of rigorous exercise, i am stuck at this point.
i ran test to check if differencing has any impact on model fit of auto.arima function and found that it does. so auto.arima can't handle non stationary series and it requires some effort on part of analyst to convert the series to stationary.
Firstly, auto.arima without any differencing. Orange color is actual value, blue is fitted.
ARIMAfit <- auto.arima(val.ts, approximation=FALSE,trace=FALSE, xreg=xreg)
plot(val.ts,col="orange")
lines(fitted(ARIMAfit),col="blue")
secondly, i tried differencing
ARIMAfit <- auto.arima(diff(val.ts), approximation=FALSE,trace=FALSE, xreg=diff(xreg))
plot(diff(val.ts),col="orange")
lines(fitted(ARIMAfit),col="blue")
enter image description here
thirdly, i did differencing 2 times.
ARIMAfit <- auto.arima(diff(diff(val.ts)), approximation=FALSE,trace=FALSE,
xreg=diff(diff(xreg)))
plot(diff(diff(val.ts)),col="orange")
lines(fitted(ARIMAfit),col="blue")
enter image description here
A visual inspection can suggest that 3rd graph is more accurate out of all. This i am aware of. The challenge is how to reconvert this fitted value which is in the form of double differenced form into the actual metric!
The opposite of diff is kind of cumsum, but you need to know the starting values at each diff.
e.g:
set.seed(1234)
x <- runif(100)
z <- cumsum(c(x[1], cumsum(c(diff(x)[1], diff(diff(x))))))
all.equal(z, x)
[1] TRUE
Share some of your data to make a reproducible example to better help answer the question.
If you expect that differencing will be necessary to obtain stationarity, then why not simply include the maximum differencing order in the function call? That is, the "I" in ARIMA is the order of differencing prior to fitting an ARMA model, such that if
y = diff(diff(x)) and y is an ARMA(p,q) process,
then
x follows an ARIMA(p,2,q) process.
In auto.arima() you specify the differencing with the d argument (or D if it involves seasons). So, you want something like this (for a maximum of 3 differences):
fit <- auto.arima(val.ts, d=3, ...)
From this, you can verify that the fitted values will indeed map onto the original data
plot(val.ts)
lines(fit, col="blue")
In the example below containing dummy data, I have double differenced. First, I removed seasonality (lag = 12) and then I removed trend from the differenced data (lag = 1).
set.seed(1234)
x <- rep(NA,24)
x <- x %>%
rnorm(mean = 10, sd = 5) %>%
round(.,0) %>%
abs()
yy <- diff(x, lag = 12)
z <- diff(yy, lag = 1)
Using the script that #jeremycg included above and I include below, how would I remove the double difference? Would I need to add lag specifiers to the two nested diff() commands? If so, which diff() would have the lag = 12 specifier and which would have the lag = 1?
zz <- cumsum(c(x[1], cumsum(c(diff(x)[1], diff(diff(x))))))

Resources