NA values in the trend of my time series in R - r

and thanks in advance for yout help.
I'm working with a weekley seasonal time series, but when I use de decompose() function to get the trend, the seasonal and the random data, I get som NA's. Here is the code:
myts <- c(5,40,43,65,95,111,104,124,133,263,388,1488,796,1209,707,52,0,76,306,1219,671,318,125,192,128,33,5,17,54,55,74,133,111,336,321,34,74,210,280,342,708,232,479,822,188,104,50,24,3,1,0,0,8,55,83,75,104,163,169,259,420,1570,243,378,1036,834,856,17,8,88,359,590,768,1461,443,128,89,192,37,21,51,62,78,125,123,259,600,60,59,180,253,379,766,375,828,502,165,114,76,10,2,1,0,0,46,71,95,102,132,212,268,330,428,1635,302,461,993,1497,1137,29,2,219,436,817,979,1226,317,134,121,211,35,47,87,83,97,177,153,345,635,48,84,234,258,358,780,470,700,701,331,67,0,0,0,0,0,0)
myts <- ts(myts, start=c(2015,17), frequency = 52)
modelo1 <- decompose(myts, "additive")
plot(modelo1)
As you can see in this image, there are some NA's at the beginning and the end of my trend and random data. I would like to know why and how can I solve this in order to extract the trend from the data:
Thanks again for your help.

From the documentation of the decompose() function itself, the trend component is estimated using a moving average with a symmetric window with equal weights.
Since your frequency is 52, it's an even number and so the value of the first 25.5 and last 25.5 points plus itself are averaged in order to produce the value of the first "average".
When you apply the filtering, because values haven't yet exist for the first 26 points, you would get exactly 25 NA for the first 26 values in the trend component of your time series.
The calculation of your random component essentially is:
$Observed - $Trend - $Seasonal = Random
So because there are NA values in your seasonal component, you would also get NA values in the same position for Random where the arithmetic operation is expected.
Additional Proof:
These are the weights that should be applied in your moving average since you specified frequency=52. This moving average results in what you know as the trend component:
c(0.5, rep_len(1, 51), 0.5)/52
[1] 0.009615385 0.019230769 ... 0.019230769 0.009615385
So applying those weights to the first non-NA value, you would do something like this:
sum(
as.vector(myts[1])*0.009615385,
as.vector(myts[2:52])*0.019230769,
as.vector(myts[53])*0.009615385
)
Alternatively you can also use the filter function, which apply, be default, a two-sided moving average:
coef1 <- c(0.5, rep_len(1, 51), 0.5)/52
stats::filter(myts, coef1)
In any case, you will see exactly the same result as the one from your decomposed time series, modelo1$trend. And because the first 26 values are missing, you end up with NAs.
For a frequency=12 decomposed time series, this is what I see for example:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov
1946 NA NA NA NA NA NA 23.98433 23.66213 23.42333 23.16112 22.86425
1947 22.35350 22.30871 22.30258 22.29479 22.29354 22.30562 22.33483 22.31167 22.26279 22.25796 22.27767
1948 22.43038 22.43667 22.38721 22.35242 22.32458 22.27458 22.23754 22.21988 22.16983 22.07721 22.01396
1949 22.06375 22.08033 22.13317 22.16604 22.17542 22.21342 22.27625 22.35750 22.48862 22.70992 22.98563

Related

Problems when adjusting time series for seasonality with seas function in R

I have an issue when I try to adjust my quarterly time series dataset for seasonality in R. I have loaded in the dataset 'ASPUS' to R and specified it's date by using the following code:
ASPUS <- read.csv(file = "ASPUS.csv", header=TRUE, sep=",")
ASPUS$DATE <- as.Date(ASPUS$DATE, "%Y-%m-%d")
The head of the dataset looks like this:
DATE ASPUS
1 1963-01-01 100.00000
2 1963-04-01 100.51813
3 1963-07-01 99.48187
4 1963-10-01 101.55440
5 1964-01-01 101.55440
The purpose of the dataset is to analyze it as a Time Series. Therefore, I use the ts function to create a time series object:
ASPUSts <- ts(ASPUS, frequency = 4, start = 1963)
However, this function returns negative numbers within the date column like this:
DATE ASPUS
1963 Q1 -2557 100.00000
1963 Q2 -2467 100.51813
1963 Q3 -2376 99.48187
1963 Q4 -2284 101.55440
1964 Q1 -2192 101.55440
1964 Q2 -2101 104.66321
My problem then occurs in the next step, where I try to adjust for seasonality with the function seas:
ASPUS1 <- seas(ASPUSts)
Because I get this error:
Error: X-13 run failed
Errors:
- Seasonal MA polynomial with initial parameters is
noninvertible with root(s) inside the unit circle. RESPECIFY
model with different initial parameters.
Warnings:
- Automatic transformation selection cannot be done on a series
with zero or negative values.
- The covariance matrix of the ARMA parameters is singular, so
the standard errors and the correlation matrix of the ARMA
parameters will not be printed out.
- The covariance matrix of the ARMA parameters is singular, so
the standard errors and the correlation matrix of the ARMA
parameters will not be printed out.
Does anyone have any suggestions on how I can deal with these negative values or otherwise solve the problem so that I can seasonally adjust my dataset?
According to ?ts:
Description:
The function ‘ts’ is used to create time-series objects.
‘as.ts’ and ‘is.ts’ coerce an object to a time-series and test
whether an object is a time series.
Usage:
ts(data = NA, start = 1, end = numeric(), frequency = 1,
deltat = 1, ts.eps = getOption("ts.eps"), class = , names = )
as.ts(x, ...)
is.ts(x)
Arguments:
data: a vector or matrix of the observed time-series values. A data
frame will be coerced to a numeric matrix via ‘data.matrix’.
(See also ‘Details’.)
Here, the key thing to note is: "data: a vector or matrix of the observed time-series values". I.e. you are not expected to enter the date vector as an input to ts.
This is the correct syntax for ts.
ASPUSts <- ts(ASPUS$ASPUS, frequency = 4, start = 1963)
In your example, seas is actually trying to seasonally adjust the time series of dates that get transformed to -2557, -2467, ... (side note: internally R counts dates as differences from 1970-01-01; that's why dates in the '60s appear as negative numbers).

Write R code of AR(2) model for a time serie data from `rsav` file

I need to write R code to model a time serie data from rsav file. Here is detailed information about the question:
The file “file.rsav” (which can be loaded into R using load(“file.rsav”)) contains a time series (“xx”). The series is a “demeaned” monthly revenue stream (in millions of
dollars) for a company. There are n = 96 observations.
The series has been “demeaned”; usually that would mean we subtract off $\bar{X}$ from every data point, but pretend for now we know the mean $miu$ exactly so we have subtracted off µ from every data point, so the new series is exactly (theoretically) mean 0. (But thus its sample mean is not precisely 0.)
We will consider possible ARMA models for the series $X_t$. We assume that the corresponding white noise is Gaussian (so X_t is Gaussian). We will consider first an AR(2) model. We assume we know the true model exactly: $X_t = .1.34X_{t-1} - .48X_{t-2} + W_t, W_t \sim iid N(0, σ^2)$.
I was asked to compute forecasts backcasts using model, up to 25 time steps in the future and into the past.
Write code to do the prediction by hand (i.e., not using the predict() function). Plot the data, forecast, and 95% prediction intervals [assuming gaussianity] (all on one plot). (Note: you do not need to do a multiplicity correction for the prediction intervals.)
Code:
load('./file.rsav')
str(xx)
xx
Out:
Time-Series [1:96] from 1 to 8.92: 2.45 2.18 0.389 -1.44 -1.47 ...
Jan Feb Mar Apr May Jun Jul
1 2.45017780 2.17955829 0.38874020 -1.43979552 -1.47049807 -2.25233354 -0.82580703
2 1.92378321 1.87944208 1.07382472 1.01933130 1.88660307 -0.31109156 -0.25732342
3 0.60732330 1.53185399 1.58614371 0.63922270 0.82728407 0.28910411 -1.18154941
4 0.41375543 1.96633332 1.97402973 4.16058136 5.15474250 5.71865844 3.93136013
5 -1.51228022 -3.03396294 -3.65446772 -4.69589618 -3.51276584 -2.97682246 -3.08655352
6 3.43027017 4.68909032 6.55598795 4.95816124 4.87626503 3.17103291 0.79093946
7 -0.62481997 -0.94469455 -2.13648402 -3.64364158 -2.07214317 -3.26793808 -3.38573375
8 0.67823828 1.09908274 0.93832242 0.08791237 1.77322327 2.01201710 3.70197246
Aug Sep Oct Nov Dec
1 0.53048061 1.31994246 0.69306401 1.25916404 1.53363966
2 -0.47154459 0.52849630 0.90548093 0.71783457 0.86908457
3 -0.52525201 -0.40335058 0.73415310 0.58501633 0.29875228
4 2.50242432 1.69408297 0.96230124 0.53164036 -0.64480235
5 -1.60735865 -0.20500888 -0.44508903 -0.01443040 1.71087427
6 -0.09975821 -0.85972650 -0.41557374 -0.99876068 0.52620555
7 -2.25968715 -0.91700127 -0.49302872 -1.44275203 -0.66221559
8 4.62724761 4.17549847 3.43992950 3.15302462 4.17300576
I don't know too much about rsav extension file, could someone help me to solve this issue or give me some tips? Thanks in advance.
I think with "backcast" the in sample fit for the last 25 observations is meant. To forecast from an AR(2) model you simply need the last 2 observations for the next step.
The model is: x_t = ar1 * x_{t-1} + ar2 * x_{t-2} + error
Now we just need to insert the estimated ar parameters and the observations for x_{t-1} and x_{t-2}. For the next step we need the forecast step and the last observation:
x_{t+1} = ar1 * x_{t} + ar2 * x_{t-1} + error
This is what we repeat 25 times. The error term is assumed to be normal distributed, so it is expected to be zero.
We do the same thing for the "backcast", the in sample fit, but here we only need the observations from the time series.
forecast<-numeric(25)
backcast<-numeric(25)
forecast[1]<-0.134*xx[length(xx)]+0.48*xx[length(xx)-1]
forecast[2]<-0.134*forecast[1]+0.48*xx[length(xx)]
for(i in 3:25)
{
forecast[i]<-0.134*forecast[i-1]+0.48*forecast[i-2]
}
for(i in 1:25)
{
backcast[i]<-0.134*xx[length(xx)-i-1]+0.48*xx[length(xx)-i-2]
}
ts.plot(xx)

How does the forecast function work for auto.arima with exogenous regressors?

I am trying to fit a dynamic regression model using auto.arima. I have monthly data for gas (for heating) use per customer (which I want to predict) and set of regressors (e.g. heating degree days, gas price, structural dummies for particular years and seasonal dummies). I have forecast values for the exogenous regressors. Use per customer data is available from Jan 2005- Mar2018, all other data is from Jan2005-Dec 2021. I am trying to forecast Use Per Customer for all months of the year 2020.
I am unsure how to divide my data between xreg in the auto.arima function and the forecast function. The forecast values I am currently yielding do not match up with the month of use, for example, April 2018 forecast for use per customer is as high as and almost equal to Jan 2005 actual use. This should not be the case.
I am trying to forecast gas usage for residential customers using a dynamic regression model in the forecats package. I have referred to the online textbook by Prof. Rob J. Hyndman
https://otexts.com/fpp2/forecasting.html
#I generated the time series for the period for which the data is available (Jan 2005-Mar 2018)
Med_ros_upc.ts.test<-ts(Med_ros_upc.ts[,"ORMEDSCH410upc.r"],
frequency = 12, start = c(2005,1), end = c(2018,3))
#This is the set of external regressors including seasonal dummies(sd.ts)
xreg_Med<- cbind(Hdd = Med_ros_upc.ts[, "MEDHDD"],
Hdd2 = Med_ros_upc.ts[, "MEDHDD2"],
RPA = Med_ros_upc.ts[, "ORSCH410RPAt1.r"], sd,
Jan2009, intdummf)
#I convert the xreg matrix into a time series. I use this in auto.arima
xreg_Med.ts<-ts(xreg_Med, frequency = 12, start = c(2005,1),
end = c(2018,3))
#I generate a different xreg for forecast"
xreg_Med.ts1<-ts(xreg_Med, frequency = 12, start = c(2018,4),
end = c(2021, 12))
fitdyn <- auto.arima(Med_ros_upc.ts.test, xreg =xreg_Med.ts)
fcast <- forecast(fitdyn , xreg = xreg_Med.ts1)
Expected result
Point forecast
Jan 2005 111.19
Feb 2005 89.22...
April 2005 53.86
Actual result Point forecast
April 2018 111.19
May 2018 89.22...
Jun 2018 53.86
Your training data starts in Jan 2005 and finishes in March 2018. So your forecasts start in April 2018. Forecasts are future to the training data by definition.
Thank you!
This is what I did:
`fitdyn <- auto.arima(Med_ros_upc.ts[,"ORMEDSCH410upc.r"], xreg =xreg_Med, stationary= TRUE)`
`fit<-fitted(fitdyn)`
`fcast_fit<-forecast(fit, h=36, xreg= xreg_Med)`
Now I have two separate tables, one with the fitted values and with the forecast, and the values look alright.
However, I am getting this error message:
Error in etsmodel(y, errortype[i], trendtype[j], seasontype[k], damped[l], :
unused argument (xreg = xreg_Med)
In addition: Warning message:
In ets(object, lambda = lambda, biasadj = biasadj, allow.multiplicative.trend = allow.multiplicative.trend, :
Missing values encountered. Using longest contiguous portion of time series
Does this mean that the forecast I am obtaining does not account for the external regressors?

How to get the actual date plotted in X axis while plotting the auto.arima forecase in R?

I have a gold price data set with "DATE" and "GOLD PRICE" variables.After doing all the pre processing steps in R,I convert the data frame object to time series by ts or xts function and check for stationary through adf test.
Now by enabling forecast library I run auto.arima function and forecast next ten values.
x <- "DATE" "GOLD PRICE"
01-01-2006 1326
x.xts <- xts(x$GOLD PRICE,X$DATE),
fit <- auto.arima(x.xts)
forecast <- forecast(fit,h=10)
Now when I plot the forecast I get some values plotted in x instead of actual dates.I am able to get the date from x.xts through index(x.xts). But I want to extract it from forecast to get it plotted in graph for better understanding.
Someone please help me through this with the R codes.
You need to explicitly note the date when creating the ts (or xts) object. Using a reproducible example:
library("forecast")
data("gas")
# gas is already a TS object.
# We remove it and recreate it to show the appropriate method
gas2 <- vector(gas); rm(gas)
gas <- ts(gas2, start= c(1956,1), frequency= 12)
fit <- auto.arima(gas)
forecast(fit, h= 10)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Sep 1995 57178.66 54885.61 59471.71 53671.74 60685.58
Oct 1995 53080.77 50466.09 55695.46 49081.96 57079.59
Nov 1995 50940.76 48086.64 53794.87 46575.77 55305.75
Dec 1995 40923.84 37931.85 43915.84 36347.99 45499.70
Jan 1996 43739.23 40654.37 46824.09 39021.35 48457.12
Feb 1996 43706.56 40557.77 46855.34 38890.91 48522.20
Mar 1996 47849.24 44653.96 51044.52 42962.48 52736.00
Apr 1996 50204.88 46974.32 53435.44 45264.16 55145.60
May 1996 56691.41 53432.91 59949.91 51707.96 61674.86
Jun 1996 61053.42 57771.93 64334.92 56034.81 66072.04

Correlations between response (NDVI) and explanatory (certain precipitation months) variables in a loop

I am trying to correlate ndvi of certain months (jul, aug, sept) to the coefficient of variation (cv) of the previous 6 months.
My original dataset "d" includes monthly ndvi and precipitation data for 13 years for 26 stations:
row.names timestamp station year month ndvi landcover altitude precipitation
1 1 1 A 2000 jan 0.4138 Mixed forest 2143 16.0
2 1769 2 A 2000 feb 0.4396 Mixed forest 2143 4.0
So far I have the following script that allows me to correlate and plot just the precipitation in months jan-jun to the ndvi in jul.
for(m in c("jan","feb","mar","apr","may","jun")) {
ndvi<-d$ndvi[d$month=="jul"]
precip<-d$precipitation[d$month==m]
r2<-cor(ndvi,precip)^2
cat("month =",m,"P=",cor.test(ndvi,precip)$p.value,"\n")
plot(ndvi~precip,main=m, sub=sprintf("r2=%.2f",r2))
}
Does anyone have any ideas as to how I can incorporate the cv of jan-jun into the for loop?
d$cv <- sd(d$precipitation, na.rm = TRUE)/mean(d$precipitation, na.rm = TRUE)
Thanks for any help!
EDIT:
#Osssan I tried to input your changes like this:
for(m in c("jan","feb","mar","apr","may","jun")) {
ndvi<-d$ndvi[d$month=="aug"]
precip_6m<-d$precipitation[d$month %in% c("jan","feb","mar","apr","may","jun")]
cv= sd(precip_6m, na.rm=TRUE)/mean(precip_6m,na.rm=TRUE)
r2<-cor(ndvi,cv, use="complete.obs")^2
cat("month =",m,"P=",cor.test(ndvi,cv, na.action=na.omit)$p.value,"\n")
plot(ndvi~cv,main=m,
sub=sprintf("r2=%.2f",r2))
}
Unfortunately I get the following error:
Error in cor(ndvi, cv, use = "complete.obs") : incompatible dimensions
Basically for every year 2000-2013, I need to have the R squared, plot and p values of the coefficient of variation (jan-jun: ONE value) vs. NDVI (separately for jul/aug/sep).
Thanks

Resources