I am working through the "Forecasting Using R" DataCamp course. I have completed the entire thing except for the last part of one particular exercise (link here, if you have an account), where I'm totally lost. The error help it's giving me isn't helping either. I'll put the various parts of the task down with the code I'm using to solve them:
Produce time plots of only the daily demand and maximum temperatures with facetting.
autoplot(elec[, c("Demand", "Temperature")], facets = TRUE)
Index elec accordingly to set up the matrix of regressors to include MaxTemp for the maximum temperatures, MaxTempSq which represents the squared value of the maximum temperature, and Workday, in that order.
xreg <- cbind(MaxTemp = elec[, "Temperature"],
MaxTempSq = elec[, "Temperature"] ^2,
Workday = elec[,"Workday"])
Fit a dynamic regression model of the demand column with ARIMA errors and call this fit.
fit <- auto.arima(elec[,"Demand"], xreg = xreg)
If the next day is a working day (indicator is 1) with maximum temperature forecast to be 20°C, what is the forecast demand? Fill out the appropriate values in cbind() for the xreg argument in forecast().
This is where I'm stuck. The sample code they supply looks like this:
forecast(___, xreg = cbind(___, ___, ___))
I have managed to work out that the first blank is fit, so I'm trying code that looks like this:
forecast(fit, xreg = cbind(elec[,"Workday"]==1, elec[, "Temperature"]==20, elec[,"Demand"]))
But that is giving me the error hint "Make sure to forecast the next day using the inputs given in the instructions." Which... doesn't tell me anything useful. Any ideas what I should be doing instead?
When you are forecasting ahead of time, you use new data that was not included in elec (which is the data set you used to fit your model). The new data was given to you in the question (temperature 20C and workday 1). Therefore, you do not need elec in your forecastcall. Just use the new data to forecast ahead:
forecast(fit, xreg = cbind(20, 20^2, 1))
Related
my problem is the following : I have a Landsat NDVI time series that is non-periodic/doesn't have a homogenous frequency. However, the error code I receive is
Error in stl(Yt, "periodic") : series is not periodic or has less than two periods
after having tried to convert my data into a timeseries without explicitely setting a frequency :
test_timeseries <-ts(test$nd, start = c(1984,4), end = c(2011,10)). when I try to calculate the frequency or deltat with the help of the functions frequency() or deltat(), it both leads to 1 - which I don't understand , as I have non-periodic data for nearly every month and not only once a year.
So my question is, how to set the frequency in this case and how to deal with this circumstance of non-periodicity ? It seems like, without setting a frequency, I cannot use the function bfast().
sorry if the answer is obvious, I'm very new to timeseries analyses.
Please read the help file. It helps. In this case, it describes the following argument.
season : the seasonal model used to fit the seasonal component and detect seasonal breaks (i.e. significant phenological change). There are three options: "dummy", "harmonic", or "none" where "dummy" is the model proposed in the first Remote Sensing of Environment paper and "harmonic" is the model used in the second Remote Sensing of Environment paper (See paper for more details) and where "none" indicates that no seasonal model will be fitted (i.e. St = 0 ). If there is no seasonal cycle (e.g. frequency of the time series is 1) "none" can be selected to avoid fitting a seasonal model.
So set season = "none" in bfast().
I'm stuck into this question that I can't solve.
When using AirPassengers data and model it through ETS() and AUTO.ARIMA(), the fitted values seems reasonable well fitted to observed values:
library(forecast)
a <- ts(AirPassengers, start = 1949, frequency = 12)
a <- window(a, start = 1949, end = c(1954,12), frequency = 12)
fit_a_ets <- ets(a)
fit_a_arima <- auto.arima(a)
plot(a)
lines(fit_a_ets$fitted, col = "blue")
lines(fit_a_arima$fitted, col = "red")
Plot from AirPassengers and fitted models
When I tried same code on my data, it seems dislocated 1 period:
b <- c(1237,1982,1191,1163,1418,1687,2331,2181,1943,1782,177,1871,391,1397,734,712,1006,508,368,767,675,701,989,725,1292,983,1094,1105,928,1246,1604,1163,1390,959,1630,789,1173,910,875,718,655,606,968,716,476,476,655,499,544,1250,359,386,458,947,542,953,1450,1195,1317,957,778,1030,1399,1119,3142,1024,1537,1321,2062,1897,2094,2546,1796,2089,1194,896,727,599,785,674,828,311,375,315,365,314,126,315,372,666,596,589,001,613,498,635,644,1018,873,900,502,121,293,259,311,169,378,153,24,115,250,565,349,201,393,83,327,325,185,307,501,194)
b <- ts(b, start = 1949, frequency = 12)
b <- window(b, start = 1949, end = c(1954,12), frequency = 12)
fit_b_ets <- ets(b)
fit_b_arima <- auto.arima(b)
plot(b)
lines(fit_b_ets$fitted, col = "blue")
lines(fit_b_arima$fitted, col = "red")
Plot from my data and fitted models
Does anyone know why?
Tried here https://otexts.com/fpp2/index.html and I didn't get why this happens.
I thought it would be because it's not well fitted into my data, but for others set's of data, same occurs. For example, figure 7.1 from https://otexts.com/fpp2/ses.html.
This is typical.
In the context of forecasting, the "fitted" value is the one-step-ahead forecast. For many different types of series, the best that we can do is something that's close to the latest observation, plus a small adjustment. This makes it look like the "fitted" value lags by 1 period because it is then usually quite close to the previous observed value.
Asking why the fitted series lags is like asking "why can't we know the future before it happens?". It's simply not that easy, and it doesn't indicate that the model is necessarily inadequate (it may not be possible to do better).
Plots comparing the time series of observations and fitted values are rarely of any use for forecasting; they always essentially look like this. It also makes it difficult to judge the vertical distance between the lines, which is what you actually care about (the forecasting error). Better to plot the forecasting error directly.
The AirPassengers series is unusual because it is extremely easy to forecast based on its seasonality. Most series you will encounter in the wild are not quite like this.
I generated my own fictional Sales Data in order to execute a time series analysis.
It is supposed to represent a growing company and therefore i worked with a trend. However, I read through some tutorials and often read the information, that non-stationary time series should not be predicted by the auto.arima function.
But I receive results that make sense and If I would difference the data (which i did as well) the output doesn't make much sense.
So here comes my question: Can I use the auto.arima function with my data, that obviously has a trend?
Best regards and thanks in advance,
Francisco
eps <- rnorm(100, 30, 20)
trend <- seq(1, 100, 1)
trend <- 3 * trend
Sales <- trend + eps
timeframe<-seq(as.Date("2008/9/1"),by="month",length.out=100)
Data<-data.frame(Sales,timeframe)
plot(Data$timeframe,Data$Sales)
ts=ts(t(Data[,1]))
plot(ts[1,],type='o',col="black")
md=rwf(ts[1,],h=12,drift=T,level=c(80,95))
auto.arima(ts[1,])
Using the forecast function allows us to plot the expected sales for the next year: plot(forecast(auto.arima(ts[1,]),h=12))
Using the forecast function with our automated ARIMA can help us plan for the next quartal
forecast(auto.arima(ts[1,]),h=4)
plot(forecast(auto.arima(ts[1,])))
another way would be to use the autoplot function
fc<-forecast(ts[1,])
autoplot(fc)
The next step is to analyze our time-series. I execute the adf test, which has the null-hypothesis that the data is non-stationary.
So with the 5% default threshold our p-value would have to be greater than 0.05 in order to be certified as non-stationary.
library(tseries)
adf=adf.test(ts[1,])
adf
The output suggests that the data is non-stationary:
acf
acf=Acf(ts[1,])
Acf(ts[1,])
The autocorrelation is decreasing almost steadily, this points to non-stationary data also. Doing a kpss.test should verify that our data is non-stationary, since its null-hypothesis is the opposite of the adf test.
Do we expect a value smaller than 0.05
kpss=kpss.test(ts[1,])
kpss
We receive a p-value of 0.01, further proving that the data has a trend
ndiffs(ts[1,])
diff.data=diff(ts[1,])
auto.arima(diff.data)
plot(forecast(diff.data))
To answer your question - yes, you can use the auto.arima() function in the forecast package on non-stationary data.
If you look at the help file for auto.arima() (by typing ?auto.arima) you will see that it explains that you can choose to specify the "d" parameter - this is the order of differencing - first order means you difference the data once, second order means you difference the data twice etc. You can also choose not to specify this parameter and in this case, the auto.arima() function will determine the appropriate order of differencing using the "kpss" test. There are other unit root tests such as the Augmented Dickey-Fuller which you can choose to use in the auto.arima function by setting test="adf". It really depends on your preference.
You can refer to page 11 and subsequent pages for more information on the auto.arima function here:
https://cran.r-project.org/web/packages/forecast/forecast.pdf
I have a series of algorithms I am running on financial data. For the purposes of this question I have financial market data for a stock with 1226 rows of data.
I run the follow code to fit and predict the model:
strat.fit <- glm(DirNDay ~l_UUP.Close + l_FXE.Close + MA50 + +MA10 + RSI06 + BIAS10 + BBands05, data=STCK.df,family="binomial")
strat.probs <- predict(strat.fit, STCK.df,type="response")
I get probability prediction up to row 1226, I am interested in making a prediction for a new day which would be 1227. I get the following response on an attempt for a predict on day 1227
strat.probs[1227]
NA
Any help/suggestions would be appreciated
The predict function is going to predict the value of DirNDay based on the value of the other variables for that day. If you want it to predict DirNDay for a new day, then you need to provide it with all the other relevant variables for that new day.
It sounds like that's not what you're trying to do, and you need to create a totally different model which uses time (or day) to predict the values. Then you can provide predict with a new time and it can use that to predict a new DirNDay.
There's a free online textbook about forecasting using R by Rob Hyndman if you don't know where to start: https://www.otexts.org/fpp
(But if I totally misunderstood that glm model then nevermind those last two paragraphs.)
In order to make a prediction for the 1228th day, you'll need to know what the values of your explanatory variables (MA50, MA10, etc) will be for the 1228th day. Store those as a new data frame (say STCK.df.new) and put that into your predict function:
STCK.df.new <- data.frame(l_UUP.Close = .4, l_FXE.Close = 2, ... )
strat.probs <- predict(strat.fit ,STCK.df.new ,type="response")
i'm new to R, so I'm having trouble with this time series data
For example (the real data is way larger)
data <- c(7,5,3,2,5,2,4,11,5,4,7,22,5,14,18,20,14,22,23,20,23,16,21,23,42,64,39,34,39,43,49,59,30,15,10,12,4,2,4,6,7)
ts <- ts(data,frequency = 12, start = c(2010,1))
So if I try to decompose the data to adjust it
ts.decompose <- decompose(ts)
ts.adjust <- ts - ts.decompose$seasonal
ts.hw <- HoltWinters(ts.adjust)
ts.forecast <- forecast.HoltWinters(ts.hw, h = 10)
plot.forecast(ts.forecast)
But for the first values I got negative values, why this is happening?
Well, you are forecasting the seasonally adjusted time series, and of course the deseasonalized series ts.adjust can already contain negative values by itself, and in fact, it actually does.
In addition, even if the original series contained only positive values, Holt-Winters can yield negative forecasts. It is not constrained.
I would suggest trying to model your original (not seasonally adjusted) time series directly using ets() in the forecast package. It usually does a good job in detecting seasonality. (And it can also yield negative forecasts or prediction intervals.)
I very much recommend this free online forecasting textbook. Given your specific question, this may also be helpful.