I am using R to read a time series from Excel (using XLConnect), then running some forecast models on that time series, and then outputting the results back to Excel. It's a long story, but the company I'm doing a Masters by Research for want to keep using Excel! Anyhow, I can extract the time series I want from Excel. I use ts() to make it a time series. I then run forecasts on that series using (in this order) ets(), auto.arima(), tbats(), mapaest(), theta(), and finally stlf(). To check that it's doing the forecasts I get R to print off the results of the forecasts. It runs through all the forecast models fine until it gets to the stlf() forecast function. When I get this error:
Error in stl(x, s.window = s.window, t.window = t.window, robust =
robust) : only univariate series are allowed
My question is how come the time series (it is univariate) works fine in the other forecast functions but not in the stlf() function?
I have figured it out now after doing some more digging around. It turns out that I was passing a time series that was also a matrix to the stlf() function. The other forecasting functions don't seem to mind being given a time series that is also a matrix, but for some reason stlf() doesn't like it.
I converted my raw data into a vector using as.vector() and then turned that vector into a ts() object. And then passed that onto the stlf() function. Not massivly sure how this works as I originally passed a ts() object to stlf() previously. Fairly new to R, but I will look further into this..
Here is the code I used ;
altVector <- as.vector(t(rawdata))
tsy2 <- ts(altVector, start=1, end=c(3,9), frequency=12)
f.stlf <- stlf(tsy2, method="arima", h=3)
It's all working fine now.
Related
I have a question using R's ccf() function. I have two time series that represents snow water equivalent on the surface and groundwater head under the ground. I want to find out the "propagation" time from the surface to the ground, so I think that using cross correlation between two time series can help me to detect what's the "lag" time between them.
It seems that ccf() function is a proper way to determine the lag of two time series. But according to the mathematical concepts of cross correlation, it seems that it requires stationarity of the input data, and both of my time series are seasonal, because intuitively we know that snow occurs in winter. Data with seasonality is considered as non-stationary, so I think I might need to decompose the data so that it's stationary. Then I used both stl() function and decompose() function to detect whether there is a seasonality pattern, but both of them gave me this error message:
Error in decompose(swefoothill):
time series has no or less than 2 periods
which is pretty self explanatory, both time series don't have a clear seasonality. But that doesn't mean that my data are not seasonal. So I want to ask under this circumstance, is it okay for me to perform a ccf() directly for both time series? I did a sample analysis and the correlation factor figure looks like this:
And I'm observing a cycle pattern here, am I doing it wrong? Thanks a lot for your help!
I am trying to run the analyze.wavelet function from WaveletComp package with a time series with 15 years of data. There are some small gaps in the data resulting in error "Some values in your time series seem to be missing." Is there a way to analyze the time series as a single block? I would prefer that in my work instead of data separated to several blocks. Thanks for any help.
I have a script in R for Time series forecasting. Initially when I created a time series on my univariate data and applied tsclean() function it worked. But today, when I'm trying to re-run my script again at a point where I use my tsclean() function it says:
Error in tsclean(sales_ts) : could not find function "tsclean"
I'm calling in my forecast package as well before where it says forecast package is already built in R 3.6.1.
It's weird why the function is not working today but it worked yesterday.
#creating time series data
sales_ts = ts(sdld_10[, c('Sales')])
#cleaning time series data
library(forecast)
sdld_10$clean_sales = tsclean(sales_ts)
sdld_10$test<- c(sdld_10$clean_sales - sdld_10$Sales)
I'm relatively new to R programming, but I've been reading your blogs and posts in order to get up-to-date with the forecast package. However, I have been struggling with the effect of seasonality.
Take for example the simplest signal possible:
train <- ts(sin((2*pi)*seq(from=0, to=10, by=0.01)))
If I just try to forecast this signal with brute force, I get irrelevant results:
plot(forecast(train,h=20))
However, if I manually detect the seasonality as 100, and do the following:
train <- ts(sin((2*pi)*seq(from=0, to=10, by=0.01)),frequency=100)
plot(forecast(train))
I get excellent forecasting results.
I'm honestly very puzzled by these results, which obviously happen for more complex signals.
If I remember correctly, when you create the time series object you have to specify its frequency. That way, the forecast method will be able to detect the seasonal pattern. There are some other ways to detect seasonality, like trying the auto arima function and checking if it selects a seasonal model. Apart from visual exploration, of course.
I'm using the fourier() and fourierf() functions in Ron Hyndman's excellent forecast package in R. Looking to verify whether the same terms are selected and used in fourier() and fourierf(), I plotted a few of the output terms.
Below is the original data using ts.plot(data). There's a frequency of 364 in the time series, FYI.
Below is the plot of the terms using fourier(data,3). Basically, it looks like mirror images of the existing data.
Looking at just the sin1 term of the output, again, we get some variation that shows similar 364-day seasonality in line with the data above.
However, when I plot the results of the Fourier forecast using fourierf(data,3, 410) I see the below data. It appears far more smooth than the terms provided by the original fourier function.
So, I wonder how the results of fourier() and fourierf() are related. Is it possible to just see one consolidated Fourier result, so that you can see the sin or cosine result moving through existing data and then through the forecasting period? If not, how can I confirm that the terms created by fourierf() fit the in-sample data?
I want to use it in an auto.arima or glm function with other external regressors like this:
trainFourier<-fourier(data,3)
trainFourier<-as.data.frame(trainFourier)
trainFourier$exogenous<-exogenousData
arima.object<-auto.arima(data, xreg=trainFourier)
futureFourier<-fourierf(data,3, 410)
fourierForecast<-forecast(arima.object, xreg=futureFourier, h=410)
and want to be completely sure that the auto.arima has the proper fitting (using the terms from fourier()) to what I'll put in under xreg for forecast (which has terms from a different function, i.e. ffourier()).
Figured out the problem. I was using both the fda and forecast packages. fda, which is for functional data analysis and regression, has its own fourier() function. If I detach fda, my S1 term from fourier(data,3) looks like this:
which lines up nicely with the Fourier forecast if I use ts.plot(c(trainFourier$S1,futureFourier$S1))
Moral of the story -- watch what your packages supress, folks!