R function 'stl'. How to make seasonal? - r

I want to know how to make 'seasonal' in stl function.
I guess 'trend' is diff time series.
and how to make seasonal??
in R
'stl' description is
which.cycle <- cycle(x)
z$seasonal <- tapply(z$seasonal, which.cycle, mean)[which.cycle]
and R documentation
fdrfourier Calculation of the false discovery rates (FDR) for periodic expression
backgroundData Generation of background expression set
ar1analysis Performs AR1 fitting
fourierscore Calculation of the Fourier score
AR and Fourier transform ??

A time series cannot be made seasonal, strictly speaking. I suspect you mean extracting the seasonality pattern using the stl() function in R.
Let's consider an example of a time series measuring the maximum recorded air temperature in Dublin, Ireland from 1941 to 2019.
Here is a graph of the original time series:
weatherarima <- ts(mydata$maxtp, start = c(1941,11), frequency = 12)
plot(weatherarima,type="l",ylab="Temperature in Celsius")
title("Maximum Air Temperature - Dublin")
The seasonal, trend, and random components can be extracted with stl() as follows:
stl_weather = stl(weatherarima, "periodic")
seasonal_stl_weather <- stl_weather$time.series[,1]
trend_stl_weather <- stl_weather$time.series[,2]
random_stl_weather <- stl_weather$time.series[,3]
plot(as.ts(seasonal_stl_weather))
title("Seasonal")
plot(trend_stl_weather)
title("Trend")
plot(random_stl_weather)
title("Random")
Seasonal
Trend
Random
As can be observed, there is clear seasonality in the weather data - given the change in temperature across seasons - but this was merely extracted from the series by using stl() - not created as such.
You might find the following to be informative: Extracting Seasonality and Trend from Data: Decomposition Using R

Related

how to handle non-periodic timeseries in bfast (R)

my problem is the following : I have a Landsat NDVI time series that is non-periodic/doesn't have a homogenous frequency. However, the error code I receive is
Error in stl(Yt, "periodic") : series is not periodic or has less than two periods
after having tried to convert my data into a timeseries without explicitely setting a frequency :
test_timeseries <-ts(test$nd, start = c(1984,4), end = c(2011,10)). when I try to calculate the frequency or deltat with the help of the functions frequency() or deltat(), it both leads to 1 - which I don't understand , as I have non-periodic data for nearly every month and not only once a year.
So my question is, how to set the frequency in this case and how to deal with this circumstance of non-periodicity ? It seems like, without setting a frequency, I cannot use the function bfast().
sorry if the answer is obvious, I'm very new to timeseries analyses.
Please read the help file. It helps. In this case, it describes the following argument.
season : the seasonal model used to fit the seasonal component and detect seasonal breaks (i.e. significant phenological change). There are three options: "dummy", "harmonic", or "none" where "dummy" is the model proposed in the first Remote Sensing of Environment paper and "harmonic" is the model used in the second Remote Sensing of Environment paper (See paper for more details) and where "none" indicates that no seasonal model will be fitted (i.e. St = 0 ). If there is no seasonal cycle (e.g. frequency of the time series is 1) "none" can be selected to avoid fitting a seasonal model.
So set season = "none" in bfast().

Time Series Forecasting using Support Vector Machine (SVM) in R

I've tried searching but couldn't find a specific answer to this question. So far I'm able to realize that Time Series Forecasting is possible using SVM. I've gone through a few papers/articles who've performed the same but didn't mention any code, instead explained the algorithm (which I didn't quite understand). And some have done it using python.
My problem here is that: I have a company data(say univariate) of sales from 2010 to 2017. And I need to forecast the sales value for 2018 using SVM in R.
Would you be kind enough to simply present and explain the R code to perform the same using a small example?
I really do appreciate your inputs and efforts!
Thanks!!!
let's assume you have monthly data, for example derived from Air Passengers data set. You don't need the timeseries-type data, just a data frame containing time steps and values. Let's name them x and y. Next you develop an svm model, and specify the time steps you need to forecast. Use the predict function to compute the forecast for given time steps. That's it. However, support vector machine is not commonly regarded as the best method for time series forecasting, especially for long series of data. It can perform good for few observations ahead, but I wouldn't expect good results for forecasting eg. daily data for a whole next year (but it obviously depends on data). Simple R code for SVM-based forecast:
# prepare sample data in the form of data frame with cols of timesteps (x) and values (y)
data(AirPassengers)
monthly_data <- unclass(AirPassengers)
months <- 1:144
DF <- data.frame(months,monthly_data)
colnames(DF)<-c("x","y")
# train an svm model, consider further tuning parameters for lower MSE
svmodel <- svm(y ~ x,data=DF, type="eps-regression",kernel="radial",cost=10000, gamma=10)
#specify timesteps for forecast, eg for all series + 12 months ahead
nd <- 1:156
#compute forecast for all the 156 months
prognoza <- predict(svmodel, newdata=data.frame(x=nd))
#plot the results
ylim <- c(min(DF$y), max(DF$y))
xlim <- c(min(nd),max(nd))
plot(DF$y, col="blue", ylim=ylim, xlim=xlim, type="l")
par(new=TRUE)
plot(prognoza, col="red", ylim=ylim, xlim=xlim)

Time series forecasting, dealing with known big orders

I have many data sets with known outliers (big orders)
data <- matrix(c("08Q1","08Q2","08Q3","08Q4","09Q1","09Q2","09Q3","09Q4","10Q1","10Q2","10Q3","10Q4","11Q1","11Q2","11Q3","11Q4","12Q1","12Q2","12Q3","12Q4","13Q1","13Q2","13Q3","13Q4","14Q1","14Q2","14Q3","14Q4","15Q1", 155782698, 159463653.4, 172741125.6, 204547180, 126049319.8, 138648461.5, 135678842.1, 242568446.1, 177019289.3, 200397120.6, 182516217.1, 306143365.6, 222890269.2, 239062450.2, 229124263.2, 370575384.7, 257757410.5, 256125841.6, 231879306.6, 419580274, 268211059, 276378232.1, 261739468.7, 429127062.8, 254776725.6, 329429882.8, 264012891.6, 496745973.9, 284484362.55),ncol=2,byrow=FALSE)
The top 11 outliers of this specific series are:
outliers <- matrix(c("14Q4","14Q2","12Q1","13Q1","14Q2","11Q1","11Q4","14Q2","13Q4","14Q4","13Q1",20193525.68, 18319234.7, 12896323.62, 12718744.01, 12353002.09, 11936190.13, 11356476.28, 11351192.31, 10101527.85, 9723641.25, 9643214.018),ncol=2,byrow=FALSE)
What methods are there that i can forecast the time series taking these outliers into consideration?
I have already tried replacing the next biggest outlier (so running the data set 10 times replacing the outliers with the next biggest until the 10th data set has all the outliers replaced).
I have also tried simply removing the outliers (so again running the data set 10 times removing an outlier each time until all 10 are removed in the 10th data set)
I just want to point out that removing these big orders does not delete the data point completely as there are other deals that happen in that quarter
My code tests the data through multiple forecasting models (ARIMA weighted on the out sample, ARIMA weighted on the in sample, ARIMA weighted, ARIMA, Additive Holt-winters weighted and Multiplcative Holt-winters weighted) so it needs to be something that can be adapted to these multiple models.
Here are a couple more data sets that i used, i do not have the outliers for these series yet though
data <- matrix(c("08Q1","08Q2","08Q3","08Q4","09Q1","09Q2","09Q3","09Q4","10Q1","10Q2","10Q3","10Q4","11Q1","11Q2","11Q3","11Q4","12Q1","12Q2","12Q3","12Q4","13Q1","13Q2","13Q3","13Q4","14Q1","14Q2","14Q3", 26393.99306, 13820.5037, 23115.82432, 25894.41036, 14926.12574, 15855.8857, 21565.19002, 49373.89675, 27629.10141, 43248.9778, 34231.73851, 83379.26027, 54883.33752, 62863.47728, 47215.92508, 107819.9903, 53239.10602, 71853.5, 59912.7624, 168416.2995, 64565.6211, 94698.38748, 80229.9716, 169205.0023, 70485.55409, 133196.032, 78106.02227), ncol=2,byrow=FALSE)
data <- matrix(c("08Q1","08Q2","08Q3","08Q4","09Q1","09Q2","09Q3","09Q4","10Q1","10Q2","10Q3","10Q4","11Q1","11Q2","11Q3","11Q4","12Q1","12Q2","12Q3","12Q4","13Q1","13Q2","13Q3","13Q4","14Q1","14Q2","14Q3",3311.5124, 3459.15634, 2721.486863, 3286.51708, 3087.234059, 2873.810071, 2803.969394, 4336.4792, 4722.894582, 4382.349583, 3668.105825, 4410.45429, 4249.507839, 3861.148928, 3842.57616, 5223.671347, 5969.066896, 4814.551389, 3907.677816, 4944.283864, 4750.734617, 4440.221993, 3580.866991, 3942.253996, 3409.597269, 3615.729974, 3174.395507),ncol=2,byrow=FALSE)
If this is too complicated then an explanation of how, in R, once outliers are detected using certain commands, the data is dealt with to forecast. e.g smoothing etc and how i can approach that writing a code myself (not using the commands that detect outliers)
Your outliers appear to be seasonal variations with the largest orders appearing in the 4-th quarter. Many of the forecasting models you mentioned include the capability for seasonal adjustments. As an example, the simplest model could have a linear dependence on year with corrections for all seasons. Code would look like:
df <- data.frame(period= c("08Q1","08Q2","08Q3","08Q4","09Q1","09Q2","09Q3","09Q4","10Q1","10Q2","10Q3",
"10Q4","11Q1","11Q2","11Q3","11Q4","12Q1","12Q2","12Q3","12Q4","13Q1","13Q2",
"13Q3","13Q4","14Q1","14Q2","14Q3","14Q4","15Q1"),
order= c(155782698, 159463653.4, 172741125.6, 204547180, 126049319.8, 138648461.5,
135678842.1, 242568446.1, 177019289.3, 200397120.6, 182516217.1, 306143365.6,
222890269.2, 239062450.2, 229124263.2, 370575384.7, 257757410.5, 256125841.6,
231879306.6, 419580274, 268211059, 276378232.1, 261739468.7, 429127062.8, 254776725.6,
329429882.8, 264012891.6, 496745973.9, 42748656.73))
seasonal <- data.frame(year=as.numeric(substr(df$period, 1,2)), qtr=substr(df$period, 3,4), data=df$order)
ord_model <- lm(data ~ year + qtr, data=seasonal)
seasonal <- cbind(seasonal, fitted=ord_model$fitted)
library(reshape2)
library(ggplot2)
plot_fit <- melt(seasonal,id.vars=c("year", "qtr"), variable.name = "Source", value.name="Order" )
ggplot(plot_fit, aes(x=year, y = Order, colour = qtr, shape=Source)) + geom_point(size=3)
which gives the results shown in the chart below:
Models with a seasonal adjustment but nonlinear dependence upon year may give better fits.
You already said you tried different Arima-models, but as mentioned by WaltS, your series don't seem to contain big outliers, but a seasonal-component, which is nicely captured by auto.arima() in the forecast package:
myTs <- ts(as.numeric(data[,2]), start=c(2008, 1), frequency=4)
myArima <- auto.arima(myTs, lambda=0)
myForecast <- forecast(myArima)
plot(myForecast)
where the lambda=0 argument to auto.arima() forces a transformation (or you could take log) of the data by boxcox to take the increasing amplitude of the seasonal-component into account.
The approach you are trying to use to cleanse your data of outliers is not going to be robust enough to identify them. I should add that there is a free outlier package in R called tsoutliers, but it won't do the things I am about to show you....
You have an interesting time series here. The trend changes over time with the upward trend weakening a bit. If you bring in two time trend variables with the first beginning at 1 and another beginning at period 14 and forward you will capture this change. As for seasonality, you can capture the high 4th quarter with a dummy variable. The model is parsimonios as the other 3 quarters are not different from the average plus no need for an AR12, seasonal differencing or 3 seasonal dummies. You can also capture the impact of the last two observations being outliers with two dummy variables. Ignore the 49 above the word trend as that is just the name of the series being modeled.

Negative values in timeseries when removing seasonal values with HoltWinters (R)

i'm new to R, so I'm having trouble with this time series data
For example (the real data is way larger)
data <- c(7,5,3,2,5,2,4,11,5,4,7,22,5,14,18,20,14,22,23,20,23,16,21,23,42,64,39,34,39,43,49,59,30,15,10,12,4,2,4,6,7)
ts <- ts(data,frequency = 12, start = c(2010,1))
So if I try to decompose the data to adjust it
ts.decompose <- decompose(ts)
ts.adjust <- ts - ts.decompose$seasonal
ts.hw <- HoltWinters(ts.adjust)
ts.forecast <- forecast.HoltWinters(ts.hw, h = 10)
plot.forecast(ts.forecast)
But for the first values I got negative values, why this is happening?
Well, you are forecasting the seasonally adjusted time series, and of course the deseasonalized series ts.adjust can already contain negative values by itself, and in fact, it actually does.
In addition, even if the original series contained only positive values, Holt-Winters can yield negative forecasts. It is not constrained.
I would suggest trying to model your original (not seasonally adjusted) time series directly using ets() in the forecast package. It usually does a good job in detecting seasonality. (And it can also yield negative forecasts or prediction intervals.)
I very much recommend this free online forecasting textbook. Given your specific question, this may also be helpful.

R - Trend estimation for short time series

I have very short time series of data from a climatic experiment that I did back in 2012. The data consists of daily water solution flux and daily CO2 flux data. The CO2 flux data comprises 52 days and the water solution flux data is only 7 days long. I have several measurements a day for the CO2 flux data but I calculated daily averages.
Now, I would like to know if there is a trend in these time series. I have figured out that I can use the Kendall trend test or a Theil-Sen trend estimator. I used the Kendall test before for a time series spanning several years. I don't know how to use the Theil-Sen trend estimator.
I put my data into an ts object in R, but when I tried doing a decompositon (using the function decompose) I get the error that the time series is spanning less than 2 periods. I would like to extract the trend data and then do a Mann-Kendall test on it.
Here is the code that I got so far:
myexample <- structure(c(624.27, 682.06, 672.77,
765.96, 759.52, 760.38, 742.81
), .Names = c("Day1", "Day2", "Day3", "Day4", "Day5", "Day6",
"Day7"))
ts.object <- ts(myexample, frequency = 365, start = 1)
decomp.ts.obj <- decompose(ts.obj, type = "mult", filter=NULL)
# Error in decompose(ts.obj, type = "mult", filter = NULL)
Can anyone help me on how to do a trend analysis with my very short time series? Any google-fu was to no avail. And, can someone tell me what the size of the Kendall tau means? It spans values from -1 to 1. Is a tau=0.5 a strong or a weak trend?
Thanks,
Stefan
I would be tempted to do something simple like
d <- data.frame(val=myexample,ind=seq(myexample))
summary(lm(val~ind,d))
or
library(lmPerm)
summary(lmp(val~ind,d))
or
cor.test(~val+ind,data=d,method="kendall")
Whether tau=0.5 is strong or weak depends a lot on context. In this case the p-value is 0.24, which says at least that this signal (based on rank alone) is not distinguishable from an appropriate null signal.

Resources