My timeseries comes from the datasets package. It is called "USAccDeaths".
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1973 9007 8106 8928 9137 10017 10826 11317 10744 9713 9938 9161 8927
1974 7750 6981 8038 8422 8714 9512 10120 9823 8743 9129 8710 8680
1975 8162 7306 8124 7870 9387 9556 10093 9620 8285 8466 8160 8034
1976 7717 7461 7767 7925 8623 8945 10078 9179 8037 8488 7874 8647
1977 7792 6957 7726 8106 8890 9299 10625 9302 8314 8850 8265 8796
1978 7836 6892 7791 8192 9115 9434 10484 9827 9110 9070 8633 9240
When I make an out of sample forcast in GRETL i get the follwoing:
However, when i make the same forecast in R, my results differ subtantially.
This is my r code:
library(forecast)
fit <- arima(USAccDeaths[1:60], order = c(1,1,0), method = "ML")
preds <- as.vector(forecast(fit, h = 12))$mean
RMSE <- sqrt(mean((preds - as.vector(USAccDeaths[61:72])) ^ 2))
RMSE
I get an RMSE of 946.024. This is my predictions in R:
[1] 8803.104 8803.199 8803.201 8803.201 8803.201 8803.201 8803.201 8803.201 8803.201
[10] 8803.201 8803.201 8803.201here
What is the problem? How can I get the same results in both programs?
Related
I'm having a lot of trouble plotting my time series data in R Studio. My data is laid out as follows:
tsf
Time Series:
Start = 1995
End = 2021
Frequency = 1
Jan Feb Mar Apr May Jun July Aug Sep Oct Nov Dec
1995 10817 8916 9697 10314 9775 7125 9007 6000 4155 3692 2236 996
1996 12773 12562 13479 14280 13839 9168 10959 6582 5162 4815 3768 1946
1997 14691 12982 13545 14131 14162 10415 11420 7870 6340 6869 6777 6637
1998 17192 15480 14703 16903 15921 13381 13779 9127 6676 6511 5419 3447
1999 13578 19470 23411 18190 18979 17296 16588 12561 10405 8537 7304 4003
2000 20100 29419 30125 27147 27832 23874 19728 15847 11477 9301 6933 3486
2001 16528 22258 22146 19027 19436 15688 14558 10609 6799 6563 4816 2480
2002 14724 19424 21391 17215 18775 13017 14385 10044 7649 6598 4497 2766
2003 17051 20182 18564 18484 15365 12180 13313 8859 6830 6371 3781 2012
2004 16875 20084 21150 19057 16153 13619 14144 9599 7390 5830 3763 2033
2005 20002 24153 23160 20864 18331 14950 14149 11086 7475 6290 3779 2134
2006 24605 26384 24858 20634 18951 15048 14905 10749 7259 5479 3074 1509
2007 29281 26495 25974 21427 20232 15465 15738 10006 6674 5301 2857 1304
2008 32961 24290 20190 17587 12172 7369 16175 6822 4364 2699 1174 667
2009 10996 8793 7345 5558 4840 4833 4355 2422 2272 1596 948 474
2010 10469 11707 12379 9599 8893 8314 7018 5310 4683 3742 2146 647
2011 13624 13470 12390 11171 9359 9240 6953 3653 2861 2216 1398 597
2012 14507 10993 10581 9388 7986 5481 6164 3736 2783 2442 1421 774
2013 10735 9671 10596 8113 7095 3293 9306 4504 3257 2832 1307 639
2014 15975 11906 11485 11757 7767 3390 14037 6201 4376 3082 1465 920
2015 20105 15384 17054 13166 9027 3924 21290 8572 5924 3943 1874 847
2016 27106 21173 20096 14847 10125 4143 22462 9781 5842 3831 1846 679
2017 26668 16905 17180 13427 9581 3585 21316 8105 4828 3255 1594 601
2018 25813 16501 16088 11557 9362 3716 20743 7681 4397 2874 1647 778
2019 22279 14178 14404 13794 9126 3858 18741 7202 4104 3214 1676 729
2020 20665 13263 10239 1338 1490 2189 15329 7360 5747 4189 1468 1032
2021 16948 11672 10672 8214 7337 4980 20232 8563 6354 3882 2167 832
When I attempt rudimentary code to plot the data I get the following
plot(tsf)
'Error in plotts(x = x, y = y, plot.type = plot.type, xy.labels = xy.labels, :
cannot plot more than 10 series as "multiple"'
My data is monthly and therefore 12 months exceed this apparent limit of 10 graphs.I've been able to make some plot by excluding two months but this is not practical for me.
I've looked at lots of answers on this, many of which recommending ggplot() {ggplot2}
The link below had data most closely resembling my data but I still wasn't able to apply it.
issues plotting multivariate time series in R
Any help greatly appreciated.
I think the problem is with the shape of your data. It's indicating Frequency = 1, showing that it thinks the monthly columns are separate yearly time series, rather than a continuous time series across months. To plot the whole time length you can reshape your time series to match monthly frequency (from a simulated dataset of values):
tsf_switched <- ts(as.vector(t(tsf)), start = 1995, frequency = 12)
plot(tsf)
Created on 2022-05-07 by the reprex package (v2.0.1)
one solution with {ggplot2} and two convenience libraries:
library(dplyr)
library(tsbox) ## for convenient ts -> dataframe conversion
library(lubridate) ## time and date manipulation
## example monthly data for twelve years:
example_ts <- ts(runif(144), start = 2010, end = 2021, frequency = 12)
ts_data.frame(example_ts) %>% ## {tsbox}
mutate(year = year(time),
day_month = paste0(day(time),'/', month(time))
) %>%
ggplot() +
geom_line(aes(day_month,
value,
group = year
)
)
ways to convert time series to dataframes (required as ggplot input): Transforming a time-series into a data frame and back
My data is timeseries :
library(forecast)
library(Mcomp)
ts.data<- subset(M3, 12)[[551]]
print(ts.data)
Output:
Series: M551
Type of series: INDUSTRY
Period of series: MONTHLY
Series description: Lumber, softwoods, western pine, production
HISTORICAL data
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1983 5960 6320 7050 7080 7170 7620 7000 7790 7830 7580 6620 6860
1984 7240 6550 8730 8690 7650 7950 7210 7880 7040 7940 7290 5750
1985 6410 6230 7260 8220 7770 7830 7590 9210 8340 8660 7330 7000
1986 7500 7430 8350 9010 8470 8900 8720 9710 10070 9610 8410 8640
1987 8500 8920 10470 8960 9390 10630 9380 10050 9030 10580 9350 8810
1988 9300 10430 10570 10440 10120 8780 7460 8230 9830 10260 9270 9260
1989 9260 8150 9930 8840 9150 10230 9340 10170 9150 10420 8690 8960
1990 9960 9050 10430 9650 9350 8790 8550 8790 7590 8730 7520 6110
1991 7450 7230 7680 8270 8060 8210 8260 8910 8540 8180 7430 6880
1992 7360 7560 8800 7550 7900 8320 7430
FUTURE data
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1992 7650 7460 8580 7300 6530
1993 7070 6940 7060 7470 6190 6310 7090 7310 6310 7270 7240 6410
1994 7470
How can i build ARIMA model and then find the Mean Absolute Error (MAE) of the model ?
Any thoughts to start would be helpfull.
Here's one way you can do it (Note: to do this properly you should also investigate ACF/PACF prior to modelling as well as perform cross-validation after producing the initial models):
# Install pacakges if they are not already installed: necessary_packages => vector
necessary_packages <- c("forecast", "Mcomp")
# Create a vector containing the names of any packages needing installation:
# new_pacakges => vector
new_packages <- necessary_packages[!(necessary_packages %in%
installed.packages()[, "Package"])]
# If the vector has more than 0 values, install the new pacakges
# (and their) associated dependencies:
if(length(new_packages) > 0){install.packages(new_packages, dependencies = TRUE)}
# Initialise the packages in the session: list of boolean => stdout (console)
lapply(necessary_packages, require, character.only = TRUE)
# Subset the M3 data to contain the relevant series:
ts.data <- subset(M3, 12)[[551]]
# Extract the historical data:
historical_production_ts <- ts.data[["x"]]
# Set the seed for reproducibility:
set.seed(2020)
# It is clear from this series we have trend and seasonality:
decomposed_ts <- decompose(historical_production_ts)
# Chart the decomposed series:
plot(decomposed_ts)
# Note: we have to make series stationary in order to compute
# an ARIMA model, to do this we need to account for trend and
# seasonality:
# Use a unit root test to determine the number of differences required
# to make the trend of the series stationary:
trend_required_diffs <- ndiffs(decomposed_ts$trend)
# Use a unit root test to determin the number of differences required
# to make the seasonality of the series stationary:
seasonality_required_diffs <- ndiffs(decomposed_ts$seasonal)
# Create an auto-arima model and store the result in the fit variable:
fit <- auto.arima(historical_production_ts,
# Account for trend:
d = trend_required_diffs,
# Account for seasonality:
D = seasonality_required_diffs,
# Try out alot of different models:
stepwise = FALSE,
# Don't approximate the AIC:
approximation = FALSE)
# Check the Mean Absolute Error (MAE) of the model:
data.frame(accuracy(fit))[,"MAE"]
# Forecast out the number of future observations required:
aa_fcast <- forecast(fit, length(ts.data$xx))
# Chart the forecast:
autoplot(aa_fcast) +
autolayer(ts.data$xx, series = "Actual Production (Future)") +
autolayer(aa_fcast$mean, series = "Forecasts")
# A function to calculate the MAE:
mae <- function(actual, pred){
mean(abs(pred-actual))
}
# Calculate the accuracy of the forecast against the future data:
mean_abs_error <- mae(ts.data$xx, data.frame(aa_fcast)[,"Point.Forecast", drop = TRUE])
I am plotting a time series data in R. In the X-axis I should get the year as 2014, 2015, 2016 where as I'm getting 2014.0, 2014.5, 2015.0, 2015.5, 2016.0 and 2016.5 which is very annoying. How can I get rid of this?
Below given is the code I have used.
inflow<-ts(inflow,start = c(2014,1),frequency = 12)
plot(inflow, xlab="Year", ylab="Inflow Count")
Can anyone please help me how should I get rid of the decimal part in the year field in X-axis. I am attaching the image (R Plot) with my resulting output as well.
It depends on the input data or the way you want to plot your data. Give following lines a try for your needs:
data(USAccDeaths)
USAccDeaths
plot(USAccDeaths, type="l", pch=16, cex=1, col="#425a10",bty="l", xlab="Year",ylab="Accident Deaths", main="Accident Deaths USA 1973 - 1979")
The input data:
YEAR Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1973 9007 8106 8928 9137 10017 10826 11317 10744 9713 9938 9161 8927
1974 7750 6981 8038 8422 8714 9512 10120 9823 8743 9129 8710 8680
1975 8162 7306 8124 7870 9387 9556 10093 9620 8285 8466 8160 8034
1976 7717 7461 7767 7925 8623 8945 10078 9179 8037 8488 7874 8647
1977 7792 6957 7726 8106 8890 9299 10625 9302 8314 8850 8265 8796
1978 7836 6892 7791 8192 9115 9434 10484 9827 9110 9070 8633 9240
and the resulting plot:
I've used r to get some forecast result.
library(forecast)
fc<-forecast(fit.ets)
fc
I got a result like this
Points Forecast Lo 80 Hi 80 Lo 95 Hi 95
19.5, 1.8895 xxx xxx xxx xxx
20.0 xxxx xxx xxx xxx xxx
...
I want to get the Points Column to plot my data, how can I get this column?
You haven't copied and pasted correctly. The output is like this:
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
19.5 1.8895 xxx xxx xxx xxx
20.0 xxxx xxx xxx xxx xxx
...
The left hand column gives the time periods. The second column gives the "Point Forecast" -- that is, the estimated mean or median of each future observation.
If you just want the future times, you can get them using the time() function:
time(fc$mean)
Contrary to what you say in the comments, ets() does not change the x-axis scale, it simply fits a model. When you pass that model to forecast.ets(), the resulting times are a continuation of the sequence of time periods from the data that you provide.
For example:
> USAccDeaths
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1973 9007 8106 8928 9137 10017 10826 11317 10744 9713 9938 9161 8927
1974 7750 6981 8038 8422 8714 9512 10120 9823 8743 9129 8710 8680
1975 8162 7306 8124 7870 9387 9556 10093 9620 8285 8466 8160 8034
1976 7717 7461 7767 7925 8623 8945 10078 9179 8037 8488 7874 8647
1977 7792 6957 7726 8106 8890 9299 10625 9302 8314 8850 8265 8796
1978 7836 6892 7791 8192 9115 9434 10484 9827 9110 9070 8633 9240
> library(forecast)
> ets.fit <- ets(USAccDeaths)
> fc <- forecast(ets.fit)
> fc
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Jan 1979 8252.1 7945.0 8559.2 7782.4 8721.8
Feb 1979 7473.7 7162.8 7784.6 6998.3 7949.2
Mar 1979 8281.0 7903.8 8658.3 7704.0 8858.0
Apr 1979 8527.1 8107.5 8946.6 7885.4 9168.7
....
Nov 1980 8794.6 7960.6 9628.6 7519.1 10070.1
Dec 1980 9072.3 8195.5 9949.0 7731.4 10413.2
> time(fc$mean)
Jan Feb Mar Apr May Jun Jul Aug Sep
1979 1979.0 1979.1 1979.2 1979.3 1979.3 1979.4 1979.5 1979.6 1979.7
1980 1980.0 1980.1 1980.2 1980.3 1980.3 1980.4 1980.5 1980.6 1980.7
Oct Nov Dec
1979 1979.8 1979.8 1979.9
1980 1980.8 1980.8 1980.9
I am currently using ets() to forecast future values based on historic time series data in R. I used forecast() function to predict next 24 data points. However, the output gives same numbers for the first 12 and the last 12 data points. For example, the forecast-ed value of May 2012 is replicated in May 2013.
Following Data passed:
2005.04.30 87.6
2005.05.31 95.4
2005.06.30 97.7
2005.07.31 101.3
2005.08.31 100.6
2005.09.30 97
2005.10.31 91.1
2005.11.30 92.1
2005.12.31 112
2006.01.31 113.9
2006.02.28 103.9
2006.03.31 115.1
2006.04.30 100
2006.05.31 107.5
2006.06.30 110
2006.07.31 114.2
2006.08.31 109.4
2006.09.30 108.9
2006.10.31 114.6
2006.11.30 113
2006.12.31 116.5
2007.01.31 120.2
2007.02.28 112.6
2007.03.31 124.1
2007.04.30 113.4
2007.05.31 121
2007.06.30 117.9
2007.07.31 118.4
2007.08.31 119.5
2007.09.30 113.5
2007.10.31 117.8
2007.11.30 118.2
2007.12.31 120.6
2008.01.31 126.1
2008.02.29 121.2
2008.03.31 127.4
2008.04.30 119.5
2008.05.31 121.5
2008.06.30 125.7
2008.07.31 131.4
2008.08.31 123.5
2008.09.30 122.8
2008.10.31 125.3
2008.11.30 119.4
2008.12.31 121.2
2009.01.31 123.7
2009.02.28 118.1
2009.03.31 128.7
2009.04.30 112.2
2009.05.31 115.4
2009.06.30 119.8
2009.07.31 117.4
2009.08.31 127.8
2009.09.30 124.4
2009.10.31 131
2009.11.30 118.9
2009.12.31 124
2010.01.31 127.4
2010.02.28 116.3
2010.03.31 126.4
2010.04.30 115.7
2010.05.31 117.7
2010.06.30 122.4
2010.07.31 121.9
2010.08.31 116.7
2010.09.30 110.9
2010.10.31 120.7
2010.11.30 116.7
2010.12.31 131.2
2011.01.31 137.1
2011.02.28 118.7
2011.03.31 128.5
2011.04.30 123.5
2011.05.31 126.1
2011.06.30 127.7
2011.07.31 125.3
2011.08.31 126.7
2011.09.30 114
2011.10.31 116.5
2011.11.30 128
2011.12.31 130.6
Code:
ETSfit <- ets(data.ts)
data.ets <- forecast(ETSfit, level=70, h=24)
Output:
Point Forecast Lo 70 Hi 70
Jan 2012 133.6314 129.3483 137.9145
Feb 2012 123.5998 118.7221 128.4775
Mar 2012 133.1607 127.7534 138.5681
Apr 2012 121.0877 115.1982 126.9773
May 2012 125.4991 119.1639 131.8342
Jun 2012 127.5913 120.8399 134.3427
Jul 2012 128.4923 121.3489 135.6358
Aug 2012 127.2225 119.7074 134.7376
Sep 2012 122.1938 114.3247 130.0630
Oct 2012 125.5382 117.3302 133.7462
Nov 2012 123.3347 114.8012 131.8682
Dec 2012 129.9972 121.1503 138.8441
Jan 2013 133.6314 124.4818 142.7810
Feb 2013 123.5998 114.1572 133.0424
Mar 2013 133.1607 123.4340 142.8875
Apr 2013 121.0877 111.0849 131.0906
May 2013 125.4991 115.2275 135.7706
Jun 2013 127.5913 117.0579 138.1246
Jul 2013 128.4923 117.7035 139.2812
Aug 2013 127.2225 116.1841 138.2609
Sep 2013 122.1938 110.9114 133.4763
Oct 2013 125.5382 114.0169 137.0595
Nov 2013 123.3347 111.5793 135.0901
Dec 2013 129.9972 118.0123 141.9821
Kindly help.
Look at the fitted model:
ETS(A,N,A)
Call:
ets(y = x)
Smoothing parameters:
alpha = 0.5449
gamma = 1e-04
Initial states:
l = 95.8994
s=6.3817 -3.1792 6.8525 3.218 -3.4445 -1.2408
-4.5852 0.4434 1.7133 0.8123 -1.28 -5.6914
sigma: 4.1325
AIC AICc BIC
613.8103 620.1740 647.3326
So there is no trend selected. Therefore the forecasts will have only seasonal pattern and no trend, which is exactly what you've got.