How to compare two forecasted graph for two different time series in R? - r

Actually I want to compare the forecasted graph for two different time series data. I have data for 5 year for two different city of rain data which has been observed monthly. For that I have plotted the graph for 5 years of period of time series and also for 2 more year in future using forecast package for both city. Now I want to compare graph these two graphs and their future prediction for 2 years(may be in terms of error).
Can anyone help me out of these.

You could start with something like this:
f1 <- forecast(series1, h=24)
f2 <- forecast(series2, h=24)
accuracy(f1)
accuracy(f2)
That will give you a lot of error measures on the historical data. Unless you have the actual data for the future periods, you can't do much more than that.

Related

Timeseries Analysis: Observed Values do not Correspond with Input Data

I have generated a decomposition of an additive time series for METAR wind data at a Norwegian airport. I have noticed that the monthly average wind values do not correspond with the observed values shown in the decomposition chart. During the month of January (2014) average winds were measured at 5.74 kts, however the chart shows a dip down to a value below 3 kts. I noticed, however, that when I separated each variable into its own dataset and ran the decomposition separately, the issue had been resolved. Has this got something to do with the way imported data is read? ... Sorry if it seems to be a silly question. Screenshots and code below. Thanks!
To define ts data:
RtestENGM_ts <- ts(test$Sknt, start=c(2012, 1), frequency=12)
To decompose ts data:
decomposed_test <- decompose(RtestENGM_ts, type="additive")
To plot decomposed data:
plot(decomposed_sknt2012ENGM)
To plot ts data
plot(RtestENGM_ts)
Input dataset:
Decompoition of additive time series 2012-22:
I tried importing each variable individually as part of their own respective datasets, this allowed for the correct observed values to be plotted. I still do not understand why r needs the imported variables to be separate. Do I really need to split my data across dozens of spreadsheets? Does R stryggle to isolate a single column during decomposition?

time series with multiple observations per unit of time

I have a dataset of the daily spreads of 500 stocks. My eventual goal is to make a model using extreme value theory. However as one of the first steps, I want to check my data for volatility clustering and leptokurticity. So I first want R to see my data as a time series and I want to plot my data. However, I only find examples of time series with only one observation per unit of time. Is there a possibility for R to treat my type of dataset as a time series? And what's the best way to plot it?

How to create and analyze a time series with variable test frequency in R

Here is a short description of the problem I am trying to solve: I have test data for multiple variables (weight, thickness, absorption, etc.) that are taken at varying intervals over time - no set schedule, sometimes a test a day, sometimes days might go between tests. I want to detect trends in each of these and alert stake holders when any parameter is trending up/down more than a certain amount. I first did a linear model between each variable's raw data and test time (I converted the test time to days or weeks since a fixed date) and create a table with slopes for each variable - so the stake holders can view one table for all variables and quickly see if any of them is raising concern. The issue was that the data for most variables is very noisy. Someone suggested using time series functions, separating noise and seasonality from the trends, and studying the trend component for a cleaner analysis. I started to look into this and see a couple concerns/questions already:
Time series analysis seems to require specifying a frequency - how do you handle this if your test data is not taken at regular intervals
If one gets over the issue in #1 above, decomposes the data, and gets the trend separated out (ie. take out particularly the random variation/noise), how would you then get a slope metric from that? Namely, if I wanted to then fit a linear model to the trend component of the raw data (after decomposing), what would be the x (independent) variable? Is there a way to connect the trend component of the ts-decompose function with the original data's x-axis data (in this case the actual test date/times, say converted to weeks or days from a fixed date)?
Finally, is there a better way of accomplishing what I explained above? I am only looking for general trends over time - say over 3 months of data, not day to day trends.
Time series are generally used to see if previous observations of a variable have influence on future observations. You would model under the assumption that the previous observations are able to predict the future observations. That is the reason for that most (not all) time series models require evenly spaced instances of training data. If your data is not only very noisy, but also not collected on a regular basis, then you should seriously consider if time series is the appropriate choice of modelling.
Time series analysis seems to require specifying a frequency - how do you handle this if your test data is not taken at regular intervals.
What you can do, is creating an aggregate by increasing the time bucket (shift from daily data to a weekly average for instance) such that every unit of time has an instance of training data. Following your final comment, you could create the average of the observations of the last 3 months of data instead from the observations.
If one gets over the issue in #1 above, decomposes the data, and gets the trend separated out (ie. take out particularly the random variation/noise), how would you then get a slope metric from that? Namely, if I wanted to then fit a linear model to the trend component of the raw data (after decomposing), what would be the x (independent) variable?
In the simplest case of a linear model, the independent variable is the unit of time corresponding to the prediction you are trying to make. However this is not always regarded a time series model.
In the case of an autoregressive model, this would be the previous observation of what you are trying to predict, something similar to y(t) = x(t-1), for instance multiplied by a smoothing factor. I encourage you to read Forecasting: principles and practice which is an excellent book on the matter.
Is there a way to connect the trend component of the ts-decompose function with the original data's x-axis data (in this case the actual test date/times, say converted to weeks or days from a fixed date)?
The function decompose.ts returns a list which includes trend. Trend is a vector of the estimated trend components corresponding to it's respective time value.
Let's create an example time series with linear trend
df <- data.frame(
date = seq(from = as.Date("2021-01-01"), to = as.Date("2021-01-10"), by=1)
)
df$value <- jitter(seq(from = 1, to = nrow(df), by=1))
time_series <- ts(df$value, frequency = 5)
df$trend <- decompose(time_series)$trend
> df
date value trend
1 2021-01-01 0.9170296 NA
2 2021-01-02 1.8899565 NA
3 2021-01-03 3.0816892 2.992256
4 2021-01-04 4.0075589 4.042486
5 2021-01-05 5.0650478 5.046874
6 2021-01-06 6.1681775 6.051641
7 2021-01-07 6.9118942 7.074260
8 2021-01-08 8.1055282 8.041628
9 2021-01-09 9.1206522 NA
10 2021-01-10 9.9018900 NA
As you see, the trend component already is an estimate of the dependent variable at the corresponding time. In decompose the estimate of trend is based on a moving average.

Determining ARIMA frequency of non-stationary time series

I am trying to use ARIMA to forecast chemical concentrations in water tanks. I have a large dataset of around a million intervals, two minutes apart. When i use the autoarima in R i get a forecast looking like this:
Forecast
As you can see, it evens itself out, which makes larger forecasts quite useless.
As far as i can read myself to, the frequency of the time series is what i need to address in the model. I just simply cannot find anywhere that explains this. Frequency in this case is not that there is two minutes between each observation, but is something along the lines of "twelve observations per year" for a monthly observation, where the seasons have an effect on the data.
Here is a plot of the data, if it helps
Plot
and on a smaller scale:
Smaller scale plot
Have a look at this question and answer on stats stack exchange pretty much the same question, and the answer basically answers it.

Suggestions for clustering methods

I have two time series of meteorological measurements (i.e., X and Y). Both X and Y time series were constructed using daily measurements over a period of one year. By plotting X time series versus Y times series as a scatterplot and connecting all the points by date in ascending order, a closed loop is obtained representing the annual cycle. I have measurements at N locations and thus I have N loops (i.e., annual cycles) which I want to cluster to find those that have similar shapes.
With so many clustering methods, I am not sure which one will be more appropriate to use for this analysis (initially I was
thinking to use self-organizing maps).
Thank you very much for any suggestions.
Unless you have too many time series, I suggest to start with hierarchical clustering. It's easy to interpret because of the dendrogram.
For similarity, a cyclic version of DTW may be good, assuming that there is some delay between different locations.

Resources