I have very short time series of data from a climatic experiment that I did back in 2012. The data consists of daily water solution flux and daily CO2 flux data. The CO2 flux data comprises 52 days and the water solution flux data is only 7 days long. I have several measurements a day for the CO2 flux data but I calculated daily averages.
Now, I would like to know if there is a trend in these time series. I have figured out that I can use the Kendall trend test or a Theil-Sen trend estimator. I used the Kendall test before for a time series spanning several years. I don't know how to use the Theil-Sen trend estimator.
I put my data into an ts object in R, but when I tried doing a decompositon (using the function decompose) I get the error that the time series is spanning less than 2 periods. I would like to extract the trend data and then do a Mann-Kendall test on it.
Here is the code that I got so far:
myexample <- structure(c(624.27, 682.06, 672.77,
765.96, 759.52, 760.38, 742.81
), .Names = c("Day1", "Day2", "Day3", "Day4", "Day5", "Day6",
"Day7"))
ts.object <- ts(myexample, frequency = 365, start = 1)
decomp.ts.obj <- decompose(ts.obj, type = "mult", filter=NULL)
# Error in decompose(ts.obj, type = "mult", filter = NULL)
Can anyone help me on how to do a trend analysis with my very short time series? Any google-fu was to no avail. And, can someone tell me what the size of the Kendall tau means? It spans values from -1 to 1. Is a tau=0.5 a strong or a weak trend?
Thanks,
Stefan
I would be tempted to do something simple like
d <- data.frame(val=myexample,ind=seq(myexample))
summary(lm(val~ind,d))
or
library(lmPerm)
summary(lmp(val~ind,d))
or
cor.test(~val+ind,data=d,method="kendall")
Whether tau=0.5 is strong or weak depends a lot on context. In this case the p-value is 0.24, which says at least that this signal (based on rank alone) is not distinguishable from an appropriate null signal.
Related
Here is a short description of the problem I am trying to solve: I have test data for multiple variables (weight, thickness, absorption, etc.) that are taken at varying intervals over time - no set schedule, sometimes a test a day, sometimes days might go between tests. I want to detect trends in each of these and alert stake holders when any parameter is trending up/down more than a certain amount. I first did a linear model between each variable's raw data and test time (I converted the test time to days or weeks since a fixed date) and create a table with slopes for each variable - so the stake holders can view one table for all variables and quickly see if any of them is raising concern. The issue was that the data for most variables is very noisy. Someone suggested using time series functions, separating noise and seasonality from the trends, and studying the trend component for a cleaner analysis. I started to look into this and see a couple concerns/questions already:
Time series analysis seems to require specifying a frequency - how do you handle this if your test data is not taken at regular intervals
If one gets over the issue in #1 above, decomposes the data, and gets the trend separated out (ie. take out particularly the random variation/noise), how would you then get a slope metric from that? Namely, if I wanted to then fit a linear model to the trend component of the raw data (after decomposing), what would be the x (independent) variable? Is there a way to connect the trend component of the ts-decompose function with the original data's x-axis data (in this case the actual test date/times, say converted to weeks or days from a fixed date)?
Finally, is there a better way of accomplishing what I explained above? I am only looking for general trends over time - say over 3 months of data, not day to day trends.
Time series are generally used to see if previous observations of a variable have influence on future observations. You would model under the assumption that the previous observations are able to predict the future observations. That is the reason for that most (not all) time series models require evenly spaced instances of training data. If your data is not only very noisy, but also not collected on a regular basis, then you should seriously consider if time series is the appropriate choice of modelling.
Time series analysis seems to require specifying a frequency - how do you handle this if your test data is not taken at regular intervals.
What you can do, is creating an aggregate by increasing the time bucket (shift from daily data to a weekly average for instance) such that every unit of time has an instance of training data. Following your final comment, you could create the average of the observations of the last 3 months of data instead from the observations.
If one gets over the issue in #1 above, decomposes the data, and gets the trend separated out (ie. take out particularly the random variation/noise), how would you then get a slope metric from that? Namely, if I wanted to then fit a linear model to the trend component of the raw data (after decomposing), what would be the x (independent) variable?
In the simplest case of a linear model, the independent variable is the unit of time corresponding to the prediction you are trying to make. However this is not always regarded a time series model.
In the case of an autoregressive model, this would be the previous observation of what you are trying to predict, something similar to y(t) = x(t-1), for instance multiplied by a smoothing factor. I encourage you to read Forecasting: principles and practice which is an excellent book on the matter.
Is there a way to connect the trend component of the ts-decompose function with the original data's x-axis data (in this case the actual test date/times, say converted to weeks or days from a fixed date)?
The function decompose.ts returns a list which includes trend. Trend is a vector of the estimated trend components corresponding to it's respective time value.
Let's create an example time series with linear trend
df <- data.frame(
date = seq(from = as.Date("2021-01-01"), to = as.Date("2021-01-10"), by=1)
)
df$value <- jitter(seq(from = 1, to = nrow(df), by=1))
time_series <- ts(df$value, frequency = 5)
df$trend <- decompose(time_series)$trend
> df
date value trend
1 2021-01-01 0.9170296 NA
2 2021-01-02 1.8899565 NA
3 2021-01-03 3.0816892 2.992256
4 2021-01-04 4.0075589 4.042486
5 2021-01-05 5.0650478 5.046874
6 2021-01-06 6.1681775 6.051641
7 2021-01-07 6.9118942 7.074260
8 2021-01-08 8.1055282 8.041628
9 2021-01-09 9.1206522 NA
10 2021-01-10 9.9018900 NA
As you see, the trend component already is an estimate of the dependent variable at the corresponding time. In decompose the estimate of trend is based on a moving average.
I want to know how to make 'seasonal' in stl function.
I guess 'trend' is diff time series.
and how to make seasonal??
in R
'stl' description is
which.cycle <- cycle(x)
z$seasonal <- tapply(z$seasonal, which.cycle, mean)[which.cycle]
and R documentation
fdrfourier Calculation of the false discovery rates (FDR) for periodic expression
backgroundData Generation of background expression set
ar1analysis Performs AR1 fitting
fourierscore Calculation of the Fourier score
AR and Fourier transform ??
A time series cannot be made seasonal, strictly speaking. I suspect you mean extracting the seasonality pattern using the stl() function in R.
Let's consider an example of a time series measuring the maximum recorded air temperature in Dublin, Ireland from 1941 to 2019.
Here is a graph of the original time series:
weatherarima <- ts(mydata$maxtp, start = c(1941,11), frequency = 12)
plot(weatherarima,type="l",ylab="Temperature in Celsius")
title("Maximum Air Temperature - Dublin")
The seasonal, trend, and random components can be extracted with stl() as follows:
stl_weather = stl(weatherarima, "periodic")
seasonal_stl_weather <- stl_weather$time.series[,1]
trend_stl_weather <- stl_weather$time.series[,2]
random_stl_weather <- stl_weather$time.series[,3]
plot(as.ts(seasonal_stl_weather))
title("Seasonal")
plot(trend_stl_weather)
title("Trend")
plot(random_stl_weather)
title("Random")
Seasonal
Trend
Random
As can be observed, there is clear seasonality in the weather data - given the change in temperature across seasons - but this was merely extracted from the series by using stl() - not created as such.
You might find the following to be informative: Extracting Seasonality and Trend from Data: Decomposition Using R
I have a time series data of daily transactions, starting from 2017-06-28 till 2018-11-26.
The data looks like this:
I am interested to use decompose() or stl() function in R. But I am getting
error:
decompose(y) : time series has no or less than 2 periods
when I am trying to use decompose()
and
Error in stl(y, "periodic") :
series is not periodic or has less than two periods
when I am trying to use stl().
I have understood that I have to specify the period, but I am not able to understand what should be the period in my case? I have tried with the following toy example:
dat <- cumsum(rnorm(51.7*10))
y <- ts(dat, frequency = 517)
plot.ts(y)
stl(y, "periodic")
But I couldn't succeed. Any help will be highly appreciated.
The frequency parameter reflects the number of observations before the seasonal pattern repeats. As your data is daily, you may want to set frequency equal to 7 or 365.25 (depending on your business seasonality).
Of course, the larger the business seasonality, the more data you need (i.e. more than 2 periods) in order to decompose your time series. In your case, you set the frequency to 517, but have data available for less than two periods. Thus, the seasonal decomposition cannot happen.
For more info, please see: Rob Hyndman's Forecasting Principles and Practice book
I have a daily data with a weekly seasonal component ranging from 2017-03-01 to 2017-05-29. I want to do a seasonal decomposition in R. My code was as follows.
ser = ts(series[,2], frequency=7, start=c(2017,1,1 ))
plot(decompose(ser))
I got a graph as follows.
But the X axis is wrong in the graph. How can I correct it..?
it isn't correct because you have not correctly expressed the arguments frequency.
Reading the help of the function ts() you can see that:
frequency the number of observations per unit of time.
So you can try use this code:
ser = ts(series[,2], frequency=365, start=c(2017,1))
plot(decompose(ser))
Because being daily data, every year you have 365 observations.
Verify that it is the correct solution
I think your frequency is wrong. Also, if your data start in the third day of 2017 you put the wrong start. Try this :
ser = ts(series[,2], frequency = 365.25, start = c(2017,3)) #Third day of 2017
Frequency = 7 isn't really interpretable. For instance, frequency = 12 means that you've got data for each month. In this case you've got daily data so, frequency = 365.25
Default ts object in R seems to be very limited. If you want to create time series with weekly seasonality, i'd recommend the mats object from the forecast library. Because it allows multiple periods, you can define week as well as year as seasonal influence:
library(forecast)
daily_onboardings.msts <- msts(daily_onboardings$count, seasonal.periods = c(7, 365.25),start = decimal_date(min(members$onboarded_at)))
I am currently fitting a SARIMAX model to big data sets. Information is retrieved every 10 minutes, so I have a vector of size 52560 for a year of data. Considering it is representing electricity load in a device throughout said year, we can observe a daily pattern, a weekly one and a yearly one.
There is also a trend, so I need to differentiate my series 4 times. Since the set is for 1 year, I can let aside the yearly seasonality and focus on the other 3. Let's say I get something like this:
dauch = diff(diff(diff(auch2), 144), 1008)
With 144 being the daily seasonality (6×24 10-minute points per day) and 1008 the weekly one.
I would like to fit a SARIMAX model on which I worked with my predecessor. He found that SARIMAX(2,1,5)(1,2,8)144 was the best one for this series. However, I get an error whenever I try this:
themodel = arima(auch[1000:4024,1], order = c(2,1,5), seasonal = list(order = c(1,2,8),
period = 144), xreg=tmpf[988:4012])
tmpf being the temperature used as an exogenous variable. The error is the following:
Error in makeARIMA(trarma[[1L]], trarma[[2L]], Delta, kappa, SSinit) :
maximum supported lag is 350
I don't really understand what it means in my case, because the period I chose is 144 which is inferior to 350. I feel like I need to keep the D = 2 in the model because of the dual differencing for daily and weekly pattern, but I don't know what to do to solve this. Thanks.