Interpolate a high frequency time series - r

I have a physical time series in a range of 2 year sample data with a frequency of 30 minutes, but there are multiple and wide lost data intervals as you can see there:
I tried with the function na.interp from forecast package with a bad result (shown above):
sapply(dataframeTS[2:10], na.interp)
Im looking for a more useful method.
UPDATE:
Here is more info about the pattern I want to capture, concretely the row data. This subsample belongs to May.

You might want to try the **imputeTS** package. It's an R package dedicated to time series missing value imputation.
The na_seadec(), na_seasplit(), na_kalman() methods might be interesting here
There are many more algorithm options - you can find a list in this Paper about the package.
In this specific case I would try:
na_seasplit(yourData)
or
na_kalman(yourData)
or
na_seadec(yourData)
Be aware, that it might be you need to give the seasonality information correctly with the time series. (you have to create a time series (ts object) and set the frequency parameter)
Still might be that it won't work out at all, you will have to try.
(if you can provide the data I'll also give it a try)

Related

Time series benchmarking/reconciliation and revisions - are there methods that minimise revisions?

I am using the tempdisagg R package for benchmarking quarterly time series to annual time series from different (more trusted) sources (by temporally disaggragating the annual data using the quarterly data as indicator series).
The time series are sub series and sum series, and these identities should hold after benchmarking, too. I.e. if
S = A + B - C,
then
predict(td(S,...)) = predict(td(A, ...)) + predict(td(B, ...)) - predict(td(C,...)).
I have tried the Denton-Cholette and the Chow-Lin-maxlog methods.
This is to be carried out regularly, so ideally I would like a disaggregation method that minimises revisions. I have tried removing up to ten years' worth a data from various time series to see if any method outperforms the other in terms of minimising revisions, but it seems that it depends on a combination of time series volatility and method and I can't reach a conclusion.
It would be possible to use a combination of different methods on the sub series, I guess.
Is there any comprehensive knowledge on benchhmarking and revisions?
I have attached some graphs in an attempt to illustrate the problem. Ideally, we would like to see one line that just changes colour according to the various years of data, as in the first two graphs until about 2015. The black lines in the graphs are the raw data.
Your question seems to consist of two independent parts.
You mention that the identity S = A + B - C can be achieved with our tempdisagg R-library by predict(td(S,...)) = predict(td(A, ...)) + predict(td(B, ...)) - predict(td(C,...)).
This is usually not the case. You will have to apply td() to three of the four series and compute the fourth series implicitly (e.g. S = predict(td(A, ...)) + predict(td(B, ...)) - predict(td(C,...))).
To answer your question about the revisions, a reproducible example would be handy. You could create such an example with the example time series in our tempdisagg library, which are accessible by data(tempdisagg).
Since the Chow-Lin method is based on a regression (in your case of the involved annual time series), the regression parameters are gonna change with every new or revised annual value. As a consequence, all values of the resulting quarterly series are gonna be revised. When applying the Denton method, no parameters have to be estimated. Thus only the most recent years of the resulting quarterly series are prone to revision. If your focus is on the whole resulting quarterly time series, a lot less quarters are prone to revisions when using the Denton method compared to the Chow-Lin method. If your focus is on the revisions of the most recent quarters/years, it's a different story and I doubt that there is a clear cut answer.
I hope this helps. Our paper Temporal Disaggregation of Time Series contains an overview of the different temporal disaggregation methods implemented in the tempdisagg library.

How to frequency filter a time series in R

I'd like to know how to apply frequency filters to a time series in R and get the result back in the time domain (e.g. fourier transform -> get rid of certain frequencies -> inverse fourier transform) or a similar approach (there does not seem to be a suitable inverse function to fft). From what I have seen people recommend the signal package, but I don't seem to be able to get sensible results with this package. So to put an example. Let's say I have a simple monthy time series:
plot(log(AirPassengers))
Let's say I want to extract these seasonal fluctuations here using a band-pass filter (i.e. get rid of the trend and the small higher frequency movements). If someone could produce an R code do do this by fourier methods, preferably using a package function, and explain it that would be of great help to get me started.

band filtering based on PSD, to filter out frequency domains in r, probably using "buttord" from signal

I'm still a novice in R and I read quite a couple of posts and discussions on how to filter out frequency domains in a time series, but none of those quite matched my problem.
I would like to ask for your suggestions about the following:
I calculated wavelet coherence for two annually measured time series and taking a look at the wavelet coherence PSD graph:
The purple line (i.e. 8 year period) represents the border under which I would like to filter out the frequency domain, but not in the PSD, but in the original input data.
I though about using the butter function from the signal package, but it was overcomplicated for my purposes.
Thus I approached the problem with the bwfilter function of the mFilter package fo pass through the data over the 8 year period which corresponds to 2.37E-7 Hz.
name="dta OAK.resid Tair "
adat=read.table(file=paste(name,".csv", sep=""), sep=";", header=T)
dta=adat$ya
highpass <- bwfilter(dta, freq=8,drift=FALSE)
plot(highpass)
However, the results do not seem to be correct, because it seems to filter out too much from the data, the trend is too much aligned to the original time series.
Do you have any idea what may have gone wrong? The measurement unit maybe?
Any help is appreciated and if any additional details are needed I am happy to provide them!
Thank you!
The data can be found here

Fast Fourier Transform and Clustering of Time Series

I'm making a project connected with identifying dynamic of sales. That's how the piece of my database looks like http://imagizer.imageshack.us/a/img854/1958/zlco.jpg. There are three columns:
Product - present the group of product
Week - time since launch the product (week), first 26 weeks
Sales_gain - how the sales of product change by week
In the database there is 3302 observations = 127 time series
My aim is to cluster time series in groups which are going to show me different dynamic of sales. Before clustering I want to use Fast Fourier Transform to change time series on vectors and take into consideration amplitude etc and then use a distance algorithm and group products.
It's my first time I deal with FFT and clustering, so I would be grateful if anybody would point steps, which I have to do before/after using FFT to group dynamics of sales. I want to do all steps in R, so it would be wonderful if somebody type which procedures should I use to do all steps.
That's how my time series look like now http://imageshack.com/a/img703/6726/sru7.jpg
Please note that I am relatively new to time series analysis (that's why I cannot put here my code) so any clarity you could provide in R or any package you could recommend that would accomplish this task efficiently would be appreciated.
P.S. Instead of FFT I found the code for DWT here -> www.rdatamining.com/examples/time-series-clustering-classification but cannot use it on my data base and time series (suggest R to analyze new time series after 26 weeks). Can sb explain it to me?
You may have too little data for FFT/DWT to make sense. DTW may be better, but I also don't think it makes sense for sales data - why would there be a x-week temporal offset from one location to another? It's not as if the data were captured at unknown starting weeks.
FFT and DWT are good when your data will have interesting repetitive patterns, and you have A) a good temporal resolution (for audio data, e.g. 16000 Hz - I am talking about thousands of data points!) and B) you have no idea of what frequencies to expect. If you know e.g. you will have weekly patterns (e.g. no sales on sundays) then you should filter them with other algorithms instead.
DTW (dynamic time-warping) is good when you don't know when the event starts and how they align. Say you are capturing heart measurements. You cannot expect to have the hearts of two subjects to beat in synchronization. DTW will try to align this data, and may (or may not) succeed in matching e.g. an anomaly in the heart beat of two subjects. In theory...
Maybe you don't need specialized time methods here at all.
A) your data has too low temporal resolution
B) your data is already perfectly aligned
Maybe all you need is spend more time in preprocessing your data, in particular normalization, to be able to capture similarity.

correctly lagging with irregularly spaced data

I have some irregularly spaced data, say table A. The frequency is every 2-5 days. I have another data set, table B, which has entries for every weekday. I want to run the following regression:
A_{t} = alpha + beta1 * B_{t-2 months} + error
where, when I lag B, if there isn't something that isn't exactly 60 days ago, e.g. if 60 days ago was a Sunday, then just pick the next Monday. I can of course construct this w/ a for loop, but what is the R way. Currently, the data are store in MySQL tables and I am using RMySQL to access.
Thanks for the help.
You want the zoo package and its documentation --- which has numerous examples about how to aggregate, align, transform, ... data along the time dimension.
It is a hard problem. You'll have to think about how you do it --- but at least appropriate and powerful tools exist. There are also plenty of usage examples here and on the R lists.
At a minimum, you could use na.locf() to carry your last irregular observation forward to the next regular one (after having merged the data based on daily dates). You can then use lag() operators on the regular data. Also, packages dynlm and dyn facilitate modeling with lm() on data help in zoo objects by adding lags etc to the formula interface.

Resources