Fast Fourier Transform in R - r

I have a dataset with the number of hourly visits an animal made during a period of 12 months. I want to use the Fast Fourier Transform to examine cyclical patterns and periodicity. In the past, I have used Statistica for this this; however, I would like to use R to get a plot of the spectral density vs. period. Is there an easy way to do this in R? I would like to identify 12 and 24 hr peak in activity if possible.

You may consider the following functions.
periodogram from TSA package immediately plots a periodogram.
periodogram from GeneCycle returns a list of frequencies and estimated power spectral densities. It is a wrapper function for stats::spectrum with some special options set.
spectrum from stats allows you to choose the method used to estimate the spectral density: either periodogram or using autoregressive process.
cpgram from stats plots a cumulative periodogram along with a confidence interval.
See, e.g., ?cpgram or ?spectrum for all the details and keep in mind that it is, e.g., TSA::periodogram and GeneCycle::periodogram when names of the functions coincide.
There are also plenty of examples and tutorials online on how to use those functions. See here for the usage of fft and here for an even more extensive tutorial.
Also, as you probably already know, a given time series must be detrended. Hence, use, e.g., diff(x) instead of x. And finally, the length of your time series must be divisible by 12 as to be able to identify 12 and 24 hours frequencies, it can be achieved by, e.g., x[-(1:(length(x) %% 12))], where x is a detrended time series.

Use spectrum to do a spectral density analysis; also fft for the base fast Fourier transform.

Related

Suggestions for clustering methods

I have two time series of meteorological measurements (i.e., X and Y). Both X and Y time series were constructed using daily measurements over a period of one year. By plotting X time series versus Y times series as a scatterplot and connecting all the points by date in ascending order, a closed loop is obtained representing the annual cycle. I have measurements at N locations and thus I have N loops (i.e., annual cycles) which I want to cluster to find those that have similar shapes.
With so many clustering methods, I am not sure which one will be more appropriate to use for this analysis (initially I was
thinking to use self-organizing maps).
Thank you very much for any suggestions.
Unless you have too many time series, I suggest to start with hierarchical clustering. It's easy to interpret because of the dendrogram.
For similarity, a cyclic version of DTW may be good, assuming that there is some delay between different locations.

Cross-correlation of 5 time series (distance) and interpretation

I would appreciate some input in this a lot!
I have data for 5 time series (an example of 1 step in the series is in the plot below), where each step in the series is a vertical profile of species sightings in the ocean which were investigated 6h apart. All 5 steps are spaced vertically by 0.1m (and the 6h in time).
What I want to do is calculate the multivariate cross-correlation between all series in order to find out at which lag the profiles are most correlated and stable over time.
Profile example:
I find the documentation in R on that not so great, so what I did so far is use the package MTS with the ccm function to create cross correlation matrices. However, the interpretation of the figures is rather difficult with sparse documentation. I would appreciate some help with that a lot.
Data example:
http://pastebin.com/embed_iframe.php?i=8gdAeGP4
Save in file cross_correlation_stack.csv or change as you wish.
library(dplyr)
library(MTS)
library(data.table)
d1 <- file.path('cross_correlation_stack.csv')
d2 = read.csv(d1)
# USING package MTS
mod1<-ccm(d2,lag=1000,level=T)
#USING base R
acf(d2,lag.max=1000)
# MQ plot also from MTS package
mq(d2,lag=1000)
Which produces this (the ccm command):
This:
and this:
In parallel, the acf command from above produces this:
My question now is if somebody can give some input in whether I am going in the right direction or are there better suited packages and commands?
Since the default figures don't get any titles etc. What am I looking at, specifically in the ccm figures?
The ACF command was proposed somewhere, but can I use it here? In it's documentation it says ... calculates autocovariance or autocorrelation... I assume this is not what I want. But then again it's the only command that seems to work multivariate. I am confused.
The plot with the significance values shows that after a lag of 150 (15 meters) the p values increase. How would you interpret that regarding my data? 0.1 intervals of species sightings and many lags up to 100-150 are significant? Would that mean something like that peaks in sightings are stable over the 5 time-steps on a scale of 150 lags aka 15 meters?
In either way it would be nice if somebody who worked with this before can explain what I am looking at! Any input is highly appreciated!
You can use the base R function ccf(), which will estimate the cross-correlation function between any two variables x and y. However, it only works on vectors, so you'll have to loop over the columns in d1. Something like:
cc <- vector("list",choose(dim(d1)[2],2))
par(mfrow=c(ceiling(choose(dim(d1)[2],2)/2),2))
cnt <- 1
for(i in 1:(dim(d1)[2]-1)) {
for(j in (i+1):dim(d1)[2]) {
cc[[cnt]] <- ccf(d1[,i],d1[,j],main=paste0("Cross-correlation of ",colnames(d1)[i]," with ",colnames(d1)[j]))
cnt <- cnt + 1
}
}
This will plot each of the estimated CCF's and store the estimates in the list cc. It is important to remember that the lag-k value returned by ccf(x,y) is an estimate of the correlation between x[t+k] and y[t].
All of that said, however, the ccf is only defined for data that are more-or-less normally distributed, but your data are clearly overdispersed with all of those zeroes. Therefore, lacking some adequate transformation, you should really look into other metrics of "association" such as the mutual information as estimated from entropy. I suggest checking out the R packages entropy and infotheo.

Wavelet reconstruction of time series

I'm trying to reconstruct the original time series from a Morlet's wavelet transform. I'm working in R, package Rwave, function cwt. The result of this function is a matrix of n*m (n=period, m=time) containing complex values.
To reconstruct the signal I used the formula (11) in Torrence & Compo classic text, but the result has nothing to do with the original signal. I'm specially concerned with the division between the real part of the wavelet transform and the scale, this step distorts completely the result. On the other hand, if I just sum the real parts over all the scales, the result is quite similar to the original time series, but with slightly wider values (the original series ranges~ [-0.2, 0.5], the reconstructed series ranges ~ [-0.4,0.7]).
I'm wondering if someone could tell of some practical procedure, formula or algorithm to reconstruct the original time series. I've already read the papers of Torrence and Compo (1998), Farge (1992) and other books, all with different formulas, but no one really help me.
I have been working on this topic currently, using the same paper. I show you code using an example dataset, detailing how I implemented the procedure of wavelet decomposition and reconstruction.
# Lets first write a function for Wavelet decomposition as in formula (1):
mo<-function(t,trans=0,omega=6,j=0){
dial<-2*2^(j*.125)
sqrt((1/dial))*pi^(-1/4)*exp(1i*omega*((t-trans)/dial))*exp(-((t-trans)/dial)^2/2)
}
# An example time series data:
y<-as.numeric(LakeHuron)
From my experience, for correct reconstruction you should do two things: first subject the mean to get a zero-mean dataset. I then increase the maximal scale. I mostly use 110 (although the formula in the Torrence and Compo suggests 71)
# subtract mean from data:
y.m<-mean(y)
y.madj<-y-y.m
# increase the scale:
J<-110
wt<-matrix(rep(NA,(length(y.madj))*(J+1)),ncol=(J+1))
# Wavelet decomposition:
for(j in 0:J){
for(k in 1:length(y.madj)){
wt[k,j+1]<-mo(t=1:(length(y.madj)),j=j,trans=k)%*%y.madj
}
}
#Extract the real part for the reconstruction:
wt.r<-Re(wt)
# Reconstruct as in formula (11):
dial<-2*2^(0:J*.125)
rec<-rep(NA,(length(y.madj)))
for(l in 1:(length(y.madj))){
rec[l]<-0.2144548*sum(wt.r[l,]/sqrt(dial))
}
rec<-rec+y.m
plot(y,type="l")
lines(rec,col=2)
As you can see in the plot, it looks like a perfect reconstruction:

Using Rs fft function

I'm currently trying to use the fft function in R to transform measured soil temperature at a certain depths so as to model soil temperatures and heat fluxes at different depths.
I wanted to clarify some points regarding the fft function in R as i'm currently experiencing problems implementing this procedure.
So I have a df containing the date and time and soil temperatures at 5cm (T5) depth for a period of several months. According to the literature, it is possible to simulate temperatures and heat fluxes at different depths based on a fast Fourier transform of the measured data.
So my first step was naturally DF$FFT = fft (DF$T5)
From which I receive a series of complex numbers (Cn) i.e. the respective real (an) and imaginary (bn) numbers.
According to the literature, I can then recreate the T5 data with a formula based on outputs from the aforementioned fft.
*T_(0,t )= meanT + ∑ (An sin⁡〖nωt+φ〗) ̅
NB the summed term is summed between n=1 and M, the highest harmonic
where T o,t is the temperature at given time point, mean Temperature over the period, t is the time and...
An = (2/sqrt(N))*|Cn|
|Cn| = modulus of the complex number of the nth harmonic Mod (DF$FFT)
phi = arctan (an/bn) i.e. arctan (Re(DF$FFT)/Im(DF$FFT)
omega = (2*pi/N)
Unfortunately based on the output of the fft in R i cannot recreate the temperature values using the above formula. I realise i can recreate the data using
fft (fft(DF$T5), inverse = T)/length (DF$T5)
However i need to be able to do it with the above equation so as to use the terms from this equation to model temperatures at other depths. Could anyone lend a hand in where i may be going wrong with the procedure i have described above. For example the above procedure was implemented in paper where the fft function from Mathcad was used! I am not looking here for a quick fix solution to my problem, so i understand that more data and info would be handy if that were the case. What i am looking for though is a bit of guidance with e.g. any peculiarities of the R fft that i should be aware of.
If anyone could help in any way possible it would be most appreciated. Also if anyone needs more info regarding my problem please do ask
thanks a lot
Brad

Is it possibile to arrange a time series in the way that a specific autocorrleation is created?

I have a file containing 2,500 random numbers. Is it possible to rearrange these saved numbers in the way that a specific autocorrelation is created? Lets say, autocorrelation to the lag 1 of 0.2, autocorrelation to the lag 2 of 0.4, etc.etc.
Any help is greatly appreciated!
To be more specific:
The time series of a daily return in percent of an asset has the following characteristics that I am trying to recreate:
Leptokurtic, symmetric distribution, let's say centered at a daily return of zero
No significant autocorrelations (because the sign of a daily return is not predictable)
Significant autocorrleations if the time series is squared
The aim is to produce a random time series which satisfies all these three characteristics. The only two inputs should be the leptokurtic distribution (this I have already created) and the specific autocorrelation of the squared resulting time series (e.g. the final squared time series should have an autocorrelation at lag 1 of 0.2).
I only know how to produce random numbers out of my own mixed-distribution. Naturally if I would square this resulting time series, there would be no autocorrelation. I would like to find a way which takes this into account.
Generally the most straightforward way to create autocorrelated data is to generate the data so that it's autocorrelated. For example, you could create an auto correlated path by always using the value at p-1 as the mean for the random draw at time period p.
Rearranging is not only hard, but sort of odd conceptually. What are you really trying to do in the end? Giving some context might allow better answers.
There are functions for simulating correlated data. arima.sim() from stats package and simulate.Arima() from the forecast package.
simulate.Arima() has the advantages that (1.) it can simulate seasonal ARIMA models (maybe sometimes called "SARIMA") and (2.) It can simulate a continuation of an existing timeseries to which you have already fit an ARIMA model. To use simulate.Arima(), you do need to already have an Arima object.
UPDATE:
type ?arima.sim then scroll down to "examples".
Alternatively:
install.packages("forecast")
library(forecast)
fit <- auto.arima(USAccDeaths)
plot(USAccDeaths,xlim=c(1973,1982))
lines(simulate(fit, 36),col="red")

Resources