Wavelet reconstruction of time series - r

I'm trying to reconstruct the original time series from a Morlet's wavelet transform. I'm working in R, package Rwave, function cwt. The result of this function is a matrix of n*m (n=period, m=time) containing complex values.
To reconstruct the signal I used the formula (11) in Torrence & Compo classic text, but the result has nothing to do with the original signal. I'm specially concerned with the division between the real part of the wavelet transform and the scale, this step distorts completely the result. On the other hand, if I just sum the real parts over all the scales, the result is quite similar to the original time series, but with slightly wider values (the original series ranges~ [-0.2, 0.5], the reconstructed series ranges ~ [-0.4,0.7]).
I'm wondering if someone could tell of some practical procedure, formula or algorithm to reconstruct the original time series. I've already read the papers of Torrence and Compo (1998), Farge (1992) and other books, all with different formulas, but no one really help me.

I have been working on this topic currently, using the same paper. I show you code using an example dataset, detailing how I implemented the procedure of wavelet decomposition and reconstruction.
# Lets first write a function for Wavelet decomposition as in formula (1):
mo<-function(t,trans=0,omega=6,j=0){
dial<-2*2^(j*.125)
sqrt((1/dial))*pi^(-1/4)*exp(1i*omega*((t-trans)/dial))*exp(-((t-trans)/dial)^2/2)
}
# An example time series data:
y<-as.numeric(LakeHuron)
From my experience, for correct reconstruction you should do two things: first subject the mean to get a zero-mean dataset. I then increase the maximal scale. I mostly use 110 (although the formula in the Torrence and Compo suggests 71)
# subtract mean from data:
y.m<-mean(y)
y.madj<-y-y.m
# increase the scale:
J<-110
wt<-matrix(rep(NA,(length(y.madj))*(J+1)),ncol=(J+1))
# Wavelet decomposition:
for(j in 0:J){
for(k in 1:length(y.madj)){
wt[k,j+1]<-mo(t=1:(length(y.madj)),j=j,trans=k)%*%y.madj
}
}
#Extract the real part for the reconstruction:
wt.r<-Re(wt)
# Reconstruct as in formula (11):
dial<-2*2^(0:J*.125)
rec<-rep(NA,(length(y.madj)))
for(l in 1:(length(y.madj))){
rec[l]<-0.2144548*sum(wt.r[l,]/sqrt(dial))
}
rec<-rec+y.m
plot(y,type="l")
lines(rec,col=2)
As you can see in the plot, it looks like a perfect reconstruction:

Related

Low-pass fltering of a matrix

I'm trying to write a low-pass filter in R, to clean a "dirty" data matrix.
I did a google search, came up with a dazzling range of packages. Some apply to 1D signals (time series mostly, e.g. How do I run a high pass or low pass filter on data points in R? ); some apply to images. However I'm trying to filter a plain R data matrix. The image filters are the closest equivalent, but I'm a bit reluctant to go this way as they typically involve (i) installation of more or less complex/heavy solutions (imageMagick...), and/or (ii) conversion from matrix to image.
Here is sample data:
r<-seq(0:360)/360*(2*pi)
x<-cos(r)
y<-sin(r)
z<-outer(x,y,"*")
noise<-0.3*matrix(runif(length(x)*length(y)),nrow=length(x))
zz<-z+noise
image(zz)
What I'm looking for is a filter that will return a "cleaned" matrix (i.e. something close to z, in this case).
I'm aware this is a rather open-ended question, and I'm also happy with pointers ("have you looked at package so-and-so"), although of course I'd value sample code from users with experience on signal processing !
Thanks.
One option may be using a non-linear prediction method and getting the fitted values from the model.
For example by using a polynomial regression, we can predict the original data as the purple one,
By following the same logic, you can do the same thing to all columns of the zz matrix as,
predictions <- matrix(, nrow = 361, ncol = 0)
for(i in 1:ncol(zz)) {
pred <- as.matrix(fitted(lm(zz[,i]~poly(1:nrow(zz),2,raw=TRUE))))
predictions <- cbind(predictions,pred)
}
Then you can plot the predictions,
par(mfrow=c(1,3))
image(z,main="Original")
image(zz,main="Noisy")
image(predictions,main="Predicted")
Note that, I used a polynomial regression with degree 2, you can change the degree for a better fitting across the columns. Or maybe, you can use some other powerful non-linear prediction methods (maybe SVM, ANN etc.) to get a more accurate model.

Discrepancy in Cubic Spline Interpolation, R & matlab

I am trying to replicate the spline() function in matlab using the spline() function in R's splinefun {stat}s package, without having full access to matlab (I don't have a licence for it). I am able to input all of the necessary data into R that would be present in matlab, but my spline output is different than matlab's by an average of .0036 (maxdif is .0342, mindif is -.0056, stdev is .0094). My main question is, how does matlab's formula compare to R's, and is that where my calculation discrepancy might come from?
The first part of my code is feeding the excel spreadsheet into R, then calculating the necessary variables to get tau and quick delta. After this, I run the spline calculation and then rotate the output for the purposes of exporting back into excel. Below is the essential script, plus some data to try out to see if there is something flawed in my calculation. I use spline(natural), as it returns the closest values to matlab's model.
#establishing what tau is for quick Delta calculation
today<-Sys.Date()
month<-as.Date(5/1/2016)
difday<-difftime(month,today,units=c("days"))
Tau<-as.numeric((month-today)/365)
Pu<-as.numeric(1.94)
Vol<-as.numeric(.4261)
#Pf is the representation of my fixed strike prices, the points used for interpolation
Pf<-c(Pu-.3,Pu-.25,Pu-.2,Pu-.1,Pu,Pu+.1,Pu+.2,Pu+.25,Pu+.3)
qDtable<-data.frame(matrix(ncol=length(Pf),nrow=length(month)))
colnames(qDtable)<-c(Pf)
rownames(qDtable)<-format.Date(month)
#my quick Delta calculation & table as a result
qD<-data.frame(pnorm(log(Pf/Pu)/(Vol*sqrt(Tau))))
Qd<-t(qD[1:24,1])
qDtable[1,]=c(Qd)
#setting up for spline interpolation
qDpoint<-as.numeric(qDtable[1,1:24])
ncsibyPf<-data.frame(matrix(ncol=length(Pf),nrow=length(month)))
colnames(ncsibyPf)<-Pf
rownames(ncsibyPf)<-format.Date(month)
qDvol<-data.frame(matrix(ncol=14,nrow=2)
colnames(qDvol)<-c("",0,.05,.1,.2,.3,.4,.5,.6,.7,.8,.9,.95,1)
rownames(qDvol)<-format.Date(month)
qDvol[2,2:14]<-c(.59612,.51112,.46112,.45612,.44612,.42612,.42612,.42612,.42612,.42612,.42612,.42612,.42612)
#x is the quick Vol point
x<-as.numeric(qDvol[1,2:14])
#y is the vol at the quick Vol point
y<-as.numeric(qDvol[2,2:14])
ncsivol<-data.frame(spline(x,y,xout=qDpoint,method="natural"))
nroutput<-t(ncsivol[1:24,2])
ncsibyPf[1,]=c(nroutput)
The essential data points for this spline run are all included (I think), and everything should line up correctly. Thank you for your help ahead of time!

Cross-correlation of 5 time series (distance) and interpretation

I would appreciate some input in this a lot!
I have data for 5 time series (an example of 1 step in the series is in the plot below), where each step in the series is a vertical profile of species sightings in the ocean which were investigated 6h apart. All 5 steps are spaced vertically by 0.1m (and the 6h in time).
What I want to do is calculate the multivariate cross-correlation between all series in order to find out at which lag the profiles are most correlated and stable over time.
Profile example:
I find the documentation in R on that not so great, so what I did so far is use the package MTS with the ccm function to create cross correlation matrices. However, the interpretation of the figures is rather difficult with sparse documentation. I would appreciate some help with that a lot.
Data example:
http://pastebin.com/embed_iframe.php?i=8gdAeGP4
Save in file cross_correlation_stack.csv or change as you wish.
library(dplyr)
library(MTS)
library(data.table)
d1 <- file.path('cross_correlation_stack.csv')
d2 = read.csv(d1)
# USING package MTS
mod1<-ccm(d2,lag=1000,level=T)
#USING base R
acf(d2,lag.max=1000)
# MQ plot also from MTS package
mq(d2,lag=1000)
Which produces this (the ccm command):
This:
and this:
In parallel, the acf command from above produces this:
My question now is if somebody can give some input in whether I am going in the right direction or are there better suited packages and commands?
Since the default figures don't get any titles etc. What am I looking at, specifically in the ccm figures?
The ACF command was proposed somewhere, but can I use it here? In it's documentation it says ... calculates autocovariance or autocorrelation... I assume this is not what I want. But then again it's the only command that seems to work multivariate. I am confused.
The plot with the significance values shows that after a lag of 150 (15 meters) the p values increase. How would you interpret that regarding my data? 0.1 intervals of species sightings and many lags up to 100-150 are significant? Would that mean something like that peaks in sightings are stable over the 5 time-steps on a scale of 150 lags aka 15 meters?
In either way it would be nice if somebody who worked with this before can explain what I am looking at! Any input is highly appreciated!
You can use the base R function ccf(), which will estimate the cross-correlation function between any two variables x and y. However, it only works on vectors, so you'll have to loop over the columns in d1. Something like:
cc <- vector("list",choose(dim(d1)[2],2))
par(mfrow=c(ceiling(choose(dim(d1)[2],2)/2),2))
cnt <- 1
for(i in 1:(dim(d1)[2]-1)) {
for(j in (i+1):dim(d1)[2]) {
cc[[cnt]] <- ccf(d1[,i],d1[,j],main=paste0("Cross-correlation of ",colnames(d1)[i]," with ",colnames(d1)[j]))
cnt <- cnt + 1
}
}
This will plot each of the estimated CCF's and store the estimates in the list cc. It is important to remember that the lag-k value returned by ccf(x,y) is an estimate of the correlation between x[t+k] and y[t].
All of that said, however, the ccf is only defined for data that are more-or-less normally distributed, but your data are clearly overdispersed with all of those zeroes. Therefore, lacking some adequate transformation, you should really look into other metrics of "association" such as the mutual information as estimated from entropy. I suggest checking out the R packages entropy and infotheo.

Trying to do a simulation in R

I'm pretty new to R, so I hope you can help me!
I'm trying to do a simulation for my Bachelor's thesis, where I want to simulate how a stock evolves.
I've done the simulation in Excel, but the problem is that I can't make that large of a simulation, as the program crashes! Therefore I'm trying in R.
The stock evolves as follows (everything except $\epsilon$ consists of constants which are known):
$$W_{t+\Delta t} = W_t exp^{r \Delta t}(1+\pi(exp((\sigma \lambda -0.5\sigma^2) \Delta t+\sigma \epsilon_{t+\Delta t} \sqrt{\Delta t}-1))$$
The only thing here which is stochastic is $\epsilon$, which is represented by a Brownian motion with N(0,1).
What I've done in Excel:
Made 100 samples with a size of 40. All these samples are standard normal distributed: N(0,1).
Then these outcomes are used to calculate how the stock is affected from these (the normal distribution represent the shocks from the economy).
My problem in R:
I've used the sample function:
x <- sample(norm(0,1), 1000, T)
So I have 1000 samples, which are normally distributed. Now I don't know how to put these results into the formula I have for the evolution of my stock. Can anyone help?
Using R for (discrete) simulation
There are two aspects to your question: conceptual and coding.
Let's deal with the conceptual first, starting with the meaning of your equation:
1. Conceptual issues
The first thing to note is that your evolution equation is continuous in time, so running your simulation as described above means accepting a discretisation of the problem. Whether or not that is appropriate depends on your model and how you have obtained the evolution equation.
If you do run a discrete simulation, then the key decision you have to make is what stepsize $\Delta t$ you will use. You can explore different step-sizes to observe the effect of step-size, or you can proceed analytically and attempt to derive an appropriate step-size.
Once you have your step-size, your simulation consists of pulling new shocks (samples of your standard normal distribution), and evolving the equation iteratively until the desired time has elapsed. The final state $W_t$ is then available for you to analyse however you wish. (If you retain all of the $W_t$, you have a distribution of the trajectory of the system as well, which you can analyse.)
So:
your $x$ are a sampled distribution of your shocks, i.e. they are $\epsilon_t=0$.
To simulate the evolution of the $W_t$, you will need some initial condition $W_0$. What this is depends on what you're modelling. If you're modelling the likely values of a single stock starting at an initial price $W_0$, then your initial state is a 1000 element vector with constant value.
Now evaluate your equation, plugging in all your constants, $W_0$, and your initial shocks $\epsilon_0 = x$ to get the distribution of prices $W_1$.
Repeat: sample $x$ again -- this is now $\epsilon_1$. Plugging this in, gives you $W_2$ etc.
2. Coding the simulation (simple example)
One of the useful features of R is that most operators work element-wise over vectors.
So you can pretty much type in your equation more or less as it is.
I've made a few assumptions about the parameters in your equation, and I've ignored the $\pi$ function -- you can add that in later.
So you end up with code that looks something like this:
dt <- 0.5 # step-size
r <- 1 # parameters
lambda <- 1
sigma <- 1 # std deviation
w0 <- rep(1,1000) # presumed initial condition -- prices start at 1
# Show an example iteration -- incorporate into one line for production code...
x <- rnorm(1000,mean=0,sd=1) # random shock
w1 <- w0*exp(r*dt)*(1+exp((sigma*lambda-0.5*sigma^2)*dt +
sigma*x*sqrt(dt) -1)) # evolution
When you're ready to let the simulation run, then merge the last two lines, i.e. include the sampling statement in the evolution statement. You then get one line of code which you can run manually or embed into a loop, along with any other analysis you want to run.
# General simulation step
w <- w*exp(r*dt)*(1+exp((sigma*lambda-0.5*sigma^2)*dt +
sigma*rnorm(1000,mean=0,sd=1)*sqrt(dt) -1))
You can also easily visualise the changes and obtain summary statistics (5-number summary):
hist(w)
summary(w)
Of course, you'll still need to work through the details of what you actually want to model and how you want to go about analysing it --- and you've got the $\pi$ function to deal with --- but this should get you started toward using R for discrete simulation.

Using Rs fft function

I'm currently trying to use the fft function in R to transform measured soil temperature at a certain depths so as to model soil temperatures and heat fluxes at different depths.
I wanted to clarify some points regarding the fft function in R as i'm currently experiencing problems implementing this procedure.
So I have a df containing the date and time and soil temperatures at 5cm (T5) depth for a period of several months. According to the literature, it is possible to simulate temperatures and heat fluxes at different depths based on a fast Fourier transform of the measured data.
So my first step was naturally DF$FFT = fft (DF$T5)
From which I receive a series of complex numbers (Cn) i.e. the respective real (an) and imaginary (bn) numbers.
According to the literature, I can then recreate the T5 data with a formula based on outputs from the aforementioned fft.
*T_(0,t )= meanT + ∑ (An sin⁡〖nωt+φ〗) ̅
NB the summed term is summed between n=1 and M, the highest harmonic
where T o,t is the temperature at given time point, mean Temperature over the period, t is the time and...
An = (2/sqrt(N))*|Cn|
|Cn| = modulus of the complex number of the nth harmonic Mod (DF$FFT)
phi = arctan (an/bn) i.e. arctan (Re(DF$FFT)/Im(DF$FFT)
omega = (2*pi/N)
Unfortunately based on the output of the fft in R i cannot recreate the temperature values using the above formula. I realise i can recreate the data using
fft (fft(DF$T5), inverse = T)/length (DF$T5)
However i need to be able to do it with the above equation so as to use the terms from this equation to model temperatures at other depths. Could anyone lend a hand in where i may be going wrong with the procedure i have described above. For example the above procedure was implemented in paper where the fft function from Mathcad was used! I am not looking here for a quick fix solution to my problem, so i understand that more data and info would be handy if that were the case. What i am looking for though is a bit of guidance with e.g. any peculiarities of the R fft that i should be aware of.
If anyone could help in any way possible it would be most appreciated. Also if anyone needs more info regarding my problem please do ask
thanks a lot
Brad

Resources