Discrepancy in Cubic Spline Interpolation, R & matlab - r

I am trying to replicate the spline() function in matlab using the spline() function in R's splinefun {stat}s package, without having full access to matlab (I don't have a licence for it). I am able to input all of the necessary data into R that would be present in matlab, but my spline output is different than matlab's by an average of .0036 (maxdif is .0342, mindif is -.0056, stdev is .0094). My main question is, how does matlab's formula compare to R's, and is that where my calculation discrepancy might come from?
The first part of my code is feeding the excel spreadsheet into R, then calculating the necessary variables to get tau and quick delta. After this, I run the spline calculation and then rotate the output for the purposes of exporting back into excel. Below is the essential script, plus some data to try out to see if there is something flawed in my calculation. I use spline(natural), as it returns the closest values to matlab's model.
#establishing what tau is for quick Delta calculation
today<-Sys.Date()
month<-as.Date(5/1/2016)
difday<-difftime(month,today,units=c("days"))
Tau<-as.numeric((month-today)/365)
Pu<-as.numeric(1.94)
Vol<-as.numeric(.4261)
#Pf is the representation of my fixed strike prices, the points used for interpolation
Pf<-c(Pu-.3,Pu-.25,Pu-.2,Pu-.1,Pu,Pu+.1,Pu+.2,Pu+.25,Pu+.3)
qDtable<-data.frame(matrix(ncol=length(Pf),nrow=length(month)))
colnames(qDtable)<-c(Pf)
rownames(qDtable)<-format.Date(month)
#my quick Delta calculation & table as a result
qD<-data.frame(pnorm(log(Pf/Pu)/(Vol*sqrt(Tau))))
Qd<-t(qD[1:24,1])
qDtable[1,]=c(Qd)
#setting up for spline interpolation
qDpoint<-as.numeric(qDtable[1,1:24])
ncsibyPf<-data.frame(matrix(ncol=length(Pf),nrow=length(month)))
colnames(ncsibyPf)<-Pf
rownames(ncsibyPf)<-format.Date(month)
qDvol<-data.frame(matrix(ncol=14,nrow=2)
colnames(qDvol)<-c("",0,.05,.1,.2,.3,.4,.5,.6,.7,.8,.9,.95,1)
rownames(qDvol)<-format.Date(month)
qDvol[2,2:14]<-c(.59612,.51112,.46112,.45612,.44612,.42612,.42612,.42612,.42612,.42612,.42612,.42612,.42612)
#x is the quick Vol point
x<-as.numeric(qDvol[1,2:14])
#y is the vol at the quick Vol point
y<-as.numeric(qDvol[2,2:14])
ncsivol<-data.frame(spline(x,y,xout=qDpoint,method="natural"))
nroutput<-t(ncsivol[1:24,2])
ncsibyPf[1,]=c(nroutput)
The essential data points for this spline run are all included (I think), and everything should line up correctly. Thank you for your help ahead of time!

Related

Converting R script to SAS

I want to add noise to a dataset. This is a fairly straightforward procedure in R. I sample from a Laplace distribution and then add/multiply/whatever that vector to the vector I want to add noise to.
The issue is, my colleague is asking for the code in SAS. I have not used SAS since graduate school and my project has been put on hold until I can get my colleague up to speed in SAS.
My code is pretty simple :
library ("rmutil")
vector <- c (1,2,3,1,2,3,1,2,3)
vector_prop <- vector/sum(vector)
noise <- rlaplace(9, m=1, s=.1)
new_vector <- vector_prop * noise
I am turning my vector I want to add noise to into a proportion, then drawing from a laplace distribution. Finally I multiply those draws with my proportion vector.
Any idea would be helpful as the SAS documentation was difficult to follow. I imagine they feel the same way with R documentation.
Assuming your data is in a data set called have with a variable called vector_prop the following code is likely correct. Because of the nature of random numbers and streams you can't replicate that though, don't you end up with a different data set each time?
data want;
set have;
call streaminit(24); *fixes random number stream for reproduciblilty;
new_var = vectorProp * rand('laplace', 1, 0.1);
run;

Trying to do a simulation in R

I'm pretty new to R, so I hope you can help me!
I'm trying to do a simulation for my Bachelor's thesis, where I want to simulate how a stock evolves.
I've done the simulation in Excel, but the problem is that I can't make that large of a simulation, as the program crashes! Therefore I'm trying in R.
The stock evolves as follows (everything except $\epsilon$ consists of constants which are known):
$$W_{t+\Delta t} = W_t exp^{r \Delta t}(1+\pi(exp((\sigma \lambda -0.5\sigma^2) \Delta t+\sigma \epsilon_{t+\Delta t} \sqrt{\Delta t}-1))$$
The only thing here which is stochastic is $\epsilon$, which is represented by a Brownian motion with N(0,1).
What I've done in Excel:
Made 100 samples with a size of 40. All these samples are standard normal distributed: N(0,1).
Then these outcomes are used to calculate how the stock is affected from these (the normal distribution represent the shocks from the economy).
My problem in R:
I've used the sample function:
x <- sample(norm(0,1), 1000, T)
So I have 1000 samples, which are normally distributed. Now I don't know how to put these results into the formula I have for the evolution of my stock. Can anyone help?
Using R for (discrete) simulation
There are two aspects to your question: conceptual and coding.
Let's deal with the conceptual first, starting with the meaning of your equation:
1. Conceptual issues
The first thing to note is that your evolution equation is continuous in time, so running your simulation as described above means accepting a discretisation of the problem. Whether or not that is appropriate depends on your model and how you have obtained the evolution equation.
If you do run a discrete simulation, then the key decision you have to make is what stepsize $\Delta t$ you will use. You can explore different step-sizes to observe the effect of step-size, or you can proceed analytically and attempt to derive an appropriate step-size.
Once you have your step-size, your simulation consists of pulling new shocks (samples of your standard normal distribution), and evolving the equation iteratively until the desired time has elapsed. The final state $W_t$ is then available for you to analyse however you wish. (If you retain all of the $W_t$, you have a distribution of the trajectory of the system as well, which you can analyse.)
So:
your $x$ are a sampled distribution of your shocks, i.e. they are $\epsilon_t=0$.
To simulate the evolution of the $W_t$, you will need some initial condition $W_0$. What this is depends on what you're modelling. If you're modelling the likely values of a single stock starting at an initial price $W_0$, then your initial state is a 1000 element vector with constant value.
Now evaluate your equation, plugging in all your constants, $W_0$, and your initial shocks $\epsilon_0 = x$ to get the distribution of prices $W_1$.
Repeat: sample $x$ again -- this is now $\epsilon_1$. Plugging this in, gives you $W_2$ etc.
2. Coding the simulation (simple example)
One of the useful features of R is that most operators work element-wise over vectors.
So you can pretty much type in your equation more or less as it is.
I've made a few assumptions about the parameters in your equation, and I've ignored the $\pi$ function -- you can add that in later.
So you end up with code that looks something like this:
dt <- 0.5 # step-size
r <- 1 # parameters
lambda <- 1
sigma <- 1 # std deviation
w0 <- rep(1,1000) # presumed initial condition -- prices start at 1
# Show an example iteration -- incorporate into one line for production code...
x <- rnorm(1000,mean=0,sd=1) # random shock
w1 <- w0*exp(r*dt)*(1+exp((sigma*lambda-0.5*sigma^2)*dt +
sigma*x*sqrt(dt) -1)) # evolution
When you're ready to let the simulation run, then merge the last two lines, i.e. include the sampling statement in the evolution statement. You then get one line of code which you can run manually or embed into a loop, along with any other analysis you want to run.
# General simulation step
w <- w*exp(r*dt)*(1+exp((sigma*lambda-0.5*sigma^2)*dt +
sigma*rnorm(1000,mean=0,sd=1)*sqrt(dt) -1))
You can also easily visualise the changes and obtain summary statistics (5-number summary):
hist(w)
summary(w)
Of course, you'll still need to work through the details of what you actually want to model and how you want to go about analysing it --- and you've got the $\pi$ function to deal with --- but this should get you started toward using R for discrete simulation.

Wavelet reconstruction of time series

I'm trying to reconstruct the original time series from a Morlet's wavelet transform. I'm working in R, package Rwave, function cwt. The result of this function is a matrix of n*m (n=period, m=time) containing complex values.
To reconstruct the signal I used the formula (11) in Torrence & Compo classic text, but the result has nothing to do with the original signal. I'm specially concerned with the division between the real part of the wavelet transform and the scale, this step distorts completely the result. On the other hand, if I just sum the real parts over all the scales, the result is quite similar to the original time series, but with slightly wider values (the original series ranges~ [-0.2, 0.5], the reconstructed series ranges ~ [-0.4,0.7]).
I'm wondering if someone could tell of some practical procedure, formula or algorithm to reconstruct the original time series. I've already read the papers of Torrence and Compo (1998), Farge (1992) and other books, all with different formulas, but no one really help me.
I have been working on this topic currently, using the same paper. I show you code using an example dataset, detailing how I implemented the procedure of wavelet decomposition and reconstruction.
# Lets first write a function for Wavelet decomposition as in formula (1):
mo<-function(t,trans=0,omega=6,j=0){
dial<-2*2^(j*.125)
sqrt((1/dial))*pi^(-1/4)*exp(1i*omega*((t-trans)/dial))*exp(-((t-trans)/dial)^2/2)
}
# An example time series data:
y<-as.numeric(LakeHuron)
From my experience, for correct reconstruction you should do two things: first subject the mean to get a zero-mean dataset. I then increase the maximal scale. I mostly use 110 (although the formula in the Torrence and Compo suggests 71)
# subtract mean from data:
y.m<-mean(y)
y.madj<-y-y.m
# increase the scale:
J<-110
wt<-matrix(rep(NA,(length(y.madj))*(J+1)),ncol=(J+1))
# Wavelet decomposition:
for(j in 0:J){
for(k in 1:length(y.madj)){
wt[k,j+1]<-mo(t=1:(length(y.madj)),j=j,trans=k)%*%y.madj
}
}
#Extract the real part for the reconstruction:
wt.r<-Re(wt)
# Reconstruct as in formula (11):
dial<-2*2^(0:J*.125)
rec<-rep(NA,(length(y.madj)))
for(l in 1:(length(y.madj))){
rec[l]<-0.2144548*sum(wt.r[l,]/sqrt(dial))
}
rec<-rec+y.m
plot(y,type="l")
lines(rec,col=2)
As you can see in the plot, it looks like a perfect reconstruction:

analytical derivative of splinefun()

I'm trying to fit a natural cubit spline to probabilistic data (probabilities that a random variable is smaller than certain values) to obtain a cumulative distribution function, which works well enough using splinefun():
cutoffs <- c(-90,-60,-30,0,30,60,90,120)
probs <- c(0,0,0.05,0.25,0.5,0.75,0.9,1)
CDF.spline <- splinefun(cutoffs,probs, method="natural")
plot(cutoffs,probs)
curve(CDF.spline(x), add=TRUE, col=2, n=1001)
I would then, however, like to use the density function, i.e. the derivative of the spline, to perform various calculations (e.g. to obtain the expected value of the random variable).
Is there any way of obtaining this derivative as a function rather than just evaluated at a discrete number of points via splinefun(x, deriv=1)?
This is pretty close to what I'm looking for, but alas the example doesn't seem to work in R version 2.15.0.
Barring an analytical solution, what's the cleanest numerical way of going about this?
If you change the environment assignment line for g in the code the Berwin Turlach provided on R-help to this:
environment(g) <- environment(f)
... you succeed in R 2.15.1.

approximation methods

I attached image:
(source: piccy.info)
So in this image there is a diagram of the function, which is defined on the given points.
For example on points x=1..N.
Another diagram, which was drawn as a semitransparent curve,
That is what I want to get from the original diagram,
i.e. I want to approximate the original function so that it becomes smooth.
Are there any methods for doing that?
I heard about least squares method, which can be used to approximate a function by straight line or by parabolic function. But I do not need to approximate by parabolic function.
I probably need to approximate it by trigonometric function.
So are there any methods for doing that?
And one idea, is it possible to use the Least squares method for this problem, if we can deduce it for trigonometric functions?
One more question!
If I use the discrete Fourier transform and think about the function as a sum of waves, so may be noise has special features by which we can define it and then we can set to zero the corresponding frequency and then perform inverse Fourier transform.
So if you think that it is possible, then what can you suggest in order to identify the frequency of noise?
Unfortunately many solutions here presented don't solve the problem and/or they are plain wrong.
There are many approaches and they are specifically built to solve conditions and requirements you must be aware of !
a) Approximation theory: If you have a very sharp defined function without errors (given by either definition or data) and you want to trace it exactly as possible, you are using
polynominal or rational approximation by Chebyshev or Legendre polynoms, meaning that you
approach the function by a polynom or, if periodical, by Fourier series.
b) Interpolation: If you have a function where some points (but not the whole curve!) are given and you need a function to get through this points, you can use several methods:
Newton-Gregory, Newton with divided differences, Lagrange, Hermite, Spline
c) Curve fitting: You have a function with given points and you want to draw a curve with a given (!) function which approximates the curve as closely as possible. There are linear
and nonlinear algorithms for this case.
Your drawing implicates:
It is not remotely like a mathematical function.
It is not sharply defined by data or function
You need to fit the curve, not some points.
What do you want and need is
d) Smoothing: Given a curve or datapoints with noise or rapidly changing elements, you only want to see the slow changes over time.
You can do that with LOESS as Jacob suggested (but I find that overkill, especially because
choosing a reasonable span needs some experience). For your problem, I simply recommend
the running average as suggested by Jim C.
http://en.wikipedia.org/wiki/Running_average
Sorry, cdonner and Orendorff, your proposals are well-minded, but completely wrong because you are using the right tools for the wrong solution.
These guys used a sixth polynominal to fit climate data and embarassed themselves completely.
http://scienceblogs.com/deltoid/2009/01/the_australians_war_on_science_32.php
http://network.nationalpost.com/np/blogs/fullcomment/archive/2008/10/20/lorne-gunter-thirty-years-of-warmer-temperatures-go-poof.aspx
Use loess in R (free).
E.g. here the loess function approximates a noisy sine curve.
(source: stowers-institute.org)
As you can see you can tweak the smoothness of your curve with span
Here's some sample R code from here:
Step-by-Step Procedure
Let's take a sine curve, add some
"noise" to it, and then see how the
loess "span" parameter affects the
look of the smoothed curve.
Create a sine curve and add some noise:
period <- 120 x <- 1:120 y <-
sin(2*pi*x/period) +
runif(length(x),-1,1)
Plot the points on this noisy sine curve:
plot(x,y, main="Sine Curve +
'Uniform' Noise") mtext("showing
loess smoothing (local regression
smoothing)")
Apply loess smoothing using the default span value of 0.75:
y.loess <- loess(y ~ x, span=0.75,
data.frame(x=x, y=y))
Compute loess smoothed values for all points along the curve:
y.predict <- predict(y.loess,
data.frame(x=x))
Plot the loess smoothed curve along with the points that were already
plotted:
lines(x,y.predict)
You could use a digital filter like a FIR filter. The simplest FIR filter is just a running average. For more sophisticated treatment look a something like a FFT.
This is called curve fitting. The best way to do this is to find a numeric library that can do it for you. Here is a page showing how to do this using scipy. The picture on that page shows what the code does:
(source: scipy.org)
Now it's only 4 lines of code, but the author doesn't explain it at all. I'll try to explain briefly here.
First you have to decide what form you want the answer to be. In this example the author wants a curve of the form
f(x) = p0 cos (2π/p1 x + p2) + p3 x
You might instead want the sum of several curves. That's OK; the formula is an input to the solver.
The goal of the example, then, is to find the constants p0 through p3 to complete the formula. scipy can find this array of four constants. All you need is an error function that scipy can use to see how close its guesses are to the actual sampled data points.
fitfunc = lambda p, x: p[0]*cos(2*pi/p[1]*x+p[2]) + p[3]*x # Target function
errfunc = lambda p: fitfunc(p, Tx) - tX # Distance to the target function
errfunc takes just one parameter: an array of length 4. It plugs those constants into the formula and calculates an array of values on the candidate curve, then subtracts the array of sampled data points tX. The result is an array of error values; presumably scipy will take the sum of the squares of these values.
Then just put some initial guesses in and scipy.optimize.leastsq crunches the numbers, trying to find a set of parameters p where the error is minimized.
p0 = [-15., 0.8, 0., -1.] # Initial guess for the parameters
p1, success = optimize.leastsq(errfunc, p0[:])
The result p1 is an array containing the four constants. success is 1, 2, 3, or 4 if ths solver actually found a solution. (If the errfunc is sufficiently crazy, the solver can fail.)
This looks like a polynomial approximation. You can play with polynoms in Excel ("Add Trendline" to a chart, select Polynomial, then increase the order to the level of approximation that you need). It shouldn't be too hard to find an algorithm/code for that.
Excel can show the equation that it came up with for the approximation, too.

Resources