Exponential moving average function for irregularly timed samples - math

I have this problem where I need to compute a continuous exponential moving average of a value in a discrete data stream. It's impossible to predict when I will receive the next sample, but EMA formulas expect the amount of time between each sample of data to be equal.
I found this article with a demonstration of how to work around this:
double exponentialMovingAverageIrregular( double alpha,
double sample,
double prevSample,
double deltaTime,
double emaPrev
)
{
double a = deltaTime / ( 1 - alpha );
double u = exp( a * -1 ); // e^(-a)
double v = ( 1 - u ) / a;
double emaNext = ( emaPrev * u )
+ ( prevSample * ( v - u ) )
+ ( sample * ( 1 - v ) );
return emaNext;
}
I compute alpha by using the following formula: 2 / (period + 1) where period is the number of milliseconds I want my EMA to pay attention to.
When I use this, the EMA moves way too quickly. I could have a 30 minute window that takes only two or three samples for the EMA to equal the input.
Here are some things I could be doing wrong:
I use milliseconds for computing alpha because that's the resolution of the timestamps on my input
I use milliseconds for deltaTime because that's what everything else is using
Per the suggestion of commenters on the article, I use a = deltaTime / (a - alpha) instead of a = deltaTime / alpha. Neither fixes the problem, but the latter causes more problems.
Here is a contrived example in which all the samples are exactly one minute apart. When computing alpha, I used 11 * 60 * 1000, or 11 minutes, leaving me with alpha = 0.0000030302984389417593. Notice how each ema has followed the sample almost exactly. This is not supposed to happen with an 11 minute window.
sample 10766.26, ema 10766.260001166664, time 1518991800000
sample 10750.75, ema 10750.750258499216, time 1518991860000
sample 10750.76, ema 10750.759999833333, time 1518991920000
sample 10750.75, ema 10750.750000166665, time 1518991980000
sample 10750.76, ema 10750.759999833333, time 1518992040000
sample 10750.76, ema 10750.759999999998, time 1518992100000
sample 10750.76, ema 10750.759999999998, time 1518992160000
sample 10750, ema 10750.000012666627, time 1518992220000
sample 10719.99, ema 10719.990500165151, time 1518992280000
sample 10720, ema 10719.999999833333, time 1518992340000
sample 10719.99, ema 10719.990000166667, time 1518992400000
sample 10719.99, ema 10719.99, time 1518992460000
sample 10709.27, ema 10709.270178666126, time 1518992520000
sample 10690.26, ema 10690.260316832373, time 1518992580000
sample 10690.27, ema 10690.269999833334, time 1518992640000
sample 10690.27, ema 10690.27, time 1518992700000
sample 10695, ema 10694.999921166906, time 1518992760000
sample 10699.98, ema 10699.979917000252, time 1518992820000
sample 10702.05, ema 10702.049965500104, time 1518992880000
sample 10744.99, ema 10744.989284335501, time 1518992940000
sample 10744.12, ema 10744.120014499955, time 1518993000000
The way the function was derived was not explained, and I didn't pay attention in math class. Any pointers would be greatly appreciated.

You Get Exactly What You've Defined:
given the way you defined alpha, the rest is a causal-chain:
|>>> a = 60000 / 0.999997
|>>> u = exp( -a )
|>>> v = ( 1 - u ) / a
|>>> u, ( v - u ), ( 1 - v )
( 0.0, 1.6666616666666667e-05, 0.99998333338333334 )
thus a
return ( ( emaPrev * u ) // -> 0. * emaPrev
+ ( prevSample * ( v - u ) ) // -> 0.000016 * prevSample
+ ( sample * ( 1 - v ) ) // -> 0.999983 * sample
); // ~= sample
returns nothing much different from the sample ( all the powers of the smoothing effect has been efficiently short-cut off the wannabe-smoothing-filter )
There are different motivations in different fields of use of the signal-filtering / smoothing. Strategies that may work fine in domains of mass-bound models for noisy sensor readouts, need not meet your expectations in other domains, like quant-modelling in trading and other domains that enjoy mass-less or otherwise absent products of inertia for processes and similar principal discontinuities of the subject of the study phenomena.
Out of question, it is worth spending some time both on math and on quant subjects of the study, both of these help you a lot in future work.

Related

Question about delayed sampled sinusoid math expression

I have been studying the digital audio processing by using the book <Designing Audio Effect Plugins in C++>.
For analog Sinusoid:
Complex Sinusoid = e^(jωt)
Delayed Sinusoid = e^(jω(t−n)) = e^(jwt) * e^(-jwn), a delay of n seconds
For digital sampled version:
sampled complex sinusoid = e^(jωnT), T is interval for each sample, n is the index of sample
I understand all above, but I got confused about the delayed sampled sinusoid which described as: e^(jω ( nT −M )), M = samples of delay
But I think it should be described as e^(jωT( n − M )), since the T is a constant for a fixed sample rate, n and M has the same unit.
Anyone can explain it for me?
You are right about e^(jωT( n − M )), when M represents delay as sample count
Formula e^(jω ( nT −M )) is valid for M as time

Wave prediction after a FFT with a certain phase and frequency

I am using a sliding window to extract information from my EEG data with a FFT. Now I want to predict the signal from my window into the next one. So I extract the phase from a 0.25 second time window to predict for the next 0.25 second long window.
I am new to signal-processing/prediction, so my knowledge here is a little rusty.
I am not able to generate a sine wave with my extracted phase and frequency. I am just not finding a solution. I might just need a push into the right direction, who knows.
Is there a function in R to help me generate a suitable sine wave?
So I have my maximum Frequency with the phase extracted and need to generate a wave with this information.
here is pseudo-code to synthesize a sin curve of chosen frequency ... currently it assumes an initial seed phase shift of zero so just alter the theta value if you need a different initial phase shift
func pop_audio_buffer(number_of_samples float64, given_freq float64,
samples_per_second float64) ([]float64, error) {
// output sinusoidal curve is assured to both start and stop at the zero cross over threshold,
// independent of supplied input parms which control samples per cycle and buffer size.
// This avoids that "pop" which otherwise happens when rendering audio curves
// which begins at say 0.5 of a possible range -1 to 0 to +1
int_number_of_samples := int(number_of_samples)
if int_number_of_samples == 0 {
panic("ERROR - seeing 0 number_of_samples in pop_audio_buffer ... float number_of_samples " +
FloatToString(number_of_samples) + " is your desired_num_seconds too small ? " +
" or maybe too low value of sample rate")
}
source_buffer := make([]float64, int_number_of_samples)
incr_theta := (2.0 * math.Pi * given_freq) / samples_per_second
theta := 0.0
for curr_sample := 0; curr_sample < int_number_of_samples; curr_sample++ {
source_buffer[curr_sample] = math.Sin(theta)
theta += incr_theta
}
return source_buffer, nil
} // pop_audio_buffer

Simple American Option Pricing via Monte Carlo Simulation in R - Results are too high

I am more of a novice in R and have been trying to built a formula to price american type options (call or put) using a simple Monte Carlo Simulation (no regressions etc.). While the code works well for European Type Options, it appears to overvalue american type options (in comparision to Binomial-/Trinomial Trees and other pricing models).
I would greatly appreciate your input!
The steps I take are outlined below.
1.) Simulate n stock price paths with m+1 steps (Geometric Brownian Motion):
n = 10000; m = 100; T = 5; S = 100; X = 100; r = 0.1; v = 0.1; d = 0
pat = matrix(NA,n,m+1)
pat[,1] = S
dt = T/m
for(i in 1:n)
{
for (j in seq(2,m+1))
{
pat[i,j] = pat[i,j-1] + pat[i,j-1]*((r-d)* dt + v*sqrt(dt)*rnorm(1))
}
}
2.) I calculate the payoff matrix for call options and put options and discount both via backwards induction:
# Put option
payP = matrix(NA,n,m+1)
payP[,m+1] = pmax(X-pat[,m+1],0)
for (j in seq(m,1)){
payP[,j] = pmax(X-pat[,j],payP[,j+1]*exp(-r*dt))
}
# Call option
payC = matrix(NA,n,m+1)
payC[,m+1] = pmax(pat[,m+1]-X,0)
for (j in seq(m,1)){
payC[,j] = pmax(pat[,j]-X,payC[,j+1]*exp(-r*dt))
}
3.) I calculate the Option Price as the average (mean) payoff at time 0:
mean(payC[,1])
mean(payP[,1])
In the example above, a call price of approximately 44.83 and an approximate put price of 3.49 is found. However, following a trinomial tree approach (n = 250 steps), prices should more 39.42 (call) and 1.75 (put).
Black Scholes Call Price (since no dividend yield) is 39.42.
As I said, any input is highly appreciated. Thank you very much in advance!
All the bests!
I think your problem is rather a conceptual one than an actual coding problem.
What your code currently does is that it takes the in hindsight best point in time to exercise the American option over the whole simulated stock price path. It does not take into account that once the intrinsic value of an American option is higher than its calculated option price, you exercise it - which means, that you forego the chance to exercise it in the future where the difference between the intrinsic value and option price might be even larger (depending on the realized stock price movements).
Hence, you overestimate the option prices.

A simple simulation example in R from a textbook

I found this piece of code from a the textbook "Statistics and Data analysis for financial engineering," but I am confused about certain line in this code:
This code tried to answer the question of What is the probability that the value of the stock will be below $950,000 at the close of at least one of the next 45 trading days? They provide the mean and SD too.
Code:
niter = 1e5 # number of iterations
below = rep(0,niter) # set up storage
set.seed(2009)
for (i in 1:niter)
{
r = rnorm(45,mean=.05/253,
sd=.23/sqrt(253)) # generate random numbers
logPrice = log(1e6) + cumsum(r)
minlogP = min(logPrice) # minimum price over next 45 days
below[i] = as.numeric(minlogP < log(950000))
}
mean(below)
A few questions:
I dont understand about logPrice = log(1e6) + cumsum(r), why we use log(1e6) and why we have cumsum(r)?
What is the purpose of this: below[i] = as.numeric(minlogP < log(950000))
why do we use log(950000)? why do we need to log?
I'm guessing that current price is $100,000 and hence log(1e6). The return has to be accumulated over period of 45 days and therefore cumsum(r)
Well you are checking if price falls below $950,000
In quant the stock return is normally distributed and stock price (always +ve) is log-normal.

Constrain Optimisation Problems in R

I am trying to set up an optimisation script that will look at a set of models, fit curves to the models and then optimise across them, subject to a few parameters.
Essentially, I have revenue as a function of cost, in a diminishing function, and I have this for multiple portfolios, say 4 or 5. As an input, I have cost and revenue figures, at set increments. What I want to do is fit a curve to the portfolio of the form Revenue=A*cost^B, and then optimise across the different portfolios to find the optimal cost split between each portfolio for a set budget.
The code below (I apologise for the inelegance of it, I'm sure there are MANY improvements to be made!) essentially reads in my data, in this case, a simulation, creates the necessary data frames (this is likely where my inelegance comes in), calculates the necessary variables for the curves for each simulation and produces graphics to check the fitted curve to the data.
My problem is that now I have 5 curves of the form:
revenue = A * Cost ^ B (different A, B and cost for each function)
And I want to know, given the 5 variables, how should I split my cost between them, so I want to optimise the sum of the 5 curves subject to
Cost <= Budget
I know that I need to use constrOptim, but I have spent literally hours banging my head against my desk (literally hours, not literally banging my head...) and I still can't figure out how to set up the function so that it maximises revenue, subject to the cost constraint...
Any help here would be greatly appreciated, this has been bugging me for weeks.
Thanks!
Rich
## clear all previous data
rm(list=ls())
detach()
objects()
library(base)
library(stats)
## read in data
sim<-read.table("input19072011.txt",header=TRUE)
sim2<-data.frame(sim$Wrevenue,sim$Cost)
## identify how many simulations there are - here you can change the 20 to the number of steps but all simulations must have the same number of steps
portfolios<-(length(sim2$sim.Cost)/20)
## create a matrix to input the variables into
a<-rep(1,portfolios)
b<-rep(2,portfolios)
matrix<-data.frame(a,b)
## create dummy vector to hold the revenue predictions
k<-1
j<-20
for(i in 1:portfolios){
test<-sim2[k:j,]
rev9<-test[,1]
cost9<-test[,2]
ds<-data.frame(rev9,cost9)
rhs<-function(cost, b0, b1){
b0 * cost^b1
m<- nls(rev9 ~ rhs(cost9, intercept, power), data = ds, start = list(intercept = 5,power = 1))
matrix[i,1]<-summary(m)$coefficients[1]
matrix[i,2]<-summary(m)$coefficients[2]
k<-k+20
j<-j+20
}
## now there exists a matrix of all of the variables for the curves to optimise
matrix
multiples<-matrix[,1]
powers<-matrix[,2]
coststarts<-rep(0,portfolios)
## check accuracy of curves
k<-1
j<-20
for(i in 1:portfolios){
dev.new()
plot(sim$Wrevenue[k:j])
lines(multiples[i]*(sim$Cost[k:j]^powers[i]))
k<-k+20
j<-j+20
}
If you want to find
the values cost[1],...,cost[5]
that maximize revenue[1]+...+revenue[5]
subject to the constraints cost[1]+...+cost[5]<=budget
(and 0 <= cost[i] <= budget),
you can parametrize the set of feasible solutions
as follows
cost[1] = s(x[1]) * budget
cost[2] = s(x[2]) * ( budget - cost[1] )
cost[3] = s(x[3]) * ( budget - cost[1] - cost[2])
cost[4] = s(x[4]) * ( budget - cost[1] - cost[2] - cost[3] )
cost[5] = budget - cost[1] - cost[2] - cost[3] - cost[4]
where x[1],...,x[4] are the parameters to find
(with no constraints on them)
and s is any bijection between the real line R and the segment (0,1).
# Sample data
a <- rlnorm(5)
b <- rlnorm(5)
budget <- rlnorm(1)
# Reparametrization
s <- function(x) exp(x) / ( 1 + exp(x) )
cost <- function(x) {
cost <- rep(NA,5)
cost[1] = s(x[1]) * budget
cost[2] = s(x[2]) * ( budget - cost[1] )
cost[3] = s(x[3]) * ( budget - cost[1] - cost[2])
cost[4] = s(x[4]) * ( budget - cost[1] - cost[2] - cost[3] )
cost[5] = budget - cost[1] - cost[2] - cost[3] - cost[4]
cost
}
# Function to maximize
f <- function(x) {
result <- sum( a * cost(x) ^ b )
cat( result, "\n" )
result
}
# Optimization
r <- optim(c(0,0,0,0), f, control=list(fnscale=-1))
cost(r$par)

Resources