How do exchanges calculate EMA in real-time? - math

I calculated the EMA of some candles,but i don't know how to calculate the EMA for a unclosed candle
i came up with this idea but it doesn't correspond to one in the exchange
n = EMA_lenght
EMA_FOR_UNCLOSED_CANDLE = Close*(2/(n+1))+last_EMAn * (1-(2/(n+1)))

Related

Find period with lowest variability in time series r

I have a time series and would like to find the period that has the lowest contiguous variability, i.e. the period in which the rolling SD hovers around the minimum for the longest consecutive time steps.
test=c(10,12,14,16,13,13,14,15,15,14,16,16,16,16,16,16,16,15,14,15,12,11,10)
rol=rollapply(x, width=4, FUN=sd)
rol
I can easily see from the data or the graph that the longest period with the lowest variability start at t=11. Is there a function that can help me find this period of continued low variability, perhaps trying automatically different size for the rolling window? I am not interested in finding the time step with the lowest SD, but a period where this low SD is more consistent than others.
All I can think for now is looking at the difference between rol[i]-rol[i+1], looping through the vector and use a counter to find periods of consecutive low values of SD. I was also thinking of using cluster analysis, something like kmeans(rol, 5) but I can have long time series which are complex and I would have to manually pick the number of clusters.

Why is EMA not equal when comparing samples of different length?

I am struggling to understand why EMA (Exponential Moving Average) is different in these 2 cases.
options(scipen = 10)
library(quantmod)
getSymbols("AAPL", src = "google")
data1 <- EMA(AAPL[, "AAPL.Close"])
data2 <- EMA(tail(AAPL[, "AAPL.Close"], n = 10))
result <- data.frame(tail(data1, n = 1), tail(data2, n = 1))
In the first EMA call i supply as parameter the whole AAPL sample. In the second EMA call i supply only the minimum amount of data to calculate EMA for the last date. If i compare calculated value for the last day, they are different.
In concrete, result[1,1] is 152.907 and result[1,2] is 152.623. Why is this happening? I would expect both numbers to be the same, since EMA is not cumulative.
That is because the exponential decay has 'infinite' history: it decays, but never completely goes away.
So the set of weights is different between your two use cases, and hence so is the result. There is a decent recent series on MAs for smoothing price series:
part one
part two
It should make it clear that what you were expecting only holds for SMA which does in fact go to zero weights.

Information contained in FFT bins

I am using the FFTW library for an analysis. Any advice is much appreciated.
Let us assume that we have a function f[k] = A_k*sin(kwT) + B_k*cost(kwT) and I want to FFT this with a sampling rate of N = 4. Here, w is the sidereal frequency of the Earth and T is the sidereal time of an event that I am interested in. If I plot the output of the FFT, each bin (4 total) will contain some information. How do I know which bin contains information about a specific A_k or B_k? Is it true that the first bin of the output, say, corresponds to the information about A_1 or B_1 or both?
The k-th frequency bin corresponds to a centre-frequency of k * fs / N, where fs is your sampling frequency, and N is your FFT length.
Bear in mind that the "first" bin is at k == 0...
Assuming that you are sampling at four times corresponding to 1/4 of the period of the waveform (which has unknown amplitude and phase) and the values observed are a1...a4, then
A = (a1-a3)/2;
B = (a2-a4)/2;
Give you the cosine and sine amplitudes respectively (if the first sample is at t=0).

R: How to generate a series of exponential deviates that sum to some number

I am trying to generate a series of wait times for a Markov chain where the wait times are exponentially distributed numbers with rate equal to one. However, I don't know the number of transitions of the process, rather the total time spent in the process.
So, for example:
t <- rexp(100,1)
tt <- cumsum(c(0,t))
t is a vector of the successive and independent waiting times and tt is a vector of the actual transition time starting from 0.
Again, the problem is I don't know the length of t (i.e. the number of transitions), rather how much total waiting time will elapse (i.e. the floor of last entry in tt).
What is an efficient way to generate this in R?
The Wikipedia entry for Poisson process has everything you need. The number of arrivals in the interval has a Poisson distribution, and once you know how many arrivals there are, the arrival times are uniformly distributed within the interval. Say, for instance, your interval is of length 15.
N <- rpois(1, lambda = 15)
arrives <- sort(runif(N, max = 15))
waits <- c(arrives[1], diff(arrives))
Here, arrives corresponds to your tt and waits corresponds to your t (by the way, it's not a good idea to name a vector t since t is reserved for the transpose function). Of course, the last entry of waits has been truncated, but you mentioned only knowing the floor of the last entry of tt, anyway. If he's really needed you could replace him with an independent exponential (bigger than waits[N]), if you like.
If I got this right: you want to know how many transitions it'll take to fill your time interval. Since the transitions are random and unknown, there's no way to predict for a given sample. Here's how to find the answer:
tfoo<-rexp(100,1)
max(which(cumsum(tfoo)<=10))
[1] 10
tfoo<-rexp(100,1) # do another trial
max(which(cumsum(tfoo)<=10))
[1] 14
Now, if you expect to need to draw some huge sample, e.g. rexp(1e10,1), then maybe you should draw in 'chunks.' Draw 1e9 samples and see if sum(tfoo) exceeds your time threshold. If so, dig thru the cumsum . If not, draw another 1e9 samples, and so on.

How does one calculate the rate of change (derivative) of streaming data?

I have a stream of data that trends over time. How do I determine the rate of change using C#?
It's been a long time since calculus class, but now is the first time I actually need it (in 15 years). Now when I search for the term 'derivatives' I get financial stuff, and other math things I don't think I really need.
Mind pointing me in the right direction?
If you want something more sophisticated that smooths the data, you should look into a a digital filter algorithm. It's not hard to implement if you can cut through the engineering jargon. The classic method is Savitzky-Golay
If you have the last n samples stored in an array y and each sample is equally spaced in time, then you can calculate the derivative using something like this:
deriv = 0
coefficient = (1,-8,0,8,-1)
N = 5 # points
h = 1 # second
for i range(0,N):
deriv += y[i] * coefficient[i]
deriv /= (12 * h)
This example happens to be a N=5 filter of "3/4 (cubic/quartic)" filter. The bigger N, the more points it is averaging and the smoother it will be, but also the latency will be higher. You'll have to wait N/2 points to get the derivative at time "now".
For more coefficients, look here at the Appendix
https://en.wikipedia.org/wiki/Savitzky%E2%80%93Golay_filter
You need both the data value V and the corresponding time T, at least for the latest data point and the one before that. The rate of change can then be approximated with Eulers backward formula, which translates into
dvdt = (V_now - V_a_moment_ago) / (T_now - T_a_moment_ago);
in C#.
Rate of change is calculated as follows
Calculate a delta such as "price minus - price 20 days ago"
Calculate rate of change such as "delta / price 99 days ago"
Total rate of change, i.e. (new_value - original_value)/time?

Resources