I have a stream of data that trends over time. How do I determine the rate of change using C#?
It's been a long time since calculus class, but now is the first time I actually need it (in 15 years). Now when I search for the term 'derivatives' I get financial stuff, and other math things I don't think I really need.
Mind pointing me in the right direction?
If you want something more sophisticated that smooths the data, you should look into a a digital filter algorithm. It's not hard to implement if you can cut through the engineering jargon. The classic method is Savitzky-Golay
If you have the last n samples stored in an array y and each sample is equally spaced in time, then you can calculate the derivative using something like this:
deriv = 0
coefficient = (1,-8,0,8,-1)
N = 5 # points
h = 1 # second
for i range(0,N):
deriv += y[i] * coefficient[i]
deriv /= (12 * h)
This example happens to be a N=5 filter of "3/4 (cubic/quartic)" filter. The bigger N, the more points it is averaging and the smoother it will be, but also the latency will be higher. You'll have to wait N/2 points to get the derivative at time "now".
For more coefficients, look here at the Appendix
https://en.wikipedia.org/wiki/Savitzky%E2%80%93Golay_filter
You need both the data value V and the corresponding time T, at least for the latest data point and the one before that. The rate of change can then be approximated with Eulers backward formula, which translates into
dvdt = (V_now - V_a_moment_ago) / (T_now - T_a_moment_ago);
in C#.
Rate of change is calculated as follows
Calculate a delta such as "price minus - price 20 days ago"
Calculate rate of change such as "delta / price 99 days ago"
Total rate of change, i.e. (new_value - original_value)/time?
Related
After I did some research, I can understand how to implement it with time relevant functions. However, I'm not very sure about whether can I apply it to time irrelevant scenarios.
Giving that we have a simple function y=a*x^2, where both y and x are measured at a constant interval (say 1 min/sample) and a is a constant. However, both y and x measurements have white noise.
More specifically, x and y are two independently measured variables. For example, x is air flow rate in a duct and the y is the pressure drop across the duct. Because the air flow is varying due to the variation of the fan speed, the pressure drop across the duct is also varying. The relation between the pressure drop y and flow rate x is y=a*x^2, however both measurement embedded white noise. Is that possible to use Kalman Filter to estimate a more accurate y? Both x and y are recorded in a constant time interval.
Here are my questions:
Is it feasible to implement Kalman Filter for the y reading noise reduction? Or in another word, have a better estimation of y?
If this is feasible, how to code it in R or C?
P.S.
I tried to apply Kalman Filter to single variable and it works well. The result is as below. I'll have a try Ben's suggestion then and have a look whether can I make it works.
I think you can apply some Kalman Filter like ideas here.
Make your state a, with variance P_a. Your update is just F=[1], and your measurement is just H=[1] with observation y/x^2. In other words, you measure x and y and estimate a by solving for a in your original equation. Update your scalar KF as usual. Approximating R will be important. If x and y both have zero mean Gaussian noise, then y/x^2 certainly doesn't, but you can come up with an approximation.
Now that you have a running estimate of a (which is a random constant, so Q=0 ideally, but maybe Q=[tiny] to avoid numerical issues) you can use it to get a better y.
You have y_meas and y_est=a*x_meas^2. Combine those using your variances as (R_y * a * x^2 + (P_a + R_x2) * y_meas) / (R_y + P_a + R_x2). Over time as P_a goes to zero (you become certain of your estimate of a) you can see you end up combining information from your x and y measurements proportional to your trust in them individually. Early on, when P_a is high you are mostly trusting the direct measurement of y_meas because you don't know the relationship.
How can I round an excessively precise fraction to a less precise format that is more humanly readable?
I'm working with JPEG EXIF exposure time data extracted by MS' Windows Imaging Component. WIC returns exposure times in fractional form with separate ints for numerator and denominator.
WIC usually works as expected, but with some JPEGs, WIC returns exposure times in millionths of a second, meaning that instead of reporting e.g. a 1/135 second exposure time, it reports an exposure time of 7391/1000000 seconds. The difference between 1/135 and 7391/1000000 is quite small but the latter is not intuitive to most users. As such, I'd like to round overly precise exposure times to the nearest standard exposure times used in photography.
Is there a better way to do this other than using a lookup table of known-reasonable exposure times and finding the nearest match?
You can compute the continued fraction expansion of the large fraction. Then take one of the first convergents as your approximate fraction.
In your case, you get
7391/1000000 = [ 0; 135, 3, 2, ...]
so the first convergent is 1/135=0.0074074..., the next
1/(135+1/3) = 3/406 = 0.00738916256...
and the third
1/(135+1/(3+1/2)) = 1/(135+2/7) = 7/947 = 0.00739176346...
To compute the (first) coefficients of a continuous fraction development, you start with xk=x0. Then iteratively apply the procedure
Separate xk=n+r into integer n and fractional part r.
The integer is the next coefficient ak, with the inverse of the fractional part you start this procedure anew, xk = 1/r
Applied to the given number, this produces exactly the start of the sequence as above Then reconstruct the rational expressions, continue until the inverse of the square of the denominator is smaller than a given tolerance.
Try this:
human_readable_denominator = int(0.5 + 1 / precise_exposure_time)
With the example you gave:
human_readable_denominator = int(0.5 + 1 / (7391/1000000))
= 135
This works well for exposure times less than 1/2 second. For longer exposure times, converting to a 1/X format doesn't make sense.
Phil
Take a look at approxRational in Haskell's Data.Ratio. You give it a number and an epsilon value, and it gives the nicest rational number within epsilon of that number. I imagine other languages have similar library functions, or you can translate the Haskell source for approxRational.
I am testing a temperature sensor for a project. i found that there exist a variance between the expected and measured value. As the difference is non -linear over e temperature range i cant simply add an offset . Is there a way i can do a kind of offset to the acquired data ?
UPDATE
I have a commercial heater element which heat up to a set temperature(i named this temperature as expected). On the other side i have a temp sensor (my proj)which measure the temperature of the heater (here i named it as measured).
I noticed the difference between the measured and expected which i would like to compensate so that measured will be close to the expected value.
Example
If my sensor measured 73.3 it should be process by some means(mathematically or otherwise)so that it will show that it is close to 70.25.
Hope this clears thing a little.
Measured Expected
30.5 30.15
41.4 40.29
52.2 50.31
62.8 60.79
73.3 70.28
83 79.7
94 90.39
104.3 99.97
114.8 109.81
Thank you for your time.
You are interested in describing deviation one variable from the other. What you are looking for is function
g( x) = f( x) - x
which returns approximation, a prediction, what number to add to x to get y data based on real x input. You need the prediction of y based on observed x values first, the f(x). This is what you can get from doing a regression:
x = MeasuredExpected ( what you have estimated, and I assume
you will know this value)
y = MeasuredReal ( what have been actually observed instead of x)
f( x) = MeasuredReal( estimated) = alfa*x + beta + e
In the simplest case of just one variable you don't even have to include special tools for this. The coefficients of equation are equal to:
alfa = covariance( MeasuredExpected, MeasuredReal) / variance( MeasuredExpected)
beta = average( MeasuredReal) - alfa * average( MeasuredExpected)
so for each expected measured x you can now state that the most probable value of real measured is:
f( x) = MeasuredReal( expected) = alfa*x + beta (under assumption that error
is normally distributed, iid)
So you have to add
g( x) = f( x) - x = ( alfa -1)*x + beta
to account for the difference that you have observed between your usual Expected and Measured.
Maybe you could use a data sample in order to do a regression analysis on the variation and use the regression function as an offset function.
http://en.wikipedia.org/wiki/Regression_analysis
You can create a calibration lookup table (LUT).
The error in the sensor reading is not linear over the entire range of the sensor, but you can divide the range up into a number of sub-ranges for which the error within the sub-range is nearly linear. Then you calibrate the sensor by taking a reading in each sub-range and calculating the offset error for each sub-range. Store the offset for each sub-range in an array to create a calibration lookup table.
Once the calibration table is known, you can correct a measurement by performing a table lookup for the proper offset. Use the actual measured value to determine the index into the array from which to get the proper offset.
The sub-ranges don't need to be same-sized although that should make it easy to calculate the proper table index for any measurement. (If the sub-ranges are not same-sized then you could use a multidimensional array (matrix) and store not only the offset but also the beginning or end point of each sub-range. Then you would scan through the begin-points to determine the proper table index for any measurement.)
You can make the correction more accurate by dividing into smaller sub-ranges and creating a larger calibration lookup table. Or you may be able to interpolate between two table entries to get a more accurate offset.
I am trying to generate a series of wait times for a Markov chain where the wait times are exponentially distributed numbers with rate equal to one. However, I don't know the number of transitions of the process, rather the total time spent in the process.
So, for example:
t <- rexp(100,1)
tt <- cumsum(c(0,t))
t is a vector of the successive and independent waiting times and tt is a vector of the actual transition time starting from 0.
Again, the problem is I don't know the length of t (i.e. the number of transitions), rather how much total waiting time will elapse (i.e. the floor of last entry in tt).
What is an efficient way to generate this in R?
The Wikipedia entry for Poisson process has everything you need. The number of arrivals in the interval has a Poisson distribution, and once you know how many arrivals there are, the arrival times are uniformly distributed within the interval. Say, for instance, your interval is of length 15.
N <- rpois(1, lambda = 15)
arrives <- sort(runif(N, max = 15))
waits <- c(arrives[1], diff(arrives))
Here, arrives corresponds to your tt and waits corresponds to your t (by the way, it's not a good idea to name a vector t since t is reserved for the transpose function). Of course, the last entry of waits has been truncated, but you mentioned only knowing the floor of the last entry of tt, anyway. If he's really needed you could replace him with an independent exponential (bigger than waits[N]), if you like.
If I got this right: you want to know how many transitions it'll take to fill your time interval. Since the transitions are random and unknown, there's no way to predict for a given sample. Here's how to find the answer:
tfoo<-rexp(100,1)
max(which(cumsum(tfoo)<=10))
[1] 10
tfoo<-rexp(100,1) # do another trial
max(which(cumsum(tfoo)<=10))
[1] 14
Now, if you expect to need to draw some huge sample, e.g. rexp(1e10,1), then maybe you should draw in 'chunks.' Draw 1e9 samples and see if sum(tfoo) exceeds your time threshold. If so, dig thru the cumsum . If not, draw another 1e9 samples, and so on.
I am trying to determine the volatility of a rank.
More specifically, the rank can be from 1 to 16 over X data points (the number of data points varies with a maximum of 30).
I'd like to be able to measure this volatility and then map it to a percentage somehow.
I'm not a math geek so please don't spit out complex formulas at me :)
I just want to code this in the simplest manner possible.
I think the easiest first pass would be Standard Deviation over X data points.
I think that Standard Deviation is what you're looking for. There are some formulas to deal with, but it's not hard to calculate.
Given that you have a small sample set (you say a maximum of 30 data points) and that the standard deviation is easily affected by outliers, I would suggest using the interquartile range as a measure of volatility. It is a trivial calculation and would give a meaningful representation of the data spread over your small sample set.
If you want something really simple you could take the average of the absolute differences between successive ranks as volatility. This has the added bonus of being recursive. Us this for initialisation:
double sum=0;
for (int i=1; i<N; i++)
{
sum += abs(ranks[i]-ranks[i-1]);
}
double volatility = sum/N;
Then for updating the volatility if a new rank at time N+1 is available you introduce the parameter K where K determines the speed with which your volatility measurement adapts to changes in volatility. Higher K means slower adaption, so K can be though of as a "decay time" or somesuch:
double K=14 //higher = slower change in volatility over time.
double newvolatility;
newvolatility = (oldvolatility * (K-1) + abs(rank[N+1] - rank[N]))/K;
This is also known as a moving average (of the absolute differences in ranks in this case).