Im trying to apply the fourier phase shift theorem to a complex signal in R. However, only the magnitude of my signal shifts as I expect it. I think it should be possible to apply this theorem to complex signals, so probably I make an error somewhere. My guess is that there is an error in the frequency axis I calculate.
How do I correctly apply the fourier shift theorem to a complex signal (using R)?
i = complex(0,0,1)
t.in = (1+i)*matrix(c(1,0,0,0,0,0,0,0,0,0))
n.shift = 5
#the output of fft() has the mean / 0 frequency at the first element
#it then increases to the highest frequency, flips to negative frequencies
#and then increases again to the negative frequency closest to 0
N = length(t.in)
if (N%%2){#odd
kmin = -(N-1)/2
kmax = (N-1)/2
} else {#even
kmin = -N/2
kmax = N/2-1
#center frequency negative, is that correct?
}
#create frequency axis for fft() output, no sampling frequency or sample duration needed
k = (kmin:kmax)
kflip = floor(N/2)
k = k[c((kflip+1):N,1:kflip)]
f = 2*pi*k/N
shiftterm = exp( -i*n.shift*f )
T.in = fft(t.in)
T.out = T.in*shiftterm
t.out = fft(T.out, inverse=T)/N
par(mfrow=c(2,2))
plot(Mod(t.in),col="green");
plot(Mod(t.out), col="red");
plot(Arg(t.in),col="green");
plot(Arg(t.out),col="red");
As you can see the magnitude of the signal is nicely shifted, but the phase is scrambled. I think the negative frequencies are where my error is, but I cant see it.
What am I doing wrong?
The questions about fourier phase shift theorem I could find:
real 2d signal in python
real 2d signal in matlab
real 1d signal in python
math question about what fourier shift does
But these were not about complex signals.
Answer
As Steve suggested in the comments, I checked the phase on the 6th element.
> Arg(t.out)[6]
[1] 0.7853982
> Arg(t.in)[1]
[1] 0.7853982
So the only element that has a magnitude (at least one order of magnitude higher than the EPS) does have the phase that I expected.
TL;DR The result from the original approach in the question was already correct, we see the Gibbs Phenomenon sliding by.
Just discard low magnitude elements?
If ever the phase of elements that should be zero will be a problem I can run t.out[Mod(t.out)<epsfactor*.Machine$double.eps] = 0 where in this case epsfactor has to be 10 to get rid of the '0' magnitude elements.
Adding that line before plotting gives the following result, which is what I expected to get beforehand. However, the 'scrambled' phase might actually be accurate in most cases as I'll explain below.
The original result really was correct
Just setting low magnitude elements to 0 does not make the phase of the shifted signal more intuitive however. This is a plot where I apply a 4.5 sample shift, the phase is still 'scrambled'.
Applying fourier shift equivalent to downsmapling shifted fourier interpolation
It occurred to me that applying a non-integer number of elements phase shift is equivalent to fourier interpolating the signal and then downsample the interpolated signal at points between the original elements. Since the vector I used as input is an impulse function, the fourier interpolated signal is just not well behaved. Then the signal after applying the fourier phase shift theorem can be expected to have exactly the phase that the fourier interpolated signal has, as seen below.
Gibbs Ringing
Its just at the discontinuities where phase is not well behaved and where small rounding errors might cause large errors in the reconstructed phase. So not really related to low magnitude but to not well defined fourier transform of the input vector. This is called Gibbs Ringing, I could use low-pass filtering with a gaussian filter to decrease it.
Questions related to fourier interpolation and phase shift
symbolic approach in R to estimate fourier transform error
non integer signal shift by use of linear interpolation
downsampling complex signal
fourier interpolation application
estimating sub-sample shift between two signals using fourier transforms
estimating sub-sample shift between two signals without interpolation
Related
I am trying to convert an heightmap into a matrix of normals using central differencing which will later correspond to the steepness of a giving point.
I found several links with correct results but without explaining the math behind.
T
L O R
B
From this link I realised I can just do:
Vec3 normal = Vec3(2*(R-L), 2*(B-T), -4).Normalize();
The thing is that I don't know where the 2* and -4 comes from.
In this explanation of central differencing I see that we should divide that value by 2, but I still don't know how to connect all of this.
What I really want to know is the linear algebra definition behind this.
I have an heightmap, I want to measure the central differences and I want to obtain the normal vector to use later to measure the steepness.
PS: the Z-axis is the height.
From vector calculus, the normal of a surface is given by the gradient operator:
A height map h(x, y) is a special form of the function f:
For a discretized height map, assuming that the grid size is 1, the first-order approximations to the two derivative terms above are given by:
Since the x step from L to R is 2, and same for y. The above is exactly the formula you had, divided through by 4. When this vector is normalized, the factor of 4 is canceled.
(No linear algebra was harmed in the writing of this answer)
I have the following data: a vector B and a vector R. The vector B is the "independent" variable. For this pair, I have two data sets: One is an experimental measurement of Bex, Rex and the other is a simulation produced by me Bsim, Rsim. The simulation does not have any "scale" for the x-axis (the B vector). Therefore when I am trying to fit my curve to the experiment, I have to find out a scaling parameter B0 "by eye", and with this number B0 I multiply the entire Bsim vector and simply plot(Bsim, Rsim, Bex, Rex).
I wanted to use the package LsqFit to make the procedure automatic and more accurate. However I am having trouble in understanding how I could use it to find the scaling on the independent variable.
My first thought was to just "invert" the roles of B and R. However, there are two issues that I think make matters worse: 1) the R curve/data is not monotonous, 2) the experimental data are much more "dense" (they have more data-points: my simulation has 120 points in total, the experiments have some thousands).
Below I give an example if what I am trying to accomplish (of course, the answer need not use LsqFit). I also attach two figures that demonstrate everything very clearly.
#= stuff happened before this point =#
Bsim, Rsim = load(simulation)
Bex, Rex = load(experiment)
#this is what I want to do:
some_model(x, p) = ???
fit = curve_fit(some_model, Bex, Rex, [3.5])
B0 = fit.param[1]
#this is what I currently do by trail and error:
B0 = 3.85 #this is what I currently do by trial and error
plot(B0*Bsim, Rsim, Bex, Rex)
P.S.: The R curves (dependent variables) are both normalized by their maximum value because their scaling is not important.
A simple approach iff you can always expect both your experiment and simulation to feature one high peak, and you're sure that there's only a scaling factor rather than also an offset, is to simply multiply your Bsim vector by mode_rex / mode_rsim (e.g. in your example, mode_rsim = 1, and mode_rex = 4, so multiply Bsim by 4. But I'm sure you've thought of this already.
For a more general approach, one way is as follows:
add and load Interpolations package
Create a grid to interpolate over, e.g. Grid = 0:0.01:Bex[end]
interpolate Rex over that grid, e.g.
RexInterp = interpolate( (Bex,), Rex, Gridded(Linear()));
RexGridVec = RexInterp[Grid];
interpolate Rsim over the same grid, but introduce your multiplier on the Bsim "knots", e.g.
Multiplier = 0.1;
RsimInterp = interpolate( (Multiplier * Bsim,), Rsim, Gridded(Linear()));
RsimGridVec = RsimInterp[Grid]
Now you can calculate a square error value between RsimGridVec and RexGridVec, e.g.
SqErr = sum((RsimGridVec - RexGridVec).^2)
If you follow this technique, then if you create a loop for a multiplier range (say 0:0.01:10), and get the square error associated with each multiplier, you can find out the multiplier for which the square error is the minimum.
In theory if you wanted to find the optimal for a particular offset too, you can make it the outer loop for a range of offsets. Mind you this is a brute force approach, but it be reasonably efficient judging by the vectors in your graph.
There is a question I am stuck on using the following formula for the unipolar transfer function:
f(net)= 1
__________
-net
1 + e
The example has the following:
out = 1
____________ = 0.977
-3.75
1 + e
How do we arrive at 0.977?
What is e?
e = 2.71828... is the base of natural logarithms. It's a mathematical constant that comes up in many different equations, similar to π. You will see it all the time when doing exponents and logarithms.
Plug it into your equation and you get 0.977.
While factually correct the other responses merely provide the value of e and confirm the underlying computation. This type of sigmoid functions is so ubiquitous to neural networks that some additional insight may be welcome.
Essentially the exponential function (e to the x power), has a very characteristic curve:
Mostly flat at zero (very slightly above zero, actually), from - infinity to about -2
incrementally sharp turn towards the vertical, between about -2 and +4
quasi "vertical", with values in excess of 150 and increasingly huge, from +5 to infinity
As a result exponential curves are very useful for producing "S-shaped" functions; BTW, "S" is Sigma in Greek which supplied the etymology for "sigmoid". Such functions are often patterned on the formula shown in the question:
1/(1 + e^-x)
where x is the variable. Typically such functions also include constants aimed at stretching the range (the input zone where changes in x are significant) and/or at modifying the curve in this middle zone.
The result of such functions is that up to a particular value of the input, the function is quasi constant, then, for a particular range of inputs, the function provides a increasing output, and finally past the upper value of the range, the function is quasi constant. Also looking in more details, such Sigmoids have a point of inflection which correspond to a reversing of the rate of change of the ouptut and which also marks an area of the curve, on either side, where the changes are the slowest, relatively.
In turn, such S-shaped curves (1) are very useful to normalize the output of neural network neurons, or more generally, to normalize various numeric values during processes of various nature. Intuitively these correspond to a "sweet spot" or a "sweet range" of the underlying neuron or device.
(1) Or also, possibly, "step-down" shaped curves, i.e. curves with a mostly constant high value, a decreasing value within the mid-range, and a low mostly constant value thereafter.
e is Euler's number == 2.718281828....
If you raise e to the -3.75 power, add one to it, and take the inverse, you'll get precisely 0.977022630....
'e' is the base for the natural logarithm function, the value of which is equivalent to the sum of the infinite series 1/n! for n from 0 to infinity. It is available in the C standard library or the java Math package as the exp() function.
If you evaluate 1/(1+exp(-3.75)) you will get 0.977
Gaffer on Games has a great article about using RK4 integration for better game physics. The implementation is straightforward, but the math behind it confuses me. I understand derivatives and integrals on a conceptual level, but haven't manipulated equations in a long while.
Here's the brunt of Gaffer's implementation:
void integrate(State &state, float t, float dt)
{
Derivative a = evaluate(state, t, 0.0f, Derivative());
Derivative b = evaluate(state, t+dt*0.5f, dt*0.5f, a);
Derivative c = evaluate(state, t+dt*0.5f, dt*0.5f, b);
Derivative d = evaluate(state, t+dt, dt, c);
const float dxdt = 1.0f/6.0f * (a.dx + 2.0f*(b.dx + c.dx) + d.dx);
const float dvdt = 1.0f/6.0f * (a.dv + 2.0f*(b.dv + c.dv) + d.dv)
state.x = state.x + dxdt * dt;
state.v = state.v + dvdt * dt;
}
Can anybody explain in simple terms how RK4 works? Specifically, why are we averaging the derivatives at 0.0f, 0.5f, 0.5f, and 1.0f? How is averaging derivatives up to the 4th order different from doing a simple euler integration with a smaller timestep?
After reading the accepted answer below, and several other articles, I have a grasp on how RK4 works. To answer my own questions:
Can anybody explain in simple terms how RK4 works?
RK4 takes advantage of the fact that
we can get a much better approximation
of a function if we use its
higher-order derivatives rather than
just the first or second derivative.
That's why the Taylor series
converges much faster than Euler
approximations. (take a look at the
animation on the right side of that
page)
Specifically, why are we averaging the derivatives at 0.0f, 0.5f, 0.5f, and 1.0f?
The Runge-Kutta method is an
approximation of a function that
samples derivatives of several points
within a timestep, unlike the Taylor
series which only samples derivatives
of a single point. After sampling
these derivatives we need to know how
to weigh each sample to get the
closest approximation possible. An
easy way to do this is to pick
constants that coincide with the
Taylor series, which is how the
constants of a Runge-Kutta equation
are determined.
This article made it clearer for
me. Notice how (15) is the Taylor
series expansion while (17) is the
Runge-Kutta derivation.
How is averaging derivatives up to the 4th order different from doing a simple euler integration with a smaller timestep?
Mathematically, it converges much
faster than doing many Euler
approximations. Of course, with enough
Euler approximations we can gain equal
accuracy to RK4, but the computational
power needed doesn't justify using
Euler.
This may be a bit oversimplified so far as actual math, but meant as an intuitive guide to Runge Kutta integration.
Given some quantity at some time t1, we want to know the quantity at another time t2. With a first-order differential equation, we can know the rate of change of that quantity at t1. There is nothing else we can know for sure; the rest is guessing.
Euler integration is the simplest way to guess: linearly extrapolate from t1 to t2, using the precisely known rate of change at t1. This usually gives a bad answer. If t2 is far from t1, this linear extrapolation will fail to match any curvature in the ideal answer. If we take many small steps from t1 to t2, we'll have the problem of subtraction of similar values. Roundoff errors will ruin the result.
So we refine our guess. One way is to go ahead and do this linear extrapolation anyway, then hoping it's not too far off from truth, use the differential equation to compute an estimate of the rate of change at t2. This, averaged with the (accurate) rate of change at t1, better represents the typical slope of the true answer between t1 and t2. We use this to make a fresh linear extrapolation from to t1 to t2. It's not obvious if we should take the simple average, or give more weight to the rate at t1, without doing the math to estimate errors, but there is a choice here. In any case, it's a better answer than Euler gives.
Perhaps better, make our initial linear extrapolation to a point in time midway between t1 and t2, and use the differential equation to compute the rate of change there. This gives roughly as good an answer as the average just described. Then use this for a linear extrapolation from t1 to t2, since our purpose it to find the quantity at t2. This is the midpoint algorithm.
You can imagine using the mid-point estimate of the rate of change to make another linear extrapolation of the quantity from t1 to the midpoint. With the differential equation we get an better estimate of the slope there. Using this, we end by extrapolating from t1 all the way to t2 where we want an answer. This is the Runge Kutta algorithm.
Could we do a third extrapolation to the midpoint? Sure, it's not illegal, but detailed analysis shows diminishing improvement, such that other sources of error dominate the final result.
Runge Kutta applies the differential equation to the intial point t1, twice to the midpoint, and once at the final point t2. The in-between points are a matter of choice. It is possible to use other points between t1 and t2 for making those improved estimates of the slope. For example, we could use t1, a point one third the way toward t2, another 2/3 the way toward t2, and at t2. The weights for the average of the four derivatives will be different. In practice this doesn't really help, but might have a place in testing since it ought to give the same answer but will provide a different set of round off errors.
As to your question why: I recall once writing a cloth simulator where the cloth was a series of springs interconnected at nodes. In the simulator, the force exerted by the spring is proportional to how far the spring is stretched. The force causes acceleration at the node, which causes velocity which moves the node which stretches the spring. There are two integrals (integrating acceleration to get velocity, and integrating velocity to get position) and if they are inaccurate, the errors snowball: Too much acceleration causes too much velocity which causes too much stretch which causes even more acceleration, making the whole system unstable.
It is difficult to explain without graphics, but I'll try: Say you have f(t), where f(0) = 10, f(1) = 20, and f(2) = 30.
A proper integration of f(t) over the interval 0 < t < 1 would give you the surface under the graph of f(t) over that interval.
The rectangle rule integration approximates that surface with a rectangle where the breadth is the delta in time and the length is the new value of f(t), so in the interval 0 < t < 1 , it will yield 20 * 1 = 20, and in the next interval 1
Now if you were to plot these points and draw a line through them you'll see that it is actually triangular, with a surface of 30 (units), and therefore the Euler integration is inadequate.
To get a more accurate estimation of the surface (integral) you can take smaller intervals of t, evaluating at for example f(0), f(0.5), f(1), f(1.5) and f(2).
If you're still following me, the RK4 method is then simply a way of estimating values of f(t) for t0 < t < t0+dt invented by people smarter than myself for getting accurate estimates of the integral.
(but as others have said, read the Wikipedia article for a more detailed explanation. RK4 is in the category of numerical integration)
RK4 in the simplest sense is making a approximation function that that is based on 4 derivatives and point for each time step: Your initial condition at starting point A, a first approximated slope B based on data point A at your time step/2 and the slope from A, a third approximation C , which has an correction value for the slope at B to reflect the shape changes of your function, and finally a final slope based on the corrected slope at point C.
So basically this method lets you calculate using a starting point, an averaged midpoint which has corrections built into both parts to adjust for the shape, and a doubly corrected endpoint. This makes the effective contribution from each data point 1/6 1/3 1/3 and 1/6, so most of your answer is based on your corrections for the shape of your function.
It turns out that the order of an RK approximation (Euler is considered an RK1) corresponds to how its accuracy scales with smaller time steps.
The relationship between RK1 approximations is linear, so for 10 times the precision you get roughly 10 times better convergence.
For RK4, 10 times the precision yields you about 10^4 times better convergence. So while your calcuation time increases linearly in RK4, it increases your accuracy polynomially.
My question has to do with the physical meaning of the results of doing a spectral analysis of a signal, or of throwing the signal into an FFT and interpreting what comes out using a suitable numerical package,
Specifically:
take a signal, say a time-varying voltage v(t)
throw it into an FFT (you get back a sequence of complex numbers)
now take the modulus (abs) and square the result, i.e. |fft(v)|^2.
So you now have real numbers on the y axis -- shall I call these spectral coefficients?
using the sampling resolution, you follow a cookbook recipe and associate the spectral coefficients to frequencies.
AT THIS POINT, you have a frequency spectrum g(w) with frequency on the x axis, but WHAT PHYSICAL UNITS on the y axis?
My understanding is that this frequency spectrum shows how much of the various frequencies are present in the voltage signal -- they are spectral coefficients in the sense that they are the coefficients of the sines and cosines of the various frequencies required to reconstitute the original signal.
So the first question is, what are the UNITS of these spectral coefficients?
The reason this matters is that spectral coefficients can be tiny and enormous, so I want to use a dB scale to represent them.
But to do that, I have to make a choice:
Either I use the 20log10 dB conversion, corresponding to a field measurement, like voltage.
Or I use the 10log10 dB conversion, corresponding to an energy measurement, like power.
Which scaling I use depends on what the units are.
Any light shed on this would be greatly appreciated!
take a signal, a time-varying voltage v(t)
units are V, values are real.
throw it into an FFT -- ok, you get back a sequence of complex numbers
units are still V, values are complex ( not V/Hz - the FFT a DC signal becomes a point at the DC level, not an dirac delta function zooming off to infinity )
now take the modulus (abs)
units are still V, values are real - magnitude of signal components
and square the result, i.e. |fft(v)|^2
units are now V2, values are real - square of magnitudes of signal components
shall I call these spectral coefficients?
It's closer to an power density rather than usual use of spectral coefficient. If your sink is a perfect resistor, it will be power, but if your sink is frequency dependent it's "the square of the magnitude of the FFT of the input voltage".
AT THIS POINT, you have a frequency spectrum g(w): frequency on the x axis, and... WHAT PHYSICAL UNITS on the y axis?
Units are V2
The other reason the units matter is that the spectral coefficients can be tiny and enormous, so I want to use a dB scale to represent them. But to do that, I have to make a choice: do I use the 20log10 dB conversion (corresponding to a field measurement, like voltage)? Or do I use the 10log10 dB conversion (corresponding to an energy measurement, like power)?
You've already squared the voltage values, giving equivalent power into a perfect 1 Ohm resistor, so use 10log10.
log(x2) is 2 log(x), so 20log10 |fft(v)| = 10log10 ( |fft(v)|2), so alternatively if you did not square the values you could use 20log10.
The y axis is complex (as opposed to real). The magnitude is the amplitude of the original signal in whatever units your original samples were in. The angle is the phase of that frequency component.
Here's what I've been able to come up with so far:
The y-axis seems likely to be in units of [Energy / Hz] !?
Here's how I'm deriving this (feedback welcomed!):
the signal v(t) is in volts
so after taking the Fourier integral: integral e^iwt v(t) dt , we should have units of [volts*seconds], or [volts/Hz] (e^iwt is unitless)
taking the magnitude squared should then give units of [volts^2 * s^2], or [v^2 * s/Hz]
we know Power is proportional to volts ^2, so this gets us to [power * s / Hz]
but Power is the time-rate of change in energy, i.e. power = energy/s, so we can also write Energy = power * s
this leaves us with the candidate conclusion [Energy/Hz]. (Joules/Hz ?!)
... which suggests the meaning "Energy content per Hz", and suggests as a use integrating frequency bands and seeing the energy content... which would be very nice if it were true...
Continuing... assuming the above is correct, then we are dealing with an Energy measurement, so this would suggest using 10log10 conversion to get into dB scale, instead of 20log10...
...
The power into a resistor is v^2/R watts. The power of a signal x(t) is an abstraction of the power into a 1 Ohm resistor. Therefore, the power of a signal x(t) is x^2 (also called instantaneous power), regardless of the physical units of x(t).
For example, if x(t) is temperature, and the units of x(t) are degrees C, then the units for the power x^2 of x(t) are C^2, certainly not watts.
If you take the Fourier transform of x(t) to get X(jw), then the units of X(jw) are C*sec or C/Hz (according to the Fourier transform integral). If you use (abs(X(jw)))^2, then the units are C^2*sec^2=C^2*sec/Hz. Since power units are C^2, and energy units are C^2*sec, then abs(X(jw)))^2 gives the energy spectral density, say E/Hz. This is consistent with Parseval's theorem, where the energy of x(t) is given by (1/2*pi) times the integral of abs(X(jw)))^2 with respect to w, i.e., (1/2*pi)*int(abs(X(jw)))^2*dw) > (1/2*pi)*(C^2*sec^2)*2*pi*Hz > (1/2*pi)*(C^2*sec/Hz)*2*pi*Hz > E.
Conversion to a dB (log scale) scale does not change the units.
If you take the FFT of samples of x(t), written as x(n), to get X(k), then the result X(k) is an estimate of the Fourier series coefficients of a periodic function, where one period over T0 seconds is the segment of x(t) that was sampled. If the units of x(t) are degrees C, then the units of X(k) are also degrees C. The units of abs(X(k))^2 are C^2, which are the units of power. Thus, a plot of abs(X(k))^2 versus frequency shows the power spectrum (not power spectral density) of x(n), which is an estimate the power of a set of frequency components of x(t) at the frequencies k/T0 Hz.
Well, late answer I know. But I just had cause to do something like this, in a different context. My raw data was latency values for transactions against a storage unit - I resampled it to a 1ms time interval. So original data y was "latency, in microseconds." I had 2^18 = 262144 original data points, on 1ms time steps.
After I did the FFT, I got a 0th component (DC) such that the following held:
FFT[0] = 262144*(average of all input data).
So it looks to me like FFT[0] is N*(average of input data). That sort of makes sense - every single data point possesses that DC average as part of what it is, so you add 'em all up.
If you look at the definition of the FFT that makes sense too. All of the other components would involve sine and cosine terms too, but really the FFT is just a summation. The average is just the only one that happens to be present in all points equally, because you have cos(0) = 1.