FFT frequency domain values depend on sequence length? - math

I am extracting Heart Rate Variable (HRV) frequency domain features, e.g. LF, HF, using FFT. Currently, I have found out that LF and HF values in longer sequence length, e.g. 3 minutes, will be larger than shorter sequence length, e.g. 30 seconds. I wonder if this is a common observation or there are some bugs in my execution codes? Thanks in advance

Yes, the frequency in each bin depends on N, the sequence length.
See this related answer: https://stackoverflow.com/a/4371627/119527

An FFT by itself is a dimensionless basis transform. But if you know the sample rate (Fs) of the input data and the length (N) of the FFT, then the center frequency represented by each FFT result element or result bin is bin_index * (Fs/N).
Normally (with baseband sampling) the resulting range is from 0 (DC) up to Fs/2 (for strictly real input the rest of the FFT results are just a complex conjugate mirroring of the first half).
Added: Also many forward FFT implementations (but not all) are energy preserving. Since a longer signal of the same amplitude input into a longer FFT contains more total energy, the FFT result energy will also be greater by the same proportion, either by bin magnitude for sufficiently narrow-band components, and/or by distribution into more bins.

What you are observing is to be expected, at least with most common FFT implementations. Typically there is a scale factor of N in the forward direction and 1 in the reverse direction, so you need to scale the output of the FFT bins by a factor of 1/N if you are interested in calculating spectral energy (or power).
Note however that these scale factor are just a convention, e.g. some implementations have a sqrt(N) scale factor in both forward and reverse directions, so you need to check the documentation for your FFT library to be absolutely certain.

Related

Mathematical representation of a set of points in N dimensional space?

Given some x data points in an N dimensional space, I am trying to find a fixed length representation that could describe any subset s of those x points? For example the mean of the s subset could describe that subset, but it is not unique for that subset only, that is to say, other points in the space could yield the same mean therefore mean is not a unique identifier. Could anyone tell me of a unique measure that could describe the points without being number of points dependent?
In short - it is impossible (as you would achieve infinite noiseless compression). You have to either have varied length representation (or fixed length with length being proportional to maximum number of points) or dealing with "collisions" (as your mapping will not be injective). In the first scenario you simply can store coordinates of each point. In the second one you approximate your point clouds with more and more complex descriptors to balance collisions and memory usage, some posibilities are:
storing mean and covariance (so basically perofming maximum likelihood estimation over Gaussian families)
performing some fixed-complexity density estimation like Gaussian Mixture Model or training a generative Neural Network
use set of simple geometrical/algebraical properties such as:
number of points
mean, max, min, median distance between each pair of points
etc.
Any subset can be identified by a bit mask of length ceiling(lg(x)), where bit i is 1 if the corresponding element belongs to the subset. There is no fixed-length representation that is not a function of x.
EDIT
I was wrong. PCA is a good way to perform dimensionality reduction for this problem, but it won't work for some sets.
However, you can almost do it. Where "almost" is formally defined by the Johnson-Lindenstrauss Lemma, which states that for a given large dimension N, there exists a much lower dimension n, and a linear transformation that maps each point from N to n, while keeping the Euclidean distance between every pair of points of the set within some error ε from the original. Such linear transformation is called the JL Transform.
In other words, your problem is only solvable for sets of points where each pair of points are separated by at least ε. For this case, the JL Transform gives you one possible solution. Moreover, there exists a relationship between N, n and ε (see the lemma), such that, for example, if N=100, the JL Transform can map each point to a point in 5D (n=5), an uniquely identify each subset, if and only if, the minimum distance between any pair of points in the original set is at least ~2.8 (i.e. the points are sufficiently different).
Note that n depends only on N and the minimum distance between any pair of points in the original set. It does not depend on the number of points x, so it is a solution to your problem, albeit some constraints.

Simple 2 or 3 parameters float PRNG formula that changes faster than the float resolution and produces white noise?

I'm looking for a 2 or 3 parameters math formula with the following characteristics:
Simple (the fewest amount of operations the better)
Random output (non-periodic)
Normalized (Meaning the output will never be outside a given range; doesn't matter the range since once I know the range I can just divide and add/subtract to get it into the 0 to 1 range I'm looking for)
White noise (the more samples you get the more evenly distributed the outputs get across the range of possible output values, with no gaps or hotspots, to the extent permitted by the floating-point standard)
Random all the way down (no gradual changes between output values even if the inputs are changed by the smallest amount the float standard will allow. I understand that given the nature of randomness, it is possible two output values might be close together once in a while, but that must only happen by coincidence, and not because of smoothness or periodicity)
Uses only the operations listed bellow (but of course, any operations that can be done by a combination of the ones listed bellow are also allowed)
I need this because I need a good source of controllable randomness for some experiments I'm doing with Cycles material nodes in Blender. And since that is where the formula will be implemented, the only operations I have available are:
Addition
Subtraction
Multiplication
Division
Power (X to the power of Y)
Logarithm (I think it's X Log Y; I'm not very familiar with the logarithm operation, so I'm not 100% sure if that is enough to specify which type of logarithm it is; let me know if you need more information about it)
Sine
Cosine
Tangent
Arcsine
Arccosine
Arctangent (not Atan2, but that can be created by combining operations if necessary)
Minimum (Returns the lowest of 2 numbers)
Maximum (Returns the highest of 2 numbers)
Round (Returns the closest round number to the input)
Less-than (Returns 1 if X is less than Y, zero otherwise)
Greater-than (Returns 1 if X is more than Y, zero otherwise)
Modulo (Produces a sawtooth pattern of period Y; for positive X values it's in the 0 to Y range, and for negative values of X it's in the -Y to zero range)
Absolute (strips the sign of the input value, makes it positive if it was negative, doesn't do anything if it's already positive)
There is no iteration nor looping functionality available (and of course, branching can only be done by calculating all the branches and then doing something like multiplying the results of the branches not meant to be taken by zero and then adding the results of all of them together).

how to compute the global variance (square standard deviation) in a parallel application?

I have a parallel application in which I am computing in each node the variance of each partition of datapoint based on the calculated mean, but how can I compute the global variance (sum of all the variances)?
I thought that it would be a simple sum of the variances and divided by the number of nodes, but it is not giving me a close result...
The global variation is a sum.
You can compute parts of the sum in parallel trivially, and then add them together.
sum(x1...x100) = sum(x1...x50) + sum(x51...x100)
The same way, you can compute the global averages - compute the global sum, compute the sum of the object counts, divide (don't divide by the number of nodes; but by the total number of objects).
mean = sum/count
Once you have the mean, you can compute the sum of squared deviations using the distributed sum formula above (applied to (xi-mean)^2), then divide by count-1 to get the variance.
Do not use E[X^2] - (E[X])^2
While this formula "mean of square minus square of mean" is highly popular, it is numerically unstable when you are using floating point math. It's known as catastrophic cancellation.
Because the two values can be very close, you lose a lot of digits in precision when computing the difference. I've seen people get a negative variance this way...
With "big data", numerical problems gets worse...
Two ways to avoid these problems:
Use two passes. Computing the mean is stable, and gets you rid of the subtraction of the squares.
Use an online algorithm such as the one by Knuth and Welford, then use weighted sums to combine the per-partition means and variances. Details on Wikipedia In my experience, this often is slower; but it may be beneficial on Hadoop due to startup and IO costs.
You need to add the sums and sums of squares of each partition to get the global sum and sum of squares and then use them to calculate the global mean and variance.
UPDATE: E[X2] - E[X]2 and cancellation...
To figure out how important cancellation error is when calculating the standard deviation with
σ = √(E[X2] - E[X]2)
let us assume that we have both E[X2] and E[X]2 accurate to 12 significant decimal figures. This implies that σ2 has an error of order 10-12 × E[X2] or, if there has been significant cancellation, equivalently 10-12 × E[X]2 when σ will have an error of approximate order 10-6 × E[X]; one millionth the mean.
For many, if not most, statistical analyses this is negligable, in the sense that it falls within other sources of error (like measurement error), and so you can in good consciense simply set negative variances to zero before you take the square root.
If you really do care about deviations of this magnitude (and can show that it's a feature of the thing you are measuring and not, for example, an artifact of the method of measurement) then you can start worrying about cancellation. That said, the most likely explanation is that you have used an inappropriate scale for your data, such as measuring daily temperatures in Kelvin rather than Celcius!

Finding a reasonable (noise-free) maximum element in a vector

Consider a vector V riddled with noisy elements. What would be the fastest (or any) way to find a reasonable maximum element?
For e.g.,
V = [1 2 3 4 100 1000]
rmax = 4;
I was thinking of sorting the elements and finding the second differential {i.e. diff(diff(unique(V)))}.
EDIT: Sorry about the delay.
I can't post any representative data since it contains 6.15e5 elements. But here's a plot of the sorted elements.
By just looking at the plot, a piecewise linear function may work.
Anyway, regarding my previous conjecture about using differentials, here's a plot of diff(sort(V));
I hope it's clearer now.
EDIT: Just to be clear, the desired "maximum" value would be the value right before the step in the plot of the sorted elements.
NEW ANSWER:
Based on your plot of the sorted amplitudes, your diff(sort(V)) algorithm would probably work well. You would simply have to pick a threshold for what constitutes "too large" a difference between the sorted values. The first point in your diff(sort(V)) vector that exceeds that threshold is then used to get the threshold to use for V. For example:
diffThreshold = 2e5;
sortedVector = sort(V);
index = find(diff(sortedVector) > diffThreshold,1,'first');
signalThreshold = sortedVector(index);
Another alternative, if you're interested in toying with it, is to bin your data using HISTC. You would end up with groups of highly-populated bins at both low and high amplitudes, with sparsely-populated bins in between. It would then be a matter of deciding which bins you count as part of the low-amplitude group (such as the first group of bins that contain at least X counts). For example:
binEdges = min(V):1e7:max(V); % Create vector of bin edges
n = histc(V,binEdges); % Bin amplitude data
binThreshold = 100; % Pick threshold for number of elements in bin
index = find(n < binThreshold,1,'first'); % Find first bin whose count is low
signalThreshold = binEdges(index);
OLD ANSWER (for posterity):
Finding a "reasonable maximum element" is wholly dependent upon your definition of reasonable. There are many ways you could define a point as an outlier, such as simply picking a set of thresholds and ignoring everything outside of what you define as "reasonable". Assuming your data has a normal-ish distribution, you could probably use a simple data-driven thresholding approach for removing outliers from a vector V using the functions MEAN and STD:
nDevs = 2; % The number of standard deviations to use as a threshold
index = abs(V-mean(V)) <= nDevs*std(V); % Index of "reasonable" values
maxValue = max(V(index)); % Maximum of "reasonable" values
I would not sort then difference. If you have some reason to expect continuity or bounded change (the vector is of consecutive sensor readings), then sorting will destroy the time information (or whatever the vector index represents). Filtering by detecting large spikes isn't a bad idea, but you would want to compare the spike to a larger neighborhood (2nd difference effectively has you looking within a window of +-2).
You need to describe formally the expected information in the vector, and the type of noise.
You need to know the frequency and distribution of errors and non-errors. In the simplest model, the elements in your vector are independent and identically distributed, and errors are all or none (you randomly choose to store the true value, or an error). You should be able to figure out for each element the chance that it's accurate, vs. the chance that it's noise. This could be very easy (error data values are always in a certain range which doesn't overlap with non-error values), or very hard.
To simplify: don't make any assumptions about what kind of data an error produces (the worst case is: you can't rule out any of the error data points as ridiculous, but they're all at or above the maximum among non-error measurements). Then, if the probability of error is p, and your vector has n elements, then the chance that the kth highest element in the vector is less or equal to the true maximum is given by the cumulative binomial distribution - http://en.wikipedia.org/wiki/Binomial_distribution
First, pick your favorite method for identifying outliers...
If you expect the numbers to come from a normal distribution, you can use a say 2xsd (standard deviation) above the mean to determine your max.
Do you have access to bounds of your noise-free elements. For example, do you know that your noise-free elements are between -10 and 10 ?
In that case, you could remove noise, and then find the max
max( v( find(v<=10 & v>=-10) ) )

Units of a Fourier Transform (FFT) when doing Spectral Analysis of a Signal

My question has to do with the physical meaning of the results of doing a spectral analysis of a signal, or of throwing the signal into an FFT and interpreting what comes out using a suitable numerical package,
Specifically:
take a signal, say a time-varying voltage v(t)
throw it into an FFT (you get back a sequence of complex numbers)
now take the modulus (abs) and square the result, i.e. |fft(v)|^2.
So you now have real numbers on the y axis -- shall I call these spectral coefficients?
using the sampling resolution, you follow a cookbook recipe and associate the spectral coefficients to frequencies.
AT THIS POINT, you have a frequency spectrum g(w) with frequency on the x axis, but WHAT PHYSICAL UNITS on the y axis?
My understanding is that this frequency spectrum shows how much of the various frequencies are present in the voltage signal -- they are spectral coefficients in the sense that they are the coefficients of the sines and cosines of the various frequencies required to reconstitute the original signal.
So the first question is, what are the UNITS of these spectral coefficients?
The reason this matters is that spectral coefficients can be tiny and enormous, so I want to use a dB scale to represent them.
But to do that, I have to make a choice:
Either I use the 20log10 dB conversion, corresponding to a field measurement, like voltage.
Or I use the 10log10 dB conversion, corresponding to an energy measurement, like power.
Which scaling I use depends on what the units are.
Any light shed on this would be greatly appreciated!
take a signal, a time-varying voltage v(t)
units are V, values are real.
throw it into an FFT -- ok, you get back a sequence of complex numbers
units are still V, values are complex ( not V/Hz - the FFT a DC signal becomes a point at the DC level, not an dirac delta function zooming off to infinity )
now take the modulus (abs)
units are still V, values are real - magnitude of signal components
and square the result, i.e. |fft(v)|^2
units are now V2, values are real - square of magnitudes of signal components
shall I call these spectral coefficients?
It's closer to an power density rather than usual use of spectral coefficient. If your sink is a perfect resistor, it will be power, but if your sink is frequency dependent it's "the square of the magnitude of the FFT of the input voltage".
AT THIS POINT, you have a frequency spectrum g(w): frequency on the x axis, and... WHAT PHYSICAL UNITS on the y axis?
Units are V2
The other reason the units matter is that the spectral coefficients can be tiny and enormous, so I want to use a dB scale to represent them. But to do that, I have to make a choice: do I use the 20log10 dB conversion (corresponding to a field measurement, like voltage)? Or do I use the 10log10 dB conversion (corresponding to an energy measurement, like power)?
You've already squared the voltage values, giving equivalent power into a perfect 1 Ohm resistor, so use 10log10.
log(x2) is 2 log(x), so 20log10 |fft(v)| = 10log10 ( |fft(v)|2), so alternatively if you did not square the values you could use 20log10.
The y axis is complex (as opposed to real). The magnitude is the amplitude of the original signal in whatever units your original samples were in. The angle is the phase of that frequency component.
Here's what I've been able to come up with so far:
The y-axis seems likely to be in units of [Energy / Hz] !?
Here's how I'm deriving this (feedback welcomed!):
the signal v(t) is in volts
so after taking the Fourier integral: integral e^iwt v(t) dt , we should have units of [volts*seconds], or [volts/Hz] (e^iwt is unitless)
taking the magnitude squared should then give units of [volts^2 * s^2], or [v^2 * s/Hz]
we know Power is proportional to volts ^2, so this gets us to [power * s / Hz]
but Power is the time-rate of change in energy, i.e. power = energy/s, so we can also write Energy = power * s
this leaves us with the candidate conclusion [Energy/Hz]. (Joules/Hz ?!)
... which suggests the meaning "Energy content per Hz", and suggests as a use integrating frequency bands and seeing the energy content... which would be very nice if it were true...
Continuing... assuming the above is correct, then we are dealing with an Energy measurement, so this would suggest using 10log10 conversion to get into dB scale, instead of 20log10...
...
The power into a resistor is v^2/R watts. The power of a signal x(t) is an abstraction of the power into a 1 Ohm resistor. Therefore, the power of a signal x(t) is x^2 (also called instantaneous power), regardless of the physical units of x(t).
For example, if x(t) is temperature, and the units of x(t) are degrees C, then the units for the power x^2 of x(t) are C^2, certainly not watts.
If you take the Fourier transform of x(t) to get X(jw), then the units of X(jw) are C*sec or C/Hz (according to the Fourier transform integral). If you use (abs(X(jw)))^2, then the units are C^2*sec^2=C^2*sec/Hz. Since power units are C^2, and energy units are C^2*sec, then abs(X(jw)))^2 gives the energy spectral density, say E/Hz. This is consistent with Parseval's theorem, where the energy of x(t) is given by (1/2*pi) times the integral of abs(X(jw)))^2 with respect to w, i.e., (1/2*pi)*int(abs(X(jw)))^2*dw) > (1/2*pi)*(C^2*sec^2)*2*pi*Hz > (1/2*pi)*(C^2*sec/Hz)*2*pi*Hz > E.
Conversion to a dB (log scale) scale does not change the units.
If you take the FFT of samples of x(t), written as x(n), to get X(k), then the result X(k) is an estimate of the Fourier series coefficients of a periodic function, where one period over T0 seconds is the segment of x(t) that was sampled. If the units of x(t) are degrees C, then the units of X(k) are also degrees C. The units of abs(X(k))^2 are C^2, which are the units of power. Thus, a plot of abs(X(k))^2 versus frequency shows the power spectrum (not power spectral density) of x(n), which is an estimate the power of a set of frequency components of x(t) at the frequencies k/T0 Hz.
Well, late answer I know. But I just had cause to do something like this, in a different context. My raw data was latency values for transactions against a storage unit - I resampled it to a 1ms time interval. So original data y was "latency, in microseconds." I had 2^18 = 262144 original data points, on 1ms time steps.
After I did the FFT, I got a 0th component (DC) such that the following held:
FFT[0] = 262144*(average of all input data).
So it looks to me like FFT[0] is N*(average of input data). That sort of makes sense - every single data point possesses that DC average as part of what it is, so you add 'em all up.
If you look at the definition of the FFT that makes sense too. All of the other components would involve sine and cosine terms too, but really the FFT is just a summation. The average is just the only one that happens to be present in all points equally, because you have cos(0) = 1.

Resources