How to compute the derivative of a function through the Fourier domain? - math

I am currently working on a project where I have to reconstruct an Image from its gradient by solving a Poisson equation, which we solve in the Fourier domain.
The solution involves the product of the image's FT and that of the discrete derivation filter. In the Fourier domain, this is defined as the coordinate-wise product of the two.
I understand how the FT of an image is computed, however I have trouble understanding how I should compute that of a filter, as [0 -1 1] for horizontal differences. Should I use the same formula as for images? This seems weird to me as I would keep only 2 components of my FT after having multiplied it with the image's FT.

To compute the convolution though the Fourier domain, one first pads the kernel with zeros to have the same size as the image, then computes the FFT of both the image and the padded kernel, and then multiplies those two frequency spectra. It is important that the origin of the kernel be put in the right place when padding. See this answer for details on how to do the padding right.
However, to compute derivatives you do not want to do it this way. Instead, use the Fourier property that the derivative in the spatial domain is a multiplication with jω.
The [1,0,-1] filter (or [0,1,-1] or whichever you want to use) is a discrete approximation to the derivative. If you go through the Fourier domain, you might as well compute the exact derivative.
For example, in MATLAB you would do:
a = imread('cameraman.tif');
A = fft2(a);
N = size(A,2); % we're computing the x-derivative here, that is dimension 2 in MATLAB
w = ifftshift((0:N-1)-floor(N/2)) * (pi / N);
B = A .* (1i * w); % for MATLAB R2016a and older, use bsxfun here
b = real(ifft2(B));

Related

The reason of bias'es usage in networks?

It may be easy to see why but I still don't understand why we use bias in a neutral network? The weight's values will get changed, therefore ensuring whether the algorithm will learn. So, why use bias in all of this?
Because of linear equations.
Bias is another learned parameter. A single neuron will compute w*x + b where w is your weight parameter and b your bias.
Perhaps this helps you: Let's assume you are dealing with a 2D euclidian space that you'd like to classify with two labels. You can do that by computing a linear function and then classify everything below it with one label, and everything below with another. If you would not use the bias you could only change the slope of your function and your function would always intersect (0, 0). Bias gives you the possibility to define where that linear function intersects the y-axis for x=0, i.e. (0, y). E.g. without bias you could not separate data that is only separatable by the y-axis.

Perlin noise different implementations

I have read some articles on perlin noise but each seems to have their own way of implementation:
In this article, the gradient function returns a single double value.
In this article, the gradient is generated as a 3D vector.
In this article a static 256 array of random gradient vectors is generated and a random one is picked using the permutation table and then more complex details of spherical gradients are discussed.
And these are just a few of the articles I saw. With all these variations of the same algorithm which one do I use or which one is suitable for what purpose?
I have generated terrains and height maps with each of these techniques and their respective outputs widely differ in their own ways and I can't tell if I am doing it right cause I don't know what to look for in the output (cause it's just random values at the end)
I am just looking for some context on when to use what so any insight would be very usefull
There are multiple ways to implement the same algorithm, some are faster or slower than others, some are easier or harder to understand. The original implementation by Ken Perlin is difficult to understand by just looking at it. So some of the articles you linked (including #2, which I wrote, yay!), try to simplify the implementation to make it easier to understand.
But in the end, its exactly the same algorithm:
Take the input, calculate the coordinates of the 4 corners of the square (for 2D Perlin noise, or cube if using the 3D version) containing the input point
Calculate a random value for all 4 of them (by first assigning a random gradient vector to each one (there are 4 possibilities in 2D: (+1, +1), (-1, +1), (-1, -1) and (+1, -1)), then calculating the dot product between this random gradient vector and the vector from the corner of the square to the input point)
Finally, smoothly interpolate between those 4 random values to get a final value
In article #1, the grad function returns the dot product directly, whereas in article #2, vector objects are created and a dot product function is called to make it explicit what is being done (this will probably be a bit slower than the other implementations since a lot of vector objects are created and used briefly each time you want to run the algorithm).
Whether 2 implementations will produce the same terrain / height maps depends on if they generate the same random values for each corner of the square/cube (the results of the dot products). If 2 algorithms generate the same random values for every single integer points on the grid (all the corners of all the possible squares/cubes), then they will produce the same results. Ken Perlin's original implementation and the 3 articles all use an array of integers to generate a random gradient vector for each corner (out of 4 possible choices) to calculate the dot product. So in theory if the arrays are identical, then they should produce the same results. (Unless maybe if some implementation uses another method to generate the random vectors.)
I'm not really sure if that answers you questions, so don't hesitate to ask something else :)
Edit:
Generally, you would not use Perlin noise alone. So for every final value you want (for example a single pixel in a height map texture), you would call the noise function multiple times (octaves). For example:
float finalValue = 0.0f;
float amplitude = 1.0f;
float frequency = 1.0f;
int octaveCount = 8;
for (int octave = 0; octave < octaveCount; ++octave) {
finalValue += amplitude * noise(x * frequency, y * frequency, z * frequency);
amplitude *= 0.5f;
frequency *= 2.0f;
}
// Do something fun with 'finalValue'
Frequency, amplitude and the number of octaves are the most common parameters you can play with to produce different values.
If, say, you are generating a terrain, you would want many octaves. The first one will produce the rough shape of the mountains, so you would want a high amplitude (1.0 in the example code) and low frequency (also 1.0 in the above code). But just this octave would result in really smooth terrain with no details. For those small details, you would want more octaves, but with higher frequencies (so in the same range for your inputs (x, y, z), you would have a lot more ups and downs of the Perlin noise value), and lower amplitudes (you want small details, because if you would keep the same amplitude as the first octave (1.0, in the example code), there would be a lot of ups and downs really close together and really high, and this would result in a really rough moutains (imagine 100 meters 80 degrees drops and slopes every few meters you walk))
You can play with those parameters to get different results. There is also something called "domain warping" or "warped noise" that you can look up. Basically, you call a noise function as the input of a noise function. Like instead of calling:
float result = noise(x, y, z);
You would call something like:
// The numbers used are arbitrary values, you can just play around until you get something cool
float result = noise(noise(x * 1.7), 0.5 * noise(y * 4.1), noise(z * 2.3));
This can produce really interesting results

Rotate model around x,y,z axes, without gimbal lock, with input data always as x,y,z axes angle rotations

I have an input device that gives me 3 angles -- rotation around x,y,z axes.
Now I need to use these angles to rotate the 3D space, without gimbal lock. I thought I could convert to Quaternions, but apparently since I'm getting the data as 3 angles this won't help?
If that's the case, just how can I correctly rotate the space, keeping in mind that my input data simply is x,y,z axes rotation angles, so I can't just "avoid" that. Similarly, moving around the order of axes rotations won't help -- all axes will be used anyway, so shuffling the order around won't accomplish anything. But surely there must be a way to do this?
If it helps, the problem can pretty much be reduced to implementing this function:
void generateVectorsFromAngles(double &lastXRotation,
double &lastYRotation,
double &lastZRotation,
JD::Vector &up,
JD::Vector &viewing) {
JD::Vector yaxis = JD::Vector(0,0,1);
JD::Vector zaxis = JD::Vector(0,1,0);
JD::Vector xaxis = JD::Vector(1,0,0);
up.rotate(xaxis, lastXRotation);
up.rotate(yaxis, lastYRotation);
up.rotate(zaxis, lastZRotation);
viewing.rotate(xaxis, lastXRotation);
viewing.rotate(yaxis, lastYRotation);
viewing.rotate(zaxis, lastZRotation);
}
in a way that avoids gimbal lock.
If your device is giving you absolute X/Y/Z angles (which implies something like actual gimbals), it will have some specific sequence to describe what order the rotations occur in.
Since you say that "the order doesn't matter", this suggests your device is something like (almost certainly?) a 3-axis rate gyro, and you're getting differential angles. In this case, you want to combine your 3 differential angles into a rotation vector, and use this to update an orientation quaternion, as follows:
given differential angles (in radians):
dXrot, dYrot, dZrot
and current orientation quaternion Q such that:
{r=0, ijk=rot(v)} = Q {r=0, ijk=v} Q*
construct an update quaternion:
dQ = {r=1, i=dXrot/2, j=dYrot/2, k=dZrot/2}
and update your orientation:
Q' = normalize( quaternion_multiply(dQ, Q) )
Note that dQ is only a crude approximation of a unit quaternion (which makes the normalize() operation more important than usual). However, if your differential angles are not large, it is actually quite a good approximation. Even if your differential angles are large, this simple approximation makes less nonsense than many other things you could do. If you have problems with large differential angles, you might try adding a quadratic correction to improve your accuracy (as described in the third section).
However, a more likely problem is that any kind of repeated update like this tends to drift, simply from accumulated arithmetic error if nothing else. Also, your physical sensors will have bias -- e.g., your rate gyros will have offsets which, if not corrected for, will cause your orientation estimate Q to precess slowly. If this kind of drift matters to your application, you will need some way to detect/correct it if you want to maintain a stable system.
If you do have a problem with large differential angles, there is a trigonometric formula for computing an exact update quaternion dQ. The assumption is that the total rotation angle should be linearly proportional to the magnitude of the input vector; given this, you can compute an exact update quaternion as follows:
given differential half-angle vector (in radians):
dV = (dXrot, dYrot, dZrot)/2
compute the magnitude of the half-angle vector:
theta = |dV| = 0.5 * sqrt(dXrot^2 + dYrot^2 + dZrot^2)
then the update quaternion, as used above, is:
dQ = {r=cos(theta), ijk=dV*sin(theta)/theta}
= {r=cos(theta), ijk=normalize(dV)*sin(theta)}
Note that directly computing either sin(theta)/theta ornormalize(dV) is is singular near zero, but the limit value of vector ijk near zero is simply ijk = dV = (dXrot,dYrot,dZrot), as in the approximation from the first section. If you do compute your update quaternion this way, the straightforward method is to check for this, and use the approximation for small theta (for which it is an extremely good approximation!).
Finally, another approach is to use a Taylor expansion for cos(theta) and sin(theta)/theta. This is an intermediate approach -- an improved approximation that increases the range of accuracy:
cos(x) ~ 1 - x^2/2 + x^4/24 - x^6/720 ...
sin(x)/x ~ 1 - x^2/6 + x^4/120 - x^6/5040 ...
So, the "quadratic correction" mentioned in the first section is:
dQ = {r=1-theta*theta*(1.0/2), ijk=dV*(1-theta*theta*(1.0/6))}
Q' = normalize( quaternion_multiply(dQ, Q) )
Additional terms will extend the accurate range of the approximation, but if you need more than +/-90 degrees per update, you should probably use the exact trig functions described in the second section. You could also use a Taylor expansion in combination with the exact trigonometric solution -- it may be helpful by allowing you to switch seamlessly between the approximation and the exact formula.
I think that the 'gimbal lock' is not a problem of computations/mathematics but rather a problem of some physical devices.
Given that you can represent any orientation with XYZ rotations, then even at the 'gimbal lock point' there is a XYZ representation for any imaginable orientation change. Your physical gimbal may be not able to rotate this way, but the mathematics still works :).
The only problem here is your input device - if it's gimbal then it can lock, but you didn't give any details on that.
EDIT: OK, so after you added a function I think I see what you need. The function is perfectly correct. But sadly, you just can't get a nice and easy, continuous way of orientation edition using XYZ axis rotations. I haven't seen such solution even in professional 3D packages.
The only thing that comes to my mind is to treat your input like a steering in aeroplane - you just have some initial orientation and you can rotate it around X, Y or Z axis by some amount. Then you store the new orientation and clear your inputs. Rotations in 3DMax/Maya/Blender are done the same way.
If you give us more info about real-world usage you want to achieve we may get some better ideas.

Find the optimized rotation

I have an application where I must find a rotation from a set of 15 orderer&indexed 3D points (X1, X2, ..., X15) to another set of 15 points with the same index (1 initial point corresponding to 1 final point).
I've read manythings about finding the rotation with Euler angles (evil for some persons), quaternions or with projecting the vector on the basis axis. But i've an additionnal constraint : a few points of my final set can be wrong (i.e. have wrong coordinates) so I want to discriminate the points that ask a rotation very far from the median rotation.
My issue is : for every set of 3 points (not aligned ones) and their images I can compute quaternions (according to the fact that the transformation matrix won't be a pure rotation I have some additionnal calculations but it can be done). So I get a set of quaternions (455 ones max) and I want to remove the wrong ones.
Is there a way to find what points give rotations far from the mean rotation ? Does the "mean" and "the standard deviation" mean something for quaternions or must I compute Euler angles ? And once I've the set of "good" quaternions, how can I compute the "mean" quaternion/rotation ?
Cheers,
Ricola3D
In computer vision, there's a technique called RANSAC for doing something like what you propose. Instead of finding all of the possible quaternions, you would use a minimal set of point correspondences to find a single quaternion/transformation matrix. You'd then evaluate all of the points for quality of fit, discarding those that don't fit well enough. If you don't have enough good matches, perhaps you got a bad match in your original set. So you'll throw away that attempt and try again. If you do get enough good matches, you'll do a least-squares regression fit of all the inlier points to get a new transformation matrix and then iterate until you're happy with the results.
Alternatively, you could take all of your normalized quaternions and find the dot-product between them all. The dot-product should always be positive; if it's not for any given calculation, you should negate all of the components of one of the two quaternions and re-compute. You then have a distance measure between the quaternions and you can cluster or look for gaps.
There are 2 problems here:
how do you compute a "best fit" for an arbitrary number of points?
how do decide which points to accept, and which points to reject?
The general answer to the first is, "do a least squares fit". Quaternions would probably be better than Euler angles for this; try the following:
foreach point pair (a -> b), ideal rotation by unit quaternion q is:
b = q a q* -> q a - b q = 0
So, look for a least-squares fit for q:
minimize sum[over i] of |q a_i - b_i q|^2
under the constraint: |q|^2 = 1
As presented above, the least-squares problem is linear except for the constraint, which should make it easier to solve than an Euler angle formulation.
For the second problem, I can see two approaches:
if your points aren't too far off, you could try running the least-squares solver with all points, then go back, throw out the "outliers" (those point pairs whose squared-error is greatest), and try again.
if wildly inconsistent points are throwing off the above procedure, you could try selecting random, small subsets of 3 or 4 pairs, and find a least-squares fit for each. If a large group of these results have similar rotations with low total error, you can use this to identify "good" pairs (and thereby eliminate bad pairs); then go back and find a least-squares fit for all good pairs.

Finding equivalent gaussian filter mask in frequency domain for given mask in space domain

So far I've implemented a gaussian blur filter entirely in the space domain, making use of the separability of the gaussian, that is, applying a 1D gaussian kernel along the rows and then along the columns of an image. That worked fine.
Now, given only with the size N of the NxN convolution matrix of the space domain, I want to achieve the exact same blurred image over the frequency domain. That means that I'll load the image into a matrix (numpy, I'm using python), apply the FFT on it (then I have G(x,y)), and then I have to have also a filter H(u,v) in the frequency domain that also resembles the shape of some 2d gaussian, with its center value being 1.0 and then having values falling off to 0 the further away from the center I am. I do then the multiplication in frequency domain (before I have to consider to do a center-shift of H) and then apply the iFFT.
The trouble I have is to find the exact formula (i.e. to find sigma, the std-deviation) that will result in the corresponding H(u,v). From the space domain, if I have been given a mask-size N, I know that the std-dev sigma can be approximated as sigma=(maskSize-1)/2/2.575, e.g. for a mask size N=15 I get std-dev=2.71845 for e^-(x²/2sigma²), just considering 1D cases for now.
But how do I get sigma for the frequency domain?
Funny thing is btw that in theory I know how to get sigma, using Mathematica, but the result is pure bogus, as I can demonstrate here:
gauss1d[x_, sigma_] := Exp[-(x^2)/(2 sigma^2)]
Simplify[FourierTransform[gauss1d[x, sigma], x, omega], sigma > 0]
The result is E^(-(1/2) omega^2 sigma^2) * sigma
This is bogus because it turns, in the exponent of the E function, the 1/sigma² into a sigma². Consequently, if you draw this, you will see that the standard deviation has become a lot smaller, since the H(u,v)-gaussian is a lot "thinner". However, it should actually be a lot wider in the frequency domain than in the space domain!! It doesn't make any sense...
The Fourier Transform of the a Gaussian is a Gaussian as you can see from
http://en.wikipedia.org/wiki/Fourier_transform
But note that the std. dev. DOES invert!!!!
You say that is bogus BUT it is not. The frequency domain is, in some sense, the inversion of the time domain.
freq = 1/time
The standard deviation given is in time, when you transform it is still in time(the constant does not get transformed).
Suppose your you found the time version of the Gaussian using some s in terms of time. You transform the data into freq space. You can use that s and it will behave exactly the way it is suppose to. e.g., if you have a small s then it will cause the freq std. dev. in the frequency version to be large.
Again, this is because frequency is the inversion of time(again, in a sense).
Suppose your Gaussian has very small std. dev. Then it approximates a dirac delta function. We know this because gets transformed into a sinusoid in the freq domain. i.e., something that spans the whole frequency domain. (i.e., has infinite std. dev.(if it were a Gaussian).
Think of it like this: You are wanting to smooth in the frequency domain. Smooth what? High frequency components, right? By convoluting with a gaussian you are smoothing near by data. If the std. dev. is small you are keeping higher frequencies. In the frequency domain this means you are KEEPING more frequencies. But the conv is a multiplication in frequency domain. If we mulitplied by a thin gaussian in the frequency domain we would be left with a small group of frequencies.
G(t)*f(t)
G[w]*f[w]
The first, a convolution. For a smooth filter we want G(t) to be "large"(std. dev. to be large). This means in we want less of the high frequency components(a sort of low pass filter). In the freq. domain we are multiplying by G[w]. This means that G[w] must be thin(and centered around the origin) so that we block out the highs.
I think basically you are not realizing that in the time domain we have a convolution and the frequency domain it is a multiplication. G cannot be the same in both. If G is thin in the time domain and thin in the frequency domain they will not result in the same effect. G thin in the convolution gives almost no effect but G thin in the freq. domain almost completely removes all the frequencies(just the very low are kept).

Resources