Having problems interpreting and implementing radial distortion correction - math

I am trying to implement the radial distortion correction from this technical report: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr98-71.pdf
I am having quite a few problems interpreting/understanding the equations in the report. The equations of interest are number 11 and 12 with the matrix formulation provided just afterwards.
The problem I am having is that its really unclear as to what the descriptions of some of the variables actually mean.
If we look at the descriptions of [u,v] and [x,y], in the technical report it states that [u,v] are the pixel image coordinates and that [x,y] are the normalised image coordinates. My intuitive understanding of this is that [x,y] = [u,v] – [principal point] but then the presence of the [u-u_0] term would be redundant if that was correct.
I so far have been able to determine the intrinsic parameters of a camera and the only thing perventing me doing distortion correction is my understanding of the equations.

The steps are clearly laid out in the documentation of the OpenCV rooutines implementing Zhang's method. Please see the "Detailed Description" section here

Related

stereo vision 3d point calculation with known intrinsic and extrinsic matrix

I have successfully calculated Rotation, Translation with the intrinsic camera matrix of two cameras.
I also got rectified images from the left and right cameras. Now, I wonder how I calculate the 3D coordinate of a point, just one point in an image. Here, please see the green points. I have a look at the equation, but it requires baseline which I don't know how to calculate. Could you show me the process of calculating the 3d coordinate of the green point with the given information (R, T, and intrinsic matrix)?
FYI
1. I also have a Fundamental matrix and Essential matrix, just in case we need them.
2. Original image size is 960 x 720. Rectified ones are 925 x 669
3. The green point from the left image: (562, 185), from the right image: (542, 185)
The term "baseline" usually just means translation. Since you already have your rotation, translation and intrinsics matrices (let's not them R, T and K). you can triangulate and don't need either the Fundamental or Essential matrices (they could be used to extract R, T etc but you already have them). You don't really need your images to be rectified either, since it doesn't change the triangulation process that much. There are many ways to triangulate, each with their pros and cons, and many libraries that implement them. So, all I can do here is give you and overview of the problem and potential solutions, as well as pointers to resources that you can either use as their are or as a source of inspiration to write your own code.
Formalization and solution outlines. Let's formalize what we are after here. You have a 3d point X, with two observations x_1 and x_2 respectively in the left and right images. If you backproject them, you obtain two rays:
ray_1=K^{1}x_1
rat_2=R*K^{-1}x_2+T //I'm assuming that [R|T] is the pose of the second camera expressed in the referential of the first camera
Ideally, you'd want those two rays to meet at point X. Since in practice we always have some noise (discretization noise, rounding errors and so on) the two rays wont meet at X, so the best answer would be a point Q such that
Q=argmin_X {d(X,ray_1)^2+d(X,ray_2)^2}
where d(.) denotes the Euclidian distance between a line and a point. You can solve this problem as a regular least squares problem, or you can just take the geometric approach (called midpoint) of considering the line segment l that is perpendicular to both ray_1 and ray_2, and take its middle as your solution. Another quick and dirty way is to use the DLT. Basically, you re-write the constrains (i.e. X should be as close as possible to both rays) as a linear system AX=0 and solve it with SVD.
Usually, the geometric (midpoint) method is the less precise. The DLT based one, while not the most stable numerically, usually produces acceptable results.
Ressources that present in depth formalization
Hartley-Zisserman's book of course! Chapter 12. A simple DLT-based method, which is the one used in opencv (both in the calibration and sfm modules) is explained on page 312. It is very easy to implement, it shouldn't take more that 10 minutes in any language.
Szeliski'st book. It has an intersting discussion on triangulation in the chapter on SFM, but is not as straight-forward or in depth as Hartley-Zisserman's.
Code. You can use the triangulation methods from opencv, either from the calib3d module, or from the contribs/sfm module. Both use the DLT, but the code from the SFM module is more easily understandable (the calib3d code has a lot of old-school C code which is not very pleasant to read). There is also another lib, called openGV, which has a few interesting methods for triangulation.
cv::triangulatePoints
cv::sfm::triangulatePoints
OpenGV
The openGV git repo doesn't seem very active, and I'm not a big fan of the design of the library, but if I remember correctly (feel free to tell me otherwise) it offers methods other that the DLT for triangulations.
Naturally, those are all written in C++, but if you use other languages, finding wrappers or similar libraries wont be difficult (with python you still have opencv wrappers, and MATLAB has a bundle module, etc.).

Estimation of camera displacement

I am currently working on a experiment that i took multiple photos
of a scene on diferent days with a fixed camera position.
The problem is that on real world it is hard to keep the camera
perfectly fixed.
What i need is to fix the small variance I got automaticaly. The research
I made returned methods considering more complex assumption, like camera
pose estimation, homography estimation etc.
For me its enought to discover just the movement at the image plane returning an
x and y.
A perfect solution would be a function such as:
function [movx movy] = detectMotion(im1,im2).
The solution I already made was to calculate some image features, like harris or
hessian, match them and after manualy select the best ones and use the difference
of their position as a camera displacement estimation. I dont know if this is good
enough but it would be better if it was made automaticaly.
You can do the feature matching automatically be extracting feature descriptors around the interest points. Take a look at this OpenCV tutorial on how to perform feature matching using SURF and FLANN. Once you have the feature matches, run RANSAC or least squares to find the best fit for an x- and y-offset. This will give you a decent estimate of the camera motion.
Another option is to compute sparse optical flow on the detected interest points between the two frames, followed by the RANSAC or least squares procedure as above to compute the best x- and y-offset. Dense optical flow could possibly be more accurate, but at the same time could prove to be overkill.

Fitting an ellipsoid to 3D data points

I have a large set of 3D data points to which I want to fit to an ellipsoid.
My maths is pretty poor, so I'm having trouble implementing the least squares method without any math libraries.
Does anyone know of or have a piece of code that can fit an ellipsoid to data which I can plug straight into my project? In C would be best, but it should be no problem for me to convert from C++, Java, C#, python etc.
EDIT: Just being able to find the centre would be a huge help too. Note that the points aren't evenly spaced so taking the mean won't result in the centre.
here you go:
This paper describes fitting an ellipsoid to multiple dimensions AS WELL AS finding the center for the ellipois. Hope this helps,
http://www.physics.smu.edu/~scalise/SMUpreprints/SMU-HEP-10-14.pdf
(btw, I'm assuming this answer is a bit late, but I figured I would add this solution for anyone who stumbles across your question in search for the same thing :)
If you want the minimum-volume enclosing ellipsoid, check out this SO answer for a bounding ellipsoid.
If you want the best fitting ellipse in a least-squares sense, check out this MATLAB code for error ellipsoids where you find the covariance matrix of your mean-shifted 3D points and use that to construct the ellipsoid.
Least Squares data fitting is probably a good methodology give the nature of the data you describe. The GNU Scientific Library contains linear and non-linear least squares data fitting routines. In your case, you may be able to transform your data into a linear space and use linear least-squares, but that would depend on your actual use case. Otherwise, you'll need to use non-linear methods.
I could not find a good Java based algorithm for fitting an ellipsoid, so I ended up writing it myself. There were some good algorithms for an ellipse with 2D points, but not for an ellipsoid with 3D points. I experimented with a few different MATLAB scripts and eventually settled on Yury Petrov's Ellipsoid Fit. It fits an ellipsoid to the polynomial Ax^2 + By^2 + Cz^2 + 2Dxy + 2Exz + 2Fyz + 2Gx + 2Hy + 2Iz = 1. It doesn't use any constraints to force an ellipsoid, so you have to have a fairly large number of points to prevent a random quardic from being fit instead of the ellipsoid. Other than that, it works really well. I wrote a small Java library using Apache Commons Math that implements Yury Petrov's script in Java. The GIT repository can be found at https://github.com/BokiSoft/EllipsoidFit.
We developed a set of Matlab and Java codes to fit ellipsoids here:
https://github.com/pierre-weiss
You can also check our open-source Icy plugin. The following tutorial can be helpful:
https://www.youtube.com/endscreen?video_referrer=watch&v=nXnPOG_YCxw
Note: most of the existing codes fit a generic quadric and do not impose an ellipsoidal shape. To get more robustness, you need to go to convex programming rather than just linear algebra. This is what is done in the indicated sources.
Cheers,
Pierre
Here is unstrict solution with fast and simple random search approach*. Best side - no heavy linear algebra library required**. Seems it worked fine for mesh collision detection.
Is assumes that ellipsoid center matches cloud center and then uses some sort of mirrored average to search for main axis.
Full working code is slightly bigger and placed on git, idea of main axis search is here:
np.random.shuffle(pts)
pts_len = len(pts)
pt_average = np.sum(pts, axis = 0) / pts_len
vec_major = pt_average * 0
minor_max, major_max = 0, 0
# may be improved with overlapped pass,
for pt_cur in pts:
vec_cur = pt_cur - pt_average
proj_len, rej_len = proj_length(vec_cur, vec_major)
if proj_len < 0:
vec_cur = -vec_cur
vec_major += (vec_cur - vec_major) / pts_len
major_max = max(major_max, abs(proj_len))
minor_max = max(minor_max, rej_len)
It can be improved/optimized even more at some points. Examples what it will produce:
And full experiment code with plots
*i.e. adjusting code lines randomly until they work
**was actually reason to figure out this solution
I have an idea. Approximately solution, not the best but will keep points inside. In XY plane find the radius R1 that will obtain all points. Same do for the XZ plane (R2) and YZ plane (R3). Then use the maximums on each axes. A=max(R1,R2), B=max(R1,R3) and C=max(R2,R3).
But, first of all find the average (center) of all points and align it to origin.
I have just gone through the same process.
Here is a python module which is based on work by Nima Moshtagh. Referenced in many places but also in this question about a Bounding ellipse
This module also handles plotting of the final ellipsoid. Enjoy!
https://github.com/minillinim/ellipsoid/blob/master/ellipsoid.py
I ported Yury Petrov's least-squares Matlab fitter to Java some time ago, it only needs JAMA: https://github.com/mdoube/BoneJ/blob/master/src/org/doube/geometry/FitEllipsoid.java

Method for finding normals to a voxel surface

I was working on a method to approximate the normal to a surface of a 3d voxel image.
The method suggested in this article (only algorithm I found via Google) seems to work. The suggested method from the paper is to find the direction the surface varies the most in, choose 2 points on the tangent plane using some procedure, and then take the cross product. Some Pascal code by the article author code, commented in Portuguese, implements this method.
However, using the gradient of f (use each partial derivative as a component of the vector) as the normal seems to work pretty well; I tested this along several circles on a voxellated sphere and I got results that look correct in most spots (there are a few outliers that are off by about 30 degrees). This is very different from the method used in the paper, but it still works. What I don't understand is why the gradient of f = 1/dist calculated along the surface of an object should produce the normal.
Why does this procedure work? Is it just the fact that the sphere test was too much of a special case? Could you suggest a simpler method, or explain any of these methods?
Using the gradient of the volume as a normal for lighting is a standard technique in volume rendering.
If you interpret the value of a voxel as the opacity, the gradient will give you the direction of the greatest change in the opacity, which is similar to a surface normal.

Lens correction projection

What is the simplest way to un-warp a photo made using fisheye or wide-angle lens? I'm looking a pixel projection formula that has few parameters. Camera and lens parameters will not be known, so user has to change the parameters visually. Thanks
There is a good paper here that provides some decent looking mathematical models for lens distortion. It's at least. SDX2000 was kind of on the right track with the grid I think. I think the most common way to approach the problem is to map the image to a grid and then allow warping parameters to be applied to produce pincushion and barrel distortion. See the lens distortion filters in Lightroom or Photoshop as an example.
There is an excellent discussion from ImageMagick. They give the equation that they use.
Note that this does not correct distortion in the same way as Photoshop CS6 (i.e. you cannot take coefficients from the Adobe lens profiles and simply chuck them in).
The paper that Kamil points to seems like an excellent in-depth look.
I would assume you could use the lens equation to do it.
1/f = 1/object_distance + 1/image_distance
Where f is the focal length (the user input). The ratio of image distance and object distance could be used to resize the image appropriately, using the magnification equation. To get what you really want, then, you need to restructure the equation:
1/object_distance = 1/f - 1/image_distance
And then use the magnification equation to use the object height to resize:
-image_distance/object_distance = image_height/object_height
The catch, as you may have noticed, is that you need to know the distance each pixel is away from the camera. Otherwise, it simply doesn't work. You could ask the user for that information, but that seems unlikely, and painful. I don't know of any other way to do it-- lens distortion is a 3D effect, and you're given 2D information. At best you can attempt to correct it two-dimensionally, but this will be difficult, and won't work properly.
If its possible you should ask the user to take a photograph of a reference image (a chess board for example) using the same camera and then use this information to analyze the lens characteristics. This information can then be used to un-warp the other photographs taken by the same camera.
For implementation you could use neural networks/genetic algorithms.

Resources