Project Tango strange rotation visualisation - 3d-reconstruction

I am working on 3D reconstruction with tango. Our system is quite similar to KinectFusion, which uses voxel representation, but use Tango as tracker. Left image (in video linked below) is rendered by Raycast at current pose (given by tango) in real time. Raw pose converted by GetOC2OWMat() as in code examples, in addition sign of tx and rx are flipped to cope with our system. Everything works fine except ration in Z axis, which changes angle in rendered image. I guess coordinate system conversion is not done properly, but depth integration is working if no Z rotation is involved. I have also checked det(R) is always 1.
Video

It sounds like you are not factoring in intrinsics - have you accounted for camera and device IMU frames ? You need these to fully re-establish original viewpoint, i.e. both camera and device imu frame matrices need to be multiplied in to your stack

Sorry that I just find the place where things goes wrong. When the image is displayed with opengl, the rendered gl size does not have same aspect ratio as Raycasting image.

Do you program with Java/C/Unity? I'm curious because my device has problems with the camera data and you seem to capture it without problems. I am quite sure it's a bug but I would like to make sure it really is one.

Related

Why doesn't Vulkan use the "standard cartesian" coordinate system?

Vulkan uses a coordinate system where (-1, -1) is in the top left quadrant, instead of the bottom left quadrant as in the standard cartesian coordinate system one typically learns about in school. So (-1, 1) is in the bottom left quadrant in Vulkan's coordinate system.
(image from: http://vulkano.rs/guide/vertex-input)
What are the advantages of using Vulkan's coordinate system? One plain advantage I can see is pedagogical: it forces people to realize that coordinate systems are arbitrary, and one can easily map between them. However, I doubt that's the design reason.
So what are the design reasons for this choice?
Many coordinate systems in computer graphics put the origin at the top-left and point the y axis down.
This is because in early televisions and monitors, the electron beam that draws the picture starts at the top-left of the screen and progresses downward.
The pixels on the screen were generally made by reading memory in sequential addresses as the beam moved down the screen, and modulating that electron beam in accordance with each byte read in sequence. So the y axis corresponds to time, which corresponds to memory address.
Even today, virtually all representations of a bitmap in memory, or in a bitmapped file, start at the top-left.
It is natural when drawing bitmaps in such a medium to use a coordinate system that starts at the top-left too.
Things become a little more complicated when you use a bottom-left origin because finding the byte that corresponds to a pixel requires a little more math and needs to account for the height of the bitmap. There is usually just no reason to introduce the extra complexity.
When you start to introduce matrix transformations however, it becomes much more convenient to work with an upward-pointing y axis, because that lets you use all the vector algebra you learned in school without having to reverse the y axis and all the rotations in your thinking.
So what you'll usually find is that when you are working in a system that lets you do matrix operations, translations, rotations, etc., then you will have an upward-pointing y axis. At some point deep inside, however, the calculations will transform the coordinates into a downward-pointing y axis for the low-level operations.
One of the common sources of confusion and bugs in OpenGL was that NDC and window coordinates had y increasing upwards, which is opposite of the convention used in nearly all window systems and many (but not all) image formats, where y is [0..1] increasing downwards. Developers ended up having to insert a y-flip in their transformation pipeline in many cases, and it wasn't always clear when they did and didn't.
So Vulkan decided to make it so you could load an image from a y-downwards image format directly into memory and draw it to the screen without any explicit y flips, to avoid this source of errors.
Other coordinate systems were then chosen to be consistent with that, in the sense that the y direction never flips direction in the standard Vulkan transformation pipeline. That meant that clip space vertex coordinates also had y increasing downwards.
This ended up meaning that Vulkan clip coordinates have a different orientation than D3D clip coordinates, which was an annoyance for developers supporting both APIs. So the VK_KHR_maintenance1 extension adds the ability to specify a negative viewport height, which essentially introduces a y-flip to the clip-space to framebuffer coordinate transform. (D3D has essentially always had an implicit y-flip here.)
This is how I remember the reasoning in the Vulkan Working Group, anyway. I don't think there's an authoritative public source anywhere.

How to avoid strange structure artifacts in scaled images?

I create a big image stitched out of many single microscope images.
Suddenly, (after several month of working properly) the stitched overview images became blurry and they are containing strange structural artefacts like askew lines (not the rectangulars, they are because of not perfect stitching)
If I open any particular tile in full size, they are not blurry and the artefacts are hardly observable. (Consider, the image below is already 4x scaled)
The overview image is created manually by scaling each tile using QImage::scaled and copying all of them to the corresponding region in the big image. I'm not using opencv's stitching.
I assume, this happens because of image contents, because most if the overview images are ok.
The question is, how can I avoid such hardly observable artefacts to become very clearly visible after scaling? Is there some means in OpenCV or QImage?
Is there any algorithms to find out, if image content could lead to such effect for defined scale-factor?
Many thanks in advance!
Are you sure the camera is calibrated properly? That the lightning is uniform? Is the lens clear? Do you have electrical components that interfere with the camera connection?
If you add image frames of photos on a uniform material (or non-uniform material, moved randomly for significant time), the resultant integrated image should be completely uniform.
If your produced image is not uniform, especially if you get systematic noise (like the apparent sinusoidal noise in the provided pictures), write a calibration function that transforms image -> calibrated image.
Filtering in Fourier space is another way to filter out the noise but considering that the image is rotated you will lose precision, and you'll be cutting off components of the real signal, too. The following empiric method will reduce the noise in your particular case significantly:
ground_output: composite image with per-pixel sum of >10 frames (more is better) over uniform material (e.g. excited slab of phosphorus)
ground_input: the average(or sqrt(sum of px^2)) in ground_output
calib_image: ground_input /(per px) ground_output. Saved for the session, or persistent in a file (important: ensure no lossy compression! (jpeg)).
work_input: the images to work on
work_output = work_input *(per px) calib_image: images calibrated for systematic noise.
If you can't create a perfect ground_input target such as having a uniform material on hand, do not worry too much. If you move any material uniformly (or randomly) for enough time, it will act as a uniform material in this case (think of a blurred photo).
This method has the added advantage of calibrating solitary faulty pixels that ccd cameras have (eg NormalPixel.value(signal)).
If you want to have more fun you can always fit the calibration function to something more complex than a zero-intercept line (steps 3. and 5.).
I suggest scaling the image with some other software to verify if the artifacts are in fact caused by Qt or are inherent in the image you've captured.
The askew lines look a lot like analog tv interference, or CCTV noise induced by 50 or 60 Hz power lines running alongside the signal cable or some other electrical interference on the signal.
If the image distortion is caused by signal interference then you can try to mitigate it by moving the signal lines away from whatever could be the source of the problem, or fit something to try to filter the noise (baluns for example).

How would you continuously improve the mandelbrot fractal?

I've seen many mandelbrot image generator drawing a low resolution fractal of the mandelbrot and then continuously improve the fractal. Is this a tiling algorithm? Here is an example: http://neave.com/fractal/
Update: I've found this about recursively subdivide and calculate the mandelbrot: http://www.metabit.org/~rfigura/figura-fractal/math.html. Maybe it's possible to use a kd-tree to subdivide the image?
Update 2: http://randomascii.wordpress.com/2011/08/13/faster-fractals-through-algebra/
Update 3: http://www.fractalforums.com/programming/mandelbrot-exterior-optimization/15/
Author of Fractal eXtreme and the randomascii blog post linked in the question here.
Fractal eXtreme does a few things to give a gradually improving fractal image:
Start from the middle, not from the top. This is a trivial change that many early fractal programs ignored. The center should be the area the user cares the most about. This can either be starting with a center line, or spiraling out. Spiraling out has more overhead so I only use it on computationally intense images.
Do an initial low-res pass with 8x8 blocks (calculating one pixel out of 64). This gives a coarse initial view that is gradually refined at 4x4, 2x2, then 1x1 resolutions. Note that each pass does three times as many pixels as all previous passes -- don't recalculate the original points. Subsequent passes also start at the center, because that is more important.
A multi-pass method lends itself well to guessing. If four pixels in two rows have the same value then the pixels in-between probably have the same value, so don't calculate them. This works extremely well on some images. A cleanup pass at the end to look for pixels that were miscalculated is necessary and usually finds a few errors, but I've never seen visible errors after the cleanup pass, and this can give a 10x+ speedup. This feature can be disabled. The success of this feature (guess percentage) can be viewed in the status window.
When zooming in (double-click to double the magnification) the previously calculated pixels can be used as a starting point so that only three quarters of the pixels need calculating. This doesn't work when the required precision increases but these discontinuities are rare.
More sophisticated algorithms are definitely possible. Curve following, for instances.
Having fast math also helps. The high-precision routines in FX are fully unwound assembly language (generated by C# code) that uses 64-bit multiplies.
FX also has a couple of checks for points within the two biggest bulbs, to avoid calculating them at all. It also watches for cycles in calculations -- if the exact same point shows up then the calculations will repeat.
To see this in action visit http://www.cygnus-software.com/
I think that site is not as clever as you give it credit for. I think what happens on a zoom is this:
Take the previous image, scale it up using a standard interpolation method. This gives you the 'blurry' zoomed in image. Click the zoom in button several times to see this best
Then, in concentric circles starting from the central point, recalculate squares of the image in full resolution for the new zoom level. This 'sharpens' the image progressively from the centre outwards. Because you're probably looking at the centre, you see the improvement straight away.
You can more clearly see what it's doing by zooming far in, then dragging the image in a diagonal direction, so that almost all the screen is undrawn. When you release the drag, you will see the image rendered progressively in squares, in concentric circles from the new centre.
I haven't checked, but I don't think it's doing anything clever to treat in-set points differently - it's just that because an entirely-in-set square will be black both before and after rerendering, you can't see a difference.
The oldschool Mandelbrot rendering algorithm is the one that begins calculating pixels at the top-left position, goes right until it reaches the end of the screen then moves to the beginning of next line, like an ordinary typewriter machine (visually).
The linked algorithm is just calculating pixels in a different order, and when it calculates one, it quickly makes assumption about certain neighboring pixels and later goes back to properly redraw them. That's when you see improvement, think of it as displaying a progressive JPEG. If you zoom into the set, certain pixel values will remain the same (they don't need to be recalculated) the interim pixels will be guessed, quickly drawn and later recalculated.
A continuously improving Mandelbrot is just for your eyes, it will never finish earlier than a properly calculating per-pixel algorithm which can detect "islands".

Augmented Reality Demo

I'm trying to build an Augmented Reality Demonstration, like this iPhone App:
http://www.acrossair.com/acrossair_app_augmented_reality_nearesttube_london_for_iPhone_3GS.htm
However my geometry/math is a bit rusty nowadays.
This is what I know:
If i have my Android phone on the landscape mode (with the home button on the left), my z axis points to the direction I'm looking.
From the sensors of my phone i know what is the angle my z axis has with the North axis, let's call this angle theta.
If I have a vector from my current position to the point I want to show in my screen, i can calculate the angle this vector does with my z axis. Let's call this angle alpha.
So, based on the alpha angle I have a perception of where the point is, and I'm able to show it in the screen (like the Nearest Tube App).
This is the basic theory of a simple demonstration (of course it's nothing like the App, but it's the first step).
Can someone give me some lights on this matter?
[Update]
I've found this very interesting example, however I need to have the movement on both xx and yy axis. Any hints?
The basics are easy. You need the angle between your location and your destiny (arctangent), and the heading (from the digital compass in your phone). See this answer: Augmented Reality movement There is some objective-c code down there that you can read if you come from java.
What you want is a 3d-Space-Filling-Curve for example a hilbert-curve. That is a spatial index over 3 ccordinate. It is comparable to a octree. You want to store the object in that octree and do a depth-firat search on the coordinate you have recorded with your iphone as fixed coordinate probably the center of the screen. A octree subdivde the space continously in eigth directions and a 3d-Space-Filling-Curve is an hamiltonian path through the space which is like a fracta but it is clearly distinctable from the region of the octree. I use 2d-hilbert-curve to speed search in geospatial databases. Maybe you want to start with this first?

Moving sprites between tiles in an Isometric world

I'm looking for information on how to move (and animate) 2D sprites across an isometric game world, but have their movement animated smoothly as the travel from tile to tile, as opposed to having them jump from the confines of one tile, to the confines of the next.
An example of this would be in the Transport Tycoon Game, where the trains and carriages are often half in one tile and half in the other.
Drawing the sprites in the right place isn't too difficult. The projection formula are:
screen_x = sprite_x - sprite_y
screen_y = (sprite_x + sprite_y) / 2 + sprite_z
sprite_x and sprite_y are fixed point values (or floating point if you want). Usually, the precision of the fixed point is the number of pixels on a tile - so if your tile graphic was 32x16 (a projected 32x32 square) you would have 5 bits of precision, i.e. 1/32th of a tile.
The really hard part is to sort the sprites into an order that renders correctly. If you use OpenGL for drawing, you can use a z-buffer to make this really easy. Using GDI, DirectX, etc, it is really hard. Transport Tycoon doesn't correctly render the sprites in all instances. The original Transport Tycoon had the most horrendous rendering engine you've ever seen. It implemented the three zoom levels are three instanciations of a massive masm macro. TT was written entirely in assembler. I know, because I ported it to the Mac many years ago (and did a cool version for the PS1 dev kit as well, it needed 6Mb though).
P.S. One of the small bungalow graphics in the game was based on the house Chris Sawyer was living in at the time. We were tempted to add a Ferrari parked in the driveway for the Mac version as that was the car he bought with the royalties.
Look up how to do linear interpolation (it's a pretty simple formula). You can then use this to parameterise the transition on a single [0, 1] range. You then simply have a state in your sprites to store the facts:
That they are moving
Start and end points
Start and end times (or start time and duration
and then each frame you can draw it in the correct position using an interpolation from the start point to the end point. Once you have exceeded the duration, the sprite then gets updated to be not-moving and positioned in the end point/tile.
Why are you thinking it'll jump from tile to tile? You can position your sprite at any x,y co-ordinate.
First create your background screen buffer and then place your sprites on top of it.

Resources