Distance between hyperplanes - math

I'm trying to teach myself some machine learning, and have been using the MNIST database (http://yann.lecun.com/exdb/mnist/) do so. The author of that site wrote a paper in '98 on all different kinds of handwriting recognition techniques, available at http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf.
The 10th method mentioned is a "Tangent Distance Classifier". The idea being that if you place each image in a (NxM)-dimensional vector space, you can compute the distance between two images as the distance between the hyperplanes formed by each where the hyperplane is given by taking the point, and rotating the image, rescaling the image, translating the image, etc.
I can't figure out enough to fill in the missing details. I understand that most of these are indeed linear operators, so how does one use that fact to then create the hyperplane? And once we have a hyperplane, how do we take its distance with other hyperplanes?

I will give you some hints. You need some background knowledge in image processing. Please refer to 2,3 for details.
2 is a c implementation of tangent distance
3 is a paper that describes tangent distance in more details
Image Convolution
According to 3, the first step you need to do is to smooth the picture. Below we show the result of 3 different smooth operations (check section 4 of 3) (The left column shows the result images, the right column shows the original images and the convolution operators). This step is to map the discrete vector to continuous one so that it is differentiable. The author suggests to use a Gaussian function. If you need more background about image convolution, here is an example.
After this step is done, you have calculated the horizontal and vertical shift:
Calculating Scaling Tangent
Here I show you one of the tangent calculations implemented in 2 - the scaling tangent. From 3, we know the transformation is as below:
/* scaling */
for(k=0;k<height;k++)
for(j=0;j<width;j++) {
currentTangent[ind] = ((j+offsetW)*x1[ind] + (k+offsetH)*x2[ind])*factor;
ind++;
}
In the beginning of td.c in 2's implementation, we know the below definition:
factorW=((double)width*0.5);
offsetW=0.5-factorW;
factorW=1.0/factorW;
factorH=((double)height*0.5);
offsetH=0.5-factorH;
factorH=1.0/factorH;
factor=(factorH<factorW)?factorH:factorW; //min
The author is using images with size 16x16. So we know
factor=factorW=factorH=1/8,
and
offsetH=offsetW = 0.5-8 = -7.5
Also note we already computed
x1[ind] = ,
x2[ind] =
So that, we plug in those constants:
currentTangent[ind] = ((j-7.5)*x1[ind] + (k-7.5)*x2[ind])/8
= x1 * (j-7.5)/8 + x2 * (k-7.5)/8.
Since j(also k) is an integer between 0 and 15 inclusive (the width and the height of the image are 16 pixels), (j-7.5)/8 is just a fraction number between -0.9375 to 0.9375.
So I guess (j+offsetW)*factor is the displacement for each pixel, which is proportional to the horizontal distance from the pixel to the center of the image. Similarly you know the vertical displacement (k+offsetH)*factor.
Calculating Rotation Tangent
Rotation tangent is defined as below in 3:
/* rotation */
for(k=0;k<height;k++)
for(j=0;j<width;j++) {
currentTangent[ind] = ((k+offsetH)*x1[ind] - (j+offsetW)*x2[ind])*factor;
ind++;
}
Using the conclusion from previous, we know (k+offsetH)*factor corresponds to y. Similarly - (j+offsetW)*factor corresponds to -x. So you know that is exactly the formula used in 3.
You can find all other tangents described in 3 implemented at 2. I like the below image from 3, which clearly shows the displacements effect of different transformation tangents.
Calculating the tangent distance between images
Just follow the implementation in tangentDistance function:
// determine the tangents of the first image
calculateTangents(imageOne, tangents, numTangents, height, width, choice, background);
// find the orthonormal tangent subspace
numTangentsRemaining = normalizeTangents(tangents, numTangents, height, width);
// determine the distance to the closest point in the subspace
dist=calculateDistance(imageOne, imageTwo, (const double **) tangents, numTangentsRemaining, height, width);
I think the above should be enough to get you started and if anything is missing, please read 3 carefully and see corresponding implementations in 2. Good luck!

Related

Area of n overlapping circle [duplicate]

I recently came across a problem where I had four circles (midpoints and radius) and had to calculate the area of the union of these circles.
Example image:
For two circles it's quite easy,
I can just calculate the fraction of the each circles area that is not within the triangles and then calculate the area of the triangles.
But is there a clever algorithm I can use when there is more than two circles?
Find all circle intersections on the outer perimeter (e.g. B,D,F,H on the following diagram). Connect them together with the centres of the corresponding circles to form a polygon. The area of the union of the circles is the area of the polygon + the area of the circle slices defined by consecutive intersection points and the circle center in between them. You'll need to also account for any holes.
I'm sure there is a clever algorithm, but here's a dumb one to save having to look for it;
put a bounding box around the circles;
generate random points within the bounding box;
figure out whether the random point is inside one of the circles;
compute the area by some simple addition and division (proportion_of_points_inside*area_of_bounding_box).
Sure it's dumb, but:
you can get as accurate an answer as you want, just generate more points;
it will work for any shapes for which you can calculate the inside/outside distinction;
it will parallelise beautifully so you can use all your cores.
Ants Aasma's answer gave the basic idea, but I wanted to make it a little more concrete. Take a look at the five circles below and the way they've been decomposed.
The blue dots are circle centers.
The red dots are circle boundary intersections.
The red dots with white interior are circle boundary intersections that are not contained in any other circles.
Identifying these 3 types of dots is easy. Now construct a graph data structure where the nodes are the blue dots and the red dots with white interior. For every circle, put an edge between the circle middle (blue dot) and each of its intersections (red dots with white interior) on its boundary.
This decomposes the circle union into a set of polygons (shaded blue) and circular pie pieces (shaded green) that are pairwise disjoint and cover the original union (that is, a partition). Since each piece here is something that's easy to compute the area of, you can compute the area of the union by summing the pieces' areas.
For a different solution from the previous one you could produce an estimation with an arbitrary precision using a quadtree.
This also works for any shape union if you can tell if a square is inside or outside or intersects the shape.
Each cell has one of the states : empty , full , partial
The algorithm consists in "drawing" the circles in the quadtree starting with a low resolution ( 4 cells for instance marked as empty). Each cell is either :
inside at least one circle, then mark the cell as full,
outside all circles, mark the cell as empty,
else mark the cell as partial.
When it's done, you can compute an estimation of the area : the full cells give the lower bound, the empty cells give the higher bound, the partial cells give the max area error.
If the error is too big for you, you refine the partial cells until you get the right precision.
I think this will be easier to implement than the geometric method which may require to handle a lot of special cases.
I love the approach to the case of 2 intersecting circles -- here's how i'd use a slight variation of the same approach for the more complex example.
It might give better insight into generalising the algorithm for larger numbers of semi-overlapping circles.
The difference here is that i start by linking the centres (so there's a vertice between the centre of the circles, rather than between the places where the circles intersect) I think this lets it generalise better.
(in practice, maybe the monte-carlo method is worthwhile)
(source: secretGeek.net)
If you want a discrete (as opposed to a continuous) answer, you could do something similar to a pixel painting algorithm.
Draw the circles on a grid, and then color each cell of the grid if it's mostly contained within a cirle (i.e., at least 50% of its area is inside one of the circles). Do this for the entire grid (where the grid spans all of the area covered by the circles), then count the number of colored cells in the grid.
Hmm, very interesting problem. My approach would probably be something along the lines of the following:
Work out a way of working out what the areas of intersection between an arbitrary number of circles is, i.e. if I have 3 circles, I need to be able to work out what the intersection between those circles is. The "Monte-Carlo" method would be a good way of approximating this (http://local.wasp.uwa.edu.au/~pbourke/geometry/circlearea/).
Eliminate any circles that are contained entirely in another larger circle (look at radius and the modulus of the distance between the centre of the two circles) I dont think is mandatory.
Choose 2 circles (call them A and B) and work out the total area using this formula:
(this is true for any shape, be it circle or otherwise)
area(A∪B) = area(A) + area(B) - area(A∩B)
Where A ∪ B means A union B and A ∩ B means A intersect B (you can work this out from the first step.
Now keep on adding circles and keep on working out the area added as a sum / subtraction of areas of circles and areas of intersections between circles. For example for 3 circles (call the extra circle C) we work out the area using this formula:
(This is the same as above where A has been replaced with A∪B)
area((A∪B)∪C) = area(A∪B) + area(C) - area((A∪B)∩C)
Where area(A∪B) we just worked out, and area((A∪B)∩C) can be found:
area((A∪B)nC) = area((A∩C)∪(B∩C)) = area(A∩C) + area(A∩B) - area((A∩C)∩(B∩C)) = area(A∩C) + area(A∩B) - area(A∩B∩C)
Where again you can find area(A∩B∩C) from above.
The tricky bit is the last step - the more circles get added the more complex it becomes. I believe there is an expansion for working out the area of an intersection with a finite union, or alternatively you may be able to recursively work it out.
Also with regard to using Monte-Carlo to approximate the area of itersection, I believe its possible to reduce the intersection of an arbitrary number of circles to the intersection of 4 of those circles, which can be calculated exactly (no idea how to do this however).
There is probably a better way of doing this btw - the complexity increases significantly (possibly exponentially, but I'm not sure) for each extra circle added.
There are efficient solutions to this problem using what are known as power diagrams. This is really heavy math though and not something that I would want to tackle offhand. For an "easy" solution, look up line-sweep algorithms. The basic principle here is that that you divide the figure up into strips, where calculating the area in each strip is relatively easy.
So, on the figure containing all of the circles with nothing rubbed out, draw a horizontal line at each position which is either the top of a circle, the bottom of a circle or the intersection of 2 circles. Notice that inside these strips, all of the areas you need to calculate look the same: a "trapezium" with two sides replaced by circular segments. So if you can work out how to calculate such a shape, you just do it for all the individual shapes and add them together. The complexity of this naive approach is O(N^3), where N is the number of circles in the figure. With some clever data structure use, you could improve this line-sweep method to O(N^2 * log(N)), but unless you really need to, it's probably not worth the trouble.
The pixel-painting approach (as suggested by #Loadmaster) is superior to the mathematical solution in a variety of ways:
Implementation is much simpler. The above problem can be solved in less than 100 lines of code, as this JSFiddle solution demonstrates (mostly because it’s conceptually much simpler, and has no edge cases or exceptions to deal with).
It adapts easily to more general problems. It works with any shape, regardless of morphology, as long as it’s renderable with 2D drawing libraries (i.e., “all of them!”) — circles, ellipses, splines, polygons, you name it. Heck, even bitmap images.
The complexity of the pixel-painting solution is ~O[n], as compared to ~O[n*n] for the mathematical solution. This means it will perform better as the number of shapes increases.
And speaking of performance, you’ll often get hardware acceleration for free, as most modern 2D libraries (like HTML5’s canvas, I believe) will offload rendering work to graphics accelerators.
The one downside to pixel-painting is the finite accuracy of the solution. But that is tunable by simply rendering to larger or smaller canvases as the situation demands. Note, too, that anti-aliasing in the 2D rendering code (often turned on by default) will yield better-than-pixel-level accuracy. So, for example, rendering a 100x100 figure into a canvas of the same dimensions should, I think, yield accuracy on the order of 1 / (100 x 100 x 255) = .000039% ... which is probably “good enough” for all but the most demanding problems.
<p>Area computation of arbitrary figures as done thru pixel-painting, in which a complex shape is drawn into an HTML5 canvas and the area determined by comparing the number of white pixels found in the resulting bitmap. See javascript source for details.</p>
<canvas id="canvas" width="80" height="100"></canvas>
<p>Area = <span id="result"></span></p>
// Get HTML canvas element (and context) to draw into
var canvas = document.getElementById('canvas');
var ctx = canvas.getContext('2d');
// Lil' circle drawing utility
function circle(x,y,r) {
ctx.beginPath();
ctx.arc(x, y, r, 0, Math.PI*2);
ctx.fill();
}
// Clear canvas (to black)
ctx.fillStyle = 'black';
ctx.fillRect(0, 0, canvas.width, canvas.height);
// Fill shape (in white)
ctx.fillStyle = 'white';
circle(40, 50, 40);
circle(40, 10, 10);
circle(25, 15, 12);
circle(35, 90, 10);
// Get bitmap data
var id = ctx.getImageData(0, 0, canvas.width, canvas.height);
var pixels = id.data; // Flat array of RGBA bytes
// Determine area by counting the white pixels
for (var i = 0, area = 0; i < pixels.length; i += 4) {
area += pixels[i]; // Red channel (same as green and blue channels)
}
// Normalize by the max white value of 255
area /= 255;
// Output result
document.getElementById('result').innerHTML = area.toFixed(2);
I have been working on a problem of simulating overlapping star fields, attempting to estimate the true star counts from the actual disk areas in dense fields, where the larger bright stars can mask fainter ones. I too had hoped to be able to do this by rigorous formal analysis, but was unable to find an algorithm for the task. I solved it by generating the star fields on a blue background as green disks, whose diameter was determined by a probability algorithm. A simple routine can pair them to see if there's an overlap (turning the star pair yellow); then a pixel count of the colours generates the observed area to compare to the theoretical area. This then generates a probability curve for the true counts. Brute force maybe, but it seems to work OK.
(source: 2from.com)
Here's an algorithm that should be easy to implement in practice, and could be adjusted to produce arbitrarily small error:
Approximate each circle by a regular polygon centered at the same point
Calculate the polygon which is the union of the approximated circles
Calculate the area of the merged polygon
Steps 2 and 3 can be carried out using standard, easy-to-find algorithms from computational geometry.
Obviously, the more sides you use for each approximating polygon, the closer to exact your answer would be. You could approximate using inscribed and circumscribed polygons to get bounds on the exact answer.
I found this link which may be useful. There does not seem to be a definitive answer though.
Google answers. Another reference for three circles is Haruki's theorem. There is a paper there as well.
Depending on what problem you are trying to solve it could be sufficient to get an upper and lower bound. An upper bound is easy, just the sum of all the circles. For a lower bound you can pick a single radius such that none of the circles overlap. To better that find the largest radius (up to the actual radius) for each circle so that it doesn't overlap. It should also be pretty trivial to remove any completely overlapped circles (All such circles satisfy |P_a - P_b| <= r_a) where P_a is the center of circle A, P_b is the center of circle B, and r_a is the radius of A) and this betters both the upper and lower bound. You could also get a better Upper bound if you use your pair formula on arbitrary pairs instead of just the sum of all the circles. There might be a good way to pick the "best" pairs (the pairs that result in the minimal total area.
Given an upper and lower bound you might be able to better tune a Monte-carlo approach, but nothing specific comes to mind. Another option (again depending on your application) is to rasterize the circles and count pixels. It is basically the Monte-carlo approach with a fixed distribution.
I've got a way to get an approximate answer if you know that all your circles are going to be within a particular region, i.e. each point in circle is inside a box whose dimensions you know. This assumption would be valid, for example, if all the circles are in an image of known size. If you can make this assumption, divide the region which contains your image into 'pixels'. For each pixel, compute whether it is inside at least one of the circles. If it is, increment a running total by one. Once you are done, you know how many pixels are inside at least one circle, and you also know the area of each pixel, so you can calculate the total area of all the overlapping circles.
By increasing the 'resolution' of your region (the number of pixels), you can improve your approximation.
Additionally, if the size of the region containing your circles is bounded, and you keep the resolution (number of pixels) constant, the algorithm runs in O(n) time (n is the number of circles). This is because for each pixel, you have to check whether it is inside each one of your n circles, and the total number of pixels is bounded.
This can be solved using Green's Theorem, with a complexity of n^2log(n).
If you're not familiar with the Green's Theorem and want to know more, here is the video and notes from Khan Academy. But for the sake of our problem, I think my description will be enough.
If I put L and M such that
then the RHS is simply the area of the Region R and can be obtained by solving the closed integral or LHS and this is exactly what we're going to do.
So Integrating along the path in the anticlockwise gives us the Area of the region and integrating along the clockwise gives us negative of the Area. So
AreaOfUnion = (Integration along red arcs in anticlockwise direction + Integration along blue arcs in clockwise direction)
But the cool trick is if for each circle if we integrate the arcs which are not inside any other circle we get our required area i.e. we get integration in an anticlockwise direction along all red arcs and integration along all blue arcs along the clockwise direction. JOB DONE!!!
Even the cases when a circle doesn't intersect with any other is taken
care of.
Here is the GitHub link to my C++ Code

Percieved width of a decal depending on the rotation angle of the wall

I am creating a raycasting game from scratch using JavaScript canvas.
Part of the challenge (for me) is to decorate walls with random images (pictures). I already implemented drawing of walls, floor an ceiling and sprites.
While drawing walls, I store for each x (depicting screen coordinate) the distance to the wall (Z-BUFFER), the height of the wall (H-BUFFER) and actual coordinates of the pixel in the underlying 2D grid (GRID_BUFFER).
My approach for painting the decals (pictures) on the wall is then the following (after identifying a list of decals that could theoretically be visible):
distance to the decal's position is calculated (position is defined as being in the middle of the grid vertice facing the observer)
screen coordinate decalScreenX is calculated based on the transformation matrix from grid coordinates to screen coordinates. This works correctly:
let decalScreenX = Math.floor((RAYCAST.SCREEN_WIDTH / 2) * (1 + CAMERA.transformX /CAMERA.transformDepth));
Then I retrieve image data for the decal in question and get it's width and height
And based on the distance and the observed angle, I calculate the percieved width of the decal. This is where the real issue lies, as I see that I don't calculate this width completely accurate.
with all this information, it is then easy to calculate left and right screen coordinates - where to begin and and where to end drawing the decal, use H-BUFFER to calculate height factor and use GRID_BUFFER to draw only on grid belonging to this decal.
I saw the width calculation in terms that decal is rotated from the player direction vector by an angle, if the player direction is not opposite of the direction with which decal faces the space (example):
or if player direction is directly opposite to the direction of decal, this angle is 0° (example):
My first approach was to use dot product of the reversed player direction and decal facing direction, thus getting cosine of the angle between vectors and use this as a factor to reduce perceived width:
let CosA = PLAYER.dir.mirror().dot(decal.facingDir);
let widthScale = CosA * (CAMERA.transformDepth / decal.distance);
The problem with this solution is, that when perpendicular , the factor is 0 and the decal is not drawn but as the walls are drawn with perspective, this should not be the case. So I began improvising. I defined CAMERA.minPerspective factor as seen below. Field of vision (FOV) is 70°.
CAMERA.minPerspective = Math.cos(Math.radians((90 + this.FOV) / 2));
My intuition was (as I lack the knowledge of perspective and geometry, alas) that for small angles, the factor should remain 1. And for angles close to 90° there should be some minimal factor, so that decal remains visible. So I came with this "improved" code:
let CosA = PLAYER.dir.mirror().dot(decal.facingDir);
let FACTOR = Math.min(1, CosA + CAMERA.minPerspective);
FACTOR = Math.max(FACTOR, CAMERA.minPerspective);
let widthScale = FACTOR * (CAMERA.transformDepth / decal.distance);
This works considerably better, but it has some flaws. Visually, for angles 0-50° the factor of reduction is too great. This can be observed if I use decals of such width, that they should cover complete grid surface. (see image below; left of the stairs the wall underneath is visible, decal should cover complete grid, but it doesn't, bacause the FACTOR is to small).
I have searched Stack Overflow and the rest of the Web for better solution, by it seems that my knowledge of geometry also prevents me to recognize proper solutions if they are out of this context.
So, please. There are probably deterministic solutions for calculating percieved width, without using raycasting phase again or by using the information I am able to store in raycasting phase. While JavaScript is used in code example, I consider this question not to be specific to any programming language.
I have found solution that retains (or even improves) simplicity and time complexity of the approach in the question.
I have added two points to the decal definition - leftDrawStart and
rightStartDraw. Those are easy to calculate at the point of decal
instantialization, based on real sprite (decal) width and the definition
of the grid (block) size. While doing this calculation, I consider leftDrawStart from the camera perspective (not grid coordinates).
when rendering decal, I calculate using transformation matrix (as in question, code example below) screen coordinates for leftDrawStart and rightStartDraw from their grid coordinates:
transform(spritePos) {
let invDet = 1.0 / (CAMERA.dir.x * PLAYER.dir.y - PLAYER.dir.x * CAMERA.dir.y);
CAMERA.transformX = invDet * (PLAYER.dir.y * spritePos.x - PLAYER.dir.x * spritePos.y);
CAMERA.transformDepth = invDet * (-CAMERA.dir.y * spritePos.x + CAMERA.dir.x * spritePos.y);
}
I distinguish the calculated absolute drawStartX and drawEndX, and their adjustment so that they fit the screen boundaries or return from function if they are completely offscreen
finally, percieved width of the decal is not even required since the texture position can be calculated by using ratio of differences between curent drawing stripe - absolute drawing start and difference of absolute drawing end - absolute drawing start:
let texX = (((stripe - drawStartX_abs) / (drawEndX_abs - drawStartX_abs)) * imageData.width) | 0;
The approach is completelly accurate and considerably faster in comparison to approach where decal casting would be incorporated in the raycasting step.

Moving an object diagonally inside a square

I am stuck on a particular problem. I am learning on how to create a very basic game, where a ball will travel diagonally from either top left corner of a square or a rectangular down to the bottom right corner in a straight line (As shown in Fig 1 & 2). Now I know that the ball x and y position will both need to be changed frame by frame but I am unsure on how to go about this.
enter image description here
Math is not my strong point and I am unsure how do I calculate the exact route, especially since both the square and rectangle will have a different angles. Are there any math formulas I can use to calculate the diagonal line and by how much each of the x and y coordinates of the ball will need to be adjusted frame by frame.
From the research that I have done I think that I will most likely need to calculate the angle using the sin or cos functions but I am not sure how everything fits together. Have been using https://www.mathsisfun.com/sine-cosine-tangent.html to try and learn more.
I am planning on starting to code this but would really appreciate answers to these basic questions. I am trying to learn both the programming and the mathematical aspect at the same time and I feel that this approach would be the best fit.
Many Thanks for any suggestions/help, I would really appreciate it.
Since it's rectangular, just calculate the slope: rise (Y) / run (X). That will give you how much to increase the object's location in each direction per frame. Depending on how fast or slow you want the object to move, you'll need to apply some modifier to that (e.g., if you want the object to move twice as fast, you'll need to multiple 2 by the change in a particular direction before you actually change the object's location.
For square :
If you are using Frame or JFrame, you have coordinate with you.
You can move ball from left top to right down as follow ->
Suppose ur top left corner is at (0,0), add 1in both coordinate until you reach right bottom corner.
U can do this using for loop
You don't technically need the angle for this mapping. You know that the formula for a line is "y = m * x + b." I presume that you can calculate m,b. If not, let me know.
Given that - you can simply increment x based on anything you like (timer, event, etc. ). You can place your incremented x into the equation above to get your respective y.
Now, that won't be quite enough as you are dealing with pixels instead of actual numbers. For example, lest assume that in your game x/y are in feet. You will need to know how many pixels represent a foot. Then when you draw to the screen you adjust your coordinates by dividing by pixels per foot.
So...
1. Calculate your m and b for your path.
2. Use a timer. At each tick, adjust your x value
3. Use your x value to calculate your y value
4. Divide x and y by a scaling number
5. Use the new scaled x and y to plot your object
Now...There are all kinds of tricks you can play with the math, but that should get you started.
Let's left bottom corner of rectangle (or square) has coordinates (x0, y0), and right top corner (x1, y1). Then diagonal has equation
X(t) = x0 + t * (x1 - x0)
Y(t) = y0 + t * (y1 - y0)
where t is parameter in range 0..1. Point at t=0 corresponds to (x0, y0), at t=1 to (x1, y1), and at t=0.5 - to the center of rectangle. So all you need is vary parameter t and calculate position.
When your object will move with constant absolute speed in arbitrary direction, use horizontal and vertical components of velocity vx and vy. At every moment get coordinates as x_new = x_old + delta_time * vx. Note that reflection from vertical edge just changes horizontal component of velocity 'vx = - vx' and so on.

Smooth local celing function

I have a gray-scale image and I want to make a function that
closely follows the image
is always grater than it the image
smooth at some given scale.
In other words I want a smooth function that approximates the maximum of another function in the local region while over estimating the that function at all points.
Any ideas?
My first pass at this amounted to picking the "high spots" (by comparing the image to a least-squares fit of a high order 2-D polynomial) and matching a 2-D polynomial to them and their slopes. As the first fit required more working space than I had address space, I think it's not going to work and I'm going to have to come up with something else...
What I did
My end target was to do a smooth adjustment on an image so that each local region uses the full range of values. The key realization was that an "almost perfect" function would do just fine for me.
The following procedure (that never has the max function explicitly) is what I ended up with:
Find the local mean and standard deviation at each point using a "blur" like function.
offset the image to get a zero mean. (image -= mean;)
divide each pixel by its stdev. (image /= stdev;)
the most image should now be in [-1,1] (oddly enough most of my test images have better than 99% in that range rather than the 67% that would be expected)
find the standard deviation of the whole image.
map some span +/- n*sigma to your output range.
With a little manipulation, that can be converted to find the Max function I was asking about.
Here's something that's easy; I don't know how good it is.
To get smooth, use your favorite blurring algorithm. E.g., average points within radius 5. Space cost is order the size of the image and time is the product of the image size with the square of the blurring radius.
Take the difference of each individual pixel with the original image, find the maximum value of (original[i][j] - blurred[i][j]), and add that value to every pixel in the blurred image. The sum is guaranteed to overapproximate the original image. Time cost is proportional to the size of the image, with constant additional space (if you overwrite the blurred image after computing the max.
To do better (e.g., to minimize the square error under some set of constraints), you'll have to pick some class of smooth curves and do some substantial calculations. You could try quadratic or cubic splines, but in two dimensions splines are not much fun.
My quick and dirty answer would be to start with the original image, and repeat the following process for each pixel until no changes are made:
If an overlarge delta in value between this pixel and its neighbours can be resolved by increasing the value of the pixel, do so.
If an overlarge slope change around this pixel and its neighbours can be resolved by increasing the value of the pixel, do so.
The 2D version would look something like this:
for all x:
d = img[x-1] - img[x]
if d > DMAX:
img[x] += d - DMAX
d = img[x+1] - img[x]
if d > DMAX:
img[x] += d - DMAX
dleft = img[x-1] - img[x]
dright = img[x] - img[x+1]
d = dright - dleft
if d > SLOPEMAX:
img[x] += d - SLOPEMAX
Maximum filter the image with an RxR filter, then use an order R-1 B-spline smoothing on the maximum-filtered image. The convex hull properties of the B-spline guarantee that it will be above the original image.
Can you clarify what you mean by your desire that it be "smooth" at some scale? Also, over how large of a "local region" do you want it to approximate the maximum?
Quick and dirty answer: weighted average of the source image and a windowed maximum.

Normal Vector of Three Points

Hey math geeks, I've got a problem that's been stumping me for a while now. It's for a personal project.
I've got three dots: red, green, and blue. They're positioned on a cardboard slip such that the red dot is in the lower left (0,0), the blue dot is in the lower right (1,0), and the green dot is in the upper left. Imagine stepping back and taking a picture of the card from an angle. If you were to find the center of each dot in the picture (let's say the units are pixels), how would you find the normal vector of the card's face in the picture (relative to the camera)?
Now a few things I've picked up about this problem:
The dots (in "real life") are always at a right angle. In the picture, they're only at a right angle if the camera has been rotated around the red dot along an "axis" (axis being the line created by the red and blue or red and green dots).
There are dots on only one side of the card. Thus, you know you'll never be looking at the back of it.
The distance of the card to the camera is irrelevant. If I knew the depth of each point, this would be a whole lot easier (just a simple cross product, no?).
The rotation of the card is irrelevant to what I'm looking for. In the tinkering that I've been doing to try to figure this one out, the rotation can be found with the help of the normal vector in the end. Whether or not the rotation is a part of (or product of) finding the normal vector is unknown to me.
Hope there's someone out there that's either done this or is a math genius. I've got two of my friends here helping me on it and we've--so far--been unsuccessful.
i worked it out in my old version of MathCAD:
Edit: Wording wrong in screenshot of MathCAD: "Known: g and b are perpendicular to each other"
In MathCAD i forgot the final step of doing the cross-product, which i'll copy-paste here from my earlier answer:
Now we've solved for the X-Y-Z of the
translated g and b points, your
original question wanted the normal of
the plane.
If cross g x b, we'll get the
vector normal to both:
| u1 u2 u3 |
g x b = | g1 g2 g3 |
| b1 b2 b3 |
= (g2b3 - b2g3)u1 + (b1g3 - b3g1)u2 + (g1b2 - b1g2)u3
All the values are known, plug them in
(i won't write out the version with g3
and b3 substituted in, since it's just
too long and ugly to be helpful.
But in practical terms, i think you'll have to solve it numerically, adjusting gz and bz so as to best fit the conditions:
g · b = 0
and
|g| = |b|
Since the pixels are not algebraically perfect.
Example
Using a picture of the Apollo 13 astronauts rigging one of the command module's square Lithium Hydroxide cannister to work in the LEM, i located the corners:
Using them as my basis for an X-Y plane:
i recorded the pixel locations using Photoshop, with positive X to the right, and positive Y down (to keep the right-hand rule of Z going "into" the picture):
g = (79.5, -48.5, gz)
b = (-110.8, -62.8, bz)
Punching the two starting formulas into Excel, and using the analysis toolpack to "minimize" the error by adjusting gz and bz, it came up with two Z values:
g = (79.5, -48.5, 102.5)
b = (-110.8, -62.8, 56.2)
Which then lets me calcuate other interesting values.
The length of g and b in pixels:
|g| = 138.5
|b| = 139.2
The normal vector:
g x b = (3710, -15827, -10366)
The unit normal (length 1):
uN = (0.1925, -0.8209, -0.5377)
Scaling normal to same length (in pixels) as g and b (138.9):
Normal = (26.7, -114.0, -74.7)
Now that i have the normal that is the same length as g and b, i plotted them on the same picture:
i think you're going to have a new problem: distortion introduced by the camera lens. The three dots are not perfectly projected onto the 2-dimensional photographic plane. There's a spherical distortion that makes straight lines no longer straight, makes equal lengths no longer equal, and makes the normals slightly off of normal.
Microsoft research has an algorithm to figure out how to correct for the camera's distortion:
A Flexible New Technique for Camera Calibration
But it's beyond me:
We propose a flexible new technique to
easily calibrate a camera. It is well
suited for use without specialized
knowledge of 3D geometry or computer
vision. The technique only requires
the camera to observe a planar pattern
shown at a few (at least two)
different orientations. Either the
camera or the planar pattern can be
freely moved. The motion need not be
known. Radial lens distortion is
modeled. The proposed procedure
consists of a closed-form solution,
followed by a nonlinear refinement
based on the maximum likelihood
criterion. Both computer simulation
and real data have been used to test
the proposed technique, and very good
results have been obtained. Compared
with classical techniques which use
expensive equipments such as two or
three orthogonal planes, the proposed
technique is easy to use and flexible.
It advances 3D computer vision one
step from laboratory environments to
real world use.
They have a sample image, where you can see the distortion:
(source: microsoft.com)
Note
you don't know if you're seeing the "top" of the cardboard, or the "bottom", so the normal could be mirrored vertically (i.e. z = -z)
Update
Guy found an error in the derived algebraic formulas. Fixing it leads to formulas that i, don't think, have a simple closed form. This isn't too bad, since it can't be solved exactly anyway; but numerically.
Here's a screenshot from Excel where i start with the two knowns rules:
g · b = 0
and
|g| = |b|
Writing the 2nd one as a difference (an "error" amount), you can then add both up and use that value as a number to have excel's solver minimize:
This means you'll have to write your own numeric iterative solver. i'm staring over at my Numerical Methods for Engineers textbook from university; i know it contains algorithms to solve recursive equations with no simple closed form.
From the sounds of it, you have three points p1, p2, and p3 defining a plane, and you want to find the normal vector to the plane.
Representing the points as vectors from the origin, an equation for a normal vector would be
n = (p2 - p1)x(p3 - p1)
(where x is the cross-product of the two vectors)
If you want the vector to point outwards from the front of the card, then ala the right-hand rule, set
p1 = red (lower-left) dot
p2 = blue (lower-right) dot
p3 = green (upper-left) dot
# Ian Boyd...I liked your explanation, only I got stuck on step 2, when you said to solve for bz. You still had bz in your answer, and I don't think you should have bz in your answer...
bz should be +/- square root of gx2 + gy2 + gz2 - bx2 - by2
After I did this myself, I found it very difficult to substitute bz into the first equation when you solved for gz, because when substituting bz, you would now get:
gz = -(gxbx + gyby) / sqrt( gx2 + gy2 + gz2 - bx2 - by2 )
The part that makes this difficult is that there is gz in the square root, so you have to separate it and combine the gz together, and solve for gz Which I did, only I don't think the way I solved it was correct, because when I wrote my program to calculate gz for me, I used your gx, and gy values to see if my answer matched up with yours, and it did not.
So I was wondering if you could help me out, because I really need to get this to work for one of my projects. Thanks!
Just thinking on my feet here.
Your effective inputs are the apparent ratio RB/RG [+], the apparent angle BRG, and the angle that (say) RB makes with your screen coordinate y-axis (did I miss anything). You need out the components of the normalized normal (heh!) vector, which I believe is only two independent values (though you are left with a front-back ambiguity if the card is see through).[++]
So I'm guessing that this is possible...
From here on I work on the assumption that the apparent angle of RB is always 0, and we can rotate the final solution around the z-axis later.
Start with the card positioned parallel to the viewing plane and oriented in the "natural" way (i.e. you upper vs. lower and left vs. right assignments are respected). We can reach all the interesting positions of the card by rotating by \theta around the initial x-axis (for -\pi/2 < \theta < \pi/2), then rotating by \phi around initial y-axis (for -\pi/2 < \phi < \pi/2). Note that we have preserved the apparent direction of the RB vector.
Next step compute the apparent ratio and apparent angle after in terms of \theta and \phi and invert the result.[+++]
The normal will be R_y(\phi)R_x(\theta)(0, 0, 1) for R_i the primitive rotation matrix around axis i.
[+] The absolute lengths don't count, because that just tells you the distance to card.
[++] One more assumption: that the distance from the card to view plane is much large than the size of the card.
[+++] Here the projection you use from three-d space to the viewing plane matters. This is the hard part, but not something we can do for you unless you say what projection you are using. If you are using a real camera, then this is a perspective projection and is covered in essentially any book on 3D graphics.
right, the normal vector does not change by distance, but the projection of the cardboard on a picture does change by distance (Simple: If you have a small cardboard, nothing changes.
If you have a cardboard 1 mile wide and 1 mile high and you rotate it so that one side is nearer and the other side more far away, the near side is magnified and the far side shortened on the picture. You can see that immediately that an rectangle does not remain a rectangle, but a trapeze)
The mostly accurate way for small angles and the camera centered on the middle is to measure the ratio of the width/height between "normal" image and angle image on the middle lines (because they are not warped).
We define x as left to right, y as down to up, z as from far to near.
Then
x = arcsin(measuredWidth/normWidth) red-blue
y = arcsin(measuredHeight/normHeight) red-green
z = sqrt(1.0-x^2-y^2)
I will calculate tomorrow a more exact solution, but I'm too tired now...
You could use u,v,n co-oridnates. Set your viewpoint to the position of the "eye" or "camera", then translate your x,y,z co-ordinates to u,v,n. From there you can determine the normals, as well as perspective and visible surfaces if you want (u',v',n'). Also, bear in mind that 2D = 3D with z=0. Finally, make sure you use homogenious co-ordinates.

Resources