What is the math behind -webkit-perspective? - css

"Simple" question that I can't find the answer to -- What does -webkit-perspective actually do mathematically? (I know the effect it has, it basically acts like a focal-length control) e.g. what does -webkit-perspective: 500 mean?!?
I need to find the on-screen location of something that's been moved using, among other things, -webkit-perspective

The CSS 3D Transforms Module working draft gives the following explanation:
perspective(<number>)
specifies a perspective projection matrix. This matrix maps a viewing cube onto a pyramid whose base is infinitely far away from the
viewer and whose peak represents the viewer's position. The viewable
area is the region bounded by the four edges of the viewport (the
portion of the browser window used for rendering the webpage between
the viewer's position and a point at a distance of infinity from the
viewer). The depth, given as the parameter to the function, represents
the distance of the z=0 plane from the viewer. Lower values give a
more flattened pyramid and therefore a more pronounced perspective
effect. The value is given in pixels, so a value of 1000 gives a
moderate amount of foreshortening and a value of 200 gives an extreme
amount. The matrix is computed by starting with an identity matrix and
replacing the value at row 3, column 4 with the value -1/depth. The
value for depth must be greater than zero, otherwise the function is
invalid.
This is something of a start, if not entirely clear. The first sentence leads me to believe the perspective projection matrix article on Wikipedia might be of some help, although in the comments on this post it is revealed there might be some slight differences between the CSS Working Group's conventions and those found in Wikipedia, so please check those out to save yourself a headache.

Check out http://en.wikipedia.org/wiki/Perspective_projection#Diagram
After reading the previous comments and doing some research and testing I'm pretty sure this is correct.
Notice that this is same for the Y coord too.
Transformed X = Original X * ( Perspective / ( Perspective - Z translation ) )
eg.
Div is 500px wide
Perspective is 10000px
Transform is -5000px in Z direction
Transformed Width = 500 * ( 10000 / ( 10000 - ( -5000 ) )
Transformed Width = 500 * ( 10000 / 15000) = 500 * (2/3) = 333px

#Domenic Oddly enough, the description "The matrix is computed by starting with an identity matrix and replacing the value at row 3, column 4 with the value -1/depth." has already been removed from the The CSS 3D Transforms Module working draft now. Perhaps there might have been some inaccuracies with this description.
Well, as to the question what does the number in perspective(<number>) means, I think it could be seen as the distance between the position of the imagined camera and your computer screen.

Related

DirectX negative W

I really was trying to find an answer on this very basic (at first sight) question.
For simplicity depth test is disabled during further discussion (it doesn’t have a big deal).
For example, we have triangle (after transformation) with next float4 coordinates.
top CenterPoint: (0.0f, +0.6f, 0.6f, 1f)
basic point1: (+0.4f, -0.4f, 0.4f, 1f),
basic point2: (-0.4f, -0.4f, 0.4f, 1f),
I’m sending float4 for input and use straight VertexShader (without transforms), so I’m sure about input. And we have result is reasonable:
But what we will get if we'll start to move CenterPoint to point of camera position. In our case we don’t have camera so will move this point to minus infinity.
I'm getting quite reasonable results as long as w (with z) is positive.
For example, (0.0f, +0.006f, 0.006f, .01f) – look the same.
But what if I'll use next coordinates (0.0f, -0.6f, -1f, -1f).
(Note: we have to switch points or change rasterizer for culling preventing).
According to huge amount of resource I'll have test like: -w < z < w, so GPU should cut of that point. And yes, in principle, I don’t see point. But triangle still visible! OK, according to huge amount of other resource (and my personal understanding) we'll have division like (x/w, y/w, z/w) so result should be (0, 0.6, 1). But I'm getting
And even if that result have some sense (one point is somewhere far away behind as), how really DirectX (I think it is rather GPU) works in such cases (in case of infinite points and negative W)?
It seems that I don't know something very basic, but it seems that nobody know that.
[Added]: I want to note that point w < 0 - is not a real input.
In real life such points are result of transformation by matrices and according to the math (math that are used in standard Direct sdk and other places) corresponds to the point that appears behind the camera position.
And yes, that point is clipped, but questions is rather about strange triangle that contains such point.
[Brief answer]: Clipping is essentially not just z/w checking and division (see details below).
Theoretically, NDC depth is divided into two distinct areas. The following diagram shows these areas for znear = 1, zfar = 3. The horizontal axis shows view-space z and the vertical axis shows the resulting NDC depth for a standard projective transform:
We can see that the part between view-space z of 1 and 3 (znear, zmax) gets mapped to NDC depth 0 to 1. This is the part that we are actually interested in.
However, the part where view-space z is negative also produces positive NDC depth. However, those are parts that result from fold-overs. I.e., if you take a corner of your triangle and slowly decrease z (along with w), starting in the area between znear and zfar, you would observe the following:
we start between znear and zfar, everything is good
as soon as we pass znear, the point gets clipped because NDC depth < 0.
when we are at view-space z = 0, the point also has w = 0 and no valid projection.
as we decrease view-space z further, the point gets a valid projection again (starting at infinity) and comes back in with positive NDC depth.
However, this last part is the area behind the camera. So, homogeneous clipping is made, such that this part is also clipped away by znear clipping.
Check the old D3D9 documentation for the formulas and some more illustrative explanations here.

Handle "Division by Zero" in Image Processing (or PRNU estimation)

I have the following equation, which I try to implement. The upcoming question is not necessarily about this equation, but more generally, on how to deal with divisions by zero in image processing:
Here, I is an image, W is the difference between the image and its denoised version (so, W expresses the noise in the image), and K is an estimated fingerprint, gained from d images of the same camera. All calculations are done pixel-wise; so the equations does not involve a matrix multiplication. For more on the Idea of estimating digital fingerprints consult corresponding literature like the general wikipedia article or scientific papers.
However my problem arises when an Image has a pixel with value Zero, e.g. perfect black (let's say we only have one image, k=1, so the Zero gets not overwritten by the pixel value of the next image by chance, if the next pixelvalue is unequal Zero). Then I have a division by zero, which apparently is not defined.
How can I overcome this problem? One option I came up with was adding +1 to all pixels right before I even start the calculations. However this shifts the range of pixel values from [0|255] to [1|256], which then makes it impossible to work with data type uint8.
Other authors in papers I read on this topic, often do not consider values close the range borders. For example they only calculate the equation for pixelvalues [5|250]. They reason this, not because of the numerical problem but they say, if an image is totally saturated, or totally black, the fingerprint can not even be estimated properly in that area.
But again, my main concern is not about how this algorithm performs best, but rather in general: How to deal with divisions by 0 in image processing?
One solution is to use subtraction instead of division; however subtraction is not scale invariant it is translation invariant.
[e.g. the ratio will always be a normalized value between 0 and 1 ; and if it exceeds 1 you can reverse it; you can have the same normalization in subtraction but you need to find the max values attained by the variables]
Eventualy you will have to deal with division. Dividing a black image with itself is a proper subject - you can translate the values to some other range then transform back.
However 5/8 is not the same as 55/58. So you can take this only in a relativistic way. If you want to know the exact ratios you better stick with the original interval - and handle those as special cases. e.g if denom==0 do something with it; if num==0 and denom==0 0/0 that means we have an identity - it is exactly as if we had 1/1.
In PRNU and Fingerprint estimation, if you check the matlab implementation in Jessica Fridrich's webpage, they basically create a mask to get rid of saturated and low intensity pixels as you mentioned. Then they convert Image matrix to single(I) which makes the image 32 bit floating point. Add 1 to the image and divide.
To your general question, in image processing, I like to create mask and add one to only zero valued pixel values.
img=imread('my gray img');
a_mat=rand(size(img));
mask=uint8(img==0);
div= a_mat/(img+mask);
This will prevent division by zero error. (Not tested but it should work)

Distance between hyperplanes

I'm trying to teach myself some machine learning, and have been using the MNIST database (http://yann.lecun.com/exdb/mnist/) do so. The author of that site wrote a paper in '98 on all different kinds of handwriting recognition techniques, available at http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf.
The 10th method mentioned is a "Tangent Distance Classifier". The idea being that if you place each image in a (NxM)-dimensional vector space, you can compute the distance between two images as the distance between the hyperplanes formed by each where the hyperplane is given by taking the point, and rotating the image, rescaling the image, translating the image, etc.
I can't figure out enough to fill in the missing details. I understand that most of these are indeed linear operators, so how does one use that fact to then create the hyperplane? And once we have a hyperplane, how do we take its distance with other hyperplanes?
I will give you some hints. You need some background knowledge in image processing. Please refer to 2,3 for details.
2 is a c implementation of tangent distance
3 is a paper that describes tangent distance in more details
Image Convolution
According to 3, the first step you need to do is to smooth the picture. Below we show the result of 3 different smooth operations (check section 4 of 3) (The left column shows the result images, the right column shows the original images and the convolution operators). This step is to map the discrete vector to continuous one so that it is differentiable. The author suggests to use a Gaussian function. If you need more background about image convolution, here is an example.
After this step is done, you have calculated the horizontal and vertical shift:
Calculating Scaling Tangent
Here I show you one of the tangent calculations implemented in 2 - the scaling tangent. From 3, we know the transformation is as below:
/* scaling */
for(k=0;k<height;k++)
for(j=0;j<width;j++) {
currentTangent[ind] = ((j+offsetW)*x1[ind] + (k+offsetH)*x2[ind])*factor;
ind++;
}
In the beginning of td.c in 2's implementation, we know the below definition:
factorW=((double)width*0.5);
offsetW=0.5-factorW;
factorW=1.0/factorW;
factorH=((double)height*0.5);
offsetH=0.5-factorH;
factorH=1.0/factorH;
factor=(factorH<factorW)?factorH:factorW; //min
The author is using images with size 16x16. So we know
factor=factorW=factorH=1/8,
and
offsetH=offsetW = 0.5-8 = -7.5
Also note we already computed
x1[ind] = ,
x2[ind] =
So that, we plug in those constants:
currentTangent[ind] = ((j-7.5)*x1[ind] + (k-7.5)*x2[ind])/8
= x1 * (j-7.5)/8 + x2 * (k-7.5)/8.
Since j(also k) is an integer between 0 and 15 inclusive (the width and the height of the image are 16 pixels), (j-7.5)/8 is just a fraction number between -0.9375 to 0.9375.
So I guess (j+offsetW)*factor is the displacement for each pixel, which is proportional to the horizontal distance from the pixel to the center of the image. Similarly you know the vertical displacement (k+offsetH)*factor.
Calculating Rotation Tangent
Rotation tangent is defined as below in 3:
/* rotation */
for(k=0;k<height;k++)
for(j=0;j<width;j++) {
currentTangent[ind] = ((k+offsetH)*x1[ind] - (j+offsetW)*x2[ind])*factor;
ind++;
}
Using the conclusion from previous, we know (k+offsetH)*factor corresponds to y. Similarly - (j+offsetW)*factor corresponds to -x. So you know that is exactly the formula used in 3.
You can find all other tangents described in 3 implemented at 2. I like the below image from 3, which clearly shows the displacements effect of different transformation tangents.
Calculating the tangent distance between images
Just follow the implementation in tangentDistance function:
// determine the tangents of the first image
calculateTangents(imageOne, tangents, numTangents, height, width, choice, background);
// find the orthonormal tangent subspace
numTangentsRemaining = normalizeTangents(tangents, numTangents, height, width);
// determine the distance to the closest point in the subspace
dist=calculateDistance(imageOne, imageTwo, (const double **) tangents, numTangentsRemaining, height, width);
I think the above should be enough to get you started and if anything is missing, please read 3 carefully and see corresponding implementations in 2. Good luck!

Gaussian Falloff Format for Mesh Manipulation

This return below is defined as a gaussian falloff. I am not seeing e or powers of 2, so I am not sure how this is related to the Gaussian falloff, or if it is the wrong kind of fallout for me to use to get a nice smooth deformation on my mesh:
Mathf.Clamp01 (Mathf.Pow (360.0, -Mathf.Pow (distance / inRadius, 2.5) - 0.01))
where Mathf.Clamp01 returns a value between 0 and 1.
inRadius is the size of the distortion and distance is determined by:
sqrMagnitude = (vertices[i] - position).sqrMagnitude;
// Early out if too far away
if (sqrMagnitude > sqrRadius)
continue;
distance = Mathf.Sqrt(sqrMagnitude);
vertices is a list of mesh vertices, and position is the point of mesh manipulation/deformation.
My question is two parts:
1) Is the above actually a Gaussian falloff? It is expontential, but there does not seem to be the crucial e or power of 2... (Updated - I see how the graph seems to decrease smoothly in a Gaussian-like way. Perhaps this function is not the cause for problem 2 below)
2) My mesh is not deforming smoothly enough - given the above parameters, would you recommend a different Gaussian falloff?
Don't know about meshes etc. but lets see that math:
f=360^(-0.1- ((d/r)^2.5) ) looks similar enough to gausian function to make a "fall off".
i'll take the exponent apart to show a point:
f= 360^( -(d/r)^2.5)*360^(-0.1)=(0.5551)*360^( -(d/r)^2.5)
if d-->+inf then f-->0
if d-->+0 then f-->(0.5551)
the exponent of 360 is always negative (assuming 'distance' and 'inRadius' are always positive) and getting bigger (more negative) almost cubicly ( power of 2.5) with distance thus the function is "falling off" and doing it pretty fast.
Conclusion: the function is not Gausian because it behaves badly for negative input and probably for other reasons. It does exibits the "fall off" behavior you are looking for.
Changing r will change the speed of the fall-off. When d==r the f=(1/360)*0.5551.
The function will never go over 0.5551 and below zero so the "clipping" in the code is meaningless.
I don't see any see any specific reason for the constant 360 - changing it changes the slope a bit.
cheers!

How to randomly but evenly distribute nodes on a plane

I need to place 1 to 100 nodes (actually 25px dots) on a html5 canvas. I need to make them look randomly distributed so using some kind of grid is out. I also need to ensure these dots are not touching or overlapping. I would also like to not have big blank areas. Can someone tell me what this kind of algorithm is called? A reference to an open source project that does this would also be appreciated.
Thanks all
Guido
What you are looking for is called a Poisson-disc distribution. It occurs in nature in the distribution of photoreceptor cells on your retina. There is a great article about this by Mike Bostock (StackOverflow profile) called Visualizing Algorithms. It has JavaScript demos and a lot of code to look at.
In the interest of doing more then dropping a link into the answer, I will try to give a brief summary of the article:
Mitchell's best-candidate algorithm
A simple approximation known as Mitchell’s best-candidate algorithm. It is easy to implement both crowds some spaces and leaves gaps in other. The algorithm adds new points one at a time. For each new sample, the best-candidate algorithm generates a fixed number of candidates, say 10. The point furthest from any other point is added to the set and the process is repeated until the desired density is achieved.
Bridson's Algorithm
Bridson’s algorithm for Poisson-disc sampling (original paper pdf) scales linearly and is easy to implement as well. This algorithm grows from an initial point and (IMHO) is quite fun to watch (again see Mike Bostock's article). All points in the set are either active or inactive. all points are added as active. One point is chosen from the active set and some number of candidate points are generated in the annulus (a.k.a ring) that extends from the sample with the inner circle having a radius r and the outer circle having a radius 2r. Candidate sample less then r distance away from any point in the FinalSet are rejected. Once a sample is found that is not rejected it is added the the FinalSet. If all the candidate sample are rejected the original point is marked as inactive on the assumption that is has so many neighboring points that no more can be added around it. When all samples are inactive the algorithm terminates.
A grid of size r/√2 can be used to greatly increase the speed of checking candidate points. Only one point may ever be in a grid square and only a limited number of adjacent squares need to be checked.
The easiest way would be to just generate random (x, y) coordinates for each one, repeating if they are touching or overlapping.
Pseudocode:
do N times
{
start:
x = rand(0, width)
y = rand(0, height)
for each other point, p
if distance(p.x, p.y, x, y) < radius * 2
goto start
add_point(x, y);
}
This is O(n^2), but if n is only going to be 100 then that's fine.
I don't know if this is a named algorithm, but it sounds like you could assign each node a position on a “grid”, then pick a random offset. That would give the appearance of some chaos while still guaranteeing that there are no big empty spaces.
For example:
node.x = node.number / width + (Math.random() - 0.5) * SOME_SCALE;
node.y = node.number % height + (Math.random() - 0.5) * SOME_SCALE;
Maybe you could use a grid of circles and place one 25px-dot in every circle? Wouldn't really be random, but look good.
Or you could place dots randomly and then make empty areas attract dots and give dots a limited-range-repulsion, but that is maybe too complicated and takes too much CPU time for this simple task.

Resources