Ray Tracer Camera - Orthographic to Perspective Projection - projection

I am implementing a ray tracer and it currently has an orthographic projection. I want to make it into a perspective projection. I know in orthographic you send out a ray from every pixel and check for intersections. In perspective projection, the starting position of the ray is constant rather than starting from every pixel.
So I assume that in perspective projection the ray's starting position should be the camera's position. The problem is that I don't think I ever explicitly placed a camera, so I do not know what to change my ray's starting position to.
How can I determine where my camera is placed? I tried (0,0,0), but that just leaves me with a blank image so I don't think it is right.

In orthographic projection, the rays through each pixel would have the same direction, as if the rays originated from a camera behind the screen placed at infinite distance.
For perspective projection, the camera has to be placed at a finite distance behind the screen. Each ray should originate from the camera and go through each pixel of the screen. The distance between the screen and camera depends on the viewing angle.

You can triangulate the distance from the camera to your object by first picking an angle for the perspective projection. A simple example: picking an angle of 60° for the vertical Field of View (FOV) and assuming your object's center is at (0,0,0) and you want to place the camera to look down the Z axis towards the center of your object. This forms a triangle, where you can triangulate the distance with trigonometric formula: distance = (objectHeight/2) / tan(60/2). So you place the camera at (0,0,distance). You can use the same concept for your actual object location.

Related

Getting pixel position from relative position to camera

I am looking for a way to find the (x, y) pixel position of a point in an image taken by camera. I know the physical position of the object (distance - width, height and depth), the resolution of the image and probably the focal distance (maybe I could also get some others camera parameteres - bbut I want as less information as possible).
In case I am not clear I want a formula/algorithm/procedure to map from (width, heigh, depth) to (x_pixel_position_in_image, y_pixe_position_in_image) - to connect the physical coordates with the pixel ones.
Thank you very much.
If you check the diagram linked below, the perspective projection of a 3d point with a camera depends on two main sources of information.
Diagram
Camera Parameters (Intrinsics) and the where the camera is in a fixed world coordinate (Extrinsics). Since you want to project points in the camera coordinate system, you can assume the world coordinate is coinciding with the camera. Hence the extrinsic matrix [R|t] can be expressed as,
R = eye(3); and t = [0; 0; 0].
Therefore, all you need to know is the camera parameters (focal length and optical center location). You can read more about this here.

Must polygon prism have interior angles >144 degree to have sides visible after projection transform?

Most view frustums are 35 to 45 degrees, the angle of each of the four sides as they slope from the near plane to far plane. These are exterior angles. A 36 degree frustum has interior angles of 144 degrees. The projection transform generates a box, rather than a frustum. The sides with 144 degree interior angles swivel in to 90 degrees.
Now consider that a 10-sided prism, a decagon prism, has the same angles as the frustum. If the viewer sees one of its faces orthogonally, as a flat 2D surface, it's neighboring faces will virtually disappear after the projection transform, reduced to 90 degree angles.
Am I correct or wrong?
Am I correct or wrong?
Wrong (see below for explanation why).
These are exterior angles.
Field of view is usually defined as an internal angle. However this isn't really important to the general point of the question, so let's just ignore it, it just means that the FoV angle is specified one way instead of another.
Now consider that a 10-sided prism, a decagon prism, has the same angles as the frustum. If the viewer sees one of its faces orthogonally, as a flat 2D surface, it's neighboring faces will virtually disappear after the projection transform, reduced to 90 degree angles.
The hidden assumption here is that what determines the visibility of a face is:
the angle of the face
the angle that the viewer is facing, and
the viewer's field of view.
This seems reasonable on the surface, but it's actually wrong. All of these things influence whether a face is visible to a viewer, but they don't actually determine it on their own. What's missing from the equation is the viewer's position.
What actually determines whether a face is visible to a viewer is whether the viewer's eye position lies on the correct side of the plane that the face lies on, extended in all directions to infinity. Secondarily, the face must also be in the viewer's field of view.
To convince yourself of this, stand in front of a door that opens towards you, and open it about 60 degrees. Then take a couple of steps back. You should be able to see the side of the door that faces the room you are standing in. Now walk forward through the door, facing forward the whole time. At a certain point, the side of the door you could originally see will become invisible, and the other side of the door will become visible. Obviously, the angle of the door hasn't changed, and neither has the direction you are facing, so what has caused the change in visibility is not the angles but the fact that you have passed from one side of the plane that the door lies on to the other.
Here's a diagram to illustrate this in the case of a decagon:
The diagram shows a top down view. The red line represents the plane of one of the faces of the decagon. The blue dots represent different viewer positions and the blue lines represent the field of view angles. When the viewer's eye position is on the "Can't see Face A here" side of the plane, Face A will not be visible, and vice versa for the other side.
For a viewer who "sees one of its faces orthogonally, as a flat 2D surface" (the front face), they will be able to see Face A at position 3 (indicated by the green line), but not at position 2 (because of the narrow FoV), and not at position 1 (because of being on the wrong side of the plane that Face A lies on).
Samgak
First, thank you for your response. It's impressive, and completely explanatory.
The view position changes how we see prism angles
.
I'm interested in what the projection matrix does to the prism. It's often said that projection "crushes" one dimension on the plane.
But that's not really how it works. Instead a projection transform rotates points into a new configuration.
Stand in front a door, so that you can see both the door and neighboring walls clearly. The door is a plane with a surface that's at a 60 degree angle.
A projection matrix squeezes the z-dimension. The view frustum rotates from, say, 40 degrees to 90 degrees. The door rotates from 60 to around 75 degrees. This isn't what the viewer sees, it's only the math.
As you noted, the viewer sets the scene. The viewer's sight lines intersect points in 3D or 2D alike. If the same sight lines intersect the door in 3D reality and when looking at a 2D monitor, then 2D looks like 3D.
Mathematically, however, it's mapped from a different location. A point on the conical frustum door, D1, at x1,y1,z1, is transformed to square, orthogonal frustum position x2,y2,z2. That maps to a screen position x3,y3.
In order to achieve this effect, every point in 3D has to rotate to 90 degrees. In the conical frustum each point, as a vector, has a unique angle to the near plane. Each vector rotates to 90 degrees from where it started, at 80, 60, 45 degrees, etc. The projection matrix performs this rotation.
If the frustum edge is angled at 40 degrees, it rotates 50 more to fit the orthogonal frustum. A surface at 60 degrees, to the left of the center line, will rotate 30 degrees.
Once rotated, the vectors map to the right sight lines as they intersect the screen.
I was confused at first, because it's a roundabout way to generate a 2D scene.

how to translate 3d mesh, given a view direction and a change in cursor position

My question is similar to 3D Scene Panning in perspective projection (OpenGL) except I don't know how to compute the direction in which to move the mesh.
I have a program in which various meshes can be selected. Once a mesh is selected I want it to translate when click-dragging the cursor. When the cursor moves up, I want the mesh to move up, and so on for the appropriate direction. In other words, I want the mesh to translate in directions along the plane that is perpendicular to the viewing direction.
I have the Vector2 for the Delta (x,y) in cursor postion, and I have the Vector3 viewDirection of the camera and the center of the mesh. How can I figure out which way to translate the mesh in 3d space with the Delta and viewDirection? Will I need other information in order to to this calculation (such as the up, or eye)?
It doesn't matter if if the scale of the translation is off, I'm just trying to figure out the direction right now.
EDIT: for some reason I had a confusion about getting the up direction. Clearly it can be calculated by applying the camera rotation to the specified perspective up vector.
You'll need an additional vector, upDirection, which is the unit vector pointing "up" from your camera. You can now cross-product viewDirection and upDirection to get rightDirection, the vector pointing "right" from your camera.
You want to map y deltas to motion along upDirection (or -upDirection) and x deltas to motion in rightDirection. These vectors are in world-space.
You may want to scale the translation speed to match the mouse speed. If you are using perspective projection you'll want to scale the translation speed with your model's depth with respect to your camera (The further the object is from your camera, the faster you will need to move it if you want it to match the mouse.)

Relationship between distance in 3D space and its z depth

I have a flat plane of 2D graphics with a camera pointing at them. I want to get the effect so when a user pinches and zooms, it looks like they anchored their fingers on the plane and can pinch zoom realistically. To do this, I need to calculate the the distance between their fingers into distance in 3D space (which I already can do), but then I need to map that 3D distance to a z value.
For example, if a 100 units wide square and shrunk to 50 units (50%), how much further back would the camera need to move to make that 100 unit square shrink by half?
So to put it simply, If I have the distance in 3D space, how do I calculate the distance of the camera needed to shrink that 3D space by a certain amount?
EDIT:
So, I tried it myself and came up with this formula:
So let's say you are 1 unit away from the object.
When you want to shrink it to 50% (zoomfactor) the new distance equals 2 units => 1 / 0.5 = 2. The camera must be twice as far away.
Moving the camera closer to the plane for zooming only works with a perspective projection. The absolute distance depends on the angle of view. Usually you zoom by reducing the angle of view and not moving the camera at all.
If you are using an orthographic projection you can simply adjust the field of view / scale the projection matrix.

Traveling along the surface of a sphere using quaternions

I'm programming a 3D game where the user controls a first-person camera, and movement is constrained to the inside surface of a sphere. I've managed to constrain the movement, but I'm having trouble figuring out how to manage the camera orientation using quaternions. Ideally the camera up vector should point along the normal of the sphere towards its center, and user should be able to free look around - as if we was always on the bottom of the sphere, no matter where he moves.
Presumably you have two vectors describing the camera's orienation. One will be your V'up describing which way is up relative to the camera orientation and the other will be your V'norm which will be the direction the camera is aimed. You will also have a position p', where your camera is located at some time. You define a canonical orientation and position given by, say:
Vup = <0, 1, 0>
Vnorm = <0, 0, 1>
p = <0, -1, 0>
Given a quaternion rotation q you then apply your rotation to those vectors to get:
V'up = qVupq-1
V'norm = qVnormq-1
p' = qpq-1
In your particular situation, you define q to incrementally accumulate the various rotations that result in the final rotation you apply to the camera. The effect will be that it looks like what you're describing. That is, you move the camera inside a statically oriented and positioned sphere rather than moving the spehere around a statically oriented and positioned camera.
Each increment is computed by a rotation of some angle θ about the vector V = V'up x V'norm.
Quaternions are normally used to avoid gimbal lock in free space motion (flight sims, etc.). In your case, you actually want the gimbal effect, since a camera that is forced to stay upright will inevitably behave strangely when it has to point almost straight up or down.
You should be able to represent the camera's orientation as just a latitude/longitude pair indicating the direction the camera is pointing.

Resources