Getting pixel position from relative position to camera - math

I am looking for a way to find the (x, y) pixel position of a point in an image taken by camera. I know the physical position of the object (distance - width, height and depth), the resolution of the image and probably the focal distance (maybe I could also get some others camera parameteres - bbut I want as less information as possible).
In case I am not clear I want a formula/algorithm/procedure to map from (width, heigh, depth) to (x_pixel_position_in_image, y_pixe_position_in_image) - to connect the physical coordates with the pixel ones.
Thank you very much.

If you check the diagram linked below, the perspective projection of a 3d point with a camera depends on two main sources of information.
Diagram
Camera Parameters (Intrinsics) and the where the camera is in a fixed world coordinate (Extrinsics). Since you want to project points in the camera coordinate system, you can assume the world coordinate is coinciding with the camera. Hence the extrinsic matrix [R|t] can be expressed as,
R = eye(3); and t = [0; 0; 0].
Therefore, all you need to know is the camera parameters (focal length and optical center location). You can read more about this here.

Related

Converting XYZ to XY (world coords to screen coords)

Is there a way to convert that data:
Object position which is a 3D point (X, Y, Z),
Camera position which is a 3D point (X, Y, Z),
Camera yaw, pitch, roll (-180:180, -90:90, 0)
Field of view (-45°:45°)
Screen width & height
into the 2D point on the screen (X, Y)?
I'm looking for proper math calculations according to this exact set of data.
It's difficult, but it's possible to do it for yourself.
There are lots of libraries that do this for you, but it is more satisfying if you do it yourself:
This problem is possible and I have written my own 3D engine to do this for objects in javascript using the HTML5 Canvas. You can see my code here and solve a 3D maze game I wrote here to try and understand what I will talk about below...
The basic idea is to work in steps. To start, you have to forget about camera angle (yaw, pitch and roll) as these come later and just imagine you are looking down the y axis. Then the basic idea is to calculate, using trig, the pitch angle and yaw to your object coordinate. By this I mean imagining that you are looking through a letterbox, the yaw angle would be the angle in degrees left and right to your coordinate (so both positive and negative) from the center/ mid line and the yaw up and down from it. Taking these angles, you can map them to the x and y 2D coordinate system.
The calculations for the angles are:
pitch = atan((coord.x - cam.x) / (coord.y - cam.y))
yaw = atan((coord.z - cam.z) / (coord.y - cam.y))
with coord.x, coord.y and coord.z being the coordinates of the object and the same for the cam (cam.x, cam.y and cam.z). These calculations also assume that you are using a Cartesian coordinate system with the different axis being: z up, y forward and x right.
From here, the next step is to map this angle in the 3D world to a coordinate which you can use in a 2D graphical representation.
To map these angles into your screen, you need to scale them up as distances from the mid line. This means multiplying them by your screen width / fov. Finally, these distances will now be positive or negative (as it is an angle from the mid line) so to actually draw it on a canvas, you need to add it to half of the screen width.
So this would mean your canvas coordinate would be:
x = width / 2 + (pitch * (width / fov)
y = height / 2 + (yaw * (height / fov)
where width and height are the dimensions of you screen, fov is the camera's fov and yaw and pitch are the respective angles of the object from the camera.
You have now achieved the first big step which is mapping a 3D coordinate down to 2D. If you have managed to get this all working, I would suggest trying multiple points and connecting them to form shapes. Also try moving your cameras position to see how the perspective changes as you will soon see how realistic it already looks.
In addition, if this worked fine for you, you can move on to having the camera be able to not only change its position in the 3D world but also change its perspective as in yaw, pitch and roll angles. I will not go into this entirely now, but the basic idea is to use 3D world transformation matrices. You can read up about them here but they do get quite complicated, however I can give you the calculations if you get this far.
It might help to read (old style) OpenGL specs:
https://www.khronos.org/registry/OpenGL/specs/gl/glspec14.pdf
See section 2.10
Also:
https://www.khronos.org/opengl/wiki/Vertex_Transformation
Might help with more concrete examples.
Also, for "proper math" look up 4x4 matrices, projections, and homogeneous coordinates.
https://en.wikipedia.org/wiki/Homogeneous_coordinates

DICOM and the Image Position Patient

I am trying to figure out if DICOM Image Position (0020,0032) is an absolute coordinate or just the coordinates for whatever slice orientation I have?
For example, I have two planes, a sagittal and a coronal plane interleaved with respective Image Positions in mm in the form of (x,y,z) from the DICOM header. My question, is the (x,y,z) coordinate for the sagittal plane in the same 3D space as the (x,y,z) coordinate for the coronal plane or are the Image Position values specific for that plane only.
So, is the Image Position referenced off some absolute origin point or is changed for each specific image orientation?
Many thanks!
Yes, the image position (0020,0032) coordinates are absolute coordinates. They are relative to an origin point called the "frame of reference". It doesn't matter where the frame of reference is, but for CT/MRI scanners you can think of it as a fixed point for that particular scanner, relative to the scanner table (the table moves the patient through the scanner, so the frame of reference has to move too - otherwise the z-coodinates wouldn't change!)
What's important when comparing two images is not where the frame of reference is, but whether the same frame of reference is being used. If they are from the same scanner then they probably will be, but the way to check is whether the Frame of Reference UID (0020,0052) is the same.
A few things to note: if you have a stack of 2D slices then the Image Position tag contains the coordinates of the CENTRE of the first voxel of the 2D SLICE (not the whole stack of slices). So it will be different for each slice.
Even if two orthogonal planes line up at an edge, the Image Position coordinates won't necessarily be the same because the voxel dimensions could be different, so the centre of the voxel on one plane isn't necessarily the same as the centre of the voxel on another plane.
Also, it's worth emphasising that the coordinates are relative in some way to the scanner, not to the patient. When your planes are all reconstructed from the same data then everything is consistent. But if two scans were taken at different times then the coordinates of patient features will not necessarily match up as the patient may have moved.
Image Position (Patient) (0020,0032) specifies the origin of the image with respect to the patient-based coordinate system and patient based coordinate system is a right handed system. All three orthogonal planes should share the same Frame of Reference UID (0020,0052) to be spatially related to each other.
Yes, Image position is the absolute values of x, y, and z in the real-world coordinate system.
In MRI we have three different coordinate systems.
1. Real-world coordinate system
2. logical coordinate system
3. anatomical coordinate system.
sometimes they are referred with other names. There are heaps of names on the internet, but conceptually there are three of them.
To uniquely represent the status of the slice in the real world coordinate system we need to pinpoint its position and orientation.
The absolute x, y, and z of the first voxel that is transmitted (the one at the upper left corner of the slice) are considered as the image position. that's straightforward. But that is not enough. what if the slice is rotated?
So we have to determine the orientation as well.
To do that, we consider the first row and column of the image and calculate the cosine of their angles with respect to the main axes of the coordinate system as the image orientation.
Knowing these conventions, by looking at the image position (0020, 0032) and image orientation (0020, 0037) we can precisely pinpoint the slice in the real-world coordinate system.

how to translate 3d mesh, given a view direction and a change in cursor position

My question is similar to 3D Scene Panning in perspective projection (OpenGL) except I don't know how to compute the direction in which to move the mesh.
I have a program in which various meshes can be selected. Once a mesh is selected I want it to translate when click-dragging the cursor. When the cursor moves up, I want the mesh to move up, and so on for the appropriate direction. In other words, I want the mesh to translate in directions along the plane that is perpendicular to the viewing direction.
I have the Vector2 for the Delta (x,y) in cursor postion, and I have the Vector3 viewDirection of the camera and the center of the mesh. How can I figure out which way to translate the mesh in 3d space with the Delta and viewDirection? Will I need other information in order to to this calculation (such as the up, or eye)?
It doesn't matter if if the scale of the translation is off, I'm just trying to figure out the direction right now.
EDIT: for some reason I had a confusion about getting the up direction. Clearly it can be calculated by applying the camera rotation to the specified perspective up vector.
You'll need an additional vector, upDirection, which is the unit vector pointing "up" from your camera. You can now cross-product viewDirection and upDirection to get rightDirection, the vector pointing "right" from your camera.
You want to map y deltas to motion along upDirection (or -upDirection) and x deltas to motion in rightDirection. These vectors are in world-space.
You may want to scale the translation speed to match the mouse speed. If you are using perspective projection you'll want to scale the translation speed with your model's depth with respect to your camera (The further the object is from your camera, the faster you will need to move it if you want it to match the mouse.)

Traveling along the surface of a sphere using quaternions

I'm programming a 3D game where the user controls a first-person camera, and movement is constrained to the inside surface of a sphere. I've managed to constrain the movement, but I'm having trouble figuring out how to manage the camera orientation using quaternions. Ideally the camera up vector should point along the normal of the sphere towards its center, and user should be able to free look around - as if we was always on the bottom of the sphere, no matter where he moves.
Presumably you have two vectors describing the camera's orienation. One will be your V'up describing which way is up relative to the camera orientation and the other will be your V'norm which will be the direction the camera is aimed. You will also have a position p', where your camera is located at some time. You define a canonical orientation and position given by, say:
Vup = <0, 1, 0>
Vnorm = <0, 0, 1>
p = <0, -1, 0>
Given a quaternion rotation q you then apply your rotation to those vectors to get:
V'up = qVupq-1
V'norm = qVnormq-1
p' = qpq-1
In your particular situation, you define q to incrementally accumulate the various rotations that result in the final rotation you apply to the camera. The effect will be that it looks like what you're describing. That is, you move the camera inside a statically oriented and positioned sphere rather than moving the spehere around a statically oriented and positioned camera.
Each increment is computed by a rotation of some angle θ about the vector V = V'up x V'norm.
Quaternions are normally used to avoid gimbal lock in free space motion (flight sims, etc.). In your case, you actually want the gimbal effect, since a camera that is forced to stay upright will inevitably behave strangely when it has to point almost straight up or down.
You should be able to represent the camera's orientation as just a latitude/longitude pair indicating the direction the camera is pointing.

Show lat/lon points on screen, in 3d

It's been a while since my math in university, and now I've come to need it like I never thought i would.
So, this is what I want to achieve:
Having a set of 3D points (geographical points, latitude and longitude, altitude doesn't matter), I want to display them on a screen, considering the direction I want to take into account.
This is going to be used along with a camera and a compass , so when I point the camera to the North, I want to display on my computer the points that the camera should "see". It's a kind of Augmented Reality.
Basically what (i think) i need is a way of transforming the 3D points viewed from above (like viewing the points on google maps) into a set of 3d Points viewed from a side.
The conversion of Latitude and longitude to 3-D cartesian (x,y,z) coordinates can be accomplished with the following (Java) code snippet. Hopefully it's easily converted to your language of choice. lat and lng are initially the latitude and longitude in degrees:
lat*=Math.PI/180.0;
lng*=Math.PI/180.0;
z=Math.sin(-lat);
x=Math.cos(lat)*Math.sin(-lng);
y=Math.cos(lat)*Math.cos(-lng);
The vector (x,y,z) will always lie on a sphere of radius 1 (i.e. the Earth's radius has been scaled to 1).
From there, a 3D perspective projection is required to convert the (x,y,z) into (X,Y) screen coordinates, given a camera position and angle. See, for example, http://en.wikipedia.org/wiki/3D_projection
It really depends on the degree of precision you require. If you're working on a high-precision, close-in view of points anywhere on the globe you will need to take the ellipsoidal shape of the earth into account. This is usually done using an algorithm similar to the one descibed here, on page 38 under 'Conversion between Geographical and Cartesian Coordinates':
http://www.icsm.gov.au/gda/gdatm/gdav2.3.pdf
If you don't need high precision the techniques mentioned above work just fine.
could anyone explain me exactly what these params mean ?
I've tried and the results where very weird so i guess i am missunderstanding some of the params for the perspective projection
* {a}_{x,y,z} - the point in 3D space that is to be projected.
* {c}_{x,y,z} - the location of the camera.
* {\theta}_{x,y,z} - The rotation of the camera. When {c}_{x,y,z}=<0,0,0>, and {\theta}_{x,y,z}=<0,0,0>, the 3D vector <1,2,0> is projected to the 2D vector <1,2>.
* {e}_{x,y,z} - the viewer's position relative to the display surface. [1]
Well, you'll want some 3D vector arithmetic to move your origin, and probably some quaternion-based rotation functions to rotate the vectors to match your direction. There are any number of good tutorials on using quaternions to rotate 3D vectors (since they're used a lot for rendering and such), and the 3D vector stuff is pretty simple if you can remember how vectors are represented.
well, just a pice ov advice, you can plot this points into a 3d space (you can do easily this using openGL).
You have to transforrm the lat/long into another system for example polar or cartesian.
So starting from lat/longyou put the origin of your space into the center of the heart, than you have to transform your data in cartesian coord:
z= R * sin(long)
x= R * cos(long) * sin(lat)
y= R * cos(long) * cos(lat)
R is the radius of the world, you can put it at 1 if you need only to cath the direction between yoour point of view anthe points you need "to see"
than put the Virtual camera in a point of the space you've created, and link data from your real camera (simply a vector) to the data of the virtual one.
The next stemp to gain what you want to do is to try to plot timages for your camera overlapped with your "virtual space", definitevly you should have a real camera that is a control to move the virtual one in a virtual space.

Resources