Translate or Scale Coordinates from Image to Coodinate system - math

I'm creating a system where a user clicks on an image on a webpage that is generated from a CAD program. The image is of a cad drawing. I want to translate the coordinate of the image click to the real system coordinates.
The image changes sizes depends on CAD window size. I can get the bounding coordinates of the CAD system, and I can extract the coordinate that is clicked on the image as well as it's size.
How do I correlate the image coordinate to the real coordinate?
For example my bounding box of the CAD system is ll(2029 3350) ur(2373 3489). My image is 1024 X 415. My clicked image coordinate is 442 332. How do I translate that to the CAD coordinate system? I feel like this should be simple, but I'm really struggling.
I tried:
xFactor = (urx - llx)/w
yFactor = (ury - lly)/h
destX = xFactor * x
destY = yFactor * y
But it's not even close to correct. Maybe I need to calculate shift as well?
Thanks!

I was just missing the shift. Adding the minx and miny of the window bounding box gets me my correct coordinate.

Related

How to fix zoom towards mouse routine?

I'm trying to learn how to zoom towards mouse using Orthographic projection and so far I've got this:
def dolly(self, wheel, direction, x, y, acceleration_enabled):
v = vec4(*[float(v) for v in glGetIntegerv(GL_VIEWPORT)])
w, h = v[2], v[3]
f = self.update_zoom(direction, acceleration_enabled) # [0.1, 4]
aspect = w/h
x,y = x-w/2, y-h/2
K1 = f*10
K0 = K1*aspect
self.left = K0*(-2*x/w-1)
self.right = K0*(-2*x/w+1)
self.bottom = K1*(2*y/h-1)
self.top = K1*(2*y/h+1)
x/y: mouse screen coordinates
w/h: window width/height
f: factor which goes from 0.1 to 4 when scrolling down/up
left/right/bottom/top: values used to compute the new orthographic projection
The results I'm getting are really strange but I don't know which part of the formulas I've messed up.
Could you please spot which part of my maths are wrong or just post a clear pseudocode I can try? Just for the record, I've read&tested quite a lot of versions out there on the internet but haven't found yet any place where this subject is explained properly.
Ps. You don't need to post any SO link related to this subject as I've read all of them already :)
I'm going to answer this in a general way, based on the following set of assumptions:
You use a matrix P for the (ortho) projection describing the actual mapping of your eye space view volume onto the standard view volume [-1,1]^3 OpenGL will clip against (see also assumption 2) and a matrix V for the view transformtation, that is postion and orientation of the "camera" (if there is such a thing, especially in ortho projections) and basically establishing an eye space where your view volume will be defined relative to.
I will ignore the homogeneous clip space, as you work with completely affine ortho projections only, that means NDC coordinates and clip space will be identical, and no tricks to any w coordinate are applied.
I assume default GL conventions for eye space and projection matrices, notably eye space origin is camera location and camera lookat direction is -z
The viewport is filling the window completely.
Windows Space is default OpenGL convention where the origin is at the bottom left.
Mouse coordinates are in some window-specific coordinate frame where the origin is at top left, mouse is at integer pixel coordinates.
I assume that the view volume defined by P is symmetrical: right = -left and top = -bottom, and it is also supposed to stay symmetrical after the zoom operation, therefore, to compensate for any movement, the view matrix V must be adjusted, too.
What you want to get is a zoom such that the object point under the mouse cursor does not move, so becomes the center of the scale operation. The mouse cursor itself is only 2D and a whole straight line in the 3D space will be mapped to the same pixel location. However, in an ortho projection, that line will be orthogonal to the image plane, so we don't need to bother much with the third dimension.
So what we want is to scale the current situation with P_old (defined by the ortho parameters l_old, r_old, b_old, t_old, n_old and f_old) and V_old (defined by "camera" position c_old and ortientation o_old) by a zoom factor s at mouse position (x,y) (in the space from assumption 6).
We can see a few things directly:
the near and far plane of the projection should be unaffected by the operation, so n_new = n_old and f_new = f_old.
the actual camera orientation (or lookat direction) should also be unaffected: o_new = o_old
If we zoom in by a factor of s, the actual view volume must be scaled by 1/s, since when we zoom in, a smaller part of the complete world is mapper on the screen than before (and appears bigger). So we can simply scale the frustum parameters we had:
l_new = l_old / s, r_new = r_old / s, b_new = b_old / s, t_new = t_old / s
If new only replace P_old by P_new, we get the zoom, but the world point under the mouse cursor will move (except the mouse is exactly in the center of the view). So we have to compensate for that by modifying the camera position.
Let's first put the mouse coords (x,y) into OpenGL window space (assumptions 5 and 6):
x_win = x + 0.5
y_win = height - 0.5 - y
Note that besides mirroring y, I also shift the coordinates by half a pixels. That's because in OpenGL window space, pixel centers are at half-inter coordinates, while I assume that your integer mouse coordinates are to represent the center of the pixel you click onto (will not make a big difference visually, but still)
Now let's further put the coords into Normalized Device Space (relying on assumption 4 here):
x_ndc = 2.0 * x_win / width - 1
y_ndc = 2.0 * y_win / height - 1
By assumption 2, clip and NDC coordiantes will be identical, and we can call the vector v our NDC/space mouse coordinates: v = (x_ndc, y_ndc, 0, 1)^T
We can now state our "point under mouse must not move" condition:
inverse(V_old) * inverse(P_old) * v = inverse(V_new) * inverse(P_new) * v
But let's just go into eye space and let's look at what happened:
Let a = inverse(P_old) * v be the eye space location of the point under the mouse cursor before we scaled.
Let b = inverse(P_new) * v be the eye space location of the pointer under the mouse cursor after we scaled.
Since we assumed a symmetrical view volume, we already know that for the x and y coordinates, b = (1/s) *a holds (assumption 7. if that assumption does not hold, you need to do the actual calculation for b too, which isn't hard either).
So, we can set up an 2D eye space offset vector d which describes how our point of interest was moved by the scale:
d = b - a = (1 / s) *a - a = a (1/s - 1)
To compensate for that movement, we have to move our camera inversely, so by -d.
If you keep the camera position separate as I did in assumption 1, you simply need to update the camera position c accordingly. You just have to take care about the fact that c is the world space postion, while d is an eye space offset:
c_new = c_old - inverse(V_old) * (d_x, d_y, 0, 0)^T
Not that if you do not keep the camera position as a separate variable, but keep the view matrix directly, you can simply pre-multiply the translation: V_new = translate(-d_x, -d_y, 0) * V_old
Update
What I wrote so far is correct, but I took a shortcut which is numerically a very bad idea when working with not-infinite precision data types. The error in camera position accumulates very fast if one zooms out a lot. So after #BPL implemted this, this it what he got:
The main issue seems to be that I directly calculated the offset vector d in eye space, which does not take the current view matrix V_old (and its small errors into account). So a more stable approach is to calculate all of this directly in world space:
a = inverse(P_old * V_old) * v
b = inverse(P_new * V_old) * v
d = b - a
c_new = c_old - d
(doing so makes assumption 7 not needed anymore as a by product, so it directly works in the general case of arbitrary ortho matrices).
Using this approach, the zoom operation worked as expected:

Crop image in scilab

I want to crop image using mouse selection at particular region of interest in scilab,here my code is
I=imread('G:\SCI\FRAME\mixer2.jpg');
I1G = rgb2gray(I);
figure();ShowImage(I1G,'mixer');
IN1G = gca();
rect1 = rubberbox();
ROI1=imcrop(I1G,rect1);disp(ROI1);
But it gives the following error: The rectangle is out of the image range.
and i also use xclick and xgetmouse function for cropping using mouse selection and it also gives the same error.
please give me suggestions for correcting code .
Thanks and Regards
The problem arises from the difference between the image coordinate system (used by imcrop and all the other functions of the SIVP toolbox) and the "regular" coordinate system (used by rubberbox, xcick and all the builtin functions). Images have the first pixel at top-left. On the contrary rubberbox have the zero at bottom left.
To correct this you have to reverse the y (vertical) axes coordinate before applying imcrop():
imagefile="d:\Attila\PROJECTS\Scilab\Stackoverflow\mixer_crop.jpg";
I=imread(imagefile);
I1G=rgb2gray(I);
scf(0); clf(0);
ShowImage(I1G,'mixer');
rect1=rubberbox();
imheight=size(I1G,"r"); //image height
rect1(2)=imheight-rect1(2); //reverse y axes coordinates (0 is at top)
ROI1=imcrop(I1G,rect1);
scf(1); clf(1);
ShowImage(ROI1,'ROI1');

formula for game xy to image (pixel) xy

It's probably quite simple, but i can't find what i need on search engines... (it's like they used to know better what i was looking for)
I need to convert in-game coordinates to "coordinates" on an image so i can add ... say a pixel on the image to represent the location of the in-game coordinates.
The image is a map, the size is 2384x2044 (width x height).
The in-game 0,0 = the middle of the in-game map, this would also be the middle of the image.
So it's easy to find the xy to print a pixel on the middle of image:
2384 : 2 = 1192 and 2044 : 2 = 1022, so the xy for 0,0 in-game on the image is 1192,1022.
Now, for example, if i move up and slightly to the left in-game the coordinates become: -141.56,1108.11 - How can i calculate the correct xy for the image?
image: http://i.imgur.com/yfiwfO7.png?1
To recap, you want to scale game coordinates of -3000 to +3000 in both axes and offset them to centre them on your image; in that case the computations you want are
pixel_x = 1192 + (game_x * 1192 / 3000.0)
pixel_y = 1022 - (game_y * 1022 / 3000.0)
Note the minus on the y line to invert the direction of the offset. Your game coordinates are floating point so I've made the 3000s floating point by adding a .0 - you didn't say what language you were using so this may or may not be the correct syntax.
However you probably ought to avoid putting constants into this in case you ever want to change the size of the playfield or the image. It would be better to
set up constants in your program for the playfield dimensions
set up constants or global variables for the size of the image: you can read this from the image as you load it
pre-compute the values 1192 / 3000.0 and 1022 / 3000.0 (but using your image constants) to save one floating point operation for each scale? probably not worth it nowadays as a speed optimisation, though, and you might sacrifice a tiny bit of floating point accuracy at the end of the mantissa, but that won't matter here.

Finding a pixel location in a field of view

So I have a camera with a defined field of view a point of a place I would like to label in the image. I have both the lat and lon of the points and I know the angle between them however my equation for finding the pixel location is off. Attached is an image to help my explanation:
I can solve for each vector of the camera to the center of view and the point and the full angle of the field of view and the angle between the center of view and point.
Here is what Im currently using: [angle of the field of view(green angle)/ the angle between the vectors (the blue angle)] * 1024 (screen width)
With numbers: (14.182353/65) * 1024 = 223.426620 and on the image the pixel value should be 328...
Another way I tried it was using a bearing equation: [[the bearing of the point to the camera- the bearing of the left side of the field of view ]/field of view] * 1024
With numbers: ((97.014993-83.500000)/65) * 1024 = 212.913132 and the answer should be 328...
Can anyone think of a more accurate solution?
Try 512(1-tan(blue)/tan(green/2)), where blue is positive to the left.
If blue is to the right, you can treat it as a negative number, to get 512(1+tan(blue)/tan(green/2)).
Explanation:
Let C be the camera, d be the dot labeled 328, E be the center of the field of view, and L be the left end point of the field of view, so that you want to find dL. Then (for blue going left):
dL+dE = EL = 512
tan(green/2)=EL/CE
tan(blue)=dE/CE
Then tan(blue)/tan(green/2) = dE/EL = (512-dL)/512, and you can solve for dL.
Going right would be similar (or you can work with negative distances, and everything works out fine).

Converting x/y values in camera view to pan/tilt values

If I have a camera which gives out 360 degree pan (x) and tilt (y) values, and I want to get the pan and tilt values of where I have my cursor in the camera's view, how would I convert that?
More info:
It's a Flash/AS3 project.
The pan and tilt values are from the center of the camera view.
Camera view size is 960x540.
You gave the "view" size in pixels. What you need to know is the Field of View (FOV), which is measured in degrees. From that you can tell the number of degrees from center to the image edges.
You might be able to find the FOV in your camera's technical specifications. (It's determined by the detector array size and the focal length). Alternatively, you could try measuring it. Here's a webpage that explains how:
http://www.panohelp.com/lensfov.html

Resources