Drawing Isometric game worlds

Drawing Isometric game worlds - 2d

What is the correct way to draw isometric tiles in a 2D game?
I've read references (such as this one) that suggest the tiles be rendered in a way that will zig-zag each column in the 2D array representation of the map. I imagine that they should be drawn more in a diamond fashion, where what gets drawn to the screen relates more closely to what the 2D array would look like, just rotated a little.
Are there advantages or disadvantages to either method?

Update: Corrected map rendering algorithm, added more illustrations, changed formating.
Perhaps the advantage for the "zig-zag" technique for mapping the tiles to the screen can be said that the tile's x and y coordinates are on the vertical and horizontal axes.
"Drawing in a diamond" approach:
By drawing an isometric map using "drawing in a diamond", which I believe refers to just rendering the map by using a nested for-loop over the two-dimensional array, such as this example:
tile_map[][] = [[...],...]
for (cellY = 0; cellY < tile_map.size; cellY++):
for (cellX = 0; cellX < tile_map[cellY].size cellX++):
draw(
tile_map[cellX][cellY],
screenX = (cellX * tile_width / 2) + (cellY * tile_width / 2)
screenY = (cellY * tile_height / 2) - (cellX * tile_height / 2)
)
Advantage:
The advantage to the approach is that it is a simple nested for-loop with fairly straight forward logic that works consistently throughout all tiles.
Disadvantage:
One downside to that approach is that the x and y coordinates of the tiles on the map will increase in diagonal lines, which might make it more difficult to visually map the location on the screen to the map represented as an array:
However, there is going to be a pitfall to implementing the above example code -- the rendering order will cause tiles that are supposed to be behind certain tiles to be drawn on top of the tiles in front:
In order to amend this problem, the inner for-loop's order must be reversed -- starting from the highest value, and rendering toward the lower value:
tile_map[][] = [[...],...]
for (i = 0; i < tile_map.size; i++):
for (j = tile_map[i].size; j >= 0; j--): // Changed loop condition here.
draw(
tile_map[i][j],
x = (j * tile_width / 2) + (i * tile_width / 2)
y = (i * tile_height / 2) - (j * tile_height / 2)
)
With the above fix, the rendering of the map should be corrected:
"Zig-zag" approach:
Advantage:
Perhaps the advantage of the "zig-zag" approach is that the rendered map may appear to be a little more vertically compact than the "diamond" approach:
Disadvantage:
From trying to implement the zig-zag technique, the disadvantage may be that it is a little bit harder to write the rendering code because it cannot be written as simple as a nested for-loop over each element in an array:
tile_map[][] = [[...],...]
for (i = 0; i < tile_map.size; i++):
if i is odd:
offset_x = tile_width / 2
else:
offset_x = 0
for (j = 0; j < tile_map[i].size; j++):
draw(
tile_map[i][j],
x = (j * tile_width) + offset_x,
y = i * tile_height / 2
)
Also, it may be a little bit difficult to try to figure out the coordinate of a tile due to the staggered nature of the rendering order:
Note: The illustrations included in this answer were created with a Java implementation of the tile rendering code presented, with the following int array as the map:
tileMap = new int[][] {
{0, 1, 2, 3},
{3, 2, 1, 0},
{0, 0, 1, 1},
{2, 2, 3, 3}
};
The tile images are:
tileImage[0] -> A box with a box inside.
tileImage[1] -> A black box.
tileImage[2] -> A white box.
tileImage[3] -> A box with a tall gray object in it.
A Note on Tile Widths and Heights
The variables tile_width and tile_height which are used in the above code examples refer to the width and height of the ground tile in the image representing the tile:
Using the dimensions of the image will work, as long as the image dimensions and the tile dimensions match. Otherwise, the tile map could be rendered with gaps between the tiles.

Either way gets the job done. I assume that by zigzag you mean something like this: (numbers are order of rendering)
.. .. 01 .. ..
.. 06 02 ..
.. 11 07 03 ..
16 12 08 04
21 17 13 09 05
22 18 14 10
.. 23 19 15 ..
.. 24 20 ..
.. .. 25 .. ..
And by diamond you mean:
.. .. .. .. ..
01 02 03 04
.. 05 06 07 ..
08 09 10 11
.. 12 13 14 ..
15 16 17 18
.. 19 20 21 ..
22 23 24 25
.. .. .. .. ..
The first method needs more tiles rendered so that the full screen is drawn, but you can easily make a boundary check and skip any tiles fully off-screen. Both methods will require some number crunching to find out what is the location of tile 01. In the end, both methods are roughly equal in terms of math required for a certain level of efficiency.

If you have some tiles that exceed the bounds of your diamond, I recommend drawing in depth order:
...1...
..234..
.56789.
..abc..
...d...

Coobird's answer is the correct, complete one. However, I combined his hints with those from another site to create code that works in my app (iOS/Objective-C), which I wanted to share with anyone who comes here looking for such a thing. Please, if you like/up-vote this answer, do the same for the originals; all I did was "stand on the shoulders of giants."
As for sort-order, my technique is a modified painter's algorithm: each object has (a) an altitude of the base (I call "level") and (b) an X/Y for the "base" or "foot" of the image (examples: avatar's base is at his feet; tree's base is at it's roots; airplane's base is center-image, etc.) Then I just sort lowest to highest level, then lowest (highest on-screen) to highest base-Y, then lowest (left-most) to highest base-X. This renders the tiles the way one would expect.
Code to convert screen (point) to tile (cell) and back:
typedef struct ASIntCell { // like CGPoint, but with int-s vice float-s
int x;
int y;
} ASIntCell;
// Cell-math helper here:
// http://gamedevelopment.tutsplus.com/tutorials/creating-isometric-worlds-a-primer-for-game-developers--gamedev-6511
// Although we had to rotate the coordinates because...
// X increases NE (not SE)
// Y increases SE (not SW)
+ (ASIntCell) cellForPoint: (CGPoint) point
{
const float halfHeight = rfcRowHeight / 2.;
ASIntCell cell;
cell.x = ((point.x / rfcColWidth) - ((point.y - halfHeight) / rfcRowHeight));
cell.y = ((point.x / rfcColWidth) + ((point.y + halfHeight) / rfcRowHeight));
return cell;
}
// Cell-math helper here:
// http://stackoverflow.com/questions/892811/drawing-isometric-game-worlds/893063
// X increases NE,
// Y increases SE
+ (CGPoint) centerForCell: (ASIntCell) cell
{
CGPoint result;
result.x = (cell.x * rfcColWidth / 2) + (cell.y * rfcColWidth / 2);
result.y = (cell.y * rfcRowHeight / 2) - (cell.x * rfcRowHeight / 2);
return result;
}

You could use euclidean distance from the point highest and nearest the viewer, except that is not quite right. It results in spherical sort order. You can straighten that out by looking from further away. Further away the curvature becomes flattened out. So just add say 1000 to each of the x,y and z components to give x',y' and z'. The sort on x'*x'+y'*y'+z'*z'.

Real problem is when you need draw some tile/sprites intersecting/spanning two or more other tiles.
After 2 (hard) months of personal analisys of problem I finally found and implemented a "correct render drawing" for my new cocos2d-js game.
Solution consists in mapping, for each tile (susceptible), which sprites are "front, back, top and behind".
Once doing that you can draw them following a "recursive logic".

Related

How can I seamlessly wrap map tiles around cylindrically?

I'm creating a game that takes place on a map, and the player should be able to scroll around the map. I'm using real-world data from NASA as a 5700 by 2700 pixel image split into 4 smaller ones, each corresponding to a hemisphere:
How I split up the image:
The player will be viewing the world through a camera, which is currently in a 4:3 aspect ratio, which can be moved around. Its height and width can be described as two variables x and y, currently at 480 and 360 respectively.
Model of the camera:
In practice, the camera is "fixed" and instead the tiles move. The camera's center is described as two variables: xcam and ycam.
Currently, the 4 tiles move and hide flawlessly. The problem arises when the camera passes over the "edge" at 180 degrees latitude. What should happen is that the tiles on one side should show and move as if the world was a cylinder without any noticeable gaps. I update xcam by doing this equation to it:
xcam = ((xcam + (2700 - x) mod (5400 - x)) - (2700 - x)
And the tiles' centers update according to these equations (I will focus only on tiles 1 and 2 for simplicity):
tile1_x = xcam - 1350
tile1_y = ycam + 650
tile2_x = xcam + 1350
tile2_y = ycam + 650
Using this, whenever the camera moves past the leftmost edge of tile 1, it "skips" and instead of tile 1 still being visible with tile 2 in view, it moves enough so that tile 2's rightmost edge is in the camera's rightmost edge.
Here's what happens in reality: ,
and here's what I want to happen: .
So, is there any way to update the equations I'm using (or even completely redo everything) so that I can get smooth wrapping?

I think you unnecessarily hard-code a number of tiles and their sizes, and thus bind your code to those data. In my opinion it would be better to store them in some variables, so that they can be easily modified in one place if data ever changes. This also allows us to write a more flexible code.
So, let's assume we have variables:
// logical size of the whole Earth's map,
// currently 2 and 2
int ncols, nrows;
// single tile's size, currently 2700 and 1350
int wtile, htile;
// the whole Earth map's size
// always ncols*wtile and nrows*htile
int wmap, hmap;
Tile tiles[nrows][ncols];
// viewport's center in map coordinates
int xcam, ycam;
// viewport's size in map coordinates, currently 480 and 360
int wcam, hcam;
Whenever we update the player's position, we need to make sure the position falls within an allowed range. But, we need to establish the coordinates system first in order to define the allowed range. For example, if x values span from 0 to wmap-1, increasing rightwards (towards East), and y values span from 0 to hmap-1, increasing downwards (toward South), then:
// player's displacement
int dx, dy;
xcam = (xcam + dx) mod wmap
ycam = (ycam + dy) mod hmap
assures the camera position is always within the map. (Assumed the mod operator always returns non-negative value. Should it work like the C language % operator, which returns negative result for negative dividend, one needs to add a divisor first to make sure the first argument is non-negative: xcam = (xcam + dx + wmap) mod wmap, etc.)
If you'd rather like to have xcam,ycam = 0,0 at the center of a map (that is, at the Greenwich meridian and the equator), then the allowed range would be -wmap/2 through wmap/2-1 for x and -hmap/2 through hmap/2 - 1 for y. Then:
xcam = (xcam + dx + wmap/2) mod wmap - wmap/2
ycam = (ycam + dy + hmap/2) mod hmap - hmap/2
More generally, let x0, y0 denote the 'zero' position of camera relative to the upper-left corner of the map. Then we can update the camera position by transforming it to the map's coordinates, then shifting and wrapping, and finally transforming back to camera's coordinates:
xmap = xcam + x0
ymap = ycam + y0
xmap = (xmap + dx) mod wmap
ymap = (ymap + dy) mod hmap
xcam = xmap - x0
ycam = ymap - y0
or, more compactly:
xcam = (xcam + dx + x0) mod wmap - x0
ycam = (ycam + dy + y0) mod hmap - y0
Now, when we know the position of the viewport (camera) relative to the map, we need to fill it with the map tiles. And a new decision must be made here.
When we travel from Anchorage, Alaska (western hemisphere) to the North, we eventually reach the North Pole and then we'll find ourselves in the eastern hemisphere, headin South. If we proceed in the same direction, we'll get to Kuusamo, Norway, then Sankt Petersburg, Russia, then Kiev, Ukraine... But that would be a travel to the South! We usually do not describe it as a next part of the initial North route. Consequently, we do not show the part 'past the pole' as an upside-down extension of a map. Hence the map should never show tiles above row number 0 or below row nrows-1.
On the other hand, when we travel along circles of latitude, we smoothly cross the 0 and 180 meridians and switch between the eastern and western hemisphere. So if the camera view covers area on both sides of the left or right edge of the map, we need to continue filling the view with tiles from the other end of the tiles array. If we use a map scaled down, so that it is smaller than the viewport, we may even need to iterate that more than once!
The left edge of a camera view corresponds to the 'longitude' of xleft = xcam - wcam/2 and the right one to xrght = xcam + wcam/2. So we can step across the viewport by the tile's width to find out appropriate columns and show them:
x = xleft
repeat
show a column at x
x = x + wtile
until x >= xrght
The 'show a column at x' part requires finding appropriate column, then iterating across the column to show corresponding tiles. Let's find out which tiles fit the camera view:
ytop = ycam - hcam/2
ybot = ycam + hcam/2
y=ytop
repeat
show a tile at x,y
y = y + htile
until y >= ybot
To show the tile we need to locate appropriate tile and then send it to appropriate position in the camera view.
However, we treat column number differently from the row number: columns wrap while rows do not:
row = y/htile
if (0 <= row) and (row < nrows) then
col = (x/wtile) mod ncols
xtile = x - (x mod wtile)
ytile = y - (y mod htile)
display tile[row][col] at xtile,ytile
endif
Of course xtile and ytile are our map-scale longitude and latitude, so the 'display tile at' routine must transform them to the camera view coordinates by subtracting the camera position from them:
xinwiev = xtile - xcam
yinview = ytile - ycam
and then apply the resulting values relative to the camera view's center at the displaying device (screen).
Another level of complication will appear if you want to implement zooming in and out the view, that is dynamic scaling of the map, but I'm sure you'll find out yourself which calculations will need applying the zoom factor for correct results. :)

Calculate the coordinate of the small center hexagon of a group of hexagon in a grid

I am stuck on a problem that seems easy to solve but I can't seem to pinpoint the right formula.
I have a list of hexagon groups in a cube coordinate system. I know the cube coordinates of the groups but I need to calculate the "global" coordinate of a small hexagon in a given group.
For example, in the image below, I know the coordinates for GroupA (x=0, y=0, z=0) and GroupB (x=-1, y=1, z=0). How can I calculate the coordinates of the center tile of GroupB given that each group has the same radius (in this case the radius is 1) and they don't overlap each other (let's see it as a tiling of groups starting from 0,0,0 that creates a hex grid)?
In this simple example, I know as a human being that the center tile of GroupB is (x=-1, y=3, z=-2) but I need to code that logic in a way that a computer can calculate it for any given group on the map. I don't particularly need help on the code itself but the overall logic.
In this article, the author does the opposite (going from small hexagon and trying to find its group):
https://observablehq.com/#sanderevers/hexagon-tiling-of-an-hexagonal-grid
Any help would be greatly appreciated!
Thanks!

It looks like I have found something that seems to work.
Please feel free to correct me if I'm mistaken.
Based on the article I linked in my original question, I came up with an algorithm that calculates the small hexagon central coordinates based on its higher group coordinates (in this case, I've used a group with a radius of 10). I took the original algorithm and removed the area division the author did. The code is in javascript. The i, j and k variables are the cube coordinates of the group. The function returns the cube coordinates of the central small hex :
getGroupCentralTileCoordinates(i, j, k)
{
let r = 10;
let shift = 3 * r + 2;
let xh = shift * i + j;
let yh = shift * j + k;
let zh = shift * k + i;
return {
'x': (1 + xh - yh) / 3,
'y': (1 + yh - zh) / 3,
'z': (1 + zh - xh) / 3
};
}

Rendering 2d function plot

My task is to produce the plot of a 2-dimensional function in real time using nothing but linear algebra and color (imagine having to compute an image buffer in plain C++ from a function definition, for example f(x,y) = x^2 + y^2). The output should be something like this 3d plot.
So far I have tried 3 approaches:
1: Ray tracing:
Divide the (x,y) plane into triangles, find the z-values of each vertex, thus divide the plot into triangles. Intersect each ray with the triangles.
2: Sphere tracing:
a method for rendering implicit surfaces described here.
3: Rasterization:
The inverse of (1). Split the plot into triangles, project them onto the camera plane, loop over the pixels of the canvas and for each one choose the "closest" projected pixel.
All of these are way to slow. Part of my assignment is moving around the camera, so the plot has to be re-rendered in each frame. Please point me towards another source of information/another algorithm/any kind of help. Thank you.
EDIT
As pointed out, here is the pseudocode for my very basic rasterizer. I am aware that this code might not be flawless, but it should resemble the general idea. However, when splitting my plot into 200 triangles (which I do not expect to be enough) it already runs very slowly, even without rendering anything. I am not even using a depth buffer for visibility. I just wanted to test the speed by setting up a frame buffer as follows:
NOTE: In the JavaScript framework I am using, _ denotes array indexing and a..b composes a list from a to b.
/*
* Raster setup.
* The raster is a pxH x pxW array.
* Raster coordinates might be negative or larger than the array dimensions.
* When rendering (i.e. filling the array) positions outside the visible raster will not be filled (i.e. colored).
*/
pxW := Width of the screen in pixels.
pxH := Height of the screen in pixels.
T := Transformation matrix of homogeneous world points to raster space.
// Buffer setup.
colBuffer = apply(1..pxW, apply(1..pxH, 0)); // pxH x pxW array of black pixels.
// Positive/0 if the point is on the right side of the line (V1,V2)/exactly on the line.
// p2D := point to test.
// V1, V2 := two vertices of the triangle.
edgeFunction(p2D, V1, V2) := (
det([p2D-V1, V2-V1]);
);
fillBuffer(V0, V1, V2) := (
// Dehomogenize.
hV0 = V0/(V0_3);
hV1 = V1/(V1_3);
hV2 = V2/(V2_3);
// Find boundaries of the triangle in raster space.
xMin = min(hV0.x, hV1.x, hV2.x);
xMax = max(hV0.x, hV1.x, hV2.x);
yMin = min(hV0.y, hV1.y, hV2.y);
yMax = max(hV0.y, hV1.y, hV2.y);
xMin = floor(if(xMin >= 0, xMin, 0));
xMax = ceil(if(xMax < pxW, xMax, pxW));
yMin = floor(if(yMin >= 0, yMin, 0));
yMax = ceil(if(yMax < pxH, yMax, pxH));
// Check for all points "close to" the triangle in raster space whether they lie inside it.
forall(xMin..xMax, x, forall(yMin..yMax, y, (
p2D = (x,y);
i = edgeFunction(p2D, hV0.xy, hV1.xy) * edgeFunction(p2D, hV1.xy, hV2.xy) * edgeFunction(p2D, hV2.xy, hV0.xy);
if (i > 0, colBuffer_y_x = 1); // Fill all points inside the triangle with some placeholder.
)));
);
mapTrianglesToScreen() := (
tvRaster = homogVerts * T; // Triangle vertices in raster space.
forall(1..(length(tvRaster)/3), i, (
actualI = i / 3 + 1;
fillBuffer(tvRaster_actualI, tvRaster_(actualI + 1), tvRaster_(actualI + 2));
));
);
// After all this, render the colBuffer.
What is wrong about this approach? Why is it so slow?
Thank you.

I would go with #3 it is really not that complex so you should obtain > 20 fps on standard machine with pure SW rasterizer (without any libs) if coded properly. My bet is you are using some slow API like PutPixel or SetPixel or doing some crazy thing. Without seeing code or better description of how you do it is hard to elaborate. All the info you need to do this is in here:
Algorithm to fill triangle
HSV histogram
Understanding 4x4 homogenous transform matrices
Do look also in the sub-links in each ...

explanation of animating a sine wave example

I am looking at this example and one part of this does not make sense.
The code creates an array of values like this:
for (i = 0; i < 84; i++) {
data.push(i * 10 / 84);
}
It then uses this array to get both the x and y values for the graph where d is an element of the array:
sine
.x(function (d, i) { return xScale(d); })
.y(function (d, i) { return yScale(Math.sin(d - time)); });
Is 84 just an arbitrary number for the available width remaining for the graph or is there any particular reason of where this came from?

I think it is a number of points per circle... compromise between accuracy and speed. I usually use 36 for small circles and 90 for big. ... and few thousand for huge ones ... so the idea is to use as low count as possible while the circle still looks like circle (in max zoom) and not like polygon.
You can also compute this algebraically ...
da=2.0*M_PI/n
e=r-(r*cos(0.5*da))
where n is the number of line segments per circumference and e is the max distance from desired circle shape. if you set it to desired error in pixels (and radius r is in pixels) then:
n=M_PI/acos((r-e)/r)
Hopefully I did not make any mistake while deriving the equations directly in SO editor. So if you want really precise circle set e=0.4 [pixels] and you should be fine
[edit1] sin wave
The for loop creates list wit these properties:
d(i) = < 0.0 , 10.0 )
i = { 0,1,2,...83 }
Then the sinwave is rendered:
x(i) = xscale * d(i)
y(i) = yscale * sin(d(i)-time)
Which gives you:
x(i) = < 0.0 , xscale )
y(i) = < -yscale , +yscale )
So the sinwave renders 10/(2*PI)= ~ 1.59 periods. The half overlap is cut off by the view. So in theory you could use 6.28/84 -> 7/84 instead of 10/84 but it is maybe just safety value to handle different aspect ratio seettings of the rendering (I do not code in that platform so this is just speculation on my side) But as I said in the comments the sinwave is scaled so the PI period x size is equal to PI*circle_radius so the 84 most likely comes from the circle (my original answer).

This is possibly just a magic number, that is, completely arbitrary. In fact, as you said, the first thing I thought was that it is related to the width of the graph.
Here is a fiddle: https://jsfiddle.net/1nboube9/1/
You can tweak the number and see what happens. It seems to me that any number above 44 does the trick.
for (i = 0; i < 44; i++) {
data.push(i * 10 / 84);
}
But, of course, the path is not the same if you change the denominator as well:
for (i = 0; i < 44; i++) {
data.push(i * 10 / 44);
}
This creates a very different path. And, so, I tried this:
for (i = 0; i < someNumber; i++) {
data.push(i);
}
And it creates a very unpleasant path. So, I believe that this is what happened: the designer first created data.push(i * 10 / 84); to make the path more circular, and then changed the loop accordingly. Maybe I'm completely wrong, but that's my bet.

Implementing Ray Picking

I have a renderer using directx and openGL, and a 3d scene. The viewport and the window are of the same dimensions.
How do I implement picking given mouse coordinates x and y in a platform independent way?

If you can, do the picking on the CPU by calculating a ray from the eye through the mouse pointer and intersect it with your models.
If this isn't an option I would go with some type of ID rendering. Assign each object you want to pick a unique color, render the objects with these colors and finally read out the color from the framebuffer under the mouse pointer.
EDIT: If the question is how to construct the ray from the mouse coordinates you need the following: a projection matrix P and the camera transform C. If the coordinates of the mouse pointer is (x, y) and the size of the viewport is (width, height) one position in clip space along the ray is:
mouse_clip = [
float(x) * 2 / float(width) - 1,
1 - float(y) * 2 / float(height),
0,
1]
(Notice that I flipped the y-axis since often the origin of the mouse coordinates are in the upper left corner)
The following is also true:
mouse_clip = P * C * mouse_worldspace
Which gives:
mouse_worldspace = inverse(C) * inverse(P) * mouse_clip
We now have:
p = C.position(); //origin of camera in worldspace
n = normalize(mouse_worldspace - p); //unit vector from p through mouse pos in worldspace

Here's the viewing frustum:
First you need to determine where on the nearplane the mouse click happened:
rescale the window coordinates (0..640,0..480) to [-1,1], with (-1,-1) at the bottom-left corner and (1,1) at the top-right.
'undo' the projection by multiplying the scaled coordinates by what I call the 'unview' matrix: unview = (P * M).inverse() = M.inverse() * P.inverse(), where M is the ModelView matrix and P is the projection matrix.
Then determine where the camera is in worldspace, and draw a ray starting at the camera and passing through the point you found on the nearplane.
The camera is at M.inverse().col(4), i.e. the final column of the inverse ModelView matrix.
Final pseudocode:
normalised_x = 2 * mouse_x / win_width - 1
normalised_y = 1 - 2 * mouse_y / win_height
// note the y pos is inverted, so +y is at the top of the screen
unviewMat = (projectionMat * modelViewMat).inverse()
near_point = unviewMat * Vec(normalised_x, normalised_y, 0, 1)
camera_pos = ray_origin = modelViewMat.inverse().col(4)
ray_dir = near_point - camera_pos

Well, pretty simple, the theory behind this is always the same
1) Unproject two times your 2D coordinate onto the 3D space. (each API has its own function, but you can implement your own if you want). One at Min Z, one at Max Z.
2) With these two values calculate the vector that goes from Min Z and point to Max Z.
3) With the vector and a point calculate the ray that goes from Min Z to MaxZ
4) Now you have a ray, with this you can do a ray-triangle/ray-plane/ray-something intersection and get your result...

I have little DirectX experience, but I'm sure it's similar to OpenGL. What you want is the gluUnproject call.
Assuming you have a valid Z buffer you can query the contents of the Z buffer at a mouse position with:
// obtain the viewport, modelview matrix and projection matrix
// you may keep the viewport and projection matrices throughout the program if you don't change them
GLint viewport[4];
GLdouble modelview[16];
GLdouble projection[16];
glGetIntegerv(GL_VIEWPORT, viewport);
glGetDoublev(GL_MODELVIEW_MATRIX, modelview);
glGetDoublev(GL_PROJECTION_MATRIX, projection);
// obtain the Z position (not world coordinates but in range 0 - 1)
GLfloat z_cursor;
glReadPixels(x_cursor, y_cursor, 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, &z_cursor);
// obtain the world coordinates
GLdouble x, y, z;
gluUnProject(x_cursor, y_cursor, z_cursor, modelview, projection, viewport, &x, &y, &z);
if you don't want to use glu you can also implement the gluUnProject you could also implement it yourself, it's functionality is relatively simple and is described at opengl.org

Ok, this topic is old but it was the best I found on the topic, and it helped me a bit, so I'll post here for those who are are following ;-)
This is the way I got it to work without having to compute the inverse of Projection matrix:
void Application::leftButtonPress(u32 x, u32 y){
GL::Viewport vp = GL::getViewport(); // just a call to glGet GL_VIEWPORT
vec3f p = vec3f::from(
((float)(vp.width - x) / (float)vp.width),
((float)y / (float)vp.height),
1.);
// alternatively vec3f p = vec3f::from(
// ((float)x / (float)vp.width),
// ((float)(vp.height - y) / (float)vp.height),
// 1.);
p *= vec3f::from(APP_FRUSTUM_WIDTH, APP_FRUSTUM_HEIGHT, 1.);
p += vec3f::from(APP_FRUSTUM_LEFT, APP_FRUSTUM_BOTTOM, 0.);
// now p elements are in (-1, 1)
vec3f near = p * vec3f::from(APP_FRUSTUM_NEAR);
vec3f far = p * vec3f::from(APP_FRUSTUM_FAR);
// ray in world coordinates
Ray ray = { _camera->getPos(), -(_camera->getBasis() * (far - near).normalize()) };
_ray->set(ray.origin, ray.dir, 10000.); // this is a debugging vertex array to see the Ray on screen
Node* node = _scene->collide(ray, Transform());
cout << "node is : " << node << endl;
}
This assumes a perspective projection, but the question never arises for the orthographic one in the first place.

I've got the same situation with ordinary ray picking, but something is wrong. I've performed the unproject operation the proper way, but it just doesn't work. I think, I've made some mistake, but can't figure out where. My matix multiplication , inverse and vector by matix multiplications all seen to work fine, I've tested them.
In my code I'm reacting on WM_LBUTTONDOWN. So lParam returns [Y][X] coordinates as 2 words in a dword. I extract them, then convert to normalized space, I've checked this part also works fine. When I click the lower left corner - I'm getting close values to -1 -1 and good values for all 3 other corners. I'm then using linepoins.vtx array for debug and It's not even close to reality.
unsigned int x_coord=lParam&0x0000ffff; //X RAW COORD
unsigned int y_coord=client_area.bottom-(lParam>>16); //Y RAW COORD
double xn=((double)x_coord/client_area.right)*2-1; //X [-1 +1]
double yn=1-((double)y_coord/client_area.bottom)*2;//Y [-1 +1]
_declspec(align(16))gl_vec4 pt_eye(xn,yn,0.0,1.0);
gl_mat4 view_matrix_inversed;
gl_mat4 projection_matrix_inversed;
cam.matrixProjection.inverse(&projection_matrix_inversed);
cam.matrixView.inverse(&view_matrix_inversed);
gl_mat4::vec4_multiply_by_matrix4(&pt_eye,&projection_matrix_inversed);
gl_mat4::vec4_multiply_by_matrix4(&pt_eye,&view_matrix_inversed);
line_points.vtx[line_points.count*4]=pt_eye.x-cam.pos.x;
line_points.vtx[line_points.count*4+1]=pt_eye.y-cam.pos.y;
line_points.vtx[line_points.count*4+2]=pt_eye.z-cam.pos.z;
line_points.vtx[line_points.count*4+3]=1.0;

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex