How to build perspective projection matrix (no API) - math

I develop a simple 3D engine (Without any use of API), successfully transformed my scene into world and view space but have trouble projecting my scene (from view space) using the perspective projection matrix (OpenGL style). I'm not sure about the fov, near and far values and the scene I get is distorted.
I hope if someone can direct me how to build and use the perspective projection matrix properly with example codes. Thanks in advance for any help.
The matrix build:
double f = 1 / Math.Tan(fovy / 2);
return new double[,] {
{ f / Aspect, 0, 0, 0 },
{ 0, f, 0, 0 },
{ 0, 0, (Far + Near) / (Near - Far), (2 * Far * Near) / (Near - Far) },
{ 0, 0, -1, 0 }
};
The matrix use:
foreach (Point P in T.Points)
{
.
. // Transforming the point to homogen point matrix, to world space, and to view space (works fine)
.
// projecting the point with getProjectionMatrix() specified in the previous code :
double[,] matrix = MatrixMultiply( GetProjectionMatrix(Fovy, Width/Height, Near, Far) , viewSpacePointMatrix );
// translating to Cartesian coordinates (from homogen):
matrix [0, 0] /= matrix [3, 0];
matrix [1, 0] /= matrix [3, 0];
matrix [2, 0] /= matrix [3, 0];
matrix [3, 0] = 1;
P = MatrixToPoint(matrix);
// adjusting to the screen Y axis:
P.y = this.Height - P.y;
// Printing...
}

Following is a typical implemenation of perspective projection matrix.
And here is a good link to explain everything OpenGL Projection Matrix
void ComputeFOVProjection( Matrix& result, float fov, float aspect, float nearDist, float farDist, bool leftHanded /* = true */ )
{
//
// General form of the Projection Matrix
//
// uh = Cot( fov/2 ) == 1/Tan(fov/2)
// uw / uh = 1/aspect
//
// uw 0 0 0
// 0 uh 0 0
// 0 0 f/(f-n) 1
// 0 0 -fn/(f-n) 0
//
// Make result to be identity first
// check for bad parameters to avoid divide by zero:
// if found, assert and return an identity matrix.
if ( fov <= 0 || aspect == 0 )
{
Assert( fov > 0 && aspect != 0 );
return;
}
float frustumDepth = farDist - nearDist;
float oneOverDepth = 1 / frustumDepth;
result[1][1] = 1 / tan(0.5f * fov);
result[0][0] = (leftHanded ? 1 : -1 ) * result[1][1] / aspect;
result[2][2] = farDist * oneOverDepth;
result[3][2] = (-farDist * nearDist) * oneOverDepth;
result[2][3] = 1;
result[3][3] = 0;
}

Another function that may be useful.
This one is based on left/right/top/bottom/near/far parameters (used in OpenGL):
static void test(){
float projectionMatrix[16];
// width and height of viewport to display on (screen dimensions in case of fullscreen rendering)
float ratio = (float)width/height;
float left = -ratio;
float right = ratio;
float bottom = -1.0f;
float top = 1.0f;
float near = -1.0f;
float far = 100.0f;
frustum(projectionMatrix, 0, left, right, bottom, top, near, far);
}
static void frustum(float *m, int offset,
float left, float right, float bottom, float top,
float near, float far) {
float r_width = 1.0f / (right - left);
float r_height = 1.0f / (top - bottom);
float r_depth = 1.0f / (far - near);
float x = 2.0f * (r_width);
float y = 2.0f * (r_height);
float z = 2.0f * (r_depth);
float A = (right + left) * r_width;
float B = (top + bottom) * r_height;
float C = (far + near) * r_depth;
m[offset + 0] = x;
m[offset + 3] = -A;
m[offset + 5] = y;
m[offset + 7] = -B;
m[offset + 10] = -z;
m[offset + 11] = -C;
m[offset + 1] = 0.0f;
m[offset + 2] = 0.0f;
m[offset + 4] = 0.0f;
m[offset + 6] = 0.0f;
m[offset + 8] = 0.0f;
m[offset + 9] = 0.0f;
m[offset + 12] = 0.0f;
m[offset + 13] = 0.0f;
m[offset + 14] = 0.0f;
m[offset + 15] = 1.0f;
}

Related

My raycaster renders walls in a really weird way depending on the map size I guess

I've been writing a raycaster in C++ and to render stuff I use GDI/GDI+. I know that using WGDI to render graphics is not the best idea in the world and I should probably use OpenGL, SFML and etc. but this raycaster does not involve any super-high-level real-time graphics, so in this case WGDI does the job. Besides I probably will be showing this in my school and installing OpenGL there would be a huge pain.
Okay, so the actual problem I wanted to talk about is that whenever I change the map grid from 8x8 to e.g. 8x16, the way that some walls are rendered is pretty bizzarre:
If someone can explain why such issue occurrs I would be very happy to discover what's wrong with my code.
main.cpp
/*
* Pseudo-code of the void renderer():
* Horizontal gridline check:
* Set horizontal distance to a pretty high value, horizontal coordinates to camera coordinates
* Calculate negative inverse of tangent
* Set DOF variable to 0
* If ray angle is bigger than PI calculate ray Y-coordinate to be as close as possible to the gridline position and subtract 0.0001 for precision, calculate ray X-coordinate and offset coordinates for the ray moovement over the gridline
* If ray angle is smaller than PI do the same as if ray angle < PI but add whatever the size of the map is to ray Y-coordinate
* If ray angle is straight up or down set ray coordinates to camera coordinates and DOF to map size
* Loop only if DOF is smaller than map size:
* Calculate actual gridline coordinates
* If the grid cell at [X, Y] is a wall break out from the loop, save the current ray coordinates, calculate the distance between the camera and the wall
* Else update ray coordinates with the earlier calculated offsets
*
* Vertical gridline check:
* Set vertical distance to a pretty high value, vertical coordinates to camera coordinates
* Calculate inverse of tangent
* Set DOF variable to 0
* If ray angle is bigger than PI / 2 and smaller than 3 * PI / 2 calculate ray X-coordinate to be as close as possible to the gridline position and subtract 0.0001 for precision, calculate ray Y-coordinate and offset coordinates for the ray moovement over the gridline
* If ray angle is smaller than PI / 2 or bigger than 3 * PI / 2 do the same as if ray angle > PI / 2 && < 3 * PI / 2 but add whatever the size of the map is to ray X-coordinate
* If ray angle is straight left or right set ray coordinates to camera coordinates and DOF to map size
* Loop only if DOF is smaller than map size:
* Calculate actual gridline coordinates
* If the grid cell at [X, Y] is a wall break out from the loop, save the current ray coordinates, calculate the distance between the camera and the wall
* Else update ray coordinates with the earlier calculated offsets
*
* If the vertical distance is smaller than the horizontal one update ray coordinates to the horizontal ones and set final distance to the horizontal one
* Else update ray coordinates to the vertical ones and set final distance to the vertical one
* Fix fisheye effect
* Add one radian to the ray angle
* Calculate line height by multiplying constant integer 400 by the map size and dividing that by the final distance
* Calculate line offset (to make it more centered) by subtracting half of the line height from constant integer 400
* Draw 8-pixels wide column at [ray index * 8, camera Z-offset + line offset] and [ray index * 8, camera Z-offset + line offset + line height] (the color doesn't matter i think)
*/
#include "../../LIB/wsgl.hpp"
#include "res/maths.hpp"
#include <memory>
using namespace std;
const int window_x = 640, window_y = 640;
float camera_x = 256, camera_y = 256, camera_z = 75;
float camera_a = 0.001;
int camera_fov = 80;
int map_x;
int map_y;
int map_s;
shared_ptr<int[]> map_w;
void controls()
{
if(wsgl::is_key_down(wsgl::key::w))
{
int mx = (camera_x + 30 * cos(camera_a)) / map_s;
int my = (camera_y + 30 * sin(camera_a)) / map_s;
int mp = my * map_x + mx;
if(mp >= 0 && mp < map_s && !map_w[mp])
{camera_x += 15 * cos(camera_a); camera_y += 15 * sin(camera_a);}
}
if(wsgl::is_key_down(wsgl::key::s))
{
int mx = (camera_x - 30 * cos(camera_a)) / map_s;
int my = (camera_y - 30 * sin(camera_a)) / map_s;
int mp = my * map_x + mx;
if(mp >= 0 && mp < map_s && !map_w[mp])
{camera_x -= 5 * cos(camera_a); camera_y -= 5 * sin(camera_a);}
}
if(wsgl::is_key_down(wsgl::key::a_left))
{camera_a = reset_ang(camera_a - 5 * RAD);}
if(wsgl::is_key_down(wsgl::key::a_right))
{camera_a = reset_ang(camera_a + 5 * RAD);}
if(wsgl::is_key_down(wsgl::key::a_up))
{camera_z += 15;}
if(wsgl::is_key_down(wsgl::key::a_down))
{camera_z -= 15;}
}
void renderer()
{
int map_x_pos, map_y_pos, map_cell, dof;
float ray_x, ray_y, ray_a = reset_ang(camera_a - deg_to_rad(camera_fov / 2));
float x_offset, y_offset, tangent, distance_h, distance_v, h_x, h_y, v_x, v_y;
float final_distance, line_height, line_offset;
wsgl::clear_window();
for(int i = 0; i < camera_fov; i++)
{
distance_h = 1000000, h_x = camera_x, h_y = camera_y;
tangent = -1 / tan(ray_a);
dof = 0;
if(ray_a > PI)
{ray_y = (((int)camera_y / map_s) * map_s) - 0.0001; ray_x = (camera_y - ray_y) * tangent + camera_x; y_offset = -map_s; x_offset = -y_offset * tangent;}
if(ray_a < PI)
{ray_y = (((int)camera_y / map_s) * map_s) + map_s; ray_x = (camera_y - ray_y) * tangent + camera_x; y_offset = map_s; x_offset = -y_offset * tangent;}
if(ray_a == 0 || ray_a == PI)
{ray_x = camera_x; ray_y = camera_y; dof = map_s;}
for(dof; dof < map_s; dof++)
{
map_x_pos = (int)(ray_x) / map_s;
map_y_pos = (int)(ray_y) / map_s;
map_cell = map_y_pos * map_x + map_x_pos;
if(map_cell >= 0 && map_cell < map_s && map_w[map_cell])
{dof = map_s; h_x = ray_x; h_y = ray_y; distance_h = distance(camera_x, camera_y, h_x, h_y);}
else
{ray_x += x_offset; ray_y += y_offset;}
}
distance_v = 1000000, v_x = camera_x, v_y = camera_y;
tangent = -tan(ray_a);
dof = 0;
if(ray_a > PI2 && ray_a < PI3)
{ray_x = (((int)camera_x / map_s) * map_s) - 0.0001; ray_y = (camera_x - ray_x) * tangent + camera_y; x_offset = -map_s; y_offset = -x_offset * tangent;}
if(ray_a < PI2 || ray_a > PI3)
{ray_x = (((int)camera_x / map_s) * map_s) + map_s; ray_y = (camera_x - ray_x) * tangent + camera_y; x_offset = map_s; y_offset = -x_offset * tangent;}
if(ray_a == PI2 || ray_a == PI3)
{ray_x = camera_x; ray_y = camera_y; dof = map_s;}
for(dof; dof < map_s; dof++)
{
map_x_pos = (int)(ray_x) / map_s;
map_y_pos = (int)(ray_y) / map_s;
map_cell = map_y_pos * map_x + map_x_pos;
if(map_cell >= 0 && map_cell < map_s && map_w[map_cell])
{dof = map_s; v_x = ray_x; v_y = ray_y; distance_v = distance(camera_x, camera_y, v_x, v_y);}
else
{ray_x += x_offset; ray_y += y_offset;}
}
if(distance_v < distance_h)
{ray_x = v_x; ray_y = v_y; final_distance = distance_v;}
else
{ray_x = h_x; ray_y = h_y; final_distance = distance_h;}
final_distance *= cos(reset_ang(camera_a - ray_a));
ray_a = reset_ang(ray_a + RAD);
line_height = (map_s * 400) / final_distance;
line_offset = 200 - line_height / 2;
wsgl::draw_line({i * 8, camera_z + line_offset}, {i * 8, camera_z + line_offset + line_height}, {0, 255 / (final_distance / 250 + 1), 0}, 8);
if(i == camera_fov / 2)
{wsgl::draw_text({0, 0}, {255, 255, 255}, L"Final distance: " + to_wstring(final_distance) + L" Line height: " + to_wstring(line_height) + L" X: " + to_wstring(camera_x) + L" Y: " + to_wstring(camera_y));}
}
wsgl::render_frame();
}
void load_map(wsgl::wide_str wstr, int cell_size = 1)
{
shared_ptr<wsgl::bmp> map = shared_ptr<wsgl::bmp>(wsgl::bmp::FromFile(wstr.c_str(), true));
map_x = map->GetWidth();
map_y = map->GetHeight();
map_s = map_x * map_y;
map_w = shared_ptr<int[]>(new int[map_s]);
wsgl::color color;
for(int y = 0; y < map_y; y += cell_size)
{
for(int x = 0; x < map_x; x += cell_size)
{
map->GetPixel(x, y, &color);
if(color.GetR() == 255 && color.GetG() == 255 && color.GetB() == 255)
{*(map_w.get() + ((y / cell_size) * map_x + (x / cell_size))) = 0;}
else
{*(map_w.get() + ((y / cell_size) * map_x + (x / cell_size))) = 1;}
}
}
}
int main()
{
wsgl::session sess = wsgl::startup(L"raycaster", {window_x, window_y});
load_map(L"res/map.png");
while(true)
{controls(); renderer();}
}
maths.hpp
#include <cmath>
const float PI = 3.14159265359;
const float PI2 = PI / 2;
const float PI3 = 3 * PI2;
const float RAD = PI / 180;
float deg_to_rad(float deg)
{return deg * RAD;}
float distance(float ax, float ay, float bx, float by)
{
float dx = bx - ax;
float dy = by - ay;
return sqrt(dx * dx + dy * dy);
}
float reset_ang(float ang)
{
if(ang < 0)
{ang += 2 * PI;}
if(ang > 2 * PI)
{ang -= 2 * PI;}
return ang;
}
If someone asks whats wsgl.hpp thats just my wrapper library over some WGDI routines and etc.
I think the problem lies here:
map_x_pos = (int)(ray_x) / map_s;
map_y_pos = (int)(ray_y) / map_s;
map_cell = map_y_pos * map_x + map_x_pos;
You need to change the order of operations:
map_x_pos = (int)(ray_x / map_s);
map_y_pos = (int)(ray_y / map_s);
map_cell = map_y_pos * map_x + map_x_pos;
With your current implementation, you first truncate ray_x and ray_y, then divide by map_s (which should probably be a floating point value, but is an integer in your current implementation), then truncate again to integer values. Your current implementation needlessly sacrifices precision and will be unpredictable for small map_s values.
Additionally, map_s seems incorrect. You set map_s to represent the total area of your map, but in the above code, you use it like it was the side length of the map.
To be correct, you would need something like
#include <cmath>
map_x_pos = (int)(ray_x / sqrtf(map_s));
map_y_pos = (int)(ray_y / sqrtf(map_s));
map_cell = map_y_pos * map_x + map_x_pos;

Different results GPU & CPU when more than one 8 work items per group

I'm new in open cl. And tried as my first work to write code that checks intersection between many polylines to single polygon.
I'm running the code in both cpu and gpu.. and get different results.
First I sent NULL as local parameter when called clEnqueueNDRangeKernel.
clEnqueueNDRangeKernel(command_queue, kIntersect, 1, NULL, &global, null, 2, &evtCalcBounds, &evtKernel);
After trying many things i saw that if i send 1 as local it is working good. and returning the same results for the cpu and gpu.
size_t local = 1;
clEnqueueNDRangeKernel(command_queue, kIntersect, 1, NULL, &global, &local, 2, &evtCalcBounds, &evtKernel);
Played abit more and found that the cpu returns false result when i run the kernel with local 8 or more (for some reason).
I'm not using any local memory, just globals and privates.
I didn't added the code because i think it is irrelevant to the problem (note that for single work group it is working good), and it is long. If it is needed, i will try to simplify it.
The code flow is going like this:
I have polylines coordinates stored in a big buffer. and the single polygon in another. In addition i'm providing another buffer with single int that holds the current results count. All buffers are __global arguments.
In the kernel i'm simply checking intersection between all the lines of the "polyline[get_global(0)]" with the lines of the polygon. If true,
i'm using atomic_inc for the results count. There is no read and write memory from the same buffer, no barriers or mem fences,... the atomic_inc is the only thread safe mechanism i'm using.
-- UPDATE --
Added my code:
I know that i can maybe have better use of open cl functions for calculating some vectors, but for now, i'm simply convert code from my old regular CPU single threaded program to CL. so this is not my concern now.
bool isPointInPolygon(float x, float y, __global float* polygon) {
bool blnInside = false;
uint length = convert_uint(polygon[4]);
int s = 5;
uint j = length - 1;
for (uint i = 0; i < length; j = i++) {
uint realIdx = s + i * 2;
uint realInvIdx = s + j * 2;
if (((polygon[realIdx + 1] > y) != (polygon[realInvIdx + 1] > y)) &&
(x < (polygon[realInvIdx] - polygon[realIdx]) * (y - polygon[realIdx + 1]) / (polygon[realInvIdx + 1] - polygon[realIdx + 1]) + polygon[realIdx]))
blnInside = !blnInside;
}
return blnInside;
}
bool isRectanglesIntersected(float p_dblMinX1, float p_dblMinY1,
float p_dblMaxX1, float p_dblMaxY1,
float p_dblMinX2, float p_dblMinY2,
float p_dblMaxX2, float p_dblMaxY2) {
bool blnResult = true;
if (p_dblMinX1 > p_dblMaxX2 ||
p_dblMaxX1 < p_dblMinX2 ||
p_dblMinY1 > p_dblMaxY2 ||
p_dblMaxY1 < p_dblMinY2) {
blnResult = false;
}
return blnResult;
}
bool isLinesIntersects(
double Ax, double Ay,
double Bx, double By,
double Cx, double Cy,
double Dx, double Dy) {
double distAB, theCos, theSin, newX, ABpos;
// Fail if either line is undefined.
if (Ax == Bx && Ay == By || Cx == Dx && Cy == Dy)
return false;
// (1) Translate the system so that point A is on the origin.
Bx -= Ax; By -= Ay;
Cx -= Ax; Cy -= Ay;
Dx -= Ax; Dy -= Ay;
// Discover the length of segment A-B.
distAB = sqrt(Bx*Bx + By*By);
// (2) Rotate the system so that point B is on the positive X axis.
theCos = Bx / distAB;
theSin = By / distAB;
newX = Cx*theCos + Cy*theSin;
Cy = Cy*theCos - Cx*theSin; Cx = newX;
newX = Dx*theCos + Dy*theSin;
Dy = Dy*theCos - Dx*theSin; Dx = newX;
// Fail if the lines are parallel.
return (Cy != Dy);
}
bool isPolygonInersectsPolyline(__global float* polygon, __global float* polylines, uint startIdx) {
uint polylineLength = convert_uint(polylines[startIdx]);
uint start = startIdx + 1;
float x1 = polylines[start];
float y1 = polylines[start + 1];
float x2;
float y2;
int polygonLength = convert_uint(polygon[4]);
int polygonLength2 = polygonLength * 2;
int startPolygonIdx = 5;
for (int currPolyineIdx = 0; currPolyineIdx < polylineLength - 1; currPolyineIdx++)
{
x2 = polylines[start + (currPolyineIdx*2) + 2];
y2 = polylines[start + (currPolyineIdx*2) + 3];
float polyX1 = polygon[0];
float polyY1 = polygon[1];
for (int currPolygonIdx = 0; currPolygonIdx < polygonLength; ++currPolygonIdx)
{
float polyX2 = polygon[startPolygonIdx + (currPolygonIdx * 2 + 2) % polygonLength2];
float polyY2 = polygon[startPolygonIdx + (currPolygonIdx * 2 + 3) % polygonLength2];
if (isLinesIntersects(x1, y1, x2, y2, polyX1, polyY1, polyX2, polyY2)) {
return true;
}
polyX1 = polyX2;
polyY1 = polyY2;
}
x1 = x2;
y1 = y2;
}
// No intersection found till now so we check containing
return isPointInPolygon(x1, y1, polygon);
}
__kernel void calcIntersections(__global float* polylines, // My flat points array - [pntCount, x,y,x,y,...., pntCount, x,y,... ]
__global float* pBounds, // The rectangle bounds of each polyline - set of 4 values [top, left, bottom, right....]
__global uint* pStarts, // The start index of each polyline in the polylines array
__global float* polygon, // The polygon i want to intersect with - first 4 items are the rectangle bounds [top, left, bottom, right, pntCount, x,y,x,y,x,y....]
__global float* output, // Result array for saving the intersections polylines indices
__global uint* resCount) // The result count
{
int i = get_global_id(0);
uint start = convert_uint(pStarts[i]);
if (isRectanglesIntersected(pBounds[i * 4], pBounds[i * 4 + 1], pBounds[i * 4 + 2], pBounds[i * 4 + 3],
polygon[0], polygon[1], polygon[2], polygon[3])) {
if (isPolygonInersectsPolyline(polygon, polylines, start)){
int oldVal = atomic_inc(resCount);
output[oldVal] = i;
}
}
}
Can anyone explain it to me ?

Deconstructing Google maps smarty pins animation

Updates
Updated fiddle to simplify what is going on:
added four buttons to move the stick, each button increments the value by 30 in the direction
plotted x and y axis
red line is the stick, with bottom end coordinates at (ax,ay) and top end coordinates at (bx,by)
green line is (presumably) previous position of the stick, with bottom end coordinates at (ax, ay) and top end coordinates at (bx0, by0)
So, after having my ninja moments. I'm still nowhere near understanding the sorcery behind unknownFunctionA and unknownFunctionB
For the sake of everyone (all two of you) here is what I've sort of learnt so far
function unknownFunctionB(e) {
var t = e.b.x - e.a.x
, n = e.b.y - e.a.y
, a = t * t + n * n;
if (a > 0) {
if (a == e.lengthSq)
return;
var o = Math.sqrt(a)
, i = (o - e.length) / o
, s = .5;
e.b.x -= t * i * .5 * s,
e.b.y -= n * i * .5 * s
}
}
In the unknownFunctionB above, variable o is length of the red sitck.
Still don't understand
What is variable i and how is (bx,by) calculated? essentially:
bx = bx - (bx - ax) * 0.5 * 0.5
by = by - (by - ay) * 0.5 * 0.5
In unknownFunctionA what are those magic numbers 1.825 and 0.825?
Below is irrelevant
I'm trying to deconstruct marker drag animation used on smartypins
I've managed to get the relevant code for marker move animation but I'm struggling to learn how it all works, especially 2 functions (that I've named unknownFunctionA and unknownFunctionB)
Heres the StickModel class used on smartypins website, unminified to best of my knowledge
function unknownFunctionA(e) {
var t = 1.825
, n = .825
, a = t * e.x - n * e.x0
, o = t * e.y - n * e.y0 - 5;
e.x0 = e.x,
e.y0 = e.y,
e.x = a,
e.y = o;
}
function unknownFunctionB(e) {
var t = e.b.x - e.a.x
, n = e.b.y - e.a.y
, a = t * t + n * n;
if (a > 0) {
if (a == e.lengthSq)
return;
var o = Math.sqrt(a)
, i = (o - e.length) / o
, s = .5;
e.b.x -= t * i * .5 * s,
e.b.y -= n * i * .5 * s
}
}
function StickModel() {
this._props = function(e) {
return {
length: e,
lengthSq: e * e,
a: {
x: 0,
y: 0
},
b: {
x: 0,
y: 0 - e,
x0: 0,
y0: 0 - e
},
angle: 0
}
}
(60)
}
var radianToDegrees = 180 / Math.PI;
StickModel.prototype = {
pos: {
x: 0,
y: 0
},
angle: function() {
return this._props.angle
},
reset: function(e, t) {
var n = e - this._props.a.x
, a = t - this._props.a.y;
this._props.a.x += n,
this._props.a.y += a,
this._props.b.x += n,
this._props.b.y += a,
this._props.b.x0 += n,
this._props.b.y0 += a
},
move: function(e, t) {
this._props.a.x = e,
this._props.a.y = t
},
update: function() {
unknownFunctionA(this._props.b),
unknownFunctionB(this._props),
this.pos.x = this._props.a.x,
this.pos.y = this._props.a.y;
var e = this._props.b.x - this._props.a.x
, t = this._props.b.y - this._props.a.y
, o = Math.atan2(t, e);
this._props.angle = o * radianToDegrees;
}
}
StickModel.prototype.constructor = StickModel;
Fiddle link with sample implementation on canvas: http://jsfiddle.net/vff1w82w/3/
Again, Everything works as expected, I'm just really curious to learn the following:
What could be the ideal names for unknownFunctionA and unknownFunctionB and an explanation of their functionality
What are those magic numbers in unknownFunctionA (1.825 and .825) and .5 in unknownFunctionB.
Variable o in unknownFunctionB appears to be hypotenuse. If that's the case, then what exactly is i = (o - e.length) / o in other words, i = (hypotenuse - stickLength) / hypotenuse?
First thing I'd recommend is renaming all those variables and methods until they start making sense. I also removed unused code.
oscillator
adds wobble to the Stick model by creating new position values for the Stick that follows the mouse
Exaggerates its movement by multiplying its new position by 1.825 and also subtracting the position of an "echo" of its previous position multiplied by 0.825. Sort of looking for a middle point between them. Helium makes the stick sit upright.
overshooter minus undershooter must equal 1 or you will have orientation problems with your stick. overshooter values above 2.1 tend to make it never settle.
seekerUpdate
updates the seeker according to mouse positions.
The distance_to_cover variable measures the length of the total movement. You were right: hypothenuse (variable o).
The ratio variable calculates the ratio of the distance that can be covered subtracting the size of the stick. The ratio is then used to limit the adjustment of the update on the seeker in both directions (x and y). That's how much of the update should be applied to prevent overshooting the target.
easing slows down the correct updates.
There are lots of interesting info related to vectors on the book The nature of code.
function oscillator(seeker) {
var overshooter = 1.825;
var undershooter = .825;
var helium = -5;
var new_seeker_x = overshooter * seeker.x - undershooter * seeker.echo_x;
var new_seeker_y = overshooter * seeker.y - undershooter * seeker.echo_y + helium;
seeker.echo_x = seeker.x;
seeker.echo_y = seeker.y;
seeker.x = new_seeker_x;
seeker.y = new_seeker_y;
}
function seekerUpdate(stick) {
var dX = stick.seeker.x - stick.mouse_pos.x;
var dY = stick.seeker.y - stick.mouse_pos.y;
var distance_to_cover = Math.sqrt(dX * dX + dY * dY);
var ratio = (distance_to_cover - stick.length) / distance_to_cover;
var easing = .25;
stick.seeker.x -= dX * ratio * easing;
stick.seeker.y -= dY * ratio * easing;
}
function StickModel() {
this._props = function(length) {
return {
length: length,
lengthSq: length * length,
mouse_pos: {
x: 0,
y: 0
},
seeker: {
x: 0,
y: 0 - length,
echo_x: 0,
echo_y: 0 - length
}
}
}(60)
}
StickModel.prototype = {
move: function(x, y) {
this._props.mouse_pos.x = x;
this._props.mouse_pos.y = y;
},
update: function() {
oscillator(this._props.seeker);
seekerUpdate(this._props);
}
};
StickModel.prototype.constructor = StickModel;
// Canvas to draw stick model coordinates
var canvas = document.getElementById('myCanvas');
var context = canvas.getContext('2d');
canvas.width = window.outerWidth;
canvas.height = window.outerHeight;
var canvasCenterX = Math.floor(canvas.width / 2);
var canvasCenterY = Math.floor(canvas.height / 2);
context.translate(canvasCenterX, canvasCenterY);
var stickModel = new StickModel();
draw();
setInterval(function() {
stickModel.update();
draw();
}, 16);
$(window).mousemove(function(e) {
var mouseX = (e.pageX - canvasCenterX);
var mouseY = (e.pageY - canvasCenterY);
stickModel.move(mouseX, mouseY);
stickModel.update();
draw();
});
function draw() {
context.clearRect(-canvas.width, -canvas.height, canvas.width * 2, canvas.height * 2);
// red line from (ax, ay) to (bx, by)
context.beginPath();
context.strokeStyle = "#ff0000";
context.moveTo(stickModel._props.mouse_pos.x, stickModel._props.mouse_pos.y);
context.lineTo(stickModel._props.seeker.x, stickModel._props.seeker.y);
context.fillText('mouse_pos x:' + stickModel._props.mouse_pos.x + ' y: ' + stickModel._props.mouse_pos.y, stickModel._props.mouse_pos.x, stickModel._props.mouse_pos.y);
context.fillText('seeker x:' + stickModel._props.seeker.x + ' y: ' + stickModel._props.seeker.y, stickModel._props.seeker.x - 30, stickModel._props.seeker.y);
context.lineWidth = 1;
context.stroke();
context.closePath();
// green line from (ax, ay) to (bx0, by0)
context.beginPath();
context.strokeStyle = "#00ff00";
context.moveTo(stickModel._props.mouse_pos.x, stickModel._props.mouse_pos.y);
context.lineTo(stickModel._props.seeker.echo_x, stickModel._props.seeker.echo_y);
context.fillText('echo x:' + stickModel._props.seeker.echo_x + ' y: ' + stickModel._props.seeker.echo_y, stickModel._props.seeker.echo_x, stickModel._props.seeker.echo_y - 20);
context.lineWidth = 1;
context.stroke();
context.closePath();
// blue line from (bx0, by0) to (bx, by)
context.beginPath();
context.strokeStyle = "#0000ff";
context.moveTo(stickModel._props.seeker.echo_x, stickModel._props.seeker.echo_y);
context.lineTo(stickModel._props.seeker.x, stickModel._props.seeker.y);
context.stroke();
context.closePath();
}
body {
margin: 0px;
padding: 0px;
}
canvas {
display: block;
}
p {
position: absolute;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.0/jquery.min.js"></script>
<p>Move your mouse to see the stick (colored red) follow</p>
<canvas id="myCanvas"></canvas>

OpenCL traversal kernel - further optimization

Currently, I have an OpenCL kernel for like traversal as below. I'd be glad if someone had some point on optimization of this quite large kernel.
The thing is, I'm running this code with SAH BVH and I'd like to get performance similar to Timo Aila with his traversals in his paper (Understanding the Efficiency of Ray Traversal on GPUs), of course his code uses SplitBVH (which I might consider using in place of SAH BVH, but in my opinion it has really slow build times). But I'm asking about traversal, not BVH (also I've so far worked only with scenes, where SplitBVH won't give you much advantages over SAH BVH).
First of all, here is what I have so far (standard while-while traversal kernel).
__constant sampler_t sampler = CLK_FILTER_NEAREST;
// Inline definition of horizontal max
inline float max4(float a, float b, float c, float d)
{
return max(max(max(a, b), c), d);
}
// Inline definition of horizontal min
inline float min4(float a, float b, float c, float d)
{
return min(min(min(a, b), c), d);
}
// Traversal kernel
__kernel void traverse( __read_only image2d_t nodes,
__global const float4* triangles,
__global const float4* rays,
__global float4* result,
const int num,
const int w,
const int h)
{
// Ray index
int idx = get_global_id(0);
if(idx < num)
{
// Stack
int todo[32];
int todoOffset = 0;
// Current node
int nodeNum = 0;
float tmin = 0.0f;
float depth = 2e30f;
// Fetch ray origin, direction and compute invdirection
float4 origin = rays[2 * idx + 0];
float4 direction = rays[2 * idx + 1];
float4 invdir = native_recip(direction);
float4 temp = (float4)(0.0f, 0.0f, 0.0f, 1.0f);
// Traversal loop
while(true)
{
// Fetch node information
int2 nodeCoord = (int2)((nodeNum << 2) % w, (nodeNum << 2) / w);
int4 specs = read_imagei(nodes, sampler, nodeCoord + (int2)(3, 0));
// While node isn't leaf
while(specs.z == 0)
{
// Fetch child bounding boxes
float4 n0xy = read_imagef(nodes, sampler, nodeCoord);
float4 n1xy = read_imagef(nodes, sampler, nodeCoord + (int2)(1, 0));
float4 nz = read_imagef(nodes, sampler, nodeCoord + (int2)(2, 0));
// Test ray against child bounding boxes
float oodx = origin.x * invdir.x;
float oody = origin.y * invdir.y;
float oodz = origin.z * invdir.z;
float c0lox = n0xy.x * invdir.x - oodx;
float c0hix = n0xy.y * invdir.x - oodx;
float c0loy = n0xy.z * invdir.y - oody;
float c0hiy = n0xy.w * invdir.y - oody;
float c0loz = nz.x * invdir.z - oodz;
float c0hiz = nz.y * invdir.z - oodz;
float c1loz = nz.z * invdir.z - oodz;
float c1hiz = nz.w * invdir.z - oodz;
float c0min = max4(min(c0lox, c0hix), min(c0loy, c0hiy), min(c0loz, c0hiz), tmin);
float c0max = min4(max(c0lox, c0hix), max(c0loy, c0hiy), max(c0loz, c0hiz), depth);
float c1lox = n1xy.x * invdir.x - oodx;
float c1hix = n1xy.y * invdir.x - oodx;
float c1loy = n1xy.z * invdir.y - oody;
float c1hiy = n1xy.w * invdir.y - oody;
float c1min = max4(min(c1lox, c1hix), min(c1loy, c1hiy), min(c1loz, c1hiz), tmin);
float c1max = min4(max(c1lox, c1hix), max(c1loy, c1hiy), max(c1loz, c1hiz), depth);
bool traverseChild0 = (c0max >= c0min);
bool traverseChild1 = (c1max >= c1min);
nodeNum = specs.x;
int nodeAbove = specs.y;
// We hit just one out of 2 childs
if(traverseChild0 != traverseChild1)
{
if(traverseChild1)
{
nodeNum = nodeAbove;
}
}
// We hit either both or none
else
{
// If we hit none, pop node from stack (or exit traversal, if stack is empty)
if (!traverseChild0)
{
if(todoOffset == 0)
{
break;
}
nodeNum = todo[--todoOffset];
}
// If we hit both
else
{
// Sort them (so nearest goes 1st, further 2nd)
if(c1min < c0min)
{
unsigned int tmp = nodeNum;
nodeNum = nodeAbove;
nodeAbove = tmp;
}
// Push further on stack
todo[todoOffset++] = nodeAbove;
}
}
// Fetch next node information
nodeCoord = (int2)((nodeNum << 2) % w, (nodeNum << 2) / w);
specs = read_imagei(nodes, sampler, nodeCoord + (int2)(3, 0));
}
// If node is leaf & has some primitives
if(specs.z > 0)
{
// Loop through primitives & perform intersection with them (Woop triangles)
for(int i = specs.x; i < specs.y; i++)
{
// Fetch first point from global memory
float4 v0 = triangles[i * 4 + 0];
float o_z = v0.w - origin.x * v0.x - origin.y * v0.y - origin.z * v0.z;
float i_z = 1.0f / (direction.x * v0.x + direction.y * v0.y + direction.z * v0.z);
float t = o_z * i_z;
if(t > 0.0f && t < depth)
{
// Fetch second point from global memory
float4 v1 = triangles[i * 4 + 1];
float o_x = v1.w + origin.x * v1.x + origin.y * v1.y + origin.z * v1.z;
float d_x = direction.x * v1.x + direction.y * v1.y + direction.z * v1.z;
float u = o_x + t * d_x;
if(u >= 0.0f && u <= 1.0f)
{
// Fetch third point from global memory
float4 v2 = triangles[i * 4 + 2];
float o_y = v2.w + origin.x * v2.x + origin.y * v2.y + origin.z * v2.z;
float d_y = direction.x * v2.x + direction.y * v2.y + direction.z * v2.z;
float v = o_y + t * d_y;
if(v >= 0.0f && u + v <= 1.0f)
{
// We got successful hit, store the information
depth = t;
temp.x = u;
temp.y = v;
temp.z = t;
temp.w = as_float(i);
}
}
}
}
}
// Pop node from stack (if empty, finish traversal)
if(todoOffset == 0)
{
break;
}
nodeNum = todo[--todoOffset];
}
// Store the ray traversal result in global memory
result[idx] = temp;
}
}
First question of the day is, how could one write his Persistent while-while and Speculative while-while kernel in OpenCL?
Ad Persistent while-while, do I get it right, that I actually just start kernel with global work size equivalent to local work size, and both these numbers should be equal to warp/wavefront size of the GPU?
I get that with CUDA the persistent thread implementation looks like this:
do
{
volatile int& jobIndexBase = nextJobArray[threadIndex.y];
if(threadIndex.x == 0)
{
jobIndexBase = atomicAdd(&warpCounter, WARP_SIZE);
}
index = jobIndexBase + threadIndex.x;
if(index >= totalJobs)
return;
/* Perform work for task numbered 'index' */
}
while(true);
How could equivalent in OpenCL look like, I know I'll have to do some barriers in there, I also know that one should be after the score where I atomically add WARP_SIZE to warpCounter.
Ad Speculative traversal - well I probably don't have any ideas how this should be implemented in OpenCL, so any hints are welcome. I also don't have idea where to put barriers (because putting them around simulated __any will result in driver crash).
If you made it here, thanks for reading & any hints, answers, etc. are welcome!
An optimization you can do is use vector variables and the fused multiply add function to speed up your set up math. As for the rest of the kernel, It is slow because it is branchy. If you can make assumptions on the signal data you might be able to reduce the execution time by reducing the code branches. I have not checked the float4 swizles (the .xxyy and .x .y .z .w after the float 4 variables) so just check that.
float4 n0xy = read_imagef(nodes, sampler, nodeCoord);
float4 n1xy = read_imagef(nodes, sampler, nodeCoord + (int2)(1, 0));
float4 nz = read_imagef(nodes, sampler, nodeCoord + (int2)(2, 0));
float4 oodf4 = -origin * invdir;
float4 c0xyf4 = fma(n0xy,invdir.xxyy,oodf4);
float4 c0zc1z = fma(nz,(float4)(invdir.z),oodf4);
float c0min = max4(min(c0xyf4.x, c0xyf4.y), min(c0xyf4.z, c0xyf4.w), min(c0zc1z.z, c0zc1z.w), tmin);
float c0max = min4(max(c0xyf4.x, c0xyf4.y), max(c0xyf4.z, c0xyf4.w), max(c0zc1z.z, c0zc1z.w), depth);
float4 c1xy = fma(n1xy,invdir.xxyy,oodf4);
float c1min = max4(min(c1xy.x, c1xy.y), min(c1xy.z, c1xy.w), min(c0zc1z.z, c0zc1z.w), tmin);
float c1max = min4(max(c1xy.x, c1xy.y), max(c1xy.z, c1xy.w), max(c0zc1z.z, c0zc1z.w), depth);

correct glsl affine texture mapping

i'm trying to code correct 2D affine texture mapping in GLSL.
Explanation:
...NONE of this images is correct for my purposes. Right (labeled Correct) has perspective correction which i do not want. So this: Getting to know the Q texture coordinate solution (without further improvements) is not what I'm looking for.
I'd like to simply "stretch" texture inside quadrilateral, something like this:
but composed from two triangles. Any advice (GLSL) please?
This works well as long as you have a trapezoid, and its parallel edges are aligned with one of the local axes. I recommend playing around with my Unity package.
GLSL:
varying vec2 shiftedPosition, width_height;
#ifdef VERTEX
void main() {
gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
shiftedPosition = gl_MultiTexCoord0.xy; // left and bottom edges zeroed.
width_height = gl_MultiTexCoord1.xy;
}
#endif
#ifdef FRAGMENT
uniform sampler2D _MainTex;
void main() {
gl_FragColor = texture2D(_MainTex, shiftedPosition / width_height);
}
#endif
C#:
// Zero out the left and bottom edges,
// leaving a right trapezoid with two sides on the axes and a vertex at the origin.
var shiftedPositions = new Vector2[] {
Vector2.zero,
new Vector2(0, vertices[1].y - vertices[0].y),
new Vector2(vertices[2].x - vertices[1].x, vertices[2].y - vertices[3].y),
new Vector2(vertices[3].x - vertices[0].x, 0)
};
mesh.uv = shiftedPositions;
var widths_heights = new Vector2[4];
widths_heights[0].x = widths_heights[3].x = shiftedPositions[3].x;
widths_heights[1].x = widths_heights[2].x = shiftedPositions[2].x;
widths_heights[0].y = widths_heights[1].y = shiftedPositions[1].y;
widths_heights[2].y = widths_heights[3].y = shiftedPositions[2].y;
mesh.uv2 = widths_heights;
I recently managed to come up with a generic solution to this problem for any type of quadrilateral. The calculations and GLSL maybe of help. There's a working demo in java (that runs on Android), but is compact and readable and should be easily portable to unity or iOS: http://www.bitlush.com/posts/arbitrary-quadrilaterals-in-opengl-es-2-0
In case anyone's still interested, here's a C# implementation that takes a quad defined by the clockwise screen verts (x0,y0) (x1,y1) ... (x3,y3), an arbitrary pixel at (x,y) and calculates the u and v of that pixel. It was originally written to CPU-render an arbitrary quad to a texture, but it's easy enough to split the algorithm across CPU, Vertex and Pixel shaders; I've commented accordingly in the code.
float Ax, Bx, Cx, Dx, Ay, By, Cy, Dy, A, B, C;
//These are all uniforms for a given quad. Calculate on CPU.
Ax = (x3 - x0) - (x2 - x1);
Bx = (x0 - x1);
Cx = (x2 - x1);
Dx = x1;
Ay = (y3 - y0) - (y2 - y1);
By = (y0 - y1);
Cy = (y2 - y1);
Dy = y1;
float ByCx_plus_AyDx_minus_BxCy_minus_AxDy = (By * Cx) + (Ay * Dx) - (Bx * Cy) - (Ax * Dy);
float ByDx_minus_BxDy = (By * Dx) - (Bx * Dy);
A = (Ay*Cx)-(Ax*Cy);
//These must be calculated per-vertex, and passed through as interpolated values to the pixel-shader
B = (Ax * y) + ByCx_plus_AyDx_minus_BxCy_minus_AxDy - (Ay * x);
C = (Bx * y) + ByDx_minus_BxDy - (By * x);
//These must be calculated per-pixel using the interpolated B, C and x from the vertex shader along with some of the other uniforms.
u = ((-B) - Mathf.Sqrt((B*B-(4.0f*A*C))))/(A*2.0f);
v = (x - (u * Cx) - Dx)/((u*Ax)+Bx);
Tessellation solves this problem. Subdividing quad vertex adds hints to interpolate pixels.
Check out this link.
https://www.youtube.com/watch?v=8TleepxIORU&feature=youtu.be
I had similar question ( https://gamedev.stackexchange.com/questions/174857/mapping-a-texture-to-a-2d-quadrilateral/174871 ) , and at gamedev they suggested using imaginary Z coord, which I calculate using the following C code, which appears to be working in general case (not just trapezoids):
//usual euclidean distance
float distance(int ax, int ay, int bx, int by) {
int x = ax-bx;
int y = ay-by;
return sqrtf((float)(x*x + y*y));
}
void gfx_quad(gfx_t *dst //destination texture, we are rendering into
,gfx_t *src //source texture
,int *quad // quadrilateral vertices
)
{
int *v = quad; //quad vertices
float z = 20.0;
float top = distance(v[0],v[1],v[2],v[3]); //top
float bot = distance(v[4],v[5],v[6],v[7]); //bottom
float lft = distance(v[0],v[1],v[4],v[5]); //left
float rgt = distance(v[2],v[3],v[6],v[7]); //right
// By default all vertices lie on the screen plane
float az = 1.0;
float bz = 1.0;
float cz = 1.0;
float dz = 1.0;
// Move Z from screen, if based on distance ratios.
if (top<bot) {
az *= top/bot;
bz *= top/bot;
} else {
cz *= bot/top;
dz *= bot/top;
}
if (lft<rgt) {
az *= lft/rgt;
cz *= lft/rgt;
} else {
bz *= rgt/lft;
dz *= rgt/lft;
}
// draw our quad as two textured triangles
gfx_textured(dst, src
, v[0],v[1],az, v[2],v[3],bz, v[4],v[5],cz
, 0.0,0.0, 1.0,0.0, 0.0,1.0);
gfx_textured(dst, src
, v[2],v[3],bz, v[4],v[5],cz, v[6],v[7],dz
, 1.0,0.0, 0.0,1.0, 1.0,1.0);
}
I'm doing it in software to scale and rotate 2d sprites, and for OpenGL 3d app you will need to do it in pixel/fragment shader, unless you will be able to map these imaginary az,bz,cz,dz into your actual 3d space and use the usual pipeline. DMGregory gave exact code for OpenGL shaders: https://gamedev.stackexchange.com/questions/148082/how-can-i-fix-zig-zagging-uv-mapping-artifacts-on-a-generated-mesh-that-tapers
I came up with this issue as I was trying to implement a homography warping in OpenGL. Some of the solutions that I found relied on a notion of depth, but this was not feasible in my case since I am working on 2D coordinates.
I based my solution on this article, and it seems to work for all cases that I could try. I am leaving it here in case it is useful for someone else as I could not find something similar. The solution makes the following assumptions:
The vertex coordinates are the 4 points of a quad in Lower Right, Upper Right, Upper Left, Lower Left order.
The coordinates are given in OpenGL's reference system (range [-1, 1], with origin at bottom left corner).
std::vector<cv::Point2f> points;
// Convert points to homogeneous coordinates to simplify the problem.
Eigen::Vector3f p0(points[0].x, points[0].y, 1);
Eigen::Vector3f p1(points[1].x, points[1].y, 1);
Eigen::Vector3f p2(points[2].x, points[2].y, 1);
Eigen::Vector3f p3(points[3].x, points[3].y, 1);
// Compute the intersection point between the lines described by opposite vertices using cross products. Normalization is only required at the end.
// See https://leimao.github.io/blog/2D-Line-Mathematics-Homogeneous-Coordinates/ for a quick summary of this approach.
auto line1 = p2.cross(p0);
auto line2 = p3.cross(p1);
auto intersection = line1.cross(line2);
intersection = intersection / intersection(2);
// Compute distance to each point.
for (const auto &pt : points) {
auto distance = std::sqrt(std::pow(pt.x - intersection(0), 2) +
std::pow(pt.y - intersection(1), 2));
distances.push_back(distance);
}
// Assumes same order as above.
std::vector<cv::Point2f> texture_coords_unnormalized = {
{1.0f, 1.0f},
{1.0f, 0.0f},
{0.0f, 0.0f},
{0.0f, 1.0f}
};
std::vector<float> texture_coords;
for (int i = 0; i < texture_coords_unnormalized.size(); ++i) {
float u_i = texture_coords_unnormalized[i].x;
float v_i = texture_coords_unnormalized[i].y;
float d_i = distances.at(i);
float d_i_2 = distances.at((i + 2) % 4);
float scale = (d_i + d_i_2) / d_i_2;
texture_coords.push_back(u_i*scale);
texture_coords.push_back(v_i*scale);
texture_coords.push_back(scale);
}
Pass the texture coordinates to your shader (use vec3). Then:
gl_FragColor = vec4(texture2D(textureSampler, textureCoords.xy/textureCoords.z).rgb, 1.0);
thanks for answers, but after experimenting i found a solution.
two triangles on the left has uv (strq) according this and two triangles on the right are modifed version of this perspective correction.
Numbers and shader:
tri1 = [Vec2(-0.5, -1), Vec2(0.5, -1), Vec2(1, 1)]
tri2 = [Vec2(-0.5, -1), Vec2(1, 1), Vec2(-1, 1)]
d1 = length of top edge = 2
d2 = length of bottom edge = 1
tri1_uv = [Vec4(0, 0, 0, d2 / d1), Vec4(d2 / d1, 0, 0, d2 / d1), Vec4(1, 1, 0, 1)]
tri2_uv = [Vec4(0, 0, 0, d2 / d1), Vec4(1, 1, 0, 1), Vec4(0, 1, 0, 1)]
only right triangles are rendered using this glsl shader (on left is fixed pipeline):
void main()
{
gl_FragColor = texture2D(colormap, vec2(gl_TexCoord[0].x / glTexCoord[0].w, gl_TexCoord[0].y);
}
so.. only U is perspective and V is linear.

Resources