openGL (Qt) bind properly + Two rotation at the same time - qt

I'm trying to get some experience in openGL, but now I'm facing "1.5" problems ;).
The first problem / question is how can I get a rotation in two directions "simultaneously"?
I want to draw a coordinate system which is movable on the x- and y-axis. But I'm only able to move on the x-axis or y-axis. I can't figure it out how to do both at the same time.
My other half problem is not really a problem but as you can see I'm binding my shaders all the time new when I move my mouse. Is there a better way how it could been done?
void GLWidget::mouseMoveEvent(QMouseEvent *event)
differencePostition.setX(event->x() - lastPosition.x());
differencePostition.setY(event->y() - lastPosition.y());
shaderProgram.addShaderFromSourceFile(QGLShader::Vertex, "../Vector/yRotation.vert");
shaderProgram.addShaderFromSourceFile(QGLShader::Fragment, "../Vector/CoordinateSystemLines.frag");;
shaderProgram.setAttributeValue("angle", differencePostition.x());
//shaderProgram.addShaderFromSourceFile(QGLShader::Vertex, "../Vector/xRotation.vert");
//shaderProgram.addShaderFromSourceFile(QGLShader::Fragment, "../Vector/CoordinateSystemLines.frag");
//shaderProgram.setAttributeValue("angle", differencePostition.y());
void GLWidget::mousePressEvent(QMouseEvent *event)
lastPosition = event->posF();
#version 330
in float angle;
const float PI = 3.14159265358979323846264;
void main(void)
float rad_angle = angle * PI / 180.0;
vec4 oldPosition = gl_Vertex;
vec4 newPosition = oldPosition;
newPosition.y = oldPosition.y * cos(rad_angle) - oldPosition.z * sin(rad_angle);
newPosition.z = oldPosition.y * sin(rad_angle) + oldPosition.z * cos(rad_angle);
gl_Position = gl_ModelViewProjectionMatrix * newPosition;
#version 330
in float angle;
const float PI = 3.14159265358979323846264;
void main(void)
float rad_angle = angle * PI / 180.0;
vec4 oldPosition = gl_Vertex;
vec4 newPosition = oldPosition;
newPosition.x = oldPosition.x * cos(rad_angle) + oldPosition.z * sin(rad_angle);
newPosition.z = oldPosition.z * cos(rad_angle) - oldPosition.x * sin(rad_angle);
gl_Position = gl_ModelViewProjectionMatrix * newPosition;

Rotation in more than one direction at the same time requires a combination of matrices ( commonly called a general rotation matrix )
There are several sites that show how this matrix is generated if you are more interested.
As to your second problem, the shaders are usually initialized in the init section.

You only need to call shaderProgram.bind(); every time before you want to draw an object with your shader. Loading and linking is usually only done once in the initialization of your programm. Only call shaderProgram.setAttributeValue your mouseMoveEvent method.
A quick way to solve your rotation problem is to write a shader that does both rotations one after the other. Add a second in variable and set both using the setAttributeValue method.
#version 330
in float angleX;
in float angleY;
const float PI = 3.14159265358979323846264;
void main(void)
float rad_angle_x = angleX * PI / 180.0;
vec4 oldPosition = gl_Vertex;
vec4 newPositionX = oldPosition;
newPositionX.y = oldPosition.y * cos(rad_angle_x) - oldPosition.z * sin(rad_angle_x);
newPositionX.z = oldPosition.y * sin(rad_angle_x) + oldPosition.z * cos(rad_angle_x);
float rad_angle_y = angleY * PI / 180.0;
vec4 newPositionXY = newPositionX;
newPositionXY.x = newPositionX.x * cos(rad_angle_y) + newPositionX.z * sin(rad_angle_y);
newPositionXY.z = newPositionX.z * cos(rad_angle_y) - newPositionX.x * sin(rad_angle_y);
gl_Position = gl_ModelViewProjectionMatrix * newPositionXY;
This way you don't need to know matrix multiplications.


Waterwave normals calculation slightly off

I implemented "The Sum of Sines Approximation" in this article:
My geometry is working fine, but the normals are somewhat wrong. They x- and z-values are flipped, and I wonder what i did wrong in the implementation.
The following equations are used:
The generation of the normals looks like this in the shader:
vec3 generateWaveSineSumNormal(sineParams _params[sineCount])
vec2 pos = vec2(aPos.x, aPos.z);
vec3 normal = vec3(0.0f, 1.0f, 0.0f);
for(int i=0; i<sineCount; i++)
sinParams curParams = _params[i];
normal.x += sineExponent * curParams.direction.x * curParams.frequency * curParams.amplitude *
pow((sin(dot(curParams.direction, pos) * curParams.frequency + curTime * curParams.speed)+1)/2, sineExponent-1)
* cos(dot(curParams.direction, pos) * curParams.frequency + curTime * curParams.speed);
normal.z += sineExponent * curParams.direction.y * curParams.frequency * curParams.amplitude *
pow((sin(dot(curParams.direction, pos) * curParams.frequency + curTime * curParams.speed)+1)/2, sineExponent-1)
* cos(dot(curParams.direction, pos) * curParams.frequency + curTime * curParams.speed);
return vec3(-normal.x, normal.y, -normal.z);
When the x- and z-values are flipped like this, the normals are fine. I'm just wondering what I did wrong implementing it, since I just can't find it.
Your normal.xz calculates the gradient of the heightmap correctly. The gradient points uphill, i.e. where the height values increase. The normal, when projected on the horizontal plane, would point backwards.
This can be derived mathematically through the use of cross-product of two tangents, which is what that article does in the "Normals and Tangents" section. In particular the relevant result is the formula for the normal in "Equation 6b", which includes the negative signs:

Lat/Lng to Web Mercator Projection Issues

I've got some simple code in a Qt application which converts from a latitude/longitude coordinate to a web mercator x/y position:
// Radius of the Earth in metres
const double EARTH_RADIUS = 20037508.34;
// Convert from a LL to a QPointF
QPointF GeoPoint::toMercator() {
double x = this->longitude() * EARTH_RADIUS / 180.0;
double y = log(tan((90.0 + this->latitude()) * PI / 360.0)) / (PI / 180.0);
y = y * EARTH_RADIUS / 180.0;
return QPointF(x, y);
// Convert from a QPointF to a LL
GeoPoint GeoPoint::fromMercator(const QPointF &pt) {
double lon = (pt.x() / EARTH_RADIUS) * 180.0;
double lat = (pt.y() / EARTH_RADIUS) * 180.0;
lat = 180.0 / PI * (2 * atan(exp(lat * PI / 180.0)) - PI / 2.0);
return GeoPoint(lat, lon);
I'm wanting to get the geographic position of a number of objects which are in metres distance away from a geographic origin, however either my lack of understanding or source code are not correct.
Consider the following:
GeoPoint pt1(54.253230, -3.006460);
QPointF m1 = pt1.toMercator();
qDebug() << m1;
// QPointF(-334678,7.21826e+06)
// Now I want to add a distance onto the mercator coordinates, i.e. 50 metres
m1.rx() += 50.0;
qDebug() << m1;
// QPointF(-334628,7.21826e+06)
// Take this back to a LL
GeoPoint pt1a = GeoPoint::fromMercator(m1);
qDebug() << pt1a.toString();
// "54.25323°, -3.00601°"
If I plot the two LL coordinates into Google Earth, they are not 50m apart as expected, they are about 29.3m apart.
I'm perplexed!

OpenCL traversal kernel - further optimization

Currently, I have an OpenCL kernel for like traversal as below. I'd be glad if someone had some point on optimization of this quite large kernel.
The thing is, I'm running this code with SAH BVH and I'd like to get performance similar to Timo Aila with his traversals in his paper (Understanding the Efficiency of Ray Traversal on GPUs), of course his code uses SplitBVH (which I might consider using in place of SAH BVH, but in my opinion it has really slow build times). But I'm asking about traversal, not BVH (also I've so far worked only with scenes, where SplitBVH won't give you much advantages over SAH BVH).
First of all, here is what I have so far (standard while-while traversal kernel).
__constant sampler_t sampler = CLK_FILTER_NEAREST;
// Inline definition of horizontal max
inline float max4(float a, float b, float c, float d)
return max(max(max(a, b), c), d);
// Inline definition of horizontal min
inline float min4(float a, float b, float c, float d)
return min(min(min(a, b), c), d);
// Traversal kernel
__kernel void traverse( __read_only image2d_t nodes,
__global const float4* triangles,
__global const float4* rays,
__global float4* result,
const int num,
const int w,
const int h)
// Ray index
int idx = get_global_id(0);
if(idx < num)
// Stack
int todo[32];
int todoOffset = 0;
// Current node
int nodeNum = 0;
float tmin = 0.0f;
float depth = 2e30f;
// Fetch ray origin, direction and compute invdirection
float4 origin = rays[2 * idx + 0];
float4 direction = rays[2 * idx + 1];
float4 invdir = native_recip(direction);
float4 temp = (float4)(0.0f, 0.0f, 0.0f, 1.0f);
// Traversal loop
// Fetch node information
int2 nodeCoord = (int2)((nodeNum << 2) % w, (nodeNum << 2) / w);
int4 specs = read_imagei(nodes, sampler, nodeCoord + (int2)(3, 0));
// While node isn't leaf
while(specs.z == 0)
// Fetch child bounding boxes
float4 n0xy = read_imagef(nodes, sampler, nodeCoord);
float4 n1xy = read_imagef(nodes, sampler, nodeCoord + (int2)(1, 0));
float4 nz = read_imagef(nodes, sampler, nodeCoord + (int2)(2, 0));
// Test ray against child bounding boxes
float oodx = origin.x * invdir.x;
float oody = origin.y * invdir.y;
float oodz = origin.z * invdir.z;
float c0lox = n0xy.x * invdir.x - oodx;
float c0hix = n0xy.y * invdir.x - oodx;
float c0loy = n0xy.z * invdir.y - oody;
float c0hiy = n0xy.w * invdir.y - oody;
float c0loz = nz.x * invdir.z - oodz;
float c0hiz = nz.y * invdir.z - oodz;
float c1loz = nz.z * invdir.z - oodz;
float c1hiz = nz.w * invdir.z - oodz;
float c0min = max4(min(c0lox, c0hix), min(c0loy, c0hiy), min(c0loz, c0hiz), tmin);
float c0max = min4(max(c0lox, c0hix), max(c0loy, c0hiy), max(c0loz, c0hiz), depth);
float c1lox = n1xy.x * invdir.x - oodx;
float c1hix = n1xy.y * invdir.x - oodx;
float c1loy = n1xy.z * invdir.y - oody;
float c1hiy = n1xy.w * invdir.y - oody;
float c1min = max4(min(c1lox, c1hix), min(c1loy, c1hiy), min(c1loz, c1hiz), tmin);
float c1max = min4(max(c1lox, c1hix), max(c1loy, c1hiy), max(c1loz, c1hiz), depth);
bool traverseChild0 = (c0max >= c0min);
bool traverseChild1 = (c1max >= c1min);
nodeNum = specs.x;
int nodeAbove = specs.y;
// We hit just one out of 2 childs
if(traverseChild0 != traverseChild1)
nodeNum = nodeAbove;
// We hit either both or none
// If we hit none, pop node from stack (or exit traversal, if stack is empty)
if (!traverseChild0)
if(todoOffset == 0)
nodeNum = todo[--todoOffset];
// If we hit both
// Sort them (so nearest goes 1st, further 2nd)
if(c1min < c0min)
unsigned int tmp = nodeNum;
nodeNum = nodeAbove;
nodeAbove = tmp;
// Push further on stack
todo[todoOffset++] = nodeAbove;
// Fetch next node information
nodeCoord = (int2)((nodeNum << 2) % w, (nodeNum << 2) / w);
specs = read_imagei(nodes, sampler, nodeCoord + (int2)(3, 0));
// If node is leaf & has some primitives
if(specs.z > 0)
// Loop through primitives & perform intersection with them (Woop triangles)
for(int i = specs.x; i < specs.y; i++)
// Fetch first point from global memory
float4 v0 = triangles[i * 4 + 0];
float o_z = v0.w - origin.x * v0.x - origin.y * v0.y - origin.z * v0.z;
float i_z = 1.0f / (direction.x * v0.x + direction.y * v0.y + direction.z * v0.z);
float t = o_z * i_z;
if(t > 0.0f && t < depth)
// Fetch second point from global memory
float4 v1 = triangles[i * 4 + 1];
float o_x = v1.w + origin.x * v1.x + origin.y * v1.y + origin.z * v1.z;
float d_x = direction.x * v1.x + direction.y * v1.y + direction.z * v1.z;
float u = o_x + t * d_x;
if(u >= 0.0f && u <= 1.0f)
// Fetch third point from global memory
float4 v2 = triangles[i * 4 + 2];
float o_y = v2.w + origin.x * v2.x + origin.y * v2.y + origin.z * v2.z;
float d_y = direction.x * v2.x + direction.y * v2.y + direction.z * v2.z;
float v = o_y + t * d_y;
if(v >= 0.0f && u + v <= 1.0f)
// We got successful hit, store the information
depth = t;
temp.x = u;
temp.y = v;
temp.z = t;
temp.w = as_float(i);
// Pop node from stack (if empty, finish traversal)
if(todoOffset == 0)
nodeNum = todo[--todoOffset];
// Store the ray traversal result in global memory
result[idx] = temp;
First question of the day is, how could one write his Persistent while-while and Speculative while-while kernel in OpenCL?
Ad Persistent while-while, do I get it right, that I actually just start kernel with global work size equivalent to local work size, and both these numbers should be equal to warp/wavefront size of the GPU?
I get that with CUDA the persistent thread implementation looks like this:
volatile int& jobIndexBase = nextJobArray[threadIndex.y];
if(threadIndex.x == 0)
jobIndexBase = atomicAdd(&warpCounter, WARP_SIZE);
index = jobIndexBase + threadIndex.x;
if(index >= totalJobs)
/* Perform work for task numbered 'index' */
How could equivalent in OpenCL look like, I know I'll have to do some barriers in there, I also know that one should be after the score where I atomically add WARP_SIZE to warpCounter.
Ad Speculative traversal - well I probably don't have any ideas how this should be implemented in OpenCL, so any hints are welcome. I also don't have idea where to put barriers (because putting them around simulated __any will result in driver crash).
If you made it here, thanks for reading & any hints, answers, etc. are welcome!
An optimization you can do is use vector variables and the fused multiply add function to speed up your set up math. As for the rest of the kernel, It is slow because it is branchy. If you can make assumptions on the signal data you might be able to reduce the execution time by reducing the code branches. I have not checked the float4 swizles (the .xxyy and .x .y .z .w after the float 4 variables) so just check that.
float4 n0xy = read_imagef(nodes, sampler, nodeCoord);
float4 n1xy = read_imagef(nodes, sampler, nodeCoord + (int2)(1, 0));
float4 nz = read_imagef(nodes, sampler, nodeCoord + (int2)(2, 0));
float4 oodf4 = -origin * invdir;
float4 c0xyf4 = fma(n0xy,invdir.xxyy,oodf4);
float4 c0zc1z = fma(nz,(float4)(invdir.z),oodf4);
float c0min = max4(min(c0xyf4.x, c0xyf4.y), min(c0xyf4.z, c0xyf4.w), min(c0zc1z.z, c0zc1z.w), tmin);
float c0max = min4(max(c0xyf4.x, c0xyf4.y), max(c0xyf4.z, c0xyf4.w), max(c0zc1z.z, c0zc1z.w), depth);
float4 c1xy = fma(n1xy,invdir.xxyy,oodf4);
float c1min = max4(min(c1xy.x, c1xy.y), min(c1xy.z, c1xy.w), min(c0zc1z.z, c0zc1z.w), tmin);
float c1max = min4(max(c1xy.x, c1xy.y), max(c1xy.z, c1xy.w), max(c0zc1z.z, c0zc1z.w), depth);

Quaternion - Conversion between YawPitchRoll and EulerAngles produces incorrect result only with pitch of Pi

I have spent some time implementing a couple of algorithms for converting between EulerAngles and Quaternions.
I am testing that the quaternion values are the same with this code
Quaternion orientation0 = Prototype1.Mathematics.ToolBox.QuaternionFromYawPitchRoll(0, 0, 0);
Vector3 rotation = orientation0.ToEulerAngles();
Quaternion orientation1 = Prototype1.Mathematics.ToolBox.QuaternionFromYawPitchRoll(rotation.Y, rotation.X, rotation.Z);
I have used a previous method discussed here and have since implemented another method described here
public static Quaternion QuaternionFromYawPitchRoll(float yaw, float pitch, float roll)
float rollOver2 = roll * 0.5f;
float sinRollOver2 = (float)Math.Sin((double)rollOver2);
float cosRollOver2 = (float)Math.Cos((double)rollOver2);
float pitchOver2 = pitch * 0.5f;
float sinPitchOver2 = (float)Math.Sin((double)pitchOver2);
float cosPitchOver2 = (float)Math.Cos((double)pitchOver2);
float yawOver2 = yaw * 0.5f;
float sinYawOver2 = (float)Math.Sin((double)yawOver2);
float cosYawOver2 = (float)Math.Cos((double)yawOver2);
// X = PI is giving incorrect result (pitch)
// Heading = Yaw
// Attitude = Pitch
// Bank = Roll
Quaternion result;
//result.X = cosYawOver2 * cosPitchOver2 * cosRollOver2 + sinYawOver2 * sinPitchOver2 * sinRollOver2;
//result.Y = cosYawOver2 * cosPitchOver2 * sinRollOver2 - sinYawOver2 * sinPitchOver2 * cosRollOver2;
//result.Z = cosYawOver2 * sinPitchOver2 * cosRollOver2 + sinYawOver2 * cosPitchOver2 * sinRollOver2;
//result.W = sinYawOver2 * cosPitchOver2 * cosRollOver2 - cosYawOver2 * sinPitchOver2 * sinRollOver2;
result.W = cosYawOver2 * cosPitchOver2 * cosRollOver2 - sinYawOver2 * sinPitchOver2 * sinRollOver2;
result.X = sinYawOver2 * sinPitchOver2 * cosRollOver2 + cosYawOver2 * cosPitchOver2 * sinRollOver2;
result.Y = sinYawOver2 * cosPitchOver2 * cosRollOver2 + cosYawOver2 * sinPitchOver2 * sinRollOver2;
result.Z = cosYawOver2 * sinPitchOver2 * cosRollOver2 - sinYawOver2 * cosPitchOver2 * sinRollOver2;
return result;
public static Vector3 ToEulerAngles(this Quaternion q)
// Store the Euler angles in radians
Vector3 pitchYawRoll = new Vector3();
double sqx = q.X * q.X;
double sqy = q.Y * q.Y;
double sqz = q.Z * q.Z;
double sqw = q.W * q.W;
// If quaternion is normalised the unit is one, otherwise it is the correction factor
double unit = sqx + sqy + sqz + sqw;
double test = q.X * q.Y + q.Z * q.W;
//double test = q.X * q.Z - q.W * q.Y;
if (test > 0.4999f * unit) // 0.4999f OR 0.5f - EPSILON
// Singularity at north pole
pitchYawRoll.Y = 2f * (float)Math.Atan2(q.X, q.W); // Yaw
pitchYawRoll.X = PIOVER2; // Pitch
pitchYawRoll.Z = 0f; // Roll
return pitchYawRoll;
else if (test < -0.4999f * unit) // -0.4999f OR -0.5f + EPSILON
// Singularity at south pole
pitchYawRoll.Y = -2f * (float)Math.Atan2(q.X, q.W); // Yaw
pitchYawRoll.X = -PIOVER2; // Pitch
pitchYawRoll.Z = 0f; // Roll
return pitchYawRoll;
pitchYawRoll.Y = (float)Math.Atan2(2f * q.Y * q.W - 2f * q.X * q.Z, sqx - sqy - sqz + sqw); // Yaw
pitchYawRoll.X = (float)Math.Asin(2f * test / unit); // Pitch
pitchYawRoll.Z = (float)Math.Atan2(2f * q.X * q.W - 2f * q.Y * q.Z, -sqx + sqy - sqz + sqw); // Roll
//pitchYawRoll.Y = (float)Math.Atan2(2f * q.X * q.W + 2f * q.Y * q.Z, 1 - 2f * (sqz + sqw)); // Yaw
//pitchYawRoll.X = (float)Math.Asin(2f * (q.X * q.Z - q.W * q.Y)); // Pitch
//pitchYawRoll.Z = (float)Math.Atan2(2f * q.X * q.Y + 2f * q.Z * q.W, 1 - 2f * (sqy + sqz)); // Roll
return pitchYawRoll;
All my implementations work except for when the pitch value is ±PI.
Quaternion orientation0 = Prototype1.Mathematics.ToolBox.QuaternionFromYawPitchRoll(0, PI, 0);
Vector3 rotation = orientation0.ToEulerAngles();
Quaternion orientation1 = Prototype1.Mathematics.ToolBox.QuaternionFromYawPitchRoll(rotation.Y, rotation.X, rotation.Z);
Console.WriteLine(orientation1); // Not the same quaternion values
Why will this not work for that particular value? If it is a singularity then it is not being determined as one in the algorithm and the 'test' value will instead be very close to 0.
Rotation space wraps onto itself. Obviously if you rotate by 2PI around any axis, you end up back where you started. Likewise, if you rotate by PI around an axis, it's the same thing as rotating by -PI around the same axis. Or if you rotate by any angle around an axis, it's the same as rotating by the negation of that angle around the negation of that axis.
All of this means that your quaternion conversion algorithms have to decide what to do when dealing with redundancy. The two orientations that you provide in the comments are the same orientation: (0,0,0,1) and (0,0,0,-1) [I prefer having 'w' in alphabetical order].
You should make sure that you always normalize your quaternions or else you'll eventually get some strange drifting. Other than that, what seems to be happening is that when you rotate by PI around the 'z' axis, floating point round-off or a 'less-than' vs. 'less-than-or-equal-to' discrepancy is pushing the representation around the circle to the point that your algorithm decides to represent the angle as a rotation by -PI around the z-axis. That's the same thing.
In a similar manner, if you rotate by 2PI around any axis, your quaternion might be (-1,0,0,0). But if you rotate by zero, it will be (1,0,0,0). The Euler angle representation coming back from either of those quaternions, however, should be (0,0,0).

GLSL: Move transformation from tessellation-evaluation shader to vertex shader

I am working in glsl with tessellation-shaders and I am trying to do displacement mapping. It's working, but I want to move the matrix-transformation-code from the tessellation evaluation shader to the vertex shader. Why I want to have this in the vertex-shader is because I do not want to do this calculation
for every sub triangles vertices, and I want the vertices to be in screenspace in the vertex shader so I can decide how much every triangle should be subdivided in the tessellation control shader.
The version that do not work, is "almost" working, there is some issues when the triangles are rendered.
I would really appreciate even the smallest hint of what may be wrong.
This (bad) version works (position and normal are transformed in tessellation evaluation shader)
// vertex shader
void main_(void)
gl_Position = VertexPosition;
VertexTexCoord1 = VertexTexCoord;
VertexNormal1 = VertexNormal;
// tessellation evaluation shader
void main_()
VertexTexCoord3 = interpolate(VertexTexCoord2);
vec3 normal = interpolate(VertexNormal2);
vec4 pos = interpolate(gl_in[0].gl_Position, gl_in[1].gl_Position, gl_in[2].gl_Position);
vec4 movement = vec4(normal * (texture2D(heigthMap,VertexTexCoord3).r), 0.0);
gl_Position = mvpMatrix * (pos + movement);
This version does not work (position and normal are transformed in vertex shader)
// vertex shader
void main(void)
gl_Position = mvpMatrix * VertexPosition;
VertexTexCoord1 = VertexTexCoord;
VertexNormal1 = mat3(mvpMatrix) * VertexNormal;
// tessellation evaluation shader
void main()
VertexTexCoord3 = interpolate(VertexTexCoord2);
vec3 normal = interpolate(VertexNormal2);
vec4 pos = interpolate(gl_in[0].gl_Position, gl_in[1].gl_Position, gl_in[2].gl_Position);
vec4 movement = vec4(normal * (texture2D(heigthMap,VertexTexCoord3).r), 0.0);
gl_Position = (pos + movement);
In the "non-working" version the last line in tesselation shader seems to be incorrect. You're forgetting that in the source variant you had 'movement' multiplied by the mvpMatrix.
I would have tried to use this:
// tessellation evaluation shader
void main()
VertexTexCoord3 = interpolate(VertexTexCoord2);
vec3 normal = interpolate(VertexNormal2);
vec4 pos = interpolate(gl_in[0].gl_Position, gl_in[1].gl_Position, gl_in[2].gl_Position);
vec4 movement = vec4(normal * (texture2D(heigthMap,VertexTexCoord3).r), 0.0);
/// This multiplication by mvpMatrix is inevitable
gl_Position = (pos + mvpMatrix * movement);
Sorry if I mixed the order of the stages, but the code above (two versions) is definitely non-equivalent.
