Why does creating a 2x8 R8 texture from a 16 byte buffer fail in webgl2? - webgl2

When I try to create a 2x8 R8 texture in webgl2, I get an error. This doesn't happen for a 4x8 texture. If I double the size of the input buffer compared to what I expect, the 2x8 succeeds.
Does webgl2 have a 'column alignment' of 4 when creating/reading textures?
Here is some code that reproduces the issue. I tested it on Windows in both Chrome and Firefox:
function test_read(w) {
let gl = document.createElement('canvas').getContext('webgl2');
let h = 8;
let data = new Uint8Array(w*h);
data[5] = 5;
let texture = gl.createTexture();
let frameBuffer = gl.createFramebuffer();
gl.bindTexture(gl.TEXTURE_2D, texture);
gl.bindFramebuffer(gl.FRAMEBUFFER, frameBuffer);
gl.texImage2D(gl.TEXTURE_2D, 0, gl.R8, w, h, 0, gl.RED, gl.UNSIGNED_BYTE, data);
if (gl.getError() !== gl.NO_ERROR) {
return 'bad w=' + w;
return 'good w=' + w;
console.log(test_read(4)); // good w=4
console.log(test_read(2)); // bad w=2
The error code coming out is 0x502 (INVALID_OPERATION). A similar issue happens when reading textures that were created by expanding the buffer: it seems to expect a 'column alignment' of 4.

You need to set gl.pixelStorei(gl.UNPACK_ALIGNMENT, 1)
The default UNPACK_ALIGNMENT is 4 which means WebGL expects every row of pixel to be a multiple of 4 bytes. Since you're using R8 (1 byte pixel) and a width of 2 your rows are only 2 bytes long. When you change the width to 4 it starts working.
function test_read(w) {
let gl = document.createElement('canvas').getContext('webgl2');
let h = 8;
let data = new Uint8Array(w*h);
data[5] = 5;
// ---=== ADDED ===---
gl.pixelStorei(gl.UNPACK_ALIGNMENT, 1);
let texture = gl.createTexture();
let frameBuffer = gl.createFramebuffer();
gl.bindTexture(gl.TEXTURE_2D, texture);
gl.bindFramebuffer(gl.FRAMEBUFFER, frameBuffer);
gl.texImage2D(gl.TEXTURE_2D, 0, gl.R8, w, h, 0, gl.RED, gl.UNSIGNED_BYTE, data);
if (gl.getError() !== gl.NO_ERROR) {
return 'bad w=' + w;
return 'good w=' + w;
console.log(test_read(4)); // good w=4
console.log(test_read(2)); // bad w=2


WebGL2 not writing second output of `out int[2]` result

When I read output from the fragment shader:
#version 300 es
precision highp float;
precision highp int;
out int outColor[2];
void main() {
outColor[0] = 5;
outColor[1] = 2;
rendered into a 32 bit integer RG texture, I find that only the 5s have been written but not the 2s. Presumably I've got some format specifier wrong somewhere. Or I might be attaching the framebuffer to the wrong thing (gl.COLOR_ATTACHMENT0). I've tried varying various arguments but most changes that I make result in nothing coming out due to formats not lining up. It might be that I need to change 3 constants in tandem.
Here's my self-contained source. The output I want is an array alternatingbetween 5 and 2. Instead, I get an array alternating between 5 and semi-random large constants and 0.
let canvas /** #type {HTMLCanvasElement} */ = document.createElement('canvas');
let gl = canvas.getContext("webgl2");
let vertexShader = gl.createShader(gl.VERTEX_SHADER);
gl.shaderSource(vertexShader, `#version 300 es
in vec4 a_position;
void main() {
gl_Position = a_position;
console.assert(gl.getShaderParameter(vertexShader, gl.COMPILE_STATUS), "Vertex shader compile failed.");
let fragmentShader = gl.createShader(gl.FRAGMENT_SHADER);
gl.shaderSource(fragmentShader, `#version 300 es
precision highp float;
precision highp int;
out int outColor[2];
void main() {
outColor[0] = 5;
outColor[1] = 2;
let program = gl.createProgram();
gl.attachShader(program, vertexShader);
gl.attachShader(program, fragmentShader);
let positionBuffer = gl.createBuffer();
gl.bindBuffer(gl.ARRAY_BUFFER, positionBuffer);
gl.bufferData(gl.ARRAY_BUFFER, new Float32Array([-3, -1, 1, 3, 1, -1]), gl.STATIC_DRAW);
let positionAttributeLocation = gl.getAttribLocation(program, "a_position");
let vao = gl.createVertexArray();
gl.vertexAttribPointer(positionAttributeLocation, 2, gl.FLOAT, false, 0, 0);
let w = 4;
let h = 4;
let texture = gl.createTexture();
gl.bindTexture(gl.TEXTURE_2D, texture);
gl.texImage2D(gl.TEXTURE_2D, 0, gl.RG32I, w, h, 0, gl.RG_INTEGER, gl.INT, null);
let frameBuffer = gl.createFramebuffer();
gl.bindFramebuffer(gl.FRAMEBUFFER, frameBuffer);
gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, texture, 0);
gl.viewport(0, 0, w, h);
gl.drawArrays(gl.TRIANGLES, 0, 3);
let outputBuffer = new Int32Array(w*h*2);
gl.readPixels(0, 0, w, h, gl.RG_INTEGER, gl.INT, outputBuffer);
Arrayed outputs like out int outColor[2]; are used for outputting to multiple render targets. In your case, two render targets with one channel each, because you've used a scalar type.
To express a single render target with two channels, try out ivec2 outColor;.

Zoom graphics processing 2.2.1

help please, at Arduino Uno I receive a signal from the sensor and build a graph using processing 2.2.1, but you need to scale up without losing proportions. My attempts failed, the proportion was crumbling(I tried to multiply the values) Code:
Serial myPort;
int xPos = 1;
int yPos = 100;
float yOld = 0;
float yNew = 0;
float inByte = 0;
int lastS = 0;
PFont f;
void setup () {
size(1200, 500);
myPort = new Serial(this, Serial.list()[0], 9600);
void draw () {
int s = second();
PFont f = createFont("Arial",9,false);
if (s != lastS){
stroke(0xcc, 0xcc, 0xcc);
line(xPos, yPos+10, xPos, yPos+30);
text(s + " Sec.", xPos+5, yPos+30);
lastS = s;
void mousePressed(){
save(lastS + "-heart.jpg");
void serialEvent (Serial myPort) {
String inString = myPort.readStringUntil('\n');
if (inString != null) {
inString = trim(inString);
if (inString.equals("!")) {
stroke(0, 0, 0xff); // blue
inByte = 1023;
} else {
stroke(0xff, 0, 0); //Set stroke to red ( R, G, B)
inByte = float(inString);
inByte = map(inByte, 0, 1023, 0, height);
yNew = inByte;
line(xPos-1, yPos-yOld, xPos, yPos-yNew);
yOld = yNew;
if (xPos >= width) {
xPos = 1;
if (yPos > height-200){
xPos = 1;
} else {
There are multiple ways to scale graphics.
A simple method to try is to simply scale() the rendering (drawing coordinate system).
Bare in mind currently the buffer is only cleared when the xPos reaches the right hand side of the screen.
The value from Arduino is mapped to Processing here:
inByte = map(inByte, 0, 1023, 0, height);
yNew = inByte;
you should try to map change height to a different value as you see fit.
This however will scale only the Y value. The x value is incremented here:
you might want to change this increment to a different value that works with the proportion you are trying maintain between x and y.

Receiving denormalized output texture coordinates in Frag shader

See rationale at the end of my question below
Using WebGL2 I can access a texel by its denormalized coordinates (sorry don't the right lingo for this). That means I don't have to scale them down to 0-1 like I do in texture2D().
However the input to the fragment shader is still the vec2/3 in normalized values.
Is there a way to declare in/out variables in the Vertex and Frag shaders so that I don't have to scale the coordinates?
somewhere in vertex shader:
out vec2 TextureCoordinates;
somewhere in frag shader:
in vec2 TextureCoordinates;
I would like for TextureCoordinates to be ivec2 and already scaled.
This question and all my other questions on webgl related to general computing using WebGL. We are trying to do tensor (multi-D matrix) operations using WebGL.
We map our data in a few ways to a Texture. The simplest approach we follow is -- assuming we can access our data as a flat array -- to lay it out along the texture's width and go up the texture's height until we're done.
Since our thinking, logic, and calculations are all based on tensor/matrix indices -- inside the fragment shader -- we'd have to map back to/from the X-Y texture coordinates to indices. The intermediate step here is to calculate an offset for a given position of a texel. Then from that offset we can calculate the matrix indices from its strides.
Calculating an offset in webgl 1 for very large textures seems to be taking much longer than webgl2 using the integer coordinates. See below:
WebGL 1 offset calculation
int coordsToOffset(vec2 coords, int width, int height) {
float s = coords.s * float(width);
float t = coords.t * float(height);
int offset = int(t) * width + int(s);
return offset;
vec2 offsetToCoords(int offset, int width, int height) {
int t = offset / width;
int s = offset - t*width;
vec2 coords = (vec2(s,t) + vec2(0.5,0.5)) / vec2(width, height);
return coords;
WebGL 2 offset calculation in the presence of int coords
int coordsToOffset(ivec2 coords, int width) {
return coords.t * width + coords.s;
ivec2 offsetToCoords(int offset, int width) {
int t = offset / width;
int s = offset - t*width;
return ivec2(s,t);
It should be clear that for a series of large texture operations we're saving hundreds of thousands of operations just on the offset/coords calculation.
It's not clear why you want do what you're trying to do. It would be better to ask something like "I'm trying to draw an image/implement post processing glow/do ray tracing/... and to do that I want to use un-normalized texture coordinates because " and then we can tell you if your solution is going to work and how to solve it.
In any case, passing int or unsigned int or ivec2/3/4 or uvec2/3/4 as a varying is supported but not interpolation. You have to declare them as flat.
Still, you can pass un-normalized values as float or vec2/3/4 and the convert to int, ivec2/3/4 in the fragment shader.
The other issue is you'll get no sampling using texelFetch, the function that takes texel coordinates instead of normalized texture coordinates. It just returns the exact value of a single pixel. It does not support filtering like the normal texture function.
function main() {
const gl = document.querySelector('canvas').getContext('webgl2');
if (!gl) {
return alert("need webgl2");
const vs = `
#version 300 es
in vec4 position;
in ivec2 texelcoord;
out vec2 v_texcoord;
void main() {
v_texcoord = vec2(texelcoord);
gl_Position = position;
const fs = `
#version 300 es
precision mediump float;
in vec2 v_texcoord;
out vec4 outColor;
uniform sampler2D tex;
void main() {
outColor = texelFetch(tex, ivec2(v_texcoord), 0);
// compile shaders, link program, look up locations
const programInfo = twgl.createProgramInfo(gl, [vs, fs]);
// create buffers via gl.createBuffer, gl.bindBuffer, gl.bufferData)
const bufferInfo = twgl.createBufferInfoFromArrays(gl, {
position: {
numComponents: 2,
data: [
-.5, -.5,
.5, -.5,
0, .5,
texelcoord: {
numComponents: 2,
data: new Int32Array([
0, 0,
15, 0,
8, 15,
// make a 16x16 texture
const ctx = document.createElement('canvas').getContext('2d');
ctx.canvas.width = 16;
ctx.canvas.height = 16;
for (let i = 23; i > 0; --i) {
ctx.fillStyle = `hsl(${i / 23 * 360 | 0}, 100%, ${i % 2 ? 25 : 75}%)`;
ctx.arc(8, 15, i, 0, Math.PI * 2, false);
const tex = twgl.createTexture(gl, { src: ctx.canvas });
twgl.setBuffersAndAttributes(gl, programInfo, bufferInfo);
// no need to set uniforms since they default to 0
// and only one texture which is already on texture unit 0
gl.drawArrays(gl.TRIANGLES, 0, 3);
<script src="https://twgljs.org/dist/4.x/twgl-full.min.js"></script>
So in response to your updated question it's still not clear what you want to do. Why do you want to pass varyings to the fragment shader? Can't you just do whatever math you want in the fragment shader itself?
uniform sampler2D tex;
out float result;
// some all the values in the texture
vec4 sum4 = vec4(0);
ivec2 texDim = textureSize(tex, 0);
for (int y = 0; y < texDim.y; ++y) {
for (int x = 0; x < texDim.x; ++x) {
sum4 += texelFetch(tex, ivec2(x, y), 0);
result = sum4.x + sum4.y + sum4.z + sum4.w;
uniform isampler2D indices;
uniform sampler2D data;
out float result;
// some only values in data pointed to by indices
vec4 sum4 = vec4(0);
ivec2 texDim = textureSize(indices, 0);
for (int y = 0; y < texDim.y; ++y) {
for (int x = 0; x < texDim.x; ++x) {
ivec2 index = texelFetch(indices, ivec2(x, y), 0).xy;
sum4 += texelFetch(tex, index, 0);
result = sum4.x + sum4.y + sum4.z + sum4.w;
Note that I'm also not an expert in GPGPU but I have an hunch the code above is not the fastest way because I believe parallelization happens based on output. The code above has only 1 output so no parallelization? It would be easy to change so that it takes a block ID, tile ID, area ID as input and computes just the sum for that area. Then you'd write out a larger texture with the sum of each block and finally sum the block sums.
Also, dependant and non-uniform texture reads are a known perf issue. The first example reads the texture in order. That's cache friendly. The second example reads the texture in a random order (specified by indices), that's not cache friendly.

QGLWidget - distortion occured

I would like to display sample6 of the OptixSDK in a QGLWidget.
My application has only 3 QSlider for the rotation around the X,Y,Z axis and the QGLWidget.
For my understanding, paintGL() gets called whenever updateGL() is called by my QSlider or Mouseevents. Then I initialize a rotation matrix and apply this matrix to the PinholeCamera in order to trace the scene with new transformed cameracoordinates, right?
When tracing is finished i get the outputbuffer and use it draw the pixels with glDrawPixels(), just like in GLUTdisplay.cpp given in the OptiX framework.
But my issue is that the image is skewed/distorted. For example I wanted to display a ball, but the ball is extremley flatened, but the rotation works fine.
When I am zooming out, it seems that the Image scales much slower horizontally than vertically.
I am almost sure/hope that it has to do something with the gl...() functions that are not used properly. What am I missing? Can someone help me out?
For the completeness i post my paintGL() and updateGL() code.
void MyGLWidget::initializeGL()
glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
m_scene = new MeshViewer();
m_scene->setMesh( (std::string( sutilSamplesDir() ) + "/ball.obj").c_str());
int buffer_width, buffer_height;
// Set up scene
SampleScene::InitialCameraData initial_camera_data;
m_scene->setUseVBOBuffer( false );
m_scene->initScene( initial_camera_data );
int m_initial_window_width = 400;
int m_initial_window_height = 400;
if( m_initial_window_width > 0 && m_initial_window_height > 0)
m_scene->resize( m_initial_window_width, m_initial_window_height );
// Initialize camera according to scene params
m_camera = new PinholeCamera( initial_camera_data.eye,
-1.0f, // hfov is ignored when using keep vertical
PinholeCamera::KeepVertical );
Buffer buffer = m_scene->getOutputBuffer();
RTsize buffer_width_rts, buffer_height_rts;
buffer->getSize( buffer_width_rts, buffer_height_rts );
buffer_width = static_cast<int>(buffer_width_rts);
buffer_height = static_cast<int>(buffer_height_rts);
float3 eye, U, V, W;
m_camera->getEyeUVW( eye, U, V, W );
SampleScene::RayGenCameraData camera_data( eye, U, V, W );
// Initial compilation
// Accel build
m_scene->trace( camera_data );
m_scene->getContext()->launch( 0, 0 );
// Initialize state
glMatrixMode(GL_PROJECTION);glLoadIdentity();glOrtho(0, 1, 0, 1, -1, 1 );
glMatrixMode(GL_MODELVIEW); glLoadIdentity(); glViewport(0, 0, buffer_width, buffer_height);
And here is paintGL()
void MyGLWidget::paintGL()
float3 eye, U, V, W;
m_camera->getEyeUVW( eye, U, V, W );
SampleScene::RayGenCameraData camera_data( eye, U, V, W );
nvtx::ScopedRange r( "trace" );
m_scene->trace( camera_data );
// Draw the resulting image
Buffer buffer = m_scene->getOutputBuffer();
RTsize buffer_width_rts, buffer_height_rts;
buffer->getSize( buffer_width_rts, buffer_height_rts );
int buffer_width = static_cast<int>(buffer_width_rts);
int buffer_height = static_cast<int>(buffer_height_rts);
RTformat buffer_format = buffer.get()->getFormat();
GLvoid* imageData = buffer->map();
assert( imageData );
switch (buffer_format) {
/*... set gl_data_type and gl_format ...*/
RTsize elementSize = buffer->getElementSize();
int align = 1;
if ((elementSize % 8) == 0) align = 8;
else if ((elementSize % 4) == 0) align = 4;
else if ((elementSize % 2) == 0) align = 2;
glPixelStorei(GL_UNPACK_ALIGNMENT, align);
gldata = QGLWidget::convertToGLFormat(image_data);
glDrawPixels( static_cast<GLsizei>( buffer_width ), static_cast<GLsizei>( buffer_height ),gl_format, gl_data_type, imageData);
// glDraw
After hours of debugging, I found out that I forgot to set the Camera-parameters right, it had nothing to go to with the OpenGL stuff.
My U-coordinate, the horizontal axis of view plane was messed up, but the V,W and eye coordinates were right.
After I added these lines in initializeGL()
PinholeCamera::KeepVertical );
everything was right.

MandelBrot set Using openCL

Trying to use the same code (sort of) as what I have used when running using TBB (threading building blocks).
I don't have a great deal of experience with OpenCL, but I think most of the main code is correct. I believe the errors are in the .cl file, where it does the math.
Here is my mandelbrot code in TBB:
Mandelbrot TBB
Here is my code in OpenCL
Mandelbrot OpenCL
Any help would be greatly appreciated.
I changed the code in the kernel, and it ran fine. My new kernel code is the following:
// voronoi kernels
// local memory version
kernel void voronoiL(write_only image2d_t outputImage)
// get id of element in array
int x = get_global_id(0);
int y = get_global_id(1);
int w = get_global_size(0);
int h = get_global_size(1);
float4 result = (float4)(0.0f,0.0f,0.0f,1.0f);
float MinRe = -2.0f;
float MaxRe = 1.0f;
float MinIm = -1.5f;
float MaxIm = MinIm+(MaxRe-MinRe)*h/w;
float Re_factor = (MaxRe-MinRe)/(w-1);
float Im_factor = (MaxIm-MinIm)/(h-1);
float MaxIterations = 50;
//C imaginary
float c_im = MaxIm - y*Im_factor;
//C real
float c_re = MinRe + x*Re_factor;
//Z real
float Z_re = c_re, Z_im = c_im;
bool isInside = true;
bool col2 = false;
bool col3 = false;
int iteration =0;
for(int n=0; n<MaxIterations; n++)
// Z - real and imaginary
float Z_re2 = Z_re*Z_re, Z_im2 = Z_im*Z_im;
//if Z real squared plus Z imaginary squared is greater than c squared
if(Z_re2 + Z_im2 > 4)
if(n >= 0 && n <= (MaxIterations/2-1))
col2 = true;
isInside = false;
else if(n >= MaxIterations/2 && n <= MaxIterations-1)
col3 = true;
isInside = false;
Z_im = 2*Z_re*Z_im + c_im;
Z_re = Z_re2 - Z_im2 + c_re;
result = (float4)(iteration*0.05f,0.0f, 0.0f, 1.0f);
else if(col3)
result = (float4)(255, iteration*0.05f, iteration*0.05f, 1.0f);
else if(isInside)
result = (float4)(0.0f, 0.0f, 0.0f, 1.0f);
write_imagef(outputImage, (int2)(x, y), result);
You can also find it here:
See this link. It's developed by #eric-bainville. The CPU code both native and with OpenCL is not optimal (it does not use SSE/AVX) but I think the GPU code may be good. For the CPU you can speed up the code quite a bit by using AVX and operating on eight pixels at once.
