How to clear a gl.RGBA8UI texture - webgl2

I have a RGBA8UI internal format texture attached to a framebuffer, and somehow I need to reset it to a default value. Unfortunately, a simple gl.clear(gl.COLOR_BUFFER_BIT) does not seem to work, (I suspect the issue is the internal format of the texture), giving this error:
[.WebGL-000020EE00AD4700] GL_INVALID_OPERATION: No defined conversion between clear value and attachment format.
The framebuffer status is FRAMEBUFFER_COMPLETE. Here is a minimal example for jsfiddle:
HTML
<canvas id="canvas"></canvas>
JS
const gl = document.getElementById('canvas').getContext('webgl2');
let targetTexture = gl.createTexture();
gl.bindTexture(gl.TEXTURE_2D, targetTexture);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.LINEAR);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.LINEAR);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA8UI, 1, 1, 0, gl.RGBA_INTEGER, gl.UNSIGNED_BYTE, null);
let fb = gl.createFramebuffer();
gl.bindFramebuffer(gl.FRAMEBUFFER, fb);
gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, targetTexture, 0);
let fstatus = gl.checkFramebufferStatus(gl.FRAMEBUFFER);
console.log(fstatus, gl.FRAMEBUFFER_COMPLETE);
gl.clearColor(0, 0, 0, 0);
gl.clear(gl.COLOR_BUFFER_BIT);
QUESTION
How can I clear/reset the texture?

Well according to the WebGL 2 spec (Section 3.7.9) you can not clear integer color buffers:
If an integer color buffer is among the buffers that would be cleared, an INVALID_OPERATION error is generated and nothing is cleared.
So you have to either call texImage2D again like you already did:
gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA8UI, 1, 1, 0, gl.RGBA_INTEGER, gl.UNSIGNED_BYTE, null);
or (maybe more efficient) use a screenspace quad and a fragment-shader to write a constant value of your liking.

I just ran into this myself. clear does not work for integer formats. Instead you can use:
gl.clearBufferuiv(gl.COLOR, 0, [0, 0, 0, 0]);
There's also clearBufferiv for signed integer formats, clearBufferfv for float formats and depth, and clearBufferfi specifically for DEPTH_STENCIL formats.
Docs are here: https://registry.khronos.org/OpenGL-Refpages/es3.0/html/glClearBuffer.xhtml

Related

Is there any reason why clEnqueueNDRangeKernel may block?

I am developing an application making use of OpenCL, targeted to 1.2 versión. I use DX11 interoperability to display the kernel results. I try my code in Intel (iGPU) and Nvidia platforms, in both I recon the same behaviour.
My call to clEnqueueNDRangeKernel is blocking the CPU thread. I have checked the documentation and I can not find an statement declaring the situations in which a kernel call may block. I have read in some forums that those things happens sometimes with some OpenCL implementations. The code is working properly and outputting valid results. The API does not return any error at any given point, all seems smooth.
I can not paste the full source but I will paste the in-loop part:
size_t local = 64;
size_t global = ctx->dec_in_host->horizontal_blocks * ctx->dec_in_host->vertical_blocks * local;
print_if_error(clEnqueueWriteBuffer(ctx->queue, ctx->blocks_gpu, CL_TRUE, 0, sizeof(block_input) * TOTAL_BLOCKS, ctx->blocks_host, 0, NULL, &ctx->blocks_copy_status), "copying data");
print_if_error(clEnqueueWriteBuffer(ctx->queue, ctx->dec_in_gpu, CL_TRUE, 0, sizeof(decoder_input), ctx->dec_in_host, 0, NULL, &ctx->frame_copy_status), "copying data");
if (ctx->mode == nv_d3d11_sharing)
print_if_error(ctx->fp_clEnqueueAcquireD3D11ObjectsNV(ctx->queue, 1, &(ctx->image_gpu), 0, NULL, NULL), "Adquring texture");
else if (ctx->mode == khr_d3d11_sharing)
print_if_error(ctx->fp_clEnqueueAcquireD3D11ObjectsKHR(ctx->queue, 1, &(ctx->image_gpu), 0, NULL, NULL), "Adquring texture");
t1 = clock();
print_if_error(clEnqueueNDRangeKernel(ctx->queue, ctx->kernel, 1, NULL, &global, &local, 0, NULL, &ctx->kernel_status), "kernel launch");
t2 = clock();
if (ctx->mode == nv_d3d11_sharing)
print_if_error(ctx->fp_clEnqueueReleaseD3D11ObjectsNV(ctx->queue, 1, &(ctx->image_gpu), 0, NULL, NULL), "Releasing texture");
else if (ctx->mode == khr_d3d11_sharing)
print_if_error(ctx->fp_clEnqueueReleaseD3D11ObjectsKHR(ctx->queue, 1, &(ctx->image_gpu), 0, NULL, NULL), "Releasing texture");
printf("Elapsed time %lf ms\n", (double)(t2 - t1)*1000 / CLOCKS_PER_SEC);
So my question is:
¿Do you know any reason why the clEnqueueNDRangeKernel would block?
¿Do you know if the Dx11 interop might cause this?
¿Do you know if some OpenCL configuration can create a syncronous kernel launch?
Thank you :)
EDIT 1:
Thanks to doqtor comment I realize that commenting out parts of the kernel the kernel launch becomes asyncronous. The result is not Ok but I have some hint to work out the answer.

Issues with clEnqueueMapBuffer in OpenCL

I'm developing a program that implements a recursive ray tracing in OpenCL.
To run the kernel I have to options of devices: the Intel one that is integrated with the system and the Nvidia GeForce graphic Card.
When I run the project with the first device there's no problem; it runs correctly and shows the result of the algorithm just fine.
But when I try to run it with the Nvidia device, it crashes in the callback function that has the synchronous buffer map.
The part of the code where it crashes is the following:
clEnqueueNDRangeKernel( queue, kernel, 1, NULL, &global_work_size, NULL, 0, NULL, NULL);
// 7. Look at the results via synchronous buffer map.
cl_float4 *ptr = (cl_float4 *) clEnqueueMapBuffer( queue, buffer, CL_TRUE, CL_MAP_READ, 0, kWidth * kHeight * sizeof(cl_float4), 0, NULL, NULL, NULL );
cl_float *viewTransformPtr = (cl_float *) clEnqueueMapBuffer( queue, viewTransform, CL_TRUE, CL_MAP_WRITE, 0, 16 * sizeof(cl_float), 0, NULL, NULL, NULL );
cl_float *worldTransformsPtr = (cl_float *) clEnqueueMapBuffer( queue, worldTransforms, CL_TRUE, CL_MAP_WRITE, 0, 16 * sizeof(cl_float), 0, NULL, NULL, NULL );
memcpy(viewTransformPtr, viewMatrix, sizeof(float)*16);
memcpy(worldTransformsPtr, sphereTransforms, sizeof(float)*16);
clEnqueueUnmapMemObject(queue, viewTransform, viewTransformPtr, 0, 0, 0);
clEnqueueUnmapMemObject(queue, worldTransforms, worldTransformsPtr, 0, 0, 0);
unsigned char* pixels = new unsigned char[kWidth*kHeight*4];
for(int i=0; i < kWidth * kHeight; i++){
pixels[i*4] = ptr[i].s[0]*255;
pixels[i*4+1] = ptr[i].s[1]*255;
pixels[i*4+2] = ptr[i].s[2]*255;
pixels[i*4+3] = 1;
}
glBindTexture(GL_TEXTURE_2D, 1);
glTexParameterf( GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR );
glTexParameterf( GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR );
glTexImage2D(GL_TEXTURE_2D, 0, 4, kWidth, kHeight, 0, GL_RGBA, GL_UNSIGNED_BYTE, pixels);
delete [] pixels;
The two last calls to clEnqueueMapBuffer return the error -5 that matches CL_OUT_OF_RESOURCES but I believe that the sizes of the buffers are correct.
According to the CL spec, calling CL blocking calls from a callback is undefined. It is likely your code is correct, but you can't use it from a Callback. In Intel platform with integrated memory, the maps are no-ops, thus, not failing.
CL spec: clSetEventCallbacks
The behavior of calling expensive system routines, OpenCL API calls to
create contexts or command-queues, or blocking OpenCL operations from
the following list below, in a callback is undefined.
clFinish
clWaitForEvents
blocking calls to clEnqueueReadBuffer, clEnqueueReadBufferRect, clEnqueueWriteBuffer, and clEnqueueWriteBufferRect
blocking calls to clEnqueueReadImage and clEnqueueWriteImage
blocking calls to clEnqueueMapBuffer and clEnqueueMapImage
blocking calls to clBuildProgram
If an application needs to wait for completion of a routine from the
above l ist in a callback, please use the non-blocking form of the
function, and assign a completion callback to it to do the remainder
of your work. Note that when a callback (or other code) enqueues
commands to a command-queue, the commands are not required to begin
execution until the queue is flushed. In standard usage, blocking
enqueue calls serve this role by implicitly flushing the queue. Since
blocking calls are not permitted in callbacks, those callbacks that
enqueue commands on a command queue should either call clFlush on the
queue before returning or arrange for clFlush to be called later on
another thread.

Draw in QGLFrameBufferObject

With Qt and opengl, I would like draw in a QGlFrameBufferObject ?
I try this :
QGLFrameBufferObject * fbo = new QGLFramebufferObject(200, 200, QGLFramebufferObject::NoAttachment, GL_TEXTURE_2D, GL_RGBA32F);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrtho(0.0f, fbo->size().width(), fbo->size().height(), 0.0f, 0.0f, 1.0f);
fbo->bind();
glClearColor(1.0f, 0.0f, 0.0f, 1.0f);
fbo->release();
fbo->toImage().save("test.jpg"));
But I don't get a red image.
OTHER QUESTION
And if I want draw with :
glBegin(GL_QUAD);
glColor3d(0.2, 0.5, 1.0);
glVertex2i( 10, 20);
glColor3d(0.7, 0.5, 1.0);
glVertex2i( 15, 20);
glColor3d(0.8, 0.4, 1.0);
glVertex2i( 15, 25);
glColor3d(0.1, 0.9, 1.0);
glVertex2i( 10, 25);
glEnd();
Do I need also glClear() ?
You never actually clear the framebuffer. glClearColor() only sets the color used for clearing, but does not clear anything. You will need to add the second line here:
glClearColor(1.0f, 0.0f, 0.0f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT);
The glClear() call will clear the color buffer with the clear color you specified in the first call. The framebuffer content is initially undefined, so you should always clear the framebuffer with a glClear() call before you start drawing. The only exception is if you're certain that the primitives you render will cover every pixel of the drawing surface. Even then, on some architectures it can actually be better for performance to still call glClear() first.
It shouldn't matter yet as long as you only clear the buffer. But once you want to start drawing, you will also need to set the viewport:
glViewport(0, 0, 200, 200);

Creating texture from RGB

I am receiving an screen-shot Palette over the network. Next, I convert the Palette indices to their relevant RGB using my matrix containing different colors.
The problem I have with the code is I cannot create a texture out of this RGB image I receive successfully.
oid GlWidget::createTextureFromBitmap(QByteArray btmp)
{
tex.buf = new unsigned char[bytes.size()];
memcpy(tex.buf, bytes.constData(), bytes.size());
glGenTextures( 1, &tex.id);
glBindTexture(GL_TEXTURE_2D, tex.id);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER,GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER,GL_LINEAR_MIPMAP_LINEAR);
glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
glTexImage2D( GL_TEXTURE_2D, 0, GL_RGB, 0, 0, 0, GL_RGB, GL_UNSIGNED_BYTE, tex.buf);
gluBuild2DMipmaps(GL_TEXTURE_2D, GL_RGB, 800, 600, GL_RGB, GL_UNSIGNED_BYTE, tex.buf);
delete [] tex.buf;
tex.buf = NULL;
updateGL();
}
void GlWidget::paintGL()
{
shaderProgram.bind();
shaderProgram.setUniformValue("mvpMatrix", pMatrix * vMatrix * mMatrix);
shaderProgram.setUniformValue("texture", 0);
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, tex.id);
//glActiveTexture(0);
shaderProgram.setAttributeArray("vertex", vertices.constData());
shaderProgram.enableAttributeArray("vertex");
shaderProgram.setAttributeArray("textureCoordinate", textureCoordinates.constData());
shaderProgram.enableAttributeArray("textureCoordinate");
glDrawArrays(GL_TRIANGLES, 0, vertices.size());
shaderProgram.disableAttributeArray("vertex");
shaderProgram.disableAttributeArray("textureCoordinate");
shaderProgram.release();
}
glTexImage2D( GL_TEXTURE_2D, 0, GL_RGB, 0, 0, 0, GL_RGB, GL_3_BYTES, tex.buf);
gluBuild2DMipmaps(GL_TEXTURE_2D, GL_RGB, 800, 600, GL_RGB, GL_3_BYTES, tex.buf);
GL_3_BYTES is not a valid data type here, you prpbably mean GL_UNSIGNED_BYTE. Apart from that, there are some redundant calls here. Your glTexImage2D call will specify the data for mipmap level 0, the gluBuild2DMipmaps will specify all the mipmap levels, including 0. Since you use mipmapping, you will only need the latter of those calls. But both calls will fail with GL_INVALID_ENUM due to the wrong data type, so you don't have any texture. So you can replace that two lines by:
gluBuild2DMipmaps(GL_TEXTURE_2D, GL_RGB, 800, 600, GL_RGB, GL_UNSIGNED_BYTE, tex.buf);
Another thing:
glActiveTexture(0);
this is also invalid. Only the GL_TEXTUREn values are valid for that call. You also do not need to "disable" the active texture selector, as you might try to do here. It will always point to some valid texture unit.
In addition to the fixes mentioned above, you can improve the performance by not having to do an additional memcopy (1 in your code, 1 in GL driver), by sending the palette also as a texture, then doing the same indexing that you are doing in the CPU code, inside the shader code. Also, if you are anyway setting the bytes[] array, no need to clear the array first.

Opencl Buffer and kernel execution

Taken from OpenCL by Action
The following code achieves the target shown in the figure.
It creates two buffer objects and copies the content of Buffer 1 to Buffer 2 with clEnqueueCopyBuffer.
Then clEnqueueMapBuffer maps the content of Buffer 2 to host memory and memcpy transfers the mapped memory to an array.
My question is will my code still work If I do not write the following lines in the code:
err = clSetKernelArg(kernel, 0, sizeof(cl_mem),
&buffer_one);
err |= clSetKernelArg(kernel, 1, sizeof(cl_mem),
&buffer_two);
queue = clCreateCommandQueue(context, device, 0, &err);
err = clEnqueueTask(queue, kernel, 0, NULL, NULL);
The kernel is blank, it's doing nothing. What is the need of setting kernel argument, and enqueueing the task?
...
float data_one[100], data_two[100], result_array[100];
cl_mem buffer_one, buffer_two;
void* mapped_memory;
...
buffer_one = clCreateBuffer(context,
CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR,
sizeof(data_one), data_one, &err);
buffer_two = clCreateBuffer(context,
CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR,
sizeof(data_two), data_two, &err);
err = clSetKernelArg(kernel, 0, sizeof(cl_mem),
&buffer_one);
err |= clSetKernelArg(kernel, 1, sizeof(cl_mem),
&buffer_two);
queue = clCreateCommandQueue(context, device, 0, &err);
err = clEnqueueTask(queue, kernel, 0, NULL, NULL);
err = clEnqueueCopyBuffer(queue, buffer_one,
buffer_two, 0, 0, sizeof(data_one),
0, NULL, NULL);
mapped_memory = clEnqueueMapBuffer(queue,
buffer_two, CL_TRUE, CL_MAP_READ, 0,
sizeof(data_two), 0, NULL, NULL, &err);
memcpy(result_array, mapped_memory, sizeof(data_two));
err = clEnqueueUnmapMemObject(queue, buffer_two,
mapped_memory, 0, NULL, NULL);
}
...
I believe the point of calling the enqueueTask would be to ensure that the data is actually resident on the device. It is possible that when using the CL_MEM_COPY_HOST_PTR flag that the memory is still kept on the host side until it is needed by a kernel. Enqueueing the task therefore ensures that the memory is brought to the device. This may also happen on some devices but not others.
You could test this theory by instrumenting your code and measuring the time taken to run the task, both when the task has arguments, and when it does not. If the task takes significantly longer with arguments, then this is likely what is going on.

Resources