I am developing a library for Qt that extends it's OpenGL functionality to support modern OpenGL (3+) features like texture buffers, image load-store textures, shader storage buffers, atomic counter buffers, etc.. In order to fully support features like transform feedback, I allow users to bind a buffer object to different targets at different times (regardless of what data they allocated the buffer with). Consider the following scenario where I use transform feedback to advance vertex data ONCE, and then bind it to a separate program for rendering (used for the rest of the application run time) :
// I attach a (previously allocated) buffer to the transform feedback target so that I
// can capture data advanced in a shader program.
glBindBuffer(GL_TRANSFORM_FEEDBACK_BUFFER, someBufferID);
// Then I execute the shader program...
// and release the buffer from the transform feedback target.
glBindBuffer(GL_TRANSFORM_FEEDBACK_BUFFER, 0);
// Then, I bind the same buffer containing data advanced via transform feedback
// to the array buffer target for use in a separate shader program.
glBindBuffer(GL_ARRAY_BUFFER, someBufferID);
// Then, I render something using this buffer as the data source for a vertex attrib.
// After, I release this buffer from the array buffer target.
glBindBuffer(GL_ARRAY_BUFFER, 0);
In this scenario, being able to bind a buffer object to multiple targets is useful. However, I am uncertain if there are situations in which this capability would cause problems given the OpenGL specification. Essentially, should I allow a single buffer object to be bound to multiple targets or force a target (like the standard Qt buffer wrappers) during instantiation?
Edit:
I have found that there is a problem mixing creation/binding targets with TEXTURE objects as the OpenGL documentation for glBindTexture states:
GL_INVALID_OPERATION is generated if texture was previously created
with a target that doesn't match that of target.
However, the documentation for glBindBuffer states no such problem.
Well, there can always be faulty drivers (and if you don't have "green" hardware nothing can be guaranteed, anyway). But rest assured that it is perfectly legal to bind buffers to any target you want, disregarding the target they were created with.
There might be certain subtleties, like certain targets need to be used in an indexed way using glBindBufferRange/Base (like uniform buffers or transform feedback buffers). But enforcing a buffer to be used with a single target only, like Qt does, is too rigid to be useful in modern OpenGL, as your very common example already shows. Maybe a good compromise would be to use an either per-class (like Qt does) or per-buffer default target (which will be sufficient in most simple situations, when the buffer is always used with a single target, and can be used for class-internal binding operations if neccessary), but provide the option to bind it to something else.
Related
I'm trying to work with 2D in vulkan along with 3D. So right now testing out updating a texture for every frame as whatever 2D is going on. I've gotten something of a texture updater working, the problem is that it's very slow and probably not the way it's supposed to be done. Is there any better way of getting this done? The code is based on the https://vulkan-tutorial.com/ code.
https://vulkan-tutorial.com/code/26_depth_buffering.cpp
void UpdateTexture()
{
vkDeviceWaitIdle(device);
vkFreeMemory(device, textureImageMemory, nullptr);
VkBuffer stagingBuffer;
VkDeviceMemory stagingBufferMemory;
createBuffer(imageSize, VK_BUFFER_USAGE_TRANSFER_SRC_BIT, VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, stagingBuffer, stagingBufferMemory);
void* data;
vkMapMemory(device, stagingBufferMemory, 0, imageSize, 0, &data);
memcpy(data, pixel2.data(), static_cast<size_t>(imageSize));
vkUnmapMemory(device, stagingBufferMemory);
createImage(texWidth, texHeight, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_TILING_OPTIMAL, VK_IMAGE_USAGE_TRANSFER_DST_BIT | VK_IMAGE_USAGE_SAMPLED_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, textureImage, textureImageMemory);
transitionImageLayout(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);
copyBufferToImage(stagingBuffer, textureImage, static_cast<uint32_t>(texWidth), static_cast<uint32_t>(texHeight));
transitionImageLayout(textureImage, VK_FORMAT_R8G8B8A8_SRGB, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL);
vkDestroyBuffer(device, stagingBuffer, nullptr);
vkFreeMemory(device, stagingBufferMemory, nullptr);
createTextureImageView();
createDescriptorPool();
createDescriptorSets();
createCommandBuffers();
}
This code looks like a direct translation of some OpenGL code, and not particularly good/modern OpenGL code at that.
There's a lot wrong in this code, but most of it boils down to over-synchronization.
First, you should always view any call to vkDeviceWaitIdle as the wrong thing to do. The only exception would be when you are preparing to destroy the VkDevice itself. There is no other reason to do a full CPU/GPU sync like that.
Presumably, this synchronization exists so that you can be sure the GPU is finished using the image before modifying it. This is the wrong thing to do. You should instead employ multiple-buffering. That is, you should have two images that you use. One is currently being used in a rendering process, while the other is being transferred into.
Instead of doing a full device sync, you instead synchronize with the batch you sent two frames ago. That is, if you're wanting to transfer data for use by frame 10, then you must first do a fence-sync operation with the batch you sent in frame 8. Frame 9 is still being processed, but frame 8 is probably done by now. So the synchronization shouldn't hurt too much.
Second, never allocate memory in the middle of an operation like this. Memory gets allocated early in your application, and you leave it allocated until it's time to destroy your application. If you need a staging buffer, then keep it around and reuse it in subsequent frames. Make sure to allocate sufficient storage up-front.
Whatever your createBuffer call is doing, it seems very much like a bad idea. Vulkan is not OpenGL; Vulkan separated memory from buffers/textures that use it for a reason. Creating APIs that hide this separation basically throws all of that away.
Similarly, never unmap memory, unless you're about to destroy that memory object. There's no problem in Vulkan (or OpenGL) with leaving a piece of memory mapped indefinitely. Just map the entire memory's range and leave it mapped. Indeed, you could just pass the mapped pointer directly to your image loader, depending on how the memory get written by the image loading code (if it tries to read data from this pointer, they could be trouble).
Lastly, the commands doing the transfer need to be synchronized with the commands that consume the image. How this happens depends on which queues are being used to do the transfer.
And of course, if you want optimal performance, you may want to check to see if your implementation can read from linear images in your shader. If it can, then you may not need staging at all; you can just write the data directly to the memory in Vulkan's image format, and use it directly.
Employing all of the above is going to add a lot of complexity to your application. But that's how it's supposed to work.
A naive way consists in using the CPU to define the update depending on the time or data and then update the data for the shader, such as a MVP transformation matrix. But this is inefficient with lots of syncing and too low refresh rates, and also overloading the cpu in a loop.
So people recommend using many buffers sometimes mentioning old drivers. If someone can clarify it, that would be nice. I have a naive and probably wrong guess. If they know exactly the frame rate, then they can calculate the time for each frame and dispatch several frames in advance. But it confuses me because the frame rate is dynamic, especially for new screens with the FreeSync functionality that have dynamic refresh rates.
I have thought of a third possibility. One can use the clock directly in the shader. GL_EXT_shader_realtime_clock provides clockRealtimeEXT. It has no defined unit, and will wrap when exceeding the maximum value. But it is said "globally coherent by all invocations on the GPU". During initialization, you can measure its rate using a uniform buffer, and then assume the rate will be constant. And also manage the wrapping.
Then if you can write your shaders as a function of time, for example in a translation, that would be efficient. You just need the initial data. Remember that one must avoid if conditions in shaders.
VkImageCreateInfo has the following member:
VkFormat format;
And VkImageViewCreateInfo has the same member.
What I don't understand why you would ever have a different format in the VkImageView from the VkImage needed to create it.
I understand some formats are compatible with one another, but I don't know why you would use one of the alternate formats
The canonical use case and primary original motivation (in D3D10, where this idea originated) is using a single image as either R8G8B8A8_UNORM or R8G8B8A8_SRGB -- either because it holds different content at different times, or because sometimes you want to operate in sRGB-space without linearization.
More generally, it's useful sometimes to have different "types" of content in an image object at different times -- this gives engines a limited form of memory aliasing, and was introduced to graphics APIs several years before full-featured memory aliasing was a thing.
Like a lot of Vulkan, the API is designed to expose what the hardware can do. Memory layout (image) and the interpretation of that memory as data (image view) are different concepts in the hardware, and so the API exposes that. The API exposes it simply because that's how the hardware works and Vulkan is designed to be a thin abstraction; just because the API can do it doesn't mean you need to use it ;)
As you say, in most cases it's not really that useful ...
I think there are some cases where it could be more efficient, for example getting a compute shader to generate integer data for some types of image processing can be more energy efficient than either float computation or manually normalizing integer data to create unorm data. Using aliasing you the compute shader can directly write e.g. uint8 integers and a fragment shader can read the same data as unorm8 data
I have a QAbstractListModel with custom objects as items. Each object has a QImage that is loaded from a database. I use ListView in QML to visualize it but I do not see any mean to represent QImage in the delegate. Image primitive seems to accept only URLs.
Is the only way for me to show QImages is to create a QQuickImageProvider with some custom system of a URL per element (looks like a total overkill)?
I think QQuickImageProvider is the proper way.
Also, I think you can use the word 'overkill' if you know exactly how the Qt internals work. Otherwise it's just guessing.
AFAIK there is a complex caching system of images (and other data) underneath, so once an image pixmap is loaded (and doesn't change) data retrieval is immediate. So no overkill at all, since in any case at some point you need to load those QImage, but just once.
I believe a QQuickImageProvider provides pointers to the cached data, and not the whole rasterized data every time. Moreover blitting operations are nowadays performed with hardware accelerations, so it's a single operation taking a fraction of millisecond.
In other words you end up having:
give me image with url "image://xyz"
Qt looks up in the cache and returns the data pointer or performs a full load of the image if not found
the QML renderer passes the data array to OpenGL
one single blit operation (microseconds) and you have it on screen
A QML ShaderEffect will bind a QImage to a GLSL Sampler2D. See the list of "how properties are mapped to GLSL uniform variables" in the ShaderEffect docs. Needs a few lines of GLSL writing in the ShaderEffect to pass pixels through, (will update this post with an example sometime, when I have access to my code), but the result is a fast, QSG-friendly all-QML element which can render QImages.
I want to implement an algorithm in openCL which needs to apply a certain transformation on a 3D grayscale image several times. I have an input and an output image for my kernel. Now I would like to simply swap the input and output image and apply the kernel again. However, one image was created with read_only and the other one with write_only. Does this mean I have to use conventional buffers, or is there some trick, how to flip the two images, without first copying them from the device back to the host and back to the device again?
You say: "However, one image was created with read_only and the other one with write_only". The obvious answer is: don't do that, and you'll be fine.
The less obvious subtext is: There's a difference between creating an image with writeonly/readonly flags (which is done on the host-side via clCreateImage(...,CL_MEM_WRITE_ONLY/CL_MEM_READ_ONLY)) and the access-type inside a particular kernel (which is specified with the __read_only/__write_only qualifiers in the kernel's arguments definition).
Unless I'm totally mistaken, you can safely create your image with no restrictions (i.e. CL_MEM_READ_WRITE), then use it as a kernel's input parameter, and for the next kernel run, use it as the output parameter. You just can't mix read/write accesses during a single kernel run.
I use two QGLShaderProgram for processing the texture.
ShaderProgram1->bind(); // QGLShaderProgram
ShaderProgram2->bind();
glBegin(GL_TRIANGLE_STRIP);
...
glEnd();
ShaderProgram1->release();
ShaderProgram2->release();
The texture should be processed with Shaderprogram1 and then ShaderProgram2. But when I call ShaderProgram2->bind() automatically fires ShaderProgram1->release() and only one shader works. How do I bind both shaders?
You don't.
Unless these are separate shaders (and even they don't work that way), each rendering operation will apply a single set of shaders to the rendered primitive. That means a single Vertex Shader, followed by any Tessellation Shaders, followed by optionally a single Geometry Shader, followed by a single Fragment Shader.
If you want to daisy-chain shaders, you have to do that within the shaders themselves.
I know this is a very old question, but I just wanted to add my two cents here for anyone who might stumble upon this.
If you want to run multiple shaders that use the same texture, you should set the active texture at the start of the update loop. Then, you must run the shaders one at a time. One shader has to complete before the other may begin. So instead, it would look like this.
ShaderProgram1->bind();
...
ShaderProgram1->release();
ShaderProgram2->bind();
...
ShaderProgram2->release();