I'm trying to create a single pipeline with a layout where it requires two bindings, a dynamic UBO and a image/sampler binding. I want each binding to come from a separate descriptor set, so I'd bind two descriptor sets per draw call. One descriptor set is for texture per object, the other is for the dynamic UBO(shared between objects). I want to be able to do something like this in the rendering portion:
commandBuffer.bindPipeline(vk::PipelineBindPoint::eGraphics, pipeline);
for (int ii = 0; ii < mActiveQuads; ii++)
{
uint32_t dynamicOffset = ii * static_cast<uint32_t>(dynamicAlignment);
// bind texture for this quad
commandBuffer.bindDescriptorSets(vk::PipelineBindPoint::eGraphics, sharedPipelineLayout, 0, 1,
&swapResources[current_buffer].textureDescriptors[ii], 1, &dynamicOffset);
// draw the dynamic UBO with offset for this quad
commandBuffer.bindDescriptorSets(vk::PipelineBindPoint::eGraphics, sharedPipelineLayout, 0, 1,
&swapResources[current_buffer].quadDescriptor, 1, &dynamicOffset);
commandBuffer.draw(2 * 3, 1, 0, 0);
}
But this doesn't seem to work. First of all I'm not sure I have understood everything about descriptor sets and pipeline layouts to know if what I'm doing is allowed. Does this even make sense? That I can create a pipeline with a 2 binding layout, but create each descriptor to fill just one of those bindings each, then bind two descriptors per draw call for that pipeline?
If it is allowed. This is how I'm creating the pipeline and descriptors:
vk::DescriptorSetLayoutBinding const layout_bindings[2] = { vk::DescriptorSetLayoutBinding()
.setBinding(0)
.setDescriptorType(vk::DescriptorType::eUniformBufferDynamic)
.setDescriptorCount(1)
.setStageFlags(vk::ShaderStageFlagBits::eVertex)
.setPImmutableSamplers(nullptr),
vk::DescriptorSetLayoutBinding()
.setBinding(1)
.setDescriptorType(vk::DescriptorType::eCombinedImageSampler)
.setDescriptorCount(1)//texture_count)
.setStageFlags(vk::ShaderStageFlagBits::eFragment)
.setPImmutableSamplers(nullptr) };
// note binding count is 1 here
auto const descriptor_layout = vk::DescriptorSetLayoutCreateInfo().setBindingCount(1).setPBindings(&layout_bindings[0]); // using the first part of the above layout
device.createDescriptorSetLayout(&descriptor_layout, nullptr, &quadDescriptorLayout);
// note binding count is 1 here
auto const descriptor_layout2 = vk::DescriptorSetLayoutCreateInfo().setBindingCount(1).setPBindings(&layout_bindings[1]); // using the second part of the above layout
device.createDescriptorSetLayout(&descriptor_layout2, nullptr, &textureDescriptorLayout);
// Now create the pipeline, note we use both the bindings above with
// layout count = 2
auto const pPipelineLayoutCreateInfo = vk::PipelineLayoutCreateInfo().setSetLayoutCount(2).setPSetLayouts(desc_layout);
device.createPipelineLayout(&pPipelineLayoutCreateInfo, nullptr, &sharedPipelineLayout);
and the descriptors themselves:
// alloc quad descriptor
alloc_info =
vk::DescriptorSetAllocateInfo()
.setDescriptorPool(desc_pool)
.setDescriptorSetCount(1)
.setPSetLayouts(&quadDescriptorLayout);
// texture descriptors(multiple descriptors, one per quad object)
alloc_info =
vk::DescriptorSetAllocateInfo()
.setDescriptorPool(desc_pool)
.setDescriptorSetCount(1)
.setPSetLayouts(&textureDescriptorLayout);
Previously, with the texture and UBO in a single descriptor set it worked fine, I could see multiple quads but all sharing a single texture. When I split off the textures into a different descriptor set, that's when I get a hanging app. I get a 'device lost' error on trying to submit the graphics queue.
Any insight as to if this is possible to do or if I'm doing something wrong in my setup would be very appreciated. Thank you very much!
Below adding Shader code:
#version 450
#extension GL_ARB_separate_shader_objects : enable
layout(binding = 0) uniform UniformBufferObject {
mat4 mvp;
vec4 position[6];
vec4 attr[6];
} ubo;
layout(location = 0) out vec2 fragTexCoord;
void main() {
gl_Position = ubo.mvp *ubo.position[gl_VertexIndex];
fragTexCoord = vec2(ubo.attr[gl_VertexIndex].x, ubo.attr[gl_VertexIndex].y);
}
Pixel shader:
#version 450
#extension GL_ARB_separate_shader_objects : enable
layout(set=0, binding = 1) uniform sampler2D texSampler;
layout(location = 0) in vec2 fragTexCoord;
layout(location = 0) out vec4 outColor;
void main() {
outColor = texture(texSampler, fragTexCoord);
}
Yes, you can do this. Your pipeline layout has two descriptor sets. Each of the two descriptor set layouts has one descriptor: a dynamic UBO and a texture. At draw time, you bind one descriptor set of each descriptor set layout to the appropriate set number.
It looks like the firstSet parameter when binding the texture descriptor set is wrong: that's the second set in the pipeline layout, so it has index 1, but you're passing 0. The validation layers should have warned you that you're binding a descriptor set with a set layout that doesn't match what the pipeline layout expects for that set.
You don't show the shader code that accesses these, so you may have done this. But when going from a single descriptor set to two descriptor sets, you need to update the set index in the sampler binding.
Related
I try to optimize a working compute shader. Its purpose is to create an image: find the good color (using a little palette), and call imageStore(image, ivec2, vec4).
The colors are indexed, in an array of uint, in an UniformBuffer.
One color in this UBO is packed inside one uint, as {0-255, 0-255, 0-255, 0-255}.
Here the code:
struct Entry
{
*some other data*
uint rgb;
};
layout(binding = 0) uniform SConfiguration
{
Entry materials[MATERIAL_COUNT];
} configuration;
void main()
{
Entry material = configuration.materials[currentMaterialId];
float r = (material.rgb >> 16) / 255.;
float g = ((material.rgb & G_MASK) >> 8) / 255.;
float b = (material.rgb & B_MASK) / 255.;
imageStore(outImage, ivec2(gl_GlobalInvocationID.xy), vec4(r, g, b, 0.0));
}
I would like to clean/optimize a bit, because this color conversion looks bad/useless in the shader (and should be precomputed). My question is:
Is it possible to directly pack a vec4(r, g, b, 0.0) inside the UBO, using 4 bytes (like a R8G8B8A8) ?
Is it possible to do it directly? No.
But GLSL does have a number of functions for packing/unpacking normalized values. In your case, you can pass the value as a single uint uniform, then use unpackUnorm4x8 to convert it to a vec4. So your code becomes:
vec4 color = unpackUnorm4x8(material.rgb);
This is, of course, a memory-vs-performance tradeoff. So if memory isn't an issue, you should probably just pass a vec4 (never use vec3) directly.
Is it possible to directly pack a vec4(r, g, b, 0.0) inside the UBO, using 4 bytes (like a R8G8B8A8) ?
There is no way to express this directly as 4 single byte values; there is no appropriate data type in the shader to allow you to do declare this as a byte type.
However, why do you think you need to? Just upload it as 4 floats - it's a uniform so it's not like you are replicating it thousands of times, so the additional size is unlikely to be a problem in practice.
I have a QOpenGLFramebufferObject that I've been writing to and reading from using texture() in my application. I've added a second color attachment to include some additional data, but it seems like no data is being written to it.
// creating the FBO (this has been working)
_drawFbo = new QOpenGLFramebufferObject(PAINT_FBO_WIDTH, PAINT_FBO_WIDTH, QOpenGLFramebufferObject::Depth);
// now I'm adding another color attachment
_drawFbo->addColorAttachment(PAINT_FBO_WIDTH, PAINT_FBO_WIDTH);
And then in my shader I write to both attachments when the shader is bound:
layout(location=0) out vec4 meshWithPaintColor;
layout(location=1) out vec4 primitiveId;
void main() {
...
meshWithPaintColor = vec4(finalColor, 0);
primitiveId = vec4(1,1,1,1);
When I try reading from this 2nd attachment using the textures()[1] value bound to a shader sampler the values always seem to be zero.
Do I need to do anything with the QOpenGLFramebufferObject to allow drawing to the second color attachment?
I did in fact have to call glDrawBuffers myself. I assumed this was handled by the FBO binding, but apparently not.
QOpenGLExtraFunctions* f = QOpenGLContext::currentContext()->extraFunctions();
GLenum bufs[2] = { GL_COLOR_ATTACHMENT0, GL_COLOR_ATTACHMENT1 };
f->glDrawBuffers(2, bufs);
This seems strange to me that the FBO abstraction supports color attachments, but requires extra functions to use them.
i want to create array of ubo object in my cpu update it and then upload it to the gpu in one call like that. (for the example lets say i have only two objects).
std::vector<UniformBufferObject> ubo(m_allObject.size());
int index = 0;
for (RenderableObject* rendObj : m_allObject)
{
ubo[index].proj = m_camera->getProjection();
ubo[index].view = m_camera->getView();
ubo[index].model = rendObj->getTransform().getModel();
ubo[index].proj[1][1] *= -1;
index++;
}
int size = sizeof(UniformBufferObject) *m_allObject.size();
void* data;
m_instance->getLogicalDevice().mapMemory(ykEngine::Buffer::m_uniformBuffer.m_bufferMemory, 0, size , vk::MemoryMapFlags(), &data);
memcpy(data, ubo.data(), size);
m_instance->getLogicalDevice().unmapMemory(ykEngine::Buffer::m_uniformBuffer.m_bufferMemory);
i created one buffer with the size of two ubo. (the create do work because it do work with ubo in size one).
vk::DeviceSize bufferSize = sizeof(UniformBufferObject) * 2;
createBuffer(logicalDevice, bufferSize, vk::BufferUsageFlagBits::eUniformBuffer, vk::MemoryPropertyFlagBits::eHostVisible | vk::MemoryPropertyFlagBits::eHostCoherent, m_uniformBuffer.m_buffer, m_uniformBuffer.m_bufferMemory);
and than i put an offset in the descriptor set creation :
vk::DescriptorBufferInfo bufferInfo;
bufferInfo.buffer = uniformBuffer;
bufferInfo.offset = offsetForUBO;
bufferInfo.range = sizeof(UniformBufferObject);
the offset is the size of UniformBufferObject * the index of the object.
every object have is own descriptorsetLayout but the samepipline
when i try to update the descriptor set i get the error :
i couldnt find any aligment enum that specify that information.
if anyone know how to do that it will help alot.
thanks.
i couldnt find any aligment enum that specify that information.
Vulkan is not OpenGL; you don't use enums to query limits. Limits are defined by the VkPhysicalDeviceLimits struct, queried via vkGetPhysicalDeviceProperties/2KHR.
The error tells you exactly which limitation you violated: minUniformBufferOffsetAlignment. Your implementation set this to 0x100, but your provided offset was insufficient for this.
Also, you should not map buffers in the middle of a frame. All mapping in Vulkan is "persistent"; map it once and leave it that way until you're ready to delete the memory.
I have a solution which includes three projects. one is creating static library i.e .lib file. It contains one header file main.h and one main.cpp file. cpp file contains the definition of functions of header file.
second project is .exe project which includes the header file main.h and calls a function of header file.
third project is also a .exe project which includes the header file and uses a variable flag of header file.
Now both .exe projects are creating different instance of the variable. But I want to share same instance of the variable between the projects dynamically. as I have to map the value generated by one project into other project at the same instant.
Please help me as I am nearing my project deadline.
Thanks for the help.
Here are some part of the code.
main.cpp and main.h are files of .lib project
main.h
extern int flag;
extern int detect1(void);
main.cpp
#include<stdio.h>
#include"main.h"
#include <Windows.h>
#include <ShellAPI.h>
using namespace std;
using namespace cv;
int flag=0;
int detect1(void)
{
int Cx=0,Cy=0,Kx=20,Ky=20,Sx=0,Sy=0,j=0;
//create the cascade classifier object used for the face detection
CascadeClassifier face_cascade;
//use the haarcascade_frontalface_alt.xml library
face_cascade.load("E:\\haarcascade_frontalface_alt.xml");
//System::DateTime now = System::DateTime::Now;
//cout << now.Hour;
//WinExec("E:\\FallingBlock\\FallingBlock\\FallingBlock\\bin\\x86\\Debug\\FallingBlock.exe",SW_SHOW);
//setup video capture device and link it to the first capture device
VideoCapture captureDevice;
captureDevice.open(0);
//setup image files used in the capture process
Mat captureFrame;
Mat grayscaleFrame;
//create a window to present the results
namedWindow("capture", 1);
//create a loop to capture and find faces
while(true)
{
//capture a new image frame
captureDevice>>captureFrame;
//convert captured image to gray scale and equalize
cvtColor(captureFrame, grayscaleFrame, CV_BGR2GRAY);
equalizeHist(grayscaleFrame, grayscaleFrame);
//create a vector array to store the face found
std::vector<Rect> faces;
//find faces and store them in the vector array
face_cascade.detectMultiScale(grayscaleFrame, faces, 1.1, 3, CV_HAAR_FIND_BIGGEST_OBJECT|CV_HAAR_SCALE_IMAGE, Size(30,30));
//draw a rectangle for all found faces in the vector array on the original image
for(unsigned int i = 0; i < faces.size(); i++)
{
Point pt1(faces[i].x + faces[i].width, faces[i].y + faces[i].height);
Point pt2(faces[i].x, faces[i].y);
rectangle(captureFrame, pt1, pt2, cvScalar(0, 255, 0, 0), 1, 8, 0);
if(faces.size()>=1)
j++;
Cx = faces[i].x + (faces[i].width / 2);
Cy = faces[i].y + (faces[i].height / 2);
if(j==1)
{
Sx=Cx;
Sy=Cy;
flag=0;
}
}
if(Cx-Sx > Kx)
{
flag = 1;
printf("%d",flag);
}
else
{
if(Cx-Sx < -Kx)
{
flag = 2;
printf("%d",flag);
//update(2);
}
else
{
if(Cy-Sy > Ky)
{
flag = 3;
printf("%d",flag);
//update(3);
}
else
{
if(Cy-Sy < -Ky)
{
flag = 4;
printf("%d",flag);
//update(4);
}
else
if(abs(Cx-Sx) < Kx && abs(Cy-Sy)<Ky)
{
flag = 0;
printf("%d",flag);
//update(0);
}
}
}
}
2nd project's code
face.cpp
#include"main.h"
#include<stdio.h>
int main()
{
detect1();
}
3rd project's code
tetris.cpp
#include"main.h"
int key;
key = flag;
if(key==0)
{
MessageBox(hwnd,"Space2","TetRiX",0);
}
if(key==4)
{
tetris.handleInput(1);
tetris.drawScreen(2);
//MessageBox(hwnd,"Space2","TetRiX",0);
}
You need to look up how to do inter-process communication in the operating system under which your applications will run. (At this point I assume that the processes are running on the same computer.) It looks like you're using Windows (based on seeing a call to "MessageBox") so the simplest means would be for both processes to use RegisterWindowMessage create a commonly-understood message value, and then send the data via LPARAM using either PostMessage or SendMessage. (You'll need each of them to get the window handle of the other, which is reasonably easy.) You'll want to have some sort of exclusion mechanism (mutex or critical section) in both processes to ensure that the shared value can't be read and written at the same time. If both processes can do the "change and exchange" then you'll have an interesting problem to solve if both try to do that at the same time, because you'll have to deal with the possibility of deadlocks over that shared value.
You can also use shared memory, but that's a bit more involved.
If the processes are on different computers you'll need to do it via TCP/IP or a protocol on top of TCP/IP. You could use a pub-sub arrangement--or any number of things. Without an understanding of exactly what you're trying to accomplish, it's difficult to know what to recommend.
(For the record, there is almost no way in a multi-process/multi-threaded O/S to share something "at the same instant." You can get arbitrarily close, but computers don't work like that.)
Given the level of difficulty involved, is there some other design that might make this cleaner? Why do these processes have to exchange information this way? Must it be done using separate processes?
I need to extract frames from a video in my Qt based application. Using ffmpeg libraries I am able to fetch frames as AVFrames which I need to convert to QImage to use in other parts of my application. This conversion needs to be efficient. So far it seems sws_scale() is the right function to use but I am not sure what source and destination pixel formats are to be specified.
Came up with the following 2-step process that first converts a decoded AVFame to another AVFrame in RGB colorspace and then to QImage. It works and is reasonably fast.
src_frame = get_decoded_frame();
AVFrame *pFrameRGB = avcodec_alloc_frame(); // intermediate pframe
if(pFrameRGB==NULL) {
;// Handle error
}
int numBytes= avpicture_get_size(PIX_FMT_RGB24,
is->video_st->codec->width, is->video_st->codec->height);
uint8_t *buffer = (uint8_t*)malloc(numBytes);
avpicture_fill((AVPicture*)pFrameRGB, buffer, PIX_FMT_RGB24,
is->video_st->codec->width, is->video_st->codec->height);
int dst_fmt = PIX_FMT_RGB24;
int dst_w = is->video_st->codec->width;
int dst_h = is->video_st->codec->height;
// TODO: cache following conversion context for speedup,
// and recalculate only on dimension changes
SwsContext *img_convert_ctx_temp;
img_convert_ctx_temp = sws_getContext(
is->video_st->codec->width, is->video_st->codec->height,
is->video_st->codec->pix_fmt,
dst_w, dst_h, (PixelFormat)dst_fmt,
SWS_BICUBIC, NULL, NULL, NULL);
QImage *myImage = new QImage(dst_w, dst_h, QImage::Format_RGB32);
sws_scale(img_convert_ctx_temp,
src_frame->data, src_frame->linesize, 0, is->video_st->codec->height,
pFrameRGB->data,
pFrameRGB->linesize);
uint8_t *src = (uint8_t *)(pFrameRGB->data[0]);
for (int y = 0; y < dst_h; y++)
{
QRgb *scanLine = (QRgb *) myImage->scanLine(y);
for (int x = 0; x < dst_w; x=x+1)
{
scanLine[x] = qRgb(src[3*x], src[3*x+1], src[3*x+2]);
}
src += pFrameRGB->linesize[0];
}
If you find a more efficient approach, let me know in the comments
I know, it's too late, but maybe someone will find it useful. From here I got the clue of doing the same conversion, which looks a bit shorter.
So I created QImage which is reused for every decoded frame:
QImage img( width, height, QImage::Format_RGB888 );
Created frameRGB:
frameRGB = av_frame_alloc();
//Allocate memory for the pixels of a picture and setup the AVPicture fields for it.
avpicture_alloc( ( AVPicture *) frameRGB, AV_PIX_FMT_RGB24, width, height);
After the the first frame is decoded I create conversion context SwsContext this way (it will be used for all the next frames):
mImgConvertCtx = sws_getContext( codecContext->width, codecContext->height, codecContext->pix_fmt, width, height, AV_PIX_FMT_RGB24, SWS_BICUBIC, NULL, NULL, NULL);
And finally for every decoded frame conversion is performed:
if( 1 == framesFinished && nullptr != imgConvertCtx )
{
//conversion frame to frameRGB
sws_scale(imgConvertCtx, frame->data, frame->linesize, 0, codecContext->height, frameRGB->data, frameRGB->linesize);
//setting QImage from frameRGB
for( int y = 0; y < height; ++y )
memcpy( img.scanLine(y), frameRGB->data[0]+y * frameRGB->linesize[0], mWidth * 3 );
}
See the link for the specifics.
A simpler approach, I think:
void takeSnapshot(AVCodecContext* dec_ctx, AVFrame* frame)
{
SwsContext* img_convert_ctx;
img_convert_ctx = sws_getContext(dec_ctx->width,
dec_ctx->height,
dec_ctx->pix_fmt,
dec_ctx->width,
dec_ctx->height,
AV_PIX_FMT_RGB24,
SWS_BICUBIC, NULL, NULL, NULL);
AVFrame* frameRGB = av_frame_alloc();
avpicture_alloc((AVPicture*)frameRGB,
AV_PIX_FMT_RGB24,
dec_ctx->width,
dec_ctx->height);
sws_scale(img_convert_ctx,
frame->data,
frame->linesize, 0,
dec_ctx->height,
frameRGB->data,
frameRGB->linesize);
QImage image(frameRGB->data[0],
dec_ctx->width,
dec_ctx->height,
frameRGB->linesize[0],
QImage::Format_RGB888);
image.save("capture.png");
}
Today, I have tested directly pass the image->bit() to swscale and finally it works, so it doesn't need to copy to memory. For example:
/* 1. Get frame and QImage to show */
struct my_frame *frame = get_frame(source);
QImage *myImage = new QImage(dst_w, dst_h, QImage::Format_RGBA8888);
/* 2. Convert and write into image buffer */
uint8_t *dst[] = {myImage->bits()};
int linesizes[4];
av_image_fill_linesizes(linesizes, AV_PIX_FMT_RGBA, frame->width);
sws_scale(myswscontext, frame->data, (const int*)frame->linesize,
0, frame->height, dst, linesizes);
I just discovered that scanLine is just seeking thru the buffer.. all you need is use AV_PIX_FMT_RGB32 for the AVFrame and QImage::FORMAT_RGB32 for the QImage.
Then after decoding just do a memcpy
memcpy(img.scanLine(0), pFrameRGB->data[0], pFrameRGB->linesize[0] * pFrameRGB->height());
I had problems with the other proposed solutions as :
They did not mention freeing either AVFrame, SwsContext or the allocated buffers, which caused massive memory leaks (I had thousands of frames to handle). These problems couldn't all be solved easily as QImage relies on the underlying data, and does not copy it. If freeing the buffer directly, the QImage points to freed data and breaks. This could be solved by using QImage's cleanupFunction to free the buffer once the image is no longer needed, but with other problems it wasn't good anyways.
In some cases one of the suggestions, of passing QImage.bits directly to sws_scale, would not work as QImage are minimum 32 bit aligned. Therefore for certain dimensions it would not match the expected width by sws_scale and output each line shifted a little bit.
A third problem is that they used deprecated AVPicture elements.
I listed the problem in another question Converting an AVFrame to QImage with conversion of pixel format and in the end found a solution using a temporary buffer, which could be copied to the QImage, and then safely freed.
So see my answer for a fully working, efficient, and with no deprecated function calls, implementation : https://stackoverflow.com/a/68212609/7360943