OpenCL passing multiple arrays of different sizes to kernel - opencl

I want to pass two arrays of different sizes to the kernel, how can I set the cl_ndrange?
With only one array of size NUM_VALUES, the range can be easily set as
cl_ndrange range = { // 6
1, // The number of dimensions to use.
{0, 0, 0}, // The offset in each dimension. To specify
// that all the data is processed, this is 0
// in the test case. // 7
{NUM_VALUES, 0, 0}, // The global range—this is how many items
// IN TOTAL in each dimension you want to
// process.
{NULL, 0, 0} // The local size of each workgroup. This
// determines the number of work items per
// workgroup. It indirectly affects the
// number of workgroups, since the global
// size / local size yields the number of
// workgroups. In this test case, there are
// NUM_VALUE / wgs workgroups.
};
But I don't know what to do for two arrays with different sizes. In my case, one array is actually used as input for calculation, while another is used to for lookups.
Thanks in advance.

Related

Vulkan: one pipeline and multiple descriptor sets?

I'm trying to create a single pipeline with a layout where it requires two bindings, a dynamic UBO and a image/sampler binding. I want each binding to come from a separate descriptor set, so I'd bind two descriptor sets per draw call. One descriptor set is for texture per object, the other is for the dynamic UBO(shared between objects). I want to be able to do something like this in the rendering portion:
commandBuffer.bindPipeline(vk::PipelineBindPoint::eGraphics, pipeline);
for (int ii = 0; ii < mActiveQuads; ii++)
{
uint32_t dynamicOffset = ii * static_cast<uint32_t>(dynamicAlignment);
// bind texture for this quad
commandBuffer.bindDescriptorSets(vk::PipelineBindPoint::eGraphics, sharedPipelineLayout, 0, 1,
&swapResources[current_buffer].textureDescriptors[ii], 1, &dynamicOffset);
// draw the dynamic UBO with offset for this quad
commandBuffer.bindDescriptorSets(vk::PipelineBindPoint::eGraphics, sharedPipelineLayout, 0, 1,
&swapResources[current_buffer].quadDescriptor, 1, &dynamicOffset);
commandBuffer.draw(2 * 3, 1, 0, 0);
}
But this doesn't seem to work. First of all I'm not sure I have understood everything about descriptor sets and pipeline layouts to know if what I'm doing is allowed. Does this even make sense? That I can create a pipeline with a 2 binding layout, but create each descriptor to fill just one of those bindings each, then bind two descriptors per draw call for that pipeline?
If it is allowed. This is how I'm creating the pipeline and descriptors:
vk::DescriptorSetLayoutBinding const layout_bindings[2] = { vk::DescriptorSetLayoutBinding()
.setBinding(0)
.setDescriptorType(vk::DescriptorType::eUniformBufferDynamic)
.setDescriptorCount(1)
.setStageFlags(vk::ShaderStageFlagBits::eVertex)
.setPImmutableSamplers(nullptr),
vk::DescriptorSetLayoutBinding()
.setBinding(1)
.setDescriptorType(vk::DescriptorType::eCombinedImageSampler)
.setDescriptorCount(1)//texture_count)
.setStageFlags(vk::ShaderStageFlagBits::eFragment)
.setPImmutableSamplers(nullptr) };
// note binding count is 1 here
auto const descriptor_layout = vk::DescriptorSetLayoutCreateInfo().setBindingCount(1).setPBindings(&layout_bindings[0]); // using the first part of the above layout
device.createDescriptorSetLayout(&descriptor_layout, nullptr, &quadDescriptorLayout);
// note binding count is 1 here
auto const descriptor_layout2 = vk::DescriptorSetLayoutCreateInfo().setBindingCount(1).setPBindings(&layout_bindings[1]); // using the second part of the above layout
device.createDescriptorSetLayout(&descriptor_layout2, nullptr, &textureDescriptorLayout);
// Now create the pipeline, note we use both the bindings above with
// layout count = 2
auto const pPipelineLayoutCreateInfo = vk::PipelineLayoutCreateInfo().setSetLayoutCount(2).setPSetLayouts(desc_layout);
device.createPipelineLayout(&pPipelineLayoutCreateInfo, nullptr, &sharedPipelineLayout);
and the descriptors themselves:
// alloc quad descriptor
alloc_info =
vk::DescriptorSetAllocateInfo()
.setDescriptorPool(desc_pool)
.setDescriptorSetCount(1)
.setPSetLayouts(&quadDescriptorLayout);
// texture descriptors(multiple descriptors, one per quad object)
alloc_info =
vk::DescriptorSetAllocateInfo()
.setDescriptorPool(desc_pool)
.setDescriptorSetCount(1)
.setPSetLayouts(&textureDescriptorLayout);
Previously, with the texture and UBO in a single descriptor set it worked fine, I could see multiple quads but all sharing a single texture. When I split off the textures into a different descriptor set, that's when I get a hanging app. I get a 'device lost' error on trying to submit the graphics queue.
Any insight as to if this is possible to do or if I'm doing something wrong in my setup would be very appreciated. Thank you very much!
Below adding Shader code:
#version 450
#extension GL_ARB_separate_shader_objects : enable
layout(binding = 0) uniform UniformBufferObject {
mat4 mvp;
vec4 position[6];
vec4 attr[6];
} ubo;
layout(location = 0) out vec2 fragTexCoord;
void main() {
gl_Position = ubo.mvp *ubo.position[gl_VertexIndex];
fragTexCoord = vec2(ubo.attr[gl_VertexIndex].x, ubo.attr[gl_VertexIndex].y);
}
Pixel shader:
#version 450
#extension GL_ARB_separate_shader_objects : enable
layout(set=0, binding = 1) uniform sampler2D texSampler;
layout(location = 0) in vec2 fragTexCoord;
layout(location = 0) out vec4 outColor;
void main() {
outColor = texture(texSampler, fragTexCoord);
}
Yes, you can do this. Your pipeline layout has two descriptor sets. Each of the two descriptor set layouts has one descriptor: a dynamic UBO and a texture. At draw time, you bind one descriptor set of each descriptor set layout to the appropriate set number.
It looks like the firstSet parameter when binding the texture descriptor set is wrong: that's the second set in the pipeline layout, so it has index 1, but you're passing 0. The validation layers should have warned you that you're binding a descriptor set with a set layout that doesn't match what the pipeline layout expects for that set.
You don't show the shader code that accesses these, so you may have done this. But when going from a single descriptor set to two descriptor sets, you need to update the set index in the sampler binding.

create array of ubo and present each at a time to its shader

i want to create array of ubo object in my cpu update it and then upload it to the gpu in one call like that. (for the example lets say i have only two objects).
std::vector<UniformBufferObject> ubo(m_allObject.size());
int index = 0;
for (RenderableObject* rendObj : m_allObject)
{
ubo[index].proj = m_camera->getProjection();
ubo[index].view = m_camera->getView();
ubo[index].model = rendObj->getTransform().getModel();
ubo[index].proj[1][1] *= -1;
index++;
}
int size = sizeof(UniformBufferObject) *m_allObject.size();
void* data;
m_instance->getLogicalDevice().mapMemory(ykEngine::Buffer::m_uniformBuffer.m_bufferMemory, 0, size , vk::MemoryMapFlags(), &data);
memcpy(data, ubo.data(), size);
m_instance->getLogicalDevice().unmapMemory(ykEngine::Buffer::m_uniformBuffer.m_bufferMemory);
i created one buffer with the size of two ubo. (the create do work because it do work with ubo in size one).
vk::DeviceSize bufferSize = sizeof(UniformBufferObject) * 2;
createBuffer(logicalDevice, bufferSize, vk::BufferUsageFlagBits::eUniformBuffer, vk::MemoryPropertyFlagBits::eHostVisible | vk::MemoryPropertyFlagBits::eHostCoherent, m_uniformBuffer.m_buffer, m_uniformBuffer.m_bufferMemory);
and than i put an offset in the descriptor set creation :
vk::DescriptorBufferInfo bufferInfo;
bufferInfo.buffer = uniformBuffer;
bufferInfo.offset = offsetForUBO;
bufferInfo.range = sizeof(UniformBufferObject);
the offset is the size of UniformBufferObject * the index of the object.
every object have is own descriptorsetLayout but the samepipline
when i try to update the descriptor set i get the error :
i couldnt find any aligment enum that specify that information.
if anyone know how to do that it will help alot.
thanks.
i couldnt find any aligment enum that specify that information.
Vulkan is not OpenGL; you don't use enums to query limits. Limits are defined by the VkPhysicalDeviceLimits struct, queried via vkGetPhysicalDeviceProperties/2KHR.
The error tells you exactly which limitation you violated: minUniformBufferOffsetAlignment. Your implementation set this to 0x100, but your provided offset was insufficient for this.
Also, you should not map buffers in the middle of a frame. All mapping in Vulkan is "persistent"; map it once and leave it that way until you're ready to delete the memory.

Computing the memory footprint (or byte length) of a map

I want to limit a map to be maximum X bytes. It seems there is no straightforward way of computing the byte length of a map though.
"encoding/binary" package has a nice Size function, but it only works for slices or "fixed values", not for maps.
I could try to get all key/value pairs from the map, infer their type (if it's a map[string]interface{}) and compute the length - but that would be both cumbersome and probably incorrect (because that would exclude the "internal" Go cost of the map itself - managing pointers to elements etc).
Any suggested way of doing this? Preferably a code example.
This is the definition for a map header:
// A header for a Go map.
type hmap struct {
// Note: the format of the Hmap is encoded in ../../cmd/gc/reflect.c and
// ../reflect/type.go. Don't change this structure without also changing that code!
count int // # live cells == size of map. Must be first (used by len() builtin)
flags uint32
hash0 uint32 // hash seed
B uint8 // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
buckets unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
nevacuate uintptr // progress counter for evacuation (buckets less than this have been evacuated)
}
Calculating its size is pretty straightforward (unsafe.Sizeof).
This is the definition for each individual bucket the map points to:
// A bucket for a Go map.
type bmap struct {
tophash [bucketCnt]uint8
// Followed by bucketCnt keys and then bucketCnt values.
// NOTE: packing all the keys together and then all the values together makes the
// code a bit more complicated than alternating key/value/key/value/... but it allows
// us to eliminate padding which would be needed for, e.g., map[int64]int8.
// Followed by an overflow pointer.
}
bucketCnt is a constant defined as:
bucketCnt = 1 << bucketCntBits // equals decimal 8
bucketCntBits = 3
The final calculation would be:
unsafe.Sizeof(hmap) + (len(theMap) * 8) + (len(theMap) * 8 * unsafe.Sizeof(x)) + (len(theMap) * 8 * unsafe.Sizeof(y))
Where theMap is your map value, x is a value of the map's key type and y a value of the map's value type.
You'll have to share the hmap structure with your package via assembly, analogously to thunk.s in the runtime.

Safety of set_len operation on Vec, with predefined capacity

Is it safe to call set_len on Vec that has declared capacity? Like this:
let vec = unsafe {
let temp = Vec::with_capacity(N);
temp.set_len(N);
temp
}
I need my Vector to be of size N before any elements are to be added.
Looking at docs:
https://doc.rust-lang.org/collections/vec/struct.Vec.html#capacity-and-reallocation
https://doc.rust-lang.org/collections/vec/struct.Vec.html#method.with_capacity
https://doc.rust-lang.org/collections/vec/struct.Vec.html#method.set_len
I'm a bit confused. Docs say that with_capacity doesn't change length and set_len says that caller must insure vector has proper length. So is this safe?
The reason I need this is because I was looking for a way to declare a mutable buffer (&mut [T]) of size N and Vec seems to fit the bill the best. I just wanted to avoid having my types implement Clone that vec![0;n] would bring.
The docs are just a little ambiguously stated. The wording could be better. Your code example is as "safe" as the following stack-equivalent:
let mut arr: [T; N] = mem::uninitialized();
Which means that as long as you write to an element of the array before reading it you are fine. If you read before writing, you open the door to nasal demons and memory unsafety.
I just wanted to avoid clone that vec![0;n] would bring.
llvm will optimize this to a single memset.
If by "I need my Vector to be of size N" you mean you need memory to be allocated for 10 elements, with_capacity is already doing that.
If you mean you want to have a vector with length 10 (not sure why you would, though...) you need to initialize it with an initial value.
i.e.:
let mut temp: Vec<i32> = Vec::with_capacity(10); // allocate room in memory for
// 10 elements. The vector has
// initial capacity 10, length will be the
// number of elements you push into it
// (initially 0)
v.push(1); // now length is 1, capacity still 10
vs
let mut v: Vec<i32> = vec![0; 10]; // create a vector with 10 elements
// initialized to 0. You can mutate
// those in place later.
// At this point, length = capacity = 10
v[0] = 1; // mutating first element to 1.
// length and capacity are both still 10

When should we use reserve() of vector?

I always use resize() because I cannot use reserve as it gives error: vector subscript out of range. As I've read info about the differences of resize() and reserve(), I saw things like reserve() sets max. number of elements could be allocated but resize() is currently what we have. In my code I know max. number of elements but reserve() doesn't give me anything useful. So, how can I make use of reserve()?
A vector has a capacity (as returned by capacity() and a size (as returned by size(). The first states how many elements a vector can hold, the second how many he does currently hold.
resize changes the size, reserve only changes the capacity.
See also the resize and reserve documentation.
As for the use cases:
Let's say you know beforehand how many elements you want to put into your vector, but you don't want to initialize them - that's the use case for reserve. Let's say your vector was empty before; then, directly after reserve(), before doing any insert or push_back, you can, of course, not directly access as many elements as you reserved space for - that would trigger the mentioned error (subscript out of range) - since the elements you are trying to access are not yet initialized; the size is still 0. So the vector is still empty; but if you choose the reserved capacity in such a way that it's higher or equal to the maximum size your vector will get, you are avoiding expensive reallocations; and at the same time you will also avoid the (in some cases expensive) initialization of each vector element that resize would do.
With resize, on the other hand, you say: Make the vector hold as many elements as I gave as an argument; initialize those whose indices are exceeding the old size, or remove the ones exceeding the given new size.
Note that reserve will never affect the elements currently in the vector (except their storage location if reallocation is needed - but not their values or their number)! Meaning that if the size of a vector is currently greater than what you pass to a call to the reserve function on that same vector, reserve will just do nothing.
See also the answer to this question: Choice between vector::resize() and vector::reserve()
reserve() is a performance optimization for using std::vector.
A typical std::vector implementation would reserve some memory on the first push_back(), for example 4 elements. When the 5th element gets pushed, the vector has to be resized: new memory has to be allocated (usually the size is doubled), the contents of the vector have to be copied to the new location, and the old memory has to be deleted.
This becomes an expensive operation when the vector holds a lot of elements. For example when you push_back() the 2^24+1th element, 16Million elements get copied just to add one element.
If you know the number of elements in advance you can reserve() the number of elements you are planning to push_back(). In this case expensive copy operations are not necessary because the memory is already reserved for the amount needed.
resize() in contrast changes the number of elements in the vector.
If no elements are added and you use resize(20), 20 elements will now be accessable. Also the amount of memory allocated will increase to an implementation-dependent value.
If 50 elements are added and you use resize(20), the last 30 elements will be removed from the vector and not be accessable any more. This doesn't necessarily change the memory allocated but this may also be implementation-dependent.
resize(n) allocates the memory for n objects and default-initializes them.
reserve() allocates the memory but does not initialize. Hence, reserve won't change the value returned by size(), but it will change the result of capacity().
Edited after underscore_d's comment.
Description how functions implemented in VS2015
VS2015 CTP6
This error dialog exist only in the DEBUG mode, when #if _ITERATOR_DEBUG_LEVEL == 2 is defined. In the RELEASE mode we don't have any problems. We get a current value by return (*(this->_Myfirst() + _Pos), so size value isn't needed:
reference operator[](size_type _Pos)
{ // subscript mutable sequence
#if _ITERATOR_DEBUG_LEVEL == 2
if (size() <= _Pos)
{ // report error
_DEBUG_ERROR("vector subscript out of range");
_SCL_SECURE_OUT_OF_RANGE;
}
#elif _ITERATOR_DEBUG_LEVEL == 1
_SCL_SECURE_VALIDATE_RANGE(_Pos < size());
#endif /* _ITERATOR_DEBUG_LEVEL */
return (*(this->_Myfirst() + _Pos));
}
If we see in the vector's source code, we can find, that a difference between resize and reserve is only in the changing of the value of this->_Mylast() in the func resize().
reserve() calls _Reallocate.
resize() calls _Reserve, that calls _Reallocate and then resize() also changes the value of this->_Mylast(): this->_Mylast() += _Newsize - size(); that is used in the size calculation(see last func)
void resize(size_type _Newsize)
{ // determine new length, padding as needed
if (_Newsize < size())
_Pop_back_n(size() - _Newsize);
else if (size() < _Newsize)
{ // pad as needed
_Reserve(_Newsize - size());
_TRY_BEGIN
_Uninitialized_default_fill_n(this->_Mylast(), _Newsize - size(),
this->_Getal());
_CATCH_ALL
_Tidy();
_RERAISE;
_CATCH_END
this->_Mylast() += _Newsize - size();
}
}
void reserve(size_type _Count)
{ // determine new minimum length of allocated storage
if (capacity() < _Count)
{ // something to do, check and reallocate
if (max_size() < _Count)
_Xlen();
_Reallocate(_Count);
}
}
void _Reallocate(size_type _Count)
{ // move to array of exactly _Count elements
pointer _Ptr = this->_Getal().allocate(_Count);
_TRY_BEGIN
_Umove(this->_Myfirst(), this->_Mylast(), _Ptr);
_CATCH_ALL
this->_Getal().deallocate(_Ptr, _Count);
_RERAISE;
_CATCH_END
size_type _Size = size();
if (this->_Myfirst() != pointer())
{ // destroy and deallocate old array
_Destroy(this->_Myfirst(), this->_Mylast());
this->_Getal().deallocate(this->_Myfirst(),
this->_Myend() - this->_Myfirst());
}
this->_Orphan_all();
this->_Myend() = _Ptr + _Count;
this->_Mylast() = _Ptr + _Size;
this->_Myfirst() = _Ptr;
}
void _Reserve(size_type _Count)
{ // ensure room for _Count new elements, grow exponentially
if (_Unused_capacity() < _Count)
{ // need more room, try to get it
if (max_size() - size() < _Count)
_Xlen();
_Reallocate(_Grow_to(size() + _Count));
}
}
size_type size() const _NOEXCEPT
{ // return length of sequence
return (this->_Mylast() - this->_Myfirst());
}
Problems
But some problems exist with reserve:
end() will be equal to begin()
23.2.1 General container requirements
5:
end() returns an iterator which is the past-the-end value for the container.
iterator end() _NOEXCEPT
{ // return iterator for end of mutable sequence
return (iterator(this->_Mylast(), &this->_Get_data()));
}
i.e. _Mylast() will be equal _Myfirst()
at() will generate an out_of_range exception.
23.2.3 Sequence containers
17:
The member function at() provides bounds-checked access to container elements. at() throws out_of_range if n >= a.size().
in the VisualStudio debugger we can see vector values, when size isn't 0
with resize:
with reserve and manually setted #define _ITERATOR_DEBUG_LEVEL 0:

Resources