I have an OpenGL VBO containing cl_float4 vertices and I'm trying to update the VBO via OpenCL (I'm rendering the VBO contents as GL_POINTS). I pass cl_mem object representing the VBO as a kernel argument (the buffer is set as CL_MEM_READ_WRITE).
Unfortunately I cannot update one vertex's float4 data at once.
Following snippet doesn't work (i.e. the rendered points doesn't move):
__kernel void update(__global float4* particle_positions)
{
int gid = get_global_id(0);
particle_positions[gid] += float4(0.1, 0.1, 0.1, 0.0);
}
Following snippet does work (i.e. the rendered points move):
__kernel void update(__global float4* particle_positions)
{
int gid = get_global_id(0);
particle_positions[gid].x += 0.1;
particle_positions[gid].y += 0.1;
particle_positions[gid].z += 0.1;
}
It is needed to be in paranthesis like
(float4)(1,1,1,1)
to have a float4 type. Or you can use other types too
(float4)((float2)(1,1),(float2)(1,1))
or mixed type
(float4)((float2)(1,1),1,1)
acts like an overloaded function.
Related
I'm writing a renderer from scratch using openCL and I have a little compilation problem on my kernel with the error :
CL_BUILD_PROGRAM : error: program scope variable must reside in constant address space static float* objects;
The problem is that this program compiles on my desktop (with nvidia drivers) and doesn't work on my laptop (with nvidia drivers), also I have the exact same kernel file in another project that works fine on both computers...
Does anyone have an idea what I could be doing wrong ?
As a clarification, I'm coding a raymarcher which's kernel takes a list of objects "encoded" in a float array that is needed a lot in the program and that's why I need it accessible to the hole kernel.
Here is the kernel code simplified :
float* objects;
float4 getDistCol(float3 position) {
int arr_length = objects[0];
float4 distCol = {INFINITY, 0, 0, 0};
int index = 1;
while (index < arr_length) {
float objType = objects[index];
if (compare(objType, SPHERE)) {
// Treats the part of the buffer as a sphere
index += SPHERE_ATR_LENGTH;
} else if (compare(objType, PLANE)) {
//Treats the part of the buffer as a plane
index += PLANE_ATR_LENGTH;
} else {
float4 errCol = {500, 1, 0, 0};
return errCol;
}
}
}
__kernel void mkernel(__global int *image, __constant int *dimension,
__constant float *position, __constant float *aimDir, __global float *objs) {
objects = objs;
// Gets ray direction and stuf
// ...
// ...
float4 distCol = RayMarch(ro, rd);
float3 impact = rd*distCol.x + ro;
col = distCol.yzw * GetLight(impact);
image[dimension[0]*dimension[1] - idx*dimension[1]+idy] = toInt(col);
Where getDistCol(float3 position) gets called a lot by a lot of functions and I would like to avoid having to pass my float buffer to every function that needs to call getDistCol()...
There is no "static" variables allowed in OpenCL C that you can declare outside of kernels and use across kernels. Some compilers might still tolerate this, others might not. Nvidia has recently changed their OpenCL compiler from LLVM 3.4 to NVVM 7 in a driver update, so you may have the 2 different compilers on your desktop/laptop GPUs.
In your case, the solution is to hand the global kernel parameter pointer over to the function:
float4 getDistCol(float3 position, __global float *objects) {
int arr_length = objects[0]; // access objects normally, as you would in the kernel
// ...
}
kernel void mkernel(__global int *image, __constant int *dimension, __constant float *position, __constant float *aimDir, __global float *objs) {
// ...
getDistCol(position, objs); // hand global objs pointer over to function
// ...
}
Lonely variables out in the wild are only allowed as constant memory space, which is useful for large tables. They are cached in L2$, so read-only access is potentially faster. Example
constant float objects[1234] = {
1.0f, 2.0f, ...
};
I currently have something like this in my kernel code:
func(__global float2 *array, __global float *buffer) {
float *vector[2];
vector[0] = array.s0;
vector[1] = array.s1;
So I can do something like this later in the code:
vector[vec_off][index] = buffer[i];
Basically, I want to be able to access the elements of a float2 in my code based on a calculated index. The point is to be able to easily expand it to a float4/float16 vector later on.
Currently I get a (-11) error (CL_BUILD_PROGRAM_FAILURE) when I try to do vector[0] = array.x; Which I guess means I'm not allowed to write it (like that?) in OpenCL.
If it's not just a syntax error, I should be able to do this by accessing each element of array using an offset, so I would have:
array.s0 = array
array.s1 = array + offset
...
array.sf = array + 15 * offset
I do not know however how a floatn is stored in memory. Is the .s1 part stored right after the .s0? Is that is the case, then offset would just be the size of array.s0, right?
Thank you.
To be able to use calculated index to access float2 elements you can use union or cast directly to float*:
1. Using union
Define the following union:
union float_type
{
float2 data2;
float data[2];
};
and then cast float2 array on the fly and access elements using calculated index:
func(__global float2 *array, __global float *buffer) {
float foo = ((__global union float_type*)array)[1].data[1];
}
2. cast to float*
func(__global float2 *array, __global float *buffer) {
float foo = ((__global float*)&array[1])[1];
}
I've been trying for a while to get support for softbodies in my project,
I have already added all primitives, including static triangle meshes as you can see below:
I've now been trying to implement the softbodies.
I do have triangle shapes as I mentioned, and I thought I could re-use the triangulation code to
create softbody objects with the function:
btSoftBody* psb = btSoftBodyHelpers::CreateFromTriMesh(.....);
I successfully did this with the bunny mesh that's hardcoded, but now I want to insert any trinangulated mesh into this function.
But I'm a bit lost figuring out exactly what parameters to send in (how to get the right parameters from my triangulated mesh).
Do anyone of you have a example of this? (not a hardcoded one, but from a
btTriangleMesh *mTriMesh = new btTriangleMesh();
type object? )
It does work with the predefined type shapes that bullet has, so my update loop and all that works fine.
This is for version 2.81 (assuming vertices are stored as PHY_FLOAT and indices as PHY_INTEGER):
btTriangleMesh *mTriMesh = new btTriangleMesh();
// ...
const btVector3 meshScaling = mTriMesh->getScaling();
btAlignedObjectArray<btScalar> vertices;
btAlignedObjectArray<int> triangles;
for (int part=0;part< mTriMesh->getNumSubParts(); part++)
{
const unsigned char * vertexbase;
const unsigned char * indexbase;
int indexstride;
int stride,numverts,numtriangles;
PHY_ScalarType type, gfxindextype;
mTriMesh->getLockedReadOnlyVertexIndexBase(&vertexbase,numverts,type,stride,&indexbase,indexstride,numtriangles,gfxindextype,part);
for (int gfxindex=0; gfxindex < numverts; gfxindex++)
{
float* graphicsbase = (float*)(vertexbase+gfxindex*stride);
vertices.push_back(graphicsbase[0]*meshScaling.getX());
vertices.push_back(graphicsbase[1]*meshScaling.getY());
vertices.push_back(graphicsbase[2]*meshScaling.getZ());
}
for (int gfxindex=0;gfxindex < numtriangles; gfxindex++)
{
unsigned int* tri_indices= (unsigned int*)(indexbase+gfxindex*indexstride);
triangles.push_back(tri_indices[0]);
triangles.push_back(tri_indices[1]);
triangles.push_back(tri_indices[2]);
}
}
btSoftBodyWorldInfo worldInfo;
// Setup worldInfo...
// ....
btSoftBodyHelper::CreateFromTriMesh(worldInfo, &vertices[0], &triangles[0], triangles.size()/3 /*, randomizeConstraints = true*/);
A slower, more general approach is to iterate the mesh using mTriMesh->InternalProcessAllTriangles() but that will make your mesh a soup.
I need to pass a complex data type to OpenCL as a buffer and I want (if possible) to avoid the buffer alignment.
In OpenCL I need to use two structures to differentiate the data passed in the buffer casting to them:
typedef struct
{
char a;
float2 position;
} s1;
typedef struct
{
char a;
float2 position;
char b;
} s2;
I define the kernel in this way:
__kernel void
Foo(
__global const void* bufferData,
const int amountElements // in the buffer
)
{
// Now I cast to one of the structs depending on an extra value
__global s1* x = (__global s1*)bufferData;
}
And it works well only when I align the data passed in the buffer.
The question is: Is there a way to use _attribute_ ((packed)) or _attribute_((aligned(1))) to avoid the alignment in data passed in the buffer?
If padding the smaller structure is not an option, I suggest passing another parameter to let your kernel function know what the type is - maybe just the size of the elements.
Since you have data types that are 9 and 10 bytes, it may be worth a try padding them both out to 12 bytes depending on how many of them you read within your kernel.
Something else you may be interested in is the extension: cl_khr_byte_addressable_store
http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/cl_khr_byte_addressable_store.html
update:
I didn't realize you were passing a mixed array, I thought It was uniform in type. If you want to track the type on a per-element basis, you should pass a list of the types (or codes). Using float2 on its own in bufferData would probably be faster as well.
__kernel void
Foo(
__global const float2* bufferData,
__global const char* bufferTypes,
const int amountElements // in the buffer
)
I am following along with a tutorial located here: http://opencl.codeplex.com/wikipage?title=OpenCL%20Tutorials%20-%201
The kernel they have listed is this, which computes the sum of two numbers and stores it in the output variable:
__kernel void vector_add_gpu (__global const float* src_a,
__global const float* src_b,
__global float* res,
const int num)
{
/* get_global_id(0) returns the ID of the thread in execution.
As many threads are launched at the same time, executing the same kernel,
each one will receive a different ID, and consequently perform a different computation.*/
const int idx = get_global_id(0);
/* Now each work-item asks itself: "is my ID inside the vector's range?"
If the answer is YES, the work-item performs the corresponding computation*/
if (idx < num)
res[idx] = src_a[idx] + src_b[idx];
}
1) Say for example that the operation performed was much more complex than a summation - something that warrants its own function. Let's call it ComplexOp(in1, in2, out). How would I go about implementing this function such that vector_add_gpu() can call and use it? Can you give example code?
2) Now let's take the example to the extreme, and I now want to call a generic function that operates on the two numbers. How would I set it up so that the kernel can be passed a pointer to this function and call it as necessary?
Yes it is possible. You just have to remember that OpenCL is based on C99 with some caveats. You can create other functions either inside of the same kernel file or in a seperate file and just include it in the beginning. Auxiliary functions do not need to be declared as inline however, keep in mind that OpenCL will inline the functions when called. Pointers are also not available to use when calling auxiliary functions.
Example
float4 hit(float4 ray_p0, float4 ray_p1, float4 tri_v1, float4 tri_v2, float4 tri_v3)
{
//logic to detect if the ray intersects a triangle
}
__kernel void detection(__global float4* trilist, float4 ray_p0, float4 ray_p1)
{
int gid = get_global_id(0);
float4 hitlocation = hit(ray_p0, ray_p1, trilist[3*gid], trilist[3*gid+1], trilist[3*gid+2]);
}
You can have auxiliary functions for use in the kernel, see OpenCL user defined inline functions . You can not pass function pointers into the kernel.