I would like to have a Variable with Read-Access to all kernels/functions inside a CL Program. For this i have created a variable at the top of the File and prefixed it with __global.
typedef struct{
/* whatever */
} GlobalParameters;
__global GlobalParameters params;
how can i set the Values inside that Struct from the Host code now? Is that even Possible, or how can i edit it else? Or do i have to pass it as Parameter to the kernel every time i need it?
Program scope variables are meant to be constants and need to be initialized.
So, this works like:
typedef struct{
float whatever;
} GlobalParameters;
__constant GlobalParameters params=(GlobalParameters){3.14f};
then you can use it anywhere. But if opencl-compile-time is ok for it, you can alter it with string replacement after preaparing the host-side constant buffer:
typedef struct{
float whatever;
} GlobalParameters;
__constant GlobalParameters params=(GlobalParameters){##replace_0##};
if this is used for minutes per change, you can re-compile it using new string replacement before device-kernel-compiling. If there are non-changing sets, you can compile N times for different kernel programs and switch between them using different contexts.
Related
I have the following example code:
int compute_stuff(int *array)
{
/* do stuff with array */
...
return x;
}
__kernel void my_kernel()
{
__local int local_mem_block[LENGTH*MY_LOCAL_WORK_SIZE];
int result;
/* do stuff with local memory block */
result = compute_stuff(local_mem_block + (LENGTH*get_local_id(0)));
...
}
The above example compiles and executes fine on my NVIDIA card (RTX 2080).
But when I try to compile on a Macbook with AMD card, I get the following error:
error: passing '__local int *' to parameter of type '__private int *' changes address space of pointer
OK, so then I change the "compute_stuff" function to the following:
int compute_stuff(__local int *array)
Now both NVIDIA and AMD compile it fine, no problem...
But then I have one more test, to compile it on the same Macbook using WINE (rather than boot to Windows in bootcamp), and it gives the following error:
error: parameter may not be qualified with an address space
So it seems as though one is not supposed to qualify a function parameter with an address space. Fair enough. But if I do not do that, then the AMD on native Windows thinks that I am trying to change the address space of the pointer to private (I guess because it assumes that all function arguments will be private?).
What is a good way to handle this so that all three environments are happy to compile it? As a last resort, I am thinking of simply having the program check to see if the build failed without qualifier, and if so, substitute in the "__local" qualifier and build a second time... Seems like a hack, but it could work.
I agree with ProjectPhysX that it appears to be a bug with the WINE implementation. I also found the following appears to satisfy all three environments:
int compute_stuff(__local int * __private array)
{
...
}
__kernel void my_kernel()
{
__local int local_mem_block[LENGTH*MY_LOCAL_WORK_SIZE];
__local int * __private samples;
samples = local_mem_block + (LENGTH*get_local_id(0));
result = compute_stuff(samples);
}
The above is explicitly stating that the pointer itself is private while the memory it is pointing to is kept in local address space. So this removes any ambiguity.
The int* in int compute_stuff(int *array) is __generic address space. The call result = compute_stuff(local_mem_block+...); implicitly converts it to __local, which is allowed according to the OpenCL 2.0 Khronos specification.
It could be that AMD defaults to OpenCL 1.2. Maybe explicitely set –cl-std=CL2.0 in clBuildProgram() or clCompileProgram().
To keep the code compatible with OpenCL 1.2, you can explicitly set the pointer in the function to __local: int compute_stuff(__local int *array). OpenCL allows to set function parameters to the address spaces __global and __local. WINE seems to have a bug here. Maybe inlining the function can solve it: int __attribute__((always_inline)) compute_stuff(__local int *array).
As a last resort, you can do your proposed method. You can detect if it runs on WINE system like this. With that, you could switch between the two code variants without compiling twice and detecting the error.
I need to copy some data from __global to __local in openCL using async_work_group_copy. The issue is, I'm not using a built-in data type.
The code snip of what I have tried is as follows:
typedef struct Y
{
...
} Y;
typedef struct X
{
Y y[MAXSIZE];
} X;
kernel void krnl(global X* restrict x){
global const Y* l = x[a].y;
local Y* l2;
size_t sol2 = sizeof(l);
async_work_group_copy(l2, l, sol2, 0);
}
where 'a' is just a vector of int. This code does not work, specifically because the gen_type is not a built-in one. The specs (1.2) says:
We use the generic type name gentype to indicate the built-in data
types ... as the type for the arguments unless otherwise stated.
so how do I otherwisely state this data type?
OpenCL async_work_group_copy() is designed to copy N elements of a basic data type. However it does not really know what is being copied. So you can tell it to copy N bytes, containing any type inside (including structs). Similar to memcpy().
You can do:
kernel void krnl(global X* restrict x){
global const Y* l = x[a].y;
local Y l2;
size_t sol2 = sizeof(Y);
async_work_group_copy((local char *)&l2, (global char *)l, sol2, 0);
}
However, remember that you need to declare local memory explicitly in side the kernel, or dinamically from the API side and pass a pointer to local memory.
You can't just create a local pointer without any initialization and copy there. (see my code)
I need to pass a complex data type to OpenCL as a buffer and I want (if possible) to avoid the buffer alignment.
In OpenCL I need to use two structures to differentiate the data passed in the buffer casting to them:
typedef struct
{
char a;
float2 position;
} s1;
typedef struct
{
char a;
float2 position;
char b;
} s2;
I define the kernel in this way:
__kernel void
Foo(
__global const void* bufferData,
const int amountElements // in the buffer
)
{
// Now I cast to one of the structs depending on an extra value
__global s1* x = (__global s1*)bufferData;
}
And it works well only when I align the data passed in the buffer.
The question is: Is there a way to use _attribute_ ((packed)) or _attribute_((aligned(1))) to avoid the alignment in data passed in the buffer?
If padding the smaller structure is not an option, I suggest passing another parameter to let your kernel function know what the type is - maybe just the size of the elements.
Since you have data types that are 9 and 10 bytes, it may be worth a try padding them both out to 12 bytes depending on how many of them you read within your kernel.
Something else you may be interested in is the extension: cl_khr_byte_addressable_store
http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/cl_khr_byte_addressable_store.html
update:
I didn't realize you were passing a mixed array, I thought It was uniform in type. If you want to track the type on a per-element basis, you should pass a list of the types (or codes). Using float2 on its own in bufferData would probably be faster as well.
__kernel void
Foo(
__global const float2* bufferData,
__global const char* bufferTypes,
const int amountElements // in the buffer
)
I have some general parameters declared as a global (__constant) struct, like so:
typedef struct
{
int a;
int b;
float c;
/// blah blah
} SomeParams;
__constant SomeParams Parameters;
in the kernel, I need to use it like so:
__kernel void Foo()
{
int a = Parameters.a;
/// do something useful...
}
I'm not sure how I can initialize the value of Parameters from the host before I execute the kernel.
I have no problem creating buffers, etc for kernel arguments, but since this isn't a kernel argument, what do I need to do?
I'm using the Cloo C#/OpenCL bindings, but even a raw CL API would be helpful.
As far as I know (but I wouldn't swear by this), you can't initialize variables from the host code that are declared in that way (with one exception, see below). You could declare a variable and initialize it like this:
__constant float pi = 3.14f;
You could also do something like this:
Kernel: __constant float width = WIDTH
Host: Build the kernel with a -D build parameter defining the value of WIDTH.
What I have done in the past is have the constant variable as a kernel parameter.
__kernel void Foo(__constant SomeParams Parameters)
{
int a = Parameters.a;
/// do something useful...
}
Then you can allocate and set the value just like any other kernel argument.
I am following along with a tutorial located here: http://opencl.codeplex.com/wikipage?title=OpenCL%20Tutorials%20-%201
The kernel they have listed is this, which computes the sum of two numbers and stores it in the output variable:
__kernel void vector_add_gpu (__global const float* src_a,
__global const float* src_b,
__global float* res,
const int num)
{
/* get_global_id(0) returns the ID of the thread in execution.
As many threads are launched at the same time, executing the same kernel,
each one will receive a different ID, and consequently perform a different computation.*/
const int idx = get_global_id(0);
/* Now each work-item asks itself: "is my ID inside the vector's range?"
If the answer is YES, the work-item performs the corresponding computation*/
if (idx < num)
res[idx] = src_a[idx] + src_b[idx];
}
1) Say for example that the operation performed was much more complex than a summation - something that warrants its own function. Let's call it ComplexOp(in1, in2, out). How would I go about implementing this function such that vector_add_gpu() can call and use it? Can you give example code?
2) Now let's take the example to the extreme, and I now want to call a generic function that operates on the two numbers. How would I set it up so that the kernel can be passed a pointer to this function and call it as necessary?
Yes it is possible. You just have to remember that OpenCL is based on C99 with some caveats. You can create other functions either inside of the same kernel file or in a seperate file and just include it in the beginning. Auxiliary functions do not need to be declared as inline however, keep in mind that OpenCL will inline the functions when called. Pointers are also not available to use when calling auxiliary functions.
Example
float4 hit(float4 ray_p0, float4 ray_p1, float4 tri_v1, float4 tri_v2, float4 tri_v3)
{
//logic to detect if the ray intersects a triangle
}
__kernel void detection(__global float4* trilist, float4 ray_p0, float4 ray_p1)
{
int gid = get_global_id(0);
float4 hitlocation = hit(ray_p0, ray_p1, trilist[3*gid], trilist[3*gid+1], trilist[3*gid+2]);
}
You can have auxiliary functions for use in the kernel, see OpenCL user defined inline functions . You can not pass function pointers into the kernel.