When writing code for kernels, is it possible to specify a generic data type so that copying the kernel for each used data type is not necessary? Currently I'm using preprocessor macros to define the whole function with various data types:
#define REDUCTION(type) __kernel void reduce_##type##_f(__global __read_only type* a) \
{ \
// do something
} \
REDUCTION(float)
REDUCTION(float2)
This however is not very comfortable. Is there some type specifier like gentype available?
You should be able to do that starting with OpenCL 2.1 which let you use C++ and templates in kernel code Knronos's OpenCL page.
With that, you can simply write:
template <class T>
void reduce_f(__global __read_only T* a) {
// do something
}
However I am not 100% sure templates would be available in the definition of __kernel functions. If that is not the case, you would still need to wrap the kernel declaration within preprocessing macros as so:
#define REDUCTION(type) __kernel void reduce_##type##_f(__global __read_only type* a) \
{ \
return reduce_t(a); \
}
REDUCTION(float)
Related
I have the following example code:
int compute_stuff(int *array)
{
/* do stuff with array */
...
return x;
}
__kernel void my_kernel()
{
__local int local_mem_block[LENGTH*MY_LOCAL_WORK_SIZE];
int result;
/* do stuff with local memory block */
result = compute_stuff(local_mem_block + (LENGTH*get_local_id(0)));
...
}
The above example compiles and executes fine on my NVIDIA card (RTX 2080).
But when I try to compile on a Macbook with AMD card, I get the following error:
error: passing '__local int *' to parameter of type '__private int *' changes address space of pointer
OK, so then I change the "compute_stuff" function to the following:
int compute_stuff(__local int *array)
Now both NVIDIA and AMD compile it fine, no problem...
But then I have one more test, to compile it on the same Macbook using WINE (rather than boot to Windows in bootcamp), and it gives the following error:
error: parameter may not be qualified with an address space
So it seems as though one is not supposed to qualify a function parameter with an address space. Fair enough. But if I do not do that, then the AMD on native Windows thinks that I am trying to change the address space of the pointer to private (I guess because it assumes that all function arguments will be private?).
What is a good way to handle this so that all three environments are happy to compile it? As a last resort, I am thinking of simply having the program check to see if the build failed without qualifier, and if so, substitute in the "__local" qualifier and build a second time... Seems like a hack, but it could work.
I agree with ProjectPhysX that it appears to be a bug with the WINE implementation. I also found the following appears to satisfy all three environments:
int compute_stuff(__local int * __private array)
{
...
}
__kernel void my_kernel()
{
__local int local_mem_block[LENGTH*MY_LOCAL_WORK_SIZE];
__local int * __private samples;
samples = local_mem_block + (LENGTH*get_local_id(0));
result = compute_stuff(samples);
}
The above is explicitly stating that the pointer itself is private while the memory it is pointing to is kept in local address space. So this removes any ambiguity.
The int* in int compute_stuff(int *array) is __generic address space. The call result = compute_stuff(local_mem_block+...); implicitly converts it to __local, which is allowed according to the OpenCL 2.0 Khronos specification.
It could be that AMD defaults to OpenCL 1.2. Maybe explicitely set –cl-std=CL2.0 in clBuildProgram() or clCompileProgram().
To keep the code compatible with OpenCL 1.2, you can explicitly set the pointer in the function to __local: int compute_stuff(__local int *array). OpenCL allows to set function parameters to the address spaces __global and __local. WINE seems to have a bug here. Maybe inlining the function can solve it: int __attribute__((always_inline)) compute_stuff(__local int *array).
As a last resort, you can do your proposed method. You can detect if it runs on WINE system like this. With that, you could switch between the two code variants without compiling twice and detecting the error.
I am writing a function:
void callFunctionAt(uint32_t address){
//There is a void at address, how do I run it?
}
This is in Atmel Studio's C++. If previous questions are to be believed, the simple answer is to write the line "address();". This cannot be correct. Without changing the header of this function, how would one call the function located at the address given?
The answer should be system-agnostic for all micro controllers which support standard c++ compilation.
The common way to do this is to give the argument the correct type. Then you can call it right away:
void callFunctionAt(void (*address)()) {
address();
}
However, since you wrote "Without changing the header of this function [...]", you need to cast the unsigned integer to a function pointer:
void callFunctionAt(uint32_t address) {
void (*f)() = reinterpret_cast<void (*f)()>(address);
f();
}
But this is not safe and not portabel because it assumes that the uint32_t can be casted into a function pointer. And this needs not to be true: "[...] system-agnostic for all micro controllers [...]". Function pointers can have other widths than 32 bits. Pointers in general might consist of more than the pure address, for example include a selector for memory spaces, depending on the system's architecture.
If you got the address from a linker script, you might have declared it like this:
extern const uint32_t ext_func;
And like to use it so:
callFunctionAt(ext_func);
But you can change the declaration into:
extern void ext_func();
And call it directly or indirectly:
ext_func();
callFunctionAt(&ext_func);
The definition in the linker can stay as it is, because the linker knows nothing about types.
There is no generic way. It depends on which compiler you are using. In the following I'll assume avr-g++ because it's common and freely available.
Spoiler: On AVR, it's more complicated than on most other machines.
Suppose you actually have a uint32_t address which would be a byte address. Function pointers in avr-g++ are word addresses actually, where a word has 16 bits. Hence, you'll have to divide the byte address by 2 first to get a word address; then cast it to a function pointer and call it:
#include <stdint.h>
typedef void (*func_t)(void);
void callFunctionAt (uint32_t byte_address)
{
func_t func = (func_t) (byte_address >> 1);
func();
}
If you started with a word address, then you can call it without further ado:
void callFunctionAt (uint32_t address)
{
((func_t) word_address)();
}
This will only work for devices with up to 128KiB of flash memory!
The reason is that addresses in avr-g++ are 16 bits long, cf. the layout of void* as per avr-gcc ABI. This means using scalar addresses on devices with flash > 128KiB will not work in general, for example when you issue callFunctionAt (0x30000) on an ATmega2560.
On such devices, the 16-bit address in Z register used by EICALL instruction is extended by the value held in the EIND special function register, and you must not change EIND after entering main. The avr-g++ documentation is clear about that.
The crucial point here is how you are getting the address. First, in order to call and pass it around properly, use a function pointer:
typedef void (*func_t)(void);
void callFunctionAt (func_t address)
{
address();
}
void func (void);
void call_func()
{
func_t addr = func;
callFunctionAt (addr);
}
I am using void argument in the declaration because this is how you'd do it in C.
Or, if you don't like the typedef:
void callFunctionAt (void (*address)(void))
{
address();
}
void func (void);
void call_func ()
{
void (*addr)(void) = func;
callFunctionAt (addr);
}
If you want to call a function at a specific word address like, for example 0x0 to "reset"1 the µC, you could
void call_0x0()
{
callFunctionAt ((func_t) 0x0);
}
but whether this works depends on where your vector table is located, or more specifically, how EIND was initialized by the startup code. What will always work is using a symbol and define it with -Wl,--defsym,func=0 when linking with the following code:
extern "C" void func();
void call_func ()
{
void (*addr)(void) = func;
callFunctionAt (addr);
}
The big difference compared to using 0x0 directly it that the compiler will wrap symbol func with symbol modifier gs which it will not do when using 0x0 directly:
_Z9call_funcv:
ldi r24,lo8(gs(func))
ldi r25,hi8(gs(func))
jmp _Z14callFunctionAtPFvvE
This is needed if the address is out of the scope of EIJMP to advise the linker to generate a stub.
1 This will not reset the hardware. The best approach to force a reset is by letting the watchdog timer (WDT) issue a reset for you.
Methods
Yet another situation is when you want the address of a non-static method of a class because you also need a this pointer in that case:
class A
{
int a = 1;
public:
int method1 () { return a += 1; }
int method2 () { return a += 2; }
};
void callFunctionAt (A *b, int (A::*f)())
{
A a;
(a.*f)();
(b->*f)();
}
void call_method ()
{
A a;
callFunctionAt (&a, &A::method1);
callFunctionAt (&a, &A::method2);
}
The 2nd argument of callFunctionAt specifies which method (of a given prototype) you want, but you also need an object (or pointer to one) to apply it. avr-g++ will use gs when taking the method's address (provided the following call(s) cannot be inlined), thus it will also work for all AVR devices.
Based on comments I think you are asking about how microcontroller calls function.
Could you compile your program to see assembly files?
I would recommend you to read one of them.
Every function after compiling are translated to instructions that CPU can do (loading to register, adding to register etc.).
So then your void foo(int x) {statements;} compile to simple CPU instructions and whenever you call foo(x) in your program, you are moving to instructions that are related to foo - you are calling a subroutine.
As far as I remeber there is a CALL function in AVR to invoke subroutines and the name of subroutine is the label where executing program jump and invoking next instruction at adress.
I think you can clarify your doubts when you read some AVR assembly tutorials.
It is fun (at least for me) to see what exactly CPU do when it calls function that I wrote, but it required to know what instructions do. You develop in AVR so there is a set of instructions that you can read about in this PDF and compare with your assembly files.
I would like to have a Variable with Read-Access to all kernels/functions inside a CL Program. For this i have created a variable at the top of the File and prefixed it with __global.
typedef struct{
/* whatever */
} GlobalParameters;
__global GlobalParameters params;
how can i set the Values inside that Struct from the Host code now? Is that even Possible, or how can i edit it else? Or do i have to pass it as Parameter to the kernel every time i need it?
Program scope variables are meant to be constants and need to be initialized.
So, this works like:
typedef struct{
float whatever;
} GlobalParameters;
__constant GlobalParameters params=(GlobalParameters){3.14f};
then you can use it anywhere. But if opencl-compile-time is ok for it, you can alter it with string replacement after preaparing the host-side constant buffer:
typedef struct{
float whatever;
} GlobalParameters;
__constant GlobalParameters params=(GlobalParameters){##replace_0##};
if this is used for minutes per change, you can re-compile it using new string replacement before device-kernel-compiling. If there are non-changing sets, you can compile N times for different kernel programs and switch between them using different contexts.
I have seen in one post here that we can call a function from an OpenCL kernel. But in my situation, I need that complex function to be parallelized (run by all available threads) as well, so do I have to make that function a kernel too and call it straight away like function from the main kernel ? or whats possible solution for this situation? Thanks in advance
You can call helper functions from your kernel and they will be parallelized in the same manner as the kernel, imagine them as inlined inside your kernel code. So, each work item will invoke the helper function for the working set it handles.
float4 helper_function(float4 input)
{
return input.x + input.y + input.z + input.w;
}
__kernel kernel_function(const float4* arr, float4* out)
{
id = get_global_id(0);
out[id] = helper_function(arr[id]);
}
OpenCL 2.0 spec added a new feature for dynamic paralelism.
6.13.17 Enqueuing Kernels
OpenCL 2.0 allows a kernel to independently enqueue to the same device, without host
interaction. ...
In the example below my_func_B enqueus my_func_A on the device:
kernel void
my_func_A(global int *a, global int *b, global int *c)
{
...
}
kernel void
my_func_B(global int *a, global int *b, global int *c)
{
ndrange_t ndrange;
// build ndrange information
...
// example – enqueue a kernel as a block
enqueue_kernel(get_default_queue(), ndrange, ^{my_func_A(a, b, c);});
...
}
If I understand your question correctly, you want to do a separate full pass over a buffer from inside the kernel. I don't think that is possible from within the kernel, so you'd have to create the code for the "inner" pass as a separate kernel and also call that kernel separately from your host code. The output from that kernel doesn't have to be read back to the host memory, but can stay in device memory between your kernel calls.
I have some general parameters declared as a global (__constant) struct, like so:
typedef struct
{
int a;
int b;
float c;
/// blah blah
} SomeParams;
__constant SomeParams Parameters;
in the kernel, I need to use it like so:
__kernel void Foo()
{
int a = Parameters.a;
/// do something useful...
}
I'm not sure how I can initialize the value of Parameters from the host before I execute the kernel.
I have no problem creating buffers, etc for kernel arguments, but since this isn't a kernel argument, what do I need to do?
I'm using the Cloo C#/OpenCL bindings, but even a raw CL API would be helpful.
As far as I know (but I wouldn't swear by this), you can't initialize variables from the host code that are declared in that way (with one exception, see below). You could declare a variable and initialize it like this:
__constant float pi = 3.14f;
You could also do something like this:
Kernel: __constant float width = WIDTH
Host: Build the kernel with a -D build parameter defining the value of WIDTH.
What I have done in the past is have the constant variable as a kernel parameter.
__kernel void Foo(__constant SomeParams Parameters)
{
int a = Parameters.a;
/// do something useful...
}
Then you can allocate and set the value just like any other kernel argument.