Is there a way to use C style function pointers in OpenCL?
In other words, I'd like to fill out a OpenCL struct with several values, as well as pointers to a OpenCL function. I'm not talking about going from a CPU function to a GPU function, I'm talking about going from a GPU function to a GPU function.
Is this possible?
--- EDIT ---
If not, it there a way around this? In CUDA we have object inheritance, and in 4.0 we even have virtual functions. About the only way I can find to implement a runtime dispatch like this is to resort to if statements, and that will get ugly really fast.
From the OpenCL 1.1 specification:
Section 6.8 (Restrictions) (a):
The use of pointers is somewhat restricted. The following rules apply:
Arguments to kernel functions declared in a program that are pointers
must be declared with the __global, __constant or __local qualifier.
A
pointer declared with the __constant, __local or __global qualifier
can only be assigned to a pointer declared with the __constant,
__local or
__global qualifier respectively.
Pointers to functions are not
allowed.
The usual work around I use for this is with macros. Evil, but currently inescapable. So I typically end up with something like:
#define FEATURENAME_START impl1_start
#define FEATURENAME_END impl1_end
I then either inject this into the kernel at compilation time or pass it as an argument to the OpenCL compiler. It's not quite runtime in the usual sense, but it can be runtime from the Host's perspective still, even if not the device.
AMD has future plans for hardware support for this, so there may be future extensions for them.
Related
I'm relatively new to OpenCL and was wondering about this. I'd heard that it was possible to JIT on some AMD gpus via OpenCL. Now, if this were to work syntactically as it does in c++, I would just write something like:
uint jitCode[MaxProgramSize];
ulong arguments[ArgumentsSize];
//fill jitCode with gcn bytecode, load up arguments
...
//Run the bytecode
void(*executeProgram)(ulong*);
executeProgram = (void(*)(ulong*)jitCode;
executeProgram(arguments);
Of course, something like that gives me error -11.
Can it be done, and if so what would be the proper way to do it?
...second followup if it can be done, what are the calling conventions like in OpenCL?
Is it possible to translate OpenCL-style SPIR-V to Vulkan-style SPIR-V?
I know that it is possible to use clspv to compile OpenCL C to Vulkan-style SPIR-V, but I haven't seen any indication that it also supports ingesting OpenCL-style SPIR-V.
Thank you for any suggestions if you know how to achieve this :)
I know that it is possible to use clspv to compile OpenCL C to
Vulkan-style SPIR-V, but I haven't seen any indication that it also
supports ingesting OpenCL-style SPIR-V.
clspv compiles to "Opencl-style SPIR-V". IOW, it uses OpenCL execution model and also OpenCL memory model. The answer to your question is no (in general). The problem is that e.g. GLSL uses logical memory model, which means pointers are abstract, so you can't have pointers to pointers. While OpenCL allows this, because it uses physical memory model. Plus there are other things in OpenCL which cannot be expressed in GLSL. You could try to write some translator, and it might work for some very simple code, but that's about it.
I know that there is no way using std classes such as string, vector, map or set in CUDA kernel. However, it's very uncomfortable without them. I have to write a lot of code in CUDA kernel, so I would like to use at least strings and vectors. I'm not talking about something like thrust. I want to be able to write something like this:
__global__ void kernel()
{
cuda_vector<int> a;
for(int i=0;i<10;i++)
a.push_back(i);
}
int main()
{
kernel<<<1,512>>>();
return 0;
}
This should create 512 threads and in each thread I want to create cuda_vector class and use it as std::vector. I didn't find any solution on the internet and I started to write my own class. Each function of this class is defined as "__ host __ " and " __ device __" function so that I can use it on both CPU and GPU.
Theoretically, it can be implemented, however only on Fermi architecture. Because, we need to allocate memory dynamically. I have GTX 580 and started to write my own Vector. But it's tiring and needs a lot of time. Isn't there any implementation which I can use? I can't believe that there isn't any. Do so many software developers write on CUDA without it? And noone tried to write his/her own version?
The reason you don't find something like std::vector for cuda is performance. Your traditional vector object doesn't fit well with the CUDA model. If you are planning on using only 512 threads and each one will be managing a std::vector like object your performance is going to be worse than running the same code on the CPU.
GPU threads are not like CPU threads, they should be as light as possible. Use thread blocks and shared memory to have the threads cooperate. If you are manipulating a string, each thread should be working on one character, if you are using vectors in the CPU pass an array of that to the GPU, and have each thread work on one element. Basically, think about how to solve the problem with the CUDA programming model as apposed to solving it with a CPU approach and then translating it to CUDA.
I've not used it, but the CuPP framework may be of interest to you, especially the vector<T> implementation. Looks like it could do what you need it to do.
The OpenCL language, which extends C99, does not provide the memcpy function. What should be used instead?
As far as I know, there is nothing like that defined in OpenCL. OpenCL does not provide a concept like dynamic memory and therefore, such functionality is not needed.
You could just run over your array with for and copy the data element by element. But, the target array is of fixed size due to the need to specify the array length at compile time.
On the other side, OpenCL (and OpenGL as a kind of origin) was defined in a more static way. The data needs to be provided to the GPU and the result size needs to be defined. The GPU calculates the input to the pre-defined output location. It is not meant to create more processes within the GPU and it is also not meant to allocate dynamically memory to not disturbed the host doing it.
I'm fairly new to C so be gentle.
I want to use the library interception method for Linux to replace calls to the OpenCL library with my own library. I understand that this can be done using LD_PRELOAD. So I can just re-implement the OpenCL functions as defined in the OpenCL header file within my own library which can then be linked against.
The problem is that this OpenCL header also contains some extern struct definitions, e.g.
typedef struct _cl_mem * cl_mem;
which are not defined within the OpenCL header. Is it possible these structs are defined within the OpenCL shared lib? If not, where might they be defined?
Cheers
Chris
That typedef declares a type pointing to a struct, the contents of which are undeclared. This means code using it can't do things like checking its size, copying the struct, or inspecting its contents - it simply has no idea what size it is.
This is a traditional technique in C to create an opaque, or private, type. You can declare the struct inside your OpenCL library, and the official header puts no restrictions on what that struct contains. It could even be empty, if all you need is an ID you can store in the pointer itself, though this is rarely done.
An example of the same technique used in the standard C library is the FILE type. It might be as simple as an integer file descriptor, or as complex as a struct containing the entire filesystem state; standard C code won't know. The particulars are known to the library only.
In short, you can declare that struct however you like - as long as you implement every function that handles that struct. The program that links to your library never handles the struct, only pointers to it.