I want to execute fft on gpu. I am using arrayFire library for that. Since whatever we write inside the opencl kernel will be executing on the gpu(specified device). Can we call fft function inside kernel of opencl.
ArrayFire is a high level library that allows users to compute on GPUs without having to write kernels. ArrayFire provides a high level API for this purpose.
It is NOT possible to call ArrayFire function from inside a kernel as it violates the basic principles of GPU computing.
Related
CUDA MPS allows you to run multiple processes in parallel on the GPU, thus fully utilizing the GPU for operations that don't take full advantage. Is there an equivalent for OpenCL? Or is there a different approach in OpenCL?
If you use multiple OpenCL command queues that don't have event interdependencies, an OpenCL runtime could keep the GPU cores busy with varied work from each queue. It's really up to the implementation as to whether this actually happens. You'd need to check each vendor's OpenCL guide to see if they support concurrent GPU kernels.
I've been through several resources: the OpenCL Khronos book, GATech tutorial, NYU tutorial, and I could go through more. But I still don't understand fully. What is the difference between a kernel and a program object?
So far the best explanation is this for me, but this is not enough for me to fully understand:
PROGRAM OBJECT: A program object encapsulates some source code (with potentially several kernel functions) and its last successful build.
KERNEL: A kernel object encapsulates the values of the kernel’s
arguments used when the kernel is executed.
Maybe a program object is the code? And the kernel is the compiled executable? Is that it? Because I could understand something like that.
Thanks in advance!
A program is a collection of one or more kernels plus optionally supporting functions. A program could be created from source or from several types of binaries (e.g. SPIR, SPIR-V, native). Some program objects (created from source or from intermediate binaries) need to be built for one or more devices (with clBuildProgram or clCompileProgram and clLinkProgram) prior to selecting kernels from them. The easiest way to think about programs is that they are like DLLs and export kernels for use by the programmer.
Kernel is an executable entity (not necessarily compiled, since you can have built-in kernels that represent piece of hardware (e.g. Video Motion Estimation kernels on Intel hardware)), you can bind its arguments and submit them to various queues for execution.
For an OpenCL context, we can create multiple Program objects. First, I will describe the uses of program objects in the OpenCL application.
To facilitate the compilation of the kernels for the devices to which the program is
attached
To provide facilities for determining build errors and querying the program for information
An OpenCL application uses kernel objects to execute a function parallelly on the device. Kernel objects are created from program objects. A program object can have multiple kernel objects.
As we know, to execute kernel we need to pass arguments to it. The primary purpose of kernel objects are this.
To get more clear about it here is an analogy which is given in the book "OpenCL Programming Guide" by Aaftab Munshi et al
An analogy that may be helpful in understanding the distinction between kernel objects and program objects is that the program object is like a dynamic library in that it holds a collection of kernel functions. The kernel object is like a handle to a function within the dynamic library. The program object is created from either source code (OpenCL C) or a compiled program binary (more on this later). The program gets built for any of the devices to which the program object is attached. The kernel object is then used to access properties of the compiled kernel function, enqueue calls to it, and set its arguments.
Suppose you create two threads and making both of them entering a loop there both of them start the same kernel which uses same opencl memory object (Buffer in cl.hpp in my case). Will it work properly? Do opencl allow to run in the same time different kernels with the same memory object?
(I am using opencl C++ wrapper cl.hpp and beignet Intel open source library.)
If both threads are using the same in-order command queue, it will work just fine; it just becomes a race as to which thread enqueues their work first. From the OpenCL runtime point of view, it's just commands in a queue.
OpenCL 1.1 (and newer) is threadsafe except for clSetKernelArg and clEnqueueNDRangeKernel for a given kernel; you'll need to lock around that.
If however your threads are using two different command queues then you shouldn't be using the same memory object without then using OpenCL Event objects to synchronize. Unless it is read-only; that should be fine.
Read operation on same OpenCL memory objects, by concurrent kernels, wouldn't cause any functionality issue. In case of write operation, it sure will cause functionality issues.
What is the objective of running multiple kernels concurrently? Please check this answer to similar question.
I am going to parallelize the process of Encryption/Decryption by using OpenCL.
For that I just want to use existing openSSL crypto library function instead of creating own algorithms like AES ,DES.
So that I am going to call a openSSL crypto function from OpenCL kernel.
Can you please clarify my query, is it possible or not?
No, you are restricted to built-in functions and functions defined by yourself on kernel level. This becomes immediately clear (in case of a GPU), if you see host and device as two separate entitities which can only communicate through a command queue and its associated calls.
Little disclaimer: This is more the kind of theoretical / academic question than an actual problem I've got.
The usual way of setting up a parallel program in OpenCL is to write a C/C++ program, which sets up the devices (GPU and/or other CPUs), kernel and data buffers for executing the kernel on the device.
This program gets launched from the host, which used to be a CPU.
Would it be possible to write a OpenCL program where the host is a GPU and the devices other GPUs and/or CPUs?
What would be the prerequisites for such a scenario?
Do one need a special GPU or would it be possible to use any OpenCL-capable GPU?
Are you looking for a complete host or just a kernel launcher?
Up coming CUDA (v 5.0) introduces a feature to launch a kernel inside a kernel. Therefore, a device can be used for launching a kernel on itself. May be this feature will be supported by OpenCL too in near future.