Can I use external OpenCl libraries? - opencl

I want to use some external libraries (http://trac.osgeo.org/geos/) to perform some analytical tasks on Geometry objects(GIS). I want to perform these task using OpenCL on Cuda so that I can use the paralel power of GPU to perform these tasks in parallel on large set of data.So my question is:
Can I write kernel using these libraries?
Also How can I pass the objects of complex data structures of these libraries as an argument to the kernel/(in specific How can I create buffer of these complex objects??

An OpenCL program mostly consists of two parts
Host code - This is regular C/C++ code that calls functions in the OpenCL runtime and works just like any other code. This code needs to interface with any third-party libraries that may provide your program with (complex) data. It will also need to translate these complex data types to a set of simple data types (scalar, vector, other) that can be processed by piece 2.
Kernel code - The consists of a compiler that can convert a text/binary representation of a restricted kernel language (based on C99) to object code that can run on the target platform. This language and compiler has many restrictions including the fact that you cannot include/link in external libraries (maybe possible with native kernel that is runnable on the host CPU)
It is upto your host code to compile/setup the kernel, fetch/set up the data from any library/source, translate it into the appropriate scalar, vector or other data types permissible in an OpenCL kernel, run the kernel(s) that process the data and get the results back from the compute device to the host (if necessary) and then translate those simple data types back to whatever form required for consumption by the rest of the code.
So no - you cannot directly use a regular C++ library from inside the kernel. But you can do whatever you want to in the host code.

No, you can't use external libraries in OpenCL kernels. Remember, any kernels is required to be compiled when the OpenCl application runs because it can't know what platform it is running on beforehand.

Related

Is it possible for Opencl to cache some data while running between kernels?

I currently have a problem scenario where I'm doing graph computation tasks and I always need to update my vertex data on the host side, iterating through the computations to get the results. But in this process, the data about the edge is unchanged. I want to know if there is a way that I can use OpenCL to repeatedly write data, run the kernel, and read the data, some unchanged data can be saved on the device side to reduce communication costs. By the way, I am currently only able to run OpenCL under version 1.2.
Question 1:
Is it possible for Opencl to cache some data while running between kernels
Yes, it is possible in OpenCL programming model. Please check Buffer Objects, Image Objects and Pipes in OpenCL official documentation. Buffer objects can be manipulated by the host using OpenCL API calls.
Also the following OpenCL StackOverflow posts will further clarify your concept regarding caching in OpenCL:
OpenCL execution strategy for tree like dependency graph
OpenCL Buffer caching behaviour
Memory transfer between host and device in OpenCL?
And you need to check with caching techniques like double buffering in OpenCL.
Question 2:
I want to know if there is a way that I can use OpenCL to repeatedly write data, run the kernel, and read the data, some unchanged data can be saved on the device side to reduce communication costs
Yes, it is possible. You can either do it through batch processing or data tiling. Because as the overhead associated with each transfer, batching many small
transfers into one larger transfer performs significantly better than making each
transfer separately. There can be many examples of batching or data tiling. One cane be this:
OpenCL Kernel implementing im2col with batch
Miscellaneous:
If it is possible, please use the latest version of OpenCL. Version 1.2 is old.
Since you have not mentioned, programming model can differ between hardware accelerators like FPGA and GPU.

How does Open MPI implement datatype conversion?

MPI standard states that when parallel programs are running on heterogenerous environment, they may have different representations for a same datatype(like big endian and small endian machines for intergers), so datatype representation conversion might be needed when doing point to point communication. I don't know how Open MPI implements this.
For instance, current Open MPI uses UCX library defaultly, I have study some codes of UCX library and Open MPI's ucx module. However, for continuous datatype like MPI_INT, I didn't find any representation conversion happen. I wonder is it because I miss that part or the implementation didn't satisfy the standard?
If you want to run an Open MPI app on an heterogeneous cluster, you have to configure --enable-heterogeneous (this is disabled by default). Keep in mind this is supposed to work, but it is lightly tested, mainly because of a lack of interest/real use cases. FWIW, IBM Power is now little endian, and Fujitsu is moving from Sparc to ARM for HPC, so virtually all HPC processors are (or will soon be) little endian.
Open MPI uses convertors (see opal/datatype/opal_convertor.h) to pack the data before sending it, and unpack it once received.
The data is packed in its current endianness. Data conversion (e.g. swap bytes) is performed by the receiver if the sender has a different endianness.
There are two ways of using UCX : pml/ucx and pml/ob1+btl/ucx and I have tested none of them in a heterogeneous environment. If you are facing some issues with pml/ucx, try mpirun --mca pml ob1 ....

What is the difference between kernel and program object?

I've been through several resources: the OpenCL Khronos book, GATech tutorial, NYU tutorial, and I could go through more. But I still don't understand fully. What is the difference between a kernel and a program object?
So far the best explanation is this for me, but this is not enough for me to fully understand:
PROGRAM OBJECT: A program object encapsulates some source code (with potentially several kernel functions) and its last successful build.
KERNEL: A kernel object encapsulates the values of the kernel’s
arguments used when the kernel is executed.
Maybe a program object is the code? And the kernel is the compiled executable? Is that it? Because I could understand something like that.
Thanks in advance!
A program is a collection of one or more kernels plus optionally supporting functions. A program could be created from source or from several types of binaries (e.g. SPIR, SPIR-V, native). Some program objects (created from source or from intermediate binaries) need to be built for one or more devices (with clBuildProgram or clCompileProgram and clLinkProgram) prior to selecting kernels from them. The easiest way to think about programs is that they are like DLLs and export kernels for use by the programmer.
Kernel is an executable entity (not necessarily compiled, since you can have built-in kernels that represent piece of hardware (e.g. Video Motion Estimation kernels on Intel hardware)), you can bind its arguments and submit them to various queues for execution.
For an OpenCL context, we can create multiple Program objects. First, I will describe the uses of program objects in the OpenCL application.
To facilitate the compilation of the kernels for the devices to which the program is
attached
To provide facilities for determining build errors and querying the program for information
An OpenCL application uses kernel objects to execute a function parallelly on the device. Kernel objects are created from program objects. A program object can have multiple kernel objects.
As we know, to execute kernel we need to pass arguments to it. The primary purpose of kernel objects are this.
To get more clear about it here is an analogy which is given in the book "OpenCL Programming Guide" by Aaftab Munshi et al
An analogy that may be helpful in understanding the distinction between kernel objects and program objects is that the program object is like a dynamic library in that it holds a collection of kernel functions. The kernel object is like a handle to a function within the dynamic library. The program object is created from either source code (OpenCL C) or a compiled program binary (more on this later). The program gets built for any of the devices to which the program object is attached. The kernel object is then used to access properties of the compiled kernel function, enqueue calls to it, and set its arguments.

Call OpenCL CPU Kernel via function pointer

I want to use OpenCL as a simple C runtime JIT on the CPU. Because the kernels are ASCII, i can modify them at runtime, and compile/execute the code. This part is straightforward enough.
However, I'd like to have function pointer access to the resulting compiled kernel, so that it can be called conventionally from C code, rather then having to access the kernel through openCL API.
Obviously this only works on the CPU where the memory is shared.
It seems this should be possible, any thoughts?
No, it can't be done. You need to use clEnqueueTask. If you were somehow able to get the address of the CPU kernel and reverse engineer the parameters passed, it would be subject to change with a driver update.
If you need runtime compilation look at linking to LLVM or similar.

Can I call C library functions from OpenCL kernel?

I am going to parallelize the process of Encryption/Decryption by using OpenCL.
For that I just want to use existing openSSL crypto library function instead of creating own algorithms like AES ,DES.
So that I am going to call a openSSL crypto function from OpenCL kernel.
Can you please clarify my query, is it possible or not?
No, you are restricted to built-in functions and functions defined by yourself on kernel level. This becomes immediately clear (in case of a GPU), if you see host and device as two separate entitities which can only communicate through a command queue and its associated calls.

Resources