I am going to parallelize the process of Encryption/Decryption by using OpenCL.
For that I just want to use existing openSSL crypto library function instead of creating own algorithms like AES ,DES.
So that I am going to call a openSSL crypto function from OpenCL kernel.
Can you please clarify my query, is it possible or not?
No, you are restricted to built-in functions and functions defined by yourself on kernel level. This becomes immediately clear (in case of a GPU), if you see host and device as two separate entitities which can only communicate through a command queue and its associated calls.
Related
I have a collection of thousands of SYCL kernels to execute. Once each of these kernels has finished, I need to execute a function on a cl::sycl::buffer written to by said kernel.
The methods I'm aware of for achieving this are:
by using RAII; the requisite global memory is copied back to the host upon destruction of the cl::sycl::buffer
by constructing a host cl::sycl::accessor (with cl::sycl::access::target::host_buffer)
Both of these methods are synchronous and blocking. Is it possible to instead attach an asynchronous callback/continuation when submitting kernels to a cl::sycl::queue that executes as soon as the kernel has finished? Or even better, can the same functionality be achieved with C++2a coroutines? If not, is such a feature planned for SYCL?
The feature to attach callbacks or execute on the host from a SYCL queue did not make the cut for SYCL 1.2.1.
There are some proposals being discussed at the moment to bring that feature into the next version of the standard, but everything is still internal to the SYCL group.
In the meantime, if you use ComputeCpp, you can use the host_handler extension, which allows you to execute a lambda on the host based on dependencies from the device.
The open source compiler doesn't have that feature yet that I've seen.
My cluster utilizes MVAPICH2 over Infiniband FDR and and I am considering the
use of RDMA for my simulations. I am aware of the MPI_Put and MPI_Get calls for explicitly invoking RDMA operations, however I would like to know if this is the only way to use RDMA within MPI.
My current implementation involves channel semantics (send/receive) for communication, along with MPI_Reduce and MPI_Gatherv. I know that MVAPICH2 has configuration paramaters that can be used to enable RDMA. If a program using MPI has send/receive calls and RDMA is enabled, does MPI automatically convert from channel semantics over to memory semantics (put/get) or is the explicit use of MPI_Put and MPI_Get the only method for implementing RDMA in MVAPICH2?
MPI_Send requires a corresponding MPI_Receive, whether they are blocking or non-blocking doesnt matter as a send must be met with a receive. RDMA does not have this requirement and instead only implements either MPI_Put (write to remote memory) or MPI_Get (read from remote memory). I am trying to find out if enabling rdma while still using send and receives, allows MVAPICH2 to somehow automatically convert the send/receives into the appropriate rdma call.
If MVAPICH2 has been built with the correct options, it will use RDMA for all MPI operations including MPI_Send and MPI_Recv on supported hardware, which includes InfiniBand. So, you do not need to use MPI_Put/Get to take advantage of RDMA-capable hardware. In fact, using MPI_Send/Recv might be faster because they are often better optimized.
MPI libraries use various designs to translate MPI_Send/Recv operations to RDMA semantics. The details can be found in the literature.
I've been through several resources: the OpenCL Khronos book, GATech tutorial, NYU tutorial, and I could go through more. But I still don't understand fully. What is the difference between a kernel and a program object?
So far the best explanation is this for me, but this is not enough for me to fully understand:
PROGRAM OBJECT: A program object encapsulates some source code (with potentially several kernel functions) and its last successful build.
KERNEL: A kernel object encapsulates the values of the kernel’s
arguments used when the kernel is executed.
Maybe a program object is the code? And the kernel is the compiled executable? Is that it? Because I could understand something like that.
Thanks in advance!
A program is a collection of one or more kernels plus optionally supporting functions. A program could be created from source or from several types of binaries (e.g. SPIR, SPIR-V, native). Some program objects (created from source or from intermediate binaries) need to be built for one or more devices (with clBuildProgram or clCompileProgram and clLinkProgram) prior to selecting kernels from them. The easiest way to think about programs is that they are like DLLs and export kernels for use by the programmer.
Kernel is an executable entity (not necessarily compiled, since you can have built-in kernels that represent piece of hardware (e.g. Video Motion Estimation kernels on Intel hardware)), you can bind its arguments and submit them to various queues for execution.
For an OpenCL context, we can create multiple Program objects. First, I will describe the uses of program objects in the OpenCL application.
To facilitate the compilation of the kernels for the devices to which the program is
attached
To provide facilities for determining build errors and querying the program for information
An OpenCL application uses kernel objects to execute a function parallelly on the device. Kernel objects are created from program objects. A program object can have multiple kernel objects.
As we know, to execute kernel we need to pass arguments to it. The primary purpose of kernel objects are this.
To get more clear about it here is an analogy which is given in the book "OpenCL Programming Guide" by Aaftab Munshi et al
An analogy that may be helpful in understanding the distinction between kernel objects and program objects is that the program object is like a dynamic library in that it holds a collection of kernel functions. The kernel object is like a handle to a function within the dynamic library. The program object is created from either source code (OpenCL C) or a compiled program binary (more on this later). The program gets built for any of the devices to which the program object is attached. The kernel object is then used to access properties of the compiled kernel function, enqueue calls to it, and set its arguments.
I want to use OpenCL as a simple C runtime JIT on the CPU. Because the kernels are ASCII, i can modify them at runtime, and compile/execute the code. This part is straightforward enough.
However, I'd like to have function pointer access to the resulting compiled kernel, so that it can be called conventionally from C code, rather then having to access the kernel through openCL API.
Obviously this only works on the CPU where the memory is shared.
It seems this should be possible, any thoughts?
No, it can't be done. You need to use clEnqueueTask. If you were somehow able to get the address of the CPU kernel and reverse engineer the parameters passed, it would be subject to change with a driver update.
If you need runtime compilation look at linking to LLVM or similar.
I want to use some external libraries (http://trac.osgeo.org/geos/) to perform some analytical tasks on Geometry objects(GIS). I want to perform these task using OpenCL on Cuda so that I can use the paralel power of GPU to perform these tasks in parallel on large set of data.So my question is:
Can I write kernel using these libraries?
Also How can I pass the objects of complex data structures of these libraries as an argument to the kernel/(in specific How can I create buffer of these complex objects??
An OpenCL program mostly consists of two parts
Host code - This is regular C/C++ code that calls functions in the OpenCL runtime and works just like any other code. This code needs to interface with any third-party libraries that may provide your program with (complex) data. It will also need to translate these complex data types to a set of simple data types (scalar, vector, other) that can be processed by piece 2.
Kernel code - The consists of a compiler that can convert a text/binary representation of a restricted kernel language (based on C99) to object code that can run on the target platform. This language and compiler has many restrictions including the fact that you cannot include/link in external libraries (maybe possible with native kernel that is runnable on the host CPU)
It is upto your host code to compile/setup the kernel, fetch/set up the data from any library/source, translate it into the appropriate scalar, vector or other data types permissible in an OpenCL kernel, run the kernel(s) that process the data and get the results back from the compute device to the host (if necessary) and then translate those simple data types back to whatever form required for consumption by the rest of the code.
So no - you cannot directly use a regular C++ library from inside the kernel. But you can do whatever you want to in the host code.
No, you can't use external libraries in OpenCL kernels. Remember, any kernels is required to be compiled when the OpenCl application runs because it can't know what platform it is running on beforehand.