Using static variables in Open CL? - opencl

Can we use static variables within the openCL kernel.
I tried to use it but got the following exception
ptxas application ptx input, line 11; error : Module-scoped variables in .local state space are not allowed with ABI ptxas fatal : Ptx assembly aborted due to errors
EDIT
I found a page that says its not supported. So how can we make a variable to retain its old value during multiple calls?

In OpenCL 1.2, all program scope variables must be in the __constant address space (see Section 6.5, page 224 of the specification), which means that you cannot have this kind of variable that can be both read and written by multiple kernels. Instead, you need to create a buffer object that you pass in as an argument to each kernel that needs it (it will retain its contents across multiple kernel calls).
In OpenCL 2.0, program scope variables in the __global address space are allowed, so when we have hardware and implementations for OpenCL 2.0 you will be able to do this sort of thing much more easily.

Related

propagation of contents of argc and argv by MPI runtime

Is it valid for a conformant MPI program to rely on the MPI runtime to start the process for each rank with the same contents of argc and argv? Or is it necessary to e.g. broadcast things from a designated master rank?
Just to be clear, it is only guaranteed that argc/argv are define after the call to MPI_Init(), even though the processes all exist before the call. This is why MPI_Init() takes pointers to argc and argv, specifically to enable them to be initialised on all processes by the MPI_Init() call.
It is therefore essential that you use:
MPI_Init(&argc, &argv);
and not
MPI_Init(NULL, NULL);
In practice, many MPI implementations make the command-line arguments available before the Init call, but you should not rely on this.
The standard doesn't make it clear whether that is the case or not as it tries too hard to abstract the actual process by which the MPI ranks come into existence.
On one side, Section 8.8 Portable MPI Process Startup recommends that a portable process launcher by the name of mpiexec exists (if required at all by the execution environment) and it is advisable that the launcher be able to be viewed as command-line version of MPI_COMM_SPAWN.
On the other side, MPI_COMM_SPAWN takes among its arguments an array of command-line arguments to be passed on to the spawned processes and those are supposed to be passed on (Section 10.3.2 Starting Processes and Establishing Communication):
Arguments are supplied to the program if this is allowed by the operating system. [...]
But the paragraph following the cited one is:
If a Fortran implementation supplies routines that allow a program to obtain its arguments, the arguments may be available through that mechanism. In C, if the operating system does not support arguments appearing in argv of main(), the MPI implementation may add the arguments to the argv that is passed to MPI_INIT. (emphasis mine)
I would therefore read this as: MPI implementations are advised to make their best to provide all ranks with the command-line arguments of the mpiexec command, but no absolute guarantee is given.

What is the difference between kernel and program object?

I've been through several resources: the OpenCL Khronos book, GATech tutorial, NYU tutorial, and I could go through more. But I still don't understand fully. What is the difference between a kernel and a program object?
So far the best explanation is this for me, but this is not enough for me to fully understand:
PROGRAM OBJECT: A program object encapsulates some source code (with potentially several kernel functions) and its last successful build.
KERNEL: A kernel object encapsulates the values of the kernel’s
arguments used when the kernel is executed.
Maybe a program object is the code? And the kernel is the compiled executable? Is that it? Because I could understand something like that.
Thanks in advance!
A program is a collection of one or more kernels plus optionally supporting functions. A program could be created from source or from several types of binaries (e.g. SPIR, SPIR-V, native). Some program objects (created from source or from intermediate binaries) need to be built for one or more devices (with clBuildProgram or clCompileProgram and clLinkProgram) prior to selecting kernels from them. The easiest way to think about programs is that they are like DLLs and export kernels for use by the programmer.
Kernel is an executable entity (not necessarily compiled, since you can have built-in kernels that represent piece of hardware (e.g. Video Motion Estimation kernels on Intel hardware)), you can bind its arguments and submit them to various queues for execution.
For an OpenCL context, we can create multiple Program objects. First, I will describe the uses of program objects in the OpenCL application.
To facilitate the compilation of the kernels for the devices to which the program is
attached
To provide facilities for determining build errors and querying the program for information
An OpenCL application uses kernel objects to execute a function parallelly on the device. Kernel objects are created from program objects. A program object can have multiple kernel objects.
As we know, to execute kernel we need to pass arguments to it. The primary purpose of kernel objects are this.
To get more clear about it here is an analogy which is given in the book "OpenCL Programming Guide" by Aaftab Munshi et al
An analogy that may be helpful in understanding the distinction between kernel objects and program objects is that the program object is like a dynamic library in that it holds a collection of kernel functions. The kernel object is like a handle to a function within the dynamic library. The program object is created from either source code (OpenCL C) or a compiled program binary (more on this later). The program gets built for any of the devices to which the program object is attached. The kernel object is then used to access properties of the compiled kernel function, enqueue calls to it, and set its arguments.

Call OpenCL CPU Kernel via function pointer

I want to use OpenCL as a simple C runtime JIT on the CPU. Because the kernels are ASCII, i can modify them at runtime, and compile/execute the code. This part is straightforward enough.
However, I'd like to have function pointer access to the resulting compiled kernel, so that it can be called conventionally from C code, rather then having to access the kernel through openCL API.
Obviously this only works on the CPU where the memory is shared.
It seems this should be possible, any thoughts?
No, it can't be done. You need to use clEnqueueTask. If you were somehow able to get the address of the CPU kernel and reverse engineer the parameters passed, it would be subject to change with a driver update.
If you need runtime compilation look at linking to LLVM or similar.

Can I use external OpenCl libraries?

I want to use some external libraries (http://trac.osgeo.org/geos/) to perform some analytical tasks on Geometry objects(GIS). I want to perform these task using OpenCL on Cuda so that I can use the paralel power of GPU to perform these tasks in parallel on large set of data.So my question is:
Can I write kernel using these libraries?
Also How can I pass the objects of complex data structures of these libraries as an argument to the kernel/(in specific How can I create buffer of these complex objects??
An OpenCL program mostly consists of two parts
Host code - This is regular C/C++ code that calls functions in the OpenCL runtime and works just like any other code. This code needs to interface with any third-party libraries that may provide your program with (complex) data. It will also need to translate these complex data types to a set of simple data types (scalar, vector, other) that can be processed by piece 2.
Kernel code - The consists of a compiler that can convert a text/binary representation of a restricted kernel language (based on C99) to object code that can run on the target platform. This language and compiler has many restrictions including the fact that you cannot include/link in external libraries (maybe possible with native kernel that is runnable on the host CPU)
It is upto your host code to compile/setup the kernel, fetch/set up the data from any library/source, translate it into the appropriate scalar, vector or other data types permissible in an OpenCL kernel, run the kernel(s) that process the data and get the results back from the compute device to the host (if necessary) and then translate those simple data types back to whatever form required for consumption by the rest of the code.
So no - you cannot directly use a regular C++ library from inside the kernel. But you can do whatever you want to in the host code.
No, you can't use external libraries in OpenCL kernels. Remember, any kernels is required to be compiled when the OpenCl application runs because it can't know what platform it is running on beforehand.

Repeated calling of enqueueNDRangeKernel in OpenCL

What other OpenCL functions should be called when enqueueNDRangeKernel is called repeatedly?
I have not been able to find a tutorial that shows the use of enqueueNDRangeKernel in this fashion and my coding attempts have unfortunately resulted in an unhandled exception error. A similar question has been asked before but the responses don't seem to apply to my situation.
I currently have a loop in which I call the OpenCL functions in the following sequence:
setArg
enqueueNDRangeKernel
enqueueMapBuffer
enqueueUnmapMemObject
I am calling setArg because the input to the kernel changes before each call to enqueueNDRangeKernel. I am calling enqueueMapBuffer and enqueueUnmapMemObject since the output from the kernel is used in the host code. The kernel runs ok the first time (the output is correct) but during the second pass through the loop I get an unhandled exception error when calling enqueueMapBuffer.
I am using the following set-up:
Intel OpenCL SDK with CL_DEVICE_TYPE_CPU (on an Intel i7 CPU)
Visual Studio 2010 IDE on Windows 7
Host Code is written in C++ with the OpenCL C++ bindings.
Thanks.
Problem Solved ... It turns out that I was using the correct sequence of OpenCL function calls. There was a problem in my kernel that only showed up after the first iteration of the loop.
I am trying make same thing as you but I am stuck at one point. I managed to make OpenCL program and Kernel, both working, but when I try loop it several times it works only when i loop whole code from creating an assigning device to dealloc all mem_...

Resources