Handling Multiple OpenCL Versions and Platforms - opencl

Intel recently updated its OpenCL SDK to the 2.0 specification. AMD is still on 1.2, and Nvidia on 1.1. Essentially, this means each GPU platform is now on its own version.
OpenCL does not appear to be designed in the same way OpenGL is in terms of how deprecation works. As far as I know there's no way to request a compatibility version, and Intel even incorporates build errors in its SDK preventing you from calling deprecated functions.
If I want to support every platform using a minimum version (1.1, most likely), what is required of me?

Making only ifdef statements unfortunately doesn't work if you have more than one platform, and they support different OpenCL versions. For instance POCL which installs on the CPU supports 2.0, so you need to have the 2.0 OpenCL headers, but most GPU's and open source drivers, only support OpenCL 1.1 or 1.2.
The best option seems to be to get the OpenCL platform version info, and base what commands are called based on that. Unfortunately it is a char[] so may have to parse it out.
Here is an example of how to get the platform info string.
clGetPlatformInfo(platforms[platform_indexFinger], CL_PLATFORM_VERSION, INFO_LENGTH, &platformInfo, &realSize);
Typically the version info is of the form: "OpenCL 1.2 implementation name"
Here is a little function I made to diagnose the current opencl number
float diagnoseOpenCLnumber(cl_platform_id platform) {
#define VERSION_LENGTH 64
char complete_version[VERSION_LENGTH];
size_t realSize = 0;
clGetPlatformInfo(platform, CL_PLATFORM_VERSION, VERSION_LENGTH,
&complete_version, &realSize);
char version[4];
version[3] = 0;
memcpy(version, &complete_version[7], 3);
// printf("V %s %f\n", version, version_float);
float version_float = atof(version);
return version_float;
}
Can then use it like so, for example with command queue function which were modified for 2.0
float version_float = diagnoseOpenCLnumber(platform_id);
if (version_float >= 2.0) {
command_waiting_line =
clCreateCommandQueueWithProperties(context, device_id, 0, &return_number);
else {
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
command_waiting_line =
clCreateCommandQueue(context, device_id, 0, &return_number);
#pragma GCC diagnostic pop
}

AFAIK deprecated functions do not have to be implemented, hence code should check the OpenCL platform version number and avoid calling deprecated functions on that platform. See this earlier discussion: http://www.khronos.org/message_boards/showthread.php/8514-clCreateImage-2D-3D-vs-the-ICD-loader. At present, calling deprecated OpenCL 1.1 functions on AMD or Intel platforms (OpenCL 1.2) still works, but there are no guarantees that this will remain true in the future or on other platforms. I guess that as soon as supporting those deprecated functions becomes too much hassle for the maintainers of an implementation, they'll be removed.
Admittedly, I'm naughty as I have just ignored the problem and continued to use OpenCL 1.1 functions. However, if you are starting a new project (and have the time) then rather wrap the deprecated functions in some sort of generic function that has paths for each version of OpenCL - faster to do it now than later in my opinion. There is a list of frameworks and libraries at http://www.khronos.org/opencl/resources. Perhaps you will find that one of them solves this problem well enough. If not, and if you have enough time then you could build a framework that hides most of the OpenCL functions from your program. Then, as more functions get deprecated you will hopefully only need to change your framework, but not the programs that use it. At the moment, I don't know of any framework that does this for one in C++.

In the header cl.h you'll find a list of definitions like the following:
...
#define CL_VERSION_1_0 1
#define CL_VERSION_1_1 1
#define CL_VERSION_1_2 1
#define CL_VERSION_2_0 1
...
In my case I had annoying warnings about a deprecated function if I was using OpenCL 2.0 to build. So my quick/dirty solution was to do
#ifdef CL_VERSION_2_0
//call 2.0 Function
#else
//call deprecated Function
#endif
Although this might require several fix in your code it's the way to me if you want to compile based on the opencl library available.
Note that if you are using opencl 1.2 you'll get the definition of all the previous versions (so like in the example above CL_VERSION_1_1 and CL_VERSION_1_0 will be defined as well)
Hope this helps

Related

Reason to use Qt standard library function wrappers

Is there any reason to use Qt standard function wrappers like qstrncpy instead of strncpy?
I could not find any hint in documentation. And I'm curious if there is any functional difference. It looks like making code dependent on Qt, even in not mandatory places.
I found this: Qt wrapper for C libraries
But it doesn't answer my question.
These methods are part of Qt's efforts for platform-independence. Qt tries to hide platform differences and use the best each platform has to offer, replicating that functionality on platforms where it is not available. Here is what the documentation of qstrncpy has to say:
A safe strncpy() function.
Copies at most len bytes from src (stopping at len or the terminating '\0' whichever comes first) into dst and returns a pointer to dst. Guarantees that dst is '\0'-terminated. If src or dst is nullptr, returns nullptr immediately.
[…]
Note: When compiling with Visual C++ compiler version 14.00 (Visual C++ 2005) or later, internally the function strncpy_s will be used.
So qstrncpy is safer than strncpy.
The Qt wrappers for these functions are safer than the standard ones because they guarantee the destination string will always be null-terminated. strncpy() does not guarantee this.
In C11, strncpy_s() and other _s() suffixed functions were added as safe string functions. However, they are not available in any C++ standard, they are C-only. The Qt wrappers fix this.

How do I make an extension of qobject cross platform

I'm making a way of getting truly global hotkeys (I.e. emits a signal on certain inputs even when app is out of focus)
This will require different code for win vs osx vs x11. In qt creator how should I go about making this suitable for cross platform development.
Edit: I don't want to know how to do the actual code with x11, windows etc. I just want to know how I would do separate definitions for each platform.
I don't want to know how to do the actual code with x11, windows etc.
I just want to know how I would do separate definitions for each
platform.
It is convenient to do with ifdefs based on pre-defined compiler symbols e.g.:
http://sourceforge.net/p/predef/wiki/OperatingSystems/
#ifdef __linux__ // linux related, GCC has that macros
// your code and/or definitions
#endif
#if defined(__linux__) || defined(macintosh)
// your code and/or definitions
#endif
You may use OR logic there as well, as many things on Mac resemble Linux and vise versa. Mind the compiler, though, if it has that symbol. I would use OR logic with all platform symbols of applicable compilers.
If you're willing to use Qt, it has OS and compiler defines easily available for you:
#include <QtGlobal>
#if defined(Q_OS_LINUX)
// linux-specifc
#elif defined(Q_OS_WIN32) && defined(Q_CC_MSVC)
// win32 and msvc
#endif
Documentation.

Is there a general binary intermediate representation for OpenCL kernel programming?

as I understood, the OpenCL uses a modified C language (by adding some keywords like __global) as the general purpose for defining kernel function. And now I am doing a front-end inside F# language, which has a code quotation feature that can do meta programming (you can think it as some kind of reflection tech). So I would like to know if there is a general binary intermediate representation for the kernel instead of C source file.
I know that CUDA supports LLVM IR for the binary intermediate representation, so we can create kernel programmatically, and I want to do the same thing with OpenCL. But the document says that the binary format is not specified, each implementation can use their own binary format. So is there any general purpose IR which can be generated by program and can also run with NVIDIA, AMD, Intel implementation of OpenCL?
Thansk.
No, not yet. Khronos is working on SPIR (the spec is still provisional), which would hopefully become this. As far as I can tell, none of the major implementations support it yet. Unless you want to bet your project on its success and possibly delay your project for a year or two, you should probably start with generating code in the C dialect.

Calling external functions when using OpenCL for CPU Device

I am evaluating the possibility for using OpenCL for just-in-time compilation of performance-critical mathematical expressions for CPU devices. I am currently using LLVM directly (or rather, I have a working proof-of-concept), but would find the abstraction offered by OpenCL very useful going forward.
I am now trying to figure out if there is some way to call functions with external linkage when using OpenCL for CPU devices, equivalent to the following in LLVM:
... = llvm::Function::Create(..., llvm::Function::ExternalLinkage, "...", ...);
Since my OpenCL implementation at least is built on top of LLVM, I was hoping that this would be possible somehow.
Does this function http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clEnqueueNativeKernel.html
accomplish what you are after?
Edit: credit where credit is due: https://stackoverflow.com/a/10807728/717881

Does CUDA support recursion?

Does CUDA support recursion?
It does on NVIDIA hardware supporting compute capability 2.0 and CUDA 3.1:
New language features added to CUDA C
/ C++ include:
Support for function
pointers and recursion make it easier
to port many existing algorithms to
Fermi GPUs
http://developer.nvidia.com/object/cuda_3_1_downloads.html
Function pointers:
http://developer.download.nvidia.com/compute/cuda/sdk/website/CUDA_Advanced_Topics.html#FunctionPointers
Recursion:
I can't find a code sample on NVIDIA's website, but on the forum someone post this:
__device__ int fact(int f)
{
if (f == 0)
return 1;
else
return f * fact(f - 1);
}
Yes, see the NVIDIA CUDA Programming Guide:
device functions only support recursion in device code compiled for devices
of compute capability 2.0.
You need a Fermi card to use them.
Even though it only supports recursion for specific chips, you can sometimes get away with "emulated" recursion: see how I used compile-time recursion for my CUDA raytracer.
In CUDA 4.1 release CUDA supports recursion only for __device__ function but not for __global__ function.
Only after 2.0 compute capability on compatible devices
Any recursive algorithm can be implemented with a stack and a loop. It's way more of a pain, but if you really need recursion, this can work.
Sure it does, but it requires the Kepler architecture to do so.
Check out their latest example on the classic quick sort.
http://blogs.nvidia.com/2012/09/how-tesla-k20-speeds-up-quicksort-a-familiar-comp-sci-code/
As far as i know, only latest Kepler GK110 supports dynamic parallelism, which allow this kind of recursive call and spawning of new threads within the kernel. Before Kepler GK110, it was not possible. And note that not all Kepler architecture supports this, only GK110 does.
If you need recursion, you probably need the Tesla K20.
I'm not sure if Fermi does supports it,never read of it. :\
But Kepler sure does. =)
CUDA 3.1 supports recursion
If your algorithm invovles alot of recursions, then support or not, it is not designed for GPUs, either redesign your algorthims or get a better CPU, either way it will be better (I bet in many cases, maginitudes better) then do recurisons on GPUs.
Yeah, it is supported on the actual version. But despite the fact it is possible to execute recursive functions, you must have in mind that the memory allocation from the execution stack cannot be predicted (the recursive function must be executed in order to know the true depth of the recursion), so your stack could result being not enough for your purposes and it could need a manual increment of the default stack size
Yes, it does support recursion. However, it is not a good idea to do recursion on GPU. Because each thread is going to do it.
Tried just now on my pc with a NVIDIA GPU with 1.1 Compute capability. It says recursion not yet supported. So its not got anything to do with the runtime but the hardware itself

Resources