Finding number of copy engines using OpenCL - opencl

In OpenCL is there any API for finding number of copy engines in GPU? In cuda we can check this with asyncEngineCount. What is the alternative in OpenCL?

Within the OpenCL standard, there is no alternative because this is a hardware- and vendor-specific implementation detail. However, in the future, NVIDIA might update its cl_nv_device_attribute_query extension. This extension already contains the CL_DEVICE_GPU_OVERLAP_NV device info, that returns true if asynchronous copy is possible.

Related

OpenCl is a Library or is a Compiler?

I started to learn OpenCl.
I read these links:
https://en.wikipedia.org/wiki/OpenCL
https://github.com/KhronosGroup/OpenCL-Guide/blob/main/chapters/os_tooling.md
https://www.khronos.org/opencl/
but I did not understand well that OpenCl is a library by including header file in source code or it is a compiler by using OpenCl C Compiler?!
It is both a library and a compiler.
The OpenCL C/C++ bindings that you include as header files and that you link against are a library. These provide the necessary functions and commands in C/C++ to control the device (GPU).
For the device, you write OpenCL C code. This language is not C or C++, but rather based on C99 and has some specific limitations (for example only 1D arrays) as well as extensions (math and vector functionality).
The OpenCL compiler sits in between the C/C++ bindings and the OpenCL C part. Using the C/C++ bindings, you run the compiler at runtime of the executable with the command clBuildProgram (C bindings) or program.build(...) (C++). It then at runtime once compiles the OpenCL C code to the device-specific assembly, which is different for every vendor. With an Nvidia GPU, you can for example look at the Compiler PTX assembly for the device.
If you know OpenGL then OpenCL works on the same principle.
Disclaimer: Self-learned knowledge ahead. I wanted to learn OpenCL and found the OpenCL term as confusing as you do. So I did some painful research until I got my first OpenCL hello-world program working.
Overview
OpenCL is an open standard - i.e. just API specification which targets heterogeneous computing hardware in particular.
The standard comprises a set of documents available here. It is up to the manufacturers to implement the standard for their devices and make OpenCL available e.g. through GPU drivers to users. Perhaps in the form of a shared library.
It is also up to the manufacturers to provide tools for developer to make applications using OpenCL.
Here's where it gets complicated.
SDKs
Manufacturers provide SDKs - software packages that contain everything the said developer needs. (See the link above). But they are specific for each - e.g. NVIDIA SDK won't work without their gpu.
ICD Loader
Because of SDKs being tied to a signle vendor, the most portable(IMHO) solution is to use what is known as Khronos' ICD loader. It is kind of "meta-driver" that will, during run-time, search for other ICDs present in the system by AMD, Intel, NVIDIA, and others; then forward them calls from our application. So, as a developer, we can develop against this generic driver and use clGetPlatformIDs to fetch the available platforms and devices. It is availble as libOpenCL.so, at least on Linux, and we should link against it.
Counterpart for OpenGL's libOpenGL, well almost, because the vast majority of OpenGL(1.1+) is present in the form of extensions and must be loaded separately with e.g. GLAD. In that sense, GLAD is very similar to the ICD loader.
Again, it does not contain any actual "computing" code, only stub implementations of the API which forward everything to the chosen platform's ICD.
Headers
We are still missing the headers, thankfully Khronos organization releases C headers and also C++ bindings. But nothing is stopping you from writing them yourself based on the official API documents. It would just be really tedious and error-prone.
Here we can find yet another parallel with OpenGL because the headers are also just the consequence of the Standard and GLAD generates them directly from its XML version! How cool is that?!
Summary
To write a simple OpenCL application we need to:
Download an ICD from the device's manufactures - e.g. up-to-date GPU drivers is enough.
Download the headers and place them in some folder.
Download, build, and install an ICD loader. It will likely need the headers too.
Include the headers, use API in them, and link against the ICD loader.
For Debian, maybe Ubuntu and others there is a simpler approach:
Download the drivers... look for <vendor>-opencl-icd, the drivers on Linux are usually not as monolithic as on Windows and might span many packages.
Install ocl-icd-opencl-dev which contains C, C++ headers + the loader.
Use the headers and link the library.

OpenCL scan code

I'm looking for a fast implementation of scan(prefixsum) in OpenCL. The best thing that I found is in the Nvidia SDK but it's old(2010).
Does anyone know any other implementation of Scan in OpenCL?
There are several open-source implementations of scan operation in OpenCL:
CLOGS, a library for higher-level operations on top of the OpenCL C++ API.
Boost.Compute, a C++ GPU Computing Library for OpenCL.
VexCL, a C++ vector expression template library for OpenCL/CUDA.
Bolt, a C++ template library optimized for GPUs.
The author of CLOGS wrote a paper comparing performance of scan (and sort) operations in these implementations.
if your device supports 2.0 then, use builtin operations for that.
https://stackoverflow.com/a/32394920/4877550
http://developer.amd.com/community/blog/2014/11/17/opencl-2-0-device-enqueue/

Getting OpenCL program code from GPU

I have program which use OpenCL do do math, how i can get source code of opencl, that execute on my gpu when this program do calculations?
The most straightforward approach is to look for the kernel string in the application. Sometimes you'll be able to just find its source lying in some .cl file, otherwise you can try to scan the application's binaries with something like strings. If the application is not purposefully obfuscating the kernel source, you're likely to find it using one of those methods.
A more bulletproof approach would be to catch the strings provided to the OpenCL API. You can even provide your own OpenCL implementation that just prints out the kernel strings in the relevant cl function. It's actually pretty easy: start with pocl and change the implementation of clCreateProgramWithSource to print out the input strings - this is a trivial code change.
You can then install that modified version as an OpenCL implementation and make sure the application uses it. This might be tricky if the application requires certain OpenCL capabilities, but your implementation can of course lie about those.
Notice that in the future, SPIR can make this sort of thing impossible - you'll be able to get an IR of the kernel, but not its source.
clGetProgramInfo(..., CL_PROGRAM_BINARIES, ...) gets you the compiled binary, but interpreting that is dependent upon the architecture. Various SDK's have different tools that might get you GPU assembly though.

How to let OpenCl see intel and nvidia devices?

I wonder how we can have OpenCl "seeing" my K20. Xeon, and Xeon Phi at the same time?
Especially I'm confused about the use of two libraries here (from NVidia and Intel).
How to do it, if possible at all?
The OpenCL Installable Client Driver (ICD) takes care of this for you. It is the same regardless of whose implementation you have installed, and exposes all implementations as separate OpenCL "Platforms".
When you call clGetPlatformIDs it will tell you how many platforms you have installed. There could be one for AMD, one for NVIDIA, and one for Intel, for example.
Then within each platform you call clGetDeviceIDs which will return the number of devices within that platform. On your NVIDIA platform you'll find your K20, and within your Intel platform you'll find your Xeon CPU and Xeon Phi co-processor.
If you build or download the clInfo utility you'll see a nice dump of all the installed platforms and devices and the capabilities of each.
The problem is solved.
Looking at the key directory:
/etc/OpenCL/vendors/*.icd
I noticed that for Nvidia the library in used was a link which was duplicated in difference places and pointing to two different releases.
I just replace the former one by the most recent one, the one I've installed recently, and here we go.
Opencl did not know which one to use I guess.
It's like the installation location has changed between the two nividia versions.
When I was supposed to have removed it before reinstalling that was actually not true.
Thank you all for your hell.

Dynamic parallelism is supported by OpenCL...?

I am trying to use recursion inside an OpenCL kernel. Compilation is successful but while running it is giving compilation error so I want to know, as Dynamic Parallelism is now supported by CUDA, does OpenCL support Dynamic Parallelism or not?
Recursion is not supported by OpenCL. See point i in section 6.9 of the standard v1.2.
EDIT: The new Dynamic Parallelism capability of CUDA does't have anything to do with recursion (it was already supported a while ago by CUDA. See this question. This new capability allow threads running on the device to configure and launch new grids which was previously only done by the host. See this document for an overview.
SECOND EDIT: regarding the answer of #Michael: This is only the spec, you will have to wait for the implementation release. Besides, at that point in the future you will also have to make sure to have the proper hardware (even dynamic parallelism is supported by CUDA only for devices of capability 3.5 and higher). So when you asked your question, and still today: NO OpenCL implementation supports dynamic parallelism.
Dynamic Parallelism in now supported in OpenCL 2.
Khronos Group announced it at Siggraph 2013.
You can find the specifications here

Resources