OpenCL intermediate language info? - opencl

I cannot find anything just explaining the syntax of the intermediate language. Does anybody know of any good documentation?

AFAIK, nothing called "OpenCL intermediate language" exists. There are vendor-specific intermediate languages used by some OpenCL implementations (such as NVIDIA's PTX and AMD's IL).
There is also the "Standard Portable Intermediate Representation" (SPIR) specification from Khronos which aims to be a cross-platform intermediate representation for OpenCL device code.

Related

OpenCl is a Library or is a Compiler?

I started to learn OpenCl.
I read these links:
https://en.wikipedia.org/wiki/OpenCL
https://github.com/KhronosGroup/OpenCL-Guide/blob/main/chapters/os_tooling.md
https://www.khronos.org/opencl/
but I did not understand well that OpenCl is a library by including header file in source code or it is a compiler by using OpenCl C Compiler?!
It is both a library and a compiler.
The OpenCL C/C++ bindings that you include as header files and that you link against are a library. These provide the necessary functions and commands in C/C++ to control the device (GPU).
For the device, you write OpenCL C code. This language is not C or C++, but rather based on C99 and has some specific limitations (for example only 1D arrays) as well as extensions (math and vector functionality).
The OpenCL compiler sits in between the C/C++ bindings and the OpenCL C part. Using the C/C++ bindings, you run the compiler at runtime of the executable with the command clBuildProgram (C bindings) or program.build(...) (C++). It then at runtime once compiles the OpenCL C code to the device-specific assembly, which is different for every vendor. With an Nvidia GPU, you can for example look at the Compiler PTX assembly for the device.
If you know OpenGL then OpenCL works on the same principle.
Disclaimer: Self-learned knowledge ahead. I wanted to learn OpenCL and found the OpenCL term as confusing as you do. So I did some painful research until I got my first OpenCL hello-world program working.
Overview
OpenCL is an open standard - i.e. just API specification which targets heterogeneous computing hardware in particular.
The standard comprises a set of documents available here. It is up to the manufacturers to implement the standard for their devices and make OpenCL available e.g. through GPU drivers to users. Perhaps in the form of a shared library.
It is also up to the manufacturers to provide tools for developer to make applications using OpenCL.
Here's where it gets complicated.
SDKs
Manufacturers provide SDKs - software packages that contain everything the said developer needs. (See the link above). But they are specific for each - e.g. NVIDIA SDK won't work without their gpu.
ICD Loader
Because of SDKs being tied to a signle vendor, the most portable(IMHO) solution is to use what is known as Khronos' ICD loader. It is kind of "meta-driver" that will, during run-time, search for other ICDs present in the system by AMD, Intel, NVIDIA, and others; then forward them calls from our application. So, as a developer, we can develop against this generic driver and use clGetPlatformIDs to fetch the available platforms and devices. It is availble as libOpenCL.so, at least on Linux, and we should link against it.
Counterpart for OpenGL's libOpenGL, well almost, because the vast majority of OpenGL(1.1+) is present in the form of extensions and must be loaded separately with e.g. GLAD. In that sense, GLAD is very similar to the ICD loader.
Again, it does not contain any actual "computing" code, only stub implementations of the API which forward everything to the chosen platform's ICD.
Headers
We are still missing the headers, thankfully Khronos organization releases C headers and also C++ bindings. But nothing is stopping you from writing them yourself based on the official API documents. It would just be really tedious and error-prone.
Here we can find yet another parallel with OpenGL because the headers are also just the consequence of the Standard and GLAD generates them directly from its XML version! How cool is that?!
Summary
To write a simple OpenCL application we need to:
Download an ICD from the device's manufactures - e.g. up-to-date GPU drivers is enough.
Download the headers and place them in some folder.
Download, build, and install an ICD loader. It will likely need the headers too.
Include the headers, use API in them, and link against the ICD loader.
For Debian, maybe Ubuntu and others there is a simpler approach:
Download the drivers... look for <vendor>-opencl-icd, the drivers on Linux are usually not as monolithic as on Windows and might span many packages.
Install ocl-icd-opencl-dev which contains C, C++ headers + the loader.
Use the headers and link the library.

OpenCL scan code

I'm looking for a fast implementation of scan(prefixsum) in OpenCL. The best thing that I found is in the Nvidia SDK but it's old(2010).
Does anyone know any other implementation of Scan in OpenCL?
There are several open-source implementations of scan operation in OpenCL:
CLOGS, a library for higher-level operations on top of the OpenCL C++ API.
Boost.Compute, a C++ GPU Computing Library for OpenCL.
VexCL, a C++ vector expression template library for OpenCL/CUDA.
Bolt, a C++ template library optimized for GPUs.
The author of CLOGS wrote a paper comparing performance of scan (and sort) operations in these implementations.
if your device supports 2.0 then, use builtin operations for that.
https://stackoverflow.com/a/32394920/4877550
http://developer.amd.com/community/blog/2014/11/17/opencl-2-0-device-enqueue/

Getting OpenCL program code from GPU

I have program which use OpenCL do do math, how i can get source code of opencl, that execute on my gpu when this program do calculations?
The most straightforward approach is to look for the kernel string in the application. Sometimes you'll be able to just find its source lying in some .cl file, otherwise you can try to scan the application's binaries with something like strings. If the application is not purposefully obfuscating the kernel source, you're likely to find it using one of those methods.
A more bulletproof approach would be to catch the strings provided to the OpenCL API. You can even provide your own OpenCL implementation that just prints out the kernel strings in the relevant cl function. It's actually pretty easy: start with pocl and change the implementation of clCreateProgramWithSource to print out the input strings - this is a trivial code change.
You can then install that modified version as an OpenCL implementation and make sure the application uses it. This might be tricky if the application requires certain OpenCL capabilities, but your implementation can of course lie about those.
Notice that in the future, SPIR can make this sort of thing impossible - you'll be able to get an IR of the kernel, but not its source.
clGetProgramInfo(..., CL_PROGRAM_BINARIES, ...) gets you the compiled binary, but interpreting that is dependent upon the architecture. Various SDK's have different tools that might get you GPU assembly though.

Finding number of copy engines using OpenCL

In OpenCL is there any API for finding number of copy engines in GPU? In cuda we can check this with asyncEngineCount. What is the alternative in OpenCL?
Within the OpenCL standard, there is no alternative because this is a hardware- and vendor-specific implementation detail. However, in the future, NVIDIA might update its cl_nv_device_attribute_query extension. This extension already contains the CL_DEVICE_GPU_OVERLAP_NV device info, that returns true if asynchronous copy is possible.

Is there a fast language that supports portable continuations?

I'm looking for a fast language (ie. a language that can be compiled natively to achieve performance not more than 3 or 4 times slower than C), which supports portable continuations. By this I mean a continuation that can be serialized on one computer, and deserialized on another.
I know that SISC can do this (a Scheme implementation in Java), but its slow. Ditto for Rhino (a Javascript implementation in Java).
Did you checked OCaml ? It can be compiled and should be marginally slower than C.
Continuations and delimited control
Scala 2.8.0 will allow continuations, and they'll be portable.
While I agree that dialects of Caml might be an excellent choice I feel I have to mention Gambit Scheme. Together with Termite, an erlang-like framework it has support for serializing continuations, sending them over the wire, and much more.
It compiles to C-code.
Its possible to do serializable continuations in Java using Apache JavaFlow - if you do go that route then the Swing Continuations library at:
http://www.exploringexcellence.com/swingcontinuations/download.html
makes it the coding a lot more pleasant.

Resources