I am just curious about the workflow of OpenCL.
Does the OpenCL runtime does some kind of a JIT compilation?
I will explain the best of my understanding,
From the kernel file generate a SPIR file.
Then the host program requests the OpenCL runtime to execute the SPIR file.
The OpenCL runtime schedules and translate the SPIR into a form that each device can understand and then send it to the device.
If what I've explained is correct, then I think in #3, there should be some kind of JIT compilation for each devices. I think this is supported for portability, but doesn't this hurt performance?
Related
I started to learn OpenCl.
I read these links:
https://en.wikipedia.org/wiki/OpenCL
https://github.com/KhronosGroup/OpenCL-Guide/blob/main/chapters/os_tooling.md
https://www.khronos.org/opencl/
but I did not understand well that OpenCl is a library by including header file in source code or it is a compiler by using OpenCl C Compiler?!
It is both a library and a compiler.
The OpenCL C/C++ bindings that you include as header files and that you link against are a library. These provide the necessary functions and commands in C/C++ to control the device (GPU).
For the device, you write OpenCL C code. This language is not C or C++, but rather based on C99 and has some specific limitations (for example only 1D arrays) as well as extensions (math and vector functionality).
The OpenCL compiler sits in between the C/C++ bindings and the OpenCL C part. Using the C/C++ bindings, you run the compiler at runtime of the executable with the command clBuildProgram (C bindings) or program.build(...) (C++). It then at runtime once compiles the OpenCL C code to the device-specific assembly, which is different for every vendor. With an Nvidia GPU, you can for example look at the Compiler PTX assembly for the device.
If you know OpenGL then OpenCL works on the same principle.
Disclaimer: Self-learned knowledge ahead. I wanted to learn OpenCL and found the OpenCL term as confusing as you do. So I did some painful research until I got my first OpenCL hello-world program working.
Overview
OpenCL is an open standard - i.e. just API specification which targets heterogeneous computing hardware in particular.
The standard comprises a set of documents available here. It is up to the manufacturers to implement the standard for their devices and make OpenCL available e.g. through GPU drivers to users. Perhaps in the form of a shared library.
It is also up to the manufacturers to provide tools for developer to make applications using OpenCL.
Here's where it gets complicated.
SDKs
Manufacturers provide SDKs - software packages that contain everything the said developer needs. (See the link above). But they are specific for each - e.g. NVIDIA SDK won't work without their gpu.
ICD Loader
Because of SDKs being tied to a signle vendor, the most portable(IMHO) solution is to use what is known as Khronos' ICD loader. It is kind of "meta-driver" that will, during run-time, search for other ICDs present in the system by AMD, Intel, NVIDIA, and others; then forward them calls from our application. So, as a developer, we can develop against this generic driver and use clGetPlatformIDs to fetch the available platforms and devices. It is availble as libOpenCL.so, at least on Linux, and we should link against it.
Counterpart for OpenGL's libOpenGL, well almost, because the vast majority of OpenGL(1.1+) is present in the form of extensions and must be loaded separately with e.g. GLAD. In that sense, GLAD is very similar to the ICD loader.
Again, it does not contain any actual "computing" code, only stub implementations of the API which forward everything to the chosen platform's ICD.
Headers
We are still missing the headers, thankfully Khronos organization releases C headers and also C++ bindings. But nothing is stopping you from writing them yourself based on the official API documents. It would just be really tedious and error-prone.
Here we can find yet another parallel with OpenGL because the headers are also just the consequence of the Standard and GLAD generates them directly from its XML version! How cool is that?!
Summary
To write a simple OpenCL application we need to:
Download an ICD from the device's manufactures - e.g. up-to-date GPU drivers is enough.
Download the headers and place them in some folder.
Download, build, and install an ICD loader. It will likely need the headers too.
Include the headers, use API in them, and link against the ICD loader.
For Debian, maybe Ubuntu and others there is a simpler approach:
Download the drivers... look for <vendor>-opencl-icd, the drivers on Linux are usually not as monolithic as on Windows and might span many packages.
Install ocl-icd-opencl-dev which contains C, C++ headers + the loader.
Use the headers and link the library.
Is there any way to debug OpenCL kernels on an Nvidia GPU, i.e. set breakpoints and inspect variables? My understanding is that Nvidia's tool does not allow OpenCL debugging, and AMD's and Intel's only allow it on their own devices.
gDEBugger might help you somewhat (never used it though), but other than that there isn't any tool that I know of that can set breakpoints or inspect variables inside a kernel. Perhaps try to save intermediate outputs from your kernel if it is a long kernel. Sorry I can't give you a magic solution, debugging OpenCL is just hard.
I have seen that AMD APP SDK samples work on a machine having only Intel CPU.
How can this happen? How does the compiler target a different machine architecture?
Do I not need Intel's set of compilers for running the code on the intel CPU?
I think if we have to run an OpenCL application on a specific hardware, I have to (re)compile it using device's vendor specifics compiler.
Where is my understanding wrong?
Firstly, OpenCL is built to work on CPU's and GPU's. You can compile and run the same source code on either type of device. However, its very likely that CPU code will be sub-optimal for a GPU and vice-versa.
AMD H/W is 7% - 14% of total x86/x64 CPU's. So AMD must develop compilers for both AMD and Intel chips to be relevant. AMD have history developing compilers for both sets of chips. Conversely, Intel have developed compilers that either don't work on AMD chips or don't work that well. That's no surprise.
With OpenCL, the AMD APP SDK is the most flexible it will work well on AMD and Intel CPU's and AMD GPUs. Intel's OpenCL SDK doesn't even install on AMD x86 H/W.
If you compile an OpenCL program to binary, you can save and reuse it as long as it matches the OpenCL Platform and Device that created it. So, if you compile for one device and use on another you are very likely to get an error.
The power of OpenCL is abstracting the underlaying hardware and offer massive, parallel and heterogeneous computing power.
Some SDKs and platforms offers some specific features to "optimize" the code, i honestly think that such features are just marketing and they introduce boilerplate code making the application less portable.
There are also some pseudo-new technologies that are just wrappers to OpenCL or they are really similar in the concept like the Intel quick sync.
About Intel i should say that at the first place they were supporting all the iCore generation and even some C2D, now the new SDK only support the 3rd iCore generation, i don't get their strategy honestly, probably Intel is the last option if you want to adopt OpenCL and targeting the biggest possible audience, also their SDK doesn't seems to be really good at all .
Stick with the standard and you will avoid both possible legal and performance issues and your code will also be more portable.
The bottom line is that the AMD SDK includes a compiler for targeting x86 CPUs for OpenCL. That means that even though you are running an Intel CPU the generated code will run on it. It's the same concept as compiling a C program to run on an x86 CPU: it works on Intel and AMD CPUs (or any that implement the x86 instruction set).
The vendor's compiler might have specific optimizations, like user827992 mentions, but in my experience the performance of AMD's CPU compiler isn't that bad when running on an Intel CPU. I haven't tried Intel's OpenCL implementation.
It is true that for some (maybe most in the future) hardware, only the vendor's compiler will support it. AMD's SDK won't build code that will run on an NVIDIA card, and vice-versa. CPUs happen to be a bit of a special case in that the basic instruction set is so widely deployed that the CPU compiler will work on most machines you're likely to come in contact with.
I've been doing some research in to OpenCL, and the possibility of using it on a project. The question I have is, is there a way to run OpenCL code on a CPU that is unsupported by the OpenCL SDKs in a C++ application. I know Java has Aparapi, however I'm wondering how to run OpenCL code in a C++ application without hardware that is supported by the SDKs. There is some code I would like to write in OpenCL kernels to take advantage of the OpenCL parallelism where available, however I'm unsure if I wouldn't be able to run it on older hardware (still X86, but not recent hardware). Could anyone explain to me how this can be done, or if it is even a problem at all to run OpenCL code on older systems?
Thanks,
Peter
I would say best way to approach this is to check if the device supports OpenCL via OpenCL API calls such as clPlatformIDs then once you figure it isn't a OpenCL device then run the required code as normal C/C++ function otherwise run it using openCL kernel. But for portability you need to write the program logic twice once in .cl file and once as normal c/c++ method/function.
I have a system with an NVidia graphics card and I'm looking at using openCL to replace openMP for some small on CPU tasks (thanks to VS2010 making openMP useless)
Since I have NVidia's opencl SDK installed clGetPlatformIDs() only returns a single platform (NVidia's) and so only a single device (the GPU).
Do I need to also install Intel's openCL sdk to get access to the CPU platform?
Shouldn't the CPU platform always be available - I mean, how do you NOT have a cpu?
How do you manage to build against two openCL SDKs simultaneously?
You need to have a SDK which provides interface to CPU. nVidia does not, AMD and Intel's SDKs do; in my case the one from Intel is significantly (something like 10x) faster, which might due to bad programming on my part however.
You don't need the SDK for programs to run, just the runtime. In Linux, each vendor installs a file in /etc/OpenCL/vendors/*.icd, which contains path of the runtime library to use. That is scanned by the OpenCL runtime you link to (libOpenCL.so), which then calls each of the vendor's libs when querying for devices on that particular platform.
In Linux, the GPU drivers install OpenCL runtime automatically, the Intel runtime is likely to be downloadable separately from the SDK, but is part of the SDK as well, of course.
Today i finally got around to trying to start doing openCl development and wow... it is not straight forward at all.
There's an AMD sdk, there's an intel sdk, there's an nvidia sdk, each with their own properties (CPU only vs GPU only vs specific video card support only perhaps?)
There may be valid technical reasons for it having to be this way but i really wish there was just one sdk, and that when programming perhaps you could specify GPU / CPU tasks, or that maybe it would use whatever resources made most sense / preformed best or SOMETHING.
Time to dive in though I guess... trying to decide though if i go CPU or GPU. I have a pretty new 4000$ alienware laptop with SLI video cards, but then also an 8 core cpu so yeah... guess ill have to try a couple sdk's and see which preforms best for my needs?
Not sure what end users of my applications would do though... it doesnt seem like they can flip a switch to make it run on cpu or gpu instead.
The OpenCL landscape really needs some help...