Background
I am somewhat familiar with NVidia CUDA development. In CUDA world it's quite easy to debug the code inside kernels: both NSight Visual Studio and Nsight Visual Studio Code allow to do that.
Objective
I would like to be able to run a visual debugger for individual OpenCL kernels at runtime, similar to CUDA kernels in Nsight Visual Studio (with Locals, Watches, Memory and Breakpoints set in kernels).
What I have tried
printf debugging. This is a very limited approach that allows to debug only simple programs. It is not reliable and it does not allow to easily find problems when, for example, two parallel kernels try to write to the same memory address.
Searched the official documentation of Khronos Group for kernel debugging, did not find anything.
Searched the official documentation of Intel OpenCL page for kernel debugging, did not find anything (the last update on their tutorials page is from 2015).
Searched the official documentation of AMD for kernel debugging, did not find anything, but a reference to the discontinued IDE CodeXL.
Searched the respective forum of AMD ROCm for kernel debugging, did not find anything.
Searched the official documentation of ROCDebugger (AMD ROCm), did not find any clues of how visual debugging of OpenCL kernels can be done with it.
Question
How can I run a visual debugger for individual OpenCL kernels at runtime, similar to CUDA kernels in Nsight Visual Studio (with Locals, Watches, Memory and Breakpoints set in kernels)?
Intel provides Intel Distribution for GDB to debug OpenCL kernels on GPU. The debugger is available for both Linux and Windows systems. It is a part of oneAPI Base Toolkit. Additional information about how to use it on a particular system can be found in the official documentation:
Get Started with Intel Distribution for GDB on Linux OS Host
Get Started with Intel Distribution for GDB on Windows OS Host
Related
For context, I am researching upgrades into our OpenCL code at work as we are quite far behind current spec. The team I work in develops on all three of the major OSes and as far as I can tell Apple doesn't support past OpenCL 1.2 on any of their machines instead prompting you to learn and use Metal performance shaders.
I would like to update our code base to C++ for OpenCL which is based on OpenCL 2.0 or 3.0 depending on which version you use. C++ for OpenCL is fine on Windows and Linux as they have GPU drivers which support up to and past the 2.0 requirement, but Mac might have a problem with it without official support in the drivers. So I had the idea that if we could cross compile the CL kernels to Metal for the Mac users they would be able to run a Metal version of the code locally.
Has anyone got any solutions to this problem.
I have an application that uses MPI_COMM_WORLD. I'm building the application with HPC Pack 2008 R2 MPI and everything works fine on my local and most PCs.
Occasionally when installing on a different PC I will run into issues with competing versions of MPI (e.g. Intel). This is usually solved by prepending my HPC version to the PATH.
I have recently hit an issue that I can't work around.
My MPI is first in the path, but I'm getting an error "link library mkl_intel_thread.dll" cannot be found. This tells me the app is looking at the Intel version.
where mpiexec
mkl_thread not found
So my general questions are:
Is there a version of MPI that is compatibly with all others?
Or is there a way to compile my application in a way that I can
ensure it looks for the right MPI library?
Or is there a way to compile to make the application MPI agnostic?
Thanks in advance
The advantage of MPI is the MPI standard. This means as long as you stick to a certain version of the MPI standard, your programm should be compatible with MPI standard compliant implementations.
The missing Math Kernel Library has nothing to do with MPI incompatibility.
So, Intel SDK works with intel cpu, gpu, and xeon phi.
AMD SDK works with AMD gpu and cpu.
I would like to develop an application that targets intel cpu and AMD gpu.
Can anyone suggest a development strategy to achieve this?
Thanks.
Edit: I would like to run both cpu and gpu kernels concurrently on the same system.
When you get list of available platforms, in case of Intel CPU/AMD GPU you shall have 2 platforms, each with it's own ID.
Usually, that's it, you create devices an so on, using necessary platform ID in each case.
If you are using Windows, it's not so difficult to see in debugger, that different platforms corresponds to different OpenCL libraries (just go deeper into cl_platform_id structure) - both of dll's are loaded.
Put your OpenCL code (not necessarily the kernel) in a library and create and link the DLL files for the AMD and Intel (and NVIDIA) devices. Create a new program and dynamically load the library based on which platforms the user has installed.
Kind of a pain in the butt but it works in Labview so it should work in other languages.
If you are using Windows, you can use LoadLibrary and put the library in a folder that is in your PATH (Windows Environment Variable) or in the same folder as the .EXE.
I have seen that AMD APP SDK samples work on a machine having only Intel CPU.
How can this happen? How does the compiler target a different machine architecture?
Do I not need Intel's set of compilers for running the code on the intel CPU?
I think if we have to run an OpenCL application on a specific hardware, I have to (re)compile it using device's vendor specifics compiler.
Where is my understanding wrong?
Firstly, OpenCL is built to work on CPU's and GPU's. You can compile and run the same source code on either type of device. However, its very likely that CPU code will be sub-optimal for a GPU and vice-versa.
AMD H/W is 7% - 14% of total x86/x64 CPU's. So AMD must develop compilers for both AMD and Intel chips to be relevant. AMD have history developing compilers for both sets of chips. Conversely, Intel have developed compilers that either don't work on AMD chips or don't work that well. That's no surprise.
With OpenCL, the AMD APP SDK is the most flexible it will work well on AMD and Intel CPU's and AMD GPUs. Intel's OpenCL SDK doesn't even install on AMD x86 H/W.
If you compile an OpenCL program to binary, you can save and reuse it as long as it matches the OpenCL Platform and Device that created it. So, if you compile for one device and use on another you are very likely to get an error.
The power of OpenCL is abstracting the underlaying hardware and offer massive, parallel and heterogeneous computing power.
Some SDKs and platforms offers some specific features to "optimize" the code, i honestly think that such features are just marketing and they introduce boilerplate code making the application less portable.
There are also some pseudo-new technologies that are just wrappers to OpenCL or they are really similar in the concept like the Intel quick sync.
About Intel i should say that at the first place they were supporting all the iCore generation and even some C2D, now the new SDK only support the 3rd iCore generation, i don't get their strategy honestly, probably Intel is the last option if you want to adopt OpenCL and targeting the biggest possible audience, also their SDK doesn't seems to be really good at all .
Stick with the standard and you will avoid both possible legal and performance issues and your code will also be more portable.
The bottom line is that the AMD SDK includes a compiler for targeting x86 CPUs for OpenCL. That means that even though you are running an Intel CPU the generated code will run on it. It's the same concept as compiling a C program to run on an x86 CPU: it works on Intel and AMD CPUs (or any that implement the x86 instruction set).
The vendor's compiler might have specific optimizations, like user827992 mentions, but in my experience the performance of AMD's CPU compiler isn't that bad when running on an Intel CPU. I haven't tried Intel's OpenCL implementation.
It is true that for some (maybe most in the future) hardware, only the vendor's compiler will support it. AMD's SDK won't build code that will run on an NVIDIA card, and vice-versa. CPUs happen to be a bit of a special case in that the basic instruction set is so widely deployed that the CPU compiler will work on most machines you're likely to come in contact with.
I have a system with an NVidia graphics card and I'm looking at using openCL to replace openMP for some small on CPU tasks (thanks to VS2010 making openMP useless)
Since I have NVidia's opencl SDK installed clGetPlatformIDs() only returns a single platform (NVidia's) and so only a single device (the GPU).
Do I need to also install Intel's openCL sdk to get access to the CPU platform?
Shouldn't the CPU platform always be available - I mean, how do you NOT have a cpu?
How do you manage to build against two openCL SDKs simultaneously?
You need to have a SDK which provides interface to CPU. nVidia does not, AMD and Intel's SDKs do; in my case the one from Intel is significantly (something like 10x) faster, which might due to bad programming on my part however.
You don't need the SDK for programs to run, just the runtime. In Linux, each vendor installs a file in /etc/OpenCL/vendors/*.icd, which contains path of the runtime library to use. That is scanned by the OpenCL runtime you link to (libOpenCL.so), which then calls each of the vendor's libs when querying for devices on that particular platform.
In Linux, the GPU drivers install OpenCL runtime automatically, the Intel runtime is likely to be downloadable separately from the SDK, but is part of the SDK as well, of course.
Today i finally got around to trying to start doing openCl development and wow... it is not straight forward at all.
There's an AMD sdk, there's an intel sdk, there's an nvidia sdk, each with their own properties (CPU only vs GPU only vs specific video card support only perhaps?)
There may be valid technical reasons for it having to be this way but i really wish there was just one sdk, and that when programming perhaps you could specify GPU / CPU tasks, or that maybe it would use whatever resources made most sense / preformed best or SOMETHING.
Time to dive in though I guess... trying to decide though if i go CPU or GPU. I have a pretty new 4000$ alienware laptop with SLI video cards, but then also an 8 core cpu so yeah... guess ill have to try a couple sdk's and see which preforms best for my needs?
Not sure what end users of my applications would do though... it doesnt seem like they can flip a switch to make it run on cpu or gpu instead.
The OpenCL landscape really needs some help...