Disassemble an OpenCL kernel? - opencl

I'm not sure if it's possible. I want to study OpenCL in-depth, so I was wondering if there is a tool to disassemble an compiled OpenCL kernel.
For normal x86 executable, I can use objdump to get a disassembly view. Is there a similar tool for OpenCL kernel, yet?

If you're using NVIDIA's OpenCL implementation for their GPUs, you can do the followings to disassemble an OpenCL kernel:
Use clGetEventProfilingInfo() to dump the ptx code to a file, say ptxfile.ptx. Please refer to the OpenCL specification to have more details on this function.
Use nvcc to compile ptx to cubin file, for example: nvcc -cubin -arch=sm_20 ptxfile.ptx will compile ptxfile.ptx onto a compute capability 2.0 device.
Use cuobjdump to disassemble the cubin file into GPU instructions. For example: cuobjdump -sass ptxfile.cubin
Hope this helps.

I know that this is an old question, but in case someone comes looking here for disassembling a AMD GPU kernel, you can do the following in linux:
export GPU_DUMP_DEVICE_KERNEL=3
This make any kernel that is compiled on your machine dump the assembled code to a file in the same directory.
Source:
http://dis.unal.edu.co/~gjhernandezp/TOS/GPU/ATI_Stream_SDK_OpenCL_Programming_Guide.pdf
Sections 4.2.1 and 4.2.2

The simplest solution, in my experience, is to use clangs OpenCL C compiler and emit SPIR.
It even works on Godbolt's compiler explorer:
https://godbolt.org/z/_JbXPb
Clang can also emit ptx (https://godbolt.org/z/4ARMqM) and amdhsa (https://godbolt.org/z/TduTZQ), but it may not correspond to the ptx and amdhsa assembly generated by the respective driver at runtime.

If you work with an AMD GPU, you can use the Analyzer tool. It is free, cross-platform, and comes in two forms:
Command line tool (ships as part of the CodeXL package, search for the CodeXLAnalyzer executable after installing).
CodeXL GUI application (just switch to the Analyzer mode in CodeXL).
Here is a short summary of what you can do with the Analyzer:
Compile OpenCL kernels, OpenGL shaders and D3D shaders for any GPU that is supported by the installed driver (even without having the GPU physically installed on your system), and get the ISA. Using CodeXL Analyzer (option #2 above), you can get additional information such as an estimation for the number of clock cycles that are required to execute the instruction.
View the compiler-generated statistics (SGPRs usage, VGPRs usage, etc.)
Generate the AMD IL code for the OpenCL kernel.
Export the compiled binaries (ELF, in binary format).
You can download the CodeXL tool suite from here: https://gpuopen.com/compute-product/codexl/

As AMD CodeXLAnalyzer not not supported anymore use
Radeon GPU Analyzer

Related

What compiles .cl files when calling clBuildProgram() API?

I wonder what kind of compiler compiles .cl files when we call clBuildProgram() API during the runtime? Is that depends on the device?
When you create a program from source and call clBuildProgram(), OpenCL runtime performs on-line compilation of the source. Each OpenCL runtime from the vendor includes OpenCL C compiler. Usually, the compiler is implemented as a shared library and supports only certain type of devices. For example, Intel OpenCL runtime for GPU uses Intel Graphics Compiler library to compile the source for Intel GPU devices.

Out-of-tree galcore build causing Qt seg fault

I use Yocto (Krogoth) to build my imx6 images and toolchains, however it's a bit heavy and slow for working on kernel drivers. As such my dev cycle is to build the kernel on its own, just using the output of a "do_patch" run in yocto as the source tree base and sourcing the toolchain environment.
This is normally not a problem, as mostly I'm focussed at that end of the s/w stack. However, I now need to be able to run a Qt application (running under eglfs) on top of my continually updated kernel, for a bug hunt. To do this, I need the imx6 graphics driver working, so I get the galcore source from git://github.com/Freescale/kernel-module-imx-gpu-viv.git export my kernel build directory, make it and deploy it. That module loads perfectly. However running the working application that has already been built with Yocto causes a crash, somewhere in libQt5EglDeviceIntegration.so.5. All the libs etc. are part of the original working image, the same place I took my kernel source from.
What do I need to do to make this work? Is there some part of Qt tied to the graphics driver that's going to force me to rebuild the entire library? What's the relationship between galcore.ko and Qt? Is there now a weird dependency between my application and the linux kernel?!
EDIT: PEBCAK. I'm an idiot. I didn't check out from the right SHA1 (that listed in the recipe) for the galcore driver. Still, the answer below is instructive, so I'd like to keep this question.
What do I need to do to make this work?
No idea. Maybe your self-built galcore.ko is incompatible with the binary blob OpenGL libraries from Freescale somehow? Does the original galcore.ko work correctly? How does the backtrace look?
Is there some part of Qt tied to the graphics driver that's going to force me to rebuild the entire library?
No need to rebuild Qt. While Qt is linked against the OpenGL library, the OpenGL ABI/API is stable and therefore a Qt rebuild isn't needed. Besides that, you aren't changing the OpenGL libraries.
What's the relationship between galcore.ko and Qt?
Qt uses OpenGL for rendering when using QtQuick. The OpenGL library (libGL.so and a few variants like libGLes2.so) is provided by Freescale as a binary blob. The OpenGL library makes syscalls that end up in the galcore.ko kernel module.
libQt5EglDeviceIntegration.so.5 is the part in Qt that does the first OpenGL calls to initialize OpenGL.
Is there now a weird dependency between my application and the linux kernel?!
Well, yes, indirectly via Qt -> libGL.so -> kernel [galcore.ko]

How to interface my own LLVM backend with LLVM IR

I have my own linker and machine code converter.I am using my own assembly instruction for my machine.This machine is a software processor which executes machine code generated by asm to hex converter. Instead of assembly, i wan to use c language now.My question is that how to use LLVM for this purpose.
One approach could be that:
Create one parser which will read .s file (sort of asm file) generated by LLVM IR and map those instruction with my processor specific asm instruction.
I donot want to create linker and asm to machine code converter again.
Is my approach ok? or what could be the better way to do that.
The *.s file you read is not just "sort of asm", it is actually assembler that has already passed some LLVM backend, probably some X86 variant if you have not chosen a different target.
What you really want to do is to make LLVM emit assembly instructions for your own machine instead. This is what Writing an LLVM Backend and similar guides are about.
This is not exactly simple, but I expect that trying to translate some other machine's instruction set (let alone X86) to your own is probably even more difficult, as you would have to emulate each and every detail of a very complex machine.

OpenCL extensions with Apple's openclc

I'm interested in using an optional extension to OpenCL which adds certain functions to the OpenCL language (in particular, cl_khr_gl_msaa_sharing). I'm using Apple's openclc to compile my OpenCL sources at build-time, however, openclc fails to compile my source (see below) because of calls to these new functions. The machine I'm running on does, indeed, support the extensions, and if I use clCreateProgramWithSource() and clBuildProgram() at runtime, everything works great.
Obviously, any build-time tool can't know which extensions are supported at run-time. However, I'd like to be able to compile my source at build-time assuming the extension exists, then at run-time query for the presence for the extension and degrade gracefully if the extension isn't present. Is there a mechanism for doing anything like this?
The top answer to OpenCL half4 type Apple OS X suggests using preprocessor defines inside the OpenCL program to detect extensions, but that won't help me as those defines are evaluated at build-time.
The particular build-time compiler error is this: error: target does not support depth and MSAA textures
I had a similar problem; I wanted the compiler to assume that the extension cl_khr_fp64 existed so I could use the boilerplate code
#ifdef cl_khr_fp64
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#elif defined(cl_amd_fp64)
#pragma OPENCL EXTENSION cl_amd_fp64 : enable
#else
#error "Double precision floating point not supported by OpenCL implementation."
#endif
and have it compile as if double precision was available.
The solution (as of XCode 6.3.1) was to select Target > Build Settings > OpenCL - Preprocessing, and add cl_khr_fp64 to the OPENCL_PREPROCESSOR_DEFINITIONS section.
The code then compiled without complaining within XCode, assuming there were no syntax errors in my source code (which is what I wanted to check).

How do you access the cl namespace from NVIDIA header files?

I am sorry if this is a noob question but I am new to C++ and part of the reason I am messing with openCL is to learn more C++.
I installed the CUDA SDK and it put openCL header files here:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include\CL
I added the the following two directories to additional include directories in Visual C++:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include\CL
But when I try to reference anything in the cl namespace, like they do in this tutorial it does not work because cl is undefined.
This problem has already been solved so I'm only writing here to add some information.
Instead of using the Nvidia CUDA SDK you can use the Intel or AMD SDK (I prefer Intel). They both automatically include cl.hpp and support OpenCL 1.2 as well (Nvidia SDK only supports OpenCL 1.1). You may need to add #define CL_USE_DEPRECATED_OPENCL_1_1_APIS to make sure your kernel works on Nvidia devices.
The SDK has nothing to do with the device driver which compiles and runs the kernel. That is done by a vendor's video driver. In fact you can install the Nvidia video drivers, the AMD Radeon drivers (even if you don't have a AMD video card), and the Intel OpenCL drivers. Then you can compile your host code with e.g. the Intel OpenCL SDK and run your on kernel on Nvidia GPUs and Intel/AMD CPUs.
The problem is that nVidia's OpenCL framework (bundled with CUDA) doesn't come with the C++ wrapper library. But fortunately that one is a single header-only library using the existing OpenCL C API under the hood. So all you need to do is to download the official cl.hpp from Khronos and include it in your source file (after putting it into an accessible include directory, best together with nVidia's own OpenCL headers). In fact you don't need to include any other header once you include and use cl.hpp.
But be aware that this C++ wrapper only works for OpenCL 1.1 (and is anything but the best C++ wrapper one can come up with either), but nVidia doesn't have OpenCL 1.2 support anyway.

Resources