I wonder what kind of compiler compiles .cl files when we call clBuildProgram() API during the runtime? Is that depends on the device?
When you create a program from source and call clBuildProgram(), OpenCL runtime performs on-line compilation of the source. Each OpenCL runtime from the vendor includes OpenCL C compiler. Usually, the compiler is implemented as a shared library and supports only certain type of devices. For example, Intel OpenCL runtime for GPU uses Intel Graphics Compiler library to compile the source for Intel GPU devices.
Related
clMath is an open-source project provided by AMD. It contains a clBLAS library (source code). I checked the repo, and found out that all functions are written in C, not in OpenCL. Did I looking at wrong files? Where are the OpenCL kernels? How can a C function be used in parallel computing?
Actually, there is a folder named clTemplates and all .cl files are in this folder. I suppose that when a function is called, it will generate a .cl file based on one of the files in clTemplates folder, right? Hence, there are only 39 basic OpenCL kernels.
More generally, I want to know how does AMD SDK work?
My understanding is that a SPIR binary is supposed to be LLVM bitcode and SPIR IR is a subset of LLVM IR. Additionally, SPIR is device agnostic. I've tried using the llvm-dis command on the binary I get from clGetProgramInfo with CL_PROGRAM_BINARIES as the parameter, but it tells me "Invalid bitcode signature". llvm-bcanalyzer returns "Invalid record at top-level".
I can go the opposite way by using Clang to turn my OpenCL kernel into either LLVM IR or LLVM bitcode. However, the bitcode file size is about 10x smaller so I'm pretty sure it's not the same as my SPIR binary.
Just to be complete, my GPU does have the cl_khr_spir extension.
Is my understanding of a SPIR binary as LLVN bitcode correct?
Is there a way to disassemble a SPIR binary to LLVM IR?
You are correct in that SPIR 1.2 is a subset of LLVM IR (specifically LLVM 3.2). Note that the most recent version of SPIR (known as SPIR-V) is not derived from LLVM IR, and is a standalone, from-scratch intermediate representation.
Using llvm-dis is the correct way to disassemble an LLVM-based SPIR binary. Since SPIR 1.2 is derived from LLVM 3.2, this is only really guaranteed to work for an LLVM 3.2 version of llvm-dis. In practice, I've found that this still works fine with newer versions of LLVM, but there's no guarantee that this will always be the case.
Although your device supports the cl_khr_spir extension, there is no requirement for it to actually return a SPIR binary when you query CL_PROGRAM_BINARIES from clGetProgramInfo. Many platforms will instead return a native binary (e.g. x86 or the native GPU ISA), or some other intermediate representation (this is likely why LLVM is failing to recognise your binaries as being LLVM-based). There is no standardised mechanism for retrieving a SPIR binary via the OpenCL runtime API.
Using clang to compile an OpenCL C kernel into LLVM IR/SPIR 1.2 is the best way to get an LLVM bitcode file, which can then be disassembled with llvm-dis. Some vendors (e.g. Intel) also ship offline compilers with their OpenCL SDKs that provide dedicated commands/tools to do this.
I'm interested in using an optional extension to OpenCL which adds certain functions to the OpenCL language (in particular, cl_khr_gl_msaa_sharing). I'm using Apple's openclc to compile my OpenCL sources at build-time, however, openclc fails to compile my source (see below) because of calls to these new functions. The machine I'm running on does, indeed, support the extensions, and if I use clCreateProgramWithSource() and clBuildProgram() at runtime, everything works great.
Obviously, any build-time tool can't know which extensions are supported at run-time. However, I'd like to be able to compile my source at build-time assuming the extension exists, then at run-time query for the presence for the extension and degrade gracefully if the extension isn't present. Is there a mechanism for doing anything like this?
The top answer to OpenCL half4 type Apple OS X suggests using preprocessor defines inside the OpenCL program to detect extensions, but that won't help me as those defines are evaluated at build-time.
The particular build-time compiler error is this: error: target does not support depth and MSAA textures
I had a similar problem; I wanted the compiler to assume that the extension cl_khr_fp64 existed so I could use the boilerplate code
#ifdef cl_khr_fp64
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#elif defined(cl_amd_fp64)
#pragma OPENCL EXTENSION cl_amd_fp64 : enable
#else
#error "Double precision floating point not supported by OpenCL implementation."
#endif
and have it compile as if double precision was available.
The solution (as of XCode 6.3.1) was to select Target > Build Settings > OpenCL - Preprocessing, and add cl_khr_fp64 to the OPENCL_PREPROCESSOR_DEFINITIONS section.
The code then compiled without complaining within XCode, assuming there were no syntax errors in my source code (which is what I wanted to check).
I am sorry if this is a noob question but I am new to C++ and part of the reason I am messing with openCL is to learn more C++.
I installed the CUDA SDK and it put openCL header files here:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include\CL
I added the the following two directories to additional include directories in Visual C++:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include\CL
But when I try to reference anything in the cl namespace, like they do in this tutorial it does not work because cl is undefined.
This problem has already been solved so I'm only writing here to add some information.
Instead of using the Nvidia CUDA SDK you can use the Intel or AMD SDK (I prefer Intel). They both automatically include cl.hpp and support OpenCL 1.2 as well (Nvidia SDK only supports OpenCL 1.1). You may need to add #define CL_USE_DEPRECATED_OPENCL_1_1_APIS to make sure your kernel works on Nvidia devices.
The SDK has nothing to do with the device driver which compiles and runs the kernel. That is done by a vendor's video driver. In fact you can install the Nvidia video drivers, the AMD Radeon drivers (even if you don't have a AMD video card), and the Intel OpenCL drivers. Then you can compile your host code with e.g. the Intel OpenCL SDK and run your on kernel on Nvidia GPUs and Intel/AMD CPUs.
The problem is that nVidia's OpenCL framework (bundled with CUDA) doesn't come with the C++ wrapper library. But fortunately that one is a single header-only library using the existing OpenCL C API under the hood. So all you need to do is to download the official cl.hpp from Khronos and include it in your source file (after putting it into an accessible include directory, best together with nVidia's own OpenCL headers). In fact you don't need to include any other header once you include and use cl.hpp.
But be aware that this C++ wrapper only works for OpenCL 1.1 (and is anything but the best C++ wrapper one can come up with either), but nVidia doesn't have OpenCL 1.2 support anyway.
I'm not sure if it's possible. I want to study OpenCL in-depth, so I was wondering if there is a tool to disassemble an compiled OpenCL kernel.
For normal x86 executable, I can use objdump to get a disassembly view. Is there a similar tool for OpenCL kernel, yet?
If you're using NVIDIA's OpenCL implementation for their GPUs, you can do the followings to disassemble an OpenCL kernel:
Use clGetEventProfilingInfo() to dump the ptx code to a file, say ptxfile.ptx. Please refer to the OpenCL specification to have more details on this function.
Use nvcc to compile ptx to cubin file, for example: nvcc -cubin -arch=sm_20 ptxfile.ptx will compile ptxfile.ptx onto a compute capability 2.0 device.
Use cuobjdump to disassemble the cubin file into GPU instructions. For example: cuobjdump -sass ptxfile.cubin
Hope this helps.
I know that this is an old question, but in case someone comes looking here for disassembling a AMD GPU kernel, you can do the following in linux:
export GPU_DUMP_DEVICE_KERNEL=3
This make any kernel that is compiled on your machine dump the assembled code to a file in the same directory.
Source:
http://dis.unal.edu.co/~gjhernandezp/TOS/GPU/ATI_Stream_SDK_OpenCL_Programming_Guide.pdf
Sections 4.2.1 and 4.2.2
The simplest solution, in my experience, is to use clangs OpenCL C compiler and emit SPIR.
It even works on Godbolt's compiler explorer:
https://godbolt.org/z/_JbXPb
Clang can also emit ptx (https://godbolt.org/z/4ARMqM) and amdhsa (https://godbolt.org/z/TduTZQ), but it may not correspond to the ptx and amdhsa assembly generated by the respective driver at runtime.
If you work with an AMD GPU, you can use the Analyzer tool. It is free, cross-platform, and comes in two forms:
Command line tool (ships as part of the CodeXL package, search for the CodeXLAnalyzer executable after installing).
CodeXL GUI application (just switch to the Analyzer mode in CodeXL).
Here is a short summary of what you can do with the Analyzer:
Compile OpenCL kernels, OpenGL shaders and D3D shaders for any GPU that is supported by the installed driver (even without having the GPU physically installed on your system), and get the ISA. Using CodeXL Analyzer (option #2 above), you can get additional information such as an estimation for the number of clock cycles that are required to execute the instruction.
View the compiler-generated statistics (SGPRs usage, VGPRs usage, etc.)
Generate the AMD IL code for the OpenCL kernel.
Export the compiled binaries (ELF, in binary format).
You can download the CodeXL tool suite from here: https://gpuopen.com/compute-product/codexl/
As AMD CodeXLAnalyzer not not supported anymore use
Radeon GPU Analyzer