I am sorry if this is a noob question but I am new to C++ and part of the reason I am messing with openCL is to learn more C++.
I installed the CUDA SDK and it put openCL header files here:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include\CL
I added the the following two directories to additional include directories in Visual C++:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include\CL
But when I try to reference anything in the cl namespace, like they do in this tutorial it does not work because cl is undefined.
This problem has already been solved so I'm only writing here to add some information.
Instead of using the Nvidia CUDA SDK you can use the Intel or AMD SDK (I prefer Intel). They both automatically include cl.hpp and support OpenCL 1.2 as well (Nvidia SDK only supports OpenCL 1.1). You may need to add #define CL_USE_DEPRECATED_OPENCL_1_1_APIS to make sure your kernel works on Nvidia devices.
The SDK has nothing to do with the device driver which compiles and runs the kernel. That is done by a vendor's video driver. In fact you can install the Nvidia video drivers, the AMD Radeon drivers (even if you don't have a AMD video card), and the Intel OpenCL drivers. Then you can compile your host code with e.g. the Intel OpenCL SDK and run your on kernel on Nvidia GPUs and Intel/AMD CPUs.
The problem is that nVidia's OpenCL framework (bundled with CUDA) doesn't come with the C++ wrapper library. But fortunately that one is a single header-only library using the existing OpenCL C API under the hood. So all you need to do is to download the official cl.hpp from Khronos and include it in your source file (after putting it into an accessible include directory, best together with nVidia's own OpenCL headers). In fact you don't need to include any other header once you include and use cl.hpp.
But be aware that this C++ wrapper only works for OpenCL 1.1 (and is anything but the best C++ wrapper one can come up with either), but nVidia doesn't have OpenCL 1.2 support anyway.
Related
I wonder what kind of compiler compiles .cl files when we call clBuildProgram() API during the runtime? Is that depends on the device?
When you create a program from source and call clBuildProgram(), OpenCL runtime performs on-line compilation of the source. Each OpenCL runtime from the vendor includes OpenCL C compiler. Usually, the compiler is implemented as a shared library and supports only certain type of devices. For example, Intel OpenCL runtime for GPU uses Intel Graphics Compiler library to compile the source for Intel GPU devices.
I have a existing 64-bit Qt Linux project (C/C++), now I wanted to add additional hardware. Unfortunately the hardware vendor provides a SDK with 32-bit binary-only C .so.
Just including the library leads to an error like this:
/usr/bin/x86_64-linux-gnu-ld: skipping incompatible /home/SDK/lib when searching for -example
/usr/bin/x86_64-linux-gnu-ld: cannot find -example
Is there any way to include this library into my existing project?
I found Mixing 32 and 64-bit Libraries in Linux (gcc), but maybe there are some changes as it's already 7 years old.
Thank you in advance!
The x86 and amd64 ABIs are completely different on Linux, so you can't call 32-bit libraries from 64-bit code directly. That said, you can achieve your objective by creating a separate 32-bit program that proxies calls into the library and exposes them via REST, WSDL, Protobuf, or your favorite way of doing IPCs, and then making those calls from the 64-bit process.
I'm trying to build a simple application with CUDA and I've been trying for hours on end and I just can't make it work on windows. nvcc absolutely refuses to compile without Visual Studio's compiler which doesn't support things I need. I tried building using nvcc with clang but It just asks me to use Visual Studio's compiler. I've also tried using clang directly since it now supports CUDA but I receive this error:
clang++.exe: error: Unsupported CUDA gpu architecture: compute_52
This makes no sense to me because I have the CUDA toolkit version 7.5 and my graphics card is a GTX 970 (two of them). I have googled this extensively and everywhere I come across the error the person always has is their CUDA toolkit is < 7.5. I'm on the brink of tears right now trying to get something as simple as VLA to work on this CUDA application and I just can't achieve it...
The CUDA windows toolchain requires the Visual Studio C++ compiler. You cannot use anything else on that platform. If the VS compiler doesn't support the language features you need within CUDA host code, you have no choice but to change platforms, or your expectations.
You can still potentially compile non-CUDA host code using another compiler and then link that code using NVCC and the VS toolchain.
Try to use clang-cl, --cubin=clang-cl.exe
It may be worth to work on a Linux VM or WSL2 within windows. As per the CUDA docs.
To compile new CUDA applications, a CUDA Toolkit for Linux x86 is
needed. CUDA Toolkit support for WSL is still in preview stage as
developer tools such as profilers are not available yet. However, CUDA
application development is fully supported in the WSL2 environment, as
a result, users should be able to compile new CUDA Linux applications
with the latest CUDA Toolkit for x86 Linux.
https://docs.nvidia.com/cuda/wsl-user-guide/index.html#:~:text=However%2C%20CUDA%20application%20development%20is,becomes%20available%20within%20WSL%202.
I'm not sure if it's possible. I want to study OpenCL in-depth, so I was wondering if there is a tool to disassemble an compiled OpenCL kernel.
For normal x86 executable, I can use objdump to get a disassembly view. Is there a similar tool for OpenCL kernel, yet?
If you're using NVIDIA's OpenCL implementation for their GPUs, you can do the followings to disassemble an OpenCL kernel:
Use clGetEventProfilingInfo() to dump the ptx code to a file, say ptxfile.ptx. Please refer to the OpenCL specification to have more details on this function.
Use nvcc to compile ptx to cubin file, for example: nvcc -cubin -arch=sm_20 ptxfile.ptx will compile ptxfile.ptx onto a compute capability 2.0 device.
Use cuobjdump to disassemble the cubin file into GPU instructions. For example: cuobjdump -sass ptxfile.cubin
Hope this helps.
I know that this is an old question, but in case someone comes looking here for disassembling a AMD GPU kernel, you can do the following in linux:
export GPU_DUMP_DEVICE_KERNEL=3
This make any kernel that is compiled on your machine dump the assembled code to a file in the same directory.
Source:
http://dis.unal.edu.co/~gjhernandezp/TOS/GPU/ATI_Stream_SDK_OpenCL_Programming_Guide.pdf
Sections 4.2.1 and 4.2.2
The simplest solution, in my experience, is to use clangs OpenCL C compiler and emit SPIR.
It even works on Godbolt's compiler explorer:
https://godbolt.org/z/_JbXPb
Clang can also emit ptx (https://godbolt.org/z/4ARMqM) and amdhsa (https://godbolt.org/z/TduTZQ), but it may not correspond to the ptx and amdhsa assembly generated by the respective driver at runtime.
If you work with an AMD GPU, you can use the Analyzer tool. It is free, cross-platform, and comes in two forms:
Command line tool (ships as part of the CodeXL package, search for the CodeXLAnalyzer executable after installing).
CodeXL GUI application (just switch to the Analyzer mode in CodeXL).
Here is a short summary of what you can do with the Analyzer:
Compile OpenCL kernels, OpenGL shaders and D3D shaders for any GPU that is supported by the installed driver (even without having the GPU physically installed on your system), and get the ISA. Using CodeXL Analyzer (option #2 above), you can get additional information such as an estimation for the number of clock cycles that are required to execute the instruction.
View the compiler-generated statistics (SGPRs usage, VGPRs usage, etc.)
Generate the AMD IL code for the OpenCL kernel.
Export the compiled binaries (ELF, in binary format).
You can download the CodeXL tool suite from here: https://gpuopen.com/compute-product/codexl/
As AMD CodeXLAnalyzer not not supported anymore use
Radeon GPU Analyzer
I'd like to get FreeRTOS running on an MSP430 processor using Code Composer Essentials v3.1. I found an example of just this at http://www.westmorelandengineering.com/toc.htm. Specifically I’m working with FreeRTOS_Demo.zip, the top one. When I try to open it with CCE I get an error that the workspace "was not created by this version of Code Composer". So I tried to import the project and I get an error "The Managed Make project could not be read because of the following error: Project type com.ti.ccstudio.managedbuild.ui.programTargetID not found. Managed Make functionality will not be available for this project."
I’m wondering what my problem is and how I can get the project to build, or should I go about this a different way?
FreeRTOS support many, many, many chips and many, many, many compilers. Anything that is not standard C code is kept in a port layer.
The next FreeRTOS release (V7, out in the next couple of weeks and already available in the SVN repository) includes a CCS4 port and demo for the MSP430F5438 (MSP430X core).
Regards.
I was told that TI's CCS compiler suite (used in CCE/CCS) will not build the FreeRTOS sources because the FreeRTOS sources include stuff written in gnu assembler syntax (file extension .s is common between CCS asm and Gnu asm, but syntax is not the same). Until FreeRTOS is "ported" to the CCS compiler suite, your best bet is to use the full CCS with the GCC compiler instead of the CCS compiler.
reviving a zombie thread... not sure if CCE is even relevant now... you can get CCS 5.3 with code-size limited free MSP430 support.
I recently ported FreeRTOS to the CC430 using the new MP430Ware driver library from TI and Code Composer Studio 5.3, get it here:
http://www.freertos.org/Interactive_Frames/Open_Frames.html?http://interactive.freertos.org/entries/22894958-cc430f5137-ccs-5-3