I'm a graduate student from Korea.
Recently, I Started studying OpenCL and the NVIDIA JETSON TK1.
The NVIDIA JETSON K1 can run CUDA programs, so here is my question:
I would like to execute OpenCL kernels on the NVIDIA JETSON, but compiling a simple example, gives me an error message "CL/cl.h" header no such file or directory.
So, how should I proceed to compile and execute OpenCL kernel on the NVIDIA JETSON TK1?
Related
I have device which has following configuration:
Chipset architecture - Intel NM10 express
Processor - Atom D2550 Dual Core
Display - DVI
Volatile Memory - 2GB DDR3
Storage - 16GB
Objective: Device should run yocto embededded OS successfully
What I have done,
Downloaded three required yocto layers for warrior branch i.e. 1. poky 2. meta-openembedded 3. meta-intel
Modified local.conf with MACHINE ??= "intel-core2-32"
Ran source poky/oe-init-build-env
Generated .hddimg by bitbake core-image-minimal
Flashed .hddimg to thumb drive through dd command
Attached thumb drive to device and I could see BOOT and INSTALL option, upon clicking any of them nothing happens(not even logs) i.e. Blank screen
Troubleshooting I tried out are,
Tried to boot lubuntu and it was successful
Replaced kernel & initrd of lubuntu with yocto's one and booting was successful which indicates there is no issue with kernel or initrd in .hddimg generated by yocto
Tried some experiment with syslinux as well but didn't work out
The .hddimg types are quite outdated these days, and meta-intel has also switched to wic Their README includes very good information on how to create boot- and installable images here and here.
Short summary of it:
for booting, use the .wic-file
for building an installer, setup image and bootlader config according to documentation, then use .wic-file
I'm developing on Centos 7.6 64bit and Nvidia graphic card.
I've installed Nvidia driver and cuda driver.
But, when I run "clinfo", it shows:
Number of platforms 0
What should I check and how can I solve it?
The cuda's nvidia driver and your display nvidia-driver should match. I faced this problem yesterday and solved it but installing nvidia-driver suggested in cuda run file.
make sure that /var/lib/dkms/nvidia/<version> links properly with kernels.
see my topic for more info.
Sample deviceQuery cuda program
I'm reading this document about how to compile C/C++ code using the Intel C++ compiler and AVX512 support on a Intel Knights Landing.
However, I'm a little bit confused about this part:
-xMIC-AVX512: use this option to generate AVX-512F, AVX-512CD, AVX-512ER and AVX-512FP.
-xCORE-AVX512: use this option to generate AVX-512F, AVX-512CD, AVX-512BW, AVX-512DQ and AVX-512VL.
For example, to generate Intel AVX-512 instructions for the Intel Xeon
Phi processor x200, you should use the option –xMIC-AVX512. For
example, on a Linux system
$ icc –xMIC-AVX512 application.c This compiler option is useful when
you want to build a huge binary for the Intel Xeon Phi processor x200.
Instead of building it on the coprocessor where it will take more
time, build it on an Intel Xeon processor-based machine
My Xeon Phi KNL doesn't have a coprocessor (No need to ssh micX or to compile with the -mmic flag). However, I don't understand if it's better to use the -xMIC or -xCORE?
In second place about -ax instead of -x:
This compiler option is useful when you try to build a binary that can run on multiple platforms.
So -ax is used for cross-platform support, but is there any performance difference comapred to -x?
For the first question, please use –xMIC-AVX512 if you want to compile for the Intel Xeon Phi processor x200 (aka KNL processor). Note that the phrase in the paper that you mentioned was mistyped, it should read "This compiler option is useful when you want to build a huge binary for the Intel Xeon Phi processor x200. Instead of building it on the Intel Xeon Phi processor x200 where it will take more time, build it on an Intel Xeon processor-based machine."
For the second question, there should not be a performance difference if you run the binaries on an Intel Xeon Phi processor x200. However, the size of the binary complied with -ax should be bigger than the one compiled with -x option.
Another option from the link you provide is to build with -xCOMMON-AVX512. This is a tempting option because in my case it has all the instructions that I need and I can use the same option for both a KNL and a Sklake-AVX512 system. Since I don't build on a KNL system I cannot use -xHost (or -march=native with GCC).
However, -xCOMMON-AVX512 should NOT be used with KNL. The reason is that it generates the vzeroupper instruction (https://godbolt.org/z/PgFX55) which is not only not necessary it actually is very slow on a KNL system.
From Agner Fog's micro-architecture manual he writes in the KNL section.
The VZEROALL or VZEROUPPER instructions are not only superfluous here, they are actually
harmful for the performance. A VZEROALL or VZEROUPPER instruction takes 36 clock cycles
in 64 bit mode...
Therefore for a KNL system you should use -xMIC-AVX512for other systems with AVX512 you should use -xCORE-AVX512 (or -xSKYLAKE-AVX512). I use -qopt-zmm-usage=high as well.
I am not aware of a switch for ICC to disable vzeroupper once it is enabled (with GCC you can use -mno-vzeroupper).
Incidentally, by the same logic you should use -march=knl with GCC and not -mavx512f (-mavx512f -mno-vzeroupper may work if you are sure you don't need AVX512ER or AVX512PF).
The normal way to run an OpenCL program is to include the openCL kernel that is compiled at runtime (online compilation).
But I've seen examples of compiling OpenCL to binary before, called offline compilation. I'm aware of the disadvantages (reducing compatibility across hardware).
There used to be an offline compiler at http://www.fixstars.com/en/ but it does not seems to be available anymore.
So is there an offline compiler for OpenCL available, in particular for NVIDIA-based cards?
Someone suggested that nvcc.exe in NVidia's CUDA SDK could compile .cl files with
nvcc -x cu file.cl -o file.out -lOpenCL
...but it says missing cl.exe at least on Windows. This might be worth checking out, however: http://clcc.sourceforge.net/
As well as:
https://github.com/HSAFoundation/CLOC (AMD-maintained offline compiler)
https://github.com/Maratyszcza/clcc (includes also links to above ones and more)
I'm using OpenCL on OS X, I was wondering if someone could tell me the compiler which is used to generate the GPU binary from the OpenCL kernel source code? In OS X is the OpenCL kernel compiled to LLVM first then optimized and then finally compiled to GPU native code? Also I was wondering if the OpenCL kernel compiler does optimisations on the kernel such as loop invariant code motion?
Yes, on Mac OS X all OpenCL code is compiled to LLVM IR, which is then passed to device-specific optimizations and code generation.
You can generate LLVM bitcode files offline, and use the result in clCreateProgramWithBinary. The openclc compiler is inside the OpenCL framework (/System/Library/Framework/OpenCL.framework/Libraries/openclc). You need these options (arch can be i386, x86_64, gpu_32):
openclc -c -o foo.bc -arch gpu_32 -emit-llvm foo.cl