Is there an offlineOpenCL compiler (for NVIDIA graphics cards)? - opencl

The normal way to run an OpenCL program is to include the openCL kernel that is compiled at runtime (online compilation).
But I've seen examples of compiling OpenCL to binary before, called offline compilation. I'm aware of the disadvantages (reducing compatibility across hardware).
There used to be an offline compiler at http://www.fixstars.com/en/ but it does not seems to be available anymore.
So is there an offline compiler for OpenCL available, in particular for NVIDIA-based cards?

Someone suggested that nvcc.exe in NVidia's CUDA SDK could compile .cl files with
nvcc -x cu file.cl -o file.out -lOpenCL
...but it says missing cl.exe at least on Windows. This might be worth checking out, however: http://clcc.sourceforge.net/
As well as:
https://github.com/HSAFoundation/CLOC (AMD-maintained offline compiler)
https://github.com/Maratyszcza/clcc (includes also links to above ones and more)

Related

How to compile opencl-kernel-file(.cl) to LLVM IR

This question is related to LLVM/clang.
I already know how to compile opencl-kernel-file(.cl) using OpenCL API ( clBuildProgram() and clGetProgramBuildInfo() )
my question is this:
How to compile opencl-kernel-file(.cl) to LLVM IR with OpenCL 1.2 or higher?
In the other words, How to compile opnecl-kernel-file(.cl) to LLVM IR without libclc?
I have tried various methods to get LLVM-IR of OpenCL-Kernel-File.
I first followed the clang user manual.(https://clang.llvm.org/docs/UsersManual.html#opencl-features) but it did not run.
Secondly, I found a way to use libclc.
commands is this:
clang++ -emit-llvm -c -target -nvptx64-nvidial-nvcl -Dcl_clang_storage_class_specifiers -include /usr/local/include/clc/clc.h -fpack-struct=64 -o "$#".bc "$#" <br>
llvm-link "$#".bc /usr/local/lib/clc/nvptx64--nvidiacl.bc -o "$#".linked.bc <br>
llc -mcpu=sm_52 -march=nvptx64 "$#".linked.bc -o "$#".nvptx.s<br>
This method worked fine, but since libclc was built on top of the OpenCL 1.1 specification, it could not be used with OpenCL 1.2 or later code such as code using printf.
And this method uses libclc, which implements OpenCL built-in functions in the shape of new function. You can observe that in the assembly(ptx) of result opencl binary, it goes straight to the function call instead of converting it to an inline assembly. I am concerned that this will affect gpu behavior and performance, such as execution time.
So now I am looking for a way to replace compilation using libclc.
As a last resort, I'm considering using libclc with the NVPTX backend and AMDGPU backend of LLVM.
But if there is already another way, I want to use it.
(I expect that the OpenCL front-end I have not found yet exists in clang)
My program's scenarios are:
There is opencl kernel source file(.cl)
Compile the file to LLVM IR
IR-Level process to the IR
Compile(using llc) the IR to Binary
with each gpu targets(nvptx, amdgcn..)
Using the binary, Run host(.c or .cpp with lib OpenCL) with clCreateProgramWithBinary()
Now, When I compile kernel source file to LLVM IR, I have to include header of libclc(-include option in first one of above command) for compiling built-in functions. And I have to link libclc libraries before compile IR to binary
My environments are below:
GTX960
- NVIDIA's Binary appears in nvptx format
- I'm using sm_52 nvptx for my gpu.
Ubuntu Linux 16.04 LTS
LLVM/Clang 5.0.0
- If there is another way, I am willing to change the LLVM version.
Thanks in advice!
Clang 9 (and up) can compile OpenCL kernels written in the OpenCL C language. You can tell Clang to emit LLVM-IR by passing the -emit-llvm flag (add -S to output the IR in text rather than in bytecode format), and specify which version of the OpenCL standard using e.g. -cl-std=CL2.0. Clang currently supports up to OpenCL 2.0.
By default, Clang will not add the standard OpenCL headers, so if your kernel uses any of the OpenCL built-in functions you may see an error like the following:
clang-9 -c -x cl -emit-llvm -S -cl-std=CL2.0 my_kernel.cl -o my_kernel.ll
my_kernel.cl:17:12: error: implicit declaration of function 'get_global_id' is invalid in OpenCL
int i = get_global_id(0);
^
1 error generated.
You can tell Clang to include the standard OpenCL headers by passing the -finclude-default-header flag to the Clang frontend, e.g.
clang-9 -c -x cl -emit-llvm -S -cl-std=CL2.0 -Xclang -finclude-default-header my_kernel.cl -o my_kernel.ll
(I expect that the OpenCL front-end I have not found yet exists in clang)
There is an OpenCL front-end in clang - and you're using it, otherwise you couldn't compile a single line of OpenCL with clang. Frontend is Clang recognizing the OpenCL language. There is no OpenCL backend of any kind in LLVM, it's not the job of LLVM; it's the job of various OpenCL implementations to provide proper libraries. Clang+LLVM just recognizes the language and compiles it to bitcode & machine binaries, that's all it does.
in the assembly(ptx) of result opencl binary, it goes straight to the function call instead of converting it to an inline assembly.
You could try linking to a different library instead of libclc, if you find one. Perhaps NVidia's CUDA has some bitcode libraries somewhere, then again licensing issues... BTW are you 100% sure you need LLVM IR ? getting OpenCL binaries using the OpenCL runtime, or using SPIR-V, might get you faster binaries & certainly be less painful to work with. Even if you manage to get a nice LLVM IR, you'll need some runtime which actually accepts it (i could be wrong, but i doubt proprietary AMD/NVIDIA OpenCL will just accept random LLVM IR as inputs).
Clang does not provide a standard CL declaration header file (for example, C's stdio.h), which is why you're getting "undefined type float" and whatnot.
If you get one such header, you can then mark it as implicit include using "clang -include cl.h -x cl [your filename here]"
One such declaration header can be retrieved from the reference OpenCL compiler implementation at
https://github.com/KhronosGroup/SPIR-Tools/blob/master/headers/opencl_spir.h
And by the way, consider using this compiler which generates SPIR (albeit 1.0) which can be fed into OpenCL drivers as input.

Compile Java source to LLVM IR [duplicate]

From what I've read, there is a llvm program that converts java bytecode to llvm's intermediate form called class2llvm. My question is, how do I access this. What front end do I have to install in order to access this.
VMkit is their implementation of a JVM, but I am looking for how to compile the java source code with llvm, not how to run it.
The Java frontend translates Java bytecode (.class files) into LLVM
bytecode. Take a look at this link:
https://llvm.org/svn/llvm-project/java/trunk/docs/java-frontend.txt
You may take a look at dragonegg, which enables llvm to use gcc's frontends. As gcc already has a frontend for java, called gcj, perhaps llvm can use it to compile java code. But I'm not sure how well llvm interfaces with the gcc frontend, so this may not work.
I have executed a java class using vmkit ( http://vmkit.llvm.org/ ) based on LLVM. It uses LLVM for compiling and optimizing high-level languages to machine code. J3 is an implementation of a JVM with VMKit.
[NOTE: From November 2015 it is no longer open source, so this hack is mostly useless.]
RoboVM might become the solution you're looking for. It's open source and compiles JVM bytecode (.class files) to machine code.
I assume they do it using something like class2llvm.
Unfortunately, it's still in alpha. I just tested it on HelloWorld.java. It gave 5x speed up of load time running on a single core. (Most of the run time is load time.)
echo Hello World! : <1 ms : 31K (/usr/bin/echo binary)
java HelloWorld : ~70 ms : 0.4K (HelloWorld.class JVM bytecode)
./HelloWorld : ~13 ms : 9.4MB (9.3MB binary + 57K robovm-rt.jar)
Note that java calls a 32MB $JAVA_HOME/lib/rt.jar file (and maybe more). Searching in such a large file must be part of the reason java is so slow to load. If RoboVM gets smarter, perhaps it can throw out most of the 9.3MB binary for an even faster load?
The website mentions iOS, but I think that's because they're selling their add-on UI libraries. RoboVM compiled fine for me on a flavor of Ubuntu. Just make sure to do
$ sudo apt-get install g++-multilib
first (and maybe install libpthread-stubs0-dev and libpthread-workqueue0...don't know if they mattered).

OpenCL Simple "Hello World!" program compiles correctly but spits out garbage when executed

As the title suggests, I have copied verbatim the hello.cl and hello.c files from Fixstar's online OpenCL book, at http://www.fixstars.com/en/opencl/book/OpenCLProgrammingBook/first-opencl-program.html, and cannot get correct output.
I compile the program using
gcc -lOpenCL hello.c -o hello.
I execute normally with
./hello.
But my output reads something like
���.
I run Arch Linux and have installed OpenCL, the headers, and the NVIDIA implementation. I would like to continue learning OpenCL but simply cannot continue if my programs won't run. Does anyone have any ideas on what is occuring? Additionally, if anyone has any advice on how to debug this I would be immensely happy.
EDIT: I was using Nouveau drivers instead of the Nvidia ones. Nouveau does not support OpenCL. This was the problem.
Nouveau does NOT support OpenCL yet. Replace nouveau with nvidia and check to make sure libcl, libcl-headers, and opencl-nvidia are all correctly installed.

none of fink macports and homebrew useful on lion?

I have an library (flam3) that depends on a few utility libraries from unix (xml2, jpeg, png, z) and I am trying to make an application on Lion that uses it. I am building with the latest Xcode and when I try to link with the libraries from fink, macports, and homebrew I get the same error:
ld: warning: ignoring file /opt/local/lib/libxml2.a, file was built
for archive which is not the architecture being linked (i386)
and the libraries look different from ones that work:
bash-3.2$ file /sw/lib/libxml2.a
/sw/lib/libxml2.a: current ar archive random library
by comparision
bash-3.2$ file ~/Documents/FLAM3/libflam3.a
/Users/spot/Documents/FLAM3/libflam3.a: Mach-O universal binary with 2
architectures
/Users/spot/Documents/FLAM3/libflam3.a (for architecture x86_64):
current ar archive random library
/Users/spot/Documents/FLAM3/libflam3.a (for architecture i386):
current ar archive random library
that's the library that I compiled with Xcode.
Is there any way to get Xcode to accept this library? Is there any way to get fink/macports/homebrew to generate a library that works with Xcode? Seems like I am "doing it wrong" as these projects would all be useless if everyone had this problem.... but I don't feel like I've done anything unusual. Help?
The problem is that your libxml2.a is not built as "Universal binary". I.e. it doesn't contain all necessary architectures (In your case I believe it is i386). You need to ask fink, macports, or homebrew to build/download/install library with all necessary platforms. I know that macports has such flag (I don't remember how it's called).
The "file" command list all available architectures for .a file only when the file is trully universal (contains two or more of ppc, i386, x86_64), otherwise it only shows the vanilla "ar archive..." message. That confirms your libxml2.a has only one architecture.
The problem is not in Xcode or Lion. Possibly the default link architecture on Lion changed.

How a recent version of GCC (4.6) could be used together with Qt under Mac OS?

My problem is related to the one discussed here:
Is there a way that OpenMP can operate on Qt spanwed threads?
Upon trying to run my Qt-based program under Mac OS that has an OpenMP clause in a secondary thread, it crashed. After browsing through the web, now I understand that it is caused by a bug in the rather old version (4.2) of gcc supplied by Apple.
Then I downloaded the latest 4.6 version of gcc from http://hpc.sourceforge.net and tried to compile the project, but I got the following errors from g++ compiler:
unrecognized option ‘-arch’
unrecognized option ‘-Xarch_x86_64’
I learned that this is because these are options, which can be only interpreted by the custom-configured Apple-gcc compiler, but not by standard gcc.
Could anybody please help me could I overcome this issue and configure g++ 4.6 to use with Qt in order to get a bug-free OpenMP support? I admit that I'm a newbie under Mac OS platform with regard to compilers and programming and would like to port my code from Visual Studio-Qt environment.
Many thanks in advance!
If you aren't afraid of messing with your Qt installation, then change the QMAKE_CFLAGS_X86_64 entry in ~/QtSDK/Desktop/Qt/4.8.1/gcc/mkspecs/common/g++-macx.conf.
Replace ‘-Xarch_x86_64’ with ‘-arch x86_64’.
You can use your non-Apple gcc v4.6 and compile a binary for each architecture you want to build (use --target=${ARCH} should be fine for i386 and x86_64). Then once you have a binary for each of the architectures use lipo like so:
lipo -create -arch i386 binary_32bit -arch x86_64 binary_64bit -output binary_universal
This will create a fat binary (aka universal binary) named binary_universal from binary_32bit and binary_64bit.
Or you could use clang/llvm instead of gcc, which probably won't have the bug you described and (if supplied via Apple's developer tools) should be able to compile universal binaries directly.
You should run qmake woth corresponding -spec option, for example, to use gcc46 on freebsd it is needed to run qmake so:
qmake --spec=freebsd-g++46
Lipo can indeed be used to put multiple object files together into a "fat" object file, in fact it turns out this is just what apple's compiler does. Their GCC compiler is actually a driver that maps various architectures to the appropriate compiler for the architecture and then mashes the objects together using lipo.
see: http://lists.macosforge.org/pipermail/macports-dev/2011-September/016210.html
Here is the source file for that driver:
http://opensource.apple.com/source/gcc/gcc-5666.3/driverdriver.c
All one needs to do to get a new version of GCC to honor the -arch flag is to modify this driver and get it to point to a script wrapper for your version of gcc that adds the appropriate flags for the given architecture and then passes all the rest of the arguments. Something like this:
#!/bin/sh
/opt/local/bin/gcc-mp-4.6 -m32 $#
and
#!/bin/sh
/opt/local/bin/gcc-mp-4.6 -m64 $#
Here is a link that talks about how to do it, and provides a cmake project to easily get the macports version of GCC fixed up and supporting the -arch flag for the two intel architectures:
http://thecoderslife.blogspot.com/2015/07/building-with-gcc-46-and-xcode-4.html

Resources