Crystal-lang: why is the LLVM "hello.bc" not the same if generated by Crystal or by clang? - llvm-ir

this is my first Stackoverflow question :-)
My background:
2 years Python experience
2 months crystal-lang experience ( websites running with Amber framework )
1 month into C, C++ , assembly
Facts:
- crystal-lang is compiling and running without any problem
- running on x86_64
Please be nice, as i don't have much low-level language knowledge yet.
From my understanding, when we compile and run a basic hello.c file using LLVM, it goes as follow:
hello.c :
#include
int main() {
printf("hello world\n");
return 0;
}
shell :
$ clang -O3 -emit-llvm hello.c -c -o hello.bc
$ llc hello.bc -o hello.s
$ gcc hello.s -o hello.native
$ ./hello.native
this comes from the LLVM examples )
My point is that we can produce a pretty short hello.bc file (128 lines) that can be run in a slower way using:
$ lli hello.bc
but when I tried to generate a similar hello.bc from a hello.cr file and run it like i did with the hello.c file:
hello.cr :
puts "hello world"
shell :
$ crystal build hello.cr --emit llvm-bc --release
$ llc hello.bc -o hello.s
what i noticed:
This hello.bc file is much much bigger than the one generating from the c file (43'624 lines)
This hello.bc can't be run using "lli" as it generates an:
"LLVM ERROR: Program used external function 'pcre_malloc' which could not be resolved!
I can't even compile from hello.s to hello.native
Same problem if i try to use generate and hello.ll file
As i understood, LLVM is portable , and that all front-end languages would produce an intermediate *.bc that can then be compiled to any architecture.
My questions are:
Why are the hello.bc not similar in both cases ?
Am I doing something wrong in the crystal procedure ?
Thank you!

Everything is just as it is supposed to be. Crystal has a runtime library that is always present even if you didn't include anything. This is required to run the Crystal program.
The C example pretty much doesn't contain anything else than a syscall to printf. That's why the compiled ASM is also really tiny.
Crystal's simple puts call has a much more behind it. It is based on libraries for handling asynchronous IO, concurrency, signal handling, garbage collection and more. Some of these libraries are completely implemented in the Crystal standard library, some use other libraries that are either directly embedded into the binary (libgc) or still require dynamic libraries from the system (libpcre, libpthread).
Any Crystal program comes with this runtime library by default. Even an empty program. This usually goes completely unnoticed because larger programs will eventually need those things anyway and the compiled binary size of the runtime library is less than 500 KB (in release mode).
Such a small program like yours doesn't really need all of this just to print a string. But these libraries are required for the Crystal runtime.
NOTE: You can compile a Crystal program without these default libraries. But this means you can't use anything from the Crystal stdlib and you have to essentially write C code with Crystal syntax (or implement your own stdlib):
require "lib_c"
require "c/stdio"
LibC.printf pointerof("hello world".#c)
This can be compiled with --prelude=empty option and it will generate a substantially smaller ASM, roughly similar to the C example.

Related

How to compile opencl-kernel-file(.cl) to LLVM IR

This question is related to LLVM/clang.
I already know how to compile opencl-kernel-file(.cl) using OpenCL API ( clBuildProgram() and clGetProgramBuildInfo() )
my question is this:
How to compile opencl-kernel-file(.cl) to LLVM IR with OpenCL 1.2 or higher?
In the other words, How to compile opnecl-kernel-file(.cl) to LLVM IR without libclc?
I have tried various methods to get LLVM-IR of OpenCL-Kernel-File.
I first followed the clang user manual.(https://clang.llvm.org/docs/UsersManual.html#opencl-features) but it did not run.
Secondly, I found a way to use libclc.
commands is this:
clang++ -emit-llvm -c -target -nvptx64-nvidial-nvcl -Dcl_clang_storage_class_specifiers -include /usr/local/include/clc/clc.h -fpack-struct=64 -o "$#".bc "$#" <br>
llvm-link "$#".bc /usr/local/lib/clc/nvptx64--nvidiacl.bc -o "$#".linked.bc <br>
llc -mcpu=sm_52 -march=nvptx64 "$#".linked.bc -o "$#".nvptx.s<br>
This method worked fine, but since libclc was built on top of the OpenCL 1.1 specification, it could not be used with OpenCL 1.2 or later code such as code using printf.
And this method uses libclc, which implements OpenCL built-in functions in the shape of new function. You can observe that in the assembly(ptx) of result opencl binary, it goes straight to the function call instead of converting it to an inline assembly. I am concerned that this will affect gpu behavior and performance, such as execution time.
So now I am looking for a way to replace compilation using libclc.
As a last resort, I'm considering using libclc with the NVPTX backend and AMDGPU backend of LLVM.
But if there is already another way, I want to use it.
(I expect that the OpenCL front-end I have not found yet exists in clang)
My program's scenarios are:
There is opencl kernel source file(.cl)
Compile the file to LLVM IR
IR-Level process to the IR
Compile(using llc) the IR to Binary
with each gpu targets(nvptx, amdgcn..)
Using the binary, Run host(.c or .cpp with lib OpenCL) with clCreateProgramWithBinary()
Now, When I compile kernel source file to LLVM IR, I have to include header of libclc(-include option in first one of above command) for compiling built-in functions. And I have to link libclc libraries before compile IR to binary
My environments are below:
GTX960
- NVIDIA's Binary appears in nvptx format
- I'm using sm_52 nvptx for my gpu.
Ubuntu Linux 16.04 LTS
LLVM/Clang 5.0.0
- If there is another way, I am willing to change the LLVM version.
Thanks in advice!
Clang 9 (and up) can compile OpenCL kernels written in the OpenCL C language. You can tell Clang to emit LLVM-IR by passing the -emit-llvm flag (add -S to output the IR in text rather than in bytecode format), and specify which version of the OpenCL standard using e.g. -cl-std=CL2.0. Clang currently supports up to OpenCL 2.0.
By default, Clang will not add the standard OpenCL headers, so if your kernel uses any of the OpenCL built-in functions you may see an error like the following:
clang-9 -c -x cl -emit-llvm -S -cl-std=CL2.0 my_kernel.cl -o my_kernel.ll
my_kernel.cl:17:12: error: implicit declaration of function 'get_global_id' is invalid in OpenCL
int i = get_global_id(0);
^
1 error generated.
You can tell Clang to include the standard OpenCL headers by passing the -finclude-default-header flag to the Clang frontend, e.g.
clang-9 -c -x cl -emit-llvm -S -cl-std=CL2.0 -Xclang -finclude-default-header my_kernel.cl -o my_kernel.ll
(I expect that the OpenCL front-end I have not found yet exists in clang)
There is an OpenCL front-end in clang - and you're using it, otherwise you couldn't compile a single line of OpenCL with clang. Frontend is Clang recognizing the OpenCL language. There is no OpenCL backend of any kind in LLVM, it's not the job of LLVM; it's the job of various OpenCL implementations to provide proper libraries. Clang+LLVM just recognizes the language and compiles it to bitcode & machine binaries, that's all it does.
in the assembly(ptx) of result opencl binary, it goes straight to the function call instead of converting it to an inline assembly.
You could try linking to a different library instead of libclc, if you find one. Perhaps NVidia's CUDA has some bitcode libraries somewhere, then again licensing issues... BTW are you 100% sure you need LLVM IR ? getting OpenCL binaries using the OpenCL runtime, or using SPIR-V, might get you faster binaries & certainly be less painful to work with. Even if you manage to get a nice LLVM IR, you'll need some runtime which actually accepts it (i could be wrong, but i doubt proprietary AMD/NVIDIA OpenCL will just accept random LLVM IR as inputs).
Clang does not provide a standard CL declaration header file (for example, C's stdio.h), which is why you're getting "undefined type float" and whatnot.
If you get one such header, you can then mark it as implicit include using "clang -include cl.h -x cl [your filename here]"
One such declaration header can be retrieved from the reference OpenCL compiler implementation at
https://github.com/KhronosGroup/SPIR-Tools/blob/master/headers/opencl_spir.h
And by the way, consider using this compiler which generates SPIR (albeit 1.0) which can be fed into OpenCL drivers as input.

Any PowerPC simulator suggestions?

I'm about to start learning PowerPC architecture and as an example of I've downloaded some reference manuals from NXP website as well as theirs SDK so I can build even bareboard applications. To be precise I'm using virtual host environment. I don't have any board with PowerPC processor on it so I would like to use a simulator for debugging.
At this step I'm a little confused. So, I've built an bareboard application (a 'Hello World' one). And now I'd like to run it with simulator. I've tried to use a command like this: qemu-system-ppc -machine ppce500 -cpu e500v2 -nographic -kernel ./a.out and saw nothing. The qemu just loads host CPU. ./a.out is the binary built with command $CC -static ./tst.c. So, now I don't even know how to deal with qemu.
For those, who would like to help: I'm using Virtual Host environment for Freescale P1010 processor with e500v2 core, the binary was built with theirs fsl-* utilities.
The source compiled was:
$ cat ./tst.c
#include <unistd.h>
#define STRING "This is a test.\n"
int main(void) {
write(1, STRING, sizeof(STRING) - 1);
return 0;
}
Compilation took place like:
$ echo $CC
powerpc-fsl-linux-gnuspe-gcc -m32 -mcpu=8548 -mabi=spe -mspe -mfloat-gprs=double --sysroot=/opt/fsl-qoriq/1.9/sysroots/ppce500v2-fsl-linux-gnuspe
$ $CC -static -o tst.bin ./tst.c
$ file ./tst.bin
./tst.bin: ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1 (SYSV), statically linked, for GNU/Linux 2.6.32, BuildID[sha1]=63b307e7afe9de0b2781f2f92b5f1b3a803f850d, not stripped
Other than using a simulator, Why don't you ask for a real free virtual machine to do development/learning? From what you say, it should work better.
You can ask a VM in the following sites:
[Brazil] http://openpower.ic.unicamp.br/minicloud/
[China] https://dashboard.ptopenlab.com
You're not seeing anything as you're asking qemu-system-powerpc to run a userspace binary rather than a kernel.
If you're just wanting to poke at userspace programming, try the qemu-ppc binary instead, as that will run 32bit PowerPC userspace by doing things like translating syscalls.
Another option, if you want to program the bare metal, is to start writing your own tiny OS to a specific machine type (i.e. you're going to have to implement that write() call you're calling).

How do I make an executable that readelf would say is UNIX - System V?

I have been making programs for ages that are under 800K on Linux Fedora 19 using GCC 4.8.1. The readelf utility has reported them as OS/ABI = "UNIX - System V" (byte 8 is zero).
Now suddenly the binaries are turning out over 1MB and readelf is saying they are "UNIX - GNU" (byte 8 is 3). Not my doing! Something is having an influence and I'm not sure what.
For instance, now, using nm, I find that the functions __nss_hosts_lookup2 and openat are being linked in, which weren't there before.
How do I make an executable again that readelf would say is UNIX - System V ?
I found it! All on my own. The linker was picking up the October 2007 versions of libc.a libm.a and libstdc++.a in the library directory I supplied (providing 780K executables). After deleting those files it started picking up the March 2013 versions and so bloated the executable (1.1M). I'll have to leave it bloated unfortunately because I don't want to have to find all the necessary header files (as of course the header files should match the libraries). I don't blame the software writers for not putting one function per source/object file. I blame the linker for still not dragging in function by function granularity and doing cyclic library search as standard.

Compile Java source to LLVM IR [duplicate]

From what I've read, there is a llvm program that converts java bytecode to llvm's intermediate form called class2llvm. My question is, how do I access this. What front end do I have to install in order to access this.
VMkit is their implementation of a JVM, but I am looking for how to compile the java source code with llvm, not how to run it.
The Java frontend translates Java bytecode (.class files) into LLVM
bytecode. Take a look at this link:
https://llvm.org/svn/llvm-project/java/trunk/docs/java-frontend.txt
You may take a look at dragonegg, which enables llvm to use gcc's frontends. As gcc already has a frontend for java, called gcj, perhaps llvm can use it to compile java code. But I'm not sure how well llvm interfaces with the gcc frontend, so this may not work.
I have executed a java class using vmkit ( http://vmkit.llvm.org/ ) based on LLVM. It uses LLVM for compiling and optimizing high-level languages to machine code. J3 is an implementation of a JVM with VMKit.
[NOTE: From November 2015 it is no longer open source, so this hack is mostly useless.]
RoboVM might become the solution you're looking for. It's open source and compiles JVM bytecode (.class files) to machine code.
I assume they do it using something like class2llvm.
Unfortunately, it's still in alpha. I just tested it on HelloWorld.java. It gave 5x speed up of load time running on a single core. (Most of the run time is load time.)
echo Hello World! : <1 ms : 31K (/usr/bin/echo binary)
java HelloWorld : ~70 ms : 0.4K (HelloWorld.class JVM bytecode)
./HelloWorld : ~13 ms : 9.4MB (9.3MB binary + 57K robovm-rt.jar)
Note that java calls a 32MB $JAVA_HOME/lib/rt.jar file (and maybe more). Searching in such a large file must be part of the reason java is so slow to load. If RoboVM gets smarter, perhaps it can throw out most of the 9.3MB binary for an even faster load?
The website mentions iOS, but I think that's because they're selling their add-on UI libraries. RoboVM compiled fine for me on a flavor of Ubuntu. Just make sure to do
$ sudo apt-get install g++-multilib
first (and maybe install libpthread-stubs0-dev and libpthread-workqueue0...don't know if they mattered).

Does changing the order of compiling with GCC in unix delete files?

So I just messed up real bad.. I'm hoping someone can tell me I didn't just ruin everything I did for the last 4 weeks with this simple typo..
I kept making changes to my C program and would recompile to test the changes using this in terminal:
gcc -o server server.c
Due to programming for the past 5 hours straight for the most part.. I accidentally typed this the last time I tried compiling:
gcc -o server.c server
I got some long message and realized my mistake.. tried recompiling using the first way I listed.. And it says "no such file server.c"
I typed "ls" and sure enough.. my program isn't there.
Please tell me everything I did hasn't vanished? :((
Unfortunately, you told the compiler to read your executable, and write its output to your source file. The file is gone. If you are on a Windows system, perhaps it could be undeleted with something like Norton Utilities. If not, you're probably out of luck.
Next time, consider using a Makefile to contain the compiler commands, so you can just type "make" to build your program. Other strategies include keeping the file open in a single editor session the whole time you're working, and using a source control system like git or subversion (which would let you back up to previous versions of the file, as well.)

Resources