I was reading "The Art of Unix Programming" book and I found a quote saying :
In the Unix world, libraries which are delivered as libraries should come with exerciser programs.
So what exactly is a library exerciser?
A library exerciser is simply a program, or a collection of programs, that is testing that library (by calling some, or most, and ideally all public functions or methods of that library). Read also about unit testing.
BTW, this advice to make library exercisers is IMHO not specific to the Unix world (it should also hold for GNU Hurd, POSIX, VMS and even Windows systems), but generally useful for any software library. I guess it is related to modules and names in linkers. In some exotic, but interesting, programming environments (think of Lisp or Smalltalk machines, or persistent academic OSes like Grasshopper) the very notion of library does not exist, or is so far from Linux-like libraries (written in C or C++) that exercisers might not mean the same thing... ....
Notice that some languages (Ocaml, Go, D, ... but not C11 or C++14) might know some notion of modules and have a module-aware notion of libraries
Related
I started to learn OpenCl.
I read these links:
https://en.wikipedia.org/wiki/OpenCL
https://github.com/KhronosGroup/OpenCL-Guide/blob/main/chapters/os_tooling.md
https://www.khronos.org/opencl/
but I did not understand well that OpenCl is a library by including header file in source code or it is a compiler by using OpenCl C Compiler?!
It is both a library and a compiler.
The OpenCL C/C++ bindings that you include as header files and that you link against are a library. These provide the necessary functions and commands in C/C++ to control the device (GPU).
For the device, you write OpenCL C code. This language is not C or C++, but rather based on C99 and has some specific limitations (for example only 1D arrays) as well as extensions (math and vector functionality).
The OpenCL compiler sits in between the C/C++ bindings and the OpenCL C part. Using the C/C++ bindings, you run the compiler at runtime of the executable with the command clBuildProgram (C bindings) or program.build(...) (C++). It then at runtime once compiles the OpenCL C code to the device-specific assembly, which is different for every vendor. With an Nvidia GPU, you can for example look at the Compiler PTX assembly for the device.
If you know OpenGL then OpenCL works on the same principle.
Disclaimer: Self-learned knowledge ahead. I wanted to learn OpenCL and found the OpenCL term as confusing as you do. So I did some painful research until I got my first OpenCL hello-world program working.
Overview
OpenCL is an open standard - i.e. just API specification which targets heterogeneous computing hardware in particular.
The standard comprises a set of documents available here. It is up to the manufacturers to implement the standard for their devices and make OpenCL available e.g. through GPU drivers to users. Perhaps in the form of a shared library.
It is also up to the manufacturers to provide tools for developer to make applications using OpenCL.
Here's where it gets complicated.
SDKs
Manufacturers provide SDKs - software packages that contain everything the said developer needs. (See the link above). But they are specific for each - e.g. NVIDIA SDK won't work without their gpu.
ICD Loader
Because of SDKs being tied to a signle vendor, the most portable(IMHO) solution is to use what is known as Khronos' ICD loader. It is kind of "meta-driver" that will, during run-time, search for other ICDs present in the system by AMD, Intel, NVIDIA, and others; then forward them calls from our application. So, as a developer, we can develop against this generic driver and use clGetPlatformIDs to fetch the available platforms and devices. It is availble as libOpenCL.so, at least on Linux, and we should link against it.
Counterpart for OpenGL's libOpenGL, well almost, because the vast majority of OpenGL(1.1+) is present in the form of extensions and must be loaded separately with e.g. GLAD. In that sense, GLAD is very similar to the ICD loader.
Again, it does not contain any actual "computing" code, only stub implementations of the API which forward everything to the chosen platform's ICD.
Headers
We are still missing the headers, thankfully Khronos organization releases C headers and also C++ bindings. But nothing is stopping you from writing them yourself based on the official API documents. It would just be really tedious and error-prone.
Here we can find yet another parallel with OpenGL because the headers are also just the consequence of the Standard and GLAD generates them directly from its XML version! How cool is that?!
Summary
To write a simple OpenCL application we need to:
Download an ICD from the device's manufactures - e.g. up-to-date GPU drivers is enough.
Download the headers and place them in some folder.
Download, build, and install an ICD loader. It will likely need the headers too.
Include the headers, use API in them, and link against the ICD loader.
For Debian, maybe Ubuntu and others there is a simpler approach:
Download the drivers... look for <vendor>-opencl-icd, the drivers on Linux are usually not as monolithic as on Windows and might span many packages.
Install ocl-icd-opencl-dev which contains C, C++ headers + the loader.
Use the headers and link the library.
Which programming languages provide the best support for self-modifying code?
In particular, since the program will need to make extensive use of self-modifying code, I am looking forward at the ability to remove from memory some parts of code, after they are no longer needed, thus freeing that memory. Also, it would be a plus if there was the ability to identify and index the routines (procedures, functions, etc) with some sort of serial number, so that they could be easily managed in the memory (deleted, cloned etc) at runtime.
Operating systems need to have some more-or-less "self-modifying code" in order to load programs and dynamic link libraries from storage into RAM and later free up that RAM for other things, do relocation fix-ups, etc.
My understanding is that currently the C programming language is by far the most popular language for writing an operating systems.
The OSDev.org wiki has many tips of writing a new custom operating system, including a brief discussion of languages suitable for writing an operating system -- C, Assembly language, Lisp, Forth, C++, C#, PL/1, etc.
Just-in-time (JIT) compilers also need to have some more-or-less "self-modifying code" to compile source text into native instructions and run them, then later free up that memory for the next hot-spot.
Perhaps you could find some OS project or JIT project and use their code with relatively little modification.
A few people, when they say they want "self-modifying code", really want a language that supports homoiconicity such Scheme or some other dialect of Lisp, Prolog, TCL, Curl, etc.
I'm looking to create Rust implementations of some small bioinformatics programs for my research. One of my main considerations is performance, and while I know that I could schedule the Rust program to run on a grid with qsub - the cluster I have access to uses Oracle's GridEngine - I'm worried that the fact that I'm not calling MPI directly will cause performance issues with the Rust program.
Will scheduling the program without using an MPI library hinder performance greatly? Should I use an MPI library in Rust, and if so, are there any known MPI libraries for Rust? I've looked for one but I haven't found anything.
I have used several supercomputing facilities (I'm an astrophysicist) and have often faced the same problem: I know C/C++ very well but prefer to work with other languages.
In general, any approach other than MPI will do, but consider that often such supercomputers have heavily optimised MPI libraries, often tailored for the specific hardware integrated in the cluster. It is difficult to tell how much the performance of your Rust programs will be affected if you do not use MPI, but the safest bet is to stay with the MPI implementation provided on the cluster.
There is no performance penalty in using a Rust wrapper around a C library like a MPI library, as the bottleneck is the time needed to transfer data (e.g. via a MPI_Send) between nodes, not the negligible cost of an additional function call. (Moreover, this is not the case for Rust: there is no additional function call, as already stated above.)
However, despite the very good FFI provided by Rust, it is not going to be easy to create MPI bindings. The problem lies in the fact that MPI is not a library, but a specification. Popular MPI libraries are OpenMPI (http://www.open-mpi.org) and MPICH (http://www.mpich.org). Each of them differs slightly in the way they implement the standard, and they usually cover such differences using C preprocessor macros. Very few FFIs are able to deal with complex macros; I don't know how Rust scores here.
As an instance, I am implementing an MPI Program in Free Pascal but I am not able to use the existing MPICH bindings (http://wiki.lazarus.freepascal.org/MPICH), as the cluster I am using provides its own MPI library and I prefer to use this one for the reason stated above. I was unable to reuse MPICH bindings, as they assumed that constants like MPI_BYTE were hardcoded integer constants. But in my case they are pointers to opaque structures that seem to be created when MPI_Init is called.
Julia bindings to MPI (https://github.com/lcw/MPI.jl) solve this problem by running C and Fortran programs during the installation that generate Julia code with the correct values for such constants. See e.g. https://github.com/lcw/MPI.jl/blob/master/deps/make_f_const.f
In my case I preferred to implement a middleware, I.e., a small C library which wraps MPI calls with a more "predictable" interface. (This is more or less what the Python and Ocaml bindings do too, see https://forge.ocamlcore.org/projects/ocamlmpi/ and http://mpi4py.scipy.org.) Things are running smoothly, so far I haven't got any problem.
Will scheduling the program without using an MPI library hinder performance greatly?
There are lots of ways to carry out parallel computing. MPI is one, and as comments to your question indicate you can call MPI from Rust with a bit of gymnastics.
But there are other approaches, like the PGAS family (Chapel, OpenSHMEM, Co-array Fortran), or alternative messaging like what Charm++ uses.
MPI is "simply" providing a (very useful, highly portable, aggressively optimized) messaging abstraction, but as long as you have some way to manage the parallelism, you can run anything on a cluster.
I would want to compile existing software into presentation that can later be run on different architectures (and OS).
For that I need a (byte)code that can be easily run/emulated on another arch/OS (LLVM IR? Some RISC assemby?)
Some random ideas:
Compiling into JVM bytecode and running with java. Too restricting? C-compilers available?
MS CIL. C-Compilers available?
LLVM? Can Intermediate representation be run later?
Compiling into RISC arch such as MMIX. What about system calls?
Then there is the system call mapping thing, but e.g. BSD have system call translation layers.
Are there any already working systems that compile C/C++ into something that can later be run with an interpreter on another architecture?
Edit
Could I compile existing unix software into not-so-lowlevel binary, which could be "emulated" more easily than running full x86 emulator? Something more like JVM than XEN HVM.
There are several C to JVM compilers listed on Wikipedia's JVM page. I've never tried any of them, but they sound like an interesting exercise to build.
Because of its close association with the Java language, the JVM performs the strict runtime checks mandated by the Java specification. That requires C to bytecode compilers to provide their own "lax machine abstraction", for instance producing compiled code that uses a Java array to represent main memory (so pointers can be compiled to integers), and linking the C library to a centralized Java class that emulates system calls. Most or all of the compilers listed below use a similar approach.
C compiled to LLVM bit code is not platform independent. Have a look at Google portable native client, they are trying to address that.
Adobe has alchemy which will let you compile C to flash.
There are C to Java or even JavaScript compilers. However, due to differences in memory management, they aren't very usable.
Web Assembly is trying to address that now by creating a standard bytecode format for the web, but unlike the JVM bytecode, Web Assembly is more low level, working at the abstraction level of C/C++, and not Java, so it's more like what's typically called an "assembly language", which is what C/C++ code is normally compiled to.
LLVM is not a good solution for this problem. As beautiful as LLVM IR is, it is by no means machine independent, nor was it intended to be. It is very easy, and indeed necessary in some languages, to generate target dependent LLVM IR: sizeof(void*), for example, will be 4 or 8 or whatever when compiled into IR.
LLVM also does nothing to provide OS independence.
One interesting possibility might be QEMU. You could compile a program for a particular architecture and then use QEMU user space emulation to run it on different architectures. Unfortunately, this might solve the target machine problem, but doesn't solve the OS problem: QEMU Linux user mode emulation only works on Linux systems.
JVM is probably your best bet for both target and OS independence if you want to distribute binaries.
As Ankur mentions, C++/CLI may be a solution. You can use Mono to run it on Linux, as long as it has no native bits. But unless you already have a code base you are trying to port at minimal cost, maybe using it would be counter productive. If it makes sense in your situation, you should go with Java or C#.
Most people who go with C++ do it for performance reasons, but unless you play with very low level stuff, you'll be done coding earlier in a higher level language. This in turn gives you the time to optimize so that by the time you would have been done in C++, you'll have an even faster version in whatever higher level language you choose to use.
The real problem is that C and C++ are not architecture independent languages. You can write things that are reasonably portable in them, but the compiler also hardcodes aspects of the machine via your code. Think about, for example, sizeof(long). Also, as Richard mentions, there's no OS independence. So unless the libraries you use happen to have the same conventions and exist on multiple platforms then it you wouldn't be able to run the application.
Your best bet would be to write your code in a more portable language, or provide binaries for the platforms you care about.
I'm toying with the idea of writing a command line interpreter and I suspect that a functional language such as Clojure is well suited to this task.
I am however 5 years out of a CS degree and my only experience with functional languages was a harrowing experience with Haskell in a third year languages course.
So, is a language such as Clojure ideal for this task? If not, what is an ideal language.
Loose requirements:
Has to run on a JVM
Provide an interactive shell where users enter commands with a CLI like syntax
User commands ultimately end up making calls to a remote service using SOAP.
Thanks!
You can approximately do that out-of-the-box with Clojure and Scala, and with Java if you add BeanShell. You might look at the REPL facilities they already have.
I imagine that's suited only for sophisticated users. But really, it's hard to imagine a language that wouldn't do a fine job on a CLI.
Deciding between platforms, the more of a modern system it is, the more it will have scripting language convenience.
I certainly know what I would use given your requirements: JRuby. (It has an out-of-the-box REPL, too.)
I don't think a CLI has any specific requirements language-wise; you could probably do just as well writing it in Java or Scala. Ultimately I think language choice is down to:
Which ones you are most comfortable working with.
Which ones have adequate library support for what you want to do (i.e. web services).