Common Lisp: defsystem howto set files to load depending on predicate? - common-lisp

writing a project using OpenCl and OpenGl, parts of the code are depending on the GPU and OpenCL driver used. It's not very straightforward as classes have to be defined differently, so instead of using inline #+ and #- all across the different files, which would make the code quite messy, I'd prefer to separate the code specific to the different architectures into different directories and e.g. distinguish the hardware by evaluating code at the beginning of the .asd file, using this to determine (set) which files to load later in the .asd file (e.g. based on some predicate or feature set in the routine evaluated at the beginning).
I'm a bit lost and overwhelmed by the theoretical possibilities. Can somebody give advice which would be a feasible and cl like solution?

Solved it by performing code at the beginning of the .asd file before any defsystem statement to determine the exact platform and adding a suitable keyword to *features*. In the :components part of the defsystem form the specific files which should get loaded can be specified by adding ":if-feature <:feature>" to the respective (:file ...) forms. Even though there might be more suitable ways to accomplish this: cl rules!

Related

Calling the agrep .Internal C function from Rcpp

In short: How can I call, from within Rccp C++ code, the agrep C internal function that gets called when users use the regular agrep function from base R?
In long: I have found multiple questions here about how to invoke, from within Rcpp, a C or C++ function created for another package (e.g. using C function from other package in Rcpp
and Rcpp: Call C function from a package within Rcpp).
The thing that I am trying to achieve, however, is at the same time simpler but also way less documented: it is to directly call, from within Rcpp, a .Internal C function that comes with base R rather than another package, without interfacing with R (that is, without doing what is said in Call R functions in Rcpp). How could I do that for the .Internal C function that lays underneath base R's agrep wrapper?
The specific function I am trying to call here is the agrep internal C function. And for context, what I am ultimately trying to achieve is to speed-up a call to agrep for when millions of patterns must be each checked against each of millions of x targets.
Great question. The long and short of it is "You cant" (in many cases) unless the function is visible in one of the header files in "src/include/". At least not that easily.
Not long ago I had a similar fun challenge, where I tried to get access to the do_docall function (called by do.call), and it is not a simple task. First of all, it is not directly possible to just #include <agrep.c> (or something similar). That file simply isn't available for inclusion, as it is not a part of the "src/include". It is compiled and the uncompiled file is removed (not to mention that one should never "include" a .c file).
If one is willing to go the mile, then the next step one could look at is "copying" and "altering" the source code. Basically find the function in "src/main/agrep.c", copy it into your package and then fix any errors you find.
Problems with this approach:
As documented in R-exts the internal structures of sexprec_info is not made public (this is the base structure for all objects in R). Many internal function use the fields within this structure, so one has to "copy" the structure into your source code, to make it public to your code specifically.
If you ever #include <Rcpp.h> prior to this file, you will need to go through each and every call to internal functions and likely add either R_ or Rf_.
The function may contain calls to other "internal" functions, that further needs to be copied and altered for it to work.
You will also need to get a clear understanding of what CDR, CAR and similar does. The internal functions have a documented structure, where the first argument contains the full call passed to the function, and function like those 2 are used to access parts of the call.
I did myself a solid and rewrote do_docall changing the input format, to avoid having to consider this. But this takes time. The alternative is to create a pairlist according to the documentation, set its type as a call-sexp (the exact name is lost to me at the moment) and pass the appropriate arguments for op, args and env.
And lastly, if you go through the steps, and find that it is necessary to copy the internal structures of sexprec_info (as described later), then you will need to be very careful about when you include Rinternals and Rcpp, as any one of these causes your code to crash and burn in the most beautiful and silent way if you include your header and these in the wrong order! Note that this even goes for [[Rcpp::export]], which may indeed turn out to include them in the wrong arbitrary order!
If you are willing to go this far down the drainage, I would suggest carefully reading adv-R "R's C interface" and Chapter 2, 5 and 6 of R-ext and maybe even the R internal manual, and finally once that is done take a look at do_docall from src/main/coerce.c and compare it to the implementation in my repository cmdline.arguments/src/utils/{cmd_coerce.h, cmd_coerce.c}. In this version I have
Added all the internal structures that are not public, so that I can access their unmodified form (unmodified by the current session).
This includes the table used to store the currently used SEXP's, that was used as a lookup. This caused a problem as I can't access the modified version, so my code is slightly altered with the old code blocked by the macro #if --- defined(CMDLINE_ARGUMENTS_MAYBE_IN_THE_FUTURE). Luckily the code causing a problem had a static answer, so I could work around this (but this might not always be the case).
I added quite a few Rf_s as their macro version is not available (since I #include <Rcpp.h> at some point)
The code has been split into smaller functions to make it more readable (for my own sake).
The function has one additional argument (name), that is not used in the internal function, with some added errors (for my specific need).
This implementation will be frozen "for all time to come" as I've moved on to another branch (and this one is frozen for my own future benefit, if I ever want to walk down this path again).
I spent a few days scouring the internet for information on this and found 2 different posts, talking about how this could be achieved, and my approach basically copies this. Whether this is actually allowed in a cran package, is an whole other question (and not one that I will be testing out).
This approach goes again if you want to use not-public code from other packages. While often here it is as simple as "copy-paste" their files into your repository.
As a final side note, you mention the intend is to "speed up" your code for when you have to perform millions upon millions of calls to agrep. It seems that this is a time where one should consider performing the task in parallel. Even after going through the steps outlined above, creating N parallel sessions to take care of K evaluations each (say 100.000), would be the first step to reduce computing time. Of course each session should be given a batch and not a single call to agrep.

Why use macros in Julia?

I was reading up on the documentation of macros and ran into the following under the `Hold up: why macros' section. The reasoning given to use macros is as follows:
Macros are necessary because they execute when code is parsed,
therefore, macros allow the programmer to generate and include
fragments of customized code before the full program is run
This leads me to wonder why someone would want to use "generate and include fragments of customized code before the full program is run". Can someone provide context as to why this would be beneficial and/or other good use cases for macros?
Let me give you my view on macros.
A macro basically is a code -> code function. It takes code (a Julia expression) as input and spits out code (a different Julia expression).
Why is this useful? It has multiple purposes:
compile time copy-and-paste: You don't have to write the same piece of code multiple times but instead can define a short macro that writes it for you wherever you put it. (example)
domain specific language (DSL): You can create special syntax that after the macros code -> code transform is replaced by pure Julia constructs. This is used in many packages to define special syntax, for example here and here.
code generation: Imagine you want to write a really long piece of code which, although being long, is very simple because it has some kind of pattern that repeats itself rather trivially. Writing that code by hand can be a pain (or even practically impossible). A macro can programmatically generate the code for you. One example is for-loop unrolling (see here and here). But even the #time macro isn't doing much more than just putting a bunch of Base.time_ns() function calls around the provided Julia expression.
special string parsing: If you type the literal 3.2 in Julia it will be parsed and interpreted as a Float64. Now, imagine you want to supply a number literally that goes beyond Float64 precision but would fit into a BigFloat. Typing big(3.123124812498124812498) won't work, because the literal number is first interpreted as a Float64 and then handed to the big function. Instead you need a way to tell Julia at parse time that this should become a BigFloat. This is handled by a #big_str 3.2 macro which (for convenience) can also be written as big"3.2". The latter is just syntactic sugar.
There might be many more applications of macros, but those are the most important to me.
Let me end by referencing Steven G. Johnson's great talk at JuliaCon 2019:
Most of the time, don't do metaprogramming :)

What is the effect of ftype declarations on built-in functions in SBCL?

I'm building on some old Common Lisp code written by others, which includes lines such as the following at the start of a few functions:
(declare (ftype (function (&rest float) float) + - * min max))
My understanding is that the purpose of this is to tell the compiler that the five functions listed at the end of the form will only be passed floats. The compiler may use this information to create more efficient code.
Some Lisps do not complain about this declaration (ABCL, CCL, ECL, LispWorks, CLISP), but SBCL will not accept this declaration in the default configuration. SBCL can be made to accept it by placing
(unlock-package 'common-lisp)
in the .sbclrc initialization file. That's what I've been doing for the last year or so. I assume that this is needed because +, -, etc. are in that package, and the code alters these functions' declarations.
My question is: Can declaring the function type of built-in functions such as + and min have a beneficial effect on compiled code in SBCL? (If it can, then why does SBCL complain about these declarations by default?) Would I be better off removing such ftype declarations, and then getting rid of the unlock-package line in .sbclrc?
Thanks.
My understanding is that the purpose of this is to tell the compiler that the five functions listed at the end of the form will only be passed floats. The compiler may use this information to create more efficient code.
Also, they will only return floats. With certain optimization settings, the a Common Lisp compiler does not generate runtime checks and may only generate code for float computations. Also SBCL may show compile-time warnings in certain cases, where it detects that code violates type declarations.
It's also a source for errors, since from now on (within the scope of the declaration) basic functions like +and - are declared not to work on other number types (integer, complex, ...).
So, what is the purpose for these declarations? Since it is portable code (and most implementations don't implement compile-time type checking), it can only be for optimization purposes. Some of that might not be necessary in SBCL, since it uses type inference.
Why does SBCL not allow to alter the built-in functionality by default? It is so to prevent shooting in your foot: you are altering the base language. Now basic numeric operations may lead to errors.
Ways to deal with that:
use only local declarations, don't alter the language globally. You indicate that these are only locally declared - that's good.
declare the values of variables instead
write special functions for the float case and declare them inline.
only unlock the package CL during compilation of these few functions. keep it locked later.
My question is: Can declaring the function type of built-in functions such as + and min have a beneficial effect on compiled code in SBCL?
You can check that by looking at the disassembled code and also by profiling. Make sure that you compile the function with the right optimization setting. In Common Lisp the function DISASSEMBLE should show you machine code in a readable way. The SBCL compiler should also tell you if it can't optimize the compiled code.

Why would Common Lisp (SBCL) use so much memory for a simple program?

since I'm a newbie to Common Lisp I tried to solve problems on SPOJ by using Common Lisp (SBCL). The first problem is a simple task of reading numbers until number 42 is found. Here's my solution:
(defun study-num ()
(let ((num (parse-integer (read-line t))))
(when (not (= num 42))
(format t "~A~%" num)
(study-num))))
(study-num)
The solution is accepted. But when I looked into the details of the result I found it used 57M of MEM! It's bloody unreasonable but I can't figure out why. What can I do to make an optimization?
You are making repeated recursive calls, without enough optimization switched on to enable tail-call elimination (SBCL does do this, but only when you have "optimize for speed" set high and "optimize for debug info" set low).
The Common Lisp standard leaves tail-call elimination as an implementation quality issue and provides other looping constructs (like LOOP or DO, both possibly suitable for this application).
In addition, a freshly started SBCL is probably going to be larger than you expect, due to needing to pull in its runtime environment and base image.
I think yo are not realizing that Common Lisp is an online language environment with full library and compiler loaded into RAM just to give you the first prompt. After that, load in your program is probably a hardly even noticeable increase in size. Lisp does not compile and link an independent executable file made of only your code and whatever lib routines reachable from your code. That's what C and similar languages do. Instead, Lisp adds your code into its already sizeable online environment. As a new user it seams horrible. But if you have a modern general purpose computer with 100's MB of RAM, it quickly becomes something you can forget about as you enjoy the benefits of the online environment. Thins is also called a "dynamic language environment."
Various Lisp implementations have different ways to create programs. One is to dump an image of the memory of a Lisp system and to write that to disk. On restart this image is loaded with a runtime and then started again. This is quite common.
This is also what SBCL does when it saves an executable. Thus this executable includes the full SBCL.
Some other implementations are creating smaller executables using images (CLISP), some can remove unused code from executables (Allegro CL, LispWorks) and others are creating very small programs via compilation to C (mocl).
SBCL has only one easy way to reduce the size of an executable: one can compress the image.

How can I label my sub-processes for logging when using multicore and doMC in R

I have started using the doMC package for R as the parallel backend for parallelised plyr routines.
The parallelisation itself seems to be working fine (though I have yet to properly benchmark the speedup), my problem is that the logging is now asynchronous and messages from different cores are getting mixed in together. I could created different logfiles for each core, but I think I neater solution is to simply add a different label for each core. I am currently using the log4r package for my logging needs.
I remember when using MPI that each processor got a rank, which was a way of distinguishing each process from one another, so is there a way to do this with doMC? I did have the idea of extracting the PID, but this does seem messy and will change for every iteration.
I am open to ideas though, so any suggestions are welcome.
EDIT (2011-04-08): Going with the suggestion of one answer, I still have the issue of correctly identifying which subprocess I am currently inside, as I would either need separate closures for each log() call so that it writes to the correct file, or I would have a single log() function, but have some logic inside it determining which logfile to append to. In either case, I would still need some way of labelling the current subprocess, but I am not sure how to do this.
Is there an equivalent of the mpi_rank() function in the MPI library?
I think having multiple process write to the same file is a recipe for a disaster (it's just a log though, so maybe "disaster" is a bit strong).
Often times I parallelize work over chromosomes. Here is an example of what I'd do (I've mostly been using foreach/doMC):
foreach(chr=chromosomes, ...) %dopar% {
cat("+++", chr, "+++\n")
## ... some undoubtedly amazing code would then follow ...
}
And it wouldn't be unusual to get output that tramples over each other ... something like (not exactly) this:
+++chr1+++
+++chr2+++
++++chr3++chr4+++
... you get the idea ...
If I were in your shoes, I think I'd split the logs for each process and set their respective filenames to be unique with respect to something happening in that process's loop (like chr in my case above). Collate them later if you must ... ie. map/reduce your log files :-)

Resources