I am writing an R package that should be able to compile C++ code on the fly. In practice, users can define, at run-time, operators based on C++ code that is compiled and then used in computation (for efficiency purpose, like PyTorch or TensorFlow models in Python). Ideally, the code compiled at run-time should use Rcpp features to be exported to R.
Example:
In my R package, I have a function def_operator that can parse some mathematical formula defining an operator.
my_custom_op <- def_operator("x+y", args = c("x", "y"))
My Cpp API knows how to generate the Cpp code associated to this formula. This code should be compiled on the fly (just once, not at each call).
The user can use this new function to do some computations.
res <- my_custom_op(1, 3) # should give 4
Note: this is an example, the operators defined by the user aim at doing more some adding scalar numbers, and the interest is clearly to let the user defines its operators and not to pre-define some generic operators compiled at installation.
I know two things for the moment:
the Cpp code required to generate the operators (which is not compiled at installation) should be put in the inst package directory, it will be copied at installation and I can find where with the R function find.package.
I can use the function sourceCpp to compile code on the fly. Thus I can define some functions in Cpp that will be automatically exported to R and be callable there. It is even possible to keep the shared library to avoid multiple compilations (see Rcpp: how to keep files generated by sourceCpp?)
Here are my questions:
Do you know some alternative to sourceCpp from the Rcpp package to compile C++ code on the fly and export it to R?
Is there some way to manage compilation option for sourceCpp other than using the file ~/.R/Makevars (I need to link the code in the inst directory and I don't want to edit this file on the user system)?
Eventually, do you know some R packages implementing compilation on the fly that I could take as examples?
Do you know some alternative to sourceCpp from the Rcpp package to compile C++ code on the fly and export it to R?
Using sourceCpp() is the best approach. Alternatively, you can use its predecessor from the inline R package. Otherwise, you will need to build your own file via R CMD SHLIB, load the library, and create a wrapper yourself. (Not fun.)
Is there some way to manage compilation option for sourceCpp other than using the file ~/.R/Makevars (I need to link the code in the inst directory and I don't want to edit this file on the user system)?
Yes, there are many Makevars variables that can be set per R session via Sys.setenv("PKG_LIBS" = ...).
Now, to retrieve a file location dynamically, consider RcppMLPACK1's flag function approach.
Eventually, do you know some R packages implementing compilation on the fly that I could take as examples?
There are a couple entrants in this market:
armacmp package by Dirk Schumacher that translates R code to C++ under the armadillo library.
nCompiler package by Perry de Valpine et al. for code-generating C++ and easily interfacing between R and C++.
Related
When I do a package including some C code or using Rcpp, I type the roxygen code:
#' #useDynLib TheDLL, .registration=true
I did a package in which I included some DLLs created with Haskell, that I put in the inst/libs folder. I didn't type .registration=true in the roxygen code and the package works fine. Should I type it nevertheless? If so, what is the role of .registration=true?
I think you almost certainly shouldn't use it in a general-purpose DLL, but if the DLL was written specifically for R, maybe you should. It indicates that the dll calls R_registerRoutines from its R_init_DLLNAME function, so entry points can be saved into variables. For example, you might have a function named "foo". You can call it using
.Call("foo", ...)
without registering it, and R will need to search symbol tables for it at run time. Or you can register it and call it as
.Call(foo, ...)
and the search is unnecessary. This is discussed mainly in section 5.4.2 of "Writing R Extensions". I believe that if you specify .registration=true then R will use the registration information to find entry points, otherwise it needs to search through all the exports of the DLL, which is probably slower.
I have already made a simple R package (pure R) to solve a problem with brute force then I tried to faster the code by writing the Rcpp script. I wrote a script to compare the running time with the "bench" library. now, how can I add this script to my package? I tried to add
#'#importFrom Rcpp cppFunction
on top of my R script and inserting the Rcpp file in the scr folder but didn't work. Is there a way to add it to my r package without creating the package from scratch? sorry if it has already been asked but I am new to all this and completely lost.
That conversion is actually (still) surprisingly difficult (in the sense of requiring more than just one file). It is easy to overlook details. Let me walk you through why.
Let us assume for a second that you started a working package using the R package package.skeleton(). That is the simplest and most general case. The package will work (yet have warning, see my pkgKitten package for a wrapper than cleans up, and a dozen other package helping functions and packages on CRAN). Note in particular that I have said nothing about roxygen2 which at this point is a just an added complication so let's focus on just .Rd files.
You can now contrast your simplest package with one built by and for Rcpp, namely by using Rcpp.package.skeleton(). You will see at least these differences in
DESCRIPTION for LinkingTo: and Imports
NAMESPACE for importFrom as well as the useDynLib line
a new src directory and a possible need for src/Makevars
All of which make it easier to (basically) start a new package via Rcpp.package.skeleton() and copy your existing package code into that package. We simply do not have a conversion helper. I still do the "manual conversion" you tried every now and then, and even I need a try or two and I have seen all the error messages a few times over...
So even if you don't want to "copy everything over" I think the simplest way is to
create two packages with and without Rcpp
do a recursive diff
ensure the difference is applied in your original package.
PS And remember that when you use roxygen2 and have documentation in the src/ directory to always first run Rcpp::compileAttributes() before running roxygen2::roxygenize(). RStudio and other helpers do that for you but it is still easy to forget...
How can I find the source C code of the function grDevices:::C_col2rgb?
I've been led to this function after benchmarking (using R pkg profvis) some RGL functions, namely rgl:::rgl.quads and functions called therein. The corresponding R function that wraps C_col2rgb is col2rgb from grDevices. I'm interested in looking at the source of C_col2rgb to see whether I could make a faster version.
And, in general, when you encounter a C function being used in R code, is there an expedite way of finding its source code?
Many thanks!
Normally when you want to view the source code of an R function, you can just type its name in the console and press enter. However, when that function is written in another language, such as C, and exposed to R, you will just eventually see (something like)
.Call(C_col2rgb, col, alpha)
where R calls the compiled code. To see the source code of such functions, you actually have to look at the package source code. The function you are talking about is in the grDevices package, which is part of what is often called "base R" (not (necessarily) to be confused with the R package base) -- the package ships with all R installations.
There is an R source code mirror on GitHub at https://github.com/wch/r-source that I like to consult if I need to look at R's source code. The code for the grDevices package is there at https://github.com/wch/r-source/tree/trunk/src/library/grDevices.
As I mentioned in the comments, you can find the code for C_col2rgb() at r-source/src/library/grDevices/src/colors.c. However, there it looks like it's just called col2rgb(). Is it really the same?
Yes. If you consult Writing R Extensions, Section 1.5.4, you see that
A NAMESPACE file can contain one or more useDynLib directives which allows shared objects that need to be loaded.... Using argument .fixes allows an automatic prefix to be added to the registered symbols, which can be useful when working with an existing package. For example, package KernSmooth has
useDynLib(KernSmooth, .registration = TRUE, .fixes = "F_")
which makes the R variables corresponding to the Fortran symbols F_bkde and so on, and so avoid clashes with R code in the namespace.
We can see in the NAMESPACE file for grDevices
useDynLib(grDevices, .registration = TRUE, .fixes = "C_")
So, the C functions that are made available from this package will all be prefixed with C_ even though they aren't in the C source code. This lets you call both the R and the C functions col2rgb without causing any problems.
I'm building a package that uses two main functions. One of the functions model.R requires a special type of simulation sim.R and a way to set up the results in a table table.R
In a sharable package, how do I call both the sim.R and table.R files from within model.R? I've tried source("sim.R") and source("R/sim.R") but that call doesn't work from within the package. Any ideas?
Should I just copy and paste the codes from sim.R and table.R into the model.R script instead?
Edit:
I have all the scripts in the R directory, the DESCRIPTION and NAMESPACE files are all set. I just have multiple scripts in the R directory. ~R/ has premodel.R model.R sim.R and table.R. I need the model.R script to use both sim.R and table.R functions... located in the same directory in the package (e.g. ~R/).
To elaborate on joran's point, when you build a package you don't need to source functions.
For example, imagine I want to make a package named TEST. I will begin by generating a directory (i.e. folder) named TEST. Within TEST I will create another folder name R, in that folder I will include all R script(s) containing the different functions in the package.
At a minimum you need to also include a DESCRIPTION and NAMESPACE file. A man (for help files) and tests (for unit tests) are also nice to include.
Making a package is pretty easy. Here is a blog with a straightforward introduction: http://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/
As others have pointed out you don't have to source R files in a package. The package loading mechanism will take care of losing the namespace and making all exported functions available. So usually you don't have to worry about any of this.
There are exceptions however. If you have multiple files with R code situations can arise where the order in which these files are processed matters. Often it doesn't matter or the default order used by R happens to be fine. If you find that there are some dependencies within your package that aren't resolved properly you may be faced with a situation where a custom processing order for the R files is required. The DESCRIPTION file offers the optional Collate field for this purpose. Simply list all your R files in the order they should be processed to satisfy the dependencies.
If all your files are in R directory, any function will be in memory after you do a package build or Load_All.
You may have issues if you have code in files that is not in a function tho.
R loads files in alphabetical order.
Usually, this is not a problem, because functions are evaluated when they are called for execution, not at loading time (id. a function can refer another function not yet defined, even in the same file).
But, if you have code outside a function in model.R, this code will be executed immediately at time of file loading, and your package build will fail usually with a
ERROR: lazy loading failed for package 'yourPackageName'
If this is the case, wrap the sparse code of model.R into a function so you can call it later, when the package has fully loaded, external library too.
If this piece of code is there for initialize some value, consider to use_data() to have R take care of load data into the environment for you.
If this piece of code is just interactive code written to test and implement the package itself, you should consider to put it elsewhere or wrap it to a function anyway.
if you really need that code to be executed at loading time or really have dependency to solve, then you must add the collate line into DESCRIPTION file, as already stated by Peter Humburg, to force R to load files order.
Roxygen2 can help you, put before your code
#' #include sim.R table.R
call roxygenize(), and collate line will be generate for you into the DESCRIPTION file.
But even doing that, external library you may depend are not yet loaded by the package, leading to failure again at build time.
In conclusion, you'd better don't leave code outside functions in a .R file if it's located inside a package.
Since you're building a package, the reason why you're having trouble accessing the other functions in your /R directory is because you need to first:
library(devtools)
document()
from within the working directory of your package. Now each function in your package should be accessible to any other function. Then, to finish up, do:
build()
install()
although it should be noted that a simple document() call will already be sufficient to solve your problem.
Make your functions global by defining them with <<- instead of <- and they will become available to any other script running in that environment.
I'm writing a R package which begins to grow in size, and so would really appreciate to use a custom structure in folders pkg/R/ and (especially) in pkg/src/.
For example, let's say I have two families of algorithms of some type A, and some functions of type B, and a main entry point. Ideally R/ or src/ folders would be organized as follow:
typeA/
algorithms1/
algo11.ext
...
algorithms2/
algo21.ext
...
typeB/
function1.ext
...
main.ext
with "ext" in {R,cpp,c,f,...}, and potentially two files having the same name.
Is it possible ? If yes, how can I do that ?
Thanks in advance !
[2012-12-31] EDIT: an idea would be to write a few scripts - maybe inside another R package - to (un)flatten a structured package for tests or diffusion. But there is probably a better solution, so I will wait a bit.
As the 'Writing R extensions' manual indicates here, a Makevars file under pkg/src allows to have nested subfolders for C/C++/Fortran code. (See e.g. RSiena package).
However, I didn't find anything concerning a custom structure in pkg/R. So I wrote a little package (usable, although needing improvements) which accomplish the following tasks:
Load/Unload a package having (potentially) nested folders under pkg/R
Launch R and/or C unit tests on it [basic framework, to be replaced (e.g. RUnit and check)]
Export the package to be CRAN-compatible (flatten R code, generate Makevars file)
I will link it here if it reaches a publishable state. (For the moment I could send it by email).
The official package documentation https://cran.r-project.org/doc/manuals/r-devel/R-exts.html, section 1.1.5 contains this quote:
The R and man subdirectories may contain OS-specific subdirectories named unix or windows.
I've tried creating a simple test package with subdirectories in R-3.5.1 and it did not work properly.
Nor devtools::load_all() nor R CMD build successfully exported code from subdirectories in R.