How can I find the source C code of the function grDevices:::C_col2rgb?
I've been led to this function after benchmarking (using R pkg profvis) some RGL functions, namely rgl:::rgl.quads and functions called therein. The corresponding R function that wraps C_col2rgb is col2rgb from grDevices. I'm interested in looking at the source of C_col2rgb to see whether I could make a faster version.
And, in general, when you encounter a C function being used in R code, is there an expedite way of finding its source code?
Many thanks!
Normally when you want to view the source code of an R function, you can just type its name in the console and press enter. However, when that function is written in another language, such as C, and exposed to R, you will just eventually see (something like)
.Call(C_col2rgb, col, alpha)
where R calls the compiled code. To see the source code of such functions, you actually have to look at the package source code. The function you are talking about is in the grDevices package, which is part of what is often called "base R" (not (necessarily) to be confused with the R package base) -- the package ships with all R installations.
There is an R source code mirror on GitHub at https://github.com/wch/r-source that I like to consult if I need to look at R's source code. The code for the grDevices package is there at https://github.com/wch/r-source/tree/trunk/src/library/grDevices.
As I mentioned in the comments, you can find the code for C_col2rgb() at r-source/src/library/grDevices/src/colors.c. However, there it looks like it's just called col2rgb(). Is it really the same?
Yes. If you consult Writing R Extensions, Section 1.5.4, you see that
A NAMESPACE file can contain one or more useDynLib directives which allows shared objects that need to be loaded.... Using argument .fixes allows an automatic prefix to be added to the registered symbols, which can be useful when working with an existing package. For example, package KernSmooth has
useDynLib(KernSmooth, .registration = TRUE, .fixes = "F_")
which makes the R variables corresponding to the Fortran symbols F_bkde and so on, and so avoid clashes with R code in the namespace.
We can see in the NAMESPACE file for grDevices
useDynLib(grDevices, .registration = TRUE, .fixes = "C_")
So, the C functions that are made available from this package will all be prefixed with C_ even though they aren't in the C source code. This lets you call both the R and the C functions col2rgb without causing any problems.
Related
When I do a package including some C code or using Rcpp, I type the roxygen code:
#' #useDynLib TheDLL, .registration=true
I did a package in which I included some DLLs created with Haskell, that I put in the inst/libs folder. I didn't type .registration=true in the roxygen code and the package works fine. Should I type it nevertheless? If so, what is the role of .registration=true?
I think you almost certainly shouldn't use it in a general-purpose DLL, but if the DLL was written specifically for R, maybe you should. It indicates that the dll calls R_registerRoutines from its R_init_DLLNAME function, so entry points can be saved into variables. For example, you might have a function named "foo". You can call it using
.Call("foo", ...)
without registering it, and R will need to search symbol tables for it at run time. Or you can register it and call it as
.Call(foo, ...)
and the search is unnecessary. This is discussed mainly in section 5.4.2 of "Writing R Extensions". I believe that if you specify .registration=true then R will use the registration information to find entry points, otherwise it needs to search through all the exports of the DLL, which is probably slower.
I’ve written some R functions and dropped them into a script file using RStudio. These are bits of code that I use over and over, so I’m wondering how I might most easily create an R package out of them (for my own private use).
I’ve read various “how to” guides online but they’re quite complicated. Can anyone suggest an “idiot’s guide” to doing this please?
I've been involved in creating R packages recently, so I can help you with that. Before proceeding to the steps to be followed, there are some pre-requisites, which include:
RStudio
devtools package (for most of the functions involved in creation of a package)
roxygen2 package (for roxygen documentation)
In case you don't have the aforementioned packages, you can install them with these commands respectively:
install.packages("devtools")
install.packages("roxygen2")
Steps:
(1) Import devtools in RStudio by using library(devtools).
(devtools is a core package that makes creating R packages easier with its tools)
(2) Create your package by using:
create_package("~/directory/package_name") for a custom directory.
or
create_package("package_name") if you want your package to be created in current workspace directory.
(3) Soon after you execute this function, it will open a new RStudio session. You will observe that in the old session some lines will be auto-generated which basically tells R to create a new package with required components in the specified directory.
After this, we are done with this old instance of RStudio. We will continue our work on the new RStudio session window.
By far the package creation part is already over (yes, that simple) however, a package isn't directly functionable just by its creation plus the fact that you need to include a function in it requires some additional aspects of a package such as its documentation (where the function's title, parameters, return types, examples etc as mentioned using #param, #return etc - you would be familiar if you see roxygen documentation like in some github repositories) and R CMD checks to get it working.
I'll get to that in the subsequent steps, but just in case you want to verify that your package is created, you can look at:
The top right corner of the new RStudio session, where you can see the package name that you created.
The console, where you will see that R created a new directory/folder in the path that we specified in create_package() function.
The files panel of RStudio session, where you'll notice a bunch of new files and directories within your directory.
(4) As you mentioned in your words, you drop your functions in a script file - hence you will need to create the script first, which can be done using:
use_r("function_name")
A new R script will pop up in your working session, ready to be used.
Now go ahead and write your function(s) in it.
(5) After your done, you need to load the function(s) you have written for your package. This is accomplished by using the devtools::load_all() function.
When you execute load_all() in the console, you'll get to know that the functions have been loaded into your package when you'll see Loading package_name displayed in console.
You can try calling your functions after that in the console to verify that they work as a part of the package.
(6) Now that your function has been written and loaded into your package, it is time to move onto checks. It is a good practice to check the whole package as we make changes to our package. The function devtools::check() offers an easy way to do this.
Try executing check() in the console, it will go through a number of procedures checking your package for warnings/errors and give details for the same as messages on the screen (pertaining to what are the errors/warnings/notes). The R CMD check results at the end will contain the vital logs for you to see what are the errors and warnings you got along with their frequency.
If the functions in your package are written well, (with additional package dependencies taken care of) it will give you two warnings upon execution of check:
The first warning will be regarding the license that your package uses, which is not specified for a new pacakge.
The second should be the one for documentation, warning us that our code is not documented.
To resolve the first issue which is the license, use the use_mit_license("license_holder_name") command (or any other license which suits your package - but then for private use as you mentioned, it doesn't really matter what you specify if only your going to use it or not its to be distributed) with your name as in place of license_holder_name or anything which suits a license name.
This will add the license field in the .DESCRIPTION file (in your files panel) plus create additional files adding the license information.
Also you'll need to edit the .DESCRIPTION file, which have self-explanatory fields to fill-in or edit. Here is an example of how you can have it:
Package: Your_package_name
Title: Give a brief title
Version: 1.0.0.0
Authors#R:
person(given = "Your_first_name",
family = "Your_surname/family_name",
role = c("package_creator", "author"),
email = "youremailaddress#gmail.com",
comment = c(ORCID = "YOUR-ORCID-ID"))
Description: Give a brief description considering your package functionality.
License: will be updated with whatever license you provide, the above step will take care of this line.
Encoding: UTF-8
LazyData: true
To resolve the documentation warning, you'll need to document your function using roxygen documentation. An example:
#' #param a parameter one
#' #param b parameter two
#' #return sum of a and b
#' #export
#'
#' #examples
#' yourfunction(1,2)
yourfunction <- function(a,b)
{
sum <- a+b
return(sum)
}
Follow the roxygen syntax and add attributes as you desire, some may be optional such as #title for specifying title, while others such as #import are required (must) if your importing from other packages other than base R.
After your done documenting your function(s) using the Roxygen skeleton, we can tell our package that we have documented our functions by running devtools::document(). After you execute the document() command, perform check() again to see if you get any warnings. If you don't, then that means you're good to go. (you won't if you follow the steps)
Lastly, you'll need to install the package, for it to be accessible by R. Simply use the install() command (yes the same one you used at the beginning, except you don't need to specify the package here like install("package") since you are currently working in an instance where the package is loaded and is ready to be deployed/installed) and you'll see after a few lines of installation a statement like "Done (package_name)", which indicates the installation of our package is complete.
Now you can try your function by first importing your package using library("package_name") and then calling your desired function from the package. Thats it, congrats you did it!
I've tried to include the procedure in a lucid way (the way I create my R packages), but if you have any doubts feel free to ask.
I am writing an R package that should be able to compile C++ code on the fly. In practice, users can define, at run-time, operators based on C++ code that is compiled and then used in computation (for efficiency purpose, like PyTorch or TensorFlow models in Python). Ideally, the code compiled at run-time should use Rcpp features to be exported to R.
Example:
In my R package, I have a function def_operator that can parse some mathematical formula defining an operator.
my_custom_op <- def_operator("x+y", args = c("x", "y"))
My Cpp API knows how to generate the Cpp code associated to this formula. This code should be compiled on the fly (just once, not at each call).
The user can use this new function to do some computations.
res <- my_custom_op(1, 3) # should give 4
Note: this is an example, the operators defined by the user aim at doing more some adding scalar numbers, and the interest is clearly to let the user defines its operators and not to pre-define some generic operators compiled at installation.
I know two things for the moment:
the Cpp code required to generate the operators (which is not compiled at installation) should be put in the inst package directory, it will be copied at installation and I can find where with the R function find.package.
I can use the function sourceCpp to compile code on the fly. Thus I can define some functions in Cpp that will be automatically exported to R and be callable there. It is even possible to keep the shared library to avoid multiple compilations (see Rcpp: how to keep files generated by sourceCpp?)
Here are my questions:
Do you know some alternative to sourceCpp from the Rcpp package to compile C++ code on the fly and export it to R?
Is there some way to manage compilation option for sourceCpp other than using the file ~/.R/Makevars (I need to link the code in the inst directory and I don't want to edit this file on the user system)?
Eventually, do you know some R packages implementing compilation on the fly that I could take as examples?
Do you know some alternative to sourceCpp from the Rcpp package to compile C++ code on the fly and export it to R?
Using sourceCpp() is the best approach. Alternatively, you can use its predecessor from the inline R package. Otherwise, you will need to build your own file via R CMD SHLIB, load the library, and create a wrapper yourself. (Not fun.)
Is there some way to manage compilation option for sourceCpp other than using the file ~/.R/Makevars (I need to link the code in the inst directory and I don't want to edit this file on the user system)?
Yes, there are many Makevars variables that can be set per R session via Sys.setenv("PKG_LIBS" = ...).
Now, to retrieve a file location dynamically, consider RcppMLPACK1's flag function approach.
Eventually, do you know some R packages implementing compilation on the fly that I could take as examples?
There are a couple entrants in this market:
armacmp package by Dirk Schumacher that translates R code to C++ under the armadillo library.
nCompiler package by Perry de Valpine et al. for code-generating C++ and easily interfacing between R and C++.
I am using Rcpp 0.12.11 and R 3.4.0.
When I upgraded Rcpp to 0.12.11, the automatically generated R file RcppExports.R by the Rcpp::compileAttributes started to give me slightly different function calls
run_graph_match <- function(A, B, algorithm_params) {
# Rcpp 0.12.10
.Call('RGraphM_run_graph_match', PACKAGE = 'RGraphM', A, B, algorithm_params)
# Rcpp 0.12.11
.Call(RGraphM_run_graph_match, A, B, algorithm_params)
}
Is there an easy way to explain the reason behind the change?
The latter function leads to errors when checking the R package. For example, errors such as
symbol 'RGraphM_run_graph_match' not in namespace:
.Call(RGraphM_run_graph_match, A, B, algorithm_params)
Congratulations, you've experienced the Section 5.4: Registering native routines requirement added in R 3.4.0. The requirement mandated the inclusion of a src/init.c file that registered each C++ function and their parameters. Thus, Rcpp 0.12.11 generates this file inside of the RcppExports.cpp. Meanwhile, the RcppExports.R file, which is what this question is based upon, has its context being dependent on whether the user appropriately sets useDynLib(pkgname, .registration=TRUE) or useDynLib(pkgname), where the later is not ideal as it does not take advantage of a new option introduced in Rcpp 0.12.11 discussed next.
As a result of this shift in CRAN policy, JJ Allaire, the creator of Attributes for Rcpp 1, was inspired to advance a suggestion made by Douglas Bates back in 2012 when attributes was first added. Specifically, the goal was to change the call from being string-based to being a symbol. The rationale behind the change is simply put that a symbol is onhand when the package loads vs. a string which has to be looked up and converted into a symbol each time the function is run. Therefore, symbol lookup is less expensive on repetitive calls when compared to the string based method of Rcpp in the past.
Basically, this line:
.Call('RGraphM_run_graph_match', PACKAGE = 'RGraphM', A, B, algorithm_params)
Involved R looking up the symbol on each call of the encompassing R function to access the C++ function.
Meanwhile, this line:
.Call(RGraphM_run_graph_match, A, B, algorithm_params)
is a direct call to the C++ function as the symbol is already in memory.
And those are primarily the reasons behind why Rcpp changed how RcppExports.R was automatically generated. One of the downside of this approach is the inability to globally export all functions like before. In particular, some users that had in their NAMESPACE file a global symbol export statement e.g.
exportPattern("^[[:alpha:]]+")
had to remove it and opt to manually specify what functions or variables should be exported.
For more details, you may wish to see the GitHub PR that introduced this feature:
https://github.com/RcppCore/Rcpp/pull/694
1: For more on Attributes, see my history post: http://thecoatlessprofessor.com/programming/rcpp/to-rcpp-attributes-and-beyond-from-inline/
I am working on a family of R packages, all of which share substantial common code which is housed in an internal package, lets call it myPackageUtilities. So I have several packages
myPackage1, myPackage2, etc...
All of these packages depend on every method in myPackageUtilities. For a real-world example, please see statnet on CRAN. Each dependent package, of course, has
Depends: myPackageUtilities
in its DESCRIPTION file.
My question is: In the R code for myPackage1, which of the following two techniques for accessing methods from myPackageUtilities is preferable:
Use ::: to access the methods in myPackageUtilities, or
Export everything from myPackageUtilities (e.g. by including exportPattern("^[^\\.]") in the NAMESPACE)?
Option 2 clutters the end-user's search path, but the R gurus recommend against using :::.
Follow-up question: If (2) is the better choice, is there a way to export everything using roxygen2?
Suppose we have a package called randomUtils and this package has a function called sd that calculates the Slytherin Defiance Quotient.
Now I write a package called spellbound. If spellbound Depends on randomUtils, then randomUtils::sd will be found in the search path and can conflict with calculating standard deviation.
If spellbound Imports randomUtils, however, then R will install randomUtils but will not load it when spellbound is loaded. Thus, the new version of sd can't be found on the search path, but can still be accessed by randomUtils::sd
With an ever growing body of contributed work on CRAN, it is becoming very important that we use Imports as much as possible so that we don't introduce unexpected behaviors by having conflicting function definitions.
An example of when I have used Depends: when writing the HydeNet package, I wanted the end user to be able to use the rjags package in concert with HydeNet. So I put rjags in Depends so that library (HydeNet) would loaf both packages. (In other words, put rjags on the search path.
Moral of the story, if you don't intend for the user to directly access the functions, it should go in Imports.