Build R package with C Code for Linux and Windows - r

I have been doing research and I can't quite figure out how to build my R package, that calls C functions, in order for it to work in both Windows and Linux environments. I am building the package on a Linux machine.
I have two C files, one.C and two.C, I place the two files in the src directory after using package.skeleton(...). In the namespace file I use the command: useDynLib(one,two). Is this correct? Or do I need to put the actual function names instead of the file names? Do I need to export the function names?
Do I need to put the .so files in the src directory or will these be created automatically? I am worried then it won't work on a windows machine which needs a .dll file.
As you can see I'm a little confused, thanks for the help.

One of the standard R manuals is Writing R Extensions. Part of this manual is the section 5 System and foreign language interfaces. This will probably answer the majority of your questions. In regard to the dynamically linked libraries (dll or so), they are built on the fly. You develop your package, including the C code. Once you want to install the library from source (e.g. using R CMD INSTALL spam), or create a binary distribution, the C code will be compiled into the appropriate library file.

Faced with similar headaches I switched to C++ in combination with Rcpp. Rcpp takes care of all the headaches for you in compiling packages:
http://dirk.eddelbuettel.com/code/rcpp.html
There is also an entire vignette on how to build a package using Rcpp:
http://dirk.eddelbuettel.com/code/rcpp/Rcpp-package.pdf

Related

How does a typical Rcpp edit-compile-test cycle look like?

I can only find information on how to install a ready-made R extension package, but it is nowhere mentioned which commands a developer of an extension package has to use during daily development. I am using Rcpp and I am on Windows.
If this were a typical C++ project, it would go like this:
edit
make # oops, typo
edit # fix typo
make # oops, forgot an #include
edit
make # good; updates header dependencies for subsequent 'make' automatically
./fooreader # test it
make install # only now I'm ready
Which commands do I need for daily development of an Rcpp package project?
I've allocated a skeleton project using these commands from the R command line:
library(Rcpp)
Rcpp.package.skeleton("FooReader", example_code=FALSE,
author="My Name", email="my.email#example.com")
This allocated 3 files:
DESCRIPTION
NAMESPACE
man/FooReader-package.Rd
Now I dropped source code into
src/readfoo.cpp
with these contents:
#include <Rcpp.h>
#error here
I know I can run this from the R command line:
Rcpp::sourceCpp("D:/Projects/FooReader/src/readfoo.cpp")
(this does run the compiler and indicates the #error).
But I want to develop a package ultimately.
There is no universal answer for everybody, I guess.
For some people, RStudio is everything, and with some reason. One can use the package creation facility to create an Rcpp package, then edit and just hit the buttons (or keyboard shortcuts) to compile and re-load and test.
I also work a lot on a shell, so I do a fair amount of editing in Emacs/ESS along with R CMD INSTALL (where thanks to ccache recompilation of unchanged code is immediate) with command-line use via r of the littler package -- this allows me to write compact expressions loading the new package and evaluating: r -lnewpackage -esomeFunc(somearg) to test newpackage::someFunc() with somearg.
You can also launch the build and test from Emacs. As I said, it all depends.
Both those answers are for package, where I do real work. When I just test something in a single file, I do that in one Emacs buffer and sourceCpp() in an R session in another buffer of the same Emacs. Or sometimes I edit in Emacs and run sourceCpp() in RStudio.
There is no one answer. Find what works for you.
Also, the first part of your question describes the initial setup of a package. That is not part of the edit/compile/link/test cycle as it is a one off. And for that too do we have different approaches many of which have been discussed here.
Edit: The other main misunderstanding of your question is that once you have package you generally do not use sourceCpp() anymore.
In order to test an R package, it has to be installed into a (temporary) library such that it can be attached to a running R process. So you will typically need:
R CMD build . to build package_version.tar.gz
R CMD check <package_version.tar.gz> to test your package, including tests placed into the testsfolder
R CMD INSTALL <package_version.tar.gz> to install it into a library
After that you can attach the package and test it. Quite often I try to use a more TTD approach, which means I do not have to INSTALL the package. Running the unit tests (e.g. via R CMD check) is enough.
All that is independent of Rcpp. For a package using Rcpp you need to call Rcpp::compileAttributes() before these steps, e.g. with Rscript -e 'Rcpp::compileAttributes()'.
If you use RStudio for package development, it offers a lot of automation via the devtools package. I still find it useful to know what has to go on under the hood and it is by no means required.

How to include bash scripts in a package?

I need to include several bash scripts in the R package I'm writing. I'd love to distribute them together with the package, so when a user installs the package via devtools::install_github(...) he/she gets the scripts as well.
I know it is possible, but I don't know how. Including the files in the scripts subdirectory doesn't seem to suffice. I need a means to tell R (or RStudio) to include them.
I use RStudio for development, so I would appreciate a solution that integrates with the "Build package" functionality that RStudio provides.
Simply add whatever you want to the inst/xxx folder in your package.
The folder will get installed as xxx when you compile/publish the package as a library.
You access the files via system.file(), e.g.
system.file('scripts/peak_mem.sh', package='clustertools')
See more details on the R packages by Hadley Wickham
Thank you #Axeman!

What type of object is an R package?

Probably a pretty basic question but a friend and I tried to run str(packge_name) and R threw us an error. Now that I'm looking at it, I'm wondering if an R package is like a .zip file in that it is a collection of objects, say pictures and songs, but not a picture or song itself.
If I tried to open a zip of pictures with an image viewer, it wouldn't know what to do until I unzipped it - just like I can't call str(forecast) but I can call str(ts) once I've loaded the forecast package into my library...
Can anyone set me straight?
R packages are generally distributed as compressed bundles of files. They can either be in "binary" form which are preprocessed at a repository to compile any C or Fortran source and create the proper headers, or they can be in source form where the various required files are available to be used in the installation process, but this requires that the users have the necessary compilers and tools installed at locations where the R build process using OS system resources can get at them.
If you read the documentation for a package at CRAN you see they are distributed in set of compressed formats that vary depending on the OS-targets:
Package source: Rcpp_0.11.3.tar.gz # the Linus/UNIX targets
Windows binaries: r-devel: Rcpp_0.11.3.zip, r-release: Rcpp_0.11.3.zip, r-oldrel: Rcpp_0.11.3.zip
OS X Snow Leopard binaries: r-release: Rcpp_0.11.3.tgz, r-oldrel: Rcpp_0.11.3.tgz
OS X Mavericks binaries: r-release: Rcpp_0.11.3.tgz
Old sources: Rcpp archive # not really a file but a web link
Once installed an R package will have a specified directory structure. The DESCRIPTION file is a text file with specific entries for components that determine whether the local installation meets the dependencies of the package. There are NAMESPACE, LICENSE, and INDEX files. There are directories named '/help', '/html', '/Meta', '/R', and possibly '/libs', '/demo', '/data', '/unitTests', and others.
This is the tree at the top of the ../library/Rcpp package directory:
$ ls
CITATION NAMESPACE THANKS examples libs
DESCRIPTION NEWS.Rd announce help prompt
INDEX R discovery html skeleton
Meta README doc include unitTests
So in the "life-cycle" of a package, there will be initially a series of required and optional files, which then get processed by the BUILD and CHECK mechanisms into an installed package, which than then get compressed for distribution, and later unpacked into a specified directory tree on the users machine. See these help pages:
?.libPaths # also describes .Library()
?package.skeleton
?install.packages
?INSTALL
And of course read Writing R Extensions, a document that ships with every installation of R.
Your question is:
What type of object is an R package?
Somehow, I’m still missing an answer to this exact question. So here goes:
As far as R is concerned, an R package is not an object. That is, it’s not an object in R’s type system. R is being a bit difficult, because it allows you to write
library(pkg_name)
Without requiring you to define pkg_name anywhere prior. In contrast, other objects which you are using in R have to be defined somewhere – either by you, or by some package that’s loaded either explicitly or implicitly.
This is unfortunate, and confuses people. Therefore, when you see library(pkg_name), think
library('pkg_name')
That is, imagine the package name in quotes. This does in fact work just as expected. The fact that the code also works without quotes is a peculiarity of the library function, known as non-standard evaluation. In this case, it’s mostly an unfortunate design decision (but there are reasons).
So, to repeat the answer: a package isn’t a type of R object1. For R, it’s simply a name which refers to a known location in the file system, similar to what you’ve assumed. BondedDust’s answer goes into detail to explain that structure, so I shan’t repeat it here.
1 For super technical details, see Joshua’s and Richard’s comments below.
From R's own documentation:
Packages provide a mechanism for loading optional code, data and
documentation as needed.…A package is a directory of files which
extend R, a source package (the master files of a package), or a
tarball containing the files of a source package, or an installed
package, the result of running R CMD INSTALL on a source package. On
some platforms (notably OS X and Windows) there are also binary
packages, a zip file or tarball containing the files of an installed
package which can be unpacked rather than installing from sources. A
package is not a library.
So yes, a package is not the functions within it; it is a mechanism to have R be able to use the functions or data which comprise the package. Thus, it needs to be loaded first.
I am reading Hadley's book Advanced-R (Chapter 6.3 - functions, p.79) and this quote will cover you I think:
Every operation is a function call
“To understand computations in R, two slogans are helpful:
Everything that exists is an object.
Everything that happens is a function call."
— John Chambers
According to that using library(name_of_library) is a function call that will load the package. Every little bit that has been loaded i.e. functions or data sets are objects which you can use by calling other functions. In that sense a package is not an object in any of R's environments until it is loaded. Then you can say that it is a collection of the objects it contains and which are loaded.

Is there any way to access C source code in an R package binary?

I would like to share my R package but keep the source code until after an article is published. If I compile a package using R CMD INSTALL --build, is there any way for an end user to read the C source code?
According to p 44 of R News 2006-4,
In order to access the sources of compiled code
(i.e., C, C++, or Fortran), it is not sufficient to have
the binary version of R or a contributed package
installed.
I would be satisfied with this knowledge (indeed, I would prefer to release the source), but I need to assuage the fears of my collaborators.
My primary question is to confirm: if I distribute a binary created by R CMD INSTALL --build, will the C source be inaccessible?
Update: it is not very clear to me why this question has received so many down votes (4 at this point). A downvote indicates "This question has not shown any research effort; it is unclear or not useful". I am only asking about native R functionality, not trying to promote any nefarious intent.
If the .c source files aren't in the distributed archive file (a .tar.gz for Linux, maybe a .zip for Windows) then no, you can't get the source. I just did a quick test with a skeletal package and a single foo.c file and its not there for me, just a compiled foo.so file.
Unless you've used Rcpp and put the C code into inline R functions, of course.
If you have only binary file output it is inaccessible for source code. Only way to get in to some idea is to disassembly. Of course all of your header files should be also compiled.

Dependency management in R

Does R have a dependency management tool to facilitate project-specific dependencies? I'm looking for something akin to Java's maven, Ruby's bundler, Python's virtualenv, Node's npm, etc.
I'm aware of the "Depends" clause in the DESCRIPTION file, as well as the R_LIBS facility, but these don't seem to work in concert to provide a solution to some very common workflows.
I'd essentially like to be able to check out a project and run a single command to build and test the project. The command should install any required packages into a project-specific library without affecting the global R installation. E.g.:
my_project/.Rlibs/*
Unfortunately, Depends: within the DESCRIPTION: file is all you get for the following reasons:
R itself is reasonably cross-platform, but that means we need this to work across platforms and OSs
Encoding Depends: beyond R packages requires encoding the Depends in a portable manner across operating systems---good luck encoding even something simple such as 'a PNG graphics library' in a way that can be resolved unambiguously across systems
Windows does not have a package manager
AFAIK OS X does not have a package manager that mixes what Apple ships and what other Open Source projects provide
Even among Linux distributions, you do not get consistency: just take RStudio as an example which comes in two packages (which all provide their dependencies!) for RedHat/Fedora and Debian/Ubuntu
This is a hard problem.
The packrat package is precisely meant to achieve the following:
install any required packages into a project-specific library without affecting the global R installation
It allows installing different versions of the same packages in different project-local package libraries.
I am adding this answer even though this question is 5 years old, because this solution apparently didn't exist yet at the time the question was asked (as far as I can tell, packrat first appeared on CRAN in 2014).
Update (November 2019)
The new R package renv replaced packrat.
As a stop-gap, I've written a new rbundler package. It installs project dependencies into a project-specific subdirectory (e.g. <PROJECT>/.Rbundle), allowing the user to avoid using global libraries.
rbundler on Github
rbundler on CRAN
We've been using rbundler at Opower for a few months now and have seen a huge improvement in developer workflow, testability, and maintainability of internal packages. Combined with our internal package repository, we have been able to stabilize development of a dozen or so packages for use in production applications.
A common workflow:
Check out a project from github
cd into the project directory
Fire up R
From the R console:
library(rbundler)
bundle('.')
All dependencies will be installed into ./.Rbundle, and an .Renviron file will be created with the following contents:
R_LIBS_USER='.Rbundle'
Any R operations run from within this project directory will adhere to the project-speciic library and package dependencies. Note that, while this method uses the package DESCRIPTION to define dependencies, it needn't have an actual package structure. Thus, rbundler becomes a general tool for managing an R project, whether it be a simple script or a full-blown package.
You could use the following workflow:
1) create a script file, which contains everything you want to setup and store it in your projectd directory as e.g. projectInit.R
2) source this script from your .Rprofile (or any other file executed by R at startup) with a try statement
try(source("./projectInit.R"), silent=TRUE)
This will guarantee that even when no projectInit.R is found, R starts without error message
3) if you start R in your project directory, the projectInit.R file will be sourced if present in the directory and you are ready to go
This is from a Linux perspective, but should work in the same way under windows and Mac as well.

Resources