R: prevent source re-compilation using devtools::install - r

I am in the process of developing an R package with external source code that takes a long time to compile. While compilation time isn't a big problem for a one-off installation, I have to routinely reinstall the package to test new additions. Is it possible to prevent re-compiling the source code if there haven't been any changes to it?
I don't necessarily need this to be automated, but I can't figure out a manual solution either. As my source code is in Rust, the following serves as the most representative example I have (note that it requires Rust cargo to be installed):
git clone https://github.com/r-rust/hellorust
Rscript -e "devtools::install('hellorust', quick = TRUE)"
When I run the above, I see that the hellorust.so file has been created in the src directory, but how do I make devtools::install() use this file rather than recompile everything? It doesn't seem like devtools::install(quick = TRUE) or devtools::install(build = FALSE) are meant for this...
Alternatively, is it possible to achieve the desired behavior on the Rust side of things? I don't understand why cargo would recompile everything if there haven't been any changes and the target directory is still there. That said, I'm quite new to Rust and compiled languages in general so my understanding of the broader concepts involved here is unfortunately quite limited...
I would also be interested to learn if there is a better way to test R packages during development than manually reinstalling them.

Based on the comments by r2evans, the final answer seems to be that this isn't what devtools::install is for.
As per the devtools documentation, there are three main tools for "frequent development tasks":
load_all
document
test
Of these load_all "simulates installing and reloading your package, loading R code in R/, compiled shared objects in src/ and data files in data/". By default, load_all() will not recompile source code in src/ (unless the recompile flag is set to true).
So the answer is to use load_all as opposed to install during package development and manually control when to compile the source code using something like devtools::compile_dll.

Related

Determining which R packages, and dependencies, use DLL files

I work in a corporate environment that uses Microsoft Windows Defender Application Control (WDAC) to provide security. This blocks unsigned EXE and DLL files from being installed on devices. R packages which use DLLs fail to install. The workaround to this is provide an R installation from an approved central source which also copies over a default set of packages, such as tidyverse, data.table etc. to the R library. Users can continue to install additional packages which are built with native R, but run into issues if they try to install, build from source, or update packages with DLL files in.
Is there a way to check whether a package uses DLL files in advance of installation?
Output something like:-
check_dll(foo)
result: "This package and its dependencies have no DLL files. You can install this package"
check_dll(bar)
result: "bar does not have any DLL files, but one dependency, OOF, uses DLL files.
You have already have a version of OOF installed so it should be safe to install bar"
check_dll(foobar)
result: "foobar has a DLL. Do not attempt to install foobar".
check_dll(RABOOF)
result: "RABOOF does not have any DLL files, but one of it's dependencies,
foobar, does have a DLL file. Do not attempt to install RABOOF".
tools::package_dependencies() will list the package dependencies, but nothing else.
Downloading the zip file from CRAN and inspecting it for a libs/x64 folder with contents will work, but seems a heavyweight approach. Theoretically if a package has lots of dependencies this could result in downloading a lot of files unnecessarily.
Look for the NeedsCompilation field in the DESCRIPTION file. If it is "yes", there will be a DLL. If it is "no", there probably won't be. (If it is not there, the package wasn't built properly, so all bets are off.)
The test is not perfect, because packages can put DLLs into the inst folder to get them installed without compiling them, though CRAN isn't supposed to allow that: "Source packages may not contain any form of binary executable code." But packages like pak (mentioned in the comments) may be allowed to get around this rule, e.g. by downloading binaries, so the test isn't perfect. You will also need to put together a blacklist of packages that will fail your WDAC tests even though they claim not to need compilation, containing pak and others like it.
The NeedsCompilation field is included as a column of the result of available.packages(), so it is very easy to access without trying to install the package.
I have accepted the answer from user2554330 as the best solution. It makes use of the normal set of commands used for package management; and the matrix generated by available.packages() can be passed to tools::package_dependencies(), removing the need for multiple internet queries.
For completeness I am documenting another possible solution. A script could query the unofficial CRAN Github mirror https://docs.r-hub.io/#cranatgh and look for a /src directory in each package project.

How does a typical Rcpp edit-compile-test cycle look like?

I can only find information on how to install a ready-made R extension package, but it is nowhere mentioned which commands a developer of an extension package has to use during daily development. I am using Rcpp and I am on Windows.
If this were a typical C++ project, it would go like this:
edit
make # oops, typo
edit # fix typo
make # oops, forgot an #include
edit
make # good; updates header dependencies for subsequent 'make' automatically
./fooreader # test it
make install # only now I'm ready
Which commands do I need for daily development of an Rcpp package project?
I've allocated a skeleton project using these commands from the R command line:
library(Rcpp)
Rcpp.package.skeleton("FooReader", example_code=FALSE,
author="My Name", email="my.email#example.com")
This allocated 3 files:
DESCRIPTION
NAMESPACE
man/FooReader-package.Rd
Now I dropped source code into
src/readfoo.cpp
with these contents:
#include <Rcpp.h>
#error here
I know I can run this from the R command line:
Rcpp::sourceCpp("D:/Projects/FooReader/src/readfoo.cpp")
(this does run the compiler and indicates the #error).
But I want to develop a package ultimately.
There is no universal answer for everybody, I guess.
For some people, RStudio is everything, and with some reason. One can use the package creation facility to create an Rcpp package, then edit and just hit the buttons (or keyboard shortcuts) to compile and re-load and test.
I also work a lot on a shell, so I do a fair amount of editing in Emacs/ESS along with R CMD INSTALL (where thanks to ccache recompilation of unchanged code is immediate) with command-line use via r of the littler package -- this allows me to write compact expressions loading the new package and evaluating: r -lnewpackage -esomeFunc(somearg) to test newpackage::someFunc() with somearg.
You can also launch the build and test from Emacs. As I said, it all depends.
Both those answers are for package, where I do real work. When I just test something in a single file, I do that in one Emacs buffer and sourceCpp() in an R session in another buffer of the same Emacs. Or sometimes I edit in Emacs and run sourceCpp() in RStudio.
There is no one answer. Find what works for you.
Also, the first part of your question describes the initial setup of a package. That is not part of the edit/compile/link/test cycle as it is a one off. And for that too do we have different approaches many of which have been discussed here.
Edit: The other main misunderstanding of your question is that once you have package you generally do not use sourceCpp() anymore.
In order to test an R package, it has to be installed into a (temporary) library such that it can be attached to a running R process. So you will typically need:
R CMD build . to build package_version.tar.gz
R CMD check <package_version.tar.gz> to test your package, including tests placed into the testsfolder
R CMD INSTALL <package_version.tar.gz> to install it into a library
After that you can attach the package and test it. Quite often I try to use a more TTD approach, which means I do not have to INSTALL the package. Running the unit tests (e.g. via R CMD check) is enough.
All that is independent of Rcpp. For a package using Rcpp you need to call Rcpp::compileAttributes() before these steps, e.g. with Rscript -e 'Rcpp::compileAttributes()'.
If you use RStudio for package development, it offers a lot of automation via the devtools package. I still find it useful to know what has to go on under the hood and it is by no means required.

Profiling an installed R package with source line numbers?

I'd like to profile functions in an installed R package (data.table) using Rprof() with line.profiling=TRUE. Normally, installed package are byte compiled, and line numbers are not available for byte compiled packages. The usual instructions for line profiling with Rprof() require using source() or eval(parse()) so that srcref attributes are present.
How can I load data.table so that line numbers are active? My naive attempts to first load the package with library(data.table) and then source('data.table.R') fails because some of the compiled C functions are not found when I attempt to use the package, presumably because library() is using a different namespace. Maybe there is some way to source() into the correct namespace?
Alternatively, perhaps I can build a modified version of data.table that is not byte compiled, and then load that in a way that keeps line numbers? What alterations would I have to make, and how would I then load it? I started by setting ByteCompile: FALSE and then trying R CMD INSTALL -l ~/R/lib --build data.table, but this still seems to be byte compiled.
I'm eager to make this work and will pursue any suggestions. I'm running R 3.2.1 on Linux, have full control over the machine, and can install anything else that is required.
Edit:
A more complete description of the problem I was trying to solve (and the solution for it) is here: https://github.com/Rdatatable/data.table/issues/1249
I ended up doing essentially what Joshua suggested: recompile the package with "KeepSource: TRUE" in the DESCRIPTION. For my purposes, I also found "ByteCompile: FALSE" to be helpful, although this might not apply generally. I also changed the version number so I could see that I was using my modified version.
Then I installed to a different location with "R CMD INSTALL data.table -l ~/R/lib", and loaded with "library(data.table, lib='~/R/lib')". When used with the patches given in the link, I got the line numbers of the allocations as I desired. But if anyone knows a solution that doesn't require recompilation, I'm sure that others would appreciate if you shared.
You should be able to get line numbers even if the package is byte-compiled. But, as it says in ?Rprof (emphasis added):
Individual statements will be recorded in the profile log if
line.profiling is TRUE, and if the code being executed was
parsed with source references. See parse for a discussion of
source references. By default the statement locations are not
shown in summaryRprof, but see that help page for options to
enable the display.
That means you need to set KeepSource: TRUE either in the DESCRIPTION file or via the --with-keep.source argument to R CMD INSTALL.

How can I get a Makefile from an existing R package?

I like the GBM package in R.
I can't get R's memory management to work with the combination of my machine/data set/task needed for reasons that have been covered elsewhere and should be considered off topic for the purposes of this question.
I would like to "rip" out the GBM algorithm away from R and rebuild it as standalone code.
Unfortunately there is no Makefile in the package tarball (or indeed any R package tarball I've seen). Is there a place I can look for straightforward Makefiles of R packages? Or do I really have to go way back to ground zero and write my own Makefile for the long painful journey ahead?
As Henry Spencer quipped: "Those who do not understand Unix are doomed to reinvent it, poorly."
R packages do not have a Makefile because R creates one on the fly when building the package, using both the defaults of the current R installation and the settings in the package, typically via a file Makevars.
Run the usual command R CMD INSTALL foo_1.2.3.tar.gz and you will see the effect of the generated Makefile as the build proceeds. Worst case you can always start by copying and pasting.
You could also take a look at CMake which can quite easily create makefiles for you. It took me minimal time to get it working for a project of mine.

Dependency management in R

Does R have a dependency management tool to facilitate project-specific dependencies? I'm looking for something akin to Java's maven, Ruby's bundler, Python's virtualenv, Node's npm, etc.
I'm aware of the "Depends" clause in the DESCRIPTION file, as well as the R_LIBS facility, but these don't seem to work in concert to provide a solution to some very common workflows.
I'd essentially like to be able to check out a project and run a single command to build and test the project. The command should install any required packages into a project-specific library without affecting the global R installation. E.g.:
my_project/.Rlibs/*
Unfortunately, Depends: within the DESCRIPTION: file is all you get for the following reasons:
R itself is reasonably cross-platform, but that means we need this to work across platforms and OSs
Encoding Depends: beyond R packages requires encoding the Depends in a portable manner across operating systems---good luck encoding even something simple such as 'a PNG graphics library' in a way that can be resolved unambiguously across systems
Windows does not have a package manager
AFAIK OS X does not have a package manager that mixes what Apple ships and what other Open Source projects provide
Even among Linux distributions, you do not get consistency: just take RStudio as an example which comes in two packages (which all provide their dependencies!) for RedHat/Fedora and Debian/Ubuntu
This is a hard problem.
The packrat package is precisely meant to achieve the following:
install any required packages into a project-specific library without affecting the global R installation
It allows installing different versions of the same packages in different project-local package libraries.
I am adding this answer even though this question is 5 years old, because this solution apparently didn't exist yet at the time the question was asked (as far as I can tell, packrat first appeared on CRAN in 2014).
Update (November 2019)
The new R package renv replaced packrat.
As a stop-gap, I've written a new rbundler package. It installs project dependencies into a project-specific subdirectory (e.g. <PROJECT>/.Rbundle), allowing the user to avoid using global libraries.
rbundler on Github
rbundler on CRAN
We've been using rbundler at Opower for a few months now and have seen a huge improvement in developer workflow, testability, and maintainability of internal packages. Combined with our internal package repository, we have been able to stabilize development of a dozen or so packages for use in production applications.
A common workflow:
Check out a project from github
cd into the project directory
Fire up R
From the R console:
library(rbundler)
bundle('.')
All dependencies will be installed into ./.Rbundle, and an .Renviron file will be created with the following contents:
R_LIBS_USER='.Rbundle'
Any R operations run from within this project directory will adhere to the project-speciic library and package dependencies. Note that, while this method uses the package DESCRIPTION to define dependencies, it needn't have an actual package structure. Thus, rbundler becomes a general tool for managing an R project, whether it be a simple script or a full-blown package.
You could use the following workflow:
1) create a script file, which contains everything you want to setup and store it in your projectd directory as e.g. projectInit.R
2) source this script from your .Rprofile (or any other file executed by R at startup) with a try statement
try(source("./projectInit.R"), silent=TRUE)
This will guarantee that even when no projectInit.R is found, R starts without error message
3) if you start R in your project directory, the projectInit.R file will be sourced if present in the directory and you are ready to go
This is from a Linux perspective, but should work in the same way under windows and Mac as well.

Resources