When using the make tool directly, one can use the -j option to build in parallel.
How can I use parallel build when installing an R package using install.packages()? make is invoked by R, not by me, so I can't pass the -j option to it. Setting export MAKE_FLAGS=-j4 before starting R did not work. I am looking to set up parallel build permanently for my R installation.
The options(Ncpus=8) route is one way. In install.packages() you have Ncpus = getOption("Ncpus") and that option is described as
Ncpus: the number of parallel processes to use for a parallel
install of more than one source package. Values greater than
one are supported if the ‘make’ command specified by
‘Sys.getenv("MAKE", "make")’ accepts argument ‘-k -j Ncpus’.
I don't see it listed in update.packages() but it functions that same way on my Linux machines so builds generally happen in parallel.
In short, this uses parallel builds of multiple packages, as opposed to make -j ... when just building one package. I tried that route too but found the gains less compelling.
Related
I want to run valgrind on the tests, examples and vignettes of my package. Various sources insinuate that the way to do this should be:
R CMD build my-pkg
R CMD check --use-valgrind my-pkg_0.0.tar.gz
R CMD check seems to run fine, but shows no evidence of valgrind output, even after setting the environment variable VALGRIND_OPTS: --memcheck:leak-check=full. I've found sources that hint that R needs to run interactively for valgrind to show output, but R -d CMD check (or R -d "CMD check") seems to be the wrong format.
R -d "valgrind --tool=memcheck --leak-check=full" --vanilla < my-pkg.Rcheck/my-pkg-Ex.R does work, but only on the example files; I can't see a simple way to run this against my vignettes and testthat tests.
What is the best way to run all relevant scripts through valgrind? For what it's worth, the goal is to integrate this in a GitHub actions script.
Edit Mar 2022: The R CMD check case is actually simpler, running R CMD check --use-valgrind [other options you may want] will run the tests and examples under valgrind and then append the standard valgrind summary at the end of the examples output (i.e., pkg.Rcheck/pkg-Ex.Rout) and test output (i.e., pkg.Rcheck/tinytest.Rout as I use tinytest)_. What puzzles me now is that an error detected by valgrind does not seem to fail the test.
Original answer below the separator.
There is a bit more to this: you helps to ensure that the R build is instrumented for it. See Section 4.3.2 of Writing R Extensions:
On platforms where valgrind is installed you can build a version of R with extra instrumentation to help valgrind detect errors in the use of memory allocated from the R heap. The configure option is --with-valgrind-instrumentation=level, where level is 0, 1 or 2. Level 0 is the default and does not add anything. Level 1 will detect some uses117 of uninitialised memory and has little impact on speed (compared to level 0). Level 2 will detect many other memory-use bugs118 but make R much slower when running under valgrind. Using this in conjunction with gctorture can be even more effective (and even slower).
So you probably want to build yourself a Docker-ized version of R to call from your GitHub Action. I think the excellent 'sumo' container by Winston has a valgrind build as well so you could try that as well. It's huge at over 4gb:
edd#rob:~$ docker images| grep wch # some whitespace edit out
wch1/r-debug latest a88fabe8ec81 8 days ago 4.49GB
edd#rob:~$
And of course if you test dependent packages you have to get them into the valgrind session too...
Unfortunately, I found the learning curve involved in dockerizing R in GitHub actions, per #dirk-eddelbuettel's suggestion, too steep.
I came up with the hacky approach of adding a file memcheck.R to the root of the package with the contents
devtools::load_all()
devtools::run_examples()
devtools::build_vignettes()
devtools::test()
(remembering to add to .Rbuildignore).
Running R -d "valgrind --tool=memcheck --leak-check=full" --vanilla < memcheck.R then seems to work, albeit with the reporting of some issues that appear to be false positives, or at least issues that are not identified by CRAN's valgrind implementation.
Here's an example of this in action in a GitHub actions script.
(Readers who know what they are doing are invited to suggest shortcomings of this approach in the comments!)
I am developing R packages for an internal use applications at work. Unfortunately, not everybody is on the same version of R. I want to build Windows binaries of my package to support multiple versions, for example, 3.6.x and 4.0.x R.
I can easily do this by building the package (I use devtools::build(binary = TRUE)), and then change the R version in RStudio, restart, and run again. But this gets very tedious.
Is there a way to streamline this (e.g., my own custom function to build both at once)? I imagine some CI/CD thing is probably best, but this solution would have to be limited to what I can run locally.
Don't do it in RStudio, write a script to do it. It would have commands like these:
/path/to/R3.6.3/R CMD INSTALL --build /path/to/yourpackage
mv yourpackage.*.zip /path/for/R3.6users
/path/to/R4.0.3/R CMD INSTALL --build /path/to/yourpackage
mv yourpackage.*.zip /path/for/R4.0users
You don't need a lot of builds; only the first two parts of the version number (e.g. 3.6 or 4.0) need to match the target system.
You could implement this script in R using system() calls, but it's probably simpler to do it using one of the Windows script languages (.bat or .cmd or whatever).
I can only find information on how to install a ready-made R extension package, but it is nowhere mentioned which commands a developer of an extension package has to use during daily development. I am using Rcpp and I am on Windows.
If this were a typical C++ project, it would go like this:
edit
make # oops, typo
edit # fix typo
make # oops, forgot an #include
edit
make # good; updates header dependencies for subsequent 'make' automatically
./fooreader # test it
make install # only now I'm ready
Which commands do I need for daily development of an Rcpp package project?
I've allocated a skeleton project using these commands from the R command line:
library(Rcpp)
Rcpp.package.skeleton("FooReader", example_code=FALSE,
author="My Name", email="my.email#example.com")
This allocated 3 files:
DESCRIPTION
NAMESPACE
man/FooReader-package.Rd
Now I dropped source code into
src/readfoo.cpp
with these contents:
#include <Rcpp.h>
#error here
I know I can run this from the R command line:
Rcpp::sourceCpp("D:/Projects/FooReader/src/readfoo.cpp")
(this does run the compiler and indicates the #error).
But I want to develop a package ultimately.
There is no universal answer for everybody, I guess.
For some people, RStudio is everything, and with some reason. One can use the package creation facility to create an Rcpp package, then edit and just hit the buttons (or keyboard shortcuts) to compile and re-load and test.
I also work a lot on a shell, so I do a fair amount of editing in Emacs/ESS along with R CMD INSTALL (where thanks to ccache recompilation of unchanged code is immediate) with command-line use via r of the littler package -- this allows me to write compact expressions loading the new package and evaluating: r -lnewpackage -esomeFunc(somearg) to test newpackage::someFunc() with somearg.
You can also launch the build and test from Emacs. As I said, it all depends.
Both those answers are for package, where I do real work. When I just test something in a single file, I do that in one Emacs buffer and sourceCpp() in an R session in another buffer of the same Emacs. Or sometimes I edit in Emacs and run sourceCpp() in RStudio.
There is no one answer. Find what works for you.
Also, the first part of your question describes the initial setup of a package. That is not part of the edit/compile/link/test cycle as it is a one off. And for that too do we have different approaches many of which have been discussed here.
Edit: The other main misunderstanding of your question is that once you have package you generally do not use sourceCpp() anymore.
In order to test an R package, it has to be installed into a (temporary) library such that it can be attached to a running R process. So you will typically need:
R CMD build . to build package_version.tar.gz
R CMD check <package_version.tar.gz> to test your package, including tests placed into the testsfolder
R CMD INSTALL <package_version.tar.gz> to install it into a library
After that you can attach the package and test it. Quite often I try to use a more TTD approach, which means I do not have to INSTALL the package. Running the unit tests (e.g. via R CMD check) is enough.
All that is independent of Rcpp. For a package using Rcpp you need to call Rcpp::compileAttributes() before these steps, e.g. with Rscript -e 'Rcpp::compileAttributes()'.
If you use RStudio for package development, it offers a lot of automation via the devtools package. I still find it useful to know what has to go on under the hood and it is by no means required.
I want to run an R command from command line (actually, from within a Makefile). The command is roxygen2::roxygenise(), if it is relevant. I don't want to create a new file and run that as a script - that will just clutter my directory.
In python, this is simple - you write python -c "import antigravity".
I use the Makefile to build, install and test a (Rcpp) package I'm working on.
This is generally done with so 'shebang scripts'.
Historically, littler was there first, about a decade or so ago. It is still widely used, and contains a number of helper scripts as for example roxy.r which does just what you desire: run roxygen2::roxygenize(). I use this all the time.
Next, Rscript started to ship with R. It is similar to littler but automatically available whereever R is which is a plus. On the minus side, it starts slower, and fails to load the methods package which is a source of a number of bug reports and SO questions.
Much more recently, R itself added the ability to run expressions following the -e ... switch.
So you have plenty of choices. You can also study plenty of src/Makevars files many of which use Rscript.
I'm working on a package "xyz" that uses Rcpp with several cpp files.
When I'm only updating the R code, I would like to run R CMD INSTALL xyz on the package directory without having to recompile all the shared libraries that haven't changed. That works fine if I specify the --no-multiarch flag: the source directory src gets populated the first time with the compiled objects, and if the sources don't change they are re-used the next time. With multiarch on, however, R decides to make two copies of src, src-i386 and src-x86_64. It seems to confuse R CMD INSTALL which always re-runs all the compilation. Is there any workaround?
(I'm aware that there are alternative ways, e.g. devtools::load_all, but I'd rather stick to R CM INSTALL if possible).
The platform is MacOS 10.7, and I have the latest version of R.
I have a partial answer for you. One really easy for speed-up is provided by using ccache which you can enable for all R compilation (e.g. via R CMD whatever thereby also getting inline, attributes, RStudio use, ...) globally through .R/Makevars:
edd#max:~$ tail -10 .R/Makevars
VER=4.6
CC=ccache gcc-$(VER)
CXX=ccache g++-$(VER)
SHLIB_CXXLD=g++-$(VER)
FC=ccache gfortran
F77=ccache gfortran
MAKE=make -j8
edd#max:~$
It takes care of all caching of compilation units.
Now, that does not "explicitly" address the --no-multiarch aspect which I don;t play much with that as we are still mostly 'single arch' on Linux. This will change, eventually, but hasn't yet. Yet I suspect but by letting the compiler decide the caching you too will get the net effect.
Other aspects can be controlled too, eg ~/.R/check.Renviron can be used to turn certain tests on or off. I tend to keep'em all on -- better to waste a few seconds here than to get a nastygram from Vienna.