I have a problem with parallel estimation of random survival forest from randomForestSRC package. I have followed the this guide and tried installing it on Mac (Sierra). However, the rfsrc() function still runs on a single thread. Could you please advice what to do in order to achieve parallel execution, as the function takes ages to compute on a larger dataset. I have directly followed the steps described in the tutorial and no success.
Thanks in advance!
The guide noted in your question is from 2013 and the process for successful OpenMP parallel execution has been significantly streamlined since then. In fact, the binaries available on CRAN for the current build (2.5.1) should run in parallel on Sierra. The source code includes a ready-made configure file that is the result of the autoconf command. Thus, parallel execution is the default behaviour now. If you haven't yet upgraded to the latest build, I would recommend doing so. If the binary build provided by CRAN still does not switch on parallel execution, I would recommend upgrading your compiler to GCC using Homebrew or another package manager, and then appropriately create and massage a Makevars file as given in the instructions on our GitHub page so as to allow the CRAN package installation process to pick up the GCC compiler instead of the default Clang compiler:
https://kogalur.github.io/randomForestSRC/building.html
Related
I wrote a R package with Rcpp and RcppArmadillo and load it into a supercomputer cluster running on Unix. However, it produces the above error while I try to run one of the function in a R instance on the cluster. Does anyone know how to solve this? Module loaded on the cluster is R-3.5.0
I don't know the particulars of your HPC system but on "vanilla" machines it is easy to use MKL ... because, as frequently stated before, BLAS and LAPACK are an interface to which MKL adheres.
See for example the blog posts I wrote here about using MKL on Ubuntu
http://dirk.eddelbuettel.com/blog/2018/06/24#019_mkl_soon_in_debian
http://dirk.eddelbuettel.com/blog/2018/04/15#018_mkl_for_debian_ubuntu
and the GitHub repo with the script they reference
https://github.com/eddelbuettel/mkl4deb
So if I had to guess I'd say that your dynamic linker part only sees parts of the MKL library directory. And for what it is worth, I have had from other users on large research systems. Good luck!
I am facing a particularly vexing problem with R package development. My own package, called ggstatsplot (https://github.com/IndrajeetPatil/ggstatsplot), depends on userfriendlyscience, which depends on another package called MBESS, which itself ultimately depends on another package called gsl. There is no problem at all for installation of ggstatsplot on a Windows machine (as assessed by AppVeyor continuous integration platform: https://ci.appveyor.com/project/IndrajeetPatil/ggstatsplot).
But whenever the package is to be installed on Unix machines, it throws the error that ggstatsplot can't be downloaded because userfriendlyscience and MBESS can't be downloaded because gsl can't be downloaded. The same thing is also revealed on Travis continuous integration platform with virtual Unix machines, where the package build fails (https://travis-ci.org/IndrajeetPatil/ggstatsplot).
Now one way to solve this problem for the user on the Unix machine is to configure GSL (as described here:
installing R gsl package on Mac), but I can't possibly expect every user of ggstatsplot to go through the arduous process of configuring GSL. I want them to just run install.packages("ggstatsplot") and be done with it.
So I would really appreciate if anyone can offer me any helpful advice as to how I can make my package user's life simpler by removing this problem at its source. Is there something I should include in the package itself that will take care of this on behalf of the user?
This may not have a satisfying solution via changes to your R package (I'm not sure either way). If the gsl package authors (which include a former R Core member) didn't configure it to avoid a pre-req installation of a linux package, there's probably a good reason not to.
But it may be some consolation that most R+Linux users understand that some R packages first require installing the underlying Linux libraries (eg, through apt or dnf/yum).
Primary Issue: making it easy for the users to install
Try to be super clear on the GitHub readme and the CRAN INSTALL file. The gsl package has decent CRAN directions. This leads to the following bash code:
sudo apt-get install libgsl0-dev
The best example of clear (linux pre-req package) documentation I've seen is from the curl and sf packages. sf's CRAN page lists only the human names of the 3 libraries, but the GitHub page provides the exact bash commands for three major distribution branches. The curl package does this very well too (eg, CRAN and GitHub). For example, it provides the following explanation and bash code:
Installation from source on Linux requires libcurl. On Debian or Ubuntu use libcurl4-openssl-dev:
sudo apt-get install -y libcurl-dev
Ideally your documentation would describe how do install the gsl linux package on multiple distributions.
Disclaimer: I've never developed a package that directly requires a Linux package, but I use them a lot. In case more examples would help, this doc includes a script I use to install stuff on new Ubuntu machines. Some commands were stated explicitly in the package documentation; some had little or no documentation, and required research.
edit 2018-04-07:
I encountered my new favorite example: the sys package uses a config file to produce the following message in the R console. While installing 100+ packages on a new computer, it was nice to see this direct message, and not have to track down the R package and the documentation about its dependencies.
On Debian/Ubuntu this package requires AppArmor.
Please run: sudo apt-get install libapparmor-dev
Another good one is pdftools, that also uses a config file (and is also developed by Jeroen Ooms).
Secondary Issue: installing on Travis
The userfriendly travis config file apparently installs a lot of binaries directly (including gsl), unlike the current ggstatsplot version.
Alternatively, I'm more familiar with telling travis to install the linux package, as demonstrated by curl's config file. As a bonus, this probably more closely replicates what typical users do on their own machines.
addons:
apt:
packages:
- libcurl4-openssl-dev
Follow up 2018-03-13 Indrajeet and I tweaked the travis file so it's working. Two sections were changed in the yaml file:
The libgsl0-dev entry was added under the packages section (similar to the libcurl4-openssl-dev entry above).
Packages were listed in the r_binary_packages section so they install as binaries. The build was timing out after 50 minutes, and now it's under 10 min. In this particular package, the r_binary_packages section was nested in the Linux part of the Travis matrix so it wouldn't interfere with his two OS X jobs on Travis.
The last few days I have been using a lot RcppArmadillo. I have been working on a 3D arrays convolution project. After the initial steep learning curve I managed to create some useful C++ routines using Armadillo. Kudos to Dirk.
Then suddenly RcppArmadillo started doing weird things; I kept getting this message:
Found no calls to: 'R_registerRoutines', 'R_useDynamicSymbols'
It is good practice to register native routines and to disable symbol
search.
This happens in Windows 10. In Linux you may receive the same message but it is quickly fixable with RcppArmadillo::RcppArmadillo.package.skeleton() or following the new instructions in Writing R Extensions - 5.4 Registering native routines. In Windows THIS WILL NOT WORK, though. Why?
To isolate the problem I started with a Virtual Machine (VM) with a fresh R (3.4.1). When you start with the demo that RcppArmadillo builds, it will build the demo without hiccups, no notes, no warning messages. As soon as you install devtools the problems with RcppArmadillo begin.
To reproduce this, you can do the following: (1) start with a fresh R and install the packages required to run only to build your-package with RcppArmadillo (a couple). (2) Build your demo package. You will get no errors. (3) Download the source of any of these packages that I tested: gckrig, GAS, abcrf, AbsFilterGSEA, Amelia, MAVE, SparseFactorAnalysis, RcppProgress, artfima, geospt. All of them use RcppArmadillo. They should build OK providing you fed them with their dependencies. (4) Now, install devtools. Immediately after you will start receiving the message:
Found no calls to: 'R_registerRoutines', 'R_useDynamicSymbols'
It is good practice to register native routines and to disable symbol
search
No mater what I did, I could not get rid off the note. I tried the same packages in Linux and no issue at all. This causes delay because one cannot submit a package to CRAN with that message.
(5) Now, if you uninstall devtools and dependencies and try again building any RcppArmadillo package; NO MORE message register native routines. Beautiful.
I wonder why devtools is causing this conflict. How can we get this fixed? I love devtools but have to keep it uninstalled if I have to work with RcppArmadillo. Tough choices.
EDIT
This is fully reproducible. Here are the steps to reproduce the behavior:
(1) start with a fresh R and install the packages required to build your-package with RcppArmadillo (a couple). Do not install devtools.
(2) Build your demo package. You will get no register native routines errors.
(3) Download the source of any of these packages that I have tested: gckrig, GAS, abcrf, AbsFilterGSEA, Amelia, MAVE, SparseFactorAnalysis, RcppProgress, artfima, geospt. All of them use RcppArmadillo. They should build OK providing you fed them with their dependencies.
(4) Now, install devtools. Try building again with any of the packages above. Immediately after you will start receiving the message: register native routines.
(5) Now, if you uninstall devtools and dependencies and try building again any RcppArmadillo package; no more messages related to register native routines.
EDIT
This has been tested with three Windows 10 virtual machine instances installing R 3.4.1, Rtools from scratch. In all the tests, building packages with RcppArmadillo finish with the message register native routines. If you try to build in Linux -the same package- it will pass with no notes.
I am installing some R packages from source like (RQuantlib) package installation is taking about ten minutes. Is possible to use multiple cores during compilation?
The C++ code is RQuantLib is famously taxing -- a lot of templates, a lot of Boost, a lot of QuantLib headers.
To answer your question, set
$ export MAKE="make -j8"
in the shell before calling R CMD INSTALL. This is documented in the 'R Administration and Installation' manual.
I do something more and deploy ccache which caches compilation results. So for unchanged files the gains can be tremendous. Use it by setting the compiler eg
CC="ccache gcc"
CXX="ccache g++"
in ~/.R/Makeconf.
I'm trying to install the doMPI package in R.
Apparently there are no binaries available for the 3.x version?
Do i need to build it from source?
http://cran.r-project.org/web/packages/doMPI/
<>
The goal is to run parallel processing with caret on a windows machine.
CRAN doesn't build binaries of doMPI for Mac OS X or Windows because it depends on the Rmpi package, and it doesn't build binaries for Rmpi because it depends on MPI libraries which don't come by default on those platforms. Some people have suggested that I declare Rmpi to be a suggested package to work-around this issue, but in fact, doMPI really does depend on Rmpi, so it always seemed like an odd thing to do. The way I see it, if you're able to build Rmpi from source, you'll have no trouble building doMPI from source.
So yes, you have to build it from source, but the bigger problem is to build Rmpi from source, unless you're using a Linux distribution like Debian that distributes both Rmpi and doMPI as binary deb packages.
But if you just want to run caret in parallel on a Windows machine, the normal solution is to use the doParallel package using a PSOCK cluster. People have trouble with that as well, but at least installation of the packages is easy since there are binary packages available for doParallel on CRAN.