R Package Need compilation [duplicate] - r

This question already has answers here:
R: apt-get install r-cran-foo vs. install.packages("foo")
(2 answers)
Closed 7 years ago.
In Debian, there are some compiled R packages in the official repositories. But one could also install a R package from source.
I am interested to know why would a user prefer one method of installation to another.

It's sometimes preferable to 'compile' the sources on your server rather than just using an existing executable file.
This is because the compiler makes the exe file specifically for your machine so may run faster and work much better, for instance the compiler knows the processor you have so can optimise for this.

I already provided a somewhat detailed answer in response to this SO question.
As an update, these days you even have lots of packages prebuilt thanks to updated cran2deb initiaives:
On Ubuntu you now have almost all CRAN packages prebuilt via Michael Rutter's 'cran2deb for ubuntu' ppa on Launchpad.
For Debian, Don Armstrong now provides a similar service (also covering BioConductor and OmegaHat) at debian-r.debian.net.

The idea of pre-compiled R packages for Debian/Ubuntu is borrowing from Windows and MacOS. Those OSes have pre-compiled packages since they typically don't have the standard tools in standard locations for building packages from source (c and fortran compilers, latex, perl, etc.).
If there is a new release of a package on CRAN, is the pre-compiled package on Debian repos automatically updated? I believe that you better sync with CRAN. Check out the package ctv to help you manage large collections of R packages ("CRAN views"), both for installing and updating.
You need root privileges to install a pre-compiled package from the OS repos, while any regular user may install any packages using install.packages() in R (but I recommend to run sudo R, if you are the sysadmin, for installing CRAN views, so as to make them available system-wide, instead of inflating your ~/).
One inconvenient to source packages is that if you fetch many, the compiling will take extra time to install (depending on your machine). You might gain in performance from compiling, but it is not guaranteed to be noticeable.

Related

R package build failing on Unix machines due to missing GSL - GNU Scientific Library

I am facing a particularly vexing problem with R package development. My own package, called ggstatsplot (https://github.com/IndrajeetPatil/ggstatsplot), depends on userfriendlyscience, which depends on another package called MBESS, which itself ultimately depends on another package called gsl. There is no problem at all for installation of ggstatsplot on a Windows machine (as assessed by AppVeyor continuous integration platform: https://ci.appveyor.com/project/IndrajeetPatil/ggstatsplot).
But whenever the package is to be installed on Unix machines, it throws the error that ggstatsplot can't be downloaded because userfriendlyscience and MBESS can't be downloaded because gsl can't be downloaded. The same thing is also revealed on Travis continuous integration platform with virtual Unix machines, where the package build fails (https://travis-ci.org/IndrajeetPatil/ggstatsplot).
Now one way to solve this problem for the user on the Unix machine is to configure GSL (as described here:
installing R gsl package on Mac), but I can't possibly expect every user of ggstatsplot to go through the arduous process of configuring GSL. I want them to just run install.packages("ggstatsplot") and be done with it.
So I would really appreciate if anyone can offer me any helpful advice as to how I can make my package user's life simpler by removing this problem at its source. Is there something I should include in the package itself that will take care of this on behalf of the user?
This may not have a satisfying solution via changes to your R package (I'm not sure either way). If the gsl package authors (which include a former R Core member) didn't configure it to avoid a pre-req installation of a linux package, there's probably a good reason not to.
But it may be some consolation that most R+Linux users understand that some R packages first require installing the underlying Linux libraries (eg, through apt or dnf/yum).
Primary Issue: making it easy for the users to install
Try to be super clear on the GitHub readme and the CRAN INSTALL file. The gsl package has decent CRAN directions. This leads to the following bash code:
sudo apt-get install libgsl0-dev
The best example of clear (linux pre-req package) documentation I've seen is from the curl and sf packages. sf's CRAN page lists only the human names of the 3 libraries, but the GitHub page provides the exact bash commands for three major distribution branches. The curl package does this very well too (eg, CRAN and GitHub). For example, it provides the following explanation and bash code:
Installation from source on Linux requires libcurl. On Debian or Ubuntu use libcurl4-openssl-dev:
sudo apt-get install -y libcurl-dev
Ideally your documentation would describe how do install the gsl linux package on multiple distributions.
Disclaimer: I've never developed a package that directly requires a Linux package, but I use them a lot. In case more examples would help, this doc includes a script I use to install stuff on new Ubuntu machines. Some commands were stated explicitly in the package documentation; some had little or no documentation, and required research.
edit 2018-04-07:
I encountered my new favorite example: the sys package uses a config file to produce the following message in the R console. While installing 100+ packages on a new computer, it was nice to see this direct message, and not have to track down the R package and the documentation about its dependencies.
On Debian/Ubuntu this package requires AppArmor.
Please run: sudo apt-get install libapparmor-dev
Another good one is pdftools, that also uses a config file (and is also developed by Jeroen Ooms).
Secondary Issue: installing on Travis
The userfriendly travis config file apparently installs a lot of binaries directly (including gsl), unlike the current ggstatsplot version.
Alternatively, I'm more familiar with telling travis to install the linux package, as demonstrated by curl's config file. As a bonus, this probably more closely replicates what typical users do on their own machines.
addons:
apt:
packages:
- libcurl4-openssl-dev
Follow up 2018-03-13 Indrajeet and I tweaked the travis file so it's working. Two sections were changed in the yaml file:
The libgsl0-dev entry was added under the packages section (similar to the libcurl4-openssl-dev entry above).
Packages were listed in the r_binary_packages section so they install as binaries. The build was timing out after 50 minutes, and now it's under 10 min. In this particular package, the r_binary_packages section was nested in the Linux part of the Travis matrix so it wouldn't interfere with his two OS X jobs on Travis.

Does installing an R package with Rcpp require Rtools (Windows)

I am making an R package with Rcpp. It works fine on my machine which has Rtools installed. But recently, I tried to install my package locally on a different machine (Windows) and got a compiling error. The reason was that on that machine there was no g++ compiler (for Windows, g++ is provided with Rtools). After installing Rtools, it worked just fine.
So the question is, if I upload it to CRAN, does it still requires users to install Rtools by hand? Or does the function install.package() detect and install Rtools for them?
Also, if you guys know some packages written with Rcpp, please let me know. I'd like to take a look how it works.
So the question is, if I upload it to CRAN, does it still requires users to install Rtools by hand?
No.
Or does the function install.package() detect and install Rtools for them?
No.
What happens is that CRAN builds pre-compiled binary files that can be installed by Windows and MacOS users without the need for compilers and related tools.
Also, if you guys know some packages written with Rcpp, please let me know. I'd like to take a look how it works.
rr <- devtools::revdep("Rcpp")
length(rr) ## 907
or see the Rcpp page on CRAN.
Users, e.g. people who download it via install.package(), are actually downloading a "compiled" version of the packaged called a binary that is maintained by CRAN. They will only ever need a copy of R.
On the other hand, Developers, e.g. people who are creating it, require development tools that are system specific. For those on the Windows platform, they must have a local install of Rtools on their machine. For developers on macOS, they must have their own copies of gfortran binaries and xcode developer line tools.
Lastly, there are many such Rcpp packages available to look to for inspiration...
See the Rcpp CRAN page
Pick a package and then look up the source at http://github.com/cran/packagename
View an annotated list by Dirk here.

what is the difference of install r package in tow commands?

When I install the packages in R, sometimes it is used by devtools::install_github(). other times it is used by install.packages().
Could I ask what is the essential difference between them?
R's official repository for packages is located on CRAN (Comprehensive R Archive Network). The process of publishing a package there is very strict and is reachable via install.packages(). For the most part, binary packages (opposed to source code, which is not "properly translated" yet) are available and no additional tools need to be present for proper installation (see next paragraph).
GitHub is one of many webservices that offers repositories for code, incl. R code. Author can upload her or his package and if everything is in its place, the user can install a package from source via devtools::install_github(). This means you need to have a proper toolchain installed (also a distributoin of LaTeX). In Windows, this means Rtools. Linux based OS are likely to be shipped with most of the necessary tools.

R: can rpm files be used with Windows for possibly outdated R packages?

I was trying to run code that required the R packages ‘pkgDepTools’ and ‘Rgraphviz’. I received error messages saying that neither package is available for R version 2.15.0.
A Google search turned up the following webpage RPM Pbone that seems to have the packages:
http://rpm.pbone.net/index.php3/stat/4/idpl/17802118/dir/mandrake_other/com/R-pkgDepTools-1.20.0-1-mdv2012.0.i586.rpm.html
and
http://rpm.pbone.net/index.php3/stat/4/idpl/17802080/dir/mandrake_other/com/R-Rgraphviz-1.32.0-2-mdv2012.0.i586.rpm.html
However, the files have an *.rpm extension rather than the *.tar.gz or *.zip extensions I am used to.
I am using Windows 7 and R version 2.15.0. Can I install an R package from an *.rpm file?
From Wikipedia *.rpm seems like maybe it is more for Linux:
http://en.wikipedia.org/wiki/RPM_Package_Manager
Regarding other possible solutions, I have found several earlier posts here with similar questions about installing R packages that are not available for the most recent version of R:
Bivariate Poisson Regression in R?
Package ‘GeneR’ is not available
R Venn Diagram package Venerable unavailable - alternative package?
I have installed the latest version of Rtools and the package 'devtools'. Although I know nothing about them.
There is an archived version of 'Rgraphviz' here:
http://cran.r-project.org/src/contrib/Archive/Rgraphviz/
but I cannot locate an archived version of 'pkgDepTools'.
If I can install the packages on a Windows machine using the above *.rpm files could someone please provide instructions?
If I must use Rtools to build them I might ask more questions because the instructions at the link below are challenging for me:
http://cran.r-project.org/doc/manuals/R-admin.html#Building-from-source
To be completely transparent I am hoping someone might build them for me, if that is possible. Although I recognize the experience and knowledge gained from doing it myself would probably pay off in the long run.
Thank you for any advice.
pkgDepTools and Rgraphviz are BioConductor R packages not ones hosted on CRAN. Unless you configure your R to download packages from those repos, R will report that they are not available; it can only install from repos it has been configured to install from.
To install those BioConductor packages a lite installation method is provided:
source("http://bioconductor.org/biocLite.R")
biocLite(c("pkgDepTools", "Rgraphviz"))
Further details are provided on the Install page of the BioConductor website
In general you can't use rpm packages on Windows; rpm's are the equivalent of a binary package for Linux. Any C/C++/Fortran/etc code will have been compiled for Linux not Windows. If a package really isn't available for your version of R then check if there is a reason stated on CRAN (usually Windows binaries take a few days longer to produce or there may be requirements for software not available on the CRAN Windows build machines). You can try the WinBuilder service run by Uwe Ligges to build Windows Binaries of packages for you, but if the package was on CRAN and now isn't that suggests it no longer works with current R and can not be built.
In general try a wider search for packages; the first hit in my Google search results under the search string "pkgDepTools" is the Bioconductor page for the package which includes a link to the Windows binary and instructions on how to install the package from within R.
I think this merits an answer rather than a comment.
A gentleman at Bioconductor helped me get Rgraphviz installed. The primary problem was that the version of Rgraphviz I had downloaded only seems to work with the 32-bit version of R and I was running a 64-bit version of R. I was able to install Rgraphviz in the 32-bit version of R.
I had also made an error or two in the PATH statement during some of my attempts to install Rgraphviz. However, the post above in my second comment provides the instructions for installation.
You just, it seems, cannot install the normal download version of Rgraphviz in the 64-bit version of R.
I think many of our emails back and forth are now posted on the Bioconductor forum.
I might edit this answer with more detailed instructions in the next 24-hours.

How to install and manage many versions of R packages

I am developing a framework for reproducible computing with R. One problem that I am struggling with is that some R code might run perfectly in version X.Y-Z of a package, but then why you try to reproduce it 3 years later, the packages have updated, some functions are changed, and the code doesn't run anymore. This problem affects also for example Sweave documents that use packages.
The only way to confidently reproduce the results is by installing the R version and version of the packages that were used by the original author. If this was a single case, one could pull stuff from the CRAN archives and install appropriate versions. But for my framework this is impractical, and I need to have the package versions preinstalled.
Assume for now that I restrict myself to a single version of R, e.g. 2.14. What would be a practical way to install many versions of R packages, so that I can load them on the fly? I suppose I can do something like creating separate library directories for every version of every package and then using custom lib.loc arguments while loading them. This is going to be messy though. Any tips or previous attempts to do something similar?
My framework runs on Ubuntu server.
You could install packages with versions (e.g. rename to foo_1.0 directory instead of foo) and softlink the versions you want to re-create a given R + packages snapshot into one library. Obviously, the packages could actually live in a separate tree, so you could have library.projectX/foo -> library.all/foo/1.0.
The operating system gives you even more handles for complete separation, and the Debian / Ubuntu stack as a ton of those available. Two I have played with are
chroot environments: We use this to complete separate build environments from host machines. For example, all Debian uploads I produced are built in a i386 pbuilder chroot hosted on my amd64 Ubuntu server. Chroot is a very powerful Unix system call. Chroots, and particularly the pbuilder system built on top of it (for Debian package building) are meant to operate headless.
Virtual machines: This gives you full generality. My not-so-powerful box easily handles three virtual machines: Debian i386, Ubuntu i386 as well as Windoze XP. For this, I currently use KVM along with libvirt; this is Linux specific. I have also used VirtualBox and VMware in the past.
I would try to modify the DESCRIPTION file, and change the field "Package" there by adding the version number.
For example, you download the package source a from CRAN page (http://cran.r-project.org/web/packages/pls/). Unpack the compressed file (pls_2.3-0.zip) to a directory ("pls/"). The following steps are to change the package name in DESCRIPTION ("pls/DESCRIPTION") and installation with R command 'R CMD INSTALL pls/', where 'pls/' is a path to the package source with modified DESCRIPTION file.
Playing with R library paths seems a dangerous thing to me.

Resources