most common config flags for installing R from source on 64-bit Ubuntu - r

I'm convinced that using Dirk's package is the best way to install and maintain R on an Ubuntu system. But I want to have some fun and get used to installing R from source.
What are the most common configure flags to use when installing?
Also, if I want to install 2.14.1 and I have 2.14.0 currently installed (which was installed from source), should I first uninstall 2.14.0?

There was a recent thread somewhere about having several versions---one from the apt-get repo, one in /usr/local. Try to find that...
Otherwise, I will roll up 2.14.1 on Friday morning, Michael will do his magic and the repo will have .deb packages of 2.14.1 'real soon', sometimes within a day.
Lastly, you can see which flags are used by getting the package sources for which you just do apt-get source r-base (and that works for any Debian/Ubuntu package that way if you have source references in apt's file.
Edit: By the way, regarding the '64-bit' aspect of your question: Nada. We don't do anything differently. It is "merely" the host OS being more generous with resources. But R finds all it needs to know on its own via its configure etc logic.

Related

R Package Need compilation [duplicate]

This question already has answers here:
R: apt-get install r-cran-foo vs. install.packages("foo")
(2 answers)
Closed 7 years ago.
In Debian, there are some compiled R packages in the official repositories. But one could also install a R package from source.
I am interested to know why would a user prefer one method of installation to another.
It's sometimes preferable to 'compile' the sources on your server rather than just using an existing executable file.
This is because the compiler makes the exe file specifically for your machine so may run faster and work much better, for instance the compiler knows the processor you have so can optimise for this.
I already provided a somewhat detailed answer in response to this SO question.
As an update, these days you even have lots of packages prebuilt thanks to updated cran2deb initiaives:
On Ubuntu you now have almost all CRAN packages prebuilt via Michael Rutter's 'cran2deb for ubuntu' ppa on Launchpad.
For Debian, Don Armstrong now provides a similar service (also covering BioConductor and OmegaHat) at debian-r.debian.net.
The idea of pre-compiled R packages for Debian/Ubuntu is borrowing from Windows and MacOS. Those OSes have pre-compiled packages since they typically don't have the standard tools in standard locations for building packages from source (c and fortran compilers, latex, perl, etc.).
If there is a new release of a package on CRAN, is the pre-compiled package on Debian repos automatically updated? I believe that you better sync with CRAN. Check out the package ctv to help you manage large collections of R packages ("CRAN views"), both for installing and updating.
You need root privileges to install a pre-compiled package from the OS repos, while any regular user may install any packages using install.packages() in R (but I recommend to run sudo R, if you are the sysadmin, for installing CRAN views, so as to make them available system-wide, instead of inflating your ~/).
One inconvenient to source packages is that if you fetch many, the compiling will take extra time to install (depending on your machine). You might gain in performance from compiling, but it is not guaranteed to be noticeable.

Precompile R packages for specific Linux flavor

We have established a simple local CRAN-like repository for R packages. There are many users, all of which use the same version of Linux.
Is there a way of convincing R to provide pre-compiled Linux packages instead just source ones? The compilation step takes a considerable amount of time for anyone using our repository. It should be possible to precompile and reuse the same binaries, since we can guarantee that the Linux version is consistent for all users.
How could one hack something like this together?
In the very narrow sense of "all of which use the same version of Linux" you actually have an option (that happens to be relatively littler known). Create binary packages using e.g.
R CMD INSTALL --build nameOfDirectoryWithSources
As R CMD INSTALL --help says it
--build build binaries of the installed package(s)
and these are not .deb or .rpm alike packages: no dependency information or alike is added. But they do exactly what you ask for: save on compilation time.
I am not aware of a repository structure one can build of this though.

R package build failing on Unix machines due to missing GSL - GNU Scientific Library

I am facing a particularly vexing problem with R package development. My own package, called ggstatsplot (https://github.com/IndrajeetPatil/ggstatsplot), depends on userfriendlyscience, which depends on another package called MBESS, which itself ultimately depends on another package called gsl. There is no problem at all for installation of ggstatsplot on a Windows machine (as assessed by AppVeyor continuous integration platform: https://ci.appveyor.com/project/IndrajeetPatil/ggstatsplot).
But whenever the package is to be installed on Unix machines, it throws the error that ggstatsplot can't be downloaded because userfriendlyscience and MBESS can't be downloaded because gsl can't be downloaded. The same thing is also revealed on Travis continuous integration platform with virtual Unix machines, where the package build fails (https://travis-ci.org/IndrajeetPatil/ggstatsplot).
Now one way to solve this problem for the user on the Unix machine is to configure GSL (as described here:
installing R gsl package on Mac), but I can't possibly expect every user of ggstatsplot to go through the arduous process of configuring GSL. I want them to just run install.packages("ggstatsplot") and be done with it.
So I would really appreciate if anyone can offer me any helpful advice as to how I can make my package user's life simpler by removing this problem at its source. Is there something I should include in the package itself that will take care of this on behalf of the user?
This may not have a satisfying solution via changes to your R package (I'm not sure either way). If the gsl package authors (which include a former R Core member) didn't configure it to avoid a pre-req installation of a linux package, there's probably a good reason not to.
But it may be some consolation that most R+Linux users understand that some R packages first require installing the underlying Linux libraries (eg, through apt or dnf/yum).
Primary Issue: making it easy for the users to install
Try to be super clear on the GitHub readme and the CRAN INSTALL file. The gsl package has decent CRAN directions. This leads to the following bash code:
sudo apt-get install libgsl0-dev
The best example of clear (linux pre-req package) documentation I've seen is from the curl and sf packages. sf's CRAN page lists only the human names of the 3 libraries, but the GitHub page provides the exact bash commands for three major distribution branches. The curl package does this very well too (eg, CRAN and GitHub). For example, it provides the following explanation and bash code:
Installation from source on Linux requires libcurl. On Debian or Ubuntu use libcurl4-openssl-dev:
sudo apt-get install -y libcurl-dev
Ideally your documentation would describe how do install the gsl linux package on multiple distributions.
Disclaimer: I've never developed a package that directly requires a Linux package, but I use them a lot. In case more examples would help, this doc includes a script I use to install stuff on new Ubuntu machines. Some commands were stated explicitly in the package documentation; some had little or no documentation, and required research.
edit 2018-04-07:
I encountered my new favorite example: the sys package uses a config file to produce the following message in the R console. While installing 100+ packages on a new computer, it was nice to see this direct message, and not have to track down the R package and the documentation about its dependencies.
On Debian/Ubuntu this package requires AppArmor.
Please run: sudo apt-get install libapparmor-dev
Another good one is pdftools, that also uses a config file (and is also developed by Jeroen Ooms).
Secondary Issue: installing on Travis
The userfriendly travis config file apparently installs a lot of binaries directly (including gsl), unlike the current ggstatsplot version.
Alternatively, I'm more familiar with telling travis to install the linux package, as demonstrated by curl's config file. As a bonus, this probably more closely replicates what typical users do on their own machines.
addons:
apt:
packages:
- libcurl4-openssl-dev
Follow up 2018-03-13 Indrajeet and I tweaked the travis file so it's working. Two sections were changed in the yaml file:
The libgsl0-dev entry was added under the packages section (similar to the libcurl4-openssl-dev entry above).
Packages were listed in the r_binary_packages section so they install as binaries. The build was timing out after 50 minutes, and now it's under 10 min. In this particular package, the r_binary_packages section was nested in the Linux part of the Travis matrix so it wouldn't interfere with his two OS X jobs on Travis.

How to install and manage many versions of R packages

I am developing a framework for reproducible computing with R. One problem that I am struggling with is that some R code might run perfectly in version X.Y-Z of a package, but then why you try to reproduce it 3 years later, the packages have updated, some functions are changed, and the code doesn't run anymore. This problem affects also for example Sweave documents that use packages.
The only way to confidently reproduce the results is by installing the R version and version of the packages that were used by the original author. If this was a single case, one could pull stuff from the CRAN archives and install appropriate versions. But for my framework this is impractical, and I need to have the package versions preinstalled.
Assume for now that I restrict myself to a single version of R, e.g. 2.14. What would be a practical way to install many versions of R packages, so that I can load them on the fly? I suppose I can do something like creating separate library directories for every version of every package and then using custom lib.loc arguments while loading them. This is going to be messy though. Any tips or previous attempts to do something similar?
My framework runs on Ubuntu server.
You could install packages with versions (e.g. rename to foo_1.0 directory instead of foo) and softlink the versions you want to re-create a given R + packages snapshot into one library. Obviously, the packages could actually live in a separate tree, so you could have library.projectX/foo -> library.all/foo/1.0.
The operating system gives you even more handles for complete separation, and the Debian / Ubuntu stack as a ton of those available. Two I have played with are
chroot environments: We use this to complete separate build environments from host machines. For example, all Debian uploads I produced are built in a i386 pbuilder chroot hosted on my amd64 Ubuntu server. Chroot is a very powerful Unix system call. Chroots, and particularly the pbuilder system built on top of it (for Debian package building) are meant to operate headless.
Virtual machines: This gives you full generality. My not-so-powerful box easily handles three virtual machines: Debian i386, Ubuntu i386 as well as Windoze XP. For this, I currently use KVM along with libvirt; this is Linux specific. I have also used VirtualBox and VMware in the past.
I would try to modify the DESCRIPTION file, and change the field "Package" there by adding the version number.
For example, you download the package source a from CRAN page (http://cran.r-project.org/web/packages/pls/). Unpack the compressed file (pls_2.3-0.zip) to a directory ("pls/"). The following steps are to change the package name in DESCRIPTION ("pls/DESCRIPTION") and installation with R command 'R CMD INSTALL pls/', where 'pls/' is a path to the package source with modified DESCRIPTION file.
Playing with R library paths seems a dangerous thing to me.

R: apt-get install r-cran-foo vs. install.packages("foo")

When installing R packages (say mcmcpack in this example) under Ubuntu I have the choice between the following two methods of installation:
# Let the distribution's packaging system take care of installation/upgrades
apt-get install r-cran-mcmcpack
# Let R take care of installation/upgrades
install.packages("mcmcpack")
Questions:
Is any of the two ways of installing R packages considered "best practice"?
Assume that I first install.packages("mcmcpack") and later on apt-get install r-cran-mcmcpack - should I expect trouble?
Assume that I first apt-get install r-cran-mcmcpack and later on install.packages("mcmcpack") - should I expect trouble?
Update (some thirteens years later): It is now as easy as it seems if you use for example the wonderful and powerful r2u system I set up last year, and which now provides over 20k binary .deb packages for each of twi Ubuntu LTS releases (currently: 20.04 and 22.04), and is also accessible via install.packages() thanks top bspm. Follow the link to r2u for more.
It's not as easy as it seems.
apt-get update is good if and when
packages exist -- but there are only around 150 or so r-cran-* packages out of a pool of 2100+ packages on CRAN, so rather sparse coverage
packages are maintained, bug free and current
you are happy enough with the bi-annual releases by Ubuntu
install.packages() and later update.packages() is good if and when
you know what it takes to have built-time dependencies (besides r-base-dev) installed
you don't mind running update.packages() by hand as well as the apt-get updates.
On my Ubuntu machine at work, I go with the second solution. But because the first one is better if you have enough coverage, we have built cran2deb which provides 2050+ binary deb packages for amd64 and i386 --- but only for Debian testing. That is what I use at home.
As for last question of whether you 'should you expect trouble': No, because R_LIBS_SITE is set in /etc/R/Renvironment to be
# edd Apr 2003 Allow local install in /usr/local, also add a directory for
# Debian packaged CRAN packages, and finally the default dir
# edd Jul 2007 Now use R_LIBS_SITE, not R_LIBS
R_LIBS_SITE=${R_LIBS_SITE-'/usr/local/lib/R/site-library:\
/usr/lib/R/site-library:/usr/lib/R/library'}
which means that your packages go into /usr/local/lib/R/site-library whereas those managed by apt go into /usr/lib/R/site-library and (in the case of base packages) /usr/lib/R/library.
Hope that clarifies matters. The r-sig-debian mailing list is a more informed place for questions like this.
I'd consider using apt-get best practice since you will get automatic updates through the standard system tools.
Having 2 versions installed might get you into confusing situations: depending on your R setup you could load another package version then you expect -- your private (maybe outdated) one should in general be loaded first.
See above.

Resources