Why do different operating systems install packages from CRAN differently? - r

When I install.package() in R on a Windows machine, the package downloads from CRAN and installs. When I do the same on a Linux box, the package usually has to compile (at least, I assume that's what going with all those g++ lines that scroll past).
Why is the package installation method different on Windows?
Other questions and their answers make it clear that using different methods and repositories for Linux particularly, enables users to get more/different packages (particularly using the cran2deb repository). My question is more theoretical in nature: why is the default choice in Windows to download precompiled (binary?) packages whereas the default in Linux seems to be to compile packages from source?
Or to put it another way (based on Dirk's assertion in the second link above), why doesn't CRAN offer binary packages for Unix-type operating systems?

In general, Windows binaries will work on all versions of Windows.
Ditto for the key / current versions of MacOS: the provided binaries work.
Linux, sadly, is more complicated because the different distros lay things out differently. Something I build on Ubuntu or Debian (or, more specifically, a particular release version thereof) may not even work on other releases of the same distro, let alone other distro. In some cases you can get binaries. At some point I (co-)owned a build service for all of CRAN, but it died/broke. All doable with effort, but ... some effort.
So from source it is. That use to be the standard anyway which "Unix" was a catch-all phrase covering SunOS/Solaris, AIX, *BSD, SGI and on and on. Often even with different processors. So source.
There have been attempts to provide 'universal binaries': flatpack and snap are two more recent examples. And then there is of course Docker.

Related

R Package Need compilation [duplicate]

This question already has answers here:
R: apt-get install r-cran-foo vs. install.packages("foo")
(2 answers)
Closed 7 years ago.
In Debian, there are some compiled R packages in the official repositories. But one could also install a R package from source.
I am interested to know why would a user prefer one method of installation to another.
It's sometimes preferable to 'compile' the sources on your server rather than just using an existing executable file.
This is because the compiler makes the exe file specifically for your machine so may run faster and work much better, for instance the compiler knows the processor you have so can optimise for this.
I already provided a somewhat detailed answer in response to this SO question.
As an update, these days you even have lots of packages prebuilt thanks to updated cran2deb initiaives:
On Ubuntu you now have almost all CRAN packages prebuilt via Michael Rutter's 'cran2deb for ubuntu' ppa on Launchpad.
For Debian, Don Armstrong now provides a similar service (also covering BioConductor and OmegaHat) at debian-r.debian.net.
The idea of pre-compiled R packages for Debian/Ubuntu is borrowing from Windows and MacOS. Those OSes have pre-compiled packages since they typically don't have the standard tools in standard locations for building packages from source (c and fortran compilers, latex, perl, etc.).
If there is a new release of a package on CRAN, is the pre-compiled package on Debian repos automatically updated? I believe that you better sync with CRAN. Check out the package ctv to help you manage large collections of R packages ("CRAN views"), both for installing and updating.
You need root privileges to install a pre-compiled package from the OS repos, while any regular user may install any packages using install.packages() in R (but I recommend to run sudo R, if you are the sysadmin, for installing CRAN views, so as to make them available system-wide, instead of inflating your ~/).
One inconvenient to source packages is that if you fetch many, the compiling will take extra time to install (depending on your machine). You might gain in performance from compiling, but it is not guaranteed to be noticeable.

Can homebrew R and "standard" R for MacOS from CRAN coexist?

I am running R 3.6.1 on a Mac Mini running Sierra and a MacBook Pro running El Capitan. I normally get all the R packages that I need from CRAN or github and use them without issues, but I am trying to install and use an R package (NicheMapR) that requires a fortran compiler and this is giving me issues. Even after installing gfortran, the R package still does not work (the fortran code seems to be compiled but the package installation fails). The package developer suggested that installing R via homebrew might solve the problem. On the contrary, my hunch is that it would lead to a world of pain, to quote Walter from the Big Lebowski. My questions are:
What is the advantage of a homebrew version of R for MacOSX over the "regular" version installed from CRAN?
Can the two versions coexist?
Is the homebrew version going to affect the regular one?
Finally: is homebrew going to help or will it simply open a whole
new can of worms?
Many thanks in advance.
Yes, installing from homebrew is a recipe for pain. It's specifically recommended against by the official CRAN binary maintainer see his remarks from March 2016 on r-sig-mac.
Regarding your questions, this can be summarized as:
What is the advantage of a homebrew version of R for MacOSX over the "regular" version installed from CRAN?
Positives: Select your own BLAS and easily work with geospatial tools.
Downsides: Always needing to compile each R package.
Can the two versions coexist?
Yes. The homebrew version installs into a different directory. But, watch out for library collision (see next question). However, you will have to deal with symbolic linking regarding what version of R is accessible from the console and you will also need to look into using RSwitch to switch between R versions.
Is the homebrew version going to affect the regular one?
Yes, if the library paths overlap. There will be problems regarding package installation and loading. Make sure to setup different library paths. To do so, please look at the .libPaths() documentation.
Finally: is homebrew going to help or will it simply open a whole new can of worms?
Yes and no. Unless you know what you're doing, opt for the CRAN version of R and its assorted goodies.

Determine if R package is available on Linux

I am currently running R on mac osx but am looking to purchase a linux server for more power. Is there any way that I can check for specific R packages whether they will also work on linux? (before, of course, I actually buy the server and try to install and run the given packages). Also, is there any way to determine if a given package would run on certain linux distributions but not others (e.g. Ubuntu vs. Debian)?
Assuming the package is on CRAN, go to the package's CRAN page, e.g. https://cran.r-project.org/package=zoo and then click on the link to the right of CRAN checks which in this example would be labelled zoo results. It would take you to this page: https://cran.r-project.org/web/checks/check_results_zoo.html showing the results of checking that package on various different platforms.
If the package is not on CRAN but is on github and the developer checks it with Travis-CI then you can view the check by clicking on the Travis-CI icon. For example, the klmr modules package is not on CRAN (there is a CRAN package of the same name but it's different); however, if you look at its github home page at https://github.com/klmr/modules and click on the icon which currently is black and green and reads build passing (but could read something else if there are changes to the package or R that breaks tests) then you will be taken to the Travis-CI tests at https://travis-ci.org/klmr/modules .
tl;dr Slightly opinion/personal experience based, but I would be surprised if there were any CRAN package that you couldn't get running on Linux.
In general Unix users tend to install packages from source: CRAN doesn't provide binaries, but source installation is usually painless. The package binaries that are available (the CRAN Linux page has links for Debian, Ubuntu, SUSE, and Red Hat) tend to focus on packages that have extra system-level dependencies (e.g. FFT libraries, or spatial data analysis libraries) where it's more of a nuisance to assemble the needed dependencies for a particular system.
From the CRAN repository policy:
Package authors should make all reasonable efforts to provide cross-platform portable code. Packages will not normally be accepted that do not run on at least two of the major R platforms [i.e. Windows, MacOS, Linux]. Cases for Windows-only packages will be considered, but CRAN may not be the most appropriate place to host them.
When a package fails to run on of one of the three platforms, it's usually Windows. The only package I've ever had real trouble installing on Linux is
R2OpenBUGS on 64-bit systems, because it requires installing a 32-bit toolchain.

Compiling R packages for a memory-profiling configuration

Say I have two R installations. Same version but one built for Ubuntu Linux (locally) with memory profiling and the other without. Do I need to compile the installed packages for each separately?
Short answer is 'Nope' as packages are unaffected by this optional feature in the R engine.
If you have particular questions concerning R use on Debian and Ubuntu, come to the r-sig-debian list.

How to install and manage many versions of R packages

I am developing a framework for reproducible computing with R. One problem that I am struggling with is that some R code might run perfectly in version X.Y-Z of a package, but then why you try to reproduce it 3 years later, the packages have updated, some functions are changed, and the code doesn't run anymore. This problem affects also for example Sweave documents that use packages.
The only way to confidently reproduce the results is by installing the R version and version of the packages that were used by the original author. If this was a single case, one could pull stuff from the CRAN archives and install appropriate versions. But for my framework this is impractical, and I need to have the package versions preinstalled.
Assume for now that I restrict myself to a single version of R, e.g. 2.14. What would be a practical way to install many versions of R packages, so that I can load them on the fly? I suppose I can do something like creating separate library directories for every version of every package and then using custom lib.loc arguments while loading them. This is going to be messy though. Any tips or previous attempts to do something similar?
My framework runs on Ubuntu server.
You could install packages with versions (e.g. rename to foo_1.0 directory instead of foo) and softlink the versions you want to re-create a given R + packages snapshot into one library. Obviously, the packages could actually live in a separate tree, so you could have library.projectX/foo -> library.all/foo/1.0.
The operating system gives you even more handles for complete separation, and the Debian / Ubuntu stack as a ton of those available. Two I have played with are
chroot environments: We use this to complete separate build environments from host machines. For example, all Debian uploads I produced are built in a i386 pbuilder chroot hosted on my amd64 Ubuntu server. Chroot is a very powerful Unix system call. Chroots, and particularly the pbuilder system built on top of it (for Debian package building) are meant to operate headless.
Virtual machines: This gives you full generality. My not-so-powerful box easily handles three virtual machines: Debian i386, Ubuntu i386 as well as Windoze XP. For this, I currently use KVM along with libvirt; this is Linux specific. I have also used VirtualBox and VMware in the past.
I would try to modify the DESCRIPTION file, and change the field "Package" there by adding the version number.
For example, you download the package source a from CRAN page (http://cran.r-project.org/web/packages/pls/). Unpack the compressed file (pls_2.3-0.zip) to a directory ("pls/"). The following steps are to change the package name in DESCRIPTION ("pls/DESCRIPTION") and installation with R command 'R CMD INSTALL pls/', where 'pls/' is a path to the package source with modified DESCRIPTION file.
Playing with R library paths seems a dangerous thing to me.

Resources