renv: install R packages without help pages & Co - r

With R's install.packages(), it is possible to exclude certain parts of a package, which are only helpful during development - such as help pages and example data - to yield lightweight runtime environments (e.g. smaller Docker images):
install.packages('data.table',
INSTALL_opts = c('--no-docs', '--no-help', '--no-html', '--no-data'))
Now, I am wondering, if such a feature exists also for renv (and I was just too blind to find it on the web)?

From https://rstudio.github.io/renv/reference/install.html#package-configuration:
Similarly, additional flags that should be passed to R CMD INSTALL can
be set via the install.opts R option:
# installation of R packages using the Windows Subsystem for Linux
# may require the `--no-lock` flag to be set during install
options(install.opts = "--no-lock")
renv::install("xml2")
You can also use the package name in the option, e.g.
options(install.opts = list(xml2 = <...>))
if you wanted package-specific installation options; e.g. you wanted to exclude vignettes for certain packages.

Related

automatically create personal library in R

When you try to install a package in R and you don't have access rights to the default library path, R will ask you:
Would you like to use a personal library instead?
Would you like to create a personal library '~/path' to install
packages into?
However, if you are running an Rscript, those messages will not show up and installation will fail. I could predefine a specific path and instruct install.packages to use it, but I don't want to create an additional library path that would be specific to this Rscript. I just want to use the default personal library. Is there a way to force creation of a personal library without requiring interaction?
You can use Sys.getenv("R_LIBS_USER") to get the local library search location.
This is what I ended up doing, which seems to be working (the hardest part was testing the solution, since the problem only occurs the first time you try to install a package):
# create local user library path (not present by default)
dir.create(path = Sys.getenv("R_LIBS_USER"), showWarnings = FALSE, recursive = TRUE)
# install to local user library path
install.packages(p, lib = Sys.getenv("R_LIBS_USER"), repos = "https://cran.rstudio.com/")
# Bioconductor version (works for both Bioconductor and CRAN packages)
BiocManager::install(p, update = FALSE, lib = Sys.getenv("R_LIBS_USER"))
As #hrbrmstr pointed out in the comments, it may not be a good idea to force-install packages, so use at your own risk.

Virtual environment in R?

I've found several posts about best practice, reproducibility and workflow in R, for example:
How to increase longer term reproducibility of research (particularly using R and Sweave)
Complete substantive examples of reproducible research using R
One of the major preoccupations is ensuring portability of code, in the sense that moving it to a new machine (possibly running a different OS) is relatively straightforward and gives the same results.
Coming from a Python background, I'm used to the concept of a virtual environment. When coupled with a simple list of required packages, this goes some way to ensuring that the installed packages and libraries are available on any machine without too much fuss. Sure, it's no guarantee - different OSes have their own foibles and peculiarities - but it gets you 95% of the way there.
Does such a thing exist within R? Even if it's not as sophisticated. For example simply maintaining a plain text list of required packages and a script that will install any that are missing?
I'm about to start using R in earnest for the first time, probably in conjunction with Sweave, and would ideally like to start in the best way possible! Thanks for your thoughts.
I'm going to use the comment posted by #cboettig in order to resolve this question.
Packrat
Packrat is a dependency management system for R. Gives you three important advantages (all of them focused in your portability needs)
Isolated : Installing a new or updated package for one project won’t break your other projects, and vice versa. That’s because packrat gives each project its own private package library.
Portable: Easily transport your projects from one computer to another, even across different platforms. Packrat makes it easy to install the packages your project depends on.
Reproducible: Packrat records the exact package versions you depend on, and ensures those exact versions are the ones that get installed wherever you go.
What's next?
Walkthrough guide: http://rstudio.github.io/packrat/walkthrough.html
Most common commands: http://rstudio.github.io/packrat/commands.html
Using Packrat with RStudio: http://rstudio.github.io/packrat/rstudio.html
Limitations and caveats: http://rstudio.github.io/packrat/limitations.html
Update: Packrat has been soft-deprecated and is now superseded by renv, so you might want to check this package instead.
The Anaconda package manager conda supports creating R environments.
conda create -n r-environment r-essentials r-base
conda activate r-environment
I have had a great experience using conda to maintain different Python installations, both user specific and several versions for the same user. I have tested R with conda and the jupyter-notebook and it works great. At least for my needs, which includes RNA-sequencing analyses using the DEseq2 and related packages, as well as data.table and dplyr. There are many bioconductor packages available in conda via bioconda and according to the comments on this SO question, it seems like install.packages() might work as well.
It looks like there is another option from RStudio devs, renv. It's available on CRAN and supersedes Packrat.
In short, you use renv::init() to initialize your project library, and use renv::snapshot() / renv::restore() to save and load the state of your library.
I prefer this option to conda r-enviroments because here everything is stored in the file renv.lock, which can be committed to a Git repo and distributed to the team.
To add to this:
Note:
1. Have Anaconda installed already
2. Assumed your working directory is "C:"
To create desired environment -> "r_environment_name"
C:\>conda create -n "r_environment_name" r-essentials r-base
To see available environments
C:\>conda info --envs
.
..
...
To activate environment
C:\>conda activate "r_environment_name"
(r_environment_name) C:\>
Launch Jupyter Notebook and let the party begins
(r_environment_name) C:\> jupyter notebook
For a similar "requirements.txt", perhaps this link will help -> Is there something like requirements.txt for R?
Check out roveR, the R container management solution. For details, see https://www.slideshare.net/DavidKunFF/ownr-technical-introduction, in particular slide 12.
To install roveR, execute the following command in R:
install.packages("rover", repos = c("https://lair.functionalfinances.com/repos/shared", "https://lair.functionalfinances.com/repos/cran"))
To make full use of the power of roveR (including installing specific versions of packages for reproducibility), you will need access to a laiR - for CRAN, you can use our laiR instance at https://lair.ownr.io, for uploading your own packages and sharing them with your organization you will need a laiR license. You can contact us on the email address in the presentation linked above.

Install an R package temporarily, only for the current session

Sometimes on Stack Overflow, there's a question relative to a package which is not installed on my system, and which I don't plan to reuse later.
If I install the package with install.packages(), it will be put in one of my R install libraries, and then will take some storage space and be updated each time I run update.packages().
Is there a way to install a package only for the current R session ?
You can install a package temporarily with the following function :
tmp.install.packages <- function(pack, dependencies=TRUE, ...) {
path <- tempdir()
## Add 'path' to .libPaths, and be sure that it is not
## at the first position, otherwise any other package during
## this session would be installed into 'path'
firstpath <- .libPaths()[1]
.libPaths(c(firstpath, path))
install.packages(pack, dependencies=dependencies, lib=path, ...)
}
Which you can use simply this way :
tmp.install.packages("pkgname")
The package is installed in a temporary directory, and its files should be deleted at next system restart (at least on linux systems).
Another solution for this problem is devmode from devtools. Devmode allows you to install packages to a dev repository so your other packages are untouched if you install development versions. For example:
library(devtools)
devmode()
install_github('ggplot2', 'hadley')
devmode()
You'll notice that your version has not changed.
pacman deals with package management issues like this:
library(pacman)
Now you can use:
p_load("pkgname") #installs or loads package if already installed
#at end of session:
p_delete("pkgname") #deletes package from lib
This is a quick way to install in your directory and then delete it at the end (not really a temporary install)
As an addition to Tyler's answer a p_temp function was recently added to the pacman package which does exactly what the question asks for.
library(pacman)
p_temp(pkgname) # or p_temp("pkgname") either work...
This will install the package and any dependencies temporarily.
Disclosure: Tyler and I are co-authors of the pacman package...
The following is something in the middle between
juba
and
sebastian-c,
and is as simple as that:
.libPaths("/my/path")
Now and until the end of the current session,
you can install packages as you normally would, and they will end up in
/my/path.
Also package dependencies will go to /my/path.
If you want to have control over dependencies, you can specify them manually with:
install.packages(c("pack", "dep1", "dep2", ...), dependencies = FALSE)
This approach might be useful in two particular scenarios:
A so-to-say discovery session. You want to discover new packages and install them casually to see if something interesting pops up. Then, you use an OS provided tempdir in .libPaths(), to avoid messing your R setup, and the OS will take care of the cleaning.
Create, nowadays common, reproducible environments.
You install a base R, then add .libPaths("my/project/dir"). By looking at this dir, you have a clear picture of what are your project package requirements. Further, you can copy this folder to another PC to reproduce the same environment. Much like Python pipenv you can have more isolated environments: for each session, you call .libPaths() with the related project dir.

How can I ensure a consistent R environment among different users on the same server?

I am writing a protocol for a reproducible analysis using an in-house package "MyPKG". Each user will supply their own input files; other than the inputs, the analyses should be run under the same conditions. (e.g. so that we can infer that different results are due to different input files).
MyPKG is under development, so library(MyPKG) will load whichever was the last version that the user compiled in their local library. It will also load any dependencies found in their local libraries.
But I want everyone to use a specific version (MyPKG_3.14) for this analysis while still allowing development of more recent versions. If I understand correctly, "R --vanilla" will load the same dependencies for everyone.
Once we are done, we will save the working environment as a VM to maintain a stable reproducible environment. So a temporary (6 month) solution will suffice.
I have come up with two potential solutions, but am not sure if either is sufficient.
ask the server admin to install MyPKG_3.14 into the default R path and then provide the following code in the protocol:
R --vanilla
library(MyPKG)
....
or
compile MyPKG_3.14 in a specific library, e.g. lib.loc = "/home/share/lib/R/MyPKG_3.14", and then provide
R --vanilla
library(MyPKG)
Are both of these approaches sufficient to ensure that everyone is running the same version?
Is one preferable to the other?
Are there other unforseen issues that may arise?
Is there a preferred option for standardising the multiple analyses?
Should I include a test of the output of SessionInfo()?
Would it be better to create a single account on the server for everyone to use?
Couple of points:
Use system-wide installations of packages, e.g. the Debian / Ubuntu binary for R (incl the CRAN ports) will try to use /usr/local/lib/R/site-library (which users can install too if added to group owning the directory). That way everybody gets the same version
Use system-wide configuration, e.g. prefer $R_HOME/etc/ over the dotfiles below ~/. For the same reason, the Debian / Ubuntu package offers softlinks in /etc/R/
Use R's facilties to query its packages (eg installed.packages()) to report packages and versions.
Use, where available, OS-level facilities to query OS release and version. This, however, is less standardized.
Regarding the last point my box at home says
> edd#max:~$ lsb_release -a | tail -4
> Distributor ID: Ubuntu
> Description: Ubuntu 12.04.1 LTS
> Release: 12.04
> Codename: precise
> edd#max:~$
which is a start.

Dependency management in R

Does R have a dependency management tool to facilitate project-specific dependencies? I'm looking for something akin to Java's maven, Ruby's bundler, Python's virtualenv, Node's npm, etc.
I'm aware of the "Depends" clause in the DESCRIPTION file, as well as the R_LIBS facility, but these don't seem to work in concert to provide a solution to some very common workflows.
I'd essentially like to be able to check out a project and run a single command to build and test the project. The command should install any required packages into a project-specific library without affecting the global R installation. E.g.:
my_project/.Rlibs/*
Unfortunately, Depends: within the DESCRIPTION: file is all you get for the following reasons:
R itself is reasonably cross-platform, but that means we need this to work across platforms and OSs
Encoding Depends: beyond R packages requires encoding the Depends in a portable manner across operating systems---good luck encoding even something simple such as 'a PNG graphics library' in a way that can be resolved unambiguously across systems
Windows does not have a package manager
AFAIK OS X does not have a package manager that mixes what Apple ships and what other Open Source projects provide
Even among Linux distributions, you do not get consistency: just take RStudio as an example which comes in two packages (which all provide their dependencies!) for RedHat/Fedora and Debian/Ubuntu
This is a hard problem.
The packrat package is precisely meant to achieve the following:
install any required packages into a project-specific library without affecting the global R installation
It allows installing different versions of the same packages in different project-local package libraries.
I am adding this answer even though this question is 5 years old, because this solution apparently didn't exist yet at the time the question was asked (as far as I can tell, packrat first appeared on CRAN in 2014).
Update (November 2019)
The new R package renv replaced packrat.
As a stop-gap, I've written a new rbundler package. It installs project dependencies into a project-specific subdirectory (e.g. <PROJECT>/.Rbundle), allowing the user to avoid using global libraries.
rbundler on Github
rbundler on CRAN
We've been using rbundler at Opower for a few months now and have seen a huge improvement in developer workflow, testability, and maintainability of internal packages. Combined with our internal package repository, we have been able to stabilize development of a dozen or so packages for use in production applications.
A common workflow:
Check out a project from github
cd into the project directory
Fire up R
From the R console:
library(rbundler)
bundle('.')
All dependencies will be installed into ./.Rbundle, and an .Renviron file will be created with the following contents:
R_LIBS_USER='.Rbundle'
Any R operations run from within this project directory will adhere to the project-speciic library and package dependencies. Note that, while this method uses the package DESCRIPTION to define dependencies, it needn't have an actual package structure. Thus, rbundler becomes a general tool for managing an R project, whether it be a simple script or a full-blown package.
You could use the following workflow:
1) create a script file, which contains everything you want to setup and store it in your projectd directory as e.g. projectInit.R
2) source this script from your .Rprofile (or any other file executed by R at startup) with a try statement
try(source("./projectInit.R"), silent=TRUE)
This will guarantee that even when no projectInit.R is found, R starts without error message
3) if you start R in your project directory, the projectInit.R file will be sourced if present in the directory and you are ready to go
This is from a Linux perspective, but should work in the same way under windows and Mac as well.

Resources