Install R packages using conda via an environment.yml file - r

Normally I create conda environments like...
conda env create -f environment.yml
conda activate env_name
Normally I work in Python, where a typical environment.yml simple file might looks like this...
name: env_name
dependencies:
- python=3.7
- pip=19.3
- pandas=0.24.2
- pip:
- scipy==1.2.1
What should the environment.yml file look like to install R packages? The packages are on CRAN

A general rule of thumb is that most R packages have corresponding packages in Anaconda Cloud with the prefix r- added. While the defaults channel covers commonly-used packages, the conda-forge channel has the most thorough coverage of CRAN and has helpful scripts for adding new ones. I would generally recommend prioritizing conda-forge when creating R environments.
For bioinformaticians, all Bioconductor packages are available through the bioconda channel, with a bioconductor- prefix and lowercase. For example, SingleCellExperiment is packaged as bioconductor-singlecellexperiment.
A good place to start is simply searching Anaconda Cloud (example search).
Example
Let's assume you want the tidyverse umbrella package and wish to use R v4.1. A YAML for this would be
name: my_r_env
channels:
- conda-forge
dependencies:
- r-base=4.1
- r-tidyverse
Additional Notes
Avoid using install.packages() from within any R sessions - it is prone to dynamic linking issues due to the R instance's unawareness of compiling inside the environment. This is not an issue for pure R packages, but in that case it should be simple to add the package to conda-forge (takes about 15 mins of work and a ~12-24hr turnaround, IME).
Avoid the RStudio packages from Conda - it is an abandoned project and the old versions are incompatible with newer R versions. This may change once RStudio switches from Qt to Electron. Still, there are better ways to load an environment into RStudio, without having to install the full IDE inside the environment.

Related

Set up conda environment for R package not on CRAN, installs to wrong location

My goal is to use this package (https://github.com/tiagodc/TreeLS) but it was deprecated from CRAN (https://cran.r-project.org/web/packages/TreeLS/index.html). It requires an older version of R yet its dependencies such as the raster package require R 3.5 or up. I considered two approaches.
using R studio and changing the global options to an older version of R, but I frequently use many geospatial packages and since this package has older dependencies I didn't want to install older versions of packages I use all the time.
Create a virtual environment in Mini Conda 3 dedicated to use for this package. I choose this option because it would be self contained.
Here is the workflow so far.
conda search -c r r
conda create -n newR351 -c conda-forge r-base=3.5.1 -y
conda install -c r rtools -y
Successfully creates a conda environment called newR351 and installs r tools to that environment folder within mini conda 3.
Location of conda environment with R 3.5.1 install
C:\Users\me\Miniconda3\envs\newR351
When I try to install devtools so I can remote install TreeLS from github I get a warning with zero exit status. The devtools package installs, but it installed to my appdata folder and not my mini conda environment.
conda install -c r devtools -y
The downloaded source packages are in
'C:\Users\me\AppData\Local\Temp\RtmpYByvp8\downloaded_packages'
How can I access devtools on my conda environment newR351? Do I need to build a cran skeleton? When I activate R in this environment and try to load the devtools library I get this.
(newR351) C:\Users\me>R
R version 3.5.1 (2018-07-02) -- "Feather Spray"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)
...
>library(devtools)
Error in library(devtools) : there is no package called 'devtools'
What are best practices for creating an environment specific for an older R package? Anyone else use TreeLS?
First, the devtools isn't showing up because R packages in Conda repositories are conventionally prefixed with "r-", so installing conda install r-devtools should do the trick. However, I don't think Conda is the best strategy here.
Below R version 3.6, the Conda package coverage for R packages is rather poor. Also, installing non-Conda packages that require compilation into a Conda R environment is a pain and generally doesn't work out-of-the-box in my experience. Plus, not only does the TreeLS require compilation, but it has dependencies that are not Conda packages which require compilation. I would avoid this.
Option 1 is feasible. R allows multiple installations, and with manipulating environment variables (I think RSTUDIO_WHICH_R, R_LIBS are the pertinent ones) one can switch between them.
However, were this my situation, I'd spin up a docker container, probably rocker/rstudio:3.5 and use that for this project. Since the underlying image is Linux, it'll take awhile to compile, but you can version it at that point and then always have that available to spin up. This avoids having to muck around with any system settings and should be mostly straight-forward installing.

Update R 4.0.5 to R 4.1.1 with conda on ubuntu 18.04 [duplicate]

On Ubuntu in a Conda environment with Python 3.7.3, when I run
conda install -c conda-forge opencv
I get OpenCV 3.4.2 (checked with import cv2 and then cv2._version__) even though https://anaconda.org/conda-forge/opencv indicates version 4.11. Why?
Note that I didn't have OpenCV installed previously (I ran conda uninstall opencv and it got completely removed)
tl;dr You likely have previously installed dependencies that need updating. If you require a specific version, say 4.1, then express this to Conda:
conda install -c conda-forge opencv=4.1
Explanation
How Conda Interprets Specifications
A literal translation of the command
conda install -c conda-forge opencv
would go something like
With the conda-forge channel included, ensure that some version of the package opencv is installed in the currently active environment.
The logic here implies that any version it can install would be a valid solution. It also doesn't tell it that it must come from Conda Forge, only that that channel should be included.1
Two-Stage Solve Strategy
Starting with v4.7, Conda uses a two-stage dependency solving strategy. The two stages are
Solve with an implicit --freeze-installed|--no-update-deps flag. This attempts to find the newest version of the requested package that has no conflicts with installed packages. That is, it considers any installation of the package, no matter the version, to be a satisfactory solution. If it works, then it's done. Otherwise, move on to...
An unrestricted solve (what used to be default in Conda < 4.7). This frees up dependencies to be updated and will often result in the latest versions being installed unless there are previous explicit specifications on those packages.2
This strategy aims to provide a faster solve and install experience, by avoiding having to change anything in your environment. It also helps keep the environment stable by avoiding unnecessary version changes.
Specific Failure in Question
What happened in OP's case? One of the dependencies requirements of OpenCV was likely newer in v4.1.1 than what was already installed, but that dependency's version was compatible with installing OpenCV 3.4.2. Hence, the only thing that would change was adding opencv plus missing dependencies. Technically, this is a valid solution since one only asked for some version of opencv to be installed.
Getting the Latest Version
Option: Specifying the Version
If you know you want a specific version then you can always specify it
conda install -c conda-forge opencv=4.1.1
and since Conda can't install this without updating something in your env, the first round of solve will fail, and the full solve will get it for you.
Option: Skip the Freeze
Of course, you may not always know what the latest version number is and don't want to have to look this up on Anaconda Cloud every time. Fortunately, there is the --update-deps flag that essentially skips over the first solve stage and goes straight to the full solve. This will install the latest version for your system, as well as update any of the dependencies.
conda install --update-deps -c conda-forge opencv
Important Note: The --update-deps flag has a side-effect of converting dependencies to explicit specifications. While this is an internal environment state (managed through <env>/conda-meta/history), it does have some behavioral consequences (bugs!):
the result of the conda env export --from-history command will subsequently include all packages, instead of just the ones the user explicitly requested in the past
conda remove will not be able to prune dependencies; e.g., if scipy was installed, it would pull in numpy; if only scipy depended on numpy and scipy was removed, normally numpy would also get removed. This wouldn't work after using the --update-deps flag.
[1]: The behavior here depends on the channel_priority configuration option. With the strict setting, conda-forge would be prioritized over other channels; with the flexible setting, it is simply added to the list and the latest compatible version from any channel is selected.
[2]: One can check the explicit specifications of an environment with conda env export --from-history.

Creating an R environment using anaconda

I want to create a new R environment using anaconda. Following this page 'https://docs.anaconda.com/anaconda/user-guide/tasks/using-r-language/' I created an environment called eeEnv_r using this command in anaconda prompt conda create -n r_env r-essentials r-base. However the environment that was created looks like it was python environment. But when I type conda list I see some r packages such as dplyr appearing in addition to some python packages. Also there is python.exe file in environment folder. Any ideas why this is ?
I am trying to use VS code to run R, since I am familier with the IDE (used for python work). Also I want to create environments in R and then use those environment ins VS code (similar to python).
r-base depends on glib``notebook, which depends on python, therefore installing the latest r-base package will always pull in a python interpreter as well.
I would look at this from a different angle: with conda there is no such thing as a python or r-environment environments can mix packages from various languages. For most use cases this should not be a problem.

How to force `conda` to install the latest version of `jupyter`?

This question is motivated by `jupyter notebook` gives error: `"Could not open static file ''"` on macOS
After conda update jupyter, jupyter --version gives jupyter-notebook : 6.0.0
However on https://github.com/jupyter/notebook, clicking Branch: master -> tags I see a 6.0.1 tag.
How can I upgrade to 6.0.1?
> conda install jupyter=6.0.1
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
PackagesNotFoundError: The following packages are not available from current channels:
- jupyter=6.0.1
Current channels:
- https://repo.anaconda.com/pkgs/main/osx-64
- https://repo.anaconda.com/pkgs/main/noarch
- https://repo.anaconda.com/pkgs/r/osx-64
- https://repo.anaconda.com/pkgs/r/noarch
To search for alternate channels that may provide the conda package you're
looking for, navigate to
https://anaconda.org
and use the search bar at the top of the page.
I can't see any candidates on https://anaconda.org
Is this a dead-end?
First, note that the actual package you want to upgrade/install is notebook, not jupyter. The Anaconda channel hasn't released that version of notebook yet. Conda Forge has it, so you can get it with
conda install -c conda-forge notebook
However, just be aware that compatibility between Conda Forge and Anaconda package builds is not guaranteed. Best practice is to create a new env that prioritizes Conda Forge from the start:
conda create -n my_jupyter_env -c conda-forge jupyter
Generally it isn't a good idea to mess with base env, and if you want something other than a default Anaconda install, I recommend starting with Miniconda and leaving base alone (other than the occasional conda upgrade conda).

R package build failing on Unix machines due to missing GSL - GNU Scientific Library

I am facing a particularly vexing problem with R package development. My own package, called ggstatsplot (https://github.com/IndrajeetPatil/ggstatsplot), depends on userfriendlyscience, which depends on another package called MBESS, which itself ultimately depends on another package called gsl. There is no problem at all for installation of ggstatsplot on a Windows machine (as assessed by AppVeyor continuous integration platform: https://ci.appveyor.com/project/IndrajeetPatil/ggstatsplot).
But whenever the package is to be installed on Unix machines, it throws the error that ggstatsplot can't be downloaded because userfriendlyscience and MBESS can't be downloaded because gsl can't be downloaded. The same thing is also revealed on Travis continuous integration platform with virtual Unix machines, where the package build fails (https://travis-ci.org/IndrajeetPatil/ggstatsplot).
Now one way to solve this problem for the user on the Unix machine is to configure GSL (as described here:
installing R gsl package on Mac), but I can't possibly expect every user of ggstatsplot to go through the arduous process of configuring GSL. I want them to just run install.packages("ggstatsplot") and be done with it.
So I would really appreciate if anyone can offer me any helpful advice as to how I can make my package user's life simpler by removing this problem at its source. Is there something I should include in the package itself that will take care of this on behalf of the user?
This may not have a satisfying solution via changes to your R package (I'm not sure either way). If the gsl package authors (which include a former R Core member) didn't configure it to avoid a pre-req installation of a linux package, there's probably a good reason not to.
But it may be some consolation that most R+Linux users understand that some R packages first require installing the underlying Linux libraries (eg, through apt or dnf/yum).
Primary Issue: making it easy for the users to install
Try to be super clear on the GitHub readme and the CRAN INSTALL file. The gsl package has decent CRAN directions. This leads to the following bash code:
sudo apt-get install libgsl0-dev
The best example of clear (linux pre-req package) documentation I've seen is from the curl and sf packages. sf's CRAN page lists only the human names of the 3 libraries, but the GitHub page provides the exact bash commands for three major distribution branches. The curl package does this very well too (eg, CRAN and GitHub). For example, it provides the following explanation and bash code:
Installation from source on Linux requires libcurl. On Debian or Ubuntu use libcurl4-openssl-dev:
sudo apt-get install -y libcurl-dev
Ideally your documentation would describe how do install the gsl linux package on multiple distributions.
Disclaimer: I've never developed a package that directly requires a Linux package, but I use them a lot. In case more examples would help, this doc includes a script I use to install stuff on new Ubuntu machines. Some commands were stated explicitly in the package documentation; some had little or no documentation, and required research.
edit 2018-04-07:
I encountered my new favorite example: the sys package uses a config file to produce the following message in the R console. While installing 100+ packages on a new computer, it was nice to see this direct message, and not have to track down the R package and the documentation about its dependencies.
On Debian/Ubuntu this package requires AppArmor.
Please run: sudo apt-get install libapparmor-dev
Another good one is pdftools, that also uses a config file (and is also developed by Jeroen Ooms).
Secondary Issue: installing on Travis
The userfriendly travis config file apparently installs a lot of binaries directly (including gsl), unlike the current ggstatsplot version.
Alternatively, I'm more familiar with telling travis to install the linux package, as demonstrated by curl's config file. As a bonus, this probably more closely replicates what typical users do on their own machines.
addons:
apt:
packages:
- libcurl4-openssl-dev
Follow up 2018-03-13 Indrajeet and I tweaked the travis file so it's working. Two sections were changed in the yaml file:
The libgsl0-dev entry was added under the packages section (similar to the libcurl4-openssl-dev entry above).
Packages were listed in the r_binary_packages section so they install as binaries. The build was timing out after 50 minutes, and now it's under 10 min. In this particular package, the r_binary_packages section was nested in the Linux part of the Travis matrix so it wouldn't interfere with his two OS X jobs on Travis.

Resources