I want to test the potential function of the collective maintenance of the R scripts across individuals. I try to work on Rstudio project together with the Could software eg. Dropbox and the version control (eg. Git), so we can have all the records of all the updates from different maintainers. Therefore, I try to test the new released R package renv.
On my Mac OS, my newly installed packages are available in the 1st directory as I listed below.
.libPaths()
## [1] "/Library/Frameworks/R.framework/Versions/library"
## [2] "/Library/Frameworks/R.framework/Versions/3.6/Resources/library"
However when I start the renv with the renv::init(). It only has the basic packages available. How can I move these installed packages into the global cache directly without the need to reinstall these pacakges?
You can simply call renv::install() (or renv::restore()) and renv will find packages already installed in the cache. It's possible because all the projects using renv share the global package cache, therefore, the project libraries are composed by symlinks associated to the global package cache.
In case that renv global package cache and the project library are installed in different disk volumes, renv will copy the package from the cache into the project library, instead of using a symlink.
In macOS, the default location of renv global packages caches is: ~/Library/Application Support/renv.
All the information was extracted from the following link: https://cran.r-project.org/web/packages/renv/vignettes/renv.html.
I hope it helps you. Good luck!
Related
I am getting started with collaborating with team members on R projects using renv. While I can (mostly) get it to work, I am a bit confused about whether and where to install renv itself. According to the documented workflow I basically need renv installed before I start a new project with renv.
However, when I do not have renv installed, and clone a repo that uses renv, it seems to install (bootstrap?) itself. But it does this within the local renv environment.
I have a couple of questions regarding this:
Do you recommend to have renv installed "outside" the renv virtual environment?
How do you deal with differences in versions of renv itself between what is e.g. installed on my machine and present in a repo that I clone, and which renv I would like to replicate? I ran into problems with this one, could not replicate a renv from a cloned repo with a different renv version.
On a more conceptual level: why is renv itself part of the virtual environment it creates? That's not the case for the python virtual environment managers I know.
Do you recommend to have renv installed "outside" the renv virtual environment?
We do. In fact, this is necessary if you want to initialize an renv project in the first place, since this is done by calling renv::init() -- and so the regular renv initialization workflow expects renv to be installed into the user library.
How do you deal with differences in versions of renv itself between what is e.g. installed on my machine and present in a repo that I clone, and which renv I would like to replicate? I ran into problems with this one, could not replicate a renv from a cloned repo with a different renv version.
Since renv is just an R package, you can install or upgrade (or downgrade) the version of renv used within a project as required, without affecting other projects. For example, installing the latest version from CRAN can be done with a plain install.packages("renv").
When working within an renv project, the version of renv installed in that project is normally the copy that is used -- so at this point, it should no longer matter what version of renv is installed in the user library.
On a more conceptual level: why is renv itself part of the virtual environment it creates? That's not the case for the python virtual environment managers I know.
This is done primarily to ensure existing renv projects can continue to function even if an update to renv happened to break some existing workflows. (We endeavor to make sure that will never happen, but want to make sure users have an escape hatch in case it does.)
However, when I do not have renv installed, and clone a repo that uses renv, it seems to install (bootstrap?) itself. But it does this within the local renv environment.
The "bootstrap" behavior here is done to help streamline the collaborative workflow. Rather than requiring users explicitly install renv before opening an renv project, renv knows enough to bootstrap itself in an existing project so that new users can get up and running quickly. (In addition, the bootstrapper script also tries to ensure that the version of renv that project was configured to use is installed.)
I'm using the 'renv' R package in an RStudio project to control/lock the package versions used by my script. The libraries sit in the project directory under ... renv\library\R-4.1\x86_64-w64-mingw32. I'm using R version 4.1.3 and renv 0.15.5. When this directory is copied to a colleague's machine (using memory stick) the libraries in the directory mentioned above are blank. I'm assuming these libraries are just pointers to where R saves packages (e.g. "C:/Program Files/R/R-4.1.3/library") and my colleague doesn't have these packages on their machine.
Is there a way to include the packages themselves when sharing the RStudio Project directory?
By default, packages within the renv project directory are symlinked from a global cache location. If you want to ensure packages are instead stored locally in the project library, you can use renv::isolate().
See https://rstudio.github.io/renv/reference/isolate.html for more details.
We have a ubuntu linux server in our office which is a air-gapped environment. There is no internet access to external network.
However I would like to install few R packages like ggplot2, Database Connector, dplyr, Tidyverse etc. I have more than 10-15 packages to download
While I cannot write the usual command install.packages("DatabaseConnector"), I have to download the zipped folders from CRAN as shown here.
I am new to R. So, can you help me with my questions given below?
a) Why is there are no files for linux systems? I only see windows binaries and macOS binaries. Which one should I download?
b) Should I download binaries or package source? which one is easy to install?
c) When I download packages like above as zipped file from CRAN like shown here, will the dependencies be automatically downloaded as well? Or should I look at error messages and keep downloading them one by one?
d) Since I work in a Air-gapped environment, what would be the best way to do this process efficiently.
Under linux packages are always installed from source. There are no official binary packages for linux. However, your distro might offer some of them in the official repositories. Ubuntu does. However these tend to be quite old versions and usually limited to a handfull of the most important packages. So, for linux you have to download the source packages. The zip files are for windows and will not work.
You will also need to download all of the dependencies of the packages. For something like tidyverse this will be a huge number. Tracking those by hand is a lot of work. Easiest is probably to use a package like miniCRAN outside of your airgapped system to build a selective copy of CRAN. You can specify the packages you want and the package will download all dependencies. You can then copy the downloaded directories to your server, point install.packages in the right direction and install as usually using install.packages. For details see https://andrie.github.io/miniCRAN/articles/miniCRAN-introduction.html.
You might also run into the problem that your system does not have all of the depencies needed to build all of the packages. Under ubuntu you need for example to install libxml2-dev to be able to install the xml package. For that you need to use the package manager of ubuntu. How to do that on an airgapped system is another issue
I have observed that R installation in Windows creates two library paths automatically.
.libPaths()
# [1] "C:/Users/User/Documents/R/win-library/3.4"
# [2] "C:/Program Files/R/R-3.4.0/library"
What is the use of these while installing new packages and which library is used? I have frequently observed the installed packages being missed and need to install again. How do you maintain these two paths and manage the libraries while using R or RStudio in Windows?
Installing into C:/Program Files/R/... makes a package available to all users of the computer.
It is the R default, but installing a package there from within R (using install.packages() requires that R is started with administrator privileges.
Installing into C:/Users/Username/... makes the package available to the present user only, but does not require administrative rights.
R tracks these paths automatically, and looks in both directories when it is asked to load a package with require() or library(). No user input should be required.
When you update R, the version number will of course change, meaning that R will no longer look in the folders whose paths contained the previous version number. Some R updaters (e.g. installR) offer to copy packages from the "old" paths to the "new" paths, though an manually re-installing packages means that you can be sure that you are using the latest version of each package, and that you don't waste disk space and update time on packages that you are no longer using.
I am developing a framework for reproducible computing with R. One problem that I am struggling with is that some R code might run perfectly in version X.Y-Z of a package, but then why you try to reproduce it 3 years later, the packages have updated, some functions are changed, and the code doesn't run anymore. This problem affects also for example Sweave documents that use packages.
The only way to confidently reproduce the results is by installing the R version and version of the packages that were used by the original author. If this was a single case, one could pull stuff from the CRAN archives and install appropriate versions. But for my framework this is impractical, and I need to have the package versions preinstalled.
Assume for now that I restrict myself to a single version of R, e.g. 2.14. What would be a practical way to install many versions of R packages, so that I can load them on the fly? I suppose I can do something like creating separate library directories for every version of every package and then using custom lib.loc arguments while loading them. This is going to be messy though. Any tips or previous attempts to do something similar?
My framework runs on Ubuntu server.
You could install packages with versions (e.g. rename to foo_1.0 directory instead of foo) and softlink the versions you want to re-create a given R + packages snapshot into one library. Obviously, the packages could actually live in a separate tree, so you could have library.projectX/foo -> library.all/foo/1.0.
The operating system gives you even more handles for complete separation, and the Debian / Ubuntu stack as a ton of those available. Two I have played with are
chroot environments: We use this to complete separate build environments from host machines. For example, all Debian uploads I produced are built in a i386 pbuilder chroot hosted on my amd64 Ubuntu server. Chroot is a very powerful Unix system call. Chroots, and particularly the pbuilder system built on top of it (for Debian package building) are meant to operate headless.
Virtual machines: This gives you full generality. My not-so-powerful box easily handles three virtual machines: Debian i386, Ubuntu i386 as well as Windoze XP. For this, I currently use KVM along with libvirt; this is Linux specific. I have also used VirtualBox and VMware in the past.
I would try to modify the DESCRIPTION file, and change the field "Package" there by adding the version number.
For example, you download the package source a from CRAN page (http://cran.r-project.org/web/packages/pls/). Unpack the compressed file (pls_2.3-0.zip) to a directory ("pls/"). The following steps are to change the package name in DESCRIPTION ("pls/DESCRIPTION") and installation with R command 'R CMD INSTALL pls/', where 'pls/' is a path to the package source with modified DESCRIPTION file.
Playing with R library paths seems a dangerous thing to me.