Package management running R on a shared server - r

some background: I'm a fairly beginning sysadmin maintaining the server for our department. The server houses several VM's, mostly Ubuntu SE 12.04, usually with a separate VM per project.
One of the tools we use is R and RStudio, also server edition. I've set this up so everyone can access it through their browser, but I'm still wondering what the best way would be to deal with package management. Ideally, I will have one folder/library with our "common" packages, which are common in many projects and use cases. I would admin this library, since I'm the only user in sudo. My colleagues should be able to add packages on a case-by-case basis in their "personal" R folders, that get checked as a backup in case a certain package is not available in our main folder.
My question has a few parts:
- Is this actually a viable way to set this up?
- How would I configure this?
- Is there a way to easily automate this library for use in other VM's?
I have a similar question pertaining to Python, but maybe I should make a new question for that..

R supports multiple libaries for packages by default. Libraries are basically just folders in which installed packages are placed.
You can use
.libPaths()
in R to view what paths are use as libraries on your system. On my Ubuntu 13.10 system, there are
a personal library at "~/R/x86_64-pc-linux-gnu-library/3.0" where packages installed by the user are placed,
"/usr/lib/R/library" where packages installed via apt-get are placed and
"/usr/lib/R/site-library" which is a system-wide library for packages that are shared by all users.
You can add additional libraries to R, but from how I understand your question, installing packages to /usr/lib/R/site-library might be what you are looking for. This can be archived relatively easily by running R as root and calling install.packages() and update.packages() from there as usual. However, running R as root is a security risk and not a good idea, so maybe it is better to create an separate user with write access to /usr/lib/R/site-library and to use that one instead of root.
If you mount /usr/lib/R/site-library on multiple VM, they should also share the packages installed there. Does this answer your question?

Having common library and personal library locations is completely feasible.
Each user should have two environment variables set. R_LIBS should point to the common library, and R_LIBS_USER should point to their personal location. See ?.Library for more information.
You can check a user's library paths using .libPaths(). You probably want users to install packages to their personal library, so some fiddling may be required to make sure that the personal library is the first element in of .libPaths().

Related

Maintain different versions of R package for open source contribution

Packrat is often recommended as the virtual environment for R, but it doesn't fully meet my need of contributing to R open source. Packrat's "virtual environment" is stored directly in the project directory, requiring me to modify the .gitignore to ignore them when I make a pull request to the open source upstream.
In contrast, something like conda stores the virtual environment somewhere else, leaving no trace in the project codebase itself.
So how do R open source contributors deal manage dependencies during package development? Ideally the solution would work well with devtools and Rstudio.
There is nothing wrong in having Packrat in .gitignore.
You can use .git/info/exclude file thus avoiding touching the .gitignore.

Big R project with several packages and developers: Best setup for easy version controll based on packages

I have to restructure a big project written in R, which is later consisting several packages as well as developers. Everything is set up on a git server.
The question is: How do I manage frequent changes inside packages without having to build them every time and developers updating them after they made a new pull? Is there any best practice or automation for that? I don't want source() with unbuilt packages and R.files but would like to stick with a package like structure as much as possible. We will work in a Windows environment.
Thanks.
So I fiddled around a while, tried different setups and came up with an arrangement which fits my needs.
It basically consists two git repositories. The first on (let's call it base-repo) of them contains most scripts on which all later packages are based on. The second repo we will call the "package-repo".
Most development work should be done on the base-repo. The base-repo is under CI control via a build server and unit tests.
The package-repo contains folders for each package we want to build and the base-repo as a git-submodule.
Each package can now be constructed via a very simple bash/shell script (“build script”):
check out a commit/tag of the submodule base-repo on which the stable
package build should be based on
copy files which are necessary for the package into the specific package folder
checks and builds the package
script can also create a history file of package
script can either be invoked manually or by a build server
This approach can also be combined with packrat. Additional code which is very package specific can now be also added to the package-repo and is under version control while independed from the base-repo
The approach could be further extended to trigger the build of packages from the package-repo based on pushes to the base-repo. Packages with a build script pointing to master as a commit will always be up to date and if under control of a build server it will ensure that changes to the base-repo will not break the package. Also it is possible to create several packages containing the same scripts from base-repo.
See also: git: symlink/reference to a file in an external repository

Archive area in local CRAN

My organization has a local CRAN-like repository for internal R packages. As we release more and more packages, I want to retire old versions to an Archive area so that people can get to them if they really want them, but by default they won't be installed by install.packages() and the like.
Is the best practice to just move them into src/contrib/Archive? Will that play well with the indexing routines like write_PACKAGES()? I use subdirectories for each distinct package, so somehow it needs to know not to descend into this directory.

Where to place package development directory and files?

I'm developing a package here at work from a number of functions that I've written. Once I build the package, I'll put the binary in a local package repository I've created on a company network drive, using a process described by Dirk in this question.
Question: Should my package dev folder be located in my package repository? If so, where?
I couldn't find much about this in the Creating a Package Tutorial, nor in the R Administrator Guide.
No, it does not matter where your package development directories live.
That said, I think it's a good idea to keep them separate from binary package repositories, because they are conceptually quite different, and generally you should be able to delete a repo and re-generate it from sources.

How do I setup a shared R package directory on a server?

I have a shared R package directory on a server in order to maintain consistent package versions for all users. This becomes problematic when someone attempts to install a new version of a package that another user originally installed, or they attempt to install it when that package is loaded elsewhere. In these instances R creates a 00LOCK-PackageName directory in the shared package directory, and the permissions are such that the installer doesn't have write access to many files within the directory. This then requires several people chmod-ind the directory to allow it to be deleted, or having one of our system administrators do the same.
This is an especially acute problem since we use R packages to maintain and deploy our reporting infrastructure. It's something that we're constantly updating and deploying to our shared server.
Are there settings or programs that facilitate shared R package management? Any general tips?
One common solution is to
have everybody be a member of a common group, mayne rapps
have the directory where you share the R packages be group-owned by rapps, and you want to make that 'sticky' -- chmod g=rwt if I recall correctly
have your umask default set in /etc/profile or equivalent to make sure your default
creation mode in in fact 'g+w'; I have used a file /etc/profile.d/local_umask.sh for this with a single command umask u=rwx,g=rwx,o=rx
We ended up having our systems administrator create a script that:
Opened permissions on all directories, subdirectories, and files within our shared package directory
Deleted any directories starting with 00LOCK that were older than 15 minutes
Ran every minute
We haven't run into any problems since.

Resources