Maintain different versions of R package for open source contribution - r

Packrat is often recommended as the virtual environment for R, but it doesn't fully meet my need of contributing to R open source. Packrat's "virtual environment" is stored directly in the project directory, requiring me to modify the .gitignore to ignore them when I make a pull request to the open source upstream.
In contrast, something like conda stores the virtual environment somewhere else, leaving no trace in the project codebase itself.
So how do R open source contributors deal manage dependencies during package development? Ideally the solution would work well with devtools and Rstudio.

There is nothing wrong in having Packrat in .gitignore.
You can use .git/info/exclude file thus avoiding touching the .gitignore.

Related

How to share R package locally without using Github/miniCRAN/other internet sites

I created a local R-package (gmad) on one computer. I want to copy this package to a server and continue developing it.
I have the "project folder" in my current working directory where I wrote the R code etc. (includes the R functions explicitly saved in separate files).
There is also a folder with the same name in the directory containing all the installed packages - which has a sub-dolder named "R" but unlike the previous folder, it does not contain separate files for different functions. I'm guessing this is the "installed" version of the package and that I won't be able to modify the R code in it.
What is the best way to copy this to the new computer. On the original computer, .libPaths() shows two locations for installed packages. I don't really care about these on this machine however I want to follow the correct procedure for the server. What is the best practice - should there be only one location and if so, what should it be?

Should I be worried about 3rd-party packages accessing my settings.json?

I just saw a package that, in order to run properly, asks you to put something in the public section of settings.json. That made me wonder if the rest of the information there (sometimes sensible, like AWS keys) is accessible as well.
So, should I be worried about this or does Meteor hides this information from packages?
Any package you install from any package manager including NPM, Ruby Gems, and the Meteor package server can run arbitrary code on your computer as your user, including using the fs module to read and write files, accessing the network to send and receive data, etc.
In fact, you place the same trust in the developer whenever you install an application from the internet - almost any application on your computer could read your settings.json file, for example Dropbox, Chrome, etc.
Therefore, there is no way to completely secure the settings.json file from package code. The only way to be sure that packages are safe is to use only community-approved packages or read the source code of the packages you are using.

should I be (git)ignoring my knitr cache?

In a vein similar to this question: I'm writing a package and am using knitr to write a few documents in inst/doc/. Since I'm using github to host my repo (and I intend to point to people to that repo to get the package), I'm wondering if I should be version controlling my the caches of my various documents.
I ask this question because it's unclear where cache falls in the guidelines provided by this other question (which addresses when certain file types should and shouldn't be in the .gitignore of a repo).
Can anyone shed some light on to how package developers that use knitr and git are handling their caches?
If R CMD check passes without the knitr cache, and I think it would, I wouldn't include them. In fact, I suspect R CMD check would give a note about the cache files being present in the package. I know for LaTeX files, you only want to include the .tex file in the R package and in the version control. The other required files should be automatically generated on install.

Package management running R on a shared server

some background: I'm a fairly beginning sysadmin maintaining the server for our department. The server houses several VM's, mostly Ubuntu SE 12.04, usually with a separate VM per project.
One of the tools we use is R and RStudio, also server edition. I've set this up so everyone can access it through their browser, but I'm still wondering what the best way would be to deal with package management. Ideally, I will have one folder/library with our "common" packages, which are common in many projects and use cases. I would admin this library, since I'm the only user in sudo. My colleagues should be able to add packages on a case-by-case basis in their "personal" R folders, that get checked as a backup in case a certain package is not available in our main folder.
My question has a few parts:
- Is this actually a viable way to set this up?
- How would I configure this?
- Is there a way to easily automate this library for use in other VM's?
I have a similar question pertaining to Python, but maybe I should make a new question for that..
R supports multiple libaries for packages by default. Libraries are basically just folders in which installed packages are placed.
You can use
.libPaths()
in R to view what paths are use as libraries on your system. On my Ubuntu 13.10 system, there are
a personal library at "~/R/x86_64-pc-linux-gnu-library/3.0" where packages installed by the user are placed,
"/usr/lib/R/library" where packages installed via apt-get are placed and
"/usr/lib/R/site-library" which is a system-wide library for packages that are shared by all users.
You can add additional libraries to R, but from how I understand your question, installing packages to /usr/lib/R/site-library might be what you are looking for. This can be archived relatively easily by running R as root and calling install.packages() and update.packages() from there as usual. However, running R as root is a security risk and not a good idea, so maybe it is better to create an separate user with write access to /usr/lib/R/site-library and to use that one instead of root.
If you mount /usr/lib/R/site-library on multiple VM, they should also share the packages installed there. Does this answer your question?
Having common library and personal library locations is completely feasible.
Each user should have two environment variables set. R_LIBS should point to the common library, and R_LIBS_USER should point to their personal location. See ?.Library for more information.
You can check a user's library paths using .libPaths(). You probably want users to install packages to their personal library, so some fiddling may be required to make sure that the personal library is the first element in of .libPaths().

How do I setup a shared R package directory on a server?

I have a shared R package directory on a server in order to maintain consistent package versions for all users. This becomes problematic when someone attempts to install a new version of a package that another user originally installed, or they attempt to install it when that package is loaded elsewhere. In these instances R creates a 00LOCK-PackageName directory in the shared package directory, and the permissions are such that the installer doesn't have write access to many files within the directory. This then requires several people chmod-ind the directory to allow it to be deleted, or having one of our system administrators do the same.
This is an especially acute problem since we use R packages to maintain and deploy our reporting infrastructure. It's something that we're constantly updating and deploying to our shared server.
Are there settings or programs that facilitate shared R package management? Any general tips?
One common solution is to
have everybody be a member of a common group, mayne rapps
have the directory where you share the R packages be group-owned by rapps, and you want to make that 'sticky' -- chmod g=rwt if I recall correctly
have your umask default set in /etc/profile or equivalent to make sure your default
creation mode in in fact 'g+w'; I have used a file /etc/profile.d/local_umask.sh for this with a single command umask u=rwx,g=rwx,o=rx
We ended up having our systems administrator create a script that:
Opened permissions on all directories, subdirectories, and files within our shared package directory
Deleted any directories starting with 00LOCK that were older than 15 minutes
Ran every minute
We haven't run into any problems since.

Resources