Archive area in local CRAN - r

My organization has a local CRAN-like repository for internal R packages. As we release more and more packages, I want to retire old versions to an Archive area so that people can get to them if they really want them, but by default they won't be installed by install.packages() and the like.
Is the best practice to just move them into src/contrib/Archive? Will that play well with the indexing routines like write_PACKAGES()? I use subdirectories for each distinct package, so somehow it needs to know not to descend into this directory.

Related

Maintain different versions of R package for open source contribution

Packrat is often recommended as the virtual environment for R, but it doesn't fully meet my need of contributing to R open source. Packrat's "virtual environment" is stored directly in the project directory, requiring me to modify the .gitignore to ignore them when I make a pull request to the open source upstream.
In contrast, something like conda stores the virtual environment somewhere else, leaving no trace in the project codebase itself.
So how do R open source contributors deal manage dependencies during package development? Ideally the solution would work well with devtools and Rstudio.
There is nothing wrong in having Packrat in .gitignore.
You can use .git/info/exclude file thus avoiding touching the .gitignore.

How to include a full R distribution in my GitHub repository

I build transport models for various government agencies. My model is managed through GitHub, and it depends on R to perform certain calculations. I currently have my entire r installation folder in the repository. This can't be the right solution, but here are some of my constraints:
My clients are usually even less sophisticated programmers then I am. When they download/clone the model, it just needs to work.
This needs to be the case 10 years from now - regardless of what the current build of R and all the package dependencies are.
Placing my entire R folder in the repo solves these two problems, but creates some new ones:
The repository is much larger than it needs to be / longer download time.
If the transport model is updated to a new version (say v2.0), I'd want to update R and its packages to the latest versions. I'm afraid this would increase the size of the repo even further.
One solution I understand is submodules. I could place the full R folder in a separate repo and bring it in as a submodule. This, at the very least, cleans up the model repository.
What about zipping the R folder? Some early testing showed that git can diff the zip file, but I don't know if it is doing it as a flat file or reading the contents. Also, is GitHub going to complain about 100MB+ zip file? I'd like to avoid GitLFS if I can, but asking my clients to unzip that file wouldn't be a problem.
I also looked at packrat, but as far as I can tell, that only works for R projects.
Lastly, I don't entirely understand makefiles / recipes, but it would be nice if there was a script I could run that would download specific versions of R and it's libraries. One complicating thing is that some of the R packages are private GitHub repos.
Anyway, I'm happy to provide more info if needed. Thank you for your help!

Package management running R on a shared server

some background: I'm a fairly beginning sysadmin maintaining the server for our department. The server houses several VM's, mostly Ubuntu SE 12.04, usually with a separate VM per project.
One of the tools we use is R and RStudio, also server edition. I've set this up so everyone can access it through their browser, but I'm still wondering what the best way would be to deal with package management. Ideally, I will have one folder/library with our "common" packages, which are common in many projects and use cases. I would admin this library, since I'm the only user in sudo. My colleagues should be able to add packages on a case-by-case basis in their "personal" R folders, that get checked as a backup in case a certain package is not available in our main folder.
My question has a few parts:
- Is this actually a viable way to set this up?
- How would I configure this?
- Is there a way to easily automate this library for use in other VM's?
I have a similar question pertaining to Python, but maybe I should make a new question for that..
R supports multiple libaries for packages by default. Libraries are basically just folders in which installed packages are placed.
You can use
.libPaths()
in R to view what paths are use as libraries on your system. On my Ubuntu 13.10 system, there are
a personal library at "~/R/x86_64-pc-linux-gnu-library/3.0" where packages installed by the user are placed,
"/usr/lib/R/library" where packages installed via apt-get are placed and
"/usr/lib/R/site-library" which is a system-wide library for packages that are shared by all users.
You can add additional libraries to R, but from how I understand your question, installing packages to /usr/lib/R/site-library might be what you are looking for. This can be archived relatively easily by running R as root and calling install.packages() and update.packages() from there as usual. However, running R as root is a security risk and not a good idea, so maybe it is better to create an separate user with write access to /usr/lib/R/site-library and to use that one instead of root.
If you mount /usr/lib/R/site-library on multiple VM, they should also share the packages installed there. Does this answer your question?
Having common library and personal library locations is completely feasible.
Each user should have two environment variables set. R_LIBS should point to the common library, and R_LIBS_USER should point to their personal location. See ?.Library for more information.
You can check a user's library paths using .libPaths(). You probably want users to install packages to their personal library, so some fiddling may be required to make sure that the personal library is the first element in of .libPaths().

Where to place package development directory and files?

I'm developing a package here at work from a number of functions that I've written. Once I build the package, I'll put the binary in a local package repository I've created on a company network drive, using a process described by Dirk in this question.
Question: Should my package dev folder be located in my package repository? If so, where?
I couldn't find much about this in the Creating a Package Tutorial, nor in the R Administrator Guide.
No, it does not matter where your package development directories live.
That said, I think it's a good idea to keep them separate from binary package repositories, because they are conceptually quite different, and generally you should be able to delete a repo and re-generate it from sources.

Are there any R package repository management tools?

I'm creating a custom R package repository and would like to replicate the CRAN archive structure whereby old versions of packages are stored in the src/contrib/Archive/packageName/directory. I'd like to use the install_version function in devtools (source here), but that function is dependent on having a CRAN-like archive structure instead of having all package versions in src/contrib/.
Are there any R package repository management tools that facilitate the creation of this directory structure and other related tasks (e.g. updating the Archive.rds file)?
It would also be nice if the management tools handled the package type logic on the repository side so that I can use the same install.packages() or install_version() code on a Linux server as on my local Mac (i.e. I don't have to use type="both" or type="source" when installing locally on a Mac).
Short answer:
Not really for off-the-shelf use.
Long answer:
There are a couple of tools that one can use to manage their repo, but there isn't a coherent off-the-shelf ecosystem yet.
The CRAN maintainers keep a bevy of scripts here to manage the CRAN repository, but it's unclear how they all work together or which parts are needed to update the package index, run package checks, or manage the directory structure.
The tools::write_PACKAGES function can be used to update the package index, but this needs to be updated each time a package is added, updated, or removed from the repository.
M.eik Michalke has created the roxyPackage package, which has the ability to automatically update a given repository, install it, etc. The developer has also recently added the ability to have the archive structure mimic that of CRAN with the archive_structure function. The downside is the package isn't on CRAN and would probably be better if integrated with devtools. It's also brand new and isn't ready for wide use yet.
Finally, I created a small Ruby script that watches a given repository and updates the package index if any files change. However, this is made to work for my specific organization and will need to be refactored for external use. I can make it more general if anyone is interested in it.

Resources