Where to place package development directory and files? - r

I'm developing a package here at work from a number of functions that I've written. Once I build the package, I'll put the binary in a local package repository I've created on a company network drive, using a process described by Dirk in this question.
Question: Should my package dev folder be located in my package repository? If so, where?
I couldn't find much about this in the Creating a Package Tutorial, nor in the R Administrator Guide.

No, it does not matter where your package development directories live.
That said, I think it's a good idea to keep them separate from binary package repositories, because they are conceptually quite different, and generally you should be able to delete a repo and re-generate it from sources.

Related

How to host example data for a R-packages on Github

I'm experimenting with GitHub and I created a little package for my colleagues to use. They install it with the devtools package and install_github() function directly in R. I also have some example data and a R-Markdown file that shows the usage of all functions in the package and can be published via GitHub Pages.
I would like to know what would be the best practice to enable others to use this example data to learn the package.
I can think of two different options:
Host the data in a separate directory which is not part of the installation and tell people to download it manually or use something like the download.file() function from R at the beginning of the example script to download all data that could be packed into a .zip.
Make the data part of the package installation, however this would require the data to be fairly small which is difficult in my particular case (data is 10MB).
Ideally the examples in the R-documentation (.Rd files in the man folder) could also use the same examples as in the markdown file. also in this case, option (2) seems to be favorable.
Could anybody give me some advice what would be the best way to go, sort of the "industry standard" if there is any.

Maintain different versions of R package for open source contribution

Packrat is often recommended as the virtual environment for R, but it doesn't fully meet my need of contributing to R open source. Packrat's "virtual environment" is stored directly in the project directory, requiring me to modify the .gitignore to ignore them when I make a pull request to the open source upstream.
In contrast, something like conda stores the virtual environment somewhere else, leaving no trace in the project codebase itself.
So how do R open source contributors deal manage dependencies during package development? Ideally the solution would work well with devtools and Rstudio.
There is nothing wrong in having Packrat in .gitignore.
You can use .git/info/exclude file thus avoiding touching the .gitignore.

How to include a full R distribution in my GitHub repository

I build transport models for various government agencies. My model is managed through GitHub, and it depends on R to perform certain calculations. I currently have my entire r installation folder in the repository. This can't be the right solution, but here are some of my constraints:
My clients are usually even less sophisticated programmers then I am. When they download/clone the model, it just needs to work.
This needs to be the case 10 years from now - regardless of what the current build of R and all the package dependencies are.
Placing my entire R folder in the repo solves these two problems, but creates some new ones:
The repository is much larger than it needs to be / longer download time.
If the transport model is updated to a new version (say v2.0), I'd want to update R and its packages to the latest versions. I'm afraid this would increase the size of the repo even further.
One solution I understand is submodules. I could place the full R folder in a separate repo and bring it in as a submodule. This, at the very least, cleans up the model repository.
What about zipping the R folder? Some early testing showed that git can diff the zip file, but I don't know if it is doing it as a flat file or reading the contents. Also, is GitHub going to complain about 100MB+ zip file? I'd like to avoid GitLFS if I can, but asking my clients to unzip that file wouldn't be a problem.
I also looked at packrat, but as far as I can tell, that only works for R projects.
Lastly, I don't entirely understand makefiles / recipes, but it would be nice if there was a script I could run that would download specific versions of R and it's libraries. One complicating thing is that some of the R packages are private GitHub repos.
Anyway, I'm happy to provide more info if needed. Thank you for your help!

Package management running R on a shared server

some background: I'm a fairly beginning sysadmin maintaining the server for our department. The server houses several VM's, mostly Ubuntu SE 12.04, usually with a separate VM per project.
One of the tools we use is R and RStudio, also server edition. I've set this up so everyone can access it through their browser, but I'm still wondering what the best way would be to deal with package management. Ideally, I will have one folder/library with our "common" packages, which are common in many projects and use cases. I would admin this library, since I'm the only user in sudo. My colleagues should be able to add packages on a case-by-case basis in their "personal" R folders, that get checked as a backup in case a certain package is not available in our main folder.
My question has a few parts:
- Is this actually a viable way to set this up?
- How would I configure this?
- Is there a way to easily automate this library for use in other VM's?
I have a similar question pertaining to Python, but maybe I should make a new question for that..
R supports multiple libaries for packages by default. Libraries are basically just folders in which installed packages are placed.
You can use
.libPaths()
in R to view what paths are use as libraries on your system. On my Ubuntu 13.10 system, there are
a personal library at "~/R/x86_64-pc-linux-gnu-library/3.0" where packages installed by the user are placed,
"/usr/lib/R/library" where packages installed via apt-get are placed and
"/usr/lib/R/site-library" which is a system-wide library for packages that are shared by all users.
You can add additional libraries to R, but from how I understand your question, installing packages to /usr/lib/R/site-library might be what you are looking for. This can be archived relatively easily by running R as root and calling install.packages() and update.packages() from there as usual. However, running R as root is a security risk and not a good idea, so maybe it is better to create an separate user with write access to /usr/lib/R/site-library and to use that one instead of root.
If you mount /usr/lib/R/site-library on multiple VM, they should also share the packages installed there. Does this answer your question?
Having common library and personal library locations is completely feasible.
Each user should have two environment variables set. R_LIBS should point to the common library, and R_LIBS_USER should point to their personal location. See ?.Library for more information.
You can check a user's library paths using .libPaths(). You probably want users to install packages to their personal library, so some fiddling may be required to make sure that the personal library is the first element in of .libPaths().

Are there any R package repository management tools?

I'm creating a custom R package repository and would like to replicate the CRAN archive structure whereby old versions of packages are stored in the src/contrib/Archive/packageName/directory. I'd like to use the install_version function in devtools (source here), but that function is dependent on having a CRAN-like archive structure instead of having all package versions in src/contrib/.
Are there any R package repository management tools that facilitate the creation of this directory structure and other related tasks (e.g. updating the Archive.rds file)?
It would also be nice if the management tools handled the package type logic on the repository side so that I can use the same install.packages() or install_version() code on a Linux server as on my local Mac (i.e. I don't have to use type="both" or type="source" when installing locally on a Mac).
Short answer:
Not really for off-the-shelf use.
Long answer:
There are a couple of tools that one can use to manage their repo, but there isn't a coherent off-the-shelf ecosystem yet.
The CRAN maintainers keep a bevy of scripts here to manage the CRAN repository, but it's unclear how they all work together or which parts are needed to update the package index, run package checks, or manage the directory structure.
The tools::write_PACKAGES function can be used to update the package index, but this needs to be updated each time a package is added, updated, or removed from the repository.
M.eik Michalke has created the roxyPackage package, which has the ability to automatically update a given repository, install it, etc. The developer has also recently added the ability to have the archive structure mimic that of CRAN with the archive_structure function. The downside is the package isn't on CRAN and would probably be better if integrated with devtools. It's also brand new and isn't ready for wide use yet.
Finally, I created a small Ruby script that watches a given repository and updates the package index if any files change. However, this is made to work for my specific organization and will need to be refactored for external use. I can make it more general if anyone is interested in it.

Resources