I'm creating a custom R package repository and would like to replicate the CRAN archive structure whereby old versions of packages are stored in the src/contrib/Archive/packageName/directory. I'd like to use the install_version function in devtools (source here), but that function is dependent on having a CRAN-like archive structure instead of having all package versions in src/contrib/.
Are there any R package repository management tools that facilitate the creation of this directory structure and other related tasks (e.g. updating the Archive.rds file)?
It would also be nice if the management tools handled the package type logic on the repository side so that I can use the same install.packages() or install_version() code on a Linux server as on my local Mac (i.e. I don't have to use type="both" or type="source" when installing locally on a Mac).
Short answer:
Not really for off-the-shelf use.
Long answer:
There are a couple of tools that one can use to manage their repo, but there isn't a coherent off-the-shelf ecosystem yet.
The CRAN maintainers keep a bevy of scripts here to manage the CRAN repository, but it's unclear how they all work together or which parts are needed to update the package index, run package checks, or manage the directory structure.
The tools::write_PACKAGES function can be used to update the package index, but this needs to be updated each time a package is added, updated, or removed from the repository.
M.eik Michalke has created the roxyPackage package, which has the ability to automatically update a given repository, install it, etc. The developer has also recently added the ability to have the archive structure mimic that of CRAN with the archive_structure function. The downside is the package isn't on CRAN and would probably be better if integrated with devtools. It's also brand new and isn't ready for wide use yet.
Finally, I created a small Ruby script that watches a given repository and updates the package index if any files change. However, this is made to work for my specific organization and will need to be refactored for external use. I can make it more general if anyone is interested in it.
Related
I have to restructure a big project written in R, which is later consisting several packages as well as developers. Everything is set up on a git server.
The question is: How do I manage frequent changes inside packages without having to build them every time and developers updating them after they made a new pull? Is there any best practice or automation for that? I don't want source() with unbuilt packages and R.files but would like to stick with a package like structure as much as possible. We will work in a Windows environment.
Thanks.
So I fiddled around a while, tried different setups and came up with an arrangement which fits my needs.
It basically consists two git repositories. The first on (let's call it base-repo) of them contains most scripts on which all later packages are based on. The second repo we will call the "package-repo".
Most development work should be done on the base-repo. The base-repo is under CI control via a build server and unit tests.
The package-repo contains folders for each package we want to build and the base-repo as a git-submodule.
Each package can now be constructed via a very simple bash/shell script (“build script”):
check out a commit/tag of the submodule base-repo on which the stable
package build should be based on
copy files which are necessary for the package into the specific package folder
checks and builds the package
script can also create a history file of package
script can either be invoked manually or by a build server
This approach can also be combined with packrat. Additional code which is very package specific can now be also added to the package-repo and is under version control while independed from the base-repo
The approach could be further extended to trigger the build of packages from the package-repo based on pushes to the base-repo. Packages with a build script pointing to master as a commit will always be up to date and if under control of a build server it will ensure that changes to the base-repo will not break the package. Also it is possible to create several packages containing the same scripts from base-repo.
See also: git: symlink/reference to a file in an external repository
How can I modify source/templates of packages that I am using inside my Meteor Project.
For example, I download the package from Atmosphere, begin using the package and want to make some minor tweaks to the package.
What is the best approach to making changes to a installed Atmosphere package?
You should fork the package from the github source and make changes to your own version. If you think those changes might benefit others then you can make a pull request for your changes and, if the original author(s) agree, your changes can now benefit others.
If you make changes in the deployed package under your project directory then those will probably be clobbered the next time you run $ meteor update
If you're still curious as to where they live, look under /.meteor
some background: I'm a fairly beginning sysadmin maintaining the server for our department. The server houses several VM's, mostly Ubuntu SE 12.04, usually with a separate VM per project.
One of the tools we use is R and RStudio, also server edition. I've set this up so everyone can access it through their browser, but I'm still wondering what the best way would be to deal with package management. Ideally, I will have one folder/library with our "common" packages, which are common in many projects and use cases. I would admin this library, since I'm the only user in sudo. My colleagues should be able to add packages on a case-by-case basis in their "personal" R folders, that get checked as a backup in case a certain package is not available in our main folder.
My question has a few parts:
- Is this actually a viable way to set this up?
- How would I configure this?
- Is there a way to easily automate this library for use in other VM's?
I have a similar question pertaining to Python, but maybe I should make a new question for that..
R supports multiple libaries for packages by default. Libraries are basically just folders in which installed packages are placed.
You can use
.libPaths()
in R to view what paths are use as libraries on your system. On my Ubuntu 13.10 system, there are
a personal library at "~/R/x86_64-pc-linux-gnu-library/3.0" where packages installed by the user are placed,
"/usr/lib/R/library" where packages installed via apt-get are placed and
"/usr/lib/R/site-library" which is a system-wide library for packages that are shared by all users.
You can add additional libraries to R, but from how I understand your question, installing packages to /usr/lib/R/site-library might be what you are looking for. This can be archived relatively easily by running R as root and calling install.packages() and update.packages() from there as usual. However, running R as root is a security risk and not a good idea, so maybe it is better to create an separate user with write access to /usr/lib/R/site-library and to use that one instead of root.
If you mount /usr/lib/R/site-library on multiple VM, they should also share the packages installed there. Does this answer your question?
Having common library and personal library locations is completely feasible.
Each user should have two environment variables set. R_LIBS should point to the common library, and R_LIBS_USER should point to their personal location. See ?.Library for more information.
You can check a user's library paths using .libPaths(). You probably want users to install packages to their personal library, so some fiddling may be required to make sure that the personal library is the first element in of .libPaths().
My organization has a local CRAN-like repository for internal R packages. As we release more and more packages, I want to retire old versions to an Archive area so that people can get to them if they really want them, but by default they won't be installed by install.packages() and the like.
Is the best practice to just move them into src/contrib/Archive? Will that play well with the indexing routines like write_PACKAGES()? I use subdirectories for each distinct package, so somehow it needs to know not to descend into this directory.
I'm developing a package here at work from a number of functions that I've written. Once I build the package, I'll put the binary in a local package repository I've created on a company network drive, using a process described by Dirk in this question.
Question: Should my package dev folder be located in my package repository? If so, where?
I couldn't find much about this in the Creating a Package Tutorial, nor in the R Administrator Guide.
No, it does not matter where your package development directories live.
That said, I think it's a good idea to keep them separate from binary package repositories, because they are conceptually quite different, and generally you should be able to delete a repo and re-generate it from sources.