I would like to create a local R package repository such that users in my company can install packages from it and the system admins can update the local repo periodically. Access to the CRAN mirrors is currently denied.
Is there a simple way to do this?
Yes, either a copy of CRAN or a repo with local packages is easy to set up. Presumably you want this for Windows so do this:
Create a top-level directory on your webserver, say R/
Create the usual hierarchy in there: R/bin/windows/contrib/2.11. If you need to support other (earlier) releases, simply create directories 2.10, 2.9, ... next to the 2.11 directory.
Place the packages you need into the directory (say, 2.11), then change into that directory and run the following command to generate PACKAGES and PACKAGES.gz files for the repository:
tools::write_PACKAGES(".", type="win.binary")
That is all there is to it -- now you can access the repository by pointing to the address given a command such as
update.packages(repos="http://my.local.server/R", ask=FALSE)
which I even do in R/zzz.R for local packages so that they update themselves.
Edit some five+ years later: And the drat package now automates a lot of this, and shines particularly if you also use GitHub to serve the repository over http/https (but is useful for other or local hosting too).
Read the section of the Administrator guide.
The package miniCRAN also provides great functionality for this. The key advantage being that you don't need a full mirror, but can setup a "mini" mirror of CRAN with only the packages distributions you need, including their dependencies.
I also don't have access to CRAN mirrors for package installation. As a result, these are some steps I've found to be helpful.
First, you'll want to make sure you have the following path and its directories in your system: "/R/src/contrib". If you don't have this path and these directories, you'll need to create them. All of your R packages files will be stored in the "contrib" directory.
Once you've added package files to the "contrib" directory, you can use the setRepositories function from the utils package to create the repository. I'd recommend adding the following code to your .Rprofile for a local repository:
utils::setRepositories(ind = 0, addURLs = c(WORK = "file://<your higher directories>/R"))
After editing your .Rprofile, restart R.
ind = 0 will indicate that you only want the local repository. Additional repositories can be included in the addURLs = option and are comma separated within the character vector.
Next, create the repository index with the following code:
tools::write_PACKAGES("/<your higher directories>/R/src/contrib", verbose = TRUE)
This will generate the PACKAGE files that serve as the repository index.
To see what packages are in your repository, run the following code and take a look at the resulting data frame: my_packages <- available.packages()
At this point, you can install packages from the repo without referencing the entire path to the package installation file. For example, to install the dplyr package, you could run the following:
install.packages("dplyr", dependencies = TRUE)
If you want to take it a step further and manage a changing repository, you could install and use the miniCRAN package. Otherwise, you'll need to execute the write_PACKAGES function whenever your repository changes.
After installing the miniCRAN package, you can execute the following code to create the miniCRAN repo:
my_packages <- available.packages()
miniCRAN::makeRepo(
pkgs = my_packages[,1,
path = "/<your higher directories>/R",
repos = getOption("repos"),
type = "source",
Rversion = R.version,
download = TRUE,
writePACKAGES = TRUE,
quiet = FALSE
)
You only need to execute the code above once for each repo.
Then, check to make sure each miniCRAN repo has been created. You only need to do this once for each repo:
pkgAvail(
repos = getOption("repos"),
type = "source",
Rversion = R.version,
quiet = FALSE
)
Whenever new package files are placed into the local repo you can update the local repo's index as follows:
miniCRAN::updateRepoIndex("/<your higher directories>/R/")
Finally, as an optional step to see if the new package is in the index, create a data frame of the available packages and search the data frame:
my_packages <- available.packages(repos = "file://<your higher directories>/R/")
This approach has worked pretty well for me, but perhaps others have comments and suggestions that could improve upon it.
Related
I will let an R project run on a data center and the team working there has no access to the Internet, so they will have to download the R libraries from an internal repository (on their Intranet) where all the packages are hosted.
It is possible to change the repository from where the libraries are downloaded?
and how can we point to this repository if I will provide them with my renv.lock file?
Could be solved by doing this? :
repos <- c(CRAN = "https://cloud.r-project.org", WORK = "https://work.example.org")
options(repos = repos)
See here
Thanks a lot
It is possible to change the repository from where the libraries are downloaded?
Yes, and the example code you shared is correct: the active package repositories used in an R session are controlled via the repos option.
and how can we point to this repository if I will provide them with my renv.lock file?
If you're working within an renv project with an auto-loader, then renv will automatically set the repositories from the lockfile when R is started. Otherwise, you can call renv::load("/path/to/project") to explicitly load a project at some location.
I'd recommend reading https://rstudio.github.io/renv/articles/renv.html for more details.
I found myself in the situation where my private repos were set, but whenever I ran renv::init(), it wouldn't point to them. The simplest solution I could come up with from reading the renv docs:
Before calling renv::init(), call the function: Sys.getenv("RENV_CONFIG_REPOS_OVERRIDE").
If it returns anything other than the URL to your private package repository then call the function: Sys.setenv("RENV_CONFIG_REPOS_OVERRIDE" = "your_private_package_repository_url")
Call the function renv::init()
I have a utilities package foo, and have developed another package bar which calls functions in foo.
foo is available on a URL (not Github or any such service) as source files. I install/update it using
install.packages("//mywebsite.com/foo",
repos = NULL,
type = "source")
I now want to share bar with others. I've read the devtools page on dependencies and understand I merely have to add a Remotes section to my DESCRIPTION file.
However, the example for a URL-based remote dependency is:
# URL
Remotes: url::https://github.com/hadley/stringr/archive/master.zip
What concerns me here is that the example uses a .zip file, but the package foo is only available as a raw source directory.
Will this work? Can I simply use
Remotes: url:://mywebsite.com/foo
Or does this only work with zipped files?
I notice the following example, for local packages, does not have an extension (Remotes: local::/pkgs/testthat), which makes me hopeful it's representing a source directory and that therefore that'll also work for URLs, but am not sure.
It looks like remotes::install_url requires a .zip, .tar, or .tar.gz, and that's likely what would be called to install the dependency if you specify Remotes: url:://mywebsite.com/foo.
If your code is in a Git repo (even if not on GitHub/GitLab) you can refer directly to the repo. Or if it's on a network drive you can refer to it using local instead of url, since remotes::install_local can handle directories.
When you try to install a package in R and you don't have access rights to the default library path, R will ask you:
Would you like to use a personal library instead?
Would you like to create a personal library '~/path' to install
packages into?
However, if you are running an Rscript, those messages will not show up and installation will fail. I could predefine a specific path and instruct install.packages to use it, but I don't want to create an additional library path that would be specific to this Rscript. I just want to use the default personal library. Is there a way to force creation of a personal library without requiring interaction?
You can use Sys.getenv("R_LIBS_USER") to get the local library search location.
This is what I ended up doing, which seems to be working (the hardest part was testing the solution, since the problem only occurs the first time you try to install a package):
# create local user library path (not present by default)
dir.create(path = Sys.getenv("R_LIBS_USER"), showWarnings = FALSE, recursive = TRUE)
# install to local user library path
install.packages(p, lib = Sys.getenv("R_LIBS_USER"), repos = "https://cran.rstudio.com/")
# Bioconductor version (works for both Bioconductor and CRAN packages)
BiocManager::install(p, update = FALSE, lib = Sys.getenv("R_LIBS_USER"))
As #hrbrmstr pointed out in the comments, it may not be a good idea to force-install packages, so use at your own risk.
I have been using git for a while but just recently started using packrat. I would like my repository to be self contained but at the same time I do not want to include CRAN packages as they are available. It seems once R is opened in a project with packrat it will try to use packages from project library; if they are not available then it will try to install from src in the project library; if they are not available it will look at libraries installed in that computer. If a library is not available in the computer; would it look at CRAN next?
What files should I include in my git repo as a minimum (e.g., packrat.lock)?
You can choose to set an external CRAN-like repository with the source tarballs of the packages and their versions that you'd like available for your project. The default behaviour, however, is to look to CRAN next, as you've identified in your question. Check out the packrat.lock file, you will see that for each package you use in packrat, there is an option called source: CRAN (if you've downloaded the file from CRAN, that is).
When you have a locally stored package source file, the contents of the lockout for said package change to the following:
Package: FooPackage
Source: source
Version: 0.4-4
Hash: 44foo4036fb68e9foo9027048d28
SourcePath: /Users/MyName/Documents/code/myrepo/RNetica
I'm a bit unclear on your final question: What files should I include in my git repo as a minimum (e.g., packrat.lock)? But I'm going to take this as a) combination of what files should be present for packrat to run, and b) which of those files should be committed to the git-repo. To answer the first question, I illustrate with initialising packrat on an existing R project.
When you run packrat::init(), two important things happen (among others):
1. All the packrat scaffolding, including source tarballs etc are created under: PackageName/packrat/.
2. packrat/lib*/ is added to your .gitignore file.
So from this, we can see that anything under packrat/lib*/ doesn't need to be committed to your git-repo. This leaves the following 3 files to be committed:
packrat/init.R
packrat/packrat.lock
packrat/packrat.opts
packrat.lock is needed for collaborating with others through a version control system; it helps keep your private libraries in sync. packrat.opts allows you to specify different project specific options for packrat. The file is automatically generated using get_opts and set_opts. Committing this file to the git-repo will ensure that any options you specify are maintained for all collaborators. A final file to be committed to the repo is .Rprofile. This file tells R to use the private package library (when R is started from the project directory).
Depending on your needs, you can choose to commit the source tar balls to the repository, or not. If you don't want them available in your git-repo, you simply add packrat/src/ to the .gitignore. But, this will mean that anyone accessing the git-repo will not have access to the package source code, and the files will be downloaded from CRAN, or from wherever the source line dictates within the packrat.lock file.
From your question, it sounds like committing the packrat/src/ folder contents to your repo might be what you need.
Sometimes on Stack Overflow, there's a question relative to a package which is not installed on my system, and which I don't plan to reuse later.
If I install the package with install.packages(), it will be put in one of my R install libraries, and then will take some storage space and be updated each time I run update.packages().
Is there a way to install a package only for the current R session ?
You can install a package temporarily with the following function :
tmp.install.packages <- function(pack, dependencies=TRUE, ...) {
path <- tempdir()
## Add 'path' to .libPaths, and be sure that it is not
## at the first position, otherwise any other package during
## this session would be installed into 'path'
firstpath <- .libPaths()[1]
.libPaths(c(firstpath, path))
install.packages(pack, dependencies=dependencies, lib=path, ...)
}
Which you can use simply this way :
tmp.install.packages("pkgname")
The package is installed in a temporary directory, and its files should be deleted at next system restart (at least on linux systems).
Another solution for this problem is devmode from devtools. Devmode allows you to install packages to a dev repository so your other packages are untouched if you install development versions. For example:
library(devtools)
devmode()
install_github('ggplot2', 'hadley')
devmode()
You'll notice that your version has not changed.
pacman deals with package management issues like this:
library(pacman)
Now you can use:
p_load("pkgname") #installs or loads package if already installed
#at end of session:
p_delete("pkgname") #deletes package from lib
This is a quick way to install in your directory and then delete it at the end (not really a temporary install)
As an addition to Tyler's answer a p_temp function was recently added to the pacman package which does exactly what the question asks for.
library(pacman)
p_temp(pkgname) # or p_temp("pkgname") either work...
This will install the package and any dependencies temporarily.
Disclosure: Tyler and I are co-authors of the pacman package...
The following is something in the middle between
juba
and
sebastian-c,
and is as simple as that:
.libPaths("/my/path")
Now and until the end of the current session,
you can install packages as you normally would, and they will end up in
/my/path.
Also package dependencies will go to /my/path.
If you want to have control over dependencies, you can specify them manually with:
install.packages(c("pack", "dep1", "dep2", ...), dependencies = FALSE)
This approach might be useful in two particular scenarios:
A so-to-say discovery session. You want to discover new packages and install them casually to see if something interesting pops up. Then, you use an OS provided tempdir in .libPaths(), to avoid messing your R setup, and the OS will take care of the cleaning.
Create, nowadays common, reproducible environments.
You install a base R, then add .libPaths("my/project/dir"). By looking at this dir, you have a clear picture of what are your project package requirements. Further, you can copy this folder to another PC to reproduce the same environment. Much like Python pipenv you can have more isolated environments: for each session, you call .libPaths() with the related project dir.