Setting default `.libPaths()` for multiple versions of R - r

I work in an environment with multiple versions of R available, managing my libraries can be something of a hassle as I have to switch library locations to avoid issues with packages being built under different versions of R.
Is there a way to change my default library location in .libPaths() automatically depending on the version of R i'm using?

I found this trick useful.
keep your locally installed R libraries in a directory named for their version, detect the version when R starts and set .libPaths() accordingly
Edit your .Rprofile file in you home directory to contain something like the following:
version <- paste0(R.Version()$major,".",R.Version()$minor)
if (version == "3.5.2") {
.libPaths( c("/path/to/Rlibs/3.5.2", .libPaths()) )
} else if (version == "3.4.3") {
.libPaths( c("/path/to/Rlibs/3.4.3", .libPaths()) )
}
Updated version that automatically creates a new library folder for a new R version if one is detected, also throws a warning when it does this in case you accidentally loaded a new R version when you weren't intending to.
# Set Version specific local libraries
## get current R version (in semantic format)
version <- paste0(R.Version()$major,".",R.Version()$minor)
## get username
uname <- Sys.getenv("USER") # USERNAME on windows because why make it easy?
## generate R library path for parent directory
libPath <- paste0("/home/", uname, "/Rlibs/")
setLibs <- function(libPath, ver) {
libfull <- paste0(libPath, ver) # combine parent and version for full path
if(!dir.exists(libfull)) { # create a new directory for this R version if it does not exist
# Warn user (the necessity of creating a new library may indicate an in advertant choice of the wrong R version)
warning(paste0("Library for R version '", ver, "' Does not exist it will be created at: ", libfull ))
dir.create(libfull)
}
.libPaths(c(libfull, .libPaths()))
}
setLibs(libPath, version)

A slightly shorter version of Richard J. Acton's solution:
version <- paste0(R.Version()$major,".",R.Version()$minor)
libPath <- path.expand(file.path("~/.R/libs", version))
if(!dir.exists(libPath)) {
warning(paste0("Library for R version '", version, "' will be created at: ", libPath ))
dir.create(libPath, recursive = TRUE)
}
.libPaths(c(libPath, .libPaths()))

I have a standard path structure with a new folder added for each version number, and a folder called pax for packages. To do this you just need to add the following to your .Rprofile.
Rver <- paste0(R.Version()$major, ".", R.Version()$minor)
.libPaths(file.path(paste0(
"C:/Users/abcd/R/", Rver, "/pax")))
This means you aren't going to grow a forest of if statements if you have multiple versions.

Related

Can I get the URL of what will be used by install.packages?

When running install.packages("any_package") on windows I get the message :
trying URL
'somepath.zip'
I would like to get this path without downloading, is it possible ?
In other terms I'd like to get the CRAN link to the windows binary of the latest release (the best would actually be to be able to call a new function with the same parameters as install.packages and get the proper url(s) as an output).
I would need a way that works from the R console (no manual checking of the CRAN page etc).
I am not sure if this is what you are looking for. This build the URL from the repository information and building the file name of the list of available packages.
#get repository name
repos<- getOption("repos")
#Get url for the binary package
#contrib.url(repos, "both")
contriburl<-contrib.url(repos, "binary")
#"https://mirrors.nics.utk.edu/cran/bin/windows/contrib/3.5"
#make data.frame of avaialbe packages
df<-as.data.frame(available.packages())
#find package of interest
pkg <- "tidyr" #example
#ofinterest<-grep(pkg, df$Package)
ofinterest<-match(pkg, df$Package) #returns a single value
#assemble name, assumes it is always a zip file
name<-paste0(df[ofinterest,]$Package, "_", df[ofinterest,]$Version, ".zip")
#make final URL
finalurl<-paste0(contriburl, "/", name)
Here's a couple functions which respectively :
get the latest R version from RStudio's website
get the url of the last released windows binary
The first is a variation of code I found in the installr package. It seems there's no clean way of getting the last version, so we have to scrape a webpage.
The second is really just #Dave2e's code optimized and refactored into a function (with a fix for outdated R versions), so please direct upvotes to his answer.
get_package_url <- function(pkg){
version <- try(
available.packages()[pkg,"Version"],
silent = TRUE)
if(inherits(version,"try-error"))
stop("Package '",pkg,"' is not available")
contriburl <- contrib.url(getOption("repos"), "binary")
url <- file.path(
dirname(contriburl),
get_last_R_version(2),
paste0(pkg,"_",version,".zip"))
url
}
get_last_R_version <- function(n=3){
page <- readLines(
"https://cran.rstudio.com/bin/windows/base/",
warn = FALSE)
line <- grep("R-[0-9.]+.+-win\\.exe", page,value=TRUE)
long <- gsub("^.*?R-([0-9.]+.+)-win\\.exe.*$","\\1",line)
paste(strsplit(long,"\\.")[[1]][1:n], collapse=".")
}
get_package_url("data.table")
# on my system with R 3.3.1
# [1] "https://lib.ugent.be/CRAN/bin/windows/contrib/3.5/data.table_1.11.4.zip"

OS-independent way to select directory interactively in R

I would like users to be able to select a directory interactively in R. The solution needs to work on different platforms (at least on Linux, Windows and Mac machines that have a graphical desktop environment). And it needs to be robust enough to work on a variety of computers. I've run into problems with the variants I know of:
file.choose() unfortunately only works for files - It won't allow to select a directory. Other than this limitation, file.choose is a good example of the type of solution I'm looking for - it works across platforms and does not have external dependencies that may not be available on a particular computer.
choose.dir() Only works on Windows.
tk_choose.dir() from library(tcltk) was my preferred solution until recently. But I've had users report that this throws an error
log4cplus:ERROR No appenders could be found for logger (AdSyncNamespace).
log4cplus:ERROR Please initialize the log4cplus system properly.
which we tracked back to Autodesk360 software being installed, which for some reason interferes with tcltk. So this is not a suitable solution unless there is a fix for this. (the only solution I've found by googling is to uninstall Autodesk360, which won't be a solution for users who installed it because they actually use it).
This answer suggests the following as a possible alternative:
library(rJava)
library(rChoiceDialogs)
jchoose.dir()
But, as an example of the sort of thing that can go wrong with this, when I tried to install.packages("rJava") I got:
checking whether JNI programs can be compiled... configure: error:
Cannot compile a simple JNI program. See config.log for details.
Make sure you have Java Development Kit installed and correctly
registered in R. If in doubt, re-run "R CMD javareconf" as root.
ERROR: configuration failed for package ‘rJava’
* removing ‘/home/dominic/R/x86_64-pc-linux-gnu-library/3.3/rJava’ Warning in install.packages : installation of package ‘rJava’ had
non-zero exit status
I managed to fix this on my own machine (linux running openJDK) by installing the openjdk compiler using the linux package manager then running sudo R CMD javareconf. But I can't expect random users with varying levels of computer expertise to have to jump through hoops just so that they can select a directory. Even if they do manage to fix it, it will look bad when every other piece of software they use manages to open a directory-selection dialogue without any problems.
So my question: Is there a robust method that can reliably be expected to "just work" (like file.choose does for files), on a variety of platforms and makes no expectation of the end user being computer literate enough to solve these kinds of issues (such as incompatabilities with Autodesk360 or unresolved Java dependencies)?
In the time since posting this question and an earlier version of this answer, I've managed to test the various options that have been suggested on a range of computers. This process has converged on a fairly simple solution. The only cases I have found where tcltk::tk_choose.dir() fails due to conflicts are on Windows computers running Autodesk software. But on Windows, we have utils::choose.dir available instead. So the answer I am currently running with is:
choose_directory = function(caption = 'Select data directory') {
if (exists('utils::choose.dir')) {
choose.dir(caption = caption)
} else {
tk_choose.dir(caption = caption)
}
}
For completeness, I think it is useful to summarise some of the issues with other approaches and why they do not meet the criteria of being generally robust on a variety of platforms (including robustness against potentially unresolved external dependencies that can't be fixed from within R and that that may require administrator privileges and/or expertise to fix):
easycsv::choose_dir in Linux depends on zenity, which may not be available.
rstudioapi::selectDirectory requires that we are in RStudio Version greater than 1.1.287.
rChoiceDialogs::rchoose.dir requires not only that java runtime environment is installed, but also java compiler must be installed and configured correctly to work with rJava.
utils::menu does not work if the R function is run from the command line, rather than in an interactive session. Also on Linux X11 it frequently leaves an orphan window open after execution, which can't be readily closed.
gWidgets2::gfile has external dependency on either gtk2 or tcltk or Qt. Resolving these dependencies was found to be non-trivial in some cases.
Archived earlier version of this answer
Finally, an earlier version of this answer contained some longer code that tries out several possible solutions to find one that works. Although I have settled on the simple version above, I leave this version archived here in case it proves useful to someone else.
What it tries:
Check whether the function utils::choose.dir exists (will only be available on Windows). If so, use that
Check whether the user is working from within RStudio version 1.1.287 or greater. If so use the RStudio API.
Check if we can load the tcltk package and then open and close a tcltk window without throwing an error. If so, use tcltk.
Check whether we can load gWidgets2 and the RGtk2 widgets. If so, use gWidgets2. I don't try to load the tcltk widgets here, because if they worked, presumably we would already be using the tcltk package. I also do not try to load the Qt widgets, as they seem somewhat unmaintained and are not currently available on CRAN.
Check if we can load rJava and rChoiceDialogs. If so, use rChoiceDialogs.
If none of the above are successful, use a fallback position of requesting the directory name at the console.
Here's the longer version of the code:
# First a helper function to load packages, installing them first if necessary
# Returns logical value for whether successful
ensure_library = function (lib.name){
x = require(lib.name, quietly = TRUE, character.only = TRUE)
if (!x) {
install.packages(lib.name, dependencies = TRUE, quiet = TRUE)
x = require(lib.name, quietly = TRUE, character.only = TRUE)
}
x
}
select_directory_method = function() {
# Tries out a sequence of potential methods for selecting a directory to find one that works
# The fallback default method if nothing else works is to get user input from the console
if (!exists('.dir.method')){ # if we already established the best method, just use that
# otherwise lets try out some options to find the best one that works here
if (exists('utils::choose.dir')) {
.dir.method = 'choose.dir'
} else if (rstudioapi::isAvailable() & rstudioapi::getVersion() > '1.1.287') {
.dir.method = 'RStudioAPI'
ensure_library('rstudioapi')
} else if(ensure_library('tcltk') &
class(try({tt <- tktoplevel(); tkdestroy(tt)}, silent = TRUE)) != "try-error") {
.dir.method = 'tcltk'
} else if (ensure_library('gWidgets2') & ensure_library('RGtk2')) {
.dir.method = 'gWidgets2RGtk2'
} else if (ensure_library('rJava') & ensure_library('rChoiceDialogs')) {
.dir.method = 'rChoiceDialogs'
} else {
.dir.method = 'console'
}
assign('.dir.method', .dir.method, envir = .GlobalEnv) # remember the chosen method for later
}
return(.dir.method)
}
choose_directory = function(method = select_directory_method(), title = 'Select data directory') {
switch (method,
'choose.dir' = choose.dir(caption = title),
'RStudioAPI' = selectDirectory(caption = title),
'tcltk' = tk_choose.dir(caption = title),
'rChoiceDialogs' = rchoose.dir(caption = title),
'gWidgets2RGtk2' = gfile(type = 'selectdir', text = title),
readline('Please enter directory path: ')
)
}
Here is a simple directory navigation menu (using menu{utils}):
d=1
while(d != 0) {
a = getwd()
a = strsplit(a, "/")
a = unlist(a)
b = list.dirs(recursive = F, full.names = F)
c = paste("..", a[length(a) - 1], a[length(a)], sep = "/")
d = menu(c("..", b), title = c, graphics = T)
if(d==1){
e=paste(paste(a[1:(length(a)-1)],collapse = '/',sep = ''),'/',sep = '')
#print(e)
setwd(e)
}else{
e=paste(paste(a,collapse = '/',sep = ''),'/',b[d-1],sep='')
#print(e)
setwd(e)
}
}
Note: I did not (yet) test it under different systems. Here is what the documentation says:
If graphics = TRUE and a windowing system is available (Windows, macOS or X11 via Tcl/Tk) a listbox widget is used, otherwise a text menu. It is an error to use menu in a non-interactive session.
One limitation: The title = can only be a single line.
you can use the choose_dir function from easycsv.
it works on Windows, Linux and OSX
easycsv::choose_dir() # can be run without parameters to prompt a folder selection window
for some use cases a little trick might be to use dirname() around file.choose()
dir <- dirname(file.choose())
this will return the directory. It does however require at least one file to be present in the directory
Suggestion for adaption of choose_directory() as mentioned in my comment (06.09.2018 RFelber):
choose_directory <- function(ini_dir = getwd(),
method = select_directory_method(),
title = 'Select data directory') {
switch(method,
'choose.dir' = choose.dir(default = ini_dir, caption = title),
'RStudioAPI' = selectDirectory(path = ini_dir, caption = title),
'tcltk' = tk_choose.dir(default = ini_dir, caption = title),
'rChoiceDialogs' = rchoose.dir(default = ini_dir, caption = title),
'gWidgets2RGtk2' = gfile(type = 'selectdir', text = title, initial.dir = ini_dir),
readline('Please enter directory path: ')
)
}

Extract extension of file from working directory and check condition

Possible Duplicate: Extract file extension from file path
I am in a state where i need to check the extension of files in my working directory and take some decision. I check it by list.files() and it gives me all the files in the working directory with extension.
I get a list like
"GSM18423_PA-D_132.cel" "GSM18424_PA-D_206.cel" "GSM18425_PA-D_216.cel"
Now further I want a condition, if a file has extension .cel do something like below.
if(extension==".cel")
...... else
......
As i looked for tools package, but not working in my R version of R version 3.1.3 RC (2015-03-06 r67947) . I tried install.packages("tools") which pops up a window and asks to restart my system before installing but finally does nothing even no restart also. Finally i get a message
Installing package into ‘/home/hussain/R/i686-pc-linux-gnu-library/3.1’
(as ‘lib’ is unspecified)
Warning in install.packages :
package ‘tools’ is not available (for R version 3.1.3 RC)
This is the source-code of tools::file_ext
function (x)
{
pos <- regexpr("\\.([[:alnum:]]+)$", x)
ifelse(pos > -1L, substring(x, pos + 1L), "")
}
just create your own function with this code
With reference to the comment #user20650, i think it would be easy to do something like
lst <- list.files()
ext <- grepl("*.cel$", lst)[1]
if(ext)
{ .....
code
....
}else{
....
code
.....
}

Package that downloads data from the internet during installation

Is anyone aware of a package that downloads a dataset from the internet during the installation process and then prepares and saves it so that it is available when loading the package using library(packageName)? Are there any drawbacks in this approach (besides the obvious one that package installation will fail if the data source is unavailable or the data format has changed)?
EDIT: Some background. The data is three tab-separated files in a ZIP archive, owned by federal statistics and generally freely accessible. I have R code which downloads, extracts and prepares the data, in the end three data frames are created which could be saved in .RData format.
I am thinking about creating two packages: A "data" package that provides the data, and a "code" package that operates on it.
I did this mockup before, while you were posting your edit. I presume it would work, but not tested. I've commented it so you can see what you would need to change. The idea here is to check to see if an expected object is available in the current working environment. If it is not, check to see that the file that the data can be found in is in the current working directory. If that is not found, prompt the user to download the file, then proceed from there.
myFunction <- function(this, that, dataset) {
# We're giving the user a chance to specify the dataset.
# Maybe they have already downloaded it and saved it.
if (is.null(dataset)) {
# Check to see if the object is already in the workspace.
# If it is not, check to see whether the .RData file that
# contains the object is in the current working directory.
if (!exists("OBJECTNAME", where = 1)) {
if (isTRUE(list.files(
pattern = "^DATAFILE.RData$") == "DATAFILE.RData")) {
load("DATAFILE.RData")
# If neither of those are successful, prompt the user
# to download the dataset.
} else {
ans = readline(
"DATAFILE.RData dataset not found in working directory.
OBJECTNAME object not found in workspace. \n
Download and load the dataset now? (y/n) ")
if (ans != "y")
return(invisible())
# I usually use RCurl in case the URL is https
require(RCurl)
baseURL = c("http://some/base/url/")
# Here, we actually download the data
temp = getBinaryURL(paste0(baseURL, "DATAFILE.RData"))
# Here we load the data
load(rawConnection(temp), envir=.GlobalEnv)
message("OBJECTNAME data downloaded from \n",
paste0(baseURL, "DATAFILE.RData \n"),
"and added to your workspace\n\n")
rm(temp, baseURL)
}
}
dataset <- OBJECTNAME
}
TEMP <- dataset
## Other fun stuff with TEMP, this, and that.
}
Two packages, hosted at Github
Here's another approach, building on the comments between #juba and I. The basic concept is to have, as you describe, one package for the codes and one for the data. This function would be part of the package that contains your code. It will:
Check to see if the data package is installed
Check to see if the version of the data package you have installed matches the version at Github, which we are going to assume is the most up to date version.
When it fails any of the checks, it asks the user if they want to update their installation of the package. In this case, for demonstration, I've linked to one of my packages in progress at Github. This should give you an idea of what you need to substitute to get it to work with your own package once you've hosted it there.
CheckVersionFirst <- function() {
# Check to see if installed
if (!"StataDCTutils" %in% installed.packages()[, 1]) {
Checks <- "Failed"
} else {
# Compare version numbers
require(RCurl)
temp <- getURL("https://raw.github.com/mrdwab/StataDCTutils/master/DESCRIPTION")
CurrentVersion <- gsub("^\\s|\\s$", "",
gsub(".*Version:(.*)\\nDate.*", "\\1", temp))
if (packageVersion("StataDCTutils") == CurrentVersion) {
Checks <- "Passed"
}
if (packageVersion("StataDCTutils") < CurrentVersion) {
Checks <- "Failed"
}
}
switch(
Checks,
Passed = { message("Everything looks OK! Proceeding!") },
Failed = {
ans = readline(
"'StataDCTutils is either outdated or not installed. Update now? (y/n) ")
if (ans != "y")
return(invisible())
require(devtools)
install_github("StataDCTutils", "mrdwab")
})
# Some cool things you want to do after you are sure the data is there
}
Try it out with CheckVersionFirst().
Note: This would succeed only if you religiously remember to update your version number in your description file every time you push a new version of the data to Github!
So, to clarify/recap/expand, the basic idea would be to:
Periodically push the updated version of your data package to Github, being sure to change the version number of the data package in its DESCRIPTION file when you do so.
Integrate this CheckVersionFirst() function as an .onLoad event in your code package. (Obviously modify the function to match your account and package name).
Change the commented line that reads # Some cool things you want to do after you are sure the data is there to reflect the cool things you actually want to do, which would probably start with library(YOURDATAPACKAGE) to load the data....
This method may not be efficient, but a good workaround. If you are making a package that needs regularly updated data, first make a package which has that data. It does not need any functions, but I like the concept of a setter (which you might not need in this case) & getter.
Then when you make your package, have the 'data'-package as a dependency. This way, whenever someone installs your package, he/she will always have the latest data.
On your part, you'll just have to swap out the data in your 'data' package, and upload it to the repo you want.
If you don't know how to build a package, check ?packages.skeleton and R CMD CHECK, R CMD BUILD

How do I set R_LIBS_SITE on Ubuntu so that .libPaths() is set properly for all users at startup?

I am setting up a cluster where all nodes have access to /nfs/software, so a good place to install.packages() would be under /nfs/software/R. How do I set R_LIBS_SITE so that this is automatically part of all users' R environment? I tried prepending to the path given for R_LIBS_SITE in /etc/R/Renviron but help(Startup) says "do not change ‘R_HOME/etc/Renviron’ itself", which I'm not sure is the same file since R_HOME expands to /usr/lib/R, but has no effect in any case. Making entries in the various Renviron.site and Rprofile.site files does not seem to have the desired effect. What am I missing here?
Some other questions have danced around this (here and here, maybe others), but people seem to settle for having a user-specific library in their HOME.
Make sure you have owner and/or group write permissions for the directory you want to write into.
The file /etc/R/Renviron.site is the preferred choice for local overrides to /etc/R/Renviron.
Another way is to simply ... impose the directory when installing packages. I tend to do that on the (bash rather than R) shell via this script derived from an example in the littler package:
$ cat bin/install.r
#!/usr/bin/env r
#
# a simple example to install one or more packages
if (is.null(argv) | length(argv)<1) {
cat("Usage: installr.r pkg1 [pkg2 pkg3 ...]\n")
q()
}
## adjust as necessary, see help('download.packages')
repos <- "http://cran.us.r-project.org"
#repos <- "http://cran.r-project.org"
## this makes sense on Debian where no packages touch /usr/local
lib.loc <- "/usr/local/lib/R/site-library"
install.packages(argv, lib.loc, repos)
and you can easily customize a helper like this for your preferred location. With the script installed in ~/bin/, I often do
$ ~/bin/install.r xts plyr doRedis
and it will faithfully install these packages along with their depends. The littler package has a similar script update.r.
follow-up on Dirk Eddelbuettel (thanks Dirk!)
an adaptation of Dirk's suggestion that may be run within R:
# R function to install one or more packages
Rinstall <- function(pkg) {
if (is.null(pkg) | length(pkg)<1) {
q()
}
if(.Platform$OS.type == "windows") {
lib.dir <- "c:/R/library"
} else {
lib.dir <- "~/R/library"
}
repos.loc <- "http://cran.us.r-project.org"
install.packages(pkg, lib.dir, repos.loc, dependencies=c('Depends','Suggests')) # make sure you get dependencies
}
Usage:
Rinstall(c("package1", "package2"))
Naturally you want to adapt the repos.loc and lib.dir based on your system. As I work on both Windows and Linux machines I also inserted a conditional statement to check which system I'm on.
P.S. Don't hesitate to simplify the code, I'm a total newbie.

Resources