OS-independent way to select directory interactively in R - r

I would like users to be able to select a directory interactively in R. The solution needs to work on different platforms (at least on Linux, Windows and Mac machines that have a graphical desktop environment). And it needs to be robust enough to work on a variety of computers. I've run into problems with the variants I know of:
file.choose() unfortunately only works for files - It won't allow to select a directory. Other than this limitation, file.choose is a good example of the type of solution I'm looking for - it works across platforms and does not have external dependencies that may not be available on a particular computer.
choose.dir() Only works on Windows.
tk_choose.dir() from library(tcltk) was my preferred solution until recently. But I've had users report that this throws an error
log4cplus:ERROR No appenders could be found for logger (AdSyncNamespace).
log4cplus:ERROR Please initialize the log4cplus system properly.
which we tracked back to Autodesk360 software being installed, which for some reason interferes with tcltk. So this is not a suitable solution unless there is a fix for this. (the only solution I've found by googling is to uninstall Autodesk360, which won't be a solution for users who installed it because they actually use it).
This answer suggests the following as a possible alternative:
library(rJava)
library(rChoiceDialogs)
jchoose.dir()
But, as an example of the sort of thing that can go wrong with this, when I tried to install.packages("rJava") I got:
checking whether JNI programs can be compiled... configure: error:
Cannot compile a simple JNI program. See config.log for details.
Make sure you have Java Development Kit installed and correctly
registered in R. If in doubt, re-run "R CMD javareconf" as root.
ERROR: configuration failed for package ‘rJava’
* removing ‘/home/dominic/R/x86_64-pc-linux-gnu-library/3.3/rJava’ Warning in install.packages : installation of package ‘rJava’ had
non-zero exit status
I managed to fix this on my own machine (linux running openJDK) by installing the openjdk compiler using the linux package manager then running sudo R CMD javareconf. But I can't expect random users with varying levels of computer expertise to have to jump through hoops just so that they can select a directory. Even if they do manage to fix it, it will look bad when every other piece of software they use manages to open a directory-selection dialogue without any problems.
So my question: Is there a robust method that can reliably be expected to "just work" (like file.choose does for files), on a variety of platforms and makes no expectation of the end user being computer literate enough to solve these kinds of issues (such as incompatabilities with Autodesk360 or unresolved Java dependencies)?

In the time since posting this question and an earlier version of this answer, I've managed to test the various options that have been suggested on a range of computers. This process has converged on a fairly simple solution. The only cases I have found where tcltk::tk_choose.dir() fails due to conflicts are on Windows computers running Autodesk software. But on Windows, we have utils::choose.dir available instead. So the answer I am currently running with is:
choose_directory = function(caption = 'Select data directory') {
if (exists('utils::choose.dir')) {
choose.dir(caption = caption)
} else {
tk_choose.dir(caption = caption)
}
}
For completeness, I think it is useful to summarise some of the issues with other approaches and why they do not meet the criteria of being generally robust on a variety of platforms (including robustness against potentially unresolved external dependencies that can't be fixed from within R and that that may require administrator privileges and/or expertise to fix):
easycsv::choose_dir in Linux depends on zenity, which may not be available.
rstudioapi::selectDirectory requires that we are in RStudio Version greater than 1.1.287.
rChoiceDialogs::rchoose.dir requires not only that java runtime environment is installed, but also java compiler must be installed and configured correctly to work with rJava.
utils::menu does not work if the R function is run from the command line, rather than in an interactive session. Also on Linux X11 it frequently leaves an orphan window open after execution, which can't be readily closed.
gWidgets2::gfile has external dependency on either gtk2 or tcltk or Qt. Resolving these dependencies was found to be non-trivial in some cases.
Archived earlier version of this answer
Finally, an earlier version of this answer contained some longer code that tries out several possible solutions to find one that works. Although I have settled on the simple version above, I leave this version archived here in case it proves useful to someone else.
What it tries:
Check whether the function utils::choose.dir exists (will only be available on Windows). If so, use that
Check whether the user is working from within RStudio version 1.1.287 or greater. If so use the RStudio API.
Check if we can load the tcltk package and then open and close a tcltk window without throwing an error. If so, use tcltk.
Check whether we can load gWidgets2 and the RGtk2 widgets. If so, use gWidgets2. I don't try to load the tcltk widgets here, because if they worked, presumably we would already be using the tcltk package. I also do not try to load the Qt widgets, as they seem somewhat unmaintained and are not currently available on CRAN.
Check if we can load rJava and rChoiceDialogs. If so, use rChoiceDialogs.
If none of the above are successful, use a fallback position of requesting the directory name at the console.
Here's the longer version of the code:
# First a helper function to load packages, installing them first if necessary
# Returns logical value for whether successful
ensure_library = function (lib.name){
x = require(lib.name, quietly = TRUE, character.only = TRUE)
if (!x) {
install.packages(lib.name, dependencies = TRUE, quiet = TRUE)
x = require(lib.name, quietly = TRUE, character.only = TRUE)
}
x
}
select_directory_method = function() {
# Tries out a sequence of potential methods for selecting a directory to find one that works
# The fallback default method if nothing else works is to get user input from the console
if (!exists('.dir.method')){ # if we already established the best method, just use that
# otherwise lets try out some options to find the best one that works here
if (exists('utils::choose.dir')) {
.dir.method = 'choose.dir'
} else if (rstudioapi::isAvailable() & rstudioapi::getVersion() > '1.1.287') {
.dir.method = 'RStudioAPI'
ensure_library('rstudioapi')
} else if(ensure_library('tcltk') &
class(try({tt <- tktoplevel(); tkdestroy(tt)}, silent = TRUE)) != "try-error") {
.dir.method = 'tcltk'
} else if (ensure_library('gWidgets2') & ensure_library('RGtk2')) {
.dir.method = 'gWidgets2RGtk2'
} else if (ensure_library('rJava') & ensure_library('rChoiceDialogs')) {
.dir.method = 'rChoiceDialogs'
} else {
.dir.method = 'console'
}
assign('.dir.method', .dir.method, envir = .GlobalEnv) # remember the chosen method for later
}
return(.dir.method)
}
choose_directory = function(method = select_directory_method(), title = 'Select data directory') {
switch (method,
'choose.dir' = choose.dir(caption = title),
'RStudioAPI' = selectDirectory(caption = title),
'tcltk' = tk_choose.dir(caption = title),
'rChoiceDialogs' = rchoose.dir(caption = title),
'gWidgets2RGtk2' = gfile(type = 'selectdir', text = title),
readline('Please enter directory path: ')
)
}

Here is a simple directory navigation menu (using menu{utils}):
d=1
while(d != 0) {
a = getwd()
a = strsplit(a, "/")
a = unlist(a)
b = list.dirs(recursive = F, full.names = F)
c = paste("..", a[length(a) - 1], a[length(a)], sep = "/")
d = menu(c("..", b), title = c, graphics = T)
if(d==1){
e=paste(paste(a[1:(length(a)-1)],collapse = '/',sep = ''),'/',sep = '')
#print(e)
setwd(e)
}else{
e=paste(paste(a,collapse = '/',sep = ''),'/',b[d-1],sep='')
#print(e)
setwd(e)
}
}
Note: I did not (yet) test it under different systems. Here is what the documentation says:
If graphics = TRUE and a windowing system is available (Windows, macOS or X11 via Tcl/Tk) a listbox widget is used, otherwise a text menu. It is an error to use menu in a non-interactive session.
One limitation: The title = can only be a single line.

you can use the choose_dir function from easycsv.
it works on Windows, Linux and OSX
easycsv::choose_dir() # can be run without parameters to prompt a folder selection window

for some use cases a little trick might be to use dirname() around file.choose()
dir <- dirname(file.choose())
this will return the directory. It does however require at least one file to be present in the directory

Suggestion for adaption of choose_directory() as mentioned in my comment (06.09.2018 RFelber):
choose_directory <- function(ini_dir = getwd(),
method = select_directory_method(),
title = 'Select data directory') {
switch(method,
'choose.dir' = choose.dir(default = ini_dir, caption = title),
'RStudioAPI' = selectDirectory(path = ini_dir, caption = title),
'tcltk' = tk_choose.dir(default = ini_dir, caption = title),
'rChoiceDialogs' = rchoose.dir(default = ini_dir, caption = title),
'gWidgets2RGtk2' = gfile(type = 'selectdir', text = title, initial.dir = ini_dir),
readline('Please enter directory path: ')
)
}

Related

STRINGdb r environment; error in plot_network

I'm trying to use stringdb in R and i'm getting the following error when i try to plot the network:
Error in if (grepl("The document has moved", res)) { : argument is
of length zero
code:
library(STRINGdb)
#(specify organism)
string_db <- STRINGdb$new( version="10", species=9606, score_threshold=0)
filt_mapped = string_db$map(filt, "GeneID", removeUnmappedRows = TRUE)
head(filt_mapped)
(i have columns titled: GeneID, logFC, FDR, STRING_id with 156 rows)
filt_mapped_hits = filt_mapped$STRING_id
head(filt_mapped_hits)
(156 observations)
string_db$plot_network(filt_mapped_hits, add_link = FALSE)
Error in if (grepl("The document has moved", res)) { : argument is
of length zero
You are using quite few years old version of Bioconductor and by extension the STRING package.
If you update to the newest one, it will work. However the updated package only supports only the latest version STRING (currently version 11), so the underlying network may change a bit.
More detailed reason is this:
The STRING's hardware infrastructure underwent recently major changes which forced a different server setup.
Now all the old calls are forwarded to a different URL, however the cURL call, how it was implemented, does not follow our redirects which breaks the STRINGdb package functionality.
We cannot update the old bioconductor package and our server setup can’t be really changed.
That said, the fix for an old version is relatively simple.
In STRINGdb library there is script with all the methods "rstring.r".
In there you’ll find “get_png” method. In it replace this line:
urlStr = paste("http://string-db.org/version_", version, "/api/image/network", sep="" )
With this line:
urlStr = paste("http://version", version, ".string-db.org/api/image/network", sep="" )
Load the library again and it should create the PNG, as before.

How to manage dependencies with renv explicitly

I would prefer to have a config file and list the packages within it which are needed for the project, rather than relying on renv::init() to scrape the project and find all which I need (it often can't).
So my question is - how do I explicitly tell renv which packages are required for a project, an example would be appreciated.
The renv package does all sorts of fancy things: installing from several different locations, setting up a project-specific library so that you can control the versions for a project, etc. If you need that stuff, I think you're out of luck. As far as I can see it has no way to pass in a list of dependencies, it needs to scan your source to find them. I suppose you could include a function like
loadPackages <- function() {
requireNamespace("foo")
requireNamespace("bar")
...
}
to make it easier for renv to find your required packages, but if it's failing in some other way (e.g. you have incomplete files that don't parse properly), this won't help.
If you don't need all that fancy stuff, you could use the function below:
needsPackages <- function(pkgs, install = TRUE, update = FALSE,
load = FALSE, attach = FALSE) {
missing <- c()
for (p in pkgs) {
if (!nchar(system.file(package = p)))
missing <- c(missing, p)
}
if (length(missing)) {
missing <- unique(missing)
if (any(install)) {
toinstall <- intersect(missing, pkgs[install])
install.packages(toinstall)
for (p in missing)
if (!nchar(system.file(package = p)))
stop("Did not install: ", p)
} else
stop("Missing packages: ", paste(missing, collapse = ", "))
}
if (any(update))
update.packages(oldPkgs = pkgs[update], ask = FALSE, checkBuilt = TRUE)
for (p in pkgs[load])
loadNamespace(p)
for (p in pkgs[attach])
library(p, character.only = TRUE)
}
which is what I've used in one project. You call it as
needsPackages(c("foo", "bar"))
and it installs the missing ones. It can also update, load, or attach them. It's just using the standard function install.packages to install from CRAN,
no fancy selection of install locations, or maintenance of particular package versions. If you do use something simple like this, you should run sessionInfo() afterwards to record package version numbers, in case you need to return to the same state later. (Though returning to that state will probably be painful!)
There are two possible ways forward here:
Configure renv to use "explicit" snapshots, as described in https://rstudio.github.io/renv/reference/snapshot.html#snapshot-type -- this workflow requires that you list your package requirements in your DESCRIPTION file;
Manually use renv::init(bare = TRUE) + renv::install(<packages>) (or your own package installation functions) to install the packages you need for your project, building the list of <packages> from some separate source that you maintain.
If you have specific workflow that you wish renv would make possible, then you could consider filing a feature request at https://github.com/rstudio/renv/issues.

R curl::has_internet() FALSE even though there are internet connection

My problem arose when downloading data from EuroSTAT using the R package eurostat:
# Population data by NUTS3
pop_data <- subset(eurostat::get_eurostat("demo_r_pjangrp3", time_format = "num"),
(age == "TOTAL") & (sex == "T") &
(nchar(trimws(geo)) == 5))[, c("time","geo","values")]
#Fejl i eurostat::get_eurostat("demo_r_pjangrp3", time_format = "num") :
# You have no internet connection, please reconnect!
Seaching, I have found out that it is the statement (in the eurostat-package code):
if (curl::has_internet() {stop("You have no inernet connection, please connnect") that cause the problem.
However, I have interconnection and can e.g. ping www.eurostat.eu
I have tried curl::has_internet() on different computers, all with internet connection. On some it work (respond TRUE) on others it don't.
I have talked with our IT department, and we tried if it could be a firewall problem. Removing the firewall, did not solve the problem.
Unfortunately, I am ignorant on network-settings. Hence, when trying to read the documentation for the curl-package I am lost.
Downloading data from EuroSTAT using the command above have worked for the last at least 2 years, and for me the problem arose at the start of 2020 (January 7).
Hope someone can help with this, as downloading population data from EuroSTAT is a mandatory part in more of my/our regular work.
In the special case of curl::has_internet, you don't need to modify the function to return a specific value. It has its own enclosing environment, from which it reads a state variable indicating whether a proxy connection exists. You can modify that state variable instead.
assign("has_internet_via_proxy", TRUE, environment(curl::has_internet))
curl::has_internet() # will always be TRUE
# [1] TRUE
It's difficult to tell without knowing your settings but there are a couple of things to try. This issue has been noted and possibly addressed in a development version which you can install with
install.packages("https://github.com/jeroen/curl/archive/master.tar.gz", repos = NULL)
You could also try updating libcurl, which is the C library for which the R package acts as an R interface. The problem you describe seems to be more common with older versions of libcurl.
If all else fails, you could overwrite the curl::has_internet function like this:
remove_has_internet <- function()
{
unlockBinding(sym = "has_internet", asNamespace("curl"))
assign("has_internet", function() return(TRUE), envir = asNamespace("curl"))
lockBinding(sym = "has_internet", asNamespace("curl"))
}
Now if you run remove_has_internet(), any call to curl::has_internet() will return TRUE for the remainder of your R session. However, this will only work if other curl functionality is working properly with your network settings. If it isn't then you will get other strange errors and should abandon this approach.
If, for any reason, you want to restore the functionality of the original curl::has_internet without restarting an R session, you can do this:
restore_has_internet <- function()
{
unlockBinding(sym = "has_internet", asNamespace("curl"))
assign("has_internet",
function() {!is.null(nslookup("r-project.org", error = FALSE))},
envir = asNamespace("curl"))
lockBinding(sym = "has_internet", asNamespace("curl"))
}
I just got into this problem, so here's an additional solution, blending both previous answers. It's reversible and checks if we actually have internet to avoid bigger problems later.
# old value
op = get("has_internet_via_proxy", environment(curl::has_internet))
# check for internet
np = !is.null(curl::nslookup("r-project.org", error = FALSE))
assign("has_internet_via_proxy", np, environment(curl::has_internet))
Within a function, this line can be added to automatically revert the process:
on.exit(assign("has_internet_via_proxy", op, environment(curl::has_internet)))

Setting default `.libPaths()` for multiple versions of R

I work in an environment with multiple versions of R available, managing my libraries can be something of a hassle as I have to switch library locations to avoid issues with packages being built under different versions of R.
Is there a way to change my default library location in .libPaths() automatically depending on the version of R i'm using?
I found this trick useful.
keep your locally installed R libraries in a directory named for their version, detect the version when R starts and set .libPaths() accordingly
Edit your .Rprofile file in you home directory to contain something like the following:
version <- paste0(R.Version()$major,".",R.Version()$minor)
if (version == "3.5.2") {
.libPaths( c("/path/to/Rlibs/3.5.2", .libPaths()) )
} else if (version == "3.4.3") {
.libPaths( c("/path/to/Rlibs/3.4.3", .libPaths()) )
}
Updated version that automatically creates a new library folder for a new R version if one is detected, also throws a warning when it does this in case you accidentally loaded a new R version when you weren't intending to.
# Set Version specific local libraries
## get current R version (in semantic format)
version <- paste0(R.Version()$major,".",R.Version()$minor)
## get username
uname <- Sys.getenv("USER") # USERNAME on windows because why make it easy?
## generate R library path for parent directory
libPath <- paste0("/home/", uname, "/Rlibs/")
setLibs <- function(libPath, ver) {
libfull <- paste0(libPath, ver) # combine parent and version for full path
if(!dir.exists(libfull)) { # create a new directory for this R version if it does not exist
# Warn user (the necessity of creating a new library may indicate an in advertant choice of the wrong R version)
warning(paste0("Library for R version '", ver, "' Does not exist it will be created at: ", libfull ))
dir.create(libfull)
}
.libPaths(c(libfull, .libPaths()))
}
setLibs(libPath, version)
A slightly shorter version of Richard J. Acton's solution:
version <- paste0(R.Version()$major,".",R.Version()$minor)
libPath <- path.expand(file.path("~/.R/libs", version))
if(!dir.exists(libPath)) {
warning(paste0("Library for R version '", version, "' will be created at: ", libPath ))
dir.create(libPath, recursive = TRUE)
}
.libPaths(c(libPath, .libPaths()))
I have a standard path structure with a new folder added for each version number, and a folder called pax for packages. To do this you just need to add the following to your .Rprofile.
Rver <- paste0(R.Version()$major, ".", R.Version()$minor)
.libPaths(file.path(paste0(
"C:/Users/abcd/R/", Rver, "/pax")))
This means you aren't going to grow a forest of if statements if you have multiple versions.

Error when trying to deploy to shinyapps.io: Application depends on package "package" but it is not

My server.R contains the following code for dynamically installing packages when needed:
package <- input$chip
if (!require(package, character.only=T, quietly=T)) {
source("https://bioconductor.org/biocLite.R")
biocLite(package, ask = F, suppressUpdates = T, suppressAutoUpdate = T)
library(package, character.only=T)
}
ui.R has a select input element where the user can select one of the following bioconductor packages:
selectInput(inputId = 'chip', label='Chip', choices=c('Mouse Gene 1.0'='mogene10sttranscriptcluster.db',
'Mouse Gene 2.0'='mogene20sttranscriptcluster.db',
'Human Gene 1.0'='hugene10sttranscriptcluster.db',
'Human Genome U133A 2.0'='hgu133a2.db'))
So, based on what chip the user selects, the corresponding annotation package should get loaded, and if it is not already installed, it should install it.
This works on my local machine. But when I try to deploy my app on shinyapps.io. I get the following error:
Error:
* Application depends on package "package" but it is not installed. Please resolve before continuing.
I know that it is unable to recognize the package in biocLite(package, ask = F, suppressUpdates = T, suppressAutoUpdate = T). The deployment process thinks that package is a library name and not a variable and is unable to evaluate its value.
Is there any way to resolve this? Or do I have to explicitly load all required packages? The problem with explicitly loading the annotation packages is that these packages are so big they take up a lot of memory, which is why I wanted to load these packages only when required.
An alternative is to make an if-else loop or switch statement to install packages based on the condition:
package <- function(input$chip) {
switch(input$chip,
'mogene10sttranscriptcluster.db' = 'mogene10sttranscriptcluster.db',
'mogene20sttranscriptcluster.db' = 'mogene20sttranscriptcluster.db',
'hugene10sttranscriptcluster.db' = 'hugene10sttranscriptcluster.db',
'hgu133a2.db' = 'hgu133a2.db')
}
library(package)
But even in this case, the deployment process won't be able to evaluate the package value.
Thanks!
UPDATE:
Taking Yihui's suggestion, I modified my code to:
package <- input$genome
if(!do.call(require, list(package = package, character.only = T, quietly = T))){
do.call(biocLite, list(pkgs = package, ask = F, suppressUpdates = T, suppressAutoUpdate = T))
do.call(library, list(package = package, character.only = TRUE))
}
The application is able to deploy now, but it throws me this error:
Error: unable to install packages
Unfortunately, you have to fool the shinyapps (or rsconnect) package a bit so that it does not detect package as a literal package name. For example, you may use do.call():
do.call(library, list(package = package, character.only = TRUE))
The ShinyApps.io server does not allow you to install packages on the fly (strictly speaking, this is not true, but I don't want to show you how). You have to declare all packages you need in the app as dependencies beforehand. Again, it is a hack:
if (FALSE) {
library(mogene10sttranscriptcluster.db)
library(mogene20sttranscriptcluster.db)
library(hugene10sttranscriptcluster.db)
library(hgu133a2.db)
}
Then ShinyApps.io will detect these packages as dependencies and pre-install them for you. What you need to do in your app is simply load them, and you don't need to install them by yourself.

Resources