Check if package installation is required while running code via source()? - r

I am running several scripts in RStudio and checking syntax errors. I am using source() in a loop to perform those tasks. In some scripts, install.packages("packagename") occurs. My problem is that when i have the required packages already installed in my computer, a message pops up asking me to update the library. In these cases, I would like to be able to "ignore" install.packages("packagename") call and running code to continue without showing any message.
So, how can i check if package installation is required or not while running code via source()?

Bit of a hack, but given the location file, this will list all the packages inside the script:
require(readr)
require(stringr)
listPackages <- function(file)
{
r <- readr::read_file(file)
r = str_replace_all(r, '\\"', "") # remove all quote marks
packages <- str_extract_all(r, regex("(install.packages|library|require|p_load)\\([:alnum:]*\\)*", multiline = TRUE))[[1]]
return(unique(gsub("\\(", "", str_extract(packages, regex("\\([:alnum:]*", multiline = F)))))
}
Example
test.R:
library("ggplot2")
library(stats)
require("cowplot")
require(MASS)
add <- function(x,y) {x+y}
install.packages("cowplot")
p_load(dplyr)
p_load("dplyr")
listPackages("test.R")
# [1] "ggplot2" "stats" "cowplot" "MASS" "cowplot" "dplyr"

Related

How to conveniently install uninstalled packages found in library() calls in a script in RStudio?

Suppose I'm given a script with a bunch of packages, some of which I already have installed and others I don't, is there a quick/easy way (keyboard shortcut perhaps?) to make RStudio 1. recognise the library() calls, and 2. install any packages that aren't already installed?
Note: I recall a small notification used to appear toward to the top of the RStudio script pane, but it doesn't seem to happen for me -- perhaps that feature was removed or I need to do something to trigger it.
An example of a script with a lot of library() calls:
library(shiny) # not installed
library(shinydashboard) # not installed
library(dplyr) # already installed
library(tm) # etc etc
library(wordcloud)
library(memoise)
library(janeaustenr)
library(tidyverse)
library(tidytext)
library(wordcloud2)
library(tidyr)
# Truncated for brevity
Well, I found one way is to save the script, that will trigger the notification offering to install uninstalled packages:
These approaches are not dependent on R Studio or any other infrastructure other than R itself. If you give the script to someone else they will still work even if they are not using R Studio.
1) Define your own library function at the top of the script which checks if the package named in the package argument is installed and if not installs it. Note that require, used in this script, loads the package if it is present and then returns TRUE. It returns FALSE if the package was not present.
library <- function(package, ...) {
pkg <- as.character(substitute(package))
if (!require(pkg, character.only = TRUE, quietly = TRUE, ...)) {
install.packages(pkg)
base::library(pkg, character.only = TRUE, ...)
}
}
2) If it is good enough to be notified if a package is not installed then you don't have to do anything as the ordinary library statement will fail if the package is not installed halting the script and effectively notifying you to install it with a message such as the following which identifies the uninstalled package.
Error in library(xyz) : there is no package called ‘xyz’
If you go this route make sure that either there are no package redundancies in the library statements or at least that the library statements for dependencies come after the packages that depend on them. This is only to optimize installation iterations and you don't need to do it if you are willing to risk having additional iterations if multiple dependent packages are not installed.
For example, from the dependency relationships below we see that to minimize installation iterations the library statement for dplyr should come after that for tidyr and the library statement for tidyr should come after that for tidyverse. Also the library statement for shiny should come after that for shinydashboard. (Can use the R function tsort from https://rosettacode.org/wiki/Topological_sort which will accept deps and produce a possible ordering for those packages but we can just do it by inspecting deps or without deps at all and just relying on our knowledge of those packages.)
library(tools)
p <- c("shiny", "shinydashboard", "dplyr", "tm", "wordcloud",
"memoise", "janeaustenr", "tidyverse", "tidytext", "wordcloud2",
"tidyr")
deps <- Filter(length, Map(function(x) intersect(dependsOnPkgs(x), p), p))
deps
## $shiny
## [1] "shinydashboard"
##
## $dplyr
## [1] "tidyr" "tidyverse"
##
## $tidyr
## [1] "tidyverse"
tsort(deps)
## [1] "shinydashboard" "tidyverse" "shiny" "tidyr"
## [5] "dplyr"

Can R differentiate between a manually loaded library and a dependency

I have written a function to get the name and version of all of my loaded packages:
my_lib <- function(){
tmp <- (.packages())
tmp_base <- sessionInfo()$basePkgs
tmp <- setdiff(tmp, tmp_base)
tmp <- sort(tmp)
tmp <- sapply(tmp, function(x){
x <- paste(x, utils::packageVersion(x), sep = ' v')
})
tmp <- paste(tmp, collapse=', ')
return(tmp)
}
This also returns all packages loaded as dependencies to other packages (eg I load car and carData is loaded as a dependency).
I am wondering if there is a way to only return the packages I manually loaded (eg just car)? Can R tell the difference between manually loaded vs loaded as a dependency?
Edit:
Added line to remove base packages using sessionInfo()
R has a subtle difference between a loaded package and an attached package.
A package is attached when you use the library function,
and it makes its exported functions "visible" to the user's global environment.
If a package is attached,
its namespace has been loaded,
but the opposite is not necessarily true.
Each package can define two main types of dependencies: Depends and Imports.
The packages in the former get attached as soon as the dependent package is attached,
but the packages in the latter only get loaded.
This means you can't completely differentiate,
because you may call library for a specific package,
but any packages it Depends on will also be attached.
Nevertheless, you can differentiate between loaded and attached packages with loadedNamespaces() and search().
EDIT: It just occurred to me that if you want to track usage of library
(ignoring require),
you could write a custom tracker:
library_tracker <- with(new.env(), {
packages <- character()
function(flag) {
if (missing(flag)) {
packages <<- union(packages, as.character(substitute(package, parent.frame())))
}
packages
}
})
trace("library", library_tracker, print = FALSE)
library("dplyr")
library(data.table)
# retrieve packages loaded so far
library_tracker(TRUE)
[1] "dplyr" "data.table"
The flag parameter is just used to distinguish between calls made by trace,
which call the function without parameters,
and those made outside of it,
in order to easily retrieve packages loaded so far.
You could also use environment(library_tracker)$packages.

How to find out if I am using a package

My R script has evolved over many months with many additions and subtractions. It is long and rambling and I would like to find out which packages I am actually using in the code so I can start deleting library() references. Is there a way of finding redundant dependencies in my R script?
I saw this question so I tried:
library(mvbutils)
library(MyPackage)
library(dplyr)
foodweb( funs=find.funs( asNamespace( 'EndoMineR')), where=
asNamespace( 'EndoMineR'), prune='filter')
But that really tells me where I am using a function from a package whereas I don't necessarily remember which functions I have used from which package.
I tried packrat but this is looking for projects whereas mine is a directory of scripts I am trying to build into a package.
To do this you first parse your file and then use getAnywhere to check for each terminal token in which namespace it is defined. Compare the result with the searchpath and you will have the answer. I compiled a function that takes a filename as argument and returns the packages which are used in the file. Note that it can only find packages that are loaded when the function is executed, so be sure to load all "candidates" first.
findUsedPackages <- function(sourcefile) {
## get parse tree
parse_data <- getParseData(parse(sourcefile))
## extract terminal tokens
terminal_tokens <- parse_data[parse_data$terminal == TRUE, "text"]
## get loaded packages/namespaces
search_path <- search()
## helper function to find the package a token belongs to
findPackage <- function(token) {
## get info where the token is defined
token_info <- getAnywhere(token)
##return the package the comes first in the search path
token_info$where[which.min(match(token_info$where, search_path))]
}
packages <- lapply(unique(terminal_tokens), findPackage)
##remove elements that do not belong to a loaded namespace
packages <- packages[sapply(packages, length) > 0]
packages <- do.call(c, packages)
packages <- unique(packages)
##do not return base and .GlobalEnv
packages[-which(packages %in% c("package:base", ".GlobalEnv"))]
}

R: Patching a package function and reloading base libraries

Occasionally one wants to patch a function in a package, without recompiling the whole package.
For example, in Emacs ESS, the function install.packages() might get stuck if tcltk is not loaded. One might want to patch install.packages() in order to require tcltk before installation and unload it after the package setup.
A temp() patched version of install.packages() might be:
## Get original args without ending NULL
temp=rev(rev(deparse(args(install.packages)))[-1])
temp=paste(paste(temp, collapse="\n"),
## Add code to load tcltk
"{",
" wasloaded= 'package:tcltk' %in% search()",
" require(tcltk)",
## Add orginal body without braces
paste(rev(rev(deparse(body(install.packages))[-1])[-1]), collapse="\n"),
## Unload tcltk if it was not loaded before by user
" if(!wasloaded) detach('package:tcltk', unload=TRUE)",
"}\n",
sep="\n")
## Eval patched function
temp=eval(parse(text=temp))
# temp
Now we want to replace the original install.packages() and perhaps insert the code in Rprofile.
To this end it is worth nothing that:
getAnywhere("install.packages")
# A single object matching 'install.packages' was found
# It was found in the following places
# package:utils
# namespace:utils
# with value
#
# ... install.packages() source follows (quite lengthy)
That is, the function is stored inside the package/namespace of utils. This environment is sealed and therefore install.packages() should be unlocked before being replaced:
## Override original function
unlockBinding("install.packages", as.environment("package:utils"))
assign("install.packages", temp, envir=as.environment("package:utils"))
unlockBinding("install.packages", asNamespace("utils"))
assign("install.packages", temp, envir=asNamespace("utils"))
rm(temp)
Using getAnywhere() again, we get:
getAnywhere("install.packages")
# A single object matching 'install.packages' was found
# It was found in the following places
# package:utils
# namespace:utils
# with value
#
# ... the *new* install.packages() source follows
It seems that the patched function is placed in the right place.
Unfortunately, running it gives:
Error in install.packages(xxxxx) :
could not find function "getDependencies"
getDependencies() is a function inside the same utils package, but not exported; therefore it is not accessible outside its namespace.
Despite the output of getAnywhere("install.packages"), the patched install.packages() is still misplaced.
The problem is that we need to reload the utils library to obtain the desired effect, which also requires unloading other libraries importing it.
detach("package:stats", unload=TRUE)
detach("package:graphics", unload=TRUE)
detach("package:grDevices", unload=TRUE)
detach("package:utils", unload=TRUE)
library(utils)
install.packages() works now.
Of course, we need to reload the other libraries too. Given the dependencies, using
library(stats)
should reload everything. But there is a problem when reloading the graphics library, at least on Windows:
library(graphics)
# Error in FUN(X[[i]], ...) :
# no such symbol C_contour in package path/to/library/graphics/libs/x64/graphics.dll
Which is the correct way of (re)loading the graphics library?
Patching functions in packages is a low-level operation that should be avoided, because it may break internal assumptions of the execution environment and lead to unpredictable behavior/crashes. If there is a problem with tck/ESS (I didn't try to repeat that) perhaps it should be fixed or there may be a workaround. Particularly changing locked bindings is something to avoid.
If you really wanted to run some code at the start/end of say install.packages, you can use trace. It will do some of the low-level operations mentioned in the question, but the good part is you don't have to worry about fixing this whenever some new internals of R change.
trace(install.packages,
tracer=quote(cat("Starting install.packages\n")),
exit=quote(cat("Ending install packages.\n"))
)
Replace tracer and exit accordingly - maybe exit is not needed anyway, maybe you don't need to unload the package. Still, trace is a very useful tool for debugging.
I am not sure if that will solve your problem - if it would work with ESS - but in general you can also wrap install.packages in a function you define say in your workspace:
install.packages <- function(...) {
cat("Entry.\n")
on.exit(cat("Exit.\n"))
utils::install.packages(...)
}
This is the cleanest option indeed.

Check if R package is installed then load library

Our R scripts are used on multiple users on multiple computers and hence there are deviations in which packages are installed on each computer. To ensure that each script works for all users I would like to define a function pkgLoad which will first test if the package is installed locally before loading the library with suppressed startup messages. Using Check for installed packages before running install.packages() as a guide, I tried
pkgLoad <- function(x)
{
if (!require(x,character.only = TRUE))
{
install.packages(x,dep=TRUE, repos='http://star-www.st-andrews.ac.uk/cran/')
if(!require(x,character.only = TRUE)) stop("Package not found")
}
#now load library and suppress warnings
suppressPackageStartupMessages(library(x))
library(x)
}
When I try to load ggplot2 using pkgLoad("ggplot2") I get the following error message in my terminal
Error in paste("package", package, sep = ":") :
object 'ggplot2' not found
> pkgLoad("ggplot2")
Loading required package: ggplot2
Error in library(x) : there is no package called ‘x’
> pkgLoad("ggplot2")
Error in library(x) : there is no package called ‘x’
Any why x changes from ggplot2 to plain old x?
I wrote this function the other day that I thought would be useful...
install_load <- function (package1, ...) {
# convert arguments to vector
packages <- c(package1, ...)
# start loop to determine if each package is installed
for(package in packages){
# if package is installed locally, load
if(package %in% rownames(installed.packages()))
do.call('library', list(package))
# if package is not installed locally, download, then load
else {
install.packages(package)
do.call("library", list(package))
}
}
}
The CRAN pacman package that I maintain can address this nicely. Using the following header (to ensure pacman is installed first) and then the p_load function will try to load the package and then get them from CRAN if R can't load the package.
if (!require("pacman")) install.packages("pacman"); library(pacman)
p_load(qdap, ggplot2, fakePackage, dplyr, tidyr)
Use library(x,character.only=TRUE). Also you don't need the last line as suppressPackageStartupMessages(library(x,character.only=TRUE)) already loads the package.
EDIT: #LarsKotthoff is right, you already load the package inside of the if brackets. There you already use option character.only=TRUE so everything is good if you just remove last to lines of your function body.
Have a look at this nice function:
klick
The following can be used:
check.and.install.Package<-function(package_name){
if(!package_name%in%installed.packages()){
install.packages(package_name)
}
}
check.and.install.Package("RTextTools")
check.and.install.Package("e1071")
Though #maloneypatr function works fine, but it is quite silent and does not respond on success of packages loaded. I built below function that does make some checks on user entry and also respond on the number of packages being successfully installed.
lubripack <- function(...,silent=FALSE){
#check names and run 'require' function over if the given package is installed
requirePkg<- function(pkg){if(length(setdiff(pkg,rownames(installed.packages())))==0)
require(pkg, quietly = TRUE,character.only = TRUE)
}
packages <- as.vector(unlist(list(...)))
if(!is.character(packages))stop("No numeric allowed! Input must contain package names to install and load")
if (length(setdiff(packages,rownames(installed.packages()))) > 0 )
install.packages(setdiff(packages,rownames(installed.packages())),
repos = c("https://cran.revolutionanalytics.com/", "http://owi.usgs.gov/R/"))
res<- unlist(sapply(packages, requirePkg))
if(silent == FALSE && !is.null(res)) {cat("\nBellow Packages Successfully Installed:\n\n")
print(res)
}
}
Note 1:
If silent = TRUE(all capital silent), it installs and loads packages without reporting. If silent = FALSE, it reports successful installation of packages. Default value is silent = FALSE
How to use
lubripack(“pkg1","pkg2",.,.,.,.,"pkg")
Example 1: When all packages are valid and mode is not silent
lubripack(“shiny","ggvis")
or
lubripack(“shiny","ggvis", silent = FALSE)
Output
Example 2: When all packages are valid and mode is silent
lubripack(“caret","ggvis","tm", silent = TRUE)
Output 2
Example 3: When package cannot be found
lubripack(“shiny","ggvis","invalidpkg", silent=FALSE)
Output 3
How to Install Package:
Run below code to download the package and install it from GitHub. No need to have GitHub Account.
library(devtools)
install_github("espanta/lubripack")

Resources