Why require() command is preferred within functions in R? [duplicate] - r

What is the difference between require() and library()?

There's not much of one in everyday work.
However, according to the documentation for both functions (accessed by putting a ? before the function name and hitting enter), require is used inside functions, as it outputs a warning and continues if the package is not found, whereas library will throw an error.

Another benefit of require() is that it returns a logical value by default. TRUE if the packages is loaded, FALSE if it isn't.
> test <- library("abc")
Error in library("abc") : there is no package called 'abc'
> test
Error: object 'test' not found
> test <- require("abc")
Loading required package: abc
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
there is no package called 'abc'
> test
[1] FALSE
So you can use require() in constructions like the one below. Which mainly handy if you want to distribute your code to our R installation were packages might not be installed.
if(require("lme4")){
print("lme4 is loaded correctly")
} else {
print("trying to install lme4")
install.packages("lme4")
if(require(lme4)){
print("lme4 installed and loaded")
} else {
stop("could not install lme4")
}
}

In addition to the good advice already given, I would add this:
It is probably best to avoid using require() unless you actually will be using the value it returns e.g in some error checking loop such as given by thierry.
In most other cases it is better to use library(), because this will give an error message at package loading time if the package is not available. require() will just fail without an error if the package is not there. This is the best time to find out if the package needs to be installed (or perhaps doesn't even exist because it it spelled wrong). Getting error feedback early and at the relevant time will avoid possible headaches with tracking down why later code fails when it attempts to use library routines

You can use require() if you want to install packages if and only if necessary, such as:
if (!require(package, character.only=T, quietly=T)) {
install.packages(package)
library(package, character.only=T)
}
For multiple packages you can use
for (package in c('<package1>', '<package2>')) {
if (!require(package, character.only=T, quietly=T)) {
install.packages(package)
library(package, character.only=T)
}
}
Pro tips:
When used inside the script, you can avoid a dialog screen by specifying the repos parameter of install.packages(), such as
install.packages(package, repos="http://cran.us.r-project.org")
You can wrap require() and library() in suppressPackageStartupMessages() to, well, suppress package startup messages, and also use the parameters require(..., quietly=T, warn.conflicts=F) if needed to keep the installs quiet.

Always use library. Never use require.
tl;dr: require breaks one of the fundamental rules of robust software systems: fail early.
In a nutshell, this is because, when using require, your code might yield different, erroneous results, without signalling an error. This is rare but not hypothetical! Consider this code, which yields different results depending on whether {dplyr} can be loaded:
require(dplyr)
x = data.frame(y = seq(100))
y = 1
filter(x, y == 1)
This can lead to subtly wrong results. Using library instead of require throws an error here, signalling clearly that something is wrong. This is good.
It also makes debugging all other failures more difficult: If you require a package at the start of your script and use its exports in line 500, you’ll get an error message “object ‘foo’ not found” in line 500, rather than an error “there is no package called ‘bla’”.
The only acceptable use case of require is when its return value is immediately checked, as some of the other answers show. This is a fairly common pattern but even in these cases it is better (and recommended, see below) to instead separate the existence check and the loading of the package. That is: use requireNamespace instead of require in these cases.
More technically, require actually calls library internally (if the package wasn’t already attached — require thus performs a redundant check, because library also checks whether the package was already loaded). Here’s a simplified implementation of require to illustrate what it does:
require = function (package) {
already_attached = paste('package:', package) %in% search()
if (already_attached) return(TRUE)
maybe_error = try(library(package, character.only = TRUE))
success = ! inherits(maybe_error, 'try-error')
if (! success) cat("Failed")
success
}
Experienced R developers agree:
Yihui Xie, author of {knitr}, {bookdown} and many other packages says:
Ladies and gentlemen, I've said this before: require() is the wrong way to load an R package; use library() instead
Hadley Wickham, author of more popular R packages than anybody else, says
Use library(x) in data analysis scripts. […]
You never need to use require() (requireNamespace() is almost always better)

?library
and you will see:
library(package) and require(package) both load the package with name
package and put it on the search list. require is designed for use
inside other functions; it returns FALSE and gives a warning (rather
than an error as library() does by default) if the package does not
exist. Both functions check and update the list of currently loaded
packages and do not reload a package which is already loaded. (If you
want to reload such a package, call detach(unload = TRUE) or
unloadNamespace first.) If you want to load a package without putting
it on the search list, use requireNamespace.

My initial theory about the difference was that library loads the packages whether it is already loaded or not, i.e. it might reload an already loaded package, while require just checks that it is loaded, or loads it if it isn't (thus the use in functions that rely on a certain package). The documentation refutes this, however, and explicitly states that neither function will reload an already loaded package.

Here seems to be the difference on an already loaded package.
While it is true that both require and library do not load the package. Library does a lot of other things before it checks and exits.
I would recommend removing "require" from the beginning of a function running 2mil times anyway, but if, for some reason I needed to keep it. require is technically a faster check.
microbenchmark(req = require(microbenchmark), lib = library(microbenchmark),times = 100000)
Unit: microseconds
expr min lq mean median uq max neval
req 3.676 5.181 6.596968 5.655 6.177 9456.006 1e+05
lib 17.192 19.887 27.302907 20.852 22.490 255665.881 1e+05

Related

How to import sf to package to run a function that depends on lwgeom?

I'm building a package that imports {sf}, and more specifically I use st_length() in one of my functions.
I initially added only {sf} to my package "Imports", but when I checked it I got a few {lwgeom} related errors:
Running examples in 'gtfstools-Ex.R' failed
The error most likely occurred in:
> base::assign(".ptime", proc.time(), pos = "CheckExEnv")
> ### Name: get_trip_speed
> ### Title: Get trip speed
> ### Aliases: get_trip_speed
>
> ### ** Examples
>
> data_path <- system.file("extdata/spo_gtfs.zip", package = "gtfstools")
>
> gtfs <- read_gtfs(data_path)
>
> trip_speed <- get_trip_speed(gtfs)
Error in sf::st_length(trips_geometries) :
package lwgeom required, please install it first
This error happens when the examples are running, but some similar errors happen with the tests.
Then I added {lwgeom} to Imports. The check runs fine, but in the end I get a note: NOTE: Namespaces in Imports field not imported from: 'lwgeom'
What's the best practice when dealing with cases like this? Should I just keep track of this note and send it as a comment to CRAN during the package submission process?
You can consider adding the {lwgeom} package in Suggests field of your package DESCRIPTION file. It should do the trick.
The Suggests != Depends article by Dirk Eddelbuettel refers to a relevant bit of Writing R Extensions (WRE) that might be useful to this case.
Section 1.1.3.1 (suggested packages) reads (as of 2021-03-12):
Note that someone wanting to run the examples/tests/vignettes may not have a suggested package available (and it may not even be possible to install it for that platform). The recommendation used to be to make their use conditional via if(require("pkgname")): this is OK if that conditioning is done in examples/tests/vignettes, although using if(requireNamespace("pkgname")) is preferred, if possible.
However, using require for conditioning in package code is not good practice as it alters the search path for the rest of the session and relies on functions in that package not being masked by other require or library calls. It is better practice to use code like
if (requireNamespace("rgl", quietly = TRUE)) {
rgl::plot3d(...)
} else {
## do something else not involving rgl.
}
So while just adding {lwgeom} to Suggests works, we may stumble upon the issue where someone that runs a "lean installation" (i.e. without suggested packages) of my package won't be able to use the functions that rely on {lwgeom}.
More importantly, if an author of a package that I am importing decides to run a reverse dependency check on my package while not installing suggested packages, the check would fail because I'd have a few examples, tests and vignettes bits failing due to not having {lwgeom} available.
Thus, in addition to listing it in Suggests, I added some checks on examples and vignettes like suggested by WRE:
*examples/vignette context*
# the examples below require the 'lwgeom' package to be installed
if (requireNamespace("lwgeom", quietly = TRUE)) {
... do something ...
}
In the functions that require {lwgeom} I added:
if (!requireNamespace("lwgeom", quietly = TRUE))
stop(
"The 'lwgeom' package is required to run this function. ",
"Please install it first."
)
And added this bit to the tests of such functions (using {testthat}):
if (!requireNamespace("lwgeom", quietly = TRUE)) {
expect_error(
set_trip_speed(gtfs, "CPTM L07-0", 50),
regexp = paste0(
"The \\'lwgeom\\' package is required to run this function\\. ",
"Please install it first\\."
)
)
skip("'lwgeom' package required to run set_trip_speed() tests.")
}

R: Patching a package function and reloading base libraries

Occasionally one wants to patch a function in a package, without recompiling the whole package.
For example, in Emacs ESS, the function install.packages() might get stuck if tcltk is not loaded. One might want to patch install.packages() in order to require tcltk before installation and unload it after the package setup.
A temp() patched version of install.packages() might be:
## Get original args without ending NULL
temp=rev(rev(deparse(args(install.packages)))[-1])
temp=paste(paste(temp, collapse="\n"),
## Add code to load tcltk
"{",
" wasloaded= 'package:tcltk' %in% search()",
" require(tcltk)",
## Add orginal body without braces
paste(rev(rev(deparse(body(install.packages))[-1])[-1]), collapse="\n"),
## Unload tcltk if it was not loaded before by user
" if(!wasloaded) detach('package:tcltk', unload=TRUE)",
"}\n",
sep="\n")
## Eval patched function
temp=eval(parse(text=temp))
# temp
Now we want to replace the original install.packages() and perhaps insert the code in Rprofile.
To this end it is worth nothing that:
getAnywhere("install.packages")
# A single object matching 'install.packages' was found
# It was found in the following places
# package:utils
# namespace:utils
# with value
#
# ... install.packages() source follows (quite lengthy)
That is, the function is stored inside the package/namespace of utils. This environment is sealed and therefore install.packages() should be unlocked before being replaced:
## Override original function
unlockBinding("install.packages", as.environment("package:utils"))
assign("install.packages", temp, envir=as.environment("package:utils"))
unlockBinding("install.packages", asNamespace("utils"))
assign("install.packages", temp, envir=asNamespace("utils"))
rm(temp)
Using getAnywhere() again, we get:
getAnywhere("install.packages")
# A single object matching 'install.packages' was found
# It was found in the following places
# package:utils
# namespace:utils
# with value
#
# ... the *new* install.packages() source follows
It seems that the patched function is placed in the right place.
Unfortunately, running it gives:
Error in install.packages(xxxxx) :
could not find function "getDependencies"
getDependencies() is a function inside the same utils package, but not exported; therefore it is not accessible outside its namespace.
Despite the output of getAnywhere("install.packages"), the patched install.packages() is still misplaced.
The problem is that we need to reload the utils library to obtain the desired effect, which also requires unloading other libraries importing it.
detach("package:stats", unload=TRUE)
detach("package:graphics", unload=TRUE)
detach("package:grDevices", unload=TRUE)
detach("package:utils", unload=TRUE)
library(utils)
install.packages() works now.
Of course, we need to reload the other libraries too. Given the dependencies, using
library(stats)
should reload everything. But there is a problem when reloading the graphics library, at least on Windows:
library(graphics)
# Error in FUN(X[[i]], ...) :
# no such symbol C_contour in package path/to/library/graphics/libs/x64/graphics.dll
Which is the correct way of (re)loading the graphics library?
Patching functions in packages is a low-level operation that should be avoided, because it may break internal assumptions of the execution environment and lead to unpredictable behavior/crashes. If there is a problem with tck/ESS (I didn't try to repeat that) perhaps it should be fixed or there may be a workaround. Particularly changing locked bindings is something to avoid.
If you really wanted to run some code at the start/end of say install.packages, you can use trace. It will do some of the low-level operations mentioned in the question, but the good part is you don't have to worry about fixing this whenever some new internals of R change.
trace(install.packages,
tracer=quote(cat("Starting install.packages\n")),
exit=quote(cat("Ending install packages.\n"))
)
Replace tracer and exit accordingly - maybe exit is not needed anyway, maybe you don't need to unload the package. Still, trace is a very useful tool for debugging.
I am not sure if that will solve your problem - if it would work with ESS - but in general you can also wrap install.packages in a function you define say in your workspace:
install.packages <- function(...) {
cat("Entry.\n")
on.exit(cat("Exit.\n"))
utils::install.packages(...)
}
This is the cleanest option indeed.

Dealing with conflicting namespaces in R (same function names in different packages): reset precedence of a package namespace

The name conflicts between namespaces from different packages in R can be dangerous, and the use of package::function is unfortunately not generalized in R...
Isn't there a function that can reset the precedence of a package namespace over all the others currently loaded?
Surely we can detach and then reload the package, but isn't there any other, more practical (one-command) way?
Because I often end up with many packages and name conflicts in my R sessions, I use the following function to do that:
set_precedence <- function(pckg) {
pckg <- deparse(substitute(pckg))
detach(paste("package", pckg, sep = ":"), unload=TRUE, character.only=TRUE)
library(pckg, character.only=TRUE)
}
# Example
set_precedence(dplyr)
No built-in way to achieve this in a single command?
Or a way that doesn't imply detaching and reloading the package, in case it is heavy to load, and working directly on namespaces?
Here's a way that do not reload the package and work directly on the environments/namespaces.
Just replace your library() call with attachNamespace():
set_precedence <- function(pkg) {
detach(paste0("package:", pkg), character.only = TRUE)
attachNamespace(pkg)
}
set_precedence('utils')
# now utils is in pos #2 in `search()`
I would suggest taking a look at the conflicted package.
Prefix the package name with a double colon: <package>::<function>()
For instance:
ggplot2::ggplot(data=data, ggplot2::aes(x=x)) +
ggplot2::geom_histogram()
More typing, but I feel so much less anxious using R now that I have found this.

R: Sometimes system.file not working as documented

The system.file commands in my package examples sometimes fail unpredictably, while passing at other times. I do not understand why.
I typically use:
> system.file("examples", "trees.xml", package="RNeXML", mustWork=TRUE)
which usually works, but sometimes fails (even in an interactive session):
Error in system.file("examples", "trees.xml", package = "RNeXML", mustWork = TRUE) :
no file found
when it is failing, I can get this to work:
> system.file("examples", "trees.xml", lib.loc = .libPaths()[1], package="RNeXML", mustWork=TRUE)
[1] "/home/cboettig/R/x86_64-pc-linux-gnu-library/3.0/RNeXML/examples/trees.xml"
Which doesn't make any sense to me, because the documentation of system.file says that it checks libPaths automatically if no value for lib.loc is provided.
So why does it work if I give the .libPaths()[1] explicitly?
It seems like explicitly telling my package to use the first path, .libPaths()[1], would be less stable.
Since this is a heisenbug, set options(error = recover) and when prompted for a frame number, pick the one that brings you into system.file. (For more on what I'm about to explain, see Hadley's Exceptions and Debugging tutorial.) Then step through using the debugger and determine if packagePath gets loaded correctly using find.package(package, lib.loc, quiet = TRUE). I inspected this latter function and couldn't find anything immediately wrong, so it may be something system-specific. Could you post your sessionInfo()?
If packagePath is fine, then the answer lies somewhere in the rest of system.file's body:
FILES <- file.path(packagePath, ...)
present <- file.exists(FILES)
if (any(present))
FILES[present]
else ""
This would make life very hard for us since I doubt there are problems with any of these functions. If packagePath is not what you expect, you can use the recover frame number prompt again to dive back into system.file, and this time type debug(find.package) so you can step through that function. Inspect dirs and paths after the for (lib in lib.loc), and step through the few ifs that follow.
If none of these work, and you don't spot any mischief (which is very hard with the transparency of a step-by-step debugger), you can always try to dump.frames and upload the file for us. I am not sure how useful it will be, since even if we install the same packages, there may be path issues, but it's worth a shot.
Finally, if you don't care about all of the above, a hack that works would be:
trees_path <- ""
for(lib in .libPaths()) {
trees_path <- system.file("examples", "trees.xml", lib.loc = lib, package="RNeXML", mustWork = FALSE)
if (trees_path != "") break;
}
if (trees_path == "") stop("examples/trees.xml not found using any library paths")

What is the difference between require() and library()?

What is the difference between require() and library()?
There's not much of one in everyday work.
However, according to the documentation for both functions (accessed by putting a ? before the function name and hitting enter), require is used inside functions, as it outputs a warning and continues if the package is not found, whereas library will throw an error.
Another benefit of require() is that it returns a logical value by default. TRUE if the packages is loaded, FALSE if it isn't.
> test <- library("abc")
Error in library("abc") : there is no package called 'abc'
> test
Error: object 'test' not found
> test <- require("abc")
Loading required package: abc
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
there is no package called 'abc'
> test
[1] FALSE
So you can use require() in constructions like the one below. Which mainly handy if you want to distribute your code to our R installation were packages might not be installed.
if(require("lme4")){
print("lme4 is loaded correctly")
} else {
print("trying to install lme4")
install.packages("lme4")
if(require(lme4)){
print("lme4 installed and loaded")
} else {
stop("could not install lme4")
}
}
In addition to the good advice already given, I would add this:
It is probably best to avoid using require() unless you actually will be using the value it returns e.g in some error checking loop such as given by thierry.
In most other cases it is better to use library(), because this will give an error message at package loading time if the package is not available. require() will just fail without an error if the package is not there. This is the best time to find out if the package needs to be installed (or perhaps doesn't even exist because it it spelled wrong). Getting error feedback early and at the relevant time will avoid possible headaches with tracking down why later code fails when it attempts to use library routines
You can use require() if you want to install packages if and only if necessary, such as:
if (!require(package, character.only=T, quietly=T)) {
install.packages(package)
library(package, character.only=T)
}
For multiple packages you can use
for (package in c('<package1>', '<package2>')) {
if (!require(package, character.only=T, quietly=T)) {
install.packages(package)
library(package, character.only=T)
}
}
Pro tips:
When used inside the script, you can avoid a dialog screen by specifying the repos parameter of install.packages(), such as
install.packages(package, repos="http://cran.us.r-project.org")
You can wrap require() and library() in suppressPackageStartupMessages() to, well, suppress package startup messages, and also use the parameters require(..., quietly=T, warn.conflicts=F) if needed to keep the installs quiet.
Always use library. Never use require.
tl;dr: require breaks one of the fundamental rules of robust software systems: fail early.
In a nutshell, this is because, when using require, your code might yield different, erroneous results, without signalling an error. This is rare but not hypothetical! Consider this code, which yields different results depending on whether {dplyr} can be loaded:
require(dplyr)
x = data.frame(y = seq(100))
y = 1
filter(x, y == 1)
This can lead to subtly wrong results. Using library instead of require throws an error here, signalling clearly that something is wrong. This is good.
It also makes debugging all other failures more difficult: If you require a package at the start of your script and use its exports in line 500, you’ll get an error message “object ‘foo’ not found” in line 500, rather than an error “there is no package called ‘bla’”.
The only acceptable use case of require is when its return value is immediately checked, as some of the other answers show. This is a fairly common pattern but even in these cases it is better (and recommended, see below) to instead separate the existence check and the loading of the package. That is: use requireNamespace instead of require in these cases.
More technically, require actually calls library internally (if the package wasn’t already attached — require thus performs a redundant check, because library also checks whether the package was already loaded). Here’s a simplified implementation of require to illustrate what it does:
require = function (package) {
already_attached = paste('package:', package) %in% search()
if (already_attached) return(TRUE)
maybe_error = try(library(package, character.only = TRUE))
success = ! inherits(maybe_error, 'try-error')
if (! success) cat("Failed")
success
}
Experienced R developers agree:
Yihui Xie, author of {knitr}, {bookdown} and many other packages says:
Ladies and gentlemen, I've said this before: require() is the wrong way to load an R package; use library() instead
Hadley Wickham, author of more popular R packages than anybody else, says
Use library(x) in data analysis scripts. […]
You never need to use require() (requireNamespace() is almost always better)
?library
and you will see:
library(package) and require(package) both load the package with name
package and put it on the search list. require is designed for use
inside other functions; it returns FALSE and gives a warning (rather
than an error as library() does by default) if the package does not
exist. Both functions check and update the list of currently loaded
packages and do not reload a package which is already loaded. (If you
want to reload such a package, call detach(unload = TRUE) or
unloadNamespace first.) If you want to load a package without putting
it on the search list, use requireNamespace.
My initial theory about the difference was that library loads the packages whether it is already loaded or not, i.e. it might reload an already loaded package, while require just checks that it is loaded, or loads it if it isn't (thus the use in functions that rely on a certain package). The documentation refutes this, however, and explicitly states that neither function will reload an already loaded package.
Here seems to be the difference on an already loaded package.
While it is true that both require and library do not load the package. Library does a lot of other things before it checks and exits.
I would recommend removing "require" from the beginning of a function running 2mil times anyway, but if, for some reason I needed to keep it. require is technically a faster check.
microbenchmark(req = require(microbenchmark), lib = library(microbenchmark),times = 100000)
Unit: microseconds
expr min lq mean median uq max neval
req 3.676 5.181 6.596968 5.655 6.177 9456.006 1e+05
lib 17.192 19.887 27.302907 20.852 22.490 255665.881 1e+05

Resources