Which R functions are not suitable for programmatic use? - r

Some functions like browser only make sense when used interactively.
It is widely regarded that the subset function should only be used interactively.
Similarly, sapply isn't good for programmatic use since it doesn't simplify the result for zero length inputs.
I'm trying to make an exhaustive list of functions that are only not suitable for programmatic use.
The plan is to make a tool for package checking to see if any of these functions are called and give a warning.
There are other functions like file.choose and readline that require interactivity, but these are OK to include in packages, since the end use will be interactive. I don't care too much about these for this use case but feel free to add them to the list.
Which functions have I missed?

(Feel free to edit.)
The following functions should be handled with care (which does not necessarily mean they are not suitable for programming):
Functions whose outputs do not have a consistent output class depending on the inputs: sapply, mapply (by default)
Functions whose internal behavior is different depending on the input length: sample, seq
Functions that evaluate some of their arguments within environments: $, subset, with, within, transform.
Functions that go against normal environment usage: attach, detach, assign, <<-
Functions that allow partial matching: $
Functions that only make sense in interactive usage: browser, recover, debug, debugonce, edit, fix, menu, select.list
Functions that can be a threat (virus) if used with user inputs: source, eval(parse(text=...)), system.
Also, to some extent, every function that generates warnings rather than errors. I recommend using options(warn = 2) to turn all warnings into errors in a programming application. Specific cases can then be allowed via suppressWarnings or try.

This is in answer to the comment after the question by the poster. This function inputs a function and returns the bad functions found with their line number. It can generate false positives but they are only warnings anways so that does not seem too bad. Modify bad to suit.
badLines <- function(func) {
bad <- c("sapply", "subset", "attach")
regex <- paste0("\\b", bad, "\\b")
result <- sort(unlist(sapply(regex, FUN = grep, body(func), simplify = FALSE)))
setNames(result, gsub("\\b", "", names(result), fixed = TRUE))
}
badLines(badLines)
## sapply1 subset attach sapply2
## 2 2 2 4

Related

Determining which version of a function is active when many packages are loaded

If I have multiple packages loaded that define functions of the same name, is there an easy way to determine which version of the function is currently the active one? Like, lets say I have base R, the tidyverse, and a bunch of time series packages loaded. I'd like a function which_package("intersect") that would tell me the package name of the active version of the intersect function. I know you can go back and look at all the warning messages you recieved when installing packages, but I think that sort of manual search is not only tedious but also error-prone.
There is a function here that does sort of what I want, except it produces a table for all conflicts rather than the value for one function. I would actually be quite happy with that, and would also accept a similar function as an answer, but I have had problems with the implimentation of function given. As applied to my examples, it inserts vast amounts of white space and many duplicates of the package names (e.g. the %>% function shows up with 132 packages listed), making the output hard to read and hard to use. It seems like it should be easy to remove the white space and duplicates, and I have spent considerable time on various approaches that I expected to work but which had no impact on the outcome.
So, for an example of many conflicts:
install.packages(pkg = c("tidyverse", "fpp3", "tsbox", "rugarch", "Quandl", "DREGAR", "dynlm", "zoo", "GGally", "dyn", "ARDL", "bigtime", "BigVAR", "dLagM", "VARshrink")
lapply(x = c("tidyverse", "fable", "tsbox", "rugarch", "Quandl", "DREGAR", "dynlm", "zoo", "GGally", "dyn", "ARDL", "bigtime", "BigVAR", "dLagM", "VARshrink"),
library, character.only = TRUE)
You can pull this information with your own function helper.
which_package <- function(fun) {
if(is.character(fun)) fun <- getFunction(fun)
stopifnot(is.function(fun))
x <- environmentName(environment(fun))
if (!is.null(x)) return(x)
}
This will return R_GlobalEnv for functions that you define in the global environment. There is also the packageName function if you really want to restrict it to packages only.
For example
library(MASS)
library(dplyr)
which_package(select)
# [1] "dplyr"

In R, do an operation temporarily using a setting such as working directory

I'm almost certain I've read somewhere how to do this. Instead of having to save the current option (say working directory) to a variable, change the w.d, do an operation, and then revert back to what it was, doing this inside a function akin to "with" relative to attach/detach. A solution just for working directory is what I need now, but there might be a more generic function that does that sort of things? Or ain't it?
So to illustrate... The way it is now:
curdir <- getwd()
setwd("../some/place")
# some operation
setwd(curdir)
The way it is in my wildest dreams:
with.dir("../some/place", # some operation)
I know I could write a function for this, I just have the impression there's something more readily available and generalizable to other parameters too.
Thanks
There is an idiom for this in some of R's base plotting functions
op <- par(no.readonly = TRUE)
# par(blah = stuff)
# plot(stuff)
par(op)
that is so unbelievably crude as to be fully portable to options() and setwd().
Fortunately it's also easy to implement a crude wrapper:
with_dir <- function(dir, expr) {
old_wd <- getwd()
setwd(dir)
result <- evalq(expr)
setwd(old_wd)
result
}
I'm no wizard with nonstandard evaluation so evalq could be unstable somehow. More on NSE in an old write-up by Lumley and also in Wickham's Advanced R, but it's dense stuff and I haven't wrapped my head around it all yet.
edit: as per Ben Bolker's comment, it's probably better to use on.exit for this:
with_dir <- function(dir, expr) {
old_wd <- getwd()
on.exit(setwd(old_wd))
setwd(dir)
evalq(expr)
}
From the R docs:
on.exit records the expression given as its argument as needing to be executed when the current function exits (either naturally or as the result of an error). This is useful for resetting graphical parameters or performing other cleanup actions.
What you're describing depends upon two things: detecting when you enter and leave a particular lexical scope, and defining a behavior to do on entrance and on exit. Python has these, called "Context Managers". This was a big deal when it was released, and many parts of Python's standard library now behave like context managers, and have to define the "enter" and "exit" behavior in explicitly, or by leveraging some clever inheritance scheme.
with.default
function (data, expr, ...)
eval(substitute(expr), data, enclos = parent.frame())
<bytecode: 0x07d02ccc>
<environment: namespace:base>
R's with function works sort of like a context manager, because it can pass scopes around easily. That said, this doesn't give you the "enter" and "exit" operations for free. Especially consider that the current working directory isn't an entry in the current scope, but a state of the R interpreter, which can only be queried or changed by function calls behind the .Internal shield.
You can easily define your own object types to have methods that are context manager-like for the with generic function, as well as writing and registering methods for other types you commonly use, but it is not part of the base R language.

Changing defaults in a function inside a locked package [duplicate]

This question already has answers here:
Setting Function Defaults R on a Project Specific Basis
(2 answers)
Closed 9 years ago.
I am developing my first package and it is aimed at users who are new to R, so I am trying to minimize the amount of R skills required to use the package. As a result I want a function that changes defaults in other functions within my package. But I get the following error "cannot add bindings to a locked environment", which means the environment of the package is locked and I am not allowed to change the default values of its functions.
Here is an example that throws a similar error:
library(ggplot2)
assign(formals(geom_point)$position, "somethingelse", pos="package:ggplot2")
When I try assignInNamespace i get:
Error in bindingIsLocked(x, ns) : no binding for "identity"
assignInNamespace(formals(geom_point)$position,"somethingelse", pos = "package:ggplot2")
Here is an example of what I hope to achieve.
default <- function(x=c("A", "B", "C")){
x
}
default()
change.default <- function(x){
formals(default)$x <<- x # Notice the global assign
}
change.default(1:3)
default()
I am aware that this is far from the recommended approach, but I am willing to cut corners to improve the learning curve of the package. Is there a way to achieve this?
This question has been marked as a duplicate of Setting Function Defaults R on a Project Specific Basis. This is a different situation as this question concerns how to allow the user in a interactive session to change the defaults of a function - not how to actually do it. The old question could not have been solved with the options() function and it is therefore a different question.
I think the colloquial way to achieve what you want is via option and packages in fact do so, e.g., lattice (although they use special options) or ascii.
Furthermore, this is also done so in base R, e.g., the famous and notorious default for stringsAsFactors.
If you look at ?read.table or ?data.frame you get: stringsAsFactors = default.stringsAsFactors(). Inspecting this reveals:
> default.stringsAsFactors
function ()
{
val <- getOption("stringsAsFactors")
if (is.null(val))
val <- TRUE
if (!is.logical(val) || is.na(val) || length(val) != 1L)
stop("options(\"stringsAsFactors\") not set to TRUE or FALSE")
val
}
<bytecode: 0x000000000b068478>
<environment: namespace:base>
The relevant part here is getOption("stringsAsFactors") which produces:
> getOption("stringsAsFactors")
[1] TRUE
Changing it is achieved like this:
> options(stringsAsFactors = FALSE)
> getOption("stringsAsFactors")
[1] FALSE
To do what you want your package would need to set an option, and the function take it's values form the options. Another function could then change the options:
options(foo=c("A", "B", "C"))
default <- function(x=getOption("foo")){
x
}
default()
change.default <- function(x){
options(foo=x)
}
change.default(1:3)
default()
If you want your package to set the options when loaded, you need to create a .onAttach or .onLoad function in zzz.R. My afex package e.g., does this and changes the default contrasts. In your case it could look like the following:
.onAttach <- function(libname, pkgname) {
options(foo=c("A", "B", "C"))
}
ascii does it via .onLoad (I don't remember what is the exact difference, but Writing R Extensions will help).
Preferably, a function has the following things:
Input arguments
A function body which does something with those arguments
Output arguments
So in your situation where you want to change something about the behavior of a function, changing the input arguments in the best way to go. See for example my answer to another post.
You could also use an option to save some global settings (e.g. which font to use, which PATH the packages you use are stored), see the answer of #James in the question I linked above. But use these things sparingly as it makes the code hard to read. I would primarily use them read only, i.e. set them once (either by the package or the user) and not allow functions to change them.
The unreadability stems from the fact that the behavior of the function is not solely determined locally (i.e. by the code directly working with it), but also by settings far away. This makes it hard to determine what a function does by purely looking at the code calling it, but you have to dig through much more code to fully understand what is going on. In addition, what if other functions change those options, making it even harder to predict what a given function will do as it depends on the history of functions. And here comes my earlier recommendation for read-only options back into play, if these are read only, some of the problems about readability are lessened.

Setting Function Defaults R on a Project Specific Basis

Commonly, I use the same function settings. I'm wondering if there is a method, other than having a new object in the path that is essentially a wrapper for the function, to set default arguments. For example:
paste() has it's sep argument set to a space =" ", I'm tired of writing ,sep="" over and over. So is there a way to "temporarily" replace the function with my chosen defaults?
paste(...,sep="")
Can I accomplish this through packaging? I've sometimes noticed that, some packages force other equally named functions to be masked in the global environment.
Ideally, I'd like something that can be set on a project by project basis in (load.r or some other such workflow startpoint)
I'd personally be very hesitant to change the default behavior of any commonly used functions --- especially base R functions. For one thing, it will immediately decrease the portability of any scripts or code snippets in which you use the redefined functions. Worse, other R users reading your scripts will likely be either: (a) unaware of your private meanings for well-known language elements or (b) frustrated at having to rewire their own expectations for the functions. For me, it would also feel like an added mental burden to attach different meanings to the same symbol in different settings.
I think a much better solution is to create similarly named functions implementing your preferred defaults. A slightly modified name will effectively flag that this isn't the familiar base function, without burdening you with much or any extra typing. A good example are the paste0() and cat0() functions that are included in the gsubfn package. (Clearly you and I aren't the only two to find ourselves (frequently) annoyed by the paste()'s default sep setting!):
library(gsubfn)
paste0
# function (..., sep = "")
# paste(..., sep = sep)
# <environment: namespace:gsubfn>
cat0
# function (..., sep = "")
# cat(..., sep = sep)
# <environment: namespace:gsubfn>
You can then either collect a number of these functions in a text file, sourcing them early in your script, or (better) package them up and load them via a call to library().
The Defaults package used to do that; retired in 2014.

Protecting function names in R

Is it possible in R to protect function names (or variables in general) so that they cannot be masked.
I recently spotted that this can be a problem when creating a data frame with the name "new", which masked a function used by lmer and thus stopped it working. (Recovery is easy once you know what the problem is, here "rm(new)" did it.)
There is an easy workaround for your problem, without worrying about protecting variable names (though playing with lockBinding does look fun). If a function becomes masked, as in your example, it is still possible to call the masked version, with the help of the :: operator.
In general, the syntax is packagename::variablename.
(If the function you want has not been exported from the package, then you need three colons instead, :::. This shouldn't apply in this case however.)
Maybe use environments! This is a great way to separate namespaces. For example:
> a <- new.env()
> assign('printer', function(x) print(x), envir=a)
> get('printer', envir=a)('test!')
[1] "test!"
#hdallazuanna recommends (via Twitter)
new <- 1
lockBinding('new', globalenv())
this makes sense when the variable is user created but does not, of course, prevent overwriting a function from a package.
I had the reverse problem from the OP, and I wanted to prevent my custom functions in .Rprofile from being overridden when I defined a variable with the same name as a function, but I ended up putting my functions to ~/.R.R and I added these lines to .Rprofile:
if("myfuns"%in%search())detach("myfuns")
source("~/.R.R",attach(NULL,name="myfuns"))
From the help page of attach:
One useful ‘trick’ is to use ‘what = NULL’ (or equivalently a
length-zero list) to create a new environment on the search path
into which objects can be assigned by assign or load or
sys.source.
...
## create an environment on the search path and populate it
sys.source("myfuns.R", envir = attach(NULL, name = "myfuns"))

Resources