Parsing ellipsis arguments - r

I'm trying to write a wrapper for some ggplot2 graphs and am trying to use ellipsis' to make the function flexible. I want to save the user (n = 1 me!) from having to explicitly pass the axis titles so thought it might be possible to parse the ... arguments and set the axis titles appropriately. I've read in several Stackoverflow threads (e.g. 1, e.g. 2 or even the R Documentation) that ellipsis arguments can be converted to a list using args <- list(...) so have knocked up a simplified example...
test <- function(...){
args <- list(...)
is.list(args) %>% print()
if(grepl('a', args)){
title <- 'A'
}
else if(grepl('b', args)){
title <- 'B'
}
return(title)
}
Testing the function I get what I expect when supplying a single a as an argument...
> test(a)
[1] TRUE
[1] "A"
But when I try passing other arguments including multiple one via ellipsis I don't understand what is happening. One non-a argument
> test(b)
Error in test(b) (from #2) : object 'b' not found
...then the first argument as a with secondary ones...
> test(a, c, d)
Error in test(a, c, d) (from #2) : object 'd' not found
...or non a at first but something further down the line which should match....
> test(c, b, d)
Error in test(c, b, d) (from #2) : object 'b' not found
The problem is cropping up at args <- list(...) because the logical test to see if args is a list isn't printed, but this doesn't fit with what I've read list(...) does (which is turn the ellipsis arguments into a list). I expect I may need to use something like args <- list(...) %>% unlist() in order to convert the list into a vector which can then be used as an argument in grepl() (and have actually tried it but as far as I can tell the error is occurring before getting to the if()) but I don't understand whats going on and would be grateful for any explanations.
EDIT :
In light of comments it looks like this is a problem of my own creation as I'm mixing Standard and Non-Standard Evaluation. I had been trying to write a wrapper to ggplot2 and have been using the NSE Vignette and lazyeeval vignette to learn/guide me (as well as various threads here on SO), but was faltering when trying to pick out specific variables from the ellipsis (...) to pass to the ggplot call I was making.
Downside is work want results and don't afford much time for learning/improving our coding practices so I'll switch to using Standard Evaulation and have another stab at properly understanding Non-Standard Evaluation in the future.

Related

Referring to package and function as arguments in another function

I am trying to find methods for specific functions across different packages in R. For example methods(broom::tidy) will return all methods for the function tidy in the package broom. For my current issue it would be better if I could have the methods function in another function like so:
f1 <- function(x,y){
methods(x::y)
}
(I removed other parts of the code that are not relevant to my issue.)
However when I run the function like this:
f1 <- function(x,y){ methods(x::y)}
f1(broom,tidy)
I get the error
Error in loadNamespace(name) : there is no package called ‘x’
If I try to modify it as to only change the function but keep the package the same I get a similar error :
f2 <- function(y){ methods(broom::y)}
f2(tidy)
Error: 'y' is not an exported object from 'namespace:broom'
How can I get the package and function name to evaluate properly in the function? Does this current issue have to do with when r is trying to evaluate/substitute values in the function?
Both the :: and methods() functions use non-standard evaluation in order to work. This means you need to be a bit more clever with passing values to the functions in order to get it to work. Here's one method
f1 <- function(x,y){
do.call("methods", list(substitute(x::y)))
}
f1(broom,tidy)
Here we use substitute() to expand and x and y values we pass in into the namespace lookup. That solves the :: part which you can see with
f2 <- function(x,y){
substitute(x::y)
}
f2(broom,tidy)
# broom::tidy
We need the substitute because there could very well be a package x with function y. For this reason, variables are not expanded when using ::. Note that :: is just a wrapper to getExportedValue() should you otherwise need to extract values from namespaces using character values.
But there is one other catch: methods() doesn't evaluate it's parameters, it uses the raw expression to find the methods. This means we don't actually need the value of broom::tidy, we to pass that literal expression. Since we need to evaluate the substitute to get the expression we need, we need to build the call with do.call() in order to evaluate the substitute and pass that expression on to methods()

Dispatch of `rbind` and `cbind` for a `data.frame`

Background
The dispatch mechanism of the R functions rbind() and cbind() is non-standard. I explored some possibilities of writing rbind.myclass() or cbind.myclass() functions when one of the arguments is a data.frame, but so far I do not have a satisfactory approach. This post concentrates on rbind, but the same holds for cbind.
Problem
Let us create an rbind.myclass() function that simply echoes when it has been called.
rbind.myclass <- function(...) "hello from rbind.myclass"
We create an object of class myclass, and the following calls to rbind all
properly dispatch to rbind.myclass()
a <- "abc"
class(a) <- "myclass"
rbind(a, a)
rbind(a, "d")
rbind(a, 1)
rbind(a, list())
rbind(a, matrix())
However, when one of the arguments (this need not be the first one), rbind() will call base::rbind.data.frame() instead:
rbind(a, data.frame())
This behavior is a little surprising, but it is actually documented in the
dispatch section of rbind(). The advice given there is:
If you want to combine other objects with data frames,
it may be necessary to coerce them to data frames first.
In practice, this advice may be difficult to implement. Conversion to a data frame may remove essential class information. Moreover, the user who might be unware of the advice may be stuck with an error or an unexpected result after issuing the command rbind(a, x).
Approaches
Warn the user
A first possibility is to warn the user that the call to rbind(a, x) should not be made when x is a data frame. Instead, the user of package mypackage should make an explicit call to a hidden function:
mypackage:::rbind.myclass(a, x)
This can be done, but the user has to remember to make the explicit call when needed. Calling the hidden function is something of a last resort, and should not be regular policy.
Intercept rbind
Alternatively, I tried to shield the user by intercepting dispatch. My first try was to provide a local definition of base::rbind.data.frame():
rbind.data.frame <- function(...) "hello from my rbind.data.frame"
rbind(a, data.frame())
rm(rbind.data.frame)
This fails as rbind() is not fooled in calling rbind.data.frame from the .GlobalEnv, and calls the base version as usual.
Another strategy is to override rbind() by a local function, which was suggested in S3 dispatching of `rbind` and `cbind`.
rbind <- function (...) {
if (attr(list(...)[[1]], "class") == "myclass") return(rbind.myclass(...))
else return(base::rbind(...))
}
This works perfectly for dispatching to rbind.myclass(), so the user can now type rbind(a, x) for any type of object x.
rbind(a, data.frame())
The downside is that after library(mypackage) we get the message The following objects are masked from ‘package:base’: rbind .
While technically everything works as expected, there should be better ways than a base function override.
Conclusion
None of the above alternatives is satisfactory. I have read about alternatives using S4 dispatch, but so far I have not located any implementations of the idea. Any help or pointers?
As you mention yourself, using S4 would be one good solution that works nicely. I have not investigated recently, with data frames as I am much more interested in other generalized matrices, in both of my long time CRAN packages 'Matrix' (="recommended", i.e. part of every R distribution) and in 'Rmpfr'.
Actually even two different ways:
1) Rmpfr uses the new way to define methods for the '...' in rbind()/cbind().
this is well documented in ?dotsMethods (mnemonic: '...' = dots) and implemented in Rmpfr/R/array.R line 511 ff (e.g. https://r-forge.r-project.org/scm/viewvc.php/pkg/R/array.R?view=annotate&root=rmpfr)
2) Matrix uses the older approach by defining (S4) methods for rbind2() and cbind2(): If you read ?rbind it does mention that and when rbind2/cbind2 are used. The idea there: "2" means you define S4 methods with a signature for two ("2") matrix-like objects and rbind/cbind uses them for two of its potentially many arguments recursively.
The dotsMethod approach was suggested by Martin Maechler and implemented in the Rmpfr package. We need to define a new generic, class and a method using S4.
setGeneric("rbind", signature = "...")
mychar <- setClass("myclass", slots = c(x = "character"))
b <- mychar(x = "b")
rbind.myclass <- function(...) "hello from rbind.myclass"
setMethod("rbind", "myclass",
function(..., deparse.level = 1) {
args <- list(...)
if(all(vapply(args, is.atomic, NA)))
return( base::cbind(..., deparse.level = deparse.level) )
else
return( rbind.myclass(..., deparse.level = deparse.level))
})
# these work as expected
rbind(b, "d")
rbind(b, b)
rbind(b, matrix())
# this fails in R 3.4.3
rbind(b, data.frame())
Error in rbind2(..1, r) :
no method for coercing this S4 class to a vector
I haven't been able to resolve the error. See
R: Shouldn't generic methods work internally within a package without it being attached?
for a related problem.
As this approach overrides rbind(), we get the warning The following objects are masked from 'package:base': rbind.
I don't think you're going to be able to come up with something completely satisfying. The best you can do is export rbind.myclass so that users can call it directly without doing mypackage:::rbind.myclass. You can call it something else if you want (dplyr calls its version bind_rows), but if you choose to do so, I'd use a name that evokes rbind, like rbind_myclass.
Even if you can get r-core to agree to change the dispatch behavior, so that rbind dispatches on its first argument, there are still going to be cases when users will want to rbind multiple objects together with a myclass object somewhere other than the first. How else can users dispatch to rbind.myclass(df, df, myclass)?
The data.table solution seems dangerous; I would not be surprised if the CRAN maintainers put in a check and disallow this at some point.

Are these strings or variables?

Coming from a C / Python / Java background, I have trouble understanding some R syntax, where literals look like variables, but seem to behave like strings. For example:
library(ggplot2)
library("ggplot2")
The two lines behave equivalently. However, I would expect the first line to mean "load the library whose name is stored in the ggplot2 variable" and give an error like object 'ggplot2' not found.
Speaking of ggplot2:
ggplot(data, aes(factor(arrivalRate), responseTime, fill=factor(mode))) +
geom_violin(trim=FALSE, position=dodge)
The variables arrivalRate, responseTime and mode do not exist, but somehow R knows to look them up inside the data data frame. I assume that aes actually receives strings, that are then processed using something like eval.
How does R parse code that it ends up interpreting some literals as strings?
promises
When an argument is passed to a function it is not passed as a value but is passed as a promise which consists of
the expression or code that the caller uses as the actual argument
the environment in which that expression is to be evaluated, viz. the caller's environment.
the value that the expression represents when the expression is evaluated in the promise's environment -- this slot is not filled in until the promise is actually evaluated. It will never be filled in if the function never accesses it.
The pryr package can show the info in a promise:
library(pryr)
g <- function(x) promise_info(x)
g(ggplot2)
giving:
$code
ggplot2 <-- the promise x represents the expression ggplot2
$env
<environment: R_GlobalEnv> <-- if evaluated it will be done in this environment
$evaled
[1] FALSE <-- it has not been evaluated
$value
NULL <-- not filled in because promise has not been evaluated
The only one of the above slots in the pryr output that can be accessed at the R level without writing a C function to do it (or using a package such as pryr that accesses such C code) is the code slot. That can be done using the R function substitute(x) (or other means). In terms of the pryr output substitute applied to a promise returns the code slot without evaluating the promise. That is, the value slot is not modified. Had we accessed x in an ordinary way, i.e. not via substitute, then the code would have been evaluated in the promise's environment, stored in the value slot and then passed to the expression in the function that accesses it.
Thus either of the following result in a character string representing what was passed as an expression, i.e. the character representation of the code slot, as opposed to its value.
f <- function(x) as.character(substitute(x))
f("ggplot2")
## [1] "ggplot2"
f(ggplot2)
## [1] "ggplot2"
library
In fact, library uses this idiom, i.e. as.character(substitute(x)), to handle its first argument.
aes
The aes function uses match.call to get the entire call as an expression and so in a sense is an alternative to substitute. For example:
h <- function(x) match.call()
h(pi + 3)
## h(x = pi + 3)
Note
One cannot tell without looking at the documentation or code of a function how it will treat its arguments.
An interesting quirk of the R language is the way it evaluates expressions. In most cases, R behaves the way you'd expect. Expressions in quotes are treated as strings, anything else is treated as a variable, function, or other token. But some functions allow for "non-standard evaluation", in which an unquoted expression is evaluated, more or less, as if it were a quoted variable. The most common example of this is R's way of loading libraries (which allows for unquoted or quoted library names) and its succinct formula interface. Other packages can take advantage of NSE. Hadley Wickham makes extensive use of it throughout his extremely popular tidyverse packages. Aside from saving the user a few characters of typing, NSE has a number of useful properties for dynamic programming.
As noted in the other answer, Wickham has an excellent tutorial on how it all works. RPubs user lionel also has a great working paper on the topic.
The concept is called "non-standard evaluation", and there are many different ways in which it can be used in different R functions. See this book chapter for an introduction.
This language feature can be confusing, and arguably is not needed for the library() function, but it allows incredibly powerful code when you need to specify computations on data frames, as is the case in ggplot2 or in dplyr, for example.
The lines
library(ggplot2)
library("ggplot2")
are not equivalent. In the first line, ggplot2 is a symbol, which may
or may not be bound to some value. In the second line, "ggplot2" is a
character vector of length one.
A function, however, can manipulate the arguments that it gets without
evaluating them, and can decide to treat both cases equivalently, which is what library does apparently.
Here's an example of how to manipulate an unevaluated expression:
> f <- function(x) match.call() # return unevaluated function call
> x <- f(foo)
> x
f(x = foo)
> mode(x)
[1] "call"
> x[[1]]
f
> x[[2]]
foo
> mode(x[[2]])
[1] "name"
> as.character(x[[2]])
[1] "foo"
> x <- f("foo")
> mode(x[[2]])
[1] "character"

How do you re-write the rm() function in R to clear your workspace automatically [duplicate]

I am trying to find a way to clear the workspace in R using lists.
According to the documentation, I could simply create a vector with all my workspace objects: WS=c(ls()). But nothing happens when I try element wise deletion with rm(c(ls()) or rm(WS).
I know I can use the command rm(list=ls()). I am just trying to figure how R works. Where did I err in my thinking in applying the rm() function on a vector with the list of the objects?
Specifically, I'm trying to create a function similar to the clc function in MATLAB, but I am having trouble getting it to work. Here's the function that I've written:
clc <- function() { rm(list = ls()) }
From ?rm, "Details" section:
Earlier versions of R incorrectly claimed that supplying a character vector in ... removed the objects named in the character vector, but it removed the character vector. Use the list argument to specify objects via a character vector.
Your attempt should have been:
rm(list = WS)
HOWEVER, this will still leave you with an object (a character vector) named "WS" in your workspace since that was created after you called WS <- c(ls()). To actually get rid of the "WS" object, you would have had to use rm(WS, list = WS). :-)
How does it work? If you look at the code for rm, the first few lines of the function captures any individual objects that have been specified, whether quoted or unquoted. Towards the end of the function, you will find the line list <- .Primitive("c")(list, names) which basically creates a character vector of all of the objects individually named and any objects in the character vector supplied to the "list" argument.
Update
Based on your comment, it sounds like you're trying to write a function like:
.clc <- function() {
rm(list = ls(.GlobalEnv), envir = .GlobalEnv)
}
I think it's a little bit of a dangerous function, but let's test it out:
ls()
# character(0)
for (i in 1:5) assign(letters[i], i)
ls()
# [1] "a" "b" "c" "d" "e" "i"
.clc()
ls()
# character(0)
Note: FYI, I've named the function .clc (with a dot) so that it doesn't get removed when the function is run. If you wanted to write a version of the function without the ., you would probably do better to put the function in a package and load that at startup to have the function available.

Understanding element wise clearing of R's workspace

I am trying to find a way to clear the workspace in R using lists.
According to the documentation, I could simply create a vector with all my workspace objects: WS=c(ls()). But nothing happens when I try element wise deletion with rm(c(ls()) or rm(WS).
I know I can use the command rm(list=ls()). I am just trying to figure how R works. Where did I err in my thinking in applying the rm() function on a vector with the list of the objects?
Specifically, I'm trying to create a function similar to the clc function in MATLAB, but I am having trouble getting it to work. Here's the function that I've written:
clc <- function() { rm(list = ls()) }
From ?rm, "Details" section:
Earlier versions of R incorrectly claimed that supplying a character vector in ... removed the objects named in the character vector, but it removed the character vector. Use the list argument to specify objects via a character vector.
Your attempt should have been:
rm(list = WS)
HOWEVER, this will still leave you with an object (a character vector) named "WS" in your workspace since that was created after you called WS <- c(ls()). To actually get rid of the "WS" object, you would have had to use rm(WS, list = WS). :-)
How does it work? If you look at the code for rm, the first few lines of the function captures any individual objects that have been specified, whether quoted or unquoted. Towards the end of the function, you will find the line list <- .Primitive("c")(list, names) which basically creates a character vector of all of the objects individually named and any objects in the character vector supplied to the "list" argument.
Update
Based on your comment, it sounds like you're trying to write a function like:
.clc <- function() {
rm(list = ls(.GlobalEnv), envir = .GlobalEnv)
}
I think it's a little bit of a dangerous function, but let's test it out:
ls()
# character(0)
for (i in 1:5) assign(letters[i], i)
ls()
# [1] "a" "b" "c" "d" "e" "i"
.clc()
ls()
# character(0)
Note: FYI, I've named the function .clc (with a dot) so that it doesn't get removed when the function is run. If you wanted to write a version of the function without the ., you would probably do better to put the function in a package and load that at startup to have the function available.

Resources