Are these strings or variables? - r

Coming from a C / Python / Java background, I have trouble understanding some R syntax, where literals look like variables, but seem to behave like strings. For example:
library(ggplot2)
library("ggplot2")
The two lines behave equivalently. However, I would expect the first line to mean "load the library whose name is stored in the ggplot2 variable" and give an error like object 'ggplot2' not found.
Speaking of ggplot2:
ggplot(data, aes(factor(arrivalRate), responseTime, fill=factor(mode))) +
geom_violin(trim=FALSE, position=dodge)
The variables arrivalRate, responseTime and mode do not exist, but somehow R knows to look them up inside the data data frame. I assume that aes actually receives strings, that are then processed using something like eval.
How does R parse code that it ends up interpreting some literals as strings?

promises
When an argument is passed to a function it is not passed as a value but is passed as a promise which consists of
the expression or code that the caller uses as the actual argument
the environment in which that expression is to be evaluated, viz. the caller's environment.
the value that the expression represents when the expression is evaluated in the promise's environment -- this slot is not filled in until the promise is actually evaluated. It will never be filled in if the function never accesses it.
The pryr package can show the info in a promise:
library(pryr)
g <- function(x) promise_info(x)
g(ggplot2)
giving:
$code
ggplot2 <-- the promise x represents the expression ggplot2
$env
<environment: R_GlobalEnv> <-- if evaluated it will be done in this environment
$evaled
[1] FALSE <-- it has not been evaluated
$value
NULL <-- not filled in because promise has not been evaluated
The only one of the above slots in the pryr output that can be accessed at the R level without writing a C function to do it (or using a package such as pryr that accesses such C code) is the code slot. That can be done using the R function substitute(x) (or other means). In terms of the pryr output substitute applied to a promise returns the code slot without evaluating the promise. That is, the value slot is not modified. Had we accessed x in an ordinary way, i.e. not via substitute, then the code would have been evaluated in the promise's environment, stored in the value slot and then passed to the expression in the function that accesses it.
Thus either of the following result in a character string representing what was passed as an expression, i.e. the character representation of the code slot, as opposed to its value.
f <- function(x) as.character(substitute(x))
f("ggplot2")
## [1] "ggplot2"
f(ggplot2)
## [1] "ggplot2"
library
In fact, library uses this idiom, i.e. as.character(substitute(x)), to handle its first argument.
aes
The aes function uses match.call to get the entire call as an expression and so in a sense is an alternative to substitute. For example:
h <- function(x) match.call()
h(pi + 3)
## h(x = pi + 3)
Note
One cannot tell without looking at the documentation or code of a function how it will treat its arguments.

An interesting quirk of the R language is the way it evaluates expressions. In most cases, R behaves the way you'd expect. Expressions in quotes are treated as strings, anything else is treated as a variable, function, or other token. But some functions allow for "non-standard evaluation", in which an unquoted expression is evaluated, more or less, as if it were a quoted variable. The most common example of this is R's way of loading libraries (which allows for unquoted or quoted library names) and its succinct formula interface. Other packages can take advantage of NSE. Hadley Wickham makes extensive use of it throughout his extremely popular tidyverse packages. Aside from saving the user a few characters of typing, NSE has a number of useful properties for dynamic programming.
As noted in the other answer, Wickham has an excellent tutorial on how it all works. RPubs user lionel also has a great working paper on the topic.

The concept is called "non-standard evaluation", and there are many different ways in which it can be used in different R functions. See this book chapter for an introduction.
This language feature can be confusing, and arguably is not needed for the library() function, but it allows incredibly powerful code when you need to specify computations on data frames, as is the case in ggplot2 or in dplyr, for example.

The lines
library(ggplot2)
library("ggplot2")
are not equivalent. In the first line, ggplot2 is a symbol, which may
or may not be bound to some value. In the second line, "ggplot2" is a
character vector of length one.
A function, however, can manipulate the arguments that it gets without
evaluating them, and can decide to treat both cases equivalently, which is what library does apparently.
Here's an example of how to manipulate an unevaluated expression:
> f <- function(x) match.call() # return unevaluated function call
> x <- f(foo)
> x
f(x = foo)
> mode(x)
[1] "call"
> x[[1]]
f
> x[[2]]
foo
> mode(x[[2]])
[1] "name"
> as.character(x[[2]])
[1] "foo"
> x <- f("foo")
> mode(x[[2]])
[1] "character"

Related

How to define an S3 generic with the same name as a primitive function?

I have a class myclass in an R package for which I would like to define a method as.raw, so of the same name as the primitive function as.raw(). If constructor, generic and method are defined as follows...
new_obj <- function(n) structure(n, class = "myclass") # constructor
as.raw <- function(obj) UseMethod("as.raw") # generic
as.raw.myclass <- function(obj) obj + 1 # method (dummy example here)
... then R CMD check leads to:
Warning: declared S3 method 'as.raw.myclass' not found
See section ‘Generic functions and methods’ in the ‘Writing R
Extensions’ manual.
If the generic is as_raw instead of as.raw, then there's no problem, so I assume this comes from the fact that the primitive function as.raw already exists. Is it possible to 'overload' as.raw by defining it as a generic (or would one necessarily need to use a different name?)?
Update: NAMESPACE contains
export("as.raw") # export the generic
S3method("as.raw", "myclass") # export the method
This seems somewhat related, but dimnames there is a generic and so there is a solution (just don't define your own generic), whereas above it is unclear (to me) what the solution is.
The problem here appears to be that as.raw is a primitive function (is.primitive(as.raw)). From the ?setGeneric help page, it says
A number of the basic R functions are specially implemented as primitive functions, to be evaluated directly in the underlying C code rather than by evaluating an R language definition. Most have implicit generics (see implicitGeneric), and become generic as soon as methods (including group methods) are defined on them.
And according to the ?InternalMethods help page, as.raw is one of these primitive generics. So in this case, you just need to export the S3method. And you want to make sure your function signature matches the signature of the existing primitive function.
So if I have the following R code
new_obj <- function(n) structure(n, class = "myclass")
as.raw.myclass <- function(x) x + 1
and a NAMESPACE file of
S3method(as.raw,myclass)
export(new_obj)
Then this passes the package checks for me (on R 4.0.2). And I can run the code with
as.raw(new_obj(4))
# [1] 5
# attr(,"class")
# [1] "myclass"
So in this particular case, you need to leave the as.raw <- function(obj) UseMethod("as.raw") part out.

Referring to package and function as arguments in another function

I am trying to find methods for specific functions across different packages in R. For example methods(broom::tidy) will return all methods for the function tidy in the package broom. For my current issue it would be better if I could have the methods function in another function like so:
f1 <- function(x,y){
methods(x::y)
}
(I removed other parts of the code that are not relevant to my issue.)
However when I run the function like this:
f1 <- function(x,y){ methods(x::y)}
f1(broom,tidy)
I get the error
Error in loadNamespace(name) : there is no package called ‘x’
If I try to modify it as to only change the function but keep the package the same I get a similar error :
f2 <- function(y){ methods(broom::y)}
f2(tidy)
Error: 'y' is not an exported object from 'namespace:broom'
How can I get the package and function name to evaluate properly in the function? Does this current issue have to do with when r is trying to evaluate/substitute values in the function?
Both the :: and methods() functions use non-standard evaluation in order to work. This means you need to be a bit more clever with passing values to the functions in order to get it to work. Here's one method
f1 <- function(x,y){
do.call("methods", list(substitute(x::y)))
}
f1(broom,tidy)
Here we use substitute() to expand and x and y values we pass in into the namespace lookup. That solves the :: part which you can see with
f2 <- function(x,y){
substitute(x::y)
}
f2(broom,tidy)
# broom::tidy
We need the substitute because there could very well be a package x with function y. For this reason, variables are not expanded when using ::. Note that :: is just a wrapper to getExportedValue() should you otherwise need to extract values from namespaces using character values.
But there is one other catch: methods() doesn't evaluate it's parameters, it uses the raw expression to find the methods. This means we don't actually need the value of broom::tidy, we to pass that literal expression. Since we need to evaluate the substitute to get the expression we need, we need to build the call with do.call() in order to evaluate the substitute and pass that expression on to methods()

How to overload S4 slot selector `#` to be a generic function

I am trying to turn the # operator in R into a generic function for the S3 system.
Based on the chapter in Writing R extensions: adding new generic I tried implementing the generic for # like so:
`#` <- function(object, name) UseMethod("#")
`#.default` <- function(object, name) base::`#`(object, name)
However this doesn't seem to work as it breaks the # for the S4 methods. I am using Matrix package as an example of S4 instance:
Matrix::Matrix(1:4, nrow=2, ncol=2)#Dim
Error in #.default(Matrix::Matrix(1:4, nrow = 2, ncol = 2), Dim) :
no slot of name "name" for this object of class "dgeMatrix"
How to implement a generic # so it correctly dispatches in the case of S4 classes?
EDIT
Also interested in opinions about why it might not be a good idea?
R's documentation is somewhat confusing as to whether # is already a generic or not: the help page for # says it is, but it isn't listed on the internalGenerics page.
The # operator has specific behaviour as well as (perhaps) being a generic. From the help page for #: "It is checked that object is an S4 object (see isS4), and it is an error to attempt to use # on any other object." That would appear to rule out writing methods for S3 classes, though the documentation is unclear if this check happens before method dispatch (if there is any) or after (whence it could be skipped if you supplied a specific method for some S3 class).
You can implement what you want by completely redefining what # is, along the line of the suggestion in comments:
`#.default` <- function(e1,e2) slot(e1,substitute(e2))
but there are two reasons not to do this:
1) As soon as someone loads your package, it supersedes the normal # function, so if people call it with other S4 objects, they are getting your version rather than the R base version.
2) This version is considerably less efficient than the internal one, and because of (1) you have just forced your users to use it (unless they use the cumbersome construction base::"#"(e1,e2)). Efficiency may not matter to your use case, but it may matter to your users' other code that uses S4.
Practically, a reasonable compromise might be to define your own binary operator %#%, and have the default method call #. That is,
`%#%` <- function(e1,e2) slot(e1,substitute(e2))
setGeneric("%#%")
This is called in practice as follows:
> setClass("testClass",slots=c(a="character")) -> testClass
> x <- testClass(a="cheese")
> x %#% a
[1] "cheese"

Parsing ellipsis arguments

I'm trying to write a wrapper for some ggplot2 graphs and am trying to use ellipsis' to make the function flexible. I want to save the user (n = 1 me!) from having to explicitly pass the axis titles so thought it might be possible to parse the ... arguments and set the axis titles appropriately. I've read in several Stackoverflow threads (e.g. 1, e.g. 2 or even the R Documentation) that ellipsis arguments can be converted to a list using args <- list(...) so have knocked up a simplified example...
test <- function(...){
args <- list(...)
is.list(args) %>% print()
if(grepl('a', args)){
title <- 'A'
}
else if(grepl('b', args)){
title <- 'B'
}
return(title)
}
Testing the function I get what I expect when supplying a single a as an argument...
> test(a)
[1] TRUE
[1] "A"
But when I try passing other arguments including multiple one via ellipsis I don't understand what is happening. One non-a argument
> test(b)
Error in test(b) (from #2) : object 'b' not found
...then the first argument as a with secondary ones...
> test(a, c, d)
Error in test(a, c, d) (from #2) : object 'd' not found
...or non a at first but something further down the line which should match....
> test(c, b, d)
Error in test(c, b, d) (from #2) : object 'b' not found
The problem is cropping up at args <- list(...) because the logical test to see if args is a list isn't printed, but this doesn't fit with what I've read list(...) does (which is turn the ellipsis arguments into a list). I expect I may need to use something like args <- list(...) %>% unlist() in order to convert the list into a vector which can then be used as an argument in grepl() (and have actually tried it but as far as I can tell the error is occurring before getting to the if()) but I don't understand whats going on and would be grateful for any explanations.
EDIT :
In light of comments it looks like this is a problem of my own creation as I'm mixing Standard and Non-Standard Evaluation. I had been trying to write a wrapper to ggplot2 and have been using the NSE Vignette and lazyeeval vignette to learn/guide me (as well as various threads here on SO), but was faltering when trying to pick out specific variables from the ellipsis (...) to pass to the ggplot call I was making.
Downside is work want results and don't afford much time for learning/improving our coding practices so I'll switch to using Standard Evaulation and have another stab at properly understanding Non-Standard Evaluation in the future.

Which R functions are not suitable for programmatic use?

Some functions like browser only make sense when used interactively.
It is widely regarded that the subset function should only be used interactively.
Similarly, sapply isn't good for programmatic use since it doesn't simplify the result for zero length inputs.
I'm trying to make an exhaustive list of functions that are only not suitable for programmatic use.
The plan is to make a tool for package checking to see if any of these functions are called and give a warning.
There are other functions like file.choose and readline that require interactivity, but these are OK to include in packages, since the end use will be interactive. I don't care too much about these for this use case but feel free to add them to the list.
Which functions have I missed?
(Feel free to edit.)
The following functions should be handled with care (which does not necessarily mean they are not suitable for programming):
Functions whose outputs do not have a consistent output class depending on the inputs: sapply, mapply (by default)
Functions whose internal behavior is different depending on the input length: sample, seq
Functions that evaluate some of their arguments within environments: $, subset, with, within, transform.
Functions that go against normal environment usage: attach, detach, assign, <<-
Functions that allow partial matching: $
Functions that only make sense in interactive usage: browser, recover, debug, debugonce, edit, fix, menu, select.list
Functions that can be a threat (virus) if used with user inputs: source, eval(parse(text=...)), system.
Also, to some extent, every function that generates warnings rather than errors. I recommend using options(warn = 2) to turn all warnings into errors in a programming application. Specific cases can then be allowed via suppressWarnings or try.
This is in answer to the comment after the question by the poster. This function inputs a function and returns the bad functions found with their line number. It can generate false positives but they are only warnings anways so that does not seem too bad. Modify bad to suit.
badLines <- function(func) {
bad <- c("sapply", "subset", "attach")
regex <- paste0("\\b", bad, "\\b")
result <- sort(unlist(sapply(regex, FUN = grep, body(func), simplify = FALSE)))
setNames(result, gsub("\\b", "", names(result), fixed = TRUE))
}
badLines(badLines)
## sapply1 subset attach sapply2
## 2 2 2 4

Resources