Deparse, substitute with three-dots arguments - r

Let consider a typical deparse(substitute( R call:
f1 <-function(u,x,y)
{print(deparse(substitute(x)))}
varU='vu'
varX='vx'
varY='vy'
f1(u=varU,x=varX,y=varY)
That results in
[1] "varX"
which is what we expect and we want.
Then, comes the trouble, I try to get a similar behaviour using the ... arguments i.e.
f2 <- function(...)
{ l <- list(...)
x=l$x
print(deparse(substitute(x))) ### this cannot work but I would like something like that
}
That, not surprisingly, does not work :
f2(u=varU,x=varX,y=varY)
[1] "\"vx\"" ### wrong ! I would like "varX"
I tried to get the expected behaviour using a different combination of solutions but none provides me what expected and it seems that I am still not clear enough on lazy eval to find myself the how-to in a resonable amount of time.

You can get the list of all unevaluated arguments by doing
match.call(expand.dots = FALSE)$...
Or, if you only have dot arguments, via
as.list(match.call()[-1L])
This will give you a named list, similarly to list(...), but in its unevaluated form (similarly to what substitute does on a single argument).
An alternative is using rlang::quos(...) if you’re willing to use the {rlang} package, which returns a similar result in a slightly different form.

Related

How does R evaluate these weird expressions?

I was trying to make Python 3-style assignment unpacking possible in R (e.g., a, *b, c = [1,2,3], "C"), and although I got so close (you can check out my code here), I ultimately ran into a few (weird) problems.
My code is meant to work like this:
a %,*% b %,% c <- c(1,2,3,4,5)
and will assign a = 1, b = c(2,3,4) and c = 5 (my code actually does do this, but with one small snag I will get to later).
In order for this to do anything, I have to define:
`%,%` <- function(lhs, rhs) {
...
}
and
`%,%<-` <- function(lhs, rhs, value) {
...
}
(as well as %,*% and %,*%<-, which are slight variants of the previous functions).
First issue: why R substitutes *tmp* for the lhs argument
As far as I can tell, R evaluates this code from left to right at first (i.e., going from a to c, until it reaches the last %,%, where upon, it goes back from right to left, assigning values along the way. But the first weird thing I noticed is that when I do match.call() or substitute(lhs) in something like x %infix% y <- z, it says that the input into the lhs argument in %infix% is *tmp*, instead of say, a or x.
This is bizarre to me, and I couldn't find any mention of it in the R manual or docs. I actually make use of this weird convention in my code (i.e., it doesn't show this behavior on the righthand side of the assignment, so I can use the presence of the *tmp* input to make %,% behave differently on this side of the assignment), but I don't know why it does this.
Second issue: why R checks for object existence before anything else
My second problem is what makes my code ultimately not work. I noticed that if you start with a variable name on the lefthand side of any assignment, R doesn't seem to even start evaluating the expression---it returns the error object '<variable name>' not found. I.e., if x is not defined, x %infix% y <- z won't evaluate, even if %infix% doesn't actually use or evaluate x.
Why does R behave like this, and can I change it or get around it? If I could to run the code in %,% before R checks to see if x exists, I could probably hack it so that I wouldn't be a problem, and my Python unpacking code would be useful enough to actually share. But as it is now, the first variable needs to already exist, which is just too limiting in my opinion. I know that I could probably do something by changing the <- to a custom infix operator like %<-%, but then my code would be so similar to the zeallot package, that I wouldn't consider it worth it. (It's already very close in what it does, but I like my style better.)
Edit:
Following Ben Bolker's excellent advice, I was able to find a way around the problem... by overwriting <-.
`<-` <- function(x, value) {
base::`<-`(`=`, base::`=`)
find_and_assign(match.call(), parent.frame())
do.call(base::`<-`, list(x = substitute(x), value = substitute(value)),
quote = FALSE, envir = parent.frame())
}
find_and_assign <- function(expr, envir) {
base::`<-`(`<-`, base::`<-`)
base::`<-`(`=`, base::`=`)
while (is.call(expr)) expr <- expr[[2]]
if (!rlang::is_symbol(expr)) return()
var <- rlang::as_string(expr) # A little safer than `as.character()`
if (!exists(var, envir = envir)) {
assign(var, NULL, envir = envir)
}
}
I'm pretty sure that this would be a mortal sin though, right? I can't exactly see how it would mess anything up, but the tingling of my programmer senses tells me this would not be appropriate to share in something like a package... How bad would this be?
For your first question, about *tmp* (and maybe related to your second question):
From Section 3.4.4 of the R Language definition:
Assignment to subsets of a structure is a special case of a general mechanism for complex assignment:
x[3:5] <- 13:15
The result of this command is as if the following had been executed
`*tmp*` <- x
x <- "[<-"(`*tmp*`, 3:5, value=13:15)
rm(`*tmp*`)
Note that the index is first converted to a numeric index and then the elements are replaced sequentially along the numeric index, as if a for loop had been used. Any existing variable called *tmp* will be overwritten and deleted, and this variable name should not be used in code.
The same mechanism can be applied to functions other than [. The replacement function has the same name with <- pasted on. Its last argument, which must be called value, is the new value to be assigned.
I can imagine that your second problem has to do with the first step of the "as if" code: if R is internally trying to evaluate *tmp* <- x, it may be impossible to prevent from trying to evaluate x at this point ...
If you want to dig farther, I think the internal evaluation code used to deal with "complex assignment" (as it seems to be called in the internal comments) is around here ...

rlang: Error: Can't convert a function to a string

I created a function to convert a function name to string. Version 1 func_to_string1 works well, but version 2 func_to_string2 doesn't work.
func_to_string1 <- function(fun){
print(rlang::as_string(rlang::enexpr(fun)))
}
func_to_string2 <- function(fun){
is.function(fun)
print(rlang::as_string(rlang::enexpr(fun)))
}
func_to_string1 works:
> func_to_string1(sum)
[1] "sum"
func_to_string2 doesn't work.
> func_to_string2(sum)
Error: Can't convert a primitive function to a string
Call `rlang::last_error()` to see a backtrace
My guess is that by calling the fun before converting it to a string, it gets evaluated inside function and hence throw the error message. But why does this happen since I didn't do any assignments?
My questions are why does it happen and is there a better way to convert function name to string?
Any help is appreciated, thanks!
This isn't a complete answer, but I don't think it fits in a comment.
R has a mechanism called pass-by-promise,
whereby a function's formal arguments are lazy objects (promises) that only get evaluated when they are used.
Even if you didn't perform any assignment,
the call to is.function uses the argument,
so the promise is "replaced" by the result of evaluating it.
Nevertheless, in my opinion, this seems like an inconsistency in rlang*,
especially given cory's answer,
which implies that R can still find the promise object even after a given parameter has been used;
the mechanism to do so might not be part of R's public API though.
*EDIT: see coments.
Regardless, you could treat enexpr/enquo/ensym like base::missing,
in the sense that you should only use them with parameters you haven't used at all in the function's body.
Maybe use this instead?
func_to_string2 <- function(fun){
is.function(fun)
deparse(substitute(fun))
#print(rlang::as_string(rlang::enexpr(fun)))
}
> func_to_string2(sum)
[1] "sum"
This question brings up an interesting point on lazy evaluations.
R arguments are lazily evaluated, meaning the arguments are not evaluated until its required.
This is best understood in the Advanced R book which has the following example,
f <- function(x) {
10
}
f(stop("This is an error!"))
the result is 10, which is surprising because x is never called and hence never evaluated. We can force x to be evaluated by using force()
f <- function(x) {
force(x)
10
}
f(stop("This is an error!"))
This behaves as expected. In fact we dont even need force() (Although it is good to be explicit).
f <- function(x) {
x
10
}
f(stop("This is an error!"))
This what is happening with your call here. The function sum which is a symbol initially is being evaluated with no arguments when is.function() is being called. In fact, even this will fail.
func_to_string2 <- function(fun){
fun
print(rlang::as_string(rlang::ensym(fun)))
}
Overall, I think its best to use enexpr() at the very beginning of the function.
Source:
http://adv-r.had.co.nz/Functions.html

Confused by ...()?

In another question, sapply(substitute(...()), as.character) was used inside a function to obtain the names passed to the function. The as.character part sounds fine, but what on earth does ...() do?
It's not valid code outside of substitute:
> test <- function(...) ...()
> test(T,F)
Error in test(T, F) : could not find function "..."
Some more test cases:
> test <- function(...) substitute(...())
> test(T,F)
[[1]]
T
[[2]]
F
> test <- function(...) substitute(...)
> test(T,F)
T
Here's a sketch of why ...() works the way it does. I'll fill in with more details and references later, but this touches on the key points.
Before performing substitution on any of its components, substitute() first parses an R statement.
...() parses to a call object, whereas ... parses to a name object.
... is a special object, intended only to be used in function calls. As a consequence, the C code that implements substitution takes special measures to handle ... when it is found in a call object. Similar precautions are not taken when ... occurs as a symbol. (The relevant code is in the functions do_substitute, substitute, and substituteList (especially the latter two) in R_SRCDIR/src/main/coerce.c.)
So, the role of the () in ...() is to cause the statement to be parsed as a call (aka language) object, so that substitution will return the fully expanded value of the dots. It may seem surprising that ... gets substituted for even when it's on the outside of the (), but: (a) calls are stored internally as list-like objects and (b) the relevant C code seems to make no distinction between the first element of that list and the subsequent ones.
Just a side note: for examining behavior of substitute or the classes of various objects, I find it useful to set up a little sandbox, like this:
f <- function(...) browser()
f(a = 4, 77, B = "char")
## Then play around within the browser
class(quote(...)) ## quote() parses without substituting
class(quote(...()))
substitute({...})
substitute(...(..., X, ...))
substitute(2 <- (makes * list(no - sense))(...))

find all functions in a package that use a function

I would like to find all functions in a package that use a function. By functionB "using" functionA I mean that there exists a set of parameters such that functionA is called when functionB is given those parameters.
Also, it would be nice to be able to control the level at which the results are reported. For example, if I have the following:
outer_fn <- function(a,b,c) {
inner_fn <- function(a,b) {
my_arg <- function(a) {
a^2
}
my_arg(a)
}
inner_fn(a,b)
}
I might or might not care to have inner_fn reported. Probably in most cases not, but I think this might be difficult to do.
Can someone give me some direction on this?
Thanks
A small step to find uses of functions is to find where the function name is used. Here's a small example of how to do that:
findRefs <- function(pkg, fn) {
ns <- getNamespace(pkg)
found <- vapply(ls(ns, all.names=TRUE), function(n) {
f <- get(n, ns)
is.function(f) && fn %in% all.names(body(f))
}, logical(1))
names(found[found])
}
findRefs('stats', 'lm.fit')
#[1] "add1.lm" "aov" "drop1.lm" "lm" "promax"
...To go further you'd need to analyze the body to ensure it is a function call or the FUN argument to an apply-like function or the f argument to Map etc... - so in the general case, it is nearly impossible to find all legal references...
Then you should really also check that getting the name from that function's environment returns the same function you are looking for (it might use a different function with the same name)... This would actually handle your "inner function" case.
(Upgraded from a comment.) There is a very nice foodweb function in Mark Bravington's mvbutils package with a lot of this capability, including graphical representations of the resulting call graphs. This blog post gives a brief description.

Can an R function access its own name?

Can you write a function that prints out its own name?
(without hard-coding it in, obviously)
You sure can.
fun <- function(x, y, z) deparse(match.call()[[1]])
fun(1,2,3)
# [1] "fun"
You can, but just in case it's because you want to call the function recursively see ?Recall which is robust to name changes and avoids the need to otherwise process to get the name.
Recall package:base R Documentation
Recursive Calling
Description:
‘Recall’ is used as a placeholder for the name of the function in
which it is called. It allows the definition of recursive
functions which still work after being renamed, see example below.
As you've seen in the other great answers here, the answer seems to be "yes"...
However, the correct answer is actually "yes, but not always". What you can get is actually the name (or expression!) that was used to call the function.
First, using sys.call is probably the most direct way of finding the name, but then you need to coerce it into a string. deparse is more robust for that.
myfunc <- function(x, y=42) deparse(sys.call()[[1]])
myfunc (3) # "myfunc"
...but you can call a function in many ways:
lapply(1:2, myfunc) # "FUN"
Map(myfunc, 1:2) # (the whole function definition!)
x<-myfunc; x(3) # "x"
get("myfunc")(3) # "get(\"myfunc\")"
The basic issue is that a function doesn't have a name - it's just that you typically assign the function to a variable name. Not that you have to - you can have anonymous functions - or assign many variable names to the same function (the x case above).

Resources