ellipsis ... as function in substitute? - r

I'm having trouble understanding how/why parentheses work where they otherwise should not work®.
f = function(...) substitute(...()); f(a, b)
[[1]]
a
[[2]]
b
# but, substitute returns ..1
f2 = function(...) substitute(...); f2(a, b)
a
Normally an error is thrown, could not find function "..." or '...' used in an incorrect context, for example when calling (\(...) ...())(5).
What I've tried
I have looked at the source code of substitute to find out why this doesn't happen here. R Internals 1.1.1 and 1.5.2 says ... is of SEXPTYPE DOTSXP, a pairlist of promises. These promises are what is extracted by substitute.
# \-substitute #R
# \-do_substitute #C
# \-substituteList #C recursive
# \-substitute #C
Going line-by-line, I am stuck at substituteList, in which h is the current element of ... being processed. This happens recursively at line 2832 if (TYPEOF(h) == DOTSXP) h = substituteList(h, R_NilValue);. I haven't found exception handling of a ...() case in the source code, so I suspect something before this has happened.
In ?substitute we find substitute works on a purely lexical basis. Does it mean ...() is a parser trick?
parse(text = "(\\(...) substitute(...()))(a, b)") |> getParseData() |> subset(text == "...", select = c(7, 9))
#> token text
#> 4 SYMBOL_FORMALS ...
#> 10 SYMBOL_FUNCTION_CALL ...
The second ellipsis is recognized during lexical analysis as the name of a function call. It doesn't have its own token like |> does. The output is a pairlist ( typeof(f(a, b)) ), which in this case is the same as a regular list (?). I guess it is not a parser trick. But whatever it is, it has been around for a while!
Question:
How does ...() work?

Note: When referring to documentation and source code, I provide links to an unofficial GitHub mirror of R's official Subversion repository. The links are bound to commit 97b6424 in the GitHub repo, which maps to revision 81461 in the Subversion repo (the latest at the time of this edit).
substitute is a "special" whose arguments are not evaluated (doc).
typeof(substitute)
[1] "special"
That means that the return value of substitute may not agree with parser logic, depending on how the unevaluated arguments are processed internally.
In general, substitute receives the call ...(<exprs>) as a LANGSXP of the form (pseudocode) pairlist(R_DotsSymbol, <exprs>) (doc). The context of the substitute call determines how the SYMSXP R_DotsSymbol is processed. Specifically, if substitute was called inside of a function with ... as a formal argument and rho as its execution environment, then the result of
findVarInFrame3(rho, R_DotsSymbol, TRUE)
in the body of C utility substituteList (source) is either a DOTSXP or R_MissingArg—the latter if and only if f was called without arguments (doc). In other contexts, the result is R_UnboundValue or (exceptionally) some other SEXP—the latter if and only if a value is bound to the name ... in rho. Each of these cases is handled specially by substituteList.
The multiplicity in the processing of R_DotsSymbol is the reason why these R statements give different results:
f0 <- function() substitute(...(n = 1)); f0()
## ...(n = 1)
f1 <- function(...) substitute(...(n = 1)); f1()
## $n
## [1] 1
g0 <- function() {... <- quote(x); substitute(...(n = 1))}; g0()
## Error in g0() : '...' used in an incorrect context
g1 <- function(...) {... <- quote(x); substitute(...(n = 1))}; g1()
## Error in g1() : '...' used in an incorrect context
h0 <- function() {... <- NULL; substitute(...(n = 1))}; h0()
## $n
## [1] 1
h1 <- function(...) {... <- NULL; substitute(...(n = 1))}; h1()
## $n
## [1] 1
Given how ...(n = 1) is parsed, you might have expected f1 to return call("...", n = 1), both g0 and g1 to return call("x", n = 1), and both h0 and h1 to throw an error, but that is not the case for the above, mostly undocumented reasons.
Internals
When called inside of the R function f,
f <- function(...) substitute(...(<exprs>))
substitute evaluates a call to the C utility do_substitute—you can learn this by looking here—in which argList gets a LISTSXP of the form pairlist(x, R_MissingArg), where x is a LANGSXP of the form pairlist(R_DotsSymbol, <exprs>) (source).
If you follow the body of do_substitute, then you will find that the value of t passed to substituteList from do_substitute is a LISTSXP of the form pairlist(copy_of_x) (source).
It follows that the while loop inside of the substituteList call (source) has exactly one iteration and that the statement CAR(el) == R_DotsSymbol in the body of the loop (source) is false in that iteration.
In the false branch of the conditional (source), h gets the value
pairlist(substituteList(copy_of_x, env)). The loop exits and substituteList returns h to do_substitute, which in turn returns CAR(h) to R (source 1, 2, 3).
Hence the return value of substitute is substituteList(copy_of_x, env), and it remains to deduce the identity of this SEXP. Inside of this call to substituteList, the while loop has 1+m iterations, where m is the number of <exprs>. In the first iteration, the statement CAR(el) == R_DotsSymbol in the body of the loop is true.
In the true branch of the conditional (source), h is either a DOTSXP or R_MissingArg, because f has ... as a formal argument (doc). Continuing, you will find that substituteList returns:
R_NilValue if h was R_MissingArg in the first while iteration and m = 0,
or, otherwise,
a LISTSXP listing the expressions in h (if h was a DOTSXP in the first while iteration) followed by <exprs> (if m > 1), all unevaluated and without substitutions, because the execution environment of f is empty at the time of the substitute call.
Indeed:
f <- function(...) substitute(...())
is.null(f())
## [1] TRUE
f <- function(...) substitute(...(n = 1))
identical(f(a = sin(x), b = zzz), pairlist(a = quote(sin(x)), b = quote(zzz), n = 1))
## [1] TRUE
Misc
FWIW, it helped me to recompile R after adding some print statements to coerce.c. For example, I added the following before UNPROTECT(3); in the body of do_substitute (source):
Rprintf("CAR(t) == R_DotsSymbol? %d\n",
CAR(t) == R_DotsSymbol);
if (TYPEOF(CAR(t)) == LISTSXP || TYPEOF(CAR(t)) == LANGSXP) {
Rprintf("TYPEOF(CAR(t)) = %s, length(CAR(t)) = %d\n",
type2char(TYPEOF(CAR(t))), length(CAR(t)));
Rprintf("CAR(CAR(t)) = R_DotsSymbol? %d\n",
CAR(CAR(t)) == R_DotsSymbol);
Rprintf("TYPEOF(CDR(CAR(t))) = %s, length(CDR(CAR(t))) = %d\n",
type2char(TYPEOF(CDR(CAR(t)))), length(CDR(CAR(t))));
}
if (TYPEOF(s) == LISTSXP || TYPEOF(s) == LANGSXP) {
Rprintf("TYPEOF(s) = %s, length(s) = %d\n",
type2char(TYPEOF(s)), length(s));
Rprintf("TYPEOF(CAR(s)) = %s, length(CAR(s)) = %d\n",
type2char(TYPEOF(CAR(s))), length(CAR(s)));
}
which helped me confirm what was going into and coming out of the substituteList call on the previous line:
f <- function(...) substitute(...(n = 1))
invisible(f(hello, world, hello(world)))
CAR(t) == R_DotsSymbol? 0
TYPEOF(CAR(t)) = language, length(CAR(t)) = 2
CAR(CAR(t)) = R_DotsSymbol? 1
TYPEOF(CDR(CAR(t))) = pairlist, length(CDR(CAR(t))) = 1
TYPEOF(s) = pairlist, length(s) = 1
TYPEOF(CAR(s)) = pairlist, length(CAR(s)) = 4
invisible(substitute(...()))
CAR(t) == R_DotsSymbol? 0
TYPEOF(CAR(t)) = language, length(CAR(t)) = 1
CAR(CAR(t)) = R_DotsSymbol? 1
TYPEOF(CDR(CAR(t))) = NULL, length(CDR(CAR(t))) = 0
TYPEOF(s) = pairlist, length(s) = 1
TYPEOF(CAR(s)) = language, length(CAR(s)) = 1
Obviously, compiling R with debugging symbols and running R under a debugger helps, too.
Another puzzle
Just noticed this oddity:
g <- function(...) substitute(...(n = 1), new.env())
gab <- g(a = sin(x), b = zzz)
typeof(gab)
## [1] "language"
gab
## ...(n = 1)
Someone here can do another deep dive to find out why the result is a LANGSXP rather than a LISTSXP when you supply env different from environment() (including env = NULL).

Related

Modifying calls in function arguments

How can a function inspect and modify the arguments of a call that it received as argument?
Application: A user feeds a call to function a as an argument to function b, but they forget to specify one of the required arguments of a. How can function b detect the problem and fix it?
In this minimal example, function a requires two arguments:
a <- function(arg1, arg2) {
return(arg1 + arg2)
}
Function b accepts a call and an argument. The commented lines indicate what I need to do:
b <- function(CALL, arg3) {
# 1. check if `arg2` is missing from CALL
# 2. if `arg2` is missing, plug `arg3` in its place
# 3. return evaluated call
CALL
}
Expected behavior:
b(CALL = a(arg1 = 1, arg2 = 2), arg3 = 3)
> 3
b(CALL = a(arg1 = 1), arg3 = 3)
> 4
The second call currently fails because the user forgot to specify the required arg2 argument. How can function b fix this mistake automatically?
Can I exploit lazy evaluation to modify the call to a before it is evaluated? I looked into rlang::modify_call but couldn't figure it out.
Here's a method that would work
b <- function(CALL, arg3) {
scall <- substitute(CALL)
stopifnot(is.call(scall)) #check that it's a call
lcall <- as.list(scall)
if (!"arg2" %in% names(lcall)) {
lcall <- c(lcall, list(arg2 = arg3))
}
eval.parent(as.call(lcall))
}
We use substitute() to grab the unevaluated version the CALL parameter. We convert it to a list so we can modify it. Then we append to the list another list with the parameter name/value we want. Finally, we turn the list back into a call and then evaluate that call in the environment of the caller rather than in the function body itself.
If you wanted to use rlang::modify_call and other rlang functions you could use
b <- function(CALL, arg3) {
scall <- rlang::enquo(CALL)
stopifnot(rlang::quo_is_call(scall))
if (!"arg2" %in% names(rlang::quo_get_expr(scall))) {
scall <- rlang::call_modify(scall, arg2=arg3)
}
rlang::eval_tidy(scall, env = rlang::caller_env())
}
I don't see why fancy language manipulation is needed. The problem is what to do when a, which requires 2 arguments, is supplied only 1. Wrapping it with b, which has a default value for the 2nd argument, solves this.
b <- function(arg1, arg2=42)
{
a(arg1, arg2)
}
b(1)
# [1] 43
b(1, 2)
# [1] 3

How to get line number of a function call in R?

For debug purposes, I want to print a line number (and function name) of the place the current function was called from. How do I get this in R?
I've seen a solution of getting the source file name
But how to get the line number and function name?]
EDIT: I found how to get this data from traceback() in some form, traceback is able to print it out, but I am not sure how to decode the information out of it:
f <- function () {
traceback(x = 3, max.lines = 1)
}
g <- function()
{
f()
}
x <- g()
source("file.R") # file with this code
# 5: g() at file.R#20
# 4: eval(ei, envir)
# 3: eval(ei, envir)
# 2: withVisible(eval(ei, envir))
# 1: source("file.R")
str(x[[1]])
# chr "g()"
# - attr(*, "srcref")= 'srcref' int [1:8] 20 1 20 8 1 8 20 20
# ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x0000000013a31700>
Found a solution! Got it from the code of traceback():
f <- function ()
{
x <- .traceback(x = 1)
srcloc <- if (!is.null(srcref <- attr(x[[1]], "srcref"))) {
srcfile <- attr(srcref, "srcfile")
paste0("Called from ", x[[2]], ", at ", basename(srcfile$filename), "#", srcref[1])
}
cat(srcloc, "\n")
}
g <- function()
{
f()
}
g()
# Called from g(), at file.R#15
Wrote a nice wrapper function for it:
# returns a list, unless fmtstring is specified
# level: 1 - caller of the caller of this function; 2 - its parent, 3 - its grand-parent etc.
# fmtstring: return format string: %f (function), %s (source file), %l (line)
#
# example: str <- caller_info("Called from %f at %s#%l\n")
# !!! it won't work with e.g. cat(caller_info("Called from %f at %s#%l\n"))
# or cat(paste0(caller_info("Called from %f at %s#%l\n"))) !!!
caller_info <- function (fmtstring = NULL, level = 1) # https://stackoverflow.com/q/59537482/684229
{
x <- .traceback(x = level + 1)
i <- 1
repeat { # loop for subexpressions case; find the first one with source reference
srcref <- getSrcref(x[[i]])
if (is.null(srcref)) {
if (i < length(x)) {
i <- i + 1
next;
} else {
warning("caller_info(): not found\n")
return (NULL)
}
}
srcloc <- list(fun = getSrcref(x[[i+1]]), file = getSrcFilename(x[[i]]), line = getSrcLocation(x[[i]]))
break;
}
if (is.null(fmtstring))
return (srcloc)
fmtstring <- sub("%f", paste0(srcloc$fun, collapse = ""), fmtstring)
fmtstring <- sub("%s", srcloc$file, fmtstring)
fmtstring <- sub("%l", srcloc$line, fmtstring)
fmtstring
}
This is how it's used:
f <- function ()
{
str <- caller_info("Called from %f at %s#%l\n")
cat(str)
}
The only (minor) limitation is that when called in subexpressions like cat(caller_info("Called from %f at %s#%l\n")) or cat(paste0(caller_info("Called from %f at %s#%l\n"))), R confusingly counts these subexpression things as stack levels, which messes it up. So better avoid the use of this wrapper in expressions.
There aren't easy functions to give you what you're asking for, but for debugging purposes you can call browser() in a function, then run the where command to see the current call stack. For example, you might see something like this:
where 1: calls()
where 2 at ~/temp/test.R#6: print(calls())
where 3 at ~/temp/test.R#9: f()
where 4: eval(ei, envir)
where 5: eval(ei, envir)
where 6: withVisible(eval(ei, envir))
where 7: source("~/temp/test.R", echo = TRUE)
This gives locations for a couple of the calls, but not all of them.
If you really want something that prints as you go (like the __LINE__ and __FILE__ macros in C/C++), it's a little harder. This prints the current location:
cat("This is line ", getSrcLocation(function() {}, "line"),
" of ", getSrcFilename(function() {}))
Not all functions have names, and R functions don't know what name you called them under, but you can see the current call using sys.call(). So this prints everything:
cat("This is line ", getSrcLocation(function() {}, "line"),
" of ", getSrcFilename(function() {}),
" called as", deparse(sys.call()),
"\n")
which might print
This is line 3 of test.R called as f()
sys.call has an argument to move up the stack, but I don't know of a way to get the line number information.
You can get the location of the start of the function that made the current call using
cat("Called from ", getSrcFilename(sys.function(-1)), " line ", getSrcLocation(sys.function(-1), "line"),
" as ", deparse(sys.call()), "\n")
which will show you the code that made the call, but the line number is only for the function it came from. It's a good argument for keeping your functions short!

Why does "Sum()" succeed where "+" fails in recursive R function?

I am experimenting with the functional programming paradigm in R. I have defined a function that sums a sequence of integers from n to m. When I use sum() the function returns the expected result:
sumRange <- function(n, m) {
if (n <= m) {
return(sum(n, sumRange((n + 1), m)))
}
}
sumRange(1, 10)
# [1] 55
However, when I use the + operator the function returns numeric(0):
sumRange <- function(n, m) {
if (n <= m) {
return(n + sumRange((n + 1), m))
}
}
sumRange(1, 10)
# numeric(0)
Why does the operator + not work in this recursive function? Is there a way to rewrite the function so that it does?
The issue is that you never specify an else condition, hence at the end of the recursion it appears that R is returning NULL when the if condition fails. Returning 0 as the else condition fixes your problem:
sumRange <- function(n, m) return(ifelse (n <= m, (n + sumRange((n+1), m)), 0))
sumRange(1, 10)
[1] 55
Note that this is essentially defining a base case for your recursion. A base case, when hit, ends the recursion and causes the calls on the stack to be unwound.
To see the issue with the way you phrased your code, try writing out your function explicitly:
sumRange <- function(n, m) {
if (n <= m) {
return(n + sumRange((n+1), m))
}
// but what gets returned if n > m ?
// this is undefined behavior
}
I'm not an R guru, but my understanding is that R was written in C, and C might allow a recursion like this with no else condition. But the behavior is not well defined and you should not be relying on it.
Demo
If there is no return (using a explicit or implicit return statement) is executed, then R functions seems to return a NULL object.
If you add numerical value to a this object, it will simply return numeric(0).
So, what happens in the second case is that when n reaches 11, it returns a NULL object, and goes back adding values to it. But NULL + 10 + 9 .. = numeric(0).
Check this with
no_ret <- function ()
{
# just return nothing
}
obj <- no_ret()
obj
# NULL
class(obj)
# "NULL
new_obj <- obj + 10
new_obj
# numeric(0)
When the first function is executed, the what the sum statement get is
a vector with a NULL in it. For example,
vec <- c(NULL, 10, 9,...) which is actually vec <- c(10, 9, ...), so you get the expected outcome.
> c(NULL, 10:1)
[1] 10 9 8 7 6 5 4 3 2 1
> sum(NULL, 10:1)
[1] 55
> NULL + 10:1
integer(0)

Calling print(ls.str()) in function affect behavior of rep

Begin a new R session with an empty environment. Write a series of functions with a parameter that is to be used as the value of the times parameter in a call to rep().
f <- function(n) {
rep("hello", times = n)
}
f(x)
One expect this to fail, and indeed one gets:
# Error in f(x) : object 'x' not found
Modify the function a bit:
f2 <- function(n) {
ls.str()
rep("hello", times = n)
}
f2(x)
As expected, it still fails:
# Error in f2(x) : object 'x' not found
Modify a bit more (to see the environment in the console):
f3 <- function(n) {
print(ls.str())
rep("hello", times = n)
}
f3(x)
I still expect failure, but instead get:
## n : <missing>
## [1] "hello"
It is as if the call to print() makes rep work as though times were set to 1.
This is not an answer, but too long to post as a comment. A minimal reproducible example is:
f3 <- function(n) {
try(get("n", environment(), inherits=FALSE))
rep("hello", times = n)
}
f3(x)
## Error in get("n", environment(), inherits = FALSE) : object 'x' not found
## [1] "hello"
The following is speculative and based on loosely examining the source for do_rep. get starts the promise evaluation, but upon not finding the "missing" symbol appears to leave the promise partially unevaluated. rep, being a primitive, then attempts to operate on n without realizing that it is a partially evaluated promise and basically that leads implicitly to the assumption that 'n == 1'.
Also, this shows that the promise is in a weird state (have to use browser/debug to see it):
f3a <- function(n) {
try(get("n", environment(), inherits=FALSE))
browser()
rep("hello", times = n)
}
f3a(x)
## Error in get("n", environment(), inherits = FALSE) : object 'x' not found
## Called from: f3a(x)
# Browse[1]> (n)
## Error: object 'x' not found
## In addition: Warning message:
## restarting interrupted promise evaluation
## Browse[1]> c
## [1] "hello"
I received earlier today a report that the bug has been fixed in R-devel and R-patched.
The issue was that the test for missingness in the R sources did not consider the case of an interrupted promise evaluation. A fix has been committed by Luke Tierney and can be seen on GitHub.
f4 <- function(n) {
print('test')
print(ls.str())
print('end test')
rep("hello", times = n)
}
f4(x)
## [1] "test"
## n : <missing>
## [1] "end test"
## [1] "hello"
There's something within print.ls_str, from Frank's test on chat the follwing code exhibit the same problem:
f6 <- function(n) {
z = tryCatch(get("n", new.env(), mode = "any"), error = function(e) e)
rep("A", n)
}
Digging a little inside R source I found the following code
# define GET_VALUE(rval) \
/* We need to evaluate if it is a promise */ \
if (TYPEOF(rval) == PROMSXP) { \
PROTECT(rval); \
rval = eval(rval, genv); \
UNPROTECT(1); \
} \
\
if (!ISNULL(rval) && NAMED(rval) == 0) \
SET_NAMED(rval, 1)
GET_VALUE(rval);
break;
case 2: // get0(.)
if (rval == R_UnboundValue)
return CAD4R(args);// i.e. value_if_not_exists
GET_VALUE(rval);
break;
}
return rval;
}
#undef GET_VALUE
I'm quite surprised this compile properly, as far as I remember (my C is quite far behind) #define doesn't allow spaces between the # and define.
After digging for that, I'm wrong, from gcc doc:
Whitespace is also allowed before and after the `#'.
So there's probably something around this part of code, but that's above my head to pinpoint what exactly.

Guidance on missing() in R

Suppose a function, G, takes two arguments; a and b: G(a = some number, b = some number).
Now two situations (wondering what commands to use in each case?):
1- if a user puts G(b = some number), will the if(missing(a)){do this} recognize the complete absence of a argument? AND more importantly:
2- if a user puts G(a =, b = some number), still will the if(missing(a)){do this} recognize a = but lack of some number in front of it?
Defining the function as below doesn't throw an error in both the cases:
ch <- function(a=NA,b=NA){ if(is.na(a)) return(b) else( return(a+b)) }
> ch(b=2)
[1] 2
> ch(a=,b=2)
[1] 2

Resources