I'm having trouble understanding how/why parentheses work where they otherwise should not work®.
f = function(...) substitute(...()); f(a, b)
[[1]]
a
[[2]]
b
# but, substitute returns ..1
f2 = function(...) substitute(...); f2(a, b)
a
Normally an error is thrown, could not find function "..." or '...' used in an incorrect context, for example when calling (\(...) ...())(5).
What I've tried
I have looked at the source code of substitute to find out why this doesn't happen here. R Internals 1.1.1 and 1.5.2 says ... is of SEXPTYPE DOTSXP, a pairlist of promises. These promises are what is extracted by substitute.
# \-substitute #R
# \-do_substitute #C
# \-substituteList #C recursive
# \-substitute #C
Going line-by-line, I am stuck at substituteList, in which h is the current element of ... being processed. This happens recursively at line 2832 if (TYPEOF(h) == DOTSXP) h = substituteList(h, R_NilValue);. I haven't found exception handling of a ...() case in the source code, so I suspect something before this has happened.
In ?substitute we find substitute works on a purely lexical basis. Does it mean ...() is a parser trick?
parse(text = "(\\(...) substitute(...()))(a, b)") |> getParseData() |> subset(text == "...", select = c(7, 9))
#> token text
#> 4 SYMBOL_FORMALS ...
#> 10 SYMBOL_FUNCTION_CALL ...
The second ellipsis is recognized during lexical analysis as the name of a function call. It doesn't have its own token like |> does. The output is a pairlist ( typeof(f(a, b)) ), which in this case is the same as a regular list (?). I guess it is not a parser trick. But whatever it is, it has been around for a while!
Question:
How does ...() work?
Note: When referring to documentation and source code, I provide links to an unofficial GitHub mirror of R's official Subversion repository. The links are bound to commit 97b6424 in the GitHub repo, which maps to revision 81461 in the Subversion repo (the latest at the time of this edit).
substitute is a "special" whose arguments are not evaluated (doc).
typeof(substitute)
[1] "special"
That means that the return value of substitute may not agree with parser logic, depending on how the unevaluated arguments are processed internally.
In general, substitute receives the call ...(<exprs>) as a LANGSXP of the form (pseudocode) pairlist(R_DotsSymbol, <exprs>) (doc). The context of the substitute call determines how the SYMSXP R_DotsSymbol is processed. Specifically, if substitute was called inside of a function with ... as a formal argument and rho as its execution environment, then the result of
findVarInFrame3(rho, R_DotsSymbol, TRUE)
in the body of C utility substituteList (source) is either a DOTSXP or R_MissingArg—the latter if and only if f was called without arguments (doc). In other contexts, the result is R_UnboundValue or (exceptionally) some other SEXP—the latter if and only if a value is bound to the name ... in rho. Each of these cases is handled specially by substituteList.
The multiplicity in the processing of R_DotsSymbol is the reason why these R statements give different results:
f0 <- function() substitute(...(n = 1)); f0()
## ...(n = 1)
f1 <- function(...) substitute(...(n = 1)); f1()
## $n
## [1] 1
g0 <- function() {... <- quote(x); substitute(...(n = 1))}; g0()
## Error in g0() : '...' used in an incorrect context
g1 <- function(...) {... <- quote(x); substitute(...(n = 1))}; g1()
## Error in g1() : '...' used in an incorrect context
h0 <- function() {... <- NULL; substitute(...(n = 1))}; h0()
## $n
## [1] 1
h1 <- function(...) {... <- NULL; substitute(...(n = 1))}; h1()
## $n
## [1] 1
Given how ...(n = 1) is parsed, you might have expected f1 to return call("...", n = 1), both g0 and g1 to return call("x", n = 1), and both h0 and h1 to throw an error, but that is not the case for the above, mostly undocumented reasons.
Internals
When called inside of the R function f,
f <- function(...) substitute(...(<exprs>))
substitute evaluates a call to the C utility do_substitute—you can learn this by looking here—in which argList gets a LISTSXP of the form pairlist(x, R_MissingArg), where x is a LANGSXP of the form pairlist(R_DotsSymbol, <exprs>) (source).
If you follow the body of do_substitute, then you will find that the value of t passed to substituteList from do_substitute is a LISTSXP of the form pairlist(copy_of_x) (source).
It follows that the while loop inside of the substituteList call (source) has exactly one iteration and that the statement CAR(el) == R_DotsSymbol in the body of the loop (source) is false in that iteration.
In the false branch of the conditional (source), h gets the value
pairlist(substituteList(copy_of_x, env)). The loop exits and substituteList returns h to do_substitute, which in turn returns CAR(h) to R (source 1, 2, 3).
Hence the return value of substitute is substituteList(copy_of_x, env), and it remains to deduce the identity of this SEXP. Inside of this call to substituteList, the while loop has 1+m iterations, where m is the number of <exprs>. In the first iteration, the statement CAR(el) == R_DotsSymbol in the body of the loop is true.
In the true branch of the conditional (source), h is either a DOTSXP or R_MissingArg, because f has ... as a formal argument (doc). Continuing, you will find that substituteList returns:
R_NilValue if h was R_MissingArg in the first while iteration and m = 0,
or, otherwise,
a LISTSXP listing the expressions in h (if h was a DOTSXP in the first while iteration) followed by <exprs> (if m > 1), all unevaluated and without substitutions, because the execution environment of f is empty at the time of the substitute call.
Indeed:
f <- function(...) substitute(...())
is.null(f())
## [1] TRUE
f <- function(...) substitute(...(n = 1))
identical(f(a = sin(x), b = zzz), pairlist(a = quote(sin(x)), b = quote(zzz), n = 1))
## [1] TRUE
Misc
FWIW, it helped me to recompile R after adding some print statements to coerce.c. For example, I added the following before UNPROTECT(3); in the body of do_substitute (source):
Rprintf("CAR(t) == R_DotsSymbol? %d\n",
CAR(t) == R_DotsSymbol);
if (TYPEOF(CAR(t)) == LISTSXP || TYPEOF(CAR(t)) == LANGSXP) {
Rprintf("TYPEOF(CAR(t)) = %s, length(CAR(t)) = %d\n",
type2char(TYPEOF(CAR(t))), length(CAR(t)));
Rprintf("CAR(CAR(t)) = R_DotsSymbol? %d\n",
CAR(CAR(t)) == R_DotsSymbol);
Rprintf("TYPEOF(CDR(CAR(t))) = %s, length(CDR(CAR(t))) = %d\n",
type2char(TYPEOF(CDR(CAR(t)))), length(CDR(CAR(t))));
}
if (TYPEOF(s) == LISTSXP || TYPEOF(s) == LANGSXP) {
Rprintf("TYPEOF(s) = %s, length(s) = %d\n",
type2char(TYPEOF(s)), length(s));
Rprintf("TYPEOF(CAR(s)) = %s, length(CAR(s)) = %d\n",
type2char(TYPEOF(CAR(s))), length(CAR(s)));
}
which helped me confirm what was going into and coming out of the substituteList call on the previous line:
f <- function(...) substitute(...(n = 1))
invisible(f(hello, world, hello(world)))
CAR(t) == R_DotsSymbol? 0
TYPEOF(CAR(t)) = language, length(CAR(t)) = 2
CAR(CAR(t)) = R_DotsSymbol? 1
TYPEOF(CDR(CAR(t))) = pairlist, length(CDR(CAR(t))) = 1
TYPEOF(s) = pairlist, length(s) = 1
TYPEOF(CAR(s)) = pairlist, length(CAR(s)) = 4
invisible(substitute(...()))
CAR(t) == R_DotsSymbol? 0
TYPEOF(CAR(t)) = language, length(CAR(t)) = 1
CAR(CAR(t)) = R_DotsSymbol? 1
TYPEOF(CDR(CAR(t))) = NULL, length(CDR(CAR(t))) = 0
TYPEOF(s) = pairlist, length(s) = 1
TYPEOF(CAR(s)) = language, length(CAR(s)) = 1
Obviously, compiling R with debugging symbols and running R under a debugger helps, too.
Another puzzle
Just noticed this oddity:
g <- function(...) substitute(...(n = 1), new.env())
gab <- g(a = sin(x), b = zzz)
typeof(gab)
## [1] "language"
gab
## ...(n = 1)
Someone here can do another deep dive to find out why the result is a LANGSXP rather than a LISTSXP when you supply env different from environment() (including env = NULL).
So I implemented a function that calculates the value of the gamma function. and when I try to multiply f5(a) with a numeric I receive the error : Error in result * f5(a) : non-numeric argument to binary operator and if I instead use result * gamma(a) which is the predefined function it works just fine. It seems like it won't let me do any arithmetic operation with f5 even though it returns the same result as gamma
f5 <- function(a)
{
f <- function(x)
x^(a-1)*exp(-x)
integrate(f, 0, Inf)
}
f6 <- function(a)
{
if (a < 0)
print("a is negative")
else if (a%%1 == 0)
return (factorial(a-1))
else
{
result <- 1
while (a > 1)
{
result <- result * (a - 1)
a <- a - 1
}
result <- result * f5(a)
result
}
}
gamma(0.3)
f5(0.3)
f6(0.3)
This is because of the class of object that gets returned from f5().
class(f5(0.3))
[1] "integrate"
This is a named list object, and you can call the specific value from it:
names(f5(a))
[1] "value" "abs.error" "subdivisions" "message" "call"
You want the value component. Modifying f6() to the code below makes it work:
f6 <- function(a){
if (a < 0){
print("a is negative")
}else if (a%%1 == 0){
return (factorial(a-1))
}else{
result <- 1
while (a > 1){
result <- result * (a - 1)
a <- a - 1
}
result <- result * f5(a)$value
result
}
}
I wish I knew how to make this example smaller, but I don't understand the problem well enough to do that.
I have a package that rewrites R functions to make them tail-recursive: tailr. It does a bit of analysis of a recursive function and then translates it into a looping function. For example, it will translate this factorial function
factorial <- function(n, acc) {
if (n <= 1) acc
else factorial(n - 1, acc * n)
}
into this version
factorial <- function(n, acc) {
.tailr_n <- n
.tailr_acc <- acc
callCC(function(escape) {
repeat {
n <- .tailr_n
acc <- .tailr_acc
if (n <= 1)
escape(acc)
else {
.tailr_n <<- n - 1
.tailr_acc <<- acc * n
}
}
})
}
The generated function is not pretty, but it does work.
My problem is if I write a package that uses the transformation, one that contains only these lines of R:
#' Computes the factorial.
#' #param n A number
#' #param acc Accumulator to make the function tail-recursive
#' #return factorial of n
#' #export
factorial <- function(n, acc) {
if (n <= 1) acc
else factorial(n - 1, acc * n)
}
#' Computes the factorial.
#' #param n A number
#' #return factorial of n
#' #param acc Accumulator to make the function tail-recursive
#' #export
factorial_loop <- tailr::loop_transform(factorial)
running devtools::check() give me this error:
Error in attr(e, "srcref")[[i]] : subscript out of bounds
Calls: <Anonymous> ... <Anonymous> -> collectUsage -> collectUsageFun -> walkCode -> h
Execution halted
If I put a dummy version of the transformation into the package, I do not get an error
dummy_transform_body <- function(expr) {
rlang::expr({
.tailr_n <- n
.tailr_acc <- acc
callCC(function(escape) {
repeat {
n <- .tailr_n
acc <- .tailr_acc
if (n <= 1)
escape(acc)
else {
.tailr_n <<- n - 1
.tailr_acc <<- acc * n
}
}
})
})
}
dummy_transform <- function(fun) {
fun_q <- rlang::enquo(fun)
new_fun_body <- dummy_transform_body(body(fun))
result <- rlang::new_function(
args = formals(fun),
body = new_fun_body,
env = rlang::get_env(fun_q)
)
result
}
#' Computes the factorial.
#' #param n A number
#' #return factorial of n
#' #param acc Accumulator to make the function tail-recursive
#' #export
factorial_loop_dummy <- dummy_transform(factorial)
I don't see any differences between the two functions, so I am puzzled why the check accepts the dummy but not the real version.
> body(factorial_loop) == body(factorial_loop_dummy)
[1] TRUE
> environment(factorial_loop)
<environment: namespace:Test>
> environment(factorial_loop_dummy)
<environment: namespace:Test>
> formals(factorial_loop)
$n
$acc
> formals(factorial_loop_dummy)
$n
$acc
> attributes(factorial_loop())
Error in factorial_loop() : argument "n" is missing, with no default
> attributes(factorial_loop)
NULL
> attributes(factorial_loop_dummy)
NULL
The error mentions the attribute srcref, but neither transformed function has any attributes. If I explicitly set the srcref attribute it doesn't help with the error though.
Any ideas, anyone?
Update 2018/03/20:
The problem seems to be with the quasi-quotation splicing in my transformation function. If I uncomment that, the !!! statements below, and manually insert the cases for the factorial, then the error goes away.
dummy_transform_body <- function(fun_expr, info) {
vars <- names(formals(info$fun))
tmp_assignments <- vector("list", length = length(vars))
locals_assignments <- vector("list", length = length(vars))
for (i in seq_along(vars)) {
local_var <- as.symbol(vars[[i]])
tmp_var <- parse(text = paste(".tailr_", vars[[i]], sep = ""))[[1]]
tmp_assignments[[i]] <- rlang::expr(rlang::UQ(tmp_var) <- rlang::UQ(local_var))
locals_assignments[[i]] <- rlang::expr(rlang::UQ(local_var) <- rlang::UQ(tmp_var))
}
# this would be a nice pipeline, but it is a bit much to require
# magrittr just for this
fun_expr <- make_returns_explicit(fun_expr, FALSE, info)
fun_expr <- simplify_returns(fun_expr, info)
fun_expr <- handle_recursive_returns(fun_expr, info)
fun_expr <- returns_to_escapes(fun_expr, info)
fun_expr <- simplify_nested_blocks(fun_expr)
rlang::expr({
#!!! tmp_assignments
.tailr_n <- n
.tailr_acc <- acc
callCC(function(escape) {
repeat {
#!!! locals_assignments
n <<- .tailr_n
acc <<- .tailr_acc
!! fun_expr
next
}
})
})
}
Another Update:
...Deleted the previous update... The hack with putting the splicing inside another bock doesn't work for me any longer...
Yet another update...
Ok, I still have absolutely no idea why the splicing isn't working. I made other dummy-functions where it did. So I am really interested if someone has any ideas. In any case, I managed to rewrite my tailr function to avoid !!! and that seems to work now.
repeat_body <- as.call(
c(`{`, locals_assignments, fun_expr, quote(next))
)
call_cc_stmt <- rlang::expr(
callCC(function(escape) {
repeat {
!!repeat_body
}
})
)
as.call(
c(`{`, tmp_assignments, call_cc_stmt)
)
This is just a lot less elegant and the generated code is uglier--but I hide that by setting srcref to the original code, so no one need ever know.
My custom iterator is a bit slow. I hope to get a speed up when I use the unlist(as.list(ic, n=2000)) construct. However, I do not know how to implement this functionality. I only found the nextElem and hasNext methods. The iterator looks like this:
library(itertools)
fibonacci <- function(count = NA) {
ab = c(0, 1)
n <- function() {
if (!is.na(count)) {
if (count > 0) count <<- count -1
else stop('StopIteration')
}
#
ab <<- c(ab[2], sum(ab))
ab[1]
}
obj <- list(nextElem = n)
class(obj) <- c('fibonacci', 'abstractiter', 'iter')
obj
}
I can use it like this:
ic <- fibonacci ()
print (nextElem (ic))
Now I would like to get the next 10 fibonacci numbers at once, via
print(unlist(as.list(ic, n=10)))
But this of course needs to be implemented. How would I do this?
The fibonacci iterator serves as an example. Actually, I work on an iterator that gives all k-combinations of an n-set, i.e. a memory-friendly version of combn.
I did some programming work on R language to do the bubble sort. Sometimes it works perfectly without any error message, but sometimes, it shows "Error in if (x[i] > x[i + 1]) { : argument is of length zero". Can any one help me check whats wrong with it? I have attached my code below
example <- function(x) {
n <- length(x)
repeat {
hasChanged <- FALSE
n <- n - 1
for(i in 1:n) {
if ( x[i] > x[i+1] ) {
temp <- x[i]
x[i] <- x[i+1]
x[i+1] <- temp
hasChanged <- TRUE
cat("The current Vector is", x ,"\n")
}
}
if ( !hasChanged ) break;
}
}
x <-sample(1:10,5)
cat("The original Vector is", x ,"\n")
example(x)
The error occurs because you are iteratively decreasing n. Depending on the original vector's order (or lack thereof), n can reach the value of 1 after the last change. In that case, a further reduction of n in the next iteration step addresses the value x[0], which is undefined.
With a minimal correction your code will work properly, without giving error messages. Try to replace the line
if ( !hasChanged ) break;
with
if ( !hasChanged | n==1 ) break
Basically you have two termination criteria: Either nothing has been changed in the previous iteration or n is equal to one. In both cases, a further iteration won't change the vector since it is already ordered.
By the way, in R programming you don't need a semicolon at the end of a command. It is tolerated/ignored by the interpreter, but it clutters the code and is not considered good programming style.
Hope this helps.