How to get R's variable name in C (substitute in C)? - r

How can I get from C an object name used as a function argument? I
have sample code in C that gives me access to the name of the function
being called:
#include <Rinternals.h>
SEXP xname(SEXP x)
{
const char *fun_name = CHAR(PRINTNAME(CAR(x)));
x = CDR(x);
const char *arg_name = isNull(TAG(x)) ? "" : CHAR(PRINTNAME(TAG(x)));
Rprintf("fn_name: %s, arg_name: %s\n", fun_name, arg_name);
return R_NilValue;
}
and from R:
> xname <- function(...) invisible(.External("xname", ...))
> x1 = 123
> xname(x1)
fn_name: xname, var_name:
However, I am trying to find a way to access the object name. In the
documentation I found a solution for named args only:
> xname(arg = x1)
fn_name: xname, var_name: arg
And I'd like to find the C equivalent for substitute():
> substitute(x1)
x1
I tried substitute() from Rinternals.h, but it always returned a list with what above. Does anyone know how to do it? Maybe findVarInFrame()?
Best regards

Related

ellipsis ... as function in substitute?

I'm having trouble understanding how/why parentheses work where they otherwise should not work®.
f = function(...) substitute(...()); f(a, b)
[[1]]
a
[[2]]
b
# but, substitute returns ..1
f2 = function(...) substitute(...); f2(a, b)
a
Normally an error is thrown, could not find function "..." or '...' used in an incorrect context, for example when calling (\(...) ...())(5).
What I've tried
I have looked at the source code of substitute to find out why this doesn't happen here. R Internals 1.1.1 and 1.5.2 says ... is of SEXPTYPE DOTSXP, a pairlist of promises. These promises are what is extracted by substitute.
# \-substitute #R
# \-do_substitute #C
# \-substituteList #C recursive
# \-substitute #C
Going line-by-line, I am stuck at substituteList, in which h is the current element of ... being processed. This happens recursively at line 2832 if (TYPEOF(h) == DOTSXP) h = substituteList(h, R_NilValue);. I haven't found exception handling of a ...() case in the source code, so I suspect something before this has happened.
In ?substitute we find substitute works on a purely lexical basis. Does it mean ...() is a parser trick?
parse(text = "(\\(...) substitute(...()))(a, b)") |> getParseData() |> subset(text == "...", select = c(7, 9))
#> token text
#> 4 SYMBOL_FORMALS ...
#> 10 SYMBOL_FUNCTION_CALL ...
The second ellipsis is recognized during lexical analysis as the name of a function call. It doesn't have its own token like |> does. The output is a pairlist ( typeof(f(a, b)) ), which in this case is the same as a regular list (?). I guess it is not a parser trick. But whatever it is, it has been around for a while!
Question:
How does ...() work?
Note: When referring to documentation and source code, I provide links to an unofficial GitHub mirror of R's official Subversion repository. The links are bound to commit 97b6424 in the GitHub repo, which maps to revision 81461 in the Subversion repo (the latest at the time of this edit).
substitute is a "special" whose arguments are not evaluated (doc).
typeof(substitute)
[1] "special"
That means that the return value of substitute may not agree with parser logic, depending on how the unevaluated arguments are processed internally.
In general, substitute receives the call ...(<exprs>) as a LANGSXP of the form (pseudocode) pairlist(R_DotsSymbol, <exprs>) (doc). The context of the substitute call determines how the SYMSXP R_DotsSymbol is processed. Specifically, if substitute was called inside of a function with ... as a formal argument and rho as its execution environment, then the result of
findVarInFrame3(rho, R_DotsSymbol, TRUE)
in the body of C utility substituteList (source) is either a DOTSXP or R_MissingArg—the latter if and only if f was called without arguments (doc). In other contexts, the result is R_UnboundValue or (exceptionally) some other SEXP—the latter if and only if a value is bound to the name ... in rho. Each of these cases is handled specially by substituteList.
The multiplicity in the processing of R_DotsSymbol is the reason why these R statements give different results:
f0 <- function() substitute(...(n = 1)); f0()
## ...(n = 1)
f1 <- function(...) substitute(...(n = 1)); f1()
## $n
## [1] 1
g0 <- function() {... <- quote(x); substitute(...(n = 1))}; g0()
## Error in g0() : '...' used in an incorrect context
g1 <- function(...) {... <- quote(x); substitute(...(n = 1))}; g1()
## Error in g1() : '...' used in an incorrect context
h0 <- function() {... <- NULL; substitute(...(n = 1))}; h0()
## $n
## [1] 1
h1 <- function(...) {... <- NULL; substitute(...(n = 1))}; h1()
## $n
## [1] 1
Given how ...(n = 1) is parsed, you might have expected f1 to return call("...", n = 1), both g0 and g1 to return call("x", n = 1), and both h0 and h1 to throw an error, but that is not the case for the above, mostly undocumented reasons.
Internals
When called inside of the R function f,
f <- function(...) substitute(...(<exprs>))
substitute evaluates a call to the C utility do_substitute—you can learn this by looking here—in which argList gets a LISTSXP of the form pairlist(x, R_MissingArg), where x is a LANGSXP of the form pairlist(R_DotsSymbol, <exprs>) (source).
If you follow the body of do_substitute, then you will find that the value of t passed to substituteList from do_substitute is a LISTSXP of the form pairlist(copy_of_x) (source).
It follows that the while loop inside of the substituteList call (source) has exactly one iteration and that the statement CAR(el) == R_DotsSymbol in the body of the loop (source) is false in that iteration.
In the false branch of the conditional (source), h gets the value
pairlist(substituteList(copy_of_x, env)). The loop exits and substituteList returns h to do_substitute, which in turn returns CAR(h) to R (source 1, 2, 3).
Hence the return value of substitute is substituteList(copy_of_x, env), and it remains to deduce the identity of this SEXP. Inside of this call to substituteList, the while loop has 1+m iterations, where m is the number of <exprs>. In the first iteration, the statement CAR(el) == R_DotsSymbol in the body of the loop is true.
In the true branch of the conditional (source), h is either a DOTSXP or R_MissingArg, because f has ... as a formal argument (doc). Continuing, you will find that substituteList returns:
R_NilValue if h was R_MissingArg in the first while iteration and m = 0,
or, otherwise,
a LISTSXP listing the expressions in h (if h was a DOTSXP in the first while iteration) followed by <exprs> (if m > 1), all unevaluated and without substitutions, because the execution environment of f is empty at the time of the substitute call.
Indeed:
f <- function(...) substitute(...())
is.null(f())
## [1] TRUE
f <- function(...) substitute(...(n = 1))
identical(f(a = sin(x), b = zzz), pairlist(a = quote(sin(x)), b = quote(zzz), n = 1))
## [1] TRUE
Misc
FWIW, it helped me to recompile R after adding some print statements to coerce.c. For example, I added the following before UNPROTECT(3); in the body of do_substitute (source):
Rprintf("CAR(t) == R_DotsSymbol? %d\n",
CAR(t) == R_DotsSymbol);
if (TYPEOF(CAR(t)) == LISTSXP || TYPEOF(CAR(t)) == LANGSXP) {
Rprintf("TYPEOF(CAR(t)) = %s, length(CAR(t)) = %d\n",
type2char(TYPEOF(CAR(t))), length(CAR(t)));
Rprintf("CAR(CAR(t)) = R_DotsSymbol? %d\n",
CAR(CAR(t)) == R_DotsSymbol);
Rprintf("TYPEOF(CDR(CAR(t))) = %s, length(CDR(CAR(t))) = %d\n",
type2char(TYPEOF(CDR(CAR(t)))), length(CDR(CAR(t))));
}
if (TYPEOF(s) == LISTSXP || TYPEOF(s) == LANGSXP) {
Rprintf("TYPEOF(s) = %s, length(s) = %d\n",
type2char(TYPEOF(s)), length(s));
Rprintf("TYPEOF(CAR(s)) = %s, length(CAR(s)) = %d\n",
type2char(TYPEOF(CAR(s))), length(CAR(s)));
}
which helped me confirm what was going into and coming out of the substituteList call on the previous line:
f <- function(...) substitute(...(n = 1))
invisible(f(hello, world, hello(world)))
CAR(t) == R_DotsSymbol? 0
TYPEOF(CAR(t)) = language, length(CAR(t)) = 2
CAR(CAR(t)) = R_DotsSymbol? 1
TYPEOF(CDR(CAR(t))) = pairlist, length(CDR(CAR(t))) = 1
TYPEOF(s) = pairlist, length(s) = 1
TYPEOF(CAR(s)) = pairlist, length(CAR(s)) = 4
invisible(substitute(...()))
CAR(t) == R_DotsSymbol? 0
TYPEOF(CAR(t)) = language, length(CAR(t)) = 1
CAR(CAR(t)) = R_DotsSymbol? 1
TYPEOF(CDR(CAR(t))) = NULL, length(CDR(CAR(t))) = 0
TYPEOF(s) = pairlist, length(s) = 1
TYPEOF(CAR(s)) = language, length(CAR(s)) = 1
Obviously, compiling R with debugging symbols and running R under a debugger helps, too.
Another puzzle
Just noticed this oddity:
g <- function(...) substitute(...(n = 1), new.env())
gab <- g(a = sin(x), b = zzz)
typeof(gab)
## [1] "language"
gab
## ...(n = 1)
Someone here can do another deep dive to find out why the result is a LANGSXP rather than a LISTSXP when you supply env different from environment() (including env = NULL).

Modifying calls in function arguments

How can a function inspect and modify the arguments of a call that it received as argument?
Application: A user feeds a call to function a as an argument to function b, but they forget to specify one of the required arguments of a. How can function b detect the problem and fix it?
In this minimal example, function a requires two arguments:
a <- function(arg1, arg2) {
return(arg1 + arg2)
}
Function b accepts a call and an argument. The commented lines indicate what I need to do:
b <- function(CALL, arg3) {
# 1. check if `arg2` is missing from CALL
# 2. if `arg2` is missing, plug `arg3` in its place
# 3. return evaluated call
CALL
}
Expected behavior:
b(CALL = a(arg1 = 1, arg2 = 2), arg3 = 3)
> 3
b(CALL = a(arg1 = 1), arg3 = 3)
> 4
The second call currently fails because the user forgot to specify the required arg2 argument. How can function b fix this mistake automatically?
Can I exploit lazy evaluation to modify the call to a before it is evaluated? I looked into rlang::modify_call but couldn't figure it out.
Here's a method that would work
b <- function(CALL, arg3) {
scall <- substitute(CALL)
stopifnot(is.call(scall)) #check that it's a call
lcall <- as.list(scall)
if (!"arg2" %in% names(lcall)) {
lcall <- c(lcall, list(arg2 = arg3))
}
eval.parent(as.call(lcall))
}
We use substitute() to grab the unevaluated version the CALL parameter. We convert it to a list so we can modify it. Then we append to the list another list with the parameter name/value we want. Finally, we turn the list back into a call and then evaluate that call in the environment of the caller rather than in the function body itself.
If you wanted to use rlang::modify_call and other rlang functions you could use
b <- function(CALL, arg3) {
scall <- rlang::enquo(CALL)
stopifnot(rlang::quo_is_call(scall))
if (!"arg2" %in% names(rlang::quo_get_expr(scall))) {
scall <- rlang::call_modify(scall, arg2=arg3)
}
rlang::eval_tidy(scall, env = rlang::caller_env())
}
I don't see why fancy language manipulation is needed. The problem is what to do when a, which requires 2 arguments, is supplied only 1. Wrapping it with b, which has a default value for the 2nd argument, solves this.
b <- function(arg1, arg2=42)
{
a(arg1, arg2)
}
b(1)
# [1] 43
b(1, 2)
# [1] 3

Is there any way in which to make an Infix function using sourceCpp()

I was wondering whether it is possible to make an infix function, e.g. A %o% B with Rcpp.
I know that this is possible using the inline package, but have yet been able to find a method for doing this when using sourceCpp().
I have made the following infix implementation of %o% / outer() when arguments are sure to be vectors using RcppEigen and inline:
`%op%` <- cxxfunction(signature(v1="NumericVector",
v2="NumericVector"),
plugin = "RcppEigen",
body = c("
NumericVector xx(v1);
NumericVector yy(v2);
const Eigen::Map<Eigen::VectorXd> x(as<Eigen::Map<Eigen::VectorXd> >(xx));
const Eigen::Map<Eigen::VectorXd> y(as<Eigen::Map<Eigen::VectorXd> >(yy));
Eigen::MatrixXd op = x * y.transpose();
return Rcpp::wrap(op);
"))
This can easily be implemented in to be imported using sourceCpp(), however not as an infix function.
My current attempt is as follows:
#include <Rcpp.h>
using namespace Rcpp;
#include <RcppEigen.h>
// [[Rcpp::depends(RcppEigen)]]
// [[Rcpp::export]]
NumericMatrix outerProd(NumericVector v1, NumericVector v2) {
NumericVector xx(v1);
NumericVector yy(v2);
const Eigen::Map<Eigen::VectorXd> x(as<Eigen::Map<Eigen::VectorXd> >(xx));
const Eigen::Map<Eigen::VectorXd> y(as<Eigen::Map<Eigen::VectorXd> >(yy));
Eigen::MatrixXd op = x * y.transpose();
return Rcpp::wrap(op);
}
So to summarize my question.. Is it possible to make an infix function available through sourceCpp?
Is it possible to make an infix function available through sourceCpp?
Yes.
As always, one should read the Rcpp vignettes!
In particular here, if you look in Section 1.6 of the Rcpp attributes vignette, you'd see you can modify the name of a function using the name parameter for Rcpp::export.
For example, we could do:
#include <Rcpp.h>
// [[Rcpp::export(name = `%+%`)]]
Rcpp::NumericVector add(Rcpp::NumericVector x, Rcpp::NumericVector y) {
return x + y;
}
/*** R
1:3 %+% 4:6
*/
Then we'd get:
Rcpp::sourceCpp("~/infix-test.cpp")
> 1:3 %+% 4:6
[1] 5 7 9
So, you still have to name C++ functions valid C++ names in the code, but you can export it to R through the name parameter of Rcpp::export without having to do anything further on the R side.
John Chambers states three principles on page four of the (highly recommended) "Extending R" book:
Everything that exists in R is an object.
Everything that happens in R is a function call.
Interfaces to other software are part of R.
So per point two, you can of course use sourceCpp() to create your a compiled function and hang that at any odd infix operator you like.
Code Example
library(Rcpp)
cppFunction("std::string cc(std::string a, std::string b) { return a+b; }")
`%+%` <- function(a,b) cc(a,b)
cc("Hello", "World")
"hello" %+% "world"
Output
R> library(Rcpp)
R> cppFunction("std::string cc(std::string a, std::string b) { return a+b; }")
R> `%+%` <- function(a,b) cc(a,b)
R>
R> cc("Hello", "World")
[1] "HelloWorld"
R>
R> "hello" %+% "world"
[1] "helloworld"
R>
Summary
Rcpp is really just one cog in the machinery.
Edit
It also works with your initial function, with some minor simplification. For
`%op%` <- cppFunction("Eigen::MatrixXd op(Eigen::VectorXd x, Eigen::VectorXd y) { Eigen::MatrixXd op = x * y.transpose(); return op; }", depends="RcppEigen")
as.numeric(1:3) %op% as.numeric(3:1)
we get
R> `%op%` <- cppFunction("Eigen::MatrixXd op(Eigen::VectorXd x, Eigen::VectorXd y) { Eigen::MatrixXd op = x * y.transpose(); return op; }", depends="RcppEigen")
R> as.numeric(1:3) %op% as.numeric(3:1)
[,1] [,2] [,3]
[1,] 3 2 1
[2,] 6 4 2
[3,] 9 6 3
R>
(modulo some line noise from the compiler).

Can Rcpp::Function be NULL?

I have a cpp function which has an R function as one of its arguments like below:
void mycppfunction(SEXP x, Rcpp::Function func)
func can be a function, or it can be NULL. How can I implement this? If I do this:
void mycppfunction(SEXP x, Rcpp::Nullable<Rcpp::Function> func)
then, the line:
func(x)
gives me the error:
error: no match for call to ‘(Rcpp::Nullable<Rcpp::Function_Impl<Rcpp::PreserveStorage> >) (SEXP)’
On the other hand, if I just define mycppfunction as:
void mycppfunction(SEXP x, Rcpp::Function func)
then running the code with a NULL value for func results in a segfault: "memory not mapped".
Yes you can do this but I would advise that you are careful about verifying the compatibility of the function passed in and the argument. Here's a small example, with minimal defensive code to ensure that f is a valid function:
#include <Rcpp.h>
typedef Rcpp::Nullable<Rcpp::Function> nullable_t;
// [[Rcpp::export]]
SEXP null_fun(Rcpp::NumericVector x, nullable_t f = R_NilValue) {
if (f.isNotNull()) {
return Rcpp::as<Rcpp::Function>(f)(x);
}
return Rcpp::wrap((double)Rcpp::sum(x));
}
/*** R
null_fun(1:5)
#[1] 15
null_fun(1:5, mean)
#[1] 3
null_fun(1:5, min)
#[1] 1
null_fun(1:5, max)
#[1] 5
*/
Note that the use of SEXP as a return type and the Rcpp::wrap((double)...) was just to quiet the compiler in this specific example, and won't necessarily apply to your use case(s).

Any way to access function installed by makeActiveBinding?

The title basically says it all.
If I do this ...
makeActiveBinding("x", function() runif(2), .GlobalEnv)
x
# [1] 0.7332872 0.4707796
x
# [1] 0.5500310 0.5013099
... is there then any way for me to examine x to learn what function it is linked to (and if not, why not)?
(In this case, I'd like to be able to learn that x was defined to be function() runif(2).)
With a bit of noodling around in envir.c, I can get this to work:
#include <Rcpp.h>
using namespace Rcpp ;
#define HASHSIZE(x) LENGTH(x)
#define HASHVALUE(x) TRUELENGTH(x)
// [[Rcpp::export]]
SEXP get_binding_fun( std::string name, Environment env){
SEXP symbol = Rf_install( name.c_str() );
SEXP tab = HASHTAB(env) ;
SEXP c = PRINTNAME(symbol);
// finding the hash code for the symbol
int hashcode = HASHVALUE(c) % HASHSIZE(tab);
// get the value there from the hash table
SEXP res = CAR( VECTOR_ELT(tab, hashcode ) ) ;
return res ;
}
Save this into a .cpp file, sourceCpp it and use it with this R code:
> makeActiveBinding("x", function() runif(2), .GlobalEnv)
> get_binding_fun("x", .GlobalEnv)
# function ()
# runif(2)

Resources