Testing an expression for missing-ness - r

I am doing some metaprogramming where I do depth-first traversal over a function body to implement partial-evaluation. Here, I am running into a problem when I encounter local variables. With those, I have to process pairlist objects and I want to process these depth-first. I am annotating expressions with attributes to propagate information.
A dummy-example of what I am doing could look like this:
dummy_expr_processing <- function(expr) {
attr(expr, "foo") <- "bar"
expr
}
dummy_pairlist_processing <- function(expr) {
for (i in seq_along(expr))
expr[[i]] <- dummy_expr_processing(expr[[i]])
}
which works fine for
f <- quote(function(x = 42, y = x) x + y)
dummy_pairlist_processing(f[[2]])
It breaks, however, for
f <- quote(function(x, y = x) x + y)
dummy_pairlist_processing(f[[2]])
because here, the expr is missing when I process x in the pair-list. I get the error
Error in dummy_expr_processing(expr[[i]]) :
argument "expr" is missing, with no default
Called from: dummy_expr_processing(expr[[i]])
I can try to get explicit about the argument
dummy_pairlist_processing <- function(expr) {
for (i in seq_along(expr))
expr[[i]] <- dummy_expr_processing(expr = expr[[i]])
}
dummy_pairlist_processing(f[[2]])
but that doesn't change anything.
I can also explicitly check if expr is missing
dummy_expr_processing <- function(expr) {
if (missing(expr)) return()
attr(expr, "foo") <- "bar"
expr
}
This will work if I call dummy_expr_processing without an argument but not if I call it with a "missing" expression. Then it is not considered missing by the missing() function but it certainly is considered missing by attr.
I can get around that by exploiting that missing data is represented as an empty symbol, so I can do this:
dummy_expr_processing <- function(expr) {
if (as.character(expr) == "") return()
attr(expr, "foo") <- "bar"
expr
}
dummy_pairlist_processing <- function(expr) {
for (i in seq_along(expr))
dummy_expr_processing(expr[[i]])
}
dummy_pairlist_processing(f[[2]])
and that will work, but notice that I no longer update expr[[i]] in dummy_pairlist_processing. I cannot do that if I letdummy_expr_processingreturnNULL`.
My questions now are:
Is as.character(expr) == "" the only way I can test if a variable holds an empty expression, or are there better ways?
Is there anyway I can also return the missing expression? Especially nice if I can return it and set an attribute on the returned object.

Related

Defining a new class of functions in R

So I'm changing the class of some functions that I'm building in R in order to add a description attribute and because I want to use S3 generics to handle everything for me. Basically, I have a structure like
foo <- function(x) x + 1
addFunction <- function(f, description) {
class(f) <- c("addFunction", "function")
attr(f, "description") <- description
f
}
foo <- addFunction(foo, "Add one")
and then I do stuff like
description <- function(x) UseMethod("description")
description.default <- function(x) deparse(substitute(x))
description.addFunction <- function(x) attr(x, "description")
This works fine, but it's not that elegant. I'm wondering if it is possible to define a new class of functions such that instances of this class can be defined in a syntax similar to the function syntax. In other words, is it possible to define addFunction such that foo is generated in the following way:
foo <- addFunction(description = "Add one", x) {
x + 1
}
(or something similar, I have no strong feelings about where the attribute should be added to the function)?
Thanks for reading!
Update: I have experimented a bit more with the idea, but haven't really reached any concrete results yet - so this is just an overview of my current (updated) thoughts on the subject:
I tried the idea of just copying the function()-function, giving it a different name and then manipulating it afterwards. However, this does not work and I would love any inputs on what is happening here:
> function2 <- `function`
> identical(`function`, function2)
[1] TRUE
> function(x) x
function(x) x
> function2(x) x
Error: unexpected symbol in "function2(x) x"
> function2(x)
Error: incorrect number of arguments to "function"
As function() is a primitive function, I tried looking at the C-code defining it for more clues. I was particularly intrigued by the error message from the function2(x) call. The C-code underlying function() is
/* Declared with a variable number of args in names.c */
SEXP attribute_hidden do_function(SEXP call, SEXP op, SEXP args, SEXP rho)
{
SEXP rval, srcref;
if (TYPEOF(op) == PROMSXP) {
op = forcePromise(op);
SET_NAMED(op, 2);
}
if (length(args) < 2) WrongArgCount("function");
CheckFormals(CAR(args));
rval = mkCLOSXP(CAR(args), CADR(args), rho);
srcref = CADDR(args);
if (!isNull(srcref)) setAttrib(rval, R_SrcrefSymbol, srcref);
return rval;
}
and from this, I conclude that for some reason, at least two of the four arguments call, op, args and rho are now required. From the signature of do_function() I am guessing that the four arguments passed to do_function should be a call, a promise, a list of arguments and then maybe an environment. I tried a lot of different combinations for function2 (including setting up to two of these arguments to NULL), but I keep getting the same (new) error message:
> function2(call("sum", 2, 1), NULL, list(x=NULL), baseenv())
Error: invalid formal argument list for "function"
> function2(call("sum", 2, 1), NULL, list(x=NULL), NULL)
Error: invalid formal argument list for "function"
This error message is returned from the C-function CheckFormals(), which I also looked up:
/* used in coerce.c */
void attribute_hidden CheckFormals(SEXP ls)
{
if (isList(ls)) {
for (; ls != R_NilValue; ls = CDR(ls))
if (TYPEOF(TAG(ls)) != SYMSXP)
goto err;
return;
}
err:
error(_("invalid formal argument list for \"function\""));
}
I'm not fluent in C at all, so from here on I'm not quite sure what to do next.
So these are my updated questions:
Why do function and function2 not behave in the same way? Why
do I need to call function2 using a different syntax when they are
deemed identical in R?
What are the proper arguments of function2
such that function2([arguments]) will actually define a function?
Some keywords in R such as if and function have special syntax in the way that the underlying functions get called. It's quite easy to use if as a function if desired, e.g.
`if`(1 == 1, "True", "False")
is equivalent to
if (1 == 1) {
"True"
} else {
"False"
}
function is trickier. There's some help on this at a previous question.
For your current problem here's one solution:
# Your S3 methods
description <- function(x) UseMethod("description")
description.default <- function(x) deparse(substitute(x))
description.addFunction <- function(x) attr(x, "description")
# Creates the pairlist for arguments, handling arguments with no defaults
# properly. Also brings in the description
addFunction <- function(description, ...) {
args <- eval(substitute(alist(...)))
tmp <- names(args)
if (is.null(tmp)) tmp <- rep("", length(args))
names(args)[tmp==""] <- args[tmp==""]
args[tmp==""] <- list(alist(x=)$x)
list(args = as.pairlist(args), description = description)
}
# Actually creates the function using the structure created by addFunction and the body
`%{%` <- function(args, body) {
stopifnot(is.pairlist(args$args), class(substitute(body)) == "{")
f <- eval(call("function", args$args, substitute(body), parent.frame()))
class(f) <- c("addFunction", "function")
attr(f, "description") <- args$description
f
}
# Example. Note that the braces {} are mandatory even for one line functions
foo <- addFunction(description = "Add one", x) %{% {
x + 1
}
foo(1)
#[1] 2

Can I override `$` or `[[` to throw an error instead of NULL when asking for a missing list element?

My hunch is this is an abuse of the R language and there's a good reason this doesn't happen. But I find this to be a perpetual source of insidious errors in code that I'm trying to debug:
MWE
list.1 <- list(a=1,b=2,c=list(d=3))
list.2 <- list(b=4,c=list(d=6,e=7))
input.values <- list(list.1,list.2)
do.something.to.a.list <- function(a.list) {
a.list$b <- a.list$c$d + a.list$a
a.list
}
experiment.results <- lapply(input.values,do.something.to.a.list)
use.results.in.some.other.mission.critical.way <- function(result) {
result <- result^2
patient.would.survive.operation <- mean(c(-5,result)) >= -5
if(patient.would.survive.operation) {
print("Congrats, the patient would survive! Good job developing a safe procedure.")
} else {
print("Sorry, the patient won't make it.")
}
}
lapply(experiment.results, function(x)
use.results.in.some.other.mission.critical.way(x$b))
YES I am aware this is a stupid example and that I could just add a check for the existence of the element before trying to access it. But I'm not asking to know what I could do, if I had perfect memory and awareness at all times, to work slowly around the fact that this feature is inconvenient and causes me lots of headache. I'm trying to avoid the headache altogether, perhaps at the cost of code speed.
So: what I want to know is...
(a) Is it possible to do this. My initial attempt failed, and I got stuck trying to read the C internals for "$" to understand how to handle the arguments correctly
(b) If so, is there a good reason not to (or to) do this.
Basically, my idea is that instead of writing every single function that depends on non-null return from list access to check really carefully, I can write just one function to check carefully and trust that the rest of the functions won't get called with unmet preconditions b/c the failed list access will fail-fast.
You can override almost anything in R (except certain special values - NULL, NA, NA_integer_ NA_real_ NA_complex_, NA_character_, NaN, Inf, TRUE, FALSE as far as I'm aware).
For your specific case, you could do this:
`$` <- function(x, i) {
if (is.list(x)) {
i_ <- deparse(substitute(i))
x_ <- deparse(substitute(x))
if (i_ %in% names(x)) {
eval(substitute(base::`$`(x, i)), envir = parent.frame())
} else {
stop(sprintf("\"%s\" not found in `%s`", i_, x_))
}
} else {
eval(substitute(base::`$`(x, i)), envir = parent.frame())
}
}
`[[` <- function(x, i) {
if (is.list(x) && is.character(i)) {
x_ <- deparse(substitute(x))
if (i %in% names(x)) {
base::`[[`(x, i)
} else {
stop(sprintf("\"%s\" not found in `%s`", i, x_))
}
} else {
base::`[[`(x, i)
}
}
Example:
x <- list(a = 1, b = 2)
x$a
#[1] 1
x$c
#Error in x$c : "c" not found in `x`
col1 <- "b"
col2 <- "d"
x[[col1]]
#[1] 2
x[[col2]]
#Error in x[[col2]] : "d" not found in `x`
It will slow your code down quite a bit:
microbenchmark::microbenchmark(x$a, base::`$`(x, a), times = 1e4)
#Unit: microseconds
# expr min lq mean median uq max neval
# x$a 77.152 81.398 90.25542 82.814 85.2915 7161.956 10000
# base::`$`(x, a) 9.910 11.326 12.89522 12.033 12.3880 4042.646 10000
I've limited this to lists (which will include data.frames) and have implemented selection with [[ by numeric and character vectors, but this may not fully represent the ways in which $ and [[ can be used.
Note for [[ you could use #rawr's simpler code:
`[[` <- function(x, i) if (is.null(res <- base::`[[`(x, i))) simpleError('NULL') else res
but this will throw an error for a member of a list which is NULL rather than just not defined. e.g.
x <- list(a = NULL, b = 2)
x[["a"]]
This may of course be what is desired.

How can I tell if a function argument is given as an expression without evaluating it?

Here is a simple function that takes an expression as an argument.
f <- function(expr) {
expr <- substitute(expr)
eval(expr)
}
(In practice, I also need to manipulate the expression a little in the function and may evaluate the expression in some other environment where it is meaningful)
It works fine when we call
f(1+1)
by directly supplying an unevaluated argument.
However, I need f to also work like when I supply an explicitly defined expression outside, for example,
q <- quote(1+1)
f(q)
f(expr) needs to avoid substitute expr and return the value of the expression. However, the code above does not work because it results in the expression itself rather than 2.
So the question is: How can I tell if an argument is given as an expression without evaluating it in the current environment?
Perhaps you'd like to check if you're passed a "name" rather than a "call"
f <- function(expr) {
expr <- substitute(expr)
if(is.name(expr)) {
expr <- eval(expr, parent.frame())
}
eval(expr)
}
f(1+1)
# [1] 2
q<-quote(1+1)
f(q)
# [1] 2
There's probably better ways but here's one approach:
f <- function(expr) {
mess <- suppressWarnings(try(expr, silent = TRUE))
if (inherits(mess, "try-error")) {
expr <- substitute(expr)
}
eval(expr)
}
f(1+1)
## [1] 2
q <- quote(1+1)
f(q)
## [1] 2
This bit extends it to character vectors:
f <- function(expr) {
mess <- suppressWarnings(try(expr, silent = TRUE))
if (inherits(mess, "try-error")) {
expr <- substitute(expr)
}
if (is.character(expr)) return(eval(parse(text=expr)))
eval(expr)
}
f("1 + 1")
## [1] 2
I'd reccomend reading this bit HERE from Hadley's Advanced R

Get the attribute of a packaged function from within itself

Suppose we have this functions in a R package.
prova <- function() {
print(attr(prova, 'myattr'))
print(myattr(prova))
invisible(TRUE)
}
'myattr<-' <- function(x, value) {
attr(x, 'myattr') <- value
x
}
myattr <- function(x) attr(x, 'myattr')
So, I install the package and then I test it. This is the result:
prova()
# NULL
# NULL
myattr(prova) <- 'ciao' # setting 'ciao' for 'myattr' attribute
prova()
# NULL
# NULL # Why NULL here ?
myattr(prova)
# [1] "ciao"
attr(prova, 'myattr')
# [1] "ciao"
The question is: how to get the attribute of the function from within itself?
Inside the function itself I cannot get its attribute, as demonstrated by the example.
I suppose that the solution will be of the serie "computing on the language" (match.call()[[1L]], substitute, environments and friends). Am I wrong?
I think that the important point here is that this function is in a package (so, it has its environment and namespace) and I need its attribute inside itself, in the package, not outside.
you can use get with the envir argument.
prova <- function() {
print(attr(get("prova", envir=envir.prova), 'myattr'))
print(myattr(prova))
invisible(TRUE)
}
eg:
envir.prova <- environment()
prova()
# NULL
# NULL
myattr(prova) <- 'ciao'
prova()
# [1] "ciao"
# [1] "ciao"
Where envir.prova is a variable whose value you set to the environment in which prova is defined.
Alternatively you can use get(.. envir=parent.frame()), but that is less reliable as then you have to track the calls too, and ensure against another object with the same name between the target environment and the calling environment.
Update regarding question in the comments:
regarding using parent.frame() versus using an explicit environment name: parent.frame, as the name suggests, goes "up one level." Often, that is exactly where you want to go, so that works fine. And yet, even when your goal is get an object in an environment further up, R searches up the call stack until it finds the object with the matching name. So very often, parent.frame() is just fine.
HOWEVER if there are multiple calls between where you are invoking parent.frame() and where the object is located AND in one of the intermediary environments there exists another object with the same name, then R will stop at that intermediary environment and return its object, which is not the object you were looking for.
Therefore, parent.frame() has an argument n (which defaults to 1), so that you can tell R to begin it's search at n levels back.
This is the "keeping track" that I refer to, where the developer has to be mindful of the number of calls in between. The straightforward way to go about this is to have an n argument in every function that is calling the function in question, and have that value default to 1. Then for the envir argument, you use: get/assign/eval/etc (.. , envir=parent.frame(n=n) )
Then if you call Func2 from Func1, (both Func1 and Func2 have an n argument), and Func2 is calling prova, you use:
Func1 <- function(x, y, ..., n=1) {
... some stuff ...
Func2( <some, parameters, etc,> n=n+1)
}
Func2 <- function(a, b, c, ..., n=1) {
.... some stuff....
eval(quote(prova()), envir=parent.frame(n=n) )
}
As you can see, it is not complicated but it is * tedious* and sometimes what seems like a bug creeps in, which is simply forgetting to carry the n over.
Therefore, I prefer to use a fixed variable with the environment name.
The solution that I found is:
myattr <- function(x) attr(x, 'myattr')
'myattr<-' <- function(x, value) {
# check that x is a function (e.g. the prova function)
# checks on value (e.g. also value is a function with a given precise signature)
attr(x, 'myattr') <- value
x
}
prova <- function(..., env = parent.frame()) {
# get the current function object (in its environment)
this <- eval(match.call()[[1L]], env)
# print(eval(as.call(c(myattr, this)), env)) # alternative
print(myattr(this))
# print(attr(this, 'myattr')
invisible(TRUE)
}
I want to thank #RicardoSaporta for the help and the clarification about keeping tracks of the calls.
This solution doesn't work when e.g. myattr(prova) <- function() TRUE is nested in func1 while prova is called in func2 (that it's called by func1). Unless you do not properly update its parameter env ...
For completeness, following the suggestion of #RicardoSaporta, I slightly modified the prova function:
prova <- function(..., pos = 1L) {
# get the current function object (in its environment)
this <- eval(match.call()[[1L]], parent.frame(n = pos)
print(myattr(this))
# ...
}
This way, it works also when nested, if the the correct pos parameter is passed in.
With this modification it is easier to go to fish out the environment in which you set the attribute on the function prova.
myfun1 <- function() {
myattr(prova) <- function() print(FALSE)
myfun2(n = 2)
}
myfun2 <- function(n) {
prova(pos = n)
}
myfun1()
# function() print(FALSE)
# <environment: 0x22e8208>

How to expand an ellipsis (...) argument without evaluating it in R

I need a function that accepts an arbitrary number of arguments and stores them in a variable as an expression without evaluating them. I managed to do it with match.call but it seems a little "kludgy".
foo <- function(...) {
expr <- match.call()
expr[[1]] <- expression
expr <- eval(expr)
# do some stuff with expr
return(expr)
}
> bla
Error: object 'bla' not found
> foo(x=bla, y=2)
expression(x = bla, y = 2)
Clarification
To clarify, I'm asking how to write a function that behaves like expression(). I can't use expression() directly for reasons that are too long to explain.
The most idiomatic way is:
f <- function(x, y, ...) {
match.call(expand.dots = FALSE)$`...`
}
Using . from plyr as a prototype
foo <- function (...)
{
as.expression(as.list(match.call()[-1]))
}
The ultimate intended outcome is slightly vague (could you clarify a bit?). However, this may be helpful:
foo2 <- function(...) {
expr <- as.list(substitute(list(...)))[-1L]
class(expr) <- "expression"
expr
}
example:
foo2(x=bla, y=2)
# expression(x = bla, y = 2)

Resources