Scope in generic functions R - r

If you define a variable inside a generic function, it is available to the method. For example:
g <- function(x) {
y <- 2
UseMethod("g")
}
g.default <- function() y
g()
[1] 2
But if the variable you define has the same name as the function parameter, this does not happen. It seems that R deletes that variable before calling the method:
g <- function(x) {
x <- 2
UseMethod("g")
}
g.default <- function() x
g()
Error in g.default() : object 'x' not found
Could someone explain exactly what is going on here?

The following comments from the C source file that defines do_usemethod at least hint at what's going on. See especially the second sentence of the second enumerated item.
Basically, it looks like (due to dumb application of rule in that second point), the value of x does not get copied over because the C code checks to see if it's among the formals, sees that it is, and so excludes if from the list of variables inserted into the method's evaluation environment.
/* usemethod - calling functions need to evaluate the object
* (== 2nd argument). They also need to ensure that the
* argument list is set up in the correct manner.
*
* 1. find the context for the calling function (i.e. the generic)
* this gives us the unevaluated arguments for the original call
*
* 2. create an environment for evaluating the method and insert
* a handful of variables (.Generic, .Class and .Method) into
* that environment. Also copy any variables in the env of the
* generic that are not formal (or actual) arguments.
*
* 3. fix up the argument list; it should be the arguments to the
* generic matched to the formals of the method to be invoked */

Related

Rcpp function to construct a function

In R the possibility exists to have a function that creates another function, e.g.
create_ax2 <- function(a) {
ax2 <- function(x) {
y <- a * x^2
return(y)
}
return(ax2)
}
The result of which is
> fun <- create_ax2(3)
> fun(1)
[1] 3
> fun(2)
[1] 12
> fun(2.5)
[1] 18.75
I have such a complicated create function in R which take a couple of arguments, sets some of the constants used in the returned function, does some intermediary computations etc... But the result is a function that is way too slow. Hence I tried to translate the code to C++ to use it with Rcpp. However, I can't figure out a way to construct a function inside a C++ function and return it to be used in R.
This is what I have so far:
Rcpp::Function createax2Rcpp(int a) {
double ax2(double x) {
return(a * pow(x, 2));
};
return (ax2);
}
This gives me the error 'function definition is not allowed here', I am stuck about how to create the function.
EDIT: The question RcppArmadillo pass user-defined function comes close, but as far as I can tell, it only provides a way to pass a C++ function to R. It does not provide a way to initialise some values in the C++ function before it is passed to R.
Ok, as far as I understand, you want a function returning function with a closure, a.k.a. " the function defined in the closure 'remembers' the environment in which it was created."
In C++11 and up it is quite possible to define such function, along the lines
std::function<double(double)> createax2Rcpp(int a) {
auto ax2 = [a](double x) { return(double(a) * pow(x, 2)); };
return ax2;
}
What happens, the anonymous class and object with overloaded operator() will be created, it will capture the closure and moved out of the creator function. Return will be captured into instance of std::function with type erasure etc.
But! C/C++ function in R requires to be of a certain type, which is narrower (as an opposite to wider, you could capture narrow objects into wide one, but not vice versa).
Thus, I don't know how to make from std::function a proper R function, looks like it is impossible.
Perhaps, emulation of the closure like below might help
static int __a;
double ax2(double x) {
return(__a * pow(x, 2));
}
Rcpp::Function createax2Rcpp(int a) {
__a = a;
return (ax2);
}

Function (defined by user) as argument of a function

I would like to write a function where one of the argument is a function written by the user.
Specifically, I have something like:
My_function(n,g){
x<-dnorm(n,0,1)
y<-g(x)
return(y)
}
For example, g(x)=x^2 ... but is chosen by the user. Of course, I could directly put g(dnorm(n,0,1)) as argument but I would like the user to write it in terms of x, i.e. g<-x^2 in the example.
How could I do this since the x object is only defined within the function (and not in the arguments)
I can't define the g function beforehand (otherwise, I reckon it's easy). It has to be defined within "My_function" so that the user defines everything he needs in one line.
Why not just declare g as a function with argument?
g=function(x) x^2
My_function=function(n,g){
x<-dnorm(n,0,1)
y<-g(x)
return(y)
}
My_function(1,g)

Converting positional arguments to named parameters in an R function based on variable name

In R there's common function calling pattern that looks like this:
child = function(a, b, c) {
a * b - c
}
parent = function(a, b, c) {
result = child(a=a, b=b, c=c)
}
This repetition of names is helpful because it prevents potentially insidious errors if the ordering of the child's arguments were to change, or if an additional variable were to be added into the list:
childReordered = function(c, b, a) { # same args in different order
a * b - c
}
parent = function(a, b, c) {
result = childReordered(a, b, c) # probably an error
}
But this becomes cumbersome when the names of the arguments get long:
child = function(longVariableNameA, longVariableNameB, longVariableNameC) {
longVariableNameA * longVariableNameB - longVariableNameC
}
parent = function(longVariableNameA, longVariableNameB, longVariableNameC) {
child(longVariableNameA=longVariableNameA, longVariableNameB=longVariableNameB, longVariableNameC=longVariableNameB)
}
I'd like to have a way to get the safety of the named parameters without needing to actually type the names again. And I'd like to be able to do this when I can modify only the parent, and not the child. I'd envision something like this:
parent = function(a, b, c) {
result = child(params(a, b, c))
}
Where params() would be a new function that converts the unnamed arguments to named parameters based on the names of the variables. For example:
child(params(c,b,a)) == child(a=a, b=b, c=c)
There are a couple function in 'pryr' that come close to this, but I haven't figured out how to combine them to do quite what I want. named_dots(c,b,a) returns list(c=c, b=b, a=a), and standardise_call() has a similar operation, but I haven't figured out how to be able to convert the results into something that can be passed to an unmodified child().
I'd like to be able to use a mixture of implicit and explicitly named parameters:
child(params(b=3, c=a, a)) == child(b=3, c=a, a=a)
It would also be nice to be able to mix in some unnamed constants (not variables), and have them treated as named arguments when passed to the child.
child(params(7, c, b=x)) == child(a=7, c=c, b=x) # desired
named_dots(7, c, b=x) == list("7"=7, c=c, b=x) # undesired
But in a non-R way, I'd prefer to raise errors rather than trying to muddle through with what are likely programmer mistakes:
child(params(c, 7, b=x)) # ERROR: unnamed parameters must be first
Are there tools that already exist to do this? Simple ways to piece together existing functions to do what I want? Better ways to accomplish the same goal of getting safety in the presence of changing parameter lists without unwieldy repetition? Improvements to my suggested syntax to make it even safer?
Pre-bounty clarification: Both the parent() and child() functions should be considered unchangeable. I'm not interested in wrapping either with a different interface. Rather, I'm looking here for a way to write the proposed params() function in a general manner that can rewrite the list of arguments on the fly so that both parent() and child() can be used directly with a safe but non-verbose syntax.
Post-bounty clarification: While inverting the parent-child relationship and using do.call() is a useful technique, it's not the one I'm looking for here. Instead, I'm looking for a way to accept a '...' argument, modify it to have named parameters, and then return it in a form that the enclosing function will accept. It's possible that as others suggest this is truly impossible. Personally, I currently think it is possible with a C level extension, and my hope is that this extension already exists. Perhaps the vadr package does what I want? https://github.com/crowding/vadr#dot-dot-dot-lists-and-missing-values
Partial credit: I feel silly just letting the bounty expire. If there are no full solutions, I'll award it to anyone who gives a proof of concept of at least one of the necessary steps. For example, modifying a '...' argument within a function and then passing it to another function without using do.call(). Or returning an unmodified '...' argument in a way that the parent can use it. Or anything that best points the way toward a direct solution, or even some useful links: http://r.789695.n4.nabble.com/internal-manipulation-of-td4682090.html But I'm reluctant to award it to an answer that starts with the (otherwise entirely reasonable) premise that "you don't want to do that", or "that's impossible so here's an alternative".
Bounty awarded: There are several really useful and practical answers, but I chose to award the bounty to #crowding. While he (probably correctly) asserts that what I want is impossible, I think his answer comes closest to the 'idealistic' approach I'm aiming for. I also think that his vadr package might be a good starting point for a solution, whether it matches my (potentially unrealistic) design goals or not. The 'accepted answer' is still up for grabs if in case someone figures out a way to do the impossible. Thanks for the other answers and suggestions, and hopefully they will help someone put together the pieces for a more robust R syntax.
I think attempting to overwrite the built argument matching functionality of R is somewhat dangerous, so here is a solution that uses do.call.
It was unclear how much of parent is changeable
# This ensures that only named arguments to formals get passed through
parent = function(a, b, c) {
do.call("child", mget(names(formals(child))))
}
A second option, based on the "magic" of write.csv
# this second option replaces the call to parent with child and passes the
# named arguments that have been matched within the call to parent
#
parent2 <- function(a,b,c){
Call <- match.call()
Call[[1]] <- quote(child)
eval(Call)
}
You can't change the parameters to a function from inside the function call. The next best way would be to write a simple wrapper around the call. Perhaps something like this can help
with_params <- function(f, ...) {
dots <- substitute(...())
dots <- setNames(dots, sapply(dots, deparse))
do.call(f, as.list(dots), envir=parent.frame())
}
And we can test with something like
parent1 <- function(a, b, c) {
child(a, b, c)
}
parent2 <- function(a, b, c) {
with_params(child, a, b, c)
}
child <- function(a, b, c) {
a * b - c
}
parent1(5,6,7)
# [1] 23
parent2(5,6,7)
# [1] 23
child <- function(c, a, b) {
a * b - c
}
parent1(5,6,7)
# [1] 37
parent2(5,6,7)
# [1] 23
Note that parent2 is robust to change in the parameter order of the child while parent is.
It's not easy to get the exact syntax you're proposing. R is lazily evaluated, so syntax appearing in an argument to a function is only looked at after the function call is started. By the time the interpreter encounters params() it will have already started the call to child() and bound all of child's arguments and executed some of child's code. Can't rewrite child's arguments after the horse has left the barn, so to speak.
(Most non-lazy languages wouldn't let you do this either, for various reasons)
So the syntax will need to be something that that has the the references to 'child' and 'params' in its arguments. vadr has a %()% operator that can work. It applies a dotlist given on the right to a function given on the left. So you would write:
child %()% params(a, b, c)
where params catches its arguments in a dotlist and manipulates them:
params <- function(...) {
d <- dots(...)
ex <- expressions(d)
nm <- names(d) %||% rep("", length(d))
for (i in 1:length(d)) {
if (is.name(ex[[i]]) && nm[[i]] == "") {
nm[[i]] = as.character(ex[[i]])
}
}
names(d) <- nm
d
}
This was meant to be a comment but it doesn't fit the limits. I commend you for your programming ambition and purity, but I think the goal is unattainable. Let's assume params exists and apply it to the function list. By the definition of params, identical(list(a = a, b = b, c = c) , list(params(a, b, c)). From this it follows that identical(a, params(a, b, c)) by taking the first element of the first and second argument of identical. From which it follows that params does not depend on its second and later arguments, a contradiction. Q.E.D. But I think your idea is a lovely example of DRY in R and I am perfectly happy with do.call(f, params(a,b,c)), which has an additional do.call, but no repetition. With your permission I would like to incorporate it in my package bettR which collects various ideas to improve the R language. A related idea which I was toying with is creating a function that allows another function to get missing args from the calling frame. That is, instead of calling f(a = a, b = b), one could call f() and inside f there would be something like args = eval(formals(f), parent.frame()) but encapsulated into a macro-like construct args = get.args.by.name or some such. This is distinct from your idea in that it requires f to be programmed deliberately to have this feature.
Here's an answer that at first would appear to work. Diagnosing why it does not may lead to enlightenment as to why it cannot (hint: see first paragraph of #crowding's answer).
params<-function(...) {
dots<-list(...)
names(dots)<-eval(substitute(alist(...)))
child.env<-sys.frame(-1)
child.fun<-sys.function(sys.parent()+1)
args<-names(formals(child.fun))
for(arg in args) {
assign(arg,dots[[arg]],envir=child.env)
}
dots[[args[[1]]]]
}
child1<-function(a,b,c) a*b-c
parent1<-function(a,b,c) child1(params(a,b,c))
parent1(1,2,3)
#> -1
child2<-function(a,c,b) a*b-c #swap b and c in formals
parent2<-function(a,b,c) child2(params(a,b,c)) #mirrors parent1
parent2(1,2,3)
#> -1
Both produce 1*2-3 == -1 even though the order of b=2 and c=3 have been swapped in the formal argument list of child1 versus child2.
This is basically a clarification to Mnel's answer. If it happens to answer your question, please do not accept it or award the bounty; it should go to Mnel. First, we define call_as_parent, which you use to call a function inside another function as the outside function:
call_as_parent <- function(fun) {
par.call <- sys.call(sys.parent())
new.call <- match.call(
eval(par.call[[1L]], parent.frame(2L)),
call=par.call
)
new.call[[1L]] <- fun
eval(new.call, parent.frame())
}
Then we define parent and child:
child <- function(c, b, a) a - b - c
parent <- function(a, b, c) call_as_parent(child)
And finally, some examples
child(5, 10, 20)
# [1] 5
parent(5, 10, 20)
# [1] -25
Notice how clearly in the second example the 20 is getting matched to c, as it should.

R warning() wrapper - raise to parent function

I have a wrapper around the in-built warning() function in R that basically calls warning(sprintf(...)):
warningf <- function(...)
warning(sprintf(...))
This is because I use warning(sprintf(...)) so often that I decided to make a function out of it (it's in a package I have of functions I use often).
I then use warningf when I write functions. i.e., instead of writing:
f <- function() {
# ... do stuff
warning(sprintf('I have %i bananas!',2))
# ... do stuff
}
I write:
f <- function() {
# ... do stuff
warningf('I have %i bananas!',2)
# ... do stuff
}
If I call the first f(), I get:
Warning message:
In f() : I have 2 bananas!
This is good - it tells me where the warning came from f() and what went wrong.
If I call the second f(), I get:
Warning message:
In warningf("I have %i bananas!",2) : I have 2 bananas!
This is not ideal - it tells me the warning was in the warningf function (of course, because it's the warningf function that calls warning, not f), masking the fact that it actually came from the f() function.
So my question is : Can I somehow "raise" the warning call so it displays the warning in f() message instead of the warning in warningf ?
One way of dealing with this is to get a list of the environments in your calling stack, and then pasting the name of the parent frame in your warning.
You do this with the function sys.call() which returns an item in the call stack. You want to extract the second from last element in this list, i.e. the parent to warningf:
warningf <- function(...){
parent.call <- sys.call(sys.nframe() - 1L)
warning(paste("In", deparse(parent.call), ":", sprintf(...)), call.=FALSE)
}
Now, if I run your function:
> f()
Warning message:
In f() : I have 2 bananas!
Later edit : deparse(parent.call) converts the call to a string in the case that the f() function had arguments, and shows the call as it was specified (ie including arguments etc).
I know it's old but, sys.call(sys.nframe() - 1L), or sys.call(-1),
returns a vector, with the function name and the argument.
If you use it inside paste() it will raise two warnings, one from the function and one from the argument.
The answer doesn't show because f() has no arguments.
sys.call(sys.nframe() - 1L)[1] does the trick.

Forcing specific data types as arguments to a function

I was just wondering if there was a way to force a function to only accept certain data types, without having to check for it within the function; or, is this not possible because R's type-checking is done at runtime (as opposed to those programming languages, such as Java, where type-checking is done during compilation)?
For example, in Java, you have to specify a data type:
class t2 {
public int addone (int n) {
return n+1;
}
}
In R, a similar function might be
addone <- function(n)
{
return(n+1)
}
but if a vector is supplied, a vector will (obviously) be returned. If you only want a single integer to be accepted, then is the only way to do to have a condition within the function, along the lines of
addone <- function(n)
{
if(is.vector(n) && length(n)==1)
{
return(n+1)
} else
{
return ("You must enter a single integer")
}
}
Thanks,
Chris
This is entirely possible using S3 classes. Your example is somewhat contrived in the context or R, since I can't think of a practical reason why one would want to create a class of a single value. Nonetheless, this is possible. As an added bonus, I demonstrate how the function addone can be used to add the value of one to numeric vectors (trivial) and character vectors (so A turns to B, etc.):
Start by creating a generic S3 method for addone, utlising the S3 despatch mechanism UseMethod:
addone <- function(x){
UseMethod("addone", x)
}
Next, create the contrived class single, defined as the first element of whatever is passed to it:
as.single <- function(x){
ret <- unlist(x)[1]
class(ret) <- "single"
ret
}
Now create methods to handle the various classes. The default method will be called unless a specific class is defined:
addone.default <- function(x) x + 1
addone.character <- function(x)rawToChar(as.raw(as.numeric(charToRaw(x))+1))
addone.single <- function(x)x + 1
Finally, test it with some sample data:
addone(1:5)
[1] 2 3 4 5 6
addone(as.single(1:5))
[1] 2
attr(,"class")
[1] "single"
addone("abc")
[1] "bcd"
Some additional information:
Hadley's devtools wiki is a valuable source of information on all things, including the S3 object system.
The S3 method doesn't provide strict typing. It can quite easily be abused. For stricter object orientation, have a look at S4 classes, reference based classesor the proto package for Prototype object-based programming.
You could write a wrapper like the following:
check.types = function(classes, func) {
n = as.name
params = formals(func)
param.names = lapply(names(params), n)
handler = function() { }
formals(handler) = params
checks = lapply(seq_along(param.names), function(I) {
as.call(list(n('assert.class'), param.names[[I]], classes[[I]]))
})
body(handler) = as.call(c(
list(n('{')),
checks,
list(as.call(list(n('<-'), n('.func'), func))),
list(as.call(c(list(n('.func')), lapply(param.names, as.name))))
))
handler
}
assert.class = function(x, cls) {
stopifnot(cls %in% class(x))
}
And use it like
f = check.types(c('numeric', 'numeric'), function(x, y) {
x + y
})
> f(1, 2)
[1] 3
> f("1", "2")
Error: cls %in% class(x) is not TRUE
Made somewhat inconvenient by R not having decorators. This is kind of hacky
and it suffers from some serious problems:
You lose lazy evaluation, because you must evaluate an argument to determine
its type.
You still can't check the types until call time; real static type checking
lets you check the types even of a call that never actually happens.
Since R uses lazy evaluation, (2) might make type checking not very useful,
because the call might not actually occur until very late, or never.
The answer to (2) would be to add static type information. You could probably
do this by transforming expressions, but I don't think you want to go there.
I've found stopifnot() to be highly useful for these situations as well.
x <- function(n) {
stopifnot(is.vector(n) && length(n)==1)
print(n)
}
The reason it is so useful is because it provides a pretty clear error message to the user if the condition is false.

Resources