Builtins and Closures in R - r

I don't know , if this is the correct place to ask subjective questions and I am fairly new to R, but i am really confused at this time.I was going through R-Language reference and found two objects by running typeof(is.na) and typeof(mean) which returned "builtin" and "closure" respectively on R prompt. I don't know what does it mean, I went to this site after searching, http://www.r-bloggers.com/closures-in-r-a-useful-abstraction/ , but unable to understand , Can anyone help me in understanding "closure" and "builtin" in some layman terms?

From help("closure"):
This type of function is not the only type in R: they are called
closures (a name with origins in LISP) to distinguish them from
primitive functions.
From help(".Primitive"):
a ‘primitive’ (internally implemented) function.
"internally implemented" is just a different term for "builtin".
Both (closures and primitives) are functions in R. A primitive function is implemented exclusively in C (with a special calling mechanism), a closure (mostly) in R.
Another distinction between them is, that a closure always has an environment associated with it (see the language definition). That's actually pretty much the definition of "closure".

function clousures, or simply functions, are made of three basic components: formals, body and environment.
The environment, or better the enclosing environment, of a function is the environment where the function is created and functions do remember it ...
> typeof(mean)
[1] "closure"
> environment(mean)
<environment: namespace:base>
> environment(function(x){x+1})
<environment: R_GlobalEnv>
Primitives, as opposite to closures, do not have an environment:
> typeof(sum)
[1] "builtin"
> environment(sum)
NULL
In practice, closures and Primitives, differ in the way arguments are passed to the internal function.
If we want to check if a function is a closure or primitive, we may use:
> is.primitive(sum)
[1] TRUE
> is.primitive(mean)
[1] FALSE

Related

How do I call the `function` function?

I am trying to call the function `function` to define a function in R code.
As we all know™️, `function`is a .Primitive that’s used internally by R to define functions when the user uses the conventional syntax, i.e.
mean1 = function (x, ...) base::mean(x, ...)
But there’s nothing preventing me from calling that primitive directly. Or so I thought. I can call other primitives directly (and even redefine them; for instance, in a moment of madness I overrode R’s builtin `for`). So this is in principle possible.
Yet I cannot get it to work for `function`. Here’s what I tried:
# Works
mean2 = as.function(c(formals(mean), quote(mean(x, ...))))
# Works
mean3 = eval(call('function', formals(mean), quote(mean(x, ...))))
# Error: invalid formal argument list for "function"
mean4 = `function`(formals(mean), quote(mean(x, ...)))
The fact that mean3 in particular works indicates to me that mean4 should work. But it doesn’t. Why?
I checked the definition of the `function` primitive in the R source. do_function is defined in eval.c. And I see that it calls CheckFormals, which ensures that each argument is a symbol, and this fails. But why does it check this, and what does that mean?
And most importantly: Is there a way of calling the `function` primitive directly?
Just to clarify: There are trivial workarounds (this question lists two, and there’s at least a third). But I’d like to understand how this (does not) works.
This is because function is a special primitive:
typeof(`function`)
#> [1] "special"
The arguments are not evaluated, so you have actually passed quote(formals(mean)) instead of the value of formals(mean). I don't think there's a way of calling function directly without evaluation tricks, except with an empty formals list which is just NULL.
For completeness’ sake, Lionel’s answer hints at a way of calling `function` after all. Unfortunately it’s rather restricted, since we cannot pass any argument definition except for NULL:
mean5 = `function`(NULL, mean(x, ...))
formals(mean5) = formals(mean)
(Note the lack of quoting around the body!)
This is of course utterly unpractical (and formals<- internally calls as.function anyway.)
After digging a little bit through the source code, here are a few observations:
The actual function creation is done by mkCLOSXP(). This is what gets called by function() {}, by as.function.default() and by .Primitive("function") (a.k.a. `function`)
as.function.default() gets routed to do_asfunction(), which also calls CheckFormals(). However, it directly constructs these formals a few lines above that.
As you pointed out, the other place where CheckFormals() gets called is inside do_function(). However, I don't think do_function() gets called by anything other than .Primitive("function"), so this is the only situation where CheckFormals() is called on the user's input.
CheckFormals() does actually correctly validate a pairlist object.
You can check the last point yourself by running parts of the CheckFormals() function using inline::cfunction
inline::cfunction( c(x="ANY"),
'Rprintf("is list?: %d\\nTag1 OK?: %d\\nTag2 OK?: %d\\nTag3 NULL?: %d\\n",
isList(x), TYPEOF(TAG(x)) == SYMSXP, TYPEOF(TAG(CDR(x))) == SYMSXP,
CDR(CDR(x)) == R_NilValue); return R_NilValue;' )( formals(mean) )
# is list?: 1
# Tag1 OK?: 1
# Tag2 OK?: 1
# Tag3 NULL?: 1
So, somewhere between you passing formals(means) to .Primitive("function") and it getting forwarded to CheckFormals() by do_function(), the argument loses its validity. (I don't know the R source well enough to tell you how that happens.) However, since do_function() is only called by .Primitive("function"), you don't encounter this situation with any other examples.

In R, is it possible to define a function using `function`? [duplicate]

I am trying to call the function `function` to define a function in R code.
As we all know™️, `function`is a .Primitive that’s used internally by R to define functions when the user uses the conventional syntax, i.e.
mean1 = function (x, ...) base::mean(x, ...)
But there’s nothing preventing me from calling that primitive directly. Or so I thought. I can call other primitives directly (and even redefine them; for instance, in a moment of madness I overrode R’s builtin `for`). So this is in principle possible.
Yet I cannot get it to work for `function`. Here’s what I tried:
# Works
mean2 = as.function(c(formals(mean), quote(mean(x, ...))))
# Works
mean3 = eval(call('function', formals(mean), quote(mean(x, ...))))
# Error: invalid formal argument list for "function"
mean4 = `function`(formals(mean), quote(mean(x, ...)))
The fact that mean3 in particular works indicates to me that mean4 should work. But it doesn’t. Why?
I checked the definition of the `function` primitive in the R source. do_function is defined in eval.c. And I see that it calls CheckFormals, which ensures that each argument is a symbol, and this fails. But why does it check this, and what does that mean?
And most importantly: Is there a way of calling the `function` primitive directly?
Just to clarify: There are trivial workarounds (this question lists two, and there’s at least a third). But I’d like to understand how this (does not) works.
This is because function is a special primitive:
typeof(`function`)
#> [1] "special"
The arguments are not evaluated, so you have actually passed quote(formals(mean)) instead of the value of formals(mean). I don't think there's a way of calling function directly without evaluation tricks, except with an empty formals list which is just NULL.
For completeness’ sake, Lionel’s answer hints at a way of calling `function` after all. Unfortunately it’s rather restricted, since we cannot pass any argument definition except for NULL:
mean5 = `function`(NULL, mean(x, ...))
formals(mean5) = formals(mean)
(Note the lack of quoting around the body!)
This is of course utterly unpractical (and formals<- internally calls as.function anyway.)
After digging a little bit through the source code, here are a few observations:
The actual function creation is done by mkCLOSXP(). This is what gets called by function() {}, by as.function.default() and by .Primitive("function") (a.k.a. `function`)
as.function.default() gets routed to do_asfunction(), which also calls CheckFormals(). However, it directly constructs these formals a few lines above that.
As you pointed out, the other place where CheckFormals() gets called is inside do_function(). However, I don't think do_function() gets called by anything other than .Primitive("function"), so this is the only situation where CheckFormals() is called on the user's input.
CheckFormals() does actually correctly validate a pairlist object.
You can check the last point yourself by running parts of the CheckFormals() function using inline::cfunction
inline::cfunction( c(x="ANY"),
'Rprintf("is list?: %d\\nTag1 OK?: %d\\nTag2 OK?: %d\\nTag3 NULL?: %d\\n",
isList(x), TYPEOF(TAG(x)) == SYMSXP, TYPEOF(TAG(CDR(x))) == SYMSXP,
CDR(CDR(x)) == R_NilValue); return R_NilValue;' )( formals(mean) )
# is list?: 1
# Tag1 OK?: 1
# Tag2 OK?: 1
# Tag3 NULL?: 1
So, somewhere between you passing formals(means) to .Primitive("function") and it getting forwarded to CheckFormals() by do_function(), the argument loses its validity. (I don't know the R source well enough to tell you how that happens.) However, since do_function() is only called by .Primitive("function"), you don't encounter this situation with any other examples.

Where are function constants stored if a function is created inside another function?

I am using a parent function to generate a child function by returning the function in the parent function call. The purpose of the parent function is to set a constant (y) in the child function. Below is a MWE. When I try to debug the child function I cannot figure out in which environment the variable is stored in.
power=function(y){
return(function(x){return(x^y)})
}
square=power(2)
debug(square)
square(3)
debugging in: square(3)
debug at #2: {
return(x^y)
}
Browse[2]> x
[1] 3
Browse[2]> y
[1] 2
Browse[2]> ls()
[1] "x"
Browse[2]> find('y')
character(0)
If you inspect the type of an R function, you’ll observe the following:
> typeof(square)
[1] "closure"
And that is, in fact, exactly the answer to your question: a closure is a function that carries an environment around.
R also tells you which environment this is (albeit not in a terribly useful way):
> square
function(x){return(x^y)}
<environment: 0x7ffd9218e578>
(The exact number will differ with each run — it’s just a memory address.)
Now, which environment does this correspond to? It corresponds to a local environment that was created when we executed power(2) (a “stack frame”). As the other answer says, it’s now the parent environment of the square function (in fact, in R every function, except for certain builtins, is associated with a parent environment):
> ls(environment(square))
[1] "y"
> environment(square)$y
[1] 2
You can read more about environments in the chapter in Hadley’s Advanced R book.
Incidentally, closures are a core feature of functional programming languages. Another core feature of functional languages is that every expression is a value — and, by implication, a function’s (return) value is the value of its last expression. This means that using the return function in R is both unnecessary and misleading!1 You should therefore leave it out: this results in shorter, more readable code:
power = function (y) {
function (x) x ^ y
}
There’s another R specific subtlety here: since arguments are evaluated lazily, your function definition is error-prone:
> two = 2
> square = power(two)
> two = 10
> square(5)
[1] 9765625
Oops! Subsequent modifications of the variable two are reflected inside square (but only the first time! Further redefinitions won’t change anything). To guard against this, use the force function:
power = function (y) {
force(y)
function (x) x ^ y
}
force simply forces the evaluation of an argument name, nothing more.
1 Misleading, because return is a function in R and carries a slightly different meaning compared to procedural languages: it aborts the current function exectuion.
The variable y is stored in the parent environment of the function. The environment() function returns the current environment, and we use parent.env() to get the parent environment of a particular environment.
ls(envir=parent.env(environment())) #when using the browser
The find() function doesn't seem helpful in this case because it seems to only search objects that have been attached to the global search path (search()). It doesn't try to resolve variable names in the current scope.

Auto assignment

Recently I have been working with a set of R scripts that I inherited from a colleague. It is for me a trusted source but more than once I found in his code auto-assignments like
x <<- x
Is there any scope where such an operation could make sense?
This is a mechanism for copying a value defined within a function into the global environment (or at least, somewhere within the stack of parent of environments): from ?"<<-"
The operators ‘<<-’ and ‘->>’ are normally only used in functions,
and cause a search to be made through parent environments for an
existing definition of the variable being assigned. If such a
variable is found (and its binding is not locked) then its value
is redefined, otherwise assignment takes place in the global
environment.
I don't think it's particularly good practice (R is a mostly-functional language, and it's generally better to avoid function side effects), but it does do something. (#Roland points out in comments and #BrianO'Donnell in his answer [quoting Thomas Lumley] that using <<- is good practice if you're using it to modify a function closure, as in demo(scoping). In my experience it is more often misused to construct global variables than to work cleanly with function closures.)
Consider this example, starting in an empty/clean environment:
f <- function() {
x <- 1 ## assignment
x <<- x ## global assignment
}
Before we call f():
x
## Error: object 'x' not found
Now call f() and try again:
f()
x
## [1] 1
<<-
is a global assignment operator and I would imagine there should hardly ever be a reason to use it because it effectively causes side effects.
The scope to use it would be in any case when one wants to define a global variable or a variable one level up from current environment.
Alan gives a good answer: Use the superassignment operator <<- to write upstairs.
Hadley also gives a good answer: How do you use "<<-" (scoping assignment) in R?.
For details on the 'superassignment' operator see Scope.
Here is some critical information on the operator from the section on Assignment Operators in the R manual:
"The operators <<- and ->> are normally only used in functions, and cause a search to be made through parent environments for an existing definition of the variable being assigned. If such a variable is found (and its binding is not locked) then its value is redefined, otherwise assignment takes place in the global environment."
Thomas Lumley sums it up nicely: "The good use of superassignment is in conjuction with lexical scope, where an environment stores state for a function or set of functions that modify the state by using superassignment."
For example:
x <- NA
test <- function(x) {
x <<- x
}
> test(5)
> x
#[1] 5
That's a simple use here, <<- will do a parent environment search (case of nested functions declarations) and if not found assign in the global environment.
Usually this is a really bad ideaTM as you have no real control on where the variable will be assigned and you have chances it will overwrite a variable used for another purpose somewhere.

Why does rm inside a function not delete objects?

rel.mem <- function(nm) {
rm(nm)
}
I defined the above function rel.mem -- takes a single argument and passes it to rm
> ls()
[1] "rel.mem"
> x<-1:10
> ls()
[1] "rel.mem" "x"
> rel.mem(x)
> ls()
[1] "rel.mem" "x"
Now you can see what I call rel.mem x is not deleted -- I know this is due to the incorrect environment on which rm is being attempted.
What is a good fix for this?
Criteria for a good fix:
The caller should not have to pass the environment
The callee (rel.mem) should be able to determine the environment by using an R language facility (call stack inspection, aspects, etc.)
The interface of the function rel.mem should be kept simple -- idiot proof: call rel.mem -- then rel.mem takes it from there -- no need to pass environments.
NOTES:
As many commenters have pointed out that one easy fix is to pass the environment.
What I meant by a good fix [and I should have clarified it] is that the callee function (in this case rel.mem) is able to calculate/find out the environment when the caller was referring to and then remove the object from the right environment.
The type of reasoning in "2" can be done in other languages by inspecting the call stack -- for example in Java I would throw a dummy exception -- catch it and then parse the call stack. In other languages still I could use Aspect Oriented techniques. The question is can something like that be done in R?
As one commenter has suggested that there may be multiple objects with the same name and thus the "right" environment is meaningless -- as I've stated above that in other languages it is possible (sometimes with some creative trickery) to interpret the call-stack -- this may not be possible in R
As one commenter has suggested that rm(list=nm, envir = parent.frame()) will remove this from the parent environment. This is correct -- however I'm looking for something that will work for an arbitrary call depth.
The quick answer is that you're in a different environment - essentially picture the variables in a box: you have a box for the function and one for the Global Environment. You just need to tell rm where to find that box.
So
rel_mem <- function(nm) {
# State the environment
rm(list=nm, envir = .GlobalEnv )
}
x = 10
rel_mem("x")
Alternatively, you can use the pos argument, e.g.
rel_mem <- function(nm) {
rm(list=nm, pos=1 )
}
If you type search() you will see a vector of environments, the global is number 1.
Another two options are
envir = parent.frame() if you want to go one level up the call stack
Use inherits = TRUE to go up the call stack until you find something
In the above code, notice that I'm passing the object as a character - I'm passing the "x" not x. We can be clever and avoid this using the substitute function
rel_mem <- function(nm) {
rm(list = as.character(substitute(nm)), envir = .GlobalEnv )
}
To finish I'll just add that deleting things in the .GlobalEnv from a function is generally a bad idea.
Further resources:
Environments:http://adv-r.had.co.nz/Environments.html
Substitute function: http://adv-r.had.co.nz/Computing-on-the-language.html#capturing-expressions
If you are using another function to find the global objects within your function such as ls(), you must state the environment in it explicitly too:
rel_mem <- function(nm) {
# State the environment in both functions
rm(list = ls(envir = .GlobalEnv) %>% .[startsWith(., "plot_")], envir = .GlobalEnv)
}

Resources