scope of nested R function - r

I have an example where I am not sure I understand scoping in R, nor I think it's doing the Right Thing. The example is modified from "An R and S-PLUS Companion to Applied Regression" by J. Fox
> make.power = function(p) function(x) x^p
> powers = lapply(1:3, make.power)
> lapply(powers, function(p) p(2))
What I expected in the list powers where three functions that compute the identity, square and cube functions respectively, but they all cube their argument. If I don't use an lapply, it works as expected.
> id = make.power(1)
> square = make.power(2)
> cube = make.power(3)
> id(2)
[1] 2
> square(2)
[1] 4
> cube(2)
[1] 8
Am I the only person to find this surprising or disturbing? Is there a deep satisfying reason why it is so? Thanks
PS: I have performed searches on Google and SO, but, probably due to the generality of the keywords related to this problem, I've come out empty handed.
PPS: The example is motivated by a real bug in the package quickcheck, not by pure curiosity. I have a workaround for the bug, thanks for your concern. This is about learning something.
After posting the question of course I get an idea for a different example that could clarify the issue.
> p = 1
> id = make.power(p)
> p = 2
> square = make.power(p)
> id(2)
[1] 4
p has the same role as the loop variable hidden in an lapply. p is passed by a method that in this case looks like reference to make.power. Make.power doesn't evaluate it, just keeps a pointer to it. Am I on the right track?

This fixes the problem
make.power = function(p) {force(p); function(x) x^p}
powers = lapply(1:3, make.power)
lapply(powers, function(p) p(2))
This issue is that function parameters are passed as "promises" that aren't evaluated until they are actually used. Here, because you never actually use p when calling make.power(), it remains in the newly created environment as a promise that points to the variable passed to the function. When you finally call powers(), that promise is finally evaluated and the most recent value of p will be from the last iteration of the lapply. Hence all your functions are cubic.
The force() here forces the evaluation of the promise. This allows the newly created function each to have a different reference to a specific value of p.

Related

Why does R treat function closures differently when the inner and outer functions are broken out onto separate lines?

In R, a closure can be used to define a family of related functions, as in the following example:
> power <- function(exponent) function(x) x^exponent
> square <- power(2)
> square
function(x) x^exponent
<environment: 0x7fed9ee6b110>
> square(2)
[1] 4
> cube <- power(3)
> cube
function(x) x^exponent
<bytecode: 0x7fed9f21d5b8>
<environment: 0x7fed9e734d58>
> cube(2)
[1] 8
However, when I attempt to divide the "inner" and "outer" functions onto separate lines, something strange happens:
> finner <- function(x) x^exponent
> fouter <- function(exponent) finner
> square_alt <- fouter(2)
> square_alt
function(x) x^exponent
> square_alt(2)
Error in square_alt(2) : object 'exponent' not found
In the first example, when I defined the function square <- power(2), it seems like R took the number 2, created a special environment for it, and inside that environment, it assigned exponent <- 2
In the second example, when I defined square_alt <- fouter(2), R apparently didn't create such an environment, but nevertheless a valid function or object still seems to have resulted, as there's no error message after that line.
Question 1: What happened to the "2" in square_alt <- fouter(2)? R evidently consumed the 2, but what did R actually do with the "2", instead of assigning exponent <- 2?
Question 2: Given the way I've defined the fouter closure in the second example, (i.e., breaking out the definition over two lines) what are my options for assigning exponent <- 2 inside the environment of that object--if indeed it's even possible to make such an assignment?
Question 3: Why does the cube function also have a bytecode assigned to it, in addition to an environment? Given I created the cube function exactly the same way as I did for square, why does R give back a seemingly different result in this case?

Any logical test to distinguish between make-up of numerical objects

I was wondering if there a way for R to detect the existence or absence of the sign * as used in the following objects?
In other words, can R understand that a has a * sign but b doesn't?
a = 3*4
b = 12
If you keep the expressions unevaluated, R can understand their internal complexity. Under normal circumstances, though, R evaluates expressions immediately, so there is no way to tell the difference between a <- 3*4 and b <- 12 once the assignments have been made. That means that the answer to your specific question is No.
Dealing with unevaluated expressions can get a bit complex, but quote() is one simple way to keep e.g. 3*4 from being evaluated:
> length(quote(3*4))
[1] 3
> length(quote(12))
[1] 1
If you're working inside a function, you can use substitute to retrieve the unevaluated form of the function arguments:
> f <- function(a) {
+ length(substitute(a))
+ }
> f(12)
[1] 1
> f(3*4)
[1] 3
In case you're pursuing this farther, you should be aware that counting complexity might not be as easy as you think:
> f(sqrt(2*3+(7*19)^2))
[1] 2
What's going on is that R stores expressions as a tree; the top level here is made up of sqrt and <the rest of the expression>, which has length 2. If you want to measure complexity you'll need to do some kind of collapsing or counting down the branches of the tree ...
Furthermore, if you first assign a <- 3*4 and then call f(a) you get 1, not 3, because substitute() gives you back just the symbol a, which has length 1 ... the information about the difference between "12" and "3*4" gets lost as soon as the expression is evaluated, which happens when the value is assigned to the symbol a. The bottom line is that you have to be very careful in controlling when expressions get evaluated, and it's not easy.
Hadley Wickham's chapter on expressions might be a good place to read more.

'=' vs. '<-' as a function argument in R

I am a beginner so I'd appreciate any thoughts, and I understand that this question might be too basic for some of you.
Also, this question is not about the difference between <- and =, but about the way they get evaluated when they are part of the function argument. I read this thread, Assignment operators in R: '=' and '<-' and several others, but I couldn't understand the difference.
Here's the first line of code:
My objective is to get rid of variables in the environment. From reading the above thread, I would believe that <- would exist in the user workspace, so there shouldn't be any issue with deleting all variables.
Here is my code and two questions:
Question 1
First off, this code doesn't work.
rm(ls()) #throws an error
I believe this happens because ls() returns a character vector, and rm() expects an object name. Am I correct? If so, I would appreciate if someone could guide me how to get object names from character array.
Question 2
I googled this topic and found that this code below deletes all variables.
rm(list = ls())
While this does help me, I am unsure why = is used instead of <-. If I run the following code, I get an error Error in rm(list <- ls()) : ... must contain names or character strings
rm(list <- ls())
Why is this? Can someone please guide me? I'd appreciate any help/guidance.
I read this thread, Assignment operators in R: '=' and '<-' and several others, but I couldn't understand the difference.
No wonder, since the answers there are actually quite confusing, and some are outright wrong. Since that’s the case, let’s first establish the difference between them before diving into your actual question (which, it turns out, is mostly unrelated):
<- is an assignment operator
In R, <- is an operator that performs assignment from right to left, in the current scope. That’s it.
= is either an assignment operator or a distinct syntactic token
=, by contrast, has several meanings: its semantics change depending on the syntactic context it is used in:
If = is used inside a parameter list, immediately to the right of a parameter name, then its meaning is: “associate the value on the right with the parameter name on the left”.
Otherwise (i.e. in all other situations), = is also an operator, and by default has the same meaning as <-: i.e. it performs assignment in the current scope.
As a consequence of this, the operators <- and = can be used interchangeably1. However, = has an additional syntactic role in an argument list of a function definition or a function call. In this context it’s not an operator and cannot be replaced by <-.
So all these statements are equivalent:
x <- 1
x = 1
x[5] <- 1
x[5] = 1
(x <- 1)
(x = 1)
f((x <- 5))
f((x = 5))
Note the extra parentheses in the last example: if we omitted these, then f(x = 5) would be interpreted as a parameter association rather than an assignment.
With that out of the way, let’s turn to your first question:
When calling rm(ls()), you are passing ls() to rm as the ... parameter. Ronak’s answer explains this in more detail.
Your second question should be answered by my explanation above: <- and = behave differently in this context because the syntactic usage dictates that rm(list = ls()) associates ls() with the named parameter list, whereas <- is (as always) an assignment operator. The result of that assignment is then once again passed as the ... parameter.
1 Unless somebody changed their meaning: operators, like all other functions in R, can be overwritten with new definitions.
To expand on my comment slightly, consider this example:
> foo <- function(a,b) b+1
> foo(1,b <- 2) # Works
[1] 3
> ls()
[1] "b" "foo"
> foo(b <- 3) # Doesn't work
Error in foo(b <- 3) : argument "b" is missing, with no default
The ... argument has some special stuff going on that restricts things a little further in the OP's case, but this illustrates the issue with how R is parsing the function arguments.
Specifically, when R looks for named arguments, it looks specifically for arg = val, with an equals sign. Otherwise, it is parsing the arguments positionally. So when you omit the first argument, a, and just do b <- 1, it thinks the expression b <- 1 is what you are passing for the argument a.
If you check ?rm
rm(..., list = character(),pos = -1,envir = as.environment(pos), inherits = FALSE)
where ,
... - the objects to be removed, as names (unquoted) or character strings (quoted).
and
list - a character vector naming objects to be removed.
So, if you do
a <- 5
and then
rm(a)
it will remove the a from the global environment.
Further , if there are multiple objects you want to remove,
a <- 5
b <- 10
rm(a, b)
This can also be written as
rm(... = a, b)
where we are specifying that the ... part in syntax takes the arguments a and b
Similarly, when we want to specify the list part of the syntax, it has to be given by
rm(list = ls())
doing list <- ls() will store all the variables from ls() in the variable named list
list <- ls()
list
#[1] "a" "b" "list"
I hope this is helpful.

What's the real meaning about 'Everything that exists is an object' in R?

I saw:
“To understand computations in R, two slogans are helpful:
• Everything that exists is an object.
• Everything that happens is a function call."
— John Chambers
But I just found:
a <- 2
is.object(a)
# FALSE
Actually, if a variable is a pure base type, it's result is.object() would be FALSE. So it should not be an object.
So what's the real meaning about 'Everything that exists is an object' in R?
The function is.object seems only to look if the object has a "class" attribute. So it has not the same meaning as in the slogan.
For instance:
x <- 1
attributes(x) # it does not have a class attribute
NULL
is.object(x)
[1] FALSE
class(x) <- "my_class"
attributes(x) # now it has a class attribute
$class
[1] "my_class"
is.object(x)
[1] TRUE
Now, trying to answer your real question, about the slogan, this is how I would put it. Everything that exists in R is an object in the sense that it is a kind of data structure that can be manipulated. I think this is better understood with functions and expressions, which are not usually thought as data.
Taking a quote from Chambers (2008):
The central computation in R is a function call, defined by the
function object itself and the objects that are supplied as the
arguments. In the functional programming model, the result is defined
by another object, the value of the call. Hence the traditional motto
of the S language: everything is an object—the arguments, the value,
and in fact the function and the call itself: All of these are defined
as objects. Think of objects as collections of data of all kinds. The data contained and the way the data is organized depend on the class from which the object was generated.
Take this expression for example mean(rnorm(100), trim = 0.9). Until it is is evaluated, it is an object very much like any other. So you can change its elements just like you would do it with a list. For instance:
call <- substitute(mean(rnorm(100), trim = 0.9))
call[[2]] <- substitute(rt(100,2 ))
call
mean(rt(100, 2), trim = 0.9)
Or take a function, like rnorm:
rnorm
function (n, mean = 0, sd = 1)
.Call(C_rnorm, n, mean, sd)
<environment: namespace:stats>
You can change its default arguments just like a simple object, like a list, too:
formals(rnorm)[2] <- 100
rnorm
function (n, mean = 100, sd = 1)
.Call(C_rnorm, n, mean, sd)
<environment: namespace:stats>
Taking one more time from Chambers (2008):
The key concept is that expressions for evaluation are themselves
objects; in the traditional motto of the S language, everything is an
object. Evaluation consists of taking the object representing an
expression and returning the object that is the value of that
expression.
So going back to our call example, the call is an object which represents another object. When evaluated, it becomes that other object, which in this case is the numeric vector with one number: -0.008138572.
set.seed(1)
eval(call)
[1] -0.008138572
And that would take us to the second slogan, which you did not mention, but usually comes together with the first one: "Everything that happens is a function call".
Taking again from Chambers (2008), he actually qualifies this statement a little bit:
Nearly everything that happens in R results from a function call.
Therefore, basic programming centers on creating and refining
functions.
So what that means is that almost every transformation of data that happens in R is a function call. Even a simple thing, like a parenthesis, is a function in R.
So taking the parenthesis like an example, you can actually redefine it to do things like this:
`(` <- function(x) x + 1
(1)
[1] 2
Which is not a good idea but illustrates the point. So I guess this is how I would sum it up: Everything that exists in R is an object because they are data which can be manipulated. And (almost) everything that happens is a function call, which is an evaluation of this object which gives you another object.
I love that quote.
In another (as of now unpublished) write-up, the author continues with
R has a uniform internal structure for representing all objects. The evaluation process keys off that structure, in a simple form that is essentially
composed of function calls, with objects as arguments and an object as the
value. Understanding the central role of objects and functions in R makes
use of the software more effective for any challenging application, even those where extending R is not the goal.
but then spends several hundred pages expanding on it. It will be a great read once finished.
Objects For x to be an object means that it has a class thus class(x) returns a class for every object. Even functions have a class as do environments and other objects one might not expect:
class(sin)
## [1] "function"
class(.GlobalEnv)
## [1] "environment"
I would not pay too much attention to is.object. is.object(x) has a slightly different meaning than what we are using here -- it returns TRUE if x has a class name internally stored along with its value. If the class is stored then class(x) returns the stored value and if not then class(x) will compute it from the type. From a conceptual perspective it matters not how the class is stored internally (stored or computed) -- what matters is that in both cases x is still an object and still has a class.
Functions That all computation occurs through functions refers to the fact that even things that you might not expect to be functions are actually functions. For example when we write:
{ 1; 2 }
## [1] 2
if (pi > 0) 2 else 3
## [1] 2
1+2
## [1] 3
we are actually making invocations of the {, if and + functions:
`{`(1, 2)
## [1] 2
`if`(pi > 0, 2, 3)
## [1] 2
`+`(1, 2)
## [1] 3

Why are arguments to replacement functions not evaluated lazily?

Consider the following simple function:
f <- function(x, value){print(x);print(substitute(value))}
Argument x will eventually be evaluated by print, but value never will. So we can get results like this:
> f(a, a)
Error in print(x) : object 'a' not found
> f(3, a)
[1] 3
a
> f(1+1, 1+1)
[1] 2
1 + 1
> f(1+1, 1+"one")
[1] 2
1 + "one"
Everything as expected.
Now consider the same function body in a replacement function:
'g<-' <- function(x, value){print(x);print(substitute(value))}
(the single quotes should be fancy quotes)
Let's try it:
> x <- 3
> g(x) <- 4
[1] 3
[1] 4
Nothing unusual so far...
> g(x) <- a
Error: object 'a' not found
This is unexpected. Name a should be printed as a language object.
> g(x) <- 1+1
[1] 4
1 + 1
This is ok, as x's former value is 4. Notice the expression passed unevaluated.
The final test:
> g(x) <- 1+"one"
Error in 1 + "one" : non-numeric argument to binary operator
Wait a minute... Why did it try to evaluate this expression?
Well the question is: bug or feature? What is going on here? I hope some guru users will shed some light about promises and lazy evaluation on R. Or we may just conclude it's a bug.
We can reduce the problem to a slightly simpler example:
g <- function(x, value)
'g<-' <- function(x, value) x
x <- 3
# Works
g(x, a)
`g<-`(x, a)
# Fails
g(x) <- a
This suggests that R is doing something special when evaluating a replacement function: I suspect it evaluates all arguments. I'm not sure why, but the comments in the C code (https://github.com/wch/r-source/blob/trunk/src/main/eval.c#L1656 and https://github.com/wch/r-source/blob/trunk/src/main/eval.c#L1181) suggest it may be to make sure other intermediate variables are not accidentally modified.
Luke Tierney has a long comment about the drawbacks of the current approach, and illustrates some of the more complicated ways replacement functions can be used:
There are two issues with the approach here:
A complex assignment within a complex assignment, like
f(x, y[] <- 1) <- 3, can cause the value temporary
variable for the outer assignment to be overwritten and
then removed by the inner one. This could be addressed by
using multiple temporaries or using a promise for this
variable as is done for the RHS. Printing of the
replacement function call in error messages might then need
to be adjusted.
With assignments of the form f(g(x, z), y) <- w the value
of z will be computed twice, once for a call to g(x, z)
and once for the call to the replacement function g<-. It
might be possible to address this by using promises.
Using more temporaries would not work as it would mess up
replacement functions that use substitute and/or
nonstandard evaluation (and there are packages that do
that -- igraph is one).
I think the key may be found in this comment beginning at line 1682 of "eval.c" (and immediately followed by the evaluation of the assignment operation's RHS):
/* It's important that the rhs get evaluated first because
assignment is right associative i.e. a <- b <- c is parsed as
a <- (b <- c). */
PROTECT(saverhs = rhs = eval(CADR(args), rho));
We expect that if we do g(x) <- a <- b <- 4 + 5, both a and b will be assigned the value 9; this is in fact what happens.
Apparently, the way that R ensures this consistent behavior is to always evaluate the RHS of an assignment first, before carrying out the rest of the assignment. If that evaluation fails (as when you try something like g(x) <- 1 + "a"), an error is thrown and no assignment takes place.
I'm going to go out on a limb here, so please, folks with more knowledge feel free to comment/edit.
Note that when you run
'g<-' <- function(x, value){print(x);print(substitute(value))}
x <- 1
g(x) <- 5
a side effect is that 5 is assigned to x. Hence, both must be evaluated. But if you then run
'g<-'(x,10)
both the values of x and 10 are printed, but the value of x remains the same.
Speculation:
So the parser is distinguishing between whether you call g<- in the course of making an actual assignment, and when you simply call g<- directly.

Resources