How are apply family functions scoped? - r

Consider:
x <- 5
replicate(10, x <- x + 1)
This has output c(6, 6, 6, 6, 6, 6, 6, 6, 6, 6). However:
x <- 5
replicate(10, x <<- x + 1)
has output c(6, 7, 8, 9, 10, 11, 12, 13, 14, 15).
What does this imply about the environment that x <- x + 1 is evaluated in? Am I to believe that x is treated as if it is an internal variable for replicate? That appears to be what I'm seeing, but when I consulted the relevant section of the language definition, I saw the following:
It is also worth noting that the effect of foo(x <- y) if the argument is evaluated is to change the value of x in the calling environment and not in the evaluation environment of foo.
But if x really was changed in the calling environment, then why does:
x <- 5
replicate(10, x <- x + 1)
x
Return 5 and not 15? What part have I misunderstood?

The sentence you quoted from the language definition is about standard evaluation, but replicate uses non-standard evaluation. Here's its source:
replicate <- function (n, expr, simplify = "array")
sapply(integer(n), eval.parent(substitute(function(...) expr)),
simplify = simplify)
The substitute(function(...) expr) call takes your expression x <- x + 1 without evaluating it, and creates a new function
function(...) x <- x + 1
That's the function that gets passed to sapply(), which applies it to a vector of length n. So all the assignments take place in the frame of that anonymous function.
When you use x <<- x + 1, the evaluation still takes place in the constructed function, but its environment is the calling environment to replicate() (because of the eval.parent call), and that's where the assignment happens. That's why you get the increasing values in the output.
So I think you understood the manual correctly, but it didn't make clear it was talking there about the case of standard evaluation. The following paragraph hints at what's happening here:
It is possible to access the actual (not default) expressions used as arguments inside the function. The mechanism is implemented via promises. When a function is being evaluated the actual expression used as an argument is stored in the promise together with a pointer to the environment the function was called from. When (if) the argument is evaluated the stored expression is evaluated in the environment that the function was called from. Since only a pointer to the environment is used any changes made to that environment will be in effect during this evaluation. The resulting value is then also stored in a separate spot in the promise. Subsequent evaluations retrieve this stored value (a second evaluation is not carried out). Access to the unevaluated expression is also available using substitute.
but the help page for replicate() doesn't make clear this is what it's doing.
BTW, your title asks about apply family functions: but most of them other than replicate ask explicitly for a function, so this issue doesn't arise there. For example, it's obvious that this doesn't affect the global x:
sapply(integer(10), function(i) x <- x + 1)

Related

Combining recursion with map - is reduce the solution?

I am trying to avoid using a for loop at all costs in this example. Consider this simple case:
I have a vector z and an initial condition b1:
z <- 1:5
b1 <- 0
Consider also this simple function like adding:
f <- function(y, b){
return(y + b)
}
I'd like to write a function that generates a sequence S as follows: the first element, call it S[1] is f's output with the y argument as z[1] and the b argument as the initial condition b1. The second element S[2] is such that S[2] = f(y = z[2], b = S[1]). How can I?
Keep in mind that I don't want to use stuff like cumsum since my actual function f is more complicated
The desired output on this case would be the vector:
c(0 + 1,
1 + 2,
3 + 3,
6 + 4,
10 + 5)
Or c(1, 3, 6, 10, 15)
I thought about using reduce but I guess it only accepts deals with the recursive argument b and not the mapping part y
Reduce(function(prev, this) prev + this,
1:5, init=0, accumulate=TRUE)[-1]
# [1] 1 3 6 10 15
You can easily use your own function in place of my anonymous function, I defined it like that to demonstrate which value is which: the first argument is your S[n-1], the second is your z[n].
The [-1] is because of this: without init=, the first call to function is effectively f(prev=z[1], this=z[2]), which is correct only when your b1 is 0. If b1 is ever anything else, then you must use init=b1. However, when using init=, it effectively prepends it to the input vector (your z). This means that the first call to the function is f(prev=b1, this=z[1]) which is right, but it also means that the return value includes b1 and is therefore too long. Fortunately, we can drop the first element very easily.
Lastly, the normal operation of Reduce is to only return the value from the last call to the function (ergo the reduction concept of Reduce); using accumulate= really means Reduce is a cumulative-operation function.

Understanding evaluation of input arguments of functions

I am reading Advanced R by Hadley Wickham where some very good exercises are provided. One of them asks for description of this function:
f1 <- function(x = {y <- 1; 2}, y = 0) {
x + y
}
f1()
Can someone help me to understand why it returns 3? I know there is something called lazy evaluation of the input arguments, and e.g. another exercise asks for description of this function
f2 <- function(x = z) {
z <- 100
x
}
f2()
and I correctly predicted to be 100; x gets value of z which is evaluated inside a function, and then x is returned. I cannot figure out what happens in f1(), though.
Thanks.
See this from https://cran.r-project.org/doc/manuals/r-patched/R-lang.html#Evaluation:
When a function is called or invoked a new evaluation frame is
created. In this frame the formal arguments are matched with the
supplied arguments according to the rules given in Argument matching.
The statements in the body of the function are evaluated sequentially
in this environment frame.
...
R has a form of lazy evaluation of function arguments. Arguments are not evaluated until needed.
and this from https://cran.r-project.org/doc/manuals/r-patched/R-lang.html#Arguments:
Default values for arguments can be specified using the special form
‘name = expression’. In this case, if the user does not specify a
value for the argument when the function is invoked the expression
will be associated with the corresponding symbol. When a value is
needed the expression is evaluated in the evaluation frame of the
function.
In summary, if the parameter does not have user-specified value, its default value will be evaluated in the function's evaluation frame. So y is not evalulated at first. When the default of x is evaluated in the function's evaluation frame, y will be modified to 1, then x will be set to 2. As y is already found, the default argument has no change to be evaluated. if you try f1(y = 1) and f1(y = 2), the results are still 3.

parameter passing mechanism in R

The following function is used to multiply a sequence 1:x by y
f1<-function(x,y){return (lapply(1:x, function(a,b) b*a, b=y))}
Looks like a is used to represent the element in the sequence 1:x, but I do not know how to understand this parameter passing mechanism. In other OO languages, like Java or C++, there have call by reference or call by value.
Short answer: R is call by value. Long answer: it can do both.
Call By Value, Lazy Evaluation, and Scoping
You'll want to read through: the R language definition for more details.
R mostly uses call by value but this is complicated by its lazy evaluation:
So you can have a function:
f <- function(x, y) {
x * 3
}
If you pass in two big matrixes to x and y, only x will be copied into the callee environment of f, because y is never used.
But you can also access variables in parent environments of f:
y <- 5
f <- function(x) {
x * y
}
f(3) # 15
Or even:
y <- 5
f <- function() {
x <- 3
g <- function() {
x * y
}
}
f() # returns function g()
f()() # returns 15
Call By Reference
There are two ways for doing call by reference in R that I know of.
One is by using Reference Classes, one of the three object oriented paradigms of R (see also: Advanced R programming: Object Oriented Field Guide)
The other is to use the bigmemory and bigmatrix packages (see The bigmemory project). This allows you to create matrices in memory (underlying data is stored in C), returning a pointer to the R session. This allows you to do fun things like accessing the same matrix from multiple R sessions.
To multiply a vector x by a constant y just do
x * y
The (some prefix)apply functions works very similar to each other, you want to map a function to every element of your vector, list, matrix and so on:
x = 1:10
x.squared = sapply(x, function(elem)elem * elem)
print(x.squared)
[1] 1 4 9 16 25 36 49 64 81 100
It gets better with matrices and data frames because you can now apply a function over all rows or columns, and collect the output. Like this:
m = matrix(1:9, ncol = 3)
# The 1 below means apply over rows, 2 would mean apply over cols
row.sums = apply(m, 1, function(some.row) sum(some.row))
print(row.sums)
[1] 12 15 18
If you're looking for a simple way to multiply a sequence by a constant, definitely use #Fernando's answer or something similar. I'm assuming you're just trying to determine how parameters are being passed in this code.
lapply calls its second argument (in your case function(a, b) b*a) with each of the values of its first argument 1, 2, ..., x. Those values will be passed as the first parameter to the second argument (so, in your case, they will be argument a).
Any additional parameters to lapply after the first two, in your case b=y, are passed to the function by name. So if you called your inner function fxn, then your invocation of lapply is making calls like fxn(1, b=4), fxn(2, b=4), .... The parameters are passed by value.
You should read the help of lapply to understand how it works. Read this excellent answer to get and a good explanation of different xxpply family functions.
From the help of laapply:
lapply(X, FUN, ...)
Here FUN is applied to each elementof X and ... refer to:
... optional arguments to FUN.
Since FUN has an optional argument b, We replace the ... by , b=y.
You can see it as a syntax sugar and to emphasize the fact that argument b is optional comparing to argument a. If the 2 arguments are symmetric maybe it is better to use mapply.

Why are arguments to replacement functions not evaluated lazily?

Consider the following simple function:
f <- function(x, value){print(x);print(substitute(value))}
Argument x will eventually be evaluated by print, but value never will. So we can get results like this:
> f(a, a)
Error in print(x) : object 'a' not found
> f(3, a)
[1] 3
a
> f(1+1, 1+1)
[1] 2
1 + 1
> f(1+1, 1+"one")
[1] 2
1 + "one"
Everything as expected.
Now consider the same function body in a replacement function:
'g<-' <- function(x, value){print(x);print(substitute(value))}
(the single quotes should be fancy quotes)
Let's try it:
> x <- 3
> g(x) <- 4
[1] 3
[1] 4
Nothing unusual so far...
> g(x) <- a
Error: object 'a' not found
This is unexpected. Name a should be printed as a language object.
> g(x) <- 1+1
[1] 4
1 + 1
This is ok, as x's former value is 4. Notice the expression passed unevaluated.
The final test:
> g(x) <- 1+"one"
Error in 1 + "one" : non-numeric argument to binary operator
Wait a minute... Why did it try to evaluate this expression?
Well the question is: bug or feature? What is going on here? I hope some guru users will shed some light about promises and lazy evaluation on R. Or we may just conclude it's a bug.
We can reduce the problem to a slightly simpler example:
g <- function(x, value)
'g<-' <- function(x, value) x
x <- 3
# Works
g(x, a)
`g<-`(x, a)
# Fails
g(x) <- a
This suggests that R is doing something special when evaluating a replacement function: I suspect it evaluates all arguments. I'm not sure why, but the comments in the C code (https://github.com/wch/r-source/blob/trunk/src/main/eval.c#L1656 and https://github.com/wch/r-source/blob/trunk/src/main/eval.c#L1181) suggest it may be to make sure other intermediate variables are not accidentally modified.
Luke Tierney has a long comment about the drawbacks of the current approach, and illustrates some of the more complicated ways replacement functions can be used:
There are two issues with the approach here:
A complex assignment within a complex assignment, like
f(x, y[] <- 1) <- 3, can cause the value temporary
variable for the outer assignment to be overwritten and
then removed by the inner one. This could be addressed by
using multiple temporaries or using a promise for this
variable as is done for the RHS. Printing of the
replacement function call in error messages might then need
to be adjusted.
With assignments of the form f(g(x, z), y) <- w the value
of z will be computed twice, once for a call to g(x, z)
and once for the call to the replacement function g<-. It
might be possible to address this by using promises.
Using more temporaries would not work as it would mess up
replacement functions that use substitute and/or
nonstandard evaluation (and there are packages that do
that -- igraph is one).
I think the key may be found in this comment beginning at line 1682 of "eval.c" (and immediately followed by the evaluation of the assignment operation's RHS):
/* It's important that the rhs get evaluated first because
assignment is right associative i.e. a <- b <- c is parsed as
a <- (b <- c). */
PROTECT(saverhs = rhs = eval(CADR(args), rho));
We expect that if we do g(x) <- a <- b <- 4 + 5, both a and b will be assigned the value 9; this is in fact what happens.
Apparently, the way that R ensures this consistent behavior is to always evaluate the RHS of an assignment first, before carrying out the rest of the assignment. If that evaluation fails (as when you try something like g(x) <- 1 + "a"), an error is thrown and no assignment takes place.
I'm going to go out on a limb here, so please, folks with more knowledge feel free to comment/edit.
Note that when you run
'g<-' <- function(x, value){print(x);print(substitute(value))}
x <- 1
g(x) <- 5
a side effect is that 5 is assigned to x. Hence, both must be evaluated. But if you then run
'g<-'(x,10)
both the values of x and 10 are printed, but the value of x remains the same.
Speculation:
So the parser is distinguishing between whether you call g<- in the course of making an actual assignment, and when you simply call g<- directly.

Assignment in R language

I am wondering how assignment works in the R language.
Consider the following R shell session:
> x <- c(5, 6, 7)
> x[1] <- 10
> x
[1] 10 6 7
>
which I totally understand. The vector (5, 6, 7) is created and bound to
the symbol 'x'. Later, 'x' is rebound to the new vector (10, 6, 7) because vectors
are immutable data structures.
But what happens here:
> c(4, 5, 6)[1] <- 10
Error in c(4, 5, 6)[1] <- 10 :
target of assignment expands to non-language object
>
or here:
> f <- function() c(4, 5, 6)
> f()[1] <- 10
Error in f()[1] <- 10 : invalid (NULL) left side of assignment
>
It seems to me that one can only assign values to named data structures (like 'x').
The reason why I am asking is because I try to implement the R language core and I am unsure
how to deal with such assignments.
Thanks in advance
It seems to me that one can only assign values to named data structures (like 'x').
That's precisely what the documentation for ?"<-" says:
Description:
Assign a value to a name.
x[1] <- 10 doesn't use the same function as x <- c(5, 6, 7). The former calls [<- while the latter calls <-.
As per #Owen's answer to this question, x[1] <- 10 is really doing two things. It is calling the [<- function, and it is assigning the result of that call to x.
So what you want to achieve your c(4, 5, 6)[1] <- 10 result is:
> `[<-`(c(4, 5, 6),1, 10)
[1] 10 5 6
You can make modifications to anonymous functions, but there is no assignment to anonymous vectors. Even R creates temporary copies with names and you will sometimes see error messages that reflect that fact. You can read this in the R language definition on page 21 where it deals with the evaluation of expressions for "subset assignment" and for other forms of assignment:
x[3:5] <- 13:15
# The result of this commands is as if the following had been executed
`*tmp*` <- x
x <- "[<-"(`*tmp*`, 3:5, value=13:15)
rm(`*tmp*`)
And there is a warning not to use *tmp* as an object name because it would be overwritting during the next call to [<-

Resources