How does the `[<-` function work in R? - r

I've seen a couple of people using [<- as a function with Polish notation, for example
x <- matrix(1:4, nrow = 2)
`[<-`(x, 1, 2, 7)
which returns
[,1] [,2]
[1,] 1 7
[2,] 2 4
I've tried playing around with [<- a little, and it looks like using it this way prints the result of something like x[1,2] <- 7 without actually performing the assignment. But I can't figure out for sure what this function actually does, because the documentation given for ?"[" only mentions it in passing, and I can't search google or SO for "[<-".
And yes, I know that actually using it is probably a horrible idea, I'm just curious for the sake of a better understanding of R.

This is what you would need to do to get the assignment to stick:
`<-`( `[`( x, 1, 2), 7) # or x <- `[<-`( x, 1, 2, 7)
x
[,1] [,2]
[1,] 1 7
[2,] 2 4
Essentially what is happening is that [ is creating a pointer into row-col location of x and then <- (which is really a synonym for assign that can also be used in an infix notation) is doing the actual "permanent" assignment. Do not be misled into thinking this is a call-by-reference assignment. I'm reasonably sure there will still be a temporary value of x created.
Your version did make a subassignment (as can be seen by what it returned) but that assignment was only in the local environment of the call to [<- which did not encompass the global environment.

Since `[`(x, y) slices an object, and `<-`(x, z) performs assignment, it seems like `[<-`(x,y,z) would perform the assignment x[y] <- y. #42-'s answer is a great explanation of what [<- actually does, and the top answer to `levels<-`( What sorcery is this? provides some insight into why R works this way.
To see what [<- actually does under the hood, you have to go to the C source code, which for [<- can be found at http://svn.r-project.org/R/trunk/src/main/subassign.c (the relevant parts start at around line 1470). You can see that x, the object being "assigned to" is protected so that only the local version is mutated. Instead, we're using VectorAssign, MatrixAssign, ArrayAssign, etc. to perform assignment locally and then returning the result.

Related

Any logical test to distinguish between make-up of numerical objects

I was wondering if there a way for R to detect the existence or absence of the sign * as used in the following objects?
In other words, can R understand that a has a * sign but b doesn't?
a = 3*4
b = 12
If you keep the expressions unevaluated, R can understand their internal complexity. Under normal circumstances, though, R evaluates expressions immediately, so there is no way to tell the difference between a <- 3*4 and b <- 12 once the assignments have been made. That means that the answer to your specific question is No.
Dealing with unevaluated expressions can get a bit complex, but quote() is one simple way to keep e.g. 3*4 from being evaluated:
> length(quote(3*4))
[1] 3
> length(quote(12))
[1] 1
If you're working inside a function, you can use substitute to retrieve the unevaluated form of the function arguments:
> f <- function(a) {
+ length(substitute(a))
+ }
> f(12)
[1] 1
> f(3*4)
[1] 3
In case you're pursuing this farther, you should be aware that counting complexity might not be as easy as you think:
> f(sqrt(2*3+(7*19)^2))
[1] 2
What's going on is that R stores expressions as a tree; the top level here is made up of sqrt and <the rest of the expression>, which has length 2. If you want to measure complexity you'll need to do some kind of collapsing or counting down the branches of the tree ...
Furthermore, if you first assign a <- 3*4 and then call f(a) you get 1, not 3, because substitute() gives you back just the symbol a, which has length 1 ... the information about the difference between "12" and "3*4" gets lost as soon as the expression is evaluated, which happens when the value is assigned to the symbol a. The bottom line is that you have to be very careful in controlling when expressions get evaluated, and it's not easy.
Hadley Wickham's chapter on expressions might be a good place to read more.

R - using the assign() function for subsetting an array

I have tried to get the gist of my problem in the following reproducible example :
mat <- matrix(3:6,nr=2,nc=2)
j=1
> eval(parse(text=paste0("m",c("a","b")[j],"t","[1,1]")))
[1] 3
> assign(paste0("m",c("a","b")[j],"t","[1,1]"),45)
> mat
[,1] [,2]
[1,] 3 5
[2,] 4 6
My problem is that mat[1,1] is still equal to 3 and not 45 as I would have expected.
A primer on R: "Every operation is a function call." What this means, in a practical sense for your question, is that you can't use assign() with more than just a name. mat[1,1] is not a name - it is the name mat and the function call [. So, using the expression mat[1,1] within assign will not work, because it is trying to find an R object named mat[1,1] (which I think is disastrous for a few reasons...)
This seems like a really weird use case. You might want to consider instead working in a function, which has its own environment that you can manipulate without working in the global environment.
Alternatively, you can do this:
eval(parse(text=paste0("m",c("a","b")[j],"t","[1,1] <- 45")))
eval(parse(text=paste0("m",c("a","b")[j],"t","[1,1]")))
I am struggling to think of a reason you would want to - but it is, in theory, possible. Basically, you just add the assignment to the text that you are parsing, then pass it to eval().

what is the purpose of 'NULL' in processing loops?

sqr = seq(1, 100, by=2)
sqr.squared = NULL
for (n in 1:50)
{
sqr.squared[n] = sqr[n]^2
}
I came accross the loop above, for a beginner this was simple enough. To further understand r what was the precise purpose of the second line? For my research I gather it has something to do with resetting the vector. If someone could elaborate it'd be much appreciated.
sqr.squared <- NULL
is one of many ways initialize the empty vector sqr.squared prior to running it through a loop. In general, when the length of the resulting vector is known, it is much better practice to allocate the vector's length. So here,
sqr.squared <- vector("integer", 50)
would be much better practice. And faster too. This way you are not building the new vector in the loop. But since ^ is vectorized, you could also simply do
sqr[1:50] ^ 2
and ditch the loop all together.
Another way to think about it is to remember that everything in r is a function call, and functions need input (usually).
say you calculated y and want to store that value somewhere. You can do x <- y without initializing an x object (r does this for you unlike in other languages, c for example), but say you want to store it in a specific place in x.
So note that <- (or = in your example) is a function
y <- 1
x[2] <- y
# Error in x[2] <- y : object 'x' not found
This is a different function than <-. Since you want to put y at x[2], you need the function [<-
`[<-`(x, 2, y)
# Error: object 'x' not found
But this still doesn't work because we need the object x to use this function, so initialize x to something.
(x <- numeric(5))
# [1] 0 0 0 0 0
# and now use the function
`[<-`(x, 2, y)
# [1] 0 1 0 0 0
This prefix notation is easier for computers to parse (eg, + 1 1) but harder for humans (me at least), so we prefer infix notation (eg, 1 + 1). R makes such functions easier to use x[2] <- y rather than how I did above.
The first answer is correct, when you assign a NULL value to a variable, the purpose is to initialize a vector. In many cases, when you are working checking numbers or with different types of variables, you will need to set NULL this arrays, matrix, etc.
For example, in you want to create a some type of element, in some cases you will need to put something inside them. This is the purpose of to use NULL. In addition, sometimes you will require NA instead of NULL.

Why are arguments to replacement functions not evaluated lazily?

Consider the following simple function:
f <- function(x, value){print(x);print(substitute(value))}
Argument x will eventually be evaluated by print, but value never will. So we can get results like this:
> f(a, a)
Error in print(x) : object 'a' not found
> f(3, a)
[1] 3
a
> f(1+1, 1+1)
[1] 2
1 + 1
> f(1+1, 1+"one")
[1] 2
1 + "one"
Everything as expected.
Now consider the same function body in a replacement function:
'g<-' <- function(x, value){print(x);print(substitute(value))}
(the single quotes should be fancy quotes)
Let's try it:
> x <- 3
> g(x) <- 4
[1] 3
[1] 4
Nothing unusual so far...
> g(x) <- a
Error: object 'a' not found
This is unexpected. Name a should be printed as a language object.
> g(x) <- 1+1
[1] 4
1 + 1
This is ok, as x's former value is 4. Notice the expression passed unevaluated.
The final test:
> g(x) <- 1+"one"
Error in 1 + "one" : non-numeric argument to binary operator
Wait a minute... Why did it try to evaluate this expression?
Well the question is: bug or feature? What is going on here? I hope some guru users will shed some light about promises and lazy evaluation on R. Or we may just conclude it's a bug.
We can reduce the problem to a slightly simpler example:
g <- function(x, value)
'g<-' <- function(x, value) x
x <- 3
# Works
g(x, a)
`g<-`(x, a)
# Fails
g(x) <- a
This suggests that R is doing something special when evaluating a replacement function: I suspect it evaluates all arguments. I'm not sure why, but the comments in the C code (https://github.com/wch/r-source/blob/trunk/src/main/eval.c#L1656 and https://github.com/wch/r-source/blob/trunk/src/main/eval.c#L1181) suggest it may be to make sure other intermediate variables are not accidentally modified.
Luke Tierney has a long comment about the drawbacks of the current approach, and illustrates some of the more complicated ways replacement functions can be used:
There are two issues with the approach here:
A complex assignment within a complex assignment, like
f(x, y[] <- 1) <- 3, can cause the value temporary
variable for the outer assignment to be overwritten and
then removed by the inner one. This could be addressed by
using multiple temporaries or using a promise for this
variable as is done for the RHS. Printing of the
replacement function call in error messages might then need
to be adjusted.
With assignments of the form f(g(x, z), y) <- w the value
of z will be computed twice, once for a call to g(x, z)
and once for the call to the replacement function g<-. It
might be possible to address this by using promises.
Using more temporaries would not work as it would mess up
replacement functions that use substitute and/or
nonstandard evaluation (and there are packages that do
that -- igraph is one).
I think the key may be found in this comment beginning at line 1682 of "eval.c" (and immediately followed by the evaluation of the assignment operation's RHS):
/* It's important that the rhs get evaluated first because
assignment is right associative i.e. a <- b <- c is parsed as
a <- (b <- c). */
PROTECT(saverhs = rhs = eval(CADR(args), rho));
We expect that if we do g(x) <- a <- b <- 4 + 5, both a and b will be assigned the value 9; this is in fact what happens.
Apparently, the way that R ensures this consistent behavior is to always evaluate the RHS of an assignment first, before carrying out the rest of the assignment. If that evaluation fails (as when you try something like g(x) <- 1 + "a"), an error is thrown and no assignment takes place.
I'm going to go out on a limb here, so please, folks with more knowledge feel free to comment/edit.
Note that when you run
'g<-' <- function(x, value){print(x);print(substitute(value))}
x <- 1
g(x) <- 5
a side effect is that 5 is assigned to x. Hence, both must be evaluated. But if you then run
'g<-'(x,10)
both the values of x and 10 are printed, but the value of x remains the same.
Speculation:
So the parser is distinguishing between whether you call g<- in the course of making an actual assignment, and when you simply call g<- directly.

What is the practical use of the identity function in R?

Base R defines an identity function, a trivial identity function returning its argument (quoting from ?identity).
It is defined as :
identity <- function (x){x}
Why would such a trivial function ever be useful? Why would it be included in base R?
Don't know about R, but in a functional language one often passes functions as arguments to other functions. In such cases, the constant function (which returns the same value for any argument) and the identity function play a similar role as 0 and 1 in multiplication, so to speak.
I use it from time to time with the apply function of commands.
For instance, you could write t() as:
dat <- data.frame(x=runif(10),y=runif(10))
apply(dat,1,identity)
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
x 0.1048485 0.7213284 0.9033974 0.4699182 0.4416660 0.1052732 0.06000952
y 0.7225307 0.2683224 0.7292261 0.5131646 0.4514837 0.3788556 0.46668331
[,8] [,9] [,10]
x 0.2457748 0.3833299 0.86113771
y 0.9643703 0.3890342 0.01700427
One use that appears on a simple code base search is as a convenience for the most basic type of error handling function in tryCatch.
tryCatch(...,error = identity)
which is identical (ha!) to
tryCatch(...,error = function(e) e)
So this handler would catch an error message and then simply return it.
For whatever it's worth, it is located in funprog.R (the functional programming stuff) in the source of the base package, and it was added as a "convenience function" in 2008: I can imagine (but can't give an immediate example!) that there would be some contexts in the functional programming approach (i.e. using Filter, Reduce, Map etc.) where it would be convenient to have an identity function ...
r45063 | hornik | 2008-04-03 12:40:59 -0400 (Thu, 03 Apr 2008) | 2 lines
Add higher-order functions Find() and Position(), and convenience
function identity().
Stepping away from functional programming, identity is also used in another context in R, namely statistics. Here, it is used to refer to the identity link function in generalized linear models. For more details about this, see ?family or ?glm. Here is an example:
> x <- rnorm(100)
> y <- rpois(100, exp(1+x))
> glm(y ~x, family=quasi(link=identity))
Call: glm(formula = y ~ x, family = quasi(link = identity))
Coefficients:
(Intercept) x
4.835 5.842
Degrees of Freedom: 99 Total (i.e. Null); 98 Residual
Null Deviance: 6713
Residual Deviance: 2993 AIC: NA
However, in this case parsing it as a string instead of a function will achieve the same: glm(y ~x, family=quasi(link="identity"))
EDIT: As noted in the comments below, the function base::identity is not what is used by the link constructor, and it is just used for parsing the link name. (Rather than deleting this answer, I'll leave it to help clarify the difference between the two.)
Here is usage example:
Map<Integer, Long> m = Stream.of(1, 1, 2, 2, 3, 3)
.collect(Collectors.groupingBy(Function.identity(),
Collectors.counting()));
System.out.println(m);
output:
{1=2, 2=2, 3=2}
here we are grouping ints into a int/count map. Collectors.groupingBy accepts a Function. In our case we need a function which returns the argument. Note that we could use e->e lambda instead
I just used it like this:
fit_model <- function(lots, of, parameters, error_silently = TRUE) {
purrr::compose(ifelse(test = error_silently, yes = tryNA, no = identity),
fit_model_)(lots, of, parameters)
}
tryNA <- function(expr) {
suppressWarnings(tryCatch(expr = expr,
error = function(e) NA,
finally = NA))
}
As this question has already been viewed 8k times it maybe worth updating even 9 years after it has been written.
In a blog post called "Simple tricks for Debugging Pipes (within magrittr, base R or ggplot2)" the author points out how identity() can be very usefull at the end of different kinds of pipes. The blogpost with examples can be found here: https://rstats-tips.net/2021/06/06/simple-tricks-for-debugging-pipes-within-magrittr-base-r-or-ggplot2/
If pipe chains are written in a way, that each "pipe" symbol is at the end of a line, you can exclude any line from execution by commenting it out. Except for the last line. If you add identity() as the last line, there will never be a need to comment that out. So you can temporarily exclude any line that changes the data by commenting it out.

Resources