Assignment in R language - r

I am wondering how assignment works in the R language.
Consider the following R shell session:
> x <- c(5, 6, 7)
> x[1] <- 10
> x
[1] 10 6 7
>
which I totally understand. The vector (5, 6, 7) is created and bound to
the symbol 'x'. Later, 'x' is rebound to the new vector (10, 6, 7) because vectors
are immutable data structures.
But what happens here:
> c(4, 5, 6)[1] <- 10
Error in c(4, 5, 6)[1] <- 10 :
target of assignment expands to non-language object
>
or here:
> f <- function() c(4, 5, 6)
> f()[1] <- 10
Error in f()[1] <- 10 : invalid (NULL) left side of assignment
>
It seems to me that one can only assign values to named data structures (like 'x').
The reason why I am asking is because I try to implement the R language core and I am unsure
how to deal with such assignments.
Thanks in advance

It seems to me that one can only assign values to named data structures (like 'x').
That's precisely what the documentation for ?"<-" says:
Description:
Assign a value to a name.
x[1] <- 10 doesn't use the same function as x <- c(5, 6, 7). The former calls [<- while the latter calls <-.

As per #Owen's answer to this question, x[1] <- 10 is really doing two things. It is calling the [<- function, and it is assigning the result of that call to x.
So what you want to achieve your c(4, 5, 6)[1] <- 10 result is:
> `[<-`(c(4, 5, 6),1, 10)
[1] 10 5 6

You can make modifications to anonymous functions, but there is no assignment to anonymous vectors. Even R creates temporary copies with names and you will sometimes see error messages that reflect that fact. You can read this in the R language definition on page 21 where it deals with the evaluation of expressions for "subset assignment" and for other forms of assignment:
x[3:5] <- 13:15
# The result of this commands is as if the following had been executed
`*tmp*` <- x
x <- "[<-"(`*tmp*`, 3:5, value=13:15)
rm(`*tmp*`)
And there is a warning not to use *tmp* as an object name because it would be overwritting during the next call to [<-

Related

How are apply family functions scoped?

Consider:
x <- 5
replicate(10, x <- x + 1)
This has output c(6, 6, 6, 6, 6, 6, 6, 6, 6, 6). However:
x <- 5
replicate(10, x <<- x + 1)
has output c(6, 7, 8, 9, 10, 11, 12, 13, 14, 15).
What does this imply about the environment that x <- x + 1 is evaluated in? Am I to believe that x is treated as if it is an internal variable for replicate? That appears to be what I'm seeing, but when I consulted the relevant section of the language definition, I saw the following:
It is also worth noting that the effect of foo(x <- y) if the argument is evaluated is to change the value of x in the calling environment and not in the evaluation environment of foo.
But if x really was changed in the calling environment, then why does:
x <- 5
replicate(10, x <- x + 1)
x
Return 5 and not 15? What part have I misunderstood?
The sentence you quoted from the language definition is about standard evaluation, but replicate uses non-standard evaluation. Here's its source:
replicate <- function (n, expr, simplify = "array")
sapply(integer(n), eval.parent(substitute(function(...) expr)),
simplify = simplify)
The substitute(function(...) expr) call takes your expression x <- x + 1 without evaluating it, and creates a new function
function(...) x <- x + 1
That's the function that gets passed to sapply(), which applies it to a vector of length n. So all the assignments take place in the frame of that anonymous function.
When you use x <<- x + 1, the evaluation still takes place in the constructed function, but its environment is the calling environment to replicate() (because of the eval.parent call), and that's where the assignment happens. That's why you get the increasing values in the output.
So I think you understood the manual correctly, but it didn't make clear it was talking there about the case of standard evaluation. The following paragraph hints at what's happening here:
It is possible to access the actual (not default) expressions used as arguments inside the function. The mechanism is implemented via promises. When a function is being evaluated the actual expression used as an argument is stored in the promise together with a pointer to the environment the function was called from. When (if) the argument is evaluated the stored expression is evaluated in the environment that the function was called from. Since only a pointer to the environment is used any changes made to that environment will be in effect during this evaluation. The resulting value is then also stored in a separate spot in the promise. Subsequent evaluations retrieve this stored value (a second evaluation is not carried out). Access to the unevaluated expression is also available using substitute.
but the help page for replicate() doesn't make clear this is what it's doing.
BTW, your title asks about apply family functions: but most of them other than replicate ask explicitly for a function, so this issue doesn't arise there. For example, it's obvious that this doesn't affect the global x:
sapply(integer(10), function(i) x <- x + 1)

Cannot modify subset of psp object

All,
I am trying to modify a subset of a psp object in the R package spatstat. Here is the code that is giving me an issue:
set.seed(10)
mat <- matrix(runif(40), ncol=4)
mx <- data.frame(v1=sample(1:4,10,TRUE),
v2=factor(sample(letters[1:4],10,TRUE),levels=letters[1:4]))
a <- as.psp(mat, window=owin(),marks=mx)
#subset to marking v1 = 2, modify one of its endpoints
a[a$marks$v1==2]$ends$x0<-rep(5,4)
this throws a warning at me:
Warning message:
In a[a$marks$v1 == 2]$ends$x0 <- rep(5, 4) :
number of items to replace is not a multiple of replacement length
What is the right way to modify some elements of a psp object? I commonly use this operation with dataframes and don't have an issue. My sense is that the subset operator ([) isn't set up for this operation with the psp class.
Thank you for reading; appreciate any help you may have.
The problem here is that you are trying to write to a subset of the psp object. Although the [ operator is defined for this class so you can extract a subset from it, the [<- operator is not defined, so you can't overwrite a subset.
However, the member that you are trying to overwrite is a data frame, which of course does have a [<- operator defined. So all you need to do is write to that without subsetting the actual psp object.
Here's a full reprex:
library(spatstat)
set.seed(10)
mat <- matrix(runif(40), ncol = 4)
mx <- data.frame(v1 = sample(1:4, 10, TRUE),
v2 = factor(sample(letters[1:4], 10, TRUE),
levels = letters[1:4]))
a <- as.psp(mat, window = owin(), marks = mx)
#subset to marking v1 = 2, modify one of its endpoints
a$ends$x0[a$marks$v1 == 2] <- rep(5, 4)
a
#> marked planar line segment pattern: 10 line segments
#> Mark variables: v1, v2
#> window: rectangle = [0, 1] x [0, 1] units
Created on 2020-08-18 by the reprex package (v0.3.0)
I will take that as a feature request to add a method for [<- for class psp.
Generally we advise against directly altering the components of objects in spatstat because this can destroy their internal consistency. So a method for [<- would be the best solution.

Count number of arguments passed to function

I'm interested in counting a number of arguments passed to a function. length can't be used for that purpose:
>> length(2,2,2,2,2)
Error in length(2, 2, 2, 2, 2) :
5 arguments passed to 'length' which requires 1
This is obvious as length takes 1 argument so:
length(c(2,2,2,2,2))
would produce the desired result - 5.
Solution
I want to call my function like that myFunction(arg1, arg2, arg3). This can be done with use of an ellipsis:
myCount <- function(...) {length(list(...))}
myCount would produce the desired result:
>> myCount(2,2,2,2,2)
[1] 5
Problem
This is awfully inefficient. I'm calling this function on substantial number of arguments and creating lists just to count number of objects is wasteful. What's the better way of returning the number of arguments passed to a function?
How about
myCount <- function(...) {length(match.call())-1}
This just inspects the passed call (and removes 1 for the function name itself)
nargs returns the number of arguments supplied to that function
myCount <- function(...) {
nargs()
}
> myCount(2,2,2,2,2)
[1] 5
Reference https://stat.ethz.ch/R-manual/R-devel/library/base/html/nargs.html
Here is a somewhat elegant way using length() with purrr::lift_*() familiy functions.
Generally you are passing multiple arguments to length(), which is not working because length() takes a vector or a list as input.
So what we need is to convert the input from a vector/list to ... (dots). purrr::lift_*() family provides a series of functions that do so.
One option can be converting from vector to dots:
> lift_vd(length)(2, 2, 2, 2, 2)
[1] 5
Another option can be converting from list to dots:
> lift_ld(length)(2, 2, 2, 2, 2)
[1] 5
Both options are working perfectly well, and what you need is using one of the purrr::lift_*() functions on length() before passing spliced arguments to it.

Is there a way to mimic R's higher order (binary) function shorthand syntax within spark or pyspark?

In R, I can write the following:
## Explicit
Reduce(function(x,y) x*y, c(1, 2, 3))
# returns 6
However, I can also do this less explicitly with the following:
## Less explicit
Reduce(`*`, c(1, 2, 3))
# also returns 6
In pyspark, I could do the following:
rdd = sc.parallelize([1, 2, 3])
rdd.reduce(lambda a, b: a * b)
Question: Can you mimic the "shorthand" (less explicit) syntax of R's Reduce('*', ...) with pyspark or some sort of anonymous function?
In R, you're supplying a binary function. The multiply operator (as with all operators) is actually a binary function. Type
`*`(2, 3)
to see what I mean.
In Python, the equivalent for multiplication is operator.mul.
So:
rdd = sc.parallelize([1, 2, 3])
rdd.reduce(operator.mul)

Indexing a list with an empty index

The technique of indexing a data frame with an empty index features several times in Hadley Wickam's Advanced R, but is only explained there in passing. I'm trying to figure out the rules governing indexing a list with an empty index. Consider the following four statements.
> (l <- list(a = 1, b = 2))
$a
[1] 1
$b
[1] 2
> (l[] <- list(c = 3))
$c
[1] 3
> l
$a
[1] 3
$b
[1] 3
> l[]
$a
[1] 3
$b
[1] 3
Questions:
Why is the output from second statement different from the output from the third statement? Isn't assignment supposed to return the object being assigned to, in which case the second statement should yield the same output as the third one?
How come did the assignment in the second statement result in the output shown after the third statement? What are the rules governing assignment to an emptily indexed list?
How come did the fourth statement yield the output shown? What are the rules governing indexing a list with an empty index when it is not on the left hand side of an assignment?
In short l[] will return the whole list.
(l <- list(a = 1, b = 2))
l[]
l[] <- list(c=3) is essentially reassigning what was assigned to each index to now be the result of list(c=3). For this example, it is the same as saying l[[1]] <- 3 and l[[2]] <- 3. From the ?'[' page, which mentions empty indexing a few times:
When an index expression appears on the left side of an assignment (known as subassignment) then that part of x is set to the value of the right hand side of the assignment.
and also
An empty index selects all values: this is most often used to replace all the entries but keep the attributes.
So, I roughly take this to mean each index of l should evaluate to list(c=3).
When you enter (l[] <- list(c = 3)) what is being returned is the replacement value. When you then enter l or l[] you will see that the values at each index have been replaced by list(c=3).
In addition to the previous answer, check this out. Note that the behaviour is totally the same with ordinary vectors and lists, so it cannot be labeled as "list-specific".
v <- 1:3
names(v) <- c("one", "two", "three")
r <- 4:5
names(r) <- c("four", "five")
(v[] <- r)
four five
4 5
Warning message:
In v[] <- r :
number of items to replace is not a multiple of replacement length
v
one two three
4 5 4
Assignment via subsetting keeps initial attributes (here, names). So names from the right side of the assigment are lost. What is also important, assigning via subsetting follows recycling rules. In your example, all values are reassigned to 3, in my example there is a partial recycling with a warning due to length incompatibility.
To sum up,
Assignment with <- returns evaluated right hand side before applying recycling rules.
This happens because of recycling, since lengths of two objects differ.
Without assignment operator, l or v is essentially the same as l[] or v[].

Resources