R object identity - r

is there a way to test whether two objects are identical in the R language?
For clarity: I do not mean identical in the sense of the identical function,
which compares objects based on certain properties like numerical values or logical values etc.
I am really interested in object identity, which for example could be tested using the is operator in the Python language.

UPDATE: A more robust and faster implementation of address(x) (not using .Internal(inspect(x))) was added to data.table v1.8.9. From NEWS :
New function address() returns the address in RAM of its argument. Sometimes useful in determining whether a value has been copied or not by R, programatically.
There's probably a neater way but this seems to work.
address = function(x) substring(capture.output(.Internal(inspect(x)))[1],2,17)
x = 1
y = 1
z = x
identical(x,y)
# [1] TRUE
identical(x,z)
# [1] TRUE
address(x)==address(y)
# [1] FALSE
address(x)==address(z)
# [1] TRUE
You could modify it to work on 32bit by changing 17 to 9.

You can use the pryr package.
For example, return the memory location of the mtcars object:
pryr::address(mtcars)
Then, for variables a and b, you can check:
address(a) == address(b)

Related

What are the dangers of using R attributes?

Adding used-defined attributes to R objects makes it easy to carry around some additional information glued together with the object of interest. The problem is that it slightly changes how R sees the objects, e.g. a numeric vector with additional attribute still is numeric but is not a vector anymore:
x <- rnorm(100)
class(x)
## [1] "numeric"
is.numeric(x)
## [1] TRUE
is.vector(x)
## [1] TRUE
mode(x)
## [1] "numeric"
typeof(x)
## [1] "double"
attr(x, "foo") <- "this is my attribute"
class(x)
## [1] "numeric"
is.numeric(x)
## [1] TRUE
is.vector(x)
## [1] FALSE # <-- here!
mode(x)
## [1] "numeric"
typeof(x)
## [1] "double"
Can this lead to any potential problems? What I'm thinking about is adding some attributes to common R objects and then passing them to other methods. What is the risk of something breaking just because of the fact alone that I added additional attributes to standard R objects (e.g. vector, matrix, data.frame etc.)?
Notice that I'm not asking about creating my own classes. For the sake of simplicity we can also assume that there won't be any conflicts in the names of the attributes (e.g. using dims attribute). Let's also assume that it is not a problem if some method at some point will drop my attribute, it is an acceptable risk.
In my (somewhat limited) experience, adding new attributes to an object hasn't ever broken anything. The only likely scenario I can think of where it would break something would be if a function required that an object have a specific set of attributes and nothing else. I can't think of a time when I've encountered that though. Most functions, especially in S3 methods, will just ignore attributes they don't need.
You're more likely to see problems arise if you remove attributes.
The reason you won't see a lot of problems stemming from additional attributes is that methods are dispatched on the class of an object. As long as the class doesn't change, methods will be dispatched in much the same way. However, this doesn't mean that existing methods will know what to do with your new attributes. Take the following example--after adding a new_attr attribute to both x and y, and then adding them, the result adopts the attribute of x. What happened to the attribute of y? The default + function doesn't know what to do with conflicting attributes of the same name, so it just takes the first one (more details at R Language Definition, thanks Brodie).
x <- 1:10
y <- 10:1
attr(x, "new_attr") <- "yippy"
attr(y, "new_attr") <- "ki yay"
x + y
[1] 1 2 3 4 5 6 7 8 9 10
attr(,"new_attr")
[1] "yippy"
In a different example, if we give x and y attributes with different names, x + y produces an object that preserves both attributes.
x <- 1:10
y <- 10:1
attr(x, "new_attr") <- "yippy"
attr(y, "another_attr") <- "ki yay"
x + y
[1] 11 11 11 11 11 11 11 11 11 11
attr(,"another_attr")
[1] "ki yay"
attr(,"new_attr")
[1] "yippy"
On the other hand, mean(x) doesn't even try to preserve the attributes. I don't know of a good way to predict which functions will and won't preserve attributes. There's probably some reliable mnemonic you could use in base R (aggregation vs. vectorized, perhaps?), but I think there's a separate principle that ought to be considered.
If preservation of your new attributes is important, you should define a new class that preserves the inheritance of the old class
With a new class, you can write methods that extend the generics and handle the attributes in whichever way you want. Whether or not you should define a new class and write its methods is very much dependent on how valuable any new attributes you add are to the future work you will be doing.
So in general, adding new attributes is very unlikely to break anything in R. But without adding a new class and methods to handle the new attributes, I would be very cautious about interpreting the meaning of those attributes after they've been passed through other functions.

How does R reference unassigned values?

I'm familiar with tracemem() showing the hex memory address of an assigned variable, e.g.
x <- 2
tracemem(x)
#> [1] "<0x876df68>"
but what does this involve (under the hood) when the value is literally just an unassigned value? e.g.
tracemem(4)
#> [1] "<0x9bd93b8>"
The same question applies to just evaluating an expression without assignment
4
#> [1] 4
It seems that if I evaluate this several times in the console, I get ever-increasing hex addresses
tracemem(4)
#> [1] "<0x8779968>"
tracemem(4)
#> [1] "<0x87799c8>"
tracemem(4)
#> [1] "<0x8779a28>"
but if I either explicitly loop this operation
for ( i in 1:3 ) { print(tracemem(4)) }
#> [1] "<0x28bda48>"
#> [1] "<0x28bda48>"
#> [1] "<0x28bda48>"
or with sapply via replicate
replicate(3, tracemem(4))
#> [1] "<0xba88208>" "<0xba88208>" "<0xba88208>"
I get repeats of the same address, even if I explicitly delay the printing between iterations
for ( i in 1:3 ) { print(tracemem(4)); Sys.sleep(1) }
#> [1] "<0xa3c4058>"
#> [1] "<0xa3c4058>"
#> [1] "<0xa3c4058>"
My best guess is that the call refers to an already temporarily assigned value in the parent.frame given eval.parent(substitute( in replicate, but I don't know enough about the underlying .Primitive code of for to know if it's doing the same there.
I have some confidence that R is creating temporary variables given that I can do
list(x = 1)
#> $x
#> [1] 1
so R must be processing the data even though it never assigns to anything. I'm aware of the strict formality summarised by #hadleywickham's tweet:
but I'm not sure how it works here. Is it just that the temporary name isn't preserved? Does the for loop always use that name/object? Does evaluating lots of code regardless of whether or not it's assigned still use up memory? (up until gc() is called, whenever that is??)
tl;dr -- how does R "store" unassigned values for printing?
Ok, so I will do what I can here.
First off, tracemem is a primitive. This means it is not a closure like the vast majority of R-level functions you can call from R code. More specifically, it is a BUILTINSXP primitive:
> .Internal(inspect(tracemem))
#62f548 08 BUILTINSXP g0c0 [MARK,NAM(1)]
This means that when it is called, a closure is NOT applied (because it is a primitive) and it's argument IS evaluated, because it is a BUILTINSXP (see this section of the R internals manual).
Closure application is when R objects passed as arguments in function calls are assigned to the appropriate variables within the call frame. Thus this doesn't happen for tracemem. Instead, its arguments are evaluated at the C level into a SEXP that is never bound to any symbol in any environment, but instead is passed directly to the C-level do_tracemem function. See this line within the C-level eval function
This means that when a numeric constant is passed to tracemem (a valid call though something one would generally not have any reason to do), you get the actual SEXP for the constant, not one representing an R level variable with the value 4, being passed down to do_tracemem.
As far as I can tell, within any evaluation frame (I may not be using this term precisely, but call frames and steps within a for loop qualify, as do individual top-level expressions) , every evaluation of , e.g., "4L" gets an entirely new SEXP (INTSXP, speciifically) with NAMED set immediately to 4. Between these frames, it appears that they can be shared, though I pretty strongly suspect that may be an artifact of memory-reuse rather than actually shared SEXPs.
The output below appears to corroborate the memory-reuse theory but I don't have the cycles free to confirm it beyond that at the moment.
> for(i in 1:3) {print(tracemem(4L)); print(tracemem(4L))}
[1] "<0x1c3f3b8>"
[1] "<0x1c3f328>"
[1] "<0x1c3f3b8>"
[1] "<0x1c3f328>"
[1] "<0x1c3f3b8>"
[1] "<0x1c3f328>"
Hope that helps.

What's the real meaning about 'Everything that exists is an object' in R?

I saw:
“To understand computations in R, two slogans are helpful:
• Everything that exists is an object.
• Everything that happens is a function call."
— John Chambers
But I just found:
a <- 2
is.object(a)
# FALSE
Actually, if a variable is a pure base type, it's result is.object() would be FALSE. So it should not be an object.
So what's the real meaning about 'Everything that exists is an object' in R?
The function is.object seems only to look if the object has a "class" attribute. So it has not the same meaning as in the slogan.
For instance:
x <- 1
attributes(x) # it does not have a class attribute
NULL
is.object(x)
[1] FALSE
class(x) <- "my_class"
attributes(x) # now it has a class attribute
$class
[1] "my_class"
is.object(x)
[1] TRUE
Now, trying to answer your real question, about the slogan, this is how I would put it. Everything that exists in R is an object in the sense that it is a kind of data structure that can be manipulated. I think this is better understood with functions and expressions, which are not usually thought as data.
Taking a quote from Chambers (2008):
The central computation in R is a function call, defined by the
function object itself and the objects that are supplied as the
arguments. In the functional programming model, the result is defined
by another object, the value of the call. Hence the traditional motto
of the S language: everything is an object—the arguments, the value,
and in fact the function and the call itself: All of these are defined
as objects. Think of objects as collections of data of all kinds. The data contained and the way the data is organized depend on the class from which the object was generated.
Take this expression for example mean(rnorm(100), trim = 0.9). Until it is is evaluated, it is an object very much like any other. So you can change its elements just like you would do it with a list. For instance:
call <- substitute(mean(rnorm(100), trim = 0.9))
call[[2]] <- substitute(rt(100,2 ))
call
mean(rt(100, 2), trim = 0.9)
Or take a function, like rnorm:
rnorm
function (n, mean = 0, sd = 1)
.Call(C_rnorm, n, mean, sd)
<environment: namespace:stats>
You can change its default arguments just like a simple object, like a list, too:
formals(rnorm)[2] <- 100
rnorm
function (n, mean = 100, sd = 1)
.Call(C_rnorm, n, mean, sd)
<environment: namespace:stats>
Taking one more time from Chambers (2008):
The key concept is that expressions for evaluation are themselves
objects; in the traditional motto of the S language, everything is an
object. Evaluation consists of taking the object representing an
expression and returning the object that is the value of that
expression.
So going back to our call example, the call is an object which represents another object. When evaluated, it becomes that other object, which in this case is the numeric vector with one number: -0.008138572.
set.seed(1)
eval(call)
[1] -0.008138572
And that would take us to the second slogan, which you did not mention, but usually comes together with the first one: "Everything that happens is a function call".
Taking again from Chambers (2008), he actually qualifies this statement a little bit:
Nearly everything that happens in R results from a function call.
Therefore, basic programming centers on creating and refining
functions.
So what that means is that almost every transformation of data that happens in R is a function call. Even a simple thing, like a parenthesis, is a function in R.
So taking the parenthesis like an example, you can actually redefine it to do things like this:
`(` <- function(x) x + 1
(1)
[1] 2
Which is not a good idea but illustrates the point. So I guess this is how I would sum it up: Everything that exists in R is an object because they are data which can be manipulated. And (almost) everything that happens is a function call, which is an evaluation of this object which gives you another object.
I love that quote.
In another (as of now unpublished) write-up, the author continues with
R has a uniform internal structure for representing all objects. The evaluation process keys off that structure, in a simple form that is essentially
composed of function calls, with objects as arguments and an object as the
value. Understanding the central role of objects and functions in R makes
use of the software more effective for any challenging application, even those where extending R is not the goal.
but then spends several hundred pages expanding on it. It will be a great read once finished.
Objects For x to be an object means that it has a class thus class(x) returns a class for every object. Even functions have a class as do environments and other objects one might not expect:
class(sin)
## [1] "function"
class(.GlobalEnv)
## [1] "environment"
I would not pay too much attention to is.object. is.object(x) has a slightly different meaning than what we are using here -- it returns TRUE if x has a class name internally stored along with its value. If the class is stored then class(x) returns the stored value and if not then class(x) will compute it from the type. From a conceptual perspective it matters not how the class is stored internally (stored or computed) -- what matters is that in both cases x is still an object and still has a class.
Functions That all computation occurs through functions refers to the fact that even things that you might not expect to be functions are actually functions. For example when we write:
{ 1; 2 }
## [1] 2
if (pi > 0) 2 else 3
## [1] 2
1+2
## [1] 3
we are actually making invocations of the {, if and + functions:
`{`(1, 2)
## [1] 2
`if`(pi > 0, 2, 3)
## [1] 2
`+`(1, 2)
## [1] 3

Why does R's attributes() function fail when using explicit arguments?

I am working with RODBC and parallel to make multiple queries against a data system for some internal reporting. To facilitate making new connections, I am going to extract the connection string from the RODBC object. To do this, I planned to use attributes(). However, I've encountered a behavior that I do not understand. A minimal working example is below:
> example.data <- data.frame(letters = sample(x = LETTERS,size = 20,replace = T),
+ numbers = sample(x = 0:9,size = 20,replace = T))
>
> attributes(obj = example.data)
Error in attributes(obj = example.data) :
supplied argument name 'obj' does not match 'x'
> attributes(example.data)
$names
[1] "letters" "numbers"
$row.names
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
$class
[1] "data.frame"
It should be noted that the obj = behavior is the one tab-suggested by RStudio. However, it causes an error. I tried to review the source code for attributes, but it is a primitive, so I would have to go digging into the C source - with which I am not nearly as familiar.
Why does attributes() fail when an explicit argument (obj =) is used, but runs fine when it is not used? (And should the behavior of RStudio with regard to suggesting obj = be changed?)
This seems like a bug in the documentation for attributes. The parameter probably should be named x. You can call it that way
attributes(x = example.data)
The problem is that attributes() is a primitive function and primitive functions behave differently than regular functions in R. They don't have formal parameters (formals(attributes) returns NULL). For these types of functions, R typically isn't going to parse out parameters by name and will assume they are in a certain positional order for efficiency reasons. That's why it's better not to name them because you cannot change the order of these parameters. There should be no need to name the parameter here.
There are other functions out there that have mismatches between the parameter name in the documentation and the value checked by the code. For example
isS4(pi)
# [1] FALSE
# documented parameter name is "object"
isS4(object=pi)
# Error in isS4(object = pi) :
# supplied argument name 'object' does not match 'x'
isS4(x=pi)
# [1] FALSE
But there are also other primitives out there that use names other than x: e.g. seq_along (uses "along.with=") and quote (uses "expr=").

Sort a list of nontrivial elements in R

In R, I have a list of nontrivial objects (they aren't simple objects like scalars that R can be expected to be able to define an order for). I want to sort the list. Most languages allow the programmer to provide a function or similar that compares a pair of list elements that is passed to a sort function. How can I sort my list?
To make this is as simple I can, say your objects are lists with two elements, a name and a value. The value is a numeric; that's what we want to sort by. You can imagine having more elements and needing to do something more complex to sort.
The sort help page tells us that sort uses xtfrm; xtfrm in turn tells us it will use == and > methods for the class of x[i].
First I'll define an object that I want to sort:
xx <- lapply(c(3,5,7,2,4), function(i) list(name=LETTERS[i], value=i))
class(xx) <- "myobj"
Now, since xtfrm works on the x[i]'s, I need to define a [ function that returns the desired elements but still with the right class
`[.myobj` <- function(x, i) {
class(x) <- "list"
structure(x[i], class="myobj")
}
Now we need == and > functions for the myobj class; this potentially could be smarter by vectorizing these properly; but for the sort function, we know that we're only going to be passing in myobj's of length 1, so I'll just use the first element to define the relations.
`>.myobj` <- function(e1, e2) {
e1[[1]]$value > e2[[1]]$value
}
`==.myobj` <- function(e1, e2) {
e1[[1]]$value == e2[[1]]$value
}
Now sort just works.
sort(xx)
It might be considered more proper to write a full Ops function for your object; however, to just sort, this seems to be all you need. See p.89-90 in Venables/Ripley for more details about doing this using the S3 style. Also, if you can easily write an xtfrm function for your objects, that would be simpler and most likely faster.
The order function will allow you to determine the sort order for character or numeric aruments and break ties with subsequent arguments. You need to be more specific about what you want. Produce an example of a "non-trivial object" and specify the order you desire in some R object. Lists are probably the most non-vectorial objects:
> slist <- list(cc=list(rr=1), bb=list(ee=2, yy=7), zz="ww")
> slist[order(names(slist))] # alpha order on names()
$bb
$bb$ee
[1] 2
$bb$yy
[1] 7
$cc
$cc$rr
[1] 1
$zz
[1] "ww"
slist[c("zz", "bb", "cc")] # an arbitrary ordering
$zz
[1] "ww"
$bb
$bb$ee
[1] 2
$bb$yy
[1] 7
$cc
$cc$rr
[1] 1
One option is to create a xtfrm method for your objects. Functions like order take multiple columns which works in some cases. There are also some specialized functions for specific cases like mixedsort in the gtools package.

Resources