What are the dangers of using R attributes? - r

Adding used-defined attributes to R objects makes it easy to carry around some additional information glued together with the object of interest. The problem is that it slightly changes how R sees the objects, e.g. a numeric vector with additional attribute still is numeric but is not a vector anymore:
x <- rnorm(100)
class(x)
## [1] "numeric"
is.numeric(x)
## [1] TRUE
is.vector(x)
## [1] TRUE
mode(x)
## [1] "numeric"
typeof(x)
## [1] "double"
attr(x, "foo") <- "this is my attribute"
class(x)
## [1] "numeric"
is.numeric(x)
## [1] TRUE
is.vector(x)
## [1] FALSE # <-- here!
mode(x)
## [1] "numeric"
typeof(x)
## [1] "double"
Can this lead to any potential problems? What I'm thinking about is adding some attributes to common R objects and then passing them to other methods. What is the risk of something breaking just because of the fact alone that I added additional attributes to standard R objects (e.g. vector, matrix, data.frame etc.)?
Notice that I'm not asking about creating my own classes. For the sake of simplicity we can also assume that there won't be any conflicts in the names of the attributes (e.g. using dims attribute). Let's also assume that it is not a problem if some method at some point will drop my attribute, it is an acceptable risk.

In my (somewhat limited) experience, adding new attributes to an object hasn't ever broken anything. The only likely scenario I can think of where it would break something would be if a function required that an object have a specific set of attributes and nothing else. I can't think of a time when I've encountered that though. Most functions, especially in S3 methods, will just ignore attributes they don't need.
You're more likely to see problems arise if you remove attributes.
The reason you won't see a lot of problems stemming from additional attributes is that methods are dispatched on the class of an object. As long as the class doesn't change, methods will be dispatched in much the same way. However, this doesn't mean that existing methods will know what to do with your new attributes. Take the following example--after adding a new_attr attribute to both x and y, and then adding them, the result adopts the attribute of x. What happened to the attribute of y? The default + function doesn't know what to do with conflicting attributes of the same name, so it just takes the first one (more details at R Language Definition, thanks Brodie).
x <- 1:10
y <- 10:1
attr(x, "new_attr") <- "yippy"
attr(y, "new_attr") <- "ki yay"
x + y
[1] 1 2 3 4 5 6 7 8 9 10
attr(,"new_attr")
[1] "yippy"
In a different example, if we give x and y attributes with different names, x + y produces an object that preserves both attributes.
x <- 1:10
y <- 10:1
attr(x, "new_attr") <- "yippy"
attr(y, "another_attr") <- "ki yay"
x + y
[1] 11 11 11 11 11 11 11 11 11 11
attr(,"another_attr")
[1] "ki yay"
attr(,"new_attr")
[1] "yippy"
On the other hand, mean(x) doesn't even try to preserve the attributes. I don't know of a good way to predict which functions will and won't preserve attributes. There's probably some reliable mnemonic you could use in base R (aggregation vs. vectorized, perhaps?), but I think there's a separate principle that ought to be considered.
If preservation of your new attributes is important, you should define a new class that preserves the inheritance of the old class
With a new class, you can write methods that extend the generics and handle the attributes in whichever way you want. Whether or not you should define a new class and write its methods is very much dependent on how valuable any new attributes you add are to the future work you will be doing.
So in general, adding new attributes is very unlikely to break anything in R. But without adding a new class and methods to handle the new attributes, I would be very cautious about interpreting the meaning of those attributes after they've been passed through other functions.

Related

Which print method is used for atomic vectors?

I don't find which print method is used for the different classes of atomic vectors.
E.g., why are characters printed with quotes, and numerics are not?
I don't find a print.numeric/ print.character etc method.
The reason for it is, apart from the desire of deeper understanding, to create a print method for a new class, and I'd like to understand how the current class is printed.
Example: Assigning a new class to the atomic x, makes print print the attributes, which I don't want. Understanding which print method is behind this would help me tweak this.
x <- 1:5
x
#> [1] 1 2 3 4 5
class(x) <- c(class(x), "new")
x
#> [1] 1 2 3 4 5
#> attr(,"class")
#> [1] "integer" "new"
It depends how deep you want to go into the explanation Tjebo. For the built-in classes, the print.default method is called, which in turn calls some internal C code.
The internal C function that is called in print.default is defined here. The C code takes the R object as a SEXP object and decides what to do with it by checking its fundamental type and using a switch statement to determine the format of printing to the console using the C print method sprintf.
It's no mystery, since you can trace the code through quite easily, but essentially the print methods for the basic types are defined in C code and you can't change them directly.
However, that doesn't stop you from overriding them by defining your own print methods for the built in types:
print.character <- function(x) cat("I print characters")
print("a")
#> I print characters
And you don't need to settle for the default printing of attributes, etc, when you define a new class:
x <- 1:5
class(x) <- c(class(x), "new")
print.new <- function(x) cat("My fancy new class prints like this:", x)
x
#> My fancy new class prints like this: 1 2 3 4 5

What's the real meaning about 'Everything that exists is an object' in R?

I saw:
“To understand computations in R, two slogans are helpful:
• Everything that exists is an object.
• Everything that happens is a function call."
— John Chambers
But I just found:
a <- 2
is.object(a)
# FALSE
Actually, if a variable is a pure base type, it's result is.object() would be FALSE. So it should not be an object.
So what's the real meaning about 'Everything that exists is an object' in R?
The function is.object seems only to look if the object has a "class" attribute. So it has not the same meaning as in the slogan.
For instance:
x <- 1
attributes(x) # it does not have a class attribute
NULL
is.object(x)
[1] FALSE
class(x) <- "my_class"
attributes(x) # now it has a class attribute
$class
[1] "my_class"
is.object(x)
[1] TRUE
Now, trying to answer your real question, about the slogan, this is how I would put it. Everything that exists in R is an object in the sense that it is a kind of data structure that can be manipulated. I think this is better understood with functions and expressions, which are not usually thought as data.
Taking a quote from Chambers (2008):
The central computation in R is a function call, defined by the
function object itself and the objects that are supplied as the
arguments. In the functional programming model, the result is defined
by another object, the value of the call. Hence the traditional motto
of the S language: everything is an object—the arguments, the value,
and in fact the function and the call itself: All of these are defined
as objects. Think of objects as collections of data of all kinds. The data contained and the way the data is organized depend on the class from which the object was generated.
Take this expression for example mean(rnorm(100), trim = 0.9). Until it is is evaluated, it is an object very much like any other. So you can change its elements just like you would do it with a list. For instance:
call <- substitute(mean(rnorm(100), trim = 0.9))
call[[2]] <- substitute(rt(100,2 ))
call
mean(rt(100, 2), trim = 0.9)
Or take a function, like rnorm:
rnorm
function (n, mean = 0, sd = 1)
.Call(C_rnorm, n, mean, sd)
<environment: namespace:stats>
You can change its default arguments just like a simple object, like a list, too:
formals(rnorm)[2] <- 100
rnorm
function (n, mean = 100, sd = 1)
.Call(C_rnorm, n, mean, sd)
<environment: namespace:stats>
Taking one more time from Chambers (2008):
The key concept is that expressions for evaluation are themselves
objects; in the traditional motto of the S language, everything is an
object. Evaluation consists of taking the object representing an
expression and returning the object that is the value of that
expression.
So going back to our call example, the call is an object which represents another object. When evaluated, it becomes that other object, which in this case is the numeric vector with one number: -0.008138572.
set.seed(1)
eval(call)
[1] -0.008138572
And that would take us to the second slogan, which you did not mention, but usually comes together with the first one: "Everything that happens is a function call".
Taking again from Chambers (2008), he actually qualifies this statement a little bit:
Nearly everything that happens in R results from a function call.
Therefore, basic programming centers on creating and refining
functions.
So what that means is that almost every transformation of data that happens in R is a function call. Even a simple thing, like a parenthesis, is a function in R.
So taking the parenthesis like an example, you can actually redefine it to do things like this:
`(` <- function(x) x + 1
(1)
[1] 2
Which is not a good idea but illustrates the point. So I guess this is how I would sum it up: Everything that exists in R is an object because they are data which can be manipulated. And (almost) everything that happens is a function call, which is an evaluation of this object which gives you another object.
I love that quote.
In another (as of now unpublished) write-up, the author continues with
R has a uniform internal structure for representing all objects. The evaluation process keys off that structure, in a simple form that is essentially
composed of function calls, with objects as arguments and an object as the
value. Understanding the central role of objects and functions in R makes
use of the software more effective for any challenging application, even those where extending R is not the goal.
but then spends several hundred pages expanding on it. It will be a great read once finished.
Objects For x to be an object means that it has a class thus class(x) returns a class for every object. Even functions have a class as do environments and other objects one might not expect:
class(sin)
## [1] "function"
class(.GlobalEnv)
## [1] "environment"
I would not pay too much attention to is.object. is.object(x) has a slightly different meaning than what we are using here -- it returns TRUE if x has a class name internally stored along with its value. If the class is stored then class(x) returns the stored value and if not then class(x) will compute it from the type. From a conceptual perspective it matters not how the class is stored internally (stored or computed) -- what matters is that in both cases x is still an object and still has a class.
Functions That all computation occurs through functions refers to the fact that even things that you might not expect to be functions are actually functions. For example when we write:
{ 1; 2 }
## [1] 2
if (pi > 0) 2 else 3
## [1] 2
1+2
## [1] 3
we are actually making invocations of the {, if and + functions:
`{`(1, 2)
## [1] 2
`if`(pi > 0, 2, 3)
## [1] 2
`+`(1, 2)
## [1] 3

R: Avoid accidently overwriting variables

Is there any way to define a variable in R in your namespace, such that it can't be overwritten (maybe ala a "Final" declaration)? Something like the following psuedocode:
> xvar <- 10
> xvar
[1] 10
xvar <- 6
> "Error, cannot overwrite this variable unless you remove its finality attribute"
Motivation: When running R scripts multiple times, it's sometimes too easy to inadvertently overwrite variables.
Check out ? lockBinding:
a <- 2
a
## [1] 2
lockBinding('a', .GlobalEnv)
a <- 3
## Error: cannot change value of locked binding for 'a'
And its complement, unlockBinding:
unlockBinding('a', .GlobalEnv)
a <- 3
a
## [1] 3
You can make variables constant using the pryr package.
install_github("pryr")
library(pryr)
xvar %<c-% 10
xvar
## [1] 10
xvar <- 6
## Error: cannot change value of locked binding for 'xvar'
The %<c-% operator is a convenience wrapper for assign + lockBinding.
Like Baptiste said in the comments: if you are having problems with this, it's a possible sign of poor coding style. Bundling the majority of your logic into functions will reduce variable name clashes.

R object identity

is there a way to test whether two objects are identical in the R language?
For clarity: I do not mean identical in the sense of the identical function,
which compares objects based on certain properties like numerical values or logical values etc.
I am really interested in object identity, which for example could be tested using the is operator in the Python language.
UPDATE: A more robust and faster implementation of address(x) (not using .Internal(inspect(x))) was added to data.table v1.8.9. From NEWS :
New function address() returns the address in RAM of its argument. Sometimes useful in determining whether a value has been copied or not by R, programatically.
There's probably a neater way but this seems to work.
address = function(x) substring(capture.output(.Internal(inspect(x)))[1],2,17)
x = 1
y = 1
z = x
identical(x,y)
# [1] TRUE
identical(x,z)
# [1] TRUE
address(x)==address(y)
# [1] FALSE
address(x)==address(z)
# [1] TRUE
You could modify it to work on 32bit by changing 17 to 9.
You can use the pryr package.
For example, return the memory location of the mtcars object:
pryr::address(mtcars)
Then, for variables a and b, you can check:
address(a) == address(b)

Sort a list of nontrivial elements in R

In R, I have a list of nontrivial objects (they aren't simple objects like scalars that R can be expected to be able to define an order for). I want to sort the list. Most languages allow the programmer to provide a function or similar that compares a pair of list elements that is passed to a sort function. How can I sort my list?
To make this is as simple I can, say your objects are lists with two elements, a name and a value. The value is a numeric; that's what we want to sort by. You can imagine having more elements and needing to do something more complex to sort.
The sort help page tells us that sort uses xtfrm; xtfrm in turn tells us it will use == and > methods for the class of x[i].
First I'll define an object that I want to sort:
xx <- lapply(c(3,5,7,2,4), function(i) list(name=LETTERS[i], value=i))
class(xx) <- "myobj"
Now, since xtfrm works on the x[i]'s, I need to define a [ function that returns the desired elements but still with the right class
`[.myobj` <- function(x, i) {
class(x) <- "list"
structure(x[i], class="myobj")
}
Now we need == and > functions for the myobj class; this potentially could be smarter by vectorizing these properly; but for the sort function, we know that we're only going to be passing in myobj's of length 1, so I'll just use the first element to define the relations.
`>.myobj` <- function(e1, e2) {
e1[[1]]$value > e2[[1]]$value
}
`==.myobj` <- function(e1, e2) {
e1[[1]]$value == e2[[1]]$value
}
Now sort just works.
sort(xx)
It might be considered more proper to write a full Ops function for your object; however, to just sort, this seems to be all you need. See p.89-90 in Venables/Ripley for more details about doing this using the S3 style. Also, if you can easily write an xtfrm function for your objects, that would be simpler and most likely faster.
The order function will allow you to determine the sort order for character or numeric aruments and break ties with subsequent arguments. You need to be more specific about what you want. Produce an example of a "non-trivial object" and specify the order you desire in some R object. Lists are probably the most non-vectorial objects:
> slist <- list(cc=list(rr=1), bb=list(ee=2, yy=7), zz="ww")
> slist[order(names(slist))] # alpha order on names()
$bb
$bb$ee
[1] 2
$bb$yy
[1] 7
$cc
$cc$rr
[1] 1
$zz
[1] "ww"
slist[c("zz", "bb", "cc")] # an arbitrary ordering
$zz
[1] "ww"
$bb
$bb$ee
[1] 2
$bb$yy
[1] 7
$cc
$cc$rr
[1] 1
One option is to create a xtfrm method for your objects. Functions like order take multiple columns which works in some cases. There are also some specialized functions for specific cases like mixedsort in the gtools package.

Resources