I don't find which print method is used for the different classes of atomic vectors.
E.g., why are characters printed with quotes, and numerics are not?
I don't find a print.numeric/ print.character etc method.
The reason for it is, apart from the desire of deeper understanding, to create a print method for a new class, and I'd like to understand how the current class is printed.
Example: Assigning a new class to the atomic x, makes print print the attributes, which I don't want. Understanding which print method is behind this would help me tweak this.
x <- 1:5
x
#> [1] 1 2 3 4 5
class(x) <- c(class(x), "new")
x
#> [1] 1 2 3 4 5
#> attr(,"class")
#> [1] "integer" "new"
It depends how deep you want to go into the explanation Tjebo. For the built-in classes, the print.default method is called, which in turn calls some internal C code.
The internal C function that is called in print.default is defined here. The C code takes the R object as a SEXP object and decides what to do with it by checking its fundamental type and using a switch statement to determine the format of printing to the console using the C print method sprintf.
It's no mystery, since you can trace the code through quite easily, but essentially the print methods for the basic types are defined in C code and you can't change them directly.
However, that doesn't stop you from overriding them by defining your own print methods for the built in types:
print.character <- function(x) cat("I print characters")
print("a")
#> I print characters
And you don't need to settle for the default printing of attributes, etc, when you define a new class:
x <- 1:5
class(x) <- c(class(x), "new")
print.new <- function(x) cat("My fancy new class prints like this:", x)
x
#> My fancy new class prints like this: 1 2 3 4 5
Related
I want to determine if an arbitrary function is an S3 method overriding a generic function. If so, I would want to identify the generic as next step.
The Problem
I need something like HasGeneric(AMethod) that tells me whether my argument AMethod is a method overriding some generic. Or even better FindGeneric(AMethod) that returns the according generic for AMethod.
The first idea I had was parsing the function name because the pattern "generic.class" is recommended to be used to name methods. Example:
print is the generic.
print.aov overrides print for the S3 class aov.
But string parsing is not without ambiguities: all.equal.data.frame (in the dplyr package).
Furthermore, the pattern isn't even mandatory anymore. Look at this:
a <- 1:5
class(a) <- "SomeTestClassOfMineNotToBeConfused"
# Register some arbitrarily named method as override for print
thisismyaprinter <- function(x) cat("I am A okay")
.S3method("print", "SomeTestClassOfMineNotToBeConfused", thisismyaprinter) # register as override for `print`
methods(class = "SomeTestClassOfMineNotToBeConfused") # now `print` is in the list of methods
#> [1] coerce initialize print show slotsFromS3
#> see '?methods' for accessing help and source code
print(a) # test works, too
#> I am A okay
As we all know, the output of a as simple vector 1:5 would be:
print(a <- 1:5)
#> [1] 1 2 3 4 5
Tried solutions
All functions that I found to investigate methods either assume that I know the name of the generic or the class (which I don't). I could (of course)
go over all available generics using .knownS3Generics.
Get their methods with methods()
And search them with the function name that I have.
That would not only be slow. It would not even work. When I looked up the methods for print in my example above I got this:
methods(print)
#> [201] ...
#> [202] print.SomeTestClassOfMineNotToBeConfused*
#> [203] ...
R did register the method with the "generic.class" name pattern. But it is not possible to access this:
print.SomeTestClassOfMineNotToBeConfused(a)
#> Error in print.SomeTestClassOfMineNotToBeConfused(a): konnte Funktion "print.SomeTestClassOfMineNotToBeConfused" nicht finden
... which means R could not find the function print.SomeTestClassOfMineNotToBeConfused.
You can use getDispatchMethodS3 form R.methodsS3
R.methodsS3::getDispatchMethodS3("print" ,
"SomeTestClassOfMineNotToBeConfused")(a)
#> I am A okay
I came across an object with round brackets like: (a = 1)
In contrast to a = 1, with (a = 1) the output is presented in the console.
a = 1 # a=1 on console
(a = 1) # a=1 and [1] 1 on console
My question is how do you call this situation? Object in round brackets?
The assignment operators (= and <-, to name two of several) both do their job (assign to a variable) and invisibly return the value assigned. In some cases, this allows "chaining" assignment,
aa <- ab <- ac <- 5
but has some other uses. Some incorrectly assume that it returns the entire LHS variable, but it only (invisibly) returns the value passed through it.
vec <- 1:4
(vec[2] <- 99)
# [1] 99
It is sometimes used in answers (here on SO and elsewhere) as a short-hand for both assigning something and showing what that is. For instance, see the difference in presentation only between the following two commands:
dat <- data.frame(a=1, b=2)
### (nothing printed)
(dat <- data.frame(a=1, b=2))
# a b
# 1 1 2
To your question "how do you call this", I'm not sure there's a great answer for that. Because it is using invisible(.) in the return value, the console's default print methods are not called. By wrapping it in parens, you are subverting that intent, so it is doing the default console thing of printing a value.
A good way to describe it is "an assignment wrapped in parentheses".
Adding used-defined attributes to R objects makes it easy to carry around some additional information glued together with the object of interest. The problem is that it slightly changes how R sees the objects, e.g. a numeric vector with additional attribute still is numeric but is not a vector anymore:
x <- rnorm(100)
class(x)
## [1] "numeric"
is.numeric(x)
## [1] TRUE
is.vector(x)
## [1] TRUE
mode(x)
## [1] "numeric"
typeof(x)
## [1] "double"
attr(x, "foo") <- "this is my attribute"
class(x)
## [1] "numeric"
is.numeric(x)
## [1] TRUE
is.vector(x)
## [1] FALSE # <-- here!
mode(x)
## [1] "numeric"
typeof(x)
## [1] "double"
Can this lead to any potential problems? What I'm thinking about is adding some attributes to common R objects and then passing them to other methods. What is the risk of something breaking just because of the fact alone that I added additional attributes to standard R objects (e.g. vector, matrix, data.frame etc.)?
Notice that I'm not asking about creating my own classes. For the sake of simplicity we can also assume that there won't be any conflicts in the names of the attributes (e.g. using dims attribute). Let's also assume that it is not a problem if some method at some point will drop my attribute, it is an acceptable risk.
In my (somewhat limited) experience, adding new attributes to an object hasn't ever broken anything. The only likely scenario I can think of where it would break something would be if a function required that an object have a specific set of attributes and nothing else. I can't think of a time when I've encountered that though. Most functions, especially in S3 methods, will just ignore attributes they don't need.
You're more likely to see problems arise if you remove attributes.
The reason you won't see a lot of problems stemming from additional attributes is that methods are dispatched on the class of an object. As long as the class doesn't change, methods will be dispatched in much the same way. However, this doesn't mean that existing methods will know what to do with your new attributes. Take the following example--after adding a new_attr attribute to both x and y, and then adding them, the result adopts the attribute of x. What happened to the attribute of y? The default + function doesn't know what to do with conflicting attributes of the same name, so it just takes the first one (more details at R Language Definition, thanks Brodie).
x <- 1:10
y <- 10:1
attr(x, "new_attr") <- "yippy"
attr(y, "new_attr") <- "ki yay"
x + y
[1] 1 2 3 4 5 6 7 8 9 10
attr(,"new_attr")
[1] "yippy"
In a different example, if we give x and y attributes with different names, x + y produces an object that preserves both attributes.
x <- 1:10
y <- 10:1
attr(x, "new_attr") <- "yippy"
attr(y, "another_attr") <- "ki yay"
x + y
[1] 11 11 11 11 11 11 11 11 11 11
attr(,"another_attr")
[1] "ki yay"
attr(,"new_attr")
[1] "yippy"
On the other hand, mean(x) doesn't even try to preserve the attributes. I don't know of a good way to predict which functions will and won't preserve attributes. There's probably some reliable mnemonic you could use in base R (aggregation vs. vectorized, perhaps?), but I think there's a separate principle that ought to be considered.
If preservation of your new attributes is important, you should define a new class that preserves the inheritance of the old class
With a new class, you can write methods that extend the generics and handle the attributes in whichever way you want. Whether or not you should define a new class and write its methods is very much dependent on how valuable any new attributes you add are to the future work you will be doing.
So in general, adding new attributes is very unlikely to break anything in R. But without adding a new class and methods to handle the new attributes, I would be very cautious about interpreting the meaning of those attributes after they've been passed through other functions.
I ran this in R:
a <- factor(c("A","A","B","A","B","B","C","A","C"))
And then I made a table
results <- table(a)
but when I run
> attributes(results)
$dim
[1] 3
$dimnames
$dimnames$a
But I'm confused why does a show up in my attributes? I've programmed in Java before and I thought variable names weren't supposed to show up in your functions .
R functions can not only see the data you pass to them, but they can see the actual call that was run to invoke them. So when you run, table(a) the table() function not only sees the values of a, but is also can see that those values came from a variable named a.
So by default table() likes to name each dimension in the resulting table. If you don't pass explicit names in the call via the dnn= parameter, table() will look back to the call, and turn the variable name into a character and use that value for the dimension name.
So after table() has ran, it has no direct connection to the variable a, it merely used the name of that variable as a character label of the results.
Many functions in R do this. For example this is similar to how plot(height~weight, data=data.frame(height=runif(10), weight=runif(10))) knows to use the names "weight" and "height" for the axis labels on the plot.
Here's a simple example to show one way this can be accomplished.
paramnames <- function(...) {
as.character(substitute(...()))
}
paramnames(a,b,x)
# [1] "a" "b" "x"
I think the only answer is because the designers wanted it that way. It seems reasonable to label table objects with the names of variables that formed the margins:
> b <- c(1,1,1,2,2,2, 3,3,3)
> table(a, b)
b
a 1 2 3
A 2 1 1
B 1 2 0
C 0 0 2
R was intended as a clone of S, and S was intended as a tool for working statisticians. R also has a handy function for working with table objects, as.data.frame:
> as.data.frame(results)
a Freq
1 A 4
2 B 3
3 C 2
If you want to build a function that performs the same sort of labeling or that otherwise retrieves the name of the object passed to your function then there is the deparse(substitute(.))-maneuver:
myfunc <- function(x) { nam <- deparse(substitute(x)); print(nam)}
> myfunc <- function(x) { nam <- deparse(substitute(x)); print(nam)}
> myfunc(z)
[1] "z"
> str(z)
Error in str(z) : object 'z' not found
So "z" doesn't even need to exist. Highly "irregular" if you ask me. If you "ask" myfunc what its argument list looks like you get the expected answer:
> formals(myfunc)
$x
But that is a list with an R-name for its single element x. R names are language elements, whereas the names function will retrieve it as a character value, "x", which is not a language element:
> names(formals(myfunc))
[1] "x"
R has some of the aspects of Lisp (interpreted, functional (usually)) although the dividing line between its language functions and the data objects seems less porous to me, but I'm not particularly proficient in Lisp.
I know the function get can help you transform values to variable names, like get(test[1]). But I find it is not compatible when the values is in a list format. Look at below example:
> A = c("obj1$length","obj1$width","obj1$height","obj1$weight")
> obj1 <- NULL
> obj1$length=c(1:4);obj1$width=c(5:8);obj1$height=c(9:12);obj1$weight=c(13:16)
> get(A[1])
Error in get(A[1]) : object 'obj1$length' not found
In this case, how can I retrieve the variable name?
get doesn't work like that you need to specify the variable and environment (the list is coerced to one) separately:
get("length",obj1)
[1] 1 2 3 4
Do do it with the data you have, you need to use eval and parse:
eval(parse(text=A[1]))
[1] 1 2 3 4
However, I suggest you rethink your approach to the problem as get, eval and parse are blunt tools that can bite you later.
I think that eval() function will do the trick, among other uses.
eval(A[1])
>[1] 1 2 3 4
You could also find useful this simple function I implemented (based in the commonly used combo eval, parse, paste):
evaluate<-function(..., envir=.GlobalEnv){ eval(parse(text=paste( ... ,sep="")), envir=envir) }
It concatenates and evaluates several character type objects. If you want it to be used inside another function, add at the begining of your function
envir <- environment()
and use it like this:
evaluate([some character objects], envir=envir)
Try, for example
myvariable1<-"aaa"; myvariable2<-"bbb"; aaa<-15; bbb<-3
evaluate(myvariable1," * ",myvariable2).
I find it very usefull when I have to evaluate similar sentences with several variables, or when I want to create variables with automatically generated names.
for(i in 1:100){evaluate("variable",i,"<-2*",i)}