Difference between function objects in R - r

Suppose I create a list of functions in my R workspace, the same set of functions are also in a R file and after source(), the sourced function object should be identical to the corresponding function in the list I created, but this is not the case.
Example:
The f.R file contains f <- function(x) x^2.
In R console:
lst <- list(f=function(x) x^2)
source("f.R")
> ls()
[1] "f" "lst"
> identical(f,lst$f)
[1] FALSE
> str(f)
function (x)
- attr(*, "srcref")=Class 'srcref' atomic [1:8] 1 6 1 20 6 20 1 1
.. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x1b2fd60>
> str(lst$f)
function (x)
- attr(*, "srcref")=Class 'srcref' atomic [1:8] 1 16 1 30 16 30 1 1
.. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x1bb4b50>
I also tried:
> identical(f,lst$f, ignore.environment=TRUE)
[1] FALSE
> all.equal(f,lst$f)
[1] TRUE
Why is that?
EDIT:
In general:
f <- function(x) x^2
g <- function(x) x^2
identical(f,g)
[1] FALSE
Why would the attributes of f and g be different? Does this suggest one should never use identical to test equality between function objects ?

You've proved this yourself:
> str(f)
function (x)
- attr(*, "srcref")=Class 'srcref' atomic [1:8] 1 6 1 20 6 20 1 1
.. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x1b2fd60>
> str(lst$f)
function (x)
- attr(*, "srcref")=Class 'srcref' atomic [1:8] 1 16 1 30 16 30 1 1
.. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x1bb4b50>
These attributes are not identical, therefore the objects are not identical.
More info about these attributes is here:
What/Where are the attributes of a function object?
These attributes allow R to refer to the function's source code. Your two functions have been defined in different places, and so their source code references (such as which file they were defined in) are different.
The easiest difference to see is the srcfile attribute of the srcref:
> attr(attr(f,"srcref"),"srcfile")
f.R
> attr(attr(lst$f,"srcref"),"srcfile")
>
Your f function is defined in f.R, so R keeps that in the attribute. Your lst$f is defined at the command line, so has a blank srcfile attribute.
Other parts of the attribute are also different because of how the code is parsed.
Specifically, the srcfile environment attribute is different:
> f <- function(x) x^2
> g <- function(x) x^2
> identical(f,g)
[1] FALSE
> print.default(attr(attr(g,"srcref"),"srcfile"))
<environment: 0x5b9fe40>
attr(,"class")
[1] "srcfilecopy" "srcfile"
> print.default(attr(attr(f,"srcref"),"srcfile"))
<environment: 0x5ba1028>
attr(,"class")
[1] "srcfilecopy" "srcfile"
I would say testing for identicality between function objects is probably not a good thing to do. The only time I can reliably believe two functions to be identical is if it has been created by assignment from the other one, eg: f2 = f1 implies identical(f1,f2).
Any other way of creating a function is prone to differences in srcref, enclosing environment, and even writing things with or without brackets.
Note all this is avoided if you set keep.source to FALSE.
> options(keep.source=TRUE)
> f <- function(x) x^2
> g <- function(x) x^2
> identical(f,g)
[1] FALSE
> options(keep.source=FALSE)
> f <- function(x) x^2
> g <- function(x) x^2
> identical(f,g)
[1] TRUE
But I still think you need to think very carefully about what constitutes identical functions for your purposes.
As for the ignore.environment argument, that ignores the environment in which the function was created. I'll write a function that creates functions to illustrate this:
foo creates an "adder" function that adds x to its argument:
> foo=function(x){print(x);function(z){z*x}}
So I can make an add-1 function and an add-2 function:
> f1 = foo(1)
[1] 1
> f2 = foo(2)
[1] 2
These functions are storing the local x in their environment. They are not identical by default:
> identical(f1,f2)
[1] FALSE
But are identical if you ignore the environment in which they were created:
> identical(f1,f2,ignore.environment=TRUE)
[1] TRUE
If you print them, they look identical (and I've set options(keep.source=FALSE) already) except for the environment hex-code reference:
> f1
function (z)
{
z * x
}
<environment: 0x8e4ea54>
> f2
function (z)
{
z * x
}
<environment: 0x8e4fb5c>
That's the environment ignored by ignore.environment.

Related

For new objects in R, how to assign values through subscripting?

For new objects in R, how can I specify assignments to subscripted elements? As in object[3] <- new value. Here is a specific example of the problem I have.
# Rectangle example:
Rectangle <- function(a, b,...){
R <- list(a=a, b=b, others=list(...))
structure(R, class="Rectangle")
}#
`[.Rectangle` <- function(R,ind){
if(ind==1) return(R$a)
if(ind==2) return(R$b)
if(ind>=3) return(R$others[[ind-2]])
}#
R <- Rectangle(2,3,"other1","other2")
> R[1]; R[2]; R[3]; R[4];
[1] 2
[1] 3
[1] "other1"
[1] "other2"
> R[4] <- "new.other";
> R[1]; R[2]; R[3]; R[4];
[1] 2
[1] 3
[1] "other1"
[1] "other2"
Clearly, the assignment to the subscripted object hasn't worked. I would like to know the syntax to define such assignments properly. That is, I would need an example for the following:
`[<-.Rectangle` <- function(){ }
Thank you very much.
To override subset-assign, your function needs to accept three arguments (x, index, value) and return the modified object. It is important that the third parameter is called exactly value, since R internally calls the function using that name (rather than positionally).
Here’s an example:
`[<-.Rectangle` = function (x, index, value) {
if (index == 1L) {
x$a = value
}
else if (index == 2L) {
x$b = value
}
else {
x$others[[index - 2L]] = value
}
x
}
It probably goes without saying that this is a pretty convoluted logic, I’m not convinced that real-world code should have objects with such an API.
Maybe this help you:
> str(R)
List of 3
$ a : num 2
$ b : num 3
$ others:List of 2
..$ : chr "other1"
..$ : chr "other2"
- attr(*, "class")= chr "Rectangle"
> R[4]='hello'
> str(R)
List of 4
$ a : num 2
$ b : num 3
$ others:List of 2
..$ : chr "other1"
..$ : chr "other2"
$ : chr "hello"
- attr(*, "class")= chr "Rectangle"
> R[4]
[1] "other2"
> R[[4]]
[1] "hello"

how to iterate inside the elements of a quosure of rlang in R

So let's say that I want to now if X appears in the quosure.
library(rlang)
library(purrr)
q <- quo(mean(X))
I know I can check for equality with expr
q[[2]][[2]] == expr(X)
[1] TRUE
But how do I iterate or flatten the quo element? flatten(q) doesn't work, I couldn't use for loops, no idea how to use some map function from purrr.
Ideally I would like to capture X when it's "data" and not any function.
I use the following custom function to convert expressions to their Abstract Syntax Trees (ASTs):
getAST <- function( ee ) { as.list(ee) %>% purrr::map_if(is.call, getAST) }
Since you're working with quosures, there's an intermediate step of retrieving the associated expression:
## Define a quosure
## Side note: don't use q as a variable name; it conflicts with q()
qsr <- quo( mean(5*X+2) )
## The associated expression
xpr <- rlang::get_expr( qsr )
## ...and its AST
ast <- getAST( xpr )
# List of 2
# $ : symbol mean
# $ :List of 3
# ..$ : symbol +
# ..$ :List of 3
# .. ..$ : symbol *
# .. ..$ : num 5
# .. ..$ : symbol X
# ..$ : num 2
From here, you can use standard techniques to find X. For example, flatten the nested list and compare each element to expr(X) as in your question:
purrr::has_element( unlist(ast), expr(X) )
# [1] TRUE
purrr::map_lgl( unlist(ast), identical, expr(X) )
# [1] FALSE FALSE FALSE FALSE TRUE FALSE

Subtracting r objects of class 'times'

I have two objects of class 'times' generated using chron that I am trying to compare. On the surface they look identical:
> str(x)
Class 'times' atomic [1:6] 0.04444 0.05417 0.05486 0.00208 0.01111 ...
..- attr(*, "format")= chr "h:m:s"
> str(y)
Class 'times' atomic [1:6] 0.04444 0.05417 0.05486 0.00208 0.01111 ...
..- attr(*, "format")= chr "h:m:s"
So I expected that x - y = 0 or x==y would return TRUE, but this is not the case:
> x-y
[1] -6.245005e-17 -2.775558e-17 -2.775558e-17 7.372575e-18 -7.112366e-17 0.000000e+00
> x==y
[1] FALSE FALSE FALSE FALSE FALSE TRUE
Any idea what is going on or how I can compare the two? I already tried changing it to POSIXct and that works, but before comparing, I have operations to do on the data frame columns this data comes from (adding and subtracting), which can't be done with POSIXct. Also, it requires extra steps and this is meant to be a quick check up to see if there are any discrepencies in the data.
I guess I can use as.character(x)==as.character(y), and it works, but there has to be a more elegant way of doing this...

Access transition matrix from markovchainFit object

I want to first calculate a markov transition matrix and then take exponent of it. To achieve the first goal I use the markovchainFit function inside markovchain package and it return me a data.frame , rather than a matrix. So I need to convert it to matrix before I take exponent.
My R code snippet is like
#################################
# Estimate Transition Matrix #
#################################
setwd("G:/Data_backup/GDP_per_Capita")
library("foreign")
library("Hmisc")
mydata <- stata.get("G:/Data_backup/GDP_per_Capita/states.dta")
mydata
library(markovchain)
library(expm)
rgdp_e=mydata[,2:7]
rgdp_o=mydata[,8:13]
createSequenceMatrix(rgdp_e)
rgdp_e_trans<-markovchainFit(data=rgdp_e,,method="bootstrap",nboot=5, name="Bootstrap Mc")
rgdp_e_trans<-as.numeric(unlist(rgdp_e_trans))
rgdp_e_trans<-as.matrix(rgdp_e_trans)
is.matrix(rgdp_e_trans)
rgdp_e_trans %^% 1/5
the rgdp_e_trans is a data frame, and I try to convert it to a numeric matrix. It seems work when I test it using is.matrix command. However, the final line give me an error said
Error in rgdp_e_trans %^% 2 :
(list) object cannot be coerced to type 'double'
After some searching work in stackoverflow, I find this question sharing the similar problem and use rgdp_e_trans<-as.numeric(unlist(rgdp_e_trans)) to coerce the object to be `double', but it seems not work.
Besides, the data.frame rgdp_e_trans contains no factor or characters
The output in the console is like
> rgdp_e=mydata[,2:7]
> rgdp_o=mydata[,8:13]
> createSequenceMatrix(rgdp_e)
Error: not compatible with STRSXP
> rgdp_e_trans<-markovchainFit(data=rgdp_e,,method="bootstrap",nboot=5, name="Bootstrap Mc")
> rgdp_e_trans
$estimate
1 2 3 4 5
1 0.6172840 0.18930041 0.09053498 0.074074074 0.02880658
2 0.1125828 0.59602649 0.28476821 0.006622517 0.00000000
3 0.0000000 0.03846154 0.60256410 0.358974359 0.00000000
4 0.0000000 0.01162791 0.03488372 0.691860465 0.26162791
5 0.0000000 0.00000000 0.00000000 0.044247788 0.95575221
> rgdp_e_trans<-as.numeric(unlist(rgdp_e_trans))
Error: (list) object cannot be coerced to type 'double'
> rgdp_e_trans<-as.matrix(rgdp_e_trans)
> is.matrix(rgdp_e_trans)
[1] TRUE
> rgdp_e_trans %^% 1/5
Error in rgdp_e_trans %^% 1 :
(list) object cannot be coerced to type 'double'
>
Any suggestion to fix the problem, or alternative way to calculate the exponent ? Thank you.
Additional:
> str(rgdp_e_trans)
List of 1
$ estimate:Formal class 'markovchain' [package "markovchain"] with 4 slots
.. ..# states : chr [1:5] "1" "2" "3" "4" ...
.. ..# byrow : logi TRUE
.. ..# transitionMatrix: num [1:5, 1:5] 0.617 0.113 0 0 0 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:5] "1" "2" "3" "4" ...
.. .. .. ..$ : chr [1:5] "1" "2" "3" "4" ...
.. ..# name : chr "Bootstrap Mc"
and I comment out the as.matrix part
rgdp_e=mydata[,2:7]
rgdp_o=mydata[,8:13]
createSequenceMatrix(rgdp_e)
rgdp_e_trans<-markovchainFit(data=rgdp_e,,method="bootstrap",nboot=5, name="Bootstrap Mc")
rgdp_e_trans
str(rgdp_e_trans)
# rgdp_e_trans<-as.numeric(unlist(rgdp_e_trans))
# rgdp_e_trans<-as.matrix(rgdp_e_trans)
# is.matrix(rgdp_e_trans)
rgdp_e_trans$estimate %^% 1/5
You can access the transition matrix directly from the object returned by markovchainFit as:
rgdp_e_trans$estimate#transitionMatrix
Here rgdp_e_trans is your return value from markovchainFit, which is actually a list containing the information from the fitting process. You access the estimates item of that list by using the $ operator. The estimate object is from a formal S4 class (see e.g. Advanced R by Hadley Wickham for a description of the object systems used in R), which is why in order to access its items you have to use the # operator instead of the standard $ used for the more common S3 objects.
If you print out the return value of as.matrix(rgdp_e_trans) it should be immediately obvious where your initial approach went wrong. In general it's a good idea to check the structure of an object with the str function - instead of relying on its print method - when you encounter unexpected results or are working with new types of objects.

What is "{" class in R?

Here is the code:
mf = function(..., expr) {
expr = substitute(expr)
print(class(expr))
print(str(expr))
expr
}
mf(a = 1, b = 2, expr = {matrix(NA, 4, 4)})
Output:
[1] "{"
length 2 { matrix(NA, 4, 4) }
- attr(*, "srcref")=List of 2
..$ :Class 'srcref' atomic [1:8] 1 25 1 25 25 25 1 1
.. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7fbcdbce3860>
..$ :Class 'srcref' atomic [1:8] 1 26 1 41 26 41 1 1
.. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7fbcdbce3860>
- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7fbcdbce3860>
- attr(*, "wholeSrcref")=Class 'srcref' atomic [1:8] 1 0 1 42 0 42 1 1
.. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7fbcdbce3860>
NULL
{
matrix(NA, 4, 4)
}
Apparently the result of substitute(expr) produces something of the class "{". What is this class exactly? Why is {matrix(NA, 4, 4)} of length 2? What do these strange attrs mean?
The { is the class for a block of code. Just looking at the classes, note the difference between these
mf(a = 1, b = 2, expr = {matrix(NA, 4, 4)})
# [1] "{"
mf(a = 1, b = 2, expr = matrix(NA, 4, 4))
# [1] "call"
A class of { can hold multiple statements. The length() indicates how many statements are in the block (including the start of the block). For example
length(quote({matrix(NA, 4, 4)}))
# [1] 2
length(quote({matrix(NA, 4, 4); matrix(NA,3,3)}))
# [1] 3
length(quote({}))
# [1] 1
The attributes "srcref" and "srcfile" are how R tracks where functions are defined for trying to give informative error messages. You can see the ?srcfile help page for more information about that.
What you're seeing is a reflection of the way R exposes its internal language structure through its own data structures.
The substitute() function returns the parse tree of an R expression. The parse tree is a tree of language elements. These can include literal values, symbols (basically variable names), function calls, and braced blocks. Here's a demonstration of all the R language elements as returned by substitute(), showing their types in all of R's type classification schemes:
tmc <- function(x) c(typeof(x),mode(x),class(x));
tmc(substitute(TRUE));
## [1] "logical" "logical" "logical"
tmc(substitute(4e5L));
## [1] "integer" "numeric" "integer"
tmc(substitute(4e5));
## [1] "double" "numeric" "numeric"
tmc(substitute(4e5i));
## [1] "complex" "complex" "complex"
tmc(substitute('a'));
## [1] "character" "character" "character"
tmc(substitute(somevar));
## [1] "symbol" "name" "name"
tmc(substitute(T));
## [1] "symbol" "name" "name"
tmc(substitute(sum(somevar)));
## [1] "language" "call" "call"
tmc(substitute(somevec[1]));
## [1] "language" "call" "call"
tmc(substitute(somelist[[1]]));
## [1] "language" "call" "call"
tmc(substitute(somelist$x));
## [1] "language" "call" "call"
tmc(substitute({blah}));
## [1] "language" "call" "{"
Notes:
Note how all three type classification schemes are very similar, but subtly different. This can be a source of confusion. typeof() gives the storage type of the object, sometimes called the "internal" type (to be honest, it probably shouldn't be called "internal" because it is frequently exposed very directly to the user at the R level, but it is often described that way; I would call it the "fundamental" or "underlying" type), mode() gives a similar classification scheme that everyone should probably ignore, and class() gives the implicit (if there's no class attribute) or explicit (if there is) class of the object, which is used for S3 method lookup (and, it should be said, is sometimes examined directly by R code, independent of the S3 lookup process).
Note how TRUE is a logical literal, but T is a symbol, just like any other variable name, and just happens to be assigned to TRUE by default (and ditto for F and FALSE). This is why sometimes people recommend against using T and F in favor of using TRUE and FALSE, because T and F can be reassigned (but personally I prefer to use T and F for the concision; no one should ever reassign those!).
The astute reader will notice that in my demonstration of literals, I've omitted the raw type. This is because there's no such thing as a raw literal in R. In fact, there are very few ways to get a hold of raw vectors in R; raw(), as.raw(), charToRaw(), and rawConnectionValue() are the only ways that I'm aware of, and if I used those functions in a substitute() call, they would be returned as "call" objects, just like in the sum(somevar) example, not literal raw values. The same can be said for the list type; there's no such thing as a list literal (although there are many ways to acquire a list via function calls). Plain raw vectors return 'raw' for all three type classifications, and plain lists return 'list' for all three type classifications.
Now, when you have a parse tree that is more complicated than a simple literal value or symbol (meaning it must be a function call or braced expression), you can generally examine the contents of that parse tree by coercing to list. This is how R exposes its internal language structure through its own data structures.
Diving into your example:
pt <- as.list(substitute({matrix(NA,4,4)}));
pt;
## [[1]]
## `{`
##
## [[2]]
## matrix(NA, 4, 4)
This makes it clear why length() returns 2: that's the length of the list that represents the parse tree. In general, the bracing of the expression is translated into the first list component, and the remaining list components are built from the semicolon-separated statements within the braces:
as.list(substitute({}));
## [[1]]
## `{`
##
as.list(substitute({a}));
## [[1]]
## `{`
##
## [[2]]
## a
##
as.list(substitute({a;b}));
## [[1]]
## `{`
##
## [[2]]
## a
##
## [[3]]
## b
##
as.list(substitute({a;b;c}));
## [[1]]
## `{`
##
## [[2]]
## a
##
## [[3]]
## b
##
## [[4]]
## c
Note that this is identical to how function calls work, except with the difference that, for function calls, the list components are formed from the comma-separated arguments to the function call:
as.list(substitute(sum()));
## [[1]]
## sum
##
as.list(substitute(sum(1)));
## [[1]]
## sum
##
## [[2]]
## [1] 1
##
as.list(substitute(sum(1,3)));
## [[1]]
## sum
##
## [[2]]
## [1] 1
##
## [[3]]
## [1] 3
##
as.list(substitute(sum(1,3,5)));
## [[1]]
## sum
##
## [[2]]
## [1] 1
##
## [[3]]
## [1] 3
##
## [[4]]
## [1] 5
From the above it becomes clear that the first list component is actually a symbol representing the name of a function, for both braced expressions and function calls. In other words, the open brace is a function call, one which simply returns its final argument. Just as square brackets are normal function calls with a convenient syntax built on top of them, the open brace is a normal function call with a convenient syntax built on top of it:
a <- 4:6;
a[2];
## [1] 5
`[`(a,2);
## [1] 5
{1;2};
## [1] 2
`{`(1,2);
## [1] 2
Returning to your example, we can fully explore the parse tree by traversing the list structure that represents the parse tree. I just wrote a nice little recursive function that can do this very easily:
unwrap <- function(x) if (typeof(x) == 'language') lapply(as.list(x),unwrap) else x;
unwrap(substitute(3));
## [1] 3
unwrap(substitute(a));
## a
unwrap(substitute(a+3));
## [[1]]
## `+`
##
## [[2]]
## a
##
## [[3]]
## [1] 3
##
unwrap(substitute({matrix(NA,4,4)}));
## [[1]]
## `{`
##
## [[2]]
## [[2]][[1]]
## matrix
##
## [[2]][[2]]
## [1] NA
##
## [[2]][[3]]
## [1] 4
##
## [[2]][[4]]
## [1] 4
As you can see, the braced expression turns into a normal function call of the function `{`(), taking one argument, which is the single statement you coded into it. That statement consists of a single function call to matrix(), taking three arguments, each of which being a literal value: NA, 4, and 4. And that's the entire parse tree.
So now we can understand the meaning of the "{" class on a deep level: it represents an element of a parse tree that is a function call to the `{`() function. It happens to be classed differently from other function calls ("{" instead of "call"), but as far as I can tell, that has no significance anywhere. Also observe that the typeof() and mode() are identical ("language" and "call", respectively) between all parse tree elements representing function calls, for both `{`() and others alike.

Resources