Here is the code:
mf = function(..., expr) {
expr = substitute(expr)
print(class(expr))
print(str(expr))
expr
}
mf(a = 1, b = 2, expr = {matrix(NA, 4, 4)})
Output:
[1] "{"
length 2 { matrix(NA, 4, 4) }
- attr(*, "srcref")=List of 2
..$ :Class 'srcref' atomic [1:8] 1 25 1 25 25 25 1 1
.. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7fbcdbce3860>
..$ :Class 'srcref' atomic [1:8] 1 26 1 41 26 41 1 1
.. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7fbcdbce3860>
- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7fbcdbce3860>
- attr(*, "wholeSrcref")=Class 'srcref' atomic [1:8] 1 0 1 42 0 42 1 1
.. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7fbcdbce3860>
NULL
{
matrix(NA, 4, 4)
}
Apparently the result of substitute(expr) produces something of the class "{". What is this class exactly? Why is {matrix(NA, 4, 4)} of length 2? What do these strange attrs mean?
The { is the class for a block of code. Just looking at the classes, note the difference between these
mf(a = 1, b = 2, expr = {matrix(NA, 4, 4)})
# [1] "{"
mf(a = 1, b = 2, expr = matrix(NA, 4, 4))
# [1] "call"
A class of { can hold multiple statements. The length() indicates how many statements are in the block (including the start of the block). For example
length(quote({matrix(NA, 4, 4)}))
# [1] 2
length(quote({matrix(NA, 4, 4); matrix(NA,3,3)}))
# [1] 3
length(quote({}))
# [1] 1
The attributes "srcref" and "srcfile" are how R tracks where functions are defined for trying to give informative error messages. You can see the ?srcfile help page for more information about that.
What you're seeing is a reflection of the way R exposes its internal language structure through its own data structures.
The substitute() function returns the parse tree of an R expression. The parse tree is a tree of language elements. These can include literal values, symbols (basically variable names), function calls, and braced blocks. Here's a demonstration of all the R language elements as returned by substitute(), showing their types in all of R's type classification schemes:
tmc <- function(x) c(typeof(x),mode(x),class(x));
tmc(substitute(TRUE));
## [1] "logical" "logical" "logical"
tmc(substitute(4e5L));
## [1] "integer" "numeric" "integer"
tmc(substitute(4e5));
## [1] "double" "numeric" "numeric"
tmc(substitute(4e5i));
## [1] "complex" "complex" "complex"
tmc(substitute('a'));
## [1] "character" "character" "character"
tmc(substitute(somevar));
## [1] "symbol" "name" "name"
tmc(substitute(T));
## [1] "symbol" "name" "name"
tmc(substitute(sum(somevar)));
## [1] "language" "call" "call"
tmc(substitute(somevec[1]));
## [1] "language" "call" "call"
tmc(substitute(somelist[[1]]));
## [1] "language" "call" "call"
tmc(substitute(somelist$x));
## [1] "language" "call" "call"
tmc(substitute({blah}));
## [1] "language" "call" "{"
Notes:
Note how all three type classification schemes are very similar, but subtly different. This can be a source of confusion. typeof() gives the storage type of the object, sometimes called the "internal" type (to be honest, it probably shouldn't be called "internal" because it is frequently exposed very directly to the user at the R level, but it is often described that way; I would call it the "fundamental" or "underlying" type), mode() gives a similar classification scheme that everyone should probably ignore, and class() gives the implicit (if there's no class attribute) or explicit (if there is) class of the object, which is used for S3 method lookup (and, it should be said, is sometimes examined directly by R code, independent of the S3 lookup process).
Note how TRUE is a logical literal, but T is a symbol, just like any other variable name, and just happens to be assigned to TRUE by default (and ditto for F and FALSE). This is why sometimes people recommend against using T and F in favor of using TRUE and FALSE, because T and F can be reassigned (but personally I prefer to use T and F for the concision; no one should ever reassign those!).
The astute reader will notice that in my demonstration of literals, I've omitted the raw type. This is because there's no such thing as a raw literal in R. In fact, there are very few ways to get a hold of raw vectors in R; raw(), as.raw(), charToRaw(), and rawConnectionValue() are the only ways that I'm aware of, and if I used those functions in a substitute() call, they would be returned as "call" objects, just like in the sum(somevar) example, not literal raw values. The same can be said for the list type; there's no such thing as a list literal (although there are many ways to acquire a list via function calls). Plain raw vectors return 'raw' for all three type classifications, and plain lists return 'list' for all three type classifications.
Now, when you have a parse tree that is more complicated than a simple literal value or symbol (meaning it must be a function call or braced expression), you can generally examine the contents of that parse tree by coercing to list. This is how R exposes its internal language structure through its own data structures.
Diving into your example:
pt <- as.list(substitute({matrix(NA,4,4)}));
pt;
## [[1]]
## `{`
##
## [[2]]
## matrix(NA, 4, 4)
This makes it clear why length() returns 2: that's the length of the list that represents the parse tree. In general, the bracing of the expression is translated into the first list component, and the remaining list components are built from the semicolon-separated statements within the braces:
as.list(substitute({}));
## [[1]]
## `{`
##
as.list(substitute({a}));
## [[1]]
## `{`
##
## [[2]]
## a
##
as.list(substitute({a;b}));
## [[1]]
## `{`
##
## [[2]]
## a
##
## [[3]]
## b
##
as.list(substitute({a;b;c}));
## [[1]]
## `{`
##
## [[2]]
## a
##
## [[3]]
## b
##
## [[4]]
## c
Note that this is identical to how function calls work, except with the difference that, for function calls, the list components are formed from the comma-separated arguments to the function call:
as.list(substitute(sum()));
## [[1]]
## sum
##
as.list(substitute(sum(1)));
## [[1]]
## sum
##
## [[2]]
## [1] 1
##
as.list(substitute(sum(1,3)));
## [[1]]
## sum
##
## [[2]]
## [1] 1
##
## [[3]]
## [1] 3
##
as.list(substitute(sum(1,3,5)));
## [[1]]
## sum
##
## [[2]]
## [1] 1
##
## [[3]]
## [1] 3
##
## [[4]]
## [1] 5
From the above it becomes clear that the first list component is actually a symbol representing the name of a function, for both braced expressions and function calls. In other words, the open brace is a function call, one which simply returns its final argument. Just as square brackets are normal function calls with a convenient syntax built on top of them, the open brace is a normal function call with a convenient syntax built on top of it:
a <- 4:6;
a[2];
## [1] 5
`[`(a,2);
## [1] 5
{1;2};
## [1] 2
`{`(1,2);
## [1] 2
Returning to your example, we can fully explore the parse tree by traversing the list structure that represents the parse tree. I just wrote a nice little recursive function that can do this very easily:
unwrap <- function(x) if (typeof(x) == 'language') lapply(as.list(x),unwrap) else x;
unwrap(substitute(3));
## [1] 3
unwrap(substitute(a));
## a
unwrap(substitute(a+3));
## [[1]]
## `+`
##
## [[2]]
## a
##
## [[3]]
## [1] 3
##
unwrap(substitute({matrix(NA,4,4)}));
## [[1]]
## `{`
##
## [[2]]
## [[2]][[1]]
## matrix
##
## [[2]][[2]]
## [1] NA
##
## [[2]][[3]]
## [1] 4
##
## [[2]][[4]]
## [1] 4
As you can see, the braced expression turns into a normal function call of the function `{`(), taking one argument, which is the single statement you coded into it. That statement consists of a single function call to matrix(), taking three arguments, each of which being a literal value: NA, 4, and 4. And that's the entire parse tree.
So now we can understand the meaning of the "{" class on a deep level: it represents an element of a parse tree that is a function call to the `{`() function. It happens to be classed differently from other function calls ("{" instead of "call"), but as far as I can tell, that has no significance anywhere. Also observe that the typeof() and mode() are identical ("language" and "call", respectively) between all parse tree elements representing function calls, for both `{`() and others alike.
Related
Total R noob here.
I am having difficulty creating a list of stock tickers.
Here's the situation:
I've created a dataframe of tickers pulled in from Quandl's API.
x1<-Quandl.datatable('SHARADAR/SF1',paginate=TRUE,
qopts.columns=c('ticker'))
I then try to put this dataframe into a list.
x2<-as.list(x1)
So that I can then use the API to pull data for all the tickers in the list.
x3<-Quandl.datatable('SHARADAR/SF1',paginate=TRUE,
qopts.columns=c('ticker','dimension','datekey','revenue'),
dimension='ART', calendardate='2015-12-31',ticker=c(x2))
But, alas, this doesn't work.
Compare this, however, with when I pull specific tickers:
Quandl.datatable('SHARADAR/SF1', ticker=c('AAPL', 'TSLA'))
z = list('AAPL','TSLA')
The code behaves itself:
x3<-Quandl.datatable('SHARADAR/SF1',paginate=TRUE,
qopts.columns=c('ticker','dimension','datekey','revenue'),
dimension='ART', calendardate='2015-12-31',ticker=z)
This is because each ticker is its own component in the list(z):
[[1]]
[1] "AAPL"
[[2]]
[1] "TSLA"
Whereas for x2 all the tickers are stored as a single list component:
[1] "AAPL", "TSLA", etc.
Therefore I'd be swell if I could find a way to convert vector x2 into a list where each element is it's own component.
Thank a bunch (and for your patience as well!)
This should work:
x = sapply(1:5000, list)
The length is 5000:
length(x)
[1] 5000
All elements are integers:
all(sapply(x, is.integer) == TRUE)
[1] TRUE
This also works with character vectors:
sapply(c('AAPL', 'MSFT', 'AMZN'), list)
$AAPL
[1] "AAPL"
$MSFT
[1] "MSFT"
$AMZN
[1] "AMZN"
One option could be as:
x1 <- c(list(),1:5000)
str(x1)
# List of 10
# $ : int 1
# $ : int 2
# $ : int 3
# $ : int 4
# $ : int 5
# $ : int 6
# $ : int 7
# $ : int 8
#...
#.....
x1 is a one column data frame. Because a data.frame really is a list under the hood, as.list() just gives you a list of columns, in this case list(x1$column1).
You need to run as.list on a vector to get the result you want. Either of these will work:
as.list(x1$your_column_name)
as.list(x1[["your_column_name"]])
I am trying to dive into the internals of static code analysis packages like codetools and CodeDepends, and my immediate goal is to understand how to detect function calls written as package_name::function_name() or package_name:::function_name(). I would have liked to just use findGlobals() from codetools, but this is not so simple.
Example function to analyze:
f <- function(n){
tmp <- digest::digest(n)
stats::rnorm(n)
}
Desired functionality:
analyze_function(f)
## [1] "digest::digest" "stats::rnorm"
Attempt with codetools:
library(codetools)
f = function(n) stats::rnorm(n)
findGlobals(f, merge = FALSE)
## $functions
## [1] "::"
##
## $variables
## character(0)
CodeDepends comes closer, but I am not sure I can always use the output to match functions to packages. I am looking for an automatic rule that connects rnorm() to stats and digest() to digest.
library(CodeDepends)
getInputs(body(f)
## An object of class "ScriptNodeInfo"
## Slot "files":
## character(0)
##
## Slot "strings":
## character(0)
##
## Slot "libraries":
## [1] "digest" "stats"
##
## Slot "inputs":
## [1] "n"
##
## Slot "outputs":
## [1] "tmp"
##
## Slot "updates":
## character(0)
##
## Slot "functions":
## { :: digest rnorm
## NA NA NA NA
##
## Slot "removes":
## character(0)
##
## Slot "nsevalVars":
## character(0)
##
## Slot "sideEffects":
## character(0)
##
## Slot "code":
## {
## tmp <- digest::digest(n)
## stats::rnorm(n)
## }
EDIT To be fair to CodeDepends, there is so much customizability and power for those who understand the internals. At the moment, I am just trying to wrap my head around collectors, handlers, walkers, etc. Apparently, it is possible to modify the standard :: collector to make special note of each namespaced call. For now, here is a naive attempt at something similar.
col <- inputCollector(`::` = function(e, collector, ...){
collector$call(paste0(e[[2]], "::", e[[3]]))
})
getInputs(quote(stats::rnorm(x)), collector = col)#functions
Browse[1]> getInputs(quote(stats::rnorm(x)), collector = col)#functions
stats::rnorm rnorm
NA NA
If you want to extract namespaced functions from a function, try something like this
find_ns_functions <- function(f, found=c()) {
if( is.function(f) ) {
# function, begin search on body
return(find_ns_functions(body(f), found))
} else if (is.call(f) && deparse(f[[1]]) %in% c("::", ":::")) {
found <- c(found, deparse(f))
} else if (is.recursive(f)) {
# compound object, iterate through sub-parts
v <- lapply(as.list(f), find_ns_functions, found)
found <- unique( c(found, unlist(v) ))
}
found
}
And we can test with
f <- function(n){
tmp <- digest::digest(n)
stats::rnorm(n)
}
find_ns_functions(f)
# [1] "digest::digest" "stats::rnorm"
Ok, so this was possible with CodeDepends previously, but a bit harder than it should have been. I've just committed version 0.5-4 to github, which now makes this really "easy". Essentially you just need to modify the default colonshandlers ("::" and/or ":::") as follows:
library(CodeDepends) # version >= 0.5-4
handler = function(e, collector, ..., iscall = FALSE) {
collector$library(asVarName(e[[2]]))
## :: or ::: name, remove if you don't want to count those as functions called
collector$call(asVarName(e[[1]]))
if(iscall)
collector$call(deparse((e))) #whole expr ie stats::norm
else
collector$vars(deparse((e)), input=TRUE) #whole expr ie stats::norm
}
getInputs(quote(stats::rnorm(x,y,z)), collector = inputCollector("::" = handler))
getInputs(quote(lapply( 1:10, stats::rnorm)), collector = inputCollector("::" = handler))
The first getInputs call above gives the result:
An object of class "ScriptNodeInfo"
Slot "files":
character(0)
Slot "strings":
character(0)
Slot "libraries":
[1] "stats"
Slot "inputs":
[1] "x" "y" "z"
Slot "outputs":
character(0)
Slot "updates":
character(0)
Slot "functions":
:: stats::rnorm
NA NA
Slot "removes":
character(0)
Slot "nsevalVars":
character(0)
Slot "sideEffects":
character(0)
Slot "code":
stats::rnorm(x, y, z)
As, I believe, desired.
One thing to note here is the iscall argument I've added to the colons handler. The default handler and applyhandlerfactory now have special logic so that when they invoke one of the colons handlers in a situation where it is a function being called, that is set to TRUE.
I haven't done extensive testing yet of what will happen when "stats::rnorm" appears in lieu of symbols, particularly in the inputs slot when calculating dependencies, but I'm hopeful that should all continue to work as well. If it doesn't let me know.
~G
I am writing my bachelor thesis and I have not much experience with r so far.
My problem is that my dates which I made with this commands :
t<-strptime(x, "%d.%m.%Y %H.%M")
don't work anymore when I save them in a matrix with the other information on those specific dates.
I am a bit confused because it works just fine when I don't put them in a matrix like this t[1:10]
But that happens as soon as I try to save them in a matrix
matrix1<-matrix(c(t,v2,v3,v4),nrow=length(v2))
Fehler in as.POSIXct.numeric(X[[i]], ...) : 'origin' muss angegeben werden
It's German but it means origin must be supplied.
Any ideas what I have to do to fix it? I am a bit frustrated :)
Roland is right. You can't have Posixlt objects in a matrix. What you can do is save those dates as numeric timestamps in the matrix and convert them back to dates while accessing
Converting to numeric timestamp:
>date<- as.numeric(as.POSIXct("2014-02-16 2:13:46 UTC",origin="01-01-1970"))
>date
[1] 1392545626
Then save those timestamps in a matrix as you do and to convert it back to date, use the above command again without converting it into a numeric.
t (terrible name by the way, easily confused with the t function) is a POSIXlt object, which internally is a list. First you should check, what c(t,v2,v3,v4) returns (I don't know how v2 etc are defined).
Then we can look into the documentation in help("matrix"):
data
an optional data vector (including a list or expression vector). Non-atomic classed R objects are coerced by as.vector and all attributes discarded.
The important bit is "all attributes discarded". This is what you get if you discard the attributes (which include the class attribute) of a POSIXlt object:
x <- strptime(c("2016-05-09 12:00:00", "2016-05-09 13:00:00"), format = "%Y-%m-%d %H:%M:%S")
attributes(x) <- NULL
print(x)
# [[1]]
# [1] 0 0
#
# [[2]]
# [1] 0 0
#
# [[3]]
# [1] 12 13
#
# [[4]]
# [1] 9 9
#
# [[5]]
# [1] 4 4
#
# [[6]]
# [1] 116 116
#
# [[7]]
# [1] 1 1
#
# [[8]]
# [1] 129 129
#
# [[9]]
# [1] 1 1
#
# [[10]]
# [1] "CEST" "CEST"
#
# [[11]]
# [1] NA NA
A matrix can't contain POSIXlt objects (or any objects, i.e., anything with an explicit class).
Suppose I create a list of functions in my R workspace, the same set of functions are also in a R file and after source(), the sourced function object should be identical to the corresponding function in the list I created, but this is not the case.
Example:
The f.R file contains f <- function(x) x^2.
In R console:
lst <- list(f=function(x) x^2)
source("f.R")
> ls()
[1] "f" "lst"
> identical(f,lst$f)
[1] FALSE
> str(f)
function (x)
- attr(*, "srcref")=Class 'srcref' atomic [1:8] 1 6 1 20 6 20 1 1
.. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x1b2fd60>
> str(lst$f)
function (x)
- attr(*, "srcref")=Class 'srcref' atomic [1:8] 1 16 1 30 16 30 1 1
.. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x1bb4b50>
I also tried:
> identical(f,lst$f, ignore.environment=TRUE)
[1] FALSE
> all.equal(f,lst$f)
[1] TRUE
Why is that?
EDIT:
In general:
f <- function(x) x^2
g <- function(x) x^2
identical(f,g)
[1] FALSE
Why would the attributes of f and g be different? Does this suggest one should never use identical to test equality between function objects ?
You've proved this yourself:
> str(f)
function (x)
- attr(*, "srcref")=Class 'srcref' atomic [1:8] 1 6 1 20 6 20 1 1
.. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x1b2fd60>
> str(lst$f)
function (x)
- attr(*, "srcref")=Class 'srcref' atomic [1:8] 1 16 1 30 16 30 1 1
.. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x1bb4b50>
These attributes are not identical, therefore the objects are not identical.
More info about these attributes is here:
What/Where are the attributes of a function object?
These attributes allow R to refer to the function's source code. Your two functions have been defined in different places, and so their source code references (such as which file they were defined in) are different.
The easiest difference to see is the srcfile attribute of the srcref:
> attr(attr(f,"srcref"),"srcfile")
f.R
> attr(attr(lst$f,"srcref"),"srcfile")
>
Your f function is defined in f.R, so R keeps that in the attribute. Your lst$f is defined at the command line, so has a blank srcfile attribute.
Other parts of the attribute are also different because of how the code is parsed.
Specifically, the srcfile environment attribute is different:
> f <- function(x) x^2
> g <- function(x) x^2
> identical(f,g)
[1] FALSE
> print.default(attr(attr(g,"srcref"),"srcfile"))
<environment: 0x5b9fe40>
attr(,"class")
[1] "srcfilecopy" "srcfile"
> print.default(attr(attr(f,"srcref"),"srcfile"))
<environment: 0x5ba1028>
attr(,"class")
[1] "srcfilecopy" "srcfile"
I would say testing for identicality between function objects is probably not a good thing to do. The only time I can reliably believe two functions to be identical is if it has been created by assignment from the other one, eg: f2 = f1 implies identical(f1,f2).
Any other way of creating a function is prone to differences in srcref, enclosing environment, and even writing things with or without brackets.
Note all this is avoided if you set keep.source to FALSE.
> options(keep.source=TRUE)
> f <- function(x) x^2
> g <- function(x) x^2
> identical(f,g)
[1] FALSE
> options(keep.source=FALSE)
> f <- function(x) x^2
> g <- function(x) x^2
> identical(f,g)
[1] TRUE
But I still think you need to think very carefully about what constitutes identical functions for your purposes.
As for the ignore.environment argument, that ignores the environment in which the function was created. I'll write a function that creates functions to illustrate this:
foo creates an "adder" function that adds x to its argument:
> foo=function(x){print(x);function(z){z*x}}
So I can make an add-1 function and an add-2 function:
> f1 = foo(1)
[1] 1
> f2 = foo(2)
[1] 2
These functions are storing the local x in their environment. They are not identical by default:
> identical(f1,f2)
[1] FALSE
But are identical if you ignore the environment in which they were created:
> identical(f1,f2,ignore.environment=TRUE)
[1] TRUE
If you print them, they look identical (and I've set options(keep.source=FALSE) already) except for the environment hex-code reference:
> f1
function (z)
{
z * x
}
<environment: 0x8e4ea54>
> f2
function (z)
{
z * x
}
<environment: 0x8e4fb5c>
That's the environment ignored by ignore.environment.
I have a list of list subgame[[i]]$Weight of this type:
[[1]]
[1] 0.4720550 0.4858826 0.4990469 0.5115899 0.5235512 0.5349672 0.5458720
[8] 0.5562970 0.5662715 0.5758226 0.5849754 0.5937532 0.6021778 0.6102692
[15] 0.6180462 0.6255260 0.6327250 0.6396582 0.6463397 0.6527826
[[2]]
[1] 0.4639948 0.4779027 0.4911519 0.5037834 0.5158356 0.5273443 0.5383429
[8] 0.5488623 0.5589313 0.5685767 0.5778233 0.5866943 0.5952111 0.6033936
[15] 0.6112605 0.6188291 0.6261153 0.6331344 0.6399002 0.6464260
[[3]]
[1] 0.4629488 0.4768668 0.4901266 0.5027692 0.5148329 0.5263534 0.5373639
[8] 0.5478953 0.5579764 0.5676339 0.5768926 0.5857755 0.5943041 0.6024984
[15] 0.6103768 0.6179568 0.6252543 0.6322844 0.6390611 0.6455976
What I am looking for is to access all the j-th elements of every list. Example if j=1 I must get:
>0.4720550 0.4639948 0.4629488
How can I do it?
I found
sapply(1:length(subgame[[i]]$Weight),function(k) subgame[[i]]$Weight[[k]][1])
But seems too tricky to me.
There is a more elegant way?
If j=1, then you're interested in subgame[[i]]$Weight[[1]][1], subgame[[i]]$Weight[[2]][1], and subgame[[i]]$Weight[[3]][1]. In other words, you want to use [1] on each list element.
But what happens when you subset a vector? For example:
(x <- rnorm(5))
# [1] -1.8965529 0.4688618 0.6588774 0.2749539 0.1829046
x[3]
# [1] 0.6588774
[ is actually a function, and it gets called in this situation. You can read a bit more about it with ?"[", but the point is that you can call it like any other function. Its first argument will be the object to subset, then you can pass it the index (or indices) you're interested in (along with some other arguments that the help page discusses):
x[3]
# [1] 0.6588774
`[`(x, 3)
# [1] 0.6588774
Note the backticks surrounding the name. A bare [ will throw an error, so you need to quote it. The same goes for other functions like +.
So if you want to get the first element of each list element, you can apply [ to each element of the list, passing it 1 or whatever j is:
sapply(subgame[[i]]$Weight, `[`, 1)
I would like to add a solution which returns the result you want for the Weight list of each elements of your subgame list.
> subgame <- list(list(weight = list(c(1, 2), c(3, 4), c(5, 6))), list(weight = list(c(7, 8), c(9, 10), c(11, 12))))
>
> j = 1
>
> do.call(rbind, subgame[[1]]$weight)[,j]
[1] 1 3 5
>
> lapply(subgame, function(x) {do.call(rbind, x$weight)[,j]})
[[1]]
[1] 1 3 5
[[2]]
[1] 7 9 11