I come from a python background and am trying to get up to speed with R, so please bear with me
I have an R file - util.R with the following lines:
util.add <- function(a,b) a + b
util.sub <- function(a,b) { a - b }
I source it as follows:
source('path/util.R')
I now have two function objects and want to write a function as follows:
getFilePath(util.add)
that would give me this result
[1] "path/util.R"
Digging into the srcref attribute of one of the loaded functions appears to work, if you go deep enough ...
source("tmp/tmpsrc.R")
str(util.add)
## function (a, b)
## - attr(*, "srcref")=Class 'srcref' atomic [1:8] 1 13 1 31 13 31 1 1
## .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x8fffb18>
srcfile <- attr(attr(util.add,"srcref"),"srcfile")
ls(srcfile)
## [1] "Enc" "filename" "fixedNewlines" "isFile"
## [5] "lines" "parseData" "timestamp" "wd"
srcfile$filename
## [1] "tmp/tmpsrc.R"
I know this was solved years ago, but I've just come across it and realised that there is a bit more to this if you use the body() function.
The raw function has only the one attribute, "srcref" which contains the code of the function, along with it's own attributes and class of "srcref" (which dictates how it'll get printed).
The body() of a function, such as body(util.add) has three attributes.
"srcref" which contains the body of the function stored as a list of expressions.
"srcfile" which contains the source file of the function (which is what you are looking for in this question)
"wholeSrcref" which points to the entire source file.
This gives you an alternative (although slightly slower) method to extract the source file name attr(body(util.add),"srcfile"), along with being able to see (although not interact with) the sibling functions (i.e. the other functions loaded in the same source file).
Not sure if it's useful, but it could be interesting.
Let's also not forget about the %#% infix operator for accessing attributes using the {purrr} package, with this we could use the more succinct (although again, slower) piece of code as:
util.add%#%srcref%#%srcfile
Related
I am using a parent function to generate a child function by returning the function in the parent function call. The purpose of the parent function is to set a constant (y) in the child function. Below is a MWE. When I try to debug the child function I cannot figure out in which environment the variable is stored in.
power=function(y){
return(function(x){return(x^y)})
}
square=power(2)
debug(square)
square(3)
debugging in: square(3)
debug at #2: {
return(x^y)
}
Browse[2]> x
[1] 3
Browse[2]> y
[1] 2
Browse[2]> ls()
[1] "x"
Browse[2]> find('y')
character(0)
If you inspect the type of an R function, you’ll observe the following:
> typeof(square)
[1] "closure"
And that is, in fact, exactly the answer to your question: a closure is a function that carries an environment around.
R also tells you which environment this is (albeit not in a terribly useful way):
> square
function(x){return(x^y)}
<environment: 0x7ffd9218e578>
(The exact number will differ with each run — it’s just a memory address.)
Now, which environment does this correspond to? It corresponds to a local environment that was created when we executed power(2) (a “stack frame”). As the other answer says, it’s now the parent environment of the square function (in fact, in R every function, except for certain builtins, is associated with a parent environment):
> ls(environment(square))
[1] "y"
> environment(square)$y
[1] 2
You can read more about environments in the chapter in Hadley’s Advanced R book.
Incidentally, closures are a core feature of functional programming languages. Another core feature of functional languages is that every expression is a value — and, by implication, a function’s (return) value is the value of its last expression. This means that using the return function in R is both unnecessary and misleading!1 You should therefore leave it out: this results in shorter, more readable code:
power = function (y) {
function (x) x ^ y
}
There’s another R specific subtlety here: since arguments are evaluated lazily, your function definition is error-prone:
> two = 2
> square = power(two)
> two = 10
> square(5)
[1] 9765625
Oops! Subsequent modifications of the variable two are reflected inside square (but only the first time! Further redefinitions won’t change anything). To guard against this, use the force function:
power = function (y) {
force(y)
function (x) x ^ y
}
force simply forces the evaluation of an argument name, nothing more.
1 Misleading, because return is a function in R and carries a slightly different meaning compared to procedural languages: it aborts the current function exectuion.
The variable y is stored in the parent environment of the function. The environment() function returns the current environment, and we use parent.env() to get the parent environment of a particular environment.
ls(envir=parent.env(environment())) #when using the browser
The find() function doesn't seem helpful in this case because it seems to only search objects that have been attached to the global search path (search()). It doesn't try to resolve variable names in the current scope.
Consider having many *.Rda files in your directory. They all contain exactly one object (in this case, a model obtained from mboost:::gamboost) with the added twist, that the objects have the same name ("mod_gam").
Is it possible to load all of them into workspace at once (and even renaming them)?
temp <- list.files(pattern="*.Rda")
models <- lapply(temp, load)
does yield a list with empty characters:
str(models)
List of 26
$ : chr "mod_gam"
$ : chr "mod_gam"
$ : chr "mod_gam"
... and so on.
My suggestion would be to add an iterative suffix to your objects as they are loaded in. Since you already know that every object loaded in will be called "mod_gam", it makes things a bit easier.
i <- 1
for(each in temp){
load(each)
eval(parse(text=paste(paste0("mod_gam_",i),"<- mod_gam")))
i <- i+1
}
This will give you the 26 different objects. Note that this isn't optimal -- I wanted to lapply instead of loop, but I was having trouble figuring out how to iterate the suffix each time I read in a new file.
I was hoping to use R's reflection capabilities to intercept the current expression under evaluation before it is evaluated.
For instance, to create some syntax sugar, given the following:
> Server <- setRefClass("Server",
> methods = list(
> handler = function(expr) submitExpressionToRemoteServer(expr)
> )
> )
> server <- Server()
> server$foo$bar$baz #... should be map to... server$handler("foo$bar$baz")
I want the expression server$foo$bar$baz to be intercepted by the server$handlermethod and get mapped to server$handler("foo$bar$baz").
Note that I want this call to succeed even though server$foo is not defined: I am interested only in the expression itself (so I can do stuff with the expression), not that it evaluates to a valid local object.
Is this possible?
I don't think this is possible to redefine the $ behavior with Reference Classes (R5) objects in R. However, this is something that you can do with S4 classes. The main problem is that an expression like
server$foo$bar$baz
would get translated to a series of calls like
$($($(server,"foo"),"bar"),"baz")
but unlike normal function nesting, each inner call appears to be fully evaluated before going to the next level of nesting. This it's not really possible just to split up everything after the first $ because that's not how it's parsed. However you can have the $ function return another object and append all the values sent to the object. Here's a sample S4 class
setClass("Server", slots=list(el="character"))
setMethod("$", signature(x="Server"),
function(x,name) {
xx <- append(slot(x,"el"),name)
new("Server", el=xx)
}
)
server <- new("Server")
server$foo$bar$baz
# An object of class "Server"
# Slot "el":
# [1] "foo" "bar" "baz"
the only problem is there's no way i've found to know when you're at the end of a list if you wanted to do anything with those parameters.
When I have a SpatialPolygonsDataFrame object, I know that I can get to the data two ways:
spatial_df#data$column
spatial_df$column
However I don't understand why the second way is possible. I thought that I must access the data slot using #? Is this something unique about the SpatialPolygonsDataFrame class, or is it something about S4 object in general?
One possible answer is in the sp documentations, which mentions the method [ for the SpatialPolygonsDataFrame class. However, since $ is equivalent to [[, NOT to [, I'm not sure that's the answer.
The short answer is that this behavior of $ is implemented by the Spatial class in the sp package, and is not a feature of general S4 object.
The long answer (how I find out about this):
Use showMethods("$") to find out about all the methods of the generic $.
The result shows:
Function: $ (package base)
x="C++Class"
x="envRefClass"
x="Module"
x="Raster"
x="refObjectGenerator"
x="Spatial"
x="SpatialGDAL"
x="SpatialPoints"
x="SpatialPolygonsDataFrame"
(inherited from: x="Spatial")
So we know that SpatialPolygonsDataFrame-class inherits $ from Spatial-class. We go to the root by:
getMethod("$", "Spatial"), which shows the implementation of $ for Spatial-class as follows:
Method Definition:
function (x, name)
{
if (!("data" %in% slotNames(x)))
stop("no $ method for object without attributes")
x#data[[name]]
}
<environment: namespace:sp>
Therefore, spatial_df$col_name is a shortcut for spatial_df#data[["col_name"]]
I am a bit confused by R's lookup mechanism. When I have the following code
# create chain of empty environments
e1 <- new.env()
e2 <- new.env(parent=e1)
e3 <- new.env(parent=e2)
# set key/value pairs
e1[["x"]] <- 1
e2[["x"]] <- 2
then I would expect to get "2" if I look for "x" in environment e3.
This works if I do
> get(x="x", envir=e3)
[1] 2
but not if I use
> e3[["x"]]
NULL
Could somebody explain the difference? It seems, that
e3[["x"]]
is not just syntactic sugar for
get(x="x", envir=e3)
Thanks in advance,
Sven
These functions are different.
get performs a search for an object in an environemnt, as well as the enclosing frames (by default):
From ?get:
This function looks to see if the name x has a value bound to it in the specified environment. If inherits is TRUE and a value is not found for x in the specified environment, the enclosing frames of the environment are searched until the name x is encountered. See environment and the ‘R Language Definition’ manual for details about the structure of environments and their enclosures.
In contrast, the [ operator does not search enclosing environments, by default.
From ?'[':
Both $ and [[ can be applied to environments. Only character indices are allowed and no partial matching is done. The semantics of these operations are those of get(i, env=x, inherits=FALSE).