I am serializing objects with the serialize function.
For example
serialize_object <- serialize(some_object, NULL)
Now I have an issue with closures. For example:
closure <- function(){
member <- NULL
list(init=function(val){member <<- val})
}
closure_serialized <- serialize(closure(), NULL)
This raw object closure_serialized is huge: some 200MB. I am quite sure that also the environment in which it is made is serialized. But I don't need its environment. I only need the closure and its contents.
Am I doing something wrong? Am I initializing or defining the closure in a wrong way? How can I make it only to serialize the closure and not the rest of the environment? Serializing closures from some packages do not have this effect, and I can not find the culprit.
This is mainly because the definition of the closure is within a function.
fn <- function(){
# make big variables
closure <- function(){
member <- NULL
list(init=function(val){member <<- val})
}
closure_serialized <- serialize(closure(), NULL)
}
# serialize will copy the environment within the function in closure_serialized
fn()
The serialize function will in that case copy the environment also. A "workaround" is to place the definition of the closure in the global environment.
closure <- function(){
member <- NULL
list(init=function(val){member <<- val})
}
fn <- function(){
# make big variables
closure_serialized <- serialize(closure(), NULL)
}
# serialize will not copy the global environment.
fn()
The serialize doesn't copy .GlobalEnv environment. See also here for a related topic.
Related
I have developed the following two functions:
save_sysdata <- function(...) {
data <- eval(substitute(alist(...)))
data <- purrr::map_chr(data, add_dot)
save(list = data, file = "sysdata.rda", compress = "bzip2", version = 2)
}
add_dot <- function(object) {
object <- object # Why is this required?
name <- paste0(".", deparse(substitute(object)))
# parent.frame(3) because evaluating in global (or caller function); 2 because assigning in save_sysdata.
assign(name, eval(object, envir = parent.frame(3)), envir = parent.frame(2))
return(name)
}
The purpose of this set of functions is to provide an object (x) and save it as a sysdata.rda file but as a hidden object. This requires adding a . to the object symbol (.x).
The set of functions as I have it works and accomplishes what I want. However, it requires a bit of code that I don't understand why it works or what it's doing. I'm not even sure how I came up with this particular line as a solution.
If I remove the line object <- object from the add_dot function, the whole thing fails to work. It actually just generates an empty sysdata.rda file.
Can anyone explain why this line is necessary and what it is doing?
And if you have a more efficient way of accomplishing this, please let me know. It was a fun exercise to figure this out myself but I'm sure there is a better way.
For a reprex, simply copy the above functions and run:
x <- "test"
save_sysdata(x)
Then load the sysdata.rda file into your global environment and type .x. You should return [1] "test".
Here's an alternative version
save_sysdata <- function(...) {
pnames <- sapply(match.call(expand.dots=FALSE)$..., deparse)
snames <- paste0(".", pnames)
senv <- setNames(list(...), snames)
save(list = snames, envir=list2env(senv), file = "sysdata.rda", compress = "bzip2", version = 2)
}
We dump the values into a named list and granbing the names of the parameter with match.call(). We add dots to the names and then turn that list into an environment that we can use with save.
The reason your version required object <- object is that function parameters are lazily evaluated. Since you never actually use the value of that object in your function without the assignment, it remains a promise and is never added tot he function environment. Sometimes you'll see force(object) instead which does the same thing.
I'm looking for the R equivalent of Python's __reduce__ for serialization and de-serialization of S3 classes - i.e. some method of manually specifying how to serialize and de-serialize objects which belong to a certain class.
Simple example:
Object creator:
make_obj <- function(a = 1) {
obj <- list(a = a, b = a + 1)
class(obj) <- "myClass"
return(obj)
}
Serializer and de-serializer:
serializer <- function(obj) return(as.character(obj$a))
deserializer <- function(s) {
a <- as.numeric(s)
return(make_obj(a))
}
I see R has functions like saveRDS and readRDS which accept an argument refhook for customized serialization, and can be used with those two functions as intended:
myObj <- make_obj(10)
saveRDS(myObj, "myObj.Rds", refhook = serializer)
newObj <- readRDS("myObj.Rds", refhook = deserializer)
But I'm looking for some way of making this automatic based on the object's class, so that (a) it would work with RStudio's save and restore session when those objects are in the environment, and so that (b) someone could just load a package and then use the internal R serialization functions without extra hassle.
I though of defining a custom saveRDS.myClass and registering it as an S3 method - e.g.:
saveRDS.myClass <- function(obj, ...) {
s <- serializer(obj)
saveRDS(s, ...)
}
But this wouldn't work with RStudio's save session, and when calling readRDS it will not know that it should use the custom de-serialization function once it loads this object.
Is there any way of making these serialization and de-serialization functions be attached to an S3 class, so to say?
I'm working in a call stack of variable depth that looks like
TopLevelFunction
-> <SomeOtherFunction(s), 1 or more>
-> AssignmentFunction
Now, my goal is to assign a variable created in AssignmentFunction, to the environment of TopLevelFunction. I know I can extract the stack with sys.calls, so my current approach is
# get the call stack and search for TopLevelFunction
depth <- which(stringr::str_detect(as.character(sys.calls()), "TopLevelFunction"))
# assign in TopLevelFunction's environment
assign(varName, varValue, envir = sys.frame(depth))
I'm more or less fine with that, though I am not sure if that's a good idea to convert call objects to character vectors. Is that approach error-prone? More generally, how would you search for a specific parent environment, knowing only the name of the function?
A fn like this
get_toplevel_env <- function(env) {
if (identical(parent.env(env), globalenv())) {
env
} else {
get_toplevel_env(parent.env(env))
}
}
And use it within any level of your nested-functions like this?
get_toplevel_env(as.environment(-1))
I'm not sure if I understood correctly what you want to do, but, woulnd't it work to use parent.env(as.environment(-1))?
In this example it seems to work.
fn1 <- function() {
fn1.1 <- function(){
assign("parentvar", "PARENT",
envir = parent.env(as.environment(-1)))
}
fn1.1()
print(parentvar)
}
fn1()
Maybe other possibility is to use <<-, which assigns in the global environment, I think. But maybe that's not what you want.
Is there an R function that lists all the functions in an R script file along with their arguments?
i.e. an output of the form:
func1(var1, var2)
func2(var4, var10)
.
.
.
func10(varA, varB)
Using [sys.]source has the very undesirable side-effect of executing the source inside the file. At the worst this has security problems, but even “benign” code may simply have unintended side-effects when executed. At best it just takes unnecessary time (and potentially a lot).
It’s actually unnecessary to execute the code, though: it is enough to parse it, and then do some syntactical analysis.
The actual code is trivial:
file_parsed = parse(filename)
functions = Filter(is_function, file_parsed)
function_names = unlist(Map(function_name, functions))
And there you go, function_names contains a vector of function names. Extending this to also list the function arguments is left as an exercise to the reader. Hint: there are two approaches. One is to eval the function definition (now that we know it’s a function definition, this is safe); the other is to “cheat” and just get the list of arguments to the function call.
The implementation of the functions used above is also not particularly hard. There’s probably even something already in R core packages (‘utils’ has a lot of stuff) but since I’m not very familiar with this, I’ve just written them myself:
is_function = function (expr) {
if (! is_assign(expr)) return(FALSE)
value = expr[[3L]]
is.call(value) && as.character(value[[1L]]) == 'function'
}
function_name = function (expr) {
as.character(expr[[2L]])
}
is_assign = function (expr) {
is.call(expr) && as.character(expr[[1L]]) %in% c('=', '<-', 'assign')
}
This correctly recognises function declarations of the forms
f = function (…) …
f <- function (…) …
assign('f', function (…) …)
It won’t work for more complex code, since assignments can be arbitrarily complex and in general are only resolvable by actually executing the code. However, the three forms above probably account for ≫ 99% of all named function definitions in practice.
UPDATE: Please refer to the answer by #Konrad Rudolph instead
You can create a new environment, source your file in that environment and then list the functions in it using lsf.str() e.g.
test.env <- new.env()
sys.source("myfile.R", envir = test.env)
lsf.str(envir=test.env)
rm(test.env)
or if you want to wrap it as a function:
listFunctions <- function(filename) {
temp.env <- new.env()
sys.source(filename, envir = temp.env)
functions <- lsf.str(envir=temp.env)
rm(temp.env)
return(functions)
}
In my R development I need to wrap function primitives in proto objects so that a number of arguments can be automatically passed to the functions when the $perform() method of the object is invoked. The function invocation internally happens via do.call(). All is well, except when the function attempts to access variables from the closure within which it is defined. In that case, the function cannot resolve the names.
Here is the smallest example I have found that reproduces the behavior:
library(proto)
make_command <- function(operation) {
proto(
func = operation,
perform = function(., ...) {
func <- with(., func) # unbinds proto method
do.call(func, list(), envir=environment(operation))
}
)
}
test_case <- function() {
result <- 100
make_command(function() result)$perform()
}
# Will generate error:
# Error in function () : object 'result' not found
test_case()
I have a reproducible testthat test that also outputs a lot of diagnostic output. The diagnostic output has me stumped. By looking up the parent environment chain, my diagnostic code, which lives inside the function, finds and prints the very same variable the function fails to find. See this gist..
How can the environment for do.call be set up correctly?
This was the final answer after an offline discussion with the poster:
make_command <- function(operation) {
proto(perform = function(.) operation())
}
I think the issue here is clearer and easier to explore if you:
Replace the anonymous function within make_command() with a named one.
Make that function open a browser() (instead of trying to get result). That way you can look around to see where you are and what's going on.
Try this, which should clarify the cause of your problem:
test_case <- function() {
result <- 100
myFun <- function() browser()
make_command(myFun)$perform()
}
test_case()
## Then from within the browser:
#
parent.env(environment())
# <environment: 0x0d8de854>
# attr(,"class")
# [1] "proto" "environment"
get("result", parent.env(environment()))
# Error in get("result", parent.env(environment())) :
# object 'result' not found
#
parent.frame()
# <environment: 0x0d8ddfc0>
get("result", parent.frame()) ## (This works, so points towards a solution.)
# [1] 100
Here's the problem. Although you think you're evaluating myFun(), whose environment is the evaluation frame of test_case(), your call to do.call(func, ...) is really evaluating func(), whose environment is the proto environment within which it was defined. After looking for and not finding result in its own frame, the call to func() follows the rules of lexical scoping, and next looks in the proto environment. Neither it nor its parent environment contains an object named result, resulting in the error message you received.
If this doesn't immediately make sense, you can keep poking around within the browser. Here are a few further calls you might find helpful:
environment(get("myFun", parent.frame()))
ls(environment(get("myFun", parent.frame())))
environment(get("func", parent.env(environment())))
ls(environment(get("func", parent.env(environment()))))