Is there a way to lazily load elements of a list?
I have a list of large data.frames that each take a long time to generate and load. Typically I would not use all of the data.frames during a session, so would like them to generate and load lazily as i used them. I know I can use delayedAssign to create variables that load lazily, but this cannot be applied to list elements.
Below is a reproducible example of what does not work:
Some functions that take a while to generate data.frames:
slow_fun_1 <- function(){
cat('running slow function 1 now \n')
Sys.sleep(1)
df<-data.frame(x=1:5, y=6:10)
return(df)
}
slow_fun_2 <- function(){
cat('running slow function 2 now \n')
Sys.sleep(1)
df<-data.frame(x=11:15, y=16:20)
return(df)
}
APPROACH 1
my_list <- list()
my_list$df_1 <-slow_fun_1()
my_list$df_2 <-slow_fun_2()
# This is too slow. I might not want to use both df_1 & df_2.
APPROACH 2
my_list_2 <- list()
delayedAssign('my_list_2$df_1', slow_fun_1())
delayedAssign('my_list_2$df_2', slow_fun_2())
# Does not work. Can't assign to a list.
my_list_2 #output is still an empty list.
Here is one possible solution. It is not lazy evaluation. But it calculates the data.frame when you need (and then it caches it, so the calculation is carried out only for the first time). You can use package memoise to achieve this. For example
slow_fun_1 <- function(){
cat('running slow function 1 now \n')
Sys.sleep(1)
df<-data.frame(x=1:5, y=6:10)
return(df)
}
slow_fun_2 <- function(){
cat('running slow function 2 now \n')
Sys.sleep(1)
df<-data.frame(x=11:15, y=16:20)
return(df)
}
library(memoise)
my_list <- list()
my_list$df_1 <-memoise(slow_fun_1)
my_list$df_2 <-memoise(slow_fun_2)
and note that my_list$df_1 and so on are actually the functions that give you data.frames, so your usage should look like this:
> my_list$df_1()
running slow function 1 now
x y
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
> my_list$df_1()
x y
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
>
Note that the cached function only do the actual calculation at the first time.
Update: If you want to stick with the original usage without the function call, one way is to have a modified data structure based on the list, for example:
library(memoise)
lazy_list <- function(...){
structure(list(...), class = c("lazy_list", "list"))
}
as.list.lazy_list <- function(x){
structure(x, class = "list")
}
generator <- function(f){
structure(memoise(f), class = c("generator", "function"))
}
`$.lazy_list` <- function(lst, name){
r <- as.list(lst)[[name]]
if (inherits(r, "generator")) {
return(r())
}
return(r)
}
`[[.lazy_list` <- function(lst, name){
r <- as.list(lst)[[name]]
if (inherits(r, "generator")) {
return(r())
}
return(r)
}
lazy1 <- lazy_list(df_1 = generator(slow_fun_1),
df_2 = generator(slow_fun_2),
df_3 = data.frame(x=11:15, y=16:20))
lazy1$df_1
lazy1$df_1
lazy1$df_2
lazy1$df_2
lazy1$df_3
Here is an imperfect solution. Its imperfect because the list can't be used interactively in the Rstudio console without all of the list elements being loaded. Specifically when the $ is typed, Rstudio runs both functions. Ctrl+Enter on my_env$df_1 works as desired, so the issue is the use in the console.
my_env <- new.env()
delayedAssign('df_1',slow_fun_1(),assign.env = my_env)
delayedAssign('df_2',slow_fun_2(),assign.env = my_env)
# That was very fast!
get('df_1',envir = my_env)
my_env$df_1
# only slow_fun_1() is run once my_env$df_1 is called. So this is a partial success.
# however it does not work interactively in Rstudio
# when the following is typed into the console:
my_env$
# In Rstudio, once the dollar sign is typed, both functions are run.
# this works interactively in the Rstudio console.
# But the syntax is less convenient to type.
my_env[['d']]
Related
In a for loop I make a "string-formula" and allocate it to e.g. body1. And when I try to make a function with that body1 it fails... And I have no clue what I should try else...
This question How to create an R function programmatically? helped me a lot but sadly only quote is used to set the body...
I hope you have an idea how to work around with this issue.
And now my code:
A.m=matrix(c(3,4,2,2,1,1,1,3,2),ncol=3,byrow=TRUE)
for(i in 1:dim(A.m)[1]) {
body=character()
# here the string-formula emerges
for(l in 1:dim(A.m)[2]) {
body=paste0(body,"A.m[",i,",",l,"]","*x[",l,"]+")
}
# only the last plus-sign is cutted off
assign(paste0("body",i),substr(body,1,nchar(body)-1))
}
args=alist(x = )
# just for your convenience the console output
body1
## [1] "A.m[1,1]*x[1]+A.m[1,2]*x[2]+A.m[1,3]*x[3]"
# in this code-line I don't know how to pass body1 in feasible way
assign("Function_1", as.function(c(args, ???body1???), env = parent.frame())
And this is my aim:
Function_1(x=c(1,1,1))
## 9 # 3*1 + 4*1 + 2*1
Since you have a string, you need to parse that string. You can do
assign("Function_1",
as.function(c(args, parse(text=body1)[[1]])),
env = parent.frame())
Though I would strongly discourage the use of assign for filling your global environment with a bunch of variables with indexes in their name. In general that makes things much tougher to program with. It would be much easier to collect all your functions in a list. For example
funs <- lapply(1:dim(A.m)[1], function(i) {
body <- ""
for(l in 1:dim(A.m)[2]) {
body <- paste0(body,"A.m[",i,",",l,"]","*x[",l,"]+")
}
body <- substr(body,1,nchar(body)-1)
body <- parse(text=body)[[1]]
as.function(c(alist(x=), body), env=parent.frame())
})
And then you can call the different functions by extracting them with [[]]
funs[[1]](x=c(1,1,1))
# [1] 9
funs[[2]](x=c(1,1,1))
# [1] 4
Or you can ever call all the functions with an lapply
lapply(funs, function(f, ...) f(...), x=c(1,1,1))
# [[1]]
# [1] 9
# [[2]]
# [1] 4
# [[3]]
# [1] 6
Although if this is actually what your function is doing, there are easier ways to do this in R using matrix multiplication %*%. Your Function_1 is the same as A.m[1,] %*% c(1,1,1). You could make a generator funciton like
colmult <- function(mat, row) {
function(x) {
as.numeric(mat[row,] %*% x)
}
}
And then create the same list of functions with
funs <- lapply(1:3, function(i) colmult(A.m, i))
Then you don't need any string building or parsing which tends to be error prone.
So consider the following chunk of code which does not work as most people might expect it to
#cartoon example
a <- c(3,7,11)
f <- list()
#manual initialization
f[[1]]<-function(x) a[1]+x
f[[2]]<-function(x) a[2]+x
f[[3]]<-function(x) a[3]+x
#desired result for the rest of the examples
f[[1]](1)
# [1] 4
f[[3]](1)
# [1] 12
#attempted automation
for(i in 1:3) {
f[[i]] <- function(x) a[i]+x
}
f[[1]](1)
# [1] 12
f[[3]](1)
# [1] 12
Note that we get 12 both times after we attempt to "automate". The problem is, of course, that i isn't being enclosed in the function's private environment. All the functions refer to the same i in the global environment (which can only have one value) since a for loop does not seem to create different environment for each iteration.
sapply(f, environment)
# [[1]]
# <environment: R_GlobalEnv>
# [[2]]
# <environment: R_GlobalEnv>
# [[3]]
# <environment: R_GlobalEnv>
So I though I could get around with with the use of local() and force() to capture the i value
for(i in 1:3) {
f[[i]] <- local({force(i); function(x) a[i]+x})
}
f[[1]](1)
# [1] 12
f[[3]](1)
# [1] 12
but this still doesn't work. I can see they all have different environments (via sapply(f, environment)) however they appear to be empty (ls.str(envir=environment(f[[1]]))). Compare this to
for(i in 1:3) {
f[[i]] <- local({ai<-i; function(x) a[ai]+x})
}
f[[1]](1)
# [1] 4
f[[3]](1)
# [1] 12
ls.str(envir=environment(f[[1]]))
# ai : int 1
ls.str(envir=environment(f[[3]]))
# ai : int 3
So clearly the force() isn't working like I was expecting. I was assuming it would capture the current value of i into the current environment. It is useful in cases like
#bad
f <- lapply(1:3, function(i) function(x) a[i]+x)
#good
f <- lapply(1:3, function(i) {force(i); function(x) a[i]+x})
where i is passed as a parameter/promise, but this must not be what's happening in the for-loop.
So my question is: is possible to create this list of functions without local() and variable renaming? Is there a more appropriate function than force() that will capture the value of a variable from a parent frame into the local/current environment?
This isn't a complete answer, partly because I'm not sure exactly what the question is (even though I found it quite interesting!).
Instead, I'll just present two alternative for-loops that do work. They've helped clarify the issues in my mind (in particular by helping me to understand for the first time why force() does anything at all in a call to lapply()). I hope they'll help you as well.
First, here is one that's a much closer equivalent of your properly function lapply() call, and which works for the same reason that it does:
a <- c(3,7,11)
f <- list()
## `for` loop equivalent of:
## f <- lapply(1:3, function(i) {force(i); function(x) a[i]+x})
for(i in 1:3) {
f[[i]] <- {X <- function(i) {force(i); function(x) a[i]+x}; X(i)}
}
f[[1]](1)
# [1] 4
Second, here is one that does use local() but doesn't (strictly- or literally-speaking) rename i. It does, though, "rescope" it, by adding a copy of it to the local environment. In one sense, it's only trivially different from your functioning for-loop, but by focusing attention on i's scope, rather than its name, I think it helps shed light on the real issues underlying your question.
a <- c(3,7,11)
f <- list()
for(i in 1:3) {
f[[i]] <- local({i<-i; function(x) a[i]+x})
}
f[[1]](1)
# [1] 4
Will this approach work for you?
ff<-list()
for(i in 1:3) {
fillit <- parse(text=paste0('a[',i,']+x' ))
ff[[i]] <- function(x) ''
body(ff[[i]])[1]<-fillit
}
It's sort of a lower-level way to construct a function, but it does work "as you want it to."
An alternative for force() that would work in a for-loop local environment is
capture <- function(...) {
vars<-sapply(substitute(...()), deparse);
pf <- parent.frame();
Map(assign, vars, mget(vars, envir=pf, inherits = TRUE), MoreArgs=list(envir=pf))
}
Which can then be used like
for(i in 1:3) {
f[[i]] <- local({capture(i); function(x) a[i]+x})
}
(Taken from the comments in Josh's answer above)
In the following example I created the add_timing function operator. The input is a function (say mean) and it returns a function that does the same as mean, but reports on how long it took for the function to complete. See the following example:
library(pryr)
add_timing = function(input_function, specific_info) {
if (missing(specific_info)) specific_info = function(l) 'That'
function(...) {
relevant_value = specific_info(list(...))
start_time = Sys.time()
res = input_function(...)
cat(sprintf('%s took', relevant_value), difftime(Sys.time(), start_time, units = 'secs'), 'sec', '\n')
res
}
}
timed_mean = add_timing(mean)
# > timed_mean(runif(10000000))
# That took 0.4284899 sec
# [1] 0.4999762
Next I tried to use pryr::compose to create the same timed_mean function (I like the syntax):
timed_mean_composed = pryr::compose(add_timing, mean)
But this does get me the required output:
# > timed_mean_composed(runif(100))
# function(...) {
# relevant_value = specific_info(list(...))
# start_time = Sys.time()
# res = input_function(...)
# cat(sprintf('%s took', relevant_value), difftime(Sys.time(), start_time, units = 'secs'), 'sec', '\n')
# res
# }
It seems that the compose operation does not lead to the add_timing function actually being executed. Only after calling the function, the new timed_mean_compose actually shows the correct function output.
Based on the following example from Advanced R by #HadleyWickham I expected this to work as I used it (see below for an excerpt):
dot_every <- function(n, f) {
i <- 1
function(...) {
if (i %% n == 0) cat(".")
i <<- i + 1
f(...)
}
}
download <- pryr::compose(
partial(dot_every, 10),
memoise,
partial(delay_by, 1),
download_file
)
Where the dot_every function operator is used in the same way I use add_timing above.
What am I missing?
The difference is that in your first attempt, you are calling
(add_timing(mean))(runif(1e7)
and with the compose syntax you are calling something more similar to
add_timing(mean(runif(1e7))
These are not exactly equivalent. Actually, the pryr compose function is really expanding the syntax to something more like
x <- runif(1e7)
x <- mean(x)
x <- add_timing(x)
Maybe looking at this will help
a <- function(x) {print(paste("a:", x));x}
b <- function(x) {print(paste("b:", x));x}
x <- pryr::compose(a,b)(print("c"))
# [1] "c"
# [1] "b: c"
# [1] "a: c"
Notice how a isn't called until after b. This means that a would have no way to time b. compose would not be an appropriate way to create a timer wrapper.
The issue is that pryr::compose is aimed at doing something completely different from what you're trying to do in your initial example. You want to create a function factory (called add_timing), which will take a function as input and return a new function as output that does the same thing as the input function but with an additional time printing. I would write that as follows:
add_timing <- function(FUN) { function(...) { print(system.time(r <- FUN(...))); r }}
mean(1:5)
# [1] 3
add_timing(mean)(1:5)
# user system elapsed
# 0 0 0
# [1] 3
The compose function, by contrast, returns a function that represents a series of functions to be evaluated in sequence. The examples in ? compose are helpful here. Here's an example that builds on that:
add1 <- function(x) x + 1
times2 <- function(x) x * 2
# the following two are identical:
add1(1)
# [1] 2
compose(add1)(1)
# [1] 2
# the following two are identical:
times2(1)
# [1] 2
compose(times2)(1)
# [1] 2
compose becomes useful for nesting, when the order of nesting is important:
add1(times2(2))
# [1] 5
compose(add1, times2)(2)
# [1] 5
times2(add1(2))
# [1] 6
compose(times2, add1)(2)
# [1] 6
This means that the reason your example does not work is because your functions are not actually nested in the way that compose is intended to work. In your example, you're asking system.time to, for example, calculate the time to evaluate 3 (the output of mean) rather than the time to evaluate mean(1:5).
So consider the following chunk of code which does not work as most people might expect it to
#cartoon example
a <- c(3,7,11)
f <- list()
#manual initialization
f[[1]]<-function(x) a[1]+x
f[[2]]<-function(x) a[2]+x
f[[3]]<-function(x) a[3]+x
#desired result for the rest of the examples
f[[1]](1)
# [1] 4
f[[3]](1)
# [1] 12
#attempted automation
for(i in 1:3) {
f[[i]] <- function(x) a[i]+x
}
f[[1]](1)
# [1] 12
f[[3]](1)
# [1] 12
Note that we get 12 both times after we attempt to "automate". The problem is, of course, that i isn't being enclosed in the function's private environment. All the functions refer to the same i in the global environment (which can only have one value) since a for loop does not seem to create different environment for each iteration.
sapply(f, environment)
# [[1]]
# <environment: R_GlobalEnv>
# [[2]]
# <environment: R_GlobalEnv>
# [[3]]
# <environment: R_GlobalEnv>
So I though I could get around with with the use of local() and force() to capture the i value
for(i in 1:3) {
f[[i]] <- local({force(i); function(x) a[i]+x})
}
f[[1]](1)
# [1] 12
f[[3]](1)
# [1] 12
but this still doesn't work. I can see they all have different environments (via sapply(f, environment)) however they appear to be empty (ls.str(envir=environment(f[[1]]))). Compare this to
for(i in 1:3) {
f[[i]] <- local({ai<-i; function(x) a[ai]+x})
}
f[[1]](1)
# [1] 4
f[[3]](1)
# [1] 12
ls.str(envir=environment(f[[1]]))
# ai : int 1
ls.str(envir=environment(f[[3]]))
# ai : int 3
So clearly the force() isn't working like I was expecting. I was assuming it would capture the current value of i into the current environment. It is useful in cases like
#bad
f <- lapply(1:3, function(i) function(x) a[i]+x)
#good
f <- lapply(1:3, function(i) {force(i); function(x) a[i]+x})
where i is passed as a parameter/promise, but this must not be what's happening in the for-loop.
So my question is: is possible to create this list of functions without local() and variable renaming? Is there a more appropriate function than force() that will capture the value of a variable from a parent frame into the local/current environment?
This isn't a complete answer, partly because I'm not sure exactly what the question is (even though I found it quite interesting!).
Instead, I'll just present two alternative for-loops that do work. They've helped clarify the issues in my mind (in particular by helping me to understand for the first time why force() does anything at all in a call to lapply()). I hope they'll help you as well.
First, here is one that's a much closer equivalent of your properly function lapply() call, and which works for the same reason that it does:
a <- c(3,7,11)
f <- list()
## `for` loop equivalent of:
## f <- lapply(1:3, function(i) {force(i); function(x) a[i]+x})
for(i in 1:3) {
f[[i]] <- {X <- function(i) {force(i); function(x) a[i]+x}; X(i)}
}
f[[1]](1)
# [1] 4
Second, here is one that does use local() but doesn't (strictly- or literally-speaking) rename i. It does, though, "rescope" it, by adding a copy of it to the local environment. In one sense, it's only trivially different from your functioning for-loop, but by focusing attention on i's scope, rather than its name, I think it helps shed light on the real issues underlying your question.
a <- c(3,7,11)
f <- list()
for(i in 1:3) {
f[[i]] <- local({i<-i; function(x) a[i]+x})
}
f[[1]](1)
# [1] 4
Will this approach work for you?
ff<-list()
for(i in 1:3) {
fillit <- parse(text=paste0('a[',i,']+x' ))
ff[[i]] <- function(x) ''
body(ff[[i]])[1]<-fillit
}
It's sort of a lower-level way to construct a function, but it does work "as you want it to."
An alternative for force() that would work in a for-loop local environment is
capture <- function(...) {
vars<-sapply(substitute(...()), deparse);
pf <- parent.frame();
Map(assign, vars, mget(vars, envir=pf, inherits = TRUE), MoreArgs=list(envir=pf))
}
Which can then be used like
for(i in 1:3) {
f[[i]] <- local({capture(i); function(x) a[i]+x})
}
(Taken from the comments in Josh's answer above)
UPDATE: I have added a variant
of Roland's implementation to the kimisc package.
Is there a convenience function for exporting objects to the global environment, which can be called from a function to make objects available globally?
I'm looking for something like
export(obj.a, obj.b)
which would behave like
assign("obj.a", obj.a, .GlobalEnv)
assign("obj.b", obj.b, .GlobalEnv)
Rationale
I am aware of <<- and assign. I need this to refactor oldish code which is simply a concatenation of scripts:
input("script1.R")
input("script2.R")
input("script3.R")
script2.R uses results from script1.R, and script3.R potentially uses results from both 1 and 2. This creates a heavily polluted namespace, and I wanted to change each script
pollute <- the(namespace)
useful <- result
to
(function() {
pollute <- the(namespace)
useful <- result
export(useful)
})()
as a first cheap countermeasure.
Simply write a wrapper:
myexport <- function(...) {
arg.list <- list(...)
names <- all.names(match.call())[-1]
for (i in seq_along(names)) assign(names[i],arg.list[[i]],.GlobalEnv)
}
fun <- function(a) {
ttt <- a+1
ttt2 <- a+2
myexport(ttt,ttt2)
return(a)
}
print(ttt)
#object not found error
fun(2)
#[1] 2
print(ttt)
#[1] 3
print(ttt2)
#[1] 4
Not tested thoroughly and not sure how "safe" that is.
You can create an environment variable and use it within your export function. For example:
env <- .GlobalEnv ## better here to create a new one :new.env()
exportx <- function(x)
{
x <- x+1
env$y <- x
}
exportx(3)
y
[1] 4
For example , If you want to define a global options(emulate the classic R options) in your package ,
my.options <- new.env()
setOption1 <- function(value) my.options$Option1 <- value
EDIT after OP clarification:
You can use evalq which take 2 arguments :
envir the environment in which expr is to be evaluated
enclos where R looks for objects not found in envir.
Here an example:
env.script1 <- new.env()
env.script2 <- new.env()
evalq({
x <- 2
p <- 3
z <- 5
} ,envir = env.script1,enclos=.GlobalEnv)
evalq({
h <- x +2
} ,envir = env.script2,enclos=myenv.script1)`
You can see that all variable are created within the environnment ( like local)
env.script2$h
[1] 4
env.script1$p
[1] 3
> env.script1$x
[1] 2
First, given your use case, I don't see how an export function is any better than using good (?) old-fashioned <<-. You could just do
(function() {
pollute <- the(namespace)
useful <<- result
})()
which will give the same result as what's in your example.
Second, rather than anonymous functions, it seems better form to use local, which allows you to run involved computations without littering your workspace with various temporary objects.
local({
pollute <- the(namespace)
useful <<- result
})
ETA: If it's important for whatever reason to avoid modifying an existing variable called useful, put an exists check in there. The same applies to the other solutions presented.
local({
.....
useful <- result
if(!exists("useful", globalenv())) useful <<- useful
})