My question
If an object x is passed to a function f that modifies it R will create a modified local copy of x within f's environment, rather than changing the original object (due to the copy-on-change principle). However, I have a situation where x is very big and not needed once it has been passed to f, so I want to avoid storing the original copy of x once f is called. Is there a clever way to achieve this?
f is an unknown function to be supplied by a possibly not very clever user.
My current solution
The best I have so far is to wrap x in a function forget that makes a new local reference to x called y, removes the original reference in the workspace, and then passes on the new reference. The problem is that I am not certain it accomplish what I want and it only works in globalenv(), which is a deal breaker in my current case.
forget <- function(x){
y <- x
# x and y now refers to the same object, which has not yet been copied
print(tracemem(y))
rm(list=deparse(substitute(x)), envir=globalenv())
# The outside reference is now removed so modifying `y`
# should no longer result in a copy (other than the
# intermediate copy produced in the assigment)
y
}
f <- function(x){
print(tracemem(x))
x[2] <- 9000.1
x
}
Here is an example of calling the above function.
> a <- 1:3
> tracemem(a)
[1] "<0x2ac1028>"
> b <- f(forget(a))
[1] "<0x2ac1028>"
[1] "<0x2ac1028>"
tracemem[0x2ac1028 -> 0x2ac1e78]: f
tracemem[0x2ac1e78 -> 0x308f7a0]: f
> tracemem(b)
[1] "<0x308f7a0>"
> b
[1] 1.0 9000.1 3.0
> a
Error: object 'a' not found
Bottom line
Am I doing what I hope I am doing and is there a better way to do it?
(1) Environments You can use environments for that:
e <- new.env()
e$x <- 1:3
f <- function(e) with(e, x <- x + 1)
f(e)
e$x
(2) Reference Classes or since reference classes automatically use environments use those:
E <- setRefClass("E", fields = "x",
methods = list(
f = function() x <<- x + 1
)
)
e <- E$new(x = 1:3)
e$f()
e$x
(3) proto objects also use environments:
library(proto)
p <- proto(x = 1:3, f = function(.) with(., x <- x + 1))
p$f()
p$x
ADDED: proto solution
UPDATED: Changed function name to f for consistency with question.
I think the easiest approach is to only load the working copy into memory, instead of loading both the original (global namespace) and the working copy (function namespace). You can sidestep your whole issue by using the 'ff' package to define your 'x' and 'y' data sets as 'ffdf' data frames. As I understand it, 'ffdf' data frames reside on disk and load into memory only as parts of the data frame are needed and purge when those parts are no longer necessary. This would mean, theoretically, that the data would be loaded into memory to copy into the function namespace and then purged after the copy was complete.
I'll admit that I rarely have to use the 'ff' package, and when I do, I usually don't have any issues at all. I'm not checking specific memory usage, though, and my goal is usually just to perform a large calculation across the data. It works, and I don't ask questions.
Related
I want to add an environment to a search path and modify the values of variables within that environment, in a limited chunk of code, without having to specify the name of the environment every time I refer to a variable: for example, given the environment
ee <- list2env(list(x=1,y=2))
Now I would like to do stuff like
ee$x <- ee$x+1
ee$y <- ee$y*2
ee$z <- 6
but without appending ee$ to everything (or using assign("x", ee$x+1, ee) ... etc.): something like
in_environment(ee, {
x <- x+1
y <- y+2
z <- 6
})
Most of the solutions I can think of are explicitly designed not to modify the environment, e.g.
?attach: "The database is not actually attached. Rather, a new environment
is created on the search path ..."
within(): takes lists or data frames (not environments) "... and makes the corresponding modifications to a copy of ‘data’"
There are two problems with <<-: (1) using it will cause NOTEs in CRAN checks (I think? can't find direct evidence of this, but e.g. see here — maybe this only happens because of the appearance of assigning to a locally undefined symbol? I guess I could put this in a package and test it with --as-cran to confirm ...); (2) it will try to assign in the parent environment, which in a package context [which this is] will be locked ...
I suppose I could use a closure as described in section 10.7 of the Introduction to R by doing
clfun <- function() {
x <- 1
y <- 2
function(...) {
x <<- x + 1
y <<- y * 2
}
}
myfun <- clfun()
This seems convoluted (but I guess not too bad?) but:
will still incur problem #1 (CRAN check?).
I think (??) it won't work with variables that don't already exist in the environment (would need an explicit assign() for that ...)
doesn't allow a choice of which environment to operate in - it's necessarily going to work in the enclosing environment, not with arbitrary environment ee
Am I missing something obvious and idiomatic?
Thanks to #Nuclear03020704 ! I think with() was what I wanted all along; I was incorrectly assuming that it would also create a local copy of the environment, but it only does this if the data argument is not already an environment.
ee <- list2env(list(x=1,y=2))
with(ee, {
x <- x+1
y <- y+2
z <- 6
})
does exactly that I want.
Just had another idea, which also seems to have some drawbacks: using a big eval clause. Rather than make my question a long laundry list of unsatisfactory solutions, I'll add it here.
myfun <- function() {
eval(quote( {
x <- x+1
y <- y*2
z <- 3
}), envir=ee)
}
This does seem to work, but also seems very weird/mysterious! I hate to think about explaining it to someone who's being using R for less than 10 years ... I suppose I could write an in_environment() based on this, but I'd have to be very careful to capture the expression properly without evaluating it ...
What about with()? From here,
with(data, expr)
data is the data to use for constructing an environment. For the default with method this may be an environment, a list, a data frame, or an integer.
expr is the expression to evaluate.
with is a generic function that evaluates expr in a local environment constructed from data. The environment has the caller's environment as its parent. This is useful for simplifying calls to modeling functions. (Note: if data is already an environment then this is used with its existing parent.)
Note that assignments within expr take place in the constructed environment and not in the user's workspace.
with() returns value of the evaluated expr.
ee <- list2env(list(x=1,y=2))
with(ee, {
x <- x+1
y <- y+2
z <- 6
})
Is there any way to change the set the working environment or workspace in R to a defined environment? I'd like to call the variables in an environment without constantly referring to the environment. I understand that the attach function can be used to access the variable in this manner, but any variables created don't get placed back in the attached environment. The goal is to have all the functions take place in the other environment.
For example:
original.env <- .GlobalEnv
other.env <- new.env()
other.env$A <- 12; other.env$B <- 1.5
set.env(other.env)
C <- A + B
set.env(original.env)
other.env$C
[1] 13.5
It's the step with set.env which I can't figure out if it exists, or there's some other trick to doing this.
The goal of this approach is to allow the same code to be used on different data sets with identical structures in several non-nested environments without constantly calling other environments with the Environment$ prefix, which gets really verbose in many cases.
If the results can also be assigned back to the set environment (as in the global environment, everything has an implicit .GlobalEnv$ in front of any variable) it would make it way easier to access and return multiple values from inside of a function.
Any help is appreciated. Thanks.
You are looking for eval / evalq
From help("evalq")
Evaluate an R expression in a specified environment.
Specifically, evalq noting
The evalq form is equivalent to eval(quote(expr), ...). eval evaluates its first argument in the current scope before passing it to the evaluator: evalq avoids this.
# Therefore you want something like this
evalq(C <- A + B, envir = other.env)
If you want to wrap more than one expression use {} eg
evalq({C <-A + B
d <- 5
}, envir = other.env)
How about using with?
original.env <- .GlobalEnv
other.env <- new.env()
other.env$A <- 12; other.env$B <- 1.5
with(other.env, { C <- A + B ; FOO <- 'bar' })
other.env$C
[1] 13.5
other.env$FOO
[1] "bar"
I'm trying to write a function, which limits the scope of R variables. For example,
source("LimitScope.R")
y = 0
f = function(){
#Raises an error as y is a global variable
x = y
}
I thought of testing the variable environment, but wasn't really sure of how to do this.
The why
I teach R to undergrads. In their first couple of practicals a few of them always forget about variable scope, so their submitted functions don't work. For example, I always get something like:
n = 10
f = function(x){
#Raises an error
#as I just source f and test it for a few test cases.
return(x*n)
}
I was after a quick function that would 'turn off' scope. As you can imagine it doesn't have to be particularly robust, as it would just be offered for the few practicals.
I'm not sure that you want to do this in general, but the local() function should help, as should the codetools library.
In your example, try
f = local( function() { ... }, baseenv())
It does not do exactly what you want, but it should get you closer.
You can force a variable to be the local version with this function:
get_local <- function(variable)
{
get(variable, envir = parent.frame(), inherits = FALSE)
}
Compare these cases
y <- 0
f <- function()
{
x <- y
}
print(f()) # 0
y <- 0
f <- function()
{
y <- get_local("y")
x <- y
}
print(f()) # Error: 'y' not found
Depending on what you are doing, you may also want to check to see if y was an argument to f, using formalArgs or formals.
g <- function(x, y = TRUE, z = c("foo", "bar"), ...) 0
formalArgs(g)
# [1] "x" "y" "z" "..."
formals(g)
#$x
#
#
#$y
#[1] TRUE
#
#$z
#c("foo", "bar")
#
#$...
EDIT: The more general point of 'how to turn off lexical scoping without changing the contents of functions' is harder to solve. I'm fairly certain that the scoping rules are pretty ingrained into R. An alternative might be to use S-Plus, since it has different scoping rules.
You can check if y exists in the global environment using exists('y',envir=.GlobalEnv)
What occassionally happens to me is that I've got a split screen in ESS with a file buffer of R code on the left and the interpreter on the right. I may have set some values in the interpreter while I debug the code I am working on in the buffer. It's then possible that the code in the buffer accidentally refereneces something I set in the intepreter. That's hard to detect problem unless I evaluate my buffer in a fresh interpreter every time, which is not how ESS works by default.
If this is the kind of problem you are seeing frequently, an rm(list=ls(envir=.GlobalEnv)) in the thing you source might help, but that of course creates other issues, such as erasing anything they were using to keep state while debugging, etc.
I have a function that needs to access a variable in its parent environment (scope from which the function is called). The variable is large in terms of memory so I would prefer not to pass it to by value to the function being called. Is there a standard way of doing this other than declaring the variable in the global scope? For example:
g <- function (a, b) { #do stuff}
f <- function(x) {
y <- 3 #but in my program y is very large
g(x, y)
}
I would like to access y in g(). So something like this:
g <- function (a) { a+y }
f <- function(x) {
y <- 3 #but in my program y is very large
g(x)
}
Is this possible?
Thanks
There is no advantage to "declaring the variable in the global scope" and it may not even be possible in R depending on what you mean by that. You certainly could use the second form. The action that causes duplicate or even triplicate copies of an object is assignment. You will need to describe in more detail what you are trying to illustrate by the code: y <- 3. That would not normally be needed inside a function that merely accessed an object named "y" that was located in an enclosing frame.
Storing variables in a declared environment will sometimes improve efficiency of access, but my understanding is that the efficiency is in terms of improved speed because a hash table is used. One accesses items in an environment in the same manner as one accesses list elements:
> evn <- new.env()
> evn$a <- rnorm(100000)
> ls(evn)
[1] "a"
> length(evn$a)
[1] 100000
The BigMemory project may offer facilities for this:
http://www.bigmemory.org/ .
It and Lumley's biglm may help with the large dataset mentioned in the comments.
I'm trying to write a function, which limits the scope of R variables. For example,
source("LimitScope.R")
y = 0
f = function(){
#Raises an error as y is a global variable
x = y
}
I thought of testing the variable environment, but wasn't really sure of how to do this.
The why
I teach R to undergrads. In their first couple of practicals a few of them always forget about variable scope, so their submitted functions don't work. For example, I always get something like:
n = 10
f = function(x){
#Raises an error
#as I just source f and test it for a few test cases.
return(x*n)
}
I was after a quick function that would 'turn off' scope. As you can imagine it doesn't have to be particularly robust, as it would just be offered for the few practicals.
I'm not sure that you want to do this in general, but the local() function should help, as should the codetools library.
In your example, try
f = local( function() { ... }, baseenv())
It does not do exactly what you want, but it should get you closer.
You can force a variable to be the local version with this function:
get_local <- function(variable)
{
get(variable, envir = parent.frame(), inherits = FALSE)
}
Compare these cases
y <- 0
f <- function()
{
x <- y
}
print(f()) # 0
y <- 0
f <- function()
{
y <- get_local("y")
x <- y
}
print(f()) # Error: 'y' not found
Depending on what you are doing, you may also want to check to see if y was an argument to f, using formalArgs or formals.
g <- function(x, y = TRUE, z = c("foo", "bar"), ...) 0
formalArgs(g)
# [1] "x" "y" "z" "..."
formals(g)
#$x
#
#
#$y
#[1] TRUE
#
#$z
#c("foo", "bar")
#
#$...
EDIT: The more general point of 'how to turn off lexical scoping without changing the contents of functions' is harder to solve. I'm fairly certain that the scoping rules are pretty ingrained into R. An alternative might be to use S-Plus, since it has different scoping rules.
You can check if y exists in the global environment using exists('y',envir=.GlobalEnv)
What occassionally happens to me is that I've got a split screen in ESS with a file buffer of R code on the left and the interpreter on the right. I may have set some values in the interpreter while I debug the code I am working on in the buffer. It's then possible that the code in the buffer accidentally refereneces something I set in the intepreter. That's hard to detect problem unless I evaluate my buffer in a fresh interpreter every time, which is not how ESS works by default.
If this is the kind of problem you are seeing frequently, an rm(list=ls(envir=.GlobalEnv)) in the thing you source might help, but that of course creates other issues, such as erasing anything they were using to keep state while debugging, etc.