R language Functions within lists - r

G'Day, I am a newbie at R and I have GOOGLED and read books and had lots of play, but I can't seem to figure out if what I am doing is implemented. It compiles (no interpreter spit) and can be called (again no spit), it just doesn't seem to want to do anything.
OK. SYNOPSIS.
I read that lists in R are the OBJECTS of other languages. So just for a Saturday and Sunday play I have been trying to get this to work.
GLOBAL <- list( counter = 1,
locked = FALSE,
important_value = 42,
copy_of_important_value = 42,
lock = function() { GLOBAL$locked = TRUE },
unlock = function() { GLOBAL$locked = FALSE },
is_locked = function() { return(GLOBAL$locked )},
visit = function() { GLOBAL$counter <- GLOBAL$counter + 1 })
> GLOBAL$locked
[1] FALSE
>
This works...
> GLOBAL$locked <- TRUE
> GLOBAL$locked
[1] TRUE
>
This does not
> GLOBAL$unlock()
> GLOBAL$locked
[1] TRUE
>
Has R got a $this or $self construct? None of this generates any errors. Just doesn't seem to want to do anything! (functions that is). I suppose I could set up a function as a routing access table, but I thought the encapsulation would be nifty.
Second question. It has been mentioned to me several times that R MUST keep all data in memory, and that is a limitation. Does that include swp on *NIX systems? I mean, if you had a humungus matrix could you just add some swap to make it fit?
Sorry for dumb newbie questions

This can be done using proto objects:
library(proto) # home page at http://r-proto.googlecode.com
GLOBAL <- proto( counter = 1,
locked = FALSE,
important_value = 42,
copy_of_important_value = 42,
lock = function(.) { .$locked = TRUE },
unlock = function(.) { .$locked = FALSE },
is_locked = function(.) { return(.$locked )},
visit = function(.) { .$counter <- .$counter + 1 })
GLOBAL$locked <- TRUE
GLOBAL$unlock()
GLOBAL$locked
## FALSE

The S3 way of doing things.
GLOBAL <- list(counter=1, locked=FALSE,
important_value=42, copy_of_important_value=42)
class(GLOBAL) <- "foo"
lock <- function(x, ...) UseMethod("lock")
lock.foo <- function(x)
{
x$locked <- TRUE
x
}
unlock <- function(x, ...) UseMethod("unlock")
unlock.foo <- function(x)
{
x$locked <- FALSE
x
}
is_locked <- function(x) x$locked
visit <- function(x)
{
x$counter <- x$counter + 1
x
}
GLOBAL <- lock(GLOBAL) # locked is now TRUE
GLOBAL <- unlock(GLOBAL) # locked is now FALSE

There's also the enclosure method
getGlobal <- function() {
counter <- 1
locked <- FALSE
important_value <- 42
list(
is_locked = function() locked,
lock = function() locked<<-TRUE,
unlock = function() locked<<-FALSE,
visit = function() {counter <<- counter + 1 }
)
}
And then you would use
GLOBAL <- getGlobal()
GLOBAL$is_locked()
# [1] FALSE
GLOBAL$lock()
GLOBAL$is_locked()
# [1] TRUE
So the state is stored in the enclosure and getGlobal returns a list of functions you can use to access those variables not otherwise exposed.

Nothing happened because
R doesn't have any variables to go get because they way you have it, = does not mean assignment has occurred inside the list(). So the only object in the global environment is GLOBAL. The way you're using = right now is assigning the list names to the left-hand side, and they are subsequently accessed with the $ operator.
Your functions are not returning a value as they're written. GLOBAL$locked() will not return a value if GLOBAL$locked <- FALSE is all you have inside the body of the function that calls it. So I wrap it in parentheses, and is returns our desire values.
So we just need to assign locked to the global environment first, then <<- will reassign it.
I shortened your list a bit. Here's a look:
> GLOBAL <- list(locked = assign("locked", FALSE, parent.frame()),
lock = function() { (GLOBAL$locked <<- TRUE) },
unlock = function() { (GLOBAL$locked <<- 'HELLO') },
is_locked = function() { return(NULL) })
> GLOBAL$locked
[1] FALSE
> GLOBAL$lock()
[1] TRUE
> GLOBAL$unlock()
[1] "HELLO"
> GLOBAL$is_locked()
NULL
Yes, a list has its own environment, separate from the global environment. An example of this is
> l <- list(x = 5, y = 10)
> within(l, {
f <- function(x) 2 * x
})
$x
[1] 5
$y
[1] 10
$f
function (x)
2 * x
<environment: 0xb041278>
but we are currently in the global environment
> environment()
<environment: R_GlobalEnv>
It's funny you should ask this question because I just asked a question about the same thing yesterday. MrFlick provided a very good explanation on that question.

Related

identical() but for environments/R6 in base R?

If I can run code before and after a user runs some code, how can I detect which variables were set or changed using base R? I can do this using identical() for non-environment objects. But is there a base-R solution for environments, including R6 classes?
Here's a solution using identical() which fails for envs/R6:
# Copy of initial vars
this_frame = sys.frame()
start_vars = ls()
start_copy = lapply(start_vars, get, envir = this_frame )
names(start_copy) = start_vars
# (user code here)
# Assess what's new and what's changed
end_vars = ls()
new_vars = end_vars[end_vars %in% start_vars == FALSE]
old_vars = end_vars[end_vars %in% start_vars == TRUE]
changed_vars = old_vars[sapply(old_vars, function(x) identical(get(x, envir = this_frame), start_copy[[x]])) == FALSE]
I'm writing a package that lets users run code in a separate session. I'd like to return only objects that were changed.
This solution detects changes in an environment, sub-environments, and R6-classes.
General approach
run start_state = env_as_list() on sys.frame()which stores everything in a list and recursively converts all environments/R6 and sub-environments/R6 to list.
Let the user manipulate stuff
Run end_state = env_as_list() and use identical() to detect changes between start_state and end_state.
env_as_list = function(env) {
rapply(
object = as.list(env, all.names = TRUE),
f = function(x) {
if ("R6" %in% class(x)) {
# R6 to list without recursion
x = as.list(x, all.names = TRUE)
x$.__enclos_env__$self = NULL
x$.__enclos_env__$super = NULL
env_as_list(x)
} else if (is.environment(x)) {
env_as_list(x)
} else {
stop("Impossible to get here")
}
},
classes = c("environment", "R6"),
how = "replace"
)
}
Demonstration
Let's test it: let's fill globalenv() with a some stuff to begin with:
R6_class = R6::R6Class("Testing", list(a = 1))
my_R6 = R6_class$new()
my_env = new.env()
my_env$sub_env = new.env()
my_env$sub_env$some_value = 2
my_regular = rnorm(5)
Snapshot time!
start_state = env_as_list(sys.frame())
Let the user play:
my_R6$a = 99 # Change R6
new_regular = 3 # new var
my_env$sub_env$some_value = 99 # Change sub-environment
Snapshot again!
end_state = env_as_list(sys.frame())
end_state$start_state = NULL # don't include this
Did nothing change?
> identical(start_state, end_state))
# FALSE
Which variables changed?
> is_same = lapply(names(end_state), function(x) identical(start_state[[x]], end_state[[x]]))
> names(end_state)[is_same == FALSE]
# "my_env" "new_regular" "my_R6"
Bonus
You can also use this to compute the size of an environment, including all R6 and sub-environments. Simply:
object.size(env_as_list(globalenv()))

Exporting function inputs in parallel processing in R

I am trying to write a function which has a parallel computation option.
To make it work in both windows, and mac or Linux environments, I am using a PSOCK system, which I believe is a default configuration in makeCluster(). My question is whether I should, or it is more desirable, to pass all arguments to the clusters using a clusterExport function. If I do this, I think I need to evaluate all input arguments-- instead of the default lazy evaluation. If some variables are used only in some special cases, this does not seem desirable.
For example, in the following code, I am wondering whether I should add
clusterExport(varlist = c("a","b","c"),cl = cl,envir = environment()) in the function. The following code works fine in my computer, but a similar code failed in other's computer.
I would be very interested to hear about the best practice as well. Thank you!
library(pbapply)
foo = function(a=3, b=4, c=5, B = 8, parallel = FALSE){
if(parallel) {cl = makeCluster(4) } else{cl = NULL}
# default a,b,c values are desired to be used
if(a>5){
# c is used only in this case
v= pbsapply(1:B,FUN = function(i) {Sys.sleep(.5); a+b+c+i},cl = cl)
}else{
v= pbsapply(1:B,FUN = function(i) {Sys.sleep(.5); a+b+i},cl = cl)
}
if(parallel) stopCluster(cl)
return(v)
}
system.time(foo())
system.time(foo(parallel = T))
You could try to set defaults to NULL and do a case handling using sapply. I'm not sure, though, if this really works, because I can't reproduce your error.
foo <- function(a=NULL, b=NULL, c=NULL, B=NULL, parallel=FALSE) {
if (parallel) {
cl <- makeCluster(detectCores() - 4) ## safer to code this more dynamically
## case handling:
sapply(c("a", "b", "c", "B"), function(x) {
if (!is.null(get(x))) clusterExport(cl, x, environment())
})
} else {
cl <- NULL
}
# default a,b,c values are desired to be used
if (a > 5) {
# c is used only in this case
v <- pbsapply(1:B, FUN=function(i) {
Sys.sleep(.2)
a + b + c + i
}, cl=cl)
} else {
v <- pbsapply(1:B, FUN=function(i) {
Sys.sleep(.2)
a + b + i
}, cl=cl)
}
if (parallel) stopCluster(cl)
return(v)
}
foo(a=3, b=4, c=5, B=8, parallel=TRUE)
# |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s
# [1] 8 9 10 11 12 13 14 15

R function for obtaining a reference to a variable

In Advanced R, environments are advertised as a useful way to get pass-by-reference semantics in R: instead of passing a list, which gets copied, I can pass an environment, which is not. This is useful to know.
But it assumes that whoever is calling my function is happy to agree on an "environment"-based data type, with named slots corresponding to the variables we want to modify.
Hasn't someone made a class which allows me to just refer to a single variable by reference? For example,
v = 1:5
r <- ref(v)
(function() {
getRef(r) # same as v
setRef(r, 1:6) # same as v <<- 1:6, in this case
})()
It would seem to be pretty easy to do this, by storing the character name of v together with the environment where it is bound.
Is there a standard library which accomplishes this semantics, or can someone provide a short snippet of code? (I haven't finished reading "Advanced R"; apologies if this is covered later in the book)
As you have already mentioned in your question, you can store the variable name and its environment and access it with get and assign what will be somehow like a reference to a single variable.
v <- 1:5
r <- list(name="v", env=environment())
(function() {
get(r$name, envir = r$env)
assign(r$name, 1:6, envir = r$env)
})()
v
#[1] 1 2 3 4 5 6
Alternatively you can store the reference to an environment but then you can access everything in this referenced environment.
v <- 1:5
r <- globalenv() #reference to everything in globalenv
(function() {
r$v
r$v <- 1:6
})()
v
#[1] 1 2 3 4 5 6
You can also create an environment with only one variable and make a reference to it.
v <- new.env(parent=emptyenv())
v$v <- 1:5
r <- v
(function() {
r$v
r$v <- 1:6
})()
v$v
#[1] 1 2 3 4 5 6
Implemented as functions using find or set the environment during creation. Have also a look at How to get environment of a variable in R.
ref <- function(name, envir = NULL) {
name <- substitute(name)
if (!is.character(name)) name <- deparse(name)
if(length(envir)==0) envir <- as.environment(find(name))
list(name=name, envir=envir)
}
getRef <- function(r) {
get(r$name, envir = r$envir, inherits = FALSE)
}
setRef <- function(r, x) {
assign(r$name, x, envir = r$envir, inherits = FALSE)
}
x <- 1
r1 <- ref(x) #x from Global Environment
#x from Function Environment
r2 <- (function() {x <- 2; ref(x, environment())})()
#But simply returning x might here be better
r2b <- (function() {x <- 2; x})()
a <- new.env(parent=emptyenv())
a$x <- 3
r3 <- ref(x, a) #x from Environment a
This is based on GKi's answer, thanks to him for stepping up.
It includes pryr::where so you don't have to install the whole library
Note that we need to point "where" to parent.frame() in the definition of "ref"
Added some test cases which I used to check correctness
The code:
# copy/modified from pryr::where
where = function(name, env=parent.frame()) {
if (identical(env, emptyenv())) {
stop("Can't find ", name, call. = FALSE)
}
if (exists(name, env, inherits = FALSE)) {
env
} else {
where(name, parent.env(env))
}
}
ref <- function(v) {
arg <- deparse(substitute(v))
list(name=arg, env=where(arg, env=parent.frame()))
}
getRef <- function(r) {
get(r$name, envir = r$env, inherits = FALSE)
}
setRef <- function(r, x) {
assign(r$name, x, envir = r$env)
}
if(1) { # tests
v <- 1:5
r <- ref(v)
(function() {
stopifnot(identical(getRef(r),1:5))
setRef(r, 1:6)
})()
stopifnot(identical(v,1:6))
# this refers to v in the global environment
v=2; r=(function() {ref(v)})()
stopifnot(getRef(r)==2)
setRef(r,5)
stopifnot(getRef(r)==5)
stopifnot(v==5)
# same as above
v=2; r=(function() {v <<- 3; ref(v)})()
stopifnot(getRef(r)==3)
setRef(r,5)
stopifnot(getRef(r)==5)
stopifnot(v==5)
# this creates a local binding first, and refers to that. the
# global binding is unaffected
v=2; r=(function() {v=3; ref(v)})()
stopifnot(getRef(r)==3)
setRef(r,5)
stopifnot(getRef(r)==5)
stopifnot(v==2)
# additional tests
r=(function() {v=4; (function(v1) { ref(v1) })(v)})()
stopifnot(r$name=="v1")
stopifnot(getRef(r)==4)
setRef(r,5)
stopifnot(getRef(r)==5)
# check that outer v is not modified
v=2; r=(function() {(function(v1) { ref(v1) })(v)})()
stopifnot(getRef(r)==2)
setRef(r,5)
stopifnot(getRef(r)==5)
stopifnot(v==2)
}
I imagine there may be some garbage collection inefficiency if you're creating a reference to a small variable in a temporary environment with a different large variable, since the reference must retain the whole environment - although the same problem could arise with other uses of lexical scoping.
I will probably use this code next time I need pass-by-reference semantics.

Proper way to have two functions access a single function's environment?

Based on the answer provided in1088639, I set up a pair of functions which both access the same sub-function's environment. This example works, but I wanted to see if I'd missed some cleaner way to "connect" both top-level functions to the internal environment.
( Back story: I wanted to write a pair of complementary functions which shared a variable, e.g. "count" in this example, and meet CRAN package requirements which do not allow functions to modify the global environment. )
static.f <- function() {
count <- 0
f <- function(x) {
count <<- count + 1
return( list(mean=mean(x), count=count) )
}
return( f )
}
# make sure not to delete this command, even tho' it's not
# creating a function.
f1 <- static.f()
statfoo <- function(x){
tmp<-f1(x)
tmp<- list(tmp,plus=2)
return(tmp)
}
statbar <- function(x){
tmp<-f1(x)
tmp<- list(tmp,minus=3)
return(tmp)
}
Sample output:
> statfoo(5)
[[1]]
[[1]]$mean
[1] 5
[[1]]$count
[1] 1
$plus
[1] 2
Rgames> statfoo(5)
[[1]]
[[1]]$mean
[1] 5
[[1]]$count
[1] 2
$plus
[1] 2
> statbar(4)
[[1]]
[[1]]$mean
[1] 4
[[1]]$count
[1] 3
$minus
[1] 3
> statfoo(5)
[[1]]
[[1]]$mean
[1] 5
[[1]]$count
[1] 4
$plus
[1] 2
A cleaner method would be to use an object oriented approach. There is already an answer using reference classes.
A typical object oriented approach with classes would create a class and then create a singleton object, i.e. a single object of that class. Of course it is a bit wasteful to create a class only to create one object from it so here we provide a proto example. (Creating a function to enclose count and the function doing the real work has a similar problem -- you create an enclosing function only to run it once.) The proto model allows one to create an object directly bypassing the need to create a class only to use it once. Here foobar is the proto object with property count and methods stats, statfoo and statbar. Note that we factored out stats to avoid duplicating its code in each of statfoo and statbar. (continued further down)
library(proto)
foobar <- proto(count = 0,
stats = function(., x) {
.$count <- .$count + 1
list(mean = mean(x), count = .$count)
},
statfoo = function(., x) c(.$stats(x), plus = 2),
statbar = function(., x) c(.$stats(x), plus = 3)
)
foobar$statfoo(1:3)
foobar$statbar(2:4)
giving:
> foobar$statfoo(1:3)
$mean
[1] 2
$count
[1] 1
$plus
[1] 2
> foobar$statbar(2:4)
$mean
[1] 3
$count
[1] 2
$plus
[1] 3
A second design would be to have statfoo and statbar as independent functions and only keep count and stats in foobar (continued further down)
library(proto)
foobar <- proto(count = 0,
stats = function(., x) {
.$count <- .$count + 1
list(mean = mean(x), count = .$count)
}
)
statfoo <- function(x) c(foobar$stats(x), plus = 2)
statbar <- function(x) c(foobar$stats(x), plus = 3)
statfoo(1:3)
statbar(2:4)
giving similar output to the prior example.
Third approach Of course the second variation could easily be implemented by using local and a function getting us close to where you started. This does not use any packages but does not create a function only to throw it away:
foobar <- local({
count <- 0
function(x) {
count <<- count + 1
list(mean = mean(x), count = count)
}
})
statfoo <- function(x) c(foobar(x), plus = 2)
statbar <- function(x) c(foobar(x), plus = 3)
statfoo(1:3)
statbar(2:4)
Another simple option is tocreate an environment and assign it to both functions. Here I use simpler functions for illustrative purposes, but this can be easily extended:
f1 <- function() {count <<- count + 1; return(paste("hello", count))}
f2 <- function() {count <<- count + 1; return(paste("goodbye", count))}
environment(f1) <- environment(f2) <- list2env(list(count=0))
Then:
> f1()
[1] "hello 1"
> f2()
[1] "goodbye 2"
> f1()
[1] "hello 3"
Both functions have the same environment.
You can use reference class like this:
foobar <- setRefClass(
'foobar',
fields = list(count='numeric'),
methods = list(
initialize=function() {
.self$initFields(count = 0L)
},
statfoo = function(x) {
count <<- count + 1L
list(list(mean=mean(x), count=count), plus=2)
},
statbar = function(x){
count <<- count + 1L
list(list(mean=mean(x), count=count), minus=3)
}
)
)()
foobar$statfoo(5)
foobar$statbar(3)
It makes it relatively clear that neither statfoo nor statbar is a pure function.
You could get rid of the factory functions, and more explicitly use environments. A solution like this would work as well
.env<-(function() {
count <- 0
f <- function(x) {
count <<- count + 1
return( list(mean=mean(x), count=count))
}
return(environment())
})()
statfoo <- function(x){
list(.env$f(x),plus=2)
}
statbar <- function(x){
list(.env$f(x),minus=3)
}
The .env variable is created by immediately executing an anonymous function to get its environment. We then extract the function from the environment itself to modify its values.

Detect if environment is global environment

Sometimes it may be useful to detect if the environment is the global environment or not and act accordingly. I have come up with what I believe is a way to detect the environment and test if it's the global environment. I just don't want to be reinventing the wheel if there's a better way or if this has holes etc. Is there some sort of built in R method to do what global_test does below or a better approach?
global_test <- function() {
environmentName(parent.frame(n = 1)) == "R_GlobalEnv"
}
global_test()
lapply(1:10, function(i) {
global_test()
})
fun <- function() global_test()
fun()
I would simplify your life a little and use identical:
global_test <- function() {
identical( parent.frame(n = 1) , globalenv() )
}
And I think this should be slightly 'safer' than doing a character comparison because you can do this:
e <- new.env()
attr(e,"name") <- "R_GlobalEnv"
# And then...
environmentName(e)
#[1] "R_GlobalEnv"
And as pointed out by #eddi, using .GlobalEnv may also not be desirable because one can do:
.GlobalEnv <- 1
identical( parent.frame(n = 1) , .GlobalEnv )
#[1] FALSE
This use of identical is in fact one of the examples from the help page on ?identical:
## even for unusual R objects :
identical(.GlobalEnv, environment())
So even if we try to trick R the function still works:
e <- new.env()
attr(e,"name") <- "R_GlobalEnv"
.GlobalEnv <- 1
global_test()
#[1] TRUE
Maybe sys.nframe?
sys.nframe() == 0L
#[1] TRUE
fun <- function() {
sys.nframe() == 0L
}
fun()
#[1] FALSE

Resources