I would like to do perform analysis of structure
a <- list()
replicate_letter <- function(letter) {
return(data.frame(first_letter = rep(letter, 10),
second_letter = rep(letter, 10)))
}
get_letter <- function(letter) {
if (is.null(a[[letter]])) {
a[[letter]] <- replicate_letter(letter)
}
# Do further analysis, plotting,....
}
and perform it often just by calling get_letter. However, this does not work - function runs, but a is not altered.
I have figured out that this is due to attempt to change list inside the function as this
a <- list()
replicate_letter <- function(letter) {
return(data.frame(first_letter = rep(letter, 10),
second_letter = rep(letter, 10)))
}
for (letter in letters[1:3]) {
if (is.null(a[[letter]])) {
a[[letter]] <- replicate_letter(letter)
}
}
a
runs alright. How do I need to change function get_letter to make it work? Is it possible?
The problem is that a copy of the list is modified in the function's environment, and that copy is destroyed when the function exits. R differs in this way from many other languages, in that the global environment is not (by default) modified within functions.
You should have your function return the new list:
get_letter <- function(a, letter) {
if (is.null(a[[letter]])) {
a[[letter]] <- replicate_letter(letter)
}
# Do further analysis, plotting,....
return(a)
}
Calling:
a <- get_letter(a, 'c')
I'm not quite sure if I understand your situation, so perhaps take this with a grain of salt. But be aware that you are assigning the output of a[[letter]] <- replicate_letter(letter) only within the function (which then disappears afterwards). You could try ?<<- instead. Consider:
replicate_letter <- function(letter) {
return(data.frame(first_letter = rep(letter, 10),
second_letter = rep(letter, 10)))
}
get_letter <- function(letter) {
if (is.null(a[[letter]])) {
a[[letter]] <<- replicate_letter(letter)
}
# Do further analysis, plotting,....
}
get_letter("a")
a
# $a
# first_letter second_letter
# 1 a a
# 2 a a
# 3 a a
# 4 a a
# 5 a a
# 6 a a
# 7 a a
# 8 a a
# 9 a a
# 10 a a
The help for <<- reads (in part):
The operators <- and = assign into the environment in which they are evaluated. The operator <- can be used anywhere, whereas the operator = is only allowed at the top level (e.g., in the complete expression typed at the command prompt) or as one of the subexpressions in a braced list of expressions.
The operators <<- and ->> are normally only used in functions, and cause a search to be made through parent environments for an existing definition of the variable being assigned. If such a variable is found (and its binding is not locked) then its value is redefined, otherwise assignment takes place in the global environment. Note that their semantics differ from that in the S language, but are useful in conjunction with the scoping rules of R. See ‘The R Language Definition’ manual for further details and examples.
(The boldface is mine.)
Related
In Advanced R, environments are advertised as a useful way to get pass-by-reference semantics in R: instead of passing a list, which gets copied, I can pass an environment, which is not. This is useful to know.
But it assumes that whoever is calling my function is happy to agree on an "environment"-based data type, with named slots corresponding to the variables we want to modify.
Hasn't someone made a class which allows me to just refer to a single variable by reference? For example,
v = 1:5
r <- ref(v)
(function() {
getRef(r) # same as v
setRef(r, 1:6) # same as v <<- 1:6, in this case
})()
It would seem to be pretty easy to do this, by storing the character name of v together with the environment where it is bound.
Is there a standard library which accomplishes this semantics, or can someone provide a short snippet of code? (I haven't finished reading "Advanced R"; apologies if this is covered later in the book)
As you have already mentioned in your question, you can store the variable name and its environment and access it with get and assign what will be somehow like a reference to a single variable.
v <- 1:5
r <- list(name="v", env=environment())
(function() {
get(r$name, envir = r$env)
assign(r$name, 1:6, envir = r$env)
})()
v
#[1] 1 2 3 4 5 6
Alternatively you can store the reference to an environment but then you can access everything in this referenced environment.
v <- 1:5
r <- globalenv() #reference to everything in globalenv
(function() {
r$v
r$v <- 1:6
})()
v
#[1] 1 2 3 4 5 6
You can also create an environment with only one variable and make a reference to it.
v <- new.env(parent=emptyenv())
v$v <- 1:5
r <- v
(function() {
r$v
r$v <- 1:6
})()
v$v
#[1] 1 2 3 4 5 6
Implemented as functions using find or set the environment during creation. Have also a look at How to get environment of a variable in R.
ref <- function(name, envir = NULL) {
name <- substitute(name)
if (!is.character(name)) name <- deparse(name)
if(length(envir)==0) envir <- as.environment(find(name))
list(name=name, envir=envir)
}
getRef <- function(r) {
get(r$name, envir = r$envir, inherits = FALSE)
}
setRef <- function(r, x) {
assign(r$name, x, envir = r$envir, inherits = FALSE)
}
x <- 1
r1 <- ref(x) #x from Global Environment
#x from Function Environment
r2 <- (function() {x <- 2; ref(x, environment())})()
#But simply returning x might here be better
r2b <- (function() {x <- 2; x})()
a <- new.env(parent=emptyenv())
a$x <- 3
r3 <- ref(x, a) #x from Environment a
This is based on GKi's answer, thanks to him for stepping up.
It includes pryr::where so you don't have to install the whole library
Note that we need to point "where" to parent.frame() in the definition of "ref"
Added some test cases which I used to check correctness
The code:
# copy/modified from pryr::where
where = function(name, env=parent.frame()) {
if (identical(env, emptyenv())) {
stop("Can't find ", name, call. = FALSE)
}
if (exists(name, env, inherits = FALSE)) {
env
} else {
where(name, parent.env(env))
}
}
ref <- function(v) {
arg <- deparse(substitute(v))
list(name=arg, env=where(arg, env=parent.frame()))
}
getRef <- function(r) {
get(r$name, envir = r$env, inherits = FALSE)
}
setRef <- function(r, x) {
assign(r$name, x, envir = r$env)
}
if(1) { # tests
v <- 1:5
r <- ref(v)
(function() {
stopifnot(identical(getRef(r),1:5))
setRef(r, 1:6)
})()
stopifnot(identical(v,1:6))
# this refers to v in the global environment
v=2; r=(function() {ref(v)})()
stopifnot(getRef(r)==2)
setRef(r,5)
stopifnot(getRef(r)==5)
stopifnot(v==5)
# same as above
v=2; r=(function() {v <<- 3; ref(v)})()
stopifnot(getRef(r)==3)
setRef(r,5)
stopifnot(getRef(r)==5)
stopifnot(v==5)
# this creates a local binding first, and refers to that. the
# global binding is unaffected
v=2; r=(function() {v=3; ref(v)})()
stopifnot(getRef(r)==3)
setRef(r,5)
stopifnot(getRef(r)==5)
stopifnot(v==2)
# additional tests
r=(function() {v=4; (function(v1) { ref(v1) })(v)})()
stopifnot(r$name=="v1")
stopifnot(getRef(r)==4)
setRef(r,5)
stopifnot(getRef(r)==5)
# check that outer v is not modified
v=2; r=(function() {(function(v1) { ref(v1) })(v)})()
stopifnot(getRef(r)==2)
setRef(r,5)
stopifnot(getRef(r)==5)
stopifnot(v==2)
}
I imagine there may be some garbage collection inefficiency if you're creating a reference to a small variable in a temporary environment with a different large variable, since the reference must retain the whole environment - although the same problem could arise with other uses of lexical scoping.
I will probably use this code next time I need pass-by-reference semantics.
I'm having a little trouble understanding why, in R, the two functions below, functionGen1 and functionGen2 behave differently. Both functions attempt to return another function which simply prints the number passed as an argument to the function generator.
In the first instance the generated functions fail as a is no longer present in the global environment, but I don't understand why it needs to be. I would've thought it was passed as an argument, and is replaced with aNumber in the namespace of the generator function, and the printing function.
My question is: Why do the functions in the list list.of.functions1 no longer work when a is not defined in the global environment? (And why does this work for the case of list.of.functions2 and even list.of.functions1b)?
functionGen1 <- function(aNumber) {
printNumber <- function() {
print(aNumber)
}
return(printNumber)
}
functionGen2 <- function(aNumber) {
thisNumber <- aNumber
printNumber <- function() {
print(thisNumber)
}
return(printNumber)
}
list.of.functions1 <- list.of.functions2 <- list()
for (a in 1:2) {
list.of.functions1[[a]] <- functionGen1(a)
list.of.functions2[[a]] <- functionGen2(a)
}
rm(a)
# Throws an error "Error in print(aNumber) : object 'a' not found"
list.of.functions1[[1]]()
# Prints 1
list.of.functions2[[1]]()
# Prints 2
list.of.functions2[[2]]()
# However this produces a list of functions which work
list.of.functions1b <- lapply(c(1:2), functionGen1)
A more minimal example:
functionGen1 <- function(aNumber) {
printNumber <- function() {
print(aNumber)
}
return(printNumber)
}
a <- 1
myfun <- functionGen1(a)
rm(a)
myfun()
#Error in print(aNumber) : object 'a' not found
Your question is not about namespaces (that's a concept related to packages), but about variable scoping and lazy evaluation.
Lazy evaluation means that function arguments are only evaluated when they are needed. Until you call myfun it is not necessary to evaluate aNumber = a. But since a has been removed then, this evaluation fails.
The usual solution is to force evaluation explicitly as you do with your functionGen2 or, e.g.,
functionGen1 <- function(aNumber) {
force(aNumber)
printNumber <- function() {
print(aNumber)
}
return(printNumber)
}
a <- 1
myfun <- functionGen1(a)
rm(a)
myfun()
#[1] 1
I'm trying to figure out how to allow a function to directly alter or create variables in its parent environment, whether the parent environment is the global environment or another function.
For example if I have a function
my_fun <- function(){
a <- 1
}
I would like a call to my_fun() to produce the same results as doing a <- 1.
I know that one way to do this is by using parent.frame as per below but I would prefer a method that doesn't involve rewriting every variable assignment.
my_fun <- function(){
env = parent.frame()
env$a <- 1
}
Try with:
g <- function(env = parent.frame()) with(env, { b <- 1 })
g()
b
## [1] 1
Note that normally it is preferable to pass the variables as return values rather than directly create them in the parent frame. If you have many variables to return you can always return them in a list, e.g. h <- function() list(a = 1, b = 2); result <- h() Now result$a and result$b have the values of a and b.
Also see Function returning more than one value.
How can I generate a tree with an unknown number of nodes, each of which have an unknown and varying number of children, with the condition that a list of the child nodes for a given parent node is generated by some fun(parent)? Note that I'm using library(data.tree) from cran to make my tree hierarchy.
The tree will always begin with a node defined by a given parent vector. There will always be a finite amount of nodes. Every node will have the same length as the root node.
I've tried to create the question in a general sense out of context, but it has just been too general to provide definitive feedback. Accordingly, here is the script that is presently not quite there:
require(data.tree)
#also requires Generating Scripts (link at bottom) to run
# Helper function to insert nodes as children of parents with unique names
i=1
assn <- function(child,parentvarname){
child<-paste(child,collapse=" ")
nam <- paste("v", i, sep = "")
# assign node to variable called vi
# and make the tree global so it can be seen outside the function
assign(nam, parentvarname$AddChild(child),envir = .GlobalEnv)
noquote(nam)->a
i+1
a #output the child variable name vi for the sake of recursion
}
cdrtree<- function(root){
#assign root
v0 <- Node$new(root) #assign root to the root of the tree
node<-root #rename variable for clarity in next step
kidparentname<-v0 #recursion starts at v0
have.kids<-function(node){ #this is unfortunately asexual reproduction...
for(pointer in cdrpointers(node)){ #A variable number of pointers are
#used to determine the next node(s) if any with function cdrmove
cdrmove(node,pointer)->newkid #make a child
assn(newkid,kidparentname) #enter this node in the tree hierarchy
#get the name of newkid for next iteration and write name to tree
kidparentname<-assn(newkid,kidparentname)
node<-newkid #rename node variable for the next iteration
have.kids(newkid) #recurse, likely the problem is here
}
return(v0) #return the tree (if the code works...)
}
}
Running the script on a possible root node node gives a strange result:
> cdrtree(c(1,-2,3))
> cdrtree(c(1,-2,3))->a
> a
function(node){ #this is unfortunately asexual reproduction...
for(pointer in cdrpointers(node)){ #A variable number of pointers are
... #all code as written above ...
}
<environment: 0x00000000330ee348>
If you want a true working example, you can grab and source "Generating Scripts.R" from here and run it with any signed permutation of 1:n with n>2 as an argument similar to my example.
To be extra clear, the tree with root node c(1,-2,3) would hypothetically look something like this:
I don't think your function are working as expected. For example, using your starting value,
lapply(cdrpointers(c(1,-2,3)), function(i) cdrmove(c(1,-2,3), i))
[[1]]
[1] 1 2 3
[[2]]
[1] 1 2 3
But, assuming those work. you could try the following and determine if they are being used incorrectly.
## Name nodes uniquely, dont be assigning to the .Globalenv like
## you are in `assn`, which wont work becuse `i` isn't being incremented.
## You could invcrement `i` in the global, but, instead,
## I would encapsulate `i` in the function's parent.frame, avoiding possible conflicts
nodeNamer <- function() {
i <- 0
## Note: `i` is incremented outside of the scope of this function using `<<-`
function(node) sprintf("v%g", (i <<- i+1))
}
## Load your functions, havent looked at these too closely,
## so just gonna assume they work
source(file="https://raw.githubusercontent.com/zediiiii/CDS/master/Generating%20Scripts.r")
cdrtree <- function(root.value) {
root <- Node$new('root') # assign root
root$value <- root.value # There seems to be a separation of value from name
name_node <- nodeNamer() # initialize the node counter to name the nodes
## Define your recursive helper function
## Note: you could do without this and have `cdrtree` have an additional
## parameter, say tree=NULL. But, I think the separation is nice.
have.kids <- function(node) {
## this function (`cdrpointers`) needs work, it should return a 0 length list, not print
## something and then error if there are no values
## (or throw and error with the message if that is what you want)
pointers <- tryCatch({cdrpointers(node$value)}, error=function(e) return( list() ))
if (!length(pointers)) return()
for (pointer in pointers) {
child_val <- cdrmove(node$value, pointer) # does this always work?
child <- Node$new(name_node()) # give the node a name
child$value <- child_val
child <- node$AddChildNode(child)
Recall(child) # recurse with child
}
}
have.kids(root)
return( root )
}
library(data.tree)
res <- cdrtree(root.value=c(1,-2,3))
After much help from #TheTime I have a solid solution to this question.
Though it's a lot of code, I would like to post it because there are a few tricky issues with duplicate values:
####################
# function: cdrtree()
# purpose: Generates a CDR tree with uniquely named nodes (uniqueness is required for igraph export)
# parameters: root.value: the value of the seed to generate the tree. Values of length>6 are not recommended.
# Author: Joshua Watson Nov 2015, help from TheTime #stackoverflow
# Dependancies: sort.listss.r ; gen.bincomb.r
require(combinat)
require(data.tree)
#Two helper functions for keeping names distinct.
nodeNamer <- function() {
i <- 0
function(node) sprintf("v%g", (i <<- i+1))
}
nodeNamer2 <- function() {
j <- 0
function(node) sprintf("%g", (j <<- j+1))
}
cdrtree <- function(root.value, make.igraph=FALSE) {
templist<- list()
root <- Node$new('v0')
root$value <- root.value
root$name <- paste(unlist(root$value),collapse=' ') #name this the same as the value collapsed in type char
name.node <- nodeNamer() # initialize the node counters to name the nodes
name.node2 <- nodeNamer2()
#recursive function that produces chidlren and names them appropriately
have.kids <- function(node) {
pointers <- tryCatch({cdrpointers(node$value)}, error=function(e) return( list() ))
if (!length(pointers)) return()
for (pointer in pointers) {
child.val <- cdrmove(node$value, pointer) #make the cdr move on the first pointer
child <- Node$new(name.node())
child$value <- child.val
#child$name <- paste(" ",unlist(child$value),collapse=' ') # Name it for text
child$name <- paste(unlist(child$value),collapse=' ') # Name it For Graphics
child <- node$AddChildNode(child)
#identical ending name handling catches duplicates. Names WIN+, WIN-, and DRAW outcomes
endname<-paste(unlist(tail(gen.cdrpile(length(root.value)), n=1)[[1]]),collapse=' ')
startname<-paste(unlist(root$value),collapse=' ')
if(child$name==endname){
child$name <- paste(name.node2(),"-WIN",child$name,sep='')
} else {
if(child$name==startname){
child$name <- paste(name.node2(),"+WIN",child$name,sep='')
} else {
#if all negative or all postitive then it is terminal and could be a duplicate, rename it
if((sum(child$value < 0) == length(root.value)) || (sum(child$value < 0 ) == 0 )){
child$name <- paste(name.node2(),"DRAW",child$name,sep='')
} else {
#catch the other duplicate cases that aren't listed above
if((child$name %in% templist == TRUE) || (child$name == root$name)){
child$name <- paste(name.node2(),"DUP",child$name,sep='')
#templist[[length(pointerlist)+1]] <-
}
}
}
}
#make a list of names for the last duplicate catcher
append(child$name,templist)->>templist
Recall(child) # recurse with child
}
}
have.kids(root)
return( root )
}
I have a question about function environments in the R language.
I know that everytime a function is called in R, a new environment E
is created in which the function body is executed. The parent link of
E points to the environment in which the function was created.
My question: Is it possible to specify the environment E somehow, i.e., can one
provide a certain environment in which function execution should happen?
A function has an environment that can be changed from outside the function, but not inside the function itself. The environment is a property of the function and can be retrieved/set with environment(). A function has at most one environment, but you can make copies of that function with different environments.
Let's set up some environments with values for x.
x <- 0
a <- new.env(); a$x <- 5
b <- new.env(); b$x <- 10
and a function foo that uses x from the environment
foo <- function(a) {
a + x
}
foo(1)
# [1] 1
Now we can write a helper function that we can use to call a function with any environment.
with_env <- function(f, e=parent.frame()) {
stopifnot(is.function(f))
environment(f) <- e
f
}
This actually returns a new function with a different environment assigned (or it uses the calling environment if unspecified) and we can call that function by just passing parameters. Observe
with_env(foo, a)(1)
# [1] 6
with_env(foo, b)(1)
# [1] 11
foo(1)
# [1] 1
Here's another approach to the problem, taken directly from http://adv-r.had.co.nz/Functional-programming.html
Consider the code
new_counter <- function() {
i <- 0
function() {
i <<- i + 1
i
}
}
(Updated to improve accuracy)
The outer function creates an environment, which is saved as a variable. Calling this variable (a function) effectively calls the inner function, which updates the environment associated with the outer function. (I don't want to directly copy Wickham's entire section on this, but I strongly recommend that anyone interested read the section entitled "Mutable state". I suspect you could get fancier than this. For example, here's a modification with a reset option:
new_counter <- function() {
i <- 0
function(reset = FALSE) {
if(reset) i <<- 0
i <<- i + 1
i
}
}
counter_one <- new_counter()
counter_one()
counter_one()
counter_two <- new_counter()
counter_two()
counter_two()
counter_one(reset = TRUE)
I am not sure I completely track the goal of the question. But one can set the environment that a function executes in, modify the objects in that environment and then reference them from the global environment. Here is an illustrative example, but again I do not know if this answers the questioners question:
e <- new.env()
e$a <- TRUE
testFun <- function(){
print(a)
}
testFun()
Results in: Error in print(a) : object 'a' not found
testFun2 <- function(){
e$a <- !(a)
print(a)
}
environment(testFun2) <- e
testFun2()
Returns: FALSE
e$a
Returns: FALSE