Recursive manipulation of list elements in R - r

I have a nested list in the global environment of a R script.
anno <- list()
anno[['100']] <- list(
name = "PLACE",
color = "#a6cee3",
isDocumentAnnotation = T,
sublist = list()
)
person_sublist <- list()
person_sublist[['200']] <- list(
name = "ACTOR",
color = "#7fc97f",
isDocumentAnnotation = T,
sublist = list()
)
person_sublist[['300']] <- list(
name = "DIRECTOR",
color = "#beaed4",
isDocumentAnnotation = T,
sublist = list()
)
anno[['400']] <- list(
name = "PERSON",
color = "#1f78b4",
isDocumentAnnotation = T,
sublist = person_sublist
)
While running my process I interactively select elements via the id (100,200, ...). In return a want to add, delete or move elements in the list.
For this reason I thought of using a recursive function to navigate through the list:
searchListId <- function(parent_id = NULL, annotation_system = NULL)
{
for(id in names(annotation_system))
{
cat(paste(id,"\n"))
if(id == parent_id)
{
return(annotation_system[[id]]$sublist)
}
else
{
if(length(annotation_system[[id]]$sublist) > 0)
{
el <- searchListId(parent_id, annotation_system[[id]]$sublist)
if(!is.null(el))
return(el)
}
}
}
return(NULL)
}
searchListId('100', anno)
This functions returns the list() found in the sublist element of the matching element in the 'anno'-list. My problem is the global environment of R. If I manipulate something (delete, add, move something within the returned sublist) i need to reset the global variable with <<-. But in the case of a recursive function I only hold the current sublist in the context where the parent_id matches. How could one reference a global nested list in R while navigating though it via an recursive function? Is that even possible in R?
The calls I want to carry out in order to delete, add, or move elements in the list 'anno' are:
deleteListId('100', anno) #Should return the list without the element 100
addListId('400', anno) #Should return the list with a new element nested in '400'
switchListId('400','200', anno) #Should return a list where the elements with the according keys are switched.
The tricky part though is that I don't know how deep the recursive structure is. Normally I would use element references to manipulate them directly but how could a solution for manipulation of nested lists in R look like if I want to use recursion?

If possible, have the recursive function take a list, alter that, and return the new version. The reason I suggest this is because it's idiomatic R. R leans toward being a functional language, and part of that means state-based actions are discouraged. In general, functions should only modify state if that's all they do. For example, scale(x) doesn't affect the value stored in the x variable. But x <- scale(x) does, because the <- function (yes, it's a function) is meant to modify state.
Also, don't worry about memory unless you know it will be a problem based on past experience. Behind the scenes, R is pretty good at preventing needless copying, so trust it to do the right thing. This lets you work with simpler mental models.
A skeleton of how to recursively modify a list, without affecting the original:
anno <- list()
anno[['A1']] <- list(
sublist = list(
A3 = list(sublist = NULL),
A4 = list(sublist = list(A6 = list(sublist = NULL))),
A5 = list(sublist = NULL)
)
)
change_list <- function(x) {
for (i in seq_along(x)) {
value <- x[[i]]
if (is.list(value)) {
x[[i]] <- change_list(value)
} else {
if (is.null(value)) {
x[[i]] <- "this ws null"
}
}
}
x
}
change_list(anno)
# $A1
# $A1$sublist
# $A1$sublist$A3
# $A1$sublist$A3$sublist
# [1] "something new"
#
#
# $A1$sublist$A4
# $A1$sublist$A4$sublist
# $A1$sublist$A4$sublist$A6
# $A1$sublist$A4$sublist$A6$sublist
# [1] "something new"
#
#
#
#
# $A1$sublist$A5
# $A1$sublist$A5$sublist
# [1] "something new"
If you absolutely need to modify an item in the global namespace, use environments instead of lists.
anno_env <- new.env()
anno_env[["A1"]] <- new.env()
anno_env[["A1"]][["sublist"]] <- new.env()
anno_env[["A1"]][["sublist"]][["A3"]] <- NULL
anno_env[["A1"]][["sublist"]][["A4"]] <- NULL
change_environment <- function(environ) {
for (varname in ls(envir = environ)) {
value <- environ[[varname]]
if (is.environment(value)) {
change_environment(value)
} else {
environ[[varname]] <- "something new"
}
}
}
change_environment(anno_env)
anno_env[["A1"]][["sublist"]][["A3"]]
# [1] "something new"

Related

identical() but for environments/R6 in base R?

If I can run code before and after a user runs some code, how can I detect which variables were set or changed using base R? I can do this using identical() for non-environment objects. But is there a base-R solution for environments, including R6 classes?
Here's a solution using identical() which fails for envs/R6:
# Copy of initial vars
this_frame = sys.frame()
start_vars = ls()
start_copy = lapply(start_vars, get, envir = this_frame )
names(start_copy) = start_vars
# (user code here)
# Assess what's new and what's changed
end_vars = ls()
new_vars = end_vars[end_vars %in% start_vars == FALSE]
old_vars = end_vars[end_vars %in% start_vars == TRUE]
changed_vars = old_vars[sapply(old_vars, function(x) identical(get(x, envir = this_frame), start_copy[[x]])) == FALSE]
I'm writing a package that lets users run code in a separate session. I'd like to return only objects that were changed.
This solution detects changes in an environment, sub-environments, and R6-classes.
General approach
run start_state = env_as_list() on sys.frame()which stores everything in a list and recursively converts all environments/R6 and sub-environments/R6 to list.
Let the user manipulate stuff
Run end_state = env_as_list() and use identical() to detect changes between start_state and end_state.
env_as_list = function(env) {
rapply(
object = as.list(env, all.names = TRUE),
f = function(x) {
if ("R6" %in% class(x)) {
# R6 to list without recursion
x = as.list(x, all.names = TRUE)
x$.__enclos_env__$self = NULL
x$.__enclos_env__$super = NULL
env_as_list(x)
} else if (is.environment(x)) {
env_as_list(x)
} else {
stop("Impossible to get here")
}
},
classes = c("environment", "R6"),
how = "replace"
)
}
Demonstration
Let's test it: let's fill globalenv() with a some stuff to begin with:
R6_class = R6::R6Class("Testing", list(a = 1))
my_R6 = R6_class$new()
my_env = new.env()
my_env$sub_env = new.env()
my_env$sub_env$some_value = 2
my_regular = rnorm(5)
Snapshot time!
start_state = env_as_list(sys.frame())
Let the user play:
my_R6$a = 99 # Change R6
new_regular = 3 # new var
my_env$sub_env$some_value = 99 # Change sub-environment
Snapshot again!
end_state = env_as_list(sys.frame())
end_state$start_state = NULL # don't include this
Did nothing change?
> identical(start_state, end_state))
# FALSE
Which variables changed?
> is_same = lapply(names(end_state), function(x) identical(start_state[[x]], end_state[[x]]))
> names(end_state)[is_same == FALSE]
# "my_env" "new_regular" "my_R6"
Bonus
You can also use this to compute the size of an environment, including all R6 and sub-environments. Simply:
object.size(env_as_list(globalenv()))

How to store different outputs inside a funcion? [duplicate]

This question already has answers here:
Returning multiple objects in an R function [duplicate]
(6 answers)
Closed 3 years ago.
I want to store different output variables that are calculated inside a function.
I coded a toy example:
f = function(number)
{
xx = NULL
savexx = NULL
savexx10 = NULL
for (i in 1:10) {
x = number*i
xx = c(xx,x)
}
save_phrase = "hello"
savexx = xx
savexx10 = xx*10
save = cbind(savexx,savexx10)
}
store = f(1)
store
But with this code it is returning only the variable save = cbind(savexx,savexx10).
I would like to save all the 4 variables that are created inside this function.
Is it possible doing this without using a dataframe or a list?
It is impossible without a list. List would be better than a data.frame because it can store different types of variables (vector, table, plot ect.) Try to do it like here:
f = function(number)
{
xx = NULL
savexx = NULL
savexx10 = NULL
for (i in 1:10) {
x = number*i
xx = c(xx,x)
}
lista <- list()
lista$save_phrase = "hello"
lista$savexx = xx
lista$savexx10 = xx*10
lista$save = cbind(lista$savexx, lista$savexx10)
lista
}
store = f(1)
# whole list:
store
# elements of a list:
store$save_phrase
store$savexx
store$savexx10
store$save
1) list We can return the desired variables in a list.
f2 = function(number) {
xx = NULL
savexx = NULL
savexx10 = NULL
for (i in 1:10) {
x = number*i
xx = c(xx,x)
}
list(save_phrase = "hello",
savexx = xx,
savexx10 = xx*10,
save = cbind(savexx,savexx10))
}
store = f2(1)
2) mget Another way to do this is to use mget if the returned variables have a pattern to their names as in this case:
f3 = function(number) {
xx = NULL
savexx = NULL
savexx10 = NULL
for (i in 1:10) {
x = number*i
xx = c(xx,x)
}
save_phrase = "hello"
savexx = xx
savexx10 = xx*10
save = cbind(savexx,savexx10)
mget(ls(pattern = "save"))
}
store = f3(1)
3) gsubfn gsubfn has a facility for placing the list components into separate variables. After this is run save_phrase, savexx, savexx10 and save will exists as separate variables.
library(gsubfn)
list[save_phrase, savexx, savexx10, save] <- f2(1)
4) attach Although this is not really recommended you can do this:
attach(f2(1), name = "f2")
This will create an entry on the search list with the variables that were returned so we can just refer to save_phrase, savexx, savexx10 and save. We can see the entry using search() and ls("f2") and we can remove the entry using detach("f2") .
5) assign Another possibility which is not really recommended but does work is to assign the components right into a specific environment. Now save_phrase, savexx, savexx10 and save will all exist in the global environment.
list2env(f2(1), .GlobalEnv)
Similarly this will inject those variables into the current environment. This is the same as the prior line if the current environment is the global environment.
list2env(f2(1), environment())
6) Again, I am not so sure this is a good idea but we could modify f to inject the outputs right into the parent frame. After this is run save_phrase, savexx, savexx10 and save will all exist in the current environment.
f4 = function(number, env = parent.frame()) {
xx = NULL
savexx = NULL
savexx10 = NULL
for (i in 1:10) {
x = number*i
xx = c(xx,x)
}
env$save_phrase = "hello"
env$savexx = xx
env$savexx10 = xx*10
env$save = cbind(savexx,savexx10)
invisible(env)
}
f4(1)
R functions only return a SINGLE object. If you want multiple objects returned they have to be combined into a list or some other type of object.
Some languages like python let us do stuff like this:
a, b = mult_return_func()
But R will only return a single object. R programmers typically use lists to return multiple objects.
If there is no return statement, then R will return the value of the last evaluated expression in the function.
This would explain why it is returning save = cbind(savexx,savexx10).
To return multiple values you will need a list or another object because the R return function can only return a single object.
My suggestion would be to add those values to a list, return the list, and then get the variables from the list.
I hope that helps. If you'd like to read more then I suggest going to https://www.datamentor.io/r-programming/return-function/

Getting name of an object from list in Map

Given the following data:
list_A <- list(data_cars = mtcars,
data_air = AirPassengers,
data_list = list(A = 1,
B = 2))
I would like to print names of objects available across list_A.
Example:
Map(
f = function(x) {
nm <- deparse(match.call()$x)
print(nm)
# nm object is only needed to properly name flat file that may be
# produced within Map call
if (any(class(x) == "list")) {
length(x) + 1
} else {
length(x) + 1e6
saveRDS(object = x,
file = tempfile(pattern = make.names(nm), fileext = ".RDS"))
}
},
list_A
)
returns:
[1] "dots[[1L]][[1L]]"
[1] "dots[[1L]][[2L]]"
[1] "dots[[1L]][[3L]]"
$data_cars
NULL
$data_air
NULL
$data_list
[1] 3
Desired results
I would like to get:
`data_cars`
`data_air`
`data_list`
Update
Following the comments, I have modified the example to make it more reflective of my actual needs which are:
While using Map to iterate over list_A I'm performing some operations on each element of the list
Periodically I want to create a flat file with name reflecting name of object that was processed
In addition to list_A, there are also list_B, list_C and so forth. Therefore, I would like to avoid calling names(list) inside the function f of the Map as I will have to modify it n number of times. The solution I'm looking to find should lend itself for:
Map(function(l){...}, list_A)
So I can later replace list_A. It does not have to rely on Map. Any of the apply functions would do; same applied to purrr-based solutions.
Alternative example
do_stuff <- function(x) {
nm <- deparse(match.call()$x)
print(nm)
# nm object is only needed to properly name flat file that may be
# produced within Map call
if (any(class(x) == "list")) {
length(x) + 1
} else {
length(x) + 1e6
saveRDS(object = x,
file = tempfile(pattern = make.names(nm), fileext = ".RDS"))
}
}
Map(do_stuff, list_A)
As per the notes below, I want to avoid having to modify do_stuff function as I will be looking to do:
Map(do_stuff, list_A)
Map(do_stuff, list_B)
Map(do_stuff, list_...)
We could wrap it into a function, and do it in two steps:
myFun <- function(myList){
# do stuff
res <- Map(
f = function(x) {
#do stuff
head(x)
},
myList)
# write to a file, here we might add control
# if list is empty do not output to a file
for(i in names(res)){
write.table(res[[ i ]], file = paste0(i, ".txt"))
}
}
myFun(list_A)
Would something like this work ?
list_A2 <- Map(list, x = list_A,nm = names(list_A) )
trace(do_stuff, quote({ nm <- x$nm; x<- x$x}), at=3)
Map(do_stuff, list_A2)

Changing a list inside a function

I would like to do perform analysis of structure
a <- list()
replicate_letter <- function(letter) {
return(data.frame(first_letter = rep(letter, 10),
second_letter = rep(letter, 10)))
}
get_letter <- function(letter) {
if (is.null(a[[letter]])) {
a[[letter]] <- replicate_letter(letter)
}
# Do further analysis, plotting,....
}
and perform it often just by calling get_letter. However, this does not work - function runs, but a is not altered.
I have figured out that this is due to attempt to change list inside the function as this
a <- list()
replicate_letter <- function(letter) {
return(data.frame(first_letter = rep(letter, 10),
second_letter = rep(letter, 10)))
}
for (letter in letters[1:3]) {
if (is.null(a[[letter]])) {
a[[letter]] <- replicate_letter(letter)
}
}
a
runs alright. How do I need to change function get_letter to make it work? Is it possible?
The problem is that a copy of the list is modified in the function's environment, and that copy is destroyed when the function exits. R differs in this way from many other languages, in that the global environment is not (by default) modified within functions.
You should have your function return the new list:
get_letter <- function(a, letter) {
if (is.null(a[[letter]])) {
a[[letter]] <- replicate_letter(letter)
}
# Do further analysis, plotting,....
return(a)
}
Calling:
a <- get_letter(a, 'c')
I'm not quite sure if I understand your situation, so perhaps take this with a grain of salt. But be aware that you are assigning the output of a[[letter]] <- replicate_letter(letter) only within the function (which then disappears afterwards). You could try ?<<- instead. Consider:
replicate_letter <- function(letter) {
return(data.frame(first_letter = rep(letter, 10),
second_letter = rep(letter, 10)))
}
get_letter <- function(letter) {
if (is.null(a[[letter]])) {
a[[letter]] <<- replicate_letter(letter)
}
# Do further analysis, plotting,....
}
get_letter("a")
a
# $a
# first_letter second_letter
# 1 a a
# 2 a a
# 3 a a
# 4 a a
# 5 a a
# 6 a a
# 7 a a
# 8 a a
# 9 a a
# 10 a a
The help for <<- reads (in part):
The operators <- and = assign into the environment in which they are evaluated. The operator <- can be used anywhere, whereas the operator = is only allowed at the top level (e.g., in the complete expression typed at the command prompt) or as one of the subexpressions in a braced list of expressions.
The operators <<- and ->> are normally only used in functions, and cause a search to be made through parent environments for an existing definition of the variable being assigned. If such a variable is found (and its binding is not locked) then its value is redefined, otherwise assignment takes place in the global environment. Note that their semantics differ from that in the S language, but are useful in conjunction with the scoping rules of R. See ‘The R Language Definition’ manual for further details and examples.
(The boldface is mine.)

Transform a dataframe into a tree structure list of lists

I have a data.frame with two columns representing a hierarchical tree, with parents and nodes.
I want to transform its structure in a way that I can use as an input for the function d3tree, from d3Network package.
Here's my data frame:
df <- data.frame(c("Canada","Canada","Quebec","Quebec","Ontario","Ontario"),c("Quebec","Ontario","Montreal","Quebec City","Toronto","Ottawa"))
names(df) <- c("parent","child")
And I want to transform it to this structure
Canada_tree <- list(name = "Canada", children = list(
list(name = "Quebec",
children = list(list(name = "Montreal"),list(name = "Quebec City"))),
list(name = "Ontario",
children = list(list(name = "Toronto"),list(name = "Ottawa")))))
I have succesfully transformed this particular case using this code below:
fill_list <- function(df,node) node <- as.character(node)if (is.leaf(df,node)==TRUE){
return (list(name = node))
}
else {
new_node = df[df[,1] == node,2]
return (list(name = node, children = list(fill_list(df,new_node[1]),fill_list(df,new_node[2]))))
}
The problem is, it only works with trees which every parent node has exactly two children.
You can see I hard coded the two children (new_node[1] and new_node[2]) as inputs for my recursive function.
I'm trying to figure out a way that I could call the recursive function as many time as the parent's node children.
Example:
fill_list(df,new_node[1]),...,fill_list(df,new_node[length(new_node)])
I tried these 3 possibilities but none of it worked:
First: Creating a string with all the functions and parameters and then evaluating. It return this error could not find function fill_functional(df,new_node[1]). That's because my function wasn´t created by the time I called it after all.
fill_functional <- function(df,node) {
node <- as.character(node)
if (is.leaf(df,node)==TRUE){
return (list(name = node))
}
else {
new_node = df[df[,1] == node,2]
level <- length(new_node)
xxx <- paste0("(df,new_node[",seq(level),"])")
lapply(xxx,function(x) eval(call(paste("fill_functional",x,sep=""))))
}
}
Second: Using a for loop. But I only got the children of my root node.
L <- list()
fill_list <- function(df,node) {
node <- as.character(node)
if (is.leaf(df,node)==TRUE){
return (list(name = node))
}
else {
new_node = df[df[,1] == node,2]
for (i in 1:length(new_node)){
L[i] <- (fill_list(df,new_node[i]))
}
return (list(name = node, children = L))
}
}
Third: Creating a function that populates a list with elements that are functions, and just changing the arguments. But I wasn't able to accomplish anything interesting, and I'm afraid I'll have the same problem as I did on my first try described above.
Here is a recursive definition:
maketreelist <- function(df, root = df[1, 1]) {
if(is.factor(root)) root <- as.character(root)
r <- list(name = root)
children = df[df[, 1] == root, 2]
if(is.factor(children)) children <- as.character(children)
if(length(children) > 0) {
r$children <- lapply(children, maketreelist, df = df)
}
r
}
canadalist <- maketreelist(df)
That produces what you desire. This function assumes that the first column of the data.frame (or matrix) you pass in contains the parent column and the second column has the child. it also takes a root parameter which allows you to specify a starting points. It will default to the first parent in the list.
But if you really are interested in playing round with trees. The igraph package might be of interest
library(igraph)
g <- graph.data.frame(df)
plot(g)

Resources