R - merge lists with overwrite and recursion

R - merge lists with overwrite and recursion - r

Suppose I have two lists with names,
a = list( a=1, b=2, c=list( d=1, e=2 ), d=list( a=1, b=2 ) )
b = list( a=2, c=list( e=1, f=2 ), d=3, e=2 )
I'd like to recursively merge those lists, overwriting entries if the second argument contains conflicting values. I.e. the expected output would be
$a
[1] 2
$b
[1] 2
$c
$c$d
[1] 1
$c$e
[1] 1
$c$f
[1] 2
$d
[1] 3
$e
[1] 2
Any hint?

I am not so sure if a custom function is necessary here. There is a function utils::modifyList() to perform this exact same operation! See modifyList for more info.
a <- list( a=1, b=2, c=list( d=1, e=2 ), d=list( a=1, b=2 ) )
b <- list( a=2, c=list( e=1, f=2 ), d=3, e=2 )
modifyList(a, b) # updates(modifies) 'a' with 'b'
Which gives the following
$a
[1] 2
$b
[1] 2
$c
$c$d
[1] 1
$c$e
[1] 1
$c$f
[1] 2
$d
[1] 3
$e
[1] 2

I think you'll have to write your own recursive function here.
A function that takes in two lists, list1 and list2.
If:
list1[[name]] exists but not list2[[name]], use list1[[name]];
list1[[name]] exists as well as list2[[name]] and both are not lists, use list2[[name]];
otherwise, recurse with list1[[name]] and list2[[name]] as the new lists.
Something like:
myMerge <- function (list1, list2) {
allNames <- unique(c(names(list1), names(list2)))
merged <- list1 # we will copy over/replace values from list2 as necessary
for (x in allNames) {
# convenience
a <- list1[[x]]
b <- list2[[x]]
if (is.null(a)) {
# only exists in list2, copy over
merged[[x]] <- b
} else if (is.list(a) && is.list(b)) {
# recurse
merged[[x]] <- myMerge(a, b)
} else if (!is.null(b)) {
# replace the list1 value with the list2 value (if it exists)
merged[[x]] <- b
}
}
return(merged)
}
Caveats - if your lists to be merged are weird, you might get weird output. For example:
a <- list( a=list(a=1, b=2), b=3 )
b <- list( a=2 )
Then your merged list has a=2, b=3. This is because the value from b$a overrides the value from a$a, even though a$a is a list (you did not specify what would happen if this were the case). However it is simple enough to modify myMerge to handle these sorts of cases. Just remember - use is.list to test if it's a list, and is.null(myList$a) to see if entry a exists in list myList.
Here is the "vectorized" version using sapply:
merge.lists <- function(a, b) {
a.names <- names(a)
b.names <- names(b)
m.names <- sort(unique(c(a.names, b.names)))
sapply(m.names, function(i) {
if (is.list(a[[i]]) & is.list(b[[i]])) merge.lists(a[[i]], b[[i]])
else if (i %in% b.names) b[[i]]
else a[[i]]
}, simplify = FALSE)
}

Related

Find empty lists in nested list of lists

Given an arbitrarily nested list, how can I find if a list contains empty lists? Consider the following example:
mylist <- list(list("foo", "bar", "baz", list(list())))
I tried rapply, but that skips through lists. While I could use lapply, I'd need to know the level of nesting beforehand. For this exercise, I don't need to know where the list is (although that would be a bonus), I just need a way to detect if there is one.

What about a function like this
has_empty_list <- function(x) {
if(is.list(x)) {
if (length(x)==0) {
return(TRUE)
} else {
return(any(vapply(x, has_empty_list, logical(1))))
}
} else {
return(FALSE)
}
}
Basically we create a recursive function to look for lists of length 0.
has_empty_list( list(list("foo", "bar", "baz", list(list()))) )
# TRUE
has_empty_list( list(list("foo", "bar", "baz", list(list(4)))) )
# FALSE
And here's a modification to find the index of the empty list
find_empty_list <- function(x, index=c()) {
if(is.list(x)) {
#list
if (length(x)==0) {
if (length(index)==0) {
return(0)
} else {
return(index)
}
} else {
m <- Map(find_empty_list, x, lapply(seq_along(x), function(i) append(index,i)))
# return the most deeply nested
return( m[[which.max(lengths(m))]] )
}
} else {
return(numeric())
}
}
This should return a vector of the index that you can use to find the empty list. For example
( i <- find_empty_list(mylist) )
# [1] 1 4 1
mylist[[i]]
# list()
If the first parameter itself is an empty list, it will return 0
find_empty_list(list())
# 0
and if there is no empty list, it should return an empty vector
find_empty_list(list(1:3, list("c", a~b)))
# numeric()

Another convenient option to work with nested list is to use data.tree package:
library(data.tree)
nodes <- as.Node(mylist)
any(node$Get(function(node) length(as.list(node))) == 0)
# [1] TRUE

Another approach is to use rrapply in the rrapply-package (an extension of base-rrapply):
library(rrapply)
## check if any empty list exists
any(
rrapply(mylist,
classes = "list",
condition = function(x) length(x) < 1,
f = function(x) TRUE,
deflt = FALSE,
how = "unlist"
)
)
#> [1] TRUE
It is straightforward to update the above call to return the index vectors of any empty lists:
## return flat list with position vectors of empty list
rrapply(mylist,
classes = "list",
condition = function(x) length(x) < 1,
f = function(x, .xpos) .xpos,
how = "flatten"
)
#> [[1]]
#> [1] 1 4 1
Here, we make use of the .xpos argument which evaluates to the position of the current list element under evaluation.
Note that this automatically returns all empty list positions instead of only one:
mylist2 <- list(list("foo", list(), "baz", list(list())))
rrapply(mylist2,
classes = "list",
condition = function(x) length(x) < 1,
f = function(x, .xpos) .xpos,
how = "flatten"
)
#> [[1]]
#> [1] 1 2
#>
#> [[2]]
#> [1] 1 4 1
## using MrFlick's find_empty_list function
find_empty_list(mylist2)
#> [1] 1 4 1

Extract function parameters and default values from any function

Is there a way to extract the parameters and their respective default values of any given function from outside the function?
For example, given:
myfunc <- function(a, b = 1) { print(c(a, b)) }
I'm looking for some function that will return:
list(a = NULL, b = 1)
or some variation thereof.

You are looking for formals().
formals(myfunc)
# $a
#
#
# $b
# [1] 1
If you needed NULL for a, you could do some checking. a will be of the "name" class and empty.
lapply(formals(myfunc), function(x) if(is.name(x) & !nzchar(x)) NULL else x)
# $a
# NULL
#
# $b
# [1] 1

Creating a R function storing objects that can be recall but not displayed

Is it possible for me to create a R function that the output will not show every objects inside but you can call the object(s) inside it using $ or [], like the one below
abc <- function(a=1, b=2, c=3) {...}
#by default, abc() only the value of a will be shown but not b neither c
abc()
[1] 1
abc()$a
[1] 1
abc()["b"]
[1] 2
abc()$c
[1] 3
Thanks.

If you want to return all values but only by default display certain ones, i think you best course of action would be to give the return object a custom class. One strategy is to just a list and give a custom print method for that list.
print.myvalues <- function(x) {
print(x$a)
}
abc <- function(a=1, b=2, c=3) {
ret <- list(a=a, b=b,c=c)
class(ret)<-"myvalues"
ret
}
(x<-abc(10,20,30))
# [1] 10
x
# [1] 10
x$b
# [1] 20
x[["c"]]
# [1] 30
But this might confuse the user when they go to use the number in any way because it's not really just a number
x+5
# Error in x + 5 : non-numeric argument to binary operator
A slightly more complex strategy would be to store the additional values as attributes and overload the $ and [ methods in such a fashion
`$.myvalues` <- function(x, n) {
x[n]
}
`[.myvalues` <- function(x, n) {
if(n=="a") return(`attributes<-`(x,NULL))
attr(x, n)
}
print.myvalues <- function(x) {
attributes(x)<-NULL
print(x)
}
abc <- function(a=1, b=2, c=3) {
ret <- a
attr(ret, "b") <- b
attr(ret, "c") <- c
class(ret)<-"myvalues"
ret
}
and use it like
(x<-abc(10,20,30))
# [1] 10
x
# [1] 10
x$b
# [1] 20
x["c"]
# [1] 30
x+5
# [1] 15
Here we preserve addition, of course nothing in particular happens to the b and c values. We could also overload the + operator if we wanted to carry the addition through the attributes. It all depends on how complex you want to make it.

> abc <- function(a=1, b=2, c=3) { cat(a); invisible(list(a=a,b=b,c=c))}
> abc()
1
> abc()$c
1[1] 3
> abc()$a
1[1] 1
> abc <- function(a=1, b=2, c=3) { print(a); invisible(list(a=a,b=b,c=c))}
> abc()$a
[1] 1
[1] 1

R populate list by its values

Say I have a list:
> fs
[[1]]
NULL
[[2]]
NULL
[[3]]
NULL
[[4]]
[1] 61.90298 58.29699 54.90104 51.70293 48.69110
I want to "reverse fill" the rest of the list by using it's values. Example:
The [[3]] should have the function value of [[4]] pairs:
c( myFunction(fs[[4]][1], fs[[4]][2]), myFunction(fs[[4]][2], fs[[4]][3]), .... )
The [[2]] should have myFunction values of [[3]] etc...
I hope that's clear. What's the right way to do it? For loops? *applys? My last attempt, which leaves 1-3 empty:
n = length(fs)
for (i in rev(1:(n-1)))
child_fs = fs[[i+1]]
res = c()
for (j in 1:(i+1))
up = v(child_fs[j])
do = v(child_fs[j+1])
this_f = myFunction(up, do)
res[j] = this_f
fs[[i]] = res

Make fs easily reproducible
fs <- list(NULL, NULL, NULL, c(61.90298, 58.29699, 54.90104, 51.70293, 48.69110))
To be able to show an example, make a trivial myFunction
myFunction <- function(a, b) {a + b}
You can loop over all but the last positions in fs (in reverse order), and compute each. Just call myFunciton with the vectors which are the next higher position's vectors without the last and without the first element.
for (i in rev(seq_along(fs))[-1]) {
fs[[i]] <- myFunction(head(fs[[i+1]], -1), tail(fs[[i+1]], -1))
}
That assumes myFunction is vectorized (given vectors for inputs, will give a vector for output). If it isn't, you can easily make a version which is.
myFunction <- function(a, b) {a[[1]] + b[[1]]}
for (i in rev(seq_along(fs))[-1]) {
fs[[i]] <- Vectorize(myFunction)(head(fs[[i+1]], -1), tail(fs[[i+1]], -1))
}
In either case, you get
> fs
[[1]]
[1] 453.2 426.8
[[2]]
[1] 233.398 219.802 206.998
[[3]]
[1] 120.200 113.198 106.604 100.394
[[4]]
[1] 61.90298 58.29699 54.90104 51.70293 48.69110

Really, what you have is a starting point
start <- c(61.90298, 58.29699, 54.90104, 51.70293, 48.69110)
a function you want to apply (I made this one up which adds 1 everywhere and deletes the last element)
myFunction <- function(x) head(x + 1, -1L)
and the number of times you want to apply the function (recursively):
n <- 3L
So I would write a function to apply the function n times recursively, then reverse the output list:
apply.n.times <- function(fun, n, x)
if (n == 0L) list(x) else c(list(x), Recall(fun, n - 1L, fun(x)))
rev(apply.n.times(myFunction, n, start))
# [[1]]
# [1] 64.90298 61.29699
#
# [[2]]
# [1] 63.90298 60.29699 56.90104
#
# [[3]]
# [1] 62.90298 59.29699 55.90104 52.70293
#
# [[4]]
# [1] 61.90298 58.29699 54.90104 51.70293 48.69110

Here is a one-line solution (if myFunction can be replaced with something like sum, or in this case rowSums):
Reduce( function(x,y) rowSums( embed(y,2) ), fs, right=TRUE, accumulate=TRUE )
If myFunction needs to accept 2 values and do something with them then this can be expanded a bit to:
Reduce( function(x,y) apply( embed(y,2), 1, function(z) myFunction(z[1],z[2]) ),
fs, right=TRUE, accumulate=TRUE )

Assign multiple new variables on LHS in a single line

I want to assign multiple variables in a single line in R. Is it possible to do something like this?
values # initialize some vector of values
(a, b) = values[c(2,4)] # assign a and b to values at 2 and 4 indices of 'values'
Typically I want to assign about 5-6 variables in a single line, instead of having multiple lines. Is there an alternative?

I put together an R package zeallot to tackle this very problem. zeallot includes an operator (%<-%) for unpacking, multiple, and destructuring assignment. The LHS of the assignment expression is built using calls to c(). The RHS of the assignment expression may be any expression which returns or is a vector, list, nested list, data frame, character string, date object, or custom objects (assuming there is a destructure implementation).
Here is the initial question reworked using zeallot (latest version, 0.0.5).
library(zeallot)
values <- c(1, 2, 3, 4) # initialize a vector of values
c(a, b) %<-% values[c(2, 4)] # assign `a` and `b`
a
#[1] 2
b
#[1] 4
For more examples and information one can check out the package vignette.

There is a great answer on the Struggling Through Problems Blog
This is taken from there, with very minor modifications.
USING THE FOLLOWING THREE FUNCTIONS
(Plus one for allowing for lists of different sizes)
# Generic form
'%=%' = function(l, r, ...) UseMethod('%=%')
# Binary Operator
'%=%.lbunch' = function(l, r, ...) {
Envir = as.environment(-1)
if (length(r) > length(l))
warning("RHS has more args than LHS. Only first", length(l), "used.")
if (length(l) > length(r)) {
warning("LHS has more args than RHS. RHS will be repeated.")
r <- extendToMatch(r, l)
}
for (II in 1:length(l)) {
do.call('<-', list(l[[II]], r[[II]]), envir=Envir)
}
}
# Used if LHS is larger than RHS
extendToMatch <- function(source, destin) {
s <- length(source)
d <- length(destin)
# Assume that destin is a length when it is a single number and source is not
if(d==1 && s>1 && !is.null(as.numeric(destin)))
d <- destin
dif <- d - s
if (dif > 0) {
source <- rep(source, ceiling(d/s))[1:d]
}
return (source)
}
# Grouping the left hand side
g = function(...) {
List = as.list(substitute(list(...)))[-1L]
class(List) = 'lbunch'
return(List)
}
Then to execute:
Group the left hand side using the new function g()
The right hand side should be a vector or a list
Use the newly-created binary operator %=%
# Example Call; Note the use of g() AND `%=%`
# Right-hand side can be a list or vector
g(a, b, c) %=% list("hello", 123, list("apples, oranges"))
g(d, e, f) %=% 101:103
# Results:
> a
[1] "hello"
> b
[1] 123
> c
[[1]]
[1] "apples, oranges"
> d
[1] 101
> e
[1] 102
> f
[1] 103
Example using lists of different sizes:
Longer Left Hand Side
g(x, y, z) %=% list("first", "second")
# Warning message:
# In `%=%.lbunch`(g(x, y, z), list("first", "second")) :
# LHS has more args than RHS. RHS will be repeated.
> x
[1] "first"
> y
[1] "second"
> z
[1] "first"
Longer Right Hand Side
g(j, k) %=% list("first", "second", "third")
# Warning message:
# In `%=%.lbunch`(g(j, k), list("first", "second", "third")) :
# RHS has more args than LHS. Only first2used.
> j
[1] "first"
> k
[1] "second"

Consider using functionality included in base R.
For instance, create a 1 row dataframe (say V) and initialize your variables in it. Now you can assign to multiple variables at once V[,c("a", "b")] <- values[c(2, 4)], call each one by name (V$a), or use many of them at the same time (values[c(5, 6)] <- V[,c("a", "b")]).
If you get lazy and don't want to go around calling variables from the dataframe, you could attach(V) (though I personally don't ever do it).
# Initialize values
values <- 1:100
# V for variables
V <- data.frame(a=NA, b=NA, c=NA, d=NA, e=NA)
# Assign elements from a vector
V[, c("a", "b", "e")] = values[c(2,4, 8)]
# Also other class
V[, "d"] <- "R"
# Use your variables
V$a
V$b
V$c # OOps, NA
V$d
V$e

here is my idea. Probably the syntax is quite simple:
`%tin%` <- function(x, y) {
mapply(assign, as.character(substitute(x)[-1]), y,
MoreArgs = list(envir = parent.frame()))
invisible()
}
c(a, b) %tin% c(1, 2)
gives like this:
> a
Error: object 'a' not found
> b
Error: object 'b' not found
> c(a, b) %tin% c(1, 2)
> a
[1] 1
> b
[1] 2
this is not well tested though.

A potentially dangerous (in as much as using assign is risky) option would be to Vectorize assign:
assignVec <- Vectorize("assign",c("x","value"))
#.GlobalEnv is probably not what one wants in general; see below.
assignVec(c('a','b'),c(0,4),envir = .GlobalEnv)
a b
0 4
> b
[1] 4
> a
[1] 0
Or I suppose you could vectorize it yourself manually with your own function using mapply that maybe uses a sensible default for the envir argument. For instance, Vectorize will return a function with the same environment properties of assign, which in this case is namespace:base, or you could just set envir = parent.env(environment(assignVec)).

As others explained, there doesn't seem to be anything built in. ...but you could design a vassign function as follows:
vassign <- function(..., values, envir=parent.frame()) {
vars <- as.character(substitute(...()))
values <- rep(values, length.out=length(vars))
for(i in seq_along(vars)) {
assign(vars[[i]], values[[i]], envir)
}
}
# Then test it
vals <- 11:14
vassign(aa,bb,cc,dd, values=vals)
cc # 13
One thing to consider though is how to handle the cases where you e.g. specify 3 variables and 5 values or the other way around. Here I simply repeat (or truncate) the values to be of the same length as the variables. Maybe a warning would be prudent. But it allows the following:
vassign(aa,bb,cc,dd, values=0)
cc # 0

list2env(setNames(as.list(rep(2,5)), letters[1:5]), .GlobalEnv)
Served my purpose, i.e., assigning five 2s into first five letters.

Had a similar problem recently and here was my try using purrr::walk2
purrr::walk2(letters,1:26,assign,envir =parent.frame())

https://stat.ethz.ch/R-manual/R-devel/library/base/html/list2env.html:
list2env(
list(
a=1,
b=2:4,
c=rpois(10,10),
d=gl(3,4,LETTERS[9:11])
),
envir=.GlobalEnv
)

If your only requirement is to have a single line of code, then how about:
> a<-values[2]; b<-values[4]

I'm afraid that elegent solution you are looking for (like c(a, b) = c(2, 4)) unfortunatelly does not exist. But don't give up, I'm not sure! The nearest solution I can think of is this one:
attach(data.frame(a = 2, b = 4))
or if you are bothered with warnings, switch them off:
attach(data.frame(a = 2, b = 4), warn = F)
But I suppose you're not satisfied with this solution, I wouldn't be either...

R> values = c(1,2,3,4)
R> a <- values[2]; b <- values[3]; c <- values[4]
R> a
[1] 2
R> b
[1] 3
R> c
[1] 4

Another version with recursion:
let <- function(..., env = parent.frame()) {
f <- function(x, ..., i = 1) {
if(is.null(substitute(...))){
if(length(x) == 1)
x <- rep(x, i - 1);
stopifnot(length(x) == i - 1)
return(x);
}
val <- f(..., i = i + 1);
assign(deparse(substitute(x)), val[[i]], env = env);
return(val)
}
f(...)
}
example:
> let(a, b, 4:10)
[1] 4 5 6 7 8 9 10
> a
[1] 4
> b
[1] 5
> let(c, d, e, f, c(4, 3, 2, 1))
[1] 4 3 2 1
> c
[1] 4
> f
[1] 1
My version:
let <- function(x, value) {
mapply(
assign,
as.character(substitute(x)[-1]),
value,
MoreArgs = list(envir = parent.frame()))
invisible()
}
example:
> let(c(x, y), 1:2 + 3)
> x
[1] 4
> y
[1]

Combining some of the answers given here + a little bit of salt, how about this solution:
assignVec <- Vectorize("assign", c("x", "value"))
`%<<-%` <- function(x, value) invisible(assignVec(x, value, envir = .GlobalEnv))
c("a", "b") %<<-% c(2, 4)
a
## [1] 2
b
## [1] 4
I used this to add the R section here: http://rosettacode.org/wiki/Sort_three_variables#R
Caveat: It only works for assigning global variables (like <<-). If there is a better, more general solution, pls. tell me in the comments.

For a named list, use
list2env(mylist, environment())
For instance:
mylist <- list(foo = 1, bar = 2)
list2env(mylist, environment())
will add foo = 1, bar = 2 to the current environement, and override any object with those names. This is equivalent to
mylist <- list(foo = 1, bar = 2)
foo <- mylist$foo
bar <- mylist$bar
This works in a function, too:
f <- function(mylist) {
list2env(mylist, environment())
foo * bar
}
mylist <- list(foo = 1, bar = 2)
f(mylist)
However, it is good practice to name the elements you want to include in the current environment, lest you override another object... and so write preferrably
list2env(mylist[c("foo", "bar")], environment())
Finally, if you want different names for the new imported objects, write:
list2env(`names<-`(mylist[c"foo", "bar"]), c("foo2", "bar2")), environment())
which is equivalent to
foo2 <- mylist$foo
bar2 <- mylist$bar

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R - merge lists with overwrite and recursion - r

Related

Find empty lists in nested list of lists

Extract function parameters and default values from any function

Creating a R function storing objects that can be recall but not displayed

R populate list by its values

Assign multiple new variables on LHS in a single line

Categories

Resources