Find empty lists in nested list of lists - r

Given an arbitrarily nested list, how can I find if a list contains empty lists? Consider the following example:
mylist <- list(list("foo", "bar", "baz", list(list())))
I tried rapply, but that skips through lists. While I could use lapply, I'd need to know the level of nesting beforehand. For this exercise, I don't need to know where the list is (although that would be a bonus), I just need a way to detect if there is one.

What about a function like this
has_empty_list <- function(x) {
if(is.list(x)) {
if (length(x)==0) {
return(TRUE)
} else {
return(any(vapply(x, has_empty_list, logical(1))))
}
} else {
return(FALSE)
}
}
Basically we create a recursive function to look for lists of length 0.
has_empty_list( list(list("foo", "bar", "baz", list(list()))) )
# TRUE
has_empty_list( list(list("foo", "bar", "baz", list(list(4)))) )
# FALSE
And here's a modification to find the index of the empty list
find_empty_list <- function(x, index=c()) {
if(is.list(x)) {
#list
if (length(x)==0) {
if (length(index)==0) {
return(0)
} else {
return(index)
}
} else {
m <- Map(find_empty_list, x, lapply(seq_along(x), function(i) append(index,i)))
# return the most deeply nested
return( m[[which.max(lengths(m))]] )
}
} else {
return(numeric())
}
}
This should return a vector of the index that you can use to find the empty list. For example
( i <- find_empty_list(mylist) )
# [1] 1 4 1
mylist[[i]]
# list()
If the first parameter itself is an empty list, it will return 0
find_empty_list(list())
# 0
and if there is no empty list, it should return an empty vector
find_empty_list(list(1:3, list("c", a~b)))
# numeric()

Another convenient option to work with nested list is to use data.tree package:
library(data.tree)
nodes <- as.Node(mylist)
any(node$Get(function(node) length(as.list(node))) == 0)
# [1] TRUE

Another approach is to use rrapply in the rrapply-package (an extension of base-rrapply):
library(rrapply)
## check if any empty list exists
any(
rrapply(mylist,
classes = "list",
condition = function(x) length(x) < 1,
f = function(x) TRUE,
deflt = FALSE,
how = "unlist"
)
)
#> [1] TRUE
It is straightforward to update the above call to return the index vectors of any empty lists:
## return flat list with position vectors of empty list
rrapply(mylist,
classes = "list",
condition = function(x) length(x) < 1,
f = function(x, .xpos) .xpos,
how = "flatten"
)
#> [[1]]
#> [1] 1 4 1
Here, we make use of the .xpos argument which evaluates to the position of the current list element under evaluation.
Note that this automatically returns all empty list positions instead of only one:
mylist2 <- list(list("foo", list(), "baz", list(list())))
rrapply(mylist2,
classes = "list",
condition = function(x) length(x) < 1,
f = function(x, .xpos) .xpos,
how = "flatten"
)
#> [[1]]
#> [1] 1 2
#>
#> [[2]]
#> [1] 1 4 1
## using MrFlick's find_empty_list function
find_empty_list(mylist2)
#> [1] 1 4 1

Related

R replace list elements from within a function

In R I'd like to replace some elements in a list using the $ notation:
# functions
replaceNonNull <- function(x, value) {
if(!is.null(x)){
thisx <- deparse(substitute(x))
print(paste0("replacing ", thisx, " with '",value,"'"))
#x <<- value
assign(thisx, value, envir = .GlobalEnv)
}
}
mylist = list("a"=1:3)
replaceNonNull(mylist$a,"456");mylist$a
However after running replaceNonNull, a new variable is created with name 'mylist$a'. How can I change the a value in the list instead?
Maybe you want something like this:
replaceNonNull <- function(x, el, value, env = globalenv()) {
if (!is.null(x[[el]])) {
nx <- deparse(substitute(x))
nv <- deparse(substitute(value))
cat("replacing value of", sQuote(el), "in", sQuote(nx), "with", sQuote(nv), "\n")
env[[nx]][[el]] <- value
}
}
mylist <- list(a = 1:3)
replaceNonNull(mylist, "a", 4:6)
## replacing value of ‘a’ in ‘mylist’ with ‘4:6’
mylist$a
## [1] 4 5 6
replaceNonNull(mylist, "b", 4:6)
mylist$b
## NULL
Nonstandard evaluation is a dangerous game, so you should be aware of the limitations. Here, x must be the name of a variable bound in env (hence not a call to the $ operator). Otherwise, you will continue to see unexpected behaviour:
mylist <- list(zzz = list(a = 1:3))
replaceNonNull(mylist$zzz, "a", 4:6)
## replacing value of ‘a’ in ‘mylist$zzz’ with ‘4:6’
mylist$zzz$a
## [1] 1 2 3
`mylist$zzz`
## $a
## [1] 4 5 6
You can avoid unintended assignments by adding a test:
replaceNonNull <- function(x, el, value, env = globalenv()) {
nx <- deparse(substitute(x))
if (!exists(nx, env, mode = "list")) {
stop("There is no list in ", sQuote("env"), " named ", sQuote(nx), ".")
}
if (!is.null(x[[el]])) {
nv <- deparse(substitute(value))
cat("replacing value of", sQuote(el), "in", sQuote(nx), "with", sQuote(nv), "\n")
env[[nx]][[el]] <- value
}
}
rm(`mylist$zzz`) # clean up after last example
replaceNonNull(mylist$zzz, "a", 4:6)
## Error in replaceNonNull(mylist$zzz, "a", 4:6) :
## There is no list in ‘env’ named ‘mylist$zzz’.
The problem you're having is that the first argument of assign is:
x - a variable name, given as a character string.
But even outside the function, this doesn't work.
assign(mylist$a,0)
#Error in assign(mylist$a, 0) : invalid first argument
assign("mylist$a",0)
mylist
#$a
#[1] 1 2 3
However, you can use $<-, like this:
> mylist$a <- 0
> mylist$a
[1] 0
One approach, then is to create that expression and evaluate it:
mylist = list("a"=1:3)
myexpression <- deparse(substitute(mylist$a))
myexpression
#[1] "mylist$a"
library(rlang)
expr(!!parse_expr(myexpression) <- 0)
#mylist$a <- 0
eval(expr(!!parse_expr(myexpression) <- 0))
mylist$a
#[1] 0
Obviously use <<- inside the function.

Filter list in R base on criteria within list objects

This is a trivial question, but I'm stumped. How can I filter a list of dataframes based on their length? The list is nested -- meaning there are lists of lists of dataframes of different lengths. Here is an example. I'd like to filter or subset the list to include only those objects that are length n, say 3.
Here is an example and my current approach.
library(tidyverse)
# list of list with arbitrary lengths
star.wars_ls <- list(starwars[1:5],
list(starwars[1:8], starwars[4:6]),
starwars[1:2],
list(starwars[1:7], starwars[2:6]),
starwars[1:3])
# I want to filter the list by dataframes that are 3 variables long (i.e. length(df == 3).
# Here is my attempt, I'm stuck at how to obtain
# the number of varibles in each dataframe and then filter by it.
map(star.wars_ls, function(x){
map(x, function(x){ ## Incorrectly returns 20 for all
length(y)
})
})
We can do
map(star.wars_ls, ~ if(is.data.frame(.x)) .x[length(.x) == 3] else map(.x, ~ .x[length(.x) == 3]))
You should be able to check whether the item in the star.wars_ls is a list or a data frame. Then, check the number of columns within each item. Try using:
library(tidyverse)
# list of list with arbitrary lengths
star.wars_ls <- list(starwars[1:5],
list(starwars[1:8], starwars[4:6]),
starwars[1:2],
list(starwars[1:7], starwars[2:6]),
starwars[1:3])
# I want to filter the list by dataframes that are 3 variables long (i.e. length(df == 3).
datacols <- map(star.wars_ls, function(X) {
if (is.data.frame(X) == T) {
ncol(X) }
else {
map(X, function(Y) {
ncol(Y)
})
}
}
)
# > datacols
# [[1]]
# [1] 5
#
# [[2]]
# [[2]][[1]]
# [1] 8
#
# [[2]][[2]]
# [1] 3
#
#
# [[3]]
# [1] 2
#
# [[4]]
# [[4]][[1]]
# [1] 7
#
# [[4]][[2]]
# [1] 5
#
#
# [[5]]
# [1] 3
This will only give you the length (number of columns) of each data frame within the list. To get the indices (I'm sure there's a more efficient way to do this -- maybe someone else can help with that):
indexlist <- c()
for (i in 1:length(datacols)) {
if (length(datacols[[i]]) == 1) {
if (datacols[[i]][1] == 3) {
index <- i
indexlist <- c(indexlist, as.character(index))
}
} else {
for (j in 1:length(datacols[[i]])) {
if (datacols[[i]][[j]][1] == 3) {
index <- str_c(i, ",", j)
indexlist <- c(indexlist, index)
}
}
}
}
# > indexlist
# [1] "2,2" "5"
you could use recursion. It doesnt matter how deeply nested the list is:
ff = function(x)map(x,~if(is.data.frame(.x)){if(length(.x)==3) .x} else ff(.x))
ff(star.wars_ls)

Chaining list of vectors

If I have a list of vectors such as below
list.x <- list(1:2, 1:3, 3:4, 5, 5:6)
Is there a way to replace each list element with an element that includes all the other values that the element can be paired with?
For example the first element (list.x[[1]]) would be replace with 1:4 because element 2 (list.x[[2]]) shows that 2, is also paired with 3, and element 3 shows that 3 is also paired with 4.
The final result I would like to achieve would be this list
final.list <- list(1:4, 1:4, 1:4, 5:6, 5:6)
I needed a change of pace today, so I decided to try to answer the question using base R. Here it goes:
First, I created a function that unions two vectors if they intersect, and if not, simply returns the first vector:
expand.if.toucing <- function(vector1, vector2) {
i = intersect(vector1, vector2);
if (NROW(i) > 0)
union(vector1, vector2)
else
vector1;
}
Then I made a function that merges one element in the list of vectors with another:
list.reduce <- function (lst) {
for(v1 in 1:NROW(lst))
for (v2 in 1:NROW(lst)) {
if (v1 == v2)
next;
prevLength <- NROW(lst[[v1]]);
lst[[v1]] <- expand.if.toucing(lst[[v1]], lst[[v2]]);
newLength <- NROW(lst[[v1]]);
if (newLength == prevLength)
next;
return(lst[-v2]);
}
lst;
}
After this, I made a function that merges all vectors in the list that can be merged. This is sort of a proto cluster analysis, so I called it clusterize:
clusterize <- function (lst) {
reduced = TRUE;
while(reduced) {
prevLength <- NROW(lst);
lst <- list.reduce(lst);
newLength <- NROW(lst);
reduced <- prevLength != newLength;
}
lst;
}
Now it's just a matter of replacing each element in the original list with its associated cluster:
replace.with.clusters <- function(lst, clusters) {
for(l in 1:NROW(lst))
for(c in 1:NROW(clusters)) {
lst[[l]] <- expand.if.toucing(lst[[l]], clusters[[c]]);
next;
}
lst;
}
You're good to go. The two main functions are clusterize and replace.with.cluster. Use them like this:
list.x <- list(1:2, 1:3, 3:4, 5, 5:6)
clusters <- clusterize(list.x);
replace.with.clusters(list.x, clusters);
# Outputs the following:
#
# [[1]]
# [1] 1 2 3 4
#
# [[2]]
# [1] 1 2 3 4
#
# [[3]]
# [1] 3 4 1 2
#
# [[4]]
# [1] 5 6
#
# [[5]]
# [1] 5 6
The third element is in a different order than your list, but from the way you describe the problem, order is not truly relevant.

Loop over list of R objects and change each object

I want to make a list of objects of a particular class in R, go through the list, and change each object according to some criterion. For example:
Duck <- function(grade,cap) {
res <- structure(list(grade=grade,cap=cap),class="Duck")
return(res)
}
Kwik <- Duck(5,0)
Kwek <- Duck(7,0)
Kwak <- Duck(9,0)
# Select all Ducks from the workspace
AllDucks <- Filter( function(x) 'Duck' %in% class( get(x) ), ls() )
# Give each Duck with a grade higher than 5 a cap (i.e. cap is set to 1)
for(i in 1:length(AllDucks)) {
if(get(AllDucks[i])$grade > 5) {
get(AllDucks[i])$cap <- 1
}
}
The expression get(AllDucks[i])$cap <- 1 gives the error message
Error in get(AllDucks[i])$cap <- 1 : could not find function "get<-"
How can I pick an object from a list of objects and change some of its attributes?
Why don't your ducks swim nicely in a pond? You should give them a nice habitat to begin with, but you can also catch them from the wild:
pond <- mget(AllDucks)
pond <- lapply(pond, function(x) {
if (x$grade > 5) x$cap <- 1
x
})
pond$Kwek
# $grade
# [1] 7
#
# $cap
# [1] 1
#
# attr(,"class")
# [1] "Duck"
To reassign into the current environment, you could do
mapply(assign, AllDucks, lapply(mget(AllDucks), function(x) {x$cap<-1; x}),
MoreArgs =list(envir = environment()))

R - merge lists with overwrite and recursion

Suppose I have two lists with names,
a = list( a=1, b=2, c=list( d=1, e=2 ), d=list( a=1, b=2 ) )
b = list( a=2, c=list( e=1, f=2 ), d=3, e=2 )
I'd like to recursively merge those lists, overwriting entries if the second argument contains conflicting values. I.e. the expected output would be
$a
[1] 2
$b
[1] 2
$c
$c$d
[1] 1
$c$e
[1] 1
$c$f
[1] 2
$d
[1] 3
$e
[1] 2
Any hint?
I am not so sure if a custom function is necessary here. There is a function utils::modifyList() to perform this exact same operation! See modifyList for more info.
a <- list( a=1, b=2, c=list( d=1, e=2 ), d=list( a=1, b=2 ) )
b <- list( a=2, c=list( e=1, f=2 ), d=3, e=2 )
modifyList(a, b) # updates(modifies) 'a' with 'b'
Which gives the following
$a
[1] 2
$b
[1] 2
$c
$c$d
[1] 1
$c$e
[1] 1
$c$f
[1] 2
$d
[1] 3
$e
[1] 2
I think you'll have to write your own recursive function here.
A function that takes in two lists, list1 and list2.
If:
list1[[name]] exists but not list2[[name]], use list1[[name]];
list1[[name]] exists as well as list2[[name]] and both are not lists, use list2[[name]];
otherwise, recurse with list1[[name]] and list2[[name]] as the new lists.
Something like:
myMerge <- function (list1, list2) {
allNames <- unique(c(names(list1), names(list2)))
merged <- list1 # we will copy over/replace values from list2 as necessary
for (x in allNames) {
# convenience
a <- list1[[x]]
b <- list2[[x]]
if (is.null(a)) {
# only exists in list2, copy over
merged[[x]] <- b
} else if (is.list(a) && is.list(b)) {
# recurse
merged[[x]] <- myMerge(a, b)
} else if (!is.null(b)) {
# replace the list1 value with the list2 value (if it exists)
merged[[x]] <- b
}
}
return(merged)
}
Caveats - if your lists to be merged are weird, you might get weird output. For example:
a <- list( a=list(a=1, b=2), b=3 )
b <- list( a=2 )
Then your merged list has a=2, b=3. This is because the value from b$a overrides the value from a$a, even though a$a is a list (you did not specify what would happen if this were the case). However it is simple enough to modify myMerge to handle these sorts of cases. Just remember - use is.list to test if it's a list, and is.null(myList$a) to see if entry a exists in list myList.
Here is the "vectorized" version using sapply:
merge.lists <- function(a, b) {
a.names <- names(a)
b.names <- names(b)
m.names <- sort(unique(c(a.names, b.names)))
sapply(m.names, function(i) {
if (is.list(a[[i]]) & is.list(b[[i]])) merge.lists(a[[i]], b[[i]])
else if (i %in% b.names) b[[i]]
else a[[i]]
}, simplify = FALSE)
}

Resources