R: removing NULL elements from a list - r

mylist <- list(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
123, NULL, 456)
> mylist
[[1]]
NULL
[[2]]
NULL
[[3]]
NULL
[[4]]
NULL
[[5]]
NULL
[[6]]
NULL
[[7]]
NULL
[[8]]
NULL
[[9]]
NULL
[[10]]
NULL
[[11]]
[1] 123
[[12]]
NULL
[[13]]
[1] 456
My list has 13 elements, 11 of which are NULL. I would like to remove them, but preserve the indices of the elements that are nonempty.
mylist2 = mylist[-which(sapply(mylist, is.null))]
> mylist2
[[1]]
[1] 123
[[2]]
[1] 456
This removes the NULL elements just fine, but I don't want the nonempty elements to be reindexed, i.e, I want mylist2 to look something like this, where the indices of the nonempty entries are preserved.
> mylist2
[[11]]
[1] 123
[[13]]
[1] 456

The closest you'll be able to get is to first name the list elements and then remove the NULLs.
names(x) <- seq_along(x)
## Using some higher-order convenience functions
Filter(Negate(is.null), x)
# $`11`
# [1] 123
#
# $`13`
# [1] 456
# Or, using a slightly more standard R idiom
x[sapply(x, is.null)] <- NULL
x
# $`11`
# [1] 123
#
# $`13`
# [1] 456

Simply use mylist[lengths(mylist) != 0].
Function lengths() was introduced in R 3.2.0 (April 2015).

The purrr package, included in Tidyverse, has elegant and fast functions for working with lists:
require(tidyverse)
# this works
compact(mylist)
# or this
mylist %>% discard(is.null)
# or this
# pipe "my_list" data object into function "keep()", make lambda function inside "keep()" to return TRUE FALSE.
mylist %>% keep( ~ !is.null(.) )
All above options are from Purrr. Output is:
[[1]]
[1] 123
[[2]]
[1] 456
Note: compact() was in plyr, but dplyr superseded plyr, and compact() stayed around but moved to purrr. Anyway, all the functions are within the parent package tidyverse.
Here's a link to the Purrr cheat sheet download:
https://rstudio.com/resources/cheatsheets/
Or to view the Purrr cheatsheet directly in a browser:
https://evoldyn.gitlab.io/evomics-2018/ref-sheets/R_purrr.pdf

There's a function that automatically removes all the null entries of a list, and if the list is named, it maintains the names of the non-null entries.
This function is called compact from the package plyr.
l <- list( NULL, NULL, foo, bar)
names(l) <- c( "one", "two", "three", "four" )
plyr::compact(l)
If you want to preserve the indexes of the non-null entries, you can name the list as it is done in the post before and then compact your list:
names(l) <- seq_along(l)
plyr::compact(l)

If you want to keep the names you can do
a <- list(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
123, NULL, 456)
non_null_names <- which(!sapply(a, is.null))
a <- a[non_null_names]
names(a) <- non_null_names
a
You can then access the elements like so
a[['11']]
num <- 11
a[[as.character(num)]]
a[[as.character(11)]]
a$`11`
You can't get them in the neat [[11]], [[13]] notation, though, because those represent numerical indices.

This solution works with nested list as well
rlist::list.clean(myNestedlist ,recursive = T)

Here it is with convenient chaining notation
library(magrittr)
mylist %>%
setNames(seq_along(.)) %>%
Filter(. %>% is.null %>% `!`, .)

here's a very simple way to do it using only base R functions:
names(mylist) <- 1:length(mylist)
mylist2 <- mylist[which(!sapply(mylist, is.null))]

Related

Accesing variable name passed as argument inside apply

I made almost the same question in another post, but asking just for column name, and received a perfect solution for that need. Now what I need is the variable full name. I reformulate here.
I use 'deparse(substitute(x))' from inside my function to get variable name passed as parameter. It works great... but not with 'lapply'
myfun <- function(x)
{
return(deparse(substitute(x)))
}
a <- c(1,2,3)
b <- c(4,5,5)
df<-data.frame(a,b)
myfun(df$a)
[1] "df$a"
but, with 'lapply'...
lapply(df, myfun)
$a
[1] "X[[i]]"
$b
[1] "X[[i]]"
How can I get the variable name inside 'lapply'?
Thanks
When you pass a data frame to lapply, it iterates through the columns by numerical indexing using the double square bracket, not name indexing using the $ accessor. It is equivalent to using the following loop:
X <- df
result <- list()
for(i in seq_along(X)) {
result[[i]] <- myfun(X[[i]])
}
names(result) <- names(X)
result
#> $a
#> [1] "X[[i]]"
#>
#> $b
#> [1] "X[[i]]"
So a simple deparse(substitute(x)) will not work inside lapply. You are not recovering the column name, but rather would need to reconstruct it from the call stack. This is full of caveats and gotchas, but a (relatively) simple approach would be:
myfun <- function(x) {
stack <- lapply(sys.calls(), function(x) sapply(as.list(x), deparse))
if(stack[[length(stack)]][1] == 'myfun') {
return(stack[[length(stack)]][2])
}
if(stack[[length(stack)]][1] == 'FUN') {
return(paste0(stack[[length(stack) - 1]][2], '$',
eval(quote(names(X)[i]), parent.frame())))
}
deparse(substitute(x))
}
This means your function will still work if called directly:
myfun(df$a)
#> [1] "df$a"
But will also work within lapply
lapply(df, myfun)
#> $a
#> [1] "df$a"
#>
#> $b
#> [1] "df$b"
lapply(iris, myfun)
#> $Sepal.Length
#> [1] "iris$Sepal.Length"
#>
#> $Sepal.Width
#> [1] "iris$Sepal.Width"
#>
#> $Petal.Length
#> [1] "iris$Petal.Length"
#>
#> $Petal.Width
#> [1] "iris$Petal.Width"
#>
#> $Species
#> [1] "iris$Species"
It is specifically written to cover direct use or use within lapply. If you wanted to expand its use to work within other functional calls like Map or the various purrr mapping functions, then these would have to be covered specifically by their own if clauses.
Here is another solution, its a bit verbose and Allen's solution is much better:
myfun <- function(x) {
pf <- parent.frame()
x_nm <- deparse(substitute(x))
frame_n <- sys.nframe()
apply <- FALSE
while(frame_n > 0) {
cl <- as.list(sys.call(frame_n))
if (grepl("apply", cl[[1]])) {
x_obj <- cl[[2]]
apply <- TRUE
break
}
frame_n <- frame_n - 1L
}
if (apply) {
idx <- parent.frame()$i[]
obj <- get(x_obj, envir = pf)
if (!is.null(names(obj)[idx])) {
nm_or_idx <- names(obj)[idx]
} else {
nm_or_idx <- idx
}
x_nm <- paste0(x_obj, '$', nm_or_idx)
}
return(x_nm)
}
myfun(df$a)
#> [1] "df$a"
lapply(df, myfun)
#> $a
#> [1] "df$a"
#>
#> $b
#> [1] "df$b"
Created on 2023-02-09 by the reprex package (v2.0.1)
We can define a character string 'col_name'to take the name of the data frame column in the function. For example, if col_name is "a", df[[col_name]] extracts "a" column from data frame.Then we can use the paste() function to concatenate the string 'df$' and 'col_name':
myfun <- function(col_name) {
col <- df[[col_name]]
return(paste("df$", col_name, sep = ""))
}
lapply(colnames(df), myfun)
output
[[1]]
[1] "df$a"
[[2]]
[1] "df$b"
If we would like to assign any data we could do the assignment and then run lapply for example:
df <- iris
lapply(colnames(df), myfun)
output
[[1]]
[1] "df$Sepal.Length"
[[2]]
[1] "df$Sepal.Width"
[[3]]
[1] "df$Petal.Length"
[[4]]
[1] "df$Petal.Width"
[[5]]
[1] "df$Species"
I hope this could helps.

R: sapply / lapply Different Behaviour with Names

I reduced some problem to the following toy code:
cc<-c("1","2")
ff<-function(x) { list(myname=x)}
aa<-unlist(lapply(cc,ff))
bb<-sapply(cc,ff)
I'd expect aa and bb to be identical, but:
> aa
myname myname
"1" "2"
> bb
$`1.myname`
[1] "1"
$`2.myname`
[1] "2"
I'm aware of the USE.NAMES argument to sapply, however -
it's documented as -
USE.NAMES logical; if TRUE and if X is character, use X as names for
the result unless it had names already.
and so should have no impact in this case,
Internally, it isn't even passed to simplify2array and thus neither to the final unlist.
What's going on here? Could this be an R issue?
Edit: after further investigation it turns out the root cause for the difference is that sapply is essentially equivalent not to
unlist(lapply(cc,ff)
but rather to
unlist(lapply(cc, ff), recursive = FALSE)
(This is the exact internal unlist call).
Look carefully at this:
lapply(cc, ff)
#> [[1]]
#> [[1]]$myname
#> [1] "1"
#>
#>
#> [[2]]
#> [[2]]$myname
#> [1] "2"
The output of lapply itself doesn't have names. Look:
a <- lapply(cc, ff)
names(a)
#> NULL
The output of the lapply is actually an unnamed list. Each element of a is a named list.
names(a[[1]])
#> [1] "myname"
names(a[[2]])
#> [1] "myname"
So in fact, USE.NAMES will apply, and sapply will assign the contents of cc as names for the output of the lapply for which sapply is a thin wrapper as stated in the documentation. It's quite straightforward to follow the code through:
sapply
#> function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
#> {
#> FUN <- match.fun(FUN)
#> answer <- lapply(X = X, FUN = FUN, ...)
#> if (USE.NAMES && is.character(X) && is.null(names(answer)))
#> names(answer) <- X
#> if (!isFALSE(simplify) && length(answer))
#> simplify2array(answer, higher = (simplify == "array"))
#> else answer
#> }
#> <bytecode: 0x036ae7a8>
#> <environment: namespace:base>

Extract list elements using names

I have a list
ls<-list(c("a"="one","b"="two"),"x"="t4",c("y"="t5","z"="t6"))
I would like to extract the list elements by names rather than indexing. Is there a way to do it?
As in
ls["a"]
> "one"
ls["y"]
> "t5"
I want only the output "one" and "t5". I will be using these outputs to either parse it with some other string , or perform arithmetic (if the outputs are numbers) with other variables
I found a similar question asked before, R: get element by name from a nested list.
But it doesnt work for this. Any thoughts?
With plyr:
plyr::llply(lst,function(x) x["a"])
or:
Filter(Negate(is.na),plyr::llply(lst,function(x) x["y"]))
[[1]]
y
"t5"
You can automate it by making it a function.
An attempt at automating the process(might be slow):
purrr::map(c("a","y"),
function(x) lapply(lst, function(z) z[x]))
The following might be sufficient in your specific case, given that the component names are unique (otherwise there is an identifiability issue).
## data
ls <- list(c(a = "one", b = "two"), x = "t4", list(c(y = "t5", z = "t6")))
getElement <- function(ls, name) unlist(ls)[[grep(name, names(unlist(ls)))]]
getElement(ls, "a")
#> [1] "one"
getElement(ls, "b")
#> [1] "two"
getElement(ls, "x")
#> [1] "t4"
getElement(ls, "y")
#> [1] "t5"
We can just unlist the list and use the [[ operator, which returns an unnamed one-element vector:
unlist(ls)[["a"]]
# [1] "one"
unlist(ls)[["y"]]
# [1] "t5"
If we want to keep the name, use [:
unlist(ls2)["a"]
# a
# "one"
unlist(ls2)["y"]
# y
# "t5"

How to convert object variable into string inside a function

I have the following list of vectors
v1 <- c("foo","bar")
v2 <- c("qux","uip","lsi")
mylist <- list(v1,v2)
mylist
#> [[1]]
#> [1] "foo" "bar"
#>
#> [[2]]
#> [1] "qux" "uip" "lsi"
What I want to do is to apply a function so that it prints the this string:
v1:foo,bar
v2:qux,uip,lsi
So it involves two step: 1) Convert object variable to string and
2) make the vector into string. The latter is easy as I can do this:
make_string <- function (content_vector) {
cat(content_vector,sep=",")
}
make_string(mylist[[1]])
# foo,bar
make_string(mylist[[2]])
# qux,uip,lsi
I am aware of this solution, but I don't know how can I turn the object name into a string within a function so that
it prints like my desired output.
I need to to this inside a function, because there are many other output I need to process.
We can use
cat(paste(c('v1', 'v2'), sapply(mylist, toString), sep=":", collapse="\n"), '\n')
#v1:foo, bar
#v2:qux, uip, lsi
If we need to pass the original object i.e. 'v1', 'v2'
make_string <- function(vec){
obj <- deparse(substitute(vec))
paste(obj, toString(vec), sep=":")
}
make_string(v1)
#[1] "v1:foo, bar"
If you want to use a list, you can name the objects in the list to be able to use them in a function. Remove the cat if you just want a string to be returned.
v1 <- c("foo","bar")
v2 <- c("qux","uip","lsi")
# objects given names here
mylist <- list("v1" = v1, "v2" = v2)
# see names now next to the $
mylist
$v1
[1] "foo" "bar"
$v2
[1] "qux" "uip" "lsi"
make_string <- function (content_vector) {
vecname <- names(content_vector)
cat(paste0(vecname, ":", paste(sapply(content_vector, toString), sep = ",")))
}
make_string(mylist[1])
v1:foo, bar
make_string(mylist[2])
v2:qux, uip, lsi

Setting an object to itself

I am new to R and have a question on the function posted here: R RStudio Resetting debug / function environment. Why are the objects set to themselves (e.g. "getmean = getmean" etc.)? Couldn't it simply be written as follows: list(set, get, setmean, getmean)
The difference is that
aa <- list(set, get, setmean, getmean)
is an unnamed list and
bb <- list(set=set, get=get, setmean=setmean, getmean=getmean)
is a named list. Compare names(aa) and names(bb).
And that = is not assignment. It's really just giving a label to a list item. It's one of the reasons R programmers try to only use <- for assignment and leave = with this special meaning. You could have easily also done
cc <- list(apple=set, banana=get, ornage=setmean, grape=getmean)
cc$apple()
It doesn't have to be the exact same name.
Because list(set, get, setmean, getmean) won't tag the list elements with the correct names. Here's an example of the difference between tagged and untagged lists:
> list(1, 2, 3)
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
> list(foo=1, bar=2, baz=3)
$foo
[1] 1
$bar
[1] 2
$baz
[1] 3
Note that in the context of argument lists, = is used to supply named arguments, it does not do any assignments (unlike <-). Thus list(foo=1, bar=2, baz=3) is very different from list(foo<-1, bar<-2, baz<-3).
The question has been answered, but you could also do this to achieve the same result.
> object <- c('set', 'get', 'setmean', 'getmean')
> setNames(object = as.list(object), nm = object)
# $set
# [1] "set"
#
# $get
# [1] "get"
#
# $setmean
# [1] "setmean"
#
# $getmean
# [1] "getmean"
The quotations are dependent on what these values actually are.
And you can set different names with like this
> setNames(as.list(object), letters[1:4])
# $a
# [1] "set"
#
# $b
# [1] "get"
#
# $c
# [1] "setmean"
#
# $d
# [1] "getmean"
setNames comes in handy when working with lapply.

Resources