Using backticks and operators in apply family functions - r

I saw in a recent answer an apply family function with assignments built-in and can't generalize it.
lst <- list(a=1, b=2:3)
lst
$a
[1] 1
$b
[1] 2 3
This can't yet be made into a data.frame because of the unequal lengths. But by coercing the max length to the list, it works:
data.frame(lapply(lst, `length<-`, max(lengths(lst))))
a b
1 1 2
2 NA 3
That works. But I've never used arrow assignments in apply functions. I tried to understand it by generalizing like:
lapply(lst, function(x) length(x) <- max(lengths(lst)))
$a
[1] 2
$b
[1] 2
That's not the correct output. Nor is
lapply(lst, function(x) length(x) <- max(lengths(x)))
Error in lengths(x) : 'x' must be a list
This would be a useful technique to understand well. Is there a way to express the assignment in the anonymous function form?

By using anonymous functions, we are returning only the value of that function, and not the value of 'x'. We have to specify return(x) or simply x.
lapply(lst, function(x) {
length(x) <- max(lengths(lst))
x})
#$a
#[1] 1 NA
#$b
#[1] 2 3

Related

How can I remove elements by columns number from a list?

I've like to remove elements in a list, if the number of elements are smaller than 3.
For this I try:
#Create a list
my_list <- list(a = c(3,5,6), b = c(3,1,0), c = 4, d = NA)
my_list
$a
[1] 3 5 6
$b
[1] 3 1 0
$c
[1] 4
$d
[1] NA
# Thant I create a function for remove the elements by my condition:
delete.F <- function(x.list){
x.list[unlist(lapply(x.list, function(x) ncol(x)) < 3)]}
delete.F(my_list)
And I have as output:
Error in unlist(lapply(x.list, function(x) ncol(x)) < 3) :
(list) object cannot be coerced to type 'double'
Any ideas, please?
An option is to create a logical expression with lengths and use that for subsetting the list
my_list[lengths(my_list) >=3]
#$a
#[1] 3 5 6
#$b
#[1] 3 1 0
Note that in the example, it is a list of vectors and not a list of data.frame. the ncol/nrow is when there is a dim attribute - matrix checks TRUE for that, as do data.frame
If we want to somehow use lapply (based on some constraints), create the logic with length
unlist(lapply(my_list, function(x) if(length(x) >=3 ) x))
If we need to create the index with lapply, use length (but it would be slower than lengths)
my_list[unlist(lapply(my_list, length)) >= 3]
Here are few more options. Using Filter in base R
Filter(function(x) length(x) >=3, my_list)
#$a
#[1] 3 5 6
#$b
#[1] 3 1 0
Or using purrr's keep and discard
purrr::keep(my_list, ~length(.) >= 3)
purrr::discard(my_list, ~length(.) < 3)

Fail to join the variables with its name into a list using lapply?

I have tried to use the code below but failed. I want to know why it failed and what's the correct (and elegant) way to do that?
a <- 1
b <- 2
res <- lapply(ls(), function(x, l) { l$x <- get(x)}, l=list())
I hope I get the result like
res
# $a
# [1] 1
# $b
# [1] 2
but what I get is
res
# [[1]]
# [1] 1
# [[2]]
# [1] 2
We can use mget to obtain the value of more than one object and it returns a named list
mget(ls())
#$a
#[1] 1
#$b
#[1] 2
If we need to use get, then set the names with ls()
setNames(lapply(ls(), get), ls())
Using sapply:
sapply(ls(), get, simplify = FALSE)
# $a
# [1] 1
#
# $b
# [1] 2
sapply has simplify and USE.NAMES arguments, both have default values of TRUE. So by setting simplify to FALSE we are keeping the result as named list.

How to subset an environment by its variable names in r

I would like to subset an environment by its variable names.
e <- new.env(parent=emptyenv())
e$a <- 1
e$b <- 2
e$d <- 3
e[ls(e) %in% c("a","b", "c")]
### if e was a list, this would return the subset list(a=1, b=2)
I could not figure out how to subset elements of an environment by their names. Using lapply or eapply does not work either. What is the proper or easy way to subset an environment by its variable names?
Thank you.
Okay, after thinking this through a bit more, may I suggest:
mget(c("a","b"), envir=e)
#$a
#[1] 1
#
#$b
#[1] 2
My original solution is to use get() / mget() (maybe OP saw my deleted comment earlier). Then I noticed that OP had tried eapply(), so I thought about possible solutions with that. Here it is (with help of #thelatemail).
# try some different data type
e <- new.env(parent=emptyenv())
e$a <- 1:3
e$b <- matrix(1:4, 2)
e$c <- data.frame(x=letters[1:2],y=LETTERS[1:2])
You can use either of the following to collect objects in environment e into a list:
elst <- eapply(e, "[") ## my idea
elst <- eapply(e, identity) ## thanks to #thelatemail
elst <- as.list.environment(e) ## thanks to #thelatemail
#$a
#[1] 1 2 3
#$b
# [,1] [,2]
#[1,] 1 3
#[2,] 2 4
#$c
# x y
#1 a A
#2 b B
The as.list.environment() can be seen as the inverse operation of list2env(). It is mentioned in the "See Also" part of ?list2env.
The result elst is just an ordinary list. There are various way to subset this list. For example:
elst[names(elst) %in% c("a","b")] ## no need to use "ls(e)" now
#$a
#[1] 1 2 3
#$b
# [,1] [,2]
#[1,] 1 3
#[2,] 2 4
mget(ls(e)[ls(e) %in% c('a','b','d')], e)
The [ operator usually returns the same type of object as the original, so I guess you're expecting an environment, rather than a list. The same environment but with a different set of elements, or a new environment with the specified elements? Either way I think you'll end up iterating, e.g.,
f = new.env(parent=emptyenv())
for (elt in c("a", "b"))
f[[elt]] = e[[elt]]
Working with environments is not very idiomatic R code, which might explain why there is not a more elegant solution.
You can use rlang::env_get_list() to get a list of the bindings:
rlang::env_get_list(env=e, c("a","b"))
#$a
#[1] 1
#
#$b
#[1] 2
If you're trying to get an environment, rather than a list, I'm not sure how you would do that, other than just creating a new environment using the output of rlang::env_get_list().
If you want to include elements in your list that might not exist in the environment (like "c"), you have to specify a default value - otherwise you'll get an error:
env_get_list(env = e, c("a","b","c"))
#Error in env_get_list(env = e, c("a", "b", "c")) : argument "default" is missing, with no default
env_get_list(env = e, c("a","b","c"),default=NULL)
#$a
#[1] 1
#
#$b
#[1] 2
#
#$c
#NULL
I assume you don't want c at all, so I'd do something like:
temp <- c("a","b","c")[c("a","b","c") %in% env_names(e)]
temp
[1] "a" "b"
env_get_list(env=e,temp)
#$a
#[1] 1
#
#$b
#[1] 2

Subset different vector elements within a list

Assume I have this list of vectors:
mylist <- list(a=1:3,b=4:1,c=1:5)
mylist
$a
[1] 1 2 3
$b
[1] 4 3 2 1
$c
[1] 1 2 3 4 5
I want to get the last or the max element of each vector like this for the last element:
$a
[1] 3
$b
[1] 1
$c
[1] 5
What I have tried so far:
First use lapply and the length function to get the last element index and then subset:
last <- unlist(lapply(mylist, length))
lapply(mylist,"[", last) # not working
Then I tried to use sapply with lapply. This is working, but I'm not sure whether this is generally valid. There must be a better base R solution (without loops!).
mymatrix <- sapply(last, function(x) lapply(mylist, "[",x))
diag(mymatrix)
$a
[1] 3
$b
[1] 1
$c
[1] 5
(Making this a CV as there were many contributes here and worth summing this up)
If you have some function you want to apply on your list, a simple lapply should do, such as
lapply(mylist, max) # retrieving the maximum values
Or
lapply(mylist, tail, 1) # retrieving the last values (by #docendo)
If you want to operate on two vectors simultaneously, you could use mapply or Map
Map(`[`, mylist, lengths(mylist)) # A Map version of #docendos lapply suggestion
Or per your newest request
Map(`[`, mylist, 1:3)

How to efficiently turn a particular element (which is a scalar) of each list in a list into a vector?

My current way is
coalesce <- function(x){
if (is.null(x)) NA else x
}
data[,aa:=sapply(JSON, function(x) coalesce(x$a))]
data[,bb:=sapply(JSON, function(x) x$b)]
> JSON <- list(list(a=1, b=1), list(b=2))
> JSON
[[1]]
[[1]]$a
[1] 1
[[1]]$b
[1] 1
[[2]]
[[2]]$b
[1] 2
> sapply(JSON, function(x) coalesce(x$a))
[1] 1 NA
> sapply(JSON, function(x) x$b)
[1] 1 2
JSON is a list of lists, each list may contain a which I would like to grab. If a doesn't exist, NA is returned. Each list must contain b. Both a and b are always scalars.
My Rprof tells me the majority time spent lies in sapply and Fun and coalesce.
I am wondering if there is any way to improve it?
Update
Sample data
x <- list(a=1, b=1)
y <- list(a=1)
JSON <- rep(list(x,y),300000)
system.time(sapply(JSON, function(x) x$a))
system.time(sapply(JSON, function(x) coalesce(x$b)))
Try coalescing after you extract the value and stick to lapply, that should speed things up (and if you posted a reasonable benching sample, we could test it):
unlist(lapply(lapply(JSON, "[[", "a"), coalesce))
There's an error in the way you're using sapply - what you want is:
sapply(JSON, function(x) coalesce(x)$a)
But that's really not optimal, and returns NULL when coalesce returns NA (probably not what you want.
Modify coalesce:
coalesce <- function(x){
if (is.null(x$a)) NA else x$a
}
And do:
data[,b:=sapply(JSON, coalesce)]

Resources