I reduced some problem to the following toy code:
cc<-c("1","2")
ff<-function(x) { list(myname=x)}
aa<-unlist(lapply(cc,ff))
bb<-sapply(cc,ff)
I'd expect aa and bb to be identical, but:
> aa
myname myname
"1" "2"
> bb
$`1.myname`
[1] "1"
$`2.myname`
[1] "2"
I'm aware of the USE.NAMES argument to sapply, however -
it's documented as -
USE.NAMES logical; if TRUE and if X is character, use X as names for
the result unless it had names already.
and so should have no impact in this case,
Internally, it isn't even passed to simplify2array and thus neither to the final unlist.
What's going on here? Could this be an R issue?
Edit: after further investigation it turns out the root cause for the difference is that sapply is essentially equivalent not to
unlist(lapply(cc,ff)
but rather to
unlist(lapply(cc, ff), recursive = FALSE)
(This is the exact internal unlist call).
Look carefully at this:
lapply(cc, ff)
#> [[1]]
#> [[1]]$myname
#> [1] "1"
#>
#>
#> [[2]]
#> [[2]]$myname
#> [1] "2"
The output of lapply itself doesn't have names. Look:
a <- lapply(cc, ff)
names(a)
#> NULL
The output of the lapply is actually an unnamed list. Each element of a is a named list.
names(a[[1]])
#> [1] "myname"
names(a[[2]])
#> [1] "myname"
So in fact, USE.NAMES will apply, and sapply will assign the contents of cc as names for the output of the lapply for which sapply is a thin wrapper as stated in the documentation. It's quite straightforward to follow the code through:
sapply
#> function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
#> {
#> FUN <- match.fun(FUN)
#> answer <- lapply(X = X, FUN = FUN, ...)
#> if (USE.NAMES && is.character(X) && is.null(names(answer)))
#> names(answer) <- X
#> if (!isFALSE(simplify) && length(answer))
#> simplify2array(answer, higher = (simplify == "array"))
#> else answer
#> }
#> <bytecode: 0x036ae7a8>
#> <environment: namespace:base>
Related
I made almost the same question in another post, but asking just for column name, and received a perfect solution for that need. Now what I need is the variable full name. I reformulate here.
I use 'deparse(substitute(x))' from inside my function to get variable name passed as parameter. It works great... but not with 'lapply'
myfun <- function(x)
{
return(deparse(substitute(x)))
}
a <- c(1,2,3)
b <- c(4,5,5)
df<-data.frame(a,b)
myfun(df$a)
[1] "df$a"
but, with 'lapply'...
lapply(df, myfun)
$a
[1] "X[[i]]"
$b
[1] "X[[i]]"
How can I get the variable name inside 'lapply'?
Thanks
When you pass a data frame to lapply, it iterates through the columns by numerical indexing using the double square bracket, not name indexing using the $ accessor. It is equivalent to using the following loop:
X <- df
result <- list()
for(i in seq_along(X)) {
result[[i]] <- myfun(X[[i]])
}
names(result) <- names(X)
result
#> $a
#> [1] "X[[i]]"
#>
#> $b
#> [1] "X[[i]]"
So a simple deparse(substitute(x)) will not work inside lapply. You are not recovering the column name, but rather would need to reconstruct it from the call stack. This is full of caveats and gotchas, but a (relatively) simple approach would be:
myfun <- function(x) {
stack <- lapply(sys.calls(), function(x) sapply(as.list(x), deparse))
if(stack[[length(stack)]][1] == 'myfun') {
return(stack[[length(stack)]][2])
}
if(stack[[length(stack)]][1] == 'FUN') {
return(paste0(stack[[length(stack) - 1]][2], '$',
eval(quote(names(X)[i]), parent.frame())))
}
deparse(substitute(x))
}
This means your function will still work if called directly:
myfun(df$a)
#> [1] "df$a"
But will also work within lapply
lapply(df, myfun)
#> $a
#> [1] "df$a"
#>
#> $b
#> [1] "df$b"
lapply(iris, myfun)
#> $Sepal.Length
#> [1] "iris$Sepal.Length"
#>
#> $Sepal.Width
#> [1] "iris$Sepal.Width"
#>
#> $Petal.Length
#> [1] "iris$Petal.Length"
#>
#> $Petal.Width
#> [1] "iris$Petal.Width"
#>
#> $Species
#> [1] "iris$Species"
It is specifically written to cover direct use or use within lapply. If you wanted to expand its use to work within other functional calls like Map or the various purrr mapping functions, then these would have to be covered specifically by their own if clauses.
Here is another solution, its a bit verbose and Allen's solution is much better:
myfun <- function(x) {
pf <- parent.frame()
x_nm <- deparse(substitute(x))
frame_n <- sys.nframe()
apply <- FALSE
while(frame_n > 0) {
cl <- as.list(sys.call(frame_n))
if (grepl("apply", cl[[1]])) {
x_obj <- cl[[2]]
apply <- TRUE
break
}
frame_n <- frame_n - 1L
}
if (apply) {
idx <- parent.frame()$i[]
obj <- get(x_obj, envir = pf)
if (!is.null(names(obj)[idx])) {
nm_or_idx <- names(obj)[idx]
} else {
nm_or_idx <- idx
}
x_nm <- paste0(x_obj, '$', nm_or_idx)
}
return(x_nm)
}
myfun(df$a)
#> [1] "df$a"
lapply(df, myfun)
#> $a
#> [1] "df$a"
#>
#> $b
#> [1] "df$b"
Created on 2023-02-09 by the reprex package (v2.0.1)
We can define a character string 'col_name'to take the name of the data frame column in the function. For example, if col_name is "a", df[[col_name]] extracts "a" column from data frame.Then we can use the paste() function to concatenate the string 'df$' and 'col_name':
myfun <- function(col_name) {
col <- df[[col_name]]
return(paste("df$", col_name, sep = ""))
}
lapply(colnames(df), myfun)
output
[[1]]
[1] "df$a"
[[2]]
[1] "df$b"
If we would like to assign any data we could do the assignment and then run lapply for example:
df <- iris
lapply(colnames(df), myfun)
output
[[1]]
[1] "df$Sepal.Length"
[[2]]
[1] "df$Sepal.Width"
[[3]]
[1] "df$Petal.Length"
[[4]]
[1] "df$Petal.Width"
[[5]]
[1] "df$Species"
I hope this could helps.
Why don't lambda functions handle replacement functions in their natural form? For example, consider the length<- function. Say I want to standardize the lengths of a list of objects, I may do something like:
a <- list(c("20M1", "A1", "ACC1"), c("20M2", "A2", "ACC2"), c("20M3"))
mx <- max(lengths(a))
lapply(a, `length<-`, mx)
#> [[1]]
#> [1] "20M1" "A1" "ACC1"
#>
#> [[2]]
#> [1] "20M2" "A2" "ACC2"
#>
#> [[3]]
#> [1] "20M3" NA NA
However if I wanted to specify the argument input locations explicitly using a lambda function I'd need to do (which also works):
lapply(a, function(x) `length<-`(x, mx))
But why doesn't the more intuitive notation for replacement functions (see below) work?
lapply(a, function(x) length(x) <- mx)
#> [[1]]
#> [1] 3
#>
#> [[2]]
#> [1] 3
#>
#> [[3]]
#> [1] 3
This returns an output I did not expect. What is going on here? Lambda functions seem to handle the intuitive form of infix functions, so I was a little surprised they don't work with the intuitive form of replacement functions. Why is this / is there a way to specify replacement functions in lambda functions using their intuitive form?
(I imagine it has something to do with the special operator <-... but would be curious for a solution or more precise explanation).
Whenever you do an assignment in R, the value returned from that expression is the right hand side value. This is true even for "special" versions of assign functions. For example if you do this
x <- 1:2; y <- (names(x) <- letters[1:2])
> y
[1] "a" "b"
You can see that y gets the values of the names, not the updated value of x.
In your case if you want to return the updated value itself, you need to do so explicitly
lapply(a, function(x) {length(x) <- mx; x})
As far as I am aware, aside from simplification, everything that can be done with mapply can be done with Map. After all, Map is a wrapper for mapply. However, I was surprised to see that mapply takes both a ... set of arguments (which the docs call "arguments to vectorize over (vectors or lists of strictly positive length, or all of zero length)") and a MoreArgs argument on top of the required function f, whereas Map does not use MoreArgs, only needing ... (which the docs just call "vectors") and f.
My question is this: Why does mapply need MoreArgs but Map doesn't? Can mapply do something that Map cannot? Or is mapply trying to make something easier that would be harder with Map? And if so, what?
I suspect that sapply may be a useful point of reference for an answer. It may be helpful to compare its X, FUN and ... arguments to mapply's ... and MoreArgs.
Let's look at the code of Map :
Map
function (f, ...)
{
f <- match.fun(f)
mapply(FUN = f, ..., SIMPLIFY = FALSE)
}
As you wrote, it's just a wrapper, and the dots are forwarded to mapply, note that SIMPLIFY is hardcoded to FALSE
Why does mapply need MoreArgs but Map doesn't?
It's a design choice, possibly due in part to historical reasons, I wouldn't have minded explicit MoreArgs and USE.NAMES arguments (or a SIMPLIFY = TRUE argument for that matter), but I believe the rationale is that Map is meant to be simple, if you want to tweak parameters you're encouraged to use mapply. Nevertheless, you can use MoreArgs and USE.NAMES with Map, they will travel through the dots to the mapply call, though it is undocumented, as the doc describes the ... argument as "vectors".
Map(sum, setNames(1:2, c("a", "b")), 1:2)
#> $a
#> [1] 2
#>
#> $b
#> [1] 4
Map(sum, setNames(1:2, c("a", "b")), 1:2, USE.NAMES = FALSE)
#> [[1]]
#> [1] 2
#>
#> [[2]]
#> [1] 4
Map(
replicate,
2:3,
c(FALSE, TRUE),
MoreArgs = list(expr = quote(runif(1))))
#> [[1]]
#> [[1]][[1]]
#> [1] 0.7523955
#>
#> [[1]][[2]]
#> [1] 0.4922519
#>
#>
#> [[2]]
#> [1] 0.81626690 0.07415023 0.56264388
The equivalent mapply calls would be :
mapply(sum, setNames(1:2, c("a", "b")), 1:2, SIMPLIFY = FALSE)
#> $a
#> [1] 2
#>
#> $b
#> [1] 4
mapply(sum, setNames(1:2, c("a", "b")), 1:2, USE.NAMES = FALSE, SIMPLIFY = FALSE)
#> [[1]]
#> [1] 2
#>
#> [[2]]
#> [1] 4
mapply(
replicate,
2:3,
c(FALSE, TRUE),
MoreArgs = list(expr = quote(runif(1))))
#> [[1]]
#> [[1]][[1]]
#> [1] 0.6690229
#>
#> [[1]][[2]]
#> [1] 0.7529774
#>
#>
#> [[2]]
#> [1] 0.8632736 0.7822639 0.8553680
Can mapply do something that Map cannot? Or is mapply trying to make something easier that would be harder with Map? And if so, what?
You cannot use SIMPLIFY with Map:
Map(sum, 1:3, 1:3, SIMPLIFY = TRUE)
#> Error in mapply(FUN = f, ..., SIMPLIFY = FALSE): formal argument "SIMPLIFY" matched by multiple actual arguments
a bit of history
mapply was introduced with R 1.7.0
Map was introduced with R 2.6, the NEWS item reads :
New higher-order functions Reduce(), Filter() and Map().
The f argument name is shared among those functions and they are documented on the same page. The reason why moving away from naming functions FUN is unknown to me, but the consistency between these 3 functions (and the other functions documented in ?Reduce) explains why mapply and Map don't name their function argument the same way (the consistency explains the upper case M too I guess).
In the doc we can also read :
Map is a simple wrapper to mapply which does not attempt to simplify the result, similar to Common Lisp's mapcar (with arguments being recycled, however). Future versions may allow some control of the result type.
So, in theory, as I understand from the last sentence, Map could be upgraded to provide some type stability, similar to what vapply does. It seems to me that R-devel didn't go all the way because they wanted to take time to decide properly what to do, and they left it in this state since then.
I would like to delay the evaluation of a function argument in R. Example:
my_func <- function(FUN){print(FUN); print(FUN)}
my_func(runif(1))
#> [1] 0.2833882
#> [1] 0.2833882
Created on 2019-07-21 by the reprex package (v0.2.1)
This works as documented because runif(1) is only evaluated once and its results printed twice.
Instead, I don't want runif(1) to be evaluated until it is within each print() statement. This would generate two different random numbers.
In other words, I don't want FUN to "resolve" --- if that is the right word --- to runif(1) until we are within a print() statement.
You can also achieve this with substitute and eval:
my_func <- function(FUN) {
print(eval(substitute(FUN)))
print(eval(substitute(FUN)))
}
my_func(runif(1))
#> [1] 0.09973534
#> [1] 0.8096205
my_func(runif(1))
#> [1] 0.2231202
#> [1] 0.5386637
NB: For additional details, check out this chapter Non-standard evaluation of Advanced R
Here is one trick with match.call and eval
my_func <- function(FUN){
print(eval(match.call()[[2]]))
print(eval(match.call()[[2]]))
}
my_func(runif(1))
#[1] 0.7439711
#[1] 0.5011816
my_func(runif(1))
#[1] 0.7864152
#[1] 0.730453
provide and expression
f = function(EXPR){
print(EXPR)
eval(EXPR)
}
EXPR = expression(runif(1))
> f(EXPR)
expression(runif(1))
[1] 0.1761139
provide an string
f2 = function(STR){
print(STR)
eval(parse(text = STR))
}
STR = "runif(1)"
> f2(STR)
[1] "runif(1)"
[1] 0.7630865
mylist <- list(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
123, NULL, 456)
> mylist
[[1]]
NULL
[[2]]
NULL
[[3]]
NULL
[[4]]
NULL
[[5]]
NULL
[[6]]
NULL
[[7]]
NULL
[[8]]
NULL
[[9]]
NULL
[[10]]
NULL
[[11]]
[1] 123
[[12]]
NULL
[[13]]
[1] 456
My list has 13 elements, 11 of which are NULL. I would like to remove them, but preserve the indices of the elements that are nonempty.
mylist2 = mylist[-which(sapply(mylist, is.null))]
> mylist2
[[1]]
[1] 123
[[2]]
[1] 456
This removes the NULL elements just fine, but I don't want the nonempty elements to be reindexed, i.e, I want mylist2 to look something like this, where the indices of the nonempty entries are preserved.
> mylist2
[[11]]
[1] 123
[[13]]
[1] 456
The closest you'll be able to get is to first name the list elements and then remove the NULLs.
names(x) <- seq_along(x)
## Using some higher-order convenience functions
Filter(Negate(is.null), x)
# $`11`
# [1] 123
#
# $`13`
# [1] 456
# Or, using a slightly more standard R idiom
x[sapply(x, is.null)] <- NULL
x
# $`11`
# [1] 123
#
# $`13`
# [1] 456
Simply use mylist[lengths(mylist) != 0].
Function lengths() was introduced in R 3.2.0 (April 2015).
The purrr package, included in Tidyverse, has elegant and fast functions for working with lists:
require(tidyverse)
# this works
compact(mylist)
# or this
mylist %>% discard(is.null)
# or this
# pipe "my_list" data object into function "keep()", make lambda function inside "keep()" to return TRUE FALSE.
mylist %>% keep( ~ !is.null(.) )
All above options are from Purrr. Output is:
[[1]]
[1] 123
[[2]]
[1] 456
Note: compact() was in plyr, but dplyr superseded plyr, and compact() stayed around but moved to purrr. Anyway, all the functions are within the parent package tidyverse.
Here's a link to the Purrr cheat sheet download:
https://rstudio.com/resources/cheatsheets/
Or to view the Purrr cheatsheet directly in a browser:
https://evoldyn.gitlab.io/evomics-2018/ref-sheets/R_purrr.pdf
There's a function that automatically removes all the null entries of a list, and if the list is named, it maintains the names of the non-null entries.
This function is called compact from the package plyr.
l <- list( NULL, NULL, foo, bar)
names(l) <- c( "one", "two", "three", "four" )
plyr::compact(l)
If you want to preserve the indexes of the non-null entries, you can name the list as it is done in the post before and then compact your list:
names(l) <- seq_along(l)
plyr::compact(l)
If you want to keep the names you can do
a <- list(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
123, NULL, 456)
non_null_names <- which(!sapply(a, is.null))
a <- a[non_null_names]
names(a) <- non_null_names
a
You can then access the elements like so
a[['11']]
num <- 11
a[[as.character(num)]]
a[[as.character(11)]]
a$`11`
You can't get them in the neat [[11]], [[13]] notation, though, because those represent numerical indices.
This solution works with nested list as well
rlist::list.clean(myNestedlist ,recursive = T)
Here it is with convenient chaining notation
library(magrittr)
mylist %>%
setNames(seq_along(.)) %>%
Filter(. %>% is.null %>% `!`, .)
here's a very simple way to do it using only base R functions:
names(mylist) <- 1:length(mylist)
mylist2 <- mylist[which(!sapply(mylist, is.null))]