Accesing variable name passed as argument inside apply - r

I made almost the same question in another post, but asking just for column name, and received a perfect solution for that need. Now what I need is the variable full name. I reformulate here.
I use 'deparse(substitute(x))' from inside my function to get variable name passed as parameter. It works great... but not with 'lapply'
myfun <- function(x)
{
return(deparse(substitute(x)))
}
a <- c(1,2,3)
b <- c(4,5,5)
df<-data.frame(a,b)
myfun(df$a)
[1] "df$a"
but, with 'lapply'...
lapply(df, myfun)
$a
[1] "X[[i]]"
$b
[1] "X[[i]]"
How can I get the variable name inside 'lapply'?
Thanks

When you pass a data frame to lapply, it iterates through the columns by numerical indexing using the double square bracket, not name indexing using the $ accessor. It is equivalent to using the following loop:
X <- df
result <- list()
for(i in seq_along(X)) {
result[[i]] <- myfun(X[[i]])
}
names(result) <- names(X)
result
#> $a
#> [1] "X[[i]]"
#>
#> $b
#> [1] "X[[i]]"
So a simple deparse(substitute(x)) will not work inside lapply. You are not recovering the column name, but rather would need to reconstruct it from the call stack. This is full of caveats and gotchas, but a (relatively) simple approach would be:
myfun <- function(x) {
stack <- lapply(sys.calls(), function(x) sapply(as.list(x), deparse))
if(stack[[length(stack)]][1] == 'myfun') {
return(stack[[length(stack)]][2])
}
if(stack[[length(stack)]][1] == 'FUN') {
return(paste0(stack[[length(stack) - 1]][2], '$',
eval(quote(names(X)[i]), parent.frame())))
}
deparse(substitute(x))
}
This means your function will still work if called directly:
myfun(df$a)
#> [1] "df$a"
But will also work within lapply
lapply(df, myfun)
#> $a
#> [1] "df$a"
#>
#> $b
#> [1] "df$b"
lapply(iris, myfun)
#> $Sepal.Length
#> [1] "iris$Sepal.Length"
#>
#> $Sepal.Width
#> [1] "iris$Sepal.Width"
#>
#> $Petal.Length
#> [1] "iris$Petal.Length"
#>
#> $Petal.Width
#> [1] "iris$Petal.Width"
#>
#> $Species
#> [1] "iris$Species"
It is specifically written to cover direct use or use within lapply. If you wanted to expand its use to work within other functional calls like Map or the various purrr mapping functions, then these would have to be covered specifically by their own if clauses.

Here is another solution, its a bit verbose and Allen's solution is much better:
myfun <- function(x) {
pf <- parent.frame()
x_nm <- deparse(substitute(x))
frame_n <- sys.nframe()
apply <- FALSE
while(frame_n > 0) {
cl <- as.list(sys.call(frame_n))
if (grepl("apply", cl[[1]])) {
x_obj <- cl[[2]]
apply <- TRUE
break
}
frame_n <- frame_n - 1L
}
if (apply) {
idx <- parent.frame()$i[]
obj <- get(x_obj, envir = pf)
if (!is.null(names(obj)[idx])) {
nm_or_idx <- names(obj)[idx]
} else {
nm_or_idx <- idx
}
x_nm <- paste0(x_obj, '$', nm_or_idx)
}
return(x_nm)
}
myfun(df$a)
#> [1] "df$a"
lapply(df, myfun)
#> $a
#> [1] "df$a"
#>
#> $b
#> [1] "df$b"
Created on 2023-02-09 by the reprex package (v2.0.1)

We can define a character string 'col_name'to take the name of the data frame column in the function. For example, if col_name is "a", df[[col_name]] extracts "a" column from data frame.Then we can use the paste() function to concatenate the string 'df$' and 'col_name':
myfun <- function(col_name) {
col <- df[[col_name]]
return(paste("df$", col_name, sep = ""))
}
lapply(colnames(df), myfun)
output
[[1]]
[1] "df$a"
[[2]]
[1] "df$b"
If we would like to assign any data we could do the assignment and then run lapply for example:
df <- iris
lapply(colnames(df), myfun)
output
[[1]]
[1] "df$Sepal.Length"
[[2]]
[1] "df$Sepal.Width"
[[3]]
[1] "df$Petal.Length"
[[4]]
[1] "df$Petal.Width"
[[5]]
[1] "df$Species"
I hope this could helps.

Related

Accesing column name inside lapply

I use 'deparse(substitute(x))' from inside my function to get the name of the dataframe column passed as argument. It works great... but not with 'lapply'
myfun <- function(x)
{
return(deparse(substitute(x)))
}
a <- c(1,2,3)
b <- c(4,5,5)
df<-data.frame(a,b)
myfun(df$a)
[1] "df$a"
but, with 'lapply'...
lapply(df, myfun)
$a
[1] "X[[i]]"
$b
[1] "X[[i]]"
How can I get the name inside 'lapply'?
EDIT: I need to access not the column name but the FULL NAME (dataFrameName$varName)
You can use colnames() :
f=function(d) {
paste0(deparse(substitute(d)),"$",colnames(d))
}

Why don't lambda functions handle replacement functions in their intuitive form?

Why don't lambda functions handle replacement functions in their natural form? For example, consider the length<- function. Say I want to standardize the lengths of a list of objects, I may do something like:
a <- list(c("20M1", "A1", "ACC1"), c("20M2", "A2", "ACC2"), c("20M3"))
mx <- max(lengths(a))
lapply(a, `length<-`, mx)
#> [[1]]
#> [1] "20M1" "A1" "ACC1"
#>
#> [[2]]
#> [1] "20M2" "A2" "ACC2"
#>
#> [[3]]
#> [1] "20M3" NA NA
However if I wanted to specify the argument input locations explicitly using a lambda function I'd need to do (which also works):
lapply(a, function(x) `length<-`(x, mx))
But why doesn't the more intuitive notation for replacement functions (see below) work?
lapply(a, function(x) length(x) <- mx)
#> [[1]]
#> [1] 3
#>
#> [[2]]
#> [1] 3
#>
#> [[3]]
#> [1] 3
This returns an output I did not expect. What is going on here? Lambda functions seem to handle the intuitive form of infix functions, so I was a little surprised they don't work with the intuitive form of replacement functions. Why is this / is there a way to specify replacement functions in lambda functions using their intuitive form?
(I imagine it has something to do with the special operator <-... but would be curious for a solution or more precise explanation).
Whenever you do an assignment in R, the value returned from that expression is the right hand side value. This is true even for "special" versions of assign functions. For example if you do this
x <- 1:2; y <- (names(x) <- letters[1:2])
> y
[1] "a" "b"
You can see that y gets the values of the names, not the updated value of x.
In your case if you want to return the updated value itself, you need to do so explicitly
lapply(a, function(x) {length(x) <- mx; x})

R: sapply / lapply Different Behaviour with Names

I reduced some problem to the following toy code:
cc<-c("1","2")
ff<-function(x) { list(myname=x)}
aa<-unlist(lapply(cc,ff))
bb<-sapply(cc,ff)
I'd expect aa and bb to be identical, but:
> aa
myname myname
"1" "2"
> bb
$`1.myname`
[1] "1"
$`2.myname`
[1] "2"
I'm aware of the USE.NAMES argument to sapply, however -
it's documented as -
USE.NAMES logical; if TRUE and if X is character, use X as names for
the result unless it had names already.
and so should have no impact in this case,
Internally, it isn't even passed to simplify2array and thus neither to the final unlist.
What's going on here? Could this be an R issue?
Edit: after further investigation it turns out the root cause for the difference is that sapply is essentially equivalent not to
unlist(lapply(cc,ff)
but rather to
unlist(lapply(cc, ff), recursive = FALSE)
(This is the exact internal unlist call).
Look carefully at this:
lapply(cc, ff)
#> [[1]]
#> [[1]]$myname
#> [1] "1"
#>
#>
#> [[2]]
#> [[2]]$myname
#> [1] "2"
The output of lapply itself doesn't have names. Look:
a <- lapply(cc, ff)
names(a)
#> NULL
The output of the lapply is actually an unnamed list. Each element of a is a named list.
names(a[[1]])
#> [1] "myname"
names(a[[2]])
#> [1] "myname"
So in fact, USE.NAMES will apply, and sapply will assign the contents of cc as names for the output of the lapply for which sapply is a thin wrapper as stated in the documentation. It's quite straightforward to follow the code through:
sapply
#> function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
#> {
#> FUN <- match.fun(FUN)
#> answer <- lapply(X = X, FUN = FUN, ...)
#> if (USE.NAMES && is.character(X) && is.null(names(answer)))
#> names(answer) <- X
#> if (!isFALSE(simplify) && length(answer))
#> simplify2array(answer, higher = (simplify == "array"))
#> else answer
#> }
#> <bytecode: 0x036ae7a8>
#> <environment: namespace:base>

Capture and evaluate function arguments within function body

I would like to capture a function's arguments within its body to help with logging. I have found that match.call() and sys.call() work when the argument value is explicitly stated in the function call, but don't output an evaluated value when an object name is used.
Here's a simplified example:
gauss_vector <- function(number) {
sys_args <- as.list(sys.call())
match_args <- as.list(match.call())
output <- rnorm(n = number)
list(sys_args,
match_args,
output)
}
When this function is called like this:
gauss_vector(number = 5)
The resulting list includes the value 5.
[[1]]
[[1]][[1]]
gauss_vector
[[1]]$number
[1] 5
[[2]]
[[2]][[1]]
gauss_vector
[[2]]$number
[1] 5
[[3]]
[1] 0.9663434 0.8051087 0.1576298 0.3189806 -2.3110680
However, when the function is called like this:
n <- 5
gauss_vector(number = n)
The resulting list only includes n.
[[1]]
[[1]][[1]]
gauss_vector
[[1]]$number
n
[[2]]
[[2]][[1]]
gauss_vector
[[2]]$number
n
[[3]]
[1] -0.6017670 -0.7631405 0.7793892 -0.7529637 1.3022802
Is there a way to capture the evaluated figure rather than the object name when the function is called in the second way?
You could eval all the arguments passed to the function.
gauss_vector <- function(number) {
sys_args <- as.list(sys.call())
sys_args[-1] <- lapply(sys_args[-1], eval)
match_args <- as.list(match.call())
match_args[-1] <- lapply(match_args[-1], eval)
output <- rnorm(n = number)
list(sys_args,match_args,output)
}
gauss_vector(n)
#[[1]]
#[[1]][[1]]
#gauss_vector
#[[1]][[2]]
#[1] 5
#[[2]]
#[[2]][[1]]
#gauss_vector
#[[2]]$number
#[1] 5
#[[3]]
#[1] 0.6998265 0.4037748 1.8558809 -0.1343624 -1.5600925

How to delay the evaluation of function arguments in R?

I would like to delay the evaluation of a function argument in R. Example:
my_func <- function(FUN){print(FUN); print(FUN)}
my_func(runif(1))
#> [1] 0.2833882
#> [1] 0.2833882
Created on 2019-07-21 by the reprex package (v0.2.1)
This works as documented because runif(1) is only evaluated once and its results printed twice.
Instead, I don't want runif(1) to be evaluated until it is within each print() statement. This would generate two different random numbers.
In other words, I don't want FUN to "resolve" --- if that is the right word --- to runif(1) until we are within a print() statement.
You can also achieve this with substitute and eval:
my_func <- function(FUN) {
print(eval(substitute(FUN)))
print(eval(substitute(FUN)))
}
my_func(runif(1))
#> [1] 0.09973534
#> [1] 0.8096205
my_func(runif(1))
#> [1] 0.2231202
#> [1] 0.5386637
NB: For additional details, check out this chapter Non-standard evaluation of Advanced R
Here is one trick with match.call and eval
my_func <- function(FUN){
print(eval(match.call()[[2]]))
print(eval(match.call()[[2]]))
}
my_func(runif(1))
#[1] 0.7439711
#[1] 0.5011816
my_func(runif(1))
#[1] 0.7864152
#[1] 0.730453
provide and expression
f = function(EXPR){
print(EXPR)
eval(EXPR)
}
EXPR = expression(runif(1))
> f(EXPR)
expression(runif(1))
[1] 0.1761139
provide an string
f2 = function(STR){
print(STR)
eval(parse(text = STR))
}
STR = "runif(1)"
> f2(STR)
[1] "runif(1)"
[1] 0.7630865

Resources