Get name of a functions inside a list - r

What I wish to achieve
So I want to get the names of my function inside a list of function.
Here is an example:
foo = list(foo1 = sum, foo2 = mean)
What I wish to extract from foo is:
list("sum", "mean")
And I would like it to be a function, meaning:
> foo = list(foo1 = sum, foo2 = mean)
> super_function(foo)
list("sum", "mean")
What I have checked
Applying names:
> sapply(foo , names)
$`foo1`
NULL
$foo2
NULL
Applying deparse(substitute())
> my_f <- function(x)deparse(substitute(x))
> sapply(foo, my_f)
foo1 foo2
"X[[i]]" "X[[i]]"
Neither idea works....
More background:
Here are some more details. One don't need them to understand the first question, but are extra details asked by community.
I'm using those functions as aggregation functions given by the user.
data(iris)
agg_function<-function(data, fun_to_apply){
res <- list()
for (col_to_transform in names(fun_to_apply)){
res[col_to_transform] <- (fun_to_apply[[col_to_transform]])(data[[col_to_transform]])
}
res
}
agg_function(iris, fun_to_apply = list("Sepal.Length" = mean, "Petal.Length" = sum))
Result is:
$`Sepal.Length`
[1] 5.843333
$Petal.Length
[1] 563.7
In this example I'm performing aggregation on two columns of iris. But I wish to have the name of the performed function in the name of each field of my result.
NB: This is an over simplification of what I'm doing;
Conclusion:
Do you have any ideas?

If you are starting from just the list foo = list(foo1 = sum, foo2 = mean), then it's not possible. The call to list() will evaluate the parameters returning the values that the variables sum and mean point to but it will not remember those variable names. Functions don't have names in R. But functions can be assigned to variables. However in R functions can live without names as well.
You've basically just created a named list of function. That might also look like this
foo = list(foo1 = function(x) sum(x+1),
foo2 = function(x) mean(x+1))
Here we also have functions, but these functions don't have "names" other than the names you gave to them in the list.
This only chance you have of making this work is using something other than list() when creating foo in the first place. Or having them actually explicitly call list() in the function call (which isn't very practical).

Despite you already said that tidyverse is not suitable for you, I will add this as an other idea.
agg_function <- function(df, x, ...){
df %>%
summarise_at(.vars = x, funs(...))
}
agg_function(iris, c("Sepal.Length", "Petal.Length"), mean, sum)
Sepal.Length_mean Petal.Length_mean Sepal.Length_sum Petal.Length_sum
1 5.843333 3.758 876.5 563.7

You can use a list with the functions as strings
foo <- list(foo1 = "mean", foo2 = "sum")
foo
$foo1
[1] "mean"
$foo2
[1] "sum"
get(foo[[1]])(1:10)
[1] 5.5
get(foo[[2]])(1:10)
[1] 55
Or use the rlang package and do something like
library(rlang)
foo <- quos(foo1 = mean, foo2 = sum)
getNames <- function(x) {
+ sapply(x, function(x) x[[2]])
+ }
getNames(foo)
$foo1
mean
$foo2
sum
eval_tidy(foo[[1]])(1:10)
[1] 5.5
eval_tidy(foo[[2]])(1:10)
[1] 55
This also works with non named functions
foo <- quos(foo1 = function(x) sum(x + 1), foo2 = sum)
getNames(foo)
$foo1
function(x) sum(x + 1)
$foo2
sum
eval_tidy(foo[[1]])(1:10)
[1] 65

Related

How to use non standard evaluation with dollar sign in r

Context
I want to use non-standard evaluation with dollar sign in R.
I want to customize a function with two parameters. data is the input data frame, var is the name of the variable in the input data frame. The return value is the value corresponding to the variable name.
I can do this with fun1.
library(dplyr)
df = data.frame(a = 1:3)
fun1 <- function(data, var){
data %>% pull({{var}})
}
> fun1(data = df, var = a)
[1] 1 2 3
Question
Is there any way to perform the function of fun1 with non-standard evaluation and dollar signs($).
My solution is as follows (fun2) but it reports an error.
fun2 <- function(data, var){ # Can't create fun2, it will report an error.
data${{var}}
}
fun2(data = df, var = a)
We may use [[ to extract after converting the unquoted to string
fun2 <- function(data, var){
var <- deparse(substitute(var))
data[[var]]
}
-testing
> fun2(data = df, var = a)
[1] 1 2 3
It is better to use [[ instead of $. In case, we want to only use $, then paste the names and eval (not recommended)
fun2 <- function(data, var) {
argslist <- as.list(match.call()[-1])
eval(parse(text = sprintf('%s$%s',
deparse(argslist$data), deparse(argslist$var))))
}
> fun2(df, a)
[1] 1 2 3

lapply function with arguments I want to pick from a dataframe with a loop

I'm still very new to R and haven't found any answer so far. Sorry to finally ask.
Edition with a quick example:
I want to compute a multidimensional development index based on South Africa Data.
My list is composed of individual information for each year, so basically df1 is about year 1 and df2 about year2.
df1<-data.frame(var1=c(1, 1,1), var2=c(0,0,1), var3=c(1,1,0))
df2<-data.frame(var1=c(1, 0,1), var2=c(1,0,1), var3=c(0,1,0))
mylist <-list (df1,df2)
You can find here a very simplified working index function:
myindex <- function(x, dimX, dimY){
econ_i<- ( x[dimX]+ x[dimY] )
return ( (1/length(econ_i))*sum(econ_i) )
}
myindex(df1, "var2", "var3")
Then I have my dataframe of variables I want to use for my index
mydf <- data.frame(set1=c("var1", "var2"), set2=c("var2", "var3"))
I'm using a function to get arguments from database such as:
pick_values <-function(x){
vect <-c()
for(i in x){
vect <- c(vect, i)
}
return(vect)
}
I'd like to set up a lapply loop such that I apply my function for my list, for all sets of arguments in my dataframe. In other words, I'd like to compute my index for both years, with all sets of variables I can use. //end Edit
I've tried many unsuccessful things so far. For instance:
lapply(mylist, myindex, lapply(mydf,pick_values))
Thanks a lot for your help!
Okay, I don't like your mydf name nor that it has factors, so I rename it args because it has function arguments and I set stringsAsFactors = F:
args <- data.frame(set1=c("var1", "var2"), set2=c("var2", "var3"), stringsAsFactors = F)
We'll also write a wrapper for myindex that accepts a vector of arguments instead of dimX and dimY:
myindex2 = function(x, d) {
myindex(x, d[1], d[2])
}
Then we can nest lapply like this:
lapply(mylist, function(m) lapply(args, myindex2, x = m))
# $df1
# $df1$set1
# [1] 4
#
# $df1$set2
# [1] 3
#
#
# $df2
# $df2$set1
# [1] 4
#
# $df2$set2
# [1] 3

Passing an expression into `MoreArgs` of `mapply`

I'm doing some programming using dplyr, and am curious about how to pass an expression as (specifically a MoreArgs) argument to mapply?
Consider a simple function F that subsets a data.frame based on some ids and a time_range, then outputs a summary statistic based on some other column x.
require(dplyr)
F <- function(ids, time_range, df, date_column, x) {
date_column <- enquo(date_column)
x <- enquo(x)
df %>%
filter(person_id %chin% ids) %>%
filter(time_range[1] <= (!!date_column) & (!!date_column) <= time_range[2]) %>%
summarise(newvar = sum(!!x))
}
We can make up some example data to which we can apply our function F.
person_ids <- lapply(1:2, function(i) sample(letters, size = 10))
time_ranges <- lapply(list(c("2014-01-01", "2014-12-31"),
c("2015-01-01", "2015-12-31")), as.Date)
require(data.table)
dt <- CJ(person_id = letters,
date_col = seq.Date(from = as.Date('2014-01-01'), to = as.Date('2015-12-31'), by = '1 day'))
dt[, z := rnorm(nrow(dt))] # The variable we will later sum over, i.e. apply F to.
We can successfully apply our function to each of our inputs.
F(person_ids[[1]], time_ranges[[1]], dt, date_col, z)
F(person_ids[[2]], time_ranges[[2]], dt, date_col, z)
And so if I wanted, I could write a simple for-loop to solve my problem. But if we try to apply syntactic sugar and wrap everything within mapply, we get an error.
mapply(F, ids = person_ids, time_range = time_ranges, MoreArgs = list(df = dt, date_column = date_col, x = z))
# Error in mapply... object 'date_col' not found
In mapply, MoreArgs is provided as a list, but R tries to evaluate the list elements, causing the error. As suggested by #Gregor, you can quote those MoreArgs that we don't want to evaluate immediately, preventing the error and allowing the function to proceed. This can be done with base quote or dplyr quo:
mapply(F, person_ids, time_ranges, MoreArgs = list(dt, quote(date_col), quote(z)))
mapply(F, person_ids, time_ranges, MoreArgs = list(dt, quo(date_col), quo(z)))
Another option is to use map2 from the purrr package, which is the tidyverse equivalent of mapply with two input vectors. tidyverse functions are set up to work with non-standard evaluation, which avoids the error you're getting with mapply without the need for quoting the arguments:
library(purrr)
map2(person_ids, time_ranges, F, dt, date_col, z)
[[1]]
newvar
1 40.23419
[[2]]
newvar
1 71.42327
More generally, you could use pmap, which iterates in parallel over any number of input vectors:
pmap(list(person_ids, time_ranges), F, dt, date_col, z)

Apply family of functions for functions with multiple arguments

I would like to use a function from the apply family (in R) to apply a function of two arguments to two matrices. I assume this is possible. Am I correct? Otherwise, it would seem that I have to put the two matrices into one, and redefine my function in terms of the new matrix.
Here's an example of what I'd like to do:
a <- matrix(1:6,nrow = 3,ncol = 2)
b <- matrix(7:12,nrow = 3,ncol = 2)
foo <- function(vec1,vec2){
d <- sample(vec1,1)
f <- sample(vec2,1)
result <- c(d,f)
return(result)
}
I would like to apply foo to a and b.
(Strictly answering the question, not pointing you to a better approach for you particular use here....)
mapply is the function from the *apply family of functions for applying a function while looping through multiple arguments.
So what you want to do here is turn each of your matrices into a list of vectors that hold its rows or columns (you did not specify). There are many ways to do that, I like to use the following function:
split.array.along <- function(X, MARGIN) {
require(abind)
lapply(seq_len(dim(X)[MARGIN]), asub, x = X, dims = MARGIN)
}
Then all you have to do is run:
mapply(foo, split.array.along(a, 1),
split.array.along(b, 1))
Like sapply, mapply tries to put your output into an array if possible. If instead you prefer the output to be a list, add SIMPLIFY = FALSE to the mapply call, or equivalently, use the Map function:
Map(foo, split.array.along(a, 1),
split.array.along(b, 1))
You could adjust foo to take one argument (a single matrix), and use apply in the function body.
Then you can use lapply on foo to sample from each column of each matrix.
> a <- matrix(1:6,nrow = 3,ncol = 2)
> b <- matrix(7:12,nrow = 3,ncol = 2)
> foo <- function(x){
apply(x, 2, function(z) sample(z, 1))
}
> lapply(list(a, b), foo)
## [[1]]
## [1] 1 6
## [[2]]
## [1] 8 12

Object not found error with ddply inside a function

This has really challenged my ability to debug R code.
I want to use ddply() to apply the same functions to different columns that are sequentially named; eg. a, b, c. To do this I intend to repeatedly pass the column name as a string and use the eval(parse(text=ColName)) to allow the function to reference it. I grabbed this technique from another answer.
And this works well, until I put ddply() inside another function. Here is the sample code:
# Required packages:
library(plyr)
myFunction <- function(x, y){
NewColName = "a"
z = ddply(x, y, summarize,
Ave = mean(eval(parse(text=NewColName)), na.rm=TRUE)
)
return(z)
}
a = c(1,2,3,4)
b = c(0,0,1,1)
c = c(5,6,7,8)
df = data.frame(a,b,c)
sv = c("b")
#This works.
ColName = "a"
ddply(df, sv, summarize,
Ave = mean(eval(parse(text=ColName)), na.rm=TRUE)
)
#This doesn't work
#Produces error: "Error in parse(text = NewColName) : object 'NewColName' not found"
myFunction(df,sv)
#Output in both cases should be
# b Ave
#1 0 1.5
#2 1 3.5
Any ideas? NewColName is even defined inside the function!
I thought the answer to this question, loops-to-create-new-variables-in-ddply, might help me but I've done enough head banging for today and it's time to raise my hand and ask for help.
Today's solution to this question is to make summarize into here(summarize). e.g.
myFunction <- function(x, y){
NewColName = "a"
z = ddply(x, y, here(summarize),
Ave = mean(eval(parse(text=NewColName)), na.rm=TRUE)
)
return(z)
}
here(f), added to plyr in Dec 2012, captures the current context.
You can do this with a combination of do.call and call to construct the call in an environment where NewColName is still visible:
myFunction <- function(x,y){
NewColName <- "a"
z <- do.call("ddply",list(x, y, summarize, Ave = call("mean",as.symbol(NewColName),na.rm=TRUE)))
return(z)
}
myFunction(d.f,sv)
b Ave
1 0 1.5
2 1 3.5
I occasionally run into problems like this when combining ddply with summarize or transform or something and, not being smart enough to divine the ins and outs of navigating various environments I tend to side-step the issue by simply not using summarize and instead using my own anonymous function:
myFunction <- function(x, y){
NewColName <- "a"
z <- ddply(x, y, .fun = function(xx,col){
c(Ave = mean(xx[,col],na.rm=TRUE))},
NewColName)
return(z)
}
myFunction(df,sv)
Obviously, there is a cost to doing this stuff 'manually', but it often avoids the headache of dealing with the evaluation issues that come from combining ddply and summarize. That's not to say, of course, that Hadley won't show up with a solution...
The problem lies in the code of the plyr package itself. In the summarize function, there is a line eval(substitute(...),.data,parent.frame()). It is well known that parent.frame() can do pretty funky and unexpected stuff. T
he solution of #James is a very nice workaround, but if I remember right #Hadley himself said before that the plyr package was not intended to be used within functions.
Sorry, I was wrong here. It is known though that for the moment, the plyr package gives problems in these situations.
Hence, I give you a base solution for the problem :
myFunction <- function(x, y){
NewColName = "a"
z = aggregate(x[NewColName],x[y],mean,na.rm=TRUE)
return(z)
}
> myFunction(df,sv)
b a
1 0 1.5
2 1 3.5
Looks like you have an environment problem. Global assignment fixes the problem, but at the cost of one's soul:
library(plyr)
a = c(1,2,3,4)
b = c(0,0,1,1)
c = c(5,6,7,8)
d.f = data.frame(a,b,c)
sv = c("b")
ColName = "a"
ddply(d.f, sv, summarize,
Ave = mean(eval(parse(text=ColName)), na.rm=TRUE)
)
myFunction <- function(x, y){
NewColName <<- "a"
z = ddply(x, y, summarize,
Ave = mean(eval(parse(text=NewColName)), na.rm=TRUE)
)
return(z)
}
myFunction(x=d.f,y=sv)
eval is looking in parent.frame(1). So if you instead define NewColName outside MyFunction it should work:
rm(NewColName)
NewColName <- "a"
myFunction <- function(x, y){
z = ddply(x, y, summarize,
Ave = mean(eval(parse(text=NewColName)), na.rm=TRUE)
)
return(z)
}
myFunction(x=d.f,y=sv)
By using get to pull out my.parse from the earlier environment, we can come much closer, but still have to pass curenv as a global:
myFunction <- function(x, y){
NewColName <- "a"
my.parse <- parse(text=NewColName)
print(my.parse)
curenv <<- environment()
print(curenv)
z = ddply(x, y, summarize,
Ave = mean( eval( get("my.parse" , envir=curenv ) ), na.rm=TRUE)
)
return(z)
}
> myFunction(x=d.f,y=sv)
expression(a)
<environment: 0x0275a9b4>
b Ave
1 0 1.5
2 1 3.5
I suspect that ddply is evaluating in the .GlobalEnv already, which is why all of the parent.frame() and sys.frame() strategies I tried failed.

Resources