I am trying to come up with a variant of mapply (call it xapply for now) that combines the functionality (sort of) of expand.grid and mapply. That is, for a function FUN and a list of arguments L1, L2, L3, ... of unknown length, it should produce a list of length n1*n2*n3 (where ni is the length of list i) which is the result of applying FUN to all combinations of the elements of the list.
If expand.grid worked to generate lists of lists rather than data frames, one might be able to use it, but I have in mind that the lists may be lists of things that won't necessarily fit into a data frame nicely.
This function works OK if there are exactly three lists to expand, but I am curious about a more generic solution. (FLATTEN is unused, but I can imagine that FLATTEN=FALSE would generate nested lists rather than a single list ...)
xapply3 <- function(FUN,L1,L2,L3,FLATTEN=TRUE,MoreArgs=NULL) {
retlist <- list()
count <- 1
for (i in seq_along(L1)) {
for (j in seq_along(L2)) {
for (k in seq_along(L3)) {
retlist[[count]] <- do.call(FUN,c(list(L1[[i]],L2[[j]],L3[[k]]),MoreArgs))
count <- count+1
}
}
}
retlist
}
edit: forgot to return the result. One might be able to solve this by making a list of the indices with combn and going from there ...
I think I have a solution to my own question, but perhaps someone can do better (and I haven't implemented FLATTEN=FALSE ...)
xapply <- function(FUN,...,FLATTEN=TRUE,MoreArgs=NULL) {
L <- list(...)
inds <- do.call(expand.grid,lapply(L,seq_along)) ## Marek's suggestion
retlist <- list()
for (i in 1:nrow(inds)) {
arglist <- mapply(function(x,j) x[[j]],L,as.list(inds[i,]),SIMPLIFY=FALSE)
if (FLATTEN) {
retlist[[i]] <- do.call(FUN,c(arglist,MoreArgs))
}
}
retlist
}
edit: I tried #baptiste's suggestion, but it's not easy (or wasn't for me). The closest I got was
xapply2 <- function(FUN,...,FLATTEN=TRUE,MoreArgs=NULL) {
L <- list(...)
xx <- do.call(expand.grid,L)
f <- function(...) {
do.call(FUN,lapply(list(...),"[[",1))
}
mlply(xx,f)
}
which still doesn't work. expand.grid is indeed more flexible than I thought (although it creates a weird data frame that can't be printed), but enough magic is happening inside mlply that I can't quite make it work.
Here is a test case:
L1 <- list(data.frame(x=1:10,y=1:10),
data.frame(x=runif(10),y=runif(10)),
data.frame(x=rnorm(10),y=rnorm(10)))
L2 <- list(y~1,y~x,y~poly(x,2))
z <- xapply(lm,L2,L1)
xapply(lm,L2,L1)
#ben-bolker, I had a similar desire and think I have a preliminary solution worked out, that I've also tested to work in parallel. The function, which I somewhat confusingly called gmcmapply (g for grid) takes an arbitrarily large named list mvars (that gets expand.grid-ed within the function) and a FUN that utilizes the list names as if they were arguments to the function itself (gmcmapply will update the formals of FUN so that by the time FUN is passed to mcmapply it's arguments reflect the variables that the user would like to iterate over (which would be layers in a nested for loop)). mcmapply then dynamically updates the values of these formals as it cycles over the expanded set of variables in mvars.
I've posted the preliminary code as a gist (reprinted with an example below) and would be curious to get your feedback on it. I'm a grad student, that is self-described as an intermediately-skilled R enthusiast, so this is pushing my R skills for sure. You or other folks in the community may have suggestions that would improve on what I have. I do think even as it stands, I'll be coming to this function quite a bit in the future.
gmcmapply <- function(mvars, FUN, SIMPLIFY = FALSE, mc.cores = 1, ...){
require(parallel)
FUN <- match.fun(FUN)
funArgs <- formals(FUN)[which(names(formals(FUN)) != "...")] # allow for default args to carry over from FUN.
expand.dots <- list(...) # allows for expanded dot args to be passed as formal args to the user specified function
# Implement non-default arg substitutions passed through dots.
if(any(names(funArgs) %in% names(expand.dots))){
dot_overwrite <- names(funArgs[which(names(funArgs) %in% names(expand.dots))])
funArgs[dot_overwrite] <- expand.dots[dot_overwrite]
#for arg naming and matching below.
expand.dots[dot_overwrite] <- NULL
}
## build grid of mvars to loop over, this ensures that each combination of various inputs is evaluated (equivalent to creating a structure of nested for loops)
grid <- expand.grid(mvars,KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE)
# specify formals of the function to be evaluated by merging the grid to mapply over with expanded dot args
argdefs <- rep(list(bquote()), ncol(grid) + length(expand.dots) + length(funArgs) + 1)
names(argdefs) <- c(colnames(grid), names(funArgs), names(expand.dots), "...")
argdefs[which(names(argdefs) %in% names(funArgs))] <- funArgs # replace with proper dot arg inputs.
argdefs[which(names(argdefs) %in% names(expand.dots))] <- expand.dots # replace with proper dot arg inputs.
formals(FUN) <- argdefs
if(SIMPLIFY) {
#standard mapply
do.call(mcmapply, c(FUN, c(unname(grid), mc.cores = mc.cores))) # mc.cores = 1 == mapply
} else{
#standard Map
do.call(mcmapply, c(FUN, c(unname(grid), SIMPLIFY = FALSE, mc.cores = mc.cores)))
}
}
example code below:
# Example 1:
# just make sure variables used in your function appear as the names of mvars
myfunc <- function(...){
return_me <- paste(l3, l1^2 + l2, sep = "_")
return(return_me)
}
mvars <- list(l1 = 1:10,
l2 = 1:5,
l3 = letters[1:3])
### list output (mapply)
lreturns <- gmcmapply(mvars, myfunc)
### concatenated output (Map)
lreturns <- gmcmapply(mvars, myfunc, SIMPLIFY = TRUE)
## N.B. This is equivalent to running:
lreturns <- c()
for(l1 in 1:10){
for(l2 in 1:5){
for(l3 in letters[1:3]){
lreturns <- c(lreturns,myfunc(l1,l2,l3))
}
}
}
### concatenated outout run on 2 cores.
lreturns <- gmcmapply(mvars, myfunc, SIMPLIFY = TRUE, mc.cores = 2)
Example 2. Pass non-default args to FUN.
## Since the apply functions dont accept full calls as inputs (calls are internal), user can pass arguments to FUN through dots, which can overwrite a default option for FUN.
# e.g. apply(x,1,FUN) works and apply(x,1,FUN(arg_to_change= not_default)) does not, the correct way to specify non-default/additional args to FUN is:
# gmcmapply(mvars, FUN, arg_to_change = not_default)
## update myfunc to have a default argument
myfunc <- function(rep_letters = 3, ...){
return_me <- paste(rep(l3, rep_letters), l1^2 + l2, sep = "_")
return(return_me)
}
lreturns <- gmcmapply(mvars, myfunc, rep_letters = 1)
A bit of additional functionality I would like to add but am still trying to work out is
cleaning up the output to be a pretty nested list with the names of mvars (normally, I'd create multiple lists within a nested for loop and tag lower-level lists onto higher level lists all the way up until all layers of the gigantic nested loop were done). I think using some abstracted variant of the solution provided here will work, but I haven't figured out how to make the solution flexible to the number of columns in the expand.grid-ed data.frame.
I would like an option to log the outputs of the child processesthat get called in mcmapply in a user-specified directory. So you could look at .txt outputs from every combination of variables generated by expand.grid (i.e. if the user prints model summaries or status messages as a part of FUN as I often do). I think a feasible solution is to use the substitute() and body() functions, described here to edit FUN to open a sink() at the beginning of FUN and close it at the end if the user specifies a directory to write to. Right now, I just program it right into FUN itself, but later it would be nice to just pass gmcmapply an argument called something like log_children = "path_to_log_dir. and then editing the body of the function to (pseudocode) sink(file = file.path(log_children, paste0(paste(names(mvars), sep = "_"), ".txt")
Let me know what you think!
-Nate
Related
I am looking for advice on the best way to code passing a list of functions as argument.
What I want to do:
I would like to pass as an argument a list a functions to apply them to a specific input. And give output name based on those.
A "reproducible" example
input = 1:5
Functions to pass are mean, min
Expected call:
foo(input, something_i_ask_help_for)
Expected output:
list(mean = 3, min = 1)
If it's not perfectly clear, please see my two solutions to have an illustration.
Solution 1: Passing functions as arguments
foo <- function(input, funs){
# Initialize output
output = list()
# Compute output
for (fun_name in names(funs)){
# For each function I calculate it and store it in output
output[fun_name] = funs[[fun_name]](input)
}
return(output)
}
foo(1:5, list(mean=mean, min=min))
What I don't like with this method is that we can't call it by doing: foo(1:5, list(mean, min)).
Solution 2: passing functions names as argument and using get
foo2 <- function(input, funs){
# Initialize output
output = list()
# Compute output
for (fun in funs){
# For each function I calculate it and store it in output
output[fun] = get(fun)(input)
}
return(output)
}
foo2(1:5, c("mean", "min"))
What i don't like with this method is that we are not really passing the function-object as argument.
My question:
Both ways works, but I not quite sure which one to choose.
Could you help me by:
Telling me which one is the best?
What are the advantages and defaults of each methods?
Is there a another (better) method
If you need any more information, don't hesitate to ask.
Thanks!
Simplifying solutions in question
The first of the solutions in the question requires that the list be named and the second requires that the functions have names which are passed as character strings. Those two user interfaces could be implemented using the following simplifications. Note that we add an envir argument to foo2 to ensure function name lookup occurs as expected. Of those the first seems cleaner but if the functions were to be used interactively and less typing were desired then the second does do away with having to specify the names.
foo1 <- function(input, funs) Map(function(f) f(input), funs)
foo1(1:5, list(min = min, max = max)) # test
foo2 <- function(input, nms, envir = parent.frame()) {
Map(function(nm) do.call(nm, list(input), envir = envir), setNames(nms, nms))
}
foo2(1:5, list("min", "max")) # test
Alternately we could build foo2 on foo1:
foo2a <- function(input, funs, envir = parent.frame()) {
foo1(input, mget(unlist(funs), envir = envir, inherit = TRUE))
}
foo2a(1:5, list("min", "max")) # test
or base the user interface on passing a formula containing the function names since formulas already incorporate the notion of environment:
foo2b <- function(input, fo) foo2(input, all.vars(fo), envir = environment(fo))
foo2b(1:5, ~ min + max) # test
Optional names while passing function itself
However, the question indicates that it is preferred that
the functions themselves be passed
names are optional
To incorporate those features the following allows the list to have names or not or a mixture. If a list element does not have a name then the expression defining the function (usually its name) is used.
We can derive the names from the list's names or when a name is missing we can use the function name itself or if the function is anonymous and so given as its definition then the name can be the expression defining the function.
The key is to use match.call and pick it apart. We ensure that funs is a list in case it is specified as a character vector. match.fun will interpret functions and character strings naming functions and look them up in the parent frame so we use a for loop instead of Map or lapply in order that we not generate a new function scope.
foo3 <- function(input, funs) {
cl <- match.call()[[3]][-1]
nms <- names(cl)
if (is.null(nms)) nms <- as.character(cl)
else nms[nms == ""] <- as.character(cl)[nms == ""]
funs <- as.list(funs)
for(i in seq_along(funs)) funs[[i]] <- match.fun(funs[[i]])(input)
setNames(funs, nms)
}
foo3(1:5, list(mean = mean, min = "min", sd, function(x) x^2))
giving:
$mean
[1] 3
$min
[1] 1
$sd
[1] 1.581139
$`function(x) x^2`
[1] 1 4 9 16 25
One thing that you are missing is replacing the for loops with lapply. Also for functional programming it is often good practice to separate functions to do one thing. I personally like the version from solution 1 where you pass the functions in directly because it avoids another call in R and therefore is more efficient. In solution 2, it is best to use match.fun instead of get. match.fun is stricter than get in searching for functions.
x <- 1:5
foo <- function(input, funs) {
lapply(funs, function(fun) fun(input))
}
foo(x, c(mean=mean, min=min))
The above code simplifies your solution 1. To add to this function, you could add some error handling such as is.numeric for x and is.function for funs.
My dataset looks like this, and I have a list of data.
Plot_ID Canopy_infection_rate DAI
1 YO01 5 7
2 YO01 8 14
3 YO01 10 21
What I want to do is to apply a function called "audpc_Canopyinfactionrate" to a list of dataframes.
However, when I run lapply, I get an error as below:
Error in FUN(X[[i]], ...) : argument "DAI" is missing, with no default
I've checked my list that my data does not shift a column.
Does anyone know what's wrong with it? Thanks
Here is part of my code:
#Read files in to list
for(i in 1:length(files)) {
lst[[i]] <- read.delim(files[i], header = TRUE, sep=" ")
}
#Apply a function to the list
densities <- list()
densities<- lapply(lst, audpc_Canopyinfactionrate)
#canopy infection rate
audpc_Canopyinfactionrate <- function(Canopy_infection_rate,DAI){
n <- length(DAI)
meanvec <- matrix(-1,(n-1))
intvec <- matrix(-1,(n-1))
for(i in 1:(n-1)){
meanvec[i] <- mean(c(Canopy_infection_rate[i],
Canopy_infection_rate[i+1]))
intvec[i] <- DAI[i+1] - DAI[i]
}
infprod <- meanvec * intvec
sum(infprod)
}
As pointed out in the comments, the problem lies in the way you are using lapply.
This function is built up like this: lapply(X, FUN, ...). FUN is the name of a function used to apply to the elements in a data.frame/list called X. So far so good.
Back to your case: You want to apply a function audpc_Canopyinfactionrate() to all data frames in lst. This function takes two arguments. And I think this is where things got mixed up in your code. Make sure you understand that in the way you are using lapply, you use lst[[1]], lst[[2]], etc. as the only argument in audpc_Canopyinfactionrate(), whereas it actually requires two arguments!
If you reformulate your function a bit, you can use lst[[1]], lst[[2]] as the only argument to your function, because you know that argument contains the columns you need - Canopy_infection_rate and DAI:
audpc_Canopyinfactionrate <- function(df){
n <- nrow(df)
meanvec <- matrix(-1, (n-1))
intvec <- matrix(-1, (n-1))
for(i in 1:(n-1)){
meanvec[i] <- mean(c(df$Canopy_infection_rate[i],
df$Canopy_infection_rate[i+1]))
intvec[i] <- df$DAI[i+1] - df$DAI[i]
}
infprod <- meanvec * intvec
return(sum(infprod))
}
Call lapply in the following way:
lapply(lst, audpc_Canopyinfactionrate)
Note: lapply can also be used with more than 1 argument, by using the ... in lapply(X, FUN, ...). In your case, however, I think this is not the best option.
I have a list which contains more lists of lists:
results <- sapply(c(paste0("cv_", seq(1:50)), "errors"), function(x) NULL)
## Locations for results to be stored
step_results <- sapply(c("myFit", "forecast", "errors"), function(x) NULL)
step_errors <- sapply(c("MAE", "MSE", "sign_accuracy"), function(x) NULL)
final_error <- sapply(c("MAE", "MSE", "sign_accuracy"), function(x) NULL)
for(i in 1:50){results[[i]] <- step_results}
for(i in 1:50){results[[i]][[3]] <- step_errors}
results$errors <- final_error
Now in this whole structure, I would like to sum up all the values in sign_accuracy and save them in results$errors$sign_accuracy
I could maybe do this with a for-loop, indexing with i:
## This is just an example - it won't actually work!
sign_acc <- matrix(nrow = 50, ncol = 2)
for (i in 1:50){
sign_acc[i, ] <- `results[[i]][[3]][[3]]`
results$errors$sign_accuracy <- sign_acc
}
If I remember correctly, in Matlab there is something like list(:), which means all elements. In Python I have seen something like list(0:-1), which also means all elements.
What is the elegent R equivalent? I don't really like loops.
I have seen methods using the apply family of functions. With something like apply(data, "[[", 2), but can't get it to work for deeper lists.
Did you try with c(..., recursive)?
Here is an option with a short example at the end:
sumList <- function(l, label) {
lc <- c(l, recursive=T)
filter <- grepl(paste0("\\.",label, "$"), names(lc)) | (names(lc) == label)
nums <- lc[filter]
return(sum(as.numeric(nums)))
}
ex <- list(a=56,b=list("5",a=34,list(c="3",a="5")))
sumList(ex,"a")
In this case, you can do what you want with
results$errors$sign_accuracy <- do.call(sum, lapply(results, function(x){x[[3]][[3]]}))
lapply loops through the first layer of results, and pulls out the third element of the third element for each. do.call(sum catches all the results and sums them.
The real problems with lists arise when the nesting is more irregular, or when you need to loop through more than one index. It can always be done in the same way, but it gets extraordinarily ugly very quickly.
When I have data.frame objects, I can simply do View(df), and then I get to see the data.frame in a nice table (even if I can't see all of the rows, I still have an idea of what variables my data contains).
But when I have a list object, the same command does not work. And when the list is large, I have no idea what the list looks like.
I've tried head(mylist) but my console simply cannot display all of the information at once. What's an efficient way to look at a large list in R?
Here's a few ways to look at a list:
Look at one element of a list:
myList[[1]]
Look at the head of one element of a list:
head(myList[[1]])
See the elements that are in a list neatly:
summary(myList)
See the structure of a list (more in depth):
str(myList)
Alternatively, as suggested above you could make a custom print method as such:
printList <- function(list) {
for (item in 1:length(list)) {
print(head(list[[item]]))
}
}
The above will print out the head of each item in the list.
I use str to see the structure of any object, especially complex list's
Rstudio shows you the structure by clicking at the blue arrow in the data-window:
You can also use a package called listviewer
library(listviewer)
jsonedit( myList )
If you have a really large list, you can look at part of it using
str(myList, max.level=1)
(If you don't feel like typing out the second argument, it can be written as max=1 since there are no other arguments that start with max.)
I do this often enough that I have an alias in my .Rprofile for it:
str1 <- function(x, ...) str(x, max.level=1, ...)
And a couple others that limit the printed output (see example(str) for an example of using list.len):
strl <- function(x, len=10L, ...) str(x, list.len=len, ...) # lowercase L in the func name
str1l <- function(x, len=10L, ...) str(x, max.level=1, list.len=len, ...)
you can check the "head" of your dataframes using lapply family:
lapply(yourList, head)
which will return the "heads" of you list.
For example:
df1 <- data.frame(x = runif(3), y = runif(3))
df2 <- data.frame(x = runif(3), y = runif(3))
dfs <- list(df1, df2)
lapply(dfs, head)
Returns:
> lapply(dfs, head)
[[1]]
x y
1 0.3149013 0.8418625
2 0.8807581 0.5048528
3 0.2490966 0.2373453
[[2]]
x y
1 0.4132597 0.5762428
2 0.0303704 0.3399696
3 0.9425158 0.5465939
Instead of "head" you can use any function related to the data.frames, i.e. names, nrow...
Seeing as you explicitly specify that you want to use View() with a list, this is probably what you are looking for:
View(myList[[x]])
Where x is the number of the list element that you wish to view.
For example:
View(myList[[1]])
will show you the first element of the list in the standard View() format that you will be used to in RStudio.
If you know the name of the list item you wish to view, you can do this:
View(myList[["itemOne"]])
There are several other ways, but these will probably serve you best.
This is a simple edit of giraffehere's excellent answer.
For some lists it is convenient to only print the head of a subset of the nested objects, to print the name of the given slot above the output of head().
Arguments:
#'#param list a list object name
#'#param n an integer - the the objects within the list that you wish to print
#'#param hn an integer - the number of rows you wish head to print
USAGE: printList(mylist, n = 5, hn = 3)
printList <- function(list, n = length(list), hn = 6) {
for (item in 1:n) {
cat("\n", names(list[item]), ":\n")
print(head(list[[item]], hn))
}
}
For numeric lists, output may be more readable if the number of digits is limited to 3, eg:
printList <- function(list, n = length(list), hn = 6) {
for (item in 1:n) {
cat("\n", names(list[item]), ":\n")
print(head(list[[item]], hn), digits = 3)
}
}
I had a similar problem and managed to solve it using as_tibble() on my list (dplyr or tibble packages), then just use View() as usual.
In recent versions of RStudio, you can just use View() (or alternatively click on the little blue arrow beside the object in the Global Environment pane).
For example, if we create a list with:
test_list <- list(
iris,
mtcars
)
Then either of the above methods will show you:
I like using as.matrix() on the list and then can use the standard View() command.
I'm doing cross validation. So I wanted to split data into 10 folds. Somebody has post following code.
f_K_fold <- function(Nobs,K=10){
rs <- runif(Nobs)
id <- seq(Nobs)[order(rs)]
k <- as.integer(Nobs * seq(1, K-1) / K)
k <- matrix(c(0, rep(k, each=2), Nobs), ncol = 2, byrow = TRUE)
k[,1] <- k[,1]+1
l <- lapply(seq.int(K), function(x, k, d)
list(train=d[!(seq(d) %in% seq(k[x, 1],k[x, 2]))],
test=d[seq(k[x,1],k[x,2])]),
k=k,d=id)
return(l)
}
however I don't really understand what the lapply doing. Could someone explain to a newbie? Appreciate it.
It's really unfortunate that the code folding in this example is horrible, since aving properly formatted code can aid in understanding the code and catching mistakes.
The last three lines can be viewed as an anonymous function passed to lapply. lapply in essence "climbs" a list and for each list element, applies that (anonymous) function. In the example below, I've disambiguated the lines into a not so anonymous function and a call to lapply.
notSoanonymousFunction <- function(x, k, d) {
list(train = d[!(seq(d) %in% seq(k[x,1],k[x,2]))],
test = d[seq(k[x,1],k[x,2])])
}
l <- lapply(seq.int(K), FUN = notSoanonymousFunction, k = k, d = id)
If you look at ?lapply, you'll notice that there are no k or d arguments. However, these arguments do belong to our notSoanonymousFunction, and lapply takes it in via the ... argument.
As a mental exercise for you, I will show you one more trick how to learn what the function is doing. If you need to see what is happening inside the function, place a browser() call inside and run it. In your case, this would look like this:
notSoanonymousFunction <- function(x, k, d) {
browser()
list(train = d[!(seq(d) %in% seq(k[x,1],k[x,2]))],
test = d[seq(k[x,1],k[x,2])])
}
Once you run this, your console should say something along the lines of
Browser[1] >
You are now effectively inside the function. You can navigate to next line by typing n, running the whole chunk by c and quitting the browser all together, by pressing Q (see ?browser()). You can view and manipulate objects ad libidum. You can try by checking your workspace with ls() to see which objects are inside the function. You can bet your family farm that there will be objects x, k and d.