Progress bar and mapply (input as list) - r

I would like to monitor the progress of my mapply function. The data consists of 2 lists and there is a function with 2 arguments.
If I do something similar with a function that takes 1 arguments I can use ldply instead of lapply. (I'd like to rbind.fill the output to a data.frame)
If I want to do the same with mdply it doesn't work as the function in mdply wants values taken from columns of a data frame or array. Mapply takes lists as input.
These plyr apply functions are handy, not just because I can get the output as a data.frame but also because I can use the progress bar.
I know there is the pbapply package but that there is no mapply version and there is the txtProgressBar function but I could not figure out how to use this with mapply.
I tried to create a reproducible example (takes around 30 s to run)
I guess bad example. My l1 is a list of scraped websites (rvest::read_html) which I cannot send as a data frame to mdply. The lists really need to be lists.
mdply <- plyr::mdply
l1 <- as.list(rep("a", 2*10^6+1))
l2 <- as.list(rnorm(-10^6:10^6))
my_func <- function(x, y) {
ab <- paste(x, "b", sep = "_")
ab2 <- paste0(ab, exp(y), sep = "__")
return(ab2)
}
mapply(my_func, x = l1, y = l2)
mdply does't work
mdply(l1, l2, my_func, .progress='text')
Error in do.call(flat, c(args, list(...))) : 'what' must be a function or character string

From ?mdply I dare say you can't specify two data-inputs. Your error message means mdply is trying to use l2 as function but a list cannot be coerced into a function...
The following works fine
mdply(
data.frame(x=unlist(l1), y=unlist(l2)), # create a data.frame from l1 and l2
my_func, # your function
.progress=plyr::progress_text(style = 3) # create a textual progress bar
)[, 3] # keep the output only
I think I've understood your purpose now:
mdply(
.data=data.frame(r=1:length(l1)), # "fake data" (I will use them as item index)
.fun=function(r) return(my_func(l1[[r]], l2[[r]])), # a wrapper function of your function
.progress=plyr::progress_text(style = 3) # create a textual progress bar
)[, 2] # keep the output only
Please note I had to wrap your function with a new one which takes into account just one argument and it uses that argument to access l1 and l2

Answering my own question. There is now a function called pbmapply in pbapply that adds progress bars to mapply.

Related

What does declaring function at the start of a line do?

I encountered this code:
res <- lapply(strsplit(s, "\n")[[1]],
(function (str) paste(rev(strsplit(str, "")[[1]]), collapse = "")))
The secodnd line reverses each of the splitted strings at the first line.
How does it do that? Namely, what does calling 'function' at the start do?
Calling lapply takes and performs some function on each list element. It takes the form lapply(list_data, some_function). So, for instance, if I have a list of integers and want to find out how many integers are in each list element, I would run:
list_data <- list(list1 = 1:5,
list2 = 6:10,
list3 = 11:30)
lapply(list_data, length)
The function here is length, which is a function that is inherent in R. Some functions aren't defined in R, say if I want define my own formula for each value in the list, I could define my own function. Calling a function allows users to define a function that is not already in R or an R library. Like so:
lapply(list_data, function(x) x^2+4-x^3)
The function here is x^2+4-x^3, which is not defined in R programming itself.
So in your example, your data is strsplit(s, "\n")[[1]] and it is taking that data and applying the function paste(rev(strsplit(str, "")[[1]]), collapse = "")) to each element in the data.
Note that in my example, I put function(x) - your example puts function(str) - what's in the parentheses doesn't matter and is user defined. For example lapply(list_data, function(str) str^2+4-str^3) will return the same thing as lapply(list_data, function(x) x^2+4-x^3)
Please note that broad "learning" style questions like this are not exactly what this site is for, and this question will likely get removed and/or receive some negative feedback. Since you are new to this site and to R, I'm providing this answer but I would not be surprised if the question is removed. Just trying to help both you and the SO community!

R: Use (m?)apply with a function that returns a list as the argument to the function to be applied over?

I have a function of two arguments foo(a,b). As an input of this function, I was to use every row of the output of combinations(10,2) from the gtools library. I've tried to get it to work with mapply and I really had high hopes for apply(combinations(10,2),1,foo), but everything that I've attempted throws the error "argument "b" is missing, with no default". How can I correct this without storing combinations(10,2) in memory and dividing it up? I suspect that I'm missing a trick with Vectorize.
For a simple reproducible example, use beta(a,b) in place of foo(a,b).
What I very specifically do not want to do is anything like:
a<-combinations(10,2)
mapply(foo,a[,1],a[,2])
because I do not want to store combinations(10,2) in memory.
Here we can use do.call with mapply or Map
do.call(mapply, c(FUN = foo, asplit(combinations(10, 2), 2)))
Or with Map (returns a list)
do.call(Map, c(f = foo, asplit(combinations(10, 2), 2)))
As a reproducible example, can use beta
do.call(Map, c(f = beta, asplit(combinations(10, 2), 2)))

how to get variable names from list

I have list of functions which also contains one user defined function:
> fun <- function(x) {x}
> funs <- c(median, mean, fun)
Is it possible to get function names as strings from this list? My only workaround so far was to create vector which contains function names as strings:
> fun.names <- c("median", "mean", "fun")
When I want to get variable name I use to do this trick (if this is not correct correct me please) but as you can see it only work for one variable not for list:
> as.character(substitute(mean))
[1] "mean"
> as.character(substitute(funs))
[1] "funs"
Is there something that will work also for list? Is there any difference if list contains functions or data types?
EDIT:
I need to pass this list of functions (plus another data) to another function. Then those functions from list will be applied to dataset. Function names are needed because if there are several functions passed in list I want to being able to determine which function was applied. So far I've been using this:
window.size <- c(1,2,3)
combinations <- expand.grid(window.size, c(median, mean))
combinations <- cbind(combinations, rep(c("median","mean"), each = length(window.size)))
Generally speaking, this is not possible. Consider this definition of funs:
funs <- c(median,mean,function(x) x);
In this case, there's no name associated with the user-defined function at all. There's no rule in R that says all functions must be bound to a name at any point in time.
If you want to start making some assumptions about whether and where all such lambdas are defined, then possibilities open up.
One idea is to search the closure environment of each function for an entry that matches (identically) to the function itself, and then use that name. This will incur a performance penalty due to the comparison work, but may be tolerable if you don't have to run it repetitively:
getFunNameFromClosure <- function(fun) names(which(do.call(c,eapply(environment(fun),identical,fun)))[1L]);
Demo:
fun <- function(x) x;
funs <- c(median,mean,fun);
sapply(funs,getFunNameFromClosure);
## [1] "median" "mean" "fun"
Caveats:
1: As explained earlier, this will not work on functions that were never bound to a name. Furthermore, it will not work on functions whose closure environment does not contain a binding to the function. This could happen if the function was bound to a name in a different environment than its closure (via a return value, superassignment, or assign() call) or if its closure environment was explicitly changed.
2: It is possible to bind a function to multiple names. Thus, the name you get as a result of the eapply() search may not be the one you expect. Here's a good demonstration of this:
getFunNameFromClosure(ls); ## gets wrong name
## [1] "objects"
identical(ls,objects); ## this is why
## [1] TRUE
Here is a hacky approach:
funs <- list(median, mean)
fun_names = sapply(funs, function(x) {
s = as.character(deparse(eval(x)))[[2]]
gsub('UseMethod\\(|[[:punct:]]', '', s)
})
names(funs) <- fun_names
funs
$median
function (x, na.rm = FALSE)
UseMethod("median")
<bytecode: 0x103252878>
<environment: namespace:stats>
$mean
function (x, ...)
UseMethod("mean")
<bytecode: 0x103ea11b8>
<environment: namespace:base>
combinations <- expand.grid(window.size, fun_names, c(median, mean))

Print.myclass function in R

I am trying to develop my first package in R and I am facing some issues with "myclass" generic functions that i will try to describe.
Assume a data.frame X with n <- nrow(X) rows and K <- ncol(X) columns.
My main package function (too big to put it in this post) lets say
fun1 <- function(X){
# do staff...
out <- list(index= character vector, A= A, B= B,... etc)
return(out)
class(out) <- "myclass"
}
returns as an output a list. Then I have to use the output for the generic print method in a print.myclass function. However, in my print function I want to use the data frame X used in my main function without asking the user to provide it in an argument (i.e, print(out,X)) and without having it in my output list out (visible to the user at least). Is there any way to do that? Thanks in advance!

zipping lists in R

As a guideline I prefer apply functions on elements of a list using lapply or *ply (from plyr) rather than explicitly iterating through them. However, this works well when I have to process one list at a time. When the function takes multiple arguments, I usually do a cycle.
I was wondering if it's possible to have a cleaner construct, still functional in nature. One possible approach could be to define a function similar to Python, zip(x,y), which takes the input lists, and returns a list, whose i-th element is list(x, y), and then apply the function to this list. But my question is whether I am using the cleanest approach or not. I am not worried about performance optimization, but rather clarity/elegance.
Below is the naive example.
A <- as.list(0:9)
B <- as.list(0:9)
f <- function(x, y) x^2+y
OUT <- list()
for (n in 1:10) OUT[[n]] <- f(A[[n]], B[[n]])
OUT
[[1]]
[1] 0
[[2]]
[1] 2
...
And here is the zipped example (which could be extended to arbitrary arguments):
zip <- function(x, y){
stopifnot(length(x)==length(y))
z <- list()
for (i in seq_along(x)){
z[[i]] <- list(x[[i]], y[[i]])
}
z
}
E <- zip(A, B)
lapply(E, function(x) f(x[[1]], x[[2]]))
[[1]]
[1] 0
[[2]]
[1] 2
...
I think you're looking for mapply:
‘mapply’ is a multivariate version of ‘sapply’. ‘mapply’ applies
‘FUN’ to the first elements of each ... argument, the second
elements, the third elements, and so on. Arguments are recycled
if necessary.
For your example, use mapply(f, A, B)
I came across a similar problem today. And after learning the usage of the func mapply, I know how to solve it now.
mapply is so cool!!
Here is an examples:
en = c("cattle", "chicken", "pig")
zh = c("牛", "鸡", "猪")
dict <- new.env(hash = TRUE)
Add <- function(key, val) dict[[key]] <- val
mapply(Add, en, zh)
## cattle chicken pig
## "牛" "鸡" "猪"
I think you could do this with what I call an 'implicit loop' (this name does not hit it fully, but whatever), taking into account that you can loop over vectors within *apply:
OUT <- lapply(1:10, function(x) (A[[x]]^2 + B[[x]]))
or
OUT <- lapply(1:10, function(x) f(A[[x]], B[[x]]))
Note that you then could also use vapply (or 'sapply`) for output managing (i.e. if you don't want a list).
(by the way, I am not getting what you want with the zip function, so I am sorry, if I missed your point.)

Resources