purrr::pmap with user-defined functions and named list - r

The following piece of code works as expected:
library(tidyverse)
tib <- tibble(x = c(1,2), y = c(2,4), z = c(3,6))
tib %>% pmap(c)
#[[1]]
#x y z
#1 2 3
#
#[[2]]
#x y z
#2 4 6
But if I define the function
my_c_1 <- function(u, v, w) c(u, v, w)
I get an error:
tib %>% pmap(my_c_1)
#Error in .f(x = .l[[c(1L, i)]], y = .l[[c(2L, i)]], z = .l[[c(3L, i)]], :
# unused arguments (x = .l[[c(1, i)]], y = .l[[c(2, i)]], z = .l[[c(3, i)]])
Equivalently, for a named list with the base vector function all works well:
lili_1 <- list(x = list(1,2), y = list(2,4), z = list(3,6))
pmap(lili_1, c)
#[[1]]
#x y z
#1 2 3
#
#[[2]]
#x y z
#2 4 6
And with the user-defined function I get the same error:
pmap(lili_1, my_c_1)
#Error in .f(x = .l[[c(1L, i)]], y = .l[[c(2L, i)]], z = .l[[c(3L, i)]], :
#unused arguments (x = .l[[c(1, i)]], y = .l[[c(2, i)]], z = .l[[c(3, i)]])
However, for an un-named list with the user-defined function, it works:
lili_2 <- list(list(1,2), list(2,4), list(3,6))
pmap(lili_2, my_c_1)
#[[1]]
#[1] 1 2 3
#
#[[2]]
#[1] 2 4 6
I don't quite understand why things break with named lists and user-defined functions. Any insight?
BTW, I found a temporary workaround by defining:
my_c_2 <- function(...) c(...)
Then all works well, even with named lists... which leaves me even more puzzled.
This is in the spirit of a minimal reproducible example. In my current working code I would like to be able to pipe tibbles to pmap with my more general defined function without using the ... workaround for my variables.

your function my_c_1 has arguments u, v, w but you pass a list with names x, y, z. If you don't want a function with no named arguments (..., such as base's c), you should make sure the names match in your call.

Related

How to apply a custom function to every value in a dataframe?

I am trying to apply a custom function to every value of a dataframe. Here is the custom function and dataframe:
#function
test_fun <- function(x, y = 1) {
output <- x + y
output
}
#dataframe
df <- data.frame(a = c(1,2,3), b = c(4,5,6))
Now lets say I want to apply test_fun, with y = 2, to every value of df. This method doesn't seem to work:
lapply(df, test_fun(y = 2))
The function is vectorized, we can directly apply over the dataset
test_fun(df, y = 2)
# a b
#1 3 6
#2 4 7
##3 5 8
Regarding the OP's error, if we are not using lambda function, specify the argument as
lapply(df, test_fun, y = 2)
-output
#$a
#[1] 3 4 5
#$b
#[1] 6 7 8
Or specify the lambda function and then use (y = 2)
lapply(df, function(vec) test_fun(vec, y = 2))

Find variables that occur only in one cluster in data.frame in R

Using BASE R, I wonder how to answer the following question:
Are there any value on X or Y (i.e., variables of interest names) that occurs only in one element in m (as a cluster) but not others? If yes, produce my desired output below.
For example:
Here we see X == 3 only occurs in element m[[3]] but not m[[1]] and m[[2]].
Here we also see Y == 99 only occur in m[[1]] but not others.
Note: the following is a toy example, a functional answer is appreciated. AND X & Y may or may not be numeric (e.g., be string).
f <- data.frame(id = c(rep("AA",4), rep("BB",2), rep("CC",2)), X = c(1,1,1,1,1,1,3,3),
Y = c(99,99,99,99,6,6,6,6))
m <- split(f, f$id) # Here is `m`
mods <- names(f)[-1] # variables of interest names
Desired output:
list(AA = c(Y = 99), CC = c(X = 3))
# $AA
# Y
# 99
# $CC
# X
# 3
This is a solution based on rapply() and table().
ux <- rapply(m, unique)
tb <- table(uxm <- ux[gsub(rx <- "^.*\\.(.*)$", "\\1", names(ux)) %in% mods])
r <- Map(setNames, n <- uxm[uxm %in% names(tb)[tb == 1]], gsub(rx, "\\1", names(n)))
setNames(r, gsub("^(.*)\\..*$", "\\1", names(r)))
# $AA
# Y
# 99
#
# $CC
# X
# 3
tmp = do.call(rbind, lapply(names(f)[-1], function(x){
d = unique(f[c("id", x)])
names(d) = c("id", "val")
transform(d, nm = x)
}))
tmp = tmp[ave(as.numeric(as.factor(tmp$val)), tmp$val, FUN = length) == 1,]
lapply(split(tmp, tmp$id), function(a){
setNames(a$val, a$nm)
})
#$AA
# Y
#99
#$BB
#named numeric(0)
#$CC
#X
#3
This utilizes #jay.sf's idea of rapply() with an idea from a previous answer:
vec <- rapply(lapply(m, '[', , mods), unique)
unique_vec <- vec[!duplicated(vec) & !duplicated(vec, fromLast = T)]
vec_names <- do.call(rbind, strsplit(names(unique_vec), '.', fixed = T))
names(unique_vec) <- vec_names[, 2]
split(unique_vec, vec_names[, 1])
$AA
Y
99
$CC
X
3

Substitute LHS of = in R

I would like to replace the LHS of "=" in a expression in R. In my personal case, I need it to make sure the following creates a variable that does not already exist in the data frame
df %>% mutate(v = mean(w))
I tried eval(substitute()) but the LHS is not substituted
eval(substitute(df %>% mutate(v = mean(w)), list(v = as.name("id"))))
#similarly in a list
eval(substitute(l <- list(v=1:10),list(v=as.name("id"))))
l
$v
[1] 1 2 3 4 5 6 7 8 9 10
Why can't v substituted throught eval/substitute? What's the best way to work around it?
1) eval/parse Create a cmd string, parse it and evaluate it:
f2 <- function(DF, x, env = parent.frame()){
cmd <- sprintf("mutate(%s, %s = mean(v1))", deparse(substitute(DF)), x)
eval(parse(text = cmd), env)
}
f2(DF, "v1_name")
giving
v1 v1_mean
1 1 2
2 2 2
3 3 2
... etc ...
2) eval/as.call Another way is to construct a list, convert it to a call and evaluate it. (This is also the approach that mutate_each_q in dplyr takes.)
f3 <- function(DF, x, env = parent.frame()) {
L <- list(quote(mutate), .data = substitute(DF), quote(mean(v1)))
names(L)[3] <- x
eval(as.call(L), env)
}
f3(DF, "v1_name")
3) do.call We form a list equal to the last two components of the list in the prior solution and then use do.call :
f3 <- function(DF, x, env = parent.frame()) {
L <- list(.data = substitute(DF), quote(mean(v1)))
names(L)[2] <- x
do.call(mutate, L)
}
f3(DF, "v1_name")
Upodate Added additional solutions.

Reproducing the result from Map() with mapply()

Take the following data frame and vector,
df <- data.frame(x = 1:3, y = 4:6, z = 7:9)
v <- c(5, 10, 15)
Assume I want to multiply df columnwise by the elements of v, meaning df[1] * v[1], df[2] * v[2], and df[3] * v[3]
I can do this with Map
> Map(`*`, df, v)
$x
[1] 5 10 15
$y
[1] 40 50 60
$z
[1] 105 120 135
Now, since Map is defined as
> Map
function (f, ...)
{
f <- match.fun(f)
mapply(FUN = f, ..., SIMPLIFY = FALSE)
}
<bytecode: 0x3950e00>
<environment: namespace:base>
it seems logical that I should be able to reproduce the above exactly with the following call to mapply, but this is not the case.
> mapply(`*`, df, v, simplify = FALSE)
# Error in .Primitive("*")(dots[[1L]][[1L]], dots[[2L]][[1L]],
# simplify = dots[[3L]][[1L]]) : operator needs one or two arguments
The problem seems to be within the arguments of "*", and those arguments are
> args("*")
function (e1, e2)
NULL
So two more tries yield similar errors.
> mapply(`*`, e1 = df, e2 = v, simplify = FALSE)
# Error in .Primitive("*")(e1 = dots[[1L]][[1L]], e2 = dots[[2L]][[1L]], :
# operator needs one or two arguments
> mapply(`*`, ..1 = df, ..2 = v, simplify = FALSE)
# Error in .Primitive("*")(..1 = dots[[1L]][[1L]], ..2 = dots[[2L]][[1L]], :
# operator needs one or two arguments
What is the issue here? And how can I reproduce (exactly) the result from
Map(`*`, df, v)
with mapply?
Notice that Map calls
mapply(FUN = f, ..., SIMPLIFY = FALSE)
not
mapply(FUN = f, ..., simplify = FALSE)
and of course R is case sensitive. Try
mapply(`*`, df, v, SIMPLIFY = FALSE)
# $x
# [1] 5 10 15
#
# $y
# [1] 40 50 60
#
# $z
# [1] 105 120 135
instead. With simplify = FALSE, it's trying to call
`*`(df[[1]], v[1], simplify = FALSE)
which is what is giving that error.

Behavior of do.call() in the presence of arguments without defaults

This question is a follow-up to a previous answer which raised a puzzle.
Reproducible example from the previous answer:
Models <- list( lm(runif(10)~rnorm(10)),lm(runif(10)~rnorm(10)),lm(runif(10)~rnorm(10)) )
lm1 <- lm(runif(10)~rnorm(10))
library(functional)
# This works
do.call( Curry(anova, object=lm1), Models )
# But so does this
do.call( anova, Models )
The question is why does do.call(anova, Models) work fine, as #Roland points out?
The signature for anova is anova(object, ...)
anova calls UseMethod, which should* call anova.lm which should call anova.lmlist, whose first line is objects <- list(object, ...), but object doesn't exist in that formulation.
The only thing I can surmise is that do.call might not just fill in ellipses but fills in all arguments without defaults and leaves any extra for the ellipsis to catch? If so, where is that documented, as it's definitely new to me!
* Which is itself a clue--how does UseMethod know to call anova.lm if the first argument is unspecified? There's no anova.list method or anova.default or similar...
In a regular function call ... captures arguments by position, partial match and full match:
f <- function(...) g(...)
g <- function(x, y, zabc) c(x = x, y = y, zabc = zabc)
f(1, 2, 3)
# x y zabc
# 1 2 3
f(z = 3, y = 2, 1)
# x y zabc
# 1 2 3
do.call behaves in exactly the same way except instead of supplying the arguments directly to the function, they're stored in a list and do.call takes care of passing them into the function:
do.call(f, list(1, 2, 3))
# x y zabc
# 1 2 3
do.call(f, list(z = 3, y = 2, 1))
# x y zabc
# 1 2 3
I think it is worth stressing that the names of the list elements do matter. Hadley mentioned it, but it can be an annoyance. Consider the next example:
x <- rnorm(1000)
y <- rnorm(1000)
z <- rnorm(1000) + 0.2
Models <- list()
Models$xy <- lm(z~x)
Models$yz <- lm(z~y)
# This will fail, because do.call will not assign anything to the argument "object" of anova
do.call(anova, Models)
# This won't
do.call(anova, unname(Models))
do.call passes the first element of the list to the first argument:
fun <- function(x,...) {
print(paste0("x=",x))
list(x, ...)
}
do.call(fun, list(1,2))
# [1] "x=1"
# [[1]]
# [1] 1
#
# [[2]]
# [1] 2

Resources