Reproducing the result from Map() with mapply() - r

Take the following data frame and vector,
df <- data.frame(x = 1:3, y = 4:6, z = 7:9)
v <- c(5, 10, 15)
Assume I want to multiply df columnwise by the elements of v, meaning df[1] * v[1], df[2] * v[2], and df[3] * v[3]
I can do this with Map
> Map(`*`, df, v)
$x
[1] 5 10 15
$y
[1] 40 50 60
$z
[1] 105 120 135
Now, since Map is defined as
> Map
function (f, ...)
{
f <- match.fun(f)
mapply(FUN = f, ..., SIMPLIFY = FALSE)
}
<bytecode: 0x3950e00>
<environment: namespace:base>
it seems logical that I should be able to reproduce the above exactly with the following call to mapply, but this is not the case.
> mapply(`*`, df, v, simplify = FALSE)
# Error in .Primitive("*")(dots[[1L]][[1L]], dots[[2L]][[1L]],
# simplify = dots[[3L]][[1L]]) : operator needs one or two arguments
The problem seems to be within the arguments of "*", and those arguments are
> args("*")
function (e1, e2)
NULL
So two more tries yield similar errors.
> mapply(`*`, e1 = df, e2 = v, simplify = FALSE)
# Error in .Primitive("*")(e1 = dots[[1L]][[1L]], e2 = dots[[2L]][[1L]], :
# operator needs one or two arguments
> mapply(`*`, ..1 = df, ..2 = v, simplify = FALSE)
# Error in .Primitive("*")(..1 = dots[[1L]][[1L]], ..2 = dots[[2L]][[1L]], :
# operator needs one or two arguments
What is the issue here? And how can I reproduce (exactly) the result from
Map(`*`, df, v)
with mapply?

Notice that Map calls
mapply(FUN = f, ..., SIMPLIFY = FALSE)
not
mapply(FUN = f, ..., simplify = FALSE)
and of course R is case sensitive. Try
mapply(`*`, df, v, SIMPLIFY = FALSE)
# $x
# [1] 5 10 15
#
# $y
# [1] 40 50 60
#
# $z
# [1] 105 120 135
instead. With simplify = FALSE, it's trying to call
`*`(df[[1]], v[1], simplify = FALSE)
which is what is giving that error.

Related

Getting name of a function passed as an argument to a function

Background
Function is passed as an argument to a function. The problem pertains to:
getting the name of that function as a string for convenient subsequent manipulation
locating that function within the package from which is called
understanding :: and ::: calls
Example
Function fun_tst executes function FUN on x:
fun_tst <- function(x = 1:100, FUN = mean) {
return(FUN(x))
}
mean
fun_tst()
# [1] 50.5
sum
fun_tst(x = 1:1e3, FUN = sum)
# [1] 500500
Problem
fun_tst <- function(x = 1:100, FUN = mean) {
msg <- paste("Executing function", FUN)
print(msg)
return(FUN(x))
}
fun_tst(x = 1:1e3, FUN = sum)
Error in paste("Executing function", FUN) : cannot coerce type
'builtin' to vector of type 'character'
Attempts
1)
Interestingly, print can handle FUN object but results return function body.
fun_tst <- function(x = 1:100, FUN = mean) {
print(FUN)
return(FUN(x))
}
fun_tst(x = 1:1e3, FUN = sum)
function (..., na.rm = FALSE) .Primitive("sum") [1] 500500
2) subsitute
fun_tst <- function(x = 1:100, FUN = mean) {
fun_name <- substitute(FUN)
msg <- paste("Executing function", fun_name, collapse = " ")
print(msg)
return(FUN(x))
}
fun_tst(x = 1:1e3, FUN = sum)
>> fun_tst(x = 1:1e3, FUN = sum)
[1] "Executing function sum"
[1] 500500
Almost there but it looks like a total mess when used with :: as in:
>> fun_tst(x = 1:1e3, FUN = dplyr::glimpse)
[1] "Executing function :: Executing function dplyr Executing function glimpse"
int [1:1000] 1 2 3 4 5 6 7 8 9 10 ..
Desired results
fun_tst(x = 1:1e3, FUN = dplyr::glimpse)
# Executing function glimpse from package dplyr
int [1:1000] 1 2 3 4 5 6 7 8 9 10 ...
fun_tst(x = 1:1e3, FUN = sum)
# Executing function sum from package base
You're almost there with your second try (using substitute). The problem comes from the way R converts language objects to character:
> as.character(substitute(dplyr::glimpse))
[1] "::" "dplyr" "glimpse"
Given this, it's not surprising that paste mangles it that way. I would fix this just by handling the two cases separately:
fun_tst <- function(x = 1:100, FUN = mean) {
fun_name <- substitute(FUN)
if (length(fun_name) == 1) {
msg <- paste("Executing function", fun_name, "from package base")
} else {
msg <- paste("Executing function", fun_name[3], "from package", fun_name[2])
}
print(msg)
return(FUN(x))
}
This works on both of your examples:
> fun_tst(x = 1:1e3, FUN = sum)
[1] "Executing function sum from package base"
[1] 500500
> fun_tst(x = 1:1e3, FUN = dplyr::glimpse)
[1] "Executing function glimpse from package dplyr"
int [1:1000] 1 2 3 4 5 6 7 8 9 10 ...
However, as written, it will think all functions in the global environment are from base, even if they're user-defined or introduced with a library call. If this is your use case, don't explicitly say "from package base".
If you use deparse() and substitute you'll get the desired output, see a similar post on passing variable names to plot(), https://stackoverflow.com/a/9666650/1993932.
fun_tst <- function(x = 1:100, FUN = mean) {
message(paste("Executing function",deparse(substitute(FUN))))
return((FUN(x)))
}
> fun_tst(x = 1:1e3, FUN = sum)
Executing function sum
[1] 500500
> fun_tst(x = 1:1e3, FUN = dplyr::glimpse)
Executing function dplyr::glimpse
int [1:1000] 1 2 3 4 5 6 7 8 9 10 ...
If you rather want the message as a character vector, replace message with print.

purrr::pmap with user-defined functions and named list

The following piece of code works as expected:
library(tidyverse)
tib <- tibble(x = c(1,2), y = c(2,4), z = c(3,6))
tib %>% pmap(c)
#[[1]]
#x y z
#1 2 3
#
#[[2]]
#x y z
#2 4 6
But if I define the function
my_c_1 <- function(u, v, w) c(u, v, w)
I get an error:
tib %>% pmap(my_c_1)
#Error in .f(x = .l[[c(1L, i)]], y = .l[[c(2L, i)]], z = .l[[c(3L, i)]], :
# unused arguments (x = .l[[c(1, i)]], y = .l[[c(2, i)]], z = .l[[c(3, i)]])
Equivalently, for a named list with the base vector function all works well:
lili_1 <- list(x = list(1,2), y = list(2,4), z = list(3,6))
pmap(lili_1, c)
#[[1]]
#x y z
#1 2 3
#
#[[2]]
#x y z
#2 4 6
And with the user-defined function I get the same error:
pmap(lili_1, my_c_1)
#Error in .f(x = .l[[c(1L, i)]], y = .l[[c(2L, i)]], z = .l[[c(3L, i)]], :
#unused arguments (x = .l[[c(1, i)]], y = .l[[c(2, i)]], z = .l[[c(3, i)]])
However, for an un-named list with the user-defined function, it works:
lili_2 <- list(list(1,2), list(2,4), list(3,6))
pmap(lili_2, my_c_1)
#[[1]]
#[1] 1 2 3
#
#[[2]]
#[1] 2 4 6
I don't quite understand why things break with named lists and user-defined functions. Any insight?
BTW, I found a temporary workaround by defining:
my_c_2 <- function(...) c(...)
Then all works well, even with named lists... which leaves me even more puzzled.
This is in the spirit of a minimal reproducible example. In my current working code I would like to be able to pipe tibbles to pmap with my more general defined function without using the ... workaround for my variables.
your function my_c_1 has arguments u, v, w but you pass a list with names x, y, z. If you don't want a function with no named arguments (..., such as base's c), you should make sure the names match in your call.

big64 - sum() on a vector of NA produces odd results

When using big64 package, summing a vector of NAs to another vector of integers yields an inaccurate result. Depending on whether the NA vector is summed first or last, the results will be either 0 or twice the correct answer, respectively.
Notice that converting the NA vector away from integer64 will remove the issue.
However, when experimenting with other small values in place of y, the results were awfully strange.
For example:
40 + 35 = 75 but
35 + 40 = 80
Any thoughts as to what is going on?
EXAMPLE:
library(bit64)
x <- as.integer64(c(20, 20))
y <- as.integer64(c(NA, NA))
sum(y, x, na.rm=TRUE)
# integer64
# [1] 80 # <~~~ Twice the correct value
sum(x, y, na.rm=TRUE)
# integer64
# [1] 0 # <~~~~ Incorrect 0. Should be 40.
## Removing the NAs does not help.
y <- y[!is.na(y)]
## A vector of 0's gives the same issue
y <- as.integer64(c(0, 0))
## Same results
sum(y, x, na.rm=TRUE)
# integer64
# [1] 80
sum(x, y, na.rm=TRUE)
# integer64
# [1] 0
## Converting to numeric does away with the issue (but is not a viable workaround, for obvious reasons)
y <- as.numeric(y)
sum(y, x, na.rm=TRUE)
# [1] 1.97626e-322
sum.integer64(y, x, na.rm=TRUE)
# integer64
# [1] 40
sum(x, y, na.rm=TRUE)
# integer64
# [1] 40
Give y a single value, and the results are also very out of place
y <- as.integer64(c(35, NA, NA))
sum.integer64(x, if (!all(is.na(y))) removeNA(y), na.rm=TRUE)
sum.integer64(x, y[[1]], na.rm=TRUE)
sum.integer64(y[[1]], x, na.rm=TRUE)
## No NA's present
sum.integer64(as.integer64(35), x)
# integer64
# [1] 80
sum.integer64(x, as.integer64(35))
# integer64
# [1] 70
Not an answer, but an exploration. Hope it might help you.
From the sum.integer64 function of the bit64 package:
function (..., na.rm = FALSE)
{
l <- list(...)
ret <- double(1)
if (length(l) == 1) {
.Call("sum_integer64", l[[1]], na.rm, ret)
oldClass(ret) <- "integer64"
ret
}
else {
ret <- sapply(l, function(e) {
if (is.integer64(e)) {
.Call("sum_integer64", e, na.rm, ret)
ret
}
else {
as.integer64(sum(e, na.rm = na.rm))
}
})
oldClass(ret) <- "integer64"
sum(ret, na.rm = na.rm)
}
}
Here is your example:
library(bit64)
x <- as.integer64(c(20, 20))
y <- as.integer64(c(NA, NA))
na.rm <- TRUE
l <- list(y, x)
ret <- double(1)
ret
#[1] 0
# We use the sapply function as in the function:
ret <- sapply(l, function(e) { .Call("sum_integer64", e, na.rm, ret) })
oldClass(ret) <- "integer64"
ret
#integer64
#[1] 40 40 <-- twice the value "40"
sum(ret, na.rm = na.rm)
# integer64
#[1] 80 <-- twice the expected value, as you said
Here we decompose the calculation, for each vector:
ret <- double(1)
ret2 <- NULL
ret2[1] <- .Call("sum_integer64", y, na.rm, ret)
ret2[2] <- .Call("sum_integer64", x, na.rm, ret)
oldClass(ret2) <- "integer64"
ret2
#integer64
#[1] 0 40 <-- only once the value "40", and "0" because of NaNs
sum(ret2, na.rm = na.rm)
#integer64
#[1] 40 <- expected value

R: Extracting the index of the final / last minimum value in a vector

I have a vector
y <- c(10:1, c(0.1,0.1,0.1))
if I do which.min(y), it returns the index 11.
However, I want it to return the last / final index, where the minimum has been seen.
I have a clumsy of doing this, which is:
rev(which(y == min(y)))[1]
But is there a better way to do this ?
I would just create my own utility function
which.min2 <- function(x, last.index = FALSE, ...){
if(last.index) max(which(x == min(x, ...))) else which.min(x)
}
which.min2(y, TRUE)
## [1] 13
which.min2(y)
## [1] 11
This will also work if you have NAs in your vector
y <- c(10:1, c(0.1,0.1,NA))
which.min2(y, TRUE, na.rm = TRUE)
## [1] 12
which.min2(y, na.rm = TRUE)
## [1] 11
You can use the max.col function, which has a richer way of handling ties, but it requires a matrix and I did not find the corresponding min function:
> y <- c(10:1, c(0.1,0.1,0.1))
> max.col(t(as.matrix(-y)), ties.method='last')
[1] 13
>
So depending on how your real y vector is built, you cain gain from this method.
tail(order(y, decreasing=T),1)
#[1] 13

R: how to make passing of arguments using ... to multiple functions well-defined?

V1: Suppose functions f(x, ...) and g(x , ...) can be passed different arguments. If I were to define a new function using both of them, can I make the passing of arguments via the ... operator well-defined? As a simple example:
f1 = function(x, n = 1) x + n
g1 = function(x, m = 1) x + m
f = function(x, ...) f1(x, ...)
g = function(x, ...) g1(x, ...)
h = function(x, ...) {
fgList = list()
fgList[["f"]] = f(x, ...)
fgList[["g"]] = g(x, ...)
return(fgList)
}
h(1:4)
# $f
# [1] 2 3 4 5
# $g
# [1] 2 3 4 5
h(1:4, n = 2)
# Error in g1(x, ...) : unused argument (n = 2)
The argument n is being passed down to functions f and g, but it is only well-defined for function f. I want to mitigate against this.
V2: If they are functions that I have defined, then Hong Ooi's solution below works perfectly.
Can this solution be extended for pre-defined functions which don't have a ... argument or equivalently, can a ... argument be 'added' to a predefined function which doesn't have one? For example:
h = function(x, ...) mean(x, ...) * median (x, ...)
h(1:4, test = 1)
## Error in median(x, ...) : unused argument (test = 1)
You can't have multiple versions of ... in the one environment. What you can do, however, is give each of your called sub-functions a ... argument of their own. This means they will ignore any parameters passed down that don't match their own formal arguments.
f1 = function(x, n = 1, ...) x + n
g1 = function(x, m = 1, ...) x + m
> h(1:4, n = 2)
$f
[1] 3 4 5 6
$g
[1] 2 3 4 5
Edit to answer added question: you can make a new version of median, which will override the predefined function when you call it in your own code. (Due to how R namespaces work, other predefined functions will still use the existing version.)
median <- function(x, na.rm=FALSE, ...)
base::median(x, na.rm) # function median exported from base package

Resources