Using the following function foo() as a simple example, I'd like to distribute the values given in ... two different functions, if possible.
foo <- function(x, y, ...) {
list(sum = sum(x, ...), grep = grep("abc", y, ...))
}
In the following example, I would like na.rm to be passed to sum(), and value to be passed to grep(). But I get an error for an unused argument in grep().
X <- c(1:5, NA, 6:10)
Y <- "xyzabcxyz"
foo(X, Y, na.rm = TRUE, value = TRUE)
# Error in grep("abc", y, ...) : unused argument (na.rm = TRUE)
It seems like the arguments were sent to grep() first. Is that correct? I would think R would see and evaluate sum() first, and return an error for that case.
Furthermore, when trying to split up the arguments in ..., I ran into trouble. sum()'s formal arguments are NULL because it is a .Primitive, and therefore I cannot use
names(formals(sum)) %in% names(list(...))
I also don't want to assume that the leftover arguments from
names(formals(grep)) %in% names(list(...))
are to automatically be passed to sum().
How can I safely and efficiently distribute ... arguments to multiple functions so that no unnecessary evaluations are made?
In the long-run, I'd like to be able to apply this to functions with a long list of ... arguments, similar to those of download.file() and scan().
Separate Lists If you really want to pass different sets of parameters to different functions then it's probably cleaner to specify separate lists:
foo <- function(x, y, sum = list(), grep = list()) {
list(sum = do.call("sum", c(x, sum)), grep = do.call("grep", c("abc", y, grep)))
}
# test
X <- c(1:5, NA, 6:10)
Y <- "xyzabcxyz"
foo(X, Y, sum = list(na.rm = TRUE), grep = list(value = TRUE))
## $sum
## [1] 55
##
## $grep
## [1] "xyzabcxyz"
Hybrid list / ... An alternative is that we could use ... for one of these and then specify the other as a list, particularly in the case that one of them is frequently used and the other is infrequently used. The frequently used one would be passed via ... and the infrequently used via a list. e.g.
foo <- function(x, y, sum = list(), ...) {
list(sum = do.call("sum", c(x, sum)), grep = grep("abc", y, ...))
}
foo(X, Y, sum = list(na.rm = TRUE), value = TRUE)
Here are a couple of examples of the hybrid approach from R itself:
i) The mapply function takes that approach using both ... and a MoreArgs list:
> args(mapply)
function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)
NULL
ii) nls also takes this approach using both ... and the control list:
> args(nls)
function (formula, data = parent.frame(), start, control = nls.control(),
algorithm = c("default", "plinear", "port"), trace = FALSE,
subset, weights, na.action, model = FALSE, lower = -Inf,
upper = Inf, ...)
NULL
Why does grep error before sum?
See that sum is a lot more accommodating with its arguments:
X <- c(1:5, NA, 6:10)
sum(X, na.rm = TRUE, value = TRUE)
## [1] 56
It doesn't failed because it doesn't care about other named arguments, so the value = TRUE simplifies to just TRUE which sums to 1. Incidentally:
sum(X, na.rm = TRUE)
## [1] 55
How to split ... to different functions?
One method (that is very prone to error) is to look for the args for the target functions. For instance:
foo <- function(x, y, ...){
argnames <- names(list(...))
sumargs <- intersect(argnames, names(as.list(args(sum))))
grepargs <- intersect(argnames, names(as.list(args(grep))))
list(sum = do.call(sum, c(list(x), list(...)[sumargs])),
grep = do.call(grep, c(list("abc", y), list(...)[grepargs])))
}
This is prone to error anytime the arguments a function uses are not properly reported by args, such as S3 objects. As an example:
names(as.list(args(plot)))
## [1] "x" "y" "..." ""
names(as.list(args(plot.default)))
## [1] "x" "y" "type" "xlim" "ylim"
## [6] "log" "main" "sub" "xlab" "ylab"
## [11] "ann" "axes" "frame.plot" "panel.first" "panel.last"
## [16] "asp" "..." ""
In this case, you could substitute the appropriate S3 function. Because of this, I don't have a generalized solution for this (though I don't know that it does or does not exist).
You can only pass the ... argument to another function, if that other function includes all named arguments that you pass to ... or if it has a ... argument itself. So for sum, this is no problem (args(sum) returns function (..., na.rm = FALSE)). On the other hand grep has neither na.rm nor ... as an argument.
args(grep)
# function (pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
# fixed = FALSE, useBytes = FALSE, invert = FALSE)
This does not include ... and also does not include a named argument na.rm either. A simple solution is to just define your own function mygrep as follows:
mygrep <- function (pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
fixed = FALSE, useBytes = FALSE, invert = FALSE, ...)
grep(pattern, x, ignore.case, perl, value, fixed, useBytes, invert)
Then it seems to work:
foo <- function(x, y, ...){
list(sum = sum(x, ...), grep = mygrep("abc", y, ...))
}
X <- c(1:5, NA, 6:10)
Y <- "xyzabcxyz"
foo(X, Y, na.rm = TRUE, value = TRUE)
# $sum
# [1] 56
#
# $grep
# [1] "xyzabcxyz"
This answer does not directly the original question but could be helpful to others who experience a similar problem with their own functions (as opposed to existing functions like sum and grep).
#shadow's answer contains an insight that points to a very simple solution in such cases: just make sure your nested functions have ... as an argument and you won't get the unused argument error.
For example:
nested1 <- function(x, a) {
x + a
}
nested2 <- function(x, b) {
x - b
}
f <- function(x, ...) {
if (x >= 0) {
nested1(x, ...)
} else {
nested2(x, ...)
}
}
If we call f(x = 2, a = 3, b = 4) we get an error: Error in nested1(x, ...) : unused argument (b = 4).
But just add a ... to the formals of nested1 and nested2 and run again:
nested1 <- function(x, a, ...) {
x + a
}
nested2 <- function(x, b, ...) {
x - b
}
Now, f(x = 2, a = 3, b = 4) yields the desired result: 5. Problem solved.
Related
Is it possible to get the information which arguments are expected by a function and then store it in a character vector?
I know args(foo) but it only prints this information and returns NULL.
Why do I need this?
I want to work with the three dot arguments (dot dot dot, ...) and pass it to different functions.
Let me explain...
The following simple case works.
data <- c(1:10)
cv <- function(x, ...) {
numerator <- mean(x, ...)
denominator <- sd(x, ...)
return(numerator / denominator)
}
cv(data, na.rm = TRUE)
However, in a slightly different case, R will not figure out automatically which aruments match which function.
data <- c(1:10)
roundCv <- function(x, ...) {
numerator <- mean(x, ...)
denominator <- sd(x, ...)
result <- round(numerator / denominator, ...)
return(result)
}
roundCv(data, na.rm = TRUE, digits = 2)
# Error in sd(x, ...) : unused argument (digits = 2)
If I want to separate those arguments, it gets a little hairy. The approach is not generic but has to be adapted to all functions involved.
data <- c(1:10)
roundCv2 <- function(x, ...) {
args <- list(...)
args1 <- args[ names(args) %in% "na.rm"] # For mean/sd
args2 <- args[!names(args) %in% "na.rm"] # For print
numerator <- do.call("mean", c(list(x = x), args1))
denominator <- do.call("sd", c(list(x = x), args1))
tmp <- numerator / denominator
do.call("round", c(list(x = tmp), args2))
}
roundCv2(data, na.rm = TRUE, digits = 2)
Is there a simple way to do this?!
If I would know the arguments each function expects, I could handle it generically. That's why I'm asking:
Is it possible to get the information which arguments are expected by a function and then store it in a character vector?
A shout-out to MrFlick for pointing to similar questions and giving the answer in the comments.
You can use formals() to get a list like object back, bit it won't work for primitive functions. Like names(formals(...))
More details can be found here: https://stackoverflow.com/a/4128401/1553796
So, I'm making a function like
example <- function(x, y){
z <- data.frame("variable name" = y, "Quantity of 1" = sum(x==1, na.rm = TRUE))
eval(as.character(y)) <<- z
}
list <- sample(c(0,1), size = 12, replace = TRUE)
If I evaluate my function using
example(list, "list")
It gives me an
error in eval(as.character(y)) <<- z: object 'y' not found
I want the function to give me a variable which I could find by the name I pass on it (as "Y") given that I'll have to work using the same procedures multiples times.
I think you are looking for assign:
example <- function(x, y){
z <- data.frame("variable name" = y, "Quantity of 1" = sum(x==1, na.rm = TRUE))
assign(y, z, envir = parent.frame())
}
list <- sample(c(0,1), size = 12, replace = TRUE)
example(list, "list")
list
#> variable.name Quantity.of.1
#> 1 list 5
However, please note that this is not a great idea. You should try to avoid writing functions that can over-write objects in the global environment (or other calling frame) as this can have unintended consequences and is not idiomatic R.
It would be better to have:
example <- function(x, y){
data.frame("variable name" = y, "Quantity of 1" = sum(x==1, na.rm = TRUE))
}
and do
list <- example(list, "list")
or, better yet:
example <- function(x){
data.frame("variable name" = deparse(substitute(x)),
"Quantity of 1" = sum(x==1, na.rm = TRUE))
}
So you can just do:
list <- example(list)
Furthermore, it is a good idea to avoid using a common function name like list as a variable name.
I'm trying to set up details for which function to run and which arguments to include at the start of my script, to then later call the function. I'm having trouble specifying arguments to be input into the function.
I have a fixed object
v <- c(1,2,3,5,6,7,8,9,NA)
I want to specify which measurement function I will use as well as any relevant arguments.
Example 1:
chosenFunction <- mean
chosenArguments <- "trim = 0.1, na.rm = T"
Example 2:
chosenFunction <- median
chosenArguments <- "na.rm = F"
Then I want to be able to run this specified function
chosenFunction(v, chosenArguments)
Unfortunately, I can't just put in the string chosenArguments and expect the function to run. Is there any alternative way to specify the arguments to my function?
Updated answer based on OP's clarifications
chosenFunction <- mean
get_summary <- function(x, fun, ...) fun(x, ...)>
v <- 1:100
get_summary(v, chosenFunction, na.rm = TRUE)
# [1] 50.5
Later on if you want to change the function
chosenFunction <- median
get_summary(v, chosenFunction, na.rm = TRUE)
# [1] 50.5
Original answer
get_summary <- function(x, chosenFunction, ...) chosenFunction(x, ...)
v <- 1:100
get_summary(v, mean, na.rm = TRUE, trim = 1)
# [1] 50.5
get_summary(v, median, na.rm = TRUE)
# [1] 50.5
By doing ..., you don't have to specify all arguments
get_summary(mean, na.rm = TRUE)
# [1] 50.5
If we want to calculate mean, we do it by
mean(v, na.rm = TRUE, time = 0.1)
#[1] 5.125
Another way is by using do.call
do.call(mean, list(v, na.rm = TRUE, trim = 0.1))
#[1] 5.125
We can leverage this fact and create a named list for chosenArguments and use it in do.call
chosenFunction <- mean
chosenArguments <- list(na.rm = TRUE, trim = 0.1)
do.call(chosenFunction, c(list(v), chosenArguments))
#[1] 5.125
In the body of some R functions, for example lm I see calls to the match.call function. As its help page says, when used inside a function match.call returns a call where argument names are specified; and this is supposed to be useful for passing a large number of arguments to another functions.
For example, in the lm function we see a call to the function model.frame...
function (formula, data, subset, weights, na.action, method = "qr",
model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
contrasts = NULL, offset, ...)
{
cl <- match.call()
mf <- match.call(expand.dots = FALSE)
m <- match(c("formula", "data", "subset", "weights", "na.action",
"offset"), names(mf), 0L)
mf <- mf[c(1L, m)]
mf$drop.unused.levels <- TRUE
mf[[1L]] <- quote(stats::model.frame)
mf <- eval(mf, parent.frame())
...
...Why is this more useful than making a straight call to model.frame specifying the argument names as I do next?
function (formula, data, subset, weights, na.action, method = "qr",
model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
contrasts = NULL, offset, ...)
{
mf <- model.frame(formula = formula, data = data,
subset = subset, weights = weights, subset = subset)
...
(Note that match.call has another use that I do not discuss, store the call in the resulting object.)
One reason that is relevant here is that match.call captures the language of the call without evaluating it, and in this case it allows lm to treat some of the "missing" variables as "optional". Consider:
lm(x ~ y, data.frame(x=1:10, y=runif(10)))
Vs:
lm2 <- function (
formula, data, subset, weights, na.action, method = "qr",
model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
contrasts = NULL, offset, ...
) {
mf <- model.frame(
formula = formula, data = data, subset = subset, weights = weights
)
}
lm2(x ~ y, data.frame(x=1:10, y=runif(10)))
## Error in model.frame.default(formula = formula, data = data, subset = subset, :
## invalid type (closure) for variable '(weights)'
In lm2, since weights is "missing" but you still use it in weights=weights, R tries to use the stats::weights function which is clearly not what was intended. You could get around this by testing for missingness before you call model.frame, but at that point the match.call starts looking pretty good. Look at what happens if we debug the call:
debug(lm2)
lm2(x ~ y, data.frame(x=1:10, y=runif(10)))
## debugging in: lm2(x ~ y, data.frame(x = 1:10, y = runif(10)))
## debug at #5: {
## mf <- model.frame(formula = formula, data = data, subset = subset,
## weights = weights)
## }
Browse[2]> match.call()
## lm2(formula = x ~ y, data = data.frame(x = 1:10, y = runif(10)))
match.call doesn't involve the missing arguments at all.
You could argue that the optional arguments should have been made explicitly optional via default values, but that's not what happened here.
Here's an example. In it, calc_1 is a function with loads of numerical arguments that wants to add and multiply them. It delegates this work to calc_2 , which is a subsidiary function that takes most of these arguments. But calc_2 also takes some extra arguments (q to t) which calc_1 can't supply from its own actual parameters. Instead, it passes them as extras.
The call to calc_2 would be truly horrendous if written so as to show everything calc_1 passes it. So instead, we assume that if calc_1 and calc_2 share a formal parameter, they give it the same name. This makes it possible to write a caller that works out which arguments calc_1 can pass to calc_2 , constructs a call that will do so, and feeds in the extra values to complete it. The comments in the code below should make this clear.
Incidentally, library "tidyverse" is only needed for %>% and str_c which I defined calc_2 with, and library "assertthat" for one assertion. (Though in a realistic program, I'd put in assertions to check the arguments.)
Here's the output:
> calc_1( a=1, b=11, c=2, d=22, e=3, f=33, g=4, h=44, i=5, j=55, k=6
+ , l=66, m=7, n=77, o=8, p=88
+ )
[1] "87654321QRST"
And here's the code:
library( tidyverse )
library( rlang )
library( assertthat )
`%(%` <- call_with_extras
#
# This is the operator for calling
# a function with arguments passed
# from its parent, supplemented
# with extras. See call_with_extras()
# below.
# A function with a very long
# argument list. It wants to call
# a related function which takes
# most of these arguments and
# so has a long argument list too.
# The second function takes some
# extra arguments.
#
calc_1 <- function( a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p )
{
calc_2 %(% list( t = "T", q = "Q", s = "S", r = "R" )
#
# Call it with those extras, passing
# all the others that calc_2() needs
# as well. %(% is my function for
# doing so: see below.
}
# The function that we call above. It
# uses its own arguments q to t , as
# well as those from calc_1() .
#
calc_2 <- function( a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t )
{
( a + c * 10 + e * 100 + g * 1000 + i * 10000 + k * 100000 +
m * 1000000 + o * 10000000 ) %>%
str_c( q, r, s, t )
}
# Calls function f2 . Passes f2 whichever
# arguments it needs from its caller.
# Corresponding formals should have the
# same name in both. Also passes f2 extra
# arguments from the named list extra.
# The names should have the same names as
# corresponding formals of f2 .
#
call_with_extras <- function( f2, extras )
{
f1_call <- match.call( sys.function(1), sys.call(1) )
# A call object.
f1_actuals <- as.list( f1_call %>% tail(-1) )
# Named list of f1's actuals.
f1_formals <- names( f1_actuals )
# Names of f1's formals.
f2_formals <- names( formals( f2 ) )
# Names of f2's formals.
f2_formals_from_f1 <- intersect( f2_formals, f1_formals )
# Names of f2's formals which f1 can supply.
f2_formals_not_from_f1 <- setdiff( f2_formals, f1_formals )
# Names of f2's formals which f1 can't supply.
extra_formals <- names( extras )
# Names of f2's formals supplied as extras.
assert_that( setequal( extra_formals, f2_formals_not_from_f1 ) )
# The last two should be equal.
f2_actuals_from_f1 <- f1_actuals[ f2_formals_from_f1 ]
# List of actuals which f1 can supply to f2.
f2_actuals <- append( f2_actuals_from_f1, extras )
# All f2's actuals.
f2_call <- call2( f2, !!! f2_actuals )
# Call to f2.
eval( f2_call )
# Run it.
}
# Test it.
#
calc_1( a=1, b=11, c=2, d=22, e=3, f=33, g=4, h=44, i=5, j=55, k=6
, l=66, m=7, n=77, o=8, p=88
)
I usually use the combination of colwise and tapply to calculate grouped values in a data frame. However, I found unexpectedly that the parameter FUN in tapply cannot work correctly with colwise from plyr. The example is as follows:
Data:
df <- data.frame(a = 1:10, b = rep(1:2, each = 5), c = 2:11)
Normal:
library(plyr)
colwise(tapply)(subset(df, select = c(a, c)), df$b, function(x){sum(x[x > 2])})
Above code is correct and can work normally. But if I add FUN, it will be wrong:
colwise(tapply)(subset(df, select = c(a, c)), df$b, FUN = function(x){sum(x[x > 2])})
Error is:
Error in FUN(X[[1L]], ...) :
unused arguments (function (X, INDEX, FUN = NULL, ..., simplify = TRUE)
{
FUN <- if (!is.null(FUN)) match.fun(FUN)
if (!is.list(INDEX)) INDEX <- list(INDEX)
nI <- length(INDEX)
if (!nI) stop("'INDEX' is of length zero")
namelist <- vector("list", nI)
names(namelist) <- names(INDEX)
extent <- integer(nI)
nx <- length(X)
one <- 1
group <- rep.int(one, nx)
ngroup <- one
for (i in seq_along(INDEX)) {
index <- as.factor(INDEX[[i]])
if (length(index) != nx) stop("arguments must have same length")
namelist[[i]] <- levels(index)
extent[i] <- nlevels(index)
group <- group + ngroup * (as.integer(index) - one)
ngroup <- ngroup * nlevels(index)
}
if (is.null(FUN)) return(group)
ans <- lapply(X = split(X, group), FUN = FUN, ...)
index <- as.integer(names(ans))
if (simplify && all(unlist(lapply(ans, length)) == 1)) {
ansmat <- array(dim = extent, dimnames = namelist)
Could anyone explain the reason? Thank you in advance.
Well, the issue is that both lapply and tapply have an optional FUN argument. Note that colwise(tapply) is a function with the following line:
out <- do.call("lapply", c(list(filtered, .fun, ...), dots))
Let's go to this line with our debugger by writing
ct <- colwise(tapply); trace(ct, quote(browser()), at = 6)
and then running
ct(subset(df, select = c(a, c)), df$b, FUN = function(x){sum(x[x > 2])})
Now let's print c(list(filtered, .fun, ...), dots). Notice that the first three (unnamed) arguments are now the dataframe, tapply, and db$b, with the FUN argument above coming in last. However, this argument is named. Since this is a do.call on lapply, instead of that argument becoming an optional parameter for tapply, it now becomes the main call on lapply! So what is happening is that you are turning this into:
lapply(subset(df, select = c(a, c)), function(x){sum(x[x > 2])}, tapply, df$b)
This, of course, makes no sense, and if you execute the above (still in your debugger) manually, you will get the exact same error you are getting. For a simple workaround, try:
tapply2 <- function(.FUN, ...) tapply(FUN = .FUN, ...)
colwise(tapply2)(subset(df, select = c(a, c)), df$b, .FUN = function(x){sum(x[x > 2])})
The plyr package should be checking for ... arguments named FUN (or anything that can interfere with lapply's job), but it doesn't seem the author included this. You can submit a pull request to the plyr package that implements any of the following workarounds:
Defines a local
.lapply <- function(`*X*`, `*FUN*`, ...) lapply(X = `*X*`, `*FUN*`, ...)
(minimizing interference further).
Checks names(list(...)) within the colwise(tapply) function for X and FUN (can introduce problems if the author intended to prevent evaluation of promises until the child call).
Calls do.call("lapply", ...) explicitly with named X and FUN, so that you get the intended
formal argument "FUN" matched by multiple actual arguments