Why is match.call useful? - r

In the body of some R functions, for example lm I see calls to the match.call function. As its help page says, when used inside a function match.call returns a call where argument names are specified; and this is supposed to be useful for passing a large number of arguments to another functions.
For example, in the lm function we see a call to the function model.frame...
function (formula, data, subset, weights, na.action, method = "qr",
model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
contrasts = NULL, offset, ...)
{
cl <- match.call()
mf <- match.call(expand.dots = FALSE)
m <- match(c("formula", "data", "subset", "weights", "na.action",
"offset"), names(mf), 0L)
mf <- mf[c(1L, m)]
mf$drop.unused.levels <- TRUE
mf[[1L]] <- quote(stats::model.frame)
mf <- eval(mf, parent.frame())
...
...Why is this more useful than making a straight call to model.frame specifying the argument names as I do next?
function (formula, data, subset, weights, na.action, method = "qr",
model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
contrasts = NULL, offset, ...)
{
mf <- model.frame(formula = formula, data = data,
subset = subset, weights = weights, subset = subset)
...
(Note that match.call has another use that I do not discuss, store the call in the resulting object.)

One reason that is relevant here is that match.call captures the language of the call without evaluating it, and in this case it allows lm to treat some of the "missing" variables as "optional". Consider:
lm(x ~ y, data.frame(x=1:10, y=runif(10)))
Vs:
lm2 <- function (
formula, data, subset, weights, na.action, method = "qr",
model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
contrasts = NULL, offset, ...
) {
mf <- model.frame(
formula = formula, data = data, subset = subset, weights = weights
)
}
lm2(x ~ y, data.frame(x=1:10, y=runif(10)))
## Error in model.frame.default(formula = formula, data = data, subset = subset, :
## invalid type (closure) for variable '(weights)'
In lm2, since weights is "missing" but you still use it in weights=weights, R tries to use the stats::weights function which is clearly not what was intended. You could get around this by testing for missingness before you call model.frame, but at that point the match.call starts looking pretty good. Look at what happens if we debug the call:
debug(lm2)
lm2(x ~ y, data.frame(x=1:10, y=runif(10)))
## debugging in: lm2(x ~ y, data.frame(x = 1:10, y = runif(10)))
## debug at #5: {
## mf <- model.frame(formula = formula, data = data, subset = subset,
## weights = weights)
## }
Browse[2]> match.call()
## lm2(formula = x ~ y, data = data.frame(x = 1:10, y = runif(10)))
match.call doesn't involve the missing arguments at all.
You could argue that the optional arguments should have been made explicitly optional via default values, but that's not what happened here.

Here's an example. In it, calc_1 is a function with loads of numerical arguments that wants to add and multiply them. It delegates this work to calc_2 , which is a subsidiary function that takes most of these arguments. But calc_2 also takes some extra arguments (q to t) which calc_1 can't supply from its own actual parameters. Instead, it passes them as extras.
The call to calc_2 would be truly horrendous if written so as to show everything calc_1 passes it. So instead, we assume that if calc_1 and calc_2 share a formal parameter, they give it the same name. This makes it possible to write a caller that works out which arguments calc_1 can pass to calc_2 , constructs a call that will do so, and feeds in the extra values to complete it. The comments in the code below should make this clear.
Incidentally, library "tidyverse" is only needed for %>% and str_c which I defined calc_2 with, and library "assertthat" for one assertion. (Though in a realistic program, I'd put in assertions to check the arguments.)
Here's the output:
> calc_1( a=1, b=11, c=2, d=22, e=3, f=33, g=4, h=44, i=5, j=55, k=6
+ , l=66, m=7, n=77, o=8, p=88
+ )
[1] "87654321QRST"
And here's the code:
library( tidyverse )
library( rlang )
library( assertthat )
`%(%` <- call_with_extras
#
# This is the operator for calling
# a function with arguments passed
# from its parent, supplemented
# with extras. See call_with_extras()
# below.
# A function with a very long
# argument list. It wants to call
# a related function which takes
# most of these arguments and
# so has a long argument list too.
# The second function takes some
# extra arguments.
#
calc_1 <- function( a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p )
{
calc_2 %(% list( t = "T", q = "Q", s = "S", r = "R" )
#
# Call it with those extras, passing
# all the others that calc_2() needs
# as well. %(% is my function for
# doing so: see below.
}
# The function that we call above. It
# uses its own arguments q to t , as
# well as those from calc_1() .
#
calc_2 <- function( a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t )
{
( a + c * 10 + e * 100 + g * 1000 + i * 10000 + k * 100000 +
m * 1000000 + o * 10000000 ) %>%
str_c( q, r, s, t )
}
# Calls function f2 . Passes f2 whichever
# arguments it needs from its caller.
# Corresponding formals should have the
# same name in both. Also passes f2 extra
# arguments from the named list extra.
# The names should have the same names as
# corresponding formals of f2 .
#
call_with_extras <- function( f2, extras )
{
f1_call <- match.call( sys.function(1), sys.call(1) )
# A call object.
f1_actuals <- as.list( f1_call %>% tail(-1) )
# Named list of f1's actuals.
f1_formals <- names( f1_actuals )
# Names of f1's formals.
f2_formals <- names( formals( f2 ) )
# Names of f2's formals.
f2_formals_from_f1 <- intersect( f2_formals, f1_formals )
# Names of f2's formals which f1 can supply.
f2_formals_not_from_f1 <- setdiff( f2_formals, f1_formals )
# Names of f2's formals which f1 can't supply.
extra_formals <- names( extras )
# Names of f2's formals supplied as extras.
assert_that( setequal( extra_formals, f2_formals_not_from_f1 ) )
# The last two should be equal.
f2_actuals_from_f1 <- f1_actuals[ f2_formals_from_f1 ]
# List of actuals which f1 can supply to f2.
f2_actuals <- append( f2_actuals_from_f1, extras )
# All f2's actuals.
f2_call <- call2( f2, !!! f2_actuals )
# Call to f2.
eval( f2_call )
# Run it.
}
# Test it.
#
calc_1( a=1, b=11, c=2, d=22, e=3, f=33, g=4, h=44, i=5, j=55, k=6
, l=66, m=7, n=77, o=8, p=88
)

Related

Store the arguments an R function expects in a character vector

Is it possible to get the information which arguments are expected by a function and then store it in a character vector?
I know args(foo) but it only prints this information and returns NULL.
Why do I need this?
I want to work with the three dot arguments (dot dot dot, ...) and pass it to different functions.
Let me explain...
The following simple case works.
data <- c(1:10)
cv <- function(x, ...) {
numerator <- mean(x, ...)
denominator <- sd(x, ...)
return(numerator / denominator)
}
cv(data, na.rm = TRUE)
However, in a slightly different case, R will not figure out automatically which aruments match which function.
data <- c(1:10)
roundCv <- function(x, ...) {
numerator <- mean(x, ...)
denominator <- sd(x, ...)
result <- round(numerator / denominator, ...)
return(result)
}
roundCv(data, na.rm = TRUE, digits = 2)
# Error in sd(x, ...) : unused argument (digits = 2)
If I want to separate those arguments, it gets a little hairy. The approach is not generic but has to be adapted to all functions involved.
data <- c(1:10)
roundCv2 <- function(x, ...) {
args <- list(...)
args1 <- args[ names(args) %in% "na.rm"] # For mean/sd
args2 <- args[!names(args) %in% "na.rm"] # For print
numerator <- do.call("mean", c(list(x = x), args1))
denominator <- do.call("sd", c(list(x = x), args1))
tmp <- numerator / denominator
do.call("round", c(list(x = tmp), args2))
}
roundCv2(data, na.rm = TRUE, digits = 2)
Is there a simple way to do this?!
If I would know the arguments each function expects, I could handle it generically. That's why I'm asking:
Is it possible to get the information which arguments are expected by a function and then store it in a character vector?
A shout-out to MrFlick for pointing to similar questions and giving the answer in the comments.
You can use formals() to get a list like object back, bit it won't work for primitive functions. Like names(formals(...))
More details can be found here: https://stackoverflow.com/a/4128401/1553796

How to evaluate a function with different arguments without having to keep writing it out in R [duplicate]

This question already has answers here:
Grouping functions (tapply, by, aggregate) and the *apply family
(10 answers)
Closed 1 year ago.
I have a function I would like to keep everything fixed apart form a single argument.
ls <- score_model_compound(data, pred, tmp$Prediction, score= "log")
bs <- score_model_compound(data, pred, tmp$Prediction, score="Brier")
ss <- score_model_compound(data, pred, score="spherical")
what I would like is something like
ls = data.frame()
ls <- score_model_compound(data, pred, score= c("log", "Brier", "spherical"))
is there a function I can use, like apply(), which lets me do this?
Thank you
You can create some kind of wrapping function with only the first argument being the one you want to vary and then pass it to lapply:
## Creating the wrapping function
my.wrapping.function <- function(score, data, pred, tmp) {
return(score_model_compound(data = data,
pred = pred,
tmp = tmp,
score = score))
}
## Making the list of variables
my_variables <- as.list(c("log", "Brier", "spherical"))
## Running the function for all the variables (with the set specific arguments)
result_list <- lapply(my_variables,
my.wrapping.function,
data = data, pred = pred, tmp = tmp$Prediction)
And finally, to transform it into a data.frame (or matrix), you can use the do.call(cbind, ...) function on the results:
## Combining the results into a table
my_results_table <- do.call(cbind, result_list)
Does that answer your question?
mapply() to the rescue:
score_v = c('spherical', 'log', 'Brier')
l = mapply(
score_model_compound, # function
score = score_v, # variable argument vector
MoreArgs = list(data = data, # fixed arguments
pred = pred),
SIMPLIFY = FALSE # don't simplify
)
You probably have to tweak it a little yourself, since you didn't provide a reproducible example. mapply(SIMPLIFY = FALSE) will output you a list. If the function returns data.frame's the resulting list of data.frame's can subsequently be bound with e.g. data.table::rbindlidst().
Alternatively you could just use a loop:
l = list()
for (i in seq_along(score_v)) {
sc = score_v[i]
message('Modeling: ', sc)
smc = score_model_compound(data, pred, score = sc)
l[[i]] = smc
names(l)[i] = sc
}

Handling missing arguments, handled with missing(), to call a function inside a function

I am using rpart() inside a function myFunction(). rpart() accepts several parameters which are handled using the missing() function:
rpart(formula, data, weights, subset, na.action = na.rpart, method, model = FALSE, x = FALSE, y = TRUE, parms, control, cost, ...)
For example, the parameter method can be left unspecified, and is handled inside rpart() using the following code:
if (missing(method)) method <- "whatever default"
How can I pass the argument method as a parameter for myFunction() in the most simple and efficient way so that it handles the default missing argument?
If I do something like
myFunction(foo = 0, method){# somecode; rpart(y ~ x, data = data, method = method)}
then this throws an error,
argument "method" is missing, with no default
I have also tried with functions like rlang::missing() with no success whatsoever.
Of course an option is doing something like passing myFunction(method = NULL) and then using if-else statements to either pass or not pass this argument, but then I have to code each possibility (for 4 arguments that would be 16 calls) and is very clumsy.
Note that I would also like to avoid using the ellipsis, as I want to specifically name my arguments.
MINIMAL REPRODUCIBLE EXAMPLE:
y <- c(0,0.1,0.1,-0.1, 100, 101, 99)
x <- c(1,2,3,4, 100,101,102)
myFunction <- function(x, y,
method,
weights,
subset,
parms){
rpart(formula = y ~ .,
data = data.frame(y, x),
weights = weights,
subset = subset,
parms = parms)
}
myFunction(x,y)
Error in eval(extras, data, env) : argument "weights" is missing,
with no default
Here's a solution using match.call. This kind of pattern is seen quite often inside base R functions.
Consider the following function which we might find inside a package, with optional arguments:
package_fun <- function(x, method1, method2, method3)
{
if(missing(method1)) method1 <- "Unspecified"
if(missing(method2)) method2 <- "Unspecified"
if(missing(method3)) method3 <- "Unspecified"
data.frame(x, method1, method2, method3)
}
Inside our own function, we can build a call to package_fun that swaps in our own optional parameters, swaps out any we don't want to pass, and adds any additional ones we choose. We are left with a single call to package_fun, and don't need to worry about combinatorial explosion:
myFunction <- function(foo = 0, method1, method2, method3)
{
mc <- match.call()
mc[[1]] <- quote(package_fun)
mc <- mc[-which(names(mc) == "foo")]
mc$x <- foo
eval(mc, env = parent.frame())
}
So now we can do:
myFunction(foo = 1, method1 = "Specified", method3 = "Specified")
#> x method1 method2 method3
#> 1 1 Specified Unspecified Specified
From the point of view of your reproducible example, this would look like:
myFunction <- function(x, y,
method,
weights,
subset,
parms){
mc <- match.call()
mc[[1]] <- quote(rpart)
mc$formula <- y ~ .
mc$data <- data.frame(y, x)
mc$x <- NULL
mc$y <- NULL
eval(mc, envir =parent.frame())
}
So we would have:
myFunction(x,y)
#> n= 7
#>
#> node), split, n, deviance, yval
#> * denotes terminal node
#>
#> 1) root 7 17136.31 42.87143 *

In R, How do I properly pass a function and a set of parameters to said function so that it executes properly?

I am working on an R project, and I have many different functions (I'm calculating RMSEs on various data sets with various requirements).
I am currently using the "do.call()" function to invoke the function name I'm passing in, but
this causes my whole system to stall and nothing works. This has happened many times over, and I've had to restart R Studio (using version 4.0.2).
I would like to pass in a function as an argument into my parent function (which is recursive but only to 2 passes), and I would like to be able to pass in the parameters from the parent function to the child functions, as well as the recursive function call.
I'm not sure of the correct execution of this.
Any help on where I'm going wrong is greatly appreciated.
Currently, my code is as follows:
#find_generic_lambda is the parent function that is called, and the FUN argument is the named function I would like to pass in to execute inside
find_generic_lambda <- function(seq_start, seq_end, seq_increment, FUN, detailed_flag = FALSE, training_set, testing_set)
{
lambdas <- seq(seq_start, seq_end, seq_increment)
params = c(lambdas, train_set, test_set)
#invoking the passed-in function here with the parameters I'm setting
#this is where the code stumbles
RMSE <- sapply(lambdas, do.call(FUN, params))
#find the smallest lamdba
qplot(lambdas, RMSE)
#saving the first round lambda
min_lambda_first_try <- lambdas[which.min(RMSE)]
min_lambda_first_try
if (detailed_flag)
{
#if this is the first iteration of the function, continue with taking a 10% lower and 10% higher lambda value to iterate through new lambdas that are much more granuluar, with increments at 10% of what they were previously.
new_lambda_range = (seq_end + seq_start)/10
new_lambda_range
min_lambda_first_try <- find_generic_lambda(seq_start = min_lambda_first_try - new_lambda_range, seq_end = min_lambda_first_try + new_lambda_range,
seq_increment = seq_increment/10, FUN, detailed_flag = FALSE, training_set = training_set, testing_set = testing_set)
}
return (min_lambda_first_try)
}
#this is one of the functions that will be passed in as a parameter
regularized_rmse_3 <- function(l, train_set, test_set)
{
mu <- mean(train_set$rating)
just_the_sum <- train_set %>%
group_by(movieId) %>%
summarize(s = sum(rating - mu), n_i = n())
predicted_ratings <- test_set %>%
left_join(just_the_sum, by='movieId') %>%
mutate(b_i = s/(n_i+l)) %>%
mutate(pred = mu + b_i) %>%
pull(pred)
return(RMSE(predicted_ratings, test_set$rating))
}
rmse3_lambda <- find_generic_lambda(seq_start=0, seq_end=10, seq_increment=0.5,
FUN="regularized_rmse_3",
detailed_flag = TRUE, training_set=training_set, testing_set=testing_set)
Expanding on my comments:
Here's a simplified version of your functions (so I can make example dataset) -
f <- function (l_candidate, FUN) {
RMSE <- sapply(l_candidate, FUN)
l_min_RMSE <- l_candidate[which.min(RMSE)]
return(l_min_RMSE)
}
g <- function (l, trainset, testset) {
p <- mean(trainset + l)
sqrt(mean((testset - p)^2))
}
trainset <- c(1, 1, 2, 1)
testset <- c(3, 4)
Then:
f(1:5, FUN = function (x) g(x, trainset, testset))
# [1] 2
So you pass the function g via a wrapper function into f and it will do the job for you.
Alternative
R allows you to create a function out of another function:
g <- function (trainset, testset) function (l) {
p <- mean(trainset + l)
sqrt(mean((testset - p)^2))
}
g1 <- g(trainset, testset)
g1(1)
# [1] 1.346291
In this situation, g() takes two arguments, and return a function that takes 1 argument l. So you can create a new function g1() out of g().
Then you can pass it to your parent function giving you the same results in this example:
f(1:5, FUN = g1)
# [1] 2

Split up `...` arguments and distribute to multiple functions

Using the following function foo() as a simple example, I'd like to distribute the values given in ... two different functions, if possible.
foo <- function(x, y, ...) {
list(sum = sum(x, ...), grep = grep("abc", y, ...))
}
In the following example, I would like na.rm to be passed to sum(), and value to be passed to grep(). But I get an error for an unused argument in grep().
X <- c(1:5, NA, 6:10)
Y <- "xyzabcxyz"
foo(X, Y, na.rm = TRUE, value = TRUE)
# Error in grep("abc", y, ...) : unused argument (na.rm = TRUE)
It seems like the arguments were sent to grep() first. Is that correct? I would think R would see and evaluate sum() first, and return an error for that case.
Furthermore, when trying to split up the arguments in ..., I ran into trouble. sum()'s formal arguments are NULL because it is a .Primitive, and therefore I cannot use
names(formals(sum)) %in% names(list(...))
I also don't want to assume that the leftover arguments from
names(formals(grep)) %in% names(list(...))
are to automatically be passed to sum().
How can I safely and efficiently distribute ... arguments to multiple functions so that no unnecessary evaluations are made?
In the long-run, I'd like to be able to apply this to functions with a long list of ... arguments, similar to those of download.file() and scan().
Separate Lists If you really want to pass different sets of parameters to different functions then it's probably cleaner to specify separate lists:
foo <- function(x, y, sum = list(), grep = list()) {
list(sum = do.call("sum", c(x, sum)), grep = do.call("grep", c("abc", y, grep)))
}
# test
X <- c(1:5, NA, 6:10)
Y <- "xyzabcxyz"
foo(X, Y, sum = list(na.rm = TRUE), grep = list(value = TRUE))
## $sum
## [1] 55
##
## $grep
## [1] "xyzabcxyz"
Hybrid list / ... An alternative is that we could use ... for one of these and then specify the other as a list, particularly in the case that one of them is frequently used and the other is infrequently used. The frequently used one would be passed via ... and the infrequently used via a list. e.g.
foo <- function(x, y, sum = list(), ...) {
list(sum = do.call("sum", c(x, sum)), grep = grep("abc", y, ...))
}
foo(X, Y, sum = list(na.rm = TRUE), value = TRUE)
Here are a couple of examples of the hybrid approach from R itself:
i) The mapply function takes that approach using both ... and a MoreArgs list:
> args(mapply)
function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)
NULL
ii) nls also takes this approach using both ... and the control list:
> args(nls)
function (formula, data = parent.frame(), start, control = nls.control(),
algorithm = c("default", "plinear", "port"), trace = FALSE,
subset, weights, na.action, model = FALSE, lower = -Inf,
upper = Inf, ...)
NULL
Why does grep error before sum?
See that sum is a lot more accommodating with its arguments:
X <- c(1:5, NA, 6:10)
sum(X, na.rm = TRUE, value = TRUE)
## [1] 56
It doesn't failed because it doesn't care about other named arguments, so the value = TRUE simplifies to just TRUE which sums to 1. Incidentally:
sum(X, na.rm = TRUE)
## [1] 55
How to split ... to different functions?
One method (that is very prone to error) is to look for the args for the target functions. For instance:
foo <- function(x, y, ...){
argnames <- names(list(...))
sumargs <- intersect(argnames, names(as.list(args(sum))))
grepargs <- intersect(argnames, names(as.list(args(grep))))
list(sum = do.call(sum, c(list(x), list(...)[sumargs])),
grep = do.call(grep, c(list("abc", y), list(...)[grepargs])))
}
This is prone to error anytime the arguments a function uses are not properly reported by args, such as S3 objects. As an example:
names(as.list(args(plot)))
## [1] "x" "y" "..." ""
names(as.list(args(plot.default)))
## [1] "x" "y" "type" "xlim" "ylim"
## [6] "log" "main" "sub" "xlab" "ylab"
## [11] "ann" "axes" "frame.plot" "panel.first" "panel.last"
## [16] "asp" "..." ""
In this case, you could substitute the appropriate S3 function. Because of this, I don't have a generalized solution for this (though I don't know that it does or does not exist).
You can only pass the ... argument to another function, if that other function includes all named arguments that you pass to ... or if it has a ... argument itself. So for sum, this is no problem (args(sum) returns function (..., na.rm = FALSE)). On the other hand grep has neither na.rm nor ... as an argument.
args(grep)
# function (pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
# fixed = FALSE, useBytes = FALSE, invert = FALSE)
This does not include ... and also does not include a named argument na.rm either. A simple solution is to just define your own function mygrep as follows:
mygrep <- function (pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
fixed = FALSE, useBytes = FALSE, invert = FALSE, ...)
grep(pattern, x, ignore.case, perl, value, fixed, useBytes, invert)
Then it seems to work:
foo <- function(x, y, ...){
list(sum = sum(x, ...), grep = mygrep("abc", y, ...))
}
X <- c(1:5, NA, 6:10)
Y <- "xyzabcxyz"
foo(X, Y, na.rm = TRUE, value = TRUE)
# $sum
# [1] 56
#
# $grep
# [1] "xyzabcxyz"
This answer does not directly the original question but could be helpful to others who experience a similar problem with their own functions (as opposed to existing functions like sum and grep).
#shadow's answer contains an insight that points to a very simple solution in such cases: just make sure your nested functions have ... as an argument and you won't get the unused argument error.
For example:
nested1 <- function(x, a) {
x + a
}
nested2 <- function(x, b) {
x - b
}
f <- function(x, ...) {
if (x >= 0) {
nested1(x, ...)
} else {
nested2(x, ...)
}
}
If we call f(x = 2, a = 3, b = 4) we get an error: Error in nested1(x, ...) : unused argument (b = 4).
But just add a ... to the formals of nested1 and nested2 and run again:
nested1 <- function(x, a, ...) {
x + a
}
nested2 <- function(x, b, ...) {
x - b
}
Now, f(x = 2, a = 3, b = 4) yields the desired result: 5. Problem solved.

Resources