Is there a way to retrieve function arguments from an evaluated formula that are not specified in the function call?
For example, consider the call seq(1, 10). If I wanted to get the first argument, I could use quote() and simply use quote(seq(1,10))[[1]]. However, this only works if the argument is defined at the function call (instead of having a default value) and I need to know its exact position.
In this example, is there some way to get the by argument from seq(1, 10) without a lengthy list of if statements to see if it is defined?
The first thing to note is that all of the named arguments you're after (from, to, by, etc.) belong to seq.default(), the method that is dispatched by your call to seq(), and not to seq() itself. (seq() itself only has one formal, ...).
From there you can use these two building blocks
## (1) Retrieves pairlist of all formals
formals(seq.default)
# [long pairlist object omitted to save space]
## (2) Matches supplied arguments to formals
match.call(definition = seq.default, call = quote(seq.default(1,10)))
# seq.default(from = 1, to = 10)
to do something like this:
modifyList(formals(seq.default),
as.list(match.call(seq.default, quote(seq.default(1,10))))[-1])
# $from
# [1] 1
#
# $to
# [1] 10
#
# $by
# ((to - from)/(length.out - 1))
#
# $length.out
# NULL
#
# $along.with
# NULL
#
# $...
Related
The syntax for using scales::label_percent() in a mutate function is unusual because it uses double parentheses:
label_percent()(an_equation_goes_here)
I don't think I have seen ()() syntax in R before and I don't know how to look it up because I don't know what it is called. I tried ?`()()` and ??`()()` and neither helped. What is double parentheses syntax called? Can someone recommend a place to read about it?
Here is an example for context:
library(tidyverse)
members <-
read_csv(
paste0(
"https://raw.githubusercontent.com/rfordatascience/tidytuesday/",
"master/data/2020/2020-09-22/members.csv"
),
show_col_types = FALSE)
members %>%
count(success, died) %>%
group_by(success) %>%
# old syntax:
# mutate(percent = scales::percent(n / sum(n)))
# new syntax:
mutate(percent = scales::label_percent()(n / sum(n)))
#> # A tibble: 4 × 4
#> # Groups: success [2]
#> success died n percent
#> <lgl> <lgl> <int> <chr>
#> 1 FALSE FALSE 46452 98%
#> 2 FALSE TRUE 868 2%
#> 3 TRUE FALSE 28961 99%
#> 4 TRUE TRUE 238 1%
Created on 2023-01-01 with reprex v2.0.2
Most functions return a value, whether something atomic (numeric, integer, character), list-like (including data.frame), or something more complex. For those, the single set of ()s (as you recognize) are for the one call.
Occasionally, however, a function call returns a function. For example, if we look at ?scales::label_percent, we can scroll down to
Value:
All 'label_()' functions return a "labelling" function, i.e. a
function that takes a vector 'x' and returns a character vector of
'length(x)' giving a label for each input value.
Let's look at it step-by-step:
fun <- scales::label_percent()
fun
# function (x)
# {
# number(x, accuracy = accuracy, scale = scale, prefix = prefix,
# suffix = suffix, big.mark = big.mark, decimal.mark = decimal.mark,
# style_positive = style_positive, style_negative = style_negative,
# scale_cut = scale_cut, trim = trim, ...)
# }
# <bytecode: 0x00000168ee5440e8>
# <environment: 0x00000168ee5501b8>
fun(0.35)
# [1] "35%"
The first call to scales::label_percent() returned a function. We can then use that function with as many arguments as we want.
If you don't want to store the returned function in a variable like fun, you can use it immediately by following the first set of ()s with another set of parens.
scales::label_percent()(0.35)
# [1] "35%"
A related question is "why would you want a function to return another function?" There are many stylistic reasons, but in the case of scales::label_*, they are designed to be used in places where the option needs to be expressed as a function, not as a static value. For example, it can be used in ggplot code: axis ticks are often placed conveniently with simple heuristics to determine the count, locations, and rendering of the ticks marks. While one can use ggplot2::scale_*_manual(values = ...) to manually control how many, where, and what they look like, it is often more convenient to not care a priori how many or where, and in cases where faceting is used, it can vary per faceting variable(s), so not something one can easily assign in a static variable. In those cases, it is often better to assign a function that is given some simple parameters (such as the min/max of the axis), and the function returns something meaningful.
Why can't we just pass it scales::label_percent? (Good question.) Even though you're using the default values in your call here, one might want to change any or all of the controllable things, such as:
suffix= defaults to "%", but perhaps you want a space as in " %"?
decimal.mark= defaults to ".", but maybe your locale prefers commas?
While it is feasible to have multiple functions for all of the combinations of these options, it is generally easier in the long run to provide a "template function" for creating the function, such as
fun <- scales::label_percent(accuracy = 0.01, suffix = " %", decimal.mark = ",")
fun(0.353)
# [1] "35,30 %"
scales::label_percent(accuracy = 0.01, suffix = " %", decimal.mark = ",")(0.353)
# [1] "35,30 %"
An Expression followed by an argument list in round parentheses (( / )) is called a Function Call in R.
There's no need to have a special name for two function calls in a row. They're still just function calls.
If we run a function and the value returned by the function is itself a function then we could call one that too.
For example, we first run f using f() assigning the return value to g but the return value is itself a function so g is a function -- it is the function function() 3 -- and we can run that too.
# f is a function which returns a function
f <- function() function() 3
g <- f() # this runs f which returns `function() 3`
g() # thus g is a function so we can call it
## [1] 3
Now putting that all together we can write it in one line as
f()()
## [1] 3
As seen there is only one meaning for () and the fact that there were two together was simply because we were calling the result of a call.
I have a function that I'd like to know its default values. However, the popular answer given in other posts formals("fun-name") does not satisfy my need. The reason is that I have a function called introBox from the rintrojs package.
The call to formals as follows yields:
formals("introBox")
# $...
#
#
# $data.step
#
#
# $data.intro
#
#
# $data.hint
#
#
# $data.position
# c("bottom", "auto", "top", "left", "right",
# "bottom", "bottom-left_aligned", "bottom-middle-aligned",
# "bottom-right-aligned", "auto")
I know that the argument data.position takes 1 value from the set of choices, i.e., "bottom", "auto" etc. How can one know what that default value is without looking into the function's internals?
You can get the default values of the arguments of a function by using formals() only if the author assigned the default values. If you don't, it means the author did not assign any default value for the function argument.
For example,
formals(lm)
#$formula
#$data
#$subset
#$weights
#$na.action
#$method
#[1] "qr"
#$model
#[1] TRUE
#$x
#[1] FALSE
#$y
#[1] FALSE
#$qr
#[1] TRUE
#$singular.ok
#[1] TRUE
#$contrasts
#NULL
#$offset
#$...
It shows that the default value of method argument in lm() is qr, but there is no default value for formula.
We can write a function in R with without assigning any default value to its argument.
myfun1 = function(any_letter, any_number) print(c(any_letter, any_number))
myfun2 = function(any_letter = 'a', any_number) print(c(any_letter, any_number))
myfun3 = function(any_letter = 'a', any_number = 3) print(c(any_letter, any_number))
Because of the lazy evaluation, no error message will pop up when these functions are created. But the error message will pop up when the function with no default value is called and no value is assigned by the user:
#myfun2()
#Error in print(c(any_letter, any_number)) :
# argument "any_number" is missing, with no default
#myfun3()
[1] "a" "3"
Again, you can find the default values, if any, by using formals:
formals(myfun2)
#$any_letter
#[1] "a"
#$any_number
Edit to mention match.arg
To assign a character vector to a function argument, such as in data.position, is an efficient way to provide a table of allowed candidate values. The function internals then uses match.arg to check the user's choice and check if it matches the table, and produces an error message if it doesn't match. The Description in match.arg documentation says "match.arg matches arg against a table of candidate values as specified by choices, where NULL means to take the first one." So, in this case, the default for data.position is bottom because it is the first value in the character list.
I'm trying to program over a function inside a package, but I'm stuck with the function internally using match.call() to parse one of its arguments.
A super-simplified example of the function with the usual utilization could look like this:
f1 = function(x, y=0, z=0, a=0, b=0){ #lots of arguments not needed for the example
mc = match.call()
return(mc$x)
#Returning for testing purpose.
#Normally, the function later uses calls as character:
r1 = as.character(mc$x[1])
r2 = as.character(mc$x[2])
#...
}
x1 = f1(x = foo(bar))
x1
# foo(bar)
class(x1)
# [1] "call"
In my case, I need to get the value of x from a variable (value in the following code). Expected utilisation of f1 is as following :
value = "foo(bar)" #this line could also be anything else
f1(x=some_magic_function(value))
# Expected result = foo(bar)
# Unwanted result = some_magic_function(value)
Unfortunately, match.call() always return the very input value. I'm quite out of my league here so I only tried few functions.
Is there any way I could trick match.call() so it could accept external variable ?
Failed attempts so far:
#I tried to create the exact same call using rlang::sym()
#This may not be the best way...
value = call("foo", rlang::sym("bar"))
value
# foo(bar)
class(value)
# [1] "call"
x1==value
# [1] TRUE
f1(x=value)
# value
f1(x=eval(value))
# eval(value)
f1(x=substitute(value))
# substitute(value)
There's nothing you can include as a parameter to f1 to make this work. Instead, you would dynamically need to build your call to f1. With base R you might do this with do.call.
do.call("f1", list(parse(text=value)[[1]]))
or with rlang
eval_tidy(quo(f1(!!parse_expr(value))))
Am working through the section on vectors in "The Book on R", which has given the following examples:
length(x=c(3,2,8,1))
# [1] 4
length(x=5:13)
# [1] 9
foo <- 4
bar <- c(3,8.3,rep(x=32,times=foo),seq(from=-2,to=1,length.out=foo+1))
length(x=bar)
# [1] 11
But if the input length(c(3,2,8,1)) is going to give you the output 4 anyway, why would you add in x=? What is the purpose of x=? At first I thought it had to do with variables but R did not reflect that x was holding the vector (3,2,8,1) after I typed length(x=c(3,2,8,1)).
And why does length(y=c(5:13)) does not work but gives an error:
Error in length(y = 5:13) : supplied argument name 'y' does not match 'x'
R has named arguments for functions. Check this section of R's doc for some information on the subject.
So x is just the name that was given to the first argument of function length, it has nothing to do with any variable in your environment that may be named x.
Overall, it's a pretty handy feature:
it allows you to pass arguments in any order (if you use the arg = ... syntax)
the function's writer can give hints to users about what type of arguments are expected
combined with auto-completion, it helps to remember a function's syntax and usage
and it is optional, since you can also pass arguments without naming them:
'
matrix(data = 1:12, ncol = 3) # is equivalent to:
matrix(1:12,,3)
You can also use it to write some really confusing stuff (of course, not recommended), such as:
x <- 1:3
length(x = x) # 3
length(x = (x <- 1:4)) # 4 ...
x # 1 2 3 4
By "replacement functions" I mean those mentioned in this thread What are Replacement Functions in R?, ones that look like 'length<-'(x, value). When I was working with such functions I encountered something weird. It seems that a replacement function only works when variables are named according to a certain rule.
Here is my code:
a <- c(1,2,3)
I will try to change the first element of a, using one of the 3 replacement functions below.
'first0<-' <- function(x, value){
x[1] <- value
x
}
first0(a) <- 5
a
# returns [1] 5 2 3.
The first one works pretty well... but then when I change the name of arguments in the definition,
'first1<-' <- function(somex, somevalue){
somex[1] <- somevalue
somex
}
first1(a) <- 9
# Error in `first1<-`(`*tmp*`, value = 9) : unused argument (value = 9)
a
# returns [1] 5 2 3
It fails to work, though the following code is OK:
a <- 'first1<-'(a, 9)
a
# returns [1] 9 2 3
Some other names work well, too, if they are similar to x and value, it seems:
'first2<-' <- function(x11, value11){
x11[1] <- value11
x11
}
first2(a) <- 33
a
# returns [1] 33 2 3
This doesn't make sense to me. Do the names of variables actually matter or did I make some mistakes?
There are two things going on here. First, the only real rule of replacement functions is that the new value will be passed as a parameter named value and it will be the last parameter. That's why when you specify the signature function(somex, somevalue), you get the error unused argument (value = 9) and the assignment doesn't work.
Secondly, things work with the signature function(x11, value11) thanks to partial matching of parameter names in R. Consider this example
f<-function(a, value1234=5) {
print(value1234)
}
f(value=5)
# [1] 5
Note that 5 is returned. This behavior is defined under argument matching in the language definition.
Another way to see what's going on is to print the call signature of what's actually being called.
'first0<-' <- function(x, value){
print(sys.call())
x[1] <- value
x
}
a <- c(1,2,3)
first0(a) <- 5
# `first0<-`(`*tmp*`, value = 5)
So the first parameter is actually passed as an unnamed positional parameter, and the new value is passed as the named parameter value=. This is the only parameter name that matters.