The syntax for using scales::label_percent() in a mutate function is unusual because it uses double parentheses:
label_percent()(an_equation_goes_here)
I don't think I have seen ()() syntax in R before and I don't know how to look it up because I don't know what it is called. I tried ?`()()` and ??`()()` and neither helped. What is double parentheses syntax called? Can someone recommend a place to read about it?
Here is an example for context:
library(tidyverse)
members <-
read_csv(
paste0(
"https://raw.githubusercontent.com/rfordatascience/tidytuesday/",
"master/data/2020/2020-09-22/members.csv"
),
show_col_types = FALSE)
members %>%
count(success, died) %>%
group_by(success) %>%
# old syntax:
# mutate(percent = scales::percent(n / sum(n)))
# new syntax:
mutate(percent = scales::label_percent()(n / sum(n)))
#> # A tibble: 4 × 4
#> # Groups: success [2]
#> success died n percent
#> <lgl> <lgl> <int> <chr>
#> 1 FALSE FALSE 46452 98%
#> 2 FALSE TRUE 868 2%
#> 3 TRUE FALSE 28961 99%
#> 4 TRUE TRUE 238 1%
Created on 2023-01-01 with reprex v2.0.2
Most functions return a value, whether something atomic (numeric, integer, character), list-like (including data.frame), or something more complex. For those, the single set of ()s (as you recognize) are for the one call.
Occasionally, however, a function call returns a function. For example, if we look at ?scales::label_percent, we can scroll down to
Value:
All 'label_()' functions return a "labelling" function, i.e. a
function that takes a vector 'x' and returns a character vector of
'length(x)' giving a label for each input value.
Let's look at it step-by-step:
fun <- scales::label_percent()
fun
# function (x)
# {
# number(x, accuracy = accuracy, scale = scale, prefix = prefix,
# suffix = suffix, big.mark = big.mark, decimal.mark = decimal.mark,
# style_positive = style_positive, style_negative = style_negative,
# scale_cut = scale_cut, trim = trim, ...)
# }
# <bytecode: 0x00000168ee5440e8>
# <environment: 0x00000168ee5501b8>
fun(0.35)
# [1] "35%"
The first call to scales::label_percent() returned a function. We can then use that function with as many arguments as we want.
If you don't want to store the returned function in a variable like fun, you can use it immediately by following the first set of ()s with another set of parens.
scales::label_percent()(0.35)
# [1] "35%"
A related question is "why would you want a function to return another function?" There are many stylistic reasons, but in the case of scales::label_*, they are designed to be used in places where the option needs to be expressed as a function, not as a static value. For example, it can be used in ggplot code: axis ticks are often placed conveniently with simple heuristics to determine the count, locations, and rendering of the ticks marks. While one can use ggplot2::scale_*_manual(values = ...) to manually control how many, where, and what they look like, it is often more convenient to not care a priori how many or where, and in cases where faceting is used, it can vary per faceting variable(s), so not something one can easily assign in a static variable. In those cases, it is often better to assign a function that is given some simple parameters (such as the min/max of the axis), and the function returns something meaningful.
Why can't we just pass it scales::label_percent? (Good question.) Even though you're using the default values in your call here, one might want to change any or all of the controllable things, such as:
suffix= defaults to "%", but perhaps you want a space as in " %"?
decimal.mark= defaults to ".", but maybe your locale prefers commas?
While it is feasible to have multiple functions for all of the combinations of these options, it is generally easier in the long run to provide a "template function" for creating the function, such as
fun <- scales::label_percent(accuracy = 0.01, suffix = " %", decimal.mark = ",")
fun(0.353)
# [1] "35,30 %"
scales::label_percent(accuracy = 0.01, suffix = " %", decimal.mark = ",")(0.353)
# [1] "35,30 %"
An Expression followed by an argument list in round parentheses (( / )) is called a Function Call in R.
There's no need to have a special name for two function calls in a row. They're still just function calls.
If we run a function and the value returned by the function is itself a function then we could call one that too.
For example, we first run f using f() assigning the return value to g but the return value is itself a function so g is a function -- it is the function function() 3 -- and we can run that too.
# f is a function which returns a function
f <- function() function() 3
g <- f() # this runs f which returns `function() 3`
g() # thus g is a function so we can call it
## [1] 3
Now putting that all together we can write it in one line as
f()()
## [1] 3
As seen there is only one meaning for () and the fact that there were two together was simply because we were calling the result of a call.
Related
Background
Packages can include a lot of functions. Some of them require informative error messages, and perhaps some comments in the function to explain what/why is happening. An example, f1 in a hypothetical f1.R file. All documentation and comments (both why the error and why the condition) in one place.
f1 <- function(x){
if(!is.character(x)) stop("Only characters suported")
# user input ...
# .... NaN problem in g()
# ....
# ratio of magnitude negative integer i base ^ i is positive
if(x < .Machine$longdouble.min.exp / .Machine$longdouble.min.exp) stop("oof, an error")
log(x)
}
f1(-1)
# >Error in f1(-1) : oof, an error
I create a separate conds.R, specifying a function (and w warning, s suggestion) etc, for example.
e <- function(x){
switch(
as.character(x),
"1" = "Only character supported",
# user input ...
# .... NaN problem in g()
# ....
"2" = "oof, and error") |>
stop()
}
Then in, say, f.R script I can define f2 as
f2 <- function(x){
if(!is.character(x)) e(1)
# ratio of magnitude negative integer i base ^ i is positive
if(x < .Machine$longdouble.min.exp / .Machine$longdouble.min.exp) e(2)
log(x)
}
f2(-1)
#> Error in e(2) : oof, and error
Which does throw the error, and on top of it a nice traceback & rerun with debug option in the console. Further, as package maintainer I would prefer this as it avoids considering writing terse if statements + 1-line error message or aligning comments in a tryCatch statement.
Question
Is there a reason (not opinion on syntax) to avoid writing a conds.R in a package?
There is no reason to avoid writing conds.R. This is very common and good practice in package development, especially as many of the checks you want to do will be applicable across many functions (like asserting the input is character, as you've done above. Here's a nice example from dplyr.
library(dplyr)
df <- data.frame(x = 1:3, x = c("a", "b", "c"), y = 4:6)
names(df) <- c("x", "x", "y")
df
#> x x y
#> 1 1 a 4
#> 2 2 b 5
#> 3 3 c 6
df2 <- data.frame(x = 2:4, z = 7:9)
full_join(df, df2, by = "x")
#> Error: Input columns in `x` must be unique.
#> x Problem with `x`.
nest_join(df, df2, by = "x")
#> Error: Input columns in `x` must be unique.
#> x Problem with `x`.
traceback()
#> 7: stop(fallback)
#> 6: signal_abort(cnd)
#> 5: abort(c(glue("Input columns in `{input}` must be unique."), x = glue("Problem with {err_vars(vars[dup])}.")))
#> 4: check_duplicate_vars(x_names, "x")
#> 3: join_cols(tbl_vars(x), tbl_vars(y), by = by, suffix = c("", ""), keep = keep)
#> 2: nest_join.data.frame(df, df2, by = "x")
#> 1: nest_join(df, df2, by = "x")
Here, both functions rely code written in join-cols.R. Both call join_cols() which in turn calls check_duplicate_vars(), which I've copied the source code from:
check_duplicate_vars <- function(vars, input, error_call = caller_env()) {
dup <- duplicated(vars)
if (any(dup)) {
bullets <- c(
glue("Input columns in `{input}` must be unique."),
x = glue("Problem with {err_vars(vars[dup])}.")
)
abort(bullets, call = error_call)
}
}
Although different in syntax from what you wrote, it's designed to provide the same behaviour, and shows it is possible to include in a package and no reason (from my understanding) not to do this. However, I would add a few syntax points based on your code above:
I would bundle the check (if() statement) inside the package with the error raising to reduce repeating yourself in other areas you use the function.
It's often nicer to include the name of the variable or argument passed in so the error message is explicit, such as in the dplyr example above. This makes the error more clear to the user what is causing the problem, in this case, that the x column is not unique in df.
The traceback showing #> Error in e(2) : oof, and error in your example is more obscure to the user, especially as e() is likely not exported in the NAMESPACE and they would need to parse the source code to understand where the error is generated. If you use stop(..., .call = FALSE) or passing the calling environment through the nested functions, like in join-cols.R, then you can avoid not helpful information in the traceback(). This is for instance suggested in Hadley's Advanced R:
By default, the error message includes the call, but this is typically not useful (and recapitulates information that you can easily get from traceback()), so I think it’s good practice to use call. = FALSE
I am trying to make a function in R that outputs a data frame in a standard way, but that also allows the user to have the personalized columns that he deams necessary (the goal is to make a data format for paleomagnetic data, for which there are common informations that everybody use, and some more unusual that the user might like to keep in the format).
However, I realized that if the user wants the header of his data to be a prefix of one of the defined arguments of the data formating function (e.g. via the 'sheep' argument, that is a prefix of the 'sheepc' argument, see example below), the function interprets it as the defined argument (through partial name identification, see http://adv-r.had.co.nz/Functions.html#lexical-scoping for more details).
Is there a way to prevent this, or to at least give a warning to the user saying that he cannot use this name ?
PS I realize this question is similar to Disabling partial variable names in subsetting data frames, but I would like to avoid toying with the options of the future users of my function.
fun <- function(sheeta = 1, sheetb = 2, sheepc = 3, ...)
{
# I use the sheeta, sheetb and sheepc arguments for computations
# (more complex than shown below, but here thet are just there to give an example)
a <- sum(sheeta, sheetb)
df1 <- data.frame(standard = rep(a, sheepc))
df2 <- as.data.frame(list(...))
if(nrow(df1) == nrow(df2)){
res <- cbind(df1, df2)
return(res)
} else {
stop("Extra elements should be of length ", sheep)
}
}
fun(ball = rep(1,3))
#> standard ball
#> 1 3 1
#> 2 3 1
#> 3 3 1
fun(sheep = rep(1,3))
#> Error in rep(a, sheepc): argument 'times' incorrect
fun(sheet = rep(1,3))
#> Error in fun(sheet = rep(1, 3)) :
#> argument 1 matches multiple formal arguments
From the language definition:
If the formal arguments contain ‘...’ then partial matching is only
applied to arguments that precede it.
fun <- function(..., sheeta = 1, sheetb = 2, sheepc = 3)
{<your function body>}
fun(sheep = rep(1,3))
# standard sheep
#1 3 1
#2 3 1
#3 3 1
Of course, your function should have assertion checks for the non-... parameters (see help("stopifnot")). You could also consider adding a . or _ to their tags to make name collisions less likely.
Edit:
"would it be possible to achieve the same effect without having the ... at the beginning ?"
Yes, here is a quick example with one parameter:
fun <- function(sheepc = 3, ...)
{
stopifnot("partial matching detected" = identical(sys.call(), match.call()))
list(...)
}
fun(sheep = rep(1,3))
# Error in fun(sheep = rep(1, 3)) : partial matching detected
fun(ball = rep(1,3))
#$ball
#[1] 1 1 1
I am trying to write a function to extract package names from a list of R script files. My regular expression do not seem to be working and I am not sure why. For begginers, I am not able to match lines that include library. For example
str <- c(" library(abc)", "library(def)", "some other text")
grep("library\\(", str, value = TRUE)
grep("library\\(+[A-z]\\)", str, value = TRUE)
Why does my second grep do not return elements 1 and 2 from the str vector? I have tried so many options but all my results come back empty.
Your second grep does not return 1,2 for two reasons.
You used value=TRUE which makes it return the matching string instead of the
location. and
You misplaced the +. You wantgrep("library\\(\\w+\\)", str)
If you'd like something a bit more robust that will handle some edge cases (library() takes a number of parameters and the package one can be a name/symbol or a string and doesn't necessarily have to be specified first):
library(purrr)
script <- '
library(js) ; library(foo)
#
library("V8")
ls()
library(package=rvest)
TRUE
library(package="hrbrthemes")
1 + 1
library(quietly=TRUE, "ggplot2")
library(quietly=TRUE, package=dplyr, verbose=TRUE)
'
x <- parse(textConnection(script)) # parse w/o eval
keep(x, is.language) %>% # `library()` is a language object
keep(~languageEl(.x, 1) == "library") %>% # other things are too, so only keep `library()` ones
map(as.call) %>% # turn it into a `call` object
map(match.call, definition = library) %>% # so we can match up parameters and get them in the right order
map(languageEl, 2) %>% # language element 1 is `library`
map_chr(as.character) %>% # turn names/symbols into characters
sort() # why not
## [1] "dplyr" "foo" "ggplot2" "hrbrthemes" "js" "rvest" "V8"
This won't catch library() calls within functions (it could be expanded to do that) but if top-level edge cases are infrequent, there is an even smaller likelihood of ones in functions (those wld likely use require() as well).
Am working through the section on vectors in "The Book on R", which has given the following examples:
length(x=c(3,2,8,1))
# [1] 4
length(x=5:13)
# [1] 9
foo <- 4
bar <- c(3,8.3,rep(x=32,times=foo),seq(from=-2,to=1,length.out=foo+1))
length(x=bar)
# [1] 11
But if the input length(c(3,2,8,1)) is going to give you the output 4 anyway, why would you add in x=? What is the purpose of x=? At first I thought it had to do with variables but R did not reflect that x was holding the vector (3,2,8,1) after I typed length(x=c(3,2,8,1)).
And why does length(y=c(5:13)) does not work but gives an error:
Error in length(y = 5:13) : supplied argument name 'y' does not match 'x'
R has named arguments for functions. Check this section of R's doc for some information on the subject.
So x is just the name that was given to the first argument of function length, it has nothing to do with any variable in your environment that may be named x.
Overall, it's a pretty handy feature:
it allows you to pass arguments in any order (if you use the arg = ... syntax)
the function's writer can give hints to users about what type of arguments are expected
combined with auto-completion, it helps to remember a function's syntax and usage
and it is optional, since you can also pass arguments without naming them:
'
matrix(data = 1:12, ncol = 3) # is equivalent to:
matrix(1:12,,3)
You can also use it to write some really confusing stuff (of course, not recommended), such as:
x <- 1:3
length(x = x) # 3
length(x = (x <- 1:4)) # 4 ...
x # 1 2 3 4
Is there a way to retrieve function arguments from an evaluated formula that are not specified in the function call?
For example, consider the call seq(1, 10). If I wanted to get the first argument, I could use quote() and simply use quote(seq(1,10))[[1]]. However, this only works if the argument is defined at the function call (instead of having a default value) and I need to know its exact position.
In this example, is there some way to get the by argument from seq(1, 10) without a lengthy list of if statements to see if it is defined?
The first thing to note is that all of the named arguments you're after (from, to, by, etc.) belong to seq.default(), the method that is dispatched by your call to seq(), and not to seq() itself. (seq() itself only has one formal, ...).
From there you can use these two building blocks
## (1) Retrieves pairlist of all formals
formals(seq.default)
# [long pairlist object omitted to save space]
## (2) Matches supplied arguments to formals
match.call(definition = seq.default, call = quote(seq.default(1,10)))
# seq.default(from = 1, to = 10)
to do something like this:
modifyList(formals(seq.default),
as.list(match.call(seq.default, quote(seq.default(1,10))))[-1])
# $from
# [1] 1
#
# $to
# [1] 10
#
# $by
# ((to - from)/(length.out - 1))
#
# $length.out
# NULL
#
# $along.with
# NULL
#
# $...