I recently posted two questions (1, 2) related to functions I was trying to write. I received useful answers to each, which resulted in the following two functions:
second_table <- function(dat, variable1, variable2){
dat %>%
tabyl({{variable1}}, {{variable2}}, show_na = FALSE) %>%
adorn_percentages("row") %>%
adorn_pct_formatting(digits = 1) %>%
adorn_ns()
}
And
second_table2 = function(dat, variable1, variable2){
variable1 <- sym(variable1)
dat %>%
tabyl(!!variable1, {{variable2}}, show_na = FALSE) %>%
adorn_percentages("row") %>%
adorn_pct_formatting(digits = 1) %>%
adorn_ns()
}
These functions work as intended, but I had never used the rlang package before and am still confused about the difference between the {{}} operator and !! + sym() after looking through the available documentation and writing some additional functions. I don't like to use code that I don't fully understand and am sure I will have further use for these rlang operators in the future, so would greatly appreciate a plain-language explanation of what the difference is between these operators.
R has a particular feature called non-standard evaluation (NSE), where expressions are used as-is instead of being evaluated. Most people first encounter NSE when they load packages:
a <- "rlang"
print(a) # Standard evaluation - the expression a is evaluated to its value
# [1] "rlang"
library(a) # Non-standard evaluation - the expression a is used as-is
# Error in library(a) : there is no package called ‘a’
rlang enables sophisticated NSE by providing three main functions to capture unevaluated symbols and expressions:
sym("x") captures a symbol (i.e., variable name, column name, etc.). Older versions allowed for sym(x), but I think the latest version of rlang forces the input to be a string.
expr(a + b) captures arbitrary expressions
quo(a + b) captures arbitrary expressions AND the environment where these expression were defined.
The difference between expressions and quosures is that evaluating the former will be done in the immediate environment, while the latter is always evaluated in the environment where the expression was captured:
f <- function(e) {a <- 2; b <- 3; eval_tidy(e)}
a <- 5; b <- 10
f(expr(a+b)) # Evaluated inside f
# [1] 5
f(quo(a+b)) # Evaluated in the environment where it is captured
# [1] 15
All three verbs have en-equivalents: ensym, enexpr and enquo. These are used to capture symbols and expressions provided to a function from within that function. This is useful when you want to remove the need for a user of the function to use sym, etc. themselves:
f <- function(x) {enexpr(x)} # Expression captured within a function
f(a+b)
# This has exact equivalence to
f <- function(x) {x}
f(expr(a+b)) # The user has to do the capture themselves
In all cases, the operator !! evaluates symbols and expressions. Think of it as eval() on steroids, because !! forces immediate evaluation that takes precedence over everything else. Among other things, this can be useful for iterative construction of more complicated expressions:
a <- expr(b + 2)
expr(d * !!a) # a is evaluated immediately
# d * (b + 2)
expr(d * eval(a)) # evaluation of a is delayed
# d * eval(a)
With all that said, {{x}} is shorthand notation for !!enquo(x)
Related
usage of pipe operator in purrr-dplyr packages is (in short) defined as follows:
y%>%f(x,.,z) is the same as f(x,y,z)
I am trying to do the following task using pipe operator. First I show you the task without using pipes:
#####for reproducibility
set.seed(50)
z0<-factor(sample(c(letters[1:3],NA),100,replace = T))
###the task
rep(1,length(table(z0)))
now I want to do this using pipes:
z0%>%table%>%rep(1,length(.))
however the result is not the same. It seems that pipe operator cannot handle the proper assignation to a composition of functions. That is
y%>%f(x,g(.)) should be the same as f(x,g(y))
so, the concrete question is if ti is possible to do
y%>%f(x,g(.))
Thank you in advance for your comments.
The %>% implements a first argument rule, that is, it passes the previous data as first argument to the function if . is not a direct argument; In your second case, the argument to rep is 1 and length(.), so the first argument rule takes effect; To avoid this, use {} to enclose the expression; You can read more about this at Re-using the placeholder for attributes:
Re-using the placeholder for attributes
It is straight-forward to use the placeholder several times in a
right-hand side expression. However, when the placeholder only appears
in a nested expressions magrittr will still apply the first-argument
rule. The reason is that in most cases this results more clean code.
x %>% f(y = nrow(.), z = ncol(.)) is equivalent to f(x, y = nrow(x), z = ncol(x))
The behavior can be overruled by enclosing the right-hand side in
braces:
x %>% {f(y = nrow(.), z = ncol(.))} is equivalent to f(y = nrow(x), z = ncol(x))
rep(1,length(table(z0)))
# [1] 1 1 1
Equivalent would be:
z0 %>% table %>% {rep(1,length(.))}
# [1] 1 1 1
Original question
Can anyone explain to me why unquote does not work in the following?
I want to pass on a (function) user-specified column name in a call to do in version 0.7.4 of dplyr. This does seem somewhat less awkward than the older standard evaluation approach using do_. A basic (successful) example ignoring the fact that using do here is very unnecessary would be something like:
sum_with_do <- function(D, x, ...) {
x <- rlang::ensym(x)
gr <- quos(...)
D %>%
group_by(!!! gr) %>%
do(data.frame(y=sum(.[[quo_name(x)]])))
}
D <- data.frame(group=c('A','A','B'), response=c(1,2,3))
sum_with_do(D, response, group)
# A tibble: 2 x 2
# Groups: group [2]
group y
<fct> <dbl>
1 A 3.
2 B 3.
The rlang:: is unnecessary as of dplyr 0.7.5 which now exports ensym. I have included lionel's suggestion regarding using ensym here rather than enquo, as the former guarantees that the value of x is a symbol (not an expression).
Unquoting not useful here (e.g. other dplyr examples), replacing quo_name(x) with !! x in the above produces the following error:
Error in ~response : object 'response' not found
Explanation
As per the accepted response, the underlying reason is that do does not evaluate the expression in the same environment that other dplyr functions (e.g. mutate) use.
I did not find this to be abundantly clear from either the documentation or the source code (e.g. compare the source for mutate and do for data.frames and follow Alice down the rabbit hole if you wish), but essentially - and this is probably nothing new to most;
do evaluates expressions in an environment whose parent is the calling environment, and attaches the current group (slice) of the data.frame to the symbol ., and;
other dplyr functions 'more or less' evaluate the expressions in the environment of the data.frame with parent being the calling environment.
See also Advanced R. 22. Evaluation for a description in terms of 'data masking'.
This is because of regular do() semantics where there is no data masking apart from .:
do(df, data.frame(y = sum(.$response)))
#> y
#> 1 6
do(df, data.frame(y = sum(.[[response]])))
#> Error: object 'response' not found
So you just need to capture the bare column name as a string and there is no need to unquote since there is no data masking:
sum_with_do <- function(df, x, ...) {
# ensym() guarantees that `x` is a simple column name and not a
# complex expression:
x <- as.character(ensym(x))
df %>%
group_by(...) %>%
do(data.frame(y = sum(.[[x]])))
}
In the official docs, it says:
substitute returns the parse tree for the (unevaluated) expression
expr, substituting any variables bound in env.
quote simply returns its argument. The argument is not evaluated and
can be any R expression.
But when I try:
> x <- 1
> substitute(x)
x
> quote(x)
x
It looks like both quote and substitute returns the expression that's passed as argument to them.
So my question is, what's the difference between substitute and quote, and what does it mean to "substituting any variables bound in env"?
Here's an example that may help you to easily see the difference between quote() and substitute(), in one of the settings (processing function arguments) where substitute() is most commonly used:
f <- function(argX) {
list(quote(argX),
substitute(argX),
argX)
}
suppliedArgX <- 100
f(argX = suppliedArgX)
# [[1]]
# argX
#
# [[2]]
# suppliedArgX
#
# [[3]]
# [1] 100
R has lazy evaluation, so the identity of a variable name token is a little less clear than in other languages. This is used in libraries like dplyr where you can write, for instance:
summarise(mtcars, total_cyl = sum(cyl))
We can ask what each of these tokens means: summarise and sum are defined functions, mtcars is a defined data frame, total_cyl is a keyword argument for the function summarise. But what is cyl?
> cyl
Error: object 'cyl' not found
It isn't anything! Well, not yet. R doesn't evaluate it right away, but treats it as an expression to be parsed later with some parse tree that is different than the global environment your command line is working in, specifically one where the columns of mtcars are defined. Somewhere in the guts of dplyr, something like this is happening:
> substitute(cyl, mtcars)
[1] 6 6 4 6 8 ...
Suddenly cyl means something. That's what substitute is for.
So what is quote for? Well sometimes you want your lazily-evaluated expression to be represented somewhere else before it's evaluated, i.e. you want to display the actual code you're writing without any (or only some) values substituted. The docs you quoted explain this is common for "informative labels for data sets and plots".
So, for example, you could create a quoted expression, and then both print the unevaluated expression in your chart to show how you calculated and actually calculate with the expression.
expr <- quote(x + y)
print(expr) # x + y
eval(expr, list(x = 1, y = 2)) # 3
Note that substitute can do this expression trick also while giving you the option to parse only part of it. So its features are a superset of quote.
expr <- substitute(x + y, list(x = 1))
print(expr) # 1 + y
eval(expr, list(y = 2)) # 3
Maybe this section of the documentation will help somewhat:
Substitution takes place by examining each component of the parse tree
as follows: If it is not a bound symbol in env, it is unchanged. If it
is a promise object, i.e., a formal argument to a function or
explicitly created using delayedAssign(), the expression slot of the
promise replaces the symbol. If it is an ordinary variable, its value
is substituted, unless env is .GlobalEnv in which case the symbol is
left unchanged.
Note the final bit, and consider this example:
e <- new.env()
assign(x = "a",value = 1,envir = e)
> substitute(a,env = e)
[1] 1
Compare that with:
> quote(a)
a
So there are two basic situations when the substitution will occur: when we're using it on an argument of a function, and when env is some environment other than .GlobalEnv. So that's why you particular example was confusing.
For another comparison with quote, consider modifying the myplot function in the examples section to be:
myplot <- function(x, y)
plot(x, y, xlab = deparse(quote(x)),
ylab = deparse(quote(y)))
and you'll see that quote really doesn't do any substitution.
Regarding your question why GlobalEnv is treated as an exception for substitute, it is just a heritage of S. From The R language definition (https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Substitutions):
The special exception for substituting at the top level is admittedly peculiar. It has been inherited from S and the rationale is most likely that there is no control over which variables might be bound at that level so that it would be better to just make substitute act as quote.
I have seen the use of %>% (percent greater than percent) function in some packages like dplyr and rvest. What does it mean? Is it a way to write closure blocks in R?
%...% operators
%>% has no builtin meaning but the user (or a package) is free to define operators of the form %whatever% in any way they like. For example, this function will return a string consisting of its left argument followed by a comma and space and then it's right argument.
"%,%" <- function(x, y) paste0(x, ", ", y)
# test run
"Hello" %,% "World"
## [1] "Hello, World"
The base of R provides %*% (matrix mulitiplication), %/% (integer division), %in% (is lhs a component of the rhs?), %o% (outer product) and %x% (kronecker product). It is not clear whether %% falls in this category or not but it represents modulo.
expm The R package, expm, defines a matrix power operator %^%. For an example see Matrix power in R .
operators The operators R package has defined a large number of such operators such as %!in% (for not %in%). See http://cran.r-project.org/web/packages/operators/operators.pdf
igraph This package defines %--% , %->% and %<-% to select edges.
lubridate This package defines %m+% and %m-% to add and subtract months and %--% to define an interval. igraph also defines %--% .
Pipes
magrittr In the case of %>% the magrittr R package has defined it as discussed in the magrittr vignette. See http://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html
magittr has also defined a number of other such operators too. See the Additional Pipe Operators section of the prior link which discusses %T>%, %<>% and %$% and http://cran.r-project.org/web/packages/magrittr/magrittr.pdf for even more details.
dplyr The dplyr R package used to define a %.% operator which is similar; however, it has been deprecated and dplyr now recommends that users use %>% which dplyr imports from magrittr and makes available to the dplyr user. As David Arenburg has mentioned in the comments this SO question discusses the differences between it and magrittr's %>% : Differences between %.% (dplyr) and %>% (magrittr)
pipeR The R package, pipeR, defines a %>>% operator that is similar to magrittr's %>% and can be used as an alternative to it. See http://renkun.me/pipeR-tutorial/
The pipeR package also has defined a number of other such operators too. See: http://cran.r-project.org/web/packages/pipeR/pipeR.pdf
postlogic The postlogic package defined %if% and %unless% operators.
wrapr The R package, wrapr, defines a dot pipe %.>% that is an explicit version of %>% in that it does not do implicit insertion of arguments but only substitutes explicit uses of dot on the right hand side. This can be considered as another alternative to %>%. See https://winvector.github.io/wrapr/articles/dot_pipe.html
Bizarro pipe. This is not really a pipe but rather some clever base syntax to work in a way similar to pipes without actually using pipes. It is discussed in http://www.win-vector.com/blog/2017/01/using-the-bizarro-pipe-to-debug-magrittr-pipelines-in-r/ The idea is that instead of writing:
1:8 %>% sum %>% sqrt
## [1] 6
one writes the following. In this case we explicitly use dot rather than eliding the dot argument and end each component of the pipeline with an assignment to the variable whose name is dot (.) . We follow that with a semicolon.
1:8 ->.; sum(.) ->.; sqrt(.)
## [1] 6
Update Added info on expm package and simplified example at top. Added postlogic package.
Update 2 The development version of R has defined a |> pipe. Unlike magrittr's %>% it can only substitute into the first argument of the right hand side. Although limited, it works via syntax transformation so it has no performance impact.
%>% is similar to pipe in Unix. For example, in
a <- combined_data_set %>% group_by(Outlet_Identifier) %>% tally()
the output of combined_data_set will go into group_by and its output will go into tally, then the final output is assigned to a.
This gives you handy and easy way to use functions in series without creating variables and storing intermediate values.
My understanding after reading the link offered by G.Grothendieck is that %>% is an operator that pipes functions. This helps readability and productivity as it's easier to follow the flow of multiple functions through these pipes than going backwards when multiple function are nested.
The R packages dplyr and sf import the operator %>% from the R package magrittr.
Help is available by using the following command:
?'%>%'
Of course the package must be loaded before by using e.g.
library(sf)
The documentation of the magrittr forward-pipe operator gives a good example:
When functions require only one argument, x %>% f is equivalent to f(x)
Another usage for %---% is the use of %<-% which means a multi-assignment operator for example:
session <- function(){
x <- 1
y <- 2
z <- y + x
list(x,y,z)
}
c(var1,var2,result) %<-% session()
I don't know much about it but I have seen it in one case study during the study of Multivariate Normal Distribution in R in my college
suppose you have a data frame in a variable called "df_gather" and you want to pipe it into a ggplot then you can use that %>%
EG:
df_gather %>% ggplot(aes(x = Value, fill = Variable, color = Variable))+
geom_density(alpha = 0.3)+ggtitle('Distibution of X')
This question already has answers here:
Is it possible to get F#'s function application "|>" operator in R? [duplicate]
(2 answers)
Closed 7 years ago.
How can you implement F#'s forward pipe operator in R? The operator makes it possible to easily chain a sequence of calculations. For example, when you have an input data and want to call functions foo and bar in sequence, you can write:
data |> foo |> bar
Instead of writing bar(foo(data)). The benefits are that you avoid some parentheses and the computations are written in the same order in which they are executed (left-to-right). In F#, the operator is defined as follows:
let (|>) a f = f a
It would appear that %...% can be used for binary operators, but how would this work?
I don't know how well it would hold up to any real use, but this seems (?) to do what you want, at least for single-argument functions ...
> "%>%" <- function(x,f) do.call(f,list(x))
> pi %>% sin
[1] 1.224606e-16
> pi %>% sin %>% cos
[1] 1
> cos(sin(pi))
[1] 1
For what it's worth, as of now (3 December 2021), in addition to the magrittr/tidyverse pipe (%>%), there also a native pipe |> in R (and an experimental => operator that can be enabled in the development version): see here, for example.
Edit: package now on CRAN. Example included.
The magrittr package is made for this.
install.packages("magrittr")
Example:
iris %>%
subset(Sepal.Length > 5) %>%
aggregate(. ~ Species, ., mean)
Also, see the vignette:http://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html
It has quite a few useful features if you like the F# pipe, and who doesn't?!
The problem is that you are talking about entirely different paradigms of calling functions so it's not really clear what you want. R only uses what in F# would be tuple arguments (named in R), so one way to think of it is trivially
fp = function(x, f) f(x)
which will perform the call so for example
> fp(4, print)
[1] 4
This is equivalent, but won't work in non-tupple case like 4 |> f x y because there is no such thing in R. You could try to emulate the F# functional behavior, but it would be awkward:
fp = function(x, f, ...) function(...) f(x, ...)
That will be always functional and thus chaining will work so for example
> tri = function(x, y, z) paste(x,y,z)
> fp("foo", fp("mar", tri))("bar")
[1] "mar foo bar"
but since R doesn't convert incomplete calls into functions it's not really useful. Instead, R has much more flexible calling based on the tuple concept. Note that R uses a mixture of functional and imperative paradigm, it is not purely functional so it doesn't perform argument value matching etc.
Edit: since you changed the question in that you are interested in syntax and only a special case, just replace fp above with the infix notation:
`%>%` = function(x, f) f(x)
> 1:10 %>% range %>% mean
[1] 5.5
(Using Ben's operator ;))