Using binary operator in lazyeval call with rlang - r

Let's say I want to add 1 to every value of a column using dplyr and standard evaluation.
I can do :
library(dplyr)
data <- head(iris)
var <- "Sepal.Length"
mutate(data, !!rlang::sym(var) := !!quo(`+`(!!rlang::sym(var), 1)))
But what if I would like to use + as binary operator and not as function ?
I can't figure out how to write the + with a symbol in a quosure.
In most of my attempts I got an error for trying to use a non-numeric argument (the symbol for example) with the binary operator +.
With the deprecated mutate_ you can use lazyeval::interp which allowed you to do it easily :
mutate_(data, .dots = setNames(list(lazyeval::interp(~var + 1, var = as.symbol(var))), var))
Any help would be appreciated. Thanks.

You can just use
mutate(data, !!rlang::sym(var) := (!!rlang::sym(var)) + 1)
Note the parenthesis around the bang-bang part. This is only necessary because you are probably using an older version of rlang. In older versions (<0.2) the !! has a very low precedence so the addition happens before the expansion. Starting with rlang 0.2 the !! has been given a different operator precedence and works more how you might expect.
Of course if you are applyting the same transformation to a bunch of columns, you might want to use the mutate_at, mutate_all, or mutate_if versions, which also allow the transformations to be specific with the formula syntax.
mutate_if(data, is.numeric, ~.x+1)
mutate_all(data, ~.x+1)
mutate_at(data, var, ~.x+1)

Related

Explanation of rlang operators used to write functions

I recently posted two questions (1, 2) related to functions I was trying to write. I received useful answers to each, which resulted in the following two functions:
second_table <- function(dat, variable1, variable2){
dat %>%
tabyl({{variable1}}, {{variable2}}, show_na = FALSE) %>%
adorn_percentages("row") %>%
adorn_pct_formatting(digits = 1) %>%
adorn_ns()
}
And
second_table2 = function(dat, variable1, variable2){
variable1 <- sym(variable1)
dat %>%
tabyl(!!variable1, {{variable2}}, show_na = FALSE) %>%
adorn_percentages("row") %>%
adorn_pct_formatting(digits = 1) %>%
adorn_ns()
}
These functions work as intended, but I had never used the rlang package before and am still confused about the difference between the {{}} operator and !! + sym() after looking through the available documentation and writing some additional functions. I don't like to use code that I don't fully understand and am sure I will have further use for these rlang operators in the future, so would greatly appreciate a plain-language explanation of what the difference is between these operators.
R has a particular feature called non-standard evaluation (NSE), where expressions are used as-is instead of being evaluated. Most people first encounter NSE when they load packages:
a <- "rlang"
print(a) # Standard evaluation - the expression a is evaluated to its value
# [1] "rlang"
library(a) # Non-standard evaluation - the expression a is used as-is
# Error in library(a) : there is no package called ‘a’
rlang enables sophisticated NSE by providing three main functions to capture unevaluated symbols and expressions:
sym("x") captures a symbol (i.e., variable name, column name, etc.). Older versions allowed for sym(x), but I think the latest version of rlang forces the input to be a string.
expr(a + b) captures arbitrary expressions
quo(a + b) captures arbitrary expressions AND the environment where these expression were defined.
The difference between expressions and quosures is that evaluating the former will be done in the immediate environment, while the latter is always evaluated in the environment where the expression was captured:
f <- function(e) {a <- 2; b <- 3; eval_tidy(e)}
a <- 5; b <- 10
f(expr(a+b)) # Evaluated inside f
# [1] 5
f(quo(a+b)) # Evaluated in the environment where it is captured
# [1] 15
All three verbs have en-equivalents: ensym, enexpr and enquo. These are used to capture symbols and expressions provided to a function from within that function. This is useful when you want to remove the need for a user of the function to use sym, etc. themselves:
f <- function(x) {enexpr(x)} # Expression captured within a function
f(a+b)
# This has exact equivalence to
f <- function(x) {x}
f(expr(a+b)) # The user has to do the capture themselves
In all cases, the operator !! evaluates symbols and expressions. Think of it as eval() on steroids, because !! forces immediate evaluation that takes precedence over everything else. Among other things, this can be useful for iterative construction of more complicated expressions:
a <- expr(b + 2)
expr(d * !!a) # a is evaluated immediately
# d * (b + 2)
expr(d * eval(a)) # evaluation of a is delayed
# d * eval(a)
With all that said, {{x}} is shorthand notation for !!enquo(x)

How to use rlang operators in a package?

I am writing a package that uses tidyverse functions, i.e. that use non-standard evaluation, like dplyr::filter for example:
setMethod("filter_by_id",
signature(x = "studies", id = "character"),
definition = function(x, id) {
study_id <- rlang::expr(study_id)
lst <- purrr::map(s4_to_list(x), ~ dplyr::filter(.x, !!study_id %in% id))
y <- list_to_s4(lst, "studies")
return(y)
})
I am using the !! operator (and I will probably use a few more others from the rlang package) and I am wondering if I need to explicitly import it as with the pipe-operator %>%, as explained in this question: R: use magrittr pipe operator in self written package.
Is there something equivalent to usethis::use_pipe() but for the operators from rlang?
According to Hadley, the !! operator is more like a polite fiction than an actual operator, which is why you don't need to import it.
So far we have acted as if !! and !!! are regular prefix operators like + , -, and !.
They’re not. From R’s perspective, !! and !!! are simply the repeated application of !:
!!TRUE
#> [1] TRUE
!!!TRUE
#> [1] FALSE
Once an rlang function detects this "operator" it treats it differently to perform the tidy evaluation necessary (which is why the operator is only useful in a rlang context)
!! and !!! behave specially inside all quoting functions powered by rlang, where they
behave like real operators with precedence equivalent to unary + and -.
This is why you only need to import the rlang function you want, because the logic for dealing with !! lies inside rlang internals, not a separate function like the magrittr pipe.

dplyr, rlang: Unable to predict if minor varients of passing names to nested dplyr functions will work

Data for reproducibility
.i <- tibble(a=2*1:4+1, b=2*1:4)
This function is supposed to take its data and other arguments as unquoted names, find those names in the data, and use them to add a column and filter out the
top row. It does not work. Mutate says it can not find a.
t1 <- function(.j=.i, X=a, Y=b){
e_X <- enquo(X)
e_Y <- enquo(Y)
mutate(.data=.j, pass=UQ(e_X)+1) %>%
filter(UQ(e_Y) > 3) -> out
out
}
t1(a,b)
This function, which I found by typo -- note the .i instead of .j in the mutate statement -- does what the previous function was supposed to do. And I don't know why. I think it is skipping over the function arguments and finding .i in the global environment. Or maybe it is using a ouiji board.
t2 <- function(.j=.i, X=a, Y=b){
e_X <- enquo(X)
e_Y <- enquo(Y)
mutate(.data=.i, pass=UQ(e_X)+1) %>%
filter(UQ(e_Y) > 3) -> out
out
}
t1(a,b)
Since mutate could not find .j when passed to it in the usual R way, maybe it needs to be passed in an rlang-style quosure, like the formals X and Y. This function also does not work, with UQ in mutate saying that it can not find a. Like the first function above, it works if the .j in mutate is replaced with a .i. (Seems like there should be an "enquos" to parallel quos).
t3 <- function(.j=.i, X=a, Y=b){
e_j <- enquo(.j)
e_X <- enquo(X)
e_Y <- enquo(Y)
mutate(.data=UQ(.j), pass=UQ(e_X)+1) %>%
filter(UQ(e_Y) > 3) -> out
out
}
t1(a,b)
Finally, it appears that, once the .i substitution in mutate is made, t4() no longer needs a data argument at all. See below, where I replace it with bop_foo_foo. If, however, you replace bop_foo_foo throughout with the name of the data, .i, (t5()) then UQ again fails to find a.
bop_foo_foo <- 0
t4 <- function(bop_foo_foo, X=a, Y=b){
e_j <- enquo(bop_foo_foo)
e_X <- enquo(X)
e_Y <- enquo(Y)
mutate(.data=UQ(.i), pass=UQ(e_X)+1) %>%
filter(UQ(e_Y) > 3) -> out
out
}
t1(a,b)
The functions above seem to me to be relatively minor variants on a single function. I have run dozens more, and although I have observed some patterns,
and read the enquo and UQ help files I do not know how many times, a real
understanding continues to elude me.
I would like to know why the functions above that that don't work don't, and why the ones that do work do. I don't necessarily need a function by function critique. If you can state general principles that embody the required, understanding, that would be delightful. And more than sufficient.
I think it is skipping over the function arguments and finding .i in the global environment.
Yes, scope of symbols in R is hierarchical. The variables local to a function are looked up first, and then the surrounding environment of the function is inspected, and so on.
mutate(.data = UQ(.j), ...)
I think you are missing the difference between regular arguments and (quasi)quoted arguments. Unquoting is only relevant for quasiquoted arguments. Since the .data argument of mutate() is not quasiquoted it does not make sense to try and unquote stuff. The quasiquoted arguments are the ones that are captured/quoted with enexpr() or enquo(). You can tell whether an argument is quasiquoted either by looking at the documentation or by recognising that the argument supports direct references to columns (regular arguments need to be explicit about where to find the columns).
In the next version of rlang, the exported UQ() function will throw an error to make it clear that it should not be called directly and that it can only be used in quasiquoted arguments.
I would suggest:
Call the first argument of your function data or df rather than .i.
Don't give it a default. The user should always supply the data.
Don't capture it with enquo() or enexpr() or substitute(). Instead pass it directly to the data argument of other verbs.
Once this is out of the way it will be easier to work out the rest.

Passing conditional expression into a user definded function with dplyr (R)

I'm trying to make a function that subsets and mutates data with dplyr commands. My fake data is like this:
newTest_rv <- data.frame(is_op=c(rep(0,6),rep(1,4)),
has_click=c(0,0,1,1,1,1,0,0,1,1),
num_pimp=c(3,5,1,2,3,5,2,5,3,5),
freq = c(rep(1,5),5,1,2,1,2))
And my function is like this:
reweight <- function(data, conds){
require(dplyr)
require(lazyeval)
data %>%
filter_(lazy(conds)) %>%
group_by(num_pimp) %>%
mutate_(lazy(new_num) = lazy(num_pimp) - lazy(sum(freq[lazy(!conds)]))) %>%
mutate(new_weight=freq*(1/new_num)) %>%
ungroup()
}
> reweight(newTest_rv, is_op==0)
The non-standard evaluation with the conditional statement "is_op==0" seems to work in other places but not in the subset within a group "lazy(sum(freq[lazy(!conds)]))". Is there any way I can circumvent this problem?
Thank you!
It looks like you went a bit overboard with the lazys. The lazy() function creates a lazy object which basically delays evaluation of an expression. You can't just compose standard expressions and lazy expression. Generally you combine them via lazyeval's interp() function. I think what you want is
mutate_(new_num = interp(~num_pimp - sum(freq[!(x)]), x=lazy(conds)))
Here we use interp() to take a standard expression (in this case one that uses the formula syntax) and insert the lazy expression as a subsetting vector.

What does %>% function mean in R?

I have seen the use of %>% (percent greater than percent) function in some packages like dplyr and rvest. What does it mean? Is it a way to write closure blocks in R?
%...% operators
%>% has no builtin meaning but the user (or a package) is free to define operators of the form %whatever% in any way they like. For example, this function will return a string consisting of its left argument followed by a comma and space and then it's right argument.
"%,%" <- function(x, y) paste0(x, ", ", y)
# test run
"Hello" %,% "World"
## [1] "Hello, World"
The base of R provides %*% (matrix mulitiplication), %/% (integer division), %in% (is lhs a component of the rhs?), %o% (outer product) and %x% (kronecker product). It is not clear whether %% falls in this category or not but it represents modulo.
expm The R package, expm, defines a matrix power operator %^%. For an example see Matrix power in R .
operators The operators R package has defined a large number of such operators such as %!in% (for not %in%). See http://cran.r-project.org/web/packages/operators/operators.pdf
igraph This package defines %--% , %->% and %<-% to select edges.
lubridate This package defines %m+% and %m-% to add and subtract months and %--% to define an interval. igraph also defines %--% .
Pipes
magrittr In the case of %>% the magrittr R package has defined it as discussed in the magrittr vignette. See http://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html
magittr has also defined a number of other such operators too. See the Additional Pipe Operators section of the prior link which discusses %T>%, %<>% and %$% and http://cran.r-project.org/web/packages/magrittr/magrittr.pdf for even more details.
dplyr The dplyr R package used to define a %.% operator which is similar; however, it has been deprecated and dplyr now recommends that users use %>% which dplyr imports from magrittr and makes available to the dplyr user. As David Arenburg has mentioned in the comments this SO question discusses the differences between it and magrittr's %>% : Differences between %.% (dplyr) and %>% (magrittr)
pipeR The R package, pipeR, defines a %>>% operator that is similar to magrittr's %>% and can be used as an alternative to it. See http://renkun.me/pipeR-tutorial/
The pipeR package also has defined a number of other such operators too. See: http://cran.r-project.org/web/packages/pipeR/pipeR.pdf
postlogic The postlogic package defined %if% and %unless% operators.
wrapr The R package, wrapr, defines a dot pipe %.>% that is an explicit version of %>% in that it does not do implicit insertion of arguments but only substitutes explicit uses of dot on the right hand side. This can be considered as another alternative to %>%. See https://winvector.github.io/wrapr/articles/dot_pipe.html
Bizarro pipe. This is not really a pipe but rather some clever base syntax to work in a way similar to pipes without actually using pipes. It is discussed in http://www.win-vector.com/blog/2017/01/using-the-bizarro-pipe-to-debug-magrittr-pipelines-in-r/ The idea is that instead of writing:
1:8 %>% sum %>% sqrt
## [1] 6
one writes the following. In this case we explicitly use dot rather than eliding the dot argument and end each component of the pipeline with an assignment to the variable whose name is dot (.) . We follow that with a semicolon.
1:8 ->.; sum(.) ->.; sqrt(.)
## [1] 6
Update Added info on expm package and simplified example at top. Added postlogic package.
Update 2 The development version of R has defined a |> pipe. Unlike magrittr's %>% it can only substitute into the first argument of the right hand side. Although limited, it works via syntax transformation so it has no performance impact.
%>% is similar to pipe in Unix. For example, in
a <- combined_data_set %>% group_by(Outlet_Identifier) %>% tally()
the output of combined_data_set will go into group_by and its output will go into tally, then the final output is assigned to a.
This gives you handy and easy way to use functions in series without creating variables and storing intermediate values.
My understanding after reading the link offered by G.Grothendieck is that %>% is an operator that pipes functions. This helps readability and productivity as it's easier to follow the flow of multiple functions through these pipes than going backwards when multiple function are nested.
The R packages dplyr and sf import the operator %>% from the R package magrittr.
Help is available by using the following command:
?'%>%'
Of course the package must be loaded before by using e.g.
library(sf)
The documentation of the magrittr forward-pipe operator gives a good example:
When functions require only one argument, x %>% f is equivalent to f(x)
Another usage for %---% is the use of %<-% which means a multi-assignment operator for example:
session <- function(){
x <- 1
y <- 2
z <- y + x
list(x,y,z)
}
c(var1,var2,result) %<-% session()
I don't know much about it but I have seen it in one case study during the study of Multivariate Normal Distribution in R in my college
suppose you have a data frame in a variable called "df_gather" and you want to pipe it into a ggplot then you can use that %>%
EG:
df_gather %>% ggplot(aes(x = Value, fill = Variable, color = Variable))+
geom_density(alpha = 0.3)+ggtitle('Distibution of X')

Resources