behavior of pipe operator in compostie functions purrr - dplyr - r

usage of pipe operator in purrr-dplyr packages is (in short) defined as follows:
y%>%f(x,.,z) is the same as f(x,y,z)
I am trying to do the following task using pipe operator. First I show you the task without using pipes:
#####for reproducibility
set.seed(50)
z0<-factor(sample(c(letters[1:3],NA),100,replace = T))
###the task
rep(1,length(table(z0)))
now I want to do this using pipes:
z0%>%table%>%rep(1,length(.))
however the result is not the same. It seems that pipe operator cannot handle the proper assignation to a composition of functions. That is
y%>%f(x,g(.)) should be the same as f(x,g(y))
so, the concrete question is if ti is possible to do
y%>%f(x,g(.))
Thank you in advance for your comments.

The %>% implements a first argument rule, that is, it passes the previous data as first argument to the function if . is not a direct argument; In your second case, the argument to rep is 1 and length(.), so the first argument rule takes effect; To avoid this, use {} to enclose the expression; You can read more about this at Re-using the placeholder for attributes:
Re-using the placeholder for attributes
It is straight-forward to use the placeholder several times in a
right-hand side expression. However, when the placeholder only appears
in a nested expressions magrittr will still apply the first-argument
rule. The reason is that in most cases this results more clean code.
x %>% f(y = nrow(.), z = ncol(.)) is equivalent to f(x, y = nrow(x), z = ncol(x))
The behavior can be overruled by enclosing the right-hand side in
braces:
x %>% {f(y = nrow(.), z = ncol(.))} is equivalent to f(y = nrow(x), z = ncol(x))
rep(1,length(table(z0)))
# [1] 1 1 1
Equivalent would be:
z0 %>% table %>% {rep(1,length(.))}
# [1] 1 1 1

Related

Evaluation of one input in a function with multiple inputs in R

I am trying to do the following but can't figure it out. Could someone please help me?
f <- expression(x^3+4*y)
df <- D(f,'x')
x <-0
df0 <- eval(df)
df0 should be a function of y!
If you take the derivative of f with respect to x you get 3 * x^2. The 4*y is a constant as far as x is concerned. So you don't have a function of y as such, your df is a constant as far as y is concerned (although it is a function of x).
Assigning to x doesn't change df; it remains the expression 3 * x^2 and is still a function of x if you wanted to treat it as such.
If you want to substitute a variable in an expression, then substitute() is what you are looking for.
> substitute(3 * x^2, list(x = 0))
3 * 0^2
It is a blind substitute with no simplification of the expression--we probably expected zero here, but we get zero times 3--but that is what you get.
Unfortunately, substituting in an expression you have in a variable is a bit cumbersome, since substitute() thinks its first argument is the verbatim expression, so you get
> substitute(df, list(x = 0))
df
The expression is df, there is no x in that so nothing is substituted, and you just get df back.
You can get around that with two substitutions and an eval:
> df0 <- eval(
+ substitute(substitute(expr, list(x = 0)),
+ list(expr = df)))
> df0
3 * 0^2
> eval(df0)
[1] 0
The outermost substitute() puts the value of df into expr, so you get the right expression there, and the inner substitute() changes the value of x.
There are nicer functions for manipulating expressions in the Tidyverse, but I don't remember them off the top of my head.

Explanation of rlang operators used to write functions

I recently posted two questions (1, 2) related to functions I was trying to write. I received useful answers to each, which resulted in the following two functions:
second_table <- function(dat, variable1, variable2){
dat %>%
tabyl({{variable1}}, {{variable2}}, show_na = FALSE) %>%
adorn_percentages("row") %>%
adorn_pct_formatting(digits = 1) %>%
adorn_ns()
}
And
second_table2 = function(dat, variable1, variable2){
variable1 <- sym(variable1)
dat %>%
tabyl(!!variable1, {{variable2}}, show_na = FALSE) %>%
adorn_percentages("row") %>%
adorn_pct_formatting(digits = 1) %>%
adorn_ns()
}
These functions work as intended, but I had never used the rlang package before and am still confused about the difference between the {{}} operator and !! + sym() after looking through the available documentation and writing some additional functions. I don't like to use code that I don't fully understand and am sure I will have further use for these rlang operators in the future, so would greatly appreciate a plain-language explanation of what the difference is between these operators.
R has a particular feature called non-standard evaluation (NSE), where expressions are used as-is instead of being evaluated. Most people first encounter NSE when they load packages:
a <- "rlang"
print(a) # Standard evaluation - the expression a is evaluated to its value
# [1] "rlang"
library(a) # Non-standard evaluation - the expression a is used as-is
# Error in library(a) : there is no package called ‘a’
rlang enables sophisticated NSE by providing three main functions to capture unevaluated symbols and expressions:
sym("x") captures a symbol (i.e., variable name, column name, etc.). Older versions allowed for sym(x), but I think the latest version of rlang forces the input to be a string.
expr(a + b) captures arbitrary expressions
quo(a + b) captures arbitrary expressions AND the environment where these expression were defined.
The difference between expressions and quosures is that evaluating the former will be done in the immediate environment, while the latter is always evaluated in the environment where the expression was captured:
f <- function(e) {a <- 2; b <- 3; eval_tidy(e)}
a <- 5; b <- 10
f(expr(a+b)) # Evaluated inside f
# [1] 5
f(quo(a+b)) # Evaluated in the environment where it is captured
# [1] 15
All three verbs have en-equivalents: ensym, enexpr and enquo. These are used to capture symbols and expressions provided to a function from within that function. This is useful when you want to remove the need for a user of the function to use sym, etc. themselves:
f <- function(x) {enexpr(x)} # Expression captured within a function
f(a+b)
# This has exact equivalence to
f <- function(x) {x}
f(expr(a+b)) # The user has to do the capture themselves
In all cases, the operator !! evaluates symbols and expressions. Think of it as eval() on steroids, because !! forces immediate evaluation that takes precedence over everything else. Among other things, this can be useful for iterative construction of more complicated expressions:
a <- expr(b + 2)
expr(d * !!a) # a is evaluated immediately
# d * (b + 2)
expr(d * eval(a)) # evaluation of a is delayed
# d * eval(a)
With all that said, {{x}} is shorthand notation for !!enquo(x)

Would like to know how to properly use `%>%` in `mapply`?

Would like to know how to combine %>% with mapply in a proper way.
here is the toy sample
A = data.table(a = letters[1:3], b = 3:1)
mapply(function(x, y) str_c(x,"---", y), A$a, A$b)
it gives a named character vector as following
a b c
"a---3" "b---2" "c---1"
However, it generates a new variables which I try hard to avoid, and would like to make it in this form:
A %>% mapply(function(x, y) str_c(x,"---", y), .$a, .$b)
but the result is
object '.' of mode 'function' was not found
Please advise how I can make it?
To explain why your code isn’t working, you need to know that writing obj %>% f(args) always inserts obj as the first argument in the call to f, unless you use . on its own as another argument. In other words, it’s equivalent to
obj %>% f(., args)
Since you don’t use . on its own as an argument (even though you use .$a and .%b), your call is equivalent to
A %>% mapply(., function(x, y) str_c(x,"---", y), .$a, .$b)
… which doesn’t work, since mapply expects its first argument to be a function.
As Ronak’s answer shows, to circumvent this you can put f(args) into {…}:
obj %>% {f(args)}
This syntactic form explicitly disables the rule about inserting . as explained above. It’s a special case defined for this purpose.
Alternatively, you could use another pipe operator from ‘magrittr’, the exposition pipe, %$%. This one works differently: it pulls out named components from the left-hand expression. That way, you could write
A %$% mapply(function(x, y) str_c(x,"---", y), a, b)
This doesn't seem to right place to use %>%. If you need it to for learning purpose use :
A %>% {mapply(function(x, y) stringr::str_c(x,"---", y), .$a, .$b)}

R dplyr mutate colnames [duplicate]

What are the differences between the assignment operators = and <- in R?
I know that operators are slightly different, as this example shows
x <- y <- 5
x = y = 5
x = y <- 5
x <- y = 5
# Error in (x <- y) = 5 : could not find function "<-<-"
But is this the only difference?
The difference in assignment operators is clearer when you use them to set an argument value in a function call. For example:
median(x = 1:10)
x
## Error: object 'x' not found
In this case, x is declared within the scope of the function, so it does not exist in the user workspace.
median(x <- 1:10)
x
## [1] 1 2 3 4 5 6 7 8 9 10
In this case, x is declared in the user workspace, so you can use it after the function call has been completed.
There is a general preference among the R community for using <- for assignment (other than in function signatures) for compatibility with (very) old versions of S-Plus. Note that the spaces help to clarify situations like
x<-3
# Does this mean assignment?
x <- 3
# Or less than?
x < -3
Most R IDEs have keyboard shortcuts to make <- easier to type. Ctrl + = in Architect, Alt + - in RStudio (Option + - under macOS), Shift + - (underscore) in emacs+ESS.
If you prefer writing = to <- but want to use the more common assignment symbol for publicly released code (on CRAN, for example), then you can use one of the tidy_* functions in the formatR package to automatically replace = with <-.
library(formatR)
tidy_source(text = "x=1:5", arrow = TRUE)
## x <- 1:5
The answer to the question "Why does x <- y = 5 throw an error but not x <- y <- 5?" is "It's down to the magic contained in the parser". R's syntax contains many ambiguous cases that have to be resolved one way or another. The parser chooses to resolve the bits of the expression in different orders depending on whether = or <- was used.
To understand what is happening, you need to know that assignment silently returns the value that was assigned. You can see that more clearly by explicitly printing, for example print(x <- 2 + 3).
Secondly, it's clearer if we use prefix notation for assignment. So
x <- 5
`<-`(x, 5) #same thing
y = 5
`=`(y, 5) #also the same thing
The parser interprets x <- y <- 5 as
`<-`(x, `<-`(y, 5))
We might expect that x <- y = 5 would then be
`<-`(x, `=`(y, 5))
but actually it gets interpreted as
`=`(`<-`(x, y), 5)
This is because = is lower precedence than <-, as shown on the ?Syntax help page.
What are the differences between the assignment operators = and <- in R?
As your example shows, = and <- have slightly different operator precedence (which determines the order of evaluation when they are mixed in the same expression). In fact, ?Syntax in R gives the following operator precedence table, from highest to lowest:
…
‘-> ->>’ rightwards assignment
‘<- <<-’ assignment (right to left)
‘=’ assignment (right to left)
…
But is this the only difference?
Since you were asking about the assignment operators: yes, that is the only difference. However, you would be forgiven for believing otherwise. Even the R documentation of ?assignOps claims that there are more differences:
The operator <- can be used anywhere,
whereas the operator = is only allowed at the top level (e.g.,
in the complete expression typed at the command prompt) or as one
of the subexpressions in a braced list of expressions.
Let’s not put too fine a point on it: the R documentation is wrong. This is easy to show: we just need to find a counter-example of the = operator that isn’t (a) at the top level, nor (b) a subexpression in a braced list of expressions (i.e. {…; …}). — Without further ado:
x
# Error: object 'x' not found
sum((x = 1), 2)
# [1] 3
x
# [1] 1
Clearly we’ve performed an assignment, using =, outside of contexts (a) and (b). So, why has the documentation of a core R language feature been wrong for decades?
It’s because in R’s syntax the symbol = has two distinct meanings that get routinely conflated (even by experts, including in the documentation cited above):
The first meaning is as an assignment operator. This is all we’ve talked about so far.
The second meaning isn’t an operator but rather a syntax token that signals named argument passing in a function call. Unlike the = operator it performs no action at runtime, it merely changes the way an expression is parsed.
So how does R decide whether a given usage of = refers to the operator or to named argument passing? Let’s see.
In any piece of code of the general form …
‹function_name›(‹argname› = ‹value›, …)
‹function_name›(‹args›, ‹argname› = ‹value›, …)
… the = is the token that defines named argument passing: it is not the assignment operator. Furthermore, = is entirely forbidden in some syntactic contexts:
if (‹var› = ‹value›) …
while (‹var› = ‹value›) …
for (‹var› = ‹value› in ‹value2›) …
for (‹var1› in ‹var2› = ‹value›) …
Any of these will raise an error “unexpected '=' in ‹bla›”.
In any other context, = refers to the assignment operator call. In particular, merely putting parentheses around the subexpression makes any of the above (a) valid, and (b) an assignment. For instance, the following performs assignment:
median((x = 1 : 10))
But also:
if (! (nf = length(from))) return()
Now you might object that such code is atrocious (and you may be right). But I took this code from the base::file.copy function (replacing <- with =) — it’s a pervasive pattern in much of the core R codebase.
The original explanation by John Chambers, which the the R documentation is probably based on, actually explains this correctly:
[= assignment is] allowed in only two places in the grammar: at the top level (as a complete program or user-typed expression); and when isolated from surrounding logical structure, by braces or an extra pair of parentheses.
In sum, by default the operators <- and = do the same thing. But either of them can be overridden separately to change its behaviour. By contrast, <- and -> (left-to-right assignment), though syntactically distinct, always call the same function. Overriding one also overrides the other. Knowing this is rarely practical but it can be used for some fun shenanigans.
Google's R style guide simplifies the issue by prohibiting the "=" for assignment. Not a bad choice.
https://google.github.io/styleguide/Rguide.xml
The R manual goes into nice detail on all 5 assignment operators.
http://stat.ethz.ch/R-manual/R-patched/library/base/html/assignOps.html
x = y = 5 is equivalent to x = (y = 5), because the assignment operators "group" right to left, which works. Meaning: assign 5 to y, leaving the number 5; and then assign that 5 to x.
This is not the same as (x = y) = 5, which doesn't work! Meaning: assign the value of y to x, leaving the value of y; and then assign 5 to, umm..., what exactly?
When you mix the different kinds of assignment operators, <- binds tighter than =. So x = y <- 5 is interpreted as x = (y <- 5), which is the case that makes sense.
Unfortunately, x <- y = 5 is interpreted as (x <- y) = 5, which is the case that doesn't work!
See ?Syntax and ?assignOps for the precedence (binding) and grouping rules.
According to John Chambers, the operator = is only allowed at "the top level," which means it is not allowed in control structures like if, making the following programming error illegal.
> if(x = 0) 1 else x
Error: syntax error
As he writes, "Disallowing the new assignment form [=] in control expressions avoids programming errors (such as the example above) that are more likely with the equal operator than with other S assignments."
You can manage to do this if it's "isolated from surrounding logical structure, by braces or an extra pair of parentheses," so if ((x = 0)) 1 else x would work.
See http://developer.r-project.org/equalAssign.html
From the official R documentation:
The operators <- and = assign into the environment in which they
are evaluated. The operator <- can be used anywhere, whereas the
operator = is only allowed at the top level (e.g., in the
complete expression typed at the command prompt) or as one of the
subexpressions in a braced list of expressions.
This may also add to understanding of the difference between those two operators:
df <- data.frame(
a = rnorm(10),
b <- rnorm(10)
)
For the first element R has assigned values and proper name, while the name of the second element looks a bit strange.
str(df)
# 'data.frame': 10 obs. of 2 variables:
# $ a : num 0.6393 1.125 -1.2514 0.0729 -1.3292 ...
# $ b....rnorm.10.: num 0.2485 0.0391 -1.6532 -0.3366 1.1951 ...
R version 3.3.2 (2016-10-31); macOS Sierra 10.12.1
I am not sure if Patrick Burns book R inferno has been cited here where in 8.2.26 = is not a synonym of <- Patrick states "You clearly do not want to use '<-' when you want to set an argument of a function.". The book is available at https://www.burns-stat.com/documents/books/the-r-inferno/
There are some differences between <- and = in the past version of R or even the predecessor language of R (S language). But currently, it seems using = only like any other modern language (python, java) won't cause any problem. You can achieve some more functionality by using <- when passing a value to some augments while also creating a global variable at the same time but it may have weird/unwanted behavior like in
df <- data.frame(
a = rnorm(10),
b <- rnorm(10)
)
str(df)
# 'data.frame': 10 obs. of 2 variables:
# $ a : num 0.6393 1.125 -1.2514 0.0729 -1.3292 ...
# $ b....rnorm.10.: num 0.2485 0.0391 -1.6532 -0.3366 1.1951 ...
Highly recommended! Try to read this article which is the best article that tries to explain the difference between those two:
Check https://colinfay.me/r-assignment/
Also, think about <- as a function that invisibly returns a value.
a <- 2
(a <- 2)
#> [1] 2
See: https://adv-r.hadley.nz/functions.html

What's the difference between substitute and quote in R

In the official docs, it says:
substitute returns the parse tree for the (unevaluated) expression
expr, substituting any variables bound in env.
quote simply returns its argument. The argument is not evaluated and
can be any R expression.
But when I try:
> x <- 1
> substitute(x)
x
> quote(x)
x
It looks like both quote and substitute returns the expression that's passed as argument to them.
So my question is, what's the difference between substitute and quote, and what does it mean to "substituting any variables bound in env"?
Here's an example that may help you to easily see the difference between quote() and substitute(), in one of the settings (processing function arguments) where substitute() is most commonly used:
f <- function(argX) {
list(quote(argX),
substitute(argX),
argX)
}
suppliedArgX <- 100
f(argX = suppliedArgX)
# [[1]]
# argX
#
# [[2]]
# suppliedArgX
#
# [[3]]
# [1] 100
R has lazy evaluation, so the identity of a variable name token is a little less clear than in other languages. This is used in libraries like dplyr where you can write, for instance:
summarise(mtcars, total_cyl = sum(cyl))
We can ask what each of these tokens means: summarise and sum are defined functions, mtcars is a defined data frame, total_cyl is a keyword argument for the function summarise. But what is cyl?
> cyl
Error: object 'cyl' not found
It isn't anything! Well, not yet. R doesn't evaluate it right away, but treats it as an expression to be parsed later with some parse tree that is different than the global environment your command line is working in, specifically one where the columns of mtcars are defined. Somewhere in the guts of dplyr, something like this is happening:
> substitute(cyl, mtcars)
[1] 6 6 4 6 8 ...
Suddenly cyl means something. That's what substitute is for.
So what is quote for? Well sometimes you want your lazily-evaluated expression to be represented somewhere else before it's evaluated, i.e. you want to display the actual code you're writing without any (or only some) values substituted. The docs you quoted explain this is common for "informative labels for data sets and plots".
So, for example, you could create a quoted expression, and then both print the unevaluated expression in your chart to show how you calculated and actually calculate with the expression.
expr <- quote(x + y)
print(expr) # x + y
eval(expr, list(x = 1, y = 2)) # 3
Note that substitute can do this expression trick also while giving you the option to parse only part of it. So its features are a superset of quote.
expr <- substitute(x + y, list(x = 1))
print(expr) # 1 + y
eval(expr, list(y = 2)) # 3
Maybe this section of the documentation will help somewhat:
Substitution takes place by examining each component of the parse tree
as follows: If it is not a bound symbol in env, it is unchanged. If it
is a promise object, i.e., a formal argument to a function or
explicitly created using delayedAssign(), the expression slot of the
promise replaces the symbol. If it is an ordinary variable, its value
is substituted, unless env is .GlobalEnv in which case the symbol is
left unchanged.
Note the final bit, and consider this example:
e <- new.env()
assign(x = "a",value = 1,envir = e)
> substitute(a,env = e)
[1] 1
Compare that with:
> quote(a)
a
So there are two basic situations when the substitution will occur: when we're using it on an argument of a function, and when env is some environment other than .GlobalEnv. So that's why you particular example was confusing.
For another comparison with quote, consider modifying the myplot function in the examples section to be:
myplot <- function(x, y)
plot(x, y, xlab = deparse(quote(x)),
ylab = deparse(quote(y)))
and you'll see that quote really doesn't do any substitution.
Regarding your question why GlobalEnv is treated as an exception for substitute, it is just a heritage of S. From The R language definition (https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Substitutions):
The special exception for substituting at the top level is admittedly peculiar. It has been inherited from S and the rationale is most likely that there is no control over which variables might be bound at that level so that it would be better to just make substitute act as quote.

Resources