In R for data science Chapter 21.5.1, this syntax is used in base function split(.$cyl). Why the dot in .$cyl. The package purrr has a syntax for a placeholders (. or .x) but purrr is not involved.
library(tidyverse)
mtcars %>% split(f=.$cyl)
The placeholder syntax used by purrr is also used by the magrittr pipe (%>%). By default, the pipe passes the left-hand side (LHS) as the first argument of the function on the right-hand side (RHS). When this is the case the . is not necessary in the RHS expression.
For instance:
mtcars %>% str()
works fine and is equivalent to:
mtcars %>% str(.)
The . is in this case totally unnecessary because the LHS (mtcars) is the first argument passed to str().
So this is the same as:
str(mtcars)
But in any other situation, you need to use . to mark where, in the RHS, the LHS should be passed.
Your example is a little complex because the LHS (mtcars) is passed twice in the RHS (the function split()):
first, as the first argument (so no . needed)
then, again, as part of the 2nd argument (so you do need a . in that case).
mtcars %>% split(f = .$cyl)
could be written (though that is unnecessary) as:
mtcars %>% split(x = ., f = .$cyl)
and is thus in fact equivalent to:
split(x = mtcars, f = mtcars$cyl)
Related
Recently I have found the %$% pipe operator, but I am missing the point regarding its difference with %>% and if it could completely replace it.
Motivation to use %$%
The operator %$% could replace %>% in many cases:
mtcars %>% summary()
mtcars %$% summary(.)
mtcars %>% head(10)
mtcars %$% head(.,10)
Apparently, %$% is more usable than %>%:
mtcars %>% plot(.$hp, .$mpg) # Does not work
mtcars %$% plot(hp, mpg) # Works
Implicitly fills the built-in data argument:
mtcars %>% lm(mpg ~ hp, data = .)
mtcars %$% lm(mpg ~ hp)
Since % and $ are next to each other in the keyboard, inserting %$% is more convenient than inserting %>%.
Documentation
We find the following information in their respective help pages.
(?magrittr::`%>%`):
Description:
Pipe an object forward into a function or call expression.
Usage:
lhs %>% rhs
(?magrittr::`%$%`):
Description:
Expose the names in ‘lhs’ to the ‘rhs’ expression. This is useful
when functions do not have a built-in data argument.
Usage:
lhs %$% rhs
I was not able to understand the difference between the two pipe operators. Which is the difference between piping an object and exposing a name? But, in the rhs of %$%, we are able to get the piped object with the ., right?
Should I start using %$% instead of %>%? Which problems could I face doing so?
In addition to the provided comments:
%$% also called the Exposition pipe vs. %>%:
This is a short summary of this article https://towardsdatascience.com/3-lesser-known-pipe-operators-in-tidyverse-111d3411803a
"The key difference in using %$% or %>% lies in the type of arguments of used functions."
One advantage, and as far as I can understand it, for me the only one to use %$% over %>% is the fact that
we can avoid repetitive input of the dataframe name in functions that have no data as an argument.
For example the lm() has a data argument. In this case we can use both %>% and %$% interchangeable.
But in functions like the cor() which has no data argument:
mtcars %>% cor(disp, mpg) # Will give an Error
cor(mtcars$disp, mtcars$mpg)
is equivalent to
mtcars %$% cor(disp, mpg)
And note to use %$% pipe operator you have to load library(magrittr)
Update: on OPs comment:
The pipe independent which one allows us to transform machine or computer language to a more readable human language.
ggplot2 is special. ggplot2 is not internally consistent.
ggplot1 had a tidier API then ggplot2
Pipes would work with ggplot1:
library(ggplot1) mtcars %>% ggplot(list( x= mpg, y = wt)) %>% ggpoint() %>% ggsave("mtcars.pdf", width= 8 height = 6)
In 2016 Wick Hadley said:
"ggplot2 newver would have existed if I'd discovered the pipe 10 years earlier!"
https://www.youtube.com/watch?v=K-ss_ag2k9E&list=LL&index=9
No, you shouldn't use %$% routinely. It is like using the with() function, i.e. it exposes the component parts of the LHS when evaluating the RHS. But it only works when the value on the left has names like a list or dataframe, so you can't always use it. For example,
library(magrittr)
x <- 1:10
x %>% mean()
#> [1] 5.5
x %$% mean()
#> Error in eval(substitute(expr), data, enclos = parent.frame()): numeric 'envir' arg not of length one
Created on 2022-02-06 by the reprex package (v2.0.1.9000)
You'd get a similar error with x %$% mean(.).
Even when the LHS has names, it doesn't automatically put the . argument in the first position. For example,
mtcars %>% nrow()
#> [1] 32
mtcars %$% nrow()
#> Error in nrow(): argument "x" is missing, with no default
Created on 2022-02-06 by the reprex package (v2.0.1.9000)
In this case mtcars %$% nrow(.) would work, because mtcars has names.
Your example involving .$hp and .$mpg is illustrating one of the oddities of magrittr pipes. Because the . is only used in expressions, not alone as an argument, it is passed as the first argument as well as being passed in those expressions. You can avoid this using braces, e.g.
mtcars %>% {plot(.$hp, .$mpg)}
I am trying to print a data sample of the mpg dataframe for each of the drv types in mpg ('f','r','4'). I tried to do it the following way using the walk function:
walk(unique(mpg$drv),~print(mpg %>% filter(drv == .) %>% head()))
But the result for each of the drive types using this method is an empty tibble.
The following method works perfectly but I wanted to understand what's wrong with the previous one.
walk(drives,~print(mpg[mpg$drv == .,] %>% head()))
When you are using pipes . refers to an object coming from LHS (left hand side) of the pipe. In this case it is dataframe mpg what you need is the current value of drv to filter. You can use .x to refer it.
library(dplyr)
library(purrr)
walk(unique(mpg$drv),~print(mpg %>% filter(drv == .x) %>% head()))
It works for the second case because you are not using pipes there before using ..
I figured this out while typing my question, but would like to see if there's a cleaner, less code way of doing what I want.
e.g. code block:
target <- "mpg"
# want
mtcars %>%
mutate(target := log(target))
I'd like to update mpg to be the log of mpg based on the variable target.
Looks like I got this working with:
mtcars %>%
mutate(!! rlang::sym(target) := log(!! rlang::sym(target)))
That just reads as pretty repetitive. Is there a 'cleaner', less code way of achieving the same result?
I'm fond of the double curly braces {{var}}, no reason, they are just nicer to read imho but I couldn't get the same results when I tried:
mtcars %>%
mutate(!! rlang::sym(target) := log({{target}}))
What are the various ways I can use tidyeval to mutate a field via transformation based on a pre determined variable to define which field to be transformed, in this case the variable 'target'?
On the lhs of :=, the string can be evaluated with just !!, while on the rhs, it is the value that we need, so we convert to symbol and evaluate (!!)
library(dplyr)
mtcars %>%
mutate(!!target := log(!! rlang::sym(target)))
1) Use mutate_at
library(dplyr)
mtcars %>% mutate_at(target, log)
2) We can use the magrittr %<>% operator:
library(magrittr)
mtcars[[target]] %<>% log
3) Of course this is trivial in base R:
mtcars[[target]] <- log(mtcars[[target]])
I recently discovered the pipe operator %>%, which can make code more readable. Here is my MWE.
library(dplyr) # for the pipe operator
library(lsr) # for the cohensD function
set.seed(4) # make it reproducible
dat <- data.frame( # create data frame
subj = c(1:6),
pre = sample(1:6, replace = TRUE),
post = sample(1:6, replace = TRUE)
)
dat %>% select(pre, post) %>% sapply(., mean) # works as expected
However, I struggle using the pipe operator in this particular case
dat %>% select(pre, post) %>% cohensD(.$pre, .$post) # piping returns an error
cohensD(dat$pre, dat$post) # classical way works fine
Why is it not possible to subset columns using the placeholder .in combination with $? Is it worthwhile to write this line using a pipe operator %>%, or does it complicate syntax? The classical way of writing this seems more concise.
This would work:
dat %>% select(pre, post) %>% {cohensD(.$pre, .$post)}
Wrapping the last call into curly braces makes it be treated like an expression and not a function call. When you pipe something into an expression, the . gets replaced as expected. I often use this trick to call a function which does not interface well with piping.
What is inside the braces happens to be a function call but could really be any expression of . .
Since you're going from a bunch of data into one (row of) value(s), you're summarizing. in a dplyr pipeline you can then use the summarize function, within the summarize function you don't need to subset and can just call pre and post
Like so:
dat %>% select(pre, post) %>% summarize(CD = cohensD(pre, post))
(The select statement isn't actually necessary in this case, but I left it in to show how this works in a pipeline)
It doesn't work because the . operator has to be used directly as an argument, and not inside a nested function (like $...) in your call.
If you really want to use piping, you can do it with the formula interface, but with a little reshaping before (melt is from reshape2 package):
dat %>% select(pre, post) %>% melt %>% cohensD(value~variable, .)
#### [1] 0.8115027
This starts as an aestethic question but then turns into a functional one, specifically about magrittr.
I want to add a data_frame which is manually input to one that is already there as so:
cars_0 <- mtcars %>%
mutate(brand = row.names(.)) %>%
select(brand, mpg, cyl)
new_cars <- matrix(ncol = 3, byrow = T, c(
"VW Beetle", 25, 4,
"Peugeot 406", 42, 6)) # Coercing types is not an issue here.
cars_1 <- rbind(cars_0,
set_colnames(new_cars, names(cars_0)))
I'm writing the new cars in a matrix for "increased legibility", and therefore need to set the column names for it to be bound to cars_0.
If anyone likes magrittr as much as I do, they might want to present new_cars first and pipe it to set_colnames
cars_1 <- rbind(cars_0, new_cars %>%
set_colnames(names(cars_0)))
Or to avoid repetition they'll want to indicate cars_0 and pipe it to rbind
cars_1 <- cars_0 %>%
rbind(., set_colnames(new_cars, names(.)))
However one cannot do both as there is confusion about whom is being piped
cars_1 <- cars_0 %>%
rbind(., new_cars %>% set_colnames(names(.)))
## Error in match.names(clabs, names(xi)) :
## names do not match previous names
My question: Is there a way to distinguish the two arguments that are piped?
Short answer: no.
Longer answer: I'm not sure what the rationale for doing this would be. The philosophy behind magrittr was to unnest composite functions, with the primary intent of making it easier to read the code. For example:
f(g(h(x)))
becomes
h(x) %>% g() %>% f()
Trying to use pipes in a manner that places two objects to be interpreted as the . argument goes against the philosophy of simplification. There are circumstances in which you can have nested pipes, but the environments ought to remain distinct. Trying to cross two pipes in the same environment can be likened to crossing the streams.
Don't cross the streams :)