%>% .$column_name equivalent for R base pipe |> - r

I frequently use the dplyr piping to get a column from a tibble into a vector as below
iris %>% .$Sepal.Length
iris %>% .$Sepal.Length %>% cut(5)
How can I do the same using the latest R built-in pipe symbol |>
iris |> .$Sepal.Length
iris |> .$Sepal.Length |> cut(5)
Error: function '$' not supported in RHS call of a pipe

We can use getElement().
iris |> getElement('Sepal.Length') |> cut(5)

In base pipe no placeholder is provided for the data that is passed in the pipe. This is one difference between magrittr pipe and base R pipe. You may use an anonymous function to access the object.
iris |> {\(x) x$Sepal.Length}()

The direct usage of $ in |> is currently disabled. If the call of $ or other disabled functions in |> is still needed, an option, beside the creation of a function is to use $ via the function :: as base::`$` or place it in brakes ($):
iris |> (`$`)("Sepal.Length")
iris |> base::`$`("Sepal.Length")
iris |> (\(.) .$Sepal.Length)()
fun <- `$`
iris |> fun(Sepal.Length)
This will also work in cases where more than one column will be extracted.
iris |> (`[`)(c("Sepal.Length", "Petal.Length"))
Another option can be the use of a bizarro pipe ->.;. Some call it a joke others clever use of existing syntax.
iris ->.; .$Sepal.Length
This creates or overwrites . in the .GlobalEnv. rm(.) can be used to remove it. Alternatively it could be processed in local:
local({iris ->.; .$Sepal.Length})
In this case it produces two same objects in the environment iris and . but as long as they are not modified they point the the same address.
tracemem(iris)
#[1] "<0x556871bab148>"
tracemem(.)
#[1] "<0x556871bab148>"
|> is used as a pipe operator in R.
The left-hand side expression lhs is inserted as the first free argument in the call of to the right-hand side expression rhs.
mtcars |> head() # same as head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
mtcars |> head(2) # same as head(mtcars, 2)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
It is also possible to use a named argument with the placeholder _ in the rhs call to specify where the lhs is to be inserted. The placeholder can only appear once on the rhs.
mtcars |> lm(mpg ~ disp, data = _)
#mtcars |> lm(mpg ~ disp, _) #Error: pipe placeholder can only be used as a named argument
#Call:
#lm(formula = mpg ~ disp, data = mtcars)
#
#Coefficients:
#(Intercept) disp
# 29.59985 -0.04122
Alternatively explicitly name the argument(s) before the "one":
mtcars |> lm(formula = mpg ~ disp)
In case the placeholder is used more than once or used as a named or also unnamed argument on any position or for disabled functions: Use an (anonymous) function.
mtcars |> (\(.) .[.$cyl == 6,])()
#mtcars ->.; .[.$cyl == 6,] # Alternative using bizarro pipe
#local(mtcars ->.; .[.$cyl == 6,]) # Without overwriting and keeping .
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
mtcars |> (\(.) lm(mpg ~ disp, .))()
#Call:
#lm(formula = mpg ~ disp, data = .)
#
#Coefficients:
#(Intercept) disp
# 29.59985 -0.04122
An expression written as x |> f(y) is parsed as f(x, y). While the code in a pipeline is written sequentially, regular R semantics for evaluation apply. So piped expressions will be evaluated only when first used in the rhs expression.

Interesting example and great answers, let me add another version: I use usually selectand then unlist in such cases. This follows the "speaking R" paradigm and works same with both operators %>% and |>:
library("dplyr")
iris %>% select(Sepal.Length) %>% unlist() %>% cut(5)
iris |> select(Sepal.Length) |> unlist() |> cut(5)
Note that select is from dplyr and pull brought in from #jpdugo17 is even better.
If we use usual "base R" indexing, it is also short and works in both worlds:
iris[["Sepal.Length"]] |> cut(5)
iris$Sepal.Length |> cut(5)
and thanks to the comment of #zx8754 one can of course also use base R without any pipes
cut(iris$Sepal.Length, 5)
... but I think that the OP just wanted to point out differences in piping. I guess that it is to be applied in a bigger context and iris is only an example.

This is also an option:
iris |> dplyr::pull(Sepal.Length) |> cut(5)
Edit:
I wonder why calling a function with backticks isn't allowed.
iris |> `[`(, 'Sepal.Length')
#>Error: function '[' not supported in RHS call of a pipe
As pointed out by #Hugh, backticks are allowed but some functions are not.
Here's the blacklisted functions list extracted from wch Github
"if", "while", "repeat", "for", "break", "next", "return", "function",
"(", "{",
"+", "-", "*", "/", "^", "%%", "%/%", "%*%", ":", "::", ":::", "?", "|>",
"~", "#", "=>",
"==", "!=", "<", ">", "<=", ">=",
"&", "|", "&&", "||", "!",
"<-", "<<-", "=",
"$", "[", "[[",
"$<-", "[<-", "[[<-",
0

I know this question is closed. Other Base R solutions where we use symbol name instead of the character name might include:
iris |>
with(Sepal.Length)
iris |>
subset(select = Sepal.Length)

Since R 4.2.0, you can use _ as a placeholder for |>. Because "functions in rhs calls [can] not be syntactically special", you cannot use $ directly, so you have to define the function with another name first, and then use the placeholder and the column name:
set <- `$`
iris |> set(x = _, Sepal.Length)

Related

Using glue-like constructs on RHS in R/Tidyeval

I've spent hours trying to make glue on the RHS of a formula work and out of clues. Here is a simple reprex.
meta <- function(x, var, suffix){
x<- x %>% mutate("{{var}}_{suffix}":= 5)
x<- x %>% mutate("{{var}}_{suffix}_new":= {{var}} - "{{var}}_{suffix}")
}
x<- meta(mtcars, mpg, suf)
#Should be equivalent to
x<- mtcars %>% mutate(mpg_suf:= 5)
x<- x%>% mutate(mpg_suf_new:= mpg - mpg_suf)
#N: Tried https://stackoverflow.com/questions/70427403/how-to-correctly-glue-together-prefix-suffix-in-a-function-call-rhs but none of the methods in it worked, unfortunately
Meta function gives me "Error in local_error_context(dots = dots, .index = i, mask = mask) :
promise already under evaluation: recursive default argument reference or earlier problems? "
Went over all hits for the searchwords for it on SO but nothing worked at the moment.
Would really appreciate any insights. Thank you!
Here is a working version:
meta <- function(x, var, suffix){
new_name <- rlang::englue("{{ var }}_{{ suffix }}")
x %>%
mutate("{new_name}" := 5) %>%
mutate("{new_name}_new" := {{ var }} - .data[[new_name]])
}
names(meta(mtcars, mpg, suf))
#> [1] "mpg" "cyl" "disp" "hp"
#> [5] "drat" "wt" "qsec" "vs"
#> [9] "am" "gear" "carb" "mpg_suf"
#> [13] "mpg_suf_new"
To understand what is going on:
Learn about the difference between "{{ var }}" and "{var}" in tidyeval glue strings: https://rlang.r-lib.org/reference/glue-operators.html
Learn about englue() to create glue strings outside of the LHS of :=: https://rlang.r-lib.org/reference/englue.html. This part is not necessary but I thought it was nicer to create and reuse a variable.
Tricky part, you create a new column with a constructed name and then want to use the new column that this name refers to. You'll have to subset it with .data, see: https://rlang.r-lib.org/reference/dot-data.html
See also the general topic: https://rlang.r-lib.org/reference/topic-data-mask-programming.html
I think it's best if we define the pieces we need first, then we can use them as needed on the LHS or the RHS of the calculation. I will add that it doesn't make much sense to me to pass the suffix argument as a bare name. I think it would be a clearer choice to make it string only.
library(dplyr)
meta <- function(x, var, suffix) {
var <- rlang::as_name(enquo(var))
suffix <- rlang::as_name(enquo(suffix)) # Remove this to make "suffix" string only.
new_var <- glue::glue("{var}_{suffix}")
x %>%
mutate("{new_var}" := 5,
"{new_var}_new" := !!sym(var) - !!sym(new_var))
}
mtcars %>%
head() %>%
meta(mpg, suf)
mpg cyl disp hp drat wt qsec vs am gear carb mpg_suf mpg_suf_new
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 5 16.0
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 5 16.0
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 5 17.8
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 5 16.4
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 5 13.7
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 5 13.1

How to evaluate a glue expression in a nested function?

I'm trying to nest a function that glues together two strings within a function that uses the combined string to name a column of a dataframe. However, the problem seems to be that the glue expression is not evaluated to a string early enough. Can (and should) I force the expression to be evaluated before it is being passed on as an argument to another function?
library(tidyverse)
# define inner function
add_prefix <- function(string) {
x <- glue::glue("prefix_{string}")
return(as.character(x))
}
# define outer function
mod_mtcars <- function(df, name) {
df %>%
mutate({{ name }} := mpg ^ 2)
}
mod_mtcars(mtcars, add_prefix("foo"))
#> Error: The LHS of `:=` must be a string or a symbol
# alternative outer function with explicit enquoting
mod_mtcars2 <- function(df, name) {
name <- ensym(name)
df %>%
mutate(!!name := mpg ^ 2)
}
mod_mtcars2(mtcars, add_prefix("foo"))
#> Error: Only strings can be converted to symbols
Created on 2021-10-30 by the reprex package (v2.0.1)
First of all, the glue string syntax is now preferred over embracing directly in the LHS. So prefer "{{ var }}" := expr to {{ var }} := expr. In a future version of rlang (next year) we'll make it possible to use glue strings with =. At that point, := will be pretty much superseded. We went with := to allow !! injection on the LHS before glue support was added.
Second, your problem is that you're using {{ instead of simple injection. {{ is for injecting the expression supplied as argument, not the value of the expression. Use normal glue interpolation with "{" to inject the value instead:
mod_mtcars <- function(df, name) {
df %>%
mutate("{name}" := mpg ^ 2)
}
PS: Your !! version had a similar problem. Because you used ensym() on the argument, you were defusing the expression supplied as argument instead of using the value. But ensym() requires the expression to be a simple name and you supplied a full computation, causing an error. You can fix it like this:
mod_mtcars2 <- function(df, name) {
df %>%
mutate(!!name := mpg ^ 2)
}
But glue syntax is now preferred.
name is not a symbol. Try:
mod_mtcars <- function(df, name) {
name <- sym(name)
df %>%
mutate({{ name }} := mpg ^ 2)
}
mod_mtcars(mtcars, add_prefix("foo"))
Even better, tidy eval now supports glue strings so you could simplify with:
# don't need add_prefix()
mod_mtcars <- function(df, name){
df %>%
mutate("prefix_{{name}}" := mpg ^ 2)
}
mod_mtcars(mtcars, foo)
Lastly, the curly-curly operator tunnels your expression and you're passing it a string. If you keep your current functions use the bang-bang operator instead:
mod_mtcars <- function(df, name){
df %>%
mutate(!! name := mpg ^ 2)
}
mod_mtcars(mtcars, add_prefix("foo"))
We may need to escape the input (keeping the OP's original function unchanged)
mod_mtcars(mtcars, !!add_prefix("foo"))
-output
mpg cyl disp hp drat wt qsec vs am gear carb prefix_foo
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 441.00
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 441.00
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 519.84
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 457.96
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 349.69
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 327.61
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 204.49
...

How to use a short script to eliminate all but one duplicate column variables based on the prefix of the colname

I want to know to use a short script to eliminate all but one duplicate column variables based on the prefix of the colname without inputting the variables I want to remove by hand.
For example, I created repeats of the mtcars$am variables, called am1, am2, am3, and am4 in a data frame called mtcars_example_2. I removed the original am variable in the mtcars_example_2 data frame.
I can use the script below to eliminate all variables with the prefix "am" but the am1 variable into a new variable called mtcars_example_3 using the code below, which inputs all variables to remove by hand:
## long way of removing all variable with am prefix that were not am1
mtcars_example_3 <-
mtcars_example_2 %>%
select(
-c(
"am2", "am3", "am4"
)
)
But this seems like the long way of doing this. Is there a faster way that does not require me to individual type in the names of each of the variables that I want to remove from the data.
Is this possible? If so, how can this be done?
Thanks ahead of time.
Here is the code for the example:
# example data
## loads packages
library(tidyverse)
## creates mtcars_example data
mtcars_example_1 <- data.frame(mtcars)
mtcars_example_2 <- data.frame(mtcars_example_1)
## creates duplicate variables, based on am variable
mtcars_example_2$am1 <- mtcars_example_1$am
mtcars_example_2$am2 <- mtcars_example_1$am
mtcars_example_2$am3 <- mtcars_example_1$am
mtcars_example_2$am4 <- mtcars_example_1$am
## removes original variable
mtcars_example_2 <-
mtcars_example_2 %>%
select(
-c(
"am"
)
)
## long way of removing all variable with am prefix that were not am1
mtcars_example_3 <-
mtcars_example_2 %>%
select(
-c(
"am2", "am3", "am4"
)
)
You can remove all the variables that start with am but keep am1 :
library(dplyr)
mtcars_example_2 %>% select(-starts_with('am'), am1) %>% head
# mpg cyl disp hp drat wt qsec vs gear carb am1
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 4 4 1
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 4 4 1
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 4 1 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 3 1 0
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 3 2 0
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 3 1 0
Depending on your actual scenario you can also use regex to remove columns.
mtcars_example_2 %>% select(-matches('am[2-4]')) %>% head
We could also do
library(dplyr)
mtcars_example_2 %>%
select(-contains('am'), am1)

Indexing by column name to the end of the dataframe - R

I'm wondering if there is a way to select a group of columns by the name of the first column in the group and then all the next columns either a) to the end of the data frame, or b) to another column, also using its name.
a) As an example for the first question, in the mtcars dataset, is there a way to select the columns from drat to the end of the data frame? (Something like mtcars[,'drat':ncol(mtcars)])
b) For the second question, is there a way to select the columns starting at cyl and ending at wt? (Something like mtcars[,'cyl':'wt'])
Many elegant solutions already provided but one can even use base-R to get the desired result using which as:
Ans a:
mtcars[,which(names(mtcars) == "drat"):ncol(mtcars)]
Ans b:
mtcars[,which(names(mtcars) == "cyl"):which(names(mtcars) == "wt")]
# cyl disp hp drat wt
#Mazda RX4 6 160.0 110 3.90 2.620
#Mazda RX4 Wag 6 160.0 110 3.90 2.875
#Datsun 710 4 108.0 93 3.85 2.320
#Hornet 4 Drive 6 258.0 110 3.08 3.215
#Hornet Sportabout 8 360.0 175 3.15 3.440
#......so on
We can do with this with select from dplyr
Answer a)
mtcars %>% select(drat:get(last(names(.))))
Answer b)
mtcars %>% select(cyl:wt)
In dplyr, the select function does exactly this (no quotes needed):
mtcards %>%
select(cyl:wt)
If we need to use a quoted string, convert it to sym (symbol) and then do the evaluation (!!
mtcars %>%
select(!! (rlang::sym("cyl")): !!(rlang::sym("wt")))
It would be when these are stored in an object
a <- "cyl"
b <- "wt"
mtcars %>%
select(!! (rlang::sym(a)): !!(rlang::sym(b)))
Or another option is
mtcars %>%
select(!! rlang::parse_expr(glue::glue("{a}:{b}")))

Using 'mutate_' to sum a bunch of columns row-wise

In this blog post, Paul Hiemstra shows how to sum up two columns using dplyr::mutate_. Copy/paste-ing relevant parts:
library(lazyeval)
f = function(col1, col2, new_col_name) {
mutate_call = lazyeval::interp(~ a + b, a = as.name(col1), b = as.name(col2))
mtcars %>% mutate_(.dots = setNames(list(mutate_call), new_col_name))
}
allows one to then do:
head(f('wt', 'mpg', 'hahaaa'))
Great!
I followed up with a question (see comments) as to how one could extend this to a 100 columns, since it wasn't quite clear (to me) how one could do it without having to type all the names using the above method. Paul was kind enough to indulge me and provided this answer (thanks!):
# data
df = data.frame(matrix(1:100, 10, 10))
names(df) = LETTERS[1:10]
# answer
sum_all_rows = function(list_of_cols) {
summarise_calls = sapply(list_of_cols, function(col) {
lazyeval::interp(~col_name, col_name = as.name(col))
})
df %>% select_(.dots = summarise_calls) %>% mutate(ans1 = rowSums(.))
}
sum_all_rows(LETTERS[sample(1:10, 5)])
I'd like to improve this answer on these points:
The other columns are gone. I'd like to keep them.
It uses rowSums() which has to coerce the data.frame to a matrix which I'd like to avoid.
Also I'm not sure if the use of . within non-do() verbs is encouraged? Because . within mutate() doesn't seem to adapt to just those rows when used with group_by().
And most importantly, how can I do the same using mutate_() instead of mutate()?
I found this answer, which addresses point 1, but unfortunately, both dplyr answers use rowSums() along with mutate().
PS: I just read Hadley's comment under that answer. IIUC, 'reshape to long form + group by + sum + reshape to wide form' is the recommend dplyr way for these type of operations?
Here's a different approach:
library(dplyr); library(lazyeval)
f <- function(df, list_of_cols, new_col) {
df %>%
mutate_(.dots = ~Reduce(`+`, .[list_of_cols])) %>%
setNames(c(names(df), new_col))
}
head(f(mtcars, c("mpg", "cyl"), "x"))
# mpg cyl disp hp drat wt qsec vs am gear carb x
#1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 27.0
#2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 27.0
#3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 26.8
#4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 27.4
#5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 26.7
#6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 24.1
Regarding your points:
Other columns are kept
It doesn't use rowSums
You are specifically asking for a row-wise operation here so I'm not sure (yet) how a group_by could do any harm when using . inside mutate/mutate_
It makes use of mutate_

Resources