Using glue-like constructs on RHS in R/Tidyeval - r

I've spent hours trying to make glue on the RHS of a formula work and out of clues. Here is a simple reprex.
meta <- function(x, var, suffix){
x<- x %>% mutate("{{var}}_{suffix}":= 5)
x<- x %>% mutate("{{var}}_{suffix}_new":= {{var}} - "{{var}}_{suffix}")
}
x<- meta(mtcars, mpg, suf)
#Should be equivalent to
x<- mtcars %>% mutate(mpg_suf:= 5)
x<- x%>% mutate(mpg_suf_new:= mpg - mpg_suf)
#N: Tried https://stackoverflow.com/questions/70427403/how-to-correctly-glue-together-prefix-suffix-in-a-function-call-rhs but none of the methods in it worked, unfortunately
Meta function gives me "Error in local_error_context(dots = dots, .index = i, mask = mask) :
promise already under evaluation: recursive default argument reference or earlier problems? "
Went over all hits for the searchwords for it on SO but nothing worked at the moment.
Would really appreciate any insights. Thank you!

Here is a working version:
meta <- function(x, var, suffix){
new_name <- rlang::englue("{{ var }}_{{ suffix }}")
x %>%
mutate("{new_name}" := 5) %>%
mutate("{new_name}_new" := {{ var }} - .data[[new_name]])
}
names(meta(mtcars, mpg, suf))
#> [1] "mpg" "cyl" "disp" "hp"
#> [5] "drat" "wt" "qsec" "vs"
#> [9] "am" "gear" "carb" "mpg_suf"
#> [13] "mpg_suf_new"
To understand what is going on:
Learn about the difference between "{{ var }}" and "{var}" in tidyeval glue strings: https://rlang.r-lib.org/reference/glue-operators.html
Learn about englue() to create glue strings outside of the LHS of :=: https://rlang.r-lib.org/reference/englue.html. This part is not necessary but I thought it was nicer to create and reuse a variable.
Tricky part, you create a new column with a constructed name and then want to use the new column that this name refers to. You'll have to subset it with .data, see: https://rlang.r-lib.org/reference/dot-data.html
See also the general topic: https://rlang.r-lib.org/reference/topic-data-mask-programming.html

I think it's best if we define the pieces we need first, then we can use them as needed on the LHS or the RHS of the calculation. I will add that it doesn't make much sense to me to pass the suffix argument as a bare name. I think it would be a clearer choice to make it string only.
library(dplyr)
meta <- function(x, var, suffix) {
var <- rlang::as_name(enquo(var))
suffix <- rlang::as_name(enquo(suffix)) # Remove this to make "suffix" string only.
new_var <- glue::glue("{var}_{suffix}")
x %>%
mutate("{new_var}" := 5,
"{new_var}_new" := !!sym(var) - !!sym(new_var))
}
mtcars %>%
head() %>%
meta(mpg, suf)
mpg cyl disp hp drat wt qsec vs am gear carb mpg_suf mpg_suf_new
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 5 16.0
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 5 16.0
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 5 17.8
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 5 16.4
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 5 13.7
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 5 13.1

Related

How to evaluate a glue expression in a nested function?

I'm trying to nest a function that glues together two strings within a function that uses the combined string to name a column of a dataframe. However, the problem seems to be that the glue expression is not evaluated to a string early enough. Can (and should) I force the expression to be evaluated before it is being passed on as an argument to another function?
library(tidyverse)
# define inner function
add_prefix <- function(string) {
x <- glue::glue("prefix_{string}")
return(as.character(x))
}
# define outer function
mod_mtcars <- function(df, name) {
df %>%
mutate({{ name }} := mpg ^ 2)
}
mod_mtcars(mtcars, add_prefix("foo"))
#> Error: The LHS of `:=` must be a string or a symbol
# alternative outer function with explicit enquoting
mod_mtcars2 <- function(df, name) {
name <- ensym(name)
df %>%
mutate(!!name := mpg ^ 2)
}
mod_mtcars2(mtcars, add_prefix("foo"))
#> Error: Only strings can be converted to symbols
Created on 2021-10-30 by the reprex package (v2.0.1)
First of all, the glue string syntax is now preferred over embracing directly in the LHS. So prefer "{{ var }}" := expr to {{ var }} := expr. In a future version of rlang (next year) we'll make it possible to use glue strings with =. At that point, := will be pretty much superseded. We went with := to allow !! injection on the LHS before glue support was added.
Second, your problem is that you're using {{ instead of simple injection. {{ is for injecting the expression supplied as argument, not the value of the expression. Use normal glue interpolation with "{" to inject the value instead:
mod_mtcars <- function(df, name) {
df %>%
mutate("{name}" := mpg ^ 2)
}
PS: Your !! version had a similar problem. Because you used ensym() on the argument, you were defusing the expression supplied as argument instead of using the value. But ensym() requires the expression to be a simple name and you supplied a full computation, causing an error. You can fix it like this:
mod_mtcars2 <- function(df, name) {
df %>%
mutate(!!name := mpg ^ 2)
}
But glue syntax is now preferred.
name is not a symbol. Try:
mod_mtcars <- function(df, name) {
name <- sym(name)
df %>%
mutate({{ name }} := mpg ^ 2)
}
mod_mtcars(mtcars, add_prefix("foo"))
Even better, tidy eval now supports glue strings so you could simplify with:
# don't need add_prefix()
mod_mtcars <- function(df, name){
df %>%
mutate("prefix_{{name}}" := mpg ^ 2)
}
mod_mtcars(mtcars, foo)
Lastly, the curly-curly operator tunnels your expression and you're passing it a string. If you keep your current functions use the bang-bang operator instead:
mod_mtcars <- function(df, name){
df %>%
mutate(!! name := mpg ^ 2)
}
mod_mtcars(mtcars, add_prefix("foo"))
We may need to escape the input (keeping the OP's original function unchanged)
mod_mtcars(mtcars, !!add_prefix("foo"))
-output
mpg cyl disp hp drat wt qsec vs am gear carb prefix_foo
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 441.00
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 441.00
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 519.84
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 457.96
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 349.69
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 327.61
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 204.49
...

How to use a short script to eliminate all but one duplicate column variables based on the prefix of the colname

I want to know to use a short script to eliminate all but one duplicate column variables based on the prefix of the colname without inputting the variables I want to remove by hand.
For example, I created repeats of the mtcars$am variables, called am1, am2, am3, and am4 in a data frame called mtcars_example_2. I removed the original am variable in the mtcars_example_2 data frame.
I can use the script below to eliminate all variables with the prefix "am" but the am1 variable into a new variable called mtcars_example_3 using the code below, which inputs all variables to remove by hand:
## long way of removing all variable with am prefix that were not am1
mtcars_example_3 <-
mtcars_example_2 %>%
select(
-c(
"am2", "am3", "am4"
)
)
But this seems like the long way of doing this. Is there a faster way that does not require me to individual type in the names of each of the variables that I want to remove from the data.
Is this possible? If so, how can this be done?
Thanks ahead of time.
Here is the code for the example:
# example data
## loads packages
library(tidyverse)
## creates mtcars_example data
mtcars_example_1 <- data.frame(mtcars)
mtcars_example_2 <- data.frame(mtcars_example_1)
## creates duplicate variables, based on am variable
mtcars_example_2$am1 <- mtcars_example_1$am
mtcars_example_2$am2 <- mtcars_example_1$am
mtcars_example_2$am3 <- mtcars_example_1$am
mtcars_example_2$am4 <- mtcars_example_1$am
## removes original variable
mtcars_example_2 <-
mtcars_example_2 %>%
select(
-c(
"am"
)
)
## long way of removing all variable with am prefix that were not am1
mtcars_example_3 <-
mtcars_example_2 %>%
select(
-c(
"am2", "am3", "am4"
)
)
You can remove all the variables that start with am but keep am1 :
library(dplyr)
mtcars_example_2 %>% select(-starts_with('am'), am1) %>% head
# mpg cyl disp hp drat wt qsec vs gear carb am1
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 4 4 1
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 4 4 1
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 4 1 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 3 1 0
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 3 2 0
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 3 1 0
Depending on your actual scenario you can also use regex to remove columns.
mtcars_example_2 %>% select(-matches('am[2-4]')) %>% head
We could also do
library(dplyr)
mtcars_example_2 %>%
select(-contains('am'), am1)

%>% .$column_name equivalent for R base pipe |>

I frequently use the dplyr piping to get a column from a tibble into a vector as below
iris %>% .$Sepal.Length
iris %>% .$Sepal.Length %>% cut(5)
How can I do the same using the latest R built-in pipe symbol |>
iris |> .$Sepal.Length
iris |> .$Sepal.Length |> cut(5)
Error: function '$' not supported in RHS call of a pipe
We can use getElement().
iris |> getElement('Sepal.Length') |> cut(5)
In base pipe no placeholder is provided for the data that is passed in the pipe. This is one difference between magrittr pipe and base R pipe. You may use an anonymous function to access the object.
iris |> {\(x) x$Sepal.Length}()
The direct usage of $ in |> is currently disabled. If the call of $ or other disabled functions in |> is still needed, an option, beside the creation of a function is to use $ via the function :: as base::`$` or place it in brakes ($):
iris |> (`$`)("Sepal.Length")
iris |> base::`$`("Sepal.Length")
iris |> (\(.) .$Sepal.Length)()
fun <- `$`
iris |> fun(Sepal.Length)
This will also work in cases where more than one column will be extracted.
iris |> (`[`)(c("Sepal.Length", "Petal.Length"))
Another option can be the use of a bizarro pipe ->.;. Some call it a joke others clever use of existing syntax.
iris ->.; .$Sepal.Length
This creates or overwrites . in the .GlobalEnv. rm(.) can be used to remove it. Alternatively it could be processed in local:
local({iris ->.; .$Sepal.Length})
In this case it produces two same objects in the environment iris and . but as long as they are not modified they point the the same address.
tracemem(iris)
#[1] "<0x556871bab148>"
tracemem(.)
#[1] "<0x556871bab148>"
|> is used as a pipe operator in R.
The left-hand side expression lhs is inserted as the first free argument in the call of to the right-hand side expression rhs.
mtcars |> head() # same as head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
mtcars |> head(2) # same as head(mtcars, 2)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
It is also possible to use a named argument with the placeholder _ in the rhs call to specify where the lhs is to be inserted. The placeholder can only appear once on the rhs.
mtcars |> lm(mpg ~ disp, data = _)
#mtcars |> lm(mpg ~ disp, _) #Error: pipe placeholder can only be used as a named argument
#Call:
#lm(formula = mpg ~ disp, data = mtcars)
#
#Coefficients:
#(Intercept) disp
# 29.59985 -0.04122
Alternatively explicitly name the argument(s) before the "one":
mtcars |> lm(formula = mpg ~ disp)
In case the placeholder is used more than once or used as a named or also unnamed argument on any position or for disabled functions: Use an (anonymous) function.
mtcars |> (\(.) .[.$cyl == 6,])()
#mtcars ->.; .[.$cyl == 6,] # Alternative using bizarro pipe
#local(mtcars ->.; .[.$cyl == 6,]) # Without overwriting and keeping .
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
mtcars |> (\(.) lm(mpg ~ disp, .))()
#Call:
#lm(formula = mpg ~ disp, data = .)
#
#Coefficients:
#(Intercept) disp
# 29.59985 -0.04122
An expression written as x |> f(y) is parsed as f(x, y). While the code in a pipeline is written sequentially, regular R semantics for evaluation apply. So piped expressions will be evaluated only when first used in the rhs expression.
Interesting example and great answers, let me add another version: I use usually selectand then unlist in such cases. This follows the "speaking R" paradigm and works same with both operators %>% and |>:
library("dplyr")
iris %>% select(Sepal.Length) %>% unlist() %>% cut(5)
iris |> select(Sepal.Length) |> unlist() |> cut(5)
Note that select is from dplyr and pull brought in from #jpdugo17 is even better.
If we use usual "base R" indexing, it is also short and works in both worlds:
iris[["Sepal.Length"]] |> cut(5)
iris$Sepal.Length |> cut(5)
and thanks to the comment of #zx8754 one can of course also use base R without any pipes
cut(iris$Sepal.Length, 5)
... but I think that the OP just wanted to point out differences in piping. I guess that it is to be applied in a bigger context and iris is only an example.
This is also an option:
iris |> dplyr::pull(Sepal.Length) |> cut(5)
Edit:
I wonder why calling a function with backticks isn't allowed.
iris |> `[`(, 'Sepal.Length')
#>Error: function '[' not supported in RHS call of a pipe
As pointed out by #Hugh, backticks are allowed but some functions are not.
Here's the blacklisted functions list extracted from wch Github
"if", "while", "repeat", "for", "break", "next", "return", "function",
"(", "{",
"+", "-", "*", "/", "^", "%%", "%/%", "%*%", ":", "::", ":::", "?", "|>",
"~", "#", "=>",
"==", "!=", "<", ">", "<=", ">=",
"&", "|", "&&", "||", "!",
"<-", "<<-", "=",
"$", "[", "[[",
"$<-", "[<-", "[[<-",
0
I know this question is closed. Other Base R solutions where we use symbol name instead of the character name might include:
iris |>
with(Sepal.Length)
iris |>
subset(select = Sepal.Length)
Since R 4.2.0, you can use _ as a placeholder for |>. Because "functions in rhs calls [can] not be syntactically special", you cannot use $ directly, so you have to define the function with another name first, and then use the placeholder and the column name:
set <- `$`
iris |> set(x = _, Sepal.Length)

Apply variable function to columns in data.table

I'm wondering if there's a way to apply a function in a string variable to .SD cols in a data.table.
I can generalize all other parts of function calls using a data.table, including input and output columns, which I'm very happy about. But the final piece seems to be applying a variable function to a data.table, which is something I believe I've done before with dplyr and do.call.
mtcars <- as.data.table(mtcars)
returnNames <- "calculatedColumn"
SDnames <- c("mpg","hp")
myfunc <- function(data) {
print(data)
return(data[,1]*data[,2])
}
This obviously works:
mtcars[,eval(returnNames) := myfunc(.SD),.SDcols = SDnames,by = cyl]
But if I want to apply a dynamic function, something like this does not work:
functionCall <- "myfunc"
mtcars[,eval(returnNames) := lapply(.SD,eval(functionCall)),.SDcols = SDnames,by = cyl]
I get this error:
Error in `[.data.table`(mtcars, , `:=`(eval(returnNames), lapply(.SD, : attempt to apply non-function
Is using "apply" with "eval" the right idea, or am I on the wrong track entirely?
You don't want lapply. Since myfunc takes a data.table with multiple columns, you just want to feed such a data table into the function as one object.
To get the function you need get instead of eval
On the left-hand-side of :=, you can just put the character vector in parentheses, eval isn't needed
-
mtcars[, (returnNames) := get(functionCall)(.SD)
, .SDcols = SDnames
, by = cyl]
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb calculatedColumn
# 1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 2310.0
# 2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 2310.0
# 3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 2120.4
# 4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 2354.0
# 5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 3272.5
# 6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 1900.5
The code above was run after the following code
mtcars <- as.data.table(mtcars)
returnNames <- "calculatedColumn"
SDnames <- c("mpg","hp")
myfunc <- function(data) {
print(data)
return(data[,1]*data[,2])
}
functionCall <- "myfunc"

R function with variable number of arguments for ifelse

I have a script that creates a column so that I know which rule should be applied to each row in a dataframe.
EndoSubset$FU_Group<-ifelse(EndoSubset$IMorNoIM=="No_IM","Rule1",
ifelse(EndoSubset$IMorNoIM=="IM","Rule2",
ifelse(EndoSubset$IMorNoIM=="AnotherIM","Rule3",
"NoRules")))
I want to make this into a function so that there can be any number of rules and any number of conditions for a column so it could be:
EndoSubset$FU_Group<-ifelse(EndoSubset$IMorNoIM=="No_IM","Rule1",
ifelse(EndoSubset$IMorNoIM=="IM","Rule2",
ifelse(EndoSubset$IMorNoIM=="AnotherIM","Rule3",
ifelse(EndoSubset$IMorNoIM=="SomeOtherIM","Rule4",
ifelse(EndoSubset$IMorNoIM=="LotsOfIM","Rule5",
"NoRules")))
I understand that I can use the ellipsis for this but I don't understand how to use this for both the conditional string ("No_IM, "IM,"AnotherIM", etc) and the Rule string at the same time ("Rule1","Rule2","Rule3" etc.)
This answer is based upon another, incomplete answer that has been deleted.
You can use case_when() from the dplyr package to achieve this. It takes an arbitrary number of conditions. Since you don't give a reproducible example, I show how this works with mtcars:
library(dplyr)
mtcars$cyl_group <- case_when(mtcars$cyl == 4 ~ "Rule1",
mtcars$cyl == 6 ~ "Rule2",
TRUE ~ "NoRules")
mtcars[2:5, ]
## mpg cyl disp hp drat wt qsec vs am gear carb cyl_group
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Rule2
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Rule1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Rule2
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 NoRules
As you can see, you can easily connect a condition with a value using ~. Your two examples can probably be solved like this (I cannot check this, since you don't give your data):
EndoSubset$FU_Group <- case_when(EndoSubset$IMorNoIM == "No_IM" ~ "Rule1",
EndoSubset$IMorNoIM == "IM" ~ "Rule2",
EndoSubset$IMorNoIM == "AnotherIM" ~ "Rule3",
TRUE ~ "NoRules")
EndoSubset$FU_Group <- case_when(EndoSubset$IMorNoIM == "No_IM" ~ "Rule1",
EndoSubset$IMorNoIM == "IM" ~ "Rule2",
EndoSubset$IMorNoIM == "AnotherIM" ~ "Rule3",
EndoSubset$IMorNoIM == "SomeOtherIM" ~ "Rule4",
EndoSubset$IMorNoIM == "LotsOfIM" ~ "Rule5",
TRUE ~ "NoRules")

Resources