How to evaluate a glue expression in a nested function? - r

I'm trying to nest a function that glues together two strings within a function that uses the combined string to name a column of a dataframe. However, the problem seems to be that the glue expression is not evaluated to a string early enough. Can (and should) I force the expression to be evaluated before it is being passed on as an argument to another function?
library(tidyverse)
# define inner function
add_prefix <- function(string) {
x <- glue::glue("prefix_{string}")
return(as.character(x))
}
# define outer function
mod_mtcars <- function(df, name) {
df %>%
mutate({{ name }} := mpg ^ 2)
}
mod_mtcars(mtcars, add_prefix("foo"))
#> Error: The LHS of `:=` must be a string or a symbol
# alternative outer function with explicit enquoting
mod_mtcars2 <- function(df, name) {
name <- ensym(name)
df %>%
mutate(!!name := mpg ^ 2)
}
mod_mtcars2(mtcars, add_prefix("foo"))
#> Error: Only strings can be converted to symbols
Created on 2021-10-30 by the reprex package (v2.0.1)

First of all, the glue string syntax is now preferred over embracing directly in the LHS. So prefer "{{ var }}" := expr to {{ var }} := expr. In a future version of rlang (next year) we'll make it possible to use glue strings with =. At that point, := will be pretty much superseded. We went with := to allow !! injection on the LHS before glue support was added.
Second, your problem is that you're using {{ instead of simple injection. {{ is for injecting the expression supplied as argument, not the value of the expression. Use normal glue interpolation with "{" to inject the value instead:
mod_mtcars <- function(df, name) {
df %>%
mutate("{name}" := mpg ^ 2)
}
PS: Your !! version had a similar problem. Because you used ensym() on the argument, you were defusing the expression supplied as argument instead of using the value. But ensym() requires the expression to be a simple name and you supplied a full computation, causing an error. You can fix it like this:
mod_mtcars2 <- function(df, name) {
df %>%
mutate(!!name := mpg ^ 2)
}
But glue syntax is now preferred.

name is not a symbol. Try:
mod_mtcars <- function(df, name) {
name <- sym(name)
df %>%
mutate({{ name }} := mpg ^ 2)
}
mod_mtcars(mtcars, add_prefix("foo"))
Even better, tidy eval now supports glue strings so you could simplify with:
# don't need add_prefix()
mod_mtcars <- function(df, name){
df %>%
mutate("prefix_{{name}}" := mpg ^ 2)
}
mod_mtcars(mtcars, foo)
Lastly, the curly-curly operator tunnels your expression and you're passing it a string. If you keep your current functions use the bang-bang operator instead:
mod_mtcars <- function(df, name){
df %>%
mutate(!! name := mpg ^ 2)
}
mod_mtcars(mtcars, add_prefix("foo"))

We may need to escape the input (keeping the OP's original function unchanged)
mod_mtcars(mtcars, !!add_prefix("foo"))
-output
mpg cyl disp hp drat wt qsec vs am gear carb prefix_foo
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 441.00
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 441.00
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 519.84
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 457.96
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 349.69
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 327.61
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 204.49
...

Related

Using glue-like constructs on RHS in R/Tidyeval

I've spent hours trying to make glue on the RHS of a formula work and out of clues. Here is a simple reprex.
meta <- function(x, var, suffix){
x<- x %>% mutate("{{var}}_{suffix}":= 5)
x<- x %>% mutate("{{var}}_{suffix}_new":= {{var}} - "{{var}}_{suffix}")
}
x<- meta(mtcars, mpg, suf)
#Should be equivalent to
x<- mtcars %>% mutate(mpg_suf:= 5)
x<- x%>% mutate(mpg_suf_new:= mpg - mpg_suf)
#N: Tried https://stackoverflow.com/questions/70427403/how-to-correctly-glue-together-prefix-suffix-in-a-function-call-rhs but none of the methods in it worked, unfortunately
Meta function gives me "Error in local_error_context(dots = dots, .index = i, mask = mask) :
promise already under evaluation: recursive default argument reference or earlier problems? "
Went over all hits for the searchwords for it on SO but nothing worked at the moment.
Would really appreciate any insights. Thank you!
Here is a working version:
meta <- function(x, var, suffix){
new_name <- rlang::englue("{{ var }}_{{ suffix }}")
x %>%
mutate("{new_name}" := 5) %>%
mutate("{new_name}_new" := {{ var }} - .data[[new_name]])
}
names(meta(mtcars, mpg, suf))
#> [1] "mpg" "cyl" "disp" "hp"
#> [5] "drat" "wt" "qsec" "vs"
#> [9] "am" "gear" "carb" "mpg_suf"
#> [13] "mpg_suf_new"
To understand what is going on:
Learn about the difference between "{{ var }}" and "{var}" in tidyeval glue strings: https://rlang.r-lib.org/reference/glue-operators.html
Learn about englue() to create glue strings outside of the LHS of :=: https://rlang.r-lib.org/reference/englue.html. This part is not necessary but I thought it was nicer to create and reuse a variable.
Tricky part, you create a new column with a constructed name and then want to use the new column that this name refers to. You'll have to subset it with .data, see: https://rlang.r-lib.org/reference/dot-data.html
See also the general topic: https://rlang.r-lib.org/reference/topic-data-mask-programming.html
I think it's best if we define the pieces we need first, then we can use them as needed on the LHS or the RHS of the calculation. I will add that it doesn't make much sense to me to pass the suffix argument as a bare name. I think it would be a clearer choice to make it string only.
library(dplyr)
meta <- function(x, var, suffix) {
var <- rlang::as_name(enquo(var))
suffix <- rlang::as_name(enquo(suffix)) # Remove this to make "suffix" string only.
new_var <- glue::glue("{var}_{suffix}")
x %>%
mutate("{new_var}" := 5,
"{new_var}_new" := !!sym(var) - !!sym(new_var))
}
mtcars %>%
head() %>%
meta(mpg, suf)
mpg cyl disp hp drat wt qsec vs am gear carb mpg_suf mpg_suf_new
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 5 16.0
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 5 16.0
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 5 17.8
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 5 16.4
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 5 13.7
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 5 13.1

How to use a short script to eliminate all but one duplicate column variables based on the prefix of the colname

I want to know to use a short script to eliminate all but one duplicate column variables based on the prefix of the colname without inputting the variables I want to remove by hand.
For example, I created repeats of the mtcars$am variables, called am1, am2, am3, and am4 in a data frame called mtcars_example_2. I removed the original am variable in the mtcars_example_2 data frame.
I can use the script below to eliminate all variables with the prefix "am" but the am1 variable into a new variable called mtcars_example_3 using the code below, which inputs all variables to remove by hand:
## long way of removing all variable with am prefix that were not am1
mtcars_example_3 <-
mtcars_example_2 %>%
select(
-c(
"am2", "am3", "am4"
)
)
But this seems like the long way of doing this. Is there a faster way that does not require me to individual type in the names of each of the variables that I want to remove from the data.
Is this possible? If so, how can this be done?
Thanks ahead of time.
Here is the code for the example:
# example data
## loads packages
library(tidyverse)
## creates mtcars_example data
mtcars_example_1 <- data.frame(mtcars)
mtcars_example_2 <- data.frame(mtcars_example_1)
## creates duplicate variables, based on am variable
mtcars_example_2$am1 <- mtcars_example_1$am
mtcars_example_2$am2 <- mtcars_example_1$am
mtcars_example_2$am3 <- mtcars_example_1$am
mtcars_example_2$am4 <- mtcars_example_1$am
## removes original variable
mtcars_example_2 <-
mtcars_example_2 %>%
select(
-c(
"am"
)
)
## long way of removing all variable with am prefix that were not am1
mtcars_example_3 <-
mtcars_example_2 %>%
select(
-c(
"am2", "am3", "am4"
)
)
You can remove all the variables that start with am but keep am1 :
library(dplyr)
mtcars_example_2 %>% select(-starts_with('am'), am1) %>% head
# mpg cyl disp hp drat wt qsec vs gear carb am1
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 4 4 1
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 4 4 1
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 4 1 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 3 1 0
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 3 2 0
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 3 1 0
Depending on your actual scenario you can also use regex to remove columns.
mtcars_example_2 %>% select(-matches('am[2-4]')) %>% head
We could also do
library(dplyr)
mtcars_example_2 %>%
select(-contains('am'), am1)

%>% .$column_name equivalent for R base pipe |>

I frequently use the dplyr piping to get a column from a tibble into a vector as below
iris %>% .$Sepal.Length
iris %>% .$Sepal.Length %>% cut(5)
How can I do the same using the latest R built-in pipe symbol |>
iris |> .$Sepal.Length
iris |> .$Sepal.Length |> cut(5)
Error: function '$' not supported in RHS call of a pipe
We can use getElement().
iris |> getElement('Sepal.Length') |> cut(5)
In base pipe no placeholder is provided for the data that is passed in the pipe. This is one difference between magrittr pipe and base R pipe. You may use an anonymous function to access the object.
iris |> {\(x) x$Sepal.Length}()
The direct usage of $ in |> is currently disabled. If the call of $ or other disabled functions in |> is still needed, an option, beside the creation of a function is to use $ via the function :: as base::`$` or place it in brakes ($):
iris |> (`$`)("Sepal.Length")
iris |> base::`$`("Sepal.Length")
iris |> (\(.) .$Sepal.Length)()
fun <- `$`
iris |> fun(Sepal.Length)
This will also work in cases where more than one column will be extracted.
iris |> (`[`)(c("Sepal.Length", "Petal.Length"))
Another option can be the use of a bizarro pipe ->.;. Some call it a joke others clever use of existing syntax.
iris ->.; .$Sepal.Length
This creates or overwrites . in the .GlobalEnv. rm(.) can be used to remove it. Alternatively it could be processed in local:
local({iris ->.; .$Sepal.Length})
In this case it produces two same objects in the environment iris and . but as long as they are not modified they point the the same address.
tracemem(iris)
#[1] "<0x556871bab148>"
tracemem(.)
#[1] "<0x556871bab148>"
|> is used as a pipe operator in R.
The left-hand side expression lhs is inserted as the first free argument in the call of to the right-hand side expression rhs.
mtcars |> head() # same as head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
mtcars |> head(2) # same as head(mtcars, 2)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
It is also possible to use a named argument with the placeholder _ in the rhs call to specify where the lhs is to be inserted. The placeholder can only appear once on the rhs.
mtcars |> lm(mpg ~ disp, data = _)
#mtcars |> lm(mpg ~ disp, _) #Error: pipe placeholder can only be used as a named argument
#Call:
#lm(formula = mpg ~ disp, data = mtcars)
#
#Coefficients:
#(Intercept) disp
# 29.59985 -0.04122
Alternatively explicitly name the argument(s) before the "one":
mtcars |> lm(formula = mpg ~ disp)
In case the placeholder is used more than once or used as a named or also unnamed argument on any position or for disabled functions: Use an (anonymous) function.
mtcars |> (\(.) .[.$cyl == 6,])()
#mtcars ->.; .[.$cyl == 6,] # Alternative using bizarro pipe
#local(mtcars ->.; .[.$cyl == 6,]) # Without overwriting and keeping .
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
mtcars |> (\(.) lm(mpg ~ disp, .))()
#Call:
#lm(formula = mpg ~ disp, data = .)
#
#Coefficients:
#(Intercept) disp
# 29.59985 -0.04122
An expression written as x |> f(y) is parsed as f(x, y). While the code in a pipeline is written sequentially, regular R semantics for evaluation apply. So piped expressions will be evaluated only when first used in the rhs expression.
Interesting example and great answers, let me add another version: I use usually selectand then unlist in such cases. This follows the "speaking R" paradigm and works same with both operators %>% and |>:
library("dplyr")
iris %>% select(Sepal.Length) %>% unlist() %>% cut(5)
iris |> select(Sepal.Length) |> unlist() |> cut(5)
Note that select is from dplyr and pull brought in from #jpdugo17 is even better.
If we use usual "base R" indexing, it is also short and works in both worlds:
iris[["Sepal.Length"]] |> cut(5)
iris$Sepal.Length |> cut(5)
and thanks to the comment of #zx8754 one can of course also use base R without any pipes
cut(iris$Sepal.Length, 5)
... but I think that the OP just wanted to point out differences in piping. I guess that it is to be applied in a bigger context and iris is only an example.
This is also an option:
iris |> dplyr::pull(Sepal.Length) |> cut(5)
Edit:
I wonder why calling a function with backticks isn't allowed.
iris |> `[`(, 'Sepal.Length')
#>Error: function '[' not supported in RHS call of a pipe
As pointed out by #Hugh, backticks are allowed but some functions are not.
Here's the blacklisted functions list extracted from wch Github
"if", "while", "repeat", "for", "break", "next", "return", "function",
"(", "{",
"+", "-", "*", "/", "^", "%%", "%/%", "%*%", ":", "::", ":::", "?", "|>",
"~", "#", "=>",
"==", "!=", "<", ">", "<=", ">=",
"&", "|", "&&", "||", "!",
"<-", "<<-", "=",
"$", "[", "[[",
"$<-", "[<-", "[[<-",
0
I know this question is closed. Other Base R solutions where we use symbol name instead of the character name might include:
iris |>
with(Sepal.Length)
iris |>
subset(select = Sepal.Length)
Since R 4.2.0, you can use _ as a placeholder for |>. Because "functions in rhs calls [can] not be syntactically special", you cannot use $ directly, so you have to define the function with another name first, and then use the placeholder and the column name:
set <- `$`
iris |> set(x = _, Sepal.Length)

Apply variable function to columns in data.table

I'm wondering if there's a way to apply a function in a string variable to .SD cols in a data.table.
I can generalize all other parts of function calls using a data.table, including input and output columns, which I'm very happy about. But the final piece seems to be applying a variable function to a data.table, which is something I believe I've done before with dplyr and do.call.
mtcars <- as.data.table(mtcars)
returnNames <- "calculatedColumn"
SDnames <- c("mpg","hp")
myfunc <- function(data) {
print(data)
return(data[,1]*data[,2])
}
This obviously works:
mtcars[,eval(returnNames) := myfunc(.SD),.SDcols = SDnames,by = cyl]
But if I want to apply a dynamic function, something like this does not work:
functionCall <- "myfunc"
mtcars[,eval(returnNames) := lapply(.SD,eval(functionCall)),.SDcols = SDnames,by = cyl]
I get this error:
Error in `[.data.table`(mtcars, , `:=`(eval(returnNames), lapply(.SD, : attempt to apply non-function
Is using "apply" with "eval" the right idea, or am I on the wrong track entirely?
You don't want lapply. Since myfunc takes a data.table with multiple columns, you just want to feed such a data table into the function as one object.
To get the function you need get instead of eval
On the left-hand-side of :=, you can just put the character vector in parentheses, eval isn't needed
-
mtcars[, (returnNames) := get(functionCall)(.SD)
, .SDcols = SDnames
, by = cyl]
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb calculatedColumn
# 1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 2310.0
# 2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 2310.0
# 3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 2120.4
# 4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 2354.0
# 5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 3272.5
# 6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 1900.5
The code above was run after the following code
mtcars <- as.data.table(mtcars)
returnNames <- "calculatedColumn"
SDnames <- c("mpg","hp")
myfunc <- function(data) {
print(data)
return(data[,1]*data[,2])
}
functionCall <- "myfunc"

use get() and eval() to pass argument in dplyr functions

I'm trying to write my function and need to pass argument inside.
Use mtcars dataset as an example:
get.param <- function(data, var){
data %>% select(eval(var)) %>%
head()
}
get.param(mtcars, 'hp')
In the above function, replacing eval() with get() gave me error.
I'm little bit confused which one should I use. I use get() i some other functions and work. What is the difference between these two?
You can get it to work via
get.param <- function(data, var){
var <- enquo(var)
data %>% select(!!var) %>%
head()
}
get.param(mtcars, hp)
hp
Mazda RX4 110
Mazda RX4 Wag 110
Datsun 710 93
Hornet 4 Drive 110
Hornet Sportabout 175
Valiant 105
Normally one does not use get or eval with dplyr. See the vignette in the rlang package for how it is done with that package; however, in this particular case one can just pass var directly to select adding parentheses around it so that it does not confuse it with a column called "var" should it exist. If you are not worried about that edge case you could omit the parentheses.
get.param <- function(data, var) {
data %>% select((var)) %>% head
}
get.param(mtcars, 'hp')
giving:
hp
Mazda RX4 110
Mazda RX4 Wag 110
Datsun 710 93
Hornet 4 Drive 110
Hornet Sportabout 175
Valiant 105
Another possibility is to use ... like this and giving the same answer. In this variation we don't need to add the parentheses to eliminate an edge case. It also allows multiple columns to be specified.
get.param <- function(data, ...) {
data %>% select(...) %>% head
}
get.param(mtcars, 'hp')

Resources