Passing data variable to function environmental variable as object or string - r

Take this toy function - just a wrapper around dplyr::select:
select_col <- function(df, chosen_col) {
if(is.character(enquo(chosen_col))){
df %>%
select(all_of(chosen_col))
} else {
df %>%
select({{ chosen_col }})
}
}
What it does is allows the selection of a column passed to chosen_col, regardless of whether that column is presented as a data variable masked as an environmental variable or a string masked as an environmental variable.
Both of these return the same thing:
mtcars %>%
select_col(chosen_col = mpg) %>%
head(2)
mpg
Mazda RX4 21
Mazda RX4 Wag 21
mtcars %>%
select_col(chosen_col = "mpg") %>%
head(2)
mpg
Mazda RX4 21
Mazda RX4 Wag 21
While select_col works, what I want is something more like this doesn't work:
select_col_desired <- function(df, chosen_col) {
if(is.character(enquo(chosen_col))){
chosen_col <- convert_to_env_variable(chosen_col)
}
df %>%
select({{ chosen_col }})
}
What can I use in place of the non-existent function convert_to_env_variablesuch that select_col_desired returns the same things as select_col?
I am aware that select does this already outside a function.

Solution 1: Single Column
If you truly want something like convert_to_env_variable(), you can just use rlang::enquo() to begin with, and skip the conditional.
select_col_desired <- function(df, chosen_col) {
chosen_col <- enquo(chosen_col)
df %>%
select(!!chosen_col)
}
Heck, you could just skip the enquo() and !! altogether:
select_col_desired <- function(df, chosen_col) {
df %>%
select({{ chosen_col }})
}
This is because the "curly-curly" operator {{ first captures the argument passed to chosen_col and immediately defuses it as an expression; before substituting the result in its own place, here to be evaluated as a data-variable in the context of df.
Result
Either way, you'll get this:
# With column as unquoted symbol.
mtcars %>%
select_col_desired(chosen_col = mpg) %>%
head(2)
#> mpg
#> Mazda RX4 21
#> Mazda RX4 Wag 21
# With column as string.
mtcars %>%
select_col_desired(chosen_col = "mpg") %>%
head(2)
#> mpg
#> Mazda RX4 21
#> Mazda RX4 Wag 21
Solution 2: Arbitrary Selection
For a more general solution, you could simply pass your columns as ... to select():
select_any_desired <- function(df, ...) {
df %>% select(...)
}
Result
# With column as unquoted symbol.
mtcars %>%
select_col_desired(mpg) %>%
head(2)
#> mpg
#> Mazda RX4 21
#> Mazda RX4 Wag 21
# With column as string.
mtcars %>%
select_col_desired("mpg") %>%
head(2)
#> mpg
#> Mazda RX4 21
#> Mazda RX4 Wag 21
Note
If you're writing a package, you'll naturally need to #import or qualify functions like rlang::enquo() and magrittr:`%>%`() and dplyr::select(); rather than using library().

Step 1: get the name for the variable
This is done via as.character(substitute(chosen_col)) and we also want to save it for step 2.
varname <- as.character(substitute(chosen_col))
Step 2: assign to a different environment
Assignment can be done using assign(x, value, envir=parent.frame()). Here it is essential to use parent.frame() as assigning environment since we want to assign outside of the function and default behavior is to assign to current environment. Another option here would be to assign to the global environment with envir=globalenv.
This leads us to
assign(varname, x, envir=parent.frame())
where x is containing everything we want to assign. Read this as varname <- x.
Putting both steps
select_col <- function(df, chosen_col) {
if(is.character(enquo(chosen_col))){
x <- df %>%
select(all_of(chosen_col))
} else {
x <- df %>%
select({{ chosen_col }})
}
varname <- as.character(substitute(chosen_col))
assign(varname, x, envir=parent.frame())
x
}
select_col(mtcars, "mpg") %>% head(2)
#> mpg
#> Mazda RX4 21
#> Mazda RX4 Wag 21
head(mpg, 2)
#> mpg
#> Mazda RX4 21
#> Mazda RX4 Wag 21
rm(mpg)
select_col(mtcars, mpg) %>% head(2)
#> mpg
#> Mazda RX4 21
#> Mazda RX4 Wag 21
head(mpg, 2)
#> mpg
#> Mazda RX4 21
#> Mazda RX4 Wag 21

Related

passing column names in functions as strings and evaluate with mutate

I see gobs of posts about passing column names as strings to function but none of them consider this use case. All the methods I see don't work. Here is one. Please compare the what_I_want column to the what_I_get column below. I want the value of items in the column, not the column name, of course. Thanks.
library(dplyr)
Fun <- function(df,column) {
df %>%
mutate(what_I_want = cyl) %>%
# current best practice? Doen't work in this case.
mutate(what_I_get := {{column}})
}
mtcars[1:2,1:3] %>% Fun("cyl")
#> mpg cyl disp what_I_want what_I_get
#> Mazda RX4 21 6 160 6 cyl
#> Mazda RX4 Wag 21 6 160 6 cyl
Created on 2022-11-07 with reprex v2.0.2
Just add get
Fun <- function(df,column) {
df %>%
mutate(what_I_want = get(column) )
}
mtcars[1:2,1:3] %>% Fun("cyl")
mpg cyl disp what_I_want
Mazda RX4 21 6 160 6
Mazda RX4 Wag 21 6 160 6
We may use ensym which can take both quoted as well as unquoted column name
Fun <- function(df,column) {
df %>%
mutate(what_I_want = !! rlang::ensym(column))
}
-testing
> mtcars[1:2,1:3] %>% Fun("cyl")
mpg cyl disp what_I_want
Mazda RX4 21 6 160 6
Mazda RX4 Wag 21 6 160 6
> mtcars[1:2,1:3] %>% Fun(cyl)
mpg cyl disp what_I_want
Mazda RX4 21 6 160 6
Mazda RX4 Wag 21 6 160 6
Using the .data pronoun you could do:
library(dplyr)
Fun <- function(df,column) {
df %>%
mutate(what_I_get = .data[[column]])
}
mtcars[1:2,1:3] %>%
Fun("cyl")
#> mpg cyl disp what_I_get
#> Mazda RX4 21 6 160 6
#> Mazda RX4 Wag 21 6 160 6
For more on the .data pronoun see Data mask programming patterns.

Ignore `NULL` parameters in dplyr::filter

Given:
gear <- NULL
carb <- NULL
mpg <- 20
How would I pass all of these variables to dplyr::filter but have it ignore the NULL arguments and not return an error? This becomes especially problematic when I have 10+ variables in a function, any of which could be NULL, and need to filter a data frame by any possible permutation of the variables. I do not want my function to need a considerable number of if statements in order to run properly.
As expected, this fails but is in-line with what I want to do.
mtcars |>
filter(gear == !!gear & carb == !!carb & mpg == !!mpg)
In practice, I want dplyr to basically evaluate: mtcars |> filter(mpg == !!mpg) because it is the only variable that is not missing.
One way is to customize an operator similar to the operator == but returning TRUE when the second input is NULL.
library(dplyr)
`%==%` <- function (e1, e2) {
if (is.null(e2)) {
return(TRUE)
} else {
return(e1 == e2)
}
}
gear <- NULL
carb <- NULL
mpg <- 21
mtcars |>
filter(gear %==% !!gear & carb %==% !!carb & mpg %==% !!mpg)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4

Dplyr: Conditionally rename multiple variables with regex by name

I need to rename multiple variables using a replacement dataframe. This replacement dataframe also includes regex. I would like to use a similar solution proposed here, .e.g
df %>% rename_with(~ newnames, all_of(oldnames))
MWE:
df <- mtcars[, 1:5]
# works without regex
replace_df_1 <- tibble::tibble(
old = df %>% colnames(),
new = df %>% colnames() %>% toupper()
)
df %>% rename_with(~ replace_df_1$new, all_of(replace_df_1$old))
# with regex
replace_df_2 <- tibble::tibble(
old = c("^m", "cyl101|cyl", "disp", "hp", "drat"),
new = df %>% colnames() %>% toupper()
)
old new
<chr> <chr>
1 ^m MPG
2 cyl101|cyl CYL
3 disp DISP
4 hp HP
5 drat DRAT
# does not work
df %>% rename_with(~ replace_df_2$new, all_of(replace_df_2$old))
df %>% rename_with(~ matches(replace_df_2$new), all_of(replace_df_2$old))
EDIT 1:
The solution of #Mael works in general, but there seems to be index issue, e.g. consider the following example
replace_df_2 <- tibble::tibble(
old = c("xxxx", "cyl101|cyl", "yyy", "xxx", "yyy"),
new = mtcars[,1:5] %>% colnames() %>% toupper()
)
mtcars[, 1:5] %>%
rename_with(~ replace_df_2$new, matches(replace_df_2$old))
Results in
mpg MPG disp hp drat
<dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9
meaning that the rename_with function correctly finds the column, but replaces it with the first item in the replacement column. How can we tell the function to take the respective row where a replacement has been found?
So in this example (edit 1), I only want to substitute the second column with "CYL", the rest should be left untouched. The problem is that the function takes the first replacement (MPG) instead of the second (CYL).
Thank you for any hints!
matches should be on the regex-y column:
df %>%
rename_with(~ replace_df_2$new, matches(replace_df_2$old))
MPG CYL DISP HP DRAT
Mazda RX4 21.0 6 160.0 110 3.90
Mazda RX4 Wag 21.0 6 160.0 110 3.90
Datsun 710 22.8 4 108.0 93 3.85
Hornet 4 Drive 21.4 6 258.0 110 3.08
Hornet Sportabout 18.7 8 360.0 175 3.15
Valiant 18.1 6 225.0 105 2.76
#...
If the task is simply to set all col names to upper-case, then this works:
sub("^(.+)$", "\\U\\1", colnames(df), perl = TRUE)
[1] "MPG" "CYL" "DISP" "HP" "DRAT"
In dplyr:
df %>%
rename_with( ~sub("^(.+)$", "\\U\\1", colnames(df), perl = TRUE))
I found a solution using the idea of non standard evaluation from this question and #Maƫl's answer.
Using map_lgl we create a logical vector that returns TRUE if the column in replace_df_2$old can be found inside the dataframe df. Then we pass this logical vector to replace_df_2$new to get the correct replacement.
df <- mtcars[, 1:5]
df %>%
rename_with(.fn = ~replace_df_2$new[map_lgl(replace_df_2$old,~ any(str_detect(., names(df))))],
.cols = matches(replace_df_2$old))
Result:
mpg CYL disp hp drat
Mazda RX4 21.0 6 160.0 110 3.90

How to rename a variable with spaces in the name dynamically in dplyr?

I want to rename a variable in my dataframe using dplyr to have spaces but this variable name is a concatenation of a dynamic variable and a static string. In the following example, I'd need "Test1" to be a dynamic variable
df <- mtcars %>% select(`Test1 mpg` = "mpg")
So when I try this, I end up with an error:
var <- "Test1"
df <- mtcars %>% select(paste0(var, " mpg") = "mpg")
How could I go about making those new variable names dynamic?
Using the special assignment operator := you could do:
library(dplyr)
df <- mtcars %>% select(`Test1 mpg` = "mpg")
var <- "Test1"
mtcars %>%
select("{var} mpg" := "mpg")
#> Test1 mpg
#> Mazda RX4 21.0
#> Mazda RX4 Wag 21.0
#> Datsun 710 22.8
#> Hornet 4 Drive 21.4
or using !!sym():
mtcars %>%
select(!!sym(paste(var, " mpg")) := "mpg")
#> Test1 mpg
#> Mazda RX4 21.0
#> Mazda RX4 Wag 21.0
#> Datsun 710 22.8
#> Hornet 4 Drive 21.4

Use pipes in R to set data

Is it possible to use the pipe Operator in R (not to get) but to set data?
Lets say i want to modify the first row of mtcars dataset and set the value of qsec to 99.
Traditional way:
mtcars[1, 7] <- 99
Is that also possible using the pipe Operator?
mtcars %>% filter(qsec == 16.46) %>% select(qsec) <- 99
If we are in a state where the chain is absolute necessary or curious to know whether <- can be applied in a chain
library(magrittr)
mtcars %>%
`[<-`(1, 7, 99) %>%
head(2)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21 6 160 110 3.9 2.620 99.00 0 1 4 4
#Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
Also, inset (from the comments) is an alias for [<-
mtcars %>%
inset(1, 7, 99) %>%
head(2)

Resources