I am trying to write a vectorized function to compare a string to a vector of regular expressions and return TRUE if the string matches any of the regular expressions. The function should work on a single string, a vector of strings, and within dplyr::mutate. I'm having trouble vectorizing the function. In the below reprex, the function I would like to vectorize is is_favorite. I came up with work-arounds using base::Vectorize and purrr::map but these quite accomplish the goal. I also tried using stringi::stri_detect_regex instead of stringr::str_detect which produced the same results.
Thank you so much for your help!
library(dplyr)
library(tibble)
library(stringr)
favorite_cars <- c("^Merc", "Firebird$")
mtcars <-
mtcars %>%
tibble::rownames_to_column("car")
is_favorite <- function(x) {
any(stringr::str_detect(x, favorite_cars))
}
# Success: Works on single string
is_favorite("Merc 240D")
#> [1] TRUE
is_favorite("Ferrari Dino")
#> [1] FALSE
# Failure: On vector, returns a single boolean instead of a vector of booleans
is_favorite(mtcars$car)
#> [1] TRUE
# Failure: Add column with single boolean instead of vectorized rowwise
mtcars %>%
mutate(fav_car = is_favorite(car)) %>%
head()
#> car mpg cyl disp hp drat wt qsec vs am gear carb fav_car
#> 1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 TRUE
#> 2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 TRUE
#> 3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 TRUE
#> 4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 TRUE
#> 5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 TRUE
#> 6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 TRUE
# Success...but not ideal
# https://community.rstudio.com/t/vectorise-text-function-for-dplyr-mutate/53387/2
# https://deanattali.com/blog/mutate-non-vectorized/
# https://www.jimhester.com/post/2018-04-12-vectorize/
v_is_favorite <- Vectorize(is_favorite)
mtcars %>%
mutate(fav_car = v_is_favorite(car)) %>%
head()
#> car mpg cyl disp hp drat wt qsec vs am gear carb fav_car
#> 1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 FALSE
#> 2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 FALSE
#> 3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 FALSE
#> 4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 FALSE
#> 5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 FALSE
#> 6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 FALSE
# Success...better but still not ideal
# https://www.jimhester.com/post/2018-04-12-vectorize/
# http://yoshidk6.hatenablog.com/entry/2018/09/05/222248
mtcars %>%
mutate(fav_car = purrr::map_lgl(car, is_favorite)) %>%
head()
#> car mpg cyl disp hp drat wt qsec vs am gear carb fav_car
#> 1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 FALSE
#> 2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 FALSE
#> 3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 FALSE
#> 4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 FALSE
#> 5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 FALSE
#> 6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 FALSE
Created on 2020-08-28 by the reprex package (v0.3.0)
I also consulted the following resources:
https://adv-r.hadley.nz/perf-improve.html#vectorise
https://deanattali.com/blog/mutate-non-vectorized/
https://www.jimhester.com/post/2018-04-12-vectorize/
https://community.rstudio.com/t/vectorise-text-function-for-dplyr-mutate/53387/2
http://yoshidk6.hatenablog.com/entry/2018/09/05/222248
https://dplyr.tidyverse.org/articles/programming.html
any would always return only one logical value as output. You should collapse your favorite_cars regex as length 1 string.
is_favorite <- function(x) {
stringr::str_detect(x, paste0(favorite_cars, collapse = "|"))
#Will also work with base R grepl
#grepl(paste0(favorite_cars, collapse = "|"), x)
}
and then use :
library(dplyr)
mtcars %>% mutate(fav_car = is_favorite(car))
# car mpg cyl disp hp drat wt qsec vs am gear carb fav_car
#1 Mazda RX4 21.0 6 160.0 110 3.90 2.62 16.5 0 1 4 4 FALSE
#2 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.88 17.0 0 1 4 4 FALSE
#3 Datsun 710 22.8 4 108.0 93 3.85 2.32 18.6 1 1 4 1 FALSE
#4 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.21 19.4 1 0 3 1 FALSE
#5 Hornet Sportabout 18.7 8 360.0 175 3.15 3.44 17.0 0 0 3 2 FALSE
#6 Valiant 18.1 6 225.0 105 2.76 3.46 20.2 1 0 3 1 FALSE
#7 Duster 360 14.3 8 360.0 245 3.21 3.57 15.8 0 0 3 4 FALSE
#8 Merc 240D 24.4 4 146.7 62 3.69 3.19 20.0 1 0 4 2 TRUE
#9 Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2 TRUE
#10 Merc 280 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4 TRUE
#11 Merc 280C 17.8 6 167.6 123 3.92 3.44 18.9 1 0 4 4 TRUE
#...
#...
where the pattern that we are looking for becomes
paste0(favorite_cars, collapse = "|")
#[1] "^Merc|Firebird$"
Related
This question already has answers here:
In R, how to get an object's name after it is sent to a function?
(4 answers)
Closed 1 year ago.
I am trying to create a function which summarises a grouped dataset and then adds a column to identify which variable is being summarised (ID column).
I am not sure how to add the ID column using the curly curly appraoch.
my_fun <- function(dat, var_name){
dat %>%
mutate(id_column = names({{var_name}}))
}
my_fun(mtcars, cyl)
What I want is for the variable name, in this case cyl, to be recycled.
Just, deparse/subsitute at the start
my_fun <- function(dat, var_name){
nm1 <- deparse(substitute(var_name))
dat %>%
mutate(id_column = nm1)
}
-testing
my_fun(mtcars, cyl)
mpg cyl disp hp drat wt qsec vs am gear carb id_column
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 cyl
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 cyl
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 cyl
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 cyl
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 cyl
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 cyl
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 cyl
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 cyl
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 cyl
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 cyl
...
In the tidyverse, it may also be done directly from a symbol i.e. use ensym to convert to symbol and then evaluate (!!) to get the value or convert to string with as_string
my_fun <- function(dat, var_name){
var_name <- rlang::ensym(var_name)
dat %>%
mutate(id_column = rlang::as_string(var_name), val_column = !! var_name)
}
-testing
my_fun(head(mtcars), cyl)
mpg cyl disp hp drat wt qsec vs am gear carb id_column val_column
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 cyl 6
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 cyl 6
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 cyl 4
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 cyl 6
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 cyl 8
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 cyl 6
With mtcars for example, I'd like to create a new column carb_dots such that when carb = 4, carb_dots = "...."
Using dplyr, I've tried
library(dplyr)
mtcars2 <- mtcars %>% mutate(carb_dots = rep(".", carb))
This errors with
Error in mutate_impl(.data, dots) :
Evaluation error: invalid 'times' argument.
What should I do? Thanks for your suggestions.
With the addition of stringr, you can do:
mtcars %>%
mutate(carb_dots = str_dup(".", carb))
mpg cyl disp hp drat wt qsec vs am gear carb carb_dots
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ....
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ....
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 .
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 .
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 ..
6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 .
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 ....
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 ..
9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 ..
10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 ....
We can use strrep
library(dplyr)
mtcars %>%
mutate(carb_dots = strrep(".", carb))
# mpg cyl disp hp drat wt qsec vs am gear carb carb_dots
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ....
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ....
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 .
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 .
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 ..
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 .
#Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 ....
#...
If we need to use rep
mtcars %>%
rowwise %>%
mutate(carb_dots = paste(rep(".", carb), collapse=""))
I'm looking to change values into their opposite (T becomes F, and vice versa) in specific rows in a column in a datatable
I know that x <- !x works for T/F variables but how to finish this dplyr approach:
library(dplyr)
library(datatable)
library(magrittr)
mtcars$selected <- T
mtcars %>% select(selected) %>% slice(c(1,4,5)) %>% mutate(??)
If you just want to subset those rows, then #Shree's answer is likely right. If you want to invert just those rows but otherwise keep all, then something like:
In dplyr:
library(dplyr)
mtcars %>%
mutate(selected = TRUE) %>%
# the heart of the answer
mutate(selected = if_else(row_number() %in% c(1, 4, 5), !selected, selected))
# mpg cyl disp hp drat wt qsec vs am gear carb selected
# 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 FALSE
# 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 TRUE
# 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 TRUE
# 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 FALSE
# 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 FALSE
# 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 TRUE
# 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 TRUE
# 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 TRUE
# ...
You said datatable, I think you meant data.table, in which case
library(data.table)
DT <- as.data.table(mtcars)
DT[, selected := TRUE]
DT[, selected := ifelse(.I %in% c(1, 3, 4), !selected, selected)]
head(DT, n = 8)
# mpg cyl disp hp drat wt qsec vs am gear carb selected
# 1: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 FALSE
# 2: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 TRUE
# 3: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 FALSE
# 4: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 FALSE
# 5: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 TRUE
# 6: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 TRUE
# 7: 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 TRUE
# 8: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 TRUE
Or pipe-wise as
library(magrittr)
DT <- as.data.table(mtcars)
DT %>%
.[, selected := TRUE] %>%
.[, selected := ifelse(.I %in% c(1, 3, 4), !selected, selected)]
head(DT, n = 8)
# mpg cyl disp hp drat wt qsec vs am gear carb selected
# 1: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 FALSE
# 2: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 TRUE
# 3: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 FALSE
# 4: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 FALSE
# 5: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 TRUE
# 6: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 TRUE
# 7: 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 TRUE
# 8: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 TRUE
In Base R, you can do it this way:
mtcars$selected <- TRUE
mtcars$selected[c(1, 3, 4)] <- !mtcars$selected[c(1, 3, 4)]
head(mtcars, n = 8)
# mpg cyl disp hp drat wt qsec vs am gear carb selected
# Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 FALSE
# Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 TRUE
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 FALSE
# Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 FALSE
# Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 TRUE
# Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 TRUE
# Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 TRUE
# Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 TRUE
Here's one way -
mtcars %>%
select(selected) %>%
slice(c(1,4,5)) %>%
mutate(
selected = !selected # or as.logical(1 - selected)
)
How do I transform a string of expression into a quotable expression?
Example:
This is the result I want:
mutate(mtcars,answer=wt+wt)
# mpg cyl disp hp drat wt qsec vs am gear carb answer
# 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 5.240
# 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 5.750
# 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 4.640
# 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 6.430
# 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 6.880
# 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 6.920
...
Here's the function I am writing:
f<-function(df,string_expression){
se<-enexpr(string_expression)
mutate(df,answer=!!se)
}
It will work if I use the following functional call:
f(mtcars,wt+wt)
# mpg cyl disp hp drat wt qsec vs am gear carb answer
# 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 5.240
# 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 5.750
# 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 4.640
# 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 6.430
# 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 6.880
# 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 6.920
...
However, I would like to provide the expression as a string, so I must use the following function call:
f(mtcars,'wt+wt')
# mpg cyl disp hp drat wt qsec vs am gear carb answer
# 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 wt+wt
# 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 wt+wt
# 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 wt+wt
# 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 wt+wt
# 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 wt+wt
...
How do I make it (either change the function definition or function call) to get the result I want?
What I have tried:
I have tried to sym(string_expression) -- didn't work.
I have tried to quo(string_expression) -- didn't work.
Thank you!
You could change your f function to something this:
f<-function(df,string_expression){
mutate(df, answer = eval(parse(text = string_expression)))
}
head(f(mtcars,'wt+wt'))
mpg cyl disp hp drat wt qsec vs am gear carb answer
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 5.24
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 5.75
3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 4.64
4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 6.43
5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 6.88
6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 6.92
While within dplyr workflow I would like to append a row across a selected number of columns.
Desired results
Starting with the mtcarsdata and applying function(s) with the goal of adding string "A" to columns 2:5 the one should arrive at the following results:
mpg cyl disp hp drat wt qsec vs am gear carb
NA A A A A NA NA NA NA NA NA
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
The following criteria were met:
For the columns with available index in vars() call the "A" string was added
For the remaining columns the NA value was provided
Approach
require(dplyr)
mtcars %>%
mutate_at(.cols = vars(2:5),
.funs = add_row(. = "A", .before = 1))
Naturally, this results in an error message:
Error: Unsupported index type: NULL
Hence my question: how can I utilise add_row, or a similar approach, to force value across a set of columns initially passed via vars()?
Side notes
I don't mind doing this via rbind but I would like to keep my %>% workflow:
%>% - receive object
Add something across first row to columns x:y %>%
Add something across first row to columns m:n %>%
Other manipulations
Add the row then update:
mtcars %>%
head %>%
add_row(.before = 1) %>%
mutate_at(.cols = vars(2:5),
funs(ifelse(is.na(.), "A", .)))
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 NA A A A A NA NA NA NA NA NA
# 2 21.0 6 160 110 3.9 2.620 16.46 0 1 4 4
# 3 21.0 6 160 110 3.9 2.875 17.02 0 1 4 4
# 4 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# 5 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# 6 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# 7 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Note: This will add "A" to any row that has NAs.