Somewhat related to Tidy evaluation programming with dplyr::case_when and Making tidyeval function inside case_when, I want to create strings (using a shiny app) to be parsed later inside a case_when function. Here's an example:
library(tidyverse)
# simulated shiny inputs
new_column = sym("COL_NAME")
number_of_categories = 3
col1_text = "Big"
col1_min = 7.0
col1_max = 8.0
col2_text = "Medium"
col2_min = 5.0
col2_max = 6.9
col3_text = "Small"
col3_max = 4.9
col3_min = 4.0
columninput = sym("Sepal.Length")
DESIRED OUTPUT
iris %>%
mutate(new_column =
case_when(
!!columninput >= col1_min & !!columninput <= col1_max ~ col1_text,
!!columninput >= col2_min & !!columninput <= col2_max ~ col2_text,
!!columninput >= col3_min & !!columninput <= col3_max ~ col3_text
)
)
Because the only thing changing between functions is the index, I was thinking we can use the general pattern to create a string
# create single string
my_string <-function(i) {
paste0("!!", columninput, " >= col", i, "_min & ", "!!", columninput, " <= col", i, "_max ~ col", i, "_text")
}
Then repeat the string for the dynamic number of cases
mega_string <- map_chr(1:number_of_categories, ~ my_string(.x))
TODO:
This is the part I cant quite piece together: using those strings as the arguments within a case_when.
# evaluate somehow?
iris %>%
mutate(
new_column = case_when(
# tidyeval mega_string?
paste(mega_string, collapse = "," )
)
)
Is this even the right approach? How else would you go about solving this - any help high level or otherwise is greatly appreciated!
We could create an expression and evaluate
library(dplyr)
library(stringr)
iris %>%
mutate(new_column = eval(rlang::parse_expr(str_c('case_when(',
str_c(mega_string, collapse=","), ')'))))
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_column
#1 5.1 3.5 1.4 0.2 setosa Medium
#2 4.9 3.0 1.4 0.2 setosa Small
#3 4.7 3.2 1.3 0.2 setosa Small
#4 4.6 3.1 1.5 0.2 setosa Small
#5 5.0 3.6 1.4 0.2 setosa Medium
#6 5.4 3.9 1.7 0.4 setosa Medium
#7 4.6 3.4 1.4 0.3 setosa Small
#8 5.0 3.4 1.5 0.2 setosa Medium
#9 4.4 2.9 1.4 0.2 setosa Small
#10 4.9 3.1 1.5 0.1 setosa Small
# ...
Or using parse_expr with !!!
library(purrr)
iris %>%
mutate(new_column = case_when(!!! map(mega_string, rlang::parse_expr)))
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_column
#1 5.1 3.5 1.4 0.2 setosa Medium
#2 4.9 3.0 1.4 0.2 setosa Small
#3 4.7 3.2 1.3 0.2 setosa Small
#4 4.6 3.1 1.5 0.2 setosa Small
#5 5.0 3.6 1.4 0.2 setosa Medium
#6 5.4 3.9 1.7 0.4 setosa Medium
#7 4.6 3.4 1.4 0.3 setosa Small
#8 5.0 3.4 1.5 0.2 setosa Medium
#...
thx for the nice question and answer.
I'm using in same context (shiny).
I'd like to mention another approach that suits my needs better, and that I find more easy to read the logic off: rather than passing variables in the string to be evaluated you directly pass the values in the string coming from a tibble and str_glue_data
mega <- tribble(
~min, ~max, ~size,
7, 8, "Big",
5, 6.9, "Medium",
4.9, 4, "Small"
) %>%
str_glue_data("Sepal.Length >= {min} & Sepal.Length <= {max} ~ '{size}'")
iris %>%
mutate(new_column = case_when(!!! map(mega, rlang::parse_expr)))
Related
I want to rename multiple columns that starts with the same string.
However, all the codes I tried did not change the columns.
For example this:
df %>% rename_at(vars(matches('^oldname,\\d+$')), ~ str_replace(., 'oldname', 'newname'))
And also this:
df %>% rename_at(vars(starts_with(oldname)), funs(sub(oldname, newname, .))
Are you familiar with a suitable code for rename?
Thank you!
Take iris for example, you can use rename_with() to replace those column names started with "Petal" with a new string.
head(iris) %>%
rename_with(~ sub("^Petal", "New", .x), starts_with("Petal"))
Sepal.Length Sepal.Width New.Length New.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
You can also use rename_at() in this case, although rename_if(), rename_at(), and rename_all() have been superseded by rename_with().
head(iris) %>%
rename_at(vars(starts_with("Petal")), ~ sub("^Petal", "New", .x))
I feel I have a simple question, but I cannot get my code to work. In short, I want the condition statement in a subset() function to be a string. This mostly works, except for the logical operator. So I would want something like this;
my.string = "gender == female"
Subsequently I would run;
myData = subset(myData, my.string)
I have tried things like;
myData = subset(myData, parse(text = my.string))
myData = subset(myData, eval(parse(text = my.string)))
But of no avail. The main reason I want to do this, is because I want you to be able to make filter conditions up front in the code, so this would be;
filter.variable[[1]] = "gender"
filter.condition[[1]] = "==" # or %in%
filer.value[[1]] = "female"
i = 1
my.string = paste(filter.variable[[i]],filter.condition[[i]],filter.value[[i]])
This way I do not have to hardwire any filters in R.
Any suggestions are much appreciated,
Alex
We need to have quotes around 'female' i.e. This can be easily done in dQuote
my.string <- paste0('gender == ', dQuote('female', FALSE))
Or can do this with " wrapped
my.string = 'gender== "female"'
and then use that in subset with eval(parse
Using a reproducible example
my.string <- paste0('Species == ', dQuote('setosa', FALSE))
subset(iris, eval(parse(text = my.string)))
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.5 1.4 0.2 setosa
#2 4.9 3.0 1.4 0.2 setosa
#3 4.7 3.2 1.3 0.2 setosa
#4 4.6 3.1 1.5 0.2 setosa
#5 5.0 3.6 1.4 0.2 setosa
#6 5.4 3.9 1.7 0.4 setosa
#7 4.6 3.4 1.4 0.3 setosa
#8 5.0 3.4 1.5 0.2 setosa
# ...
I want to transform multiple columns in a large data.frame at once using across.
As an example I want to make this transformation
library(tidyverse)
iris %>% mutate(Sepal.Length2 = (Sepal.Length^4-min(Sepal.Length^4)) / (max(Sepal.Length^4) - min(Sepal.Length^4)))
but for all columns starting with "Sepal".
I think, I can use this command, but I can't figure how I can add my function.
iris %>% mutate(across(starts_with("Sepal")), ... )
Sorry if it is too trivial, but I don't know what I have to enter into google to find some useful pages.
We can use
library(dplyr)
iris1 <- iris %>%
mutate(across(starts_with("Sepal"),
~ (.^4-min(.^4)) / (max(.^4) - min(.^4)), .names = '{.col}2'))
my_function <- function(x) {
y = x^4-min(x^4)/max(x^4)/min(x^4)
return=y
}
iris %>%
mutate(across(starts_with("Sepal"), my_function))
Output:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 676.5198 150.05983 1.4 0.2 setosa
2 576.4798 80.99733 1.4 0.2 setosa
3 487.9678 104.85493 1.3 0.2 setosa
4 447.7453 92.34943 1.5 0.2 setosa
5 624.9997 167.95893 1.4 0.2 setosa
6 850.3053 231.34143 1.7 0.4 setosa
7 447.7453 133.63093 1.4 0.3 setosa
8 624.9997 133.63093 1.5 0.2 setosa
9 374.8093 70.72543 1.4 0.2 setosa
10 576.4798 92.34943 1.5 0.1 setosa
11 850.3053 187.41343 1.5 0.2 setosa
12 530.8413 133.63093 1.6 0.2 setosa
13 530.8413 80.99733 1.4 0.1 setosa
14 341.8798 80.99733 1.1 0.1 setosa
15 1131.6493 255.99733 1.2 0.2 setosa
.....
I'm looking to use a non-across function from mutate to create multiple columns. My problem is that the variable in the function will change along with the crossed variables. Here's an example:
needs=c('Sepal.Length','Petal.Length')
iris %>% mutate_at(needs, ~./'{col}.Width')
This obviously doesn't work, but I'm looking to divide Sepal.Length by Sepal.Width and Petal.Length by Petal.Width.
I think your needs should be something which is common in both the columns.
You can select the columns based on the pattern in needs and divide the data based on position. !! and := is used to assign name of the new columns.
library(dplyr)
library(rlang)
needs = c('Sepal','Petal')
purrr::map_dfc(needs, ~iris %>%
select(matches(.x)) %>%
transmute(!!paste0(.x, '_divide') := .[[1]]/.[[2]]))
# Sepal_divide Petal_divide
#1 1.457142857 7.000000000
#2 1.633333333 7.000000000
#3 1.468750000 6.500000000
#4 1.483870968 7.500000000
#...
#...
If you want to add these as new columns you can do bind_cols the above with iris.
Here is a base R approach based that the columns you want to divide have a similar name pattern,
res <- sapply(split.default(iris[-ncol(iris)], sub('\\..*', '', names(iris[-ncol(iris)]))), function(i) i[1] / i[2])
iris[names(res)] <- res
head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Petal.Length Sepal.Sepal.Length
#1 5.1 3.5 1.4 0.2 setosa 7.00 1.457143
#2 4.9 3.0 1.4 0.2 setosa 7.00 1.633333
#3 4.7 3.2 1.3 0.2 setosa 6.50 1.468750
#4 4.6 3.1 1.5 0.2 setosa 7.50 1.483871
#5 5.0 3.6 1.4 0.2 setosa 7.00 1.388889
#6 5.4 3.9 1.7 0.4 setosa 4.25 1.384615
This works well, but troublesome.
> library(dplyr)
> mutate(iris, a = paste( Petal.Width, Petal.Length) ) %>>% head
Sepal.Length Sepal.Width Petal.Length Petal.Width Species a
1 5.1 3.5 1.4 0.2 setosa 0.2 1.4
2 4.9 3.0 1.4 0.2 setosa 0.2 1.4
3 4.7 3.2 1.3 0.2 setosa 0.2 1.3
4 4.6 3.1 1.5 0.2 setosa 0.2 1.5
5 5.0 3.6 1.4 0.2 setosa 0.2 1.4
6 5.4 3.9 1.7 0.4 setosa 0.4 1.7
How can I use dplyr's "Select helpers" in paste()?
> mutate(iris, a = paste( starts_with("Petal") ))
Error in mutate_impl(.data, dots) :
wrong result size (0), expected 150 or 1
> mutate_(iris, a = paste( starts_with("Petal") ))
Error in parse(text = x)[[1]] : subscript out of bounds
> mutate_(iris, a = paste( starts_with(Petal) ))
Error in is.string(match) : object 'Petal' not found
> mutate(iris, a = paste( grep("Petal", names(iris), value=T) ))
Error in mutate_impl(.data, dots) :
wrong result size (2), expected 150 or 1
And this did not work.
> mutate(iris, a = paste( names(iris)[base::startsWith(names(iris),"Petal")] ))
Error in mutate_impl(.data, dots) :
wrong result size (2), expected 150 or 1
I made very troublesome function. But it works. Maybe I use this or search more simple good one.
> paste.colprefix <- function(DFNAME, PREFIX){
+ TMP <- eval(parse(text= paste0("grep(\"", PREFIX, "\",names(", DFNAME, "), v=T)")))
+ TMP <- paste0(DFNAME, "$",TMP)
+ TMP <- paste0(TMP, collapse = ",")
+ eval(parse(text= paste0( "paste(", TMP, ")")))
+ }
>
> iris$PetalPaste <- paste.colprefix("iris", "Petal")
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species PetalPaste
1 5.1 3.5 1.4 0.2 setosa 1.4 0.2
2 4.9 3.0 1.4 0.2 setosa 1.4 0.2
3 4.7 3.2 1.3 0.2 setosa 1.3 0.2
4 4.6 3.1 1.5 0.2 setosa 1.5 0.2
5 5.0 3.6 1.4 0.2 setosa 1.4 0.2
6 5.4 3.9 1.7 0.4 setosa 1.7 0.4
>
You can not use select's helper functions in paste function.
Following is the trick with which you can get expected output.
You can filter out column names of the data frame and use them as parameter to your paste function.
To filter out those column names you can use any one of the following technique.
base::startsWith(character vector, Starts with string)
cn <- names(iris)[base::startsWith(names(iris),"Petal")]
stringr::str_detect(character vector, regex to find)
cn <- names(iris)[stringr::str_detect(names(iris), "Petal.*")]
In each of this method, it will return vector of column names which start with "Petal".
Then You can use this as following to get your expected result.
iris$a <- do.call(paste,iris[cn])