I feel I have a simple question, but I cannot get my code to work. In short, I want the condition statement in a subset() function to be a string. This mostly works, except for the logical operator. So I would want something like this;
my.string = "gender == female"
Subsequently I would run;
myData = subset(myData, my.string)
I have tried things like;
myData = subset(myData, parse(text = my.string))
myData = subset(myData, eval(parse(text = my.string)))
But of no avail. The main reason I want to do this, is because I want you to be able to make filter conditions up front in the code, so this would be;
filter.variable[[1]] = "gender"
filter.condition[[1]] = "==" # or %in%
filer.value[[1]] = "female"
i = 1
my.string = paste(filter.variable[[i]],filter.condition[[i]],filter.value[[i]])
This way I do not have to hardwire any filters in R.
Any suggestions are much appreciated,
Alex
We need to have quotes around 'female' i.e. This can be easily done in dQuote
my.string <- paste0('gender == ', dQuote('female', FALSE))
Or can do this with " wrapped
my.string = 'gender== "female"'
and then use that in subset with eval(parse
Using a reproducible example
my.string <- paste0('Species == ', dQuote('setosa', FALSE))
subset(iris, eval(parse(text = my.string)))
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.5 1.4 0.2 setosa
#2 4.9 3.0 1.4 0.2 setosa
#3 4.7 3.2 1.3 0.2 setosa
#4 4.6 3.1 1.5 0.2 setosa
#5 5.0 3.6 1.4 0.2 setosa
#6 5.4 3.9 1.7 0.4 setosa
#7 4.6 3.4 1.4 0.3 setosa
#8 5.0 3.4 1.5 0.2 setosa
# ...
Related
I'm looking to use a non-across function from mutate to create multiple columns. My problem is that the variable in the function will change along with the crossed variables. Here's an example:
needs=c('Sepal.Length','Petal.Length')
iris %>% mutate_at(needs, ~./'{col}.Width')
This obviously doesn't work, but I'm looking to divide Sepal.Length by Sepal.Width and Petal.Length by Petal.Width.
I think your needs should be something which is common in both the columns.
You can select the columns based on the pattern in needs and divide the data based on position. !! and := is used to assign name of the new columns.
library(dplyr)
library(rlang)
needs = c('Sepal','Petal')
purrr::map_dfc(needs, ~iris %>%
select(matches(.x)) %>%
transmute(!!paste0(.x, '_divide') := .[[1]]/.[[2]]))
# Sepal_divide Petal_divide
#1 1.457142857 7.000000000
#2 1.633333333 7.000000000
#3 1.468750000 6.500000000
#4 1.483870968 7.500000000
#...
#...
If you want to add these as new columns you can do bind_cols the above with iris.
Here is a base R approach based that the columns you want to divide have a similar name pattern,
res <- sapply(split.default(iris[-ncol(iris)], sub('\\..*', '', names(iris[-ncol(iris)]))), function(i) i[1] / i[2])
iris[names(res)] <- res
head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Petal.Length Sepal.Sepal.Length
#1 5.1 3.5 1.4 0.2 setosa 7.00 1.457143
#2 4.9 3.0 1.4 0.2 setosa 7.00 1.633333
#3 4.7 3.2 1.3 0.2 setosa 6.50 1.468750
#4 4.6 3.1 1.5 0.2 setosa 7.50 1.483871
#5 5.0 3.6 1.4 0.2 setosa 7.00 1.388889
#6 5.4 3.9 1.7 0.4 setosa 4.25 1.384615
Somewhat related to Tidy evaluation programming with dplyr::case_when and Making tidyeval function inside case_when, I want to create strings (using a shiny app) to be parsed later inside a case_when function. Here's an example:
library(tidyverse)
# simulated shiny inputs
new_column = sym("COL_NAME")
number_of_categories = 3
col1_text = "Big"
col1_min = 7.0
col1_max = 8.0
col2_text = "Medium"
col2_min = 5.0
col2_max = 6.9
col3_text = "Small"
col3_max = 4.9
col3_min = 4.0
columninput = sym("Sepal.Length")
DESIRED OUTPUT
iris %>%
mutate(new_column =
case_when(
!!columninput >= col1_min & !!columninput <= col1_max ~ col1_text,
!!columninput >= col2_min & !!columninput <= col2_max ~ col2_text,
!!columninput >= col3_min & !!columninput <= col3_max ~ col3_text
)
)
Because the only thing changing between functions is the index, I was thinking we can use the general pattern to create a string
# create single string
my_string <-function(i) {
paste0("!!", columninput, " >= col", i, "_min & ", "!!", columninput, " <= col", i, "_max ~ col", i, "_text")
}
Then repeat the string for the dynamic number of cases
mega_string <- map_chr(1:number_of_categories, ~ my_string(.x))
TODO:
This is the part I cant quite piece together: using those strings as the arguments within a case_when.
# evaluate somehow?
iris %>%
mutate(
new_column = case_when(
# tidyeval mega_string?
paste(mega_string, collapse = "," )
)
)
Is this even the right approach? How else would you go about solving this - any help high level or otherwise is greatly appreciated!
We could create an expression and evaluate
library(dplyr)
library(stringr)
iris %>%
mutate(new_column = eval(rlang::parse_expr(str_c('case_when(',
str_c(mega_string, collapse=","), ')'))))
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_column
#1 5.1 3.5 1.4 0.2 setosa Medium
#2 4.9 3.0 1.4 0.2 setosa Small
#3 4.7 3.2 1.3 0.2 setosa Small
#4 4.6 3.1 1.5 0.2 setosa Small
#5 5.0 3.6 1.4 0.2 setosa Medium
#6 5.4 3.9 1.7 0.4 setosa Medium
#7 4.6 3.4 1.4 0.3 setosa Small
#8 5.0 3.4 1.5 0.2 setosa Medium
#9 4.4 2.9 1.4 0.2 setosa Small
#10 4.9 3.1 1.5 0.1 setosa Small
# ...
Or using parse_expr with !!!
library(purrr)
iris %>%
mutate(new_column = case_when(!!! map(mega_string, rlang::parse_expr)))
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_column
#1 5.1 3.5 1.4 0.2 setosa Medium
#2 4.9 3.0 1.4 0.2 setosa Small
#3 4.7 3.2 1.3 0.2 setosa Small
#4 4.6 3.1 1.5 0.2 setosa Small
#5 5.0 3.6 1.4 0.2 setosa Medium
#6 5.4 3.9 1.7 0.4 setosa Medium
#7 4.6 3.4 1.4 0.3 setosa Small
#8 5.0 3.4 1.5 0.2 setosa Medium
#...
thx for the nice question and answer.
I'm using in same context (shiny).
I'd like to mention another approach that suits my needs better, and that I find more easy to read the logic off: rather than passing variables in the string to be evaluated you directly pass the values in the string coming from a tibble and str_glue_data
mega <- tribble(
~min, ~max, ~size,
7, 8, "Big",
5, 6.9, "Medium",
4.9, 4, "Small"
) %>%
str_glue_data("Sepal.Length >= {min} & Sepal.Length <= {max} ~ '{size}'")
iris %>%
mutate(new_column = case_when(!!! map(mega, rlang::parse_expr)))
Before my question, here is a little background.
I am creating a general purpose data shaping and charting library for plotting survey data of a particular format.
As part of my scripts, I am using the subset function on my data frame. The way I am working is that I have a parameter file where I can pass this subsetting criteria into my functions (so I don't need to directly edit my main library). The way I do this is as follows:
subset_criteria <- expression(variable1 != "" & variable2 == TRUE)
(where variable1 and variable2 are columns in my data frame, for example).
Then in my function, I call this as follows:
my.subset <- subset(my.data, eval(subset_criteria))
This part works exactly as I want it to work. But now I want to augment that subsetting criteria inside the function, based on some other calculations that can only be performed inside the function. So I am trying to find a way to combine together these subsetting expressions.
Imagine inside my function I create some new column in my data frame automatically, and then I want to add a condition to my subsetting that says that this additional column must be TRUE.
Essentially, I do the following:
my.data$newcolumn <- with(my.data, ifelse(...some condition..., TRUE, FALSE))
Then I want my subsetting to end up being:
my.subset <- subset(my.data, eval(subset_criteria & newcolumn == TRUE))
But it does not seem like simply doing what I list above is valid. I get the wrong solution. So I'm looking for a way of combining these expressions using expression and eval so that I essentially get the combination of all the conditions.
Thanks for any pointers. It would be great if I can do this without having to rewrite how I do all my expressions, but I understand that might be what is needed...
Bob
You should probably avoid two things: using subset in non-interactive setting (see warning in the help pages) and eval(parse()). Here we go.
You can change the expression into a string and append it whatever you want. The trick is to convert the string back to expression. This is where the aforementioned parse comes in.
sub1 <- expression(Species == "setosa")
subset(iris, eval(sub1))
sub2 <- paste(sub1, '&', 'Petal.Width > 0.2')
subset(iris, eval(parse(text = sub2))) # your case
> subset(iris, eval(parse(text = sub2)))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
16 5.7 4.4 1.5 0.4 setosa
17 5.4 3.9 1.3 0.4 setosa
18 5.1 3.5 1.4 0.3 setosa
19 5.7 3.8 1.7 0.3 setosa
20 5.1 3.8 1.5 0.3 setosa
22 5.1 3.7 1.5 0.4 setosa
24 5.1 3.3 1.7 0.5 setosa
27 5.0 3.4 1.6 0.4 setosa
32 5.4 3.4 1.5 0.4 setosa
41 5.0 3.5 1.3 0.3 setosa
42 4.5 2.3 1.3 0.3 setosa
44 5.0 3.5 1.6 0.6 setosa
45 5.1 3.8 1.9 0.4 setosa
46 4.8 3.0 1.4 0.3 setosa
I am looking a way for easy list constructing based on R's tidyeval framework as defined in the rlang package.
Below is what I want to achieve:
a <- "item_name"
b <- "item_value"
identical(
list(!!a := !!b), # list(!!a := b) is of course also fine
list(item_name = "item_value")
)
What I can obtain at the moment is:
list(!!a := !!b)
# output
[[1]]
`:=`(!(!a), !(!b)
Alternatively it can get perhaps a little bit better when adding quosure:
quo(list(!!a := !!b))
# output
<quosure: global>
~list(`:=`("item_name", "item_value"))
Unfortunately I have no idea how to proceed further from here.
In other words I would like to have a similar effect like what we can get in the dplyr package:
transmute(iris, !!a := b)
# first few rows
Sepal.Length Sepal.Width Petal.Length Petal.Width Species item_name
1 5.1 3.5 1.4 0.2 setosa item_value
2 4.9 3.0 1.4 0.2 setosa item_value
3 4.7 3.2 1.3 0.2 setosa item_value
4 4.6 3.1 1.5 0.2 setosa item_value
5 5.0 3.6 1.4 0.2 setosa item_value
6 5.4 3.9 1.7 0.4 setosa item_value
You can use rlang::list2() which supports name-unquoting with := and splicing with !!!.
Note that you shouldn't unquote the argument itself since list2() is not a quoting function, it is just like list() with a few more syntactic features:
a <- "item_name"
b <- "item_value"
list2(!!a := b)
This works well, but troublesome.
> library(dplyr)
> mutate(iris, a = paste( Petal.Width, Petal.Length) ) %>>% head
Sepal.Length Sepal.Width Petal.Length Petal.Width Species a
1 5.1 3.5 1.4 0.2 setosa 0.2 1.4
2 4.9 3.0 1.4 0.2 setosa 0.2 1.4
3 4.7 3.2 1.3 0.2 setosa 0.2 1.3
4 4.6 3.1 1.5 0.2 setosa 0.2 1.5
5 5.0 3.6 1.4 0.2 setosa 0.2 1.4
6 5.4 3.9 1.7 0.4 setosa 0.4 1.7
How can I use dplyr's "Select helpers" in paste()?
> mutate(iris, a = paste( starts_with("Petal") ))
Error in mutate_impl(.data, dots) :
wrong result size (0), expected 150 or 1
> mutate_(iris, a = paste( starts_with("Petal") ))
Error in parse(text = x)[[1]] : subscript out of bounds
> mutate_(iris, a = paste( starts_with(Petal) ))
Error in is.string(match) : object 'Petal' not found
> mutate(iris, a = paste( grep("Petal", names(iris), value=T) ))
Error in mutate_impl(.data, dots) :
wrong result size (2), expected 150 or 1
And this did not work.
> mutate(iris, a = paste( names(iris)[base::startsWith(names(iris),"Petal")] ))
Error in mutate_impl(.data, dots) :
wrong result size (2), expected 150 or 1
I made very troublesome function. But it works. Maybe I use this or search more simple good one.
> paste.colprefix <- function(DFNAME, PREFIX){
+ TMP <- eval(parse(text= paste0("grep(\"", PREFIX, "\",names(", DFNAME, "), v=T)")))
+ TMP <- paste0(DFNAME, "$",TMP)
+ TMP <- paste0(TMP, collapse = ",")
+ eval(parse(text= paste0( "paste(", TMP, ")")))
+ }
>
> iris$PetalPaste <- paste.colprefix("iris", "Petal")
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species PetalPaste
1 5.1 3.5 1.4 0.2 setosa 1.4 0.2
2 4.9 3.0 1.4 0.2 setosa 1.4 0.2
3 4.7 3.2 1.3 0.2 setosa 1.3 0.2
4 4.6 3.1 1.5 0.2 setosa 1.5 0.2
5 5.0 3.6 1.4 0.2 setosa 1.4 0.2
6 5.4 3.9 1.7 0.4 setosa 1.7 0.4
>
You can not use select's helper functions in paste function.
Following is the trick with which you can get expected output.
You can filter out column names of the data frame and use them as parameter to your paste function.
To filter out those column names you can use any one of the following technique.
base::startsWith(character vector, Starts with string)
cn <- names(iris)[base::startsWith(names(iris),"Petal")]
stringr::str_detect(character vector, regex to find)
cn <- names(iris)[stringr::str_detect(names(iris), "Petal.*")]
In each of this method, it will return vector of column names which start with "Petal".
Then You can use this as following to get your expected result.
iris$a <- do.call(paste,iris[cn])