Evaluate condition using quasiquotation [duplicate] - r

This question already has answers here:
How to store filter expressions as strings?
(2 answers)
Closed 3 years ago.
I would like to filter a dataframe df based on some filter_phrase using quasiquotation (similar to this question here). However, instead of dynamically setting the column, I would like to evaluate the entire condition:
library(dplyr)
library(rlang)
df <- data.frame(a = 1:5, b = letters[1:5])
filter_phrase <- "a < 4"
df %>% filter(sym(filter_phrase))
The expected output should look like this:
> df %>% filter(a < 4)
a b
1 1 a
2 2 b
3 3 c
Any help is greatly appreciated.

An option would be parse_expr. The 'filter_phrase' is an expression as a string. We can convert it to langauge class with parse_expr and then evaluate with (!!)
library(dplyr)
df %>%
filter(!! rlang::parse_expr(filter_phrase))
# a b
#1 1 a
#2 2 b
#3 3 c

Related

In R: is there an elegant way to split data.frame row by "," and add to existing rows matching the splitted strings? [duplicate]

This question already has answers here:
How can I calculate the sum of comma separated in the 2nd column
(3 answers)
Closed 2 years ago.
I have a data frame:
df <- data.frame(sample_names=c("foo","bar","foo, bar"), sample_values=c(1,5,3))
df
sample_names sample_values
1 foo 1
2 bar 5
3 foo, bar 3
and I want a resulting data.frame of the following shape:
sample_names sample_values
1 foo 4
2 bar 8
Is there a elegant way to achieve this? My workaround would be to grep by "," and somehow fidly add the result to the existing rows. Since I want to apply this on multiple dataframes, I'd like to come up with an easier solution. Any ideas for this?
We can use separate_rows to split the column, then do a group by operation to get the sum
library(dplyr)
library(tidyr)
df %>%
separate_rows(sample_names) %>%
group_by(sample_names) %>%
summarise(sample_values = sum(sample_values), .groups = 'drop')
-output
# A tibble: 2 x 2
# sample_names sample_values
# <chr> <dbl>
#1 bar 8
#2 foo 4
Or with base R by splitting the column with strsplit into a list of vectors, then use tapply to do a group by sum
lst1 <- strsplit(df$sample_names, ",\\s+")
tapply(rep(df$sample_values, lengths(lst1)), unlist(lst1), FUN = sum)

Usings multiple conditions in select (dplyr)

I want to choose certain columns of a dataframe with dplyr::select() using contains() more than ones. I know there are other ways to solve it, but I wonder whether this is possible inside select(). An example:
df <- data.frame(column1= 1:10, col2= 1:10, c3= 1:10)
library(dplyr)
names(select(df, contains("col") & contains("1")))
This gives an error but I would like the function to give "column1".
I expected that select() would allow a similiar appraoch as filter() where we can set multiple conditions with operators, i.e. something like filter(df, column1 %in% 1:5 & col2 != 2).
EDIT
I notice that my question is more general and I wonder whether it is possible to pass any combinations in select(), like select(df, contains("1") | !starts_with("c")), and so on. But can't figure out how to make such a function.
You can use select_if and grepl
library(dplyr)
df %>%
select_if(grepl("col", names(.)) & grepl(1, names(.)))
# column1
#1 1
#2 2
#3 3
#4 4
#5 5
#6 6
#7 7
#8 8
#9 9
#10 10
If you want to use select with contains you could do something like this:
df %>%
select(intersect(contains("col"), contains("1")))
This can be combined in other ways, as mentioned in the comments:
df %>%
select(intersect(contains("1"), starts_with("c")))
You can also chain two select calls:
library(dplyr)
df <- data.frame(column1 = 1:10, col2 = 1:10, c3 = 1:10)
df %>%
select(contains("col")) %>%
select(contains("1"))
Not too elegant for one-line lovers
You could use the dplyr::intersect function
select(df, intersect(contains("col"), contains("1")))

How can I use gather function to manipulate my data frame? [duplicate]

This question already has answers here:
Collapse / concatenate / aggregate a column to a single comma separated string within each group
(6 answers)
Closed 3 years ago.
I Have a data frame as follows:
df <- data.frame(Name = c("a","c","d","b","f","g","h"), group = c(2,1,2,3,1,3,1))
Name group
a 2
c 1
d 2
b 3
f 1
g 3
h 1
I would like to use gather function from tidyverse package to reshape my data frame to the following format.
group Name total
1 c,f,h 3
2 a,d 2
3 b,h 2
Do you know how can I do this?
Thanks,
We can group by 'group' and paste the elements of 'Name' with toString, while getting the total number of elements with n()
library(dplyr)
df %>%
group_by(group) %>%
summarise(Name = toString(Name), total = n())

In R, how do you compare two columns with a regex, row-by row?

I have a data frame (a tibble, actually) df, with two columns, a and b, and I want to filter out the rows in which a is a substring of b. I've tried
df %>%
dplyr::filter(grepl(a,b))
but I get a warning that seems to indicate that R is actually applying grepl with the first argument being the whole column a.
Is there any way to apply a regular expression involving two different columns to each row in a tibble (or data frame)?
If you're only interested in by-row comparisons, you can use rowwise():
df <- data.frame(A=letters[1:5],
B=paste0(letters[3:7],letters[c(2,2,4,3,5)]),
stringsAsFactors=F)
df %>%
rowwise() %>%
filter(grepl(A,B))
A B
1 b db
2 e ge
---------------------------------------------------------------------------------
If you want to know whether row-entry of A is in all of B:
df %>% rowwise() %>% filter(any(grepl(A,df$B)))
A B
1 b db
2 c ed
3 d fc
4 e ge
Or using base R apply and #Chi-Pak's reproducible example
df <- data.frame(A=letters[1:5],
B=paste0(letters[3:7],letters[c(2,2,4,3,5)]),
stringsAsFactors=F)
matched <- sapply(1:nrow(df), function(i) grepl(df$A[i], df$B[i]))
df[matched, ]
Result
A B
2 b db
5 e ge
You can use stringr::str_detect, which is vectorised over both string and pattern. (Whereas, as you noted, grepl is only vectorised over its string argument.)
Using #Chi Pak's example:
library(dplyr)
library(stringr)
df %>%
filter(str_detect(B, fixed(A)))
# A B
# 1 b db
# 2 e ge

use dplyr to concatenate a column [duplicate]

This question already has answers here:
Collapse / concatenate / aggregate a column to a single comma separated string within each group
(6 answers)
Closed 5 years ago.
I have a data_frame where I would like vector to be the concatenation of elements in A. So
df <- data_frame(id = c(1, 1, 2, 2), A = c("a", "b", "b", "c"))
df
Source: local data frame [4 x 2]
id A
1 1 a
2 1 b
3 2 b
4 2 c
Should become
newdf
Source: local data frame [4 x 2]
id vector
1 1 "a b"
2 2 "b c"
My first inclination is to use paste() inside summarise but this doesn't work.
df %>% group_by(id) %>% summarise(paste(A))
Error: expecting a single value
Hadley and Romain talk about a similar issue in the GitHub issues, but I can't quite see how that applies directly. It seems like there should be a very simple solution, especially because paste() usually does return a single value.
You need to collapse the values in paste
df %>% group_by(id) %>% summarise(vector=paste(A, collapse=" "))
My data frame was as:
col1 col2
1 one
1 one more
2 two
2 two
3 three
I needed to summarise it as follows:
col1 col3
1 one, one more
2 two
3 three
This following code did the trick:
df <- data.frame(col1 = c(1,1,2,2,3), col2 = c("one", "one more", "two", "two", "five"))
df %>%
group_by(col1) %>%
summarise( col3 = toString(unique(col2)))

Resources