Extract rows where value appears in any of multiple columns

Extract rows where value appears in any of multiple columns - r

Let' say I have two data.frames
name_df = read.table(text = "player_name
a
b
c
d
e
f
g", header = T)
game_df = read.table(text = "game_id winner_name loser_name
1 a b
2 b a
3 a c
4 a d
5 b c
6 c d
7 d e
8 e f
9 f a
10 g f
11 g a
12 f e
13 a d", header = T)
name_df contains a unique list of all the winner_name or loser_name values in game_df. I want to create a new data.frame that has, for each person in the name_df a row if a given name (e.g. a) appears in either the winner_name or loser_name column
So I essentially want to merge game_df with name_df, but the key column (name) can appear in either winner_name or loser_name.
So, for just a and b the final output would look something like:
final_df = read.table(text = "player_name game_id winner_name loser_name
a 1 a b
a 2 b a
a 3 a c
a 4 a d
a 9 f a
a 11 g a
a 13 a d
b 1 a b
b 2 b a
b 5 b c", header = T)

We can loop over the elements in 'name_df' for 'player_name', filter the rows from 'game_df' for either the 'winner_name' or 'loser_name'
library(dplyr)
library(purrr)
map_dfr(setNames(name_df$player_name, name_df$player_name),
~ game_df %>%
filter(winner_name %in% .x|loser_name %in% .x), .id = 'player_name')
Or if there are many columns, use if_any
map_dfr(setNames(name_df$player_name, name_df$player_name),
~ {
nm1 <- .x
game_df %>%
filter(if_any(c(winner_name, loser_name), ~ . %in% nm1))
}, .id = 'player_name')

Dedicated to our teacher and mentor dear #akrun
I think we can also make use of the add_row() function you first taught me the other day. Unbelievable!!!
library(dplyr)
library(purrr)
library(tibble)
game_df %>%
rowwise() %>%
mutate(player_name = winner_name) %>%
group_split(game_id) %>%
map_dfr(~ add_row(.x, game_id = .x$game_id, winner_name = .x$winner_name,
loser_name = .x$loser_name, player_name = .x$loser_name)) %>%
arrange(player_name) %>%
relocate(player_name)
# A tibble: 26 x 4
player_name game_id winner_name loser_name
<chr> <int> <chr> <chr>
1 a 1 a b
2 a 2 b a
3 a 3 a c
4 a 4 a d
5 a 9 f a
6 a 11 g a
7 a 13 a d
8 b 1 a b
9 b 2 b a
10 b 5 b c
# ... with 16 more rows

This can be directly expressed in SQL:
library(sqldf)
sqldf("select *
from name_df
left join game_df on winner_name = player_name or loser_name = player_name")

Without using purrr. I think this is appropriate use case of tidyr::unite with argument remove = F where we can first unite the winners' and losers' names and then use tidyr::separate_rows to split new column into rows.
library(tidyr)
library(dplyr)
game_df %>% unite(Player_name, winner_name, loser_name, remove = F, sep = ', ') %>%
separate_rows(Player_name) %>%
relocate(Player_name) %>%
arrange(Player_name)
# A tibble: 26 x 4
Player_name game_id winner_name loser_name
<chr> <int> <chr> <chr>
1 a 1 a b
2 a 2 b a
3 a 3 a c
4 a 4 a d
5 a 9 f a
6 a 11 g a
7 a 13 a d
8 b 1 a b
9 b 2 b a
10 b 5 b c
# ... with 16 more rows

A Base R approach :
result <- do.call(rbind, lapply(name_df$player_name, function(x)
cbind(plaername = x,
subset(game_df, winner_name == x | loser_name == x))))
rownames(result) <- NULL
result
# playername game_id winner_name loser_name
#1 a 1 a b
#2 a 2 b a
#3 a 3 a c
#4 a 4 a d
#5 a 9 f a
#6 a 11 g a
#7 a 13 a d
#8 b 1 a b
#...
#...

Related

R: expand grid of all possible combinations within groups and apply functions across all the pairs

data <- tibble(time = c(1,1,2,2), a = c(1,2,3,4), b =c(4,3,2,1), c = c(1,1,1,1))
The result will look like this
result <- tibble(
t = c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2),
firm1 = c("a","a","a","b","b","b","c","c","c","a","a","a","b","b","b","c","c","c"),
firm2 = c("a","b","c","a","b","c","a","b","c","a","b","c","a","b","c","a","b","c"),
value = c(6,10,5,10,14,9,5,9,4,14,10,9,10,6,5,9,5,4))
result
The function could be
function(x, y){sum(x, y)}
Basically I am looking for a tidy solution to expand.grid data at each point of time and apply functions across columns. Can anyone help?
I tried this, but I could not have time in front of the pairs.
expected_result<-expand.grid(names(data[-1]), names(data[-1])) %>%
mutate(value = map2(Var1, Var2, ~ fun1(data[.x], data[.y])))
expected_result

Use exand.grid you get all possible combination of columns, split the data by time and apply fun for each row of tmp.
library(dplyr)
library(purrr)
tmp <- expand.grid(firm1 = names(data[-1]), firm2 = names(data[-1]))
fun <- function(x, y) sum(x, y)
result <- data %>%
group_split(time) %>%
map_df(~cbind(time = .x$time[1], tmp,
value = apply(tmp, 1, function(x) fun(.x[[x[1]]], .x[[x[2]]]))))
result
# time firm1 firm2 value
#1 1 a a 6
#2 1 b a 10
#3 1 c a 5
#4 1 a b 10
#5 1 b b 14
#6 1 c b 9
#7 1 a c 5
#8 1 b c 9
#9 1 c c 4
#10 2 a a 14
#11 2 b a 10
#12 2 c a 9
#13 2 a b 10
#14 2 b b 6
#15 2 c b 5
#16 2 a c 9
#17 2 b c 5
#18 2 c c 4
You may also do this in base R -
result <- do.call(rbind, by(data, data$time, function(x) {
cbind(time = x$time[1], tmp,
value = apply(tmp, 1, function(y) fun(x[[y[1]]], x[[y[2]]])))
}))

We may use
library(dplyr)
library(tidyr)
library(purrr)
data1 <- data %>%
group_by(time) %>%
summarise(across(everything(), sum, na.rm = TRUE), .groups = 'drop') %>%
pivot_longer(cols = -time) %>%
group_split(time)
map_dfr(data1, ~ {dat <- .x
crossing(firm1 = dat$name, firm2 = dat$name) %>%
mutate(value = c(outer(dat$value, dat$value, FUN = `+`))) %>%
mutate(time = first(dat$time), .before = 1)})
-output
# A tibble: 18 × 4
time firm1 firm2 value
<dbl> <chr> <chr> <dbl>
1 1 a a 6
2 1 a b 10
3 1 a c 5
4 1 b a 10
5 1 b b 14
6 1 b c 9
7 1 c a 5
8 1 c b 9
9 1 c c 4
10 2 a a 14
11 2 a b 10
12 2 a c 9
13 2 b a 10
14 2 b b 6
15 2 b c 5
16 2 c a 9
17 2 c b 5
18 2 c c 4

Mutate new column with unique values for each list

I have a list here, and I wish to mutate a new column with unique values for each list relative to the mutation. For example, I want to mutate a column named ID as n >= 1.
Naturally, on a dataframe I would do this:
dat %>% mutate(id = row_number())
For a list, I would do this:
dat%>% map(~ mutate(., ID = row_number()))
And I would get an output likeso:
dat <- list(data.frame(x=c("a", "b" ,"c", "d", "e" ,"f" ,"g") ), data.frame(y=c("p", "lk", "n", "m", "g", "f", "t")))
[[1]]
x id
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
6 f 6
7 g 7
[[2]]
y id
1 p 1
2 lk 2
3 n 3
4 m 4
5 g 5
6 f 6
7 t 7
Though, how would I mutate a new column ID such that the row number continues from the first list.
Expected output:
[[1]]
x id
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
6 f 6
7 g 7
[[2]]
y id
1 p 8
2 lk 9
3 n 10
4 m 11
5 g 12
6 f 13
7 t 14

An option is to bind them into a single dataset, create the 'id' with row_number(), split by 'grp', loop over the list and remove any columns that have all NA values
library(dplyr)
library(purrr)
dat %>%
bind_rows(.id = 'grp') %>%
mutate(id = row_number()) %>%
group_split(grp) %>%
map(~ .x %>%
select(where(~ any(!is.na(.))), -grp))
-output
#[[1]]
# A tibble: 7 x 2
# x id
# <chr> <int>
#1 a 1
#2 b 2
#3 c 3
#4 d 4
#5 e 5
#6 f 6
#7 g 7
#[[2]]
# A tibble: 7 x 2
# y id
# <chr> <int>
#1 p 8
#2 lk 9
#3 n 10
#4 m 11
#5 g 12
#6 f 13
#7 t 14
Or an easier approach is to unlist (assuming single column), get the sequence, add a new column with map2
map2(dat, relist(seq_along(unlist(dat)), skeleton = dat),
~ .x %>% mutate(id = .y))
Or using a for loop
dat[[1]]$id <- seq_len(nrow(dat[[1]]))
for(i in seq_along(dat)[-1]) dat[[i]]$id <-
seq(tail(dat[[i-1]]$id, 1) + 1, length.out = nrow(dat[[i]]), by = 1)

R slide window through tibble

I got a simple question that I cannot figure out solutions.
Also, I didn't find an answer that I understand.
Imagine I got this data frame
(ts <- tibble(
+ a = LETTERS[1:10],
+ b = c(rep(1, 5), rep(2,5))
+ ))
# A tibble: 10 x 2
a b
<chr> <dbl>
1 A 1
2 B 1
3 C 1
4 D 1
5 E 1
6 F 2
7 G 2
8 H 2
9 I 2
10 J 2
What I want is simple. I want to build a df with the column b indexing a sliding window which sizes n f the column a.
The output can be something like this:
# A tibble: 8 x 2
b a
<dbl> <chr>
1 1 A B
2 1 B C
3 1 C D
4 1 D E
5 2 F G
6 2 G H
7 2 H I
8 2 I J
I don't care if the column a contains an array (nest values).
I just need a new data frame based on the sliding window.
Since this operation will run in a relational database I'd like a function compatible with DBI-PostgresSQL.
Any help is appreciated.
Thanks in advance

We can group by 'b', create the new column based on the lead of 'a', remove the NA rows with na.omit
library(dplyr)
ts %>%
group_by(b) %>%
mutate(a2 = lead(a)) %>%
ungroup %>%
na.omit %>%
select(b, everything())
# A tibble: 8 x 3
# b a a2
# <dbl> <chr> <chr>
#1 1 A B
#2 1 B C
#3 1 C D
#4 1 D E
#5 2 F G
#6 2 G H
#7 2 H I
#8 2 I J
If lead doesn't works, then just remove the first element, append NA at the end in the mutate step
ts %>%
group_by(b) %>%
mutate(a2 = c(a[-1], NA)) %>%
ungroup %>%
na.omit %>%
select(b, everything())

r create new data frame that matches in rows elements grouped by another column

I want to create a new data frame from the df one below. In the new data frame (df2), each element in df$name is placed in the first column and matched in its row with other element of df$name grouped by df$group.
df <- data.frame(group = rep(letters[1:2], each=3),
name = LETTERS[1:6])
> df
group name
1 a A
2 a B
3 a C
4 b D
5 b E
6 b F
In this example, "A", "B", and "C" in df$name belong to "a" in df$group, and I want to put them in the same row in a new data frame. The desired output looks like this:
> df2
V1 V2
1 A B
2 A C
3 B A
4 B C
5 C A
6 C B
7 D E
8 D F
9 E D
10 E F
11 F D
12 F E

We could do this in base R with merge
out <- setNames(subset(merge(df, df, by.x = 'group', by.y = 'group'),
name.x != name.y, select = -group), c("V1", "V2"))
row.names(out) <- NULL
out
# V1 V2
#1 A B
#2 A C
#3 B A
#4 B C
#5 C A
#6 C B
#7 D E
#8 D F
#9 E D
#10 E F
#11 F D
#12 F E

In my opinion its case of self-join. Using dplyr a solution can be as:
library(dplyr)
inner_join(df, df, by="group") %>%
filter(name.x != name.y) %>%
select(V1 = name.x, V2 = name.y)
# V1 V2
# 1 A B
# 2 A C
# 3 B A
# 4 B C
# 5 C A
# 6 C B
# 7 D E
# 8 D F
# 9 E D
# 10 E F
# 11 F D
# 12 F E

df <- data.frame(group = rep(letters[1:2], each=3),
name = LETTERS[1:6])
library(tidyverse)
df %>%
group_by(group) %>% # for every group
summarise(v = list(expand.grid(V1=name, V2=name))) %>% # create all combinations of names
select(v) %>% # keep only the combinations
unnest(v) %>% # unnest combinations
filter(V1 != V2) # exclude rows with same names
# # A tibble: 12 x 2
# V1 V2
# <fct> <fct>
# 1 B A
# 2 C A
# 3 A B
# 4 C B
# 5 A C
# 6 B C
# 7 E D
# 8 F D
# 9 D E
# 10 F E
# 11 D F
# 12 E F

Tidyverse friendly hack on combn

Problem is simple and in many other posts, but I haven't found satisfactory answer.
Say you have a tibble with one column of labels (here letters) and other values in other columns (here just one 'value').
data <- tibble(letter = letters[1:5], value = 1:5)
Now what you want is generate all the pairs without permutations and keep the value attached to each of the pair element. Here's the solution I have and which I believe is valid but...inelegant.
combn(data$letter, m = 2) %>%
t() %>%
as_tibble() %>%
rename(letter_1 = V1, letter_2 = V2) %>%
left_join(data, by = c("letter_1" = "letter")) %>%
left_join(data, by = c("letter_2" = "letter"), suffix = c("_1", "_2"))
Which outputs the desired result:
# A tibble: 10 x 4
letter_1 letter_2 value_1 value_2
<chr> <chr> <int> <int>
1 a b 1 2
2 a c 1 3
3 a d 1 4
4 a e 1 5
5 b c 2 3
6 b d 2 4
7 b e 2 5
8 c d 3 4
9 c e 3 5
10 d e 4 5
I'm really looking for a tidyverse approach. I'm a fan boy :)
Thank you in advance for any help.

Here is a tidyverse solution using expand (instead of combn):
data %>%
expand(letter_1 = letter, letter_2 = letter) %>%
mutate(
value_1 = match(letter_1, letters),
value_2 = match(letter_2, letters)) %>%
filter(letter_1 != letter_2) %>%
rowwise() %>%
mutate(id = paste0(sort(c(letter_1, letter_2)), collapse = " ")) %>%
distinct(id, .keep_all = TRUE) %>%
select(-id)
## A tibble: 15 x 4
# letter_1 letter_2 value_1 value_2
# <chr> <chr> <int> <int>
# 2 a b 1 2
# 3 a c 1 3
# 4 a d 1 4
# 5 a e 1 5
# 7 b c 2 3
# 8 b d 2 4
# 9 b e 2 5
#11 c d 3 4
#12 c e 3 5
#13 d d 4 4
#14 d e 4 5

One option could be using combn as:
data <- tibble(letter = letters[1:5], value = 1:5)
res <- cbind(data.frame(t(combn(data$letter, 2))), data.frame(t(combn(data$value, 2))))
names(res) <- c("letter_1", "letter_2", "value_1", "value_2")
res
# letter_1 letter_2 value_1 value_2
# 1 a b 1 2
# 2 a c 1 3
# 3 a d 1 4
# 4 a e 1 5
# 5 b c 2 3
# 6 b d 2 4
# 7 b e 2 5
# 8 c d 3 4
# 9 c e 3 5
# 10 d e 4 5

I find the rowwise() function to work inconsistently in my machine. You might want to try map() functions in the purrr pacakge.
Here's a way to implement this:
library(purrr)
data %>%
expand(letter_1 = letter, letter_2 = letter) %>%
mutate(
value_1 = match(letter_1, letters),
value_2 = match(letter_2, letters)) %>%
filter(letter_1 != letter_2) %>%
mutate(
id = map2_chr(letter_1, letter_2, function(x, y) {
paste(sort(c(x, y)), collapse = " ")
})
) %>%
distinct(id, .keep_all = TRUE) %>%
select(-id)
# # A tibble: 10 x 4
# letter_1 letter_2 value_1 value_2
# <chr> <chr> <int> <int>
# 1 a b 1 2
# 2 a c 1 3
# 3 a d 1 4
# 4 a e 1 5
# 5 b c 2 3
# 6 b d 2 4
# 7 b e 2 5
# 8 c d 3 4
# 9 c e 3 5
# 10 d e 4 5

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Extract rows where value appears in any of multiple columns - r

This can be directly expressed in SQL: library(sqldf) sqldf("select * from name_df left join game_df on winner_name = player_name or loser_name = player_name")

Related

R: expand grid of all possible combinations within groups and apply functions across all the pairs

Mutate new column with unique values for each list

R slide window through tibble

r create new data frame that matches in rows elements grouped by another column

Tidyverse friendly hack on combn

Categories

Resources