Tidyverse: group_by, arrange, and lag across columns - r

I am working on a projection model for sports where I need to understand in a certain team's most recent game:
Who is their next opponent? (solved)
When is the last time their next opponent played?
reprex that can be used below. Using row 1 as an example, I would need to understand that "a"'s next opponent "e"'s most recent game was game_id_ 3.
game_id_ <- c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6)
game_date_ <- c(rep("2021-01-29", 6), rep("2021-01-30", 6))
team_ <- c("a", "b", "c", "d", "e", "f", "b", "c", "d", "f", "e", "a")
opp_ <- c("b", "a", "d", "c", "f", "e", "c", "b", "f", "d", "a", "e")
df <- data.frame(game_id_, game_date_, team_, opp_)
#Next opponent
df <- df %>%
arrange(game_date_, game_id_, team_) %>%
group_by(team_) %>%
mutate(next_opp = lead(opp_, n = 1L))
If I can provide more details, please let me know.

We can use match to return the corresponding game_id_
library(dplyr)
df %>%
arrange(game_date_, game_id_, team_) %>%
group_by(team_) %>%
mutate(next_opp = lead(opp_, n = 1L)) %>%
ungroup %>%
mutate(last_time = game_id_[match(next_opp, opp_)])

Related

Adding extra track to outside of circos plot (circlize, chordDiagram)

I'm trying to recreate this figure below, where the "to" variable (i.e. target genes) is further grouped into outer (labelled) categories (i.e. receptors).
I have generated some example data, unfortunately I'm not sure what format is needed for the additional outer categories, but it's possibly not far off the link format.
library(circlize)
links <- data.frame(from = c("A", "B", "C", "B", "C"),
to = c("D", "E", "F", "D", "E"),
value = c(1, 1, 1, 1, 1))
categories <- data.frame(from = c("D", "E", "F", "D", "E"),
to = c("X", "X", "Y", "Y", "Y"),
value = c(1, 1, 1, 1, 1))
chordDiagram(links)
Any assistance greatly appreciated!

How to find the similarity in R?

I have a data set as I've shown below:
It shows which book is sold by which shop.
df <- tribble(
~shop, ~book_id,
"A", 1,
"B", 1,
"C", 2,
"D", 3,
"E", 3,
"A", 3,
"B", 4,
"C", 5,
"D", 1,
)
In the data set,
shop A sells 1, 3
shop B sells 1, 4
shop C sells 2, 5
shop D sells 3, 1
shop E sells only 3
So now, I want to calculate the Jaccard index here. For instance, let's take shop A and shop B. There are three different books that are sold by A and B (book 1, book 3, book 4). However, only one product is sold by both shops (this is product 1). So, the Jaccard index here should be 33.3% (1/3).
Here is the sample of the desired data:
df <- tribble(
~shop_1, ~shop_2, ~similarity,
"A", "B", 33.3,
"B", "A", 33.33,
"A", "C", 0,
"C", "A", 0,
"A", "D", 100,
"D", "A", 100,
"A", "E", 50,
"E", "A", 50,
)
Any comments/assistance really appreciated! Thanks in advance.
I don't know about a package but you can write your own function. I guess by similarity you mean something like this:
similarity <- function(x, y) {
k <- length(intersect(x, y))
n <- length(union(x, y))
k / n
}
Then you can use tidyr::crossing to merge the same data frame with itself
dfg <- df %>% group_by(shop) %>% summarise(books = list(book_id))
crossing(dfg %>% set_names(paste0, "_A"), dfg %>% set_names(paste0, "_B")) %>%
filter(shop_A != shop_B) %>%
mutate(similarity = map2_dbl(books_A, books_B, similarity))

Find the overlap of two datasets

I have two different datasets as I've shown below: df_A and df_B.
df_A <- tribble(
~book_name, ~sales_id,
"A", 1,
"B", 2,
"C", 3,
"D", 4,
"E", 5,
"F", 3,
"G", 8,
"H", 6,
"I", 7,
"J", 7,
)
df_B <- tribble(
~book_name, ~sales_id,
"A", 1,
"N", 2,
"C", 3,
"E", 4,
"K", 5,
"R", 3,
"S", 8,
"U", 6,
"Z", 7,
"Y", 7,
)
Now, I want to see the overlap of these two datasets on book_name. Namely, I want to make a list that shows us the book_name that are both in the datasets and also how similar these two datasets according to the book_name column.
Is there any idea to do this in an accurate way?
You can do an inner join between the two dataframes which automatically gives you the intersection between the two dataframes.
This should do the trick,
library(dplyr)
# Creating first data frame
df_A <- tribble(
~book_name, ~sales_id,
"A", 1,
"B", 2,
"C", 3,
"D", 4,
"E", 5,
"F", 3,
"G", 8,
"H", 6,
"I", 7,
"J", 7,
)
# Creating second data frame
df_B <- tribble(
~book_name, ~sales_id,
"A", 1,
"N", 2,
"C", 3,
"E", 4,
"K", 5,
"R", 3,
"S", 8,
"U", 6,
"Z", 7,
"Y", 7,
)
# Joining between the two dataframes to get the common values between the two
result <-
df_A %>%
inner_join(df_B, by = "book_name")
Here is a base R solution, where maybe you can use intersect(), i.e.,
overlap <- subset(df_A,book_name %in% intersect(book_name,df_B$book_name))
such that
> overlap
# A tibble: 3 x 2
book_name sales_id
<chr> <dbl>
1 A 1
2 C 3
3 E 5

R graph reorder a factor by levels for only a specified level

I am trying to create a graph where the x axis (a factor) is reordered by descending order of the y axis (numerical values), but only for one of two levels of another factor.
Originally, I tried using the code below:
reorder(factor1, desc(value1))
However, this code only reorganizes the graph (in a descending order) by the sum of the two values under each factor2 (I presume); while I am only interested in reorganizing the data for one level (i.e. "A") under factor2.
Here is some sample data to illustrate better.
sampledata <- data.frame(factor1 = c("A", "A", "B", "B", "C", "C", "D", "D", "E", "E",
"F", "F", "G", "G", "H", "H", "I", "I", "J", "J"),
factor2 = c("A", "H", "A", "H", "A", "H", "A", "H", "A", "H",
"A", "H", "A", "H", "A", "H", "A", "H", "A", "H"),
value1 = c(1, 5, 6, 2, 6, 8, 10, 21, 30, 5,
3, 5, 4, 50, 4, 7, 15, 48, 20, 21))
Here is what I used previously:
sampledata %>%
ggplot(aes(x=reorder(factor1, desc(value1)), y=value1, group=factor2, color=factor2)) +
geom_point()
The reason why I would like to reorder by a specific level (say factor2=="A") is that I can view any deviance of the values for factor2=="H" away from "A" points.
I would appreciate using tidyverse or dplyr as means to solve this problem.
library(ggplto2)
library(dplyr)
sampledata %>%
mutate(value2 = +(factor2=="A")*value1) %>%
ggplot(aes(x=reorder(factor1, desc(value2 + value1/max(value1))), y=value1,
group=factor2, color=factor2)) +
geom_point() +
xlab("factor1")

Render multiple transition plots on one page (Gmisc)

I wonder if there is a way to arrange multiple of the nice transition plots of the Gmisc package on one page (e.g. two next to each other or two-by-two)? I tried various common approaches (e.g. par(mfrow = c(2,2)) and grid.arrange()) but was not successful thus far. I would appreciate any help. Thanks!
library(Gmisc)
data.1 <- data.frame(source = c("A", "A", "A", "B", "B", "C", "C"),
target = c("A", "B", "C", "B", "C", "C", "C"))
data.2 <- data.frame(source = c("D", "D", "E", "E", "E", "E", "F"),
target = c("D", "E", "D", "E", "F", "F", "F"))
transitions.1 <- getRefClass("Transition")$new(table(data.1$source, data.1$target), label = c("Before", "After"))
transitions.2 <- getRefClass("Transition")$new(table(data.2$source, data.2$target), label = c("Before", "After"))
# wish to render transition 1 and transition 2 next to each other
transitions.1$render()
transitions.2$render()
This was actually a bug prior to the 1.9 version (uploading to CRAN when writing this, available now from GitHub). What you need to do is use the grid::viewport system:
library(grid)
grid.newpage()
pushViewport(viewport(name = "basevp", layout = grid.layout(nrow=1, ncol=2)))
pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 1))
transitions.1$render(new_page = FALSE)
popViewport()
pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 2))
transitions.2$render(new_page = FALSE)

Resources