R graph reorder a factor by levels for only a specified level - r

I am trying to create a graph where the x axis (a factor) is reordered by descending order of the y axis (numerical values), but only for one of two levels of another factor.
Originally, I tried using the code below:
reorder(factor1, desc(value1))
However, this code only reorganizes the graph (in a descending order) by the sum of the two values under each factor2 (I presume); while I am only interested in reorganizing the data for one level (i.e. "A") under factor2.
Here is some sample data to illustrate better.
sampledata <- data.frame(factor1 = c("A", "A", "B", "B", "C", "C", "D", "D", "E", "E",
"F", "F", "G", "G", "H", "H", "I", "I", "J", "J"),
factor2 = c("A", "H", "A", "H", "A", "H", "A", "H", "A", "H",
"A", "H", "A", "H", "A", "H", "A", "H", "A", "H"),
value1 = c(1, 5, 6, 2, 6, 8, 10, 21, 30, 5,
3, 5, 4, 50, 4, 7, 15, 48, 20, 21))
Here is what I used previously:
sampledata %>%
ggplot(aes(x=reorder(factor1, desc(value1)), y=value1, group=factor2, color=factor2)) +
geom_point()
The reason why I would like to reorder by a specific level (say factor2=="A") is that I can view any deviance of the values for factor2=="H" away from "A" points.
I would appreciate using tidyverse or dplyr as means to solve this problem.

library(ggplto2)
library(dplyr)
sampledata %>%
mutate(value2 = +(factor2=="A")*value1) %>%
ggplot(aes(x=reorder(factor1, desc(value2 + value1/max(value1))), y=value1,
group=factor2, color=factor2)) +
geom_point() +
xlab("factor1")

Related

Creating a treemap, based on count, using R

I would like to create a tree map based on the count of "names". However, I am not sure how to do so. Seeking you help on this matter.
names <- c("A", "B", "B", "C", "D", "A", "A", "A", "A", "G", "B", "F", "F", "H")
names <- names %>% as.factor()
ggplot(names, aes(area= names, fill= names) + geom_treemap()
Many thanks
names <- c("A", "B", "B", "C", "D", "A", "A", "A", "A", "G", "B", "F", "F", "H")
names <- data.frame(names)
names <- names %>%
count(names)
ggplot(names, aes(area= n, fill= names)) + geom_treemap()

Adding extra track to outside of circos plot (circlize, chordDiagram)

I'm trying to recreate this figure below, where the "to" variable (i.e. target genes) is further grouped into outer (labelled) categories (i.e. receptors).
I have generated some example data, unfortunately I'm not sure what format is needed for the additional outer categories, but it's possibly not far off the link format.
library(circlize)
links <- data.frame(from = c("A", "B", "C", "B", "C"),
to = c("D", "E", "F", "D", "E"),
value = c(1, 1, 1, 1, 1))
categories <- data.frame(from = c("D", "E", "F", "D", "E"),
to = c("X", "X", "Y", "Y", "Y"),
value = c(1, 1, 1, 1, 1))
chordDiagram(links)
Any assistance greatly appreciated!

Tidyverse: group_by, arrange, and lag across columns

I am working on a projection model for sports where I need to understand in a certain team's most recent game:
Who is their next opponent? (solved)
When is the last time their next opponent played?
reprex that can be used below. Using row 1 as an example, I would need to understand that "a"'s next opponent "e"'s most recent game was game_id_ 3.
game_id_ <- c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6)
game_date_ <- c(rep("2021-01-29", 6), rep("2021-01-30", 6))
team_ <- c("a", "b", "c", "d", "e", "f", "b", "c", "d", "f", "e", "a")
opp_ <- c("b", "a", "d", "c", "f", "e", "c", "b", "f", "d", "a", "e")
df <- data.frame(game_id_, game_date_, team_, opp_)
#Next opponent
df <- df %>%
arrange(game_date_, game_id_, team_) %>%
group_by(team_) %>%
mutate(next_opp = lead(opp_, n = 1L))
If I can provide more details, please let me know.
We can use match to return the corresponding game_id_
library(dplyr)
df %>%
arrange(game_date_, game_id_, team_) %>%
group_by(team_) %>%
mutate(next_opp = lead(opp_, n = 1L)) %>%
ungroup %>%
mutate(last_time = game_id_[match(next_opp, opp_)])

Efficient way to use geom_boxplot with specified quantiles and long data

I have a dataset with calculated quantiles for each department and country. It looks like this:
df <- structure(list(quantile = c("p5", "p25", "p50", "p75", "p95",
"p5", "p25", "p50", "p75", "p95", "p5", "p25", "p50", "p75",
"p95", "p5", "p25", "p50", "p75", "p95"), value = c(6, 12, 20,
33, 61, 6, 14, 23, 38, 63, 7, 12, 17, 26, 50, 7, 12, 18, 26,
51), country = c("A", "A", "A", "A", "A", "B", "B", "B", "B",
"B", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B"), dep = c("D",
"D", "D", "D", "D", "D", "D", "D", "D", "D", "I", "I", "I", "I",
"I", "I", "I", "I", "I", "I"), kpi = c("F", "F", "F", "F", "F",
"F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F",
"F", "F")), row.names = c(NA, -20L), class = c("tbl_df", "tbl",
"data.frame"))
Now, I would like to build a boxplot for each department comparing countries and using p5/p95 instead of min/max similar to this plot but without outliers (hence, Train_number would be countries):
The corresponding code to this plot is (from question ggplot2, geom_boxplot with custom quantiles and outliers):
ggplot(MyData, aes(factor(Stations), Arrival_Lateness,
fill = factor(Train_number))) +
stat_summary(fun.data = f, geom="boxplot",
position=position_dodge(1))+
stat_summary(aes(color=factor(Train_number)),fun.y = q, geom="point",
position=position_dodge(1))
I tried to derive a solution from the code above and the provided answers. Unfortunately I lack the knowledge how to provide the neccessary values from the variables quantile and value to ggplot(). Is there an argument in the stat_summary() function I missed and could use? Or just another simple solution?
Whatever data you have provided from that you can generate the following plot
library(ggplot2)
f <- function(x) {
r <- quantile(x, probs = c(0.05, 0.25, 0.5, 0.75, 0.95))
names(r) <- c("ymin", "lower", "middle", "upper", "ymax")
r
}
ggplot(df, aes(factor(dep), value)) +
stat_summary(fun.data = f, geom="boxplot",
position=position_dodge(1))+
facet_grid(.~country, scales="free")
I don't know whether it is correct or not.

How to find the pattern subgraphs in original graph?

I have a graph. One can see that the complect subgraph A<->B<->C and E<->D<->F (pattern) occurs twice in the graph. I found the motifs and took 1st and 7th motifs from the list of igraphs.
libraty(igraph)
el <- matrix( c("A", "B",
"A", "C",
"B", "A",
"B", "C",
"C", "A",
"C", "B",
"C", "E",
"E", "D",
"E", "F",
"D", "E",
"D", "F",
"F", "E",
"F", "D"),
nc = 2, byrow = TRUE)
graph <- graph_from_edgelist(el)
pattern <- graph.isocreate(size=3, number = 15, directed=TRUE)
iso <- subgraph_isomorphisms(pattern, graph)
motifs <- lapply(iso, function (x) { induced_subgraph(graph, x) })
V(graph)$id <- seq_len(vcount(graph))
V(graph)$color <- "white"
par(mfrow=c(1,2))
plot(graph, edge.curved=TRUE, main="Original graph")
m1 <- V(motifs[[1]])$id; m2 <- V(motifs[[7]])$id
V(graph)[m1]$color="red"; V(graph)[m2]$color="green"
plot(graph, edge.curved=TRUE, main="Highlight graph")
I have a solution by hand selection motifs[[1]], motifs[[7]].
Question.
How to find the vertex lists of the pattern subgraph (for example, complect subgraph) automatically?

Resources