R- Creating the same alluvial order in multiple plots - r

I'm using the ggalluvial package to make multiple alluvial plots for some data that I have, but I'd like to try to order the alluvials so I can compare them across multiple plots.
Here's some example data:
set.seed(234)
Data1 <- data.frame(
ID = rep(1:10, each = 6),
Group = rep(1:2, each = 30),
Week = rep(1:6, times = 10),
Y = sample(c("High", "Low", "None"), 60, replace = TRUE)
)
Data2 <- data.frame(
ID = rep(1:10, each = 6),
Group = rep(1:2, each = 30),
Week = rep(1:6, times = 10),
Y = sample(c("High", "Low", "None"), 60, replace = TRUE)
)
And some example code for making the two graphs:
plot1 <- ggplot(Data1,
aes(x = Week,
stratum = Y,
alluvium = ID,
fill = Y,
label = Y))+
facet_grid(Group ~.)+
scale_fill_manual(values = c("red", "yellow", "green3"))+
geom_flow(stat = "alluvium", lode.guidance = "frontback", color = "darkgray")+
geom_stratum()
plot2 <- ggplot(Data2,
aes(x = Week,
stratum = Y,
alluvium = ID,
fill = Y,
label = Y))+
facet_grid(Group ~.)+
scale_fill_manual(values = c("red", "yellow", "green3"))+
geom_flow(stat = "alluvium", lode.guidance = "frontback", color = "darkgray")+
geom_stratum()
And finally the two output graphs
Is there a way to know which alluvial belongs to which individual? Or designate the order of alluvials so that, for example, the top group 1 alluvial is the same in the first and second graphs? I realize that might make the graph look kind of bad, but for my actual data the outcome in the first week is pretty homogenous across the board so I think it would be alright. Thanks in advance.

The way that makes the most sense to me at this point is to pass the alluvium variable (in this case, ID) to the label aesthetic in a new layer using the alluvium stat and the text geom. Provided the parameters (e.g. lode.guidance) are passed the same values, the new layer should stack the alluvia in the same order at every axis, and the text labels will coincide with the "lodes" where the alluvia intersect the strata.
Here are your examples with the additional layer. Note that it is added after the stratum layer; otherwise the text would be obscured by the filled-in rectangles.
library(ggalluvial)
#> Loading required package: ggplot2
set.seed(234)
Data1 <- data.frame(
ID = rep(1:10, each = 6),
Group = rep(1:2, each = 30),
Week = rep(1:6, times = 10),
Y = sample(c("High", "Low", "None"), 60, replace = TRUE)
)
Data2 <- data.frame(
ID = rep(1:10, each = 6),
Group = rep(1:2, each = 30),
Week = rep(1:6, times = 10),
Y = sample(c("High", "Low", "None"), 60, replace = TRUE)
)
ggplot(Data1,
aes(x = Week,
stratum = Y,
alluvium = ID,
fill = Y,
label = Y))+
facet_grid(Group ~.)+
scale_fill_manual(values = c("red", "yellow", "green3"))+
geom_flow(stat = "alluvium", lode.guidance = "frontback", color = "darkgray")+
geom_stratum()+
geom_text(stat = "alluvium", aes(label = ID), lode.guidance = "frontback")
ggplot(Data2,
aes(x = Week,
stratum = Y,
alluvium = ID,
fill = Y,
label = Y))+
facet_grid(Group ~.)+
scale_fill_manual(values = c("red", "yellow", "green3"))+
geom_flow(stat = "alluvium", lode.guidance = "frontback", color = "darkgray")+
geom_stratum()+
geom_text(stat = "alluvium", aes(label = ID), lode.guidance = "frontback")
Created on 2019-11-22 by the reprex package (v0.3.0)

Related

ggplot line graph with multiple lines by groups

So my data set is to analyse the effect of predation on salmon selected for growth.
Basically I have a start and ending point, 3 different strains and 2 environments (with and without predator). Does anyone knows the best way to do this?
I was thinking of something like this drawing enter image description here
I have been trying but I can only come up either with the separated time points, in which I would have to do 2 graphs, or with and average of both.
The data set is available here.
https://dryad-assetstore-merritt-west.s3.us-west-2.amazonaws.com/ark%3A/13030/m55q9wc8%7C1%7Cproducer/Salmon_Size_Data.txt?response-content-type=text%2Fplain&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEEgaCXVzLXdlc3QtMiJIMEYCIQD%2FrxcoA78DX5N86nFNROptzvNB%2Bo82OubnJESH4AQF5wIhALF9AuZuZMgV6Ik7EBd9Pje07bsANAT%2BB5R%2BBh24rjJYKswECEEQABoMNDUxODI2OTE0MTU3Igxdeqv51kC67yp3Gr0qqQTdXAWYho6s5Xrf3UFxy0BvZ%2Fm1OUwz%2BSvZS2jSWam%2BcFwyEk2gVOvcZis5PLf%2BAUk43X0wn4S5%2FpXkunbyWiWWlwoV1d%2BOlt8M%2FiyuGrg%2Bzydv2d%2FT6l5zdQ2dxa5ISKLmLHvpl5CzfCB2aChuWTwruTMsssEPZQUyxZy2ihgpbPpjV%2FM5LOfOxcunwJXrMBL4BUk6PCqZQYMpe5NiIOvv7mO58trcPKL5hQ0W4HECtiPtoslFn5Gv5v6KWG4A9VDAfgZwc0TxVmqzzbd6xnb57i6bbfgOyX7PkwFXTuNswa1VJL8Zai08%2BmlmvCXYZyhENYuVTk7K9g3N2aUWlP0nSSMyUKoJPgW45fldrgMMfl7uAH5Budh8EfoFUMQMStuse9gR0qiCHWMbohDao0YcOImNYmoCO5znwTbuDerPsGEzQbrK9YFPKbTpFtm%2Fqc5pAPWw4wWPWcj0PmG2FvNphT3IV8M8jL5Nc%2BNkCM2SbKf82XY2sBar43Xn%2BhPFlsaU%2FkeaFINCSRf29FY6mFNgoKWHfcGbiFoS6gegiFc4iyK7zMjReIFjJ9%2Bsur6HpwWVLG%2Br2JZ8OZjjwg1Uy6tWZ5LxUk%2Fm00fhjIuJyYe6vb%2BL98gKyzL9YXEOEDoEbQ6C%2FCGPsYzKs2mEJSic%2FRxGHIt7%2B4wI7ilcdVnpmoBxiQDYIjD5EYF1UYX2RzXCAb%2Ba4Feb5Y%2FnLv5Wd9lZH67KnrCl%2F%2FP80n%2FUMLmNqJsGOqgBUH4Uc6%2BmRqbTXPRp0NF%2BL6Ieni3hFJbOhhF33xQvrX0R75mGpFCUGSh15B1V%2F%2BQyoPJSJ6KpjBbmhvByzaUNzp9Tu9IRVbrAYaQjU1msReCU7%2B8T6NQnphj%2FizbzJsYEAPxVesRFiGfoH%2FcqjfDSIXDWiJU4pzwyaITjlPe2qawZ06sxXaP%2BxkrgINQ93FHpFTh6DX7kcYUG0dXkwGsDVXYln3pXlXTG&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20221108T081734Z&X-Amz-SignedHeaders=host&X-Amz-Expires=14400&X-Amz-Credential=ASIAWSMX3SNW4W5FR7N3%2F20221108%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=8ab2326025376f5c9b96f2b4e31c51ba5fe15a96743794ac8d6cc48a75efe7e0
I am using this code:
ggplot(data = aqua, mapping = aes(x = Env, y = mass, group = Strain, color = Strain))+
geom_line(stat = "summary", fun = mean, size = 1, linetype = 2)+
geom_point(stat = "summary", fun = mean, size = 3)+
stat_summary(geom = "errorbar", fun.data = mean_se, width = 0.1, size = .5)+
labs(x = "Environment", y = "Body mass (g)")+
theme(axis.title.x.bottom = element_text(size = 20), axis.title.y.left = element_text(size = 20))
Since you have not given any data, I made up an example data for you. Next time please include a reproducible example data with your code.
example_data <- tibble(
strain = rep(c("A", "B", "C"), each = 3),
env = rep(c("x", "y", "z"), times = 3),
mass = c(1,4,7,2,6,9,3,8,10)
)
ggplot(example_data, mapping = aes(x = env, y = mass, group = strain, color = strain))+
geom_line(size = 1, linetype = 2)+
geom_point(size = 3) +
labs(x = "Environment", y = "Body mass (g)")+
theme(axis.title.x.bottom = element_text(size = 20), axis.title.y.left = element_text(size = 20))

Matching Histogram ID label with fill colour

I created a geom_histogram using the dataset and code below, and I wanted to label each bar in histogram with the subject ID and color the bar according to the metabolizer group, I noticed that for some reason the ID label and the color don't match, the ID is correct on the x-axis value but it is not colored according to the group.
For example ID 72 in the graph below has a value of -2.85, the ID is correct on the x-axis location but should be colored dark green as a PM, same for ID 33 should be UM light blue color and so on!
Any suggestions! Thanks
The dataset:
Set.seed(4)
df <- data.frame(ID = factor(1:72), gengroup = c("UM","NM" ,"IM", "PM"), value = 2 - rgamma(72, 3, 2))
Histogram code:
p1 <- ggplot(df, aes(x = value, fill = gengroup)) +
scale_fill_brewer(aes(name= "Metabolizer group"), palette = "Paired", labels= c("UM","NM" ,"IM", "PM"))+
geom_histogram(bins = 30) +
stat_bin(geom = "text", bins = 30,size =2, na.rm = TRUE,
aes(label = ifelse(after_stat(count) == 0, NA, after_stat(group)),
group = ID, y = after_stat(count)),
position = position_stack(vjust = 0.5)) +
labs(x = NULL)
show(p1)
Graph:
You could extract the colors of the Paired palette using brewer.pal from RColorBrewer and manually assign them with scale_fill_manual like this:
set.seed(4)
df <- data.frame(ID = factor(1:72), gengroup = c("UM","NM" ,"IM", "PM"), value = 2 - rgamma(72, 3, 2))
library(ggplot2)
library(RColorBrewer)
colors <- brewer.pal(4, "Paired")
p1 <- ggplot(df, aes(x = value, fill = gengroup)) +
geom_histogram(bins = 30) +
stat_bin(geom = "text", bins = 30,size =2, na.rm = TRUE,
aes(label = ifelse(after_stat(count) == 0, NA, after_stat(group)),
group = ID, y = after_stat(count)),
position = position_stack(vjust = 0.5)) +
scale_fill_manual("Metabolizer group", values = c("UM" = colors[1],
"NM" = colors[2],
"IM" = colors[3],
"PM" = colors[4])) +
labs(x = NULL)
show(p1)
Created on 2022-09-12 with reprex v2.0.2
There are actually two issues in your code:
Using labels= c("UM","NM" ,"IM", "PM") you are changing the labels for your groups in the legend. But under the hood the colors are assigned by the order of the groups in the data, which by default is c("IM","NM" ,"PM", "UM"), e.g. the dark green which is labelled PM is actually assigned to gengroup UM. To fix that you set the limits = c("UM", "NM", "IM", "PM") instead of using labels
set.seed(4)
df <- data.frame(ID = factor(1:72), gengroup = c("UM", "NM", "IM", "PM"), value = 2 - rgamma(72, 3, 2))
library(dplyr)
library(ggplot2)
ggplot(df, aes(x = value, fill = gengroup)) +
scale_fill_brewer(aes(name= "Metabolizer group"), palette = "Paired", limits = c("UM", "NM", "IM", "PM"))+
geom_histogram(bins = 30) +
stat_bin(geom = "text", bins = 30,size =2, na.rm = TRUE,
aes(label = ifelse(after_stat(count) == 0, NA, after_stat(group)),
group = ID, y = after_stat(count)),
position = position_stack(vjust = 0.5)) +
labs(x = NULL)
As you see now your ID 72 get the correct dark green and the ID 33 the light blue.
However there are still some issues in all cases where the bars contain more than one ID, e.g. ID 8 should also be colored dark green but is colored light green.
The reason for that is that you apply a different grouping for the geom_histogram and for adding the labels via stat_bin. For the first the grouping is defined by gengroup while for the second you group by ID. This could be seen clearly by grouping the geom_histogram by ID too:
ggplot(df, aes(x = value, fill = gengroup)) +
scale_fill_brewer(aes(name= "Metabolizer group"), palette = "Paired", limits = c("UM", "NM", "IM", "PM"))+
geom_histogram(aes(group = ID), bins = 30) +
stat_bin(geom = "text", bins = 30,size =2, na.rm = TRUE,
aes(label = ifelse(after_stat(count) == 0, NA, after_stat(group)),
group = ID, y = after_stat(count)),
position = position_stack(vjust = 0.5)) +
labs(x = NULL)
As can be seen now we get the right colors but the bars are no longer stacked in the order of gengroup
To fix that and to stack the labels by gengroup you could convert ID to a factor with the order of the IDs set according to the order of gengroup. To this end I arrange the data first and use forcats::fct_inorder. However, to get right labels we also have to make use of a lookup table to assign the right labels inside after_stat:
df <- df |>
arrange(gengroup) |>
mutate(ID = forcats::fct_inorder(ID))
labels <- setNames(levels(df$ID), seq_along(levels(df$ID)))
ggplot(df, aes(x = value, fill = gengroup)) +
scale_fill_brewer(aes(name = "Metabolizer group"), palette = "Paired", limits = c("UM", "NM", "IM", "PM")) +
geom_histogram(bins = 30) +
stat_bin(
geom = "text", bins = 30, size = 2, na.rm = TRUE,
aes(
label = ifelse(after_stat(count) == 0, NA, after_stat(labels[group])),
group = ID, y = after_stat(count)
),
position = position_stack(vjust = 0.5)
) +
labs(x = NULL)

ggplot line plot with one group`s lines on top

I am making a line plot of several groups and want to make a visualization where one of the groups lines are highlighted
ggplot(df) + geom_line(aes(x=timepoint ,y=var, group = participant_id, color=color)) +
scale_color_identity(labels = c(red = "g1",gray90 = "Other"),guide = "legend")
However, the group lines are partially obscured by the other groups lines
How can I make these lines always on top of other groups lines?
The simplest way to do this is to plot the gray and red groups on different layers.
First, let's try to replicate your problem with a dummy data set:
set.seed(1)
df <- data.frame(
participant_id = rep(1:50, each = 25),
timepoint = factor(rep(0:24, 50)),
var = c(replicate(50, runif(1, 50, 200) + runif(25, 0.3, 1.5) *
sin(0:24/(0.6*pi))^2/seq(0.002, 0.005, length = 25))),
color = rep(sample(c("red", "gray90"), 50, TRUE, prob = c(1, 9)), each = 100)
)
Now we apply your plotting code:
library(ggplot2)
ggplot(df) +
geom_line(aes(x=timepoint ,y=var, group = participant_id, color = color)) +
scale_color_identity(labels = c(red = "g1", gray90 = "Other"),
guide = "legend") +
theme_classic()
This looks broadly similar to your plot. If instead we plot in different layers, we get:
ggplot(df, aes(timepoint, var, group = participant_id)) +
geom_line(data = df[df$color == "gray90",], aes(color = "Other")) +
geom_line(data = df[df$color == "red",], aes(color = "gl")) +
scale_color_manual(values = c("red", "gray90")) +
theme_classic()
Created on 2022-06-20 by the reprex package (v2.0.1)
You can use factor releveling to bring the line (-s) of interest to front.
First, let's plot the data as is, with the red line partly hidden by others.
library(ggplot2)
library(dplyr)
set.seed(13)
df <-
data.frame(timepoint = rep(c(1:100), 20),
participant_id = paste0("p_", sort(rep(c(1:20), 100))),
var = abs(rnorm(2000, 200, 50) - 200),
color = c(rep("red", 100), rep("gray90", 1900)))
ggplot(df) +
geom_line(aes(x = timepoint ,
y = var,
group = participant_id, color = color)) +
scale_color_identity(labels = c(red = "g1", gray90 = "Other"),
guide = "legend")
Now let's bring p_1 to front by making it the last factor level.
df %>%
mutate(participant_id = factor(participant_id)) %>%
mutate(participant_id = relevel(participant_id, ref = "p_1")) %>%
mutate(participant_id = factor(participant_id, levels = rev(levels(participant_id)))) %>%
ggplot() +
geom_line(aes(x=timepoint,
y=var,
group = participant_id,
color = color)) +
scale_color_identity(labels = c(red = "g1", gray90 = "Other"),
guide = "legend")

How to supress/combine the legend for two geoms in R

Im making a scatterplot which shows a value plotted against the date since symptom onset. These patients are categorised based on disease severity, and i wanted to show how the values change over time in each severity category. I have coloured the dots based on severity score, but i prefer to use shape =21 so i can have a border. I also draw a line to see the trend, and i want that coloured in the same way, however, this has added another legend and it looks complicated. This issue doesnt happen if use a different shape that isnt filled, because scale_colour_manual can be used for both the lines and the dots, but i dont think it looks as nice. Any idea how i can fix this?
IC50SymObySS <- ggplot(data = isaric) +
geom_point(mapping = aes(x = Days_since_onset, y = log2IC50, fill = Severity_score), size = 2, colour = "black", shape = 21)+
geom_smooth(mapping = aes(x = Days_since_onset, y = log2IC50, colour = Severity_score), se = FALSE)+
scale_fill_manual(breaks=c("1","2","3","4","5"),
values=c("1" = "lightblue1","2" = "lightblue3","3" = "lightblue4","4" = "lightcoral","5" = "firebrick2"),
labels=c("1","2","3","4","5"),
name = "Severity Score")+
scale_colour_manual(values=c("1" = "lightblue1","2" = "lightblue3","3" = "lightblue4","4" = "lightcoral","5" = "firebrick2"))+
theme_minimal()+
JTheme+
ylab("Serum Log2 IC50")+
xlab("Days Since Symptom Onset")+
guides(colour = guide_legend(title.position = "top", title.hjust = 0.5))
IC50SymObySS
As per this answer, you need to use identical name and labels values for both fill and colour scale.
library(ggplot2)
library(dplyr)
isaric <- transmute(iris,
Days_since_onset = (Sepal.Length - 4)^3,
log2IC50 = Sepal.Width * 3,
Severity_score = cut(Petal.Length, breaks = quantile(Petal.Length, prob = 0:5 / 5), labels = 1:5))
ggplot(data = isaric) +
geom_smooth(mapping = aes(x = Days_since_onset, y = log2IC50, colour = Severity_score), se = FALSE)+
geom_point(mapping = aes(x = Days_since_onset, y = log2IC50, fill = Severity_score), size = 2, colour = "black", shape = 21)+
scale_colour_manual(
name = "Severity Score",
values=c("1" = "lightblue1","2" = "lightblue3","3" = "lightblue4","4" = "lightcoral","5" = "firebrick2"),
labels=c("1","2","3","4","5"))+
scale_fill_manual(
name = "Severity Score",
breaks=c("1","2","3","4","5"),
values=c("1" = "lightblue1","2" = "lightblue3","3" = "lightblue4","4" = "lightcoral","5" = "firebrick2"),
labels=c("1","2","3","4","5"))+
theme_minimal()+
ylab("Serum Log2 IC50")+
xlab("Days Since Symptom Onset")+
guides(colour = guide_legend(title.position = "top", title.hjust = 0.5))

Changing legend labels when using multiple variables

In R using ggplot:
I'm trying to plot a line plot with multiple columns as individual variables. I'm not using a fill = parameter so I know that's why scale_fill_discrete doesn't work. From what I've seen from other similar questions, it seems like all the other options (scale_colour_discrete, scale_shape_discrete etc) require you to use those parameters in the first step of building the plot. That may be my main issue, but I don't know how to fix it with the three different variables. Right now the legend that shows up shows the three different colors but they are not associated with the right variable.
ggplot(summary_5yr) +
geom_line(aes(x = Year, y = NY_Med_Inc, group = 1, color ="blue")) +
geom_line(aes(x = Year, y = FL_Med_Inc, group = 1, color = "red")) +
geom_line(aes(x = Year, y = WA_Med_Inc, group = 1, color = "green")) +
labs(title = "Median Income Trends", x = "Year", y = "Median Income (USD)")
Try this. To get the colors and the legend right you have to make use of scale_color_manual. Using color = "blue" inside aes() will not set the color to "blue". Instead "blue" is simply a kind of label to which you have to assign a color inside scale_color_manual. Also. To get the correct labels you have to set the labels argument.
A second approach to achieve the desired plot would be to reshape your df into long format via e.g. tidyr::pivot_longer. This way only one geom_line layer is needed and you get the correct labels automatically.
library(ggplot2)
library(tidyr)
library(dplyr)
set.seed(123)
summary_5yr <- data.frame(
Year = 2010:2020,
NY_Med_Inc = runif(11, 10000, 50000),
FL_Med_Inc = runif(11, 10000, 50000),
WA_Med_Inc = runif(11, 10000, 50000)
)
ggplot(summary_5yr) +
geom_line(aes(x = Year, y = NY_Med_Inc, group = 1, color ="blue")) +
geom_line(aes(x = Year, y = FL_Med_Inc, group = 1, color = "red")) +
geom_line(aes(x = Year, y = WA_Med_Inc, group = 1, color = "green")) +
scale_color_manual(values = c(blue = "blue", red = "red", green = "green"),
labels = c(blue = "NY_Med_Inc", red = "FL_Med_Inc", green = "WA_Med_Inc")) +
labs(title = "Median Income Trends", x = "Year", y = "Median Income (USD)")
summary_5yr %>%
tidyr::pivot_longer(-Year, names_to = "var", values_to = "value") %>%
ggplot() +
geom_line(aes(x = Year, y = value, group = var, color = var)) +
scale_color_manual(values = c(NY_Med_Inc = "blue", FL_Med_Inc = "red", WA_Med_Inc = "green")) +
labs(title = "Median Income Trends", x = "Year", y = "Median Income (USD)")

Resources