I'm combining two layers in ggplot that were created from two different data sets and want to control the order in which the legend appears.
With example data and code:
base <-
data.frame(idea_num = c(1, 2),
value = c(-50, 90),
it_cost = c(30, 10))
group <-
data.frame(idea_num = c(1, 1, 2, 2),
group = c("a", "b", "a", "b"),
is_primary = c(TRUE, FALSE, FALSE, TRUE),
group_value = c(-40, -10, 20, 70))
base %>%
left_join(group) %>%
arrange(desc(value)) %>%
mutate(idea_num = idea_num %>% factor(levels = unique(idea_num)),
is_primary = is_primary %>% factor(levels = c("TRUE", "FALSE"))) %>%
ggplot(aes(x = idea_num, y = group_value, fill = is_primary)) +
geom_bar(stat = "identity") +
geom_bar(data = base %>%
arrange(desc(value)) %>%
mutate(idea_num = idea_num %>% factor(levels = unique(idea_num))),
aes(x = idea_num, y = it_cost, alpha = 0.1, fill = "it_cost"),
stat = "identity") +
scale_fill_manual(name = "Group", labels = c("TRUE" = "Primary", "FALSE" = "Secondary", "it_cost" = "IT Cost"),
values = c("TRUE" = "blue", "FALSE" = "red", "it_cost" = "black")) +
scale_alpha(guide = "none") +
theme(legend.position = "bottom")
I get a figure
but I'd like the legend to appear in the order of Primary, Secondary, IT Cost.
Were all of the numbers I'm trying to plot part of the same grand number, I could easily melt the dataframe and sum everything; however, the values from the group$group_value need to be displayed separate from base$it_cost.
If I plot only the values from teh first layer, i.e.,
base %>%
left_join(group) %>%
arrange(desc(value)) %>%
mutate(idea_num = idea_num %>% factor(levels = unique(idea_num)),
is_primary = is_primary %>% factor(levels = c("TRUE", "FALSE"))) %>%
ggplot(aes(x = idea_num, y = group_value, fill = is_primary)) +
geom_bar(stat = "identity") +
scale_fill_manual(name = "Group", labels = c("TRUE" = "Primary", "FALSE" = "Secondary"),
values = c("TRUE" = "blue", "FALSE" = "red")) +
theme(legend.position = "bottom")
I get a figure I expect
How can I add the second layer and adjust the ordering of the legend boxes? I do not believe that this question or this question are entirely relevant to mine as the former is dealing with levels of a factor and the latter deals with ordering of multiple legends.
Can I do what I'd like to do? Is there a better way of constructing this plot?
use scale_fill_manual(..., limit=, ...):
... +
scale_fill_manual(name = "Group",
labels = c("TRUE" = "Primary", "FALSE" = "Secondary", "it_cost" = "IT Cost"),
limits = c("TRUE", "FALSE", "it_cost"),
values = c("TRUE" = "blue", "FALSE" = "red", "it_cost" = "black")) +
...
This gives:
That said, I think you may want to consider a few different approaches:
A: why do you create your data in such a complex way, ending up multiple observations of IT Costs for the same idea number? I don't know your data, you may well have your reasons, but a simple dataset along the lines:
idea_num value type
1 1 -40 Primary
2 1 -10 Secondary
3 2 20 Secondary
4 2 70 Primary
5 1 -50 IT Cost
6 2 90 IT Cost
would simplify the things quite a bit.
B: Why do you want to stack/overplot these two separate barplots? I would do position="dodge" instead to have separate bars.
df2 <- base %>%
left_join(group) %>%
mutate(is_primary=paste0("pri_", is_primary+0)) %>%
spread(is_primary, group_value) %>%
gather(yvar, y, it_cost, pri_0, pri_1)
df2$yvar <- factor(df2$yvar, levels=c("pri_0", "pri_1", "it_cost"),
labels=c("Primary", "Secondary", "IT Cost"))
df2$idea_num <- factor(df2$idea_num, levels=c(2, 1))
ggplot(df2, aes(idea_num, y, fill=yvar)) +
geom_bar(stat="identity") +
scale_fill_manual("Group", values=c("blue", "red", "black")) +
scale_alpha(guide = "none") +
theme(legend.position = "bottom")
Related
#Sample data
set.seed(42)
DB = data.frame(Group =c(rep("1",16),
rep("2",4)) ,
Score1 = sample(1:20,20, replace = T),
Score2 = sample(1:20,20, replace = T),
Score3 = sample(1:20,20, replace = T),
Score4 = sample(1:20,20, replace = T))
I want to plot two bar charts comparing the mean of each score in both groups.
So the right side will be with a Title "Group 1 mean scores" and left side (left barchart) is "Group 2 mean scores"
Thanks.
You can pivot to long format and use stat = "summary"
library(tidyverse)
DB %>%
pivot_longer(-1, names_to = "Score") %>%
ggplot(aes(Group, value, fill = Score)) +
geom_bar(position = position_dodge(width = 0.8, preserve = "total"),
stat = "summary", fun = mean, width = 0.6) +
scale_fill_brewer(palette = "Set2") +
theme_minimal(base_size = 20)
Or if you prefer facets, you can do:
library(tidyverse)
DB %>%
pivot_longer(-1, names_to = "Score") %>%
mutate(Group = paste("Group", Group)) %>%
ggplot(aes(Score, value, fill = Score)) +
geom_bar(stat = "summary", fun = mean, width = 0.6) +
scale_fill_brewer(palette = "Set2", guide = "none") +
facet_grid(.~Group) +
theme_bw(base_size = 20)
Created on 2022-11-13 with reprex v2.0.2
I created a geom_histogram using the dataset and code below, and I wanted to label each bar in histogram with the subject ID and color the bar according to the metabolizer group, I noticed that for some reason the ID label and the color don't match, the ID is correct on the x-axis value but it is not colored according to the group.
For example ID 72 in the graph below has a value of -2.85, the ID is correct on the x-axis location but should be colored dark green as a PM, same for ID 33 should be UM light blue color and so on!
Any suggestions! Thanks
The dataset:
Set.seed(4)
df <- data.frame(ID = factor(1:72), gengroup = c("UM","NM" ,"IM", "PM"), value = 2 - rgamma(72, 3, 2))
Histogram code:
p1 <- ggplot(df, aes(x = value, fill = gengroup)) +
scale_fill_brewer(aes(name= "Metabolizer group"), palette = "Paired", labels= c("UM","NM" ,"IM", "PM"))+
geom_histogram(bins = 30) +
stat_bin(geom = "text", bins = 30,size =2, na.rm = TRUE,
aes(label = ifelse(after_stat(count) == 0, NA, after_stat(group)),
group = ID, y = after_stat(count)),
position = position_stack(vjust = 0.5)) +
labs(x = NULL)
show(p1)
Graph:
You could extract the colors of the Paired palette using brewer.pal from RColorBrewer and manually assign them with scale_fill_manual like this:
set.seed(4)
df <- data.frame(ID = factor(1:72), gengroup = c("UM","NM" ,"IM", "PM"), value = 2 - rgamma(72, 3, 2))
library(ggplot2)
library(RColorBrewer)
colors <- brewer.pal(4, "Paired")
p1 <- ggplot(df, aes(x = value, fill = gengroup)) +
geom_histogram(bins = 30) +
stat_bin(geom = "text", bins = 30,size =2, na.rm = TRUE,
aes(label = ifelse(after_stat(count) == 0, NA, after_stat(group)),
group = ID, y = after_stat(count)),
position = position_stack(vjust = 0.5)) +
scale_fill_manual("Metabolizer group", values = c("UM" = colors[1],
"NM" = colors[2],
"IM" = colors[3],
"PM" = colors[4])) +
labs(x = NULL)
show(p1)
Created on 2022-09-12 with reprex v2.0.2
There are actually two issues in your code:
Using labels= c("UM","NM" ,"IM", "PM") you are changing the labels for your groups in the legend. But under the hood the colors are assigned by the order of the groups in the data, which by default is c("IM","NM" ,"PM", "UM"), e.g. the dark green which is labelled PM is actually assigned to gengroup UM. To fix that you set the limits = c("UM", "NM", "IM", "PM") instead of using labels
set.seed(4)
df <- data.frame(ID = factor(1:72), gengroup = c("UM", "NM", "IM", "PM"), value = 2 - rgamma(72, 3, 2))
library(dplyr)
library(ggplot2)
ggplot(df, aes(x = value, fill = gengroup)) +
scale_fill_brewer(aes(name= "Metabolizer group"), palette = "Paired", limits = c("UM", "NM", "IM", "PM"))+
geom_histogram(bins = 30) +
stat_bin(geom = "text", bins = 30,size =2, na.rm = TRUE,
aes(label = ifelse(after_stat(count) == 0, NA, after_stat(group)),
group = ID, y = after_stat(count)),
position = position_stack(vjust = 0.5)) +
labs(x = NULL)
As you see now your ID 72 get the correct dark green and the ID 33 the light blue.
However there are still some issues in all cases where the bars contain more than one ID, e.g. ID 8 should also be colored dark green but is colored light green.
The reason for that is that you apply a different grouping for the geom_histogram and for adding the labels via stat_bin. For the first the grouping is defined by gengroup while for the second you group by ID. This could be seen clearly by grouping the geom_histogram by ID too:
ggplot(df, aes(x = value, fill = gengroup)) +
scale_fill_brewer(aes(name= "Metabolizer group"), palette = "Paired", limits = c("UM", "NM", "IM", "PM"))+
geom_histogram(aes(group = ID), bins = 30) +
stat_bin(geom = "text", bins = 30,size =2, na.rm = TRUE,
aes(label = ifelse(after_stat(count) == 0, NA, after_stat(group)),
group = ID, y = after_stat(count)),
position = position_stack(vjust = 0.5)) +
labs(x = NULL)
As can be seen now we get the right colors but the bars are no longer stacked in the order of gengroup
To fix that and to stack the labels by gengroup you could convert ID to a factor with the order of the IDs set according to the order of gengroup. To this end I arrange the data first and use forcats::fct_inorder. However, to get right labels we also have to make use of a lookup table to assign the right labels inside after_stat:
df <- df |>
arrange(gengroup) |>
mutate(ID = forcats::fct_inorder(ID))
labels <- setNames(levels(df$ID), seq_along(levels(df$ID)))
ggplot(df, aes(x = value, fill = gengroup)) +
scale_fill_brewer(aes(name = "Metabolizer group"), palette = "Paired", limits = c("UM", "NM", "IM", "PM")) +
geom_histogram(bins = 30) +
stat_bin(
geom = "text", bins = 30, size = 2, na.rm = TRUE,
aes(
label = ifelse(after_stat(count) == 0, NA, after_stat(labels[group])),
group = ID, y = after_stat(count)
),
position = position_stack(vjust = 0.5)
) +
labs(x = NULL)
I'm trying to highlight (change the color) specific lines in a plot.
The input data looks like this:
dt <- data.frame(Marker = paste0('m', rep(seq(1,10), 10)),
Year = rep(1990:1999, each = 10),
Ahat = rnorm(100, 0.5, 0.1)) %>%
mutate(Group = if_else(Marker %in% c("m1", "m2", "m3"), "A",
if_else(Marker %in% c("m4", "m5", "m6"), "B",
if_else(Marker %in% c("m7", "m8"), "C", "D")) ) )
And the general plot can be created by:
ggplot(dt, aes(x = Year, y = Ahat, group = interaction(as.factor(Group), Marker), color = as.factor(Group) ) ) +
geom_line(alpha = 0.5, size = 0.5) +
theme_classic() +
scale_y_continuous(name = "Predicted Value", breaks = pretty_breaks()) +
scale_colour_manual(name = "Groups", values = c("black", "red", "blue", "orange")) +
facet_wrap(~Group)
What I'd like to do is to highlight (e.g. make some lines black) some specific lines in specific groups (e.g. "m1" and "m9").
I've tried using something like this gghighlight(Marker %in% c("m1", "m9")), but it doesn't work.
I'd like to have something like this (sorry for my poor drawing skills):
Any suggestion?
P.S: My real data has 50K markers.
Thank you!
One option would be to first group data in subgroups (nesting in the dataframe) and then build the plots...
library(tidyverse)
library(scales)
library(patchwork)
# 1. Create dataframe ----
dt <- data.frame(Marker = as.factor(paste0('m', rep(seq(1,10), 10))),
Year = rep(1990:1999, each = 10),
Ahat = rnorm(100, 0.5, 0.1)) %>%
mutate(Group = case_when(
Marker %in% c("m1", "m2", "m3") ~ "A",
Marker %in% c("m4", "m5", "m6") ~ "B",
Marker %in% c("m7", "m8") ~ "C",
TRUE ~ "D"))
# 2. Function to choose which Market of sub_df should be Highlight
getHighlightMarketBasedOntAhatValue <- function(sub_dt) {
sub_dt <- sub_dt %>%
group_by(Marker) %>%
mutate(mean_Ahat = mean(Ahat))
# using mean to choose Ahat is just a doomed example... also instead of a single value you could get an array of values.
# Here I am not using the index...1, 2... any more (as was in first solution), but the factor itself.
highlightMarket <- first(sub_dt$Marker[sub_dt$mean_Ahat == max(sub_dt$mean_Ahat)])
}
# 3. Function to build plot for sub_df
my_plot <- function(sub_dt, highlighted_one) {
custom_pallete = rep("grey", length(levels(sub_dt$Marker)))
names(custom_pallete) <- levels(sub_dt$Marker)
custom_pallete[highlighted_one] = "blue"
dt %>% ggplot(aes(x = Year,
y = Ahat,
color = as.factor(Marker))) +
geom_line(alpha = 0.5, size = 0.5) +
theme_classic() +
scale_y_continuous(name = "Predicted Value", breaks = pretty_breaks()) +
scale_colour_manual(name = "Marker", values = custom_pallete)
}
# 4. Main ----
# 4.1 Nesting ----
nested_dt <- dt %>%
group_by(Group) %>%
nest()
# 4.2 Choosing highlight Market for each subgroup ----
nested_dt <- nested_dt %>%
mutate(highlighted_one = getHighlightMarketBasedOntAhatValue(data[[1]]))
# 4.3 Build plots ----
nested_dt <- nested_dt %>%
mutate(plot = map2(.x = data,
.y = highlighted_one,
.f = ~ my_plot(.x, .y)))
# 4.4 Use patchwork ... ----
# to combine plots ... see patchwork help to find out how to
# manage titles, labels, etc.
nested_dt %>% pull(plot) %>% patchwork::wrap_plots()
```
One way could be to set color as Marker.
Then you can change the color of the Marker in this line
scale_colour_manual(name = "Groups", values = c("black", "red", "blue", "orange", "green", "black", "red", "blue", "orange", "green")) +
Change the colors as you like:
ggplot(dt, aes(x = Year, y = Ahat, group = interaction(as.factor(Group), Marker), color = Marker ) ) +
geom_line(alpha = 0.5, size = 0.5) +
theme_classic() +
scale_y_continuous(name = "Predicted Value", breaks = pretty_breaks()) +
scale_colour_manual(name = "Groups", values = c("black", "red", "blue", "orange", "green",
"black", "red", "blue", "orange", "green")) +
facet_wrap(~Group)
I'm trying to fix my legend text so that the text is representing the appropriate symbols and color. However, I have a lot of variables that I need to include in the legend, and they are all in different columns. Does anyone know a quick way to indicate what the colours and symbol are in the ggplot legend?
Here is some sample code
#sample data
temps = data.frame(Temperature= c(15,25,35),
Growth.Phase = c("exponential", "stationary", "death"),
Carbohydrates = sample(c(3:10), 9, replace = T),
Lipids = sample(c(10:25), 9, replace = T),
Chlorophyll = sample(c(2:15), 9),
DNA.RNA = sample(c(3:15), 9),
Protein = sample(c(5:20), 9))
temps$Shape = if_else(temps$Growth.Phase == "exponential", 21,
if_else(temps$Growth.Phase == "stationary", 22, 23))
#Graph code
ggplot(data = temps, aes(x = Temperature, y = "Proportions", shape = factor(Shape))) +
geom_point(aes(y = Carbohydrates),colour = "darkred",
fill = "darkred", size = 3) +
geom_line(aes(y = Carbohydrates), size = 1, col = "darkred") +
geom_point(aes(y = Lipids), colour = "darkblue",
fill = "darkblue", size = 3, col ="darkblue") +
geom_line(aes(y = Lipids), size = 1) +
geom_point(aes(y = Protein), colour = "violet",
fill = "violet", size = 3) +
geom_line(aes(y = Protein), size = 1, col ="violet") +
geom_point(aes(y = DNA.RNA), colour = "darkorange",
fill = "darkorange", size = 3) +
geom_line(aes(y = DNA.RNA), size = 1, col = "darkorange") +
geom_point(aes(y = Chlorophyll), size = 3, colour = "darkgreen",
fill = "darkgreen") +
geom_line(aes(y = Chlorophyll), size = 1, col = "darkgreen") +
labs(x = "Temperature (°C)", y = "Proportion")
This is the image I am getting
But as you can see it's not giving me the correct text in the legend. I would like the symbols to specify which Growth.Phase they are and the colour to specify what column I have plotted (ie. Carbohydrate, Protein etc....). Does anyone know a quick fix?
When I use my own data this is what the graph looks like, please note the lines are going through the same symbols, and are the same colours
I'm not sure whether I got the legend right. But the idea is the same as in #dc37's answer. Your plot can be considerably simplified using pivot_longer:
#sample data
temps = data.frame(Temperature= c(15,25,35),
Growth.Phase = c("exponential", "stationary", "death"),
Carbohydrates = sample(c(3:10), 9, replace = T),
Lipids = sample(c(10:25), 9, replace = T),
Chlorophyll = sample(c(2:15), 9),
DNA.RNA = sample(c(3:15), 9),
Protein = sample(c(5:20), 9))
library(ggplot2)
library(dplyr)
library(tidyr)
library(tibble)
temps_long <- temps %>%
pivot_longer(-c(Temperature, Growth.Phase)) %>%
mutate(
shape = case_when(
Growth.Phase == "exponential" ~ 21,
Growth.Phase == "stationary" ~ 22,
TRUE ~ 23
),
color = case_when(
name == "Carbohydrates" ~ "darkred",
name == "Lipids" ~ "darkblue",
name == "Protein" ~ "violet",
name == "DNA.RNA" ~ "darkorange",
name == "Chlorophyll" ~ "darkgreen",
TRUE ~ NA_character_
),
)
# named color vector
colors <- select(temps_long, name, color) %>%
distinct() %>%
deframe()
# named shape vector
shapes <- select(temps_long, Growth.Phase, shape) %>%
distinct() %>%
deframe()
ggplot(data = temps_long, aes(x = Temperature, y = value, shape = Growth.Phase, color = name, fill = name, group = Temperature)) +
geom_point(size = 3) +
geom_line(size = 1) +
scale_shape_manual(values = shapes) +
scale_fill_manual(values = colors) +
scale_color_manual(values = colors) +
labs(x = "Temperature (C)", y = "Proportion", color = "XXXX") +
guides(fill = FALSE, shape = guide_legend(override.aes = list(fill = "black")))
Created on 2020-04-04 by the reprex package (v0.3.0)
In order to make your code simpler and not have to repeat several times the same line, you can transform your data into a longer format and then use those new variables to attribute color, fill and shape arguments in your aes.
Then, using scale_color_manual or scale_shape_manual, you can set appropriate color and shape.
In order to add lines between appropriate points, I add a "rep" column in order to mimick the rpesence of replicate in your experiments. Otherwise, geom_line can't decide which points are associated together.
library(tidyr)
library(dplyr)
library(ggplot2)
temps %>% mutate(Rep = rep(1:3,each = 3)) %>%
pivot_longer(cols = Carbohydrates:Protein, names_to = "Type", values_to = "proportions") %>%
ggplot(aes(x = Temperature, y = proportions))+
geom_point(aes(fill = Type, shape = Growth.Phase, color = Type), size = 3)+
geom_line(aes( color = Type, group =interaction(Rep, Type)))+
scale_color_manual(values = c("darkred","darkgreen","darkorange","darkblue","violet"))+
scale_fill_manual(values = c("darkred","darkgreen","darkorange","darkblue","violet"))+
scale_shape_manual(values = c(23,21,22))+
labs(x = "Temperature (°C)", y = "Proportion")
Does it answer your question ?
I want to draw two densities with two vertical lines for the averages.
The legend is once to denote the densities and once the vertical
lines.
I tried the code below. However, only one legend appears and the labeling is wrong.
Can anyone help me?
set.seed(1234)
data <- data.frame(value = rnorm(n = 10000, mean = 50, sd = 20),
type = sample(letters[1:2], size = 10000, replace = TRUE))
data$value[data$type == "b"] <- data$value[data$type == "b"] + 50
mean.a <- mean(data$value[data$type == "a"])
mean.b <- mean(data$value[data$type == "b"])
library(ggplot2)
gp <- ggplot(data = data, aes(x = value))
gp <- gp + geom_density(aes(fill = type), color = "black", alpha=0.3, lwd = 1.0, show.legend = TRUE)
gp <- gp + scale_fill_manual(breaks = 1:2, name = "Density", values = c("a" = "green", "b" = "blue"), labels = c("a" = "Density a", "b" = "Density b") )
gp <- gp + geom_vline(aes(color="mean.a", xintercept=mean.a), linetype="solid", size=1.0, show.legend = NA)
gp <- gp + geom_vline(aes(color="mean.b", xintercept=mean.b), linetype="dashed", size=1.0, show.legend = NA)
gp <- gp + scale_color_manual(name = "", values = c("mean.a" = "red", "mean.b" = "darkblue"), labels = c("mean.a" = "Mean.A", "mean.b" = "Mean.B"))
gp <- gp + theme(legend.position="top")
gp
Here are a couple ways to do it. I'm not sure, but I think some of the difficulty comes from having more than one geom_vline and trying to hard-code values in the aes. You're building three scales here: fill for the density curves, and color and linetype for the vertical lines. But you're aiming (correct me if I'm misreading) for two legends.
The easiest way to deal with getting the proper legends is to make a small data frame for the means, rather than individual values for each mean. You can do this easily with dplyr to calculate means for each type.
library(tidyverse)
set.seed(1234)
data <- data.frame(value = rnorm(n = 10000, mean = 50, sd = 20),
type = sample(letters[1:2], size = 10000, replace = TRUE))
data$value[data$type == "b"] <- data$value[data$type == "b"] + 50
means <- group_by(data, type) %>%
summarise(mean = mean(value))
means
#> # A tibble: 2 x 2
#> type mean
#> <fct> <dbl>
#> 1 a 50.3
#> 2 b 99.9
Then when you plot, you can make a single geom_vline call, assigning the means data frame and allowing the aesthetics you want—color and linetype—to be scaled based on this data. The trick then is reconciling the names and labels: if you don't set the same legend name and labels for both the color and linetype scales, you'll have two legends for the lines. Set them the same, and you get a single legend for the mean lines.
ggplot(data, aes(x = value)) +
geom_density(aes(fill = type), alpha = 0.3) +
geom_vline(aes(xintercept = mean, color = type, linetype = type), data = means) +
scale_color_manual(values = c("red", "darkblue"), labels = c("Mean.A", "Mean.B"), name = NULL) +
scale_linetype_discrete(labels = c("Mean.A", "Mean.B"), name = NULL) +
scale_fill_manual(values = c(a = "green", b = "blue"), name = "Density")
The second way is to just add a step to creating the means data frame where you label the types the way you want later, i.e. "Mean.A" instead of just "a". Then you don't need to adjust labels, and you can skip the linetype scale—unless you want to change linetypes manually—and then just remove the name for that legend for both color and linetype in your labs.
means2 <- group_by(data, type) %>%
summarise(mean = mean(value)) %>%
mutate(type = paste("Mean", str_to_upper(type), sep = "."))
means2
#> # A tibble: 2 x 2
#> type mean
#> <chr> <dbl>
#> 1 Mean.A 50.3
#> 2 Mean.B 99.9
ggplot(data, aes(x = value)) +
geom_density(aes(fill = type), alpha = 0.3) +
geom_vline(aes(xintercept = mean, color = type, linetype = type), data = means2) +
scale_color_manual(values = c(Mean.A = "red", Mean.B = "darkblue")) +
scale_fill_manual(values = c(a = "green", b = "blue"), name = "Density") +
labs(color = NULL, linetype = NULL)
Created on 2018-06-05 by the reprex package (v0.2.0).