ggplot add variable to legend without including in plot (when using alpha) - r

I want to add a variable to the legend without including it in the plot.
I think problem doesn't occur when I don't use alpha(see: How do I add a variable to the legend without including it in the graph?)
library(tidyverse)
name_color <- c('black', "blue", "orange", "pink")
names(name_color) <- letters[1:4]
tibble(name = rep(letters[1:4], each = 2),
respond = rep(c("yes", "no"), 4),
n = rep(50, 8),
me = "i") %>%
filter(name != "c") %>%
ggplot(aes(me, n, fill = name, alpha = respond)) +
facet_wrap(~name) +
geom_bar(stat = "identity") +
scale_fill_manual(values = name_color, drop = FALSE)

The issue has nothing to do with alpha. The problem is the class of your data. When you use tibble to create your data, the name column is of class character. You need a factor class to "remember" the unused levels:
name_color <- c('black', "blue", "orange", "pink")
names(name_color) <- letters[1:4]
d = tibble(name = rep(letters[1:4], each = 2),
respond = rep(c("yes", "no"), 4),
n = rep(50, 8),
me = "i") %>%
class(d$name)
# [1] "character"
d %>% mutate(name = factor(name)) %>%
filter(name != "c") %>%
ggplot(aes(me, n, fill = name, alpha = respond)) +
facet_wrap(~name) +
geom_bar(stat = "identity") +
scale_fill_manual(values = name_color, drop = FALSE)
In the original question you link, you had the factor conversion explicitly, which is why it worked.
... %>% mutate(
gear = factor(gear),
vs = factor(vs)
) %>% ...

Related

ggplot2: enforcing empty space for some missing levels in a plot

In the following example (using the iris dataset), I am creating a factor class variable in which one of the species does not contain values of level C. When I make the plot, I cannot find a way to make ggplot not drop the empty level (virginica-C). In a previous post (from 10 years ago), it indicates to use the argument drop = FALSE, but it is not working for me. any suggestions?
require(dplyr)
require(ggplot2)
iris %>%
mutate(fct_x = factor(x = sample(x = c("A", "B", "C"), size = nrow(.), replace = TRUE),
levels = c("A", "B", "C"))) %>%
filter(!(Species == "virginica" & fct_x == "C")) %>%
ggplot(aes(x = Species, y = Sepal.Length, fill = fct_x)) +
geom_boxplot() +
scale_fill_discrete(drop = FALSE)
In other words, the code shown above generates the following graphic. As you can see, the virginica group does NOT show an empty space for group C (because there are no elements of type virginica-C) and that is exactly what I want to achieve: to show that empty space in the figure.
PS: There is also another similar post (from 6 years ago) in which they suggest placing values outside the limits. It is not a bad idea when you have to make a point plot, but in my case I am making a script that generates automatic plots from incoming information and, therefore, I cannot limit the y-axis since the script itself defines the ylim according to the values that appear.
You can specify the position function in the geom_boxplot call. In dodge2 (the default position parameter) you can set preserve="single" so the width of all the single columns is the same.
iris %>%
mutate(fct_x = factor(x = sample(x = c("A", "B", "C"), size = nrow(.), replace = TRUE),
levels = c("A", "B", "C"))) %>%
filter(!(Species == "virginica" & fct_x == "C")) %>%
mutate(fct_x = factor(fct_x, levels = c("A", "B", "C"))) %>%
ggplot(aes(x = Species, y = Sepal.Length, fill = fct_x)) +
geom_boxplot(position=position_dodge2(preserve="single"))
See the definition of position_dodge2(): https://ggplot2.tidyverse.org/reference/position_dodge.html
You could get the empty slot by faceting with scales = "free_x" and using scale_x_discrete(drop = FALSE):
(The strip labels could be moved to the bottom, and the fct_x labels & gaps between facets removed, if preferred per the second example.)
require(dplyr)
require(ggplot2)
iris %>%
mutate(fct_x = factor(
x = sample(x = c("A", "B", "C"), size = nrow(.), replace = TRUE),
levels = c("A", "B", "C")
)) %>%
filter(!(Species == "virginica" & fct_x == "C")) %>%
ggplot(aes(x = fct_x, y = Sepal.Length, fill = fct_x)) +
geom_boxplot() +
facet_wrap(~ Species, scales = "free_x") +
scale_x_discrete(drop = FALSE)
Created on 2022-06-16 by the reprex package (v2.0.1)
# Mimicing the original plot
require(dplyr)
require(ggplot2)
iris %>%
mutate(fct_x = factor(
x = sample(x = c("A", "B", "C"), size = nrow(.), replace = TRUE),
levels = c("A", "B", "C")
)) %>%
filter(!(Species == "virginica" & fct_x == "C")) %>%
ggplot(aes(x = fct_x, y = Sepal.Length, fill = fct_x)) +
geom_boxplot() +
facet_wrap(~ Species, scales = "free_x", strip.position = "bottom") +
scale_x_discrete(drop = FALSE) +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
strip.background = element_blank(),
panel.spacing = unit(0, "lines")) +
labs(x = "Species")
Created on 2022-06-16 by the reprex package (v2.0.1)

How to unify a legend with different types of scales in ggplot?

I have a data frame separated by 3 different factors. I want to represent this data frame with a scatter plot, using different types of scale for each factor.
I want to use shapes 21, 22 and 24, which are shapes with an outline and a colored filling. However, the filling scale does not appear correctly in the legend. Also, I want to unify the legend so that the labels look something like this (in the MWE I represented these labels with numbers from 1 to 18 (labels = 1:18)):
A, M, V1
A, M, V2
A, M, V3
...
B, O, V2
B, O, V3
I followed the recommendations of this answer, but the resulting plot was not as expected. Does anyone know how I can solve this issue?
library(ggplot2)
Factor1 <- c('A', 'B')
Factor2 <- c('M', 'N', 'O')
Factor3 <- c('V1', 'V2', 'V3')
DF <- expand.grid(Factor1 = Factor1,
Factor2 = Factor2,
Factor3 = Factor3)
DF$Result <- runif(n =18,
min = 0,
max = 100)
DF <- DF[order(DF[, "Result"]), ]
DF$Order <- 1:18
ggplot(data = DF,
aes(x = Order,
y = Result,
fill = Factor1,
shape = Factor2,
size = Factor3)) +
geom_point() +
scale_fill_manual(name = "Legend",
values = c('blue', 'red'),
labels = 1:18) +
scale_shape_manual(name = "Legend",
values = c(21,22,24),
labels = 1:18) +
scale_size_manual(name = "Legend",
values = c(2,4,6),
labels = 1:18)
Following your link will give proper result, but it needs pretty amount of effort....I made an example for Factor1 and Factor2.
DF %>%
rowwise %>%
mutate(Fac = paste0(c(Factor1, Factor2), collapse = "-") %>% as.factor) %>%
ggplot( aes(x = Order,
y = Result,
fill = Fac,
shape = Fac,
size = Factor3)) +
geom_point() +
scale_fill_manual(name = "Legend",
values = c('blue', 'red', 'blue', 'red', 'blue', 'red'),
labels = c("A-M", "B-M", "A-N", "B-N", "A-O", "B-O")) +
scale_shape_manual(name = "Legend",
values = c(21,21,22,22,24,24),
labels = c("A-M", "B-M", "A-N", "B-N", "A-O", "B-O")) +
scale_size_manual(name = "Legend",
values = c(2,4,6),
labels = 1:18)
To combine with Factor3, you may start with
DF %>%
rowwise %>%
mutate(Fac = paste0(c(Factor1, Factor2, Factor3), collapse = "-") %>% as.factor) %>%
ggplot( aes(x = Order,
y = Result,
fill = Fac,
shape = Fac,
size = Fac))

How to highlight specific lines in specific groups with ggplot2?

I'm trying to highlight (change the color) specific lines in a plot.
The input data looks like this:
dt <- data.frame(Marker = paste0('m', rep(seq(1,10), 10)),
Year = rep(1990:1999, each = 10),
Ahat = rnorm(100, 0.5, 0.1)) %>%
mutate(Group = if_else(Marker %in% c("m1", "m2", "m3"), "A",
if_else(Marker %in% c("m4", "m5", "m6"), "B",
if_else(Marker %in% c("m7", "m8"), "C", "D")) ) )
And the general plot can be created by:
ggplot(dt, aes(x = Year, y = Ahat, group = interaction(as.factor(Group), Marker), color = as.factor(Group) ) ) +
geom_line(alpha = 0.5, size = 0.5) +
theme_classic() +
scale_y_continuous(name = "Predicted Value", breaks = pretty_breaks()) +
scale_colour_manual(name = "Groups", values = c("black", "red", "blue", "orange")) +
facet_wrap(~Group)
What I'd like to do is to highlight (e.g. make some lines black) some specific lines in specific groups (e.g. "m1" and "m9").
I've tried using something like this gghighlight(Marker %in% c("m1", "m9")), but it doesn't work.
I'd like to have something like this (sorry for my poor drawing skills):
Any suggestion?
P.S: My real data has 50K markers.
Thank you!
One option would be to first group data in subgroups (nesting in the dataframe) and then build the plots...
library(tidyverse)
library(scales)
library(patchwork)
# 1. Create dataframe ----
dt <- data.frame(Marker = as.factor(paste0('m', rep(seq(1,10), 10))),
Year = rep(1990:1999, each = 10),
Ahat = rnorm(100, 0.5, 0.1)) %>%
mutate(Group = case_when(
Marker %in% c("m1", "m2", "m3") ~ "A",
Marker %in% c("m4", "m5", "m6") ~ "B",
Marker %in% c("m7", "m8") ~ "C",
TRUE ~ "D"))
# 2. Function to choose which Market of sub_df should be Highlight
getHighlightMarketBasedOntAhatValue <- function(sub_dt) {
sub_dt <- sub_dt %>%
group_by(Marker) %>%
mutate(mean_Ahat = mean(Ahat))
# using mean to choose Ahat is just a doomed example... also instead of a single value you could get an array of values.
# Here I am not using the index...1, 2... any more (as was in first solution), but the factor itself.
highlightMarket <- first(sub_dt$Marker[sub_dt$mean_Ahat == max(sub_dt$mean_Ahat)])
}
# 3. Function to build plot for sub_df
my_plot <- function(sub_dt, highlighted_one) {
custom_pallete = rep("grey", length(levels(sub_dt$Marker)))
names(custom_pallete) <- levels(sub_dt$Marker)
custom_pallete[highlighted_one] = "blue"
dt %>% ggplot(aes(x = Year,
y = Ahat,
color = as.factor(Marker))) +
geom_line(alpha = 0.5, size = 0.5) +
theme_classic() +
scale_y_continuous(name = "Predicted Value", breaks = pretty_breaks()) +
scale_colour_manual(name = "Marker", values = custom_pallete)
}
# 4. Main ----
# 4.1 Nesting ----
nested_dt <- dt %>%
group_by(Group) %>%
nest()
# 4.2 Choosing highlight Market for each subgroup ----
nested_dt <- nested_dt %>%
mutate(highlighted_one = getHighlightMarketBasedOntAhatValue(data[[1]]))
# 4.3 Build plots ----
nested_dt <- nested_dt %>%
mutate(plot = map2(.x = data,
.y = highlighted_one,
.f = ~ my_plot(.x, .y)))
# 4.4 Use patchwork ... ----
# to combine plots ... see patchwork help to find out how to
# manage titles, labels, etc.
nested_dt %>% pull(plot) %>% patchwork::wrap_plots()
```
One way could be to set color as Marker.
Then you can change the color of the Marker in this line
scale_colour_manual(name = "Groups", values = c("black", "red", "blue", "orange", "green", "black", "red", "blue", "orange", "green")) +
Change the colors as you like:
ggplot(dt, aes(x = Year, y = Ahat, group = interaction(as.factor(Group), Marker), color = Marker ) ) +
geom_line(alpha = 0.5, size = 0.5) +
theme_classic() +
scale_y_continuous(name = "Predicted Value", breaks = pretty_breaks()) +
scale_colour_manual(name = "Groups", values = c("black", "red", "blue", "orange", "green",
"black", "red", "blue", "orange", "green")) +
facet_wrap(~Group)

scale_color_manual() for different geoms in ggplot

library(tidyverse)
delta <- tibble(
type = c("alpha", "beta", "gamma"),
a = rnorm(3, 5),
b = rnorm(3, 6)
) %>%
mutate(delta = abs(a - b)) %>%
gather(`a`, `b`, `delta`, key = "letter", value = "value")
ggplot(delta %>% filter(letter != "delta"), aes(type, value, fill = letter)) +
geom_col(position = "dodge") +
geom_col(data = delta %>% filter(letter == "delta"), width = 0.5) +
scale_color_manual("grey", "black", "blue")
I'd like the a and b bars to be grey and black. And the delta bar to be blue. How do I do this with scale_color_manual()? Seems my syntax above is off.
There are two things that need to be changed:
Since you've used fill = letter, you should use scale_fill_manual instead of scale_color_manual (which would have been appropriate if you had used color = letter).
The manual color values should be provided as a vector.
library(tidyverse)
delta <- tibble(
type = c("alpha", "beta", "gamma"),
a = rnorm(3, 5),
b = rnorm(3, 6)
) %>%
mutate(delta = abs(a - b)) %>%
gather(`a`, `b`, `delta`, key = "letter", value = "value")
ggplot(delta %>% filter(letter != "delta"), aes(type, value, fill = letter)) +
geom_col(position = "dodge") +
geom_col(data = delta %>% filter(letter == "delta"), width = 0.5) +
scale_fill_manual(values = c("grey", "black", "blue"))
Created on 2018-10-08 by the reprex package (v0.2.1)

Reordering legend items in ggplot from two different datasets and layers

I'm combining two layers in ggplot that were created from two different data sets and want to control the order in which the legend appears.
With example data and code:
base <-
data.frame(idea_num = c(1, 2),
value = c(-50, 90),
it_cost = c(30, 10))
group <-
data.frame(idea_num = c(1, 1, 2, 2),
group = c("a", "b", "a", "b"),
is_primary = c(TRUE, FALSE, FALSE, TRUE),
group_value = c(-40, -10, 20, 70))
base %>%
left_join(group) %>%
arrange(desc(value)) %>%
mutate(idea_num = idea_num %>% factor(levels = unique(idea_num)),
is_primary = is_primary %>% factor(levels = c("TRUE", "FALSE"))) %>%
ggplot(aes(x = idea_num, y = group_value, fill = is_primary)) +
geom_bar(stat = "identity") +
geom_bar(data = base %>%
arrange(desc(value)) %>%
mutate(idea_num = idea_num %>% factor(levels = unique(idea_num))),
aes(x = idea_num, y = it_cost, alpha = 0.1, fill = "it_cost"),
stat = "identity") +
scale_fill_manual(name = "Group", labels = c("TRUE" = "Primary", "FALSE" = "Secondary", "it_cost" = "IT Cost"),
values = c("TRUE" = "blue", "FALSE" = "red", "it_cost" = "black")) +
scale_alpha(guide = "none") +
theme(legend.position = "bottom")
I get a figure
but I'd like the legend to appear in the order of Primary, Secondary, IT Cost.
Were all of the numbers I'm trying to plot part of the same grand number, I could easily melt the dataframe and sum everything; however, the values from the group$group_value need to be displayed separate from base$it_cost.
If I plot only the values from teh first layer, i.e.,
base %>%
left_join(group) %>%
arrange(desc(value)) %>%
mutate(idea_num = idea_num %>% factor(levels = unique(idea_num)),
is_primary = is_primary %>% factor(levels = c("TRUE", "FALSE"))) %>%
ggplot(aes(x = idea_num, y = group_value, fill = is_primary)) +
geom_bar(stat = "identity") +
scale_fill_manual(name = "Group", labels = c("TRUE" = "Primary", "FALSE" = "Secondary"),
values = c("TRUE" = "blue", "FALSE" = "red")) +
theme(legend.position = "bottom")
I get a figure I expect
How can I add the second layer and adjust the ordering of the legend boxes? I do not believe that this question or this question are entirely relevant to mine as the former is dealing with levels of a factor and the latter deals with ordering of multiple legends.
Can I do what I'd like to do? Is there a better way of constructing this plot?
use scale_fill_manual(..., limit=, ...):
... +
scale_fill_manual(name = "Group",
labels = c("TRUE" = "Primary", "FALSE" = "Secondary", "it_cost" = "IT Cost"),
limits = c("TRUE", "FALSE", "it_cost"),
values = c("TRUE" = "blue", "FALSE" = "red", "it_cost" = "black")) +
...
This gives:
That said, I think you may want to consider a few different approaches:
A: why do you create your data in such a complex way, ending up multiple observations of IT Costs for the same idea number? I don't know your data, you may well have your reasons, but a simple dataset along the lines:
idea_num value type
1 1 -40 Primary
2 1 -10 Secondary
3 2 20 Secondary
4 2 70 Primary
5 1 -50 IT Cost
6 2 90 IT Cost
would simplify the things quite a bit.
B: Why do you want to stack/overplot these two separate barplots? I would do position="dodge" instead to have separate bars.
df2 <- base %>%
left_join(group) %>%
mutate(is_primary=paste0("pri_", is_primary+0)) %>%
spread(is_primary, group_value) %>%
gather(yvar, y, it_cost, pri_0, pri_1)
df2$yvar <- factor(df2$yvar, levels=c("pri_0", "pri_1", "it_cost"),
labels=c("Primary", "Secondary", "IT Cost"))
df2$idea_num <- factor(df2$idea_num, levels=c(2, 1))
ggplot(df2, aes(idea_num, y, fill=yvar)) +
geom_bar(stat="identity") +
scale_fill_manual("Group", values=c("blue", "red", "black")) +
scale_alpha(guide = "none") +
theme(legend.position = "bottom")

Resources