ggplot2 - separating box plot labels by colour - r

I am trying to create a box plot with labels for some of the individal data. The box plot is separated by two variables, mapped to x and colour. However when I add labels using geom_text_repel from the ggrepel package (necessary for the real data) they separate by x but not colour. See this minimal reproducible example:
library(ggplot2)
library(ggrepel)
## create dummy data frame
rep_id <- c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e")
dil <- c(1, 1, 1, 1, 2, 2, 2, 2, 2, 2)
bleach_time <- c(0, 24, 0, 24, 0, 24, 0, 24, 0, 24)
a_i <- c(0.1, 0.2, 0.35, 0.2, 0.01, 0.4, 0.23, 0.1, 0.2, 0.5)
iex <- data_frame(rep_id, dil, bleach_time, a_i)
rm(rep_id, dil, bleach_time, a_i)
## Plot bar chart of a_i separated by bleach_time and dil
p <- ggplot(iex, aes(x = as.character(bleach_time), y = a_i, fill = as.factor(dil))) +
geom_boxplot() +
geom_text_repel(aes(label = rep_id, colour = as.factor(dil)), na.rm = TRUE, segment.alpha = 0)
p
As you can see the labels are colour coded, but they are all lined up around the centre of each pair of plots rather than separated by the plots. I've tried nudge_x but that moves all the labels together. Is there a way I can move each set of labels individually?
For comparison here is the plot of my full data set with the outliers labelled - you can see how each set of labels isn't centred around the points it's labelling, complicating interpretation:

It looks like geom_text_repel needs position = position_dodge(width = __), not just the position = "dodge" shorthand I'd suggested, hence the error. You can mess around with setting the width; 0.7 looked okay to me.
library(tidyverse)
library(ggrepel)
ggplot(iex, aes(x = as.character(bleach_time), y = a_i, fill = as.factor(dil))) +
geom_boxplot() +
geom_text_repel(aes(label = rep_id, colour = as.factor(dil)), na.rm = TRUE,
segment.alpha = 0, position = position_dodge(width = 0.7))
Since you're plotting distributions, it might be important to keep positions along the y-axis the same, and only let geom_text_repel jitter along the x-axis, so I repeated the plot with direction = "x", which made me notice something interesting...
ggplot(iex, aes(x = as.character(bleach_time), y = a_i, fill = as.factor(dil))) +
geom_boxplot() +
geom_text_repel(aes(label = rep_id, colour = as.factor(dil)), na.rm = TRUE,
segment.alpha = 0, position = position_dodge(width = 0.7), direction = "x")
There are a couple texts being obscured by the fact that they have the same color as the fill of the boxplots! You can fix this with a better combination of color + fill palettes. The quick fix I did was turning down the luminosity of the color and turning up the luminosity of the fill in the scale_*_discrete calls to make them distinct (but also pretty ugly).
ggplot(iex, aes(x = as.character(bleach_time), y = a_i, fill = as.factor(dil))) +
geom_boxplot() +
geom_text_repel(aes(label = rep_id, colour = as.factor(dil)), na.rm = TRUE,
segment.alpha = 0, position = position_dodge(width = 0.7), direction = "x") +
scale_color_discrete(l = 30) +
scale_fill_discrete(l = 100)
Note that you can also adjust the force used in the repel, so if you need the labels to not overlap but to also hug closer to the middles of the boxplots, you can mess around with that setting as well.

Related

Combined bar plot and points with offset in points when X values are not numbers ggplot2

I'm trying to obtain a plot like the one shown in Combined bar plot and points in ggplot2, with the points off to the side of the bars. One of the answers suggests to subtract an offset from the x values in geom_point(), and that works; but my problem comes when x values are not numbers since I can't subtract a number to them.
For example, this works:
df = data.frame(Xval = c(2, 4, 6), Yval = c(5, 6.1, 5.4))
ggplot() +
geom_bar(df, mapping = aes(Xval, Yval), stat = "identity", width = 0.5, color = "black", fill = "#92DAB8") +
geom_point(df, mapping = aes(Xval-.5, Yval))
But this does not work:
df2 = data.frame(Xval = c("A", "B", "C"), Yval = c(5, 6.1, 5.4))
ggplot() +
geom_bar(df2, mapping = aes(Xval, Yval), stat = "identity", width = 0.5, color = "black", fill = "#92DAB8") +
geom_point(df2, mapping = aes(Xval-.5, Yval))
Is there any way to do the offset like in the first plot? Ideally, I would like to have a solution that works in both cases since I want to make a "plotter" function and you wouldn't know beforehand whether the values ​​of X are numbers, but any solution (even one that implies doing the trick in two different ways depending on the X values) will be helpful. Maybe there is a way to get the actual X values in the plot, but I searched for solutions following this idea and found nothing. Thanks in advance!
In your case one option would be to use position_nudge to shift the points:
plot_fun <- function(.data, nudge_x = 0) {
ggplot(.data) +
geom_bar(aes(Xval, Yval), stat = "identity", width = 0.5, color = "black", fill = "#92DAB8") +
geom_point(aes(Xval, Yval), position = position_nudge(x = nudge_x))
}
library(ggplot2)
df = data.frame(Xval = c(2, 4, 6), Yval = c(5, 6.1, 5.4))
df2 = data.frame(Xval = c("A", "B", "C"), Yval = c(5, 6.1, 5.4))
plot_fun(df, nudge_x = -.5)
plot_fun(df2, nudge_x = -.5)

ggdist stat_halfeye not scaling correctly

I seem to have an error in the way my distribution looks. The bottom ridges of each of the facetted graphs are not at the same scale as the other ridges above, or relative to the number of counts (i.e. scale dots shown).
Is there a way to scale all distributions relative to one another?
season_names <- c(`0` = "COOL", `1` = "HOT-DRY",`2` = "HOT-WET")
dCLEAN %>%
ggplot(aes(x = tdb, y = as.factor(tsv), fill = as.factor(season))) +
ggdist::stat_halfeye(
adjust = 0.9,
justification = -0.15,
.width = 0,
point_colour = NA) +
geom_boxplot(
width = 0.2,
outlier.colour = NA,
alpha = 0.5)+
ggdist::stat_dots(
side = "left",
justification = 1.18,
binwidth = 0.1) +
facet_wrap(~ season, labeller = as_labeller(season_names)) +
theme_bw() +
theme(strip.background = element_rect(fill="white")) +
theme(legend.position = "none") +
scale_color_grey()+
scale_fill_grey()
Image of current graph (Errors seem to be in Cool graph -2 distribution, Hot-Dry graph -1 distribution, Hot-Wet graph -1 distribution
By default, the densities are scaled to have equal area regardless of the number of observations. If you wish to scale the areas according to the number of observations, you can set aes(thickness = stat(pdf*n)) in stat_halfeye(). This sets the thickness of the slab according to the product of two computed variables generated by stat_halfeye(): the density (pdf) and the number of observations per group (n).
Here's an example from the raincloud plots section of the dotsinterval vignette:
set.seed(12345) # for reproducibility
data.frame(
abc = c("a", "b", "b", "c"),
value = rnorm(200, c(1, 8, 8, 3), c(1, 1.5, 1.5, 1))
) %>%
ggplot(aes(y = abc, x = value, fill = abc)) +
stat_slab(aes(thickness = stat(pdf*n)), scale = 0.7) +
stat_dotsinterval(side = "bottom", scale = 0.7, slab_size = NA) +
scale_fill_brewer(palette = "Set2")
This example uses stat_slab() in place of stat_halfeye(). stat_slab() is a shortcut stat equivalent to stat_halfeye() but without the point and the interval (so it saves you passing .width = 0 and point_color = NA to stat_halfeye()).

control overlaying lines while color is continuous value in ggplot

I have a data and would like to plot the lines and have control over the order that lines are laying on top of each other.
I would like to use 'cale_color_viridis()' as my pallet. I have no idea how can plot the lighter(yellow) line on the darker ones.
Here is my toy data frame and my code:
toy_data <- data.frame(x = c(1,3,1,2,5,0), y = c(0, 01, 1, 0.6, 1, .7),
col = rep(c("r", "b", "g"), each = 2), group = seq(0,1, by = 0.2))
ggplot(toy_data, aes(x = x, y = y, group = col, color = group)) +
geom_line(size = 2) +
scale_color_viridis()
any idea how can I do this?
The group aesthetic determines the plotting order, in this case, the col variable which is character data. It will normally plot in alphabetical order (b g r), so to get the yellow line from col "g" to print last, you could convert it to a factor ordered in order of appearance, like with forcats::fct_inorder:
ggplot(toy_data,
aes(x = x, y = y, group = col %>% forcats::fct_inorder(), color = group)) +
geom_line(size = 2) +
scale_color_viridis_c() # added in ggplot2 3.0 in July 2018.
# scale_color_viridis for older ggplot2 versions
If col is numeric, you could achieve the same thing by giving your "top" series the biggest number.
toy_data2 <- data.frame(x = c(1,3,1,2,5,0), y = c(0, 01, 1, 0.6, 1, .7),
col = rep(c(3, 1, 2), each = 2), group = seq(0,1, by = 0.2))
ggplot(toy_data2,
aes(x = x, y = y, group = if_else(col == 2, 1e10, col), color = group)) +
geom_line(size = 2) +
scale_color_viridis_c()

How to stop ggrepel labels moving between gganimate frames in R/ggplot2?

I would like to add labels to the end of lines in ggplot, avoid them overlapping, and avoid them moving around during animation.
So far I can put the labels in the right place and hold them static using geom_text, but the labels overlap, or I can prevent them overlapping using geom_text_repel but the labels do not appear where I want them to and then dance about once the plot is animated (this latter version is in the code below).
I thought a solution might involve effectively creating a static layer in ggplot (p1 below) then adding an animated layer (p2 below), but it seems not.
How do I hold some elements of a plot constant (i.e. static) in an animated ggplot? (In this case, the labels at the end of lines.)
Additionally, with geom_text the labels appear as I want them - at the end of each line, outside of the plot - but with geom_text_repel, the labels all move inside the plotting area. Why is this?
Here is some example data:
library(dplyr)
library(ggplot2)
library(gganimate)
library(ggrepel)
set.seed(99)
# data
static_data <- data.frame(
hline_label = c("fixed_label_1", "fixed_label_2", "fixed_label_3", "fixed_label_4",
"fixed_label_5", "fixed_label_6", "fixed_label_7", "fixed_label_8",
"fixed_label_9", "fixed_label_10"),
fixed_score = c(2.63, 2.45, 2.13, 2.29, 2.26, 2.34, 2.34, 2.11, 2.26, 2.37))
animated_data <- data.frame(condition = c("a", "b")) %>%
slice(rep(1:n(), each = 10)) %>%
group_by(condition) %>%
mutate(time_point = row_number()) %>%
ungroup() %>%
mutate(score = runif(20, 2, 3))
and this is the code I am using for my animated plot:
# colours for use in plot
condition_colours <- c("red", "blue")
# plot static background layer
p1 <- ggplot(static_data, aes(x = time_point)) +
scale_x_continuous(breaks = seq(0, 10, by = 2), expand = c(0, 0)) +
scale_y_continuous(breaks = seq(2, 3, by = 0.10), limits = c(2, 3), expand = c(0, 0)) +
# add horizontal line to show existing scores
geom_hline(aes(yintercept = fixed_score), alpha = 0.75) +
# add fixed labels to the end of lines (off plot)
geom_text_repel(aes(x = 11, y = fixed_score, label = hline_label),
hjust = 0, size = 4, direction = "y", box.padding = 1.0) +
coord_cartesian(clip = 'off') +
guides(col = F) +
labs(title = "[Title Here]", x = "Time", y = "Mean score") +
theme_minimal() +
theme(panel.grid.minor = element_blank(),
plot.margin = margin(5.5, 120, 5.5, 5.5))
# animated layer
p2 <- p1 +
geom_point(data = animated_data,
aes(x = time_point, y = score, colour = condition, group = condition)) +
geom_line(data = animated_data,
aes(x = time_point, y = score, colour = condition, group = condition),
show.legend = FALSE) +
scale_color_manual(values = condition_colours) +
geom_segment(data = animated_data,
aes(xend = time_point, yend = score, y = score, colour = condition),
linetype = 2) +
geom_text(data = animated_data,
aes(x = max(time_point) + 1, y = score, label = condition, colour = condition),
hjust = 0, size = 4) +
transition_reveal(time_point) +
ease_aes('linear')
# render animation
animate(p2, nframes = 50, end_pause = 5, height = 1000, width = 1250, res = 120)
Suggestions for consideration:
The specific repelling direction / amount / etc. in geom_text_repel is determined by a random seed. You can set seed to a constant value in order to get the same repelled positions in each frame of animation.
I don't think it's possible for repelled text to go beyond the plot area, even if you turn off clipping & specify some repel range outside plot limits. The whole point of that package is to keep text labels away from one another while remaining within the plot area. However, you can extend the plot area & use geom_segment instead of geom_hline to plot the horizontal lines, such that these lines stop before they reach the repelled text labels.
Since there are more geom layers using animated_data as their data source, it would be cleaner to put animated_data & associated common aesthetic mappings in the top level ggplot() call, rather than static_data.
Here's a possible implementation. Explanation in annotations:
p3 <- ggplot(animated_data,
aes(x = time_point, y = score, colour = condition, group = condition)) +
# static layers (assuming 11 is the desired ending point)
geom_segment(data = static_data,
aes(x = 0, xend = 11, y = fixed_score, yend = fixed_score),
inherit.aes = FALSE, colour = "grey25") +
geom_text_repel(data = static_data,
aes(x = 11, y = fixed_score, label = hline_label),
hjust = 0, size = 4, direction = "y", box.padding = 1.0, inherit.aes = FALSE,
seed = 123, # set a constant random seed
xlim = c(11, NA)) + # specify repel range to be from 11 onwards
# animated layers (only specify additional aesthetic mappings not mentioned above)
geom_point() +
geom_line() +
geom_segment(aes(xend = time_point, yend = score), linetype = 2) +
geom_text(aes(x = max(time_point) + 1, label = condition),
hjust = 0, size = 4) +
# static aesthetic settings (limits / expand arguments are specified in coordinates
# rather than scales, margin is no longer specified in theme since it's no longer
# necessary)
scale_x_continuous(breaks = seq(0, 10, by = 2)) +
scale_y_continuous(breaks = seq(2, 3, by = 0.10)) +
scale_color_manual(values = condition_colours) +
coord_cartesian(xlim = c(0, 13), ylim = c(2, 3), expand = FALSE) +
guides(col = F) +
labs(title = "[Title Here]", x = "Time", y = "Mean score") +
theme_minimal() +
theme(panel.grid.minor = element_blank()) +
# animation settings (unchanged)
transition_reveal(time_point) +
ease_aes('linear')
animate(p3, nframes = 50, end_pause = 5, height = 1000, width = 1250, res = 120)

How do I group stacked bars in ggplot2 and modify colors for certain values?

Using ggplot2 I am trying to create a grouped AND stacked barchart without using faceting. I want to avoid faceting, because I need to facet on years once I have a grouped and stacked barchart for the variables provided in the example.
This is the best solution so far:
df <- data.frame("industry"=c("A","A", "B", "B", "C", "C",
"A","A", "B", "B", "C", "C"),
"value"=c(4,6,7,1, 5,9,8,3, 5,5,6,7),
"woman"=c(1,0,1,0,1,0,1,0,1,0,1,0),
"disabled"=c(1,1,1,1,1,1,0,0,0,0,0,0))
ggplot(df,aes(paste(industry,disabled),value))+
geom_col(aes(fill=factor(woman)))+
coord_flip()
This is basically what I want (see link above), but the bars should be grouped within each industry, using just one label for industry for both values of disabled. No label needed for disabled. The disabled=0 bars should have a faded color compared to the disabled=1 bars.
The intention of the chart is to display the distribution of employment across industries for the disabled population, compared to the general population (faded) and to show gender proportions for each population. (Values in example just for illustration).
Try this:
library(ggplot2)
ggplot(df, aes(interaction(disabled, industry), value, alpha = factor(woman))) +
geom_col(aes(fill = factor(woman))) +
scale_alpha_manual(values = c(0.5, 1)) +
scale_x_discrete(labels = c(0, 1, 0, 1, 0, 1)) +
annotate("text", label = "A", x = 1.5, y = -2) +
annotate("text", label = "B", x = 3.5, y = -2) +
annotate("text", label = "C", x = 5.5, y = -2) +
coord_cartesian(ylim = c(0, 15), clip = "off", expand = FALSE) +
coord_flip(ylim = c(0, 15), clip = "off", expand = TRUE) +
theme(axis.title.y = element_blank())
We are manually specifying that alpha values should vary by factor(woman) and setting the level-specific alpha values using scale_alpha_manual(). We set your subgroup 0,1 labels manually with scale_x_discrete. We are using annotate() to place your group labels, which can be placed outside of the plotting area by using coord_cartesian() with clip = "off".

Resources