ggplot2: Adjust legend symbols in overlayed plot - r

I need to create a plot, in which a histogram gets overlayed by a density. Here is my result so far using some example data:
library("ggplot2")
set.seed(1234)
a <- round(rnorm(10000, 5, 5), 0)
b <- rnorm(10000, 5, 7)
df <- data.frame(a, b)
ggplot(df) +
geom_histogram(aes(x = a, y = ..density.., col = "histogram", linetype = "histogram"), fill = "blue") +
stat_density(aes(x = b, y = ..density.., col = "density", linetype = "density"), geom = "line") +
scale_color_manual(values = c("red", "white"),
breaks = c("density", "histogram")) +
scale_linetype_manual(values = c("solid", "solid")) +
theme(legend.title = element_blank(),
legend.position = c(.75, .75),
legend.text = element_text(size = 15))
Unfortunately I can not figure out how I can change the symbols in the legend properly. The first symbol should be a relatively thick red line and the second symbol should be a blue box without the white line in the middle.
Based on some internet research, I tried to change different things in scale_linetype_manual and further I tried to use override.aes, but I could not figure out how I would have to use it in this specific case.
EDIT - Here is the best solution based on the very helpful answers below.
ggplot(df) +
geom_histogram(aes(x = a, y = ..density.., linetype = "histogram"),
fill = "blue",
# I added the following 2 lines to keep the white colour arround the histogram.
col = "white") +
scale_linetype_manual(values = c("solid", "solid")) +
stat_density(aes(x = b, y = ..density.., linetype = "density"),
geom = "line", color = "red") +
theme(legend.title = element_blank(),
legend.position = c(.75, .75),
legend.text = element_text(size = 15),
legend.key = element_blank()) +
guides(linetype = guide_legend(override.aes = list(linetype = c(1, 0),
fill = c("white", "blue"),
size = c(1.5, 1.5))))

As you thought, most of the work can be done via override.aes for linetype.
Note I removed color from the aes of both layers to avoid some trouble I was having with the legend box outline. Doing this also avoids the need for the scale_*_* function calls. To set the color of the density line I used color outside of aes.
In override.aes I set the linetype to be solid or blank, the fill to be either white or blue, and the size to be 2 or 0 for the density box and histogram box, respectively.
ggplot(df) +
geom_histogram(aes(x = a, y = ..density.., linetype = "histogram"), fill = "blue") +
stat_density(aes(x = b, y = ..density.., linetype = "density"), geom = "line", color = "red") +
theme(legend.title = element_blank(),
legend.position = c(.75, .75),
legend.text = element_text(size = 15),
legend.key = element_blank()) +
guides(linetype = guide_legend(override.aes = list(linetype = c(1, 0),
fill = c("white", "blue"),
size = c(2, 0))))

The fill and colour aesthetics are labelled by histogram and density respectively, and their values set using scale_*_manual. Doing so maps directly to the desired legend without needing any overrides.
ggplot(df) +
geom_histogram(aes(x = a, y = ..density.., fill = "histogram")) +
stat_density(aes(x = b, y = ..density.., colour="density"), geom = "line") +
scale_fill_manual(values = c("blue")) +
scale_colour_manual(values = c("red")) +
labs(fill="", colour="") +
theme(legend.title = element_blank(),
legend.position = c(.75, .75),
legend.box.just = "left",
legend.background = element_rect(fill=NULL),
legend.key = element_rect(fill=NULL),
legend.text = element_text(size = 15))

Related

Color dataset by group and add geom_vline to legend only

I have a genome-wide dataset that I'm trying to plot in the following way:
Have each chromosome be a separate color
Have specific windows highlighted by a bar (I'm using geom_vline) - this I'm getting from a
separate table
Have only geom_vline feature in the legend
I have tried many different things, but it seems I cannot have all three together!
Here is the link to both datasets:
allStats & allStats_fstPi_group15
With this code, I can have the first 2, but not the 3rd:
ggplot(allStats, aes(x = mid2, y = Fst_group1_group5,
color = as_factor(scaffold))) +
geom_point(size = 2) +
geom_vline(xintercept = chrom$add, color = "grey") +
scale_y_continuous(expand = c(0,0), limits = c(0, 1)) +
scale_x_continuous(labels = chrom$chrID, breaks = axis_set$center) +
scale_color_manual(values = rep(c("#276FBF", "#183059"), unique(length(chrom$chrID)))) +
scale_size_continuous(range = c(0.5,3)) +
labs(x = NULL,
y = "Fst SBM vs OC") +
theme_minimal() +
theme(
legend.position = "none",
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
axis.title.y = element_text(),
axis.text.x = element_text()) +
geom_vline(data = allStats_fstPi_group15,
aes(xintercept = allStats_fstPi_group15$mid2),
color = "orange", show.legend = T)
With this one I can get 2 and 3 only (I'm not able to color code each block separately):
cols <- c("SBM vs OC" = rep(c("#276FBF", "#183059"), unique(length(chrom$chrID))),
"90th percentile (Fst vs Pi)" = "orange")
ggplot(allStats, aes(x = mid2)) +
geom_point(aes(y = Fst_group1_group5,
color = as_factor(scaffold)),
size = 2) +
geom_vline(data = allStats_fstPi_group15,
aes(xintercept = allStats_fstPi_group15$mid2,
color = "90th percentile (Fst vs Pi)")) +
scale_color_manual(values = cols)
I've seen the issue with the legend being that color needs to be within aes(), so my question is: is it impossible what I'm trying to do?

How to set a conditional size scale based on name in ggplot?

Below is a simple bubble plot for three character traits (Lg_chr, Mid_chr, and Sm_chr) across three locations.
All good, except that because the range of Lg_chr is several orders of magnitude larger than the ranges for the other two traits, it swamps out the area differences between the smaller states, making the differences very difficult to see - for example, the area of the points for for Location_3's Mid_chr (70) and Sm_chr (5), look almost the same.
Is there a way to set a conditional size scale based on name in ggplot2 without having to facit wrap them? Maybe a conditional statement for scale_size_continuous(range = c(<>, <>)) separately for Lg_chr, Mid_chr, and Sm_chr?
test_df = data.frame(lg_chr = c(100000, 150000, 190000),
mid_chr = c(50, 90, 70),
sm_chr = c(15, 10, 5),
names = c("location_1", "location_2", "location_3"))
#reformat for graphing
test_df_long<- test_df %>% pivot_longer(!names,
names_to = c("category"),
values_to = "value")
#plot
ggplot(test_df_long,
aes(x = str_to_title(category),
y = str_to_title(names),
colour = str_to_title(names),
size = value)) +
geom_point() +
geom_text(aes(label = value),
colour = "white",
size = 3) +
scale_x_discrete(position = "top") +
scale_size_continuous(range = c(10, 50)) +
scale_color_manual(values = c("blue", "red",
"orange")) +
labs(x = NULL, y = NULL) +
theme(legend.position = "none",
panel.background = element_blank(),
panel.grid = element_blank(),
axis.ticks = element_blank()) ```
Edit:
You could use ggplot_build to manually modify the point layer [[1]] to specify the sizes of your points like this:
#plot
p <- ggplot(test_df_long,
aes(x = str_to_title(category),
y = str_to_title(names),
colour = str_to_title(names),
size = value)) +
geom_point() +
geom_text(aes(label = value),
colour = "white",
size = 3) +
scale_x_discrete(position = "top") +
scale_color_manual(values = c("blue", "red",
"orange")) +
labs(x = NULL, y = NULL) +
theme(legend.position = "none",
panel.background = element_blank(),
panel.grid = element_blank(),
axis.ticks = element_blank())
q <- ggplot_build(p)
q$data[[1]]$size <- c(7,4,1,8,5,2,9,6,3)*5
q <- ggplot_gtable(q)
plot(q)
Output:
You could use scale_size with a log10 scale to make the difference more visuable like this:
#plot
ggplot(test_df_long,
aes(x = str_to_title(category),
y = str_to_title(names),
colour = str_to_title(names),
size = value)) +
geom_point() +
geom_text(aes(label = value),
colour = "white",
size = 3) +
scale_size(trans="log10", range = c(10, 50)) +
scale_x_discrete(position = "top") +
scale_color_manual(values = c("blue", "red",
"orange")) +
labs(x = NULL, y = NULL) +
theme(legend.position = "none",
panel.background = element_blank(),
panel.grid = element_blank(),
axis.ticks = element_blank())
Output:

Adding 2 vlines to a ggplot, with an additional custom legend for the lines

I'm trying to have a ggplot with two vertical lines on it, with a separate custom legend to explain what the lines represent.
This is my code (using iris):
irate <- as.data.frame(iris)
irate$Species <- as.character(irate$Species)
irritating <- ggplot(irate) +
geom_line(aes(y = Sepal.Length, x = Sepal.Width), color = "blue") +
geom_point(aes(y = Sepal.Length, x = Sepal.Width, color = Species), size = 5) +
theme(legend.position = "right", axis.text.y = element_blank(), axis.title.y = element_blank(), axis.ticks.y = element_blank(), panel.grid.major.y = element_blank())+
labs(title = "The chart", x = "Sepal Width") +
geom_vline(color = "black", linetype = "dashed", aes(xintercept = 3))+
geom_vline(color = "purple", linetype = "dashed", aes(xintercept = 4))
irritating
I've tried using things like scale_color_manual (etc), but for some reason when doing so it will interfere with the main legend and not produce a separate one.
Using answers to questions like: Add legend to geom_vline
I add: +scale_color_manual(name = "still problematic", values = c("black", "purple", "red"))
the addition of "red" in the vector the only way to get it to produce a chart (otherwise there's a: "Insufficient values in manual scale. 3 needed but only 2 provided." error).
One option to achieve your desired result would be to use a different aesthetic to create the colro legend for your vlines. In my code below I map on the linetype aes and use the override.aes argument of guide_legend to assign the right colors:
irate <- as.data.frame(iris)
irate$Species <- as.character(irate$Species)
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.2.2
base <- ggplot(irate) +
geom_line(aes(y = Sepal.Length, x = Sepal.Width), color = "white") +
geom_point(aes(y = Sepal.Length, x = Sepal.Width, color = Species), size = 5) +
theme(legend.position = "right", axis.text.y = element_blank(), axis.title.y = element_blank(), axis.ticks.y = element_blank(), panel.grid.major.y = element_blank())+
labs(title = "The chart", x = "Sepal Width")
base +
geom_vline(color = "black", aes(xintercept = 3, linetype = "Black Line"))+
geom_vline(color = "purple", aes(xintercept = 4, linetype = "Purple line")) +
scale_linetype_manual(name = "still problematic", values = c("dashed", "dashed")) +
guides(linetype = guide_legend(override.aes = list(color = c("black", "purple"))))
And the second and perhaps cleaner solution would be to use the ggnewscale package which allows to have multiple legends for the same aesthetic:
library(ggnewscale)
base +
new_scale_color() +
geom_vline(linetype = "dashed", aes(xintercept = 3, color = "Black Line"))+
geom_vline(linetype = "dashed", aes(xintercept = 4, color = "Purple line")) +
scale_color_manual(name = "still problematic", values = c("black", "purple"))
Here is a way with package ggnewscale that makes plotting two legends for two color mappings very easy.
The main trick is to create a data.frame with the x intercept values and colors, then assign this data set to the data argument of geom_vline. If this is run after new_scale_color() the colors will be the right ones.
library(ggplot2)
library(ggnewscale)
irate <- iris
irate$Species <- as.character(irate$Species)
happy <- data.frame(xintercept = c(3, 4), color = c("black", "purple"))
delightful <- ggplot(irate) +
geom_line(aes(y = Sepal.Length, x = Sepal.Width), color = "blue") +
geom_point(aes(y = Sepal.Length, x = Sepal.Width, color = Species), size = 5) +
theme(legend.position = "right", axis.text.y = element_blank(), axis.title.y = element_blank(), axis.ticks.y = element_blank(), panel.grid.major.y = element_blank())+
labs(title = "The chart", x = "Sepal Width") +
new_scale_color() +
geom_vline(
data = happy,
mapping = aes(xintercept = xintercept, color = color),
linetype = "dashed"
) +
scale_color_manual(values = c(black = "black", purple = "purple"))
delightful
Created on 2022-11-30 with reprex v2.0.2
Using linetype in aes to put those parts in the legend you can then override the guide display colours:
library(ggplot2)
irate <- as.data.frame(iris)
irate$Species <- as.character(irate$Species)
irritating <- ggplot(irate) +
geom_line(aes(y = Sepal.Length, x = Sepal.Width), color = "white") +
geom_point(aes(y = Sepal.Length, x = Sepal.Width, color = Species), size = 5) +
theme(
legend.position = "right",
axis.text.y = element_blank(),
axis.title.y = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.major.y = element_blank()
) +
labs(title = "The chart", x = "Sepal Width") +
geom_vline(linewidth = 1.5,
color = "black",
aes(xintercept = 3, linetype = "Something")) +
geom_vline(linewidth = 1.5,
color = "purple",
aes(xintercept = 4, linetype = "Another thing")) +
scale_linetype_manual(
"Things",
values = c("dashed", "dashed"),
guide = guide_legend(override.aes = list(colour = c("purple", "black")))
)
irritating

ggplot poinstrange with multiple categories - change fill color

I have the following plot:
ggplot() +
geom_pointrange(data=data_FA, mapping=aes(x=snr, y=median, ymin=p25, ymax=p75, colour=factor(method), group=method), position = pd) +
geom_hline(yintercept=FA_GT, linetype="dashed", color = "blue") +
theme(legend.title = element_blank(), legend.position = "none", panel.border = element_rect(colour = "gray", fill=NA, size=1),
plot.margin = unit( c(0,0.5,0,0) , units = "lines" )) +
labs( title = "", subtitle = "")
obtained from the following dataset:
For each group (red and blue) codified by the factor method, I want to see red/blue dots and lines with different transparency according to the factor subset. Does anyone know how to do that? In addition, how can I add more separation space between the two groups (red and blue)?
Thank you!
You can just map alpha to subset inside aes:
ggplot(data_FA) +
geom_pointrange(aes(snr, median, ymin = p25, ymax = p75,
colour = factor(method), group = method,
alpha = subset),
position = pd) +
geom_hline(yintercept = FA_GT, linetype = "dashed", color = "blue") +
scale_alpha_manual(values = c(0.3, 1)) +
theme_bw() +
theme(legend.position = 'none',
panel.border = element_rect(colour = "gray", fill = NA, size = 1),
plot.margin = unit( c(0,0.5,0,0), units = "lines" )) +
labs(title = "", subtitle = "")
Data
data_FA <- data.frame(X = c("X1", "X1.7", "X1.14", "X1.21"),
snr = "snr10",
subset = c("full", "full", "subset5", "subset5"),
method= c("sc", "trunc", "sc", "trunc"),
median = c(0.4883985, 0.4883985, 0.4923685, 0.4914260),
p25 = c(0.4170183, 0.4170183, 0.4180174, 0.4187472),
p75 = c(0.5617713, 0.5617713, 0.5654203, 0.5661565))
FA_GT <- 0.513
pd <- position_dodge2(width = 1)

ggMarginal ignores choord_cartesian. How to change marginal scales?

I'm trying to plot a 2D density plot with ggplot, with added marginal histograms. Problem is that the polygon rendering is stupid and needs to be given extra padding to render values outside your axis limits (e.g. in this case I set limits between 0 and 1, because values outside this range have no physical meaning). I still want the density estimate though, because often it's much cleaner than a blocky 2D heatmap.
Is there a way around this problem, besides scrapping ggMarginal entirely and spending another 50 lines of code trying to align histograms?
Unsightly lines:
Now rendering works, but ggMarginal ignores choord_cartesian(), which demolishes the plot:
Data here:
http://pasted.co/b581605a
dataset <- read.csv("~/Desktop/dataset.csv")
library(ggplot2)
library(ggthemes)
library(ggExtra)
plot_center <- ggplot(data = dataset, aes(x = E,
y = S)) +
stat_density2d(aes(fill=..level..),
bins= 8,
geom="polygon",
col = "black",
alpha = 0.5) +
scale_fill_continuous(low = "yellow",
high = "red") +
scale_x_continuous(limits = c(-1,2)) + # Render padding for polygon
scale_y_continuous(limits = c(-1,2)) + #
coord_cartesian(ylim = c(0, 1),
xlim = c(0, 1)) +
theme_tufte(base_size = 15, base_family = "Roboto") +
theme(axis.text = element_text(color = "black"),
panel.border = element_rect(colour = "black", fill=NA, size=1),
legend.text = element_text(size = 12, family = "Roboto"),
legend.title = element_blank(),
legend.position = "none")
ggMarginal(plot_center,
type = "histogram",
col = "black",
fill = "orange",
margins = "both")
You can solve this problem by using xlim() and ylim() instead of coord_cartesian.
dataset <- read.csv("~/Desktop/dataset.csv")
library(ggplot2)
library(ggthemes)
library(ggExtra)
plot_center <- ggplot(data = dataset, aes(x = E,
y = S)) +
stat_density2d(aes(fill=..level..),
bins= 8,
geom="polygon",
col = "black",
alpha = 0.5) +
scale_fill_continuous(low = "yellow",
high = "red") +
scale_x_continuous(limits = c(-1,2)) + # Render padding for polygon
scale_y_continuous(limits = c(-1,2)) + #
xlim(c(0,1)) +
ylim(c(0,1)) +
theme_tufte(base_size = 15, base_family = "Roboto") +
theme(axis.text = element_text(color = "black"),
panel.border = element_rect(colour = "black", fill=NA, size=1),
legend.text = element_text(size = 12, family = "Roboto"),
legend.title = element_blank(),
legend.position = "none")
ggMarginal(plot_center,
type = "histogram",
col = "black",
fill = "orange",
margins = "both")

Resources