Add label to geom_vline within a ggplot2 figure - r

so far I can manage to build the following geom_density figure using ggpot2:
cuts1 <- data.frame(Ref="p", vals=c(140))
cuts2 <- data.frame(Ref="s", vals=c(300))
cuts3 <- data.frame(Ref="m", vals=c(250))
cuts <- rbind(cuts1, cuts2, cuts3)
ggplot(mtcars, aes(x=disp)) +
geom_density(color = "black",
fill = 4,
alpha = 1) +
geom_vline(data = cuts , aes(xintercept=vals, color= Ref) )
And I wondered if someone knew a way to plot the geom_vline much more like that :
Where the lines do not reach the top and bottom of the figure and where the labels are all displayed with a rotation.

Here is one potential solution:
library(ggplot2)
cuts1 <- data.frame(Ref="p", vals=c(140))
cuts2 <- data.frame(Ref="s", vals=c(300))
cuts3 <- data.frame(Ref="m", vals=c(250))
cuts <- rbind(cuts1, cuts2, cuts3)
ggplot(mtcars, aes(x=disp)) +
geom_density(color = "black",
fill = 4,
alpha = 1) +
geom_segment(data = cuts, aes(x=vals, xend = vals,
y = 0, yend = max(density(mtcars$disp)[[2]]),
color= Ref), key_glyph = "vpath") +
geom_text(data = cuts, aes(x = vals, y = max(density(mtcars$disp)[[2]]) * 1.02,
label = Ref), nudge_x = 5, angle = 45)
Created on 2022-08-29 by the reprex package (v2.0.1)

Take a look at geom_segment, you can set the yend parameter to where you want your lines to end.

Related

Customize the position of `geom_rug`

Below is a working example
library(ggplot2)
set.seed(926)
df <- data.frame(expression = rnorm(900),
time = c(rnorm(300), rnorm(300, 1, 2), rnorm(300, 2,0.5)),
membership = factor(rep(1:3, each = 300)))
ggplot(df, aes(x = time, y = expression, fill = membership)) +
geom_point(shape=21, size = 3) +
geom_rug(data = subset(df, membership ==3), sides = "b", color = "green", length = unit(1.5, "cm")) +
geom_rug(data = subset(df, membership ==2), sides = "b", color = "blue", length = unit(1, "cm")) +
geom_rug(data = subset(df, membership ==1), sides = "b", color = "red") +
scale_y_continuous(expand = c(0.3, 0))
My hope is something like
.
Note that I know the options of outside = TRUE, side = "tb" out there. But placing all rug plots at the bottom is what I really hope for.
geom_rug is designed to be drawn at the margins of a plot. It's probably best to use geom_point with a custom symbol in this case:
ggplot(df, aes(x = time, y = expression, fill = membership)) +
geom_point(shape=21, size = 3) +
geom_point(aes(y = -as.numeric(membership) - 2.5, color = membership),
shape = "|", size = 8) +
geom_hline(yintercept = -3) +
theme_classic(base_size = 20) +
scale_y_continuous(breaks = c(-2, 0, 2))
I don't think the position of geom_rug() can be easily customised. I'd recommend to use geom_segment() instead to draw the rugs like you'd want them.
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.2.2
set.seed(926)
df <- data.frame(expression = rnorm(900),
time = c(rnorm(300), rnorm(300, 1, 2), rnorm(300, 2,0.5)),
membership = factor(rep(1:3, each = 300)))
# Helper variables
limits <- range(df$expression)
step <- diff(limits) * 0.1
size <- 0.45 * step
ggplot(df, aes(x = time, y = expression, fill = membership)) +
geom_point(shape=21, size = 3) +
geom_segment(
aes(
colour = membership,
xend = time,
y = limits[1] - as.numeric(membership) * step + size,
yend = limits[1] - as.numeric(membership) * step - size
)
)
Created on 2022-12-12 by the reprex package (v2.0.1)

How to add an histogram or density plot on the right hand side of this example plot to describe the distribution of y-values?

To make it clear, I am looking for a simple way of adding a 90-degree-rotated histogram or density plot whose x-axis aligns with the y-axis of the example plot given below.
library(ggplot2)
library(tibble)
x <- seq(100)
y <- rnorm(100)
my_data <- tibble(x = x, y = y)
ggplot(data = my_data, mapping = aes(x = x, y = y)) +
geom_line()
Created on 2019-01-28 by the reprex package (v0.2.1)
I'd try it with either geom_histogram or geom_density, the patchwork library, and dynamically setting limits to match the plots.
Rather than manually setting limits, get the range of y-values, set that as the limits in scale_y_continuous or scale_x_continuous as appropriate, and add some padding with expand_scale. The first plot is the line plot, and the second and third are distribution plots, with the axes flipped. All have the scales set to match.
library(ggplot2)
library(tibble)
library(patchwork)
y_range <- range(my_data$y)
p1 <- ggplot(data = my_data, mapping = aes(x = x, y = y)) +
geom_line() +
scale_y_continuous(limits = y_range, expand = expand_scale(mult = 0.1))
p2_hist <- ggplot(my_data, aes(x = y)) +
geom_histogram(binwidth = 0.2) +
coord_flip() +
scale_x_continuous(limits = y_range, expand = expand_scale(mult = 0.1))
p2_dens <- ggplot(my_data, aes(x = y)) +
geom_density() +
coord_flip() +
scale_x_continuous(limits = y_range, expand = expand_scale(mult = 0.1))
patchwork allows you to simply add plots to each other, then add the plot_layout function where you can customize the layout.
p1 + p2_hist + plot_layout(nrow = 1)
p1 + p2_dens + plot_layout(nrow = 1)
I've generally seen these types of plots where the distribution is shown in a "marginal" plot—that is, setup to be secondary to the main (in this case, line) plot. The ggExtra package has a marginal plot, but it only seems to work where the main plot is a scatterplot.
To do this styling manually, I'm setting theme arguments on each plot inline as I pass them to plot_layout. I took off the axis markings from the histogram so its left side is clean, and shrunk the margins on the sides of the two plots that meet. In plot_layout, I'm scaling the widths so the histogram appears more in the margins of the line chart. The same could be done with the density plot.
(p1 +
theme(plot.margin = margin(r = 0, unit = "pt"))
) +
(p2_hist +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank(),
plot.margin = margin(l = 0, unit = "pt"))
) +
plot_layout(nrow = 1, widths = c(1, 0.2))
Created on 2019-01-28 by the reprex package (v0.2.1)
You can try using geom_histogram or geom_density, however it's a little bit complicated as you have to rotate axis for them (while keeping original orientation for geom_line). I would use geom_violin (which is a density plot, but mirrored). If you want to get only one sided violin plot you can use custom geom_flat_violin geom. It was first posted by #David Robinson on his gists.
I used this geom in different answer, however I don't think that it's a duplicate as you need to put it at the end of the plot and combine with different geom.
Final code is:
library(ggplot2)
ggplot(data.frame(x = seq(100), y = rnorm(100))) +
geom_flat_violin(aes(100, y), color = "red", fill = "red", alpha = 0.5, width = 10) +
geom_line(aes(x, y))
geom_flat_violin code:
library(dplyr)
"%||%" <- function(a, b) {
if (!is.null(a)) a else b
}
geom_flat_violin <- function(mapping = NULL, data = NULL, stat = "ydensity",
position = "dodge", trim = TRUE, scale = "area",
show.legend = NA, inherit.aes = TRUE, ...) {
layer(
data = data,
mapping = mapping,
stat = stat,
geom = GeomFlatViolin,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
trim = trim,
scale = scale,
...
)
)
}
GeomFlatViolin <-
ggproto(
"GeomFlatViolin",
Geom,
setup_data = function(data, params) {
data$width <- data$width %||%
params$width %||% (resolution(data$x, FALSE) * 0.9)
# ymin, ymax, xmin, and xmax define the bounding rectangle for each group
data %>%
dplyr::group_by(.data = ., group) %>%
dplyr::mutate(
.data = .,
ymin = min(y),
ymax = max(y),
xmin = x,
xmax = x + width / 2
)
},
draw_group = function(data, panel_scales, coord)
{
# Find the points for the line to go all the way around
data <- base::transform(data,
xminv = x,
xmaxv = x + violinwidth * (xmax - x))
# Make sure it's sorted properly to draw the outline
newdata <-
base::rbind(
dplyr::arrange(.data = base::transform(data, x = xminv), y),
dplyr::arrange(.data = base::transform(data, x = xmaxv), -y)
)
# Close the polygon: set first and last point the same
# Needed for coord_polar and such
newdata <- rbind(newdata, newdata[1,])
ggplot2:::ggname("geom_flat_violin",
GeomPolygon$draw_panel(newdata, panel_scales, coord))
},
draw_key = draw_key_polygon,
default_aes = ggplot2::aes(
weight = 1,
colour = "grey20",
fill = "white",
size = 0.5,
alpha = NA,
linetype = "solid"
),
required_aes = c("x", "y")
)
You could use egg::ggarrange(). So basically what you want is this:
p <- ggplot(data=my_data, mapping=aes(x=x, y=y)) +
geom_line() + ylim(c(-2, 2))
q <- ggplot(data=my_data, mapping=aes(x=y)) +
geom_histogram(binwidth=.05) + coord_flip() + xlim(c(-2, 2))
egg::ggarrange(p, q, nrow=1)
Result
Data
set.seed(42)
my_data <- data.frame(x=seq(100), rnorm(100))
my_data1 <- count(my_data, vars=c("y"))
p1 <- ggplot(data = my_data, mapping = aes(x = x, y = y)) + geom_line()
p2 <- ggplot(my_data1,aes(x=freq,y=y))+geom_line()+theme(axis.title.y = element_blank(),axis.text.y = element_blank())
grid.draw(cbind(ggplotGrob(p1), ggplotGrob(p2), size = "last"))

How do I limit the range of the viridis colour scale?

I have two sets of data, which I want to present using a heat map with the viridis color scale. For the first data set, my values range from 0 to 1.2 and I can easily see the differences I want to see. However my second data set has some outliers, resulting in a range from 0 to 2. Now it's harder to see the differences in the interesting range between 0 and 1 and it's more diffucult to compare the two images directly. Is there a possibility to show the data from 0 to 1.2 using the viridis colour scale while showing the higher values in yellow ("highest" colour of the viridis scale)?
Here is an example:
library(viridis)
#Create Data
DataSet1 <- expand.grid(x = 0:5, y = 0:5)
DataSet1$z <- runif(36, 0, 1.2)
DataSet2 <- expand.grid(x = 0:5, y = 0:5)
DataSet2$z <- runif(36, 0, 2)
#Plot Data
ggplot(DataSet1, aes(x, y, fill = z)) +
geom_tile() +
scale_fill_viridis() +
geom_text(aes(label = round(z, 2)), size = 2)
DataSet1: Differences between 0.5 and 0.7 are easy to see
ggplot(DataSet2, aes(x, y, fill = z)) +
geom_tile() +
scale_fill_viridis() +
geom_text(aes(label = round(z, 2)), size = 2)
DataSet2: Differences between 0.5 and 0.7 are diffucult to see
EDIT 2022-05-03: The scale function is called scale_fill_viridis_c() these days.
#ClausWilke's solution is better because it shows in the legend, but sometimes one just needs a quick solution without having to write too much specific code. This one also relies on the scales package
ggplot(DataSet2, aes(x, y, fill = z)) +
geom_tile() +
scale_fill_viridis_c(limits = c(0.2, 1), oob = scales::squish) +
geom_text(aes(label = round(z, 2)), size = 2)
You can define an arbitrary rescaling function. Not sure this looks that great, would likely need some work with the legend, but in principle this mechanism allows you to map data values onto the scale in any way you want.
ggplot(DataSet2, aes(x, y, fill = z)) +
geom_tile() +
scale_fill_viridis(rescaler = function(x, to = c(0, 1), from = NULL) {
ifelse(x<1.2,
scales::rescale(x,
to = to,
from = c(min(x, na.rm = TRUE), 1.2)),
1)}) +
geom_text(aes(label = round(z, 2)), size = 2)
Are you looking for something like this?
ggplot(DataSet2, aes(x, y, fill = z)) +
geom_tile() +
scale_fill_gradient(low="green", high="red", limits=c(0, 1.2),
na.value = "yellow") +
geom_text(aes(label = round(z, 2)), size = 2)
Using the viridis colors, asper jazzurro recommendation.
ggplot(DataSet2, aes(x, y, fill = z)) +
geom_tile() +
scale_fill_gradientn(colors = viridis_pal()(9), limits=c(0, 1.2),
na.value = "#FDE725FF") +
geom_text(aes(label = round(z, 2)), size = 2)
It's not necessarily an improvement, but you could do something like this to show the higher values in yellow:
DataSet2A <- DataSet2 %>% filter(z <= 1.2)
DataSet2B <- DataSet2 %>% filter(z > 1.2)
ggplot(DataSet2A, aes(x, y, fill = z)) +
geom_tile() +
scale_fill_viridis(begin = 0, end = .75) +
geom_text(aes(label = round(z, 2)), size = 2) +
geom_tile(data = DataSet2B, aes(x, y), fill = "yellow")
Maybe if you play around with the cutoff as well as the begin= and end= parameters in the scale, which control the portion of the viridis scale that you're employing, you can achieve the result you want. (Note that you can only have one fill scale per plot, but you can set additional constant fills as I've done here with yellow.)

Slight point strokes in ggplot points

Consider the following:
library(ggplot2)
df = data.frame(x = rep(0,9), y = rep(0,9), alp = c(1:8/20,1))
ggplot(df) +
geom_point(aes(x, y, alpha=alp), size = 20, col = 'red') +
theme_minimal() + facet_wrap(~ alp) + guides(alpha = F)
As you can see there are feint outlines. It makes overlaying many low-transparency points look a bit like frogspawn. Is this just a Mac thing? Any idea how to remove it?
The default point shape for ggplot2 is pch = 19. It's not one of those points where the colour of its border and its inside can be controlled separately; for instance, in the following, fill = 'black' has no effect.
library(ggplot2)
df = data.frame(x =runif(1000), y = runif(1000))
p = ggplot(df) +
geom_point(aes(x, y), alpha = .1, size = 5, fill = 'black', colour = 'red') +
theme_bw()
p
Yet the point does have a boundary line. The line's width can be changed with stroke; as follows:
p = ggplot(df) +
geom_point(aes(x, y), stroke = 2, alpha = .1, size = 5, fill = 'black', colour = 'red') +
theme_bw()
p
Unfortunately, setting stroke to zero will not remove the boundary line; it seems there is a lower limit.
To remove the boundary line, use one of the shapes that has a border that can be manipulated; for instance, shape = 21. Set its "fill" to red and its "colour" to transparent.
p = ggplot(df) +
geom_point(aes(x, y), shape = 21, alpha = .1, size = 5, fill = 'red', colour = 'transparent') +
theme_bw()
p
see::geom_point2 draws points without this border.
library(ggplot2)
library(see)
df = data.frame(x = rep(0,9), y = rep(0,9), alp = c(1:8/20,1))
ggplot(df) +
geom_point2(aes(x, y, alpha=alp), size = 20, col = 'red') +
theme_minimal() + facet_wrap(~ alp) + guides(alpha = F)
Created on 2020-05-14 by the reprex package (v0.3.0)

Constructing an area plot with outlines for discrete variable (i.e. with steps)

Similar to geom_area plot with areas and outlines ggplot, I'm trying to construct a stacked area plot with outlines. Since my variables are discrete I'm using geom_bar() for stacking them. The code is as follows:
require(ggplot2)
require(reshape)
x = 0:4
y1 = c(3,2,2,1,0)
y2 = c(1,1,0,0,0)
data = data.frame(x,y1,y2)
data.plot <-melt(data, id.vars = "x")
cols = c(y1="darkgrey",y2="lightgrey")
p = ggplot(data.plot,aes(x=x,y=value,fill=variable))
p + geom_bar(aes(width=1),stat = "identity") + theme_bw() + scale_fill_manual(values=cols)
Which gives
My problem is now adding the outlines as in the example I referred to. I can use colour="black" in geom_bar() but this adds vertical lines between the bars which look quite ugly.
Does anyone have a suggestion to get these outlines? The solution doesn't have to be based on geom_bar.
If possible, I am also interested in a solution where only the dark grey part has an outline, since this outline has an important interpretation. Perhaps this could be based on some shifted version of geom_line()?
Here is another approach, using annotate("path"). This suggestion has hard-coded values for some of the path components, but I suspect there is a way to algorithmically fill in those values (perhaps with gg_build().
p <- ggplot(data.plot,aes(x=x, y=value, fill=variable))
p <- p + geom_bar(aes(width=1), stat = "identity") + theme_bw() + scale_fill_manual(values=cols)
p <- p + annotate(x=c(-.5, 0.5, 0.5, 2.5, 2.5, 3.5, 3.5),
y=c(3, 3, 2, 2, 1, 1, 0 ), group = 1, "path", color = "black", size = 2)
p <- p + annotate(x=c(min(x)-.5, min(x)+0.5, min(x)+0.5, min(x)+2.5, min(x)+2.5, min(x)+3.5, min(x)+3.5),
y=c(max(value), max(value), max(value)- 1, max(value)- 1, max(value)- 2, max(value)- 2, min(value)), group = 1, "path", color = "black", size = 2)
p
Your plotting code (I don't want to use c since that's a function):
p <- ggplot(data.plot, aes(x = x, y = value, fill = variable))
p <- p + geom_bar(aes(width = 1), stat = "identity") + theme_bw() + scale_fill_manual(values = cols)
Now add a stepping line along the bars:
p <- p + geom_step(aes(x = x - 0.5), position = "stack")
It's a bit more work to fix a line along the axes:
library (dplyr)
y.max <- data.plot %>% group_by(x) %>% summarize(s = sum(value))
y.max <- max(y.max$s)
p + geom_step(aes(x = x - 0.5, ymax = value), position = "stack") +
annotate('segment',
x = min(data.plot$x) - 0.5,
xend = min(data.plot$x) - 0.5,
y = 0,
yend = y.max) +
annotate('segment',
x = min(data.plot$x) - 0.5,
xend = max(data.plot$x) - 0.5,
y = 0,
yend = 0)
I'd be interested in simpler solutions!

Resources