I am trying to create a ggplot histogram with a density overlay, where the alpha changes past the number 1. An example can be seen on 538 under the Every outcome in our simulations section. The alpha differs based on the electoral vote count. I am close to getting a similar graph but I cannot figure out how to get the density and histogram to work together.
Code
library(data.table)
library(ggplot2)
dt <- data.table(ratio = rnorm(10000, mean = .5, sd = 1))
dt[, .(ratio,
al = (ratio >= 1))] %>%
ggplot(aes(x = ratio, alpha = al)) +
geom_histogram(aes(), bins = 100,
fill = 'red') +
geom_density(aes(),size = 1.5,
color = 'blue') +
geom_vline(xintercept = 1,
color = '#0080e2',
size = 1.2) +
scale_alpha_discrete(range = c(.65, .9))
This attempt correctly changes alpha past 1 as desired but the density estimate is not scaled.
dt[, .(ratio,
al = (ratio >= 1))] %>%
ggplot(aes(x = ratio)) +
geom_histogram(aes(y = ..density.., alpha = al), bins = 100,
fill = 'red') +
geom_density(aes(y = ..scaled..),size = 1.5,
color = 'blue',) +
geom_vline(xintercept = 1,
color = '#0080e2',
size = 1.2) +
scale_alpha_discrete(range = c(.65, .9))
This attempt correctly scales the density curve, but now the geom_histogram is calculated separately for values under 1 and above 1. I want them calculated as one group.
What am I missing?
The reason why knowing your theme is important is that there's an easy shortcut to this, which is not using alpha, but just drawing a semitransparent rectangle over the left half of your plot:
library(data.table)
library(ggplot2)
library(dplyr)
data.table(ratio = rnorm(10000, mean = .5, sd = 1)) %>%
ggplot(aes(x = ratio)) +
geom_histogram(aes(y = ..density..), bins = 100,
fill = 'red') +
geom_line(aes(), stat = "density", size = 1.5,
color = 'blue') +
geom_vline(xintercept = 1,
color = '#0080e2',
size = 1.2) +
annotate("rect", xmin = -Inf, xmax = 1, ymin = 0, ymax = Inf, fill = "white",
alpha = 0.5) +
theme_bw()
Splitting into two groups and using alpha is possible, but it basically requires you to precalculate the histogram and the density curve. That's fine, but it would be an awful lot of extra effort for very little visual gain.
Of course, if theme_josh has a custom background color and zany gridlines, this approach may not be quite so effective. As long as you set the fill color to the panel background you should get a decent result. (the default ggplot panel is "gray90" or "gray95" I think)
Related
First we prepare some toy data that sufficiently resembles the one I am working with.
rawdata <- data.frame(Score = rnorm(1000, seq(1, 0, length.out = 10), sd = 1),
Group = rep(LETTERS[1:3], 10000))
stdev <- c(10.78,10.51,9.42)
Now we plot the estimated densities via geom_density_ridges. I also add a grey highlight around zero via geom_rect. I also flip the chart with coord_flip.
p <- ggplot(rawdata, aes(x = Score, y = Group)) +
scale_y_discrete() +
geom_rect(inherit.aes = FALSE, mapping = aes(ymin = 0, ymax = Inf, xmin = -0.1 * min(stdev), xmax = 0.1 * max(stdev)),
fill = "grey", alpha = 0.5) +
geom_density_ridges(aes(fill = Group), scale = 0.5, size = 1, alpha=0.5) +
scale_color_manual(values = col) +
scale_fill_manual(values = col) +
labs(title="Toy Graph", y="Group", x="Value") +
coord_flip(xlim = c(-8, 8), ylim = NULL, expand = TRUE, clip = "on")
p
And this is the solution I get, which is close to what I was expecting, despite the detail of this enormous gap between the y axis an the start of the first factor in the x axis A. I tried using expand=c(0,0) inside scale_y_discrete() following some suggestions from other posts, but it does not make the gap smaller at all. If possible I would still like to have a certain gap, although minimal. I've been also trying to flip the densities in the y axis so the gap is filled by first factor density plot but I have been unsuccessful as it does not seem as trivial as one could expect.
Sorry, I know this might be technically two different questions, "How to reduce the gap from the y axis to the first density plot?" and "How to flip the densities from y axis to reduce the gap?" But I would really be happy with the first one as I understand the second question seems to be apparently less straightforward.
Thanks in advance! Any help is appreciated.
Flipping the densities also effectively reduces the space, so this might be all you need to do. You can achieve it with a negative scale parameter:
ggplot(rawdata, aes(x = Score, y = Group)) +
scale_y_discrete() +
geom_rect(inherit.aes = FALSE,
mapping = aes(ymin = 0, ymax = Inf,
xmin = -0.1 * min(stdev),
xmax = 0.1 * max(stdev)),
fill = "grey", alpha = 0.5) +
geom_density_ridges(aes(fill = Group), scale = -0.5, size = 1, alpha = 0.5) +
scale_color_manual(values = col) +
scale_fill_manual(values = col) +
labs(title = "Toy Graph", y = "Group", x = "Value") +
coord_flip(xlim = c(-8, 8), ylim = NULL, expand = TRUE, clip = "on")
If you want to keep the densities pointing the same way but just reduce space on the left side, simply set hard limits in your coord_flip, with no expansion:
ggplot(rawdata, aes(x = Score, y = Group)) +
geom_rect(inherit.aes = FALSE,
mapping = aes(ymin = 0, ymax = Inf,
xmin = -0.1 * min(stdev),
xmax = 0.1 * max(stdev)),
fill = "grey", alpha = 0.5) +
geom_density_ridges(aes(fill = Group), scale = 0.5, size = 1, alpha = 0.5) +
scale_color_manual(values = col) +
scale_fill_manual(values = col) +
scale_y_discrete() +
labs(title = "Toy Graph", y = "Group", x = "Value") +
coord_flip(xlim = c(-8, 8), ylim = c(0.8, 4), expand = FALSE)
I'm wanting to plot 3 numerical size values on one line in R in order of ascending size, but research so far has pointed me towards regular line graphs. I'm looking for something like this:
where size increases from left to right and I can plot my 3 data points on the line to show where each sample falls. It doesnt need to be as complicated as this example, just one line standalone.
How would I go about doing this?
Here's a quick recreation:
library(tidyverse)
mtcars %>%
group_by(gear = as.factor(gear)) %>%
summarize(min = min(wt),
max = max(wt),
mean = mean(wt),
sd = sd(wt),
median = median(wt)) -> summary
ggplot(summary, aes(y=gear)) +
geom_errorbarh(aes(xmin = min, xmax = max), height = 0.04, color = "gray70") +
geom_segment(aes(yend = gear, x = mean-sd, xend = mean+sd), alpha = 0.3,
color = "forestgreen", size = 10) +
geom_point(aes(x = median), shape = 17, color = "darkred") +
geom_text(aes(x = median, label = median), vjust = -1.5) +
theme_minimal() + theme(panel.grid = element_blank())
I'd like to insert median lines for factor levels into a violin plot in ggplot2. Here's some reproducible data:
set.seed(12)
FactorVar <- sample(LETTERS[1:5], 500, replace = T)
NumericVar <- abs(rnorm(500))
df <- data.frame(FactorVar, NumericVar)
To get the grouped medians I use tapply:
medians <- tapply(df$NumericVar, df$FactorVar, FUN = median)
And this is the code for the plot. As can be seen, I'm inserting each median line individually. That's cumbersome and uneconomical:
library(ggplot2)
g <-
ggplot(data = df,
aes(x = FactorVar, y = NumericVar, fill = FactorVar)) +
geom_violin(scale = "count", trim = F, adjust = 0.75) +
geom_point(aes(y = NumericVar),
position = position_jitter(width = .15), size = 0.9, alpha = 0.8) +
geom_hline(yintercept = mean(NumericVar), color = "blue", size = 0.8, linetype = 4) +
geom_segment(x = 0.5, xend = 1.5, y= medians[1], yend = medians[1], color = "red", linetype = 2) +
geom_segment(x = 1.5, xend = 2.5, y = medians[2], yend = medians[2], color = "red", linetype = 2) +
geom_segment(x = 2.5, xend = 3.5, y = medians[3], yend = medians[3], color = "red", linetype = 2) +
geom_segment(x = 3.5, xend = 4.5, y = medians[4], yend = medians[4], color = "red", linetype = 2) +
geom_segment(x = 4.5, xend = 5.5, y = medians[5], yend = medians[5], color = "red", linetype = 2) +
guides(fill = FALSE) +
guides(color = FALSE) +
coord_flip() +
theme_gray(); g
How can the median segments be inserted in a single command? Also, observe how the median line for factor A is thinner than the others? Why's that?
One method (that simplifies the +/- axis) would be to facet it. Before, though, we'll need to put the medians into a frame, preferably with the same grouping factors as the original.
mediansdf <- data.frame(FactorVar=names(medians), NumericVar=medians)
g <-
ggplot(data = df,
aes(x = FactorVar, y = NumericVar, fill = FactorVar)) +
geom_violin(scale = "count", trim = F, adjust = 0.75) +
geom_point(aes(y = NumericVar),
position = position_jitter(width = .15), size = 0.9, alpha = 0.8) +
geom_hline(yintercept = mean(NumericVar), color = "blue", size = 0.8, linetype = 4) +
guides(fill = FALSE) +
guides(color = FALSE) +
coord_flip() +
theme_gray() +
facet_grid(FactorVar~., scales="free") +
geom_segment(aes(x = 0.5, xend = 1.5, yend = NumericVar), color = "red", linetype = 2, data = mediansdf)
g
This example reused the y aesthetic, but since we have a different frame, we could easily use different names (and specify them within aes(...). One advantage to using the same variable names is (in my opinion) clearer declarative code.
Since the facet_grid adds the factor label on the right side, you likely could remove it from the axis. Note, if you do not use scales="free", then you'll see all factors in each facet, which is distracting and unnecessary.
The reason I am suggesting facets is that it makes the x and xend simple and relative to a single violin, so 0.5 to 1.5; otherwise, as you saw, there is some assumption on which is going with which integer placement.
Last, the appearance of thinner red lines for me was while looking at the raster plot window. If you save to vector-based format (e.g., PDF), the lines appear to be the same thickness.
I am trying to add four transparent bands to my ggplot for the following y ranges:
y<2 & y>1.5
y<1.5 & y>1
y<1 & y>0.5
y<0.5 & y>0
I don't want the ranges to overlap as that changes the colour that I'm assigning each band (as they're transparent).
I can sort of get the effect I'm after using geom_area (see code below), but they overlap, which changes the colour.
I'm wondering if there is a better way to get the bands specifically in the areas I want?
df <- data.frame(y1=rep(1.99, 100),
y2=rep(1.49, 100),
y3=rep(0.99, 100),
y4=rep(0.49, 100),
x =1:100)
ggplot(aes(x=x), data = df) +
geom_area(aes(y=ifelse(y1<2 & y1>1.5, y1, 0)), data=df, fill="yellow", alpha = 0.3) +
geom_area(aes(y=ifelse(y2<1.5 & y2>1, y2, 0)), data=df, fill="darkgoldenrod1", alpha = 0.3) +
geom_area(aes(y=ifelse(y3<1 & y3>0.5, y3, 0)), data=df, fill="darkorange1", alpha = 0.3) +
geom_area(aes(y=ifelse(y4<0.5 & y4>0, y4, 0)), data=df, fill="darkred", alpha = 0.3) +
theme_classic()
Also, potentially separate question, is there a way to make the fill color go all the way to the axis rather than just leaving a white buffer space around it?
Use geom_rect before plotting any points
ggplot() +
geom_rect(aes(xmin = -Inf, xmax = Inf, ymin = 1.5, ymax = 2), fill="yellow", alpha = 0.3) +
geom_rect(aes(xmin = -Inf, xmax = Inf, ymin = 1, ymax = 1.5), fill="darkgoldenrod1", alpha = 0.3) +
geom_point(data = df, aes(x = x, y = y1)) +
theme_classic()
See geom_rect and alpha - does this work with hard coded values? for getting alpha to work correctly with the rectangles.
I want to create in R a graphic similar to the one below to show where a certain person or company ranks relative to its peers. The score will always be between 1 and 100.
Although I am amenable to any ggplot solution it seemed to me that the best way would be to use geom_rect and then to adapt and add the arrowhead described in baptiste's answer to this question. However, I came unstuck on something even simpler - getting the geom_rect to fill properly with a gradient like that shown in the guide to the right of the plot below. This should be easy. What am I doing wrong?
library(ggplot2)
library(scales)
mydf <- data.frame(id = rep(1, 100), sales = 1:100)
ggplot(mydf) +
geom_rect(aes(xmin = 1, xmax = 1.5, ymin = 0, ymax = 100, fill = sales)) +
scale_x_discrete(breaks = 0:2, labels = 0:2) +
scale_fill_gradient2(low = 'blue', mid = 'white', high = 'red', midpoint = 50) +
theme_minimal()
I think that geom_tile() will be better - use sales for y and fill. With geom_tile() you will get separate tile for each sales value and will be able to see the gradient.
ggplot(mydf) +
geom_tile(aes(x = 1, y=sales, fill = sales)) +
scale_x_continuous(limits=c(0,2),breaks=1)+
scale_fill_gradient2(low = 'blue', mid = 'white', high = 'red', midpoint = 50) +
theme_minimal()