How to adjust relative transparency of ggplot2 points - r

I want to make some points stand out on a ggplot2 chart by giving them less transparency while the rest fade to the background. But no matter what two alpha values I give the sets of points, their relative transparency is the same.
Here's 0.8 vs 0.7:
x <- mtcars
x$opacity <- ifelse(x$cyl == 6, 0.8, 0.7)
ggplot(x, aes(x = wt, y = mpg, color = cyl, alpha = opacity)) +
geom_point()
And here's 0.8 vs 0.1 -- looks the same:
x$opacity <- ifelse(x$cyl == 6, 0.8, 0.1)
ggplot(x, aes(x = wt, y = mpg, color = cyl, alpha = opacity)) +
geom_point()
How can I fine-tune that relative alpha so that the two sets are closer in transparency? Right now the values of the two numbers don't seem to matter. Specifically, in this case I want the darker points (with the higher alpha) to be more transparent.

Since you are trying to pass actual alpha values to the aesthetic mapping, be sure to use
scale_alpha_identity()
Otherwise ggplot will rescale your values just like it created the colors for you automatically.

Add scale_alpha_continuous to your plot and define the range. e.g.
scale_alpha_continuous(range = c(0.7, 0.8))

You're mapping the values 0.7 and 0.8 to alpha, not necessarily using them for alpha. A quicker way is to map the condition and then set alpha:
library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg, color = cyl, alpha = cyl == 6)) +
geom_point() +
scale_alpha_discrete(range = c(0.2, 0.8))
#> Warning: Using alpha for a discrete variable is not advised.

Related

ggplot2 density of one dimension in 2D plot

I would like to plot a background that captures the density of points in one dimension in a scatter plot. This would serve a similar purpose to a marginal density plot or a rug plot. I have a way of doing it that is not particularly elegant, I am wondering if there's some built-in functionality I can use to produce this kind of plot.
Mainly there are a few issues with the current approach:
Alpha overlap at boundaries causes banding at lower resolution as seen here. - Primary objective, looking for a geom or other solution that draws a nice continuous band filled with specific colour. Something like geom_density_2d() but with the stat drawn from only the X axis.
"Background" does not cover expanded area, can use coord_cartesian(expand = FALSE) but would like to cover regular margins. - Not a big deal, is nice-to-have but not required.
Setting scale_fill "consumes" the option for the plot, not allowing it to be set independently for the points themselves. - This may not be easily achievable, independent palettes for layers appears to be a fundamental issue with ggplot2.
data(iris)
dns <- density(iris$Sepal.Length)
dns_df <- tibble(
x = dns$x,
density = dns$y
)%>%
mutate(
start = x - mean(diff(x))/2,
end = x + mean(diff(x))/2
)
ggplot() +
geom_rect(
data = dns_df,
aes(xmin = start, xmax = end, fill = density),
ymin = min(iris$Sepal.Width),
ymax = max(iris$Sepal.Width),
alpha = 0.5) +
scale_fill_viridis_c(option = "A") +
geom_point(data = iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_rug(data = iris, aes(x = Sepal.Length))
This is a bit of a hacky solution because it (ab)uses knowledge of how objects are internally parametrised to get what you want, which will yield some warnings, but gets you want you'd want.
First, we'll use a geom_raster() + stat_density() decorated with some choice after_stat()/stage() delayed evaluation. Normally, this would result in a height = 1 strip, but by setting the internal parameters ymin/ymax to infinitives, we'll have the strip extend the whole height of the plot. Using geom_raster() resolves the alpha issue you were having.
library(ggplot2)
p <- ggplot(iris) +
geom_raster(
aes(Sepal.Length,
y = mean(Sepal.Width),
fill = after_stat(density),
ymin = stage(NULL, after_scale = -Inf),
ymax = stage(NULL, after_scale = Inf)),
stat = "density", alpha = 0.5
)
#> Warning: Ignoring unknown aesthetics: ymin, ymax
p
#> Warning: Duplicated aesthetics after name standardisation: NA
Next, we add a fill scale, and immediately follow that by ggnewscale::new_scale_fill(). This allows another layer to use a second fill scale, as demonstrated with fill = Species.
p <- p +
scale_fill_viridis_c(option = "A") +
ggnewscale::new_scale_fill() +
geom_point(aes(Sepal.Length, Sepal.Width, fill = Species),
shape = 21) +
geom_rug(aes(Sepal.Length))
p
#> Warning: Duplicated aesthetics after name standardisation: NA
Lastly, to get rid of the padding at the x-axis, we can manually extend the limits and then shrink in the expansion. It allows for an extended range over which the density can be estimated, making the raster fill the whole area. There is some mismatch between how ggplot2 and scales::expand_range() are parameterised, so the exact values are a bit of trial and error.
p +
scale_x_continuous(
limits = ~ scales::expand_range(.x, mul = 0.05),
expand = c(0, -0.2)
)
#> Warning: Duplicated aesthetics after name standardisation: NA
Created on 2022-07-04 by the reprex package (v2.0.1)
This doesn't solve your problem (I'm not sure I understand all the issues correctly), but perhaps it will help:
Background does not cover expanded area, can use coord_cartesian(expand = FALSE) but would like to cover regular margins.
If you make the 'background' larger and use coord_cartesian() you can get the same 'filled-to-the-edges' effect; would this work for your use-case?
Alpha overlap at boundaries causes banding at lower resolution as seen here.
I wasn't able to fix the banding completely, but my approach below appears to reduce it.
Setting scale_fill "consumes" the option for the plot, not allowing it to be set independently for the points themselves.
If you use geom_segment() you can map density to colour, leaving fill available for e.g. the points. Again, not sure if this is a useable solution, just an idea that might help.
library(tidyverse)
data(iris)
dns <- density(iris$Sepal.Length)
dns_df <- tibble(
x = dns$x,
density = dns$y
) %>%
mutate(
start = x - mean(diff(x))/2,
end = x + mean(diff(x))/2
)
ggplot() +
geom_segment(
data = dns_df,
aes(x = start, xend = end,
y = min(iris$Sepal.Width) * 0.9,
yend = max(iris$Sepal.Width) * 1.1,
color = density), alpha = 0.5) +
coord_cartesian(ylim = c(min(iris$Sepal.Width),
max(iris$Sepal.Width)),
xlim = c(min(iris$Sepal.Length),
max(iris$Sepal.Length))) +
scale_color_viridis_c(option = "A", alpha = 0.5) +
scale_fill_viridis_d() +
geom_point(data = iris, aes(x = Sepal.Length,
y = Sepal.Width,
fill = Species),
shape = 21) +
geom_rug(data = iris, aes(x = Sepal.Length))
Created on 2022-07-04 by the reprex package (v2.0.1)

ggplot2’s mpg dataset – how to understand the geom_point graph

Why does this graph not show overlaps
Some of the cars in this dataset share the same combination for x and y (displ and hwy).
For example for displ = 2 and hwy = 29, there are: 1 midsize; 6 compact and 3 subcompact.
However, in this spot there is only a green dot showing only 1 midsize. What am I misunderstanding about this graph?
Thank you so much!
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
Carsten,
The call to goem_point() will map coordinates over each other, hence you will see only one point, this is especially true for small datasets. You can address this by using geom_jitter(), which allows you to insert noise into the plot allowing you to see all points.
Solution: geom_jitter()
Here we use geom_jitter(), to insert noise into the plot data allowing us to see all overlapping points.
if (require(ggplot2) ) install.packages("ggplot2")
data(mtcars)
ggplot(data = mpg) +
geom_jitter(mapping = aes(x = displ, y = hwy, color = class))
Plot Output: (Points slightly shifted to distinguish each point)
Note how the inserted "noise" allows you to distinguish the plot points.
nb. The jitter geom is a convenient shortcut for geom_point(position = "jitter"). It adds a small amount of random variation to the location of each point, and is a useful way of handling overplotting caused by discreteness in smaller datasets.
Apart from jitter, you can also change the alpha argument in geom_point() to 0.3 or 0.4, by default it is 1, which means 100% opaque.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class, alpha = 0.3))
This will highlight areas of over-plotting
The geom_jitter solution and alpha changing solution are both excellent. A third possibility is to map the size of the marker to the number of observations at those coordinates (along with an alpha adjustment) using geom_count():
library(ggplot2)
data(mtcars)
ggplot(data = mpg) +
geom_count(mapping = aes(x = displ, y = hwy, color = class), alpha = .5)

How to create a continuous legend (color bar style) for scale_alpha?

Currently, a continuous colour bar legend, guide_colorbar is available only with scale_fill and scale_colour, and not with scale_alpha. The legend which is generated with scale_alpha is of a discrete type (guide_legend).
A small example where color and alpha are mapped to a continuous variable:
scale_color generates a continuous color bar type legend :
ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width, color = Sepal.Width)) +
geom_point()
scale_alpha generates a discrete legend, despite alpha is mapped to a continuous variable:
ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width, alpha = Sepal.Width)) +
geom_point()
Is there some way to get a continuous color bar legend also for scale_alpha?
The default minimum alpha for scale_alpha_continuous is 0.1, and the max is 1. I wrote this assuming that you might adjust the minimum to be more visible, but you'd keep the max at 1.
First I set amin to that default of 0.1, and the chosen colour for the points as highcol. Then we use the col2rgb to make a matrix of the RGB values and blend them with white, as modified from this answer written in C#. Note that we're blending with white, so you should be using a theme that has a white background (e.g. theme_classic() as below). Finally we convert that matrix to hex values and paste it into a single string with # in front for standard RGB format.
require(scales)
amin <- 0.1
highcol <- hue_pal()(1) # or a color name like "blue"
lowcol.hex <- as.hexmode(round(col2rgb(highcol) * amin + 255 * (1 - amin)))
lowcol <- paste0("#", sep = "",
paste(format(lowcol.hex, width = 2), collapse = ""))
Then we plot as you might be planning to already, with your variable of choice set to the alpha aesthetic, and here some geom_points. Then we plot another layer of points, but with colour set to that same variable, and alpha = 0 (or invisible). This gives us our colourbar we need. Then we have to set the range of scale_colour_gradient to our colours from above.
ggplot(iris, aes(Sepal.Length, Sepal.Width, alpha = Petal.Length)) +
geom_point(colour = highcol, size = 3) +
geom_point(aes(colour = Petal.Length), alpha = 0) +
scale_colour_gradient(high = highcol, low = lowcol) +
guides(alpha = F) +
labs(colour = "Alpha\nlabel") +
theme_classic()
I'm guessing you most often would want to use this with only a single colour, and for that colour to be black. In that simplified case, replace highcol and lowcol with "black" and "grey90". If you want to have multiple colours, each with an alpha varied by some other variable... that's a whole other can of worms and probably not a good idea.
Edited to add in a bad idea!
If you replace colour with fill for my solution above, you can still use colour as an aesthetic. Here I used highcol <-hue_pal()(3)[2] to extract that default green colour.
ggplot(aes(Sepal.Length, Sepal.Width, alpha = Petal.Length)) +
geom_point(aes(colour = Species), size = 3) +
geom_point(aes(fill = Petal.Length), alpha = 0) +
scale_fill_gradient(high = highcol, low = lowcol) +
guides(alpha = F) +
labs(fill = "Petal\nLength") +
theme_classic()

Set reference value for color range in viridis or RColorBrewer

I want to make a plot using a color palette from packages viridis or RColorBrewer, but I would like to set what point in the variable distribution should be considered as the middle point in the color gradient.
For example, in the following plots, R takes 0.5 to be the middle point of the color range. How can I set a different value, for example, 0.25 ?
library(ggplot2)
library(RColorBrewer)
# data
set.seed(1)
df <- expand.grid(x = 0:5, y = 0:5)
df$z <- runif(nrow(df))
# plot
ggplot(df, aes(x, y, fill = z)) + geom_raster() +
scale_fill_distiller( palette="RdBu", guide = "colorbar")
# plot using viridis
library(viridis)
ggplot(df, aes(x, y, fill = z)) + geom_raster() +
scale_fill_viridis()
EDIT: I was looking for a solution similar to this one here, because the distribution of my variable is not symmetrical. However, I'd like to keep the entire color range from dark blue to dark red, which seems not to be possible.
ggplot(df, aes(x, y, fill = z)) + geom_raster() +
scale_fill_gradient2( low = "#2166ac", mid = "#f7f7f7", high = "#b2182b", midpoint = 0.2,
space = "Lab", na.value = "grey50", guide = "colourbar")
Maybe try setting
space = scale(c(-zlim, 0, zlim))
Where zlim is the extreme value of your data.

Can I fix overlapping dashed lines in a histogram in ggplot2?

I am trying to plot a histogram of two overlapping distributions in ggplot2. Unfortunately, the graphic needs to be in black and white. I tried representing the two categories with different shades of grey, with transparency, but the result is not as clear as I would like. I tried adding outlines to the bars with different linetypes, but this produced some strange results.
require(ggplot2)
set.seed(65)
a = rnorm(100, mean = 1, sd = 1)
b = rnorm(100, mean = 3, sd = 1)
dat <- data.frame(category = rep(c('A', 'B'), each = 100),
values = c(a, b))
ggplot(data = dat, aes(x = values, linetype = category, fill = category)) +
geom_histogram(colour = 'black', position = 'identity', alpha = 0.4, binwidth = 1) +
scale_fill_grey()
Notice that one of the lines that should appear dotted is in fact solid (at a value of x = 4). I think this must be a result of it actually being two lines - one from the 3-4 bar and one from the 4-5 bar. The dots are out of phase so they produce a solid line. The effect is rather ugly and inconsistent.
Is there any way of fixing this overlap?
Can anyone suggest a more effective way of clarifying the difference between the two categories, without resorting to colour?
Many thanks.
One possibility would be to use a 'hollow histogram', as described here:
# assign your original plot object to a variable
p1 <- ggplot(data = dat, aes(x = values, linetype = category, fill = category)) +
geom_histogram(colour = 'black', position = 'identity', alpha = 0.4, binwidth = 0.4) +
scale_fill_grey()
# p1
# extract relevant variables from the plot object to a new data frame
# your grouping variable 'category' is named 'group' in the plot object
df <- ggplot_build(p1)$data[[1]][ , c("xmin", "y", "group")]
# plot using geom_step
ggplot(data = df, aes(x = xmin, y = y, linetype = factor(group))) +
geom_step()
If you want to vary both linetype and fill, you need to plot a histogram first (which can be filled). Set the outline colour of the histogram to transparent. Then add the geom_step. Use theme_bw to avoid 'grey elements on grey background'
p1 <- ggplot() +
geom_histogram(data = dat, aes(x = values, fill = category),
colour = "transparent", position = 'identity', alpha = 0.4, binwidth = 0.4) +
scale_fill_grey()
df <- ggplot_build(p1)$data[[1]][ , c("xmin", "y", "group")]
df$category <- factor(df$group, labels = c("A", "B"))
p1 +
geom_step(data = df, aes(x = xmin, y = y, linetype = category)) +
theme_bw()
First, I would recommend theme_set(theme_bw()) or theme_set(theme_classic()) (this sets the background to white, which makes it (much) easier to see shades of gray).
Second, you could try something like scale_linetype_manual(values=c(1,3)) -- this won't completely eliminate the artifacts you're unhappy about, but it might make them a little less prominent since linetype 3 is sparser than linetype 2.
Short of drawing density plots instead (which won't work very well for small samples and may not be familiar to your audience), dodging the positions of the histograms (which is ugly), or otherwise departing from histogram conventions, I can't think of a better solution.

Resources