Apply jitter uniformly across geoms in ggplot2 - r

I would like to jitter two geoms by the same amount. Consider the following minimal example:
library(ggplot2)
pdat <- data.frame(x = c(1,1,2,2,4,4,8,8),
y = c(1,1.1,2,2.2,3,3.3,4,4.4),
ymin = c(1,1.1,2,2.2,3,3.3,4,4.4)-.9^(0:7),
ymax = c(1,1.1,2,2.2,3,3.3,4,4.4)+.9^(0:7),
colour = as.factor(rep(1:2,4)))
ggplot(pdat, aes(x=x,y=y,ymin=ymin,ymax=ymax,color=colour)) +
geom_linerange(position='jitter') + geom_point(position='jitter')
ggplot(pdat, aes(x=jitter(x),y=y,ymin=ymin,ymax=ymax,color=colour)) +
geom_linerange() + geom_point()
which produces the following plots:
In both cases, the jittering is random across geoms (the points and lineranges are in different locations), whereas I would like them to be consistent for each data point (the points in the middles of the corresponding lineranges). Is this possible?
Note that I would not consider manually adding noise to the x variable to be a solution as this would destroy the ability to apply coordinate transformations. E.g., defining pdat$x2 <- pdat$x+rnorm(8)/10,
ggplot(pdat, aes(x=x2,y=y,ymin=ymin,ymax=ymax,color=colour)) +
geom_linerange() + geom_point()
looks good, but then the variance of the jitter is subject to any subsequent transformations as can be seen in
ggplot(pdat,aes(x=x2,y=y,ymin=ymin,ymax=ymax,color=colour)) +
geom_linerange() + geom_point() + scale_x_log10()

Using position_jitter function, you can add a seed value to get reproducible jitter effect:
library(ggplot2)
ggplot(pdat, aes(x = x, y = y, ymin = ymin, ymax = ymax, color = colour))+
geom_point(position = position_jitter(seed = 123, width =0.2))+
geom_linerange(position = position_jitter(seed = 123, width = 0.2))
Does it answer your question ?

Related

ggplot2 density of one dimension in 2D plot

I would like to plot a background that captures the density of points in one dimension in a scatter plot. This would serve a similar purpose to a marginal density plot or a rug plot. I have a way of doing it that is not particularly elegant, I am wondering if there's some built-in functionality I can use to produce this kind of plot.
Mainly there are a few issues with the current approach:
Alpha overlap at boundaries causes banding at lower resolution as seen here. - Primary objective, looking for a geom or other solution that draws a nice continuous band filled with specific colour. Something like geom_density_2d() but with the stat drawn from only the X axis.
"Background" does not cover expanded area, can use coord_cartesian(expand = FALSE) but would like to cover regular margins. - Not a big deal, is nice-to-have but not required.
Setting scale_fill "consumes" the option for the plot, not allowing it to be set independently for the points themselves. - This may not be easily achievable, independent palettes for layers appears to be a fundamental issue with ggplot2.
data(iris)
dns <- density(iris$Sepal.Length)
dns_df <- tibble(
x = dns$x,
density = dns$y
)%>%
mutate(
start = x - mean(diff(x))/2,
end = x + mean(diff(x))/2
)
ggplot() +
geom_rect(
data = dns_df,
aes(xmin = start, xmax = end, fill = density),
ymin = min(iris$Sepal.Width),
ymax = max(iris$Sepal.Width),
alpha = 0.5) +
scale_fill_viridis_c(option = "A") +
geom_point(data = iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_rug(data = iris, aes(x = Sepal.Length))
This is a bit of a hacky solution because it (ab)uses knowledge of how objects are internally parametrised to get what you want, which will yield some warnings, but gets you want you'd want.
First, we'll use a geom_raster() + stat_density() decorated with some choice after_stat()/stage() delayed evaluation. Normally, this would result in a height = 1 strip, but by setting the internal parameters ymin/ymax to infinitives, we'll have the strip extend the whole height of the plot. Using geom_raster() resolves the alpha issue you were having.
library(ggplot2)
p <- ggplot(iris) +
geom_raster(
aes(Sepal.Length,
y = mean(Sepal.Width),
fill = after_stat(density),
ymin = stage(NULL, after_scale = -Inf),
ymax = stage(NULL, after_scale = Inf)),
stat = "density", alpha = 0.5
)
#> Warning: Ignoring unknown aesthetics: ymin, ymax
p
#> Warning: Duplicated aesthetics after name standardisation: NA
Next, we add a fill scale, and immediately follow that by ggnewscale::new_scale_fill(). This allows another layer to use a second fill scale, as demonstrated with fill = Species.
p <- p +
scale_fill_viridis_c(option = "A") +
ggnewscale::new_scale_fill() +
geom_point(aes(Sepal.Length, Sepal.Width, fill = Species),
shape = 21) +
geom_rug(aes(Sepal.Length))
p
#> Warning: Duplicated aesthetics after name standardisation: NA
Lastly, to get rid of the padding at the x-axis, we can manually extend the limits and then shrink in the expansion. It allows for an extended range over which the density can be estimated, making the raster fill the whole area. There is some mismatch between how ggplot2 and scales::expand_range() are parameterised, so the exact values are a bit of trial and error.
p +
scale_x_continuous(
limits = ~ scales::expand_range(.x, mul = 0.05),
expand = c(0, -0.2)
)
#> Warning: Duplicated aesthetics after name standardisation: NA
Created on 2022-07-04 by the reprex package (v2.0.1)
This doesn't solve your problem (I'm not sure I understand all the issues correctly), but perhaps it will help:
Background does not cover expanded area, can use coord_cartesian(expand = FALSE) but would like to cover regular margins.
If you make the 'background' larger and use coord_cartesian() you can get the same 'filled-to-the-edges' effect; would this work for your use-case?
Alpha overlap at boundaries causes banding at lower resolution as seen here.
I wasn't able to fix the banding completely, but my approach below appears to reduce it.
Setting scale_fill "consumes" the option for the plot, not allowing it to be set independently for the points themselves.
If you use geom_segment() you can map density to colour, leaving fill available for e.g. the points. Again, not sure if this is a useable solution, just an idea that might help.
library(tidyverse)
data(iris)
dns <- density(iris$Sepal.Length)
dns_df <- tibble(
x = dns$x,
density = dns$y
) %>%
mutate(
start = x - mean(diff(x))/2,
end = x + mean(diff(x))/2
)
ggplot() +
geom_segment(
data = dns_df,
aes(x = start, xend = end,
y = min(iris$Sepal.Width) * 0.9,
yend = max(iris$Sepal.Width) * 1.1,
color = density), alpha = 0.5) +
coord_cartesian(ylim = c(min(iris$Sepal.Width),
max(iris$Sepal.Width)),
xlim = c(min(iris$Sepal.Length),
max(iris$Sepal.Length))) +
scale_color_viridis_c(option = "A", alpha = 0.5) +
scale_fill_viridis_d() +
geom_point(data = iris, aes(x = Sepal.Length,
y = Sepal.Width,
fill = Species),
shape = 21) +
geom_rug(data = iris, aes(x = Sepal.Length))
Created on 2022-07-04 by the reprex package (v2.0.1)

How do I change the colour of the background to the x axis ticks in ggplot2?

At the moment, my graph looks like this:
However, I'd like to colour the background of the x axis tick so it resembles this:
I think I'm almost there, but can't figure out how to change the background to the x axis ticks, and not the writing itself.
I changed the text colour using:
theme(axis.text.x = element_text(colour = Saltcolour))
Any help would be greatly appreciated!
One general approach is to use grobs (Grid objects) and use a rectGrob object to be displayed under the axis text. I'll demonstrate with an example shown here.
library(ggplot2)
set.seed(8675309)
df <- data.frame(
x=paste0('Test', 1:10),
y=rnorm(10, 10)
)
ggplot(df, aes(x,y)) +
geom_col(color='black', fill='gray', alpha=0.8) +
scale_y_continuous(expand=expansion(mult=c(0,0.05))) +
coord_cartesian(clip='off') +
labs(x=NULL) +
theme_classic()
Display the Boxes
To create the grob, you can use the grid package and rectGrob. Since we want to draw many boxes, we can supply a vector for x (to draw one at x positions 1 through 10), and then supply the fill colors via a vector sent to fill. Note that when using grobs, you can supply the various parameters through the gp argument inside of gpar(). Unlike a ggplot geom, the grobs are not matched to a data frame, so you'll have to manually specify the way colors/sizes are mapped via vectors as I've done here.
library(grid)
muh_grob <- grid::rectGrob(
x=1:10, y=0, gp=gpar(
color='black', fill=rainbow(10), alpha=0.2))
To use the grob, you can use annotation_custom(), where you need to specify the min and max values of y and x. You'll have to likely mess around with the numbers to get things to look right. Note the values are in npc, so 0 is left and 1 is all the way on the right in x axis here (discrete values). It's also very important that you include coord_*(clip="off"). It can be any of the coord_ functions, but you need clipping off or you will not be able to see the grob. I've also applied a margin to the top of the x axis text to move it downward a bit and make room for the box around it.
ggplot(df, aes(x,y)) +
geom_col(color='black', fill='gray', alpha=0.8) +
scale_y_continuous(expand=expansion(mult=c(0,0.05))) +
coord_cartesian(clip='off') +
labs(x=NULL) +
theme_classic() +
theme(
axis.text.x = element_text(margin=margin(t=10)),
) +
annotation_custom(
grob=muh_grob, xmin = 0, xmax = 1, ymin = -0.5, ymax=0.1
)
Multiple Facets
OP shared a plot with two facets that contained these boxes... so how to do that? Well, it's not quite as straightforward to do, since annotation_custom() is applied the same way to each facet. Each facet shares the same values of x and y, so if you specify a grob is from xmin=0 and xmax=0.5, this will apply your grob to the left side of each facet.
To get around this, there is a very nice adjustment to the method provided in another answer here, represented below:
library(gridExtra)
annotation_custom2 <-
function (grob, xmin = -Inf, xmax = Inf, ymin = -Inf, ymax = Inf, data)
{
layer(data = data, stat = StatIdentity, position = PositionIdentity,
geom = ggplot2:::GeomCustomAnn,
inherit.aes = TRUE, params = list(grob = grob,
xmin = xmin, xmax = xmax,
ymin = ymin, ymax = ymax))
}
I'll then make a left and right grob, and apply it using that function, which allows us to specify a data argument to annotation_custom2() and place the grob on one facet.
muh_left_grob <- rectGrob(
x=1:5, y=0, gp=gpar(color='black', fill='red', alpha=seq(0.7, 0.1, length.out = 5)))
muh_right_grob <- rectGrob(
x=1:5, y=0, gp=gpar(color='black', fill='blue', alpha=seq(0.7, 0.1, length.out = 5)))
ggplot(df, aes(x,y)) +
geom_col(color='black', fill='gray', alpha=0.8) +
scale_y_continuous(expand=expansion(mult=c(0,0.05))) +
coord_cartesian(clip='off') +
labs(x=NULL) +
theme_classic() +
theme(axis.text.x = element_text(margin=margin(t=10))) +
facet_wrap(~my_facet, scales='free_x') +
annotation_custom2(
data=subset(df, my_facet=='A Facet'), grob=muh_left_grob,
xmin=0, xmax=1, ymin=-0.5, ymax=0.1) +
annotation_custom2(
data=subset(df, my_facet=='Another Facet'), grob=muh_right_grob,
xmin=0, xmax=1, ymin=-0.5, ymax=0.1)

Visualizing two or more data points where they overlap (ggplot R)

I have a scatterplot that has colour-coded data points. When two or more of the data points overlap only one of the colours is shown (whichever is first in the legend). Each of these data points represents an item and I need to show which items fall at each point on the scale. I'm using R (v.3.3.1). Would anyone have any suggestions as per how I could show that there are multiple items at each point on the scatterplot?
Thanks in advance.
pdf('pedplot.pdf', height = 6, width = 10)
p3 <- ggplot(data=e4, aes(x=e4$domain, y=e4$ped)) + geom_point(aes(color =
e4$Database_acronym), size = 3, shape = 17) +
labs(x = "Domains", y = "Proportion of Elements per Domain", color = "Data
Sources") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
p3 dev.off();
You could jitter the points, meaning add a bit of noise to remove the overlap (probably the most commonly used option). Another option, would be to use different marker shapes (plus a small size adjustment) chosen so that the markers will be visible when plotted on top of each other. This will work if you have only two or three different marker types. A third option is to vary the size for each color, once again only for cases with maybe two or three colors/sizes, though the size difference might be confusing. If you can have multiple points of the same color with the same coordinates, then only jitter (among the three options above) will make that apparent. In any case, here are examples of each approach:
dat = data.frame(x=1:5, y=rep(1:5,3), group=rep(LETTERS[1:3],each=5))
theme_set(theme_bw())
# Jitter
set.seed(3)
ggplot(dat, aes(x,y, colour=group)) +
geom_point(size=3, position=position_jitter(h=0.15,w=0.15))
# Vary the marker size
ggplot(dat, aes(x,y, colour=group,size=group)) +
geom_point() +
scale_color_manual(values=c("red","blue","orange")) +
scale_size_manual(values=c(5,3,1))
# Vary the marker shape (plus a small size adjustment)
ggplot(dat, aes(x,y, colour=group, size=group, shape=group)) +
geom_point(stroke=1.5) +
scale_colour_manual(values=(c("black", "green", "orange"))) +
scale_shape_manual(values=c(19,17,4)) +
scale_size_manual(values=c(4,3,3))
Separately from or in addition to jittering as mentioned here, you could also consider making the points partially transparent:
linecolors <- c("#714C02", "#01587A", "#024E37")
fillcolors <- c("#9D6C06", "#077DAA", "#026D4E")
# partially transparent points by setting `alpha = 0.5`
ggplot(mpg, aes(displ, cty, colour = drv, fill = drv)) +
geom_point(position=position_jitter(h=0.1, w=0.1),
shape = 21, alpha = 0.5, size = 3) +
scale_color_manual(values=linecolors) +
scale_fill_manual(values=fillcolors) +
theme_bw()
What about using different shapes and fills?
ggplot(mpg, aes(displ, cty, fill = drv, shape = drv)) +
geom_point(position=position_jitter(h=0.1, w=0.1), alpha = 0.5, size = 3) +
scale_fill_manual(values=c("red","blue","orange")) +
scale_shape_manual(values= c(23, 24, 25)) +
theme_bw()
Another option could be by counting the overlapping points using geom_count with scale_size_area to scale the sizes of the points. Here is some reproducible code:
library(ggplot2)
ggplot(mpg, aes(x = displ, y = cty)) +
geom_count() +
scale_size_area()
Also, an example when using a color aesthetic to see the difference of counts of groups:
ggplot(mpg, aes(x = displ, y = cty, colour = drv)) +
geom_count() +
scale_size_area()
Created on 2023-01-31 with reprex v2.0.2
You could change the number of breaks in scale_size_area to show different sizes. Please check the link above for more examples.
Try geom_point(aes(color = e4$Database_acronym), position = "jitter", size = 3, shape = 17).
This adds a little bit of random variation to your scatter plot and thereby prevents overplotting.

Plotting both horizontal and vertical point ranges simultaneously in ggplot

Is there a way to plot both horizontal and vertical point ranges together on the same plot in ggplot. I understand that geom_pointrange(...) plots vertical point ranges, and that horizontal point ranges can be generated with coord_flip(...), but I'm interested in putting both together on the same plot.
set.seed(1)
df <- data.frame(x=sample(1:10,10),y=sample(1:10,10), x.range=1, y.range=2)
library(ggplot2)
ggplot(df) +
geom_pointrange(aes(x=x, y=y, ymin=y=y.range, ymax=y+y.range))
I'm looking for something like this:
ggplot(df) +
geom_pointrange(aes(x=x, y=y,
ymin=y-y.range, ymax=y+y.range,
xmin=x-x.range, xmax=x+x.range))
Which of course produces the same output as above because the xmin and xmax arguments are ignored. Evidently, there is (was) a function geom_hpointrange(...) in ggExtra, but this package has been pulled as far as I can tell.
Is geom_errorbarh what you are looking for?
ggplot(data = df, aes(x = x, y = y)) +
geom_pointrange(aes(ymin = y - y.range, ymax = y + y.range)) +
geom_errorbarh(aes(xmax = x + x.range, xmin = x - x.range, height = 0))
you can also call geompoint_range twice
ggplot(df, aes(x=x, y=y)) +
geom_pointrange(aes(ymin=y-y.range, ymax=y+y.range)) +
geom_pointrange(aes(xmin=x-x.range, xmax=x+x.range))

How to smartly place text labels beside points of different sizes in ggplot2?

I am trying to make a labeled bubble plot with ggplot2 in R. Here is the simplified scenario:
I have a data frame with 4 variables: 3 quantitative variables, x, y, and z, and another variable that labels the points, lab.
I want to make a scatter plot, where the position is determined by x and y, and the size of the points is determined by z. I then want to place text labels beside the points (say, to the right of the point) without overlapping the text on top of the point.
If the points did not vary in size, I could try to simply modify the aesthetic of the geom_text layer by adding a scaling constant (e.g. aes(x=x+1, y=y+1)). However, even in this simple case, I am having a problem with positioning the text correctly because the points do not scale with the output dimensions of the plot. In other words, the size of the points remains constant in a 500x500 plot and a 1000x1000 plot - they do not scale up with the dimensions of the outputted plot.
Therefore, I think I have to scale the position of the label by the size (e.g. dimensions) of the output plot, or I have to get the radius of the points from ggplot somehow and shift my text labels. Is there a way to do this in ggplot2?
Here is some code:
# Stupid data
df <- data.frame(x=c(1,2,3),
y=c(1,2,3),
z=c(1,2,1),
lab=c("a","b","c"), stringsAsFactors=FALSE)
# Plot with bad label placement
ggplot(aes(x=x, y=y), data=df) +
geom_point(aes(size=z)) +
geom_text(aes(label=lab),
colour="red") +
scale_size_continuous(range=c(5, 50), guide="none")
EDIT: I should mention, I tried hjust and vjust inside of geom_text, but it does not produce the desired effect.
# Trying hjust and vjust, but it doesn't look nice
ggplot(aes(x=x, y=y), data=df) +
geom_point(aes(size=z)) +
geom_text(aes(label=lab), hjust=0, vjust=0.5,
colour="red") +
scale_size_continuous(range=c(5, 50), guide="none")
EDIT: I managed to get something that works for now, thanks to Henrik and shujaa. I will leave the question open just in case someone shares a more general solution.
Just a blurb of what I am using this for: I am plotting a map, and indicating the amount of precipitation at certain stations with a point that is sized proportionally to the amount of precipitation observed. I wanted to add a station label beside each point in an aesthetically pleasing manner. I will be making more of these plots for different regions, and my output plot may have a different resolution or scale (e.g. due to different projections) for each plot, so a general solution is desired. I might try my hand at creating a custom position_jitter, like baptiste suggested, if I have time during the weekend.
It appears that position_*** don't have access to the scales used by other layers, so it's a no go. You could make a clone of GeomText that shifts the labels according to the size mapped,
but it's a lot of effort for a very kludgy and fragile solution,
geom_shiftedtext <- function (mapping = NULL, data = NULL, stat = "identity",
position = "identity",
parse = FALSE, ...) {
GeomShiftedtext$new(mapping = mapping, data = data, stat = stat, position = position,
parse = parse, ...)
}
require(proto)
GeomShiftedtext <- proto(ggplot2:::GeomText, {
objname <- "shiftedtext"
draw <- function(., data, scales, coordinates, ..., parse = FALSE, na.rm = FALSE) {
data <- remove_missing(data, na.rm,
c("x", "y", "label"), name = "geom_shiftedtext")
lab <- data$label
if (parse) {
lab <- parse(text = lab)
}
with(coord_transform(coordinates, data, scales),
textGrob(lab, unit(x, "native") + unit(0.375* size, "mm"),
unit(y, "native"),
hjust=hjust, vjust=vjust, rot=angle,
gp = gpar(col = alpha(colour, alpha),
fontfamily = family, fontface = fontface, lineheight = lineheight))
)
}
})
df <- data.frame(x=c(1,2,3),
y=c(1,2,3),
z=c(1.2,2,1),
lab=c("a","b","c"), stringsAsFactors=FALSE)
ggplot(aes(x=x, y=y), data=df) +
geom_point(aes(size=z), shape=1) +
geom_shiftedtext(aes(label=lab, size=z),
hjust=0, colour="red") +
scale_size_continuous(range=c(5, 100), guide="none")
This isn't a very general solution, because you'll need to tweak it every time, but you should be able to add to the x value for the text some value that's linear depending on z.
I had luck with
ggplot(aes(x=x, y=y), data=df) +
geom_point(aes(size=z)) +
geom_text(aes(label=lab, x = x + .06 + .14 * (z - min(z))),
colour="red") +
scale_size_continuous(range=c(5, 50), guide="none")
but, as the font size depends on your window size, you would need to decide on your output size and tweak accordingly. I started with x = x + .05 + 0 * (z-min(z)) and calibrated the intercept based on the smallest point, then when I was happy with that I adjusted the linear term for the biggest point.
Another alternative. Looks OK with your test data, but you need to check how general it is.
dodge <- abs(scale(df$z))/4
ggplot(data = df, aes(x = x, y = y)) +
geom_point(aes(size = z)) +
geom_text(aes(x = x + dodge), label = df$lab, colour = "red") +
scale_size_continuous(range = c(5, 50), guide = "none")
Update
Just tried position_jitter, but the width argument only takes one value, so right now I am not sure how useful that function would be. But I would be happy to find that I am wrong. Example with another small data set:
df3 <- mtcars[1:10, ]
ggplot(data = df3, aes(x = wt, y = mpg)) +
geom_point(aes(size = qsec), alpha = 0.1) +
geom_text(label = df3$carb, position = position_jitter(width = 0.1, height = 0)) +
scale_size_continuous(range = c(5, 50), guide = "none")

Resources