Calculate axis tick locations based on data in faceted plot - r

I have an issue where I would like to calculate locations of y-axis labels in a large plot mad with facet_grid(). Let me show you what I mean using the mpg dataset.
require(tidyverse)
ggplot(mpg, aes(x = displ)) +
geom_point(aes(y = hwy)) +
facet_grid(drv ~ class, scales = "free")
You will notice that both axes use variable numbers of labels. Ignoring problems with the x-axis, I am interested in labelling only three values on the y-axis: 0, half of max value, and max value. It makes sense in my use-case, so here is what I tried.
ggplot(mpg, aes(x = displ)) +
geom_point(aes(y = hwy)) +
facet_grid(drv ~ class, scales = "free") +
geom_blank(aes(y = 0)) + # extends y-axis to 0
scale_y_continuous(expand = expansion(mult = c(0, 0.1)), # prevents ggplot2 from extending beyond y = 0
n.breaks = 3) # Three axis labels, please.
The plot correctly starts at y = 0 and labels it correctly. However, remaining labels are poorly assigned, and have labels n = 2 and n = 4 instead of n = 3 for some reason.
If I could only directly calculate the label positions!
ggplot(mpg, aes(x = displ)) +
geom_point(aes(y = hwy)) +
facet_grid(drv ~ class, scales = "free") +
geom_blank(aes(y = 0)) +
scale_y_continuous(expand = expansion(mult = c(0, 0.1)),
n.breaks = 3,
breaks = c(0, 0.5*max(hwy), 1*max(hwy))) # Maybe these formulas works?
Error in check_breaks_labels(breaks, labels) : object 'hwy' not found
I believe providing break points by this way should work, but that my syntax is bad. How do I access and work with the data underlying the plot? Alternatively, if this doesn't work, can I manually specify y-axis labels for each row of panels?
I could really use some assistance here, please.

If you want custom rules for breaks, the easiest thing is to use a function implementing those rules given the (panel) limits.
Below an example for labeling 0, the max and half-max.
library(ggplot2)
ggplot(mpg, aes(x = displ)) +
geom_point(aes(y = hwy)) +
facet_grid(drv ~ class, scales = "free") +
scale_y_continuous(expand = expansion(mult = c(0, 0.1)),
limits = c(0, NA), # <- fixed min, flexible max. replaces geom_blank
breaks = function(x){c(x[1], mean(x), x[2])})

You can remove scales free: Do you get then what you desire?
require(tidyverse)
ggplot(mpg, aes(x = displ)) +
geom_point(aes(y = hwy)) +
facet_grid(drv ~ class)

Related

How to plot facets with discontinuous y-axis

I am trying to produce a plot with a discontinuous y-axis but can't get the facet titles to only show once:
Example Data:
data(mpg)
library(ggplot2)
> ggplot(mpg, aes(displ, cty)) +
+ geom_point() +
+ facet_grid(. ~ drv)
After much digging it appears that this is impossible in ggplot2, but I have discovered the gg.gap package. However, this package replicates the facet titles for each segment of the plot. Let's say I want a break in the y axis from 22-32 as follows:
library(gg.gap)
gg.gap(plot = p,
segments = c(22, 32),
ylim = c(0, 35))
Facet titles appear for each plot segment but this is clearly pretty confusing and terrible aesthetically. I would be grateful for any insight of help anyone could provide! I'm stumped.
I know this is possible if I plot in base R, but given other constraints I am unable to do so (I need the graphics/grammar provided by ggplot2.
Thanks in advance!
This is a bit of an ugly workaround. The idea is to set y-values in the broken portion to NA so that no points are drawn there. Then, we facet on a findInterval() with the breaks of the axes (negative because we want to preserve bottom-to-top axes). Finally we manually resize the panels with ggh4x::force_panelsizes() to set the 2nd panel to have 0 height. Full disclaimer, I wrote ggh4x so I'm biased.
A few details: the strips along the y-direction are hidden by setting the relevant theme elements to blank. Also, ideally you'd calculate what proportion the upper facet should be relative to the lower facet and replace the 0.2 by that number.
library(ggplot2)
library(ggh4x)
ggplot(mpg, aes(displ, cty)) +
geom_point(aes(y = ifelse(cty >= 22 & cty < 32, NA, cty))) +
facet_grid(-findInterval(cty, c(-Inf, 22, 32, Inf)) ~ drv,
scales = "free_y", space = "free_y") +
theme(strip.background.y = element_blank(),
strip.text.y = element_blank(),
panel.spacing.y = unit(5.5/2, "pt")) +
force_panelsizes(rows = c(0.2, 0, 1))
#> Warning: Removed 20 rows containing missing values (geom_point).
Alternative approach for boxplot:
Instead of censoring the bit on the break, you can duplicate the data and manipulate the position scales to show what you want. We rely on the clipping of the data by the coordinate system to crop the graphical objects.
library(ggplot2)
library(ggh4x)
ggplot(mpg, aes(class, cty)) +
geom_boxplot(data = ~ transform(., facet = 2)) +
geom_boxplot(data = ~ transform(., facet = 1)) +
facet_grid(facet ~ drv, scales = "free_y", space = "free_y") +
facetted_pos_scales(y = list(
scale_y_continuous(limits = c(32, NA), oob = scales::oob_keep, # <- keeps data
expand = c(0, 0, 0.05, 0)),
scale_y_continuous(limits= c(NA, 21), oob = scales::oob_keep,
expand = c(0.05, 0, 0, 0))
)) +
theme(strip.background.y = element_blank(),
strip.text.y = element_blank())
Here's an approach that relies on changing the data before ggplot2, and then adjusting the scale labels, comparable to what you do for a secondary y axis.
library(dplyr)
low_max <- 22.5
high_min <- 32.5
adjust <- high_min - low_max
mpg %>%
mutate(cty2 = as.numeric(cty),
cty2 = case_when(cty < low_max ~ cty2,
cty > high_min ~ cty2 - adjust,
TRUE ~ NA_real_)) %>%
ggplot(aes(displ, cty2)) +
geom_point() +
annotate("segment", color = "white", size = 2,
x = -Inf, xend = Inf, y = low_max, yend = low_max) +
scale_y_continuous(breaks = 1:50,
label = function(x) {x + ifelse(x>=low_max, adjust, 0)}) +
facet_grid(. ~ drv)

R ggplot Line With Vertical Bands

I wish to create a plot as the above with data such as this,
data1=data.frame("School"=c(1,2,3,4,5,6,7,8,9,10),
"Score"=c(80,64,79,64,64,89,69,71,61,98),
"ScoreLow"=c(65,62,62,60,60,84,54,55,55,69),
"ScoreHigh"=c(98,79,85,97,88,95,97,90,79,99))
The blue line is 'Score' and score is on the Y-AXIS and 'SChool' is on the X-AXIS. The length of the black line gets determined from 'ScoreLow' and 'ScoreHigh'
geom_errorbar would also work, in case you want to add some ticks at the edges (or leave them out, setting width=0, as below):
library(ggplot2)
data1=data.frame("School"=c(1,2,3,4,5,6,7,8,9,10),
"Score"=c(80,64,79,64,64,89,69,71,61,98),
"ScoreLow"=c(65,62,62,60,60,84,54,55,55,69),
"ScoreHigh"=c(98,79,85,97,88,95,97,90,79,99))
ggplot(data1, aes(x=School, y=Score)) + geom_line(colour="#507bc7", size=2)+
geom_errorbar(aes(ymin=ScoreLow, ymax=ScoreHigh), width=0, col="black", size=1.5) +
theme_minimal()
Created on 2020-04-10 by the reprex package (v0.3.0)
I think you are looking for a combination of geom_line() and geom_segment().
library(ggplot2)
ggplot(data1) +
geom_line(aes(x = School, y = Score), color = "blue", size = 1.5) +
geom_segment(aes(x = School, xend = School, y = ScoreLow, yend = ScoreHigh), size = 2) +
scale_x_continuous(breaks = 1:10) +
scale_y_continuous(limits = c(0, 100), breaks = 0:10 * 10) +
theme_minimal()
Need to probably play around a bit to get it how you want it.

Several distributions in the same plot -- using geom_density function from ggplot2

I think I'm very close to getting this code done, but I'm missing something here.
I want to "combine" two plots into just one like this:
The first plot has this code:
ggplot(test, aes(y=key,x=value)) +
geom_path()+
coord_flip()
And the second one has this one below:
ggplot(test, aes(x=value, fill=key)) +
geom_density() +
coord_flip()
This kind of multiple distributions plot are often seen in stats book when we read about normal distributions. The most useful link I've got so far was this one here.
Please use this code to reproduce my question:
library(tidyverse)
test <- data.frame(key = c("communication","gross_motor","fine_motor"),
value = rnorm(n=30,mean=0, sd=1))
ggplot(test, aes(x=value, fill=key)) +
geom_density() +
coord_flip()
ggplot(test, aes(y=key,x=value)) +
geom_path(size=2)+
coord_flip()
Thanks much
You might be interested in ridgeline plots from the ggridges package.
Ridgeline plots are partially overlapping line plots that create the impression of a mountain range. They can be quite useful for visualizing changes in distributions over time or space.
library(tidyverse)
library(ggridges)
set.seed(123)
test <- data.frame(
key = c("communication", "gross_motor", "fine_motor"),
value = rnorm(n = 30, mean = 0, sd = 1)
)
ggplot(test, aes(x = value, y = key)) +
geom_density_ridges(scale = 0.9) +
theme_ridges() +
NULL
#> Picking joint bandwidth of 0.525
Add median line:
ggplot(test, aes(x = value, y = key)) +
stat_density_ridges(quantile_lines = TRUE, quantiles = 2, scale = 0.9) +
coord_flip() +
theme_ridges() +
NULL
#> Picking joint bandwidth of 0.525
Simulate a rug:
ggplot(test, aes(x = value, y = key)) +
geom_density_ridges(
jittered_points = TRUE,
position = position_points_jitter(width = 0.05, height = 0),
point_shape = '|', point_size = 3, point_alpha = 1, alpha = 0.7,
) +
theme_ridges() +
NULL
#> Picking joint bandwidth of 0.525
Created on 2018-10-16 by the reprex package (v0.2.1.9000)
I think the easiest way to do this is with facet_wrap(). If you don't like the default appearance of the facets you can tweak them with theme(), e.g.:
ggplot(test, aes(x=value, fill=key)) +
geom_density() +
facet_wrap(~ key) +
coord_flip() +
theme(panel.spacing.x = unit(0, "mm"))
Result:

How can I flip and then zoom in on a boxplot?

Consider the following code:
library(ggplot2)
ggplot(diamonds, aes("", price)) + geom_boxplot() + coord_flip()
After flipping the box plot, how can I zoom in to c(0,7000) on price (which is the new x-axis)?
I feel like it has something to do with coord_cartesian(ylim=c(0, 7000)), but this doesn't seem to work in conjunction with coord_flip().
Here is my solution:
ggplot(diamonds, aes("", price)) +
geom_boxplot() +
coord_flip(ylim=c(0, 7000))
Just combine the ylim command as argument in coord_flip().
You can use scale_y_continuous():
library(ggplot2)
ggplot(diamonds, aes("", price)) +
geom_boxplot() +
coord_flip() +
scale_y_continuous(limits = c(0, 7000))
Remember that coord_flip() just rotates the plot, hence you call scale_ on the y axis, which is what you specify price as. I usually like to call it last for that reason: to help limit confusion over which axis is which!
I think you need to manually compute the boxplot statistics and plot these.
# Compute summary statistics with max (y100) set to cutoff (7000)
df <- data.frame(x = 1,
y0 = min(diamonds$price),
y25 = quantile(diamonds$price, 0.25),
y50 = median(diamonds$price),
y75 = quantile(diamonds$price, 0.75),
y100 = 7000
)
ggplot(df, aes(x)) +
geom_boxplot(aes(ymin = y0, lower = y25, middle = y50, upper = y75, ymax = y100),
stat = "identity") +
coord_flip()

How to jitter both geom_line and geom_point by the same magnitude?

I have a ggplot2 linegraph with two lines featuring significant overlap. I'm trying to use position_jitterdodge() so that they are more visible, but I can't get the lines and points to both jitter in the same way. I'm trying to jitter the points and line horizontally only (as I don't want to suggest any change on the y-axis). Here is an MWE:
## Create data frames
dimension <- factor(c("A", "B", "C", "D"))
df <- data.frame("dimension" = rep(dimension, 2),
"value" = c(20, 21, 34, 32,
20, 21, 36, 29),
"Time" = c(rep("First", 4), rep("Second", 4)))
## Plot it
ggplot(data = df, aes(x = dimension, y = value,
shape = Time, linetype = Time, group = Time)) +
geom_line(position = position_jitterdodge(dodge.width = 0.45)) +
geom_point(position = position_jitterdodge(dodge.width = 0.45)) +
xlab("Dimension") + ylab("Value")
Which produces the ugly:
I've obviously got something fundamentally wrong here: What should I do to make the geom_point jitter follow the geom_line jitter?
Another option for horizontal only would be to specify position_dodge and pass this to the position argument for each geom.
pd <- position_dodge(0.4)
ggplot(data = df, aes(x = dimension, y = value,
shape = Time, linetype = Time, group = Time)) +
geom_line(position = pd) +
geom_point(position = pd) +
xlab("Dimension") + ylab("Value")
One solution is to manually jitter the points:
df$value_j <- jitter(df$value)
ggplot(df, aes(dimension, value_j, shape=Time, linetype=Time, group=Time)) +
geom_line() +
geom_point() +
labs(x="Dimension", y="Value")
The horizontal solution for your discrete X axis isn't as clean (it's clean under the covers when ggplot2 does it since it handles the axis and point transformations for you quite nicely) but it's doable:
df$dim_j <- jitter(as.numeric(factor(df$dimension)))
ggplot(df, aes(dim_j, value, shape=Time, linetype=Time, group=Time)) +
geom_line() +
geom_point() +
scale_x_continuous(labels=dimension) +
labs(x="Dimension", y="Value")
On July 2017, developpers of ggplot2 have added a seed argument on position_jitter function (https://github.com/tidyverse/ggplot2/pull/1996).
So, now (here: ggplot2 3.2.1) you can pass the argument seed to position_jitter in order to have the same jitter effect in geom_point and geom_line (see the official documentation: https://ggplot2.tidyverse.org/reference/position_jitter.html)
Note that this seed argument does not exist (yet) in geom_jitter.
ggplot(data = df, aes(x = dimension, y = value,
shape = Time, linetype = Time, group = Time)) +
geom_line(position = position_jitter(width = 0.25, seed = 123)) +
geom_point(position = position_jitter(width = 0.25, seed = 123)) +
xlab("Dimension") + ylab("Value")

Resources