R and ggplot2: controlling scales in facet grid - r

I am plotting for a rowing boat the relation between cadence, speed, and going up- or down stream using facets in ggplot2:
library(tidyr)
library(ggplot2)
s <- function(d) ifelse(d < 500, "down", "up")
dist <- seq(1,1000)
cadence <- floor(25 + 7 * sin(dist/250))
speed <- 4+sin(dist/250)*2
stream <- s(dist)
df <- data.frame(dist,cadence,speed,stream)
g <- gather(df, variable, value, speed, cadence)
p <- ggplot(g, aes(dist, value)) +
geom_line(aes(color=stream, group=1)) +
scale_x_continuous(name = "distance [m]") +
# scale_y_continuous(sec.axis = sec_axis(~ (500/.),
# name = "split [s/500m]",
# breaks=c(90,95,100,105,110,115,120,125,130,140,150)),
# limits=c(3.0,5.5),name="speed [m/s]") +
facet_grid(variable ~ ., scales = "free_y")
I rely on scales="free_y" to get good automatic scaling of the Y axis. However, I would like to have more control and don't know how to achieve this:
I would like to limit each y axis individually, if possible
I would like to add a second y axis for the speed plot, showing a derived pace of time per 500m.
I know how to do this in individual plots but the facet grid makes sure the plots are properly aligned along the distance.

One option would be the ggh4x package which offers some options e.g. facetted_pos_scales to specify the scale individually for each facet and/or to add a secondary scale. Note however that we are still in the world of facets so you won't be able to set the the axis titles individually. To achieve that I would suggest to use patchwork.
library(ggh4x)
library(ggplot2)
ggplot(g, aes(dist, value)) +
geom_line(aes(color = stream, group = 1)) +
scale_x_continuous(name = "distance [m]") +
facet_grid(variable ~ ., scales = "free_y") +
facetted_pos_scales(
y = list(
variable == "speed" ~ scale_y_continuous(
limits = c(3, 5.5), name = "speed [m/s]",
sec.axis = sec_axis(~ (500 / .), name = "split [s/500m]", breaks = c(90, 95, 100, 105, 110, 115, 120, 125, 130, 140, 150))
)
)
) +
theme(strip.placement = "outside")

Related

Use free_y scale on first axis and fixed on second + facet_grid + ggplot2

Is there any method to set scale = 'free_y' on the left hand (first) axis in ggplot2 and use a fixed axis on the right hand (second) axis?
I have a dataset where I need to use free scales for one variable and fixed for another but represent both on the same plot. To do so I'm trying to add a second, fixed, y-axis to my data. The problem is I cannot find any method to set a fixed scale for the 2nd axis and have that reflected in the facet grid.
This is the code I have so far to create the graph -
#plot weekly seizure date
p <- ggplot(dfspw_all, aes(x=WkYr, y=Seizures, group = 1)) + geom_line() +
xlab("Week Under Observation") + ggtitle("Average Seizures per Week - To Date") +
geom_line(data = dfsl_all, aes(x =WkYr, y = Sleep), color = 'green') +
scale_y_continuous(
# Features of the first axis
name = "Seizures",
# Add a second axis and specify its features
sec.axis = sec_axis(~.[0:20], name="Sleep")
)
p + facet_grid(vars(Name), scales = "free_y") +
theme(axis.ticks.x=element_blank(),axis.text.x = element_blank())
This is what it is producing (some details omitted from code for simplicity) -
What I need is for the scale on the left to remain "free" and the scale on the right to range from 0-24.
Secondary axes are implemented in ggplot2 as a decoration that is a transformation of the primary axis, so I don't know an elegant way to do this, since it would require the secondary axis formula to be aware of different scaling factors for each facet.
Here's a hacky approach where I scale each secondary series to its respective primary series, and then add some manual annotations for the secondary series. Another way might be to make the plots separately for each facet like here and use patchwork to combine them.
Given some fake data where the facets have different ranges for the primary series but the same range for the secondary series:
library(tidyverse)
fake <- tibble(facet = rep(1:3, each = 10),
x = rep(1:10, times = 3),
y_prim = (1+sin(x))*facet/2,
y_sec = (1 + sin(x*3))/2)
ggplot(fake, aes(x, y_prim)) +
geom_line() +
geom_line(aes(y= y_sec), color = "green") +
facet_wrap(~facet, ncol = 1)
...we could scale each secondary series to its primary series, and add custom annotations for that secondary series:
fake2 <- fake %>%
group_by(facet) %>%
mutate(y_sec_scaled = y_sec/max(y_sec) * (max(y_prim))) %>%
ungroup()
fake2_labels <- fake %>%
group_by(facet) %>%
summarize(max_prim = max(y_prim), baseline = 0, x_val = 10.5)
ggplot(fake2, aes(x, y_prim)) +
geom_line() +
geom_line(aes(y= y_sec_scaled), color = "green") +
facet_wrap(~facet, ncol = 1, scales = "free_y") +
geom_text(data = fake2_labels, aes(x = x_val, y = max_prim, label = "100%"),
hjust = 0, color = "green") +
geom_text(data = fake2_labels, aes(x = x_val, y = baseline, label = "0%"),
hjust = 0, color = "green") +
coord_cartesian(xlim = c(0, 10), clip = "off") +
theme(plot.margin = unit(c(1,3,1,1), "lines"))

How can I get the real scale from a facet_grid plot in R?

I am trying to add captions as it appears in this post.
For that reason, I need the real scale of the plot (x and y axis) when I am using facet_grid. I know that I can use layer_data, since it saves everything from the plot... However, it is not really accurate, because when I try to establish the limits using min and max from that output, the plot changes.
Here you have an example:
library(ggplot2)
library(dplyr)
val1 <- c(2.1490626,2.2035281,1.5927854,3.1399245,2.3967338,3.7915825,4.6691277,3.0727319,2.9230937,2.6239759,3.7664386,4.0160378,1.2500835,4.7648343,0.0000000,5.6740227,2.7510256,3.0709322,2.7998003,4.0809085,2.5178086,5.9713330,2.7779843,3.6724801,4.2648527,3.6841084,2.5597235,3.8477471,2.6587736,2.2742209,4.5862788,6.1989269,4.1167091,3.1769325,4.2404515,5.3627032,4.1576810,4.3387921,1.4024381,0.0000000,4.3999099,3.4381837,4.8269218,2.6308474,5.3481382,4.9549753,4.5389650,1.3002293,2.8648220,2.4015338,2.0962332,2.6774765,3.0581759,2.5786137,5.0539080,3.8545796,4.3429043,4.2233248,2.0434363,4.5980727)
val2 <- c(3.7691229,3.6478055,0.5435826,1.9665861,3.0802654,1.2248374,1.7311236,2.2492826,2.2365337,1.5726119,2.0147144,2.3550348,1.9527204,3.3689502,1.7847986,3.5901329,1.6833872,3.4240479,1.8372175,0.0000000,2.5701453,3.6551315,4.0327091,3.8781182)
df1 <- data.frame(value = val1)
df2 <- data.frame(value = val2)
data <- bind_rows(lst(df1, df2), .id = 'id')
data$Sex <- rep(c("Male", "Female"), times=84/2)
p <- data %>%
ggplot(aes(value)) +
geom_density(lwd = 1.2, colour="red", show.legend = FALSE) +
geom_histogram(aes(y=..density.., fill = id), bins=10, col="black", alpha=0.2) +
facet_grid(id ~ Sex ) +
xlab("type_data") +
ylab("Density") +
ggtitle("title") +
guides(fill=guide_legend(title="legend_title")) +
theme(strip.text.y = element_blank())
p
plot_info <- layer_data(p)
> min(plot_info$density)
[1] 7.166349e-09
> max(plot_info$density)
[1] 0.5738021
As you can see in the plot, the y-axis starts at 0 and if finishes around 0.7 more less. However, the maximum density is 0.57.
If I try to use the info from layer_data:
p + coord_cartesian(clip="off", ylim=c(min(plot_info$density), max(plot_info$density)),
xlim = c(min(plot_info$x), max(plot_info$x)))
The plot changes completely.
Does anyone know how can I get the scales that ggplot2 and facet_grid are using? I need the information of the density (y_axis) and the info from the x_axis.
Yes, to get the scales directly, use layer_scales(p), which gives you the range of the axes rather than just the range of the data, which is what you get from layer_data(p)
p + coord_cartesian(clip = "off",
ylim = layer_scales(p)$y$range$range,
xlim = layer_scales(p)$x$range$range)
Or, to combine this question with your last, where you add the text labels outside of the plotting panels, your result might be something like:
p + coord_cartesian(clip = "off",
ylim = layer_scales(p)$y$range$range,
xlim = layer_scales(p)$x$range$range) +
geom_text(data = data.frame(value = c(0, 6), id = c("df2", "df2"),
Sex = c('Female', 'Male')),
aes(y = -0.15, label = c('Female', 'Male')))
Does this help?
?layer_data
summary(layer_data(p, i = 2))
i is the layer you want to return
Can min the xmin and max the xmax etc

How do I correctly connect data points ggplot

I am making a stratigraphic plot but somehow, my data points don't connect correctly.
The purpose of this plot is that the values on the x-axis are connected so you get an overview of the change in d18O throughout time (age, ma).
I've used the following script:
library(readxl)
R_pliocene_tot <- read_excel("Desktop/R_d18o.xlsx")
View(R_pliocene_tot)
install.packages("analogue")
install.packages("gridExtra")
library(tidyverse)
R_pliocene_Rtot <- R_pliocene_tot %>%
gather(key=param, value=value, -age_ma)
R_pliocene_Rtot
R_pliocene_Rtot %>%
ggplot(aes(x=value, y=age_ma)) +
geom_path() +
geom_point() +
facet_wrap(~param, scales = "free_x") +
scale_y_reverse() +
labs(x = NULL, y = "Age (ma)")
which leads to the following figure:
Something is wrong with the geom_path function, I guess, but I can't figure out what it is.
Though the comment seem solve the problem I don't think the question asked was answered. So here is some introduction about ggplot2 library regard geom_path
library(dplyr)
library(ggplot2)
# This dataset contain two group with random value for y and x run from 1->20
# The param is just to replicate the question param variable.
df <- tibble(x = rep(seq(1, 20, by = 1), 2),
y = runif(40, min = 1, max = 100),
group = c(rep("group 1", 20), rep("group 2", 20)),
param = rep("a param", 40))
df %>%
ggplot(aes(x = x, y = y)) +
# In geom_path there is group aesthetics which help the function to know
# which data point should is in which path.
# The one in the same group will be connected together.
# here I use the color to help distinct the path a bit more.
geom_path(aes(group = group, color = group)) +
geom_point() +
facet_wrap(~param, scales = "free_x") +
scale_y_reverse() +
labs(x = NULL, y = "Age (ma)")
In your data which work well with group = 1 I guessed all data points belong to one group and you just want to draw a line connect all those data point. So take my data example above and draw with aesthetics group = 1, you can see the result that have two line similar to the above example but now the end point of group 1 is now connected with the starting point of group 2.
So all data point is now on one path but the order of how they draw is depend on the order they appear in the data. (I keep the color just to help see it a bit clearer)
df %>%
ggplot(aes(x = x, y = y)) +
geom_path(aes(group = 1, color = group)) +
geom_point() +
facet_wrap(~param, scales = "free_x") +
scale_y_reverse() +
labs(x = NULL, y = "Age (ma)")
Hope this give you better understanding of ggplot2::geom_path

Can you remove the space between axis and data in ggplot with discrete scales

Is it possible to reduce the space between the axis and data in a ggplot (point) with a discrete scale. I've seen a lot of ways of doing this with discrete scales but can't seem to get it to work in this situation (discrete scale with 1 value on the axis).
I have tried converting to numeric and using expand(c(0,0) based on this response. It works when applied to the x axis but not the y-axis (possibly because there is only one break on the y-axis?)
library(tidyverse)
df <- tibble(Site = rep("A", 4),
Basin = rep("B1", 4),
Variable = c("V1", "V2", "V3", "V4"),
cls = c("up", "down", "ns", "up"))
#basic plot (output above)
p <- ggplot(df , aes(x=Variable, y=Site)) +
geom_point(size=3, aes(color=cls, shape=cls, fill=cls)) +
facet_grid(Basin~., scales="free", space="free")
p
#attempting to convert discrete scale to continuous and apply limits to axis
p2 <- ggplot(df , aes(x=Variable, y=c(as.numeric(factor(df$Site)))) +
geom_point(size=3, aes(color=cls, shape=cls, fill=cls)) +
scale_y_continuous(limits=c(0.95,1.05),breaks=1,labels=levels(factor(df$Site)), expand=c(0,0))+
facet_grid(Basin ~ ., scales="free", space="free")
#I have also tried using
scale_y_discrete(expand=expand_scale(mult = c(0.01, .01)))
#when same method is applied to the axis it seems to work - is this because there is more than 1 break/variable
p3 <- ggplot(df , aes(x=as.numeric(df$Variable), y=as.numeric(factor(df$Site)))) +
geom_point(size=3, aes(color=cls, shape=cls, fill=cls)) +
scale_x_continuous(limits=c(0.95,4.05),breaks=seq(1:length(unique(df$Variable))),labels=levels(factor(df$Variable)),
expand=c(0,0))+
facet_grid(Basin~., scales="free", space="free")
You can use expand_scale() to adjust the limits of the axis. The mult argument will multiply the min and max of the current scale by the percentage you give it. mult = c(0.2, 1) means reduce the distance to the bottom to 20% of it's current distance and keep the distance to the top at the original distance.
ggplot(df, aes(x = Variable, y = Site)) +
geom_point(aes(color = cls, shape = cls, fill = cls), size = 3) +
facet_grid(Basin ~ ., scales = "free", space = "free") +
scale_y_discrete(expand = expand_scale(mult = c(0.2, 1)))

How to apply separate coord_cartesian() to "zoom in" into individual panels of a facet_grid()?

Inspired by the Q Finding the elbow/knee in a curve I started to play around with smooth.spline().
In particular, I want to visualize how the parameter df (degree of freedom) influences the approximation and the first and second derivative. Note that this Q is not about approximation but about a specific problem (or edge case) in visualisation with ggplot2.
First attempt: simple facet_grid()
library(ggplot2)
ggplot(ap, aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw()
dp is a data.table containing the data points for which an approximation is sought and ap is a data.table with the approximated data plus the derivatives (data are given below).
For each row, facet_grid() with scales = "free_y" has choosen a scale which displays all data. Unfortunately, one panel has kind of "outliers" which make it difficult to see details in the other panels. So, I want to "zoom in".
"Zoom in" using coord_cartesian()
ggplot(ap, aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw() +
coord_cartesian(ylim = c(-200, 50))
With the manually selected range, more details in the panels of row 3 have been made visible. But, the limit has been applied to all panels of the grid. So, in row 1 details hardly can been distinguished.
What I'm looking for is a way to apply coord_cartesian() with specific parameters separately to each individual panel (or group of panels, e.g., rowwise) of the grid. For instance, is it possible to manipulate the ggplot object afterwards?
Workaround: Combine individual plots with cowplot
As a workaround, we can create three separate plots and combine them afterwards using the cowplot package:
g0 <- ggplot(ap[deriv == 0], aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw()
g1 <- ggplot(ap[deriv == 1], aes(x, y)) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw() +
coord_cartesian(ylim = c(-50, 50))
g2 <- ggplot(ap[deriv == 2], aes(x, y)) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw() +
coord_cartesian(ylim = c(-200, 100))
cowplot::plot_grid(g0, g1, g2, ncol = 1, align = "v")
Unfortunately, this solution
requires to write code to create three separate plots,
duplicates strips and axes and adds whitespace which isn't available for display of the data.
Is facet_wrap() an alternative?
We can use facet_wrap() instead of facet_grid():
ggplot(ap, aes(x, y)) +
# geom_point(data = dp, alpha = 0.2) + # this line causes error message
geom_line() +
facet_wrap(~ deriv + df, scales = "free_y", labeller = label_both, nrow = 3) +
theme_bw()
Now, the y-axes of every panel are scaled individually exhibiting details of some of the panels. Unfortunately, we still can't "zoom in" into the bottom right panel because using coord_cartesian() would affect all panels.
In addition, the line
geom_point(data = dp, alpha = 0.2)
strangely causes
Error in gList(list(x = 0.5, y = 0.5, width = 1, height = 1, just = "centre", :
only 'grobs' allowed in "gList"
I had to comment this line out, so the the data points which are to be approximated are not displayed.
Data
library(data.table)
# data points
dp <- data.table(
x = c(6.6260, 6.6234, 6.6206, 6.6008, 6.5568, 6.4953, 6.4441, 6.2186,
6.0942, 5.8833, 5.7020, 5.4361, 5.0501, 4.7440, 4.1598, 3.9318,
3.4479, 3.3462, 3.1080, 2.8468, 2.3365, 2.1574, 1.8990, 1.5644,
1.3072, 1.1579, 0.95783, 0.82376, 0.67734, 0.34578, 0.27116, 0.058285),
y = 1:32,
deriv = 0)
# approximated data points and derivatives
ap <- rbindlist(
lapply(seq(2, length(dp$x), length.out = 4),
function(df) {
rbindlist(
lapply(0:2,
function(deriv) {
result <- as.data.table(
predict(smooth.spline(dp$x, dp$y, df = df), deriv = deriv))
result[, c("df", "deriv") := list(df, deriv)]
})
)
})
)
Late answer, but the following hack just occurred to me. Would it work for your use case?
Step 1. Create an alternative version of the intended plot, limiting the range of y values such that scales = "free_y" gives a desired scale range for each facet row. Also create the intended facet plot with the full data range:
library(ggplot2)
library(dplyr)
# alternate plot version with truncated data range
p.alt <- ap %>%
group_by(deriv) %>%
mutate(upper = quantile(y, 0.75),
lower = quantile(y, 0.25),
IQR.multiplier = (upper - lower) * 10) %>%
ungroup() %>%
mutate(is.outlier = y < lower - IQR.multiplier | y > upper + IQR.multiplier) %>%
mutate(y = ifelse(is.outlier, NA, y)) %>%
ggplot(aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw()
# intended plot version with full data range
p <- p.alt %+% ap
Step 2. Use ggplot_build() to generate plot data for both ggplot objects. Apply the panel parameters of the alt version onto the intended version:
p <- ggplot_build(p)
p.alt <- ggplot_build(p.alt)
p$layout$panel_params <- p.alt$layout$panel_params
rm(p.alt)
Step 3. Build the intended plot from the modified plot data, & plot the result:
p <- ggplot_gtable(p)
grid::grid.draw(p)
Note: in this example, I truncated the data range by setting all values more than 10*IQR away from the upper / lower quartile in each facet row as NA. This can be replaced by any other logic for defining outliers.

Resources