Pad (expand) only the top of continuous scale in ggplot2 [duplicate] - r

This question already has an answer here:
How to expand axis asymmetrically with ggplot2 without setting limits manually?
(1 answer)
Closed 7 years ago.
I've got some data that share a common x-axis but have two different y variables:
set.seed(42)
data = data.frame(
x = rep(2000:2004, 2),
y = c(rnorm(5, 20, 5), rnorm(5, 150, 15)),
var = rep(c("A", "B"), each = 5)
)
I'm using a faceted line plot to display the data:
p = ggplot(data, aes(x, y)) +
geom_line() +
facet_grid(var ~ ., scales = "free_y")
I'd like the y-axis to include 0. This is easy enough:
p + expand_limits(y = 0)
but then my data looks crowded too close to the top of my facets. So I'd like to pad the range of the axis. Normally scale_y_continuous(expand = ...) is used for padding the axis, but the padding is applied symmetrically to the top and bottom, making the y-axis go well below 0.
p + expand_limits(y = 0) +
scale_y_continuous(expand = c(0.3, 0.2))
# the order of expand_limits and scale_y_continuous
# does not change the output
I can't explicitly set limits because of the facets with free y scales. What's the best way to have the y-scale extend down to 0 (not below!), while multiplicatively padding the top of the y scale?

You could create an extra data set with a single point for each facet and plot it invisibly with geom_blank(). The point is chosen to be a fixed factor larger than the maximum value in the given facet. Here, I choose that factor to be 1.5 to make the effect clearly visible:
max_data <- aggregate(y ~ var, data = data, FUN = function(y) max(y) * 1.5)
max_data <- transform(max_data, x = 2000)
p + geom_blank(data = max_data)
And this is what I get:

Related

How to make histogram in ggplot2 start at zero on X axis? [duplicate]

This question already has answers here:
How to align the bars of a histogram with the x axis?
(5 answers)
Closed 1 year ago.
When using geom_histogram() to plot histogram, the plot will always not start at zero as expect. See my example below:
set.seed(20)
randomnum <- rnorm(40)
data <- data.frame(number = randomnum[randomnum > 0])
ggplot(data, aes(x = number)) +
geom_histogram(color="black", fill="grey40")
Even the data do not contain negative number, the histgram will start at a negative value. The code below may see more clear:
ggplot(data, aes(x = number)) +
geom_histogram(color="black", fill="grey40", binwidth = 0.1) +
scale_x_continuous(breaks = c(0, seq(0, 2, 0.1)))
The histgram will start at -0.1, but the original data do not contain the negative data.
My ideal plot is the x axis will start at 0, and every plot bar share the same width. The plot may be like this:
There are some simmilar questions in stackoverflow, link1 and link2. They change the parameter in scale_x_continuous(). But it only changes the axis scales and labels and do not solve my problem.
OK, the solution is:
ggplot(data, aes(x = number)) +
geom_histogram(color="black", fill="grey40", binwidth = 0.1,
boundary = 0, closed = "left") +
scale_x_continuous(breaks = c(0, seq(0, 2, 0.1)))
The boundary is the key parameter!

facet_zoom() while setting axis limits

I would like to use facet_zoom() to zoom in on part of an axis that has limits explicitly set. However, using scale_*(limits = *) and coord_cartesian(xlim = *) overrides the zoomed facet's scales as well such that both have the same limits. Is there a way around this? Maybe I could add some data points near the limits and then set their alpha = 0... Any other ideas?
library(ggplot2)
library(ggforce)
# works with no limits specified
ggplot(mpg, aes(x = hwy, y = cyl)) +
geom_point() +
facet_zoom(xlim = c(20, 25))
# fails with limits specified
ggplot(mpg, aes(x = hwy, y = cyl)) +
scale_x_continuous(limits = c(0, 50)) +
geom_point() +
facet_zoom(xlim = c(20, 25))
# fails with coord_cartesian()
ggplot(mpg, aes(x = hwy, y = cyl)) +
scale_x_continuous() +
coord_cartesian(xlim = c(0, 50)) +
geom_point() +
facet_zoom(xlim = c(20, 25))
I don't have enough knowledge of the underlying intricacies in FacetZoom, but you can check if the following workarounds provide a reasonable starting point.
Plot for demonstration
One of the key differences between setting limits in scales_* vs. coord_* is the clipping effect (screenshot taken from the ggplot2 cheatsheet found here). Since this effect isn't really clear in a scatterplot, I added a geom_line layer and adjusted the specified limits so that the limits extend beyond the data range on one end of the x-axis, & clips the data on the other end.
p <- ggplot(mpg, aes(x = hwy, y = cyl)) +
geom_point() +
geom_line(aes(colour = fl), size = 2) +
facet_zoom(xlim = c(20, 25)) +
theme_bw()
# normal zoomed plot / zoomed plot with limits set in scale / coord
p0 <- p
p1 <- p + scale_x_continuous(limits = c(0, 35))
p2 <- p + coord_cartesian(xlim = c(0, 35))
We can see that while p0 behaves as expected, both p1 & p2 show both the original facet (top) & the zoomed facet (bottom) with the same range of c(0, 35).
In p1's case, the shaded box also expanded to cover the entire top facet. In p2's case, the zoom box stayed in exactly the same position as p0, & as a result no longer covers the zoomed range of c(20, 25).
Workaround for limits set in scale_*
# convert ggplot objects to form suitable for rendering
gp0 <- ggplot_build(p0)
gp1 <- ggplot_build(p1)
# re-set zoomed facet's limits to match zoomed range
k <- gp1$layout$layout$SCALE_X[gp1$layout$layout$name == "x"]
gp1$layout$panel_scales_x[[k]]$limits <- gp1$layout$panel_scales_x[[k]]$range$range
# re-set zoomed facet's panel parameters based on original version p0
k <- gp1$layout$layout$PANEL[gp1$layout$layout$name == "x"]
gp1$layout$panel_params[[k]] <- gp0$layout$panel_params[[k]]
# convert built ggplot object to gtable of grobs as usual & print result
gt1 <- ggplot_gtable(gp1)
grid::grid.draw(gt1)
The zoomed facet now shows the zoomed range c(20, 25) correctly, while the shaded box shrinks to cover the correct range in the original facet. Since this method removes unseen data points, all lines in the original facet stay within the confines of the facet.
Workaround for limits set in coord_*
# convert ggplot objects to form suitable for rendering
gp0 <- ggplot_build(p0)
gp1 <- ggplot_build(p1)
# apply coord limits to original facet's scale limits
k <- gp2$layout$layout$SCALE_X[gp2$layout$layout$name == "orig"]
gp2$layout$panel_scales_x[[k]]$limits <- gp2$layout$coord$limits$x
# re-set zoomed facet's panel parameters based on original version without setting
# limits in scale
k <- gp1$layout$layout$PANEL[gp1$layout$layout$name == "x"]
gp2$layout$panel_params[[k]] <- gp0$layout$panel_params[[k]]
# convert built ggplot object to gtable of grobs as usual,
# & print result
gt2 <- ggplot_gtable(gp2)
grid::grid.draw(gt2)
The zoomed facet now shows the zoomed range c(20, 25) correctly, while the shaded box shifts to cover the correct range in the original facet. Since this method includes unseen data points, some lines in the original facet extend beyond the facet's confines.
Note: These workarounds should work with zoom in y + limits set in y-axis as well, as long as all references to "x" / panel_scales_x / SCALE_X above are changed to "y" / panel_scales_y / SCALE_Y. I haven't tested this for other combinations such as zoom in both x & y, but the broad principle ought to be similar.

Add new geom as new row in ggplot2, preventing layering of plots

I am pretty sure that this is easy to do but I can't seem to find a proper way to query this question into google or stack, so here we are:
I have a plot made in ggplot2 which makes use of geom_jitter(), efficiently creating one row for each element in a factor and plotting its values.
I would like to add a complementary geom_violin() to the plot, but just adding the extra geom_ function to the plot code returns two layers: the jitter and the violin, one on top of the other (as usually expected).
EDIT:
This is how the plot looks like:
How can I have the violin as a separate row, without generating a second plot?
Side quest: how I can I have the jitter and the violin geoms interleaved? (i.e. element A jitter row followed by element A violin row, and then element B jitter row followed by element B violin row)
This is the minimum required code to make it (without all the theme() embellishments):
P1 <- ggplot(data=TEST_STACK_SUB, aes(x=E, y=C, col=A)) +
theme(... , aspect.ratio=0.3) +
geom_point(position = position_jitter(w = 0.30, h = 0), alpha=0.2, size=0.5) +
geom_violin(data=TEST_STACK_SUB, mapping=aes(x=E, y=C), position="dodge") +
scale_x_discrete() +
scale_y_continuous(limits=c(0,1), breaks=seq(0,1,0.1),
labels=c(seq(0,1,0.1))) +
scale_color_gradient2(breaks=seq(0,100,20),
limits=c(0,100),
low="green3",
high="darkorchid4",
midpoint=50,
name="") +
coord_flip()
options(repr.plot.width=8, repr.plot.height=2)
plot(P1)
Here is a subset of the data to generate it (for you to try):
data
How about manipulating your factor as a continuous variable and nudging the entries across the aes() calls like so:
library(dplyr)
library(ggplot2)
set.seed(42)
tibble(x = rep(c(1, 3), each = 10),
y = c(rnorm(10, 2), rnorm(10))) -> plot_data
ggplot(plot_data) +
geom_jitter(aes(x = x - 0.5, y = y), width = 0.25) +
geom_violin(aes(x = x + 0.5, y = y, group = x), width = 0.5) +
coord_flip() +
labs(x = "x") +
scale_x_continuous(breaks = c(1, 3),
labels = paste("Level", 1:2),
trans = scales::reverse_trans())

R - creating a bar and line on same chart, how to add a second y axis

I'm trying to create a ggplot2 graph showing a bar graph and a line graph overlaying each other. In excel this would be done by adding a second axis.
The x axis represents product type, the y values of the bar graph should represent revenue, and the line graph I want to represent profit margin as a percentage. The value of the line graph and the bar chart should be independent of each other, i.e. there is no such relationship.
require(ggplot2)
df <- data.frame(x = c(1:5), y = abs(rnorm(5)*100))
df$y2 <- abs(rnorm(5))
ggplot(df, mapping= aes(x=as.factor(`x`), y = `y`)) +
geom_col(aes(x=as.factor(`x`), y = `y`),fill = 'blue')+
geom_line(mapping= aes(x=as.factor(`x`), y = `y`),group=1) +
geom_label(aes(label= round(y2,2))) +
scale_y_continuous() +
theme_bw() +
theme(axis.text.x = element_text(angle = 20,hjust=1))
The image above produces almost what I want. However, the scaling is incorrect - I would need the 1.38 and 0.23 value to be ordered by magnitude, i.e. the point 0.23 should show below 1.38. I am also not sure how to add another axsis on the right hand side.
Starting with version 2.2.0 of ggplot2, it is possible to add a secondary axis - see this detailed demo. Also, some already answered questions with this approach: here, here, here or here. An interesting discussion about adding a second OY axis here.
The main idea is that one needs to apply a transformation for the second OY axis. In the example below, the transformation factor is the ratio between the max values of each OY axis.
# Prepare data
library(ggplot2)
set.seed(2018)
df <- data.frame(x = c(1:5), y = abs(rnorm(5)*100))
df$y2 <- abs(rnorm(5))
# The transformation factor
transf_fact <- max(df$y)/max(df$y2)
# Plot
ggplot(data = df,
mapping = aes(x = as.factor(x),
y = y)) +
geom_col(fill = 'blue') +
# Apply the factor on values appearing on second OY axis
geom_line(aes(y = transf_fact * y2), group = 1) +
# Add second OY axis; note the transformation back (division)
scale_y_continuous(sec.axis = sec_axis(trans = ~ . / transf_fact,
name = "Second axis")) +
geom_label(aes(y = transf_fact * y2,
label = round(y2, 2))) +
theme_bw() +
theme(axis.text.x = element_text(angle = 20, hjust = 1))
But if you have a particular wish for the one-to-one transformation, like, say value 100 from Y1 should correspond to value 1 from Y2 (200 to 2 and so on), then change the transformation (multiplication) factor to 100 (100/1): transf_fact <- 100/1 and you get this:
The advantage of transf_fact <- max(df$y)/max(df$y2) is using the plotting area in a optimum way when using two different scales - try something like transf_fact <- 1000/1 and I think you'll get the idea.

How do I anchor one side of axis limits? [duplicate]

This question already has answers here:
Force the origin to start at 0
(4 answers)
Closed 1 year ago.
I have a data frame of positive x and y values that I want to present as a scatterplot in ggplot2. The values are clustered away from the point (0,0), but I want to include the x=0 and y=0 lines in the plot to show overall magnitude. How can I do this?
set.seed(349)
d <- data.frame(x = runif(10, 1, 2), y = runif(10, 1, 2))
ggplot(d, aes(x,y)) + geom_point()
But what I want is something roughly equivalent to this, without having to specify both ends of the limits:
ggplot(d, aes(x=x, y=y)) + geom_point() +
scale_x_continuous(limits = c(0,2)) + scale_y_continuous(limits = c(0,2))
One option is to just anchor the x and y min, but leave the max unspecified
ggplot(d, aes(x,y)) + geom_point() +
scale_x_continuous(limits = c(0,NA)) +
scale_y_continuous(limits = c(0,NA))
This solution is a bit hacky, but it works for standard plot also.
Where d is the original dataframe we add two "fake" data points:
d2 = rbind(d,c(0,NA),c(NA,0))
This first extra data point has x-coordinate=0 and y-coordinate=NA. This means 0 will be included in the xlim, but the point will not be displayed (because it has no y-coordinate).
The other data point does the same for the y limits.
Just plot d2 instead of d and it will work as desired.
If using ggplot, as opposed to plot, you will get a warning about missing values. This can be suppressed by replacing geom_point() with geom_point(na.rm=T)
One downside with this solution (especially for plot) is that an extra value must be added for any other 'per-data-point' parameters, such as col= if you give each point a different colour.
Use the function expand_limits(x=0,y=0), i.e.:
set.seed(349)
d <- data.frame(x = runif(10, 1, 2), y = runif(10, 1, 2))
ggplot(d, aes(x,y)) + geom_point() + expand_limits(x = 0, y = 0)

Resources