Force scatter plot grid to be square in ggpot2 - r

I'm trying to force the grid of a scatter plot to be composed of squares, with x and y values that have different ranges.
I tried to force a square shape of the whole plot (aspect.ratio=1), but this does not solve the problem of different ranges. Then I tried to change limits of values of my axes.
1)Here is what I tried first:
p + theme(aspect.ratio = 1) +
coord_fixed(ratio=1, xlim = c(-0.050,0.050),ylim = c(-0.03,0.03))
2) I changed the ratio by using the range of the values for each axis:
p + coord_fixed(ratio=0.06/0.10, xlim = c(-0.050,0.050), ylim = c(-0.03,0.03))
3)Then I changed the limits of y to match those of x:
p + theme(aspect.ratio = 1) +
coord_fixed(ratio=1, xlim = c(-0.050,0.050),ylim = c(-0.05,0.05))
1) The grid on the background is composed by rectangles.
2) I would expect this to change the position of the tick marks automatically in order to give me a grid composed of squares. Still triangles.
3) It obviously worked 'cause I matched the ranges of x and y. But there was a lot of empty space in the graph.
Is there something else I should try?
Thanks in advance.

If you want the plot to be square and you want the grid to be square you can do this by rescaling the y variable to be on the same scale as the x variable (or vice versa) for plotting, and then inverting the rescaling to generate the correct axis value labels for the rescaled axis.
Here's an example using the mtcars data frame, and we'll use the rescale function from the scales package.
First let's create a plot of mpg vs. hp but with the hp values rescaled to be on the same scale as mpg:
library(tidyverse)
library(scales)
theme_set(theme_bw())
p = mtcars %>%
mutate(hp.scaled = rescale(hp, to=range(mpg))) %>%
ggplot(aes(mpg, hp.scaled)) +
geom_point() +
coord_fixed() +
labs(x="mpg", y="hp")
Now we can invert the rescaling to generate the correct value labels for hp. We do that below by supplying the inverting function to the labels argument of scale_y_continuous:
p + scale_y_continuous(labels=function(x) rescale(x, to=range(mtcars$hp)))
But note that rescaling back to the original hp scale results in non-pretty breaks. We can fix that by generating pretty breaks on the hp scale, rescaling those to the mpg scale to get the locations where we want the tick marks and then inverting that to get the label values. However, in that case we won't get a square grid if we want to keep the overall plot panel square:
p + scale_y_continuous(breaks = rescale(pretty_breaks(n=5)(mtcars$hp),
from=range(mtcars$hp),
to=range(mtcars$mpg)),
labels = function(x) rescale(x, from=range(mtcars$mpg), to=range(mtcars$hp)))

I'm not sure what code you are using, it is missing in block 1 and 3. But using the mtcars data set the following works:
library(ggplot2)
ggplot(mtcars, aes(mpg, wt)) +
geom_point() +
coord_fixed(ratio = 1) +
scale_x_continuous(breaks = seq(10, 35, 1)) +
scale_y_continuous(breaks = seq(1, 6, 1))
The last two lines make it clear that 1 point on the x-axis is equal to 1 point on the y-axis.
In the documention you will further find the following advise:
ensures that the ranges of axes are equal to the specified ratio by
adjusting the plot aspect ratio

Related

How do I make the y axis of a histogram both logarithmic and percentage?

I am trying to make a histogram in ggplot2, and I'm trying to make the y axis both logarithmic and showing percentages, to get it as 0.1%, 1%, 10% etc.
My dataset is 60.000 samples but I hope this kind of captures it:
-0.0651
-0.0649
-0.0661
-0.0652
-0.058
-0.045
-0.022
-0.001
+0.028
+0.039
-0.022
-0.0651
-0.0652
I can do both these things (1 making the y axis log and 1 making it percentage) independently. So when I just do percentage, I use the following code:
ggplot(aphist, aes(aphist$baseline1CW_Vm_samp)) +
geom_histogram(aes(y = (..count..)/sum(..count..)), binwidth=0.0008) +
And I get this output, which has the percentages on it:
But I now want to make the y axis logarithmic. When I do that the way I've been taught, using the following code:
ggplot(aphist, aes(aphist$baseline1CW_Vm_samp)) +
geom_histogram(aes(y = (..count..)/sum(..count..)), binwidth=0.0008) +
scale_y_continuous(trans = 'log10')
I suddenly get a very strange, flipped upside down plot:
..
I suspect it is because there are some samples which are 0 or close to 0 but I'm unsure. Any help would be much appreciated!
Why the bars point downwards and what to do about it
Bar plots in ggplot are created such that bars for positive values point upwards starting at y = 0, while bars for negative values point downwards from the same axis. You are showing density on the y-axis which lies between 0 and 1 by definition. The logarithm of a number in that range is negative and therefore all your bars point downwards.
I don't know of a way to let ggplot do what you want automatically. However, you can achieve your goal by plotting counts instead of density. This will work, because counts are 1 or larger, which means that the logarithm is positive. The exception is, of course, when counts are 0. The logarithm of 0 diverges and those values won't be plotted, which is equivalent to plotting bars with zero height.
A simple example
Since I don't have your data, I will show a solution using the built-in dataset faithful. It should be easy enough to adapt that to your data.
As a demonstration of what I mean, I first show you an example, where the y-axis is not logarithmic. This has the advantage that the plot can be easily created without any tricks:
bw <- 2
n <- nrow(faithful)
ggplot(faithful, aes(waiting)) +
geom_histogram(aes(y = stat(density)), binwidth = bw)
Note that I have used stat(density) instead of (..count..)/sum(..count..), which is a more modern way of achieving the same. I have also stored the binwdith and the number of data points into variables, since I will use those values often. The following code gives exactly the same image:
ggplot(faithful, aes(waiting)) +
geom_histogram(binwidth = bw) +
scale_y_continuous(
breaks = seq(0, 0.05, 0.01) * (bw * n),
labels = function(x) x / (bw * nrow(faithful))
)
Note that this time I plot counts, not density. However, I use the arguments breaks and labels in scale_y_continuous() to redefine the positions of the breaks and their labels such that they show density nevertheless.
Solution with logarithmic y-axis
The same principle can be applied to the log-plot. First, I create the log-plot the same way you did, such that you can see that I end up with the same problem: the bars point downwards.
ggplot(faithful, aes(waiting)) +
geom_histogram(aes(y = stat(density)), binwidth = 2) +
scale_y_log10()
But by plotting counts and redefining the labels, you can get a more appropriate image:
ggplot(faithful, aes(waiting)) +
geom_histogram(binwidth = bw) +
scale_y_log10(
breaks = seq(0, 0.05, 0.01) * (bw * n),
labels = function(x) x / (bw * nrow(faithful))
)

facet_zoom() while setting axis limits

I would like to use facet_zoom() to zoom in on part of an axis that has limits explicitly set. However, using scale_*(limits = *) and coord_cartesian(xlim = *) overrides the zoomed facet's scales as well such that both have the same limits. Is there a way around this? Maybe I could add some data points near the limits and then set their alpha = 0... Any other ideas?
library(ggplot2)
library(ggforce)
# works with no limits specified
ggplot(mpg, aes(x = hwy, y = cyl)) +
geom_point() +
facet_zoom(xlim = c(20, 25))
# fails with limits specified
ggplot(mpg, aes(x = hwy, y = cyl)) +
scale_x_continuous(limits = c(0, 50)) +
geom_point() +
facet_zoom(xlim = c(20, 25))
# fails with coord_cartesian()
ggplot(mpg, aes(x = hwy, y = cyl)) +
scale_x_continuous() +
coord_cartesian(xlim = c(0, 50)) +
geom_point() +
facet_zoom(xlim = c(20, 25))
I don't have enough knowledge of the underlying intricacies in FacetZoom, but you can check if the following workarounds provide a reasonable starting point.
Plot for demonstration
One of the key differences between setting limits in scales_* vs. coord_* is the clipping effect (screenshot taken from the ggplot2 cheatsheet found here). Since this effect isn't really clear in a scatterplot, I added a geom_line layer and adjusted the specified limits so that the limits extend beyond the data range on one end of the x-axis, & clips the data on the other end.
p <- ggplot(mpg, aes(x = hwy, y = cyl)) +
geom_point() +
geom_line(aes(colour = fl), size = 2) +
facet_zoom(xlim = c(20, 25)) +
theme_bw()
# normal zoomed plot / zoomed plot with limits set in scale / coord
p0 <- p
p1 <- p + scale_x_continuous(limits = c(0, 35))
p2 <- p + coord_cartesian(xlim = c(0, 35))
We can see that while p0 behaves as expected, both p1 & p2 show both the original facet (top) & the zoomed facet (bottom) with the same range of c(0, 35).
In p1's case, the shaded box also expanded to cover the entire top facet. In p2's case, the zoom box stayed in exactly the same position as p0, & as a result no longer covers the zoomed range of c(20, 25).
Workaround for limits set in scale_*
# convert ggplot objects to form suitable for rendering
gp0 <- ggplot_build(p0)
gp1 <- ggplot_build(p1)
# re-set zoomed facet's limits to match zoomed range
k <- gp1$layout$layout$SCALE_X[gp1$layout$layout$name == "x"]
gp1$layout$panel_scales_x[[k]]$limits <- gp1$layout$panel_scales_x[[k]]$range$range
# re-set zoomed facet's panel parameters based on original version p0
k <- gp1$layout$layout$PANEL[gp1$layout$layout$name == "x"]
gp1$layout$panel_params[[k]] <- gp0$layout$panel_params[[k]]
# convert built ggplot object to gtable of grobs as usual & print result
gt1 <- ggplot_gtable(gp1)
grid::grid.draw(gt1)
The zoomed facet now shows the zoomed range c(20, 25) correctly, while the shaded box shrinks to cover the correct range in the original facet. Since this method removes unseen data points, all lines in the original facet stay within the confines of the facet.
Workaround for limits set in coord_*
# convert ggplot objects to form suitable for rendering
gp0 <- ggplot_build(p0)
gp1 <- ggplot_build(p1)
# apply coord limits to original facet's scale limits
k <- gp2$layout$layout$SCALE_X[gp2$layout$layout$name == "orig"]
gp2$layout$panel_scales_x[[k]]$limits <- gp2$layout$coord$limits$x
# re-set zoomed facet's panel parameters based on original version without setting
# limits in scale
k <- gp1$layout$layout$PANEL[gp1$layout$layout$name == "x"]
gp2$layout$panel_params[[k]] <- gp0$layout$panel_params[[k]]
# convert built ggplot object to gtable of grobs as usual,
# & print result
gt2 <- ggplot_gtable(gp2)
grid::grid.draw(gt2)
The zoomed facet now shows the zoomed range c(20, 25) correctly, while the shaded box shifts to cover the correct range in the original facet. Since this method includes unseen data points, some lines in the original facet extend beyond the facet's confines.
Note: These workarounds should work with zoom in y + limits set in y-axis as well, as long as all references to "x" / panel_scales_x / SCALE_X above are changed to "y" / panel_scales_y / SCALE_Y. I haven't tested this for other combinations such as zoom in both x & y, but the broad principle ought to be similar.

How to plot a scatter plot with a radius given by a variable?

I'm pretty new to ggplot, I need to make a scatterplot with the size proportional to a variable of the df, so that some of the points get "zero radius" because a 0 value in the variable, but when I use the size aes, the points with 0 value are mapped to a non-zero radius point.
how can I get the desired effect?
According to the help, "scale_size_area ensures that a value of 0 is mapped to a size of 0". [Also worth noting, you can get a similar effect if you use scale_radius or scale_size and set the lower limit of the range argument to zero]. So we can do:
df <- data.frame(x=0:5, y=0:5, s=0:5)
ggplot(df, aes(x=x, y=y)) +
geom_point(aes(size=s)) +
scale_size_area() # NB scale_size(range = c(0, 6)) also works
Note that this still shows the location of 'zero size' points using a single pixel. If you want them to be actually invisible, you can filter out the rows of zero size like this:
ggplot(df, aes(x=x, y=y, size=s)) +
geom_point(data = df[df$s != 0,]) +
scale_size_area()

How to adjust the ordering of labels in the default legend in ggplot2 so that it corresponds to the order in the data

I am plotting a forest plot in ggplot2 and am having issues with the ordering of the labels in the legend matching the order of the labels in the data set. Here is my code below.
data code
d<-data.frame(x=c("Co-K(W) N=720", "IH-K(W) N=67", "IF-K(W) N=198", "CO-K(B)N=78", "IH-K(B) N=13", "CO=A(W) N=874","D-Sco Ad(W) N=346","DR-Ad (W) N=892","CE_A(W) N=274","CO-Ad(B) N=66","D-So Ad(B) N=215","DR-Ad(B) N=123","CE-Ad(B) N=79"),
y = rnorm(13, 0, 0.1))
d <- transform(d, ylo = y-1/13, yhi=y+1/13)
d$x <- factor(d$x, levels=rev(d$x)) # reverse ordering
forest plot code
credplot.gg <- function(d){
# d is a data frame with 4 columns
# d$x gives variable names
# d$y gives center point
# d$ylo gives lower limits
# d$yhi gives upper limits
require(ggplot2)
p <- ggplot(d, aes(x=x, y=y, ymin=ylo, ymax=yhi,group=x,colour=x,)) +
geom_pointrange(size=1) +
theme_bw() +
scale_color_discrete(name="Sample") +
coord_flip() +
theme(legend.key=element_rect(fill='cornsilk2')) +
guides(colour = guide_legend(override.aes = list(size=0.5))) +
geom_hline(aes(x=0), colour = 'red', lty=2) +
xlab('Cohort') + ylab('CI') + ggtitle('Forest Plot')
return(p)
}
credplot.gg(d)
This is what I get. As you can see the labels on the y axis matches the labels in the order that it is in the data. However, it is not the same order in the legend. I'm not sure how to correct this. This is my first time creating a plot in ggplot2. Any feedback is well appreciated.Thanks in advanced
Nice plot, especially for a first ggplot! I've not tested, but I think all you need is to add reverse=TRUE inside your colour's guide_legend(found this in the Cookbook for R).
If I were to make one more comment, I'd say that ordering your vertical factor by numeric value often makes comparisons easier when alphabetical order isn't particularly meaningful. (Though maybe your alpha order is meaningful.)

Adjusting y-axis line position in ggplot2

I'm trying to remove the space between the y-axis line and the first tick mark in my plot. Here's an example:
set.seed(201)
n <- 100
dat <- data.frame(xval = (1:n+rnorm(n,sd=5))/20, yval = 2*2^((1:n+rnorm(n,sd=5))/20))
dat[dat[,1] < 0,1] <- 0
dat[dat[,2] < 0,2] <- 0
ggplot(dat, aes(xval, yval)) + geom_point()
This code plots a graphic with a space between the y-axis line and the zero tick mark in the x-axis. How can I remove it?
You can alter this "gap" using the scale_x_continuous function:
ggplot(dat, aes(xval, yval)) + geom_point() +
scale_x_continuous(expand=c(0,0))
From the help file on scale_x_continuous,
expand: numeric vector of length two giving multiplicative and
additive expansion constants. These constants ensure that the data is
placed some distance away from the axes.
To alter the space on the y-axis, use scale_y_continuous

Resources