Related
I am pretty sure that this is easy to do but I can't seem to find a proper way to query this question into google or stack, so here we are:
I have a plot made in ggplot2 which makes use of geom_jitter(), efficiently creating one row for each element in a factor and plotting its values.
I would like to add a complementary geom_violin() to the plot, but just adding the extra geom_ function to the plot code returns two layers: the jitter and the violin, one on top of the other (as usually expected).
EDIT:
This is how the plot looks like:
How can I have the violin as a separate row, without generating a second plot?
Side quest: how I can I have the jitter and the violin geoms interleaved? (i.e. element A jitter row followed by element A violin row, and then element B jitter row followed by element B violin row)
This is the minimum required code to make it (without all the theme() embellishments):
P1 <- ggplot(data=TEST_STACK_SUB, aes(x=E, y=C, col=A)) +
theme(... , aspect.ratio=0.3) +
geom_point(position = position_jitter(w = 0.30, h = 0), alpha=0.2, size=0.5) +
geom_violin(data=TEST_STACK_SUB, mapping=aes(x=E, y=C), position="dodge") +
scale_x_discrete() +
scale_y_continuous(limits=c(0,1), breaks=seq(0,1,0.1),
labels=c(seq(0,1,0.1))) +
scale_color_gradient2(breaks=seq(0,100,20),
limits=c(0,100),
low="green3",
high="darkorchid4",
midpoint=50,
name="") +
coord_flip()
options(repr.plot.width=8, repr.plot.height=2)
plot(P1)
Here is a subset of the data to generate it (for you to try):
data
How about manipulating your factor as a continuous variable and nudging the entries across the aes() calls like so:
library(dplyr)
library(ggplot2)
set.seed(42)
tibble(x = rep(c(1, 3), each = 10),
y = c(rnorm(10, 2), rnorm(10))) -> plot_data
ggplot(plot_data) +
geom_jitter(aes(x = x - 0.5, y = y), width = 0.25) +
geom_violin(aes(x = x + 0.5, y = y, group = x), width = 0.5) +
coord_flip() +
labs(x = "x") +
scale_x_continuous(breaks = c(1, 3),
labels = paste("Level", 1:2),
trans = scales::reverse_trans())
I am making a bar chart with long axis labels which i need to wrap and right align. The only complication is i need to add a expression to have superscripts.
library(ggplot2)
library(scales)
df <- data.frame("levs" = c("a long label i want to wrap",
"another also long label"),
"vals" = c(1,2))
p <- ggplot(df, aes(x = levs, y = vals)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_x_discrete(labels = wrap_format(20))
which produces the desired result:
with properly wrapped text with all labels fully right aligned.
However now I attempt to add superscript using the below code, and the axis text alignment changes:
p <- ggplot(df, aes(x = levs, y = vals)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_x_discrete(labels = c(expression("exponent"^1),
wrap_format(20)("another also long label")))
(NB I cannot use unicode as is recommended to others with the same question because it does not work with the font I am required to use).
How can I get the axis text to be aligned right even when one of the axis labels includes an expression?
It's a strange thing, but if a vector (e.g. a character vector of labels) includes an object created by expression(), the whole vector appears to be treated as an expression:
# create a simple vector with one expression & one character string
label.vector <- c(expression("exponent"^1),
wrap_format(20)("another also long label"))
> sapply(label.vector, class) # the items have different classes when considered separately
[1] "call" "character"
> class(label.vector) # but together, it's considered an expression
[1] "expression"
... and expressions are always left-aligned. This isn't a ggplot-specific phenomenon; we can observe it in the base plotting functions as well:
# even with default hjust = 0.5 / vjust = 0.5 (i.e. central alignment), an expression is
# anchored based on the midpoint of its last line, & left-aligned within its text block
ggplot() +
annotate("point", x = 1:2, y = 1) +
annotate("text", x = 1, y = 1,
label = expression("long string\nwith single line break"))+
annotate("text", x = 2, y = 1,
label = expression("long string\nwith multiple line\nbreaks here")) +
xlim(c(0.5, 2.5))
# same phenomenon observed in base plot
par(mfrow = c(1, 3))
plot(0, xlab=expression("short string"))
plot(0, xlab=expression("long string\nwith single line break"))
plot(0, xlab=expression("long string\nwith multiple line\nbreaks here"))
Workaround
If we can force each label to be considered on its own, without the effect of other labels in the label vector, the non-expression labels could be aligned like normal character strings. One way to do this is to convert the ggplot object into grob, & replace the single textGrob for y-axis labels with multiple text grobs, one for each label.
Prep work:
# generate plot (leave the labels as default)
p <- ggplot(df, aes(x = levs, y = vals)) +
geom_bar(stat = "identity") +
coord_flip()
p
# define a list (don't use `c(...)` here) of desired y-axis labels, starting with the
# bottom-most label in your plot & work up from there
desired.labels <- list(expression("exponent"^1),
wrap_format(20)("another also long label"))
Grob hacking:
library(grid)
library(magrittr)
# convert to grob object
gp <- ggplotGrob(p)
# locate label grob in the left side y-axis
old.label <- gp$grobs[[grep("axis-l", gp$layout$name)]]$children[["axis"]]$grobs[[1]]$children[[1]]
# define each label as its own text grob, replacing the values with those from
# our list of desired y-axis labels
new.label <- lapply(seq_along(old.label$label),
function(i) textGrob(label = desired.labels[[i]],
x = old.label$x[i], y = old.label$y[i],
just = old.label$just, hjust = old.label$hjust,
vjust = old.label$vjust, rot = old.label$rot,
check.overlap = old.label$check.overlap,
gp = old.label$gp))
# remove the old label
gp$grobs[[grep("axis-l", gp$layout$name)]]$children[["axis"]]$grobs[[1]] %<>%
removeGrob(.$children[[1]]$name)
# add new labels
for(i in seq_along(new.label)) {
gp$grobs[[grep("axis-l", gp$layout$name)]]$children[["axis"]]$grobs[[1]] %<>%
addGrob(new.label[[i]])
}
# check result
grid.draw(gp)
This is a continuation of a question I recently asked (Manually assigning colors with scale_fill_manual only works for certain hexagon sizes).
I was unable to plot geom_hex() so that all hexagons were the same size. Someone solved the problem. However, their solution removed the legend key. Now, I am unable to keep all the hexagons the same size while also retaining the legend.
To be specific, I really want to keep the legend labels sensical. In the example below, the legend has values (0,2,4,6,8,20), rather than hexadecimal labels (#08306B, #08519C, etc).
Below is MWE illustrating the problem. At the end, as per the 3 comments, you can see that I am able to 1) Create a plot with consistent hexagon sizes but no legend, 2) Create a plot with legend, but inconsistent hexagon sizes, 3) Attempt to create a plot with consistent hexagon sizes and legend but fail:
library(ggplot2)
library(hexbin)
library(RColorBrewer)
library(reshape)
set.seed(1)
xbins <- 10
x <- abs(rnorm(10000))
y <- abs(rnorm(10000))
minVal <- min(x, y)
maxVal <- max(x, y)
maxRange <- c(minVal, maxVal)
buffer <- (maxRange[2] - maxRange[1]) / (xbins / 2)
bindata = data.frame(x=x,y=y,factor=as.factor(1))
h <- hexbin(bindata, xbins = xbins, IDs = TRUE, xbnds = maxRange, ybnds = maxRange)
counts <- hexTapply (h, bindata$factor, table)
counts <- t (simplify2array (counts))
counts <- melt (counts)
colnames (counts) <- c ("factor", "ID", "counts")
counts$factor =as.factor(counts$factor)
hexdf <- data.frame (hcell2xy (h), ID = h#cell)
hexdf <- merge (counts, hexdf)
my_breaks <- c(2, 4, 6, 8, 20, 1000)
clrs <- brewer.pal(length(my_breaks) + 3, "Blues")
clrs <- clrs[3:length(clrs)]
hexdf$countColor <- cut(hexdf$counts, breaks = c(0, my_breaks, Inf), labels = rev(clrs))
# Has consistent hexagon sizes, but no legend
ggplot(hexdf, aes(x=x, y=y, hexID=ID, counts=counts, fill=countColor)) + geom_hex(stat="identity", fill=hexdf$countColor) + scale_fill_manual(labels = as.character(c(0, my_breaks)), values = rev(clrs), name = "Count") + geom_abline(intercept = 0, color = "red", size = 0.25) + labs(x = "A", y = "C") + coord_fixed(xlim = c(-0.5, (maxRange[2]+buffer)), ylim = c(-0.5, (maxRange[2]+buffer))) + theme(aspect.ratio=1)
# Has legend, but inconsistent hexagon sizes
ggplot(hexdf, aes(x=x, y=y, hexID=ID, counts=counts, fill=countColor)) + geom_hex(data=hexdf, stat="identity", aes(fill=countColor)) + scale_fill_manual(labels = as.character(c(0, my_breaks)), values = rev(clrs), name = "Count") + geom_abline(intercept = 0, color = "red", size = 0.25) + labs(x = "A", y = "C") + coord_fixed(xlim = c(-0.5, (maxRange[2]+buffer)), ylim = c(-0.5, (maxRange[2]+buffer))) + theme(aspect.ratio=1)
# One attempt to create consistent hexagon sizes and retain legend
ggplot(hexdf, aes(x=x, y=y, hexID=ID, counts=counts, fill=countColor)) + geom_hex(data=hexdf, aes(fill=countColor)) + geom_hex(stat="identity", fill=hexdf$countColor) + scale_fill_manual(labels = as.character(c(0, my_breaks)), values = rev(clrs), name = "Count") + geom_abline(intercept = 0, color = "red", size = 0.25) + labs(x = "A", y = "C") + coord_fixed(xlim = c(-0.5, (maxRange[2]+buffer)), ylim = c(-0.5, (maxRange[2]+buffer))) + theme(aspect.ratio=1)
Any suggestions on how to keep the hexagon sizes consistent while retaining the legend would be very helpful!
Wow, this is an interesting one -- geom_hex seems to really dislike mapping color/fill onto categorical variables. I assume that's because it is designed to be a two-dimensional histogram and visualize continuous summary statistics, but if anyone has any insight into what's going on behind the scenes, I would love to know.
For your specific problem, that really throws a wrench in the works, because you're attempting to have categorical colorization that assigns non-linear groups to the individual hexagons. Conceptually, you might consider why you're doing that. There may be a good reason, but you're essentially taking a linear color gradient and mapping it non-linearly onto your data, which can end up being visually misleading.
However, if that is what you want to do, the best approach I could come up with was to create a new continuous variable that mapped linearly onto your chosen colors and then use those to create a color gradient. Let me try to walk you through my thought process.
You essentially have a continuous variable (counts) that you want to map onto colors. That's easy enough with a simple color gradient, which is the default in ggplot2 for continuous variables. Using your data:
ggplot(hexdf, aes(x=x, y=y)) +
geom_hex(stat="identity", aes(fill=counts))
yields something close.
However, the bins with really high counts wash out the gradient for points with much lower counts, so we need to change the way the gradient maps colors onto values. You've already declared the colors you want to use in the clrs variable; we just need to add a column to your data frame to use in conjunction with these colors to create a smooth gradient. I did that as follows:
all_breaks <- c(0, my_breaks)
breaks_n <- 1:length(all_breaks)
get_break_n <- function(n) {
break_idx <- max(which((all_breaks - n) < 0))
breaks_n[break_idx]
}
hexdf$bin <- sapply(hexdf$counts, get_break_n)
We create the bin variable as the index of the break that is nearest the count variable without exceeding it. Now, you'll notice that:
ggplot(hexdf, aes(x=x, y=y)) +
geom_hex(stat="identity", aes(fill=bin))
is getting much closer to the goal.
The next step is to change how the color gradient maps onto that bin variable, which we can do by adding a call to scale_fill_gradientn:
ggplot(hexdf, aes(x=x, y=y)) +
geom_hex(stat="identity", aes(fill=bin)) +
scale_fill_gradientn(colors=rev(clrs[-1])) # odd color reversal to
# match OP's color mapping
This takes a vector of colors between which you want to interpolate a gradient. The way we've set it up, the points along the interpolation will perfectly match up with the unique values of the bin variable, meaning each value will get one of the colors specified.
Now we're cooking with gas, and the only thing left to do is add the various bells and whistles from the original graph. Most importantly, we need to make the legend look the way we want. This requires three things: (1) changing it from the default color bar to a discretized legend, (2) specifying our own custom labels, and (3) giving it an informative title.
# create the custom labels for the legend
all_break_labs <- as.character(all_breaks[1:(length(allb)-1)])
ggplot(hexdf, aes(x=x, y=y)) +
geom_hex(stat="identity", aes(fill=bin)) +
scale_fill_gradientn(colors=rev(clrs[-1]),
guide="legend", # (1) make legend discrete
labels=all_break_labs, # (2) specify labels
name="Count") + # (3) legend title
# All the other prettification from the OP
geom_abline(intercept = 0, color = "red", size = 0.25) +
labs(x = "A", y = "C") +
coord_fixed(xlim = c(-0.5, (maxRange[2]+buffer)),
ylim = c(-0.5, (maxRange[2]+buffer))) +
theme(aspect.ratio=1)
All of this leaves us with the following graph:
Hopefully that helps you out. For completeness, here's the new code in full:
# ... the rest of your code before the plots
clrs <- clrs[3:length(clrs)]
hexdf$countColor <- cut(hexdf$counts,
breaks = c(0, my_breaks, Inf),
labels = rev(clrs))
### START OF NEW CODE ###
# create new bin variable
all_breaks <- c(0, my_breaks)
breaks_n <- 1:length(all_breaks)
get_break_n <- function(n) {
break_idx <- max(which((all_breaks - n) < 0))
breaks_n[break_idx]
}
hexdf$bin <- sapply(hexdf$counts, get_break_n)
# create legend labels
all_break_labs <- as.character(all_breaks[1:(length(all_breaks)-1)])
# create final plot
ggplot(hexdf, aes(x=x, y=y)) +
geom_hex(stat="identity", aes(fill=bin)) +
scale_fill_gradientn(colors=rev(clrs[-1]),
guide="legend",
labels=all_break_labs,
name="Count") +
geom_abline(intercept = 0, color = "red", size = 0.25) +
labs(x = "A", y = "C") +
coord_fixed(xlim = c(-0.5, (maxRange[2]+buffer)),
ylim = c(-0.5, (maxRange[2]+buffer))) +
theme(aspect.ratio=1)
This question already has answers here:
Force the origin to start at 0
(4 answers)
Closed 1 year ago.
I have a data frame of positive x and y values that I want to present as a scatterplot in ggplot2. The values are clustered away from the point (0,0), but I want to include the x=0 and y=0 lines in the plot to show overall magnitude. How can I do this?
set.seed(349)
d <- data.frame(x = runif(10, 1, 2), y = runif(10, 1, 2))
ggplot(d, aes(x,y)) + geom_point()
But what I want is something roughly equivalent to this, without having to specify both ends of the limits:
ggplot(d, aes(x=x, y=y)) + geom_point() +
scale_x_continuous(limits = c(0,2)) + scale_y_continuous(limits = c(0,2))
One option is to just anchor the x and y min, but leave the max unspecified
ggplot(d, aes(x,y)) + geom_point() +
scale_x_continuous(limits = c(0,NA)) +
scale_y_continuous(limits = c(0,NA))
This solution is a bit hacky, but it works for standard plot also.
Where d is the original dataframe we add two "fake" data points:
d2 = rbind(d,c(0,NA),c(NA,0))
This first extra data point has x-coordinate=0 and y-coordinate=NA. This means 0 will be included in the xlim, but the point will not be displayed (because it has no y-coordinate).
The other data point does the same for the y limits.
Just plot d2 instead of d and it will work as desired.
If using ggplot, as opposed to plot, you will get a warning about missing values. This can be suppressed by replacing geom_point() with geom_point(na.rm=T)
One downside with this solution (especially for plot) is that an extra value must be added for any other 'per-data-point' parameters, such as col= if you give each point a different colour.
Use the function expand_limits(x=0,y=0), i.e.:
set.seed(349)
d <- data.frame(x = runif(10, 1, 2), y = runif(10, 1, 2))
ggplot(d, aes(x,y)) + geom_point() + expand_limits(x = 0, y = 0)
Context
I want to plot two ggplot2 on the same page with the same legend. http://code.google.com/p/gridextra/wiki/arrangeGrob discribes, how to do this. This already looks good. But... In my example I have two plots with the same x-axis and different y-axis. When the range of the the y-axis is at least 10 times higher than of the other plot (e.g. 10000 instead of 1000), ggplot2 (or grid?) does not align the plots correct (see Output below).
Question
How do I also align the left side of the plot, using two different y-axis?
Example Code
x = c(1, 2)
y = c(10, 1000)
data1 = data.frame(x,y)
p1 <- ggplot(data1) + aes(x=x, y=y, colour=x) + geom_line()
y = c(10, 10000)
data2 = data.frame(x,y)
p2 <- ggplot(data2) + aes(x=x, y=y, colour=x) + geom_line()
# Source: http://code.google.com/p/gridextra/wiki/arrangeGrob
leg <- ggplotGrob(p1 + opts(keep="legend_box"))
legend=gTree(children=gList(leg), cl="legendGrob")
widthDetails.legendGrob <- function(x) unit(3, "cm")
grid.arrange(
p1 + opts(legend.position="none"),
p2 + opts(legend.position="none"),
legend=legend, main ="", left = "")
Output
A cleaner way of doing the same thing but in a more generic way is by using the formatter arg:
p1 <- ggplot(data1) +
aes(x=x, y=y, colour=x) +
geom_line() +
scale_y_continuous(formatter = function(x) format(x, width = 5))
Do the same for your second plot and make sure to set the width >= the widest number you expect across both plots.
1. Using cowplot package:
library(cowplot)
plot_grid(p1, p2, ncol=1, align="v")
2. Using tracks from ggbio package:
Note: There seems to be a bug, x ticks do not align. (tested on 17/03/2016, ggbio_1.18.5)
library(ggbio)
tracks(data1=p1,data2=p2)
If you don't mind a shameless kludge, just add an extra character to the longest label in p1, like this:
p1 <- ggplot(data1) +
aes(x=x, y=y, colour=x) +
geom_line() +
scale_y_continuous(breaks = seq(200, 1000, 200),
labels = c(seq(200, 800, 200), " 1000"))
I have two underlying questions, which I hope you'll forgive if you have your reasons:
1) Why not use the same y axis on both? I feel like that's a more straight-forward approach, and easily achieved in your above example by adding scale_y_continuous(limits = c(0, 10000)) to p1.
2) Is the functionality provided by facet_wrap not adequate here? It's hard to know what your data structure is actually like, but here's a toy example of how I'd do this:
library(ggplot2)
# Maybe your dataset is like this
x <- data.frame(x = c(1, 2),
y1 = c(0, 1000),
y2 = c(0, 10000))
# Molten data makes a lot of things easier in ggplot
x.melt <- melt(x, id.var = "x", measure.var = c("y1", "y2"))
# Plot it - one page, two facets, identical axes (though you could change them),
# one legend
ggplot(x.melt, aes(x = x, y = value, color = x)) +
geom_line() +
facet_wrap( ~ variable, nrow = 2)