Date labels overlap when putting multiple ggplot plots on single page - r

I am trying to put multiple ggplot2 time series plots on a page using the gridExtra package's arrange() function. Unfortunately, I am finding that the x-axis labels get pushed together; it appears that the plot is putting the same number of x-axis labels as a full-page chart, even though my charts only take up 1/4 of a page. Is there a better way to do this? I would prefer not to have to manually set any points, since I will be dealing with a large number of charts that span different date ranges and have different frequencies.
Here is some example code that replicates the problem:
dfm <- data.frame(index=seq(from=as.Date("2000-01-01"), length.out=100, by="year"),
x1=rnorm(100),
x2=rnorm(100))
mydata <- melt(dfm, id="index")
pdf("test.pdf")
plot1 <- ggplot(mydata, aes(index, value, color=variable))+geom_line()
plot2 <- ggplot(mydata, aes(index, value, color=variable))+geom_line()
plot3 <- ggplot(mydata, aes(index, value, color=variable))+geom_line()
plot4 <- ggplot(mydata, aes(index, value, color=variable))+geom_line()
arrange(plot1, plot2, plot3, plot4, ncol=2, nrow=2)
dev.off()

either rotate the axis labels
+ opts(axis.text.x=theme_text(angle=45, hjust=1))
Note that opts is deprecated in current versions of ggplot2. This functionality has been moved to theme():
+ theme(axis.text.x = element_text(angle = 45, hjust = 1))
or dilute the x-axis
+scale_x_datetime(major = "10 years")
to automatically shift the labels, I think the arrange() function needs to be fiddled with (though I'm not sure how).

I wrote this function to return the proper major axis breaks given that you want some set number of major breaks.
year.range.major <- function(df, column = "index", n = 5){
range <- diff(range(df[,column]))
range.num <- as.numeric(range)
major = max(pretty((range.num/365)/n))
return(paste(major,"years"))
}
So, instead of always fixing the breaks at 10 years, it'll produce fixed number of breaks at nice intervals.
+scale_x_date(major = year.range.major())
or
+scale_x_date(major = year.range.major(n=3))

Related

Ggplot2 Boxplot width setting changes x-axis

I have produced a boxplot with a continuous x-axis unsing geom_boxplot() in ggplot2. However, as there are many boxes they appear as skinny lines. Another stackoverflow chain (see here) suggested using the width= argument to make all the boxes the same width. However, when I use this argument it changes the x-axis and some of the boxes just disappear!
For example, take this example dataframe. I apologise for the number of observations this has but I think the quantity has to do with the problem as I couldn't reproduce it with a more simple boxplot:
Lat<- c(50.70228,50.70228,50.70228,51.82067,51.82067,51.82067,52.45893,52.45893,52.45893,52.76478,52.76478,52.76478,52.78354,52.78354,52.78354,53.56102,53.56102,53.56102,53.65364,53.65364,53.65364,53.63130,53.63130,53.63130,54.19035,54.19035,54.19035,54.25751,54.25751,54.25751,54.23526,54.23526,54.23526,54.62469,54.62469,54.62469,54.67831,54.67831,54.67831,54.67900,54.67900,54.67900,54.94908,54.94908,54.94908,55.19456,55.19456,55.19456,54.79198,54.79198,54.79198,55.34981,55.34981,55.34981,55.85655,55.85655,55.85655,56.06078,56.06078,56.06078,55.84553,55.84553,55.84553,56.00197,56.00197,56.00197,56.71842,56.71842,56.71842,57.00116,57.00116,57.00116,57.06942,57.06942,57.06942,57.26815,57.26815,57.26815,57.45532,57.45532,57.45532,57.88596,57.88596,57.88596,51.07711,51.07711,51.07711,51.07801,51.07621,51.11159,51.11159,51.11159,52.02484,52.02484,52.02484,52.02581,52.02581,52.02581,52.02685,52.02685,52.02685,52.05353,52.05353,52.05626,52.05353,52.05353,52.05353,52.05353,52.05353,52.05353,51.93541,51.93541,51.93541,51.93541,51.93541,51.93541,51.93541,51.93541,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.90810,52.90810,52.90810,52.90810,52.90810,52.90810,52.78968,52.78778,52.78968,52.78968,52.78881,52.78883,52.78883,52.78883,52.78970,52.78970,52.79506,52.79506,52.79506,53.77270,53.77276,53.77109,53.77109,53.77276,53.76845,53.76845,53.77109,53.76845,53.77109,53.87020,53.87020,53.87020,53.87103,53.88205,53.88205,53.88205,53.88205,53.87701,53.87701,53.87098,53.87098,53.87098,53.86932,53.86932,53.86932,56.51869,56.51869,56.51869,56.55870,56.55870,56.55870,56.55964,56.55964,56.55964,57.51056,57.49542,57.49542,57.50878,57.50878,57.50878,57.45201,57.45477,57.45192,57.45192,57.45192)
y <- c(33.45407,21.40954,27.73487,20.38318,26.65483,31.68201,23.95467,20.77363,32.94192,22.71228,25.78824,28.39449,35.60615,24.29325,22.95047,25.65343,30.23262,22.05534,37.20565,35.53812,38.20211,39.38034,35.16619,38.82336,29.72370,38.25754,26.51339,39.38283,29.57483,31.80111,24.52967,34.83037,21.75038,35.50868,39.41830,21.96971,22.82504,32.69746,35.10747,27.75669,34.96690,37.61921,37.17226,20.50448,39.26582,22.08668,28.41502,36.69530,23.69404,23.18052,33.27420,23.04157,33.17285,32.00579,21.83845,22.97143,32.27190,21.53771,38.65481,20.14341,33.62718,39.86755,39.77881,30.59810,27.65909,24.11646,34.56981,29.30249,34.99361,32.39553,28.90443,34.88775,22.77049,36.44468,30.64496,35.81501,31.77673,24.19058,39.36298,21.47219,23.02268,31.37647,27.28457,33.14749,23.20842,39.73427,39.81399,35.51515,24.55080,39.41190,29.59987,38.46791,20.94479,37.22109,26.36060,30.91641,39.25975,39.88288,22.59061,30.24439,21.66110,30.36878,28.76901,38.75561,33.80408,31.05842,26.18921,21.30804,35.02966,33.85981,30.84373,31.67341,35.07605,37.93820,31.30481,21.45117,37.13626,25.70964,25.64736,38.58381,31.24448,26.55902,23.90817,33.70300,26.48909,37.73200,32.52413,22.44440,28.19878,32.46415,25.13711,26.66075,28.16254,20.40673,39.89327,30.83327,32.40196,39.81218,39.80391,21.87316,34.95792,33.38958,38.18441,22.03114,35.64410,34.90643,24.23056,36.66581,29.35813,20.86880,30.02044,36.13727,24.65558,39.43175,29.00154,29.78185,22.89196,37.15204,35.88188,28.73920,28.04934,37.50701,30.36306,28.39842,35.20973,26.54260,29.57763,26.03163,26.90440,27.60110,25.80086,39.98019,21.59970,28.83825,32.01711,20.50812,38.43331,32.41898,27.68722,32.59905,24.18150,29.05701,22.38512,32.93342,37.66694,37.65391,34.19613,23.89985,36.90012,20.74244,27.08511,29.21433,35.83771,35.59557,33.74533,27.08854,38.38994)
V3 <-c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2)
df <- as.data.frame(cbind(Lat, y, as.factor(V3)))
head(df)
I plot it on a continuous x-axis as so:
df_plot <- ggplot(df, aes(x=Lat, y=y, group=Lat))+
geom_boxplot(aes(colour=as.factor(V3)))+
theme_classic()
df_plot
Which produces:
As you can see the boxes are represented as skinny lines.
Therefore I tried to use the width= argument as so:
df_plot2 <- ggplot(df, aes(x=Lat, y=y, group=Lat))+
geom_boxplot(aes(colour=as.factor(V3)), width=1)+
theme_classic()
df_plot2
The output is:
The main thing to notice here is that the x-axis range has suddenly changed! Some of the boxes are no longer plotted whilst others seem to be placed at different values of the x-axis.
The range of the x-axis should be:
range(df$Lat)
[1] 50.70228 57.88596
I am completley perplexed as to why the x-axis would change by simply adding the width= argument in geom_boxplot(). I therefore tried to force the limits of the x-axis scale as so:
df_plot3 <- ggplot(df, aes(x=Lat, y=y, group=Lat))+
geom_boxplot(aes(colour=as.factor(V3)), width=1)+
xlim(50,58)+
theme_classic()
df_plot3
ouput:
Please send help!
I think the strange behaviour comes from ggplot trying to automatically dodge your boxplots apart. By setting position = position_dodge(width = 0) the plot seems to be created as expected without changing the placement of boxes along the x-axis. (But gives a warning about overlapping x intervals)
Lat<- c(50.70228,50.70228,50.70228,51.82067,51.82067,51.82067,52.45893,52.45893,52.45893,52.76478,52.76478,52.76478,52.78354,52.78354,52.78354,53.56102,53.56102,53.56102,53.65364,53.65364,53.65364,53.63130,53.63130,53.63130,54.19035,54.19035,54.19035,54.25751,54.25751,54.25751,54.23526,54.23526,54.23526,54.62469,54.62469,54.62469,54.67831,54.67831,54.67831,54.67900,54.67900,54.67900,54.94908,54.94908,54.94908,55.19456,55.19456,55.19456,54.79198,54.79198,54.79198,55.34981,55.34981,55.34981,55.85655,55.85655,55.85655,56.06078,56.06078,56.06078,55.84553,55.84553,55.84553,56.00197,56.00197,56.00197,56.71842,56.71842,56.71842,57.00116,57.00116,57.00116,57.06942,57.06942,57.06942,57.26815,57.26815,57.26815,57.45532,57.45532,57.45532,57.88596,57.88596,57.88596,51.07711,51.07711,51.07711,51.07801,51.07621,51.11159,51.11159,51.11159,52.02484,52.02484,52.02484,52.02581,52.02581,52.02581,52.02685,52.02685,52.02685,52.05353,52.05353,52.05626,52.05353,52.05353,52.05353,52.05353,52.05353,52.05353,51.93541,51.93541,51.93541,51.93541,51.93541,51.93541,51.93541,51.93541,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.90810,52.90810,52.90810,52.90810,52.90810,52.90810,52.78968,52.78778,52.78968,52.78968,52.78881,52.78883,52.78883,52.78883,52.78970,52.78970,52.79506,52.79506,52.79506,53.77270,53.77276,53.77109,53.77109,53.77276,53.76845,53.76845,53.77109,53.76845,53.77109,53.87020,53.87020,53.87020,53.87103,53.88205,53.88205,53.88205,53.88205,53.87701,53.87701,53.87098,53.87098,53.87098,53.86932,53.86932,53.86932,56.51869,56.51869,56.51869,56.55870,56.55870,56.55870,56.55964,56.55964,56.55964,57.51056,57.49542,57.49542,57.50878,57.50878,57.50878,57.45201,57.45477,57.45192,57.45192,57.45192)
y <- c(33.45407,21.40954,27.73487,20.38318,26.65483,31.68201,23.95467,20.77363,32.94192,22.71228,25.78824,28.39449,35.60615,24.29325,22.95047,25.65343,30.23262,22.05534,37.20565,35.53812,38.20211,39.38034,35.16619,38.82336,29.72370,38.25754,26.51339,39.38283,29.57483,31.80111,24.52967,34.83037,21.75038,35.50868,39.41830,21.96971,22.82504,32.69746,35.10747,27.75669,34.96690,37.61921,37.17226,20.50448,39.26582,22.08668,28.41502,36.69530,23.69404,23.18052,33.27420,23.04157,33.17285,32.00579,21.83845,22.97143,32.27190,21.53771,38.65481,20.14341,33.62718,39.86755,39.77881,30.59810,27.65909,24.11646,34.56981,29.30249,34.99361,32.39553,28.90443,34.88775,22.77049,36.44468,30.64496,35.81501,31.77673,24.19058,39.36298,21.47219,23.02268,31.37647,27.28457,33.14749,23.20842,39.73427,39.81399,35.51515,24.55080,39.41190,29.59987,38.46791,20.94479,37.22109,26.36060,30.91641,39.25975,39.88288,22.59061,30.24439,21.66110,30.36878,28.76901,38.75561,33.80408,31.05842,26.18921,21.30804,35.02966,33.85981,30.84373,31.67341,35.07605,37.93820,31.30481,21.45117,37.13626,25.70964,25.64736,38.58381,31.24448,26.55902,23.90817,33.70300,26.48909,37.73200,32.52413,22.44440,28.19878,32.46415,25.13711,26.66075,28.16254,20.40673,39.89327,30.83327,32.40196,39.81218,39.80391,21.87316,34.95792,33.38958,38.18441,22.03114,35.64410,34.90643,24.23056,36.66581,29.35813,20.86880,30.02044,36.13727,24.65558,39.43175,29.00154,29.78185,22.89196,37.15204,35.88188,28.73920,28.04934,37.50701,30.36306,28.39842,35.20973,26.54260,29.57763,26.03163,26.90440,27.60110,25.80086,39.98019,21.59970,28.83825,32.01711,20.50812,38.43331,32.41898,27.68722,32.59905,24.18150,29.05701,22.38512,32.93342,37.66694,37.65391,34.19613,23.89985,36.90012,20.74244,27.08511,29.21433,35.83771,35.59557,33.74533,27.08854,38.38994)
V3 <-c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2)
library(ggplot2)
df <- as.data.frame(cbind(Lat, y, as.factor(V3)))
df_plot <- ggplot(df) +
geom_boxplot(aes(colour=as.factor(V3), x=Lat, y=y, group=as.factor(Lat)),
position=position_dodge(width = 0),
width=1) +
theme_classic()

Dividing long time series in multiple panels with ggplot2

I have a rather long timeseries that I want to plot in ggplot, but it's sufficiently long that even using the full width of the page it's barely readable.
What I want to do instead is to divide the plot into 2 (or more, in the general case) panels one on top of each other.
I could do it manually but not only it's cumbersome but also it's hard to get the axis to have the same scale. Ideally I would like to have something like this:
ggplot(data, aes(time, y)) +
geom_line() +
facet_time(time, n = 2)
And then get something like this:
(This plot was made using facet_wrap(~(year(as.Date(time)) > 2000), ncol = 1, scales = "free_x"), which messes up x axis scale, it works only for 2 panels, and doesn't work well with geom_smooth())
Also, ideally it would also handle summary statistics correctly. For example, using the correct data for geom_smooth() (so facetting wouldn't do it, because at the beginning of every facet it would not use the data in the last chunk of the previous one).
Is there a way to do this?
Thank you!
Below I create two separate plots, one for the period 1982-1999 and one for 1999-2016 and then lay them out using grid.arrange from the gridExtra package. The horizontal axes are scaled equivalently in both plots.
I also generate regression lines outside of ggplot using the loess function so that it can be added using geom_line (you can of course use any regression function here, such as lm, gam, splines, etc). With this approach the regression can be run on the entire time series, ensuring continuity of the regression line across the two panels, even though we break the time series into two halves for plotting.
library(dplyr) # For the chaining (%>%) operator
library(purrr) # For the map function
library(gridExtra) # For the grid.arrange function
Function to extract a legend from a ggplot. We'll use this to get one legend across two separate plots.
# http://stackoverflow.com/questions/12539348/ggplot-separate-legend-and-plot
g_legend<-function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
legend
}
# Fake data
set.seed(255)
dat = data.frame(time=rep(seq(1982,2016,length.out=500),2),
value= c(arima.sim(list(ar=c(0.4, 0.05, 0.5)), n=500),
arima.sim(list(ar=c(0.3, -0.3, 0.6)), n=500)),
group=rep(c("A","B"), each=500))
Generate smoother lines using loess: We want a separate regression line for each level of group, so we use group_by with the chaining operator from dplyr:
dat = dat %>% group_by(group) %>%
mutate(smooth = predict(loess(value ~ time, span=0.1)))
Create a list of two plots, one for each time period: We use map to create separate plots for each time period and return a list with the two plot objects as elements (you can also use base lapply for this instead of map):
pl = map(list(c(1982,1999), c(1999,2016)),
~ ggplot(dat %>% filter(time >= .x[1], time <= .x[2]),
aes(colour=group)) +
geom_line(aes(time, value), alpha=0.5) +
geom_line(aes(time, smooth), size=1) +
scale_x_continuous(breaks=1982:2016, expand=c(0.01,0)) +
scale_y_continuous(limits=range(dat$value)) +
theme_bw() +
labs(x="", y="", colour="") +
theme(strip.background=element_blank(),
strip.text=element_blank(),
axis.title=element_blank()))
# Extract legend as a separate graphics object
leg = g_legend(pl[[1]])
Finally, we lay out both plots (after removing legends) plus the extracted legend:
grid.arrange(arrangeGrob(grobs=map(pl, function(p) p + guides(colour=FALSE)), ncol=1),
leg, ncol=2, widths=c(10,1), left="Value", bottom="Year")
You can do this by storing the plot object, then printing it twice. Each time add an option coord_cartesian:
orig_plot <- ggplot(data, aes(time, y)) +
geom_line()
early <- orig_plot + coord_cartesian(xlim = c(1982, 2000))
late <- orig_plot + coord_cartesian(xlim = c(2000, 2016))
That makes sure that both plots use all the data.
To plot them on the same page, use grid (I got this from the ggplot2 book, which is probably around as a pdf somewhere):
library(grid)
vp1 <- viewport(width = 1, height = .5, just = c("center", "bottom"))
vp2 <- viewport(width = 1, height = .5, just = c("center", "top"))
print(early, vp = vp1)
print(late, vp = vp2)

Two column/row Positioning of labels in ggplot

I am making a plot in ggplot2 where on the y axis I have the indices of groups and on the x axis some information. For readability I would like to make the labels bigger but then they start overlapping. Therefore I would like to put the labels into two columns as shown in the figure so they can be bigger. Is there a way to do this in ggplot? I tried vjust and hjust but they only seem to accept 1 argument applying to all labels.
Current labels:
Objective labeling:
Well, there is no obvious parameter responsible for that, at least AFAIK.
However, for your specific goal my first thought was to add some spaces to numeric labels.
avoid_overlap <- function(x)
{
ind <- seq_along(x) %% 2 == 0
x[ind] <- paste0(x[ind], " ")
x
}
ggplot(mtcars, aes(cyl, mpg)) + geom_point() +
scale_y_continuous(breaks = 10:35, labels = avoid_overlap(10:35)) +
theme(axis.text.y = element_text(size = 32))
Play with grid lines (minor/major) via theme if the grid is too dense.

Keep all plot components same size in ggplot2 between two plots

I would like two separate plots. I am using them in different frames of a beamer presentation and I will add one line to the other (eventually, not in example below). Thus I do not want the presentation to "skip" ("jump" ?) from one slide to the next slide. I would like it to look like the line is being added naturally. The below code I believe shows the problem. It is subtle, but not how the plot area of the second plot is slightly larger than of the first plot. This happens because of the y axis label.
library(ggplot2)
dfr1 <- data.frame(
time = 1:10,
value = runif(10)
)
dfr2 <- data.frame(
time = 1:10,
value = runif(10, 1000, 1001)
)
p1 <- ggplot(dfr1, aes(time, value)) + geom_line() + scale_y_continuous(breaks = NULL) + scale_x_continuous(breaks = NULL) + ylab(expression(hat(z)==hat(gamma)[1]*time+hat(gamma)[4]*time^2))
print(p1)
dev.new()
p2 <- ggplot(dfr2, aes(time, value)) + geom_line() + scale_y_continuous(breaks = NULL) + scale_x_continuous(breaks = NULL) + ylab(".")
print(p2)
I would prefer to not have a hackish solution such as setting the size of the axis label manually or adding spaces on the x-axis (see one reference below), because I will use this technique in several settings and the labels can change at any time (I like reproducibility so want a flexible solution).
I'm searched a lot and have found the following:
Specifying ggplot2 panel width
How can I make consistent-width plots in ggplot (with legends)?
https://groups.google.com/forum/#!topic/ggplot2/2MNoYtX8EEY
How can I add variable size y-axis labels in R with ggplot2 without changing the plot width?
They do not work for me, mainly because I need separate plots, so it is not a matter of aligning them virtically on one combined plot as in some of the above solutions.
haven't tried, but this might work,
gl <- lapply(list(p1,p2), ggplotGrob)
library(grid)
widths <- do.call(unit.pmax, lapply(gl, "[[", "widths"))
heights <- do.call(unit.pmax, lapply(gl, "[[", "heights"))
lg <- lapply(gl, function(g) {g$widths <- widths; g$heights <- heights; g})
grid.newpage()
grid.draw(lg[[1]])
grid.newpage()
grid.draw(lg[[2]])
How about using this for p2:
p2 <- ggplot(dfr2, aes(time, value)) + geom_line() +
scale_y_continuous(breaks = NULL) +
scale_x_continuous(breaks = NULL) +
ylab(expression(hat(z)==hat(gamma)[1]*time+hat(gamma)[4]*time^2)) +
theme(axis.title.y=element_text(color=NA))
This has the same label as p1, but the color is NA so it doesn't display. You could also use color="white".

ggplot2 multiple stat_binhex() plots with different color gradients in one image

I'd like to use ggplot2's stat_binhex() to simultaneously plot two independent variables on the same chart, each with its own color gradient using scale_colour_gradientn().
If we disregard the fact that the x-axis units do not match, a reproducible example would be to plot the following in the same image while maintaining separate fill gradients.
d <- ggplot(diamonds, aes(x=carat,y=price))+
stat_binhex(colour="white",na.rm=TRUE)+
scale_fill_gradientn(colours=c("white","blue"),name = "Frequency",na.value=NA)
try(ggsave(plot=d,filename=<some file>,height=6,width=8))
d <- ggplot(diamonds, aes(x=depth,y=price))+
stat_binhex(colour="white",na.rm=TRUE)+
scale_fill_gradientn(colours=c("yellow","black"),name = "Frequency",na.value=NA)
try(ggsave(plot=d,filename=<some other file>,height=6,width=8))
I found some conversation of a related issue in ggplot2 google groups here.
Here is another possible solution: I have taken #mnel's idea of mapping bin count to alpha transparency, and I have transformed the x-variables so they can be plotted on the same axes.
library(ggplot2)
# Transforms range of data to 0, 1.
rangeTransform = function(x) (x - min(x)) / (max(x) - min(x))
dat = diamonds
dat$norm_carat = rangeTransform(dat$carat)
dat$norm_depth = rangeTransform(dat$depth)
p1 = ggplot(data=dat) +
theme_bw() +
stat_binhex(aes(x=norm_carat, y=price, alpha=..count..), fill="#002BFF") +
stat_binhex(aes(x=norm_depth, y=price, alpha=..count..), fill="#FFD500") +
guides(fill=FALSE, alpha=FALSE) +
xlab("Range Transformed Units")
ggsave(plot=p1, filename="plot_1.png", height=5, width=5)
Thoughts:
I tried (and failed) to display a sensible color/alpha legend. Seems tricky, but should be possible given all the legend-customization features of ggplot2.
X-axis unit labeling needs some kind of solution. Plotting two sets of units on one axis is frowned upon by many, and ggplot2 has no such feature.
Interpretation of cells with overlapping colors seems clear enough in this example, but could get very messy depending on the datasets used, and the chosen colors.
If the two colors are additive complements, then wherever they overlap equally you will see a neutral gray. Where the overlap is unequal, the gray would shift to more yellow, or more blue. My colors are not quite complements, judging by the slightly pink hue of the gray overlap cells.
I think what you want goes against the principles of ggplot2 and the grammar of graphics approach more generally. Until the issue is addressed (for which I would not hold my breath), you have a couple of choices
Use facet_wrap and alpha
This is will not produce a nice legend, but takes you someway to what you want.
You can set the alpha value to scale by the computed Frequency, accessed by ..Frequency..
I don't think you can merge the legends nicely though.
library(reshape2)
# in long format
dm <- melt(diamonds, measure.var = c('depth','carat'))
ggplot(dm, aes(y = price, fill = variable, x = value)) +
facet_wrap(~variable, ncol = 1, scales = 'free_x') +
stat_binhex(aes(alpha = ..count..), colour = 'grey80') +
scale_alpha(name = 'Frequency', range = c(0,1)) +
theme_bw() +
scale_fill_manual('Variable', values = setNames(c('darkblue','yellow4'), c('depth','carat')))
Use gridExtra with grid.arrange or arrangeGrob
You can create separate plots and use gridExtra::grid.arrange to arrange on a single image.
d_carat <- ggplot(diamonds, aes(x=carat,y=price))+
stat_binhex(colour="white",na.rm=TRUE)+
scale_fill_gradientn(colours=c("white","blue"),name = "Frequency",na.value=NA)
d_depth <- ggplot(diamonds, aes(x=depth,y=price))+
stat_binhex(colour="white",na.rm=TRUE)+
scale_fill_gradientn(colours=c("yellow","black"),name = "Frequency",na.value=NA)
library(gridExtra)
grid.arrange(d_carat, d_depth, ncol =1)
If you want this to work with ggsave (thanks to #bdemarest comment below and #baptiste)
replace grid.arrange with arrangeGrob something like.
ggsave(plot=arrangeGrob(d_carat, d_depth, ncol=1), filename="plot_2.pdf", height=12, width=8)

Resources