I'm currently trying to present two time series using ggplot2, both with very different scales, using two ggplots. I've combined the two separate ggplots, one on top of the other, using grid.arrange. In order to aid visualization, I'd like to make each line a different colour, and have this legend below the combined plot.
As this may be relevant, I'm currently working in the confines of creating a shiny section of an R markdown document. Hence the renderPlot wrapper around grid.arrange.
The following is similar to the code that I currently have.
testdata = data.frame(var1 = seq(0,10,by=1), var2 = runif(11),
var3 = runif(11, min = 100, max = 500))
renderPlot({grid.arrange(
ggplot(data = testdata, aes(x = var1, y = var2))
+ geom_line(colour = "blue") + xlab(NULL),
ggplot(data = testdata, aes(x = var1, y = var3)) + geom_line(colour = "red"))})
Does anyone have any suggestions about how to create the shared legend? Thanks very much for your help.
using ggplot2 I usually use the following 2 methods to create a common legend:
Method 1 : When scales are similar
By using facet_grid or just the color parameter in combination with reshape2 package, you can easily combine multiple plots with same legend. But this is ideal in case the values in your variables have a similar magnitude order.
Using color & reshape2:
library('reshape2')
data_melt<-melt(data=testdata,value.name='Value',id.vars='var1')
ggplot(data_melt)+
geom_line(aes(x=var1,y=Value,color=variable))
Using color, facet_grid & reshape2:
library('reshape2')
data_melt<-melt(data=testdata,value.name='Value',id.vars='var1')
ggplot(data_melt)+
geom_line(aes(x=var1,y=Value,color=variable))+
facet_grid(~variable)
Method 2: When scales differ wildly
As you can see,the final plot is great!
All you need is to create a plot having your legend & pass it as an input parameter to the custom function created in the wiki here.
testdata = data.frame(var1 = seq(0,10,by=1), var2 = runif(11),
var3 = runif(11, min = 100, max = 500))
library('reshape2')
data_melt<-melt(data=testdata,value.name='Value',id.vars='var1')
p1=ggplot(data = testdata)+
geom_line(aes(x = var1, y = var2,color='blue'))
p2=ggplot(data = testdata) +
geom_line(aes(x = var1, y = var3,color='red'))
p3=ggplot(data_melt)+
geom_line(aes(x=var1,y=Value,color=variable))
grid.arrange(p1,p2,nrow=2,main='Line Plots')
g_legend<-function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
return(legend)}
legend <- g_legend(p3)
lwidth <- sum(legend$width)
## using grid.arrange for convenience
## could also manually push viewports
grid.arrange(arrangeGrob(p1 + theme(legend.position="none"),
p2 + theme(legend.position="none"),
main ="Variable Name",
left = "Value"),
legend,
widths=unit.c(unit(1, "npc") - lwidth, lwidth), nrow=1)
Related
I would like to show in the same plot interpolated data and a histogram of the raw data of each predictor. I have seen in other threads like this one, people explain how to do marginal histograms of the same data shown in a scatter plot, in this case, the histogram is however based on other data (the raw data).
Suppose we see how price is related to carat and table in the diamonds dataset:
library(ggplot2)
p = ggplot(diamonds, aes(x = carat, y = table, color = price)) + geom_point()
We can add a marginal frequency plot e.g. with ggMarginal
library(ggExtra)
ggMarginal(p)
How do we add something similar to a tile plot of predicted diamond prices?
library(mgcv)
model = gam(price ~ s(table, carat), data = diamonds)
newdat = expand.grid(seq(55,75, 5), c(1:4))
names(newdat) = c("table", "carat")
newdat$predicted_price = predict(model, newdat)
ggplot(newdat,aes(x = carat, y = table, fill = predicted_price)) +
geom_tile()
Ideally, the histograms go even beyond the margins of the tileplot, as these data points also influence the predictions. I would, however, be already very happy to know how to plot a histogram for the range that is shown in the tileplot. (Maybe the values that are outside the range could just be added to the extreme values in different color.)
PS. I managed to more or less align histograms to the margins of the sides of a tile plot, using the method of the accepted answer in the linked thread, but only if I removed all kind of labels. It would be particularly good to keep the color legend, if possible.
EDIT:
eipi10 provided an excellent solution. I tried to modify it slightly to add the sample size in numbers and to graphically show values outside the plotted range since they also affect the interpolated values.
I intended to include them in a different color in the histograms at the side. I hereby attempted to count them towards the lower and upper end of the plotted range. I also attempted to plot the sample size in numbers somewhere on the plot. However, I failed with both.
This was my attempt to graphically illustrate the sample size beyond the plotted area:
plot_data = diamonds
plot_data <- transform(plot_data, carat_range = ifelse(carat < 1 | carat > 4, "outside", "within"))
plot_data <- within(plot_data, carat[carat < 1] <- 1)
plot_data <- within(plot_data, carat[carat > 4] <- 4)
plot_data$carat_range = as.factor(plot_data$carat_range)
p2 = ggplot(plot_data, aes(carat, fill = carat_range)) +
geom_histogram() +
thm +
coord_cartesian(xlim=xrng)
I tried to add the sample size in numbers with geom_text. I tried fitting it in the far right panel but it was difficult (/impossible for me) to adjust. I tried to put it on the main graph (which would anyway probably not be the best solution), but it didn’t work either (it removed the histogram and legend, on the right side and it did not plot all geom_texts). I also tried to add a third row of plots and writing it there. My attempt:
n_table_above = nrow(subset(diamonds, table > 75))
n_table_below = nrow(subset(diamonds, table < 55))
n_table_within = nrow(subset(diamonds, table >= 55 & table <= 75))
text_p = ggplot()+
geom_text(aes(x = 0.9, y = 2, label = paste0("N(>75) = ", n_table_above)))+
geom_text(aes(x = 1, y = 2, label = paste0("N = ", n_table_within)))+
geom_text(aes(x = 1.1, y = 2, label = paste0("N(<55) = ", n_table_below)))+
thm
library(egg)
pobj = ggarrange(p2, ggplot(), p1, p3,
ncol=2, widths=c(4,1), heights=c(1,4))
grid.arrange(pobj, leg, text_p, ggplot(), widths=c(6,1), heights =c(6,1))
I would be very happy to receive help on either or both tasks (adding sample size as text & adding values outside plotted range in a different color).
Based on your comment, maybe the best approach is to roll your own layout. Below is an example. We create the marginal plots as separate ggplot objects and lay them out with the main plot. We also extract the legend and put it outside the marginal plots.
Set-up
library(ggplot2)
library(cowplot)
# Function to extract legend
#https://github.com/hadley/ggplot2/wiki/Share-a-legend-between-two-ggplot2-graphs
g_legend<-function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
return(legend) }
thm = list(theme_void(),
guides(fill=FALSE),
theme(plot.margin=unit(rep(0,4), "lines")))
xrng = c(0.6,4.4)
yrng = c(53,77)
Plots
p1 = ggplot(newdat, aes(x = carat, y = table, fill = predicted_price)) +
geom_tile() +
theme_classic() +
coord_cartesian(xlim=xrng, ylim=yrng)
leg = g_legend(p1)
p1 = p1 + thm[-1]
p2 = ggplot(diamonds, aes(carat)) +
geom_line(stat="density") +
thm +
coord_cartesian(xlim=xrng)
p3 = ggplot(diamonds, aes(table)) +
geom_line(stat="density") +
thm +
coord_flip(xlim=yrng)
plot_grid(
plot_grid(plotlist=list(p2, ggplot(), p1, p3), ncol=2,
rel_widths=c(4,1), rel_heights=c(1,4), align="hv", scale=1.1),
leg, rel_widths=c(5,1))
UPDATE: Regarding your comment about the space between the plots: This is an Achilles heel of plot_grid and I don't know if there's a way to fix it. Another option is ggarrange from the experimental egg package, which doesn't add so much space between plots. Also, you need to save the output of ggarrange first and then lay out the saved object with the legend. If you run ggarrange inside grid.arrange you get two overlapping copies of the plot:
# devtools::install_github('baptiste/egg')
library(egg)
pobj = ggarrange(p2, ggplot(), p1, p3,
ncol=2, widths=c(4,1), heights=c(1,4))
grid.arrange(pobj, leg, widths=c(6,1))
The code is as follows:
set.seed(123)
d1=data.frame(x=runif(10),y=runif(10),z=runif(10,1,10))
d2=data.frame(x=runif(10),y=runif(10),z=runif(10,100,1000))
ggplot()+geom_point(aes(x,y,size=z),data=d1)+
geom_line(aes(x,y,size=z),data=d2)
And the result is like this:
The size of points are too small so I want to change its size by scale_size. However, it seems both lines and points are influenced. So I wonder if there is a way to scale lines and points separately with a separate legend?
The two ways I can think of are 1) combining two legend grobs or 2) hacking another legend aesthetic. Both of these were mentioned by #Mike Wise in the comments above.
Approach #1: combining 2 separate legends in the same plot using grobs.
I used code from this answer to grab the legend. Baptiste's arrangeGrob vignette is a useful reference.
library(grid); library(gridExtra)
#Function to extract legend grob
g_legend <- function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
legend
}
#Create plots
p1 <- ggplot()+ geom_point(aes(x,y,size=z),data=d1) + scale_size(name = "point")
p2 <- ggplot()+ geom_line(aes(x,y,size=z),data=d2) + scale_size(name = "line")
p3 <- ggplot()+ geom_line(aes(x,y,size=z),data=d2) +
geom_point(aes(x,y, size=z * 100),data=d1) # Combined plot
legend1 <- g_legend(p1)
legend2 <- g_legend(p2)
legend.width <- sum(legend2$width)
gplot <- grid.arrange(p3 +theme(legend.position = "none"), legend1, legend2,
ncol = 2, nrow = 2,
layout_matrix = rbind(c(1,2 ),
c(1,3 )),
widths = unit.c(unit(1, "npc") - legend.width, legend.width))
grid.draw(gplot)
Note for printing: use arrangeGrob() instead of grid.arrange(). I had to use png; grid.draw; dev.off to save the (arrangeGrob) plot.
Approach #2: hacking another aesthetic legend.
MilanoR has a great post on this, focusing on colour instead of size.
More SO examples: 1) discrete colour and 2) colour gradient.
#Create discrete levels for point sizes (because points will be mapped to fill)
d1$z.bin <- findInterval(d1$z, c(0,2,4,6,8,10), all.inside= TRUE) #Create bins
#Scale the points to the same size as the lines (points * 100).
#Map points to a dummy aesthetic (fill)
#Hack the fill properties.
ggplot()+ geom_line(aes(x,y,size=z),data=d2) +
geom_point(aes(x,y, size=z * 100, fill = as.character(z.bin)),data=d1) +
scale_size("line", range = c(1,5)) +
scale_fill_manual("points", values = rep(1, 10) ,
guide = guide_legend(override.aes =
list(colour = "black",
size = sort(unique(d1$z.bin)) )))
I'm a noob in programming, but you could try this methode. As you see, my code uses points and paths. I define a vector of the length of number of paths. My lines have the size 1. Then I add the sizes of my points at the back of that vector.
size_vec<-c(rep(1, length(unique(Data$Satellite))), 1.4, 4.6, 4.2, 5.5)
plot <- ggplot(data) +
geom_point(aes(x = x_cor, y = y_cor, shape=Type, size=Type)) +
geom_path(aes(x = x_cor, y = y_cor, group = Tour, size=factor(Satellite))) +
scale_size_manual(values = size_vec, guide ='none')
I want to make a number of symmetrical histograms to show butterfly abundance through time. Here's a site that shows the form of the graphs I am trying to create: http://thebirdguide.com/pelagics/bar_chart.htm
For ease, I will use the iris dataset.
library(ggplot2)
g <- ggplot(iris, aes(Sepal.Width)) + geom_histogram(binwidth=.5)
g + coord_fixed(ratio = .003)
Essentially, I would like to mirror this histogram below the x-axis. Another way of thinking about the problem is to create a horizontal violin diagram with distinct bins. I've looked at the plotrix package and the ggplot2 documentation but don't find a solution in either place. I prefer to use ggplot2 but other solutions in base R, lattice or other packages will be fine.
Without your exact data, I can only provide an approximate coding solution, but it is a start for you (if you add more details, I'll be happy to help you tweak the plot). Here's the code:
library(ggplot2)
noSpp <- 3
nTime <- 10
d <- data.frame(
JulianDate = rep(1:nTime , times = noSpp),
sppAbundance = c(c(1:5, 5:1),
c(3:5, 5:1, 1:2),
c(5:1, 1:5)),
yDummy = 1,
sppName = rep(letters[1:noSpp], each = nTime))
ggplot(data = d, aes(x = JulianDate, y = yDummy, size = sppAbundance)) +
geom_line() + facet_grid( sppName ~ . ) + ylab("Species") +
xlab("Julian Date")
And here's the figure.
I would like two separate plots. I am using them in different frames of a beamer presentation and I will add one line to the other (eventually, not in example below). Thus I do not want the presentation to "skip" ("jump" ?) from one slide to the next slide. I would like it to look like the line is being added naturally. The below code I believe shows the problem. It is subtle, but not how the plot area of the second plot is slightly larger than of the first plot. This happens because of the y axis label.
library(ggplot2)
dfr1 <- data.frame(
time = 1:10,
value = runif(10)
)
dfr2 <- data.frame(
time = 1:10,
value = runif(10, 1000, 1001)
)
p1 <- ggplot(dfr1, aes(time, value)) + geom_line() + scale_y_continuous(breaks = NULL) + scale_x_continuous(breaks = NULL) + ylab(expression(hat(z)==hat(gamma)[1]*time+hat(gamma)[4]*time^2))
print(p1)
dev.new()
p2 <- ggplot(dfr2, aes(time, value)) + geom_line() + scale_y_continuous(breaks = NULL) + scale_x_continuous(breaks = NULL) + ylab(".")
print(p2)
I would prefer to not have a hackish solution such as setting the size of the axis label manually or adding spaces on the x-axis (see one reference below), because I will use this technique in several settings and the labels can change at any time (I like reproducibility so want a flexible solution).
I'm searched a lot and have found the following:
Specifying ggplot2 panel width
How can I make consistent-width plots in ggplot (with legends)?
https://groups.google.com/forum/#!topic/ggplot2/2MNoYtX8EEY
How can I add variable size y-axis labels in R with ggplot2 without changing the plot width?
They do not work for me, mainly because I need separate plots, so it is not a matter of aligning them virtically on one combined plot as in some of the above solutions.
haven't tried, but this might work,
gl <- lapply(list(p1,p2), ggplotGrob)
library(grid)
widths <- do.call(unit.pmax, lapply(gl, "[[", "widths"))
heights <- do.call(unit.pmax, lapply(gl, "[[", "heights"))
lg <- lapply(gl, function(g) {g$widths <- widths; g$heights <- heights; g})
grid.newpage()
grid.draw(lg[[1]])
grid.newpage()
grid.draw(lg[[2]])
How about using this for p2:
p2 <- ggplot(dfr2, aes(time, value)) + geom_line() +
scale_y_continuous(breaks = NULL) +
scale_x_continuous(breaks = NULL) +
ylab(expression(hat(z)==hat(gamma)[1]*time+hat(gamma)[4]*time^2)) +
theme(axis.title.y=element_text(color=NA))
This has the same label as p1, but the color is NA so it doesn't display. You could also use color="white".
Context
I want to plot two ggplot2 on the same page with the same legend. http://code.google.com/p/gridextra/wiki/arrangeGrob discribes, how to do this. This already looks good. But... In my example I have two plots with the same x-axis and different y-axis. When the range of the the y-axis is at least 10 times higher than of the other plot (e.g. 10000 instead of 1000), ggplot2 (or grid?) does not align the plots correct (see Output below).
Question
How do I also align the left side of the plot, using two different y-axis?
Example Code
x = c(1, 2)
y = c(10, 1000)
data1 = data.frame(x,y)
p1 <- ggplot(data1) + aes(x=x, y=y, colour=x) + geom_line()
y = c(10, 10000)
data2 = data.frame(x,y)
p2 <- ggplot(data2) + aes(x=x, y=y, colour=x) + geom_line()
# Source: http://code.google.com/p/gridextra/wiki/arrangeGrob
leg <- ggplotGrob(p1 + opts(keep="legend_box"))
legend=gTree(children=gList(leg), cl="legendGrob")
widthDetails.legendGrob <- function(x) unit(3, "cm")
grid.arrange(
p1 + opts(legend.position="none"),
p2 + opts(legend.position="none"),
legend=legend, main ="", left = "")
Output
A cleaner way of doing the same thing but in a more generic way is by using the formatter arg:
p1 <- ggplot(data1) +
aes(x=x, y=y, colour=x) +
geom_line() +
scale_y_continuous(formatter = function(x) format(x, width = 5))
Do the same for your second plot and make sure to set the width >= the widest number you expect across both plots.
1. Using cowplot package:
library(cowplot)
plot_grid(p1, p2, ncol=1, align="v")
2. Using tracks from ggbio package:
Note: There seems to be a bug, x ticks do not align. (tested on 17/03/2016, ggbio_1.18.5)
library(ggbio)
tracks(data1=p1,data2=p2)
If you don't mind a shameless kludge, just add an extra character to the longest label in p1, like this:
p1 <- ggplot(data1) +
aes(x=x, y=y, colour=x) +
geom_line() +
scale_y_continuous(breaks = seq(200, 1000, 200),
labels = c(seq(200, 800, 200), " 1000"))
I have two underlying questions, which I hope you'll forgive if you have your reasons:
1) Why not use the same y axis on both? I feel like that's a more straight-forward approach, and easily achieved in your above example by adding scale_y_continuous(limits = c(0, 10000)) to p1.
2) Is the functionality provided by facet_wrap not adequate here? It's hard to know what your data structure is actually like, but here's a toy example of how I'd do this:
library(ggplot2)
# Maybe your dataset is like this
x <- data.frame(x = c(1, 2),
y1 = c(0, 1000),
y2 = c(0, 10000))
# Molten data makes a lot of things easier in ggplot
x.melt <- melt(x, id.var = "x", measure.var = c("y1", "y2"))
# Plot it - one page, two facets, identical axes (though you could change them),
# one legend
ggplot(x.melt, aes(x = x, y = value, color = x)) +
geom_line() +
facet_wrap( ~ variable, nrow = 2)