arbitrary number of plots for grid.arrange - r

I'm trying to plot an arbitrary number of bar plots with rmarkdown separated by 2 columns. In my example there will be 20 total plots so I was hoping to get 10 plots in each column, however, I can't seem to get this to work with grid.arrange
plot.categoric = function(df, feature){
df = data.frame(x=df[,feature])
plot.feature = ggplot(df, aes(x=x, fill = x)) +
geom_bar() +
geom_text(aes(label=scales::percent(..count../1460)), stat='count', vjust=-.4) +
labs(x=feature, fill=feature) +
ggtitle(paste0(length(df$x))) +
theme_minimal()
return(plot.feature)
}
plist = list()
for (i in 1:20){
plist = c(plist, list(plot.categoric(train, cat_features[i])))
}
args.list = c(plist, list(ncol=2))
do.call("grid.arrange", args.list)
When I knit this to html I'm getting the following output:
I was hoping I would get something along the lines of:
but even with this the figure sizes are still funky, I've tried playing with heights and widths but still no luck. Apologies if this is a long question

If you have all the ggplot objects in a list then you can easily build the two column graphic via gridExtra::grid.arrange. Here is a simple example that will put eight graphics into a 4x2 matrix.
library(ggplot2)
library(gridExtra)
# Build a set of plots
plots <-
lapply(unique(diamonds$clarity),
function(cl) {
ggplot(subset(diamonds, clarity %in% cl)) +
aes(x = carat, y = price, color = color) +
geom_point()
})
length(plots)
# [1] 8
grid.arrange(grobs = plots, ncol = 2)

Related

Vertically combing multiple bar plots

I'm trying to use to vertically group bar plots, sharing their x-axes.
I thought of using R's plotly's subplot for that but running into an issue I hope someone here may have a solution for.
Here are example data which have 28 groups where I'm creating a bar plot over 4 families in each group and then trying to vertically combine them using plotly::subplot:
set.seed(1)
df <- data.frame(group = paste0("G",unlist(lapply(1:28,function(i) rep(i,4)))),
family = paste0("F",rep(1:4,28)),
log2n = log2(as.integer(runif(4*28,0,30))+1),
stringsAsFactors = F)
Creating the list of bar plots:
library(plotly)
library(dplyr)
groups <- unique(df$group)
y.range <- c(0,max(df$log2n))
plot.list <- lapply(1:length(groups),function(g){
group.df <- dplyr::filter(df,group == groups[g])
plot_ly(x=group.df$family,y=group.df$log2n,type='bar',name=group.df$family,color=group.df$family,showlegend=(g==length(groups))) %>%
layout(yaxis=list(range=y.range))
})
If I try:
plotly::subplot(plot.list,shareX=T,nrows=length(plot.list))
I get:
So it seems like some sort of an overflow.
I gradually cut down on the number of plots in plot.list that I run subplot on and when reached 19 it seemed to stop 'overflowing':
plotly::subplot(plot.list[1:19],shareX=T,nrows=19)
Any idea if there's hope to get all 28 bar plots without overflowing?
Thanks a lot
I would generate the figure with ggplot and then convert it to plotly (or save it as a picture file) with proper size arguments.
library(plotly)
library(tidyverse)
g <- ggplot(df,
aes(x = family, y = log2n, fill = family)) +
geom_bar(stat = 'identity') +
facet_wrap(~group, ncol = 1) +
theme_minimal() +
theme(legend.position = "none")
ggsave(g, file = "temp.png", width = 4, height = 40)
ggplotly(g, width = 400, height = 4000)

How to plot histograms of raw data on the margins of a plot of interpolated data

I would like to show in the same plot interpolated data and a histogram of the raw data of each predictor. I have seen in other threads like this one, people explain how to do marginal histograms of the same data shown in a scatter plot, in this case, the histogram is however based on other data (the raw data).
Suppose we see how price is related to carat and table in the diamonds dataset:
library(ggplot2)
p = ggplot(diamonds, aes(x = carat, y = table, color = price)) + geom_point()
We can add a marginal frequency plot e.g. with ggMarginal
library(ggExtra)
ggMarginal(p)
How do we add something similar to a tile plot of predicted diamond prices?
library(mgcv)
model = gam(price ~ s(table, carat), data = diamonds)
newdat = expand.grid(seq(55,75, 5), c(1:4))
names(newdat) = c("table", "carat")
newdat$predicted_price = predict(model, newdat)
ggplot(newdat,aes(x = carat, y = table, fill = predicted_price)) +
geom_tile()
Ideally, the histograms go even beyond the margins of the tileplot, as these data points also influence the predictions. I would, however, be already very happy to know how to plot a histogram for the range that is shown in the tileplot. (Maybe the values that are outside the range could just be added to the extreme values in different color.)
PS. I managed to more or less align histograms to the margins of the sides of a tile plot, using the method of the accepted answer in the linked thread, but only if I removed all kind of labels. It would be particularly good to keep the color legend, if possible.
EDIT:
eipi10 provided an excellent solution. I tried to modify it slightly to add the sample size in numbers and to graphically show values outside the plotted range since they also affect the interpolated values.
I intended to include them in a different color in the histograms at the side. I hereby attempted to count them towards the lower and upper end of the plotted range. I also attempted to plot the sample size in numbers somewhere on the plot. However, I failed with both.
This was my attempt to graphically illustrate the sample size beyond the plotted area:
plot_data = diamonds
plot_data <- transform(plot_data, carat_range = ifelse(carat < 1 | carat > 4, "outside", "within"))
plot_data <- within(plot_data, carat[carat < 1] <- 1)
plot_data <- within(plot_data, carat[carat > 4] <- 4)
plot_data$carat_range = as.factor(plot_data$carat_range)
p2 = ggplot(plot_data, aes(carat, fill = carat_range)) +
geom_histogram() +
thm +
coord_cartesian(xlim=xrng)
I tried to add the sample size in numbers with geom_text. I tried fitting it in the far right panel but it was difficult (/impossible for me) to adjust. I tried to put it on the main graph (which would anyway probably not be the best solution), but it didn’t work either (it removed the histogram and legend, on the right side and it did not plot all geom_texts). I also tried to add a third row of plots and writing it there. My attempt:
n_table_above = nrow(subset(diamonds, table > 75))
n_table_below = nrow(subset(diamonds, table < 55))
n_table_within = nrow(subset(diamonds, table >= 55 & table <= 75))
text_p = ggplot()+
geom_text(aes(x = 0.9, y = 2, label = paste0("N(>75) = ", n_table_above)))+
geom_text(aes(x = 1, y = 2, label = paste0("N = ", n_table_within)))+
geom_text(aes(x = 1.1, y = 2, label = paste0("N(<55) = ", n_table_below)))+
thm
library(egg)
pobj = ggarrange(p2, ggplot(), p1, p3,
ncol=2, widths=c(4,1), heights=c(1,4))
grid.arrange(pobj, leg, text_p, ggplot(), widths=c(6,1), heights =c(6,1))
I would be very happy to receive help on either or both tasks (adding sample size as text & adding values outside plotted range in a different color).
Based on your comment, maybe the best approach is to roll your own layout. Below is an example. We create the marginal plots as separate ggplot objects and lay them out with the main plot. We also extract the legend and put it outside the marginal plots.
Set-up
library(ggplot2)
library(cowplot)
# Function to extract legend
#https://github.com/hadley/ggplot2/wiki/Share-a-legend-between-two-ggplot2-graphs
g_legend<-function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
return(legend) }
thm = list(theme_void(),
guides(fill=FALSE),
theme(plot.margin=unit(rep(0,4), "lines")))
xrng = c(0.6,4.4)
yrng = c(53,77)
Plots
p1 = ggplot(newdat, aes(x = carat, y = table, fill = predicted_price)) +
geom_tile() +
theme_classic() +
coord_cartesian(xlim=xrng, ylim=yrng)
leg = g_legend(p1)
p1 = p1 + thm[-1]
p2 = ggplot(diamonds, aes(carat)) +
geom_line(stat="density") +
thm +
coord_cartesian(xlim=xrng)
p3 = ggplot(diamonds, aes(table)) +
geom_line(stat="density") +
thm +
coord_flip(xlim=yrng)
plot_grid(
plot_grid(plotlist=list(p2, ggplot(), p1, p3), ncol=2,
rel_widths=c(4,1), rel_heights=c(1,4), align="hv", scale=1.1),
leg, rel_widths=c(5,1))
UPDATE: Regarding your comment about the space between the plots: This is an Achilles heel of plot_grid and I don't know if there's a way to fix it. Another option is ggarrange from the experimental egg package, which doesn't add so much space between plots. Also, you need to save the output of ggarrange first and then lay out the saved object with the legend. If you run ggarrange inside grid.arrange you get two overlapping copies of the plot:
# devtools::install_github('baptiste/egg')
library(egg)
pobj = ggarrange(p2, ggplot(), p1, p3,
ncol=2, widths=c(4,1), heights=c(1,4))
grid.arrange(pobj, leg, widths=c(6,1))

Saving ggplots to a list in a for loop

I produce nine ggplots within a for loop and subsequently arrange those plots using grid.arrange:
plot_list <- list()
for(i in c(3:ncol(bilanz.vol))) {
histogram <- ggplot(data = bilanz.vol, aes(x = bilanz.vol[,i])) +
geom_histogram() +
scale_x_log10() +
ggtitle(paste(varnames[i]))
# ggsave(filename = paste("Graphs/", vars[i], ".png", sep = ""), width = 16, height = 12, units = "cm")
plot_list <- c(plot_list, list(histogram))
}
library(gridExtra)
png(filename = "Graphs/non-mfi.png", width = 1280, height = 960, units = "px")
do.call(grid.arrange, c(plot_list, list(ncol = 3)))
dev.off()
The code itself works fine and there are no errors. But for some reason I do not understand, the grid shows the same (last) histogram nine times. Still, each plot shows the correct title.
Interestingly, when I uncomment the ggsave line in the code above, each plot is saved correctly (separately) and shows the expected histogram.
Any ideas?
The reason is that ggplot does not evaluate the expression in the aes call before it is used (so I believe at least), it just sets up the plot and stores the data inside of it. In you case "the data" is the entire data frame bilanz.vol and since i = ncol(bilanz.vol) after the for loop completes the expression bilanz.vol[,i] will evaluate to the same thing for all plot objects.
To make it work you could do this, which makes sure all plot objects contains different data sets my.data.
my.data <- data.frame(x = bilanz.vol[,i])
histogram <- ggplot(data = my.data, aes(x = x)) +
geom_histogram() +
scale_x_log10() +
ggtitle(paste(varnames[i]))

Moving table created by annotation_custom with geom_bar plot

I tried searching for answers but couldn't find anything.
I have have a plot and want to add a table within the plot itself. I can do it but the table ends up being right in the middle.
It is possible to relocate a table created by annotation_custom if the x-axis is discrete? If so, how?
Thank you!
For example, I want to relocate this table.
library(ggplot2)
library(gridExtra)
my.summary <- summary(chickwts$weight)
my.table <- data.frame(ids = names(my.summary), nums = as.numeric(my.summary))
ggplot(chickwts, aes(feed, weight)) +
geom_bar(stat = "identity") +
annotation_custom(tableGrob(my.table))
The custom annotation in ggplot2 can be rearragned inside the plotting area. This at least moves them out of the center. Maybe this solution is already sufficient for you. I'll try and tweak this. It should be possible to put this outside the plotting area as well.
library(ggplot2)
library(gridExtra)
my.summary <- summary(chickwts$weight)
my.table <- data.frame(ids = names(my.summary), nums = as.numeric(my.summary))
ggplot(chickwts, aes(feed, weight)) +
geom_bar(stat = "identity") +
annotation_custom(tableGrob(my.table), xmin=5,xmax=6,ymin=300,ymax=1300)
EDIT:
To place the table outside the plot, regardless of what the plot consists of, the grid package could be used:
library(ggplot2)
library(gridExtra)
library(grid)
# data
my.summary <- summary(chickwts$weight)
my.table <- data.frame(ids = names(my.summary), nums = as.numeric(my.summary))
# plot items
my.tGrob <- tableGrob(my.table)
plt <- ggplot(chickwts, aes(feed, weight)) +
geom_bar(stat = "identity")
# layout
vp.layout <- grid.layout(nrow=1, ncol=2, heights=unit(1, "null"),
widths=unit(c(1,9), c("null","line")) )
# start drawing
grid.newpage()
pushViewport(viewport(layout=vp.layout, name="layout"))
# plot
pushViewport(viewport(layout.pos.row=1, layout.pos.col=1, name="plot"))
print(plt, newpage=FALSE)
upViewport()
# table
pushViewport(viewport(layout.pos.row=1, layout.pos.col=2, name="table"))
grid.draw(my.tGrob)
upViewport()
#dev.off()

Keep all plot components same size in ggplot2 between two plots

I would like two separate plots. I am using them in different frames of a beamer presentation and I will add one line to the other (eventually, not in example below). Thus I do not want the presentation to "skip" ("jump" ?) from one slide to the next slide. I would like it to look like the line is being added naturally. The below code I believe shows the problem. It is subtle, but not how the plot area of the second plot is slightly larger than of the first plot. This happens because of the y axis label.
library(ggplot2)
dfr1 <- data.frame(
time = 1:10,
value = runif(10)
)
dfr2 <- data.frame(
time = 1:10,
value = runif(10, 1000, 1001)
)
p1 <- ggplot(dfr1, aes(time, value)) + geom_line() + scale_y_continuous(breaks = NULL) + scale_x_continuous(breaks = NULL) + ylab(expression(hat(z)==hat(gamma)[1]*time+hat(gamma)[4]*time^2))
print(p1)
dev.new()
p2 <- ggplot(dfr2, aes(time, value)) + geom_line() + scale_y_continuous(breaks = NULL) + scale_x_continuous(breaks = NULL) + ylab(".")
print(p2)
I would prefer to not have a hackish solution such as setting the size of the axis label manually or adding spaces on the x-axis (see one reference below), because I will use this technique in several settings and the labels can change at any time (I like reproducibility so want a flexible solution).
I'm searched a lot and have found the following:
Specifying ggplot2 panel width
How can I make consistent-width plots in ggplot (with legends)?
https://groups.google.com/forum/#!topic/ggplot2/2MNoYtX8EEY
How can I add variable size y-axis labels in R with ggplot2 without changing the plot width?
They do not work for me, mainly because I need separate plots, so it is not a matter of aligning them virtically on one combined plot as in some of the above solutions.
haven't tried, but this might work,
gl <- lapply(list(p1,p2), ggplotGrob)
library(grid)
widths <- do.call(unit.pmax, lapply(gl, "[[", "widths"))
heights <- do.call(unit.pmax, lapply(gl, "[[", "heights"))
lg <- lapply(gl, function(g) {g$widths <- widths; g$heights <- heights; g})
grid.newpage()
grid.draw(lg[[1]])
grid.newpage()
grid.draw(lg[[2]])
How about using this for p2:
p2 <- ggplot(dfr2, aes(time, value)) + geom_line() +
scale_y_continuous(breaks = NULL) +
scale_x_continuous(breaks = NULL) +
ylab(expression(hat(z)==hat(gamma)[1]*time+hat(gamma)[4]*time^2)) +
theme(axis.title.y=element_text(color=NA))
This has the same label as p1, but the color is NA so it doesn't display. You could also use color="white".

Resources