Context: I have a dataset of 50+ features, and I would like to produce a boxplot, histogram, and summary statistic for each of them, for presentation purposes. That makes 150+ plots. The code I have used to do the above mentioned is as such:
library(ggplot2)
library(dplyr)
library(ggpubr)
library(ggthemes)
library(Rmisc)
library(gridExtra)
myplots <- list() # new empty list
for (i in seq(2,5,3)){
local({
i <- i
p1 <- ggplot(data=dataset,aes(x=dataset[ ,i], colour=label))+
geom_histogram(alpha=.01, position="identity",bins = 33, fill = "white") +
xlab(colnames(dataset)[ i]) + scale_y_log10() + theme_few()
p2<- ggplot(data=dataset, aes( x=label, y=dataset[ ,i], colour=label)) +
geom_boxplot()+ylab(colnames(dataset)[ i]) +theme_few()
p3<- summary(dataset[ ,i])
print(i)
print(p1)
print(p2)
print(p3)
myplots[[i]] <<- p1 # histogram
myplots[[i+1]] <<- p2 # boxplot
myplots[[i+2]] <<- p3 # summary
})
}
myplots[[2]]
length(myplots)
n <- length(myplots)
nCol <- floor(sqrt(n))
do.call("grid.arrange", c(myplots, ncol=nCol)) # PROBLEM: cant print summary as grob
I have created a list of plots, every 3 elements represent the results of a histogram, boxplot, and summary for each feature. I iterate through each of the 50+ features, appending each of the results to my list (not the best way to go about doing this I know). I then run into the following issue when I attempt to print the list through grid arrange:
Error in gList(list(grobs = list(list(x = 0.5, y = 0.5, width = 1, height = 1, :
only 'grobs' allowed in "gList"
Understandably so, as the summary function does not produce a graphical object. Any ideas as to how I can overcome this setback apart from not including summary statistics at all?
Hi after combining several of the suggestions here i managed to figure out how to go about plotting the summary statistics per feature as a grob object, after looping through the different features of my dataset.
library(skimr)
library(GridExtra)
library(ggplot2)
library(dplyr)
mysumplots <- list() # new empty list
for (i in seq(2,ncol(dataset))){
local({
i <-
sampletable <- data.frame(skim((dataset[ ,i]))) #creates a skim data frame
summarystats<-select(sampletable, stat, formatted) #select relevant df columns
summarystats<-slice(summarystats , 4:10) #select relevant stats
p3<-tableGrob(summarystats, rows=NULL) #converts df into a tableGrob
mysumplots[[i]] <<- p3 # summary #appends the grob of to a list of summary table grobs
})
}
do.call("grid.arrange", c(mysumplots, ncol=3)) # use grid arrange to plot all my grobs
What this does is create a skim dataframe of each column (feature), then i selected the relevant statistics, and assigned that grob to the variable p3, which is then iteratively appended to a list of tablegrobs for each feature. I then used gridarrange to print all of the tableGrobs out!
Related
I am trying to construct a list of ggplot graphics, which will be plotted later. What I have so far, using Anscombe's quartet for an example, is:
library(ggplot2)
library(gridExtra)
base <- ggplot() + xlim(4,19)
plots = vector(mode = "list", length = 4)
for(i in 1:4) {
x <- anscombe[,i]
y <- anscombe[,i+4]
p <- geom_point(aes(x,y),colour="blue")
q <- geom_smooth(aes(x,y),method="lm",colour="red",fullrange=T)
plots[[i]] <- base+p+q
}
grid.arrange(grobs = plots,ncol=2)
As I travel through the loop, I want the current values of the plots p and q to be added with the base plot, into the i-th value of the list. That is, so that list element number i contains the plots relating to the i-th x and y columns from the dataset.
However, what happens is that the last plot only is drawn, four times. I've done something very similar with base R, using mfrow, plot and abline, so that I believe my logic is correct, but my implementation isn't. I suspect that the issue is with these lines:
plots = vector(mode = "list", length = 4)
plots[[i]] <- base+p+q
How can I create a list of ggplot graphics; starting with an empty list?
(If this is a trivial and stupid question, I apologise. I am very new both to R and to the Grammar of Graphics.)
The code works properly if lapply() is used instead of a for loop.
plots <- lapply(1:4, function(i) {
# create plot number i
})
The reason for this issue is that ggplot uses lazy evaluation. By the time the plots are rendered, the loop already iterated to i=4 and the last plot will be displayed four times.
Full working example:
library(ggplot2)
library(gridExtra)
base <- ggplot() + xlim(4,19)
plots <- lapply(1:4, function(i) {
x <- anscombe[,i]
y <- anscombe[,i+4]
p <- geom_point(aes(x,y),colour="blue")
q <- geom_smooth(aes(x,y),method="lm",colour="red",fullrange=T)
base+p+q
})
grid.arrange(grobs = plots,ncol=2)
To force evaluation, there's a simple solution, change aes(...) into aes_(...) and your code works.
library(ggplot2)
library(gridExtra)
base <- ggplot() + xlim(4,19)
plots <- lapply(1:4, function(i) {
x <- anscombe[,i]
y <- anscombe[,i+4]
p <- geom_point(aes_(x,y),colour="blue")
q <- geom_smooth(aes_(x,y),method="lm",colour="red",fullrange=T)
base+p+q
})
grid.arrange(grobs = plots,ncol=2)
library(ggplot2)
df <- data.frame(x=1:10, y=rnorm(10))
p1 <- ggplot(df, aes(x,y)) + geom_point()
plist <- list(p1,p1,p1,p1,p1)
# In my real example,a plot function will fit a ggplot to a list of datasets
#and return a list of ggplots like the example above.
I'd like to arrange the plots using grid.arrange() in gridExtra.
How would I do this if the number of plots in plist is variable?
This works:
grid.arrange(plist[[1]],plist[[2]],plist[[3]],plist[[4]],plist[[5]])
but I need a more general solution. thoughts?
How about this:
library(gridExtra)
n <- length(plist)
nCol <- floor(sqrt(n))
do.call("grid.arrange", c(plist, ncol=nCol))
You can use grid.arrange() and arrangeGrob() with lists as long as you specify the list using the grobs = argument in each function. E.g. in the example you gave:
library(ggplot2)
library(gridExtra)
df <- data.frame(x=1:10, y=rnorm(10))
p1 <- ggplot(df, aes(x,y)) + geom_point()
plist <- list(p1,p1,p1,p1,p1)
grid.arrange(grobs = plist, ncol = 2) ## display plot
ggsave(file = OutFileName, arrangeGrob(grobs = plist, ncol = 2)) ## save plot
For the sake of completeness (and as this old, already answered question has been revived, recently) I would like to add a solution using the cowplot package:
cowplot::plot_grid(plotlist = plist, ncol = 2)
I know the question specifically states using the gridExtra package, but the wrap_plots function from the patchwork package is a great way to handle variable length list:
library(ggplot2)
# devtools::install_github("thomasp85/patchwork")
library(patchwork)
df <- data.frame(x=1:10, y=rnorm(10))
p1 <- ggplot(df, aes(x,y)) + geom_point()
plist <- list(p1,p1,p1,p1,p1)
wrap_plots(plist)
A useful thing about it is that you don't need to specify how many columns are required, and will aim to keep the numbers of columns and rows equal. For example:
plist <- list(p1,p1,p1,p1,p1,p1,p1,p1,p1,p1,p1,p1,p1)
wrap_plots(plist) # produces a 4 col x 4 row plot
Find out more about the patchwork package here
To fit all plots on one page you can calculate the number of columns and rows like this:
x = length(plots)
cols = round(sqrt(x),0)
rows = ceiling(x/cols)
As most multiple plotting functions have ncol and nrow as arguments you can just put these in there. I like ggarrange from ggpubr.
ggarrange(plotlist = plots, ncol=cols, nrow = rows)
This favours more rows than columns so reverse if you want the opposite. I.e. for 6 plots it will give 3 rows and 2 columns not the other way around.
I am saving multiple ggplots to a list to be used in a subsequent multiplot. The plots are generated in a loop and appended to the list, however, after the loop all plot objects in the list are the same as the last plot of the loop. I have done the type of operation before, without any issues. Has anyone experienced the same, and solved the problem?
figList <- list()
aoinum <- 1
for (aoi in AOI_list){
...
# prepare dataframe for plotting
dat <- data.frame(...)
fig <- ggplot(data=dat, aes(x=x, y=y, fill=z, alpha=q)) +
geom_bar(...)+
...
figList[[aoi]] <- fig
aoinum = aoinum + 1
}
This is how I managed to make a list of plots in a for loop
#Define list
ggcluster<-list()
for (cluster in 1:nclusters){
# Simple plot )geom_polygon in my case)
ggcluster[[cluster]]<-ggplot() +
geom_polygon(data = datoshp.df, aes(long, lat, group = group))
}
# Build multiplot panel (two columns)
pngname<-paste(output_path,"plot-name",".png",sep="")
png(pngname,width = 1000, height = 1000)
do.call(grid.arrange, c(ggcluster,list(ncol=2)))
dev.off()
my challenge is to plot several bar plots at once, a plot for each of variables of different subsets. My goal is to compare regional differences for each variable. I would like to print all the resulting plots to a html file via R Markdown.
My main difficulty in making automatic grouped bar charts is that you need to tabulate the groups using table(data$Var[i], data$Region)but I don't know how to do this automatically. I would highly appreciate a hint on this.
Here is a an example of what one of my subset looks like:
# To Create this example of data:
b <- rep(matrix(c(1,2,3,2,1,3,1,1,1,1)), times=10)
data <- matrix(b, ncol=10)
colnames(data) <- paste("Var", 1:10, sep = "")
data <- as.data.frame(data)
reg_name <- c("North", "South")
Region <- rep(reg_name, 5)
data <- cbind(data,Region)
Using beside = TRUE, I was able to create one grouped bar plot (grouped by Region for Var1 from data):
tb <- table(data$Var1,data$Region)
barplot(tb, main="Var1", xlab="Values", legend=rownames(tb), beside=TRUE,
col=c("green", "darkblue", "red"))
I would like to loop this process to generate for example 10 plots for Var1 to Var10:
for(i in 1:10){
tb <- table(data[i], data$Region)
barplot(tb, main = i, xlab = "Values", legend = rownames(tb), beside = TRUE,
col=c("green", "darkblue", "red"))
}
R prefer the apply family of functions, therefore I tried to create a function to be applied:
fct <- function(i) {
tb <- table(data[i], data$Region)
barplot(tb, main=i, xlab="Values", legend = rownames(tb), beside = TRUE,
col=c("green", "darkblue", "red"))
}
sapply(data, fct)
I have tried other ways, but I was never successful. Maybe lattice or ggplot2 would offer easier way to do this. I am just starting in R, I will gladly accept any tips and suggestions. Thank you!
(I run on Windows, with the most recent Rv3.1.2 "Pumpking Helmet")
Given that you say "My goal is to compare regional differences for each variable", I'm not sure you've chosen the optimal plotting strategy. But yes, it is possible to do what you are asking.
Here's the default plot you get with your code above, for reference:
If you want a list with 10 plots for each variable, you can do the following (with ggplot)
many_plots <-
# for each column name in dat (except the last one)...
lapply(names(dat)[-ncol(dat)], function(x) {
this_dat <- dat[, c(x, 'Region')]
names(this_dat)[1] <- 'Var'
ggplot(this_dat, aes(x=Var, fill=factor(Var))) +
geom_bar(binwidth=1) + facet_grid(~Region) +
theme_classic()
})
Sample output, for many_plots[[1]]:
If you wanted all the plots in one image, you can do this (using reshape and data.table)
library(data.table)
library(reshape2)
dat2 <-
data.table(melt(dat, id.var='Region'))[, .N, by=list(value, variable, Region)]
ggplot(dat2, aes(y=N, x=value, fill=factor(value))) +
geom_bar(stat='identity') + facet_grid(variable~Region) +
theme_classic()
...but that's not a great plot.
library(ggplot2)
df <- data.frame(x=1:10, y=rnorm(10))
p1 <- ggplot(df, aes(x,y)) + geom_point()
plist <- list(p1,p1,p1,p1,p1)
# In my real example,a plot function will fit a ggplot to a list of datasets
#and return a list of ggplots like the example above.
I'd like to arrange the plots using grid.arrange() in gridExtra.
How would I do this if the number of plots in plist is variable?
This works:
grid.arrange(plist[[1]],plist[[2]],plist[[3]],plist[[4]],plist[[5]])
but I need a more general solution. thoughts?
How about this:
library(gridExtra)
n <- length(plist)
nCol <- floor(sqrt(n))
do.call("grid.arrange", c(plist, ncol=nCol))
You can use grid.arrange() and arrangeGrob() with lists as long as you specify the list using the grobs = argument in each function. E.g. in the example you gave:
library(ggplot2)
library(gridExtra)
df <- data.frame(x=1:10, y=rnorm(10))
p1 <- ggplot(df, aes(x,y)) + geom_point()
plist <- list(p1,p1,p1,p1,p1)
grid.arrange(grobs = plist, ncol = 2) ## display plot
ggsave(file = OutFileName, arrangeGrob(grobs = plist, ncol = 2)) ## save plot
For the sake of completeness (and as this old, already answered question has been revived, recently) I would like to add a solution using the cowplot package:
cowplot::plot_grid(plotlist = plist, ncol = 2)
I know the question specifically states using the gridExtra package, but the wrap_plots function from the patchwork package is a great way to handle variable length list:
library(ggplot2)
# devtools::install_github("thomasp85/patchwork")
library(patchwork)
df <- data.frame(x=1:10, y=rnorm(10))
p1 <- ggplot(df, aes(x,y)) + geom_point()
plist <- list(p1,p1,p1,p1,p1)
wrap_plots(plist)
A useful thing about it is that you don't need to specify how many columns are required, and will aim to keep the numbers of columns and rows equal. For example:
plist <- list(p1,p1,p1,p1,p1,p1,p1,p1,p1,p1,p1,p1,p1)
wrap_plots(plist) # produces a 4 col x 4 row plot
Find out more about the patchwork package here
To fit all plots on one page you can calculate the number of columns and rows like this:
x = length(plots)
cols = round(sqrt(x),0)
rows = ceiling(x/cols)
As most multiple plotting functions have ncol and nrow as arguments you can just put these in there. I like ggarrange from ggpubr.
ggarrange(plotlist = plots, ncol=cols, nrow = rows)
This favours more rows than columns so reverse if you want the opposite. I.e. for 6 plots it will give 3 rows and 2 columns not the other way around.