Plot all columns from a data.frame in a subplot with ggplot2 - r

as the title suggest, I want to plot all columns from my data.frame, but I want to do it in a generic way. All my columns are factor.
Here is my code so far:
nums <- sapply(train_dataset, is.factor) #Select factor columns
factor_columns <- train_dataset[ , nums]
plotList <- list()
for (i in c(1:NCOL(factor_columns))){
name = names(factor_columns)[i]
p <- ggplot(data = factor_columns) + geom_bar(mapping = aes(x = name))
plotList[[i]] <- p
}
multiplot(plotList, cols = 3)
where multiplot function came from here: http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/
And my dataset came from Kaggle (house pricing prediction): https://www.kaggle.com/c/house-prices-advanced-regression-techniques
What I get from my code is the image below, which appears to be the last column badly represented.
This would be the last column well represented:
EDIT:
Using gridExtra as #LAP suggest also doesn't give me a good result. I use this instead of multiplot.
nCol <- floor(sqrt(length(plotList)))
do.call("grid.arrange", c(plotList, ncol=nCol))
but what I get is this:
Again, SaleCondition is the only thing printed and not very well.
PD: I also tried cowplot, same result.

Using tidyr you can do something like the following:
factor_columns %>%
gather(factor, level) %>%
ggplot(aes(level)) + geom_bar() + facet_wrap(~factor, scales = "free_x")

Related

Unpacking a list inside a function in R [duplicate]

library(ggplot2)
df <- data.frame(x=1:10, y=rnorm(10))
p1 <- ggplot(df, aes(x,y)) + geom_point()
plist <- list(p1,p1,p1,p1,p1)
# In my real example,a plot function will fit a ggplot to a list of datasets
#and return a list of ggplots like the example above.
I'd like to arrange the plots using grid.arrange() in gridExtra.
How would I do this if the number of plots in plist is variable?
This works:
grid.arrange(plist[[1]],plist[[2]],plist[[3]],plist[[4]],plist[[5]])
but I need a more general solution. thoughts?
How about this:
library(gridExtra)
n <- length(plist)
nCol <- floor(sqrt(n))
do.call("grid.arrange", c(plist, ncol=nCol))
You can use grid.arrange() and arrangeGrob() with lists as long as you specify the list using the grobs = argument in each function. E.g. in the example you gave:
library(ggplot2)
library(gridExtra)
df <- data.frame(x=1:10, y=rnorm(10))
p1 <- ggplot(df, aes(x,y)) + geom_point()
plist <- list(p1,p1,p1,p1,p1)
grid.arrange(grobs = plist, ncol = 2) ## display plot
ggsave(file = OutFileName, arrangeGrob(grobs = plist, ncol = 2)) ## save plot
For the sake of completeness (and as this old, already answered question has been revived, recently) I would like to add a solution using the cowplot package:
cowplot::plot_grid(plotlist = plist, ncol = 2)
I know the question specifically states using the gridExtra package, but the wrap_plots function from the patchwork package is a great way to handle variable length list:
library(ggplot2)
# devtools::install_github("thomasp85/patchwork")
library(patchwork)
df <- data.frame(x=1:10, y=rnorm(10))
p1 <- ggplot(df, aes(x,y)) + geom_point()
plist <- list(p1,p1,p1,p1,p1)
wrap_plots(plist)
A useful thing about it is that you don't need to specify how many columns are required, and will aim to keep the numbers of columns and rows equal. For example:
plist <- list(p1,p1,p1,p1,p1,p1,p1,p1,p1,p1,p1,p1,p1)
wrap_plots(plist) # produces a 4 col x 4 row plot
Find out more about the patchwork package here
To fit all plots on one page you can calculate the number of columns and rows like this:
x = length(plots)
cols = round(sqrt(x),0)
rows = ceiling(x/cols)
As most multiple plotting functions have ncol and nrow as arguments you can just put these in there. I like ggarrange from ggpubr.
ggarrange(plotlist = plots, ncol=cols, nrow = rows)
This favours more rows than columns so reverse if you want the opposite. I.e. for 6 plots it will give 3 rows and 2 columns not the other way around.

ggplot on grid with a grobList in R

I'm trying to plot multiple plots on a grid using ggplot2 in a for loop, followed by grid.arrange. But all the plots are identical afterwards.
library(ggplot2)
library(grid)
test = data.frame(matrix(rnorm(320), ncol=16 ))
names(test) = sapply(1:16, function(x) paste0("var_",as.character(x)))
plotlist = list()
for (i in 1:(dim(test)[2]-1)){
plotlist[[i]] = ggplot(test) +
geom_point(aes(get(x=names(test)[dim(test)[2]]), y=get(names(test)[i])))
}
pdf("output.pdf")
do.call(grid.arrange, list(grobs=plotlist, nrow=3))
dev.off(4)
When running this code, it seems like the get() calls are only evaluated at the time of the grid.arrange call, so all of the y vectors in the plot are identical as "var_15". Is there a way to force get evaluation immediately, so that I get 15 different plots?
Thanks!
Here are two ways that use purrr::map functions instead of a for-loop. I find that I have less of a clear sense of what's going on when I try to use loops, and since there are functions like the apply and map families that fit so neatly into R's vector operations paradigm, I generally go with mapping instead.
The first example makes use of cowplot::plot_grid, which can take a list of plots and arrange them. The second uses the newer patchwork package, which lets you add plots together—like literally saying plot1 + plot2—and add a layout. To do all those additions, I use purrr::reduce with + as the function being applied to all the plots.
library(tidyverse)
set.seed(722)
test = data.frame(matrix(rnorm(320), ncol=16 ))
names(test) = sapply(1:16, function(x) paste0("var_",as.character(x)))
# extract all but last column
xvars <- test[, -ncol(test)]
By using purrr::imap, I can map over all the columns and apply a function with 2 arguments: the column itself, and its name. That way I can set an x-axis label that specifies the column name. I can also easily access the column of data without having to use get or any tidyeval tricks (although for something for complicated, a tidyeval solution might be better).
plots <- imap(xvars, function(variable, var_name) {
df <- data_frame(x = variable, y = test[, ncol(test)])
ggplot(df, aes(x = x, y = y)) +
geom_point() +
xlab(var_name)
})
cowplot::plot_grid(plotlist = plots, nrow = 3)
library(patchwork)
# same as plots[[1]] + plots[[2]] + plots[[3]] + ...
reduce(plots, `+`) + plot_layout(nrow = 3)
Created on 2018-07-22 by the reprex package (v0.2.0).
Try this:
library(ggplot2)
library(grid)
library(gridExtra)
set.seed(1234)
test = data.frame(matrix(rnorm(320), ncol=16 ))
names(test) = sapply(1:16, function(x) paste0("var_",as.character(x)))
plotlist = list()
for (i in 1:(dim(test)[2]-1)) {
# Define here the dataset for the i-th plot
df <- data.frame(x=test$var_16, y=test[, i])
plotlist[[i]] = ggplot(data=df, aes(x=x, y=y)) + geom_point()
}
grid.arrange(grobs=plotlist, nrow=3)

Plotting Several Grouped Bar Plots in Loop [R]

my challenge is to plot several bar plots at once, a plot for each of variables of different subsets. My goal is to compare regional differences for each variable. I would like to print all the resulting plots to a html file via R Markdown.
My main difficulty in making automatic grouped bar charts is that you need to tabulate the groups using table(data$Var[i], data$Region)but I don't know how to do this automatically. I would highly appreciate a hint on this.
Here is a an example of what one of my subset looks like:
# To Create this example of data:
b <- rep(matrix(c(1,2,3,2,1,3,1,1,1,1)), times=10)
data <- matrix(b, ncol=10)
colnames(data) <- paste("Var", 1:10, sep = "")
data <- as.data.frame(data)
reg_name <- c("North", "South")
Region <- rep(reg_name, 5)
data <- cbind(data,Region)
Using beside = TRUE, I was able to create one grouped bar plot (grouped by Region for Var1 from data):
tb <- table(data$Var1,data$Region)
barplot(tb, main="Var1", xlab="Values", legend=rownames(tb), beside=TRUE,
col=c("green", "darkblue", "red"))
I would like to loop this process to generate for example 10 plots for Var1 to Var10:
for(i in 1:10){
tb <- table(data[i], data$Region)
barplot(tb, main = i, xlab = "Values", legend = rownames(tb), beside = TRUE,
col=c("green", "darkblue", "red"))
}
R prefer the apply family of functions, therefore I tried to create a function to be applied:
fct <- function(i) {
tb <- table(data[i], data$Region)
barplot(tb, main=i, xlab="Values", legend = rownames(tb), beside = TRUE,
col=c("green", "darkblue", "red"))
}
sapply(data, fct)
I have tried other ways, but I was never successful. Maybe lattice or ggplot2 would offer easier way to do this. I am just starting in R, I will gladly accept any tips and suggestions. Thank you!
(I run on Windows, with the most recent Rv3.1.2 "Pumpking Helmet")
Given that you say "My goal is to compare regional differences for each variable", I'm not sure you've chosen the optimal plotting strategy. But yes, it is possible to do what you are asking.
Here's the default plot you get with your code above, for reference:
If you want a list with 10 plots for each variable, you can do the following (with ggplot)
many_plots <-
# for each column name in dat (except the last one)...
lapply(names(dat)[-ncol(dat)], function(x) {
this_dat <- dat[, c(x, 'Region')]
names(this_dat)[1] <- 'Var'
ggplot(this_dat, aes(x=Var, fill=factor(Var))) +
geom_bar(binwidth=1) + facet_grid(~Region) +
theme_classic()
})
Sample output, for many_plots[[1]]:
If you wanted all the plots in one image, you can do this (using reshape and data.table)
library(data.table)
library(reshape2)
dat2 <-
data.table(melt(dat, id.var='Region'))[, .N, by=list(value, variable, Region)]
ggplot(dat2, aes(y=N, x=value, fill=factor(value))) +
geom_bar(stat='identity') + facet_grid(variable~Region) +
theme_classic()
...but that's not a great plot.

Histograms using ggplot2 within loop

I would like to create a grid of histograms using a loop and ggplot2. Say I have the following code:
library(gridExtra)
library(ggplot2)
df<-matrix(NA,2000,5)
df[,1]<-rnorm(2000,1,1)
df[,2]<-rnorm(2000,2,1)
df[,3]<-rnorm(2000,3,1)
df[,4]<-rnorm(2000,4,1)
df[,5]<-rnorm(2000,5,1)
df<-data.frame(df)
out<-NULL
for (i in 1:5){
out[[i]]<-ggplot(df, aes(x=df[,i])) + geom_histogram(binwidth=.5)
}
grid.arrange(out[[1]],out[[2]],out[[3]],out[[4]],out[[5]], ncol=2)
Note that all of the plots appear, but that they all have the same mean and shape, despite having set each of the columns of df to have different means.
It seems to only plot the last plot (out[[5]]), that is, the loop seems to be reassigning all of the out[[i]]s with out[[5]].
I'm not sure why, could someone help?
I agree with #GabrielMagno, facetting is the way to go. But if for some reason you need to work with the loop, then either of these will do the job.
library(gridExtra)
library(ggplot2)
df<-matrix(NA,2000,5)
df[,1]<-rnorm(2000,1,1)
df[,2]<-rnorm(2000,2,1)
df[,3]<-rnorm(2000,3,1)
df[,4]<-rnorm(2000,4,1)
df[,5]<-rnorm(2000,5,1)
df<-data.frame(df)
out<-list()
for (i in 1:5){
x = df[,i]
out[[i]] <- ggplot(data.frame(x), aes(x)) + geom_histogram(binwidth=.5)
}
grid.arrange(out[[1]],out[[2]],out[[3]],out[[4]],out[[5]], ncol=2)
or
out1 = lapply(df, function(x){
ggplot(data.frame(x), aes(x)) + geom_histogram(binwidth=.5) })
grid.arrange(out1[[1]],out1[[2]],out1[[3]],out1[[4]],out1[[5]], ncol=2)
I would recommend using facet_wrap instead of aggregating and arranging the plots by yourself. It requires you to specify a grouping variable in the data frame that separates the values for each distribution. You can use the melt function from the reshape2 package to create such new data frame. So, having your data stored in df, you could simply do this:
library(ggplot2)
library(reshape2)
ggplot(melt(df), aes(x = value)) +
facet_wrap(~ variable, scales = "free", ncol = 2) +
geom_histogram(binwidth = .5)
That would give you something similar to this:

How do I arrange a variable list of plots using grid.arrange?

library(ggplot2)
df <- data.frame(x=1:10, y=rnorm(10))
p1 <- ggplot(df, aes(x,y)) + geom_point()
plist <- list(p1,p1,p1,p1,p1)
# In my real example,a plot function will fit a ggplot to a list of datasets
#and return a list of ggplots like the example above.
I'd like to arrange the plots using grid.arrange() in gridExtra.
How would I do this if the number of plots in plist is variable?
This works:
grid.arrange(plist[[1]],plist[[2]],plist[[3]],plist[[4]],plist[[5]])
but I need a more general solution. thoughts?
How about this:
library(gridExtra)
n <- length(plist)
nCol <- floor(sqrt(n))
do.call("grid.arrange", c(plist, ncol=nCol))
You can use grid.arrange() and arrangeGrob() with lists as long as you specify the list using the grobs = argument in each function. E.g. in the example you gave:
library(ggplot2)
library(gridExtra)
df <- data.frame(x=1:10, y=rnorm(10))
p1 <- ggplot(df, aes(x,y)) + geom_point()
plist <- list(p1,p1,p1,p1,p1)
grid.arrange(grobs = plist, ncol = 2) ## display plot
ggsave(file = OutFileName, arrangeGrob(grobs = plist, ncol = 2)) ## save plot
For the sake of completeness (and as this old, already answered question has been revived, recently) I would like to add a solution using the cowplot package:
cowplot::plot_grid(plotlist = plist, ncol = 2)
I know the question specifically states using the gridExtra package, but the wrap_plots function from the patchwork package is a great way to handle variable length list:
library(ggplot2)
# devtools::install_github("thomasp85/patchwork")
library(patchwork)
df <- data.frame(x=1:10, y=rnorm(10))
p1 <- ggplot(df, aes(x,y)) + geom_point()
plist <- list(p1,p1,p1,p1,p1)
wrap_plots(plist)
A useful thing about it is that you don't need to specify how many columns are required, and will aim to keep the numbers of columns and rows equal. For example:
plist <- list(p1,p1,p1,p1,p1,p1,p1,p1,p1,p1,p1,p1,p1)
wrap_plots(plist) # produces a 4 col x 4 row plot
Find out more about the patchwork package here
To fit all plots on one page you can calculate the number of columns and rows like this:
x = length(plots)
cols = round(sqrt(x),0)
rows = ceiling(x/cols)
As most multiple plotting functions have ncol and nrow as arguments you can just put these in there. I like ggarrange from ggpubr.
ggarrange(plotlist = plots, ncol=cols, nrow = rows)
This favours more rows than columns so reverse if you want the opposite. I.e. for 6 plots it will give 3 rows and 2 columns not the other way around.

Resources