Multiplot of multiplots in ggplot2 - r

I recently discovered the multiplot function from the Rmisc package to produce stacked plots using ggplot2 plots/objects. What I am trying to do now is to create a multiplot of multiplots. Unfortunately, unlike the ggplot function, multiplot does not produce objects, so my issue cannot be resolved by simply nesting multiplot.
I will create a dataframe to make my point clear. In my dataframe named df, I have 3 columns: period, group and value. A certain value is recorded for each of 3 groups over 10 periods. (Note: I don't use a seed number below despite the use of the sample function because the focus is not numerical, it is graphical)
# Create a data frame for illustration purposes
df <- data.frame(period = rep(1:10, 3),
group = rep(LETTERS[1:3], each = 10),
value = sample(100, 30, replace = TRUE))
I then add a fourth column to df, which is the exponential transformation of the value column.
df$exp.value = exp(df$value)
I would like to create stacked plots allowing me to compare the values in each group to their exponential counterparts.
# Split dataframe by group
df_split <- split(df, df$group)
# Plots of values in each group
plots <- lapply(df_split, function(i){
ggplot(data = i, aes(x = period, y = value)) + geom_line()
})
# Plots of logged values in each group
plots_exp <- lapply(df_split, function(i){
ggplot(data = i, aes(x = period, y = exp.value)) + geom_line()
})
plots and plots_exp are both lists of 3 elements each containing ggplot objects. The first element of each list corresponds to group A, the second element corresponds to group B and the third element corresponds to group C.
In order to compare each group's values to the exponential values, I can use the multiplot function. Following is an example with group A:
multiplot(plots[[1]], plots_log[[1]], cols = 1)
How can I create a grid which will include the multiplot above as well as the ones for groups B and C? As if the code included ... + facet_grid(. ~ group)?

We can use cowplot package:
library(cowplot)
plot_grid(plots[[1]], plots_exp[[1]],
plots[[2]], plots_exp[[2]],
plots[[3]], plots_exp[[3]],
labels = c("A", "A", "B", "B", "C", "C"),
ncol = 1, align = "v")
We can output to a pdf looping through plots and plots_exp list objects. Every page will contain 2 plots. This is a better option when we have a lot of groups:
pdf("myPlots.pdf")
lapply(seq(length(plots)), function(i){
plot_grid(plots[[i]], plots_exp[[i]], ncol = 1, align = "v")
})
dev.off()
Another option is to prepare the data for ggplot and use facet as usual:
library(dplyr)
library(tidyr)
library(ggplot2)
gather(df, valueType, value, -c(group, period)) %>%
mutate(myGroup = paste(group, valueType)) %>%
ggplot(aes(period, value)) +
geom_line() +
facet_grid(myGroup ~ ., scales = "free_y")

Related

Setting per-column y axis limits with facet_grid

I am, in R and using ggplot2, plotting the development over time of several variables for several groups in my sample (days of the week, to be precise). An artificial sample (using long data suitable for plotting) is this:
library(tidyverse)
groups1 <- rep(1:2, each = 7 * 100)
groups2 <- rep(rep(1:7, times = 2), each = 100)
x <- rep(1:100, times = 14)
values <- c(rnorm(n = 700), rgamma(n = 700, shape = 2))
data <- tibble(x, groups1, groups2, values)
data %>% ggplot(mapping = aes(x = x, y = values)) + geom_line() + facet_grid(groups2 ~ groups1)
which gives
In this example, the first variable -- shown in the left column -- has unlimited range, while the second variable -- shown in the right column -- is weakly positive.
I would like to reflect this in my plot by allowing the Y axes to differ across the columns in this plot, i.e. set Y axis limits separately for the two variables plotted. However, in order to allow for easy visual comparison of the different groups for each of the two variables, I would also like to have the identical Y axes within each column.
I've looked at the scales option to facet_grid(), but it does not seem to be able to do what I want. Specifically,
passing scales = "free_x" allows the Y axes to vary across rows, while
passing scales = "free_y" allows the X axes to vary across columns, but
there is no option to allow the Y axes to vary across columns (nor, presumably, the X axes across rows).
As usual, my attempts to find a solution have yielded nothing. Thank you very much for your help!
I think the easiest would to create a plot per facet column and bind them with something like {patchwork}. To get the facet look, you can still add a faceting layer.
library(tidyverse)
library(patchwork)
groups1 <- rep(1:2, each = 7 * 100)
groups2 <- rep(rep(1:7, times = 2), each = 100)
x <- rep(1:100, times = 14)
set.seed(42) ## always better to set a seed before using random functions
values <- c(rnorm(n = 700), rgamma(n = 700, shape = 2))
data <- tibble(x, groups1, groups2, values)
data %>%
group_split(groups1) %>%
map({
~ggplot(.x, aes(x = x, y = values)) +
geom_line() +
facet_grid(groups2 ~ groups1)
}) %>%
wrap_plots()
Created on 2023-01-11 with reprex v2.0.2

transform the data and create histogram with facets in R

I have 400 columns with dynamic names (t_namelist1_namelist2). There are 20 names in each namelist1 and namelist2. I have to create a histogram with facets 20 by 20, with labels -
along row - namelist1
along col - namelist2
Can someone please show how to transform the data using pivot_longer() using _ which separates the three parts t, namelist1 and namelist2
In the sample problem, I have a tibble with 4 columns, I want to create 4 individual histograms in 2by2 facets with labels -
along row - a and b
along col - x and y
Thanks
library(tidyverse)
t_a_x <- rnorm(100)
t_b_x <- rnorm(100)
t_a_y <- rnorm(100)
t_b_y <- rnorm(100)
tbl <- tibble(t_a_x, t_a_y, t_b_x, t_b_y)
# create a histogram in 2 by 2 facets with labels -
# along row - a and b
# along col - x and y
The first chunk of code below rearranges the example data into three-column data frame, each column corresponding to either "ab", "xy", or the value ("t"); basically separating the original column name by "_". Then you can plot and facet based on "xy" and "ab".
# Rearrange table by separating the column names by "_" using pivot_longer()
tbl_formatted <- tbl %>% pivot_longer(everything(),
names_to = c(".value", "ab", "xy"),
names_sep = c("_")
)
# Plot
tbl_formatted %>% ggplot(aes(x = t, y = t)) +
geom_point() +
facet_wrap(facets = ab ~ xy)
This is a very basic version of the plot; you can customize it with colors and more.
To get it, you need to properly rearrange your dataframe:
library(tidyverse)
tbl <- tibble(value = c(t_a_x, t_b_x, t_a_y, t_b_y),
lab1 = rep(c("a", "b", "a", "b"), each = 100),
lab2 = rep(c("x", "x", "y", "y"), each = 100))
ggplot(tbl) +
geom_histogram(aes(value), binwidth = 0.5) +
facet_grid(lab1~lab2)

Plotting multiple box plots as a single graph in R

I am trying to plot multiple box plots as a single graph. The data is where I have done a wilcoxon test. It should be like this
I have four/five questions and I want to plot the respondent score for two sets as a box plot. This should be done for all questions (Two groups for each question).
I am thinking of using ggplot2. My data is like
q1o <- c(4,4,5,4,4,4,4,5,4,5,4,4,5,4,4,4,5,5,5,5,5,5,5,5,5,3,4,4,3,4)
q1s <- c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,5,4,4)
q2o <- c(3,3,3,4,3,4,4,3,3,3,4,4,3,4,3,3,4,3,3,3,3,4,4,4,4,3,3,3,3,4)
q2s <- c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,3,4,4)
....
....
q1 means question 1 and q2 means question 2. I also want to know how to align these stacked box plots based on my need. Like one row or two rows.
This should get you started:
Unfortunately you don't provide a minimal example with sample data, so I will generate some random sample data.
# Generate sample data
set.seed(2017);
df <- cbind.data.frame(
value = rnorm(1000),
Label = sample(c("Good", "Bad"), 1000, replace = T),
variable = sample(paste0("F", 5:11), 1000, replace = T));
# ggplot
library(tidyverse);
df %>%
mutate(variable = factor(variable, levels = paste0("F", 5:11))) %>%
ggplot(aes(variable, value, fill = Label)) +
geom_boxplot(position=position_dodge()) +
facet_wrap(~ variable, ncol = 3, scale = "free");
You can specify the number of columns and rows in your 2d panel layout through arguments ncol and nrow, respectively, of facet_wrap. Many more details and examples can be found if you follow ?geom_boxplot and ?facet_wrap.
Update 1
A boxplot based on your sample data doesn't make too much sense, because your data are not continuous. But ignoring that, you could do the following:
df <- data.frame(
q1o = c(4,4,5,4,4,4,4,5,4,5,4,4,5,4,4,4,5,5,5,5,5,5,5,5,5,3,4,4,3,4),
q1s = c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,5,4,4),
q2o = c(3,3,3,4,3,4,4,3,3,3,4,4,3,4,3,3,4,3,3,3,3,4,4,4,4,3,3,3,3,4),
q2s = c(5,4,4,5,5,5,5,5,4,5,4,4,5,4,5,5,5,5,5,5,5,5,5,5,5,5,4,3,4,4));
df %>%
gather(key, value, 1:4) %>%
mutate(
variable = ifelse(grepl("q1", key), "F1", "F2"),
Label = ifelse(grepl("o$", key), "Bad", "Good")) %>%
ggplot(aes(variable, value, fill = Label)) +
geom_boxplot(position = position_dodge()) +
facet_wrap(~ variable, ncol = 3, scale = "free");
Update 2
One way of visualising discrete data would be in a mosaicplot.
mosaicplot(table(df2));
The plot shows the count of value (as filled rectangles) per Variable per Label. See ?mosaicplot for details.

ggplot boxplot: How to order x axis according to a third variable?

I have a simple dataframe containing three columns:
ST_CODE | VALUE | HEIGHT
... ... ...
factor continuous continuous
I want a VALUE boxplot for each ST_CODE, but I want the order on the x axis to be determined by the ascending order of HEIGHT.
This is the code:
ggplot(ozone, aes(x = ST_CODE, y = VALUE)) +
geom_boxplot(notch=TRUE)
Ordering ozone inside the ggplot function by doing ozone[order(ozone$HEIGHT),] was useless, because the order is determined by ST_CODE. What should I do?
Here's the dataset: https://www.dropbox.com/s/kf0jcv50oaa5my9/ozone_example.csv?dl=0
I have found this question, but I didn't really get it: Rearrange x axis according to a variable in ggplot
The solution should be to order the levels of the factor variable ST_CODE according to the VALUE column.
Until you provide example data this is my best guess :-)
Edit 1: I have added read.csv to read your data and I would say it works. To make it easier to check the result I have used only the first 1000 rows which contain only three different ST_CODEs).
library(ggplot2)
# example data
# data <- data.frame( ST_CODE = rep(c("A", "B", "C"), 2), VALUE = rep(3:1, 2), HEIGHT = rep(c(2, 1, 3), 2))
# data
# Your data
data <- read.csv("ozone_example.csv")
data <- data[1:1000,]
table(data$ST_CODE, data$HEIGHT) # indicates how to order ST_CODEs
# plot (not sorted by HEIGHT)
ggplot(data, aes(x = ST_CODE, y = VALUE)) +
geom_boxplot(notch=TRUE)
# Plot sorted by HEIGHT by changing the factor level order
ordered.data <- data[order(data$HEIGHT),]
data$ST_CODE <- factor(data$ST_CODE, levels = unique(ordered.data$ST_CODE))
ggplot(data, aes(x = ST_CODE, y = VALUE)) +
geom_boxplot(notch=TRUE)

How to add a trendline to a boxplot of counts(y axis) and ids(x axis) when x axis is ordered

df1 <- data.frame(a=c(1,4,7),
b=c(3, 5, 6),
c=c(1, 1, 4),
d=c(2 ,6 ,3))
df2<-data.frame(id=c("a","f","f","b","b","c","c","c","d","d"),
var=c(12,20,15,18,10,30,5,8,5,5))
mediorder <- with(df2, reorder(id, -var, median))
boxplot(var~mediorder, data = df2)
fc = levels(as.factor(mediorder))
ndf1= df1[,intersect(fc, colnames(df1))]
ln<-lm( #confused here
boxplot(ndf1)
abline(ln)
I have the above boxplot (ndf1) with an x-axis ordered according to medians from another data frame, and I would like to add a trendline to it.
I am confused since it doesn't have an x and y variable to refer to, just columns with counts. Also the ordering is causing me problems.
EDITED for clarification...
I am building on the question here: How to match an ordered list (e.g., levels(as.factor(x)) ) to another dataframe in which only some columns match?
All I would like to do is fit a trend line to ndf1
Something like this should do. It's fairly easy using ggplot2. However, your data/question are a bit confusing e.g. Some factors (a,d) have one data point only. Is this what you want?
df2$id <- factor(df2$id , levels = levels(mediorder))
library(ggplot2)
ggplot(data = df2, aes(x = id, y = var)) + geom_boxplot() +
geom_smooth(method = "lm", aes(group = 1), se = F)

Resources