How to visualize multiple bar plots in one (or splitted) pdf - r

I'm using the tidyverse-ggplot2 combination to plot multiple bar plots. In one of my comparisons i would like to have even up to 300 single plots. I was wondering if there is a possibility to make sure that the plots will be visible in the pdf file and not look like the attached example
If possible I would prefer to have all the plots in one single pdf file, but if not, also multiple pages will be ok.
The command to plot the bar charts is
common %>%
as_tibble(rownames="gene") %>%
left_join(x= ., y = up[,1:2], by = c("gene" = "ensembl_gene_id") ) %>%
pivot_longer(starts_with("S"), names_to="sample", values_to="counts") %>%
left_join(groups, by="sample") %>%
group_by(mgi_symbol, group, cond, time) %>%
summarize(mean_count=mean(counts)) %>%
ggplot( aes(x = time, y = mean_count, fill=cond)) +
geom_bar(stat = "identity", position = position_dodge(width=0.9) ) +
scale_fill_manual(values=c("darkblue", "lightblue", "black")) +
facet_wrap(~mgi_symbol, scales = "free", ncol = 5) +
theme_bw()
I forgot to add the group table
groups <- tibble(
sample= colnames(normCounts),
group = rep(seq(1, ncol(normCounts)/3), each=3),
cond = rep(c("WT", "GCN2-KO", "GCN1-KO"), each = 12),
time = rep(rep(c("0h", "1h", "4h", "8h"), each=3), times = 3 )
)
thanks
Adding the command with the group_map was as such
common %>%
as_tibble(rownames="gene") %>%
left_join(x= ., y = up[,1:2], by = c("gene" = "ensembl_gene_id") ) %>%
pivot_longer(starts_with("S"), names_to="sample", values_to="counts") %>%
left_join(groups, by="sample") %>%
group_by(mgi_symbol, group, cond, time) %>%
summarize(mean_count=mean(counts)) %>%
group_map(function(g, ...)
ggplot(g, aes(x = time, y = mean_count, fill=cond)) +
geom_bar(stat = "identity", position = position_dodge(width=0.9) ) +
scale_fill_manual(values=c("darkblue", "lightblue", "black")) +
facet_wrap(~mgi_symbol, scales = "free", ncol = 5) +
theme_bw()
)
EDIT
This is how the data looks like in the input table (after summarizing the means)
df <-
common %>%
as_tibble(rownames="gene") %>%
left_join(x= ., y = up[,1:2], by = c("gene" = "ensembl_gene_id") ) %>%
pivot_longer(starts_with("S"), names_to="sample", values_to="counts") %>%
left_join(groups, by="sample") %>%
group_by(mgi_symbol, group, cond, time) %>%
summarize(mean_count=mean(counts)) %>%
ungroup()
df
#>`summarise()` regrouping output by 'mgi_symbol', 'group', 'cond' (override with `.groups` argument)
#> # A tibble: 1,212 x 5
#> mgi_symbol group cond time mean_count
#> <chr> <int> <chr> <chr> <dbl>
#> 1 0610031O16Rik 1 WT 0h 14.4
#> 2 0610031O16Rik 2 WT 1h 30.9
#> 3 0610031O16Rik 3 WT 4h 45.5
#> 4 0610031O16Rik 4 WT 8h 56.0
#> 5 0610031O16Rik 5 GCN2-KO 0h 18.9
#> 6 0610031O16Rik 6 GCN2-KO 1h 39.4
#> 7 0610031O16Rik 7 GCN2-KO 4h 13.9
#> 8 0610031O16Rik 8 GCN2-KO 8h 13.3
#> 9 0610031O16Rik 9 GCN1-KO 0h 12.3
#> 10 0610031O16Rik 10 GCN1-KO 1h 25.3
#> # … with 1,202 more rows

Start with some dummy data. This is the data after you've finished running left_join, pivot_longer, group_by, summarize.
library(tidyverse)
df <- tibble(
time = 1:5,
mean_count = 1:5,
cond = "x"
) %>%
expand_grid(mgi_symbol = c(letters, LETTERS))
Create a column group which represents what page the mgi_symbol belongs on.
plots_per_page <- 20
df <-
df %>%
mutate(group = (dense_rank(mgi_symbol) - 1) %/% plots_per_page)
Create all the plots with group_map.
plots <-
df %>%
group_by(group) %>%
group_map(function(g, ...) {
ggplot(g, aes(x = time, y = mean_count, fill=cond)) +
geom_bar(stat = "identity", position = position_dodge(width=0.9) ) +
scale_fill_manual(values=c("darkblue", "lightblue", "black")) +
facet_wrap(~mgi_symbol, scales = "free", ncol = 5) +
theme_bw()
})
Save as multiple pages using ggpubr
ggpubr::ggexport(
ggpubr::ggarrange(plotlist = plots, nrow = 1, ncol = 1),
filename = "plots.pdf"
)

Related

barplot with different factor order for each x-axis tick

I was answering this question where #Léo wanted a barplot with stat = "identity" and position = "identity". This causes the bars (one for every value of the fill aesthetic) to get on top of eachother, making some to get hidden:
His solution was to set alpha = 0.5, but he didn't liked the result as the colors mixed in different ways in each x-axis tick. Thus, i figured that the solution would be to have a different color ordering for each x-axis tick, but i don't know how to do it in ggplot.
What I've tried:
Dummy data:
library(tidyverse)
set.seed(7)
df = tibble(
categories = rep(c("a", "b", "c"), each = 3) %>% factor(),
xaxis = rep(1:3, 3) %>% factor(),
yaxis = runif(9))
What plotted the "original" graph, shown above:
ggplot() +
geom_bar(aes(xaxis, yaxis, fill = categories), df,
stat = "identity", position = "identity")
My attempt: changing the categories levels order and creating a different geom_bar for each x-axis value with a for loop:
g = ggplot()
for(x in unique(df$xaxis)){
df.x = df %>% filter(xaxis == x) %>% mutate(categories = fct_reorder(categories, yaxis))
g = g + geom_bar(aes(xaxis, yaxis, fill = categories), df.x,
stat = "identity", position = "identity")}
plot(g)
The levels on df.x actually change to the correct order for every iteration, but the same graph as before gets produced.
I draw a traditional overlapping plot and (if i understood correctly) your desired plot below to compare results:
library(tidyverse)
set.seed(7)
df = tibble(
categories = rep(c("a", "b", "c"), each = 3) %>% factor(),
xaxis = rep(1:3, 3) %>% factor(),
yaxis = runif(9))
ggplot() +
geom_bar(aes(xaxis, yaxis, fill = categories, group=categories), df, alpha=0.8,
stat = "identity", position = position_dodge(width=0.3,preserve = "single"))
df<-df %>% group_by(xaxis) %>% mutate(rank=rank(-yaxis)) %>%
pivot_wider(values_from=yaxis, names_from = rank, values_fill = 0,
names_sort = T, names_prefix = "rank")
print(df)
#> # A tibble: 9 × 5
#> # Groups: xaxis [3]
#> categories xaxis rank1 rank2 rank3
#> <fct> <fct> <dbl> <dbl> <dbl>
#> 1 a 1 0.989 0 0
#> 2 a 2 0 0.398 0
#> 3 a 3 0 0 0.116
#> 4 b 1 0 0 0.0697
#> 5 b 2 0 0 0.244
#> 6 b 3 0.792 0 0
#> 7 c 1 0 0.340 0
#> 8 c 2 0.972 0 0
#> 9 c 3 0 0.166 0
g <- reduce(
map(paste0("rank",1:3),
~geom_bar(aes(xaxis, .data[[.x]], fill=categories), stat="identity", position="identity")),
`+`, .init = ggplot(df) )
g
Created on 2022-11-02 with reprex v2.0.2
EDIT
It is easier, thanks to Park and this post
set.seed(7)
df = tibble(
categories = rep(c("a", "b", "c"), each = 3) %>% factor(),
xaxis = rep(1:3, 3) %>% factor(),
yaxis = runif(9))
df %>% group_by(xaxis) %>% arrange(rank(-yaxis)) %>%
ggplot() + geom_bar(aes(xaxis, yaxis, fill=categories), stat="identity", position="identity")
How about this?
df %>%
arrange(xaxis, yaxis) %>%
group_by(xaxis) %>%
mutate(yaxis = yaxis - lag(yaxis, default = 0)) %>%
ggplot() +
geom_bar(aes(xaxis, yaxis, fill = categories),
stat = "identity", position = "stack")

Plot multiple variable in the same bar plot

With my dataframe that looks like this (I have in total 1322 rows) :
I'd like to make a bar plot with the percentage of rating of the CFS score. It should look similar to this :
With this code, I can make a single bar plot for the column cfs_triage :
ggplot(data = df) +
geom_bar(mapping = aes(x = cfs_triage, y = (..count..)/sum(..count..)))
But I can't find out to make one with the three varaibles next to another.
Thank you in advance to all of you that will help me with making this barplot with the percentage of rating for this three variable !(I'm not sure that my explanations are very clear, but I hope that it's the case :))
Your best bet here is to pivot your data into long format. We don't have your data, but we can reproduce a similar data set like this:
set.seed(1)
df <- data.frame(cfs_triage = sample(10, 1322, TRUE, prob = 1:10),
cfs_silver = sample(10, 1322, TRUE),
cfs_student = sample(10, 1322, TRUE, prob = 10:1))
df[] <- lapply(df, function(x) { x[sample(1322, 300)] <- NA; x})
Now the dummy data set looks a lot like yours:
head(df)
#> cfs_triage cfs_silver cfs_student
#> 1 9 NA 1
#> 2 8 4 2
#> 3 NA 8 NA
#> 4 NA 10 9
#> 5 9 5 NA
#> 6 3 1 NA
If we pivot into long format, then we will end up with two columns: one containing the values, and one containing the column name that the value belonged to in the original data frame:
library(tidyverse)
df_long <- df %>%
pivot_longer(everything())
head(df_long)
#> # A tibble: 6 x 2
#> name value
#> <chr> <int>
#> 1 cfs_triage 9
#> 2 cfs_silver NA
#> 3 cfs_student 1
#> 4 cfs_triage 8
#> 5 cfs_silver 4
#> 6 cfs_student 2
This then allows us to plot with value on the x axis, and we can use name as a grouping / fill variable:
ggplot(df_long, aes(value, fill = name)) +
geom_bar(position = 'dodge') +
scale_fill_grey(name = NULL) +
theme_bw(base_size = 16) +
scale_x_continuous(breaks = 1:10)
#> Warning: Removed 900 rows containing non-finite values (`stat_count()`).
Created on 2022-11-25 with reprex v2.0.2
Maybe you need something like this: The formatting was taken from #Allan Cameron (many Thanks!):
library(tidyverse)
library(scales)
df %>%
mutate(id = row_number()) %>%
pivot_longer(-id) %>%
group_by(id) %>%
mutate(percent = value/sum(value, na.rm = TRUE)) %>%
mutate(percent = ifelse(is.na(percent), 0, percent)) %>%
mutate(my_label = str_trim(paste0(format(100 * percent, digits = 1), "%"))) %>%
ggplot(aes(x = factor(name), y = percent, fill = factor(name), label = my_label))+
geom_col(position = position_dodge())+
geom_text(aes(label = my_label), vjust=-1) +
facet_wrap(. ~ id, nrow=1, strip.position = "bottom")+
scale_fill_grey(name = NULL) +
scale_y_continuous(labels = scales::percent)+
theme_bw(base_size = 16)+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

How do I use summarise properly in R for this simple analysis?

Haven't used RStudio in a while, so I am quite rusty.
I want to create a bar chart showing the countries shipping the most freight weight in ascending order.
I have made this simple script that does the job:
df_new %>%
filter(!is.na(Freight_weight)) %>%
filter(!is.na(origin_name)) %>%
select(origin_name, Freight_weight) %>%
ggplot(aes(x = reorder(origin_name, Freight_weight, FUN = sum), y = Freight_weight)) +
geom_col() +
labs(x = "") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
However, when I try to do more with it, like adding a top_10 clause to only get the countries with the highest shipments, it doesn't work since it takes the 10 highest individual shipments and not per country.
Instead, I have tried something like this:
df_new %>%
group_by(origin_name) %>%
summarise(n = sum(Freight_weight, na.rm = TRUE)) %>%
ungroup() %>%
mutate(share = n /sum(n) %>% factor() %>% fct_reorder(share)) %>%
ggplot(aes(x = origin_name, y = n)) +
geom_col() +
labs(x = "") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
But here, I can't get the share function to work. What am I doing wrong?
Greatly appreciate your input - if I get this down I should be able to do most of the concurrent analyses!
If you want to find the top 10 countries ordered by their corresponding highest
Freight_weight, one possible solution is,
(Note that, I have created more countries, (denoted by Alphabets) and more data)
Hope this helps.
library(dplyr)
set.seed(123)
df_new <- structure(
list(
Freight_weight = runif(200, min = 1, max = 50),
origin_name = sample(LETTERS[1:15], size = 200, replace = TRUE)
),
row.names = c(NA,-200L),
class = c("tbl_df", "tbl",
"data.frame")
)
df_new %>%
group_by(origin_name) %>%
slice_max(order_by = Freight_weight, n = 1) %>%
ungroup() %>%
arrange(desc(Freight_weight)) %>%
slice(1:10)
#> # A tibble: 10 × 2
#> Freight_weight origin_name
#> <dbl> <chr>
#> 1 49.7 N
#> 2 49.3 I
#> 3 49.2 J
#> 4 49.0 F
#> 5 47.9 M
#> 6 47.8 K
#> 7 47.8 E
#> 8 47.4 O
#> 9 47.1 H
#> 10 46.9 G
Created on 2022-07-06 by the reprex package (v2.0.1)

Add means to histograms by group in ggplot2

I am following this source to do histograms by group in ggplot2.
The sample data looks like this:
set.seed(3)
x1 <- rnorm(500)
x2 <- rnorm(500, mean = 3)
x <- c(x1, x2)
group <- c(rep("G1", 500), rep("G2", 500))
df <- data.frame(x, group = group)
And the code:
# install.packages("ggplot2")
library(ggplot2)
# Histogram by group in ggplot2
ggplot(df, aes(x = x, fill = group, colour = group)) +
geom_histogram(alpha = 0.5, position = "identity")
I know that adding a line like:
+geom_vline(aes(xintercept=mean(group),color=group,fill=group), col = "red")
Should allow me to get what I am looking for, but I am obtaining just an histogram with one mean, not a mean by group:
Do you have any suggestions?
I would compute the mean into the dataframe:
library(ggplot2)
library(dplyr)
df %>%
group_by(group) %>%
mutate(mean_x = mean(x))
output is:
# A tibble: 1,000 × 3
# Groups: group [2]
x group mean_x
<dbl> <chr> <dbl>
1 -0.962 G1 0.0525
2 -0.293 G1 0.0525
3 0.259 G1 0.0525
4 -1.15 G1 0.0525
5 0.196 G1 0.0525
6 0.0301 G1 0.0525
7 0.0854 G1 0.0525
8 1.12 G1 0.0525
9 -1.22 G1 0.0525
10 1.27 G1 0.0525
# … with 990 more rows
So do:
library(ggplot2)
library(dplyr)
df %>%
group_by(group) %>%
mutate(mean_x = mean(x)) %>%
ggplot(aes(x, fill = group, colour = group)) +
geom_histogram(alpha = 0.5, position = "identity") +
geom_vline(aes(xintercept = mean_x), col = "red")
Output is:
In addition to the previous suggestion, you can also use separately stored group means, i. e. two instead of nrow=1000 highly redundant values:
## a 'tidy' (of several valid ways for groupwise calculation):
group_means <- df %>%
group_by(group) %>%
summarise(group_means = mean(x, na.rm = TRUE)) %>%
pull(group_means)
## ... ggplot code ... +
geom_vline(xintercept = group_means)
A straightforward method without precomputation would be:
ggplot(df, aes(x = x, fill = group, colour = group)) +
geom_histogram(alpha = 0.5, position = "identity") +
geom_vline(xintercept = tapply(df$x, df$group, mean), col = "red")

Dygraphs in R: Plot Ribbon and mean line of different groups

I recently started working dygraphs in R, and wanted to achieve a ribbon line plot with it.
Currently, I have the below ggplot which displays a ribbon (for data from multiple batches over time) and its median for two groups. Below is the code for it.
ggplot(df,
aes(x=variable, y=A, color=`[category]`, fill = `[category]`)) +
stat_summary(geom = "ribbon", alpha = 0.35) +
stat_summary(geom = "line", size = 0.9) +
theme_minimal()+ labs(x="TimeStamp")
I could add the median solid line on the dygraph, but I'm unable to add the ribbon to it. Below is the dygraph and my code for it.
df_Medians<- df%>%
group_by(variable,`[category]`) %>%
summarise(A = median(A[!is.na(A)]))
median <- cbind(as.ts(df_Medians$A))
dygraph(median) %>%
dyRangeSelector()
Is there anyway to plot something similar to the above ggplot on dygraphs? Thanks in advance.
See if the following serves your purpose:
ggplot code (for mean, replace median_se with mean_se in the stat_summary layers):
library(ggplot2)
ggplot(df,
aes(x=variable, y=A, color=category, fill = category)) +
stat_summary(geom = "ribbon", alpha = 0.35, fun.data = median_se) +
stat_summary(geom = "line", size = 0.9, fun.data = median_se) +
theme_minimal()
dygraph code (for mean, replace median_se with mean_se in the summarise step):
library(dplyr)
library(dygraph)
# calculate summary statistics for each category, & spread results out such that each row
# corresponds to one position on the x-axis
df_dygraph <- df %>%
group_by(variable, category) %>%
summarise(data = list(median_se(A))) %>%
ungroup() %>%
tidyr::unnest(data) %>%
mutate(category = as.integer(factor(category))) %>% # optional: standardizes the column
# names for summary stats
tidyr::pivot_wider(id_cols = variable, names_from = category,
values_from = c(ymin, y, ymax))
> head(df_dygraph)
# A tibble: 6 x 7
variable ymin_1 ymin_2 y_1 y_2 ymax_1 ymax_2
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 3817. 2712. 4560. 2918. 5304. 3125.
2 2 3848. 2712. 4564. 2918. 5279. 3125.
3 3 3847. 2826. 4564 2961 5281. 3096.
4 4 3722. 2827. 4331 2962. 4940. 3098.
5 5 3833. 2831. 4570. 2963 5306. 3095.
6 6 3835. 2831. 4572 2964 5309. 3097.
dygraph(df_dygraph, main = "Dygraph title") %>%
dySeries(c("ymin_1", "y_1", "ymax_1"), label = "Category 1") %>%
dySeries(c("ymin_2", "y_2", "ymax_2"), label = "Category 2") %>%
dyRangeSelector()
Code for median counterpart of mean_se:
median_se <- function(x) {
x <- na.omit(x)
se <- sqrt(var(x) / length(x))
med <- median(x)
ggplot2:::new_data_frame(list(y = med,
ymin = med - se,
ymax = med + se),
n = 1)
}
Sample data:
df <- diamonds %>%
select(price, cut) %>%
filter(cut %in% c("Fair", "Ideal")) %>%
group_by(cut) %>%
slice(1:1000) %>%
mutate(variable = rep(seq(1, 50), times = 20)) %>%
ungroup() %>%
rename(A = price, category = cut)

Resources