facet grid of multiple barplots containing group means [closed] - r

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I have a dataframe with variables Group (demented, nondemented), sex (M, F) and Age.Group ( (60,70], (70,80], (80,90], (90,100] ) and MMSE (a continuous numerical variable representing a "score" on an assessment). I want to create a bar graph with a bar for each group (demented versus non demented) and the group mean on the y-axis. I then want to create a facet grid so there are 8 separate bar charts, one for M and one for F in each age group. Can't figure out how to get a bar graph in ggplot to have a different variable on the y-axis. Would appreciate any tips!

It sounds like your data is a bit like this:
set.seed(69)
age <- cut(sample(1:39, 500, TRUE) + 60, breaks = 10 * 6:10)
condition <- factor(sample(c("demented", "not demented"), 500, TRUE))
MMSE <- as.numeric(condition) * 10 + 10 - as.numeric(age) + 1 + rnorm(500)
MMSE <- round(MMSE)
MMSE[MMSE > 30] <- 30
sex <- sample(c("male", "female"), 500, TRUE)
df <- data.frame(age, condition, MMSE, sex)
Then you have various options to present your data. One is to have all sixteen bars in a single plot:
library(ggplot2)
library(dplyr)
df %>%
group_by(age, sex, condition) %>%
summarize(MMSE = mean(MMSE)) %>%
ggplot(aes(condition, MMSE, fill = sex)) +
geom_col(position = position_dodge(width = 1)) +
facet_grid(~age, switch = "x") +
theme_bw() +
theme(panel.spacing = unit(0, "points"),
panel.border = element_blank(),
axis.line = element_line(),
strip.background = element_blank(),
strip.placement = "outside")
Or facet by sex and condition:
df %>%
group_by(age, sex, condition) %>%
summarize(MMSE = mean(MMSE)) %>%
ggplot(aes(age, MMSE, fill = sex)) +
geom_col(position = position_dodge(width = 1)) +
facet_grid(condition~sex) +
theme_bw()
Or just by sex:
df %>%
group_by(age, sex, condition) %>%
summarize(MMSE = mean(MMSE)) %>%
ggplot(aes(age, MMSE, fill = condition)) +
geom_col(position = position_dodge(width = 1)) +
facet_grid(~sex) +
theme_bw()
Or just by condition:
df %>%
group_by(age, sex, condition) %>%
summarize(MMSE = mean(MMSE)) %>%
ggplot(aes(age, MMSE, fill = sex)) +
geom_col(position = position_dodge(width = 1)) +
facet_grid(~condition) +
theme_bw()

Related

summary and plot for multiple subgroups(more columns)

I am interested in two things 1) Summary for multiple subgroups in the same table and 2) dotplot for the subgroups based on the summary generated in step1.
For example ,
if this is my dataset
data("pbc")
I like to generate summary of cholesterol (chol), by sex, stage, ascites and spiders for two treatment levels 1, 2
table(pbc$trt)
1 2
158 154
I can do this separately like this.
library(Hmisc)
summary(chol ~ sex + stage + ascites + spiders, data = subset(pbc, trt=1))
summary(chol ~ sex + stage + ascites + spiders, data = subset(pbc, trt=2))
This creates two separate summaries.
Two different corresponding plots
plot(summary(chol ~ sex + stage + ascites + spiders, data = subset(pbc, trt=1)))
plot(summary(chol ~ sex + stage + ascites + spiders, data = subset(pbc, trt=2)))
I like the summaries to be in one table , two columns 1 column for trt=1 and 2nd column for trt=2
N
chol (trt=1)
chol (trt=2)
sex
m
..
..... .
.... ..
f
..
..... .
.... ..
And the plot side by side. 1st plot for trt=1 , second plot for trt=2
Kindly suggest suggest how to scale the Hmisc:::summary.formula , summary function to 1) show summaries by subgroups side-by-side 2) Plot the summaries side-by-side. Thanks.
Please note that your current summaries and plots are identical; despite using subset with the two levels of trt, your two posted plots are identical. You can use filter to definitively filter by the levels of trt.
First, I prefer gtsummary with my tables, since you can use tbl_continuous to make one singular table instead of trying to combine two tables. Second, you will likely encounter difficulty trying to combine your two plots since you're using base R plotting functions on Hmisc summary objects. Even trying to save each plot to an object will result in NULL. In the long run, it may be easier to recreate each plot using ggplot and combining with cowplot::plot_grid.
library(survival)
library(Hmisc)
# create combined summary
library(gtsummary)
library(tidyverse)
data(pbc)
df <- pbc %>%
select(id, trt, chol, sex, stage, ascites, spiders) %>%
mutate(across(c(sex, stage, ascites, spiders), as.factor)) %>%
mutate(trt = factor(trt)) %>%
mutate(chol = as.numeric(chol))
dftrt1 <- df %>% filter(trt == 1)
dftrt2 <- df %>% filter(trt == 2)
df %>%
select(trt, chol, sex, stage, ascites, spiders) %>%
tbl_continuous(variable = chol,
digits = everything() ~ 2,
statistic = everything() ~ "{mean}",
label = list(sex ~ "Sex",
stage ~ "Stage",
ascites ~ "Ascites",
spiders ~ "Spiders"),
by = trt)
# create combined plot
library(cowplot)
p1 <- dftrt1 %>%
select(-trt) %>% pivot_longer(cols = -c(id, chol)) %>% group_by(name, value) %>%
summarise(chol = mean(chol, na.rm = TRUE)) %>%
ggplot(aes(x = value, y = chol, fill = factor(value))) +
geom_point() + coord_flip() +
facet_wrap(~name, scales = "free_y", nrow = 4, strip.position = "top") +
theme(panel.spacing = unit(0, "lines"),
panel.border = element_rect(fill = NA),
strip.background = element_blank(),
axis.title.y = element_blank(),
legend.position = "none",
strip.placement = "outside") +
ggtitle("trt = 1") + theme(plot.title = element_text(hjust = 0.5))
p2 <- dftrt2 %>%
select(-trt) %>% pivot_longer(cols = -c(id, chol)) %>% group_by(name, value) %>%
summarise(chol = mean(chol, na.rm = TRUE)) %>%
ggplot(aes(x = value, y = chol, fill = factor(value))) +
geom_point() + coord_flip() +
facet_wrap(~name, scales = "free_y", nrow = 4, strip.position = "top") +
theme(panel.spacing = unit(0, "lines"),
panel.border = element_rect(fill = NA),
strip.background = element_blank(),
axis.title.y = element_blank(),
legend.position = "none",
strip.placement = "outside") +
ggtitle("trt = 2") + theme(plot.title = element_text(hjust = 0.5))
plot_grid(p1, p2, ncol = 2)

How to automatically choose a good ylim to read geom_labels in ggplot2 in R

Suppose I write the following code with the diamonds dataset:
library(tidyverse)
diamonds %>%
group_by(cut) %>%
summarize(total_value = sum(price, na.rm = TRUE)) %>%
arrange(total_value) %>%
mutate(cut = as_factor(cut)) %>%
mutate(across(where(is.numeric), ~round(., 1))) %>%
ggplot(aes(x = cut, y = total_value)) +
geom_col(aes(fill = cut)) +
theme(legend.position = "note") +
coord_flip() +
geom_label(aes(label = paste0("$", total_value)), size = 6) +
labs(title = "Total Value of Diamonds by Cut", y = "USD", x = "") +
theme(axis.text = element_text(size = rel(1)))
which outputs the following plot:
As you can see, it is impossible to read the last digit(s) of the first category ("Ideal").
So, my question is, I know I can simply write something like coord_flip(ylim = c(0,80000000) and this would solve the problem; however, what could I write instead for ggplot2 to automatically know by itself how much space it should provide in ylim for people to clearly read the geom_label()s without me having to do this manually?
I'm trying to create an automatic Dashboard with multiple plots such as this, but I cannot manually tune every one of those, I need an automatic mechanism and I haven't found anything regarding this on StackOverflow for geom_label() specifically.
Thanks.
Instead of positioning your label at the the bar, you could move it closer to the middle and adjust position with vjust so it won't spill out of the plot set to include the bars.
library(tidyverse)
diamonds %>%
group_by(cut) %>%
summarize(total_value = sum(price, na.rm = TRUE)) %>%
arrange(total_value) %>%
mutate(cut = as_factor(cut)) %>%
mutate(across(where(is.numeric), ~round(., 1))) %>%
ggplot(aes(x = cut, y = total_value)) +
geom_col(aes(fill = cut)) +
theme(legend.position = "note") +
coord_flip() +
geom_label(aes(label = paste0("$", total_value), y = total_value/2), size = 6, hjust = 0.2) +
labs(title = "Total Value of Diamonds by Cut", y = "USD", x = "") +
theme(axis.text = element_text(size = rel(1)))
That gives:

How to do a bar graphic with multiple columns out of an excel archive?

How can I make a graphic bar using barplot() or ggplopt() of an excel archive that has 83 columns?
I need to plot every column that has a >0 value on ich raw. (ich column represents a gene function and I need to know how many functions there is on ich cluster).
Iwas trying this,but it didn't work:
ggplot(x, aes(x=Cluster, y=value, fill=variable)) +
geom_bar(stat="bin", position="dodge") +
theme_bw() +
ylab("Funções no cluster") +
xlab("Cluster") +
scale_fill_brewer(palette="Blues")
Link to the excel:
https://github.com/annabmarques/GenesCorazon/blob/master/AllclusPathwayEDIT.xlsx
What about a heatmap? A rough example:
library(dplyr)
library(tidyr)
library(ggplot2)
library(openxlsx)
data <- read.xlsx("AllclusPathwayEDIT.xlsx")
data <- data %>%
mutate(cluster_nr = row_number()) %>%
pivot_longer(cols = -c(Cluster, cluster_nr),
names_to = "observations",
values_to = "value") %>%
mutate(value = as.factor(value))
ggplot(data, aes(x = cluster_nr, y = observations, fill = value)) +
geom_tile() +
scale_fill_brewer(palette = "Blues")
Given the large number of observations consider breaking this up into multiple charts.
It's difficult to understand exactly what you're trying to do. Is this what you're trying to achieve?
#install.packages("readxl")
library(tidyverse)
library(readxl)
read_excel("AllclusPathwayEDIT.xlsx") %>%
pivot_longer(!Cluster, names_to = "gene_counts", values_to = "count") %>%
mutate(Cluster = as.factor(Cluster)) %>%
ggplot(aes(x = Cluster, y = count, fill = gene_counts)) +
geom_bar(position="stack", stat = "identity") +
theme(legend.position = "right",
legend.key.size = unit(0.4,"line"),
legend.text = element_text(size = 7),
legend.title = element_blank()) +
guides(fill = guide_legend(ncol = 1))
ggsave(filename = "example.pdf", height = 20, width = 35, units = "cm")

fill and group bar graphs by different variables

I am trying to create faceted geom_bar graphs with the following charactaristics:
The proportion of each answer per question is shown
Each bar is colored according to the response
The plot is faceted by question
I seem to be able to do any two of the conditions, but not all 3.
Question:
Is there a way to facet and calculate proportions using one variable, but colour/fill based on another variable?
Code:
df <- data.frame(
Question = rep(c('A', 'B', 'C'), each = 5),
Resp = sample(c('Yes', 'No', 'Unsure', NA), 15, T, c(0.3,0.3,0.3,0.1)),
stringsAsFactors = F
)
# Plot 1: grouping by question to get the right proportions, but has no colour
ggplot(df, aes(x = Resp, fill = Resp)) +
stat_count(aes(y = ..prop.., group = Question)) +
scale_y_continuous(labels = scales::percent_format()) +
facet_wrap(~ Question)
# Plot 2: grouping by response to get colour, but has wrong proportions
ggplot(df, aes(x = Resp, fill = Resp)) +
stat_count(aes(y = ..prop.., group = Resp)) +
scale_y_continuous(labels = scales::percent_format()) +
facet_wrap(~ Question)
Outputs:
This is a "ggplot2-only" option:
ggplot(df, aes(x = Resp)) +
geom_bar(aes(y = ..prop.., group = Question, fill = factor(..x..)), position = "dodge") +
scale_y_continuous(labels = scales::percent_format()) +
scale_fill_discrete(name = "Response", labels = c("No", "Unsure", "Yes", "NA")) +
facet_wrap(~ Question)
One way could be to calculate the proportions and then plot.
library(dplyr)
library(ggplot2)
df %>%
count(Question, Resp) %>%
group_by(Question) %>%
mutate(n = n/sum(n) * 100) %>%
ggplot() + aes(Resp, n, fill = Resp) +
geom_col() +
facet_wrap(~Question)
Plot without facet
df$n <- 1
df <- df %>% group_by(Question, Resp) %>% summarise(n = sum(n))
ggplot(df, aes(x=factor(Question), y=n, fill=Resp)) + geom_col()
Plot with facet
df <- df %>% group_by(Question, Resp) %>% summarise(n = sum(n)) %>% mutate(prop = n/5)
ggplot(df, aes(x=factor(Resp), y=prop, fill=Resp)) + geom_col() + facet_wrap(~Question)

Overlaying barplot with line graphs using ggplot2

My question is similar to those posted here and here.
I am working on creating a graph in ggplot where I have one bar plot and then want to overlay multiple line graphs. For the purposes of this question, I have reproduced my code for two barplots (one that includes all years (2007-2015) and two from specific years (2007 and 2015), but ultimately I will be overlaying data from 10 different years. The data used can be found here.
library(dplyr)
library(tidyr)
library(gridExtra)
library(ggplot2)
overallpierc<-data[(data$item=="piercing"),]
overp<-overallpierc %>%
group_by(age) %>%
count(sex) %>%
ungroup %>%
mutate(age = factor(age)) %>%
complete(age, sex, fill = list(n = 0)) %>%
ggplot(aes(age, n)) + geom_col(aes(fill = sex), position = "dodge") +
theme_classic() +
scale_fill_manual(values=c("#000000", "#CCCCCC"), name = "Sex") +
labs(x = "Age", y = "Number of observations") +
theme(legend.position=c(0.4,0.8),
plot.title = element_text(size = 10),
legend.title=element_text(size=15),
axis.title=element_text(size=15),
legend.key.size = unit(1.13, "cm"),
legend.direction="vertical",
legend.text=element_text(size=15))
p07<-data[(data$yy=="2007") & (data$item=="piercing"),]
summary(p07)
subp07<-p07 %>%
group_by(age) %>%
count(sex) %>%
ungroup %>%
mutate(age = factor(age)) %>%
complete(age, sex, fill = list(n = 0)) %>%
ggplot(aes(age, n)) + geom_col(aes(fill = sex), position = "dodge") +
theme_classic() +
scale_fill_manual(values=c("#000000", "#CCCCCC"), name = "Sex") +
labs(x = "Age", y = "Number of observations") +
theme(legend.position=c(0.4,0.8),
plot.title = element_text(size = 10),
legend.title=element_text(size=15),
axis.title=element_text(size=15),
legend.key.size = unit(1.13, "cm"),
legend.direction="vertical",
legend.text=element_text(size=15))
p15<-data[(data$yy=="2015") & (data$item=="piercing"),]
subp15<-p15 %>%
group_by(age) %>%
count(sex) %>%
ungroup %>%
mutate(age = factor(age)) %>%
complete(age, sex, fill = list(n = 0)) %>%
ggplot(aes(age, n)) + geom_col(aes(fill = sex), position = "dodge") +
theme_classic() +
scale_fill_manual(values=c("#000000", "#CCCCCC"), name = "Sex") +
labs(x = "Age", y = "Number of observations") +
theme(legend.position=c(0.4,0.8),
plot.title = element_text(size = 10),
legend.title=element_text(size=15),
axis.title=element_text(size=15),
legend.key.size = unit(1.13, "cm"),
legend.direction="vertical",
legend.text=element_text(size=15))
grid.arrange(overp, subp07, subp15)
The code I have posted gives me the following figure.
What I am trying to do is plot the frequencies for females in 2007 and 2015 and males in 2007 and 2015 on top of the barplot for total frequencies (where this is also reflected in the legend). Is there a way to do that in R using ggplot2?
UPDATE: I tried using the geom_smooth and geom_line functions to add the lines to my ggplot as suggested in the comments and as other solutions to users questions, but I get the following error:
Error: Discrete value supplied to continuous scale
I created a new data frame for a subset that I would like to plot:
df<-data.frame(age=c(15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,40,50,60), val=c(0,5,13,77,70,106,62,51,46,27,46,16,22,16,14,48,21, 3,4))
And then added it to the ggplot code:
overallpierc %>%
filter(age != "15") %>%
group_by(age) %>%
count(sex) %>%
ungroup %>%
mutate(age = factor(age)) %>%
complete(age, sex, fill = list(n = 0)) %>%
ggplot(aes(age, n)) +
geom_line(data=df,aes(x=as.numeric(age),y=val),colour="blue") +
geom_col(aes(fill = sex), position = "dodge") +
theme_classic() +
scale_fill_manual(values=c("#000000", "#CCCCCC"), name = "Sex") +
labs(x = "Age", y = "Number of observations") +
theme(legend.position=c(0.4,0.8),
plot.title = element_text(size = 10),
legend.title=element_text(size=15),
axis.title=element_text(size=15),
legend.key.size = unit(1.13, "cm"),
legend.direction="vertical",
legend.text=element_text(size=15))
Others have encountered similar issues and used as.numeric to solve the problem. However, age needs to be treated as a factor for the purposes of plotting.
Based on our discussion in the comments, let's try stacked bars and facets. I think it works but you can decide for yourself.
The stacked bar has the advantage of showing both proportions and total count in the same bar. To compare years, a facet grid places years in rows, so the eye can scan downwards to compare the same age in different years. Note that I kept age as a continuous variable here, rather than a factor.
library(dplyr)
library(ggplot2)
data30g %>%
count(yy, sex, age) %>%
ggplot(aes(age, n)) +
geom_col(aes(fill = sex)) +
facet_grid(yy ~ .) +
theme_bw() +
scale_fill_manual(values = c("#000000", "#cccccc"))
Not bad - I can see straight away, for example, an increase in both total and female count at age 30 over time, but perhaps a little small and crowded.
We can use a facet wrap instead of a grid to make the bars clearer, but at the expense of quick visual comparison across years.
data30g %>%
count(yy, sex, age) %>%
ggplot(aes(age, n)) +
geom_col(aes(fill = sex)) +
facet_wrap(~yy, ncol = 2) +
theme_bw() +
scale_fill_manual(values = c("#000000", "#cccccc"))
One more example which does not address your question in terms of total counts or barplots - but I thought it might be of interest. This code generates a "heatmap" style of plot which is poor for quantitative comparison, but can sometimes give a quick visual impression of interesting features. I think it shows, for example, that females aged 20 in 2014 have the highest total count.
data30g %>%
count(yy, sex, age) %>%
ggplot(aes(factor(age), yy)) +
geom_tile(aes(fill = n)) +
facet_grid(sex ~ .) +
scale_fill_gradient2() +
scale_y_reverse(breaks = 2006:2015) +
labs(x = "age", y = "Year")
EDIT:
Based on further discussions in the comments, here is one way to plot age as a factor, using bars for sexes, overlaid with a line for the totals and split by year.
overallpierc %>%
count(yy, sex, age) %>%
ggplot() +
geom_col(aes(factor(age), n, fill = sex), position = "dodge") +
stat_summary(aes(factor(age), n), fun.y = "sum", geom = "line", group = 1) +
facet_grid(yy ~ .)

Resources