Stacked Histogram per day with ggplot2 - r

I have my data here in this googledoc
That looks like this:
# A tibble: 57 × 3
date n_sym n_rep
<date> <dbl> <dbl>
1 2020-06-01 153 63
2 2020-06-02 206 168
3 2020-06-03 192 202
4 2020-06-04 168 247
5 2020-06-05 155 211
6 2020-06-06 150 155
7 2020-06-07 100 85
8 2020-06-08 192 125
9 2020-06-09 182 195
10 2020-06-10 198 234
# … with 47 more rows
I would like to create a stacked histogram with daily bins, like something in this figure.
Where: n_sym and n_rep are counts stacked one over each other.
I can't understand how to proceed....

This way you could modify to achieve your desired plot:
library(tidyverse)
library(scales)
df1 <- df %>%
pivot_longer(
-date
) %>%
mutate(date = as.Date(date),
name = ifelse(name=="n_sym", "Onset of symptoms", "Date of reporting"))
ggplot(df1, aes(x=date, y=value, fill=name))+
geom_col()+
xlab("Onset of symptoms, alternatively date of reporting (2020)") +
ylab("Number of reported cases") +
scale_fill_manual(values = c("#ffc000", "#045aa0"))+
scale_x_date(date_breaks = "1 day", labels = date_format("%d/%m")) +
scale_y_continuous(expand = c(0, 0), limits = c(0, max(df1$value)),
breaks=seq(0,max(df1$value),100))+
theme_classic() +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))+
theme(legend.position="bottom")+
guides(fill=guide_legend(title=""))+
coord_fixed(ratio = .05)+
theme(axis.title = element_text(size = 16))

Related

Same y-axis scale with ggarrange

I would like to use ggarrange to combine two plots into a single one, and use the same scaling for the y-axis to make it easier to compare the values, in my case, temperature calculations.
This is my sample data:
Sample Group Temperature_A Temperature_B
1 Sample01 A 20 34
2 Sample02 B 21 31
3 Sample03 A 25 35
4 Sample04 A 18 27
5 Sample05 B 19 29
6 Sample06 A 22 32
7 Sample07 B 23 33
8 Sample08 B 17 28
9 Sample09 A 15 26
10 Sample10 B 28 39
Using the following code, I get the following plot:
p1 <- ggplot(df, aes(Group, Temperature_A)) +
geom_boxplot()
p2 <- ggplot(df, aes(Group, Temperature_B)) +
geom_boxplot()
plate <- ggarrange(p1, p2, align = "hv")
plate
What I would like to have is for them to share their y-axis so that I can easily compare the two temperature calculation methods. I can do this manually with + scale_y_continuous(limits = c(a,b)) where I set the appropriate values for a and b, however, I have a lot of different dataframes with different temperature ranges. Ideally, I would use the lowest value (+ some space) from both plots, and the highest value (+ some space) from both plots for the limits of both plots. Is there a way to achieve this?
My manual approach and the desired output:
p1 <- ggplot(df, aes(Group, Temperature_A)) +
geom_boxplot() +
scale_y_continuous(limits = c(10, 40))
p2 <- ggplot(df, aes(Group, Temperature_B)) +
geom_boxplot() +
scale_y_continuous(limits = c(10, 40))
plate <- ggarrange(p1, p2, align = "hv")
plate
This is just to illustrate how you could use facets. Because you have two columns for temperature, this is called the 'wide format'. If you reshape the data to the long format, you can easily use facets.
library(ggplot2)
df <- read.table(text = " Sample Group Temperature_A Temperature_B
1 Sample01 A 20 34
2 Sample02 B 21 31
3 Sample03 A 25 35
4 Sample04 A 18 27
5 Sample05 B 19 29
6 Sample06 A 22 32
7 Sample07 B 23 33
8 Sample08 B 17 28
9 Sample09 A 15 26
10 Sample10 B 28 39", header = TRUE)
# Reshape
long <- tidyr::pivot_longer(
df, c("Temperature_A", "Temperature_B"),
values_to = "Temperature", names_to = "Temp_name"
)
# Example of facets
ggplot(long) +
geom_boxplot(aes(Group, Temperature)) +
facet_wrap(~ Temp_name)
Created on 2021-07-27 by the reprex package (v1.0.0)
I agree with teunbrand. Here is an slightly modified approach:
bring data in long format
ignore outlieres outlier.shape = NA
adapt ylim and
facet_wrap
library(tidyverse)
df1 <- df %>%
pivot_longer(
cols = starts_with("Temperature"),
names_to = "Temperature",
values_to = "values"
)
ggplot(df1, aes(Group, values)) +
geom_boxplot(outlier.shape = NA) +
coord_cartesian(ylim = c(10, 40)) +
facet_wrap(~Temperature) +
theme_bw()
Specifically for J.Doe who appeared unsatisfied with the current accepted answer, I'll post another answer which again ignores the 'ggarrange' part, but generalises better. Instead, the {patchwork} package is used.
Notably, you can use & scale_y_continuous(...) to apply a scale to all previous plots. You can use the plot + plot syntax to combine plots.
library(ggplot2)
library(patchwork)
df <- read.table(text = " Sample Group Temperature_A Temperature_B
1 Sample01 A 20 34
2 Sample02 B 21 31
3 Sample03 A 25 35
4 Sample04 A 18 27
5 Sample05 B 19 29
6 Sample06 A 22 32
7 Sample07 B 23 33
8 Sample08 B 17 28
9 Sample09 A 15 26
10 Sample10 B 28 39", header = TRUE)
p1 <- ggplot(df, aes(Group, Temperature_A)) +
geom_boxplot()
p2 <- ggplot(df, aes(Group, Temperature_B)) +
geom_boxplot()
p1 + p2 & scale_y_continuous(limits = c(10, 40))
The same works if you have a list of plots and use wrap_plot().
plot_list <- list(p1, p2)
wrap_plots(plot_list) & scale_y_continuous(limits = c(10, 40))
Created on 2022-12-08 by the reprex package (v2.0.0)

Get the proportions in ggplot2 (R) bar charts

Can someone provide me some hints as to what I am doing wrong in my code? Or what I need to correct to get the correct percentages? I am trying to get the proportions by manipulating my ggplot2 code. I would prefer not mutating a column. However, if I can't get ggplot2 to give me the correct proportions, I will then be open to adding columns.
Here is the reproduceable data:
cat_type<-c("1", "1","2","3","1","3", "3","2","1","1","1","3","3","2","3","2","3","1","3","3","3","1","3","1","3","1","1","3","1")
country<-c("India","India","India","India","India","India","India","India","India","India","Indonesia","Russia","Indonesia","Russia","Russia","Indonesia","Indonesia","Indonesia","Indonesia","Russia","Indonesia","Russia","Indonesia","Indonesia","Russia", "Russia", "India","India","India")
bigcats<-data.frame(cat_type=cat_type,country=country)
My data gives me the following proportions (these are correct):
> table(bigcats$cat_type, bigcats$country) ## raw numbers
India Indonesia Russia
1 7 3 2
2 2 1 1
3 4 5 4
>
> 100*round(prop.table(table(bigcats$cat_type, bigcats$country),2),3) ## proportions by column total
India Indonesia Russia
1 53.8 33.3 28.6
2 15.4 11.1 14.3
3 30.8 55.6 57.1
However, my ggplot2 is giving me the incorrect proportions:
bigcats %>% ggplot(aes(x=country, y = prop.table(stat(count)), fill=cat_type, label = scales::percent(prop.table(stat(count)))))+
geom_bar(position = position_fill())+
geom_text(stat = "count", position = position_fill(vjust=0.5),colour = "white", size = 5)+
labs(y="Percent",title="Top Big Cat Populations",x="Country")+
scale_fill_discrete(name=NULL,labels=c("Siberian/Bengal", "Other wild cats", "Puma/Leopard/Jaguar"))+
scale_y_continuous(labels = scales::percent)
The issue is that using prop.table(stat(count)) will not compute the proportions by categories or your countries, i.e. you do:
library(dplyr)
bigcats %>%
count(cat_type, country) %>%
mutate(pct = scales::percent(prop.table(n)))
#> cat_type country n pct
#> 1 1 India 7 24.1%
#> 2 1 Indonesia 3 10.3%
#> 3 1 Russia 2 6.9%
#> 4 2 India 2 6.9%
#> 5 2 Indonesia 1 3.4%
#> 6 2 Russia 1 3.4%
#> 7 3 India 4 13.8%
#> 8 3 Indonesia 5 17.2%
#> 9 3 Russia 4 13.8%
Making use of a helper function to reduce code duplication you could compute your desired proportions like so:
library(ggplot2)
prop <- function(count, group) {
count / tapply(count, group, sum)[group]
}
ggplot(bigcats, aes(
x = country, y = prop(after_stat(count), after_stat(x)),
fill = cat_type, label = scales::percent(prop(after_stat(count), after_stat(x)))
)) +
geom_bar(position = position_fill()) +
geom_text(stat = "count", position = position_fill(vjust = 0.5), colour = "white", size = 5) +
labs(y = "Percent", title = "Top Big Cat Populations", x = "Country") +
scale_fill_discrete(name = NULL, labels = c("Siberian/Bengal", "Other wild cats", "Puma/Leopard/Jaguar")) +
scale_y_continuous(labels = scales::percent)
Created on 2021-07-28 by the reprex package (v2.0.0)

How to change behaviour of `check_overlap = TRUE`?

My data is structured as follows:
> Comparison
# A tibble: 12 x 3
round TotalShots Year
<int> <dbl> <dbl>
1 1 70 2021
2 2 68 2021
3 3 76 2021
4 4 73 2021
5 5 66 2021
6 6 70 2021
7 1 115 2020
8 2 106 2020
9 3 75 2020
10 4 73 2020
11 5 82 2020
12 6 84 2020
I can plot this in ggplot2 via:
ggplot(Comparison, aes(x = round, y = TotalShots,
colour = factor(Year), label = TotalShots)) +
geom_line() +
geom_point(size = 14) +
geom_text(colour = "black", size = 5, check_overlap = TRUE)
However, in the plot, I have the label, at Rd3 printing as 76 and not 75. I assume this is because of check_overlap = TRUE however the plot is wrong, as year = 2020 for round = 3 should have the label of 75 and not 76.
Is there any way to please fix this?
You can try using ggrepel library for clarity of labels and to avoid overlapping.
library(ggrepel)
library(ggplot2)
ggplot(Comparison, aes(x = round, y = TotalShots,
colour = factor(Year), label = TotalShots)) +
geom_line() +
geom_point(size = 14) +
geom_label_repel(colour = "black", size = 5, nudge_y = 0.8)

Geom_Text removes geom_bar in ggplot2 R

I have the following R code for a stacked bar chart:
p <- ggplot(df, aes(x = Datum, y = anzahl_tn, fill = ZUSAMMENFASSUNG))+
geom_bar(stat="identity", color='black') +
#geom_text(aes(label = paste0(round(100*percolumn),"%"), y = pos),size = 3)+
scale_fill_manual(values = c("black","#DD1E0D","#003087","#6fa554","#7DABFF","#d6d6d6"))
ggplotly(p)
The result is this:
enter image description here
I would like to add labels via geom_text. However, as soon as I do so, the bars vanish:
p <- ggplot(df, aes(x = Datum, y = anzahl_tn, fill = ZUSAMMENFASSUNG))+
geom_bar(stat="identity", color='black') +
geom_text(aes(label = paste0(round(100*percolumn),"%"), y = pos),size = 3)+
scale_fill_manual(values = c("black","#DD1E0D","#003087","#6fa554","#7DABFF","#d6d6d6"))
ggplotly(p)
enter image description here
Data:
Datum ZUSAMMENFASSUNG anzahl_tn percolumn pos
<date> <fct> <dbl> <dbl> <dbl>
1 2020-10-01 A 9548 0.258 2745326
2 2020-10-01 B 8213 0.222 2040286.
3 2020-10-01 C 5887 0.159 1404390.
4 2020-10-01 D 4192 0.113 932105
5 2020-10-01 E 5043 0.136 525418.
6 2020-10-01 F 4106 0.111 194945
7 2020-11-01 A 10603 0.267 3082634.
8 2020-11-01 B 9235 0.233 2099054.
9 2020-11-01 C 6108 0.154 1452656
10 2020-11-01 D 4380 0.110 1009419
Any idea what causes this?

Add percentage labels to stacked bar chart ggplot2

I'm stuck on creating a graph in ggplot2. I'm trying to create a stacked bar chart with percentages, similar to the graph on this page, but I am struggling to add percentage labels in the bars: How to draw stacked bars in ggplot2 that show percentages based on group?
All of the answers I've found to try and add percentage labels use something similar to the code
geom_text(aes(label = label), position = position_stack(vjust = 0.5),
size = 2)
but it is not working for me.
My data looks like this:
County Group Plan1 Plan2 Plan3 Plan4 Plan5 Total
County1 Group1 2019 597 513 5342 3220 11691
County2 Group1 521 182 130 1771 731 3335
County3 Group1 592 180 126 2448 1044 4390
County4 Group1 630 266 284 2298 937 4415
County5 Group1 708 258 171 2640 1404 5181
County6 Group1 443 159 71 1580 528 2781
County7 Group1 492 187 157 1823 900 3559
County8 Group1 261 101 84 1418 357 2221
My graph without the percentages looks like this:
Code:
melt(df[df$Group=="Group1",],measure.vars = c("Plan1","Plan2","Plan3","Plan4", "Plan5"),variable.name = "Counties",value.name = "value") %>%
ggplot(aes(x=County,y=value,fill=Counties))+
geom_bar(stat = "identity",position="fill", color="black", width=0.9) +
labs(y="Percent", fill="Plan Type") + ylab("Percentage") + coord_flip() + scale_y_continuous(labels=scales::percent)
After using the geom_text() code above, it turns into this mess:
Code:
melt(df[df$Group=="Group1",],measure.vars = c("Plan1","Plan2","Plan3","Plan4", "Plan5"),variable.name = "Counties",value.name = "value") %>%
ggplot(aes(x=County,y=value,fill=Counties))+
geom_bar(stat = "identity",position="fill", color="black", width=0.9) +
labs(y="Percent", fill="Plan Type") + ylab("Percentage") + coord_flip() + scale_y_continuous(labels=scales::percent)+
geom_text(aes(label=paste0(round(value/100),"%")), position=position_stack(vjust=0.5))
Any suggestions? Any advice/guidance is greatly appreciated! Thank you!!
Your approach did not work because the labels are not in % but the raw values. You have to do the stats on your own:
df <- read.table(text="County Group Plan1 Plan2 Plan3 Plan4 Plan5 Total
County1 Group1 2019 597 513 5342 3220 11691
County2 Group1 521 182 130 1771 731 3335
County3 Group1 592 180 126 2448 1044 4390
County4 Group1 630 266 284 2298 937 4415
County5 Group1 708 258 171 2640 1404 5181
County6 Group1 443 159 71 1580 528 2781
County7 Group1 492 187 157 1823 900 3559
County8 Group1 261 101 84 1418 357 2221", header = TRUE)
library(tidyverse)
df %>%
filter(Group == "Group1") %>%
select(-Total) %>%
gather(key = `Plan Type`, value = value, -County, -Group) %>%
group_by(County, Group) %>%
mutate(Percentage = value/sum(value)) %>%
ggplot(aes(x = County, y = Percentage, fill = `Plan Type`, label = paste0(round(Percentage*100), "%"))) +
geom_col(position = position_stack(), color = "black") +
geom_text(position = position_stack(vjust = .5)) +
coord_flip() +
scale_y_continuous(labels = scales::percent_format())
Edit:
The code above works as well for more plans as well as for more groups, but the plot will not account for that. Just add facet_wrap to produce also a flexible plot regarding the groups:
df %>%
filter(Group == "Group1") %>%
select(-Total) %>%
gather(key = `Plan Type`, value = value, -County, -Group) %>%
group_by(County, Group) %>%
mutate(Percentage = value/sum(value)) %>%
ggplot(aes(x = County, y = Percentage, fill = `Plan Type`, label = paste0(round(Percentage*100), "%"))) +
geom_col(position = position_stack(), color = "black") +
geom_text(position = position_stack(vjust = .5)) +
coord_flip() +
scale_y_continuous(labels = scales::percent_format()) +
facet_wrap(~Group)

Resources