Geom_Text removes geom_bar in ggplot2 R - r

I have the following R code for a stacked bar chart:
p <- ggplot(df, aes(x = Datum, y = anzahl_tn, fill = ZUSAMMENFASSUNG))+
geom_bar(stat="identity", color='black') +
#geom_text(aes(label = paste0(round(100*percolumn),"%"), y = pos),size = 3)+
scale_fill_manual(values = c("black","#DD1E0D","#003087","#6fa554","#7DABFF","#d6d6d6"))
ggplotly(p)
The result is this:
enter image description here
I would like to add labels via geom_text. However, as soon as I do so, the bars vanish:
p <- ggplot(df, aes(x = Datum, y = anzahl_tn, fill = ZUSAMMENFASSUNG))+
geom_bar(stat="identity", color='black') +
geom_text(aes(label = paste0(round(100*percolumn),"%"), y = pos),size = 3)+
scale_fill_manual(values = c("black","#DD1E0D","#003087","#6fa554","#7DABFF","#d6d6d6"))
ggplotly(p)
enter image description here
Data:
Datum ZUSAMMENFASSUNG anzahl_tn percolumn pos
<date> <fct> <dbl> <dbl> <dbl>
1 2020-10-01 A 9548 0.258 2745326
2 2020-10-01 B 8213 0.222 2040286.
3 2020-10-01 C 5887 0.159 1404390.
4 2020-10-01 D 4192 0.113 932105
5 2020-10-01 E 5043 0.136 525418.
6 2020-10-01 F 4106 0.111 194945
7 2020-11-01 A 10603 0.267 3082634.
8 2020-11-01 B 9235 0.233 2099054.
9 2020-11-01 C 6108 0.154 1452656
10 2020-11-01 D 4380 0.110 1009419
Any idea what causes this?

Related

Stacked Histogram per day with ggplot2

I have my data here in this googledoc
That looks like this:
# A tibble: 57 × 3
date n_sym n_rep
<date> <dbl> <dbl>
1 2020-06-01 153 63
2 2020-06-02 206 168
3 2020-06-03 192 202
4 2020-06-04 168 247
5 2020-06-05 155 211
6 2020-06-06 150 155
7 2020-06-07 100 85
8 2020-06-08 192 125
9 2020-06-09 182 195
10 2020-06-10 198 234
# … with 47 more rows
I would like to create a stacked histogram with daily bins, like something in this figure.
Where: n_sym and n_rep are counts stacked one over each other.
I can't understand how to proceed....
This way you could modify to achieve your desired plot:
library(tidyverse)
library(scales)
df1 <- df %>%
pivot_longer(
-date
) %>%
mutate(date = as.Date(date),
name = ifelse(name=="n_sym", "Onset of symptoms", "Date of reporting"))
ggplot(df1, aes(x=date, y=value, fill=name))+
geom_col()+
xlab("Onset of symptoms, alternatively date of reporting (2020)") +
ylab("Number of reported cases") +
scale_fill_manual(values = c("#ffc000", "#045aa0"))+
scale_x_date(date_breaks = "1 day", labels = date_format("%d/%m")) +
scale_y_continuous(expand = c(0, 0), limits = c(0, max(df1$value)),
breaks=seq(0,max(df1$value),100))+
theme_classic() +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))+
theme(legend.position="bottom")+
guides(fill=guide_legend(title=""))+
coord_fixed(ratio = .05)+
theme(axis.title = element_text(size = 16))

ggplot, scale_colour_manual issues

Can anyone spot what is wrong in this code?
a <- c("Afghanistan"="darkgreen","Iraq"="red" ,"Mali"="green", "Nigeria"="purple","Senegal"="orange")
ggplot(data = full) + scale_colour_manual(values=a) +
geom_point(aes(x=Afghanistan_GDPpC, y=Afghanistan_AS), colour = "Afghanistan") +
geom_smooth(aes(x=Afghanistan_GDPpC, y=Afghanistan_AS), colour = "Afghanistan", method = "lm") +
geom_point(aes(x=Iraq_GDPpC, y=Iraq_AS), colour = "Iraq") +
geom_smooth(aes(x=Iraq_GDPpC, y=Iraq_AS), colour = "Iraq", method = "lm") +
geom_point(aes(x=Mali_GDPpC, y=Mali_AS), colour = "Mali") +
geom_smooth(aes(x=Mali_GDPpC, y=Mali_AS), colour = "Mali", method = "lm") +
geom_point(aes(x=Nigeria_GDPpC, y=Nigeria_AS), colour = "Nigeria") +
geom_smooth(aes(x=Nigeria_GDPpC, y=Nigeria_AS), colour = "Nigeria", method = "lm") +
geom_point(aes(x=Senegal_GDPpC, y=Senegal_AS), colour = "Senegal") +
geom_smooth(aes(x=Senegal_GDPpC, y=Senegal_AS), colour = "Senegal", method = "lm") +
labs (x = "Log - GDP per Capita", y = "Log - Asylum Applications - First Time", colour = "Legend") +
theme_classic()
This is the message I keep getting:
Error: Unknown colour name: Afghanistan
Here is the dataset: https://drive.google.com/file/d/1j5I6odeWxaAiJlc7dHtD-Qj42xuP-gMs/view?usp=sharing
I advise you to look at how ggplot and "grammar of graphics" works (here for example: https://ramnathv.github.io/pycon2014-r/visualize/ggplot2.html).
So first you need to reshape your data to meet the requirements of ggplot:
full <- full %>% pivot_longer(cols = ends_with(c("AS","GDPpC")),
names_to = c("country", ".value"),
names_sep="_") %>%
rename("year" = "X1")
The resulting tibble:
# A tibble: 50 x 4
year country AS GDPpC
<dbl> <chr> <dbl> <dbl>
1 2011 Mali 8.29 6.73
2 2011 Nigeria 9.32 7.82
3 2011 Senegal 7.54 7.22
4 2011 Afghanistan 9.94 6.38
5 2011 Iraq 9.43 8.71
6 2012 Mali 7.75 6.66
7 2012 Nigeria 8.56 7.91
8 2012 Senegal 7.70 7.18
9 2012 Afghanistan 9.90 6.46
10 2012 Iraq 9.30 8.83
# ... with 40 more rows
Then you can use the ggplot correctly:
ggplot(data = full, mapping = aes(x = GDPpC, y = AS, col = country))+
geom_point()+
scale_color_manual(values = c("Afghanistan"="darkgreen","Iraq"="red" ,"Mali"="green", "Nigeria"="purple","Senegal"="orange"))+
geom_smooth(method = "lm")+
labs (x = "Log - GDP per Capita", y = "Log - Asylum Applications - First Time", colour = "Legend") +
theme_classic()

Get the proportions in ggplot2 (R) bar charts

Can someone provide me some hints as to what I am doing wrong in my code? Or what I need to correct to get the correct percentages? I am trying to get the proportions by manipulating my ggplot2 code. I would prefer not mutating a column. However, if I can't get ggplot2 to give me the correct proportions, I will then be open to adding columns.
Here is the reproduceable data:
cat_type<-c("1", "1","2","3","1","3", "3","2","1","1","1","3","3","2","3","2","3","1","3","3","3","1","3","1","3","1","1","3","1")
country<-c("India","India","India","India","India","India","India","India","India","India","Indonesia","Russia","Indonesia","Russia","Russia","Indonesia","Indonesia","Indonesia","Indonesia","Russia","Indonesia","Russia","Indonesia","Indonesia","Russia", "Russia", "India","India","India")
bigcats<-data.frame(cat_type=cat_type,country=country)
My data gives me the following proportions (these are correct):
> table(bigcats$cat_type, bigcats$country) ## raw numbers
India Indonesia Russia
1 7 3 2
2 2 1 1
3 4 5 4
>
> 100*round(prop.table(table(bigcats$cat_type, bigcats$country),2),3) ## proportions by column total
India Indonesia Russia
1 53.8 33.3 28.6
2 15.4 11.1 14.3
3 30.8 55.6 57.1
However, my ggplot2 is giving me the incorrect proportions:
bigcats %>% ggplot(aes(x=country, y = prop.table(stat(count)), fill=cat_type, label = scales::percent(prop.table(stat(count)))))+
geom_bar(position = position_fill())+
geom_text(stat = "count", position = position_fill(vjust=0.5),colour = "white", size = 5)+
labs(y="Percent",title="Top Big Cat Populations",x="Country")+
scale_fill_discrete(name=NULL,labels=c("Siberian/Bengal", "Other wild cats", "Puma/Leopard/Jaguar"))+
scale_y_continuous(labels = scales::percent)
The issue is that using prop.table(stat(count)) will not compute the proportions by categories or your countries, i.e. you do:
library(dplyr)
bigcats %>%
count(cat_type, country) %>%
mutate(pct = scales::percent(prop.table(n)))
#> cat_type country n pct
#> 1 1 India 7 24.1%
#> 2 1 Indonesia 3 10.3%
#> 3 1 Russia 2 6.9%
#> 4 2 India 2 6.9%
#> 5 2 Indonesia 1 3.4%
#> 6 2 Russia 1 3.4%
#> 7 3 India 4 13.8%
#> 8 3 Indonesia 5 17.2%
#> 9 3 Russia 4 13.8%
Making use of a helper function to reduce code duplication you could compute your desired proportions like so:
library(ggplot2)
prop <- function(count, group) {
count / tapply(count, group, sum)[group]
}
ggplot(bigcats, aes(
x = country, y = prop(after_stat(count), after_stat(x)),
fill = cat_type, label = scales::percent(prop(after_stat(count), after_stat(x)))
)) +
geom_bar(position = position_fill()) +
geom_text(stat = "count", position = position_fill(vjust = 0.5), colour = "white", size = 5) +
labs(y = "Percent", title = "Top Big Cat Populations", x = "Country") +
scale_fill_discrete(name = NULL, labels = c("Siberian/Bengal", "Other wild cats", "Puma/Leopard/Jaguar")) +
scale_y_continuous(labels = scales::percent)
Created on 2021-07-28 by the reprex package (v2.0.0)

How to change behaviour of `check_overlap = TRUE`?

My data is structured as follows:
> Comparison
# A tibble: 12 x 3
round TotalShots Year
<int> <dbl> <dbl>
1 1 70 2021
2 2 68 2021
3 3 76 2021
4 4 73 2021
5 5 66 2021
6 6 70 2021
7 1 115 2020
8 2 106 2020
9 3 75 2020
10 4 73 2020
11 5 82 2020
12 6 84 2020
I can plot this in ggplot2 via:
ggplot(Comparison, aes(x = round, y = TotalShots,
colour = factor(Year), label = TotalShots)) +
geom_line() +
geom_point(size = 14) +
geom_text(colour = "black", size = 5, check_overlap = TRUE)
However, in the plot, I have the label, at Rd3 printing as 76 and not 75. I assume this is because of check_overlap = TRUE however the plot is wrong, as year = 2020 for round = 3 should have the label of 75 and not 76.
Is there any way to please fix this?
You can try using ggrepel library for clarity of labels and to avoid overlapping.
library(ggrepel)
library(ggplot2)
ggplot(Comparison, aes(x = round, y = TotalShots,
colour = factor(Year), label = TotalShots)) +
geom_line() +
geom_point(size = 14) +
geom_label_repel(colour = "black", size = 5, nudge_y = 0.8)

Ordering a 2 bar plot in R

I have a data set as below and I have created a graph with below code as suggested in a previous question. What I want to do is order the bars by rankings rather than team names. Is that possible to do in ggplot?
Team Names PLRankingsReverse Grreserve
Liverpool 20 20
Chelsea 19 19
Manchester City 15 18
Arsenal 16 17
Tottenham 18 16
Manchester United 8 15
Everton 10 14
Watford 13 13
Burnley 17 12
Southampton 9 11
WBA 11 10
Stoke 4 9
Bournemouth 12 8
Leicester 7 7
Middlesbrough 14 6
C. Palace 6 5
West Ham 1 4
Hull 3 3
Swansea 5 2
Sunderland 2 1
And here is the code:
alldata <- read.csv("premierleague.csv")
library(ggplot2)
library(reshape2)
alldata <- melt(alldata)
ggplot(alldata, aes(x = Team.Names, y= value, fill = variable), xlab="Team Names") +
geom_bar(stat="identity", width=.5, position = "dodge")
Thanks for the help!
In this case you need to sort your data frame prior to melting and capture the order. You can then use this to set the limit order on scale_x_discrete, or you can factor Team Name in your aes string.
Using factor:
ordr <- order(alldata$`Team Names`, alldata$PLRankingsReverse, decreasing = TRUE)
alldata <- melt(alldata)
ggplot(alldata, aes(x = factor(`Team Name`, ordr), y = value, fill = variable) +
labs(x = "Team Name") +
geom_bar(stat = "identity", width = .5, position = "dodge")
Using scale_x_discrete:
ordr <- alldata$`Team Name`[order(alldata$PLRankingsReverse, decreasing = TRUE)]
alldata <- melt(alldata)
ggplot(alldata, aes(x = `Team Name`, y = value, fill = variable) +
labs(x = "Team Name") +
geom_bar(stat = "identity", width =. 5, position = "dodge") +
scale_x_discrete(limits = ordr)

Resources