Charts using ggplot() to apply geom_text() in R - r

In learning of charts plotting in R, I am using the Australian AIDS Survival Data.
To show the genders in survival, I plot 2 charts with these codes:
data <- read.csv("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/MASS/Aids2.csv")
ggplot(data) +
geom_bar(aes(sex, fill = as.factor(status)), position = "fill") +
scale_y_continuous(labels = scales::percent)
ggplot(data) +
geom_bar(aes(as.factor(status), fill = sex))
Here are the charts.
Now I want to add the values (numbers and percentages) into the bars body.
geom_text () will do. I googled some references and tried different combinations for the geom_text (x, y, label) like xxx. They are not shown properly.
Wrong code:
geom_text(aes(as.factor(status), y = sex, label = sex))
How can I do this?

I found it easiest to summarise the data outside of ggplot and then it became relatively simple.
library(tidyverse)
data2 <- data %>%
group_by(sex, status) %>%
summarise (n = n()) %>%
mutate(percent = n / sum(n) * 100)
ggplot(data2, aes(sex, percent, group = status)) +
geom_col(aes(fill = status)) +
geom_text(aes(label = round(percent,1)), position = position_stack(vjust =
0.5))
ggplot(data2, aes(status, n, group = sex)) +
geom_col(aes(fill = sex)) +
geom_text(aes(label = n), position = position_stack(vjust = 0.5))

Related

Customize ggplot2 legend with different variables

I have the following data about American and German teenagers' coding skills. I can easily display their bar plots, but I need to present the total number of teenagers from each country as well.
DF <- data.frame(code = rep(c("A","B","C"), each = 2),
Freq = c(441,121,700,866,45,95),
Country = rep(c("USA","Germany"),3),
Total = rep(c(1186,1082),3))
ggplot(DF, aes(code, Freq, fill = code)) + geom_bar(stat = "identity", alpha = 0.7) +
facet_wrap(~Country, scales = "free") +
theme_bw() +
theme(legend.position="none")
For example, instead of presenting the default legend for the code, I could replace it with the Country and the Total. Your help is appreciated
Here's what I would suggest:
library(dplyr); library(ggplot2)
DF %>%
add_count(Country, wt = Total) %>%
mutate(Country_total = paste0(Country, ": Total=", n)) %>%
ggplot(aes(code, Freq, fill = code)) + geom_bar(stat = "identity", alpha = 0.7) +
facet_wrap(~Country_total, scales = "free") +
theme_bw() +
theme(legend.position="none")
To do what you're requesting would take a different approach, since the data you're describing would not strictly be a ggplot2 legend (which explains how one of the variables is mapped to one of the graph aesthetics), rather it would be a table or annotation that is displayed alongside the plot. This could be generated separately and added to the figure using patchwork or grid packages.
For instance:
library(patchwork); library(gridExtra)
ggplot(DF, aes(code, Freq, fill = code)) + geom_bar(stat = "identity", alpha = 0.7) +
facet_wrap(~Country, scales = "free") +
theme_bw() +
theme(legend.position="none") +
tableGrob(count(DF, Country, wt = Total)) +
plot_layout(widths = c(2,1))

R Stacked percentage bar plot with two factor variables - How to label the % inside the plot, without counting the NA?

I am trying to plot two factor variables and label the results with % inside the plots.
I already tried some recommended codes in previous topics/ questions but I can't solve the problem of the % in the labels.
This is my code:
library(dplyr)
library(ggplot2)
data2 <- data %>% group_by(anoletivo_cat) %>%
count(anoletivo_cat, qsd_distrito_nascimento_rec) %>%
mutate(pct = n / sum(n), pct_label = scales::percent(pct))
ggplot(data2[!is.na(data2$qsd_distrito_nascimento_rec),], aes(x= anoletivo_cat, fill = qsd_distrito_nascimento_rec, y = pct)) +
geom_bar(position = "fill", stat="identity") +
geom_text(aes(label = paste(pct_label), y = pct),
position = position_fill(vjust = 0.5)) +
scale_y_continuous(labels = scales::percent)
And this is the plot I'm getting:
see the plot here
As you can see, my labels are counting the % of the NA data (and that's why if we sum the % of the bars it is not 100%, as it should be). So, my question is how can I label the % inside the plot, without counting the NA?
I already ommit them from the plot and the % of the bars of the plot are different from the % I am getting in the labels...
Thank you!
You could try filtering out NAs up front, such as:
library(dplyr)
library(ggplot2)
data2 <- data %>%
filter(!is.na(qsd_distrito_nascimento_rec)) %>%
group_by(anoletivo_cat) %>%
count(anoletivo_cat, qsd_distrito_nascimento_rec) %>%
mutate(pct = n / sum(n), pct_label = scales::percent(pct))
ggplot(data2, aes(x= anoletivo_cat, fill = qsd_distrito_nascimento_rec, y = pct)) +
geom_bar(position = "fill", stat="identity") +
geom_text(aes(label = paste(pct_label), y = pct),
position = position_fill(vjust = 0.5)) +
scale_y_continuous(labels = scales::percent)

r/ggplot: compute bar plot share within group

I am using ggplot2 to make a bar plot that is grouped by one variable and reported in shares.
I would like the percentages to instead be a percentage of the grouping variable rather than a percentage of the whole data set.
For example,
library(ggplot2)
library(tidyverse)
ggplot(mtcars, aes(x = as.factor(cyl),
y = (..count..) / sum(..count..),
fill = as.factor(gear))) +
geom_bar(position = position_dodge(preserve = "single")) +
geom_text(aes(label = scales::percent((..count..)/sum(..count..)),
y= ((..count..)/sum(..count..))), stat="count") +
theme(legend.position = "none")
Produces this output:
I'd like the percentages (and bar heights) to reflect the "within cyl" proportion rather than share across the entire sample. Is this possible? Would this involve a stat argument?
As an aside, if its possible to similarly position the geom_text call over the relevant bars that would be ideal. Any guidance would be appreciated.
Here is one way :
library(dplyr)
library(ggplot2)
mtcars %>%
count(cyl, gear) %>%
group_by(cyl) %>%
mutate(prop = prop.table(n) * 100) %>%
ggplot() + aes(cyl, prop, fill = factor(gear),
label = paste0(round(prop, 2), '%')) +
geom_col(position = "dodge") +
geom_text(position = position_dodge(width = 2), vjust = -0.5, hjust = 0.5)

How to make a barplot of percentages in ggplot2

I have a set of data as such;
Station;Species;
CamA;SpeciesA
CamA;SpeciesB
CamB;SpeciesA
etc...
I would like to create a cumulative barplot with the cameras station in x axis and the percentage of each species added. I have tried the following code;
ggplot(data=data, aes(x=Station, y=Species, fill = Species))+ geom_col(position="stack") + theme(axis.text.x =element_text(angle=90)) + labs (x="Cameras", y= NULL, fill ="Species")
And end up with the following graph;
But clearly I don't have a percentage on the y axis, just the species name - which is in the end what I have coded for..
How could I have the percentages on the y axis, the cameras on the x axis and the species as a fill?
Thanks !
Using mtcars as example dataset one approach to get a barplot of percentages is to use geom_bar with position = "fill".
library(ggplot2)
library(dplyr)
mtcars2 <- mtcars
mtcars2$cyl = factor(mtcars2$cyl)
mtcars2$gear = factor(mtcars2$gear)
# Use geom_bar with position = "fill"
ggplot(data = mtcars2, aes(x = cyl, fill = gear)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = scales::percent_format()) +
theme(axis.text.x = element_text(angle = 90)) +
labs(x = "Cameras", y = NULL, fill = "Species")
A second approach would be to manually pre-compute the percentages and make use of geom_col with position="stack".
# Pre-compute pecentages
mtcars2_sum <- mtcars2 %>%
count(cyl, gear) %>%
group_by(cyl) %>%
mutate(pct = n / sum(n))
ggplot(data = mtcars2_sum, aes(x = cyl, y = pct, fill = gear)) +
geom_col(position = "stack") +
scale_y_continuous(labels = scales::percent_format()) +
theme(axis.text.x = element_text(angle = 90)) +
labs(x = "Cameras", y = NULL, fill = "Species")

How to create scaled and faceted clustered bargraphs of a summarized dataframe in ggplot2?

I am trying to create a grid of bargraphs that show the average for different species. I am using the iris dataset for this question.
I summarised the data, melted it into long form long, and tried to use facet_wrap.
iris %>%
group_by(Species) %>%
summarise(M.Sepal.Length=mean(Sepal.Length),
M.Sepal.Width=mean(Sepal.Width),
M.Petal.Length= mean(Petal.Length),
M.Petal.Width=mean(Petal.Width)) %>%
gather(key = Part, value = Value, M.Sepal.Length:M.Petal.Width) %>%
ggplot(., aes(Part, Value, group = Species, fill=Species)) +
geom_col(position = "dodge") +
facet_grid(cols=vars(Part)) +
facet_grid(cols = vars(Part))
However, the graph I am getting has x.axis labels that are strung across each facet grid. Additionally the clustered graphs are not centered within each facet box. Instead they appear at the location of their respective x-axis label. I'd like to get rid of the x-axis labels, center the graphs, and scale the graphs within each facet.
Here is an image of the resulting graph marked up with my expected output:
Perhaps this is what you're looking for?
The key changes are:
Remove Part as the variable mapped to x, that way the data is plotted in the same location in every facet
Switch to facet_wrap so you can use scales = "free_y"
Use labs to manually add the x title
Add theme to get rid of the x-axis ticks and tick labels.
library(ggplot2)
library(dplyr) # Version >= 1.0.0
iris %>%
group_by(Species) %>%
summarise(across(1:4, mean, .names = "M.{col}")) %>%
gather(key = Part, value = Value, M.Sepal.Length:M.Petal.Width) %>%
ggplot(., aes(x = 1, y = Value, group = Species, fill=Species)) +
geom_col(position = "dodge") +
facet_wrap(.~Part, nrow = 1, scales = "free_y") +
labs(x = "Part") +
theme(axis.ticks.x = element_blank(),
axis.text.x = element_blank())
I also took the liberty of switching out your manual call to summarise with the new across functionality.
Here's how you might also calculate error bars:
library(tidyr)
iris %>%
group_by(Species) %>%
summarise(across(1:4, list(M = mean, SE = ~ sd(.)/sqrt(length(.))),
.names = "{fn}_{col}")) %>%
pivot_longer(-Species, names_to = c(".value","Part"),
names_pattern = "([SEM]+)_(.+)") %>%
ggplot(., aes(x = 1, y = M, group = Species, fill=Species)) +
geom_col(position = "dodge") +
geom_errorbar(aes(ymin = M - SE, ymax = M + SE), width = 0.5,
position = position_dodge(0.9)) +
facet_wrap(.~Part, nrow = 1, scales = "free_y") +
labs(x = "Part", y = "Value") +
theme(axis.ticks.x = element_blank(),
axis.text.x = element_blank())

Resources