I am using ggplot2 to make a bar plot that is grouped by one variable and reported in shares.
I would like the percentages to instead be a percentage of the grouping variable rather than a percentage of the whole data set.
For example,
library(ggplot2)
library(tidyverse)
ggplot(mtcars, aes(x = as.factor(cyl),
y = (..count..) / sum(..count..),
fill = as.factor(gear))) +
geom_bar(position = position_dodge(preserve = "single")) +
geom_text(aes(label = scales::percent((..count..)/sum(..count..)),
y= ((..count..)/sum(..count..))), stat="count") +
theme(legend.position = "none")
Produces this output:
I'd like the percentages (and bar heights) to reflect the "within cyl" proportion rather than share across the entire sample. Is this possible? Would this involve a stat argument?
As an aside, if its possible to similarly position the geom_text call over the relevant bars that would be ideal. Any guidance would be appreciated.
Here is one way :
library(dplyr)
library(ggplot2)
mtcars %>%
count(cyl, gear) %>%
group_by(cyl) %>%
mutate(prop = prop.table(n) * 100) %>%
ggplot() + aes(cyl, prop, fill = factor(gear),
label = paste0(round(prop, 2), '%')) +
geom_col(position = "dodge") +
geom_text(position = position_dodge(width = 2), vjust = -0.5, hjust = 0.5)
Related
I am trying to plot two factor variables and label the results with % inside the plots.
I already tried some recommended codes in previous topics/ questions but I can't solve the problem of the % in the labels.
This is my code:
library(dplyr)
library(ggplot2)
data2 <- data %>% group_by(anoletivo_cat) %>%
count(anoletivo_cat, qsd_distrito_nascimento_rec) %>%
mutate(pct = n / sum(n), pct_label = scales::percent(pct))
ggplot(data2[!is.na(data2$qsd_distrito_nascimento_rec),], aes(x= anoletivo_cat, fill = qsd_distrito_nascimento_rec, y = pct)) +
geom_bar(position = "fill", stat="identity") +
geom_text(aes(label = paste(pct_label), y = pct),
position = position_fill(vjust = 0.5)) +
scale_y_continuous(labels = scales::percent)
And this is the plot I'm getting:
see the plot here
As you can see, my labels are counting the % of the NA data (and that's why if we sum the % of the bars it is not 100%, as it should be). So, my question is how can I label the % inside the plot, without counting the NA?
I already ommit them from the plot and the % of the bars of the plot are different from the % I am getting in the labels...
Thank you!
You could try filtering out NAs up front, such as:
library(dplyr)
library(ggplot2)
data2 <- data %>%
filter(!is.na(qsd_distrito_nascimento_rec)) %>%
group_by(anoletivo_cat) %>%
count(anoletivo_cat, qsd_distrito_nascimento_rec) %>%
mutate(pct = n / sum(n), pct_label = scales::percent(pct))
ggplot(data2, aes(x= anoletivo_cat, fill = qsd_distrito_nascimento_rec, y = pct)) +
geom_bar(position = "fill", stat="identity") +
geom_text(aes(label = paste(pct_label), y = pct),
position = position_fill(vjust = 0.5)) +
scale_y_continuous(labels = scales::percent)
I tried this code without faceting, it works.
I want to add counts on each bar and use facets in my plot, it brokes. I managed to make it close to what I want, like this:
mtcars %>% group_by(gear, am, vs) %>% summarize(hp_sum = sum(hp), hp = hp) %>%
ggplot(aes(gear, hp_sum, fill = factor(am))) + facet_grid(.~vs) +
geom_bar(stat = 'identity', position = 'dodge', alpha = 0.5, size = 0.25) +
geom_text(aes(label=..count.., y = ..count..), stat='count', position = position_dodge(width = 0.95), size=4)
But I want the number on top of each bar. If I use y = hp_sum, I got error:
Error: stat_count() can only have an x or y aesthetic.
Run `rlang::last_error()` to see where the error occurred.
I might have format the dataset in the wrong way. Any ideas? Thanks!
I learned from this post that geom_text does not do counts by groups.
A solution is to do the summary beforehand:
mtcars %>% group_by(gear, am, vs) %>%
summarize(hp_sum = sum(hp), count = length(hp)) %>%
ggplot(aes(gear, hp_sum, fill = factor(am))) + facet_grid(.~vs) +
geom_bar(stat = 'identity', position = 'dodge', alpha = 0.5, size = 0.25) +
geom_text(aes(gear, hp_sum, label = count),
position = position_dodge(width = 0.95), size=4)
Be sure to group data the same way in the plot. Here x=gear, facet_grid(.~vs), fill = factor(am) are three factors putting y=hp into groups. So you should group this way: group_by(gear, am, vs). Hope this helps anyone who is struggling with this issue.
plot example
I have a set of data as such;
Station;Species;
CamA;SpeciesA
CamA;SpeciesB
CamB;SpeciesA
etc...
I would like to create a cumulative barplot with the cameras station in x axis and the percentage of each species added. I have tried the following code;
ggplot(data=data, aes(x=Station, y=Species, fill = Species))+ geom_col(position="stack") + theme(axis.text.x =element_text(angle=90)) + labs (x="Cameras", y= NULL, fill ="Species")
And end up with the following graph;
But clearly I don't have a percentage on the y axis, just the species name - which is in the end what I have coded for..
How could I have the percentages on the y axis, the cameras on the x axis and the species as a fill?
Thanks !
Using mtcars as example dataset one approach to get a barplot of percentages is to use geom_bar with position = "fill".
library(ggplot2)
library(dplyr)
mtcars2 <- mtcars
mtcars2$cyl = factor(mtcars2$cyl)
mtcars2$gear = factor(mtcars2$gear)
# Use geom_bar with position = "fill"
ggplot(data = mtcars2, aes(x = cyl, fill = gear)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = scales::percent_format()) +
theme(axis.text.x = element_text(angle = 90)) +
labs(x = "Cameras", y = NULL, fill = "Species")
A second approach would be to manually pre-compute the percentages and make use of geom_col with position="stack".
# Pre-compute pecentages
mtcars2_sum <- mtcars2 %>%
count(cyl, gear) %>%
group_by(cyl) %>%
mutate(pct = n / sum(n))
ggplot(data = mtcars2_sum, aes(x = cyl, y = pct, fill = gear)) +
geom_col(position = "stack") +
scale_y_continuous(labels = scales::percent_format()) +
theme(axis.text.x = element_text(angle = 90)) +
labs(x = "Cameras", y = NULL, fill = "Species")
I am trying to create a grid of bargraphs that show the average for different species. I am using the iris dataset for this question.
I summarised the data, melted it into long form long, and tried to use facet_wrap.
iris %>%
group_by(Species) %>%
summarise(M.Sepal.Length=mean(Sepal.Length),
M.Sepal.Width=mean(Sepal.Width),
M.Petal.Length= mean(Petal.Length),
M.Petal.Width=mean(Petal.Width)) %>%
gather(key = Part, value = Value, M.Sepal.Length:M.Petal.Width) %>%
ggplot(., aes(Part, Value, group = Species, fill=Species)) +
geom_col(position = "dodge") +
facet_grid(cols=vars(Part)) +
facet_grid(cols = vars(Part))
However, the graph I am getting has x.axis labels that are strung across each facet grid. Additionally the clustered graphs are not centered within each facet box. Instead they appear at the location of their respective x-axis label. I'd like to get rid of the x-axis labels, center the graphs, and scale the graphs within each facet.
Here is an image of the resulting graph marked up with my expected output:
Perhaps this is what you're looking for?
The key changes are:
Remove Part as the variable mapped to x, that way the data is plotted in the same location in every facet
Switch to facet_wrap so you can use scales = "free_y"
Use labs to manually add the x title
Add theme to get rid of the x-axis ticks and tick labels.
library(ggplot2)
library(dplyr) # Version >= 1.0.0
iris %>%
group_by(Species) %>%
summarise(across(1:4, mean, .names = "M.{col}")) %>%
gather(key = Part, value = Value, M.Sepal.Length:M.Petal.Width) %>%
ggplot(., aes(x = 1, y = Value, group = Species, fill=Species)) +
geom_col(position = "dodge") +
facet_wrap(.~Part, nrow = 1, scales = "free_y") +
labs(x = "Part") +
theme(axis.ticks.x = element_blank(),
axis.text.x = element_blank())
I also took the liberty of switching out your manual call to summarise with the new across functionality.
Here's how you might also calculate error bars:
library(tidyr)
iris %>%
group_by(Species) %>%
summarise(across(1:4, list(M = mean, SE = ~ sd(.)/sqrt(length(.))),
.names = "{fn}_{col}")) %>%
pivot_longer(-Species, names_to = c(".value","Part"),
names_pattern = "([SEM]+)_(.+)") %>%
ggplot(., aes(x = 1, y = M, group = Species, fill=Species)) +
geom_col(position = "dodge") +
geom_errorbar(aes(ymin = M - SE, ymax = M + SE), width = 0.5,
position = position_dodge(0.9)) +
facet_wrap(.~Part, nrow = 1, scales = "free_y") +
labs(x = "Part", y = "Value") +
theme(axis.ticks.x = element_blank(),
axis.text.x = element_blank())
In learning of charts plotting in R, I am using the Australian AIDS Survival Data.
To show the genders in survival, I plot 2 charts with these codes:
data <- read.csv("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/MASS/Aids2.csv")
ggplot(data) +
geom_bar(aes(sex, fill = as.factor(status)), position = "fill") +
scale_y_continuous(labels = scales::percent)
ggplot(data) +
geom_bar(aes(as.factor(status), fill = sex))
Here are the charts.
Now I want to add the values (numbers and percentages) into the bars body.
geom_text () will do. I googled some references and tried different combinations for the geom_text (x, y, label) like xxx. They are not shown properly.
Wrong code:
geom_text(aes(as.factor(status), y = sex, label = sex))
How can I do this?
I found it easiest to summarise the data outside of ggplot and then it became relatively simple.
library(tidyverse)
data2 <- data %>%
group_by(sex, status) %>%
summarise (n = n()) %>%
mutate(percent = n / sum(n) * 100)
ggplot(data2, aes(sex, percent, group = status)) +
geom_col(aes(fill = status)) +
geom_text(aes(label = round(percent,1)), position = position_stack(vjust =
0.5))
ggplot(data2, aes(status, n, group = sex)) +
geom_col(aes(fill = sex)) +
geom_text(aes(label = n), position = position_stack(vjust = 0.5))