ggplot2: plot correct proportions using geom_bar - r

I am trying to plot proportion of diamonds using geom_bar and position = "dodge". Here is what I have done.
library(ggplot2)
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut))
The image below tell me how many diamonds are there for each cut type.
Now let's do something fancy.
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")
The image below provides count of by grouping diamonds by clarity for each cut type.
What I would like to do is get the same dodge plot as above but showing proportion instead of count.
For example, for cut=ideal and clarity = VS2, the proportion should be 5071/21551 = 0.23.

You can try
library(tidyverse)
diamonds %>%
count(cut, clarity) %>%
group_by(cut) %>%
mutate(Sum=sum(n)) %>%
mutate(proportion = n/Sum) %>%
ggplot(aes(y=proportion, x=cut,fill=clarity)) +
geom_col(position = "dodge")

create a column with the correct percentages (named "percentage"), and use
require(ggplot2)
require(scales)
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, y = percentage, fill = clarity), position = "dodge") +
scale_y_continuous(labels = scales::percent)
You can also calculate the percentage inline, as Maurits Evers suggests.

Related

r/ggplot: compute bar plot share within group

I am using ggplot2 to make a bar plot that is grouped by one variable and reported in shares.
I would like the percentages to instead be a percentage of the grouping variable rather than a percentage of the whole data set.
For example,
library(ggplot2)
library(tidyverse)
ggplot(mtcars, aes(x = as.factor(cyl),
y = (..count..) / sum(..count..),
fill = as.factor(gear))) +
geom_bar(position = position_dodge(preserve = "single")) +
geom_text(aes(label = scales::percent((..count..)/sum(..count..)),
y= ((..count..)/sum(..count..))), stat="count") +
theme(legend.position = "none")
Produces this output:
I'd like the percentages (and bar heights) to reflect the "within cyl" proportion rather than share across the entire sample. Is this possible? Would this involve a stat argument?
As an aside, if its possible to similarly position the geom_text call over the relevant bars that would be ideal. Any guidance would be appreciated.
Here is one way :
library(dplyr)
library(ggplot2)
mtcars %>%
count(cyl, gear) %>%
group_by(cyl) %>%
mutate(prop = prop.table(n) * 100) %>%
ggplot() + aes(cyl, prop, fill = factor(gear),
label = paste0(round(prop, 2), '%')) +
geom_col(position = "dodge") +
geom_text(position = position_dodge(width = 2), vjust = -0.5, hjust = 0.5)

How to make a barplot of percentages in ggplot2

I have a set of data as such;
Station;Species;
CamA;SpeciesA
CamA;SpeciesB
CamB;SpeciesA
etc...
I would like to create a cumulative barplot with the cameras station in x axis and the percentage of each species added. I have tried the following code;
ggplot(data=data, aes(x=Station, y=Species, fill = Species))+ geom_col(position="stack") + theme(axis.text.x =element_text(angle=90)) + labs (x="Cameras", y= NULL, fill ="Species")
And end up with the following graph;
But clearly I don't have a percentage on the y axis, just the species name - which is in the end what I have coded for..
How could I have the percentages on the y axis, the cameras on the x axis and the species as a fill?
Thanks !
Using mtcars as example dataset one approach to get a barplot of percentages is to use geom_bar with position = "fill".
library(ggplot2)
library(dplyr)
mtcars2 <- mtcars
mtcars2$cyl = factor(mtcars2$cyl)
mtcars2$gear = factor(mtcars2$gear)
# Use geom_bar with position = "fill"
ggplot(data = mtcars2, aes(x = cyl, fill = gear)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = scales::percent_format()) +
theme(axis.text.x = element_text(angle = 90)) +
labs(x = "Cameras", y = NULL, fill = "Species")
A second approach would be to manually pre-compute the percentages and make use of geom_col with position="stack".
# Pre-compute pecentages
mtcars2_sum <- mtcars2 %>%
count(cyl, gear) %>%
group_by(cyl) %>%
mutate(pct = n / sum(n))
ggplot(data = mtcars2_sum, aes(x = cyl, y = pct, fill = gear)) +
geom_col(position = "stack") +
scale_y_continuous(labels = scales::percent_format()) +
theme(axis.text.x = element_text(angle = 90)) +
labs(x = "Cameras", y = NULL, fill = "Species")

How can I have different geom_text() labels in a faceted, stacked bar graph in R with ggplot?

I am trying to use facet_wrap with stacked bar graphs, and I'd like to have labels on the bars showing the value of each part of the bar.
Using the diamonds dataset as an example:
My geom_text code works fine when there is only one graph, albeit cramped for the shorter bars:
diamonds %>%
ggplot(aes(x = cut, fill = clarity)) +
geom_bar() +
geom_text(data = . %>%
group_by(cut, clarity) %>%
tally() %>%
ungroup() %>%
group_by(cut) %>%
ungroup(),
aes(y = n, label = n),
position = position_stack(0.5),
show.legend = FALSE)
Labeled bar plot without faceting
However, when I add the faceting, all the labels display in all the individual facets:
diamonds %>%
ggplot(aes(x = cut, fill = clarity)) +
geom_bar() +
facet_wrap(~ color) +
geom_text(data = . %>%
group_by(cut, clarity) %>%
tally() %>%
ungroup() %>%
group_by(cut) %>%
ungroup(),
aes(y = n, label = n),
position = position_stack(0.5),
show.legend = FALSE)
Faceted bar plot with replicated labeling
How can I make it so that the labels only show up on the relevant bars?
Thanks!
I think you need to include color in the group_by + tally so that it can be assigned to the correct facet:
diamonds %>%
ggplot(aes(x = cut, fill = clarity)) +
geom_bar() +
facet_wrap(~ color,scale="free_y") +
geom_text(data = . %>%
count(cut, clarity,color),
aes(y = n, label = n),size=1,
position = position_stack(0.5),
show.legend = FALSE)
Personally, I find the ..count.. special variable to be easier to work with.
diamonds %>%
ggplot(aes(x = cut, fill = clarity)) +
geom_bar() +
facet_wrap(~ color,scale="free_y") +
stat_count(geom = "text",
aes(y =..count.., label = ..count..),
position=position_stack(0.5), size = 2)

Draw a line on top of stacked bar_plot

I would like to draw a line (or making points) on top of my stacked bar_plots. As I have no real data points I can refer to (only the spereated values and not the sum of them) I don't know how I can add such line. The Code produce this plot:
I want to add this black line(my real data are not linear):
library(tidyverse)
##Create some fake data
data3 <- tibble(
year = 1991:2020,
One = c(31:60),
Two = c(21:50),
Three = c(11:40)
)
##Gather the variables to create a long dataset
new_data3 <- data3 %>%
gather(model, value, -year)
##plot the data
ggplot(new_data3, aes(x = year, y = value, fill=model)) +
geom_bar(stat = "identity",position = "stack")
You can use stat_summary and sum for the summary function:
ggplot(new_data3, aes(year, value)) +
geom_col(aes(fill = model)) +
stat_summary(geom = "line", fun.y = sum, group = 1, size = 2)
Result:
You could get sum by year and plot it with new geom_line
library(dplyr)
library(ggplot2)
newdata4 <- new_data3 %>%
group_by(year) %>%
summarise(total = sum(value))
ggplot(new_data3, aes(x = year, y = value, fill=model)) +
geom_bar(stat = "identity",position = "stack") +
geom_line(aes(year, total, fill = ""), data = newdata4, size = 2)

Charts using ggplot() to apply geom_text() in R

In learning of charts plotting in R, I am using the Australian AIDS Survival Data.
To show the genders in survival, I plot 2 charts with these codes:
data <- read.csv("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/MASS/Aids2.csv")
ggplot(data) +
geom_bar(aes(sex, fill = as.factor(status)), position = "fill") +
scale_y_continuous(labels = scales::percent)
ggplot(data) +
geom_bar(aes(as.factor(status), fill = sex))
Here are the charts.
Now I want to add the values (numbers and percentages) into the bars body.
geom_text () will do. I googled some references and tried different combinations for the geom_text (x, y, label) like xxx. They are not shown properly.
Wrong code:
geom_text(aes(as.factor(status), y = sex, label = sex))
How can I do this?
I found it easiest to summarise the data outside of ggplot and then it became relatively simple.
library(tidyverse)
data2 <- data %>%
group_by(sex, status) %>%
summarise (n = n()) %>%
mutate(percent = n / sum(n) * 100)
ggplot(data2, aes(sex, percent, group = status)) +
geom_col(aes(fill = status)) +
geom_text(aes(label = round(percent,1)), position = position_stack(vjust =
0.5))
ggplot(data2, aes(status, n, group = sex)) +
geom_col(aes(fill = sex)) +
geom_text(aes(label = n), position = position_stack(vjust = 0.5))

Resources