Adding proportions to a bar chart - r

I've used my df to create a filled bar chart (Code used below). I want to have the proportions of each "race" printed within the bar chart.
Demo_17 <- tidyr::pivot_longer(Race_17, -c("State",), names_to = "Race", values_to = "num") %>%
ggplot(aes(x=State, y=num, fill = Race)) +
geom_bar(position="fill", stat="identity")
Demo_17 +
labs(x = "Population", y = "State", title = "US State Demographics 2017")
This is the df I'm using: US Demographic Data
I've looked at other similar questions but the code is long and hard to follow, particularly if it doesn't relate to your own data.
Can anyone lead me in the right direction?

Try this. Simply compute shares before plotting. Use scales::percent for nice formatting:
Demo_17 <- tidyr::pivot_longer(Race_17, -c("State",), names_to = "Race", values_to = "num") %>%
# compute pct share of race by state
group_by(State) %>%
mutate(pct = num / sum(num)) %>%
ggplot(aes(x=State, y=num, fill = Race)) +
geom_bar(position="fill", stat="identity") +
geom_text(aes(label = scales::percent(pct)), position = "fill")
Demo_17 + labs(x = "Population",
y = "State",
title = "US State Demographics 2017")
An example of this approach using mtcars:
library(ggplot2)
library(dplyr)
mtcars %>%
count(cyl, gear, name = "num") %>%
group_by(cyl) %>%
mutate(pct = num / sum(num)) %>%
ggplot(aes(x=cyl, y=num, fill = gear)) +
geom_bar(position="fill", stat="identity") +
geom_text(aes(label = scales::percent(pct)), position = "fill", vjust = 1.5, color = "white")
Created on 2020-04-20 by the reprex package (v0.3.0)
ADDITIONALLY: If you prefer to only show a label for shares over 10% (just an example, adjust as wished) then you add an ifelse() inside the label argument of geom_text:
mtcars %>%
count(cyl, gear, name = "num") %>%
group_by(cyl) %>%
mutate(pct = num / sum(num)) %>%
ggplot(aes(x=cyl, y=num, fill = gear)) +
geom_bar(position="fill", stat="identity") +
geom_text(aes(label = ifelse(pct>0.10, scales::percent(pct), "")), position = "fill", vjust = 1.5, color = "white")
As you notice the 9% label is not showing anymore.

What adds the labels to your charts is the geom_text(). Maybe something like this:
Demo_17 <- tidyr::pivot_longer(Race_17, -c("State",), names_to = "Race", values_to = "num") %>%
ggplot(aes(x=State, y=num, fill = Race)) +
geom_bar(position="fill", stat="identity")
Demo_17 +
labs(x = "Population", y = "State", title = "US State Demographics 2017") +
geom_text(aes(y=num, x=State, labels=num), vjust=0.5)
Can't test if it works great like this or if it needs some modifications since you have only supplied a screenshot of your dataset instead of a reproducible example of it. Let me know if it works but if it needs more attention read here so that people can effectively help you.

Related

Placing data labels for stacked bar chart at top of bar

I have been attempting to add a label on top of each bar to represent the proportion that each ethnic group makes up in referrals.
For some reason I cannot get the labels to be placed at the top of each bar. How do I fix this?
My code below
freq <- df %>%
group_by(ethnicity) %>%
summarise(n = n()) %>%
mutate(f = round((n/sum(n)*100, 1))
df %>%
group_by(pathway) %>%
count(ethnicity) %>%
ggplot(aes(x = ethnicity, y = n , fill = pathway)) +
geom_bar(stat = "identity", position = "stack") +
geom_text(data = freq,
aes(x= ethnicity, y = f, label = f),
inherit.aes = FALSE) +
theme(legend.position = "bottom") +
scale_fill_manual(name = "",
values = c("light blue", "deepskyblue4"),
labels = "a", "b") +
xlab("") +
ylab("Number of Referrals") +
scale_y_continuous(breaks = seq(0, 2250, 250), expand = c(0,0)
Here is what it currently looks like
Since you are using the count as your y-axis position in geom_bar, you need to use the same thing in your geom_text to get the labels in the right place. Below is an example using mtcars dataset. Using vjust = -1 I put a little bit of space between the label and the bars to make it more legible and aesthetically pleasing.
library(tidyverse)
mtcars %>%
group_by(carb) %>%
summarise(n = n()) %>%
mutate(f = round(proportions(n) * 100, 1)) -> frq
mtcars %>%
group_by(gear) %>%
count(carb) -> df
df %>%
ggplot(aes(x = carb, y = n, fill = gear)) +
geom_bar(stat = "identity", position = "stack") +
geom_text(data = frq,
vjust = -1,
aes(x= carb, y = n, label = f),
inherit.aes = FALSE)
Created on 2022-10-31 by the reprex package (v2.0.1)

How to automatically choose a good ylim to read geom_labels in ggplot2 in R

Suppose I write the following code with the diamonds dataset:
library(tidyverse)
diamonds %>%
group_by(cut) %>%
summarize(total_value = sum(price, na.rm = TRUE)) %>%
arrange(total_value) %>%
mutate(cut = as_factor(cut)) %>%
mutate(across(where(is.numeric), ~round(., 1))) %>%
ggplot(aes(x = cut, y = total_value)) +
geom_col(aes(fill = cut)) +
theme(legend.position = "note") +
coord_flip() +
geom_label(aes(label = paste0("$", total_value)), size = 6) +
labs(title = "Total Value of Diamonds by Cut", y = "USD", x = "") +
theme(axis.text = element_text(size = rel(1)))
which outputs the following plot:
As you can see, it is impossible to read the last digit(s) of the first category ("Ideal").
So, my question is, I know I can simply write something like coord_flip(ylim = c(0,80000000) and this would solve the problem; however, what could I write instead for ggplot2 to automatically know by itself how much space it should provide in ylim for people to clearly read the geom_label()s without me having to do this manually?
I'm trying to create an automatic Dashboard with multiple plots such as this, but I cannot manually tune every one of those, I need an automatic mechanism and I haven't found anything regarding this on StackOverflow for geom_label() specifically.
Thanks.
Instead of positioning your label at the the bar, you could move it closer to the middle and adjust position with vjust so it won't spill out of the plot set to include the bars.
library(tidyverse)
diamonds %>%
group_by(cut) %>%
summarize(total_value = sum(price, na.rm = TRUE)) %>%
arrange(total_value) %>%
mutate(cut = as_factor(cut)) %>%
mutate(across(where(is.numeric), ~round(., 1))) %>%
ggplot(aes(x = cut, y = total_value)) +
geom_col(aes(fill = cut)) +
theme(legend.position = "note") +
coord_flip() +
geom_label(aes(label = paste0("$", total_value), y = total_value/2), size = 6, hjust = 0.2) +
labs(title = "Total Value of Diamonds by Cut", y = "USD", x = "") +
theme(axis.text = element_text(size = rel(1)))
That gives:

Creating the pie chart according to the dataframe

df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-
timeseries/master/countryReport/raw/rawReport.csv',
stringsAsFactors = FALSE)
How to create a pie chart of the death, confirmed and recovered fields in this data set by region.
perfect for a tidyverse
library(tidyverse)
df %>%
as_tibble() %>%
select(region, confirmed, recovered, death) %>%
gather(type, value, -region) %>%
group_by(region,type) %>%
summarise(value= sum(value)) %>%
ggplot(aes(x="", value, fill =region)) +
geom_col(position = position_fill(), color="white") +
ggrepel::geom_text_repel(aes(label = region), direction = "y",
position = position_fill(vjust = 0.5)) +
coord_polar(theta = "y") +
scale_fill_discrete("") +
facet_wrap(~type) +
theme_void() +
theme(legend.position = "bottom")
For labels I used function geom_text_repel from ggrepel package to easily avoid overplotting.

geom_raster to visualize missing values with additional colorcode

This question is a follow-up to my previous question: Adding color code (fill) to vis_miss plot
I would like to visualize the "missing info" in a data frame using geom_raster from ggplot2 in R while also highlighting some additional data structure using color-coding.
Solution attempt:
library(tidyverse)
x11()
airquality %>%
mutate(id = row_number()) %>%
gather(-c(id,Month), key = "key", value = "val") %>%
mutate(isna = is.na(val)) %>%
mutate(Month=as.factor(ifelse(isna==TRUE,NA,Month))) %>%
ggplot(aes(key, id, fill = Month)) +
geom_raster() +
labs(x = "Variable",
y = "Row Number", title = "Missing values in rows") +
coord_flip()
This is almost what I want, but it would be nicer to separate the month and NA legends. Is that possible? (Note that my system does not allow me to use transparency (alpha)).
Here, I removed the legend for NA. If this doesn't serve your purpose properly, I can think of a hacky solution to add another legend for data vs. missing.
library(tidyverse)
airquality %>%
mutate(id = row_number()) %>%
gather(-c(id,Month), key = "key", value = "val") %>%
mutate(isna = is.na(val)) %>%
mutate(Month_Dummy=as.factor(ifelse(isna==TRUE,NA,Month))) %>%
mutate(Month=as.factor(Month)) %>%
ggplot() +
geom_raster(aes(key, id, fill = Month)) +
geom_raster(aes(key, id, fill = Month_Dummy)) +
labs(x = "Variable",
y = "Row Number", title = "Missing values in rows") +
coord_flip()
Update:
The hacky solution that I can think of is adding a geom_point for just one of the missing and used that for the legend of missing data points. It's not the best in terms of appearance, but is the only solution I can think of.
library(tidyverse)
airquality %>%
mutate(id = row_number()) %>%
gather(-c(id,Month), key = "key", value = "val") %>%
mutate(isna = is.na(val)) %>%
mutate(Month_Dummy=as.factor(ifelse(isna==TRUE,NA,Month))) %>%
mutate(Month=as.factor(Month)) -> aqdf
ggplot(data = aqdf, aes(key, id)) +
geom_raster(aes(fill = Month)) +
geom_raster(aes(fill = Month_Dummy)) +
geom_point(data=aqdf[aqdf$isna==TRUE,][1,],
aes(NA, id, colour = "NA"),
inherit.aes = FALSE) +
scale_color_manual(values=c("grey50")) +
labs(x = "Variable", y = "Row Number",
title = "Missing values in rows", color = "Missing") +
coord_flip() +
theme(legend.key = element_rect(fill = "grey50"))

Overlaying barplot with line graphs using ggplot2

My question is similar to those posted here and here.
I am working on creating a graph in ggplot where I have one bar plot and then want to overlay multiple line graphs. For the purposes of this question, I have reproduced my code for two barplots (one that includes all years (2007-2015) and two from specific years (2007 and 2015), but ultimately I will be overlaying data from 10 different years. The data used can be found here.
library(dplyr)
library(tidyr)
library(gridExtra)
library(ggplot2)
overallpierc<-data[(data$item=="piercing"),]
overp<-overallpierc %>%
group_by(age) %>%
count(sex) %>%
ungroup %>%
mutate(age = factor(age)) %>%
complete(age, sex, fill = list(n = 0)) %>%
ggplot(aes(age, n)) + geom_col(aes(fill = sex), position = "dodge") +
theme_classic() +
scale_fill_manual(values=c("#000000", "#CCCCCC"), name = "Sex") +
labs(x = "Age", y = "Number of observations") +
theme(legend.position=c(0.4,0.8),
plot.title = element_text(size = 10),
legend.title=element_text(size=15),
axis.title=element_text(size=15),
legend.key.size = unit(1.13, "cm"),
legend.direction="vertical",
legend.text=element_text(size=15))
p07<-data[(data$yy=="2007") & (data$item=="piercing"),]
summary(p07)
subp07<-p07 %>%
group_by(age) %>%
count(sex) %>%
ungroup %>%
mutate(age = factor(age)) %>%
complete(age, sex, fill = list(n = 0)) %>%
ggplot(aes(age, n)) + geom_col(aes(fill = sex), position = "dodge") +
theme_classic() +
scale_fill_manual(values=c("#000000", "#CCCCCC"), name = "Sex") +
labs(x = "Age", y = "Number of observations") +
theme(legend.position=c(0.4,0.8),
plot.title = element_text(size = 10),
legend.title=element_text(size=15),
axis.title=element_text(size=15),
legend.key.size = unit(1.13, "cm"),
legend.direction="vertical",
legend.text=element_text(size=15))
p15<-data[(data$yy=="2015") & (data$item=="piercing"),]
subp15<-p15 %>%
group_by(age) %>%
count(sex) %>%
ungroup %>%
mutate(age = factor(age)) %>%
complete(age, sex, fill = list(n = 0)) %>%
ggplot(aes(age, n)) + geom_col(aes(fill = sex), position = "dodge") +
theme_classic() +
scale_fill_manual(values=c("#000000", "#CCCCCC"), name = "Sex") +
labs(x = "Age", y = "Number of observations") +
theme(legend.position=c(0.4,0.8),
plot.title = element_text(size = 10),
legend.title=element_text(size=15),
axis.title=element_text(size=15),
legend.key.size = unit(1.13, "cm"),
legend.direction="vertical",
legend.text=element_text(size=15))
grid.arrange(overp, subp07, subp15)
The code I have posted gives me the following figure.
What I am trying to do is plot the frequencies for females in 2007 and 2015 and males in 2007 and 2015 on top of the barplot for total frequencies (where this is also reflected in the legend). Is there a way to do that in R using ggplot2?
UPDATE: I tried using the geom_smooth and geom_line functions to add the lines to my ggplot as suggested in the comments and as other solutions to users questions, but I get the following error:
Error: Discrete value supplied to continuous scale
I created a new data frame for a subset that I would like to plot:
df<-data.frame(age=c(15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,40,50,60), val=c(0,5,13,77,70,106,62,51,46,27,46,16,22,16,14,48,21, 3,4))
And then added it to the ggplot code:
overallpierc %>%
filter(age != "15") %>%
group_by(age) %>%
count(sex) %>%
ungroup %>%
mutate(age = factor(age)) %>%
complete(age, sex, fill = list(n = 0)) %>%
ggplot(aes(age, n)) +
geom_line(data=df,aes(x=as.numeric(age),y=val),colour="blue") +
geom_col(aes(fill = sex), position = "dodge") +
theme_classic() +
scale_fill_manual(values=c("#000000", "#CCCCCC"), name = "Sex") +
labs(x = "Age", y = "Number of observations") +
theme(legend.position=c(0.4,0.8),
plot.title = element_text(size = 10),
legend.title=element_text(size=15),
axis.title=element_text(size=15),
legend.key.size = unit(1.13, "cm"),
legend.direction="vertical",
legend.text=element_text(size=15))
Others have encountered similar issues and used as.numeric to solve the problem. However, age needs to be treated as a factor for the purposes of plotting.
Based on our discussion in the comments, let's try stacked bars and facets. I think it works but you can decide for yourself.
The stacked bar has the advantage of showing both proportions and total count in the same bar. To compare years, a facet grid places years in rows, so the eye can scan downwards to compare the same age in different years. Note that I kept age as a continuous variable here, rather than a factor.
library(dplyr)
library(ggplot2)
data30g %>%
count(yy, sex, age) %>%
ggplot(aes(age, n)) +
geom_col(aes(fill = sex)) +
facet_grid(yy ~ .) +
theme_bw() +
scale_fill_manual(values = c("#000000", "#cccccc"))
Not bad - I can see straight away, for example, an increase in both total and female count at age 30 over time, but perhaps a little small and crowded.
We can use a facet wrap instead of a grid to make the bars clearer, but at the expense of quick visual comparison across years.
data30g %>%
count(yy, sex, age) %>%
ggplot(aes(age, n)) +
geom_col(aes(fill = sex)) +
facet_wrap(~yy, ncol = 2) +
theme_bw() +
scale_fill_manual(values = c("#000000", "#cccccc"))
One more example which does not address your question in terms of total counts or barplots - but I thought it might be of interest. This code generates a "heatmap" style of plot which is poor for quantitative comparison, but can sometimes give a quick visual impression of interesting features. I think it shows, for example, that females aged 20 in 2014 have the highest total count.
data30g %>%
count(yy, sex, age) %>%
ggplot(aes(factor(age), yy)) +
geom_tile(aes(fill = n)) +
facet_grid(sex ~ .) +
scale_fill_gradient2() +
scale_y_reverse(breaks = 2006:2015) +
labs(x = "age", y = "Year")
EDIT:
Based on further discussions in the comments, here is one way to plot age as a factor, using bars for sexes, overlaid with a line for the totals and split by year.
overallpierc %>%
count(yy, sex, age) %>%
ggplot() +
geom_col(aes(factor(age), n, fill = sex), position = "dodge") +
stat_summary(aes(factor(age), n), fun.y = "sum", geom = "line", group = 1) +
facet_grid(yy ~ .)

Resources