ggplot2 with side by side and proportional fill - r

I have data that looks like this:
My goal is to have a barplot grid as follows: Each plot will be specific to 1 race_ethnicity group. The x-axis in each plot will be the different age_bin groups. For each age_bin, there will be two bars: 1 for men, and 1 for women. For each bar, I want it to be filled with the proportion of Likely/(Unlikely + Likely). Preferably, each bar would have a height of 1 and a line cut through it so Likely% of that bar is one color with a label. This is what I currently have:
I am running into issues with 1) using a predefined proportion as the fill, and 2) having two different "fills" (one for biological sex, one for the predefined proportion.
Thanks to anyone who can help with this. My code is currently the following:
ggplot(data=who_votes_data, aes(x=age_bin,y=1, fill=gender)) +
geom_bar(stat='identity',aes(fill = gender), position = position_dodge2()) +
facet_wrap(~race_ethnicity, nrow = 2, scales = "free") +
geom_text(aes(label=paste0(sprintf("%1.1f", prop*100),"%"), y=prop),
colour="white") +
labs(x = expression("Age Group"), y= ("Prortion of Likely Voters"),
title = "Proportion of Likely Voters Across Age Groups, Race/Ethnicity, and Sex",
caption="Figure 1") + theme(plot.caption = element_text(hjust = 0.5, vjust = -0.5, size = 18))
https://docs.google.com/spreadsheets/d/1a7433iwXNSwcuXDJOvqsxNDN6oaYULVlyw22E41JROU/edit?usp=sharing
Updated Code:
library(tidyverse)
library(ggplot2)
df<- read.csv("samplevotes.csv")
df %>%
group_by(race_ethnicity, age_bin, gender) %>%
summarise(Likely = sum(Likely),
Unlikely = sum(Unlikely),
proportion = Likely/(Likely+Unlikely)) %>% ungroup() %>%
ggplot(aes(x = age_bin, y = proportion, fill = gender)) +
geom_bar(stat = "identity", position = "dodge") +
facet_wrap(~race_ethnicity, nrow = 2) +
geom_text(aes(label=paste0(sprintf("%1.1f", proportion*100),"%"), y=proportion), position = position_dodge(width = 1), colour="Black", size = 2.2) +
labs(x = expression("Age Group"), y= ("Proportion of Likely Voters"), title = "Proportion of Likely Voters Across Age Groups, Race/Ethnicity, and Sex", caption="Figure 1") +
theme(plot.caption = element_text(hjust = 0.5, vjust = -0.5, size = 18))

Here is the code I would use. I did make some changes based on the way the data was combined.
df %>%
group_by(race_ethnicity, age_bin, gender) %>%
summarise(Likely = sum(Likely),
Unlikely = sum(Unlikely),
proportion = Likely/(Likely+Unlikely)) %>% ungroup() %>%
ggplot(aes(x = age_bin, y = proportion, fill = gender)) +
geom_bar(stat = "identity", position = "dodge") +
facet_wrap(~race_ethnicity, nrow = 2) +
geom_text(aes(label=paste0(sprintf("%1.1f", proportion*100),"%"), y=proportion), position = position_dodge(width = 1), colour="Black", size = 2.2) +
labs(x = expression("Age Group"), y= ("Proportion of Likely Voters"), title = "Proportion of Likely Voters Across Age Groups, Race/Ethnicity, and Sex", caption="Figure 1") +
theme(plot.caption = element_text(hjust = 0.5, vjust = -0.5, size = 18))
Here is what it looks like

Related

Selecting using geom_bar

I am using ggplot2 and geom_bar to summarize the number of spices in a column. I want the number of bars to be restricted to a value, say count > 10.
My code is:
ggplot(r_df, aes(spice, colour="black", fill = "orange")) +
geom_bar(stat = "count") +
stat_count(geom = "text", colour = "black", size = 3.,
aes(label = after_stat(count)), position=position_stack(vjust=1.05)) +
theme(axis.text.x = element_text(angle=90, vjust=.6))
Since the code is counting the number of each item for me, I don't think I can subset prior to executing the plot. I am hoping there is a simple way to put the count>10 into the ggplot function itself.
Since I don't have the exact data in r_df, I'm taking a guess here. You should be able to use filter() prior to building your plot:
library(tidyverse)
r_df %>%
group_by(spice) %>%
filter(n() > 10) %>%
ungroup() %>%
ggplot(aes(spice, colour = "black", fill = "orange")) +
geom_bar(stat = "count") +
stat_count(
geom = "text",
colour = "black",
size = 3.,
aes(label = after_stat(count)),
position = position_stack(vjust = 1.05)
) +
theme(axis.text.x = element_text(angle = 90, vjust = .6))

how to add a legend in a ggplot?

enter image description hereI am having a problem with my ggplot that i cannot insert a legend. I just want to show the total number of facilities per region (manually).
Here is my code:
Note: my csv file has 17,333 IDs, I was thinking maybe that is why but I'm not really sure so.
library(ggplot2)
library(dplyr)
library(ggthemes)
library(tidyverse)
doh = read.csv("doh.csv")
doh %>%
ggplot(aes( y = region, color = region)) +
geom_bar(position = "identity", size = 0.7, alpha = 0.8, fill = "#28d1eb", colour="black") +
labs(title = "Total Number of COVID-19 Facilities per Region",
x = "Count",
y = "Region") +
theme_minimal() +
theme(plot.title = element_text(lineheight=6, face="bold", color="black",size=15))
I have tried inserting a legend in my code but it isn't working and I'm not sure where I went wrong.
code:
library(ggplot2)
library(dplyr)
library(ggthemes)
library(tidyverse)
doh = read.csv("doh.csv")
doh %>%
ggplot(aes( y = region, fill = region, color = region)) +
geom_bar(position = "identity", size = 1.0, alpha = 0.8, fill = "#28d1eb", colour="black") +
labs(title = "Total Number of COVID-19 Facilities per Region") +
theme_minimal() +
theme(plot.title = element_text(lineheight=6, face="bold", color="black",size=15)) +
barplot(data,
col = c("#f2b50c", "#d96fe3")) +
legend("topright",
legend = c("REGION XIII (CARAGA)"))
PS: I only included the "REGION XIII (CARAGA)" because I just want to see if its working but its not.
enter image description here
Thanks in adv!
Here is a way. Count the regions before piping to ggplot.
The example below uses the data set diamonds, substitute region for clarity and the code should work.
suppressPackageStartupMessages({
library(ggplot2)
library(dplyr)
})
data(diamonds)
diamonds %>%
select(clarity) %>%
count(clarity, name = "Count") %>%
ggplot(aes(x = clarity, y = Count, fill = clarity)) +
geom_col(alpha = 0.8, fill = "#28d1eb", colour = "black") +
geom_text(aes(label = Count), hjust = -0.1) +
coord_flip() +
labs(title = "Total Number of COVID-19 Facilities per Region") +
theme_minimal(base_size = 15) +
theme(plot.title = element_text(lineheight = 6, face = "bold", color="black"))
Created on 2022-11-21 with reprex v2.0.2

How to label the count of each bin within ggridges package?

I have a data frame that simulates the NFL season with 2 columns: team and rank. I am trying to use ggridges to make a distribution plot of the frequency of each team at each rank from 1-10. I can get the plot working, but I'd like to display the count of each team/rank in each bin. I have been unsuccessful so far.
ggplot(results,
aes(x=rank, y=team, group = team)) +
geom_density_ridges2(aes(fill=team), stat='binline', binwidth=1, scale = 0.9, draw_baseline=T) +
scale_x_continuous(limits = c(0,11), breaks = seq(1,10,1)) +
theme_ridges() +
theme(legend.position = "none") +
scale_fill_manual(values = c("#4F2E84", "#FB4F14", "#7C1415", "#A71930", "#00143F", "#0C264C", "#192E6C", "#136677", "#203731"), name = NULL)
Which creates this plot:
I tried adding in this line to get the count added to each bin, but it did not work.
geom_text(stat='bin', aes(y = team + 0.95*stat(count/max(count)),
label = ifelse(stat(count) > 0, stat(count), ""))) +
Not the exact dataset but this should be enough to at least run the original plot:
results = data.frame(team = rep(c('Jets', 'Giants', 'Washington', 'Falcons', 'Bengals', 'Jaguars', 'Texans', 'Cowboys', 'Vikings'), 1000), rank = sample(1:20,9000,replace = T))
How about calculating the count for each bin, joining to the original data and using the new variable n as the label?
library(dplyr) # for count, left_join
results %>%
count(team, rank) %>%
left_join(results) %>%
ggplot(aes(rank, team, group = team)) +
geom_density_ridges2(aes(fill = team),
stat = 'binline',
binwidth = 1,
scale = 0.9,
draw_baseline = TRUE) +
scale_x_continuous(limits = c(0, 11),
breaks = seq(1, 10, 1)) +
theme_ridges() +
theme(legend.position = "none") +
scale_fill_manual(values = c("#4F2E84", "#FB4F14", "#7C1415", "#A71930", "#00143F",
"#0C264C", "#192E6C", "#136677", "#203731"), name = NULL) +
geom_text(aes(label = n),
color = "white",
nudge_y = 0.2)
Result:
Neilfws' answer is great, but I've always found geom_ridgelines difficult to work with in circumstances like this so I usually recreate them with geom_rect:
library(dplyr)
results %>%
count(team, rank) %>%
filter(rank<=10) %>%
mutate(team=factor(team)) %>%
ggplot() +
geom_rect(aes(xmin=rank-0.5, xmax=rank+0.5, ymin=team, fill=team,
ymax=as.numeric(team)+n*0.75/max(n))) +
geom_text(aes(x=rank, y=as.numeric(team)-0.1, label=n)) +
theme_ridges() +
theme(legend.position = "none") +
scale_fill_manual(values = c("#4F2E84", "#FB4F14", "#7C1415", "#A71930",
"#00143F", "#0C264C", "#192E6C", "#136677",
"#203731"), name = NULL) +
ylab("team")
I especially like the level of fine control I get from geom_rect rather than ridgelines. But you do lose out on the nice bounding line drawn around each ridgeline, so if that's important then go with the other answer.

How can I round values in geom_text that are contained within other text?

I have some code:
library(dplyr)
library(tidyr)
library(ggplot2)
eu_chats %>%
spread(type, count, fill = 0) %>% # Spread the count column in missed and completed
mutate(Total = Completed + Missed) %>% # Create the Total column
ggplot(aes(date, Total)) +
geom_col(aes(fill = "Total"),
colour = "black") + # total bar (with stat = "identity")
geom_col(aes(y = Missed, fill = "Missed"),
colour = "black") + # missed bar
geom_text(aes(label = paste("Total chats:", Total)), # add total label
hjust = -0.05, vjust = 1) +
geom_text(aes(label = paste("Missed chats:", Missed, "(", Missed/Total*100, "%)")), # add missed label and calculate percentage
hjust = -0.05, vjust = -0.5, color = "red") +
scale_fill_manual(name = "", # Manual fill scale
values = c("Total" = "forestgreen", "Missed" = "red")) +
facet_grid(retailer~.) + # Displayed per retailer
scale_y_continuous(limits = c(0, max(eu_chats$count) * 2))
Which produces this chart:
How can I round the values in the Missed chats: label to two decimal places?
Other solutions seem only to focus on those cases where no other text is present other than the value itself.
Can this be achieved in my case, whilst still preserving the label text?

How to separately label and scale double y-axis in ggplot2?

I have a test dataset like this:
df_test <- data.frame(
proj_manager = c('Emma','Emma','Emma','Emma','Emma','Alice','Alice'),
proj_ID = c(1, 2, 3, 4, 5, 6, 7),
stage = c('B','B','B','A','C','A','C'),
value = c(15,15,20,20,20,70,5)
)
Preparation for viz:
input <- select(df_test, proj_manager, proj_ID, stage, value) %>%
filter(proj_manager=='Emma') %>%
do({
proj_value_by_manager = sum(distinct(., proj_ID, value)$value);
mutate(., proj_value_by_manager = proj_value_by_manager)
}) %>%
group_by(stage) %>%
do({
sum_value_byStage = sum(distinct(.,proj_ID,value)$value);
mutate(.,sum_value_byStage= sum_value_byStage)
}) %>%
mutate(count_proj = length(unique(proj_ID)))
commapos <- function(x, ...) {
format(abs(x), big.mark = ",", trim = TRUE,
scientific = FALSE, ...) }
Visualization:
ggplot (input, aes(x=stage, y = count_proj)) +
geom_bar(stat = 'identity')+
geom_bar(aes(y=-proj_value_by_manager),
stat = "identity", fill = "Blue") +
scale_y_continuous(labels = commapos)+
coord_flip() +
ylab('') +
geom_text(aes(label= sum_value_byStage), hjust = 5) +
geom_text(aes(label= count_proj), hjust = -1) +
labs(title = "Emma: 4 projects| $90M Values \n \n Commitment|Projects") +
theme(plot.title = element_text(hjust = 0.5)) +
geom_hline(yintercept = 0, linetype =1)
My questions are:
Why is the y-values not showing up right? e.g. C is labeled 20, but nearing hitting 100 on the scale.
How to adjust the position of labels so that it sits on the top of its bar?
How to re-scale the y axis so that both the very short bar of 'count of project' and long bar of 'Project value' can be well displayed?
Thank you all for the help!
I think your issues are coming from the fact that:
(1) Your dataset has duplicated values. This causes geom_bar to add all of them together. For example there are 3 obs for B where proj_value_by_manager = 90 which is why the blue bar extends to 270 for that group (they all get added).
(2) in your second geom_bar you use y = -proj_value_by_manager but in the geom_text to label this you use sum_value_byStage. That's why the blue bar for A is extending to 90 (since proj_value_by_manager is 90) but the label reads 20.
To get you what I believe the chart you want is you could do:
#Q1: No dupe dataset so it doesnt erroneous add columns
input2 <- input[!duplicated(input[,-c(2,4)]),]
ggplot (input2, aes(x=stage, y = count_proj)) +
geom_bar(stat = 'identity')+
geom_bar(aes(y=-sum_value_byStage), #Q1: changed so this y-value matches your label
stat = "identity", fill = "Blue") +
scale_y_continuous(labels = commapos)+
coord_flip() +
ylab('') +
geom_text(aes(label= sum_value_byStage, y = -sum_value_byStage), hjust = 1) + #Q2: Added in y-value for label and hjust so it will be on top
geom_text(aes(label= count_proj), hjust = -1) +
labs(title = "Emma: 4 projects| $90M Values \n \n Commitment|Projects") +
theme(plot.title = element_text(hjust = 0.5)) +
geom_hline(yintercept = 0, linetype =1)
For your last question, there is no good way to display both of these. One option would be to rescale the small data and still label it with a 1 or 3. However, I didn't do this because once you scale down the blue bars the other bars look OK to me.

Resources