Mixing stacked and unstacked bars in single graph - r

I'm trying to mix stacked and unstacked bars in one graph.
Here is sample code:
Indicators <- c("Z","X","C","NO C","NA C","LOW C","HIGH C","D","E","F","G","H","LOW H","MED H","HIGH H",
"I","LOW I","MED I","HIGH I","J","LOW J","MED J","HIGH J")
Class_1 <- c(0.44,0.01,"",0.6,0.13,0.11,0.16,0.08,0.01,0.14,0.21,"",0.55,
0.23,0.22,"",0.4,0.31,0.29,"",0.41,0.23,0.36)
data <- data.frame(Indicators,Class_1)
How can I make a graph with both unstacked and stacked bars? I really appreciate all the help there is! Thanks!

Basically you want a stacked barchart even if for some categories there is nothing to stack. To this end you have to do some data wrangling and cleaning like getting rid of the empty value rows and splitting the Indicators column in two:
library(dplyr)
library(ggplot2)
library(tidyr)
data1 <- data |>
filter(Class_1 != "") |>
mutate(Class_1 = as.numeric(Class_1)) |>
separate(Indicators, into = c("level", "which"), fill = "left") |>
replace_na(list(level = "Total"))
ggplot(data1, aes(which, Class_1, fill = level)) +
geom_col()

Update II:
library(tidyverse)
my_pattern1 <- c("No|Non|Low-risk|High-risk|Low|Medium|High")
data %>%
filter(Class_1 != "") %>%
mutate(x = case_when(str_detect(Indicators, 'CSA|childhood sexual abuse') ~ "Childhood sexual abuse",
str_detect(Indicators, 'discrimination') ~ "discrimination"),
y = str_extract(Indicators, my_pattern1)) %>%
mutate(z = str_remove_all(Indicators, my_pattern1)) %>%
mutate(y= ifelse(is.na(y), "Total", y)) %>%
ggplot(aes(x = z, y = as.numeric(Class_1), fill= y))+
geom_bar(stat = "identity", position = "stack")+
theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=1))
Update: (#camille is right. I now adapted my answer and removed the first one)
library(tidyverse)
data %>%
separate(Indicators, c("Indicators_1", "Indicators_2")) %>%
mutate(Indicators_3 = ifelse(is.na(Indicators_2), "Total", Indicators_1),
Indicators_2 = coalesce(Indicators_2, Indicators_1)) %>%
filter(Class_1 != "") %>%
ggplot(aes(x = Indicators_2, y = as.numeric(Class_1), fill= Indicators_3))+
geom_bar(stat = "identity", position = "stack")

Related

How to stack partially matched time periods with geom_area (ggplot2)?

With the following example, I get a plot where the areas are not stacked. I would like to stack them. This should be a partial stack, intensity starting at 0.5, then reaching 0.8 where stacked, then reaching 0.3 at the end.
I assume that the position argument does not work as the start and end date are not the same.
Am I missing an argument that could solve this issue? Or maybe another geom?
Do I have to subset the data into days, to get the desired output. If so, how can I acheive that?
Thanks in advance,
# Library
library(tidyverse)
library(lubridate)
# Data
df <- tibble(date_debut = as_date(c("2022-09-28", "2022-10-05")),
intensity = c(0.5, 0.3),
duration = days(c(14, 10)),
type = (c("a", "b")))
# Adjustment
df <- df %>%
mutate(date_fin = date_debut + duration) %>%
pivot_longer(cols = c(date_debut, date_fin),
names_to = "date_type",
values_to = "date")
# Plot
df %>%
ggplot(aes(x = date, y = intensity, fill = type))+
geom_area(position = "stack")
This is a tough data wrangling problem. The area plots only stack where the points in the two series have the same x values. The following will achieve that, though it's quite a profligate approach.
df %>%
mutate(interval = interval(date_debut, date_debut + duration)) %>%
group_by(type) %>%
summarize(time = seq(as.POSIXct(min(df$date_debut)),
as.POSIXct(max(df$date_debut + df$duration)), by = 'min'),
intensity = ifelse(time %within% interval, intensity, 0)) %>%
ggplot(aes(x = time, y = intensity, fill = type)) +
geom_area(position = position_stack())
Allan Cameron's answer inspired me to look further into complete.
The proposed answer was solving my question, so I accepted. However, it is indeed more complex than needed.
I solved it this way:
# Adjustment
df <- df %>%
mutate(date_fin = date_debut + duration) %>%
group_by(type) %>%
complete(date_debut = seq(min(date_debut), max(date_fin), by = "1 day")) %>%
fill(intensity) %>%
select(date_debut, intensity, type)
ggplot(df, aes(x = date_debut, y = intensity, fill = type)) +
geom_area()+
scale_x_date(date_labels = "%d",
date_breaks = "1 day")
To avoid the weird empty space, it is fine for me to use geom_col (the question was about geom_area, so no worries).
ggplot(df, aes(x = date_debut, y = intensity, fill = type, colour = type)) +
geom_col(width = 0.95)+
scale_x_date(date_labels = "%d",
date_breaks = "1 day")

ggplot percentage positive per factor class

Apologies for the noob question.
I have scoured the forum but have only come across counts as proportions per 100% bar, or from all groups combined.
Can someone please help getting the following into a form so that only the % positive (Yes) are shown per "Parent" group a much neater fashion? ie a simple bar chart with the %yes shown and maybe with a % above each bar?
library(dplyr)
library(ggplot2)
library(forcats)
test1<- data.frame(Class= c("yes","yes","NA","yes","NA","yes","yes","NA","yes","yes"),
Parent= c("LN1","LN2","LN1","LN2","LN1","LN2","LN1","LN2","LN1","LN2"))
test1<- test1 %>% mutate_if(is.character, factor)
test1 %>%
mutate(Parent= fct_relevel(Parent, "LN1", "LN2")) %>%
ggplot(aes(Parent, fill= Class))+
geom_bar(position = "fill",
stat= "count") +
geom_label(aes(y = ..prop..,
label = Class),
position = "fill",
stat = "Count")
Here you go:
test1 %>% mutate(Parent= fct_relevel(Parent, "LN1", "LN2")) %>%
group_by(Parent) %>%
count(Class) %>%
mutate(pct = n/sum(n)) %>%
filter(Class == "yes") %>%
ggplot(aes(Parent, pct))+
geom_bar(stat = "identity") +
geom_text(aes(label = scales::percent(pct)), color = "black", vjust = -1) +
scale_y_continuous(limits = c(0, 1))
Should get you this (use themes and other formatting as desired)

how to control the color of ggrepel segments

Going to try this again with a better MRE...for context, here's the product I'm currently trying to improve
What I'm trying to do is get the lines from the endpoints to the labels to be the same color as the data lines.
For purposes of this question we can work with this script
library(ggplot2)
library(babynames)
library(dplyr)
library(ggrepel)
library(ggsci)
data <- babynames %>%
filter(name %in% c("Ashley", "Patricia", "Mary", "Minnie")) %>%
filter(sex=="F")
data <- data %>% group_by(name) %>%
mutate(change = n - lag(n)) %>%
mutate(meanC = mean(change, na.rm = TRUE)) %>%
ungroup()
data$label <- paste(data$name,"\n",round(data$meanC,0),sep="" )
minYear = min(data$year)
maxYear = max(data$year)
#endpoint layer
Endpoints <- data %>%
group_by(name) %>%
filter(year == max(year)) %>%
select(year, name, n, label) %>%
ungroup()
namePlot <- data %>%
ggplot(mapping = aes(x=year, y=n)) +
geom_line(aes(color=name), show.legend = FALSE) +
coord_cartesian(xlim = c(minYear, maxYear+10)) +
scale_color_ucscgb() +
geom_point(data = Endpoints, size=1.5, shape=21,
aes(color=name, fill=name), show.legend=FALSE) +
geom_label_repel(data=Endpoints, aes(label=label),
color = c("forestgreen","red")[1+grepl("\\-\\d",Endpoints$label)],
show.legend = FALSE,
vjust = 0, xlim=c(maxYear+3,maxYear+10), size=3, direction='y')
print(namePlot)
which produces this plot
The colors of the labels is controlled by color = c("forestgreen","red")[1+grepl("\\-\\d",Endpoints$label)], so that, in this case, data with a positive value in the label is green and data with a negative value is red. What I'd like to is make the connecting lines from the endpoints to the label boxes be the same color as the data lines, which are controlled by geom_line(aes(color=name),show.legend = FALSE
In the ggrepel docs there is a segment.color parameter that can control the color of the line segment, but it is not an aesthetic. So it appears it has to be "hard-coded" like segment.color="red" which doesn't really help me. I also found this discussion about the issue that seemed to present a solution, but I have been unable to get it to work. Part of the issue there is that it involves scale_color_discrete(aesthetics = c("color", "segment.color")) and I already have scale_color_ucscgb() so I get a warning about replacing scales...
Any guidance would be most appreciated.
Working version based on guidance from #aosmith
library(ggplot2)
library(babynames)
library(dplyr)
library(ggrepel)
library(ggsci)
data <- babynames %>%
filter(name %in% c("Ashley", "Patricia", "Mary", "Minnie")) %>%
filter(sex=="F")
data <- data %>% group_by(name) %>%
mutate(change = n - lag(n)) %>%
mutate(meanC = mean(change, na.rm = TRUE)) %>%
ungroup()
data$label <- paste(data$name,"\n",round(data$meanC,0),sep="" )
minYear = min(data$year)
maxYear = max(data$year)
#endpoint layer
Endpoints <- data %>%
group_by(name) %>%
filter(year == max(year)) %>%
select(year, name, n, label) %>%
ungroup()
namePlot <- data %>%
ggplot(mapping = aes(x=year, y=n)) +
geom_line(aes(color=name), show.legend = FALSE) +
coord_cartesian(xlim = c(minYear, maxYear+15)) +
geom_point(data = Endpoints, size=1.5, shape=21,
aes(color=name, fill=name), show.legend=FALSE) +
geom_label_repel(data=Endpoints, aes(label=label,
segment.color=name),
color = c("forestgreen","red")[1+grepl("\\-\\d",Endpoints$label)],
show.legend = FALSE,
force = 50,
vjust = 0, xlim=c(maxYear+5,maxYear+12), size=3, direction='y') +
scale_color_discrete(aesthetics = c("color", "segment.color"))
print(namePlot)
produces

Reorder vertical axis alphabetically and change position of binary variable of stacked percent bar graph (ggplot2)

I have a dataset with two variables: 1) ID, 2) Infection Status (Binary:1/0).
I would like to use ggplot2 to
Create a stacked percentage bar graph with the various ID on the verticle-axis (arranged alphabetically with A starting on top), and the percent on the horizontal-axis. I can't seem to get a code that will automatically sort the ID alphabetically as my original dataset has quite a number of categories and will be difficult to arrange them manually.
I also hope to have the infected category (1) to be red and towards the left of the blue non-infected category (0). Is it also possible to change the sub-heading of the legend box from "Non_infected" to "Non-infected"?
I hope that the displayed ID in the plot will include the count of the number of times the ID appeared in the dataset. E.g. "A (n=6)", "B (n=3)"
My sample code is as follow:
ID <- c("A","A","A","A","A","A",
"B","B","B",
"C","C","C","C","C","C","C",
"D","D","D","D","D","D","D","D","D")
Infection <- sample(c(1, 0), size = length(ID), replace = T)
df <- data.frame(ID, Infection)
library(ggplot2)
library(dplyr)
library(reshape2)
df.plot <- df %>%
group_by(ID) %>%
summarize(Infected = sum(Infection)/n(),
Non_Infected = 1-Infected)
df.plot %>%
melt() %>%
ggplot(aes(x = ID, y = value, fill = variable)) + geom_bar(stat = "identity", position = "stack") +
xlab("ID") +
ylab("Percent Infection") +
scale_fill_discrete(guide = guide_legend(title = "Infection Status")) +
coord_flip()
Right now I managed to get this output:
I hope to get this:
Thank you so much!
First, we need to add a count to your original data.frame.
df.plot <- df %>%
group_by(ID) %>%
summarize(Infected = sum(Infection)/n(),
Non_Infected = 1-Infected,
count = n())
Then, we augment our ID column, turn the Infection Status into a factor variable, use forcats::fct_rev to reverse the ID ordering, and use scale_fill_manual to control your legend.
df.plot %>%
mutate(ID = paste0(ID, " (n=", count, ")")) %>%
select(-count) %>%
melt() %>%
mutate(variable = factor(variable, levels = c("Non_Infected", "Infected"))) %>%
ggplot(aes(x = forcats::fct_rev(ID), y = value, fill = variable)) +
geom_bar(stat = "identity", position = "stack") +
xlab("ID") +
ylab("Percent Infection") +
scale_fill_manual("Infection Status",
values = c("Infected" = "#F8766D", "Non_Infected" = "#00BFC4"),
labels = c("Non-Infected", "Infected"))+
coord_flip()

bar chart of row freq ggplot2

I have the following data:
dataf <- read.table(text = "index,group,taxa1,taxa2,taxa3,total
s1,g1,2,5,3,10
s2,g1,3,4,3,10
s3,g2,1,2,7,10
s4,g2,0,4,6,10", header = T, sep = ",")
I'm trying to make a stacked bar plot of the frequences of the data so that it counts across the row (not down a column) for each index (s1,s2,s3,s4) and then for each group (g1,g2) of each taxa. I'm only able to figure out how to graph the species of one taxa but not all three stacked on each other.
Here are some examples of what I'm trying to make:
These were made on google sheets so they don't look like ggplot but it would be easier to make in r with ggplot2 because the real data set is larger.
You would need to reshape the data.
Here is my solution (broken down by plot)
For first plot
library(tidyverse)
##For first plot
prepare_data_1 <- dataf %>% select(index, taxa1:taxa3) %>%
gather(taxa,value, -index) %>%
mutate(index = str_trim(index)) %>%
group_by(index) %>% mutate(prop = value/sum(value))
##Plot 1
prepare_data_1 %>%
ggplot(aes(x = index, y = prop, fill = fct_rev(taxa))) + geom_col()
For second plot
##For second plot
prepare_data_2 <- dataf %>% select(group, taxa1:taxa3) %>%
gather(taxa,value, -group) %>%
mutate(group = str_trim(group)) %>%
group_by(group) %>% mutate(prop = value/sum(value))
##Plot 2
prepare_data_2 %>%
ggplot(aes(x = group, y = prop, fill = fct_rev(taxa))) + geom_col()
##You need to reshape data before doing that.
dfm = melt(dataf, id.vars=c("index","group"),
measure.vars=c("taxa1","taxa2","taxa3"),
variable.name="variable", value.name="values")
ggplot(dfm, aes(x = index, y = values, group = variable)) +
geom_col(aes(fill=variable)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.25)) +
geom_text(aes(label = values), position = position_stack(vjust = .5), size = 3) + theme_gray()

Resources