Adding labels to individual % inside geom_bar() using R / ggplot2 [duplicate] - r

This question already has answers here:
Add percentage labels to a stacked barplot
(2 answers)
Closed 3 years ago.
bgraph <- ggplot(data = data, aes(x = location)) +
geom_bar(aes(fill = success))
success is a percentage calculated as a factor of 4 categories with the varying 4 outcomes of the data set. I could separately calculate them easily, but as the ggplot is currently constituted, they are generated by the geom_bar(aes(fill=success)).
data <- as.data.frame(c(1,1,1,1,1,1,2,2,3,3,3,3,4,4,4,4,4,4,
4,4,5,5,5,5,6,6,6,6,6,6,7,7,7,7,7))
data[["success"]] <- c("a","b","c","c","d","d","a","b","b","b","c","d",
"a","b","b","b","c","c","c","d","a","b","c","d",
"a","b","c","c","d","d","a","b","b","c","d")
names(data) <- c("location","success")
bgraph <- ggplot(data = data, aes(x = location)) +
geom_bar(aes(fill = success))
bgraph
How do I get labels over the individual percentages? More specifically, I wanted 4 individual percentages for each bar. One for yellow, light orange, orange, and red, respectively. %'s all add up to 1.

Maybe there is a way to do this in ggplot directly but with some pre-processing in dplyr, you'll be able to achieve your desired output.
library(dplyr)
library(ggplot2)
data %>%
count(location, success) %>%
group_by(location) %>%
mutate(n = n/sum(n) * 100) %>%
ggplot() + aes(x = location, n, fill = success,label = paste0(round(n, 2), "%")) +
geom_bar(stat = "identity") +
geom_text(position=position_stack(vjust=0.5))

How about creating a summary frame with the relative frequencies within location and then using that with geom_col() and geom_text()?
# Create summary stats
tots <-
data %>%
group_by(location,success) %>%
summarise(
n = n()
) %>%
mutate(
rel = round(100*n/sum(n)),
)
# Plot
ggplot(data = tots, aes(x = location, y = n)) +
geom_col(aes(fill = fct_rev(success))) + # could only get it with this reversed
geom_text(aes(label = rel), position = position_stack(vjust = 0.5))
OUTPUT:

Related

How to make a dual axis in ggplot R

I have made a time series plot for total count data of 4 different species. As you can see the results with sharksucker have a much higher count than the other 3 species. To see the trends of the other 3 species they need to plotted separately (or on a smaller y axis). However, I have a figure limit in my masters paper. So, I was trying to create a dual axis plot or have the y axis split into two. Does anyone know of a way I could do this?
library(tidyverse)
library(reshape2)
dat <- read_xlsx("ReefPA.xlsx")
dat1 <- dat
dat1$Date <- format(dat1$Date, "%Y/%m")
plot_dat <- dat1 %>%
group_by(Date) %>%
summarise(Sharksucker_Remora = sum(Sharksucker_Remora)) %>%
melt("Date") %>%
filter(Date > '2018-01-01') %>%
arrange(Date)
names(plot_dat) <- c("Date", "Species", "Count")
ggplot(data = plot_dat) +
geom_line(mapping = aes(x = Date, y = Count, group = Species, colour = Species)) +
stat_smooth(method=lm, aes(x = Date, y = Count, group = Species, colour = Species)) +
scale_colour_manual(values=c(Golden_Trevally="goldenrod2", Red_Snapper="firebrick2", Sharksucker_Remora="darkolivegreen3", Juvenile_Remora="aquamarine2")) +
xlab("Date") +
ylab("Total Presence Per Month") +
theme(legend.title = element_blank()) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
The thing is, the problem you're trying to solve doesn't seem like a 2nd Y axis issue. The problem here is of relative scale of the species. You might want to think of something like standardizing the initial species presence to 100 and showing growth or decline from there.
Another option would be faceting by species.

ggplot2 barplot - adding percentage labels inside the stacked bars but retaining counts on the y-axis

I have created an stacked barplot with the counts of a variables. I want to keep these as counts, so that the different bar sizes represent different group sizes. However, inside the bar plot i would like to add labels that show the proportion of each stack - in terms of percentage.
I managed to create the stacked plot of count for every group. Also I have created the labels and they are are placed correctly. What i struggle with is how to calculate the percentage there?
I have tried this, but i get an error:
dataex <- iris %>%
dplyr::group_by(group, Species) %>%
dplyr::summarise(N = n())
names(dataex)
dataex <- as.data.frame(dataex)
str(dataex)
ggplot(dataex, aes(x = group, y = N, fill = factor(Species))) +
geom_bar(position="stack", stat="identity") +
geom_text(aes(label = ifelse((..count..)==0,"",scales::percent((..count..)/sum(..count..)))), position = position_stack(vjust = 0.5), size = 3) +
theme_pubclean()
Error in (count) == 0 : comparison (1) is possible only for atomic
and list types
desired result:
well, just found answer ... or workaround. Maybe this will help someone in the future: calculate the percentage before the ggplot and then just just use that vector as labels.
dataex <- iris %>%
dplyr::group_by(group, Species) %>%
dplyr::summarise(N = n()) %>%
dplyr::mutate(pct = paste0((round(N/sum(N)*100, 2))," %"))
names(dataex)
dataex <- as.data.frame(dataex)
str(dataex)
ggplot(dataex, aes(x = group, y = N, fill = factor(Species))) +
geom_bar(position="stack", stat="identity") +
geom_text(aes(label = dataex$pct), position = position_stack(vjust = 0.5), size = 3) +
theme_pubclean()

Reorder vertical axis alphabetically and change position of binary variable of stacked percent bar graph (ggplot2)

I have a dataset with two variables: 1) ID, 2) Infection Status (Binary:1/0).
I would like to use ggplot2 to
Create a stacked percentage bar graph with the various ID on the verticle-axis (arranged alphabetically with A starting on top), and the percent on the horizontal-axis. I can't seem to get a code that will automatically sort the ID alphabetically as my original dataset has quite a number of categories and will be difficult to arrange them manually.
I also hope to have the infected category (1) to be red and towards the left of the blue non-infected category (0). Is it also possible to change the sub-heading of the legend box from "Non_infected" to "Non-infected"?
I hope that the displayed ID in the plot will include the count of the number of times the ID appeared in the dataset. E.g. "A (n=6)", "B (n=3)"
My sample code is as follow:
ID <- c("A","A","A","A","A","A",
"B","B","B",
"C","C","C","C","C","C","C",
"D","D","D","D","D","D","D","D","D")
Infection <- sample(c(1, 0), size = length(ID), replace = T)
df <- data.frame(ID, Infection)
library(ggplot2)
library(dplyr)
library(reshape2)
df.plot <- df %>%
group_by(ID) %>%
summarize(Infected = sum(Infection)/n(),
Non_Infected = 1-Infected)
df.plot %>%
melt() %>%
ggplot(aes(x = ID, y = value, fill = variable)) + geom_bar(stat = "identity", position = "stack") +
xlab("ID") +
ylab("Percent Infection") +
scale_fill_discrete(guide = guide_legend(title = "Infection Status")) +
coord_flip()
Right now I managed to get this output:
I hope to get this:
Thank you so much!
First, we need to add a count to your original data.frame.
df.plot <- df %>%
group_by(ID) %>%
summarize(Infected = sum(Infection)/n(),
Non_Infected = 1-Infected,
count = n())
Then, we augment our ID column, turn the Infection Status into a factor variable, use forcats::fct_rev to reverse the ID ordering, and use scale_fill_manual to control your legend.
df.plot %>%
mutate(ID = paste0(ID, " (n=", count, ")")) %>%
select(-count) %>%
melt() %>%
mutate(variable = factor(variable, levels = c("Non_Infected", "Infected"))) %>%
ggplot(aes(x = forcats::fct_rev(ID), y = value, fill = variable)) +
geom_bar(stat = "identity", position = "stack") +
xlab("ID") +
ylab("Percent Infection") +
scale_fill_manual("Infection Status",
values = c("Infected" = "#F8766D", "Non_Infected" = "#00BFC4"),
labels = c("Non-Infected", "Infected"))+
coord_flip()

is it possible to ggplot grouped partial boxplots w/o facets w/ a single `geom_boxplot()`?

I needed to add some partial boxplots to the following plot:
library(tidyverse)
foo <- tibble(
time = 1:100,
group = sample(c("a", "b"), 100, replace = TRUE) %>% as.factor()
) %>%
group_by(group) %>%
mutate(value = rnorm(n()) + 10 * as.integer(group)) %>%
ungroup()
foo %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE)
I would add a grid of (2 x 4 = 8) boxplots (4 per group) to the plot above. Each boxplot should consider a consecutive selection of 25 (or n) points (in each group). I.e., the firsts two boxplots represent the points between the 1st and the 25th (one boxplot below for the group a, and one boxplot above for the group b). Next to them, two other boxplots for the points between the 26th and 50th, etcetera. If they are not in a perfect grid (which I suppose would be both more challenging to obtain and uglier) it would be even better: I prefer if they will "follow" their corresponding smooth line!
That all without using facets (because I have to insert them in a plot which is already facetted :-))
I tried to
bar <- foo %>%
group_by(group) %>%
mutate(cut = 12.5 * (time %/% 25)) %>%
ungroup()
bar %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(aes(x = cut))
but it doesn't work.
I tried to call geom_boxplot() using group instead of x
bar %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(aes(group = cut))
But it draws the boxplots without considering the groups and loosing even the colors (and add a redundant call including color = group doesn't help)
Finally, I decided to try it roughly:
bar %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(data = filter(bar, group == "a"), aes(group = cut)) +
geom_boxplot(data = filter(bar, group == "b"), aes(group = cut))
And it works (maintaining even the correct colors from the main aes)!
Does someone know if it is possible to obtain it using a single call to geom_boxplot()?
Thanks!
This was interesting! I haven't tried to use geom_boxplot with a continuous x before and didn't know how it behaved. I think what is happening is that setting group overrides colour in geom_boxplot, so it doesn't respect either the inherited or repeated colour aesthetic. I think this workaround does the trick; we combine the group and cut variables into group_cut, which takes 8 different values (one for each desired boxplot). Now we can map aes(group = group_cut) and get the desired output. I don't think this is particularly intuitive and it might be worth raising it on the Github, since usually we expect aesthetics to combine nicely (e.g. combining colour and linetype works fine).
library(tidyverse)
bar <- tibble(
time = 1:100,
group = sample(c("a", "b"), 100, replace = TRUE) %>% as.factor()
) %>%
group_by(group) %>%
mutate(
value = rnorm(n()) + 10 * as.integer(group),
cut = 12.5 * ((time - 1) %/% 25), # modified this to prevent an extra boxplot
group_cut = str_c(group, cut)
) %>%
ungroup()
bar %>%
ggplot(aes(x = time, y = value, colour = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(aes(group = group_cut), position = "identity")
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Created on 2019-08-13 by the reprex package (v0.3.0)

Back to back bar chart with three levels: Can I center the plot?

I wish to create a back to back bar chart. In my data, I have a number of species observations (n) from 2017 and 2018. Some species occurred only in 2017 other occurred both years and some only occurred in 2018. I wish to depict this in a graph centered around the number of species occurring both years across multiple sites (a,b,c).
First, I create a data set:
n <- sample(1:50, 9)
reg <- c(rep("2017", 3), rep("Both",3), rep("2018", 3))
plot <- c(rep(c("a", "b", "c"), 3))
d4 <- data.frame(n, reg, plot)
I use ggplot to try to plot my graph - I have tried two ways:
library(ggplot2)
ggplot(d4, aes(plot, n, fill = reg)) +
geom_col() +
coord_flip()
ggplot(d4, aes(x = plot, y = n, fill = reg))+
coord_flip()+
geom_bar(stat = "identity", width = 0.75)
I get a plot similar to what I want. However, would like the blue 'both' bar to be in between the 2017 and 2018 bars. Further, my main problem, I would like to center the 'both' bar in the middle of the plot. The 2017 column should extend to the left and the 2018 column to the right. My question is somewhat similar to the one in the link below; however, as I have only three and not four levels in my graph, I cannot use the same approach as below.
Creating a stacked bar chart centered on zero using ggplot
I'm not sure this is the best way to do that, but here is a way to do that:
library(dplyr)
d4pos <- d4 %>%
filter(reg != 2018) %>%
group_by(reg, plot) %>%
summarise(total = sum(n)) %>%
ungroup() %>%
mutate(total = total * ifelse(reg == "Both", .5, 1))
d4neg <- d4 %>%
filter(reg != 2017) %>%
group_by(reg, plot) %>%
summarise(total = - sum(n)) %>%
ungroup() %>%
mutate(total = total * ifelse(reg == "Both", .5, 1))
ggplot(data = d4pos, aes(x = plot, y = total, fill = reg)) +
geom_bar(stat = "identity") +
geom_bar(data = d4neg, stat = "identity", aes(x = plot, y = total, fill = reg)) +
coord_flip()
I generate two data frames for the total of each group. One contains the 2017 and (half of) Both, and the other contains the rest. The value for the 2018 data frame is flipped to plot on the negative side.
The output looks like this:
EDIT
If you want to have positive values in both directions for the horizontal axis, you can do something like this:
ggplot(data = d4pos, aes(x = plot, y = total, fill = reg)) +
geom_bar(stat = "identity") +
geom_bar(data = d4neg, stat = "identity", aes(x = plot, y = total, fill = reg)) +
scale_y_continuous(breaks = seq(-50, 50, by = 25),
labels = abs(seq(-50, 50, by = 25))) +
coord_flip()

Resources