Barplot in ggplot - dodge position + counting - r

Hey I have the following code:
df = data.frame(Type = c("A", "B", "A", "A", "B"), FLAG = c(1, 1, 0, 1, 0))
df
ggplot(df, aes(x = Type)) + geom_bar(stat = "count", aes(fill = factor(FLAG)), position = "dodge") + coord_flip() + stat_count(geom = "text", colour = "white", size = 3.5,
aes(label = ..count..),position=position_stack(vjust=0.5)) + theme_bw()
but it doesnt work as I want. The graph is OK but instead displaying the total number of observations of each type I want to display the number of each flag (so instead 2 for "B" type I want to display 1 and 1 because for "B" we have 1 observation with FLAG 1 and 1 observations with FLAG 0). What should I change?

With the interaction between Type and FLAG the bars display the counts per groups of both.
ggplot(df, aes(x = interaction(Type, FLAG))) +
geom_bar(stat = "count",
aes(fill = factor(FLAG)), position = "dodge") +
coord_flip() +
stat_count(geom = "text",
aes(label = ..count..),
position=position_stack(vjust=0.5),
colour = "white", size = 3.5) +
theme_bw()

You could replace the stat_count() and geom_bar() with a little pre-processing with count() and geom_col(). Here is an example:
df %>%
janitor::clean_names() %>%
count(type, flag) %>%
ggplot(aes(type, n, fill = as.factor(flag))) +
geom_col(position = "dodge") +
geom_text(aes(label = n, y = n - 0.05), color = "white",
position = position_dodge(width = 1)) +
scale_y_continuous(breaks = 0:3, limits = c(0,3)) +
labs(fill = "flag") +
coord_flip() +
theme_bw()
The only thing janitor::clean_names() does is transform variable names, from uppercase and spaces to lowercase and underscores, respectively.

Related

gghighlight (R): Labeling bar charts

Alright, after a long silent read along, here's my first question. I am trying to add corresponding labels of unhighlighted items for a grouped barplot. When I insert gghighlight in front of the geom_text I get the following plot:
library(tidyverse)
library(gghighlight)
df <- data.frame (group = c("A", "A", "B", "B", "C", "C"),
value = c("value_1", "value_2","value_1", "value_2","value_1", "value_2"),
mean = c(1.331, 1.931, 3.231, 3.331, 4.631, 3.331)
)
ggplot(data = df, aes(x = group, y = mean, fill = value)) +
geom_bar(stat = "identity", position = "dodge") +
gghighlight(group != "B",
label_key = group
) +
geom_text(aes(label = round(mean, digits = 2)),
stat= "identity",
vjust = -.5,
position = position_dodge(width = .9)
)
If I move gghightlight behind the geom_text I get the following plot:
ggplot(data = df, aes(x = group, y = mean, fill = value)) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(label = round(mean, digits = 2)),
stat= "identity",
vjust = -.5,
position = position_dodge(width = .9)
) +
gghighlight(group != "B",
label_key = group)
Is there a way to label the unhighligthed bars like the highlighted ones?
Thanks in advance.
############## EDIT ###########
Besides graying out certain columns (see #TarJae's answer), there is also the possibility to make them transparent (essential parts are from this post: ggplot transparency on individual bar):
subset_df <- df %>%
mutate(alpha.adj = as.factor(ifelse(group != "B", 1, 0.6)))
ggplot(data = subset_df, aes(x = group, y = mean, fill = value, alpha=factor(alpha.adj))) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(label = round(mean, digits = 2)),
stat= "identity",
vjust = -.5,
position = position_dodge(width = .9)
) +
scale_alpha_manual(values = c("0.6"=0.6, "1"=1), guide='none')
[]
Are you looking for this?
This is a solution without using gghighlight package:
library(tidyverse)
subset_df <- df %>%
mutate(highlight = if_else(group != "B", mean, NA_real_))
ggplot(data = subset_df, aes(x = group, y = mean, group=value)) +
geom_col(fill = 'grey', alpha = 0.6, position = 'dodge') +
geom_col(aes(y = highlight, fill = value), position = 'dodge') +
geom_text(aes(group, label = round(mean, digits = 2)),
position = position_dodge(width = 1))
This is a solution with the gghighlight package and some limited hacky code.
When reading the vignette, I noticed that the author of the package "filters out" the data that are not highlighted. You can see that if you save your highlighted plot in p_h and then look at p_h$data, the values for group B have disappeared.
library(tidyverse)
library(gghighlight)
p_h <- ggplot(data = df, aes(x = group, y = mean, fill = value)) +
geom_bar(stat = "identity", position = "dodge") +
gghighlight(group != "B",
label_key = group) +
geom_text(aes(label = round(mean, digits = 2)),
stat= "identity",
vjust = -.5,
position = position_dodge(width = .9))
> p_h$data
group value mean
1 A value_1 1.331
2 A value_2 1.931
5 C value_1 4.631
6 C value_2 3.331
If we re-insert the data (after the call to gghighlight() has removed them), then geom_text() will be able to find the means for group B again.
One can "recover" the data and re-insert them with the following code:
### create a ggplot object with the original complete data
### you could check that with p_to_copy_data$data
p_to_copy_data <- ggplot(data = df)
### copy the complete data to your highlighted plot data section
p_h$data <- p_to_copy_data$data
p_h
This yields the following graph:

ggplot: Order stacked barplots by variable proportion

I am creating a plot with 3 variables as below. Is there a way to arrange the plot in a descending order such that the bar with the highest proportion of variable "c" comes first in the plot. Using this example last bar should come in first then middle one and then the first bar in the last.
long<- data.frame(
Name = c("abc","abc","abc","gif","gif","gif","xyz","xyz","xyz"),
variable = c("a","b","c","a","b","c","c","b","a"),
value = c(4,6,NA,2,8,1,6,NA,NA))
long_totals <- long %>%
group_by(Name) %>%
summarise(Total = sum(value, na.rm = T))
p <- ggplot()+
geom_bar(data = long,
aes(x = Name,
y = value,
fill=variable),
stat="summary",
position = "fill") +
geom_text(data = long_totals,
aes(y = 100,
x = Name,
label = Total),
size = 7,
position = position_fill(vjust = 1.02)) +
scale_y_continuous(labels = scales::percent_format()) +
ylab("Total_num") +
ggtitle("Totalnum") +
theme(plot.title = element_text(size = 20, hjust = 0.5)) +
theme(axis.text.x = element_text(angle = 75, vjust = 0.95, hjust=1))
The following code does arrange the bars by count of "c" but not by proportion. How can I arrange by proportion?
p<-long %>%
mutate(variable = fct_relevel(variable,
c("c", "b", "a"))) %>%
arrange(variable) %>%
mutate(Name = fct_inorder(Name))
p %>%
ggplot() +
aes(x = Name,
y = value,
fill = variable) +
geom_bar(position = "fill",
stat = "summary") +
We could use fct_rev from forcats package, it is in tidyverse:
p <- ggplot()+
geom_bar(data = long,
aes(x = fct_rev(Name),
y = value,
fill=variable),
stat="summary",
position = "fill") +
geom_text(data = long_totals,
aes(y = 100,
x = Name,
label = Total),
size = 7,
position = position_fill(vjust = 1.02)) +
scale_y_continuous(labels = scales::percent_format()) +
ylab("Total_num") +
ggtitle("Totalnum") +
theme(plot.title = element_text(size = 20, hjust = 0.5)) +
theme(axis.text.x = element_text(angle = 75, vjust = 0.95, hjust=1))

How to automatically change label color depending on relative values (maximum/minimum)?

In order to make a dynamic visualization, for example in a dashboard, I want to display the label colors (percentages or totals) depending on their real values in black or white.
As you can see from my reprex below, I changed the color of the label with the highest percentage manually to black, in order gain a better visability.
Is there a was, to automatically implement the label color? The label with the highest percentage corresponding should always be black, if data is changing over time.
library(ggplot2)
library(dplyr)
set.seed(3)
reviews <- data.frame(review_star = as.character(sample.int(5,400, replace = TRUE)),
stars = 1)
df <- reviews %>%
group_by(review_star) %>%
count() %>%
ungroup() %>%
mutate(perc = `n` / sum(`n`)) %>%
arrange(perc) %>%
mutate(labels = scales::percent(perc))
ggplot(df, aes(x = "", y = perc, fill = review_star)) +
geom_col(color = "black") +
geom_label(aes(label = labels), color = c( "white", "white","white",1,"white"),
position = position_stack(vjust = 0.5),
show.legend = FALSE) +
guides(fill = guide_legend(title = "Answer")) +
scale_fill_viridis_d() +
coord_polar(theta = "y") +
theme_void()
you can set the colors using replace(rep('white', nrow(df)), which.max(df$perc), 'black').
ggplot(df, aes(x = "", y = perc, fill = review_star)) +
geom_col(color = "black") +
geom_label(aes(label = labels),
color = replace(rep('white', nrow(df)), which.max(df$perc), 'black'),
position = position_stack(vjust = 0.5),
show.legend = FALSE) +
guides(fill = guide_legend(title = "Answer")) +
scale_fill_viridis_d() +
coord_polar(theta = "y") +
theme_void()

Control colour of geom_text_repel

I would like to change the colour of one of my ggrepel labels to black. I have tried to override the inheritance by specifying ...geom_text_repel(...colour='black') but that doesn't seem to work.
My attempt at a fix to the problem is in the second geom_text_repel function (below).
N.B. If there is a way to control the colour of individual geom_text_repel elements, rather than having to call the function twice, I would prefer that.
library("tidyverse")
library("ggthemes")
library("ggrepel")
df1 <- gather(economics, variable_name, observation, -date) %>%
rename(period = date) %>%
filter(variable_name == 'psavert')
df2 <- gather(economics, variable_name, observation, -date) %>%
rename(period = date) %>%
filter(variable_name == 'uempmed')
ggplot(df1, aes(x = period, y = observation, colour = variable_name)) +
geom_line() +
geom_line(data = df2, colour = 'black', size = .8) +
geom_text_repel(
data = subset(df1, period == max(as.Date(period, "%Y-%m-%d"))),
aes(label = variable_name),
size = 3,
nudge_x = 45,
segment.color = 'grey80'
) +
geom_text_repel(
data = subset(df2, period == max(as.Date(period, "%Y-%m-%d"))),
aes(label = variable_name, colour = 'black'), #How do I set the colour of the label text to black?
size = 3,
nudge_x = 45,
segment.color = 'grey80'
) +
scale_y_continuous(labels = scales::comma) +
theme_minimal(base_size = 16) +
scale_color_tableau() +
scale_fill_tableau() +
theme(legend.position = 'none') +
labs(x="", y="", title = "Economic Data") +
scale_x_date(limits = c(min(df1$period), max(df1$period) + 1200))
Do the same thing you did in your geom_line() layer. You want to set a color, not a mapping. Make colour = 'black' an argument to geom_text_repel(), not aes().
ggplot(df1, aes(x = period, y = observation, colour = variable_name)) +
geom_line() +
geom_line(data = df2, colour = 'black', size = .8) + # just like this layer
geom_text_repel(
data = subset(df1, period == max(as.Date(period, "%Y-%m-%d"))),
aes(label = variable_name),
size = 3,
nudge_x = 45,
segment.color = 'grey80'
) +
geom_text_repel(
data = subset(df2, period == max(as.Date(period, "%Y-%m-%d"))),
aes(label = variable_name) # don't assign it here,
size = 3,
nudge_x = 45,
segment.color = 'grey80',
colour = "black" # assign it here
) +
scale_y_continuous(labels = scales::comma) +
theme_minimal(base_size = 16) +
scale_color_tableau() +
scale_fill_tableau() +
theme(legend.position = 'none') +
labs(x="", y="", title = "Economic Data") +
scale_x_date(limits = c(min(df1$period), max(df1$period) + 1200))
Note that now the first line AND text are now both set manually to "black", so the automatic variable assignment will start over with next line (and text). If you want to set that manually to a different color, you can use the same strategy (set it as an argument to the geom, not as an argument to aes

stacked bar *bringing labels to the graph *

I'm plotting a stacked bar graph and use geom_text to insert the value and name of each stack. The problem is some stacks are very small/narrow, so that the text of two stacks overlap each other and hence is not very readable. How can I modify the code to solve this issue.
Type<-c("ddddddddddd","ddddddddddd","bbbbbbbbbbbbb","ddddddddddd","eeeeeeeeeeeeee","bbbbbbbbbbbbb","ddddddddddd","bbbbbbbbbbbbb","ddddddddddd",
"eeeeeeeeeeeeee","mmmmmmmmmmmmmmmmmmm","bbbbbbbbbbbbb","ddddddddddd","bbbbbbbbbbbbb","eeeeeeeeeeeeee")
Category<-c("mmmmm","mmmmm","gggggggggggggggggg","ffffffffffff","ffffffffffff","ffffffffffff","sanddddddddd","sanddddddddd","yyyyyyyyyyy",
"yyyyyyyyyyy","yyyyyyyyyyy","sssssssssssssss","sssssssssssssss","sssssssssssssss","ttttttttttttt")
Frequency<-c(4,1,30,7,127,11,1,1,6,9,1,200,3,4,5)
Data <- data.frame(Type, Category, Frequency)
p <- ggplot(Data, aes(x = Type, y = Frequency)) +
geom_bar(aes(fill = Category), stat="identity", show.legend = FALSE) +
geom_text(aes(label = Frequency), size = 3) +
geom_text(aes(label = Category), size = 3)
Considering your data, a facetted plot might be a better approach:
# summarise your data
library(dplyr)
d1 <- Data %>%
mutate_each(funs(substr(.,1,2)),Type,Category) %>%
group_by(Type,Category) %>%
summarise(Freq = sum(Frequency)) %>%
mutate(lbl = paste(Category,Freq)) # create a label by pasting the 'Category' and the 'Freq' variables together
# plot
ggplot(d1, aes(x = Category, y = Freq, fill = Category)) +
geom_bar(stat="identity", width = 0.7, position = position_dodge(0.8)) +
geom_text(aes(label = lbl), angle = 90, size = 5, hjust = -0.1, position = position_dodge(0.8)) +
scale_y_continuous(limits = c(0,240)) +
guides(fill = FALSE) +
facet_grid(.~Type, scales = "free", space = "free") +
theme_bw(base_size = 14)
which gives:
In the above plot I shortened the labels on purpose. If you don't want to do that, you could consider this:
d2 <- Data %>%
group_by(Type,Category) %>%
summarise(Freq = sum(Frequency)) %>%
mutate(lbl = paste(Category,Freq))
ggplot(d2, aes(x = Category, y = Freq, fill = Category)) +
geom_bar(stat="identity", width = 0.7, position = position_dodge(0.8)) +
geom_text(aes(y = 5, label = lbl), alpha = 0.6, angle = 90, size = 5, hjust = 0, position = position_dodge(0.8)) +
scale_y_continuous(limits = c(0,240)) +
guides(fill = FALSE) +
facet_grid(.~Type, scales = "free", space = "free") +
theme_bw(base_size = 14) +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank())
which gives:

Resources