This question already has an answer here:
How to align and label the stratum in ggalluvial using ggrepel (or otherwise)
(1 answer)
Closed 1 year ago.
I am getting problems with applying ggrepel() in an alluvial plot with different variables on columns. Some observations are so small, I need ggrepel to make them readable.
Because there are three columns, I want to apply different ggrepel() functions to each column:
Left (region): Align lables to the left of axis
Middle (supplySector): Do nothing (i.e. leave text in axis)
Right (demandSector): Align to right of axis.
I've found these issues:
https://cran.r-project.org/web/packages/ggalluvial/vignettes/labels.html
and
How to align and label the stratum in ggalluvial using ggrepel (or otherwise)
Difference is: these examples only have 2 columns, and also columns made of the same variable (but subset of the variable). Previous published fixes are through an ifelse(), selecting a subset within the variable.
ReprEx:
library(ggplot2)
library(ggrepel)
library(tidyr)
library(dplyr)
df <- data.frame(region = c("A","A","A","B","B","B"),
supplySector = c("coal","gas","wind","coal","gas","wind"),
demandSector = c("resid","indus","ag","resid","indus","ag"),
Freq = 20*runif(6)); df
p<- ggplot(df, aes(y = Freq, axis1 = region, axis2 = supplySector, axis3=demandSector, label = after_stat(stratum))) +
ggalluvial::geom_alluvium(aes(fill = demandSector), width = 1/12, color="black", alpha=0.8) +
ggalluvial::geom_stratum(width = 1/3, fill = "grey70", color = "grey10", alpha=1) +
scale_x_discrete(limits = c("Region", "Supply Sector", "Demand Sector"), expand = c(0.3,0),drop=F) +
scale_y_continuous("Frequency (n)")+
theme_classic()+
theme(legend.position = "none")
I've tried to feed the colnames(df) == "region" to get a true/false vector into
p + ggrepel::geom_text_repel(
aes(label = ifelse(colnames(df) == "region", as.character(region), NA)),
stat = "stratum", size = 4, direction = "y", nudge_x = -.5
)
I would then repeat this for aes(label = ifelse(colnames(df) == "demandSector" with nudge_x = 1.5.
Maybe I got you wrong. But after a closer look at your example I would call it a duplicate to my answer you linked in your post.
library(ggplot2)
library(ggrepel)
library(ggalluvial)
p + ggrepel::geom_text_repel(
aes(label = ifelse(after_stat(x) == 1, as.character(after_stat(stratum)), NA)),
stat = "stratum", size = 4, direction = "y", nudge_x = -.5
) + ggrepel::geom_text_repel(
aes(label = ifelse(after_stat(x) == 2, as.character(after_stat(stratum)), NA)),
stat = "stratum", size = 4, direction = "y", nudge_x = 0
) + ggrepel::geom_text_repel(
aes(label = ifelse(after_stat(x) == 3, as.character(after_stat(stratum)), NA)),
stat = "stratum", size = 4, direction = "y", nudge_x = +.5
)
Related
I have the following data & code to produce a barplot (building on this answer)
tmpdf <- tibble(class = c("class 1", rep("class 2", 4), rep("class 3", 4)),
var_1 = c("none", rep(c("A", "B", "C", "D"), 2)),
y_ = as.integer(c(runif(9, min = 100, max=250))))
tmpdf <- rbind(tmpdf, cbind(expand.grid(class = levels(as.factor(tmpdf$class)),
var_1 = levels(as.factor(tmpdf$var_1))),
y_ = NA))
ggplot(data=tmpdf, aes(x = class, y = y_, fill=var_1, width=0.75 )) +
geom_bar(stat = "identity", position=position_dodge(width = 0.90), color="black", size=0.2)
This produces the below plot:
However, since not all class / var_1 combinations are present, some space on the x-axis is lost. I would now like to remove the empty space on the x-axis without making the bars wider(!).
Can someone point me to the right direction?
You can use na.omit to remove unused levels, and then use facet_grid with scales = "free_x" and space = "free_x" to remove space.
ggplot(data=na.omit(tmpdf), aes(x = var_1, y = y_, fill=var_1, width=0.75)) +
geom_col(position=position_dodge(width = 0.90), color="black", size=0.2) +
facet_grid(~ class, scales = "free_x", space = "free_x", switch = "x") +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
strip.background = element_blank())
Technically, you could tweak a column chart (geom_col) to the desired effect, like so:
mpdf %>%
mutate(xpos = c(1.6, 2 + .2 * 0:3, 3 + .2 * 0:3)) %>%
ggplot() +
geom_col(aes(x = xpos, y = y_, fill = var_1)) +
scale_x_continuous(breaks = c(1.6, 2.3 + 0:1), labels = unique(mpdf$class))
However, the resulting barplot (condensed or not) might be difficult to interpret as long as you want to convey differences between classes. For example, the plot has to be studied carefully to detect that variable D runs against the pattern of increasing values from class 2 to 3.
I am trying to obtain a back-to-back bar plot (or pyramid plot) similar to the ones shown here:
Population pyramid with gender and comparing across two time periods with ggplot2
Basically, a pyramid plot of a quantitative variable whose values have to be displayed for combinations of three categorical variables.
library(ggplot2)
library(dplyr)
df <- data.frame(Gender = rep(c("M", "F"), each = 20),
Age = rep(c("0-10", "11-20", "21-30", "31-40", "41-50",
"51-60", "61-70", "71-80", "81-90", "91-100"), 4),
Year = factor(rep(c(2009, 2010, 2009, 2010), each= 10)),
Value = sample(seq(50, 100, 5), 40, replace = TRUE)) %>%
mutate(Value = ifelse(Gender == "F", Value *-1 , Value))
ggplot(df) +
geom_col(aes(fill = interaction(Gender, Year, sep = "-"),
y = Value,
x = Age),
position = "dodge") +
scale_y_continuous(labels = abs,
expand = c(0, 0)) +
scale_fill_manual(values = hcl(h = c(15,195,15,195),
c = 100,
l = 65,
alpha=c(0.4,0.4,1,1)),
name = "") +
coord_flip() +
facet_wrap(.~ Gender,
scale = "free_x",
strip.position = "bottom") +
theme_minimal() +
theme(legend.position = "bottom",
panel.spacing.x = unit(0, "pt"),
strip.background = element_rect(colour = "black"))
example of back-to-back barplot I want to mimick
Trying to mimick this example on my data, things go wrong from the first ggplot function call as the bars are not dodged on both sides of the axis:
mydf = read.table("https://raw.githubusercontent.com/gilles-guillot/IPUMS_R/main/tmp/df.csv",
header=TRUE,sep=";")
ggplot(mydf) +
geom_col(aes(fill = interaction(mig,ISCO08WHO_yrstud, sep = "-"),
x = country,
y = f),
position = "dodge")
failed attempt to get a back-to-back bar plot
as I was expected from:
ggplot(df) +
geom_col(aes(fill = interaction(Gender, Year, sep = "-"),
y = Value,
x = Age),
position = "dodge")
geol_col plot with bar dodged symmetrically around axis
In the example you are following, df$Value is made negative if Gender == 'F'. You need to do similar to achieve "bar dodged symmetrically around axis".
I would like to combine stacked with dodge style of a barplot in ggplot. I'm quite near it with this code:
dates_g <- as.Date(c("2020-03-30","2020-03-30", "2020-04-30","2020-04-30", "2020-05-30","2020-05-30"))
value_x <- c(1, 2, 4, 1.4, 3.2, 1.3)
value_y <- c(1.2, 3, 4.6, 1, 3, 1)
ID <- c("A", "B", "A", "B", "A", "B")
results <- data.frame(dates_g, value_x, value_y, ID)
barwidth = 13
bar_comparison <- ggplot() +
geom_bar(data = results[,c(1,2,4)],
mapping = aes(x=dates_g , y=value_x, fill=ID),
stat ="identity",
position = "stack",
width = barwidth) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
geom_bar(data = results[,c(1,3,4)],
mapping = aes(x=dates_g + barwidth + 0.01 , y=value_y, fill=ID),
stat ="identity",
position = "stack",
width = barwidth) +
xlab("Date") + ylab("Value (in millions)")
ggplotly(bar_comparison)
which gives as a result:
I'm still not happy about two things: I would like the date to be between the two bars (but this is a minor problem) and then I really would like to have, for each date, different colors for the two bars: for example I would like to have the left bar to be in a scale of green (dark green and light green) and the right one in a scale of blue (dark blue and light blue). is it possible?
This at least is a solution for the main question.
I would suggest to use facet_wrap.
Data preparation for this -> bring data in long format, Extract the month name of your date (I use lubridate for this), then plot with ggplot
library(lubridate)
results_long <- results %>%
pivot_longer(
cols = starts_with("value"),
names_to = "Names",
values_to = "Values"
) %>%
mutate(dates_name = parse_number(as.character(dates_g)),
dates_name = month(ymd(dates_g), label = TRUE))
ggplot(results_long, aes(x = Names, y = Values, fill = ID)) +
geom_bar(stat = 'identity', position = 'stack') + facet_grid(~ dates_name) +
theme_bw()
I am trying to create a faceted alluvial plot with labels for the stratums on the first axis repelled to the left and left justified and the labels on the right repelled to the right and right justified.
# Small working example
# Install Packages and Libraries
install.packages("ggplot2")
install.packages("ggalluvial")
install.packages("ggrepel")
library(ggplot2)
library(ggalluvial)
library(ggrepel)
# Data Frame with 2 regions, 3 supply sectors and 3 demand sectors
df <- data.frame(region = c("A","A","A","B","B","B"),
supplySector = c("coal","gas","wind","coal","gas","wind"),
demandSector = c("resid","indus","ag","resid","indus","ag"),
value = 10*runif(6)); df
# Faceted plot with ggrepel (nudge_x and hjust assigned for each label) works.
p <- ggplot(df, aes(y = value, axis1 = supplySector, axis2 = demandSector, group=region)) +
ggalluvial::geom_alluvium(aes(fill = supplySector), width = 1/12, color="black", alpha=0.6) +
ggalluvial::geom_stratum(width = 1/12, fill = "grey70", color = "grey10", alpha=1) +
scale_x_discrete(limits = c("supplySector", "demandSector"), expand = c(0.3,0),drop=F) +
facet_wrap(region~.) +
ggrepel::geom_text_repel(stat = "stratum", label.strata = TRUE, direction = "y",
size = 4, segment.color = 'grey50',
nudge_x = rep(c(-3,-3,-3,3,3,3),2),
hjust = rep(c(1,1,1,-1,-1,-1),2)); p
# Faceted plot with ggrepel (nudge_x and hjust assigned for each label)
# does not work when different number of variables in each facet
df1 <- df[-nrow(df),]; df1 # Remove one of the rows from df
# So this gives the following plot with different alluvia in each facet
p1 <- ggplot(df1, aes(y = value, axis1 = supplySector, axis2 = demandSector, group=region)) +
ggalluvial::geom_alluvium(aes(fill = supplySector), width = 1/12, color="black", alpha=0.6) +
ggalluvial::geom_stratum(width = 1/12, fill = "grey70", color = "grey10", alpha=1) +
scale_x_discrete(limits = c("supplySector", "demandSector"), expand = c(0.3,0),drop=F) +
facet_wrap(region~.); p1
# If we try and label these and assigns the nudge and hjust for each axis we get an error
# It expects the same length vector for nudge and hjust for each facet
p1 + ggrepel::geom_text_repel(stat = "stratum", label.strata = TRUE, direction = "y",
size = 4, segment.color = 'grey50',
nudge_x = rep(c(-3,-3,-3,3,3,3),2),
hjust=rep(c(1,1,1,-1,-1,-1),2))
# Gives error: Error: Aesthetics must be either length 1 or the same as the data (10): hjust
# If we adjust the vectors for nudge_x and hjust to 10
p1 + ggrepel::geom_text_repel(stat = "stratum", label.strata = TRUE, direction = "y",
size = 4, segment.color = 'grey50',
nudge_x = c(-3,-3,-3,3,3,3,-3-3,3,3),
hjust = c(1,1,1,-1,-1,-1,1,1,-1,-1))
# Get Error: Error in data.frame(x = data$x + nudge_x, y = data$y + nudge_y) :
# arguments imply differing number of rows: 9, 6
# In addition: Warning message:
# In data$x + nudge_x :
# longer object length is not a multiple of shorter object length
# It can be plotted without specifying the nudge_x and hjust values
p1 + ggrepel::geom_text_repel(stat = "stratum", label.strata = TRUE, direction = "y",
size = 4, segment.color = 'grey50')
In summary, what I am trying to do is:
For plot p1 (with different number of alluvia in different facets)
Label each x axis stratum column
Have axis1 labels repel to the left and be left justified
Have axis2 labels repel to the right and be right justified
This answer suggested the different vector length for labels but it doesn't work for varying facets.
Labelling and theme of ggalluvial plot in R
This is tricky! The nudge_* and *just arguments generally aren't dynamic. One way you could solve for this is to dig into the guts using ggplot_build()
ggplot_build() has all of the "instructions" of how ggplot() builds the chart. You can edit the data and then run plot(ggplot_gtable()) to see the plot with your modifications. I have added comments to help explain these steps.
# here is the base plot + the new layer for labels
plot_and_label <-
p1 +
geom_text_repel(
stat = "stratum", label.strata = TRUE,
direction = "y", size = 4,
segment.color = 'grey50',
nudge_x = 0
)
# this is the plot under the hood
gg_guts <- ggplot_build(plot_and_label)
# the geom_text_repel layer was the 3rd one we added so you can
# access and edit it like this
gg_guts$data[[3]] <-
gg_guts$data[[3]] %>%
mutate(hjust = ifelse(x%%2 == 1, 2, -2))
# once you've made your adjustments, you can plot it again
plot(ggplot_gtable(gg_guts))
I have a test dataset like this:
df_test <- data.frame(
proj_manager = c('Emma','Emma','Emma','Emma','Emma','Alice','Alice'),
proj_ID = c(1, 2, 3, 4, 5, 6, 7),
stage = c('B','B','B','A','C','A','C'),
value = c(15,15,20,20,20,70,5)
)
Preparation for viz:
input <- select(df_test, proj_manager, proj_ID, stage, value) %>%
filter(proj_manager=='Emma') %>%
do({
proj_value_by_manager = sum(distinct(., proj_ID, value)$value);
mutate(., proj_value_by_manager = proj_value_by_manager)
}) %>%
group_by(stage) %>%
do({
sum_value_byStage = sum(distinct(.,proj_ID,value)$value);
mutate(.,sum_value_byStage= sum_value_byStage)
}) %>%
mutate(count_proj = length(unique(proj_ID)))
commapos <- function(x, ...) {
format(abs(x), big.mark = ",", trim = TRUE,
scientific = FALSE, ...) }
Visualization:
ggplot (input, aes(x=stage, y = count_proj)) +
geom_bar(stat = 'identity')+
geom_bar(aes(y=-proj_value_by_manager),
stat = "identity", fill = "Blue") +
scale_y_continuous(labels = commapos)+
coord_flip() +
ylab('') +
geom_text(aes(label= sum_value_byStage), hjust = 5) +
geom_text(aes(label= count_proj), hjust = -1) +
labs(title = "Emma: 4 projects| $90M Values \n \n Commitment|Projects") +
theme(plot.title = element_text(hjust = 0.5)) +
geom_hline(yintercept = 0, linetype =1)
My questions are:
Why is the y-values not showing up right? e.g. C is labeled 20, but nearing hitting 100 on the scale.
How to adjust the position of labels so that it sits on the top of its bar?
How to re-scale the y axis so that both the very short bar of 'count of project' and long bar of 'Project value' can be well displayed?
Thank you all for the help!
I think your issues are coming from the fact that:
(1) Your dataset has duplicated values. This causes geom_bar to add all of them together. For example there are 3 obs for B where proj_value_by_manager = 90 which is why the blue bar extends to 270 for that group (they all get added).
(2) in your second geom_bar you use y = -proj_value_by_manager but in the geom_text to label this you use sum_value_byStage. That's why the blue bar for A is extending to 90 (since proj_value_by_manager is 90) but the label reads 20.
To get you what I believe the chart you want is you could do:
#Q1: No dupe dataset so it doesnt erroneous add columns
input2 <- input[!duplicated(input[,-c(2,4)]),]
ggplot (input2, aes(x=stage, y = count_proj)) +
geom_bar(stat = 'identity')+
geom_bar(aes(y=-sum_value_byStage), #Q1: changed so this y-value matches your label
stat = "identity", fill = "Blue") +
scale_y_continuous(labels = commapos)+
coord_flip() +
ylab('') +
geom_text(aes(label= sum_value_byStage, y = -sum_value_byStage), hjust = 1) + #Q2: Added in y-value for label and hjust so it will be on top
geom_text(aes(label= count_proj), hjust = -1) +
labs(title = "Emma: 4 projects| $90M Values \n \n Commitment|Projects") +
theme(plot.title = element_text(hjust = 0.5)) +
geom_hline(yintercept = 0, linetype =1)
For your last question, there is no good way to display both of these. One option would be to rescale the small data and still label it with a 1 or 3. However, I didn't do this because once you scale down the blue bars the other bars look OK to me.