Mixed Variable Stacked Barplot - r

I would like to produce a GGPlot stacked barplot with variables on the horizontal axis, yet some variables have different responses.
Some variables are 'Y/N' responses. Some variables are 'Old/Young'. And some of the variables are a Likert scale of 0-5.
Therefore, I would like to plot these in a stacked barplot form, with each variable type encoded with a different colour palette, and with a legend reflecting the different palettes/variable types.
I wondered if someone could help with this, please? Would be greatly appreciated.
X1<-c("N","N","N","N","Y","N","Y","N","N","N","N","N","Y","N","N","Y","N","N","N","Y","N","Y","Y","N","N","Y","Y","Y","N","N","N","N","N","N","N","N","Y","N","Y","N","N","N","N","Y","N","N","Y","N","Y","Y","N","Y","N","N")
X2 <-c("N","N","N","N","Y","N","Y","N","N","N","N","N","Y","N","N","Y","N","N","N","Y","N","Y","Y","N","N","Y","Y","Y","N","N","N","N","N","N","N","N","Y","N","Y","N","N","N","N","Y","N","N","Y","N","Y","Y","N","Y","N","N")
X3<-c(1,1,0,1,2,0,0,0,0,0,1,1,1,2,0,1,2,1,1,0,0,0,4,1,0,0,0,0,1,0,2,0,0,2,1,1,0,0,0,1,1,0,1,0,1,0,1,1,0,1,0,1,1,1)
X4 <-c("YouNg","Old","Old","YouNg","Old","Old","Old","YouNg","YouNg","YouNg","Old","Old","Old",
"Old","Old","Old","Old","YouNg","Old","Old","Old","YouNg","YouNg","Old","Old","Old",
"Old","Old","Old","Old","Old","Old","Old","YouNg","Old","YouNg","Old","YouNg","Old",
"Old","YouNg","Old","YouNg","YouNg","Old","Old","Old","YouNg","Old","Old","Old","YouNg", "Old", "Old")
Y <- data.frame(X1, X2, X3, X4)

One option to achieve your desired result would to use the ggnewscale package which allows for multiple scales and legends for the same aesthetic:
library(ggplot2)
library(ggnewscale)
library(tidyr)
library(dplyr)
dat <- Y |>
mutate(across(everything(), as.character)) |>
pivot_longer(everything(), names_to = "var")
ggplot(dat, aes(y = var)) +
geom_bar(data = ~subset(.x, var %in% c("X1", "X2")), aes(fill = value), position = "fill") +
scale_fill_brewer(palette = "Accent", guide = guide_legend(order = 3))+
new_scale_fill() +
geom_bar(data = ~subset(.x, var %in% c("X3")), aes(fill = value), position = "fill") +
scale_fill_brewer(palette = "Dark2", guide = guide_legend(order = 2)) +
new_scale_fill() +
geom_bar(data = ~subset(.x, var %in% c("X4")), aes(fill = value), position = "fill") +
scale_fill_brewer(palette = "Paired", guide = guide_legend(order = 1))
EDIT Depending on what you want to achieve there are three arguments to consider:
Use position = position_fill(reverse = TRUE) in geom_bar to reverse the order of the stack
To reverse the order of the fill colors use direction=-1 in scale_fill_xxx
Finally, if you want to reverse the order in the legend use reverse=TRUE in guide_legend.
The example code below uses all three, i.e.
the bars run from 0 on the left to 4 on the right.
the fill colors are reversed so that 0 is now assigned "pink" and 4 the "green" color
and finally the order in the legend is reversed.
ggplot(dat, aes(y = var)) +
geom_bar(data = ~subset(.x, var %in% c("X1", "X2")), aes(fill = value), position = "fill") +
scale_fill_brewer(palette = "Accent", guide = guide_legend(order = 3))+
new_scale_fill() +
geom_bar(data = ~subset(.x, var %in% c("X3")), aes(fill = value), position = position_fill(reverse = TRUE)) +
scale_fill_brewer(palette = "Dark2", guide = guide_legend(order = 2, reverse = TRUE), direction = -1) +
new_scale_fill() +
geom_bar(data = ~subset(.x, var %in% c("X4")), aes(fill = value), position = "fill") +
scale_fill_brewer(palette = "Paired", guide = guide_legend(order = 1))

Related

Show alpha/opacity in ggplot legend

I have two dodge bar chart which I have put on top of each other to create this plot.
Groups <- c(1, 2,1,2,1,2)
variable <- c("Yes", "Yes", "Maybe", "Maybe", "No", "No")
value <- c(50,60,70,80,90,100)
df <- data.frame(Groups, variable, value)
Groups <- c(1, 2,1,2,1,2)
variable <- c("Yes*", "Yes*", "Maybe*", "Maybe*", "No*", "No*")
value <- c(5,6,7,8,9,10)
df2 <- data.frame(Groups, variable, value)
ggplot() +
geom_bar(data=df, aes(x=Groups, y=value, fill=variable),
stat="identity", position=position_dodge(), alpha=0.2)+
geom_bar(data=df2, aes(x=Groups, y=value, fill=variable),
stat="identity", position=position_dodge())
I would like for the opacity/alpha from the plot behind to show in the legend.
I have tried +guides(colour = guide_legend(override.aes = list(alpha = 0.2))) but this does not work.
Thanks.
You can use a little trick with after_scale to achieve this. Map fill to the variable in one geom layer, and map color to variable in the second layer, but set the second layer's fill to after_scale(color). This will give you two legends, one for the first data set with its transparency, and the other for the second set, fully opaque. You can call these whatever you like using labs
ggplot(df, aes(Groups, value)) +
geom_col(aes(fill = variable), position = 'dodge', alpha = 0.2) +
geom_col(data = df2, aes(color = variable, fill = after_scale(color)),
position = 'dodge') +
labs(color = 'df2', fill = 'df1')
If you want everything in a single legend, you can do this in a couple of ways, for example, using a manual fill value with 6 colors, 3 of which have alpha set, or just merging the two legends in the method above using theme tweaks:
ggplot(df, aes(Groups, value)) +
geom_col(aes(fill = variable), position = 'dodge', alpha = 0.2) +
geom_col(data = df2, position = 'dodge', size = 0,
aes(color = variable, fill = after_scale(color))) +
labs(color = 'key', fill = NULL) +
guides(fill = guide_legend(order = 2),
color = guide_legend(order = 1,
override.aes = list(color = 'white', size = 0.3))) +
theme(legend.spacing.y = unit(-2, 'mm'),
legend.title = element_text(vjust = 5))

How to automatically change label color depending on relative values (maximum/minimum)?

In order to make a dynamic visualization, for example in a dashboard, I want to display the label colors (percentages or totals) depending on their real values in black or white.
As you can see from my reprex below, I changed the color of the label with the highest percentage manually to black, in order gain a better visability.
Is there a was, to automatically implement the label color? The label with the highest percentage corresponding should always be black, if data is changing over time.
library(ggplot2)
library(dplyr)
set.seed(3)
reviews <- data.frame(review_star = as.character(sample.int(5,400, replace = TRUE)),
stars = 1)
df <- reviews %>%
group_by(review_star) %>%
count() %>%
ungroup() %>%
mutate(perc = `n` / sum(`n`)) %>%
arrange(perc) %>%
mutate(labels = scales::percent(perc))
ggplot(df, aes(x = "", y = perc, fill = review_star)) +
geom_col(color = "black") +
geom_label(aes(label = labels), color = c( "white", "white","white",1,"white"),
position = position_stack(vjust = 0.5),
show.legend = FALSE) +
guides(fill = guide_legend(title = "Answer")) +
scale_fill_viridis_d() +
coord_polar(theta = "y") +
theme_void()
you can set the colors using replace(rep('white', nrow(df)), which.max(df$perc), 'black').
ggplot(df, aes(x = "", y = perc, fill = review_star)) +
geom_col(color = "black") +
geom_label(aes(label = labels),
color = replace(rep('white', nrow(df)), which.max(df$perc), 'black'),
position = position_stack(vjust = 0.5),
show.legend = FALSE) +
guides(fill = guide_legend(title = "Answer")) +
scale_fill_viridis_d() +
coord_polar(theta = "y") +
theme_void()

Add secondary axis to stacked bar chart

I have this code which creates the plot below
sec_axis_data <- mpg %>%
group_by(manufacturer) %>%
summarise(entries = n())
p <- ggplot(mpg, aes(x = manufacturer, fill = class == "compact")) +
geom_bar(position = "fill") +
scale_fill_manual(values = c('blue', 'red')) +
scale_y_continuous(sec.axis = sec_axis(~. * 50))
p
However, I'm not sure how to get the secondary axis data to display properly as a line across the plot? When, for example, I try:
p <- ggplot(mpg, aes(x = manufacturer, fill = class == "compact")) +
geom_bar(position = "fill") +
scale_fill_manual(values = c('blue', 'red')) +
scale_y_continuous(sec.axis = sec_axis(~. * 50)) +
geom_line(data = sec_axis_data, aes(x = manufacturer, y = entries))
p
... I get an error. I think the issue is linked to the different lengths of the data for mpg and sec_axis_data, but I'm not sure how to resolve this.
You were quite close of the solution.
You need to add inherit.aes = FALSE because of the fill argument not find in your second dataframe.
Also, to set the appropiate value, you need to divide your "entries" values by the same ratio you used for building the second axis in sec.axis function:
library(ggplot2)
ggplot(mpg, aes(x = manufacturer, fill = class == "compact")) +
geom_bar(position = "fill", alpha = 0.5) +
scale_fill_manual(values = c('blue', 'red')) +
scale_y_continuous(sec.axis = sec_axis(~. * 50, name = "Second axis")) +
geom_line(inherit.aes = FALSE, data = sec_axis_data,
aes(x = manufacturer, y = entries/50, group = 1), size = 2)
Does it answer your question ?

Aesthetics must be either length 1 or the same as the data (1): x, y, label

I'm working on some data on party polarization (something like this) and used geom_dumbbell from ggalt and ggplot2. I keep getting the same aes error and other solutions in the forum did not address this as effectively. This is my sample data.
df <- data_frame(policy=c("Not enough restrictions on gun ownership", "Climate change is an immediate threat", "Abortion should be illegal"),
Democrats=c(0.54, 0.82, 0.30),
Republicans=c(0.23, 0.38, 0.40),
diff=sprintf("+%d", as.integer((Democrats-Republicans)*100)))
I wanted to keep order of the plot, so converted policy to factor and wanted % to be shown only on the first line.
df <- arrange(df, desc(diff))
df$policy <- factor(df$policy, levels=rev(df$policy))
percent_first <- function(x) {
x <- sprintf("%d%%", round(x*100))
x[2:length(x)] <- sub("%$", "", x[2:length(x)])
x
}
Then I used ggplot that rendered something close to what I wanted.
gg2 <- ggplot()
gg2 <- gg + geom_segment(data = df, aes(y=country, yend=country, x=0, xend=1), color = "#b2b2b2", size = 0.15)
# making the dumbbell
gg2 <- gg + geom_dumbbell(data=df, aes(y=country, x=Democrats, xend=Republicans),
size=1.5, color = "#B2B2B2", point.size.l=3, point.size.r=3,
point.color.l = "#9FB059", point.color.r = "#EDAE52")
I then wanted the dumbbell to read Democrat and Republican on top to label the two points (like this). This is where I get the error.
gg2 <- gg + geom_text(data=filter(df, country=="Government will not control gun violence"),
aes(x=Democrats, y=country, label="Democrats"),
color="#9fb059", size=3, vjust=-2, fontface="bold", family="Calibri")
gg2 <- gg + geom_text(data=filter(df, country=="Government will not control gun violence"),
aes(x=Republicans, y=country, label="Republicans"),
color="#edae52", size=3, vjust=-2, fontface="bold", family="Calibri")
Any thoughts on what I might be doing wrong?
I think it would be easier to build your own "dumbbells" with geom_segment() and geom_point(). Working with your df and changing the variable refences "country" to "policy":
library(tidyverse)
# gather data into long form to make ggplot happy
df2 <- gather(df,"party", "value", Democrats:Republicans)
ggplot(data = df2, aes(y = policy, x = value, color = party)) +
# our dumbell
geom_path(aes(group = policy), color = "#b2b2b2", size = 2) +
geom_point(size = 7, show.legend = FALSE) +
# the text labels
geom_text(aes(label = party), vjust = -1.5) + # use vjust to shift text up to no overlap
scale_color_manual(values = c("Democrats" = "blue", "Republicans" = "red")) + # named vector to map colors to values in df2
scale_x_continuous(limits = c(0,1), labels = scales::percent) # use library(scales) nice math instead of pasting
Produces this plot:
Which has some overlapping labels. I think you could avoid that if you use just the first letter of party like this:
ggplot(data = df2, aes(y = policy, x = value, color = party)) +
geom_path(aes(group = policy), color = "#b2b2b2", size = 2) +
geom_point(size = 7, show.legend = FALSE) +
geom_text(aes(label = gsub("^(\\D).*", "\\1", party)), vjust = -1.5) + # just the first letter instead
scale_color_manual(values = c("Democrats" = "blue", "Republicans" = "red"),
guide = "none") +
scale_x_continuous(limits = c(0,1), labels = scales::percent)
Only label the top issue with names:
ggplot(data = df2, aes(y = policy, x = value, color = party)) +
geom_path(aes(group = policy), color = "#b2b2b2", size = 2) +
geom_point(size = 7, show.legend = FALSE) +
geom_text(data = filter(df2, policy == "Not enough restrictions on gun ownership"),
aes(label = party), vjust = -1.5) +
scale_color_manual(values = c("Democrats" = "blue", "Republicans" = "red")) +
scale_x_continuous(limits = c(0,1), labels = scales::percent)

stacked geom_bar issue with stacked bar and labels misplaced

I have this bar chart:
group = c("A","A","B","B")
value = c(25,-75,-40,-76)
day = c(1,2,1,2)
dat = data.frame(group = group , value = value, day = day)
ggplot(data = dat, aes(x = group, y = value, fill = factor(day))) +
geom_bar(stat = "identity", position = "identity")+
geom_text(aes(label = round(value,0)), color = "black", position = "stack")
and I'd like the bars stacked and the values to show up. When I run the code above the -76 is not in the correct location (and neither is the 75 it seems).
Any idea how to get the numbers to appear in the correct location?
ggplot(data=dat, aes(x=group, y=value, fill=factor(day))) +
geom_bar(stat="identity", position="identity")+
geom_text(label =round(value,0),color = "black")+
scale_y_continuous(breaks=c(-80,-40,0))
Stacking a mix of negative and positive values is difficult for ggplot2. The easiest thing to do is to split the dataset into two, one for positives and one for negatives, and then add bar layers separately. A classic example is here.
You can do the same thing with the text, adding one text layer for the positive y values and one for the negatives.
dat1 = subset(dat, value >= 0)
dat2 = subset(dat, value < 0)
ggplot(mapping = aes(x = group, y = value, fill = factor(day))) +
geom_bar(data = dat1, stat = "identity", position = "stack")+
geom_bar(data = dat2, stat = "identity", position = "stack") +
geom_text(data = dat1, aes(label = round(value,0)), color = "black", position = "stack") +
geom_text(data = dat2, aes(label = round(value,0)), color = "black", position = "stack")
If using the currently development version of ggplot2 (2.1.0.9000), the stacking doesn't seem to be working correctly in geom_text for negative values. You can always calculate the text positions "by hand" if you need to.
library(dplyr)
dat2 = dat2 %>%
group_by(group) %>%
mutate(pos = cumsum(value))
ggplot(mapping = aes(x = group, y = value, fill = factor(day))) +
geom_bar(data = dat1, stat = "identity", position = "stack")+
geom_bar(data = dat2, stat = "identity", position = "stack") +
geom_text(data = dat1, aes(label = round(value,0)), color = "black") +
geom_text(data = dat2, aes(label = round(value,0), y = pos), color = "black")

Resources