Formatting GGplot stacked barplot - r

I am making a set of scorecards where I am generating a set of graphs that show the distribution of responses from a survey and also where the response for a specific company falls. I need to modify the formatting of a graph, a stacked barchart, and add a few features I’ve outlined below. I’ve already spent a few hours getting my chart to where it is now and would appreciate your help with the features I outline below.
Data is
Data<-data.frame(Reviewed = c("Annually", "Annually", "Hourly", "Monthly", "Weekly","Monthly","Weekly","Other","Other","Monthly","Weekly"),Company=c("a","b","c","d","e","f","g","h","i","j","k"),Question="Q1")
So far I’ve developed this
ggplot(Data, aes(x="Question", fill=Reviewed)) + geom_bar(position='fill' ) +
coord_flip()
I would like to do the following:
Order the variables so they are arranged on plot as follows: Annually,Monthly,Weekly,Hourly,Other
Express the y axis in terms of percent. I.e. 0.25 turns into 25%
Move y-axis directly underneath the bar.
Remove the legend but move the terms underneath the respective part of the graph on a diagonal slant.
Add a black line that cuts down the 50% mark
Add a dot in at the midpoint of the stack for the value of company “e”.
Remove gray background
This is what I'm hoping the finished graph will look like.

There's a lot to unpack here, so I'll break it down bit by bit:
Order the variables so they are arranged on plot as follows: Annually,Monthly,Weekly,Hourly,Other
Assign "Reviewed" as an ordered factor. I'm reversing the order here since it wants to plot the "lowest" factor first (to the left).
Data$Reviewed <- factor(Data$Reviewed,
levels = rev(c('Annually', 'Monthly', 'Weekly', 'Hourly', 'Other')),
ordered = T)
ggplot(Data, aes(x="Question", fill=Reviewed)) + geom_bar(position='fill' ) +
coord_flip()
Express the y axis in terms of percent. I.e. 0.25 turns into 25%
Use scale_y_continuous(labels = scales::percent) to adjust the labels. I believe that the scales was pulled in when you installed ggplot2.
ggplot(Data, aes(x="Question", fill=Reviewed)) +
geom_bar(position = 'fill') +
scale_y_continuous(labels = scales::percent) +
coord_flip()
Move y-axis directly underneath the bar.
Remove gray background
These are done all at once by adding expand = F to coord_flip.
ggplot(Data, aes(x="Question", fill=Reviewed)) +
geom_bar(position = 'fill') +
scale_y_continuous(labels = scales::percent) +
coord_flip(expand = F)
Remove the legend...
Add theme(legend.position = 'none').
ggplot(Data, aes(x="Question", fill=Reviewed)) +
geom_bar(position = 'fill') +
scale_y_continuous(labels = scales::percent) +
coord_flip(expand = F) +
theme(legend.position = 'none')
but move the terms underneath the respective part of the graph on a diagonal slant.
This is tougher and takes a good amount of fiddling.
Use geom_text to make the labels
Calculate the position along the bar using the 'count' stat
Move the labels to the bottom of the plot by providing a fake x coordinate
Align the labels in the center of the bars using position_stack, and make them abut the x axis using hjust.
Add angle.
Use clip = 'off' in coord_flip to make sure that these values are not cut out since they're outside the plotting area.
Fiddle with the x limits to crop out empty plotting area.
Adjust the plot margin in theme to make sure everything can be seen.
ggplot(Data, aes(x="Question", fill=Reviewed)) +
geom_bar(position = 'fill') +
geom_text(aes(label = Reviewed, x = 0.45,
y = stat(..count../sum(..count..))), stat = 'count',
position = position_stack(0.5),
hjust = 0,
angle = 45) +
scale_y_continuous(labels = scales::percent) +
coord_flip(xlim = c(0.555, 1.4), clip = 'off',expand = F) +
theme(plot.margin = margin(0, 0, 35, 10),
legend.position = 'none')
Add a black line that cuts down the 50% mark
Use geom_hline(yintercept = 0.5); remember that it's a "horizontal" line since the coordinates are flipped.
ggplot(Data, aes(x="Question", fill=Reviewed)) +
geom_bar(position = 'fill') +
geom_text(aes(label = Reviewed, x = 0.45,
y = stat(..count../sum(..count..))), stat = 'count',
position = position_stack(0.5),
hjust = 0,
angle = 45) +
geom_hline(yintercept = 0.5) +
scale_y_continuous(labels = scales::percent) +
coord_flip(xlim = c(0.555, 1.4), clip = 'off',expand = F) +
theme(plot.margin = margin(0, 0, 20, 10),
legend.position = 'none')
Add a dot in at the midpoint of the stack for the value of company “e”.
This is pretty hack-y. Using the same y values as in geom_text, use geom_point to plot a point for every value of Reviewed, then use position_stack(0.5) to nudge them to the center of the bar. Then use scale_color_manual to only color "Weekly" values (which is the corresponding value of Reviewed for Company "e"). I'm sure there's a way to do this more programmatically.
ggplot(Data, aes(x="Question", fill=Reviewed)) +
geom_bar(position = 'fill') +
geom_text(aes(label = Reviewed, x = 0.45,
y = stat(..count../sum(..count..))), stat = 'count',
position = position_stack(0.5),
hjust = 0,
angle = 45) +
geom_hline(yintercept = 0.5) +
geom_point(aes(y = stat(..count../sum(..count..)),
color = Reviewed), stat = 'count',
position = position_stack(0.5), size = 5) +
scale_color_manual(values = 'black', limits = 'Weekly') +
scale_y_continuous(labels = scales::percent) +
coord_flip(xlim = c(0.555, 1.4), clip = 'off',expand = F) +
theme(plot.margin = margin(0, 0, 20, 10),
legend.position = 'none')
This is what I'm hoping the finished graph will look like.
Prettying things up:
ggplot(Data, aes(x="Question", fill = Reviewed)) +
geom_bar(position = 'fill') +
geom_text(aes(label = Reviewed, x = 0.45,
y = stat(..count../sum(..count..))), stat = 'count',
position = position_stack(0.5),
hjust = 0,
angle = 45) +
geom_hline(yintercept = 0.5) +
geom_point(aes(y = stat(..count../sum(..count..)),
color = Reviewed), stat = 'count',
position = position_stack(0.5), size = 5) +
scale_color_manual(values = 'black', limits = 'Weekly') +
scale_y_continuous(labels = scales::percent) +
coord_flip(xlim = c(0.555, 1.4), clip = 'off', expand = F) +
labs(x = NULL, y = NULL) +
theme_minimal() +
theme(plot.margin = margin(0, 0, 35, 10),
legend.position = 'none')

Related

Why are colours appearing in the labels of my gganimate sketch?

I have a gganimate sketch in R and I would like to have the percentages of my bar chart appear as labels.
But for some bizarre reason, I am getting seemingly random colours in place of the labels that I'm requesting.
If I run the ggplot part without animating then it's a mess (as it should be), but it's obvious that the percentages are appearing correctly.
Any ideas? The colour codes don't correspond to the colours of the bars which I have chosen separately. The codes displayed also cycle through about half a dozen different codes, at a rate different to the frame rate that I selected. And while the bars are the same height (they grow until they reach the chosen height displayed in the animation) then they display the same code until they stop and it gets frozen.
Code snippet:
df_new <- data.frame(index, rate, year, colour)
df_new$rate_label <- ifelse(round(df_new$rate, 1) %% 1 == 0,
paste0(round(df_new$rate, 1), ".0%"), paste0(round(df_new$rate, 1), "%"))
p <- ggplot(df_new, aes(x = year, y = rate, fill = year)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = colour) +
#geom_text(aes(y = rate, label = paste0(rate, "%")), vjust = -0.7) +
geom_shadowtext(aes(y = rate, label = rate_label),
bg.colour='white',
colour = 'black',
size = 9,
fontface = "bold",
vjust = -0.7,
alpha = 1
) +
coord_cartesian(clip = 'off') +
ggtitle("% population belonging to 'No religion', England and Wales census") +
theme_minimal() +
xlab("") + ylab("") +
theme(legend.position = "none") +
theme(plot.title = element_text(size = 18, face = "bold")) +
theme(axis.text = element_text(size = 14)) +
scale_y_continuous(limits = c(0, 45), breaks = 10*(0:4))
p
p <- p + transition_reveal(index) + view_follow(fixed_y = T)
animate(p, renderer = gifski_renderer(), nframes = 300, fps = frame_rate, height = 500, width = 800,
end_pause = 0)
anim_save("atheism.gif")
I think you have missed some delicate points about ggplot2. I will try my best to describe them to you. First of all, you need to enter the discrete values as factor or integer. So you can use as.factor() before plotting or just factor() in the aesthetic. Also, you should consider rounding the percentages as you wish. Here is an example:
set.seed(2023)
df_new <- data.frame(index=1:10, rate=runif(10), year=2001:2010, colour=1:10)
df_new$rate_label <- ifelse(round(df_new$rate, 1) %% 1 == 0,
paste0(round(df_new$rate, 1), ".0%"),
paste0(round(df_new$rate, 1), "%"))
The ggplot for this data is:
library(ggplot2)
p <- ggplot(df_new, aes(x = factor(year), y = rate, fill = factor(colour))) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(y = rate, label = paste0(round(rate,2), "%")), vjust = -0.7) +
coord_cartesian(clip = 'off') +
ggtitle("% population belonging to 'No religion', England and Wales census") +
theme_minimal() +
xlab("") + ylab("") +
theme(legend.position = "none",
plot.title = element_text(size = 18, face = "bold"),
axis.text = element_text(size = 14))
p
And you can combine all theme element in one theme() function (as did I). The output is:
And you can easily animate the plot using the following code:
library(gganimate)
p + transition_reveal(index)
And the output is as below:
Hope it helps.
So it was answered here although I don't know why the fix works.
For some reason, labels need to go into gganimate as factors
as.factor()
I just had to add the line:
df_new$rate_label <- as.factor(df_new$rate_label)
and it works fine.

Breaking y-axis in ggplot2 with geom_bar

I'm having a hard time dealing with this plot.
The height of values in ANI>96 making it hard to read the red and blue percentage text.
I failed to break the y-axis by looking at answers from other posts in StackOverflow.
Any suggestions?
Thanks.
library(data.table)
library(ggplot2)
dt <- data.table("ANI"= sort(c(seq(79,99),seq(79,99))), "n_pairs" = c(5, 55, 13, 4366, 6692, 59568, 382873, 397996, 1104955, 282915,
759579, 261170, 312989, 48423, 120574, 187685, 353819, 79468, 218039, 66314, 41826, 57668, 112960, 81652, 28613,
64656, 21939, 113656, 170578, 238967, 610234, 231853, 1412303, 5567, 4607268, 5, 14631942, 0, 17054678, 0, 3503846, 0),
"same/diff" = rep(c("yes","no"), 21))
for (i in 1:nrow(dt)) {
if (i%%2==0) {
next
}
total <- dt$n_pairs[i] + dt$n_pairs[i+1]
dt$total[i] <- total
dt$percent[i] <- paste0(round(dt$n_pairs[i]/total *100,2), "%")
dt$total[i+1] <- total
dt$percent[i+1] <- paste0(round(dt$n_pairs[i+1]/total *100,2), "%")
}
ggplot(data=dt, aes(x=ANI, y=n_pairs, fill=`same/diff`)) +
geom_text(aes(label=percent), position=position_dodge(width=0.9), hjust=0.75, vjust=-0.25) +
geom_bar(stat="identity") + scale_x_continuous(breaks = dt$ANI) +
labs(x ="ANI", y = "Number of pairs", fill = "Share one common species taxonomy?") +
theme_classic() + theme(legend.position="bottom")
Here is the list of major changes I made:
I reduced the y axis by zooming into the chart with coord_cartesian (which is called by coord_flip).
coord_flip shouuld also improve the readability of the chart by switching x and y. I don't know if the switch is a desirable output for you.
Also now position_dodge, works as expected: two bars next to each other with the labels on top (on the left in this case).
I set geom_bar before geom_text so that the text is always in front of the bars in the chart.
I set scale_y_continuous to change the labels of the y axis (in the chart the x axis because of the switch) to improve the readability of the zeros.
ggplot(data=dt, aes(x = ANI, y = n_pairs, fill = `same/diff`)) +
geom_bar(stat = "identity", position = position_dodge2(width = 1), width = 0.8) +
geom_text(aes(label = percent), position = position_dodge2(width = 1), hjust = 0, size = 3) +
scale_x_continuous(breaks = dt$ANI) +
scale_y_continuous(labels = scales::comma) +
labs(x ="ANI", y = "Number of pairs", fill = "Share one common species taxonomy?") +
theme_classic() +
theme(legend.position = "bottom") +
coord_flip(ylim = c(0, 2e6))
EDIT
Like this columns and labels are stacked but labels never overlap.
ggplot(data=dt, aes(x = ANI, y = n_pairs, fill = `same/diff`)) +
geom_bar(stat = "identity", width = 0.8) +
geom_text(aes(label = percent,
hjust = ifelse(`same/diff` == "yes", 1, 0)),
position = "stack", size = 3) +
scale_x_continuous(breaks = dt$ANI) +
scale_y_continuous(labels = scales::comma) +
labs(x ="ANI", y = "Number of pairs", fill = "Share one common species taxonomy?") +
theme_classic() +
theme(legend.position = "bottom") +
coord_flip(ylim = c(0, 2e6))
Alternatively, you can avoid labels overlapping with check_overlap = TRUE, but sometimes one of the labels will not be shown.
ggplot(data=dt, aes(x = ANI, y = n_pairs, fill = `same/diff`)) +
geom_bar(stat = "identity", width = 0.8) +
geom_text(aes(label = percent), hjust = 1, position = "stack", size = 3, check_overlap = TRUE) +
scale_x_continuous(breaks = dt$ANI) +
scale_y_continuous(labels = scales::comma) +
labs(x ="ANI", y = "Number of pairs", fill = "Share one common species taxonomy?") +
theme_classic() +
theme(legend.position = "bottom") +
coord_flip(ylim = c(0, 2e6))

Expand y scale but limit height of y axis line

My goal is to make a simple column chart in ggplot2 that looks like the following chart (made in Excel):
What I'm finding is that, with example data such as this (where one percentage value is very close to 100%), my options for plotting this data in ggplot2 leave something to be desired. In particular, I haven't found a way to make the following two simple things happen together:
1) Make the y-axis line end at 100%
and
2) Make the percentage labels over each bar visible
To address this issue, I've tried experimenting with different arguments to scale_y_continuous() but haven't found a way to meet both of the goals above at the same time. You can see this in the example plots and code below.
My question is: how do I expand the y scale so that my percentage labels over each data point are visible, but the y-axis line ends at 100%?
library(dplyr)
library(ggplot2)
library(scales)
example_df <- data_frame(Label = c("A", "B"),
Percent = c(0.5, 0.99))
example_plot <- example_df %>%
ggplot(aes(x = Label, y = Percent)) +
geom_bar(stat = "identity",
fill = "dodgerblue4", width = .6) +
geom_text(aes(label = percent(Percent)),
size = 3, vjust = -0.5) +
scale_x_discrete(NULL, expand = c(0, .5)) +
theme_classic()
Plot with desired y-axis line, but non-visible label over bar
Here is what happens when I set the limit on scale_y_continuous() to c(0,1):
example_plot +
scale_y_continuous(NULL, limits = c(0, 1.0), breaks = seq(0, 1, .2),
labels = function(x) scales::percent(x),
expand = c(0, 0)) +
labs(title = "Y axis line looks perfect, but the label over the bar is off")
Plot with y-axis line too long, but visible label over bar
And here is what happens when I set the limit on scale_y_continuous() to c(0,1.05):
example_plot +
scale_y_continuous(NULL, limits = c(0, 1.05), breaks = seq(0, 1, .2),
labels = function(x) scales::percent(x),
expand = c(0, 0)) +
labs(title = "Y axis line is too long, but the label over the bar is visible")
You could remove the regular axis line and then use geom_segment to create a new one:
example_df %>%
ggplot(aes(x = Label, y = Percent)) +
geom_bar(stat = "identity", fill = "dodgerblue4", width = .6) +
geom_text(aes(label = percent(Percent)), size = 3, vjust = -0.5) +
scale_x_discrete("", expand = c(0, .5)) +
scale_y_continuous("", breaks = seq(0, 1, .2), labels = percent, limits=c(0,1.05),
expand=c(0,0)) +
theme_classic() +
theme(axis.line.y=element_blank()) +
geom_segment(x=.5025, xend=0.5025, y=0, yend=1.002)
To respond to your comment: Even when it's outside the plot area, the 99% label is still being drawn, but it's "clipped", meaning that plot elements outside the plot area are masked. So, another option, still hacky, but less hacky than my original answer, is to turn off clipping so that the label appears:
library(grid)
p = example_df %>%
ggplot(aes(x = Label, y = Percent)) +
geom_bar(stat = "identity", fill = "dodgerblue4", width = .6) +
geom_text(aes(label = percent(Percent)), size = 3, vjust = -0.5) +
scale_x_discrete("", expand = c(0, .5)) +
scale_y_continuous("", breaks = seq(0, 1, .2), labels = percent, limits=c(0,1),
expand=c(0,0)) +
theme_classic() +
theme(plot.margin=unit(c(10,0,0,0),'pt'))
# Turn off clipping
pg <- ggplot_gtable(ggplot_build(p))
pg$layout$clip[pg$layout$name=="panel"] <- "off"
grid.draw(pg)

ggplot dot covering error bar

I have a huge file and I don't really know what small test dataset I can give here to produce the same problem in the plot, so I will not give any test dataset, I will only attach the plot image here to show the problem.
My code:
ggplot(tgc, aes(x=Week, y=MuFreq)) +
theme_gray(base_size=18) +
theme(plot.title=element_text(hjust=.5),
axis.title.x = element_text(face="bold"),
axis.title.y = element_text(face="bold")) +
geom_errorbar(aes(ymin=MuFreq-(1.96*se), ymax=MuFreq+(1.96*se)), width=3) +
geom_line() +
geom_point(aes(size= N), color="blue")+
scale_x_continuous(breaks=c(68,98,188), labels=c("Wk68", "Wk98", "Wk188")) +
scale_y_continuous(limits=c(0,0.15)) +
scale_size( breaks = unique(tgc$N))
So the problem is that I'm sizing the dots based on the sample size for each week, the middle dot actually has error bars associated with it but it's covering the error bar. I tried to use horizontal error bar but it didn't work because my x-axis is customized to be non-numerical.
What can I do to show the error bar that's being covered?
Also is there any way to make the background vertical grid lines spaced evenly?
The Q asks to improve two things in the ggplot2 chart:
Show error bars that are being covered
Make the background vertical grid lines spaced evenly
Data
As the OP didn't supply any data, we need a dummy data set. This is easily done by reading values from the plot:
tgc <- data.frame(Week = c(68, 98, 188),
MuFreq = c(0.08, 0.09, 0.091),
se = c(0.003, 0.001, 0.019)/1.96,
N = c(91, 835, 7))
This reproduces the original plot quite nicely:
Variant 1
This one is picking up Nick Criswell's comments:
Change order in which layers are plotted, so that error bars are plotted on top
Change colour and alpha
plus
Remove all vertical grid lines except those which are explicetly specified as breaks. The distances of major grid lines are still uneven but reflect the difference in time
With this code
library(ggplot2)
ggplot(tgc, aes(x = Week, y = MuFreq)) +
theme_gray(base_size = 18) +
theme(plot.title = element_text(hjust = .5),
axis.title = element_text(face = "bold")) +
geom_line() +
geom_point(aes(size = N), color = "dodgerblue1", alpha = 0.5) +
geom_errorbar(aes(ymin = MuFreq - (1.96 * se),
ymax = MuFreq + (1.96 * se)), width = 3) +
scale_x_continuous(
breaks = c(68, 98, 188),
labels = c("Wk68", "Wk98", "Wk188"),
minor_breaks = NULL
) +
scale_y_continuous(limits = c(0, 0.15)) +
scale_size(breaks = unique(tgc$N))
we do get:
Variant 2
To get evenly spaced data points on the x-axis we can turn weeks into factor. This requires to tell ggplot2 that the data belong to one group in order to have lines plotted and to add a custom x-axis label.
In addition, theme_bw is used instead of theme_gray:
library(ggplot2)
ggplot(tgc, aes(x = factor(Week, labels = c("Wk68", "Wk98", "Wk188")),
y = MuFreq, group = 1)) +
theme_bw(base_size = 18) +
theme(plot.title = element_text(hjust = .5),
axis.title = element_text(face = "bold")) +
geom_line() +
geom_point(aes(size = N), color = "dodgerblue1", alpha = 0.5) +
geom_errorbar(aes(ymin = MuFreq - (1.96 * se),
ymax = MuFreq + (1.96 * se)), width = 0.05 ) +
scale_y_continuous(limits = c(0, 0.15)) +
scale_size(breaks = unique(tgc$N)) +
xlab("Week")

Overlay circles in ggplot2

What I'm trying to do is overlay circles that have a dark outline over the ones I have but I'm not sure how to size them since I already have varying sizes. Also is there anyway to change the legend symbols to something like $1M, $2m?
mikebay_usergraph <-
ggplot(mikebay_movies_dt, aes(y = tomatoUserMeter, x = Released, label = Title)) +
geom_point(aes(size = BoxOffice)) + (aes(color = tomatoImage)) +
geom_text(hjust = .45, vjust = -.75, family = "Futura", size = 5, colour = "#535353") +
ggtitle("The Fall of Bayhem: How Michael Bay movies have declined") +
theme(plot.title = element_text(size = 15, vjust = 1, family = "Futura"),
axis.text.x = element_text(size = 12.5, family = "Futura"),
axis.text.y = element_text(size = 12.0, family = "Futura"),
panel.background = element_rect(fill = '#F0F0F0'),
panel.grid.major=element_line(colour ="#D0D0D0",size=.75)) +
scale_colour_manual(values = c('#336333', '#B03530')) +
geom_hline(yintercept = 0,size = 1.2, colour = "#535353") +
scale_x_date(limits = c(as.Date("1994-1-1"),as.Date("2017-1-1"))) +
theme(axis.ticks = element_blank())
I offer two possible solutions for adding a circle or outline around size-scaled points in a scatterplot. For the first solution, I propose using plotting symbols that allow separate fill and outline colors. The drawback here is that you cannot control the thickness of the outline. For the second solution I propose adding an extra layer of slightly larger black points positioned under the primary geom_point layer. In this case, the thickness of the outline can be manually adjusted by setting thickness to a value between 0 and 1.
Finally, dollar legend formatting can be added by loading the scales package, and adding scale_size_continuous(labels=dollar) to your ggplot call.
library(ggplot2)
library(scales) # Needed for dollar labelling.
dat = data.frame(rating=c(80, 60, 40),
date=as.Date(c("1995-1-1", "2005-1-1", "2015-1-1")),
boxoffice=c(3e7, 1e8, 7e7),
tomato=c("fresh", "rotten", "rotten"))
p1 = ggplot(dat, aes(x=date, y=rating, size=boxoffice, fill=tomato)) +
geom_point(shape=21, colour="black") +
scale_fill_manual(values = c(fresh="green", rotten="red")) +
scale_size_continuous(labels=dollar, range=c(8, 22))
thickness = 0.35
p2 = ggplot(dat, aes(x=date, y=rating)) +
geom_point(colour="black",
aes(size=boxoffice + (thickness * mean(boxoffice)))) +
geom_point(aes(colour=tomato, size=boxoffice)) +
scale_colour_manual(values = c(fresh="green", rotten="red")) +
scale_size_continuous(labels=dollar, range=c(8, 22), name="Box Office")

Resources