ggplot, facet, piechart: placing text in the middle of pie chart slices - r

I'm trying to produce a facetted pie-chart with ggplot and facing problems with placing text in the middle of each slice:
dat = read.table(text = "Channel Volume Cnt
AGENT high 8344
AGENT medium 5448
AGENT low 23823
KIOSK high 19275
KIOSK medium 13554
KIOSK low 38293", header=TRUE)
vis = ggplot(data=dat, aes(x=factor(1), y=Cnt, fill=Volume)) +
geom_bar(stat="identity", position="fill") +
coord_polar(theta="y") +
facet_grid(Channel~.) +
geom_text(aes(x=factor(1), y=Cnt, label=Cnt, ymax=Cnt),
position=position_fill(width=1))
The output:
What parameters of geom_text should be adjusted in order to place numerical labels in the middle of piechart slices?
Related question is Pie plot getting its text on top of each other but it doesn't handle case with facet.
UPDATE: following Paul Hiemstra advice and approach in the question above I changed code as follows:
---> pie_text = dat$Cnt/2 + c(0,cumsum(dat$Cnt)[-length(dat$Cnt)])
vis = ggplot(data=dat, aes(x=factor(1), y=Cnt, fill=Volume)) +
geom_bar(stat="identity", position="fill") +
coord_polar(theta="y") +
facet_grid(Channel~.) +
geom_text(aes(x=factor(1),
---> y=pie_text,
label=Cnt, ymax=Cnt), position=position_fill(width=1))
As I expected tweaking text coordiantes is absolute but it needs be within facet data:

NEW ANSWER: With the introduction of ggplot2 v2.2.0, position_stack() can be used to position the labels without the need to calculate a position variable first. The following code will give you the same result as the old answer:
ggplot(data = dat, aes(x = "", y = Cnt, fill = Volume)) +
geom_bar(stat = "identity") +
geom_text(aes(label = Cnt), position = position_stack(vjust = 0.5)) +
coord_polar(theta = "y") +
facet_grid(Channel ~ ., scales = "free")
To remove "hollow" center, adapt the code to:
ggplot(data = dat, aes(x = 0, y = Cnt, fill = Volume)) +
geom_bar(stat = "identity") +
geom_text(aes(label = Cnt), position = position_stack(vjust = 0.5)) +
scale_x_continuous(expand = c(0,0)) +
coord_polar(theta = "y") +
facet_grid(Channel ~ ., scales = "free")
OLD ANSWER: The solution to this problem is creating a position variable, which can be done quite easily with base R or with the data.table, plyr or dplyr packages:
Step 1: Creating the position variable for each Channel
# with base R
dat$pos <- with(dat, ave(Cnt, Channel, FUN = function(x) cumsum(x) - 0.5*x))
# with the data.table package
library(data.table)
setDT(dat)
dat <- dat[, pos:=cumsum(Cnt)-0.5*Cnt, by="Channel"]
# with the plyr package
library(plyr)
dat <- ddply(dat, .(Channel), transform, pos=cumsum(Cnt)-0.5*Cnt)
# with the dplyr package
library(dplyr)
dat <- dat %>% group_by(Channel) %>% mutate(pos=cumsum(Cnt)-0.5*Cnt)
Step 2: Creating the facetted plot
library(ggplot2)
ggplot(data = dat) +
geom_bar(aes(x = "", y = Cnt, fill = Volume), stat = "identity") +
geom_text(aes(x = "", y = pos, label = Cnt)) +
coord_polar(theta = "y") +
facet_grid(Channel ~ ., scales = "free")
The result:

I would like to speak out against the conventional way of making pies in ggplot2, which is to draw a stacked barplot in polar coordinates. While I appreciate the mathematical elegance of that approach, it does cause all sorts of headaches when the plot doesn't look quite the way it's supposed to. In particular, precisely adjusting the size of the pie can be difficult. (If you don't know what I mean, try to make a pie chart that extends all the way to the edge of the plot panel.)
I prefer drawing pies in a normal cartesian coordinate system, using geom_arc_bar() from ggforce. It requires a little bit of extra work on the front end, because we have to calculate angles ourselves, but that's easy and the level of control we get as a result is more than worth it.
I've used this approach in previous answers here and here.
The data (from the question):
dat = read.table(text = "Channel Volume Cnt
AGENT high 8344
AGENT medium 5448
AGENT low 23823
KIOSK high 19275
KIOSK medium 13554
KIOSK low 38293", header=TRUE)
The pie-drawing code:
library(ggplot2)
library(ggforce)
library(dplyr)
# calculate the start and end angles for each pie
dat_pies <- left_join(dat,
dat %>%
group_by(Channel) %>%
summarize(Cnt_total = sum(Cnt))) %>%
group_by(Channel) %>%
mutate(end_angle = 2*pi*cumsum(Cnt)/Cnt_total, # ending angle for each pie slice
start_angle = lag(end_angle, default = 0), # starting angle for each pie slice
mid_angle = 0.5*(start_angle + end_angle)) # middle of each pie slice, for the text label
rpie = 1 # pie radius
rlabel = 0.6 * rpie # radius of the labels; a number slightly larger than 0.5 seems to work better,
# but 0.5 would place it exactly in the middle as the question asks for.
# draw the pies
ggplot(dat_pies) +
geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = rpie,
start = start_angle, end = end_angle, fill = Volume)) +
geom_text(aes(x = rlabel*sin(mid_angle), y = rlabel*cos(mid_angle), label = Cnt),
hjust = 0.5, vjust = 0.5) +
coord_fixed() +
scale_x_continuous(limits = c(-1, 1), name = "", breaks = NULL, labels = NULL) +
scale_y_continuous(limits = c(-1, 1), name = "", breaks = NULL, labels = NULL) +
facet_grid(Channel~.)
To show why I think this this approach is so much more powerful than the conventional (coord_polar()) approach, let's say we want the labels on the outside of the pie rather than inside. This creates a couple of problems, such as we will have to adjust hjust and vjust depending on the side of the pie a label falls, and also we will have to make the
plot panel wider than high to make space for the labels on the side without generating excessive space above and below. Solving these problems in the polar coordinate approach is not fun, but it's trivial in the cartesian coordinates:
# generate hjust and vjust settings depending on the quadrant into which each
# label falls
dat_pies <- mutate(dat_pies,
hjust = ifelse(mid_angle>pi, 1, 0),
vjust = ifelse(mid_angle<pi/2 | mid_angle>3*pi/2, 0, 1))
rlabel = 1.05 * rpie # now we place labels outside of the pies
ggplot(dat_pies) +
geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = rpie,
start = start_angle, end = end_angle, fill = Volume)) +
geom_text(aes(x = rlabel*sin(mid_angle), y = rlabel*cos(mid_angle), label = Cnt,
hjust = hjust, vjust = vjust)) +
coord_fixed() +
scale_x_continuous(limits = c(-1.5, 1.4), name = "", breaks = NULL, labels = NULL) +
scale_y_continuous(limits = c(-1, 1), name = "", breaks = NULL, labels = NULL) +
facet_grid(Channel~.)

To tweak the position of the label text relative to the coordinate, you can use the vjust and hjust arguments of geom_text. This will determine the position of all labels simultaneously, so this might not be what you need.
Alternatively, you could tweak the coordinate of the label. Define a new data.frame where you average the Cnt coordinate (label_x[i] = Cnt[i+1] + Cnt[i]) to position the label in the center of that particular pie. Just pass this new data.frame to geom_text in replacement of the original data.frame.
In addition, piecharts have some visual interpretation flaws. In general I would not use them, especially where good alternatives exist, e.g. a dotplot:
ggplot(dat, aes(x = Cnt, y = Volume)) +
geom_point() +
facet_wrap(~ Channel, ncol = 1)
For example, from this plot it is obvious that Cnt is higher for Kiosk than for Agent, this information is lost in the piechart.

Following answer is partial, clunky and I won't accept it.
The hope is that it will solicit better solution.
text_KIOSK = dat$Cnt
text_AGENT = dat$Cnt
text_KIOSK[dat$Channel=='AGENT'] = 0
text_AGENT[dat$Channel=='KIOSK'] = 0
text_KIOSK = text_KIOSK/1.7 + c(0,cumsum(text_KIOSK)[-length(dat$Cnt)])
text_AGENT = text_AGENT/1.7 + c(0,cumsum(text_AGENT)[-length(dat$Cnt)])
text_KIOSK[dat$Channel=='AGENT'] = 0
text_AGENT[dat$Channel=='KIOSK'] = 0
pie_text = text_KIOSK + text_AGENT
vis = ggplot(data=dat, aes(x=factor(1), y=Cnt, fill=Volume)) +
geom_bar(stat="identity", position=position_fill(width=1)) +
coord_polar(theta="y") +
facet_grid(Channel~.) +
geom_text(aes(y=pie_text, label=format(Cnt,format="d",big.mark=','), ymax=Inf), position=position_fill(width=1))
It produces following chart:
As you noticed I can't move labels for green (low).

Related

Use free_y scale on first axis and fixed on second + facet_grid + ggplot2

Is there any method to set scale = 'free_y' on the left hand (first) axis in ggplot2 and use a fixed axis on the right hand (second) axis?
I have a dataset where I need to use free scales for one variable and fixed for another but represent both on the same plot. To do so I'm trying to add a second, fixed, y-axis to my data. The problem is I cannot find any method to set a fixed scale for the 2nd axis and have that reflected in the facet grid.
This is the code I have so far to create the graph -
#plot weekly seizure date
p <- ggplot(dfspw_all, aes(x=WkYr, y=Seizures, group = 1)) + geom_line() +
xlab("Week Under Observation") + ggtitle("Average Seizures per Week - To Date") +
geom_line(data = dfsl_all, aes(x =WkYr, y = Sleep), color = 'green') +
scale_y_continuous(
# Features of the first axis
name = "Seizures",
# Add a second axis and specify its features
sec.axis = sec_axis(~.[0:20], name="Sleep")
)
p + facet_grid(vars(Name), scales = "free_y") +
theme(axis.ticks.x=element_blank(),axis.text.x = element_blank())
This is what it is producing (some details omitted from code for simplicity) -
What I need is for the scale on the left to remain "free" and the scale on the right to range from 0-24.
Secondary axes are implemented in ggplot2 as a decoration that is a transformation of the primary axis, so I don't know an elegant way to do this, since it would require the secondary axis formula to be aware of different scaling factors for each facet.
Here's a hacky approach where I scale each secondary series to its respective primary series, and then add some manual annotations for the secondary series. Another way might be to make the plots separately for each facet like here and use patchwork to combine them.
Given some fake data where the facets have different ranges for the primary series but the same range for the secondary series:
library(tidyverse)
fake <- tibble(facet = rep(1:3, each = 10),
x = rep(1:10, times = 3),
y_prim = (1+sin(x))*facet/2,
y_sec = (1 + sin(x*3))/2)
ggplot(fake, aes(x, y_prim)) +
geom_line() +
geom_line(aes(y= y_sec), color = "green") +
facet_wrap(~facet, ncol = 1)
...we could scale each secondary series to its primary series, and add custom annotations for that secondary series:
fake2 <- fake %>%
group_by(facet) %>%
mutate(y_sec_scaled = y_sec/max(y_sec) * (max(y_prim))) %>%
ungroup()
fake2_labels <- fake %>%
group_by(facet) %>%
summarize(max_prim = max(y_prim), baseline = 0, x_val = 10.5)
ggplot(fake2, aes(x, y_prim)) +
geom_line() +
geom_line(aes(y= y_sec_scaled), color = "green") +
facet_wrap(~facet, ncol = 1, scales = "free_y") +
geom_text(data = fake2_labels, aes(x = x_val, y = max_prim, label = "100%"),
hjust = 0, color = "green") +
geom_text(data = fake2_labels, aes(x = x_val, y = baseline, label = "0%"),
hjust = 0, color = "green") +
coord_cartesian(xlim = c(0, 10), clip = "off") +
theme(plot.margin = unit(c(1,3,1,1), "lines"))

Is it possible to align x axis title to a value of the axis?

Having a tibble and a simple scatterplot:
p <- tibble(
x = rnorm(50, 1),
y = rnorm(50, 10)
)
ggplot(p, aes(x, y)) + geom_point()
I get something like this:
I would like to align (center, left, right, as the case may be) the title of the x-axis - here rather blandly x - with a specific value on the axis, say the off-center 0 in this case. Is there a way to do that declaratively, without having to resort to the dumb (as in "free of context") trial-and-error element_text(hjust=??). The ?? are rather appropriate here because every value is a result of experimentation (my screen and PDF export in RStudio never agree on quite some plot elements). Any change in the data or the dimensions of the rendering may (or may not) invalidate the hjust value and I am looking for a solution that graciously repositions itself, much like the axes do.
Following the suggestions in the comments by #tjebo I dug a little deeper into the coordinate spaces. hjust = 0.0 and hjust = 1.0 clearly align the label with the Cartesian coordinate system extent (but magically left-aligned and right-aligned, respectively) so when I set specific limits, calculation of the exact value of hjust is straightforward (aiming for 0 and hjust = (0 - -1.5) / (3.5 - -1.5) = 0.3):
ggplot(p, aes(x, y)) +
geom_point() +
coord_cartesian(ylim = c(8, 12.5), xlim = c(-1.5, 3.5), expand=FALSE) +
theme(axis.title.x = element_text(hjust = 0.3))
This gives an acceptable result for a label like x, but for longer labels the alignment is off again:
ggplot(p %>% mutate(`Longer X label` = x), aes(x = `Longer X label`, y = y)) +
geom_point() +
coord_cartesian(ylim = c(8, 12.5), xlim = c(-1.5, 3.5), expand=FALSE) +
theme(axis.title.x = element_text(hjust = 0.3))
Any further suggestions much appreciated.
Another option (different enough hopefully to justify the second answer) is as already mentioned to create the annotation as a separate plot. This removes the range problem. I like {patchwork} for this.
library(tidyverse)
library(patchwork)
p <- tibble( x = rnorm(50, 1), y = rnorm(50, 10))
p1 <- tibble( x = rnorm(50, 1), y = 100*rnorm(50, 10))
## I like to define constants outside my ggplot call
mylab <- "longer_label"
x_demo <- c(-1, 2)
demo_fct <- function(p){
p1 <- ggplot(p, aes(x, y)) +
geom_point() +
labs(x = NULL) +
theme(plot.margin = margin())
p2 <- ggplot(p, aes(x, y)) +
## you need that for your correct alignment with the first plot
geom_blank() +
annotate(geom = "text", x = x_demo, y = 1,
label = mylab, hjust = 0) +
theme_void() +
# you need that for those annoying margin reasons
coord_cartesian(clip = "off")
p1 / p2 + plot_layout(heights = c(1, .05))
}
demo_fct(p) + plot_annotation(title = "demo1 with x at -1 and 2")
demo_fct(p1) + plot_annotation(title = "demo2 with larger data range")
Created on 2021-12-04 by the reprex package (v2.0.1)
I still think you will fair better and easier with custom annotation. There are typically two ways to do that. Either direct labelling with a text layer (for single labels I prefer annotate(geom = "text"), or you create a separate plot and stitch both together, e.g. with patchwork.
The biggest challenge is the positioning in y dimension. For this I typically take a semi-automatic approach where I only need to define one constant, and set the coordinates relative to the data range, so changes in range should in theory not matter much. (they still do a bit, because the panel dimensions also change). Below showing examples of exact label positioning for two different data ranges (using the same constant for both)
library(tidyverse)
# I only need patchwork for demo purpose, it is not required for the answer
library(patchwork)
p <- tibble( x = rnorm(50, 1), y = rnorm(50, 10))
p1 <- tibble( x = rnorm(50, 1), y = 100*rnorm(50, 10))
## I like to define constants outside my ggplot call
y_fac <- .1
mylab <- "longer_label"
x_demo <- c(-1, 2)
demo_fct <- function(df, x) {map(x_demo,~{
## I like to define constants outside my ggplot call
ylims <- range(df$y)
ggplot(df, aes(x, y)) +
geom_point() +
## set hjust = 0 for full positioning control
annotate(geom = "text", x = ., y = min(ylims) - y_fac*mean(ylims),
label = mylab, hjust = 0) +
coord_cartesian(ylim = ylims, clip = "off") +
theme(plot.margin = margin(b = .5, unit = "in")) +
labs(x = NULL)
})
}
demo_fct(p, x_demo) %>% wrap_plots() + plot_annotation(title = "demo 1, label at x = -1 and x = 2")
demo_fct(p1, x_demo) %>% wrap_plots() + plot_annotation(title = "demo 2 - different data range")
Created on 2021-12-04 by the reprex package (v2.0.1)

ggplot make default point size larger when size is already determined by another variable

I am trying to display data that includes non-detects. For the ND I want to have a circular outline at different sizes so that the lines do not overlap each other. I pretty much have what I want, but for the parameter cis-DCE the circular outline just makes the point look bigger instead of being a distinct outline. How do I attribute size to the parameter and also make the starting size larger?
I will include all of the code I am using for the graphing, but I am specifically working on this bit right now.
geom_point(aes(x= date, y = lrl, group = parm_nmShort, size = parm_nmShort), shape = 1) + #marking lower limit
I also know that I could use facet_wraps and I've done that previously, but historically this data has been shown in one graph, but without identifying the NDs and I do not want to drastically alter the display of the data and confuse anyone.
{
#graphing
# folder where you want the graphs to be saved:
results <- 'C:/Users/cbuckley/OneDrive - DOI/Documents/Projects/New Haven/Data/Graphs/'
{
VOC.graph <- function(df, na.rm = TRUE, ...){
df$parm_nmShort <- factor(df$parm_nm, levels = c("cis.1.2.Dichloroethene_77093",
"Trichloroethene_34485",
"Tetrachloroethene_34475"),
labels = c("cis-DCE", "TCE", "PCE"))
# create list of sites in data to loop over
site_list <- unique(df$site_nm)
# create for loop to produce ggplot2 graphs
for (i in seq_along(site_list)) {
# create plot for each county in df
plot <-
ggplot(subset(df, df$site_nm==site_list[i]),
aes(x = date, y = result,
group = parm_nmShort,
color = parm_nmShort)) +
geom_point() + #add data point plot
geom_line() + #add line plot
#geom_point(aes(y = lrl, group = parm_nmShort, shape = parm_nmShort)) +
geom_point(aes(x= date, y = lrl, group = parm_nmShort, size = parm_nmShort), shape = 1) + #marking lower limit
#scale_shape_manual(values = c("23","24","25")) + #create outlier shapes
#facet_wrap(~parm_nmShort) +
ggtitle(site_list[i]) + #name graphs well names
# theme(legend.position="none") + #removed legend
labs(x = "Year", y = expression(paste("Value, ug/L"))) + #add x and y label titles
theme_article() + #remove grey boxes, outline graph in black
theme(legend.title = element_blank()) + #removes legend title
scale_x_date(labels = date_format("%y"),
limits = as.Date(c("2000-01-01","2021-01-01"))) #+ # set x axis for all graphs
# geom_hline(yintercept = 5) #+ #add 5ug/L contaminant limit horizontal line
# theme(axis.text.x = element_text(angle = 45, size = 12, vjust = 1)) + #angles x axis titles 45 deg
# theme(aspect.ratio = 1) +
# scale_color_hue(labels = c("cic-DCE", "PCE", "TCE")) + #change label names
# scale_fill_discrete(breaks = c("PCE", "TCE", "cic-DCE"))
# Code below will let you block out below the resolution limit
# geom_ribbon(aes(ymin = 0, ymax = ###LRL###), fill ="white", color ="grey3") +
# geom_line(color ="black", lwd = 1)
#ggsave(plot,
# file=paste(results, "", site_list[i], ".png", sep=''),
# scale=1)
# print plots to screen
print(plot)
}
}
#run graphing function with long data set
VOC.graph(data)
}}
Well after a lot of playing around, I figured out the answer to my own question. I figured I'd leave the question up because none of the solutions I found online worked for me but this code did.
geom_point(aes(x= date, y = lrl, group = parm_nmShort, shape = parm_nmShort, size = parm_nmShort)) + #identify non detects
scale_shape_manual(values = c(1,1,1)) +
scale_size_manual(values = c(3,5,7)) +
I'm not very good at R, but for some reason when I didn't include the group and shape in the aes as parm_nmShort, I couldn't mannualy change the values. I don't know if it's because I have more than one geom_point in my whole script and so maybe it didn't know which one to change.

ggplot: Grouped, adjacent bars of variable width

I'm trying to produce barplot in which the width and height of the bars both convey information: the height is the number of hours spent on a task, the widths respectively indicate the perceived aptitude and importance associated with the task. I've managed to produce this monstrosity:
It's functional but horrible. I would really like to place the bars alongside one another (rather than overlaying them), so that each activity is represented by two touching bars of the same height (=time spent) but different widths and colors. I've been trying to to pass a width argument to this plot:
but setting 'aes(width = widthVariable)' gives me overlapping bars (similar to the first image) and the following warning message:
"position_dodge requires non-overlapping x intervals".
Is there a way of grouping my bars by activity, displaying them adjacently and varying their widths?
Here's a bit of the df I'm using:
molten = data.frame(Activity = rep(c('Administration','Working with Colleagues','Use of Social Media','Leadership Role'),2),
variable = c(rep('Importance',4),rep('Competence',4)),
value = rep(c(3.02,1.71,2.39,3.32),2),
width = c(3.48,3.52,4.01,2.98,
3.85,2.34,4.87,3.81))
The second plot is this:
ggplot(molten, aes(x=Activity, y=value, fill=variable)) + geom_bar(stat='identity',position = 'dodge')
and the first in something like this:
ggplot(molten, aes(x=Activity, y=value, fill=variable)) + geom_bar(stat='identity',aes(width = width/10))
Although I actually made it using slightly simpler dataframe, which I melt()-ed into the one above.
Not a perfect solution, but you can create a new column that combines Activity and Variable, use that as the x, and fill by variable:
molten<-mutate(molten,activity=paste(Activity,variable))
ggplot(molten, aes(x=activity, y=value,width = width/10)) +
geom_bar(stat='identity', aes(fill=variable)) +
theme(axis.text.x = element_text(angle = 45,hjust=1)) +
scale_x_discrete(breaks=molten$activity, labels=molten$Activity)
I've made some progress, building on iod's idea of mutating the original data frame.
I've made two separate geom-bars and nudged them into one another. I'd really love it if every bar touched its neighbor, but position_nudge() only takes a constant. I'm still getting into ggplot so the most obvious solution in my mind is a recycled 'nudge' vector, akin to barplot()'s color argument.
tldr: Little gaps between bars but reasonably pretty now.
molten<-mutate(molten,activity=paste(Activity,variable))
molten$importanceBars = c(value[variable=='Importance'],rep(0,nrow(molten)-sum(variable=='Importance')))
molten$competenceBars = c(rep(0,nrow(molten)-sum(variable=='Competence')),value[variable=='Competence'])
ggplot(molten, aes(x=activity,width = width/6, fill = variable)) +
geom_bar(stat='identity', aes(y=importanceBars),position=position_nudge(x=-0.2-0.35)) +
geom_bar(stat='identity', aes(y=competenceBars),position=position_nudge(x=-0.35)) +
theme(axis.text.x = element_text(angle = 45,hjust=1)) +
scale_x_discrete(breaks=molten$activity[molten$variable=='Competence'],
labels=molten$Activity[molten$variable=='Competence'])
I've done it - had to draw every bar as a rectangle, adjusting xmin and xmax accordingly.
wadjust = 5.5
gap = 0.0
minv = 1:length(value) - 0.5 + gap/2
maxv = minv + 1 -gap
minv[1:length(minv)%%2!=0] = maxv[1:length(maxv)%%2!=0] - width[order(Activity)][(1:length(width))%%2!=0]/wadjust
maxv[1:length(maxv)%%2==0] = minv[1:length(minv)%%2==0] + width[order(Activity)][(1:length(width))%%2==0]/wadjust
minv = minv +0.525
maxv = maxv +0.525
minvord = minv[order(Activity)]
maxvord = maxv[order(Activity)]
ggplot(molten, aes(x=activity,width = width/6, fill = variable)) +
geom_rect(xmin = minv,xmax = maxv, ymin = rep(0,28), ymax = value[order(Activity)],
fill = rep(c('#e1de00','#e84619'),len=28)) +
theme(axis.text.x = element_text(angle = 45,hjust=1)) +
theme(plot.margin=unit(c(1,0.5,1,2),"cm")) +
scale_x_discrete(breaks=molten$activity[molten$variable=='Importance'],
labels=molten$Activity[molten$variable=='Importance'][order(Activity[molten$variable=='Importance'])]) +
scale_y_continuous(labels = 0:3, breaks = 0:3, limits = c(0,3)) +
xlab('Activity') + ylab('Hours Spent') +
labs(title = 'Perceived Importance & Competence\nAssociated with Clerical Duties') +
theme(panel.grid.major.x = element_blank()) +
geom_vline(xintercept = (maxv[1:length(maxv)%%2!=0]+minv[1:length(minv)%%2==0])/2,col='white') +
geom_vline(xintercept = seq(len = 14, by = 2),col = 'white')

R: placing a text with combination of variables over bars in ggplot

Lets draw a bar chart with ggplot2 from the following data (already in a long format). The values of the variable are then placed in the middle of the bars via geom_text() directive.
stuff.dat<-read.csv(text="continent,stuff,num
America,apples,13
America,bananas,13
Europe,apples,30
Europe,bananas,21
total,apples,43
total,bananas,34")
library(ggplot2)
ggplot(stuff.dat, aes(x=continent, y=num,fill=stuff))+geom_col() +
geom_text(position = position_stack(vjust=0.5),
aes(label=num))
Now it is necessary to add on top of the bars the "Apple-Bananas Index", which is defined as f=apples/bananas - just as manually added in the figure. How to program this in ggplot? How it would be possible to add it to the legend as a separate entry?
I think that the easiest way to achieve this is to prepare the data before you create the plot. I define a function abi() that computes the apple-banana-index from stuff.dat given a continent:
abi <- function(cont) {
with(stuff.dat,
num[continent == cont & stuff == "apples"] / num[continent == cont & stuff == "bananas"]
)
}
And then I create a data frame with all the necessary data:
conts <- levels(stuff.dat$continent)
abi_df <- data.frame(continent = conts,
yf = aggregate(num ~ continent, sum, data = stuff.dat)$num + 5,
abi = round(sapply(conts, abi), 1))
Now, I can add that information to the plot:
library(ggplot2)
ggplot(stuff.dat, aes(x = continent, y = num, fill = stuff)) +
geom_col() +
geom_text(position = position_stack(vjust = 0.5), aes(label = num)) +
geom_text(data = abi_df, aes(y = yf, label = paste0("f = ", abi), fill = NA))
Adding fill = NA to the geom_text() is a bit of a hack and leads to a warning. But if fill is not set, plotting will fail with a message that stuff was not found. I also tried to move fill = stuff from ggplot() to geom_col() but this breaks the y⁻coordinate of the text labels inside the bars. There might be a cleaner solution to this, but I haven't found it yet.
Adding the additional legend is, unfortunately, not trivial, because one cannot easily add text outside the plot area. This actually needs two steps: first one adds text using annotation_custom(). Then, you need to turn clipping off to make the text visible (see, e.g., here). This is a possible solution:
p <- ggplot(stuff.dat, aes(x = continent, y = num, fill = stuff)) +
geom_col() +
geom_text(position = position_stack(vjust = 0.5), aes(label = num)) +
geom_text(data = abi_df, aes(y = yf, label = paste0("f = ", abi), fill = NA)) +
guides(size = guide_legend(title = "f: ABI", override.aes = list(fill = 1))) +
annotation_custom(grob = textGrob("f: ABI\n(Apple-\nBanana-\nIndex",
gp = gpar(cex = .8), just = "left"),
xmin = 3.8, xmax = 3.8, ymin = 17, ymax = 17)
# turn off clipping
library(grid)
gt <- ggplot_gtable(ggplot_build(p))
gt$layout$clip[gt$layout$name == "panel"] <- "off"
grid.draw(gt)

Resources