How to add 3rd variable in bar chart for ggplot2? - r

I'm trying to make a bar chart with a 3rd variable (which in this case is "frequency") where the 3rd variable changes the width of the bars (higher frequency = larger width). Obviously I have to figure out the sizing, but that is just aesthetics and I can figure that out later. When I use this code I keep getting the error "position_dodge requires non-overlapping x intervals" and the plot then stacks the bars instead of grouping them. Also (maybe this could help) wondering if there is a way to increase the distance between labels on the x-axis (meaning increase the distance between "Iso", "Transition", "P&R Handler", etc.)
All help appreciated.
library(tidyverse)
library(ggrepel)
percentile_playtype = c(70.10, 41.20, 83.90, 0, 0, 97.30, 40, 0, 49.30, 20.10, 88.90, 91.80,
94.60, 0, 83.60, 86.90, 42, 41.10, 46.90, 0, 81.50, 84.00)
frequency = c(8.5,16.5,53.3,0,0,6,7.2,0,2.1,0.6,5.4,1.9,12.4,0,28,8.1,16,1.9,13.6,0,10.6,6.1)
v1 = sqrt(sqrt(sqrt(frequency)))/10
lowsize <- element_text(size=8)
playtype = c("Iso","Transition","P&R Handler","P&R Roll","Post Up","Spot Up",
"Handoff","Cut","Off Screen","Putbacks","Misc")
Player = rep(c("Trae Young","John Collins"), each=11)
PlayData <- data.frame(percentile_playtype,frequency,playtype,Player)
a1 <- ggplot(PlayData, aes(fill=Player, y=percentile_playtype, x=playtype)) +
geom_bar(position="dodge", stat="identity",width=v1)
a1

If you really want frequency to be mapped to bar width, you need to do it the hard way, and calculate those widths, plotting geom_rect rather than geom_bar. From a visual impact point of view it is better to scale the area of the bars rather than their absolute width:
PlayData$playtype_n <- as.numeric(as.factor(PlayData$playtype))
PlayData$frequency_n <- PlayData$frequency/max(PlayData$frequency) * 0.5 /
(PlayData$percentile_playtype / 100) *
(2 * as.numeric(as.factor(PlayData$Player)) - 3) +
as.numeric(as.factor(PlayData$playtype))
ggplot(PlayData, aes(fill = Player)) +
geom_rect(aes(xmin = playtype_n, xmax = frequency_n, ymin = 0,
ymax = percentile_playtype)) +
scale_x_continuous(breaks = sort(unique(PlayData$playtype_n)),
labels = levels(as.factor(PlayData$playtype))) +
scale_fill_manual(values = c("deepskyblue4", "orange")) +
labs(x = "Play type (area scaled to frequency)", y = "Percentile playtype") +
theme_bw()
Personally, I don't think this looks great, and I'm not convinced it's worth the trouble. Another, easier to understand approach might be to use facets
ggplot(PlayData, aes(fill=frequency, y=percentile_playtype, x=playtype)) +
geom_col(position = "dodge", width=0.75) +
geom_text(aes(label = frequency), vjust = 1.5, color = "white") +
facet_wrap(Player~., ncol = 1) +
scale_fill_viridis_c() +
theme_classic() +
theme(panel.grid.major.y = element_line(color = "gray90"),
strip.background = element_blank(),
strip.text.x = element_text(size = 16),
axis.line.x.bottom = element_line())
Or perhaps a labelled scatter plot using ggrepel:
ggplot(PlayData, aes(percentile_playtype, frequency, color = Player)) +
geom_point() +
geom_text_repel(aes(label = playtype), size = 5) +
scale_color_manual(values = c("deepskyblue4", "orange")) +
theme_bw()

Are you trying to mimic something like a mosaic plot?
percentile_playtype = c(70.10, 41.20, 83.90, 0, 0, 97.30, 40, 0, 49.30, 20.10, 88.90, 91.80,
94.60, 0, 83.60, 86.90, 42, 41.10, 46.90, 0, 81.50, 84.00)
frequency = c(8.5,16.5,53.3,0,0,6,7.2,0,2.1,0.6,5.4,1.9,12.4,0,28,8.1,16,1.9,13.6,0,10.6,6.1)
v1 = sqrt(sqrt(sqrt(frequency)))/10
playtype = c("Iso","Transition","P&R Handler","P&R Roll","Post Up","Spot Up",
"Handoff","Cut","Off Screen","Putbacks","Misc")
Player = rep(c("Trae Young","John Collins"), each=11)
PlayData <- data.frame(percentile_playtype,frequency,playtype,Player)
CGPfunctions::PlotXTabs2(PlayData,
x = playtype,
y = Player,
counts = percentile_playtype,
plottype = "mosaic",
x.axis.orientation = "slant",
sample.size.label = FALSE,
label.text.size = 2)

Related

Trying to separate bars in two overlayed bar charts in ggplot

I am trying to create a single chart from two created bar charts to show the differences in their distribution. I have both charts merging together, and the axis labels are correct. However, I cannot figure out how to get the bars in each section to be next to each other for comparison instead of overlaying. Data for this chart are two variables within the same DF. I am relatively new to r and new to ggplot so even plotting what I have was a challenge. Please be kind and I apologize if this is a question that has been answered before.
Here is the code I am using:
Labeled <- ggplot(NULL, aes(lab),position_dodge(.5)) + ggtitle("Figure 1. Comparison of Distribution of Age of Diagnosis and Age of Feeding Challenges")+
geom_bar(aes(x=AgeFactor,fill = "Age of Autism Diagnosis"), data = Graph, alpha = 0.5,width = 0.6) +
geom_bar(aes(x=FdgFactor,fill = "Feeding Challenge Onset"), data = Graph, alpha = 0.5,width=.6)+
scale_x_discrete(limits=c("0-6months","7-12months","1-1.99","2-2.99","3-3.99","4-4.99","5-5.99","6-6.99","7-7.99","8-8.99","9-9.99","10-10.99"))+
xlab("Age")+
ylab("")+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
scale_fill_discrete(name = "")
and this is the graph it is creating for me:
I really appreciate any insight. This is my first time asking a question on stack too - so I am happy to edit/adjust info as needed.
The issue is that you plot from different columns of your dataset. To dodge your bars position_dodge requires a column to group the data by. To this end you could reshape your data to long format using e.g. tidyr::pivot_longer so that your two columns are stacked on top of each other and you get a new column containing the column or group names as categories.
Using some fake random example data. First I replicate your issue with this data and your code:
set.seed(123)
levels <- c("0-6months", "7-12months", "1-1.99", "2-2.99", "3-3.99", "4-4.99", "5-5.99", "6-6.99", "7-7.99", "8-8.99", "9-9.99", "10-10.99")
Graph <- data.frame(
AgeFactor = sample(levels, 100, replace = TRUE),
FdgFactor = sample(levels, 100, replace = TRUE),
lab = 1:100
)
library(ggplot2)
ggplot(NULL, aes(lab), position_dodge(.5)) +
ggtitle("Figure 1. Comparison of Distribution of Age of Diagnosis and Age of Feeding Challenges") +
geom_bar(aes(x = AgeFactor, fill = "Age of Autism Diagnosis"), data = Graph, alpha = 0.5, width = 0.6) +
geom_bar(aes(x = FdgFactor, fill = "Feeding Challenge Onset"), data = Graph, alpha = 0.5, width = .6) +
scale_x_discrete(limits = c("0-6months", "7-12months", "1-1.99", "2-2.99", "3-3.99", "4-4.99", "5-5.99", "6-6.99", "7-7.99", "8-8.99", "9-9.99", "10-10.99")) +
xlab("Age") +
ylab("") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
scale_fill_discrete(name = "")
And now the fix using reshaping. Additionally I simplified your code a bit:
library(tidyr)
library(dplyr)
Graph_long <- Graph %>%
select(AgeFactor, FdgFactor) %>%
pivot_longer(c(AgeFactor, FdgFactor))
ggplot(Graph_long, aes(x = value, fill = name)) +
geom_bar(alpha = 0.5, width = 0.6, position = position_dodge()) +
scale_fill_discrete(labels = c("Age of Autism Diagnosis", "Feeding Challenge Onset")) +
scale_x_discrete(limits = levels) +
labs(x = "Age", y = NULL, fill = NULL, title = "Figure 1. Comparison of Distribution of Age of Diagnosis and Age of Feeding Challenges") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

Plot with geom_smooth(,) multiple colours, double y-axis with four variables in ggplot2

I have an issue with ggplot2 plotting system with R.
I would like to print a graph, scatterplot + smoothing with two grades (ref) and two variable each (Vix, monomer), with vix referring to the left y-Axis and monomer referring to the right y-Axis. I would like to have red and blue dark colour for ref at 130°C and the same but pale colours for the 150°C one. Colours are the followings, but for understanding it is not really important:'#644196', '#bba6d9', '#f92410', '#fca49c'. In this way I would obtain 4 lines with 4 different colours.
I used to define the colours according the command:
scale_color_manual(values=c('#644196', '#bba6d9', '#f92410', '#fca49c')) +
The problem is that I obtain 4 lines but only two colours and also the legend has only two assignments (and not 4 as i expected). It looks like it changes the colours over the ref and it doesn't assign any colour change to the two variables Vix and monomer.
Below I report the whole code.
Dati <- data.frame("Vix" = c(62500, 87000, 122000, 140000, 82700, 73000, 110000, 110000, 140300, 81500), "monomer" = c(0.089,0.08,0.095,0.1,0.111, 0.09, 0.094, 0.099, 0.111, 0.197), "Time" = c(30, 60, 90, 120, 135, 30, 60, 90, 120, 135), "ref" = c('130°C', '130°C', '130°C', '130°C', '130°C', '150°C', '150°C', '150°C', '150°C', '150°C'))
attach(Dati)
library(ggplot2)
library(readxl)
####Graph processing
scaleFactor <- max(Vix) / max(monomer)
Graph <- ggplot(Dati, aes(x= Time, col=(ref))) +
geom_point(aes(y= Vix, col=(ref)), shape = 1, size = 3.5) +
geom_smooth(aes(y= Vix), method="loess") +
geom_point(aes(y= monomer * scaleFactor, col=ref), shape = 1, size = 3.5) +
geom_smooth(aes(y=monomer * scaleFactor), method="loess") +
scale_color_manual(values=c('#644196', '#bba6d9', '#f92410', '#fca49c')) +
scale_y_continuous(name="Vix", sec.axis=sec_axis(~./scaleFactor, name="monomer")) +
theme(
axis.title.y.left=element_text(color='#f92410'),
axis.text.y.left=element_text(color='#f92410'),
axis.title.y.right=element_text(color='#644196'),
axis.text.y.right=element_text(color='#644196')
)
Graph
Obtained output graph
Is somebody able to understand wht could I do in order to fix this issue?
Thank you in advance for every your possible kind reply.
Probably the easiest way is to add information to the variable at the specification of aesthetics. In the example below, we paste0() the extra information whether the series is Vix or monomer to the colours.
Graph <- ggplot(Dati, aes(x= Time)) +
geom_point(aes(y= Vix, col=paste0("Vix ", ref)), shape = 1, size = 3.5) +
geom_smooth(aes(y= Vix, col = paste0("Vix ", ref)), method="loess") +
geom_point(aes(y= monomer * scaleFactor, col=paste0("Monomer ", ref)), shape = 1, size = 3.5) +
geom_smooth(aes(y=monomer * scaleFactor, col = paste0("Monomer ", ref)), method="loess") +
scale_color_manual(values=c('#644196', '#bba6d9', '#f92410', '#fca49c'),
name = "Series?") +
scale_y_continuous(name="Vix", sec.axis=sec_axis(~./scaleFactor, name="monomer")) +
theme(
axis.title.y.left=element_text(color='#f92410'),
axis.text.y.left=element_text(color='#f92410'),
axis.title.y.right=element_text(color='#644196'),
axis.text.y.right=element_text(color='#644196')
)
Graph
You have 2 colors because your variable mapped to color (ref) has 2 distinct values. I guess you would like to have Vix and monomer curves for each value of ref. You can get that by getting your data into long format and creating new variable that refers to temperature and to Vix or monomer:
scaleFactor <- max(Dati$Vix) / max(Dati$monomer)
STEP 1: calculate monomer, create column that tells you if value if Vix or monomer (long format for those two variables), and recreate ref
Dati <- Dati %>%
mutate(
monomer = monomer * scaleFactor
) %>%
pivot_longer(cols = c(Vix, monomer)) %>%
mutate(ref = str_c(ref, name, sep = "-"))
STEP 2 map ref to color aesthetic (long format is neat for ggplot2)
ggplot(Dati, aes(Time, value, color = ordered(ref, levels = unique(ref)))) +
geom_point(shape = 1, size = 3.5) +
geom_smooth(method = "loess") +
scale_color_manual("groups", values = c('#fca49c', '#bba6d9', '#f92410', '#644196')) +
scale_y_continuous(name = "Vix", sec.axis = sec_axis(~./scaleFactor, name = "monomer")) +
theme(
axis.title.y.left = element_text(color = '#f92410'),
axis.text.y.left = element_text(color = '#f92410'),
axis.title.y.right = element_text(color = '#644196'),
axis.text.y.right = element_text(color = '#644196')
)
RESULT:

Need help on customizing my Odds Ratio (ggplot)!

I'm assigned to create an Odds of Ratio ggplot in R. The plot I'm supposed to create is given below.
Given plot
My job is to figure out codes which creates the exact plots in R. I've done most parts. Here is my work.
My work
Before jumping into my code, it is very important that I am not using the correct values for boxOdds, boxCILow, and boxCIHigh since I have not figured out the correct values. I wanted to figure out codes for ggplot first so I can enter the right values as soon as I find them.
This is the code I used:
library(ggplot2)
boxLabels = c("Females/Males", "Student-Centered Prac. (+1)", "Instructor Quality (+1)", "Undecided / STM",
"non-STEM / STM", "Pre-med / STM", "Engineering / STM", "Std. test percentile (+10)",
"No previous calc / HS calc", "College calc / HS calc")
df <- data.frame(yAxis = length(boxLabels):1,
boxOdds =
c(2.23189, 1.315737, 1.22866, 0.8197413, 0.9802449, 0.9786673, 0.6559005, 0.5929812, 0.6923759, 1.3958275),
boxCILow =
c(.7543566,1.016,.9674772,.6463458,.9643047,.864922,.4965308,.3572142, 0.4523759, 1.2023275),
boxCIHigh =
c(6.603418,1.703902,1.560353,1.039654,.9964486,1.107371,.8664225,.9843584, 0.9323759, 1.5893275)
)
(p <- ggplot(df, aes(x = boxOdds, y = boxLabels)) +
geom_vline(aes(xintercept = 1), size = 0.75, linetype = 'dashed') +
geom_errorbarh(aes(xmax = boxCIHigh, xmin = boxCILow), size = .5, height =
0, color = 'gray50') +
geom_point(size = 3.5, color = 'orange') +
theme_bw() +
theme(panel.grid.minor = element_blank()) +
scale_x_continuous(breaks = seq(0,7,1) ) +
ylab('') +
xlab('Odds Ratio') +
annotate(geom = 'text', y =1.1, x = 3.5, label ='',
size = 3.5, hjust = 0) + ggtitle('Estimated Odds of Switching') +
theme(plot.title = element_text(hjust = 0.5, size = 30),
axis.title.x = (element_text(size = 15))) +
theme(panel.grid.minor = element_blank(), panel.grid.major = element_blank())
)
p
Where I'm stuck at:
Removing small vertical lines on the beginning and end of each row's CI). I was not sure what it's called so I was having hard time looking it up. SOLVED
I'm also stuck at coloring specific rows in different colors.
The last part I'm stuck at is assigning proper order of each variable for y-axis. As you can see in my code ("boxLabels" part), I have put all the variables in order of given plot but it seems like the R didn't care about the order. So the varaible located at the very top is "Undecided / STM", instead of "Females / Males".
How do I decrease the space from 0 to 1? SOLVED
Any help would be appreciated!
First, probably you want ggstance::geom_pointrangeh. Second, you could define colors by yAxis right at the beginning. To group some factors create a new variable group. Third is related to your data where you could assign factor labels. Fourth, remove coord_trans as suggested by #beetroot.
Assign factor labels
dat$yAxis <- factor(dat$yAxis, levels=10:1, labels=rev(boxLabels))
Create groups
dat$group <- 1
dat$group[which(dat$yAxis %in% c("Females/Males", "Undecided / STM", "non-STEM / STM",
"Pre-med / STM"))] <- 2
dat$group[which(dat$yAxis %in% c("Student-Centered Prac. (+1)",
"No previous calc / HS calc",
"College calc / HS calc"))] <- 3
Colors
colors <- c("#860fc2", "#fc691d", "black")
Plot
library(ggplot2)
library(ggstance)
ggplot(dat, aes(x=boxOdds, y=yAxis, color=as.factor(group))) +
geom_vline(aes(xintercept=1), size=0.75, linetype='dashed') +
geom_pointrangeh(aes(xmax=boxCIHigh, xmin=boxCILow), size=.5,
show.legend=FALSE) +
geom_point(size=3.5, show.legend=FALSE) +
theme_bw() +
scale_color_manual(values=colors)+
theme(panel.grid.minor=element_blank()) +
scale_x_continuous(breaks=seq(0,7,1), limits=c(0, max(dat[2:4]))) +
ylab('') +
xlab('Odds Ratio') +
annotate(geom='text', y =1.1, x=3.5, label ='',
size=3.5, hjust=0) + ggtitle('Estimated Odds of Switching') +
theme(plot.title=element_text(hjust=.5, size=20)) +
theme(panel.grid.minor=element_blank(), panel.grid.major=element_blank())
Gives
Data
dat <- structure(list(yAxis = 10:1, boxOdds = c(2.23189, 1.315737, 1.22866,
0.8197413, 0.9802449, 0.9786673, 0.6559005, 0.5929812, 0.6923759,
1.3958275), boxCILow = c(0.7543566, 1.016, 0.9674772, 0.6463458,
0.9643047, 0.864922, 0.4965308, 0.3572142, 0.4523759, 1.2023275
), boxCIHigh = c(6.603418, 1.703902, 1.560353, 1.039654, 0.9964486,
1.107371, 0.8664225, 0.9843584, 0.9323759, 1.5893275)), class = "data.frame", row.names = c(NA,
-10L))

How to stack two ggplot points on top of each other with error bars

I am attempting to plot two points that have error bars on them on top of each other in ggplot. However, the error bars are not syncing up with the points. This is the code that I'm using, and I have my ensuing graph attached:
df = data.frame(rtix = mean(DandI_Variance$`1RTI`[1:11]),
rtiy = 50,
rtixmin = DandI_Variance$`1RTI`[11],
rtixmax = DandI_Variance$`1RTI`[1],
rtiymin = 52,
rtiymax = 42,
rtcx = mean(DandI_Variance$`1RTC`[1:11]),
rtcy = 75,
rtcxmin = DandI_Variance$`1RTC`[11],
rtcxmax = DandI_Variance$`1RTC`[1],
rtcymin = 69,
rtcymax = 79)
ggplot(data = df, aes(x = rtix, y = rtiy)) +
geom_point() +
geom_errorbar(aes(ymin = rtiymin, ymax = rtiymax, width = .07, color = "blue")) +
geom_errorbarh(aes(xmin = rtixmin, xmax = rtixmax, height = 10, color = "blue")) +
geom_point(aes(x = rtcx, y = rtcy)) +
geom_errorbar(aes(ymin = rtcymin, ymax = rtcymax, width = .07, color = "red")) +
geom_errorbarh(aes(xmin = rtcxmin, xmax = rtcxmax, height = 10, color = "red")) +
xlab("S Equalibrium") +
ylab("Time to Equalibrium") +
ylim(0, 100) +
xlim(0, 1) +
ggtitle("Performance of Models")
I think there might be some confusion caused within ggplot since you have the geom_errorbar() and geom_errorbarh() functions twice in the same call. It also just looked like you're structuring your data frame in a weird way. Rather than having one row, why not give your data frame 2 rows, each with identifying columns?
I'd try structuring the code like this as a first step (hopefully this solves it).
I've just compressed the dataframe into 2 rows and 7 columns (adding a new one for type to use for color), I've then just called the ggplot2 functions once rather than twice, and moved the width outside of the aes call (since the aes call take the inputs as names, not values, this means a width of 0.7 is actually a factor called "0.7" not what you're desiring, which is a numerical width of 0.7) and kept the color in (only because color is now using a column instead of a name, notice on your plot "blue" is actually red and vice versa, that's because of the same problem as the width issue I described above). Finally I've added the manual colour scale so we can choose which has which colour. You can switch blue and red around if you want it in the other order.
df = data.frame(rtx = c(mean(DandI_Variance$`1RTI`[1:11]),
mean(DandI_Variance$`1RTC`[1:11])),
rty = c(50,75),
rtxmin = c(DandI_Variance$`1RTI`[11],
DandI_Variance$`1RTC`[11]),
rtxmax = c(DandI_Variance$`1RTI`[1],
DandI_Variance$`1RTC`[1]),
rtymin = c(52,69),
rtymax = c(42,79),
rttype = c('I', 'C')
)
ggplot(data = df, aes(x = rtx, y = rty)) +
geom_point() +
geom_errorbar(aes(ymin = rtymin, ymax = rtymax, color = rttype), width = .07) +
geom_errorbarh(aes(xmin = rtxmin, xmax = rtxmax, color = rttype), height = 10) +
scale_color_manual(values = c("blue", "red")) +
xlab("S Equalibrium") +
ylab("Time to Equalibrium") +
ylim(0, 100) +
xlim(0, 1) +
ggtitle("Performance of Models")

ggplot, facet, piechart: placing text in the middle of pie chart slices

I'm trying to produce a facetted pie-chart with ggplot and facing problems with placing text in the middle of each slice:
dat = read.table(text = "Channel Volume Cnt
AGENT high 8344
AGENT medium 5448
AGENT low 23823
KIOSK high 19275
KIOSK medium 13554
KIOSK low 38293", header=TRUE)
vis = ggplot(data=dat, aes(x=factor(1), y=Cnt, fill=Volume)) +
geom_bar(stat="identity", position="fill") +
coord_polar(theta="y") +
facet_grid(Channel~.) +
geom_text(aes(x=factor(1), y=Cnt, label=Cnt, ymax=Cnt),
position=position_fill(width=1))
The output:
What parameters of geom_text should be adjusted in order to place numerical labels in the middle of piechart slices?
Related question is Pie plot getting its text on top of each other but it doesn't handle case with facet.
UPDATE: following Paul Hiemstra advice and approach in the question above I changed code as follows:
---> pie_text = dat$Cnt/2 + c(0,cumsum(dat$Cnt)[-length(dat$Cnt)])
vis = ggplot(data=dat, aes(x=factor(1), y=Cnt, fill=Volume)) +
geom_bar(stat="identity", position="fill") +
coord_polar(theta="y") +
facet_grid(Channel~.) +
geom_text(aes(x=factor(1),
---> y=pie_text,
label=Cnt, ymax=Cnt), position=position_fill(width=1))
As I expected tweaking text coordiantes is absolute but it needs be within facet data:
NEW ANSWER: With the introduction of ggplot2 v2.2.0, position_stack() can be used to position the labels without the need to calculate a position variable first. The following code will give you the same result as the old answer:
ggplot(data = dat, aes(x = "", y = Cnt, fill = Volume)) +
geom_bar(stat = "identity") +
geom_text(aes(label = Cnt), position = position_stack(vjust = 0.5)) +
coord_polar(theta = "y") +
facet_grid(Channel ~ ., scales = "free")
To remove "hollow" center, adapt the code to:
ggplot(data = dat, aes(x = 0, y = Cnt, fill = Volume)) +
geom_bar(stat = "identity") +
geom_text(aes(label = Cnt), position = position_stack(vjust = 0.5)) +
scale_x_continuous(expand = c(0,0)) +
coord_polar(theta = "y") +
facet_grid(Channel ~ ., scales = "free")
OLD ANSWER: The solution to this problem is creating a position variable, which can be done quite easily with base R or with the data.table, plyr or dplyr packages:
Step 1: Creating the position variable for each Channel
# with base R
dat$pos <- with(dat, ave(Cnt, Channel, FUN = function(x) cumsum(x) - 0.5*x))
# with the data.table package
library(data.table)
setDT(dat)
dat <- dat[, pos:=cumsum(Cnt)-0.5*Cnt, by="Channel"]
# with the plyr package
library(plyr)
dat <- ddply(dat, .(Channel), transform, pos=cumsum(Cnt)-0.5*Cnt)
# with the dplyr package
library(dplyr)
dat <- dat %>% group_by(Channel) %>% mutate(pos=cumsum(Cnt)-0.5*Cnt)
Step 2: Creating the facetted plot
library(ggplot2)
ggplot(data = dat) +
geom_bar(aes(x = "", y = Cnt, fill = Volume), stat = "identity") +
geom_text(aes(x = "", y = pos, label = Cnt)) +
coord_polar(theta = "y") +
facet_grid(Channel ~ ., scales = "free")
The result:
I would like to speak out against the conventional way of making pies in ggplot2, which is to draw a stacked barplot in polar coordinates. While I appreciate the mathematical elegance of that approach, it does cause all sorts of headaches when the plot doesn't look quite the way it's supposed to. In particular, precisely adjusting the size of the pie can be difficult. (If you don't know what I mean, try to make a pie chart that extends all the way to the edge of the plot panel.)
I prefer drawing pies in a normal cartesian coordinate system, using geom_arc_bar() from ggforce. It requires a little bit of extra work on the front end, because we have to calculate angles ourselves, but that's easy and the level of control we get as a result is more than worth it.
I've used this approach in previous answers here and here.
The data (from the question):
dat = read.table(text = "Channel Volume Cnt
AGENT high 8344
AGENT medium 5448
AGENT low 23823
KIOSK high 19275
KIOSK medium 13554
KIOSK low 38293", header=TRUE)
The pie-drawing code:
library(ggplot2)
library(ggforce)
library(dplyr)
# calculate the start and end angles for each pie
dat_pies <- left_join(dat,
dat %>%
group_by(Channel) %>%
summarize(Cnt_total = sum(Cnt))) %>%
group_by(Channel) %>%
mutate(end_angle = 2*pi*cumsum(Cnt)/Cnt_total, # ending angle for each pie slice
start_angle = lag(end_angle, default = 0), # starting angle for each pie slice
mid_angle = 0.5*(start_angle + end_angle)) # middle of each pie slice, for the text label
rpie = 1 # pie radius
rlabel = 0.6 * rpie # radius of the labels; a number slightly larger than 0.5 seems to work better,
# but 0.5 would place it exactly in the middle as the question asks for.
# draw the pies
ggplot(dat_pies) +
geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = rpie,
start = start_angle, end = end_angle, fill = Volume)) +
geom_text(aes(x = rlabel*sin(mid_angle), y = rlabel*cos(mid_angle), label = Cnt),
hjust = 0.5, vjust = 0.5) +
coord_fixed() +
scale_x_continuous(limits = c(-1, 1), name = "", breaks = NULL, labels = NULL) +
scale_y_continuous(limits = c(-1, 1), name = "", breaks = NULL, labels = NULL) +
facet_grid(Channel~.)
To show why I think this this approach is so much more powerful than the conventional (coord_polar()) approach, let's say we want the labels on the outside of the pie rather than inside. This creates a couple of problems, such as we will have to adjust hjust and vjust depending on the side of the pie a label falls, and also we will have to make the
plot panel wider than high to make space for the labels on the side without generating excessive space above and below. Solving these problems in the polar coordinate approach is not fun, but it's trivial in the cartesian coordinates:
# generate hjust and vjust settings depending on the quadrant into which each
# label falls
dat_pies <- mutate(dat_pies,
hjust = ifelse(mid_angle>pi, 1, 0),
vjust = ifelse(mid_angle<pi/2 | mid_angle>3*pi/2, 0, 1))
rlabel = 1.05 * rpie # now we place labels outside of the pies
ggplot(dat_pies) +
geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0, r = rpie,
start = start_angle, end = end_angle, fill = Volume)) +
geom_text(aes(x = rlabel*sin(mid_angle), y = rlabel*cos(mid_angle), label = Cnt,
hjust = hjust, vjust = vjust)) +
coord_fixed() +
scale_x_continuous(limits = c(-1.5, 1.4), name = "", breaks = NULL, labels = NULL) +
scale_y_continuous(limits = c(-1, 1), name = "", breaks = NULL, labels = NULL) +
facet_grid(Channel~.)
To tweak the position of the label text relative to the coordinate, you can use the vjust and hjust arguments of geom_text. This will determine the position of all labels simultaneously, so this might not be what you need.
Alternatively, you could tweak the coordinate of the label. Define a new data.frame where you average the Cnt coordinate (label_x[i] = Cnt[i+1] + Cnt[i]) to position the label in the center of that particular pie. Just pass this new data.frame to geom_text in replacement of the original data.frame.
In addition, piecharts have some visual interpretation flaws. In general I would not use them, especially where good alternatives exist, e.g. a dotplot:
ggplot(dat, aes(x = Cnt, y = Volume)) +
geom_point() +
facet_wrap(~ Channel, ncol = 1)
For example, from this plot it is obvious that Cnt is higher for Kiosk than for Agent, this information is lost in the piechart.
Following answer is partial, clunky and I won't accept it.
The hope is that it will solicit better solution.
text_KIOSK = dat$Cnt
text_AGENT = dat$Cnt
text_KIOSK[dat$Channel=='AGENT'] = 0
text_AGENT[dat$Channel=='KIOSK'] = 0
text_KIOSK = text_KIOSK/1.7 + c(0,cumsum(text_KIOSK)[-length(dat$Cnt)])
text_AGENT = text_AGENT/1.7 + c(0,cumsum(text_AGENT)[-length(dat$Cnt)])
text_KIOSK[dat$Channel=='AGENT'] = 0
text_AGENT[dat$Channel=='KIOSK'] = 0
pie_text = text_KIOSK + text_AGENT
vis = ggplot(data=dat, aes(x=factor(1), y=Cnt, fill=Volume)) +
geom_bar(stat="identity", position=position_fill(width=1)) +
coord_polar(theta="y") +
facet_grid(Channel~.) +
geom_text(aes(y=pie_text, label=format(Cnt,format="d",big.mark=','), ymax=Inf), position=position_fill(width=1))
It produces following chart:
As you noticed I can't move labels for green (low).

Resources