How to get complete, rather than partial, pie charts using gganimate - r

I have a problem when doing an animated pie chart with gganimate and ggplot.
I want to have normal pies each year, but my output is totally different.
You can see an example of the code using mtcars:
library(ggplot2)
library(gganimate)
#Some Data
df<-aggregate(mtcars$mpg, list(mtcars$cyl,mtcars$carb), sum)
colnames(df)<-c("X","Y","Z")
bp<- ggplot(df, aes(x="", y=Z, fill=X, frame=Y))+
geom_bar(width = 1, stat = "identity") + coord_polar("y", start=0)
gganimate(pie, "output.gif")
An this is the output:
It works well when the frame has only one level:

The ggplot code creates a single stacked bar chart with a section for every row in df. With coord_polar this becomes a single pie chart with a wedge for each row in the data frame. Then when you use gg_animate, each frame includes only the wedges that correspond to a given level of Y. That's why you're getting only a section of the full pie chart each time.
If instead you want a full pie for each level of Y, then one option would be to create a separate pie chart for each level of Y and then combine those pies into a GIF. Here's an example with some fake data that (I hope) is similar to your real data:
library(animation)
# Fake data
set.seed(40)
df = data.frame(Year = rep(2010:2015, 3),
disease = rep(c("Cardiovascular","Neoplasms","Others"), each=6),
count=c(sapply(c(1,1.5,2), function(i) cumsum(c(1000*i, sample((-200*i):(200*i),5))))))
saveGIF({
for (i in unique(df$Year)) {
p = ggplot(df[df$Year==i,], aes(x="", y=count, fill=disease, frame=Year))+
geom_bar(width = 1, stat = "identity") +
facet_grid(~Year) +
coord_polar("y", start=0)
print(p)
}
}, movie.name="test1.gif")
The pies in the GIF above are all the same size. But you can also change the size of the pies based on the sum of count for each level of Year (code adapted from this SO answer):
library(dplyr)
df = df %>% group_by(Year) %>%
mutate(cp1 = c(0, head(cumsum(count), -1)),
cp2 = cumsum(count))
saveGIF({
for (i in unique(df$Year)) {
p = ggplot(df %>% filter(Year==i), aes(fill=disease)) +
geom_rect(aes(xmin=0, xmax=max(cp2), ymin=cp1, ymax=cp2)) +
facet_grid(~Year) +
coord_polar("y", start=0) +
scale_x_continuous(limits=c(0,max(df$cp2)))
print(p)
}
}, movie.name="test2.gif")
If I can editorialize for a moment, although animation is cool (but pie charts are uncool, so maybe animating a bunch of pie charts just adds insult to injury), the data will probably be easier to comprehend with a plain old static line plot. For example:
ggplot(df, aes(x=Year, y=count, colour=disease)) +
geom_line() + geom_point() +
scale_y_continuous(limits=c(0, max(df$count)))
Or maybe this:
ggplot(df, aes(x=Year, y=count, colour=disease)) +
geom_line() + geom_point(show.legend=FALSE) +
geom_line(data=df %>% group_by(Year) %>% mutate(count=sum(count)),
aes(x=Year, y=count, colour="All"), lwd=1) +
scale_y_continuous(limits=c(0, df %>% group_by(Year) %>%
summarise(count=sum(count)) %>% max(.$count))) +
scale_colour_manual(values=c("black", hcl(seq(15,275,length=4)[1:3],100,65)))

Related

ggplot geom_bar plot percentages by group and facet_wrap

I want to plot multiple categories on a single graph, with the percentages of each category adding up to 100%. For example, if I were plotting male versus female, each grouping (male or female), would add up to 100%. I'm using the following code, where the percentages appear to be for all groups on both graphs, i.e. if you added up all the bars on the left and right hand graphs, they would total 100%, rather than the yellow bars on the left hand graph totalling 100%, the purple bars on the left hand graph totalling 100% etc.
I appreciate that this is doable by using stat = 'identity', but is there a way to do this in ggplot without wrangling the dataframe prior to plotting?
library(ggplot2)
tmp <- diamonds %>% filter(color %in% c("E","I")) %>% select(color, cut, clarity)
ggplot(data=tmp,
aes(x=clarity,
fill=cut)) +
geom_bar(aes(y = (..count..)/sum(..count..)), position="dodge") +
scale_y_continuous(labels = scales::percent) + facet_wrap(vars(color))
When computing the percentages inside ggplot2 you have to do the grouping of the data as you would when summarizing the data before passing it to ggplot. In your case the PANEL column added internally to the data by ggplot2 could be used for the grouping:
Using after_stat and tapply this could be achieved like so:
library(ggplot2)
library(dplyr)
tmp <- diamonds %>% filter(color %in% c("E","I")) %>% select(color, cut, clarity)
ggplot(data=tmp,
aes(x=clarity,
fill=cut)) +
geom_bar(aes(y = after_stat(count/tapply(count, PANEL, sum)[PANEL])), position="dodge") +
scale_y_continuous(labels = scales::percent) + facet_wrap(vars(color))
Or using the .. notation:
ggplot(data=tmp,
aes(x=clarity,
fill=cut)) +
geom_bar(aes(y = ..count../tapply(..count.., ..PANEL.., sum)[..PANEL..]), position="dodge") +
scale_y_continuous(labels = scales::percent) + facet_wrap(vars(color))
EDIT If you need to group by more than one variable I would suggest to make use of a helper function, where I make use of dplyr for the computations:
comp_pct <- function(count, PANEL, cut) {
data.frame(count, PANEL, cut) %>%
group_by(PANEL, cut) %>%
mutate(pct = count / sum(count)) %>%
pull(pct)
}
ggplot(data=tmp,
aes(x=clarity,
fill=cut)) +
geom_bar(aes(y = after_stat(comp_pct(count, PANEL, fill))), position="dodge") +
scale_y_continuous(labels = scales::percent) + facet_wrap(vars(color))

Plotting a bar chart with years grouped together

I am using the fivethirtyeight bechdel dataset, located here https://github.com/rudeboybert/fivethirtyeight, and am attempting to recreate the first plot shown in the article here https://fivethirtyeight.com/features/the-dollar-and-cents-case-against-hollywoods-exclusion-of-women/. I am having trouble getting the years to group together similarly to how they did in the article.
This is the current code I have:
ggplot(data = bechdel, aes(year)) +
geom_histogram(aes(fill = clean_test), binwidth = 5, position = "fill") +
scale_fill_manual(breaks = c("ok", "dubious", "men", "notalk", "nowomen"),
values=c("red", "salmon", "lightpink", "dodgerblue",
"blue")) +
theme_fivethirtyeight()
I see where you were going with using the histogram geom but this really looks more like a categorical bar chart. Once you take that approach it's easier, after a bit of ugly code to get the correct labels on the year columns.
The bars are stacked in the wrong order on this one, and there needs to be some formatting applied to look like the 538 chart, but I'll leave that for you.
library(fivethirtyeight)
library(tidyverse)
library(ggthemes)
library(scales)
# Create date range column
bechdel_summary <- bechdel %>%
mutate(date.range = ((year %/% 10)* 10) + ((year %% 10) %/% 5 * 5)) %>%
mutate(date.range = paste0(date.range," - '",substr(date.range + 5,3,5)))
ggplot(data = bechdel_summary, aes(x = date.range, fill = clean_test)) +
geom_bar(position = "fill", width = 0.95) +
scale_y_continuous(labels = percent) +
theme_fivethirtyeight()
ggplot

Merge x axis and legend of multiple plots

Using the following R code I generated multiple plots onto one plot.
library(ggplot2)
Adata=nt.df[( nt.df$Species=='Human' | nt.df$Species=='Arabidopsis' )& nt.df$Nucleotide=='A',]
Cdata=nt.df[( nt.df$Species=='Human' | nt.df$Species=='Arabidopsis' )& nt.df$Nucleotide=='C',]
Gdata=nt.df[( nt.df$Species=='Human' | nt.df$Species=='Arabidopsis' )& nt.df$Nucleotide=='G',]
Udata=nt.df[( nt.df$Species=='Human' | nt.df$Species=='Arabidopsis' )& nt.df$Nucleotide=='U',]
# Grouped
Aplot <- ggplot(data, aes(fill=Species, y=Percent, x=Position)) +
geom_bar(position="dodge", stat="identity")
Cplot <- ggplot(data, aes(fill=Species, y=Percent, x=Position)) +
geom_bar(position="dodge", stat="identity")
Gplot <- ggplot(data, aes(fill=Species, y=Percent, x=Position)) +
geom_bar(position="dodge", stat="identity")
Uplot <- ggplot(data, aes(fill=Species, y=Percent, x=Position)) +
geom_bar(position="dodge", stat="identity")
grid.arrange(Aplot,Cplot,Gplot,Uplot,ncol=1)
How can I merge the x axis label, the y axis labels, and the Species legend into 1 for the entire figure?
Also, would it make more sense to have the Position tick marks labeled for the entire figure or for each figure?
Try this:
require(dplyr)
require(ggplot2)
nt.df %>%
filter(Species %in% c('Human', 'Arabidopsis')) %>%
ggplot(aes(fill = Species, y = Percent, x = Position)) +
geom_bar(position="dodge", stat="identity") +
facet_wrap(. ~ Nucleotide)
Since the data wasn't posted I couldn't test it, but this should work. Let me know if you get an error. If you've never used piping before (%>%), it's a popular way to make code more readable and concise. Basically it makes whatever is to the left the first arg in the function to the right. In this case, data is the first arg in ggplot() so the filtered dataset goes into ggplot()

Display the total number of bin elements in a stacked histogram with ggplot2

I'd like to show data values on stacked bar chart in ggplot2. After many attempts, the only way I found to show the total amount (for each bean) is using the following code
set.seed(1234)
df <- data.frame(
sex=factor(rep(c("F", "M"), each=200)),
weight=round(c(rnorm(200, mean=55, sd=5), rnorm(200, mean=65, sd=5)))
)
p<-ggplot(df, aes(x=weight, fill=sex, color=sex))
p<-p + geom_histogram(position="stack", alpha=0.5, binwidth=5)
tbl <- (ggplot_build(p)$data[[1]])[, c("x", "count")]
agg <- aggregate(tbl["count"], by=tbl["x"], FUN=sum)
for(i in 1:length(agg$x))
if(agg$count[i])
p <- p + geom_text(x=agg$x[i], y=agg$count[i] + 1.5, label=agg$count[i], colour="black" )
which generates the following plot:
Is there a better (and more efficient) way to get the same result using ggplot2?
Thanks a lot in advance
You can use stat_bin to count up the values and add text labels.
p <- ggplot(df, aes(x=weight)) +
geom_histogram(aes(fill=sex, color=sex),
position="stack", alpha=0.5, binwidth=5) +
stat_bin(aes(y=..count.. + 2, label=..count..), geom="text", binwidth=5)
I moved the fill and color aesthetics to geom_histogram so that they would apply only to that layer and not globally to the whole plot, because we want stat_bin to generate and overall count for each bin, rather than separate counts for each level of sex. ..count.. is an internal variable returned by stat_bin that stores the counts.
In this case, it was straightforward to add the counts directly. However, in more complicated situations, you might sometimes want to summarise the data outside of ggplot and then feed the summary data to ggplot. Here's how you would do that in this case:
library(dplyr)
counts = df %>% group_by(weight = cut(weight, seq(30,100,5), right=FALSE)) %>%
summarise(n = n())
countsByGroup = df %>% group_by(sex, weight = cut(weight, seq(30,100,5), right=FALSE)) %>%
summarise(n = n())
ggplot(countsByGroup, aes(x=weight, y=n, fill=sex, color=sex)) +
geom_bar(stat="identity", alpha=0.5, width=1) +
geom_text(data=counts, aes(label=n, y=n+2), colour="black")
Or, you can just create countsByGroup and then create the equivalent of counts on the fly inside ggplot:
ggplot(countsByGroup, aes(x=weight, y=n, fill=sex, color=sex)) +
geom_bar(stat="identity", alpha=0.5, width=1) +
geom_text(data=countsByGroup %>% group_by(weight) %>% mutate(n=sum(n)),
aes(label=n, y=n+2), colour="black")

ggplot Donut chart

Hi I really have googled this a lot without any joy. Would be happy to get a reference to a website if it exists. I'm struggling to understand the Hadley documentation on polar coordinates and I know that pie/donut charts are considered inherently evil.
That said, what I'm trying to do is
Create a donut/ring chart (so a pie with an empty middle) like the tikz ring chart shown here
Add a second layer circle on top (with alpha=0.5 or so) that shows a second (comparable) variable.
Why? I'm looking to show financial information. The first ring is costs (broken down) and the second is total income. The idea is then to add + facet=period for each review period to show the trend in both revenues and expenses and the growth in both.
Any thoughts would be most appreciated
Note: Completely arbitrarily if an MWE is needed if this was tried with
donut_data=iris[,2:4]
revenue_data=iris[,1]
facet=iris$Species
That would be similar to what I'm trying to do.. Thanks
I don't have a full answer to your question, but I can offer some code that may help get you started making ring plots using ggplot2.
library(ggplot2)
# Create test data.
dat = data.frame(count=c(10, 60, 30), category=c("A", "B", "C"))
# Add addition columns, needed for drawing with geom_rect.
dat$fraction = dat$count / sum(dat$count)
dat = dat[order(dat$fraction), ]
dat$ymax = cumsum(dat$fraction)
dat$ymin = c(0, head(dat$ymax, n=-1))
p1 = ggplot(dat, aes(fill=category, ymax=ymax, ymin=ymin, xmax=4, xmin=3)) +
geom_rect() +
coord_polar(theta="y") +
xlim(c(0, 4)) +
labs(title="Basic ring plot")
p2 = ggplot(dat, aes(fill=category, ymax=ymax, ymin=ymin, xmax=4, xmin=3)) +
geom_rect(colour="grey30") +
coord_polar(theta="y") +
xlim(c(0, 4)) +
theme_bw() +
theme(panel.grid=element_blank()) +
theme(axis.text=element_blank()) +
theme(axis.ticks=element_blank()) +
labs(title="Customized ring plot")
library(gridExtra)
png("ring_plots_1.png", height=4, width=8, units="in", res=120)
grid.arrange(p1, p2, nrow=1)
dev.off()
Thoughts:
You may get more useful answers if you post some well-structured sample data. You have mentioned using some columns from the iris dataset (a good start), but I am unable to see how to use that data to make a ring plot. For example, the ring plot you have linked to shows proportions of several categories, but neither iris[, 2:4] nor iris[, 1] are categorical.
You want to "Add a second layer circle on top": Do you mean to superimpose the second ring directly on top of the first? Or do you want the second ring to be inside or outside of the first? You could add a second internal ring with something like geom_rect(data=dat2, xmax=3, xmin=2, aes(ymax=ymax, ymin=ymin))
If your data.frame has a column named period, you can use facet_wrap(~ period) for facetting.
To use ggplot2 most easily, you will want your data in 'long-form'; melt() from the reshape2 package may be useful for converting the data.
Make some barplots for comparison, even if you decide not to use them. For example, try:
ggplot(dat, aes(x=category, y=count, fill=category)) +
geom_bar(stat="identity")
Just trying to solve question 2 with the same approach from bdemarest's answer. Also using his code as a scaffold. I added some tests to make it more complete but feel free to remove them.
library(broom)
library(tidyverse)
# Create test data.
dat = data.frame(count=c(10,60,20,50),
ring=c("A", "A","B","B"),
category=c("C","D","C","D"))
# compute pvalue
cs.pvalue <- dat %>% spread(value = count,key=category) %>%
ungroup() %>% select(-ring) %>%
chisq.test() %>% tidy()
cs.pvalue <- dat %>% spread(value = count,key=category) %>%
select(-ring) %>%
fisher.test() %>% tidy() %>% full_join(cs.pvalue)
# compute fractions
#dat = dat[order(dat$count), ]
dat %<>% group_by(ring) %>% mutate(fraction = count / sum(count),
ymax = cumsum(fraction),
ymin = c(0,ymax[1:length(ymax)-1]))
# Add x limits
baseNum <- 4
#numCat <- length(unique(dat$ring))
dat$xmax <- as.numeric(dat$ring) + baseNum
dat$xmin = dat$xmax -1
# plot
p2 = ggplot(dat, aes(fill=category,
alpha = ring,
ymax=ymax,
ymin=ymin,
xmax=xmax,
xmin=xmin)) +
geom_rect(colour="grey30") +
coord_polar(theta="y") +
geom_text(inherit.aes = F,
x=c(-1,1),
y=0,
data = cs.pvalue,aes(label = paste(method,
"\n",
format(p.value,
scientific = T,
digits = 2))))+
xlim(c(0, 6)) +
theme_bw() +
theme(panel.grid=element_blank()) +
theme(axis.text=element_blank()) +
theme(axis.ticks=element_blank(),
panel.border = element_blank()) +
labs(title="Customized ring plot") +
scale_fill_brewer(palette = "Set1") +
scale_alpha_discrete(range = c(0.5,0.9))
p2
And the result:

Resources