Hi I really have googled this a lot without any joy. Would be happy to get a reference to a website if it exists. I'm struggling to understand the Hadley documentation on polar coordinates and I know that pie/donut charts are considered inherently evil.
That said, what I'm trying to do is
Create a donut/ring chart (so a pie with an empty middle) like the tikz ring chart shown here
Add a second layer circle on top (with alpha=0.5 or so) that shows a second (comparable) variable.
Why? I'm looking to show financial information. The first ring is costs (broken down) and the second is total income. The idea is then to add + facet=period for each review period to show the trend in both revenues and expenses and the growth in both.
Any thoughts would be most appreciated
Note: Completely arbitrarily if an MWE is needed if this was tried with
donut_data=iris[,2:4]
revenue_data=iris[,1]
facet=iris$Species
That would be similar to what I'm trying to do.. Thanks
I don't have a full answer to your question, but I can offer some code that may help get you started making ring plots using ggplot2.
library(ggplot2)
# Create test data.
dat = data.frame(count=c(10, 60, 30), category=c("A", "B", "C"))
# Add addition columns, needed for drawing with geom_rect.
dat$fraction = dat$count / sum(dat$count)
dat = dat[order(dat$fraction), ]
dat$ymax = cumsum(dat$fraction)
dat$ymin = c(0, head(dat$ymax, n=-1))
p1 = ggplot(dat, aes(fill=category, ymax=ymax, ymin=ymin, xmax=4, xmin=3)) +
geom_rect() +
coord_polar(theta="y") +
xlim(c(0, 4)) +
labs(title="Basic ring plot")
p2 = ggplot(dat, aes(fill=category, ymax=ymax, ymin=ymin, xmax=4, xmin=3)) +
geom_rect(colour="grey30") +
coord_polar(theta="y") +
xlim(c(0, 4)) +
theme_bw() +
theme(panel.grid=element_blank()) +
theme(axis.text=element_blank()) +
theme(axis.ticks=element_blank()) +
labs(title="Customized ring plot")
library(gridExtra)
png("ring_plots_1.png", height=4, width=8, units="in", res=120)
grid.arrange(p1, p2, nrow=1)
dev.off()
Thoughts:
You may get more useful answers if you post some well-structured sample data. You have mentioned using some columns from the iris dataset (a good start), but I am unable to see how to use that data to make a ring plot. For example, the ring plot you have linked to shows proportions of several categories, but neither iris[, 2:4] nor iris[, 1] are categorical.
You want to "Add a second layer circle on top": Do you mean to superimpose the second ring directly on top of the first? Or do you want the second ring to be inside or outside of the first? You could add a second internal ring with something like geom_rect(data=dat2, xmax=3, xmin=2, aes(ymax=ymax, ymin=ymin))
If your data.frame has a column named period, you can use facet_wrap(~ period) for facetting.
To use ggplot2 most easily, you will want your data in 'long-form'; melt() from the reshape2 package may be useful for converting the data.
Make some barplots for comparison, even if you decide not to use them. For example, try:
ggplot(dat, aes(x=category, y=count, fill=category)) +
geom_bar(stat="identity")
Just trying to solve question 2 with the same approach from bdemarest's answer. Also using his code as a scaffold. I added some tests to make it more complete but feel free to remove them.
library(broom)
library(tidyverse)
# Create test data.
dat = data.frame(count=c(10,60,20,50),
ring=c("A", "A","B","B"),
category=c("C","D","C","D"))
# compute pvalue
cs.pvalue <- dat %>% spread(value = count,key=category) %>%
ungroup() %>% select(-ring) %>%
chisq.test() %>% tidy()
cs.pvalue <- dat %>% spread(value = count,key=category) %>%
select(-ring) %>%
fisher.test() %>% tidy() %>% full_join(cs.pvalue)
# compute fractions
#dat = dat[order(dat$count), ]
dat %<>% group_by(ring) %>% mutate(fraction = count / sum(count),
ymax = cumsum(fraction),
ymin = c(0,ymax[1:length(ymax)-1]))
# Add x limits
baseNum <- 4
#numCat <- length(unique(dat$ring))
dat$xmax <- as.numeric(dat$ring) + baseNum
dat$xmin = dat$xmax -1
# plot
p2 = ggplot(dat, aes(fill=category,
alpha = ring,
ymax=ymax,
ymin=ymin,
xmax=xmax,
xmin=xmin)) +
geom_rect(colour="grey30") +
coord_polar(theta="y") +
geom_text(inherit.aes = F,
x=c(-1,1),
y=0,
data = cs.pvalue,aes(label = paste(method,
"\n",
format(p.value,
scientific = T,
digits = 2))))+
xlim(c(0, 6)) +
theme_bw() +
theme(panel.grid=element_blank()) +
theme(axis.text=element_blank()) +
theme(axis.ticks=element_blank(),
panel.border = element_blank()) +
labs(title="Customized ring plot") +
scale_fill_brewer(palette = "Set1") +
scale_alpha_discrete(range = c(0.5,0.9))
p2
And the result:
Related
I need to look for correlations in the publicly available flights package. I managed to make a scatter plot using ggplot.
With the code:
library(nycflights13)
attach(flights)
ggplot(flights, aes(x = arr_delay, y = dep_delay)) +
geom_point(size = 2) +
geom_smooth(method="auto", se=TRUE, fullrange=FALSE, level=0.95)
As show in the image most is centered in the bottom left. Is there any way to make this graph look more visually appealing by spreading the plotted values better?
You can plot your points by using the alpha parameter which gives a degree of transparency (between 0 and 1 being the most opaque) to them. This will make overlapping points distinguish better while also making the regions of the plot with higher concentration look darker. The style of the plot will improve, too.
Start with a value of alpha = 0.7 then experiment with it until you get the best results.
ggplot(flights, aes(x = arr_delay, y = dep_delay)) +
geom_point(size = 2, alpha = 0.7) +
geom_smooth(method="auto", se=TRUE, fullrange=FALSE, level=0.95)
I used facets to separate the flights that arrived early or on time (arr_delay <=0) with those that arrived late (arr_delay>0). The relationship seems different.
library(nycflights13)
library(dplyr)
library(ggplot2)
ff <- flights %>%
filter(!is.na(arr_delay), origin=="LGA") %>% # Filtered to reduce waiting time!
mutate(`Arrival time`=ifelse(arr_delay<=0, "Early", "Delayed"))
ggplot(ff, aes(x = arr_delay, y = dep_delay)) +
geom_point(size = 2, alpha = 0.3) +
geom_smooth(method="auto", fullrange=FALSE, level=0.95) +
facet_wrap(~`Arrival time`, scales="free", labeller=label_both) +
labs(x="Arrival delay (minutes)", y="Departure delay (minutes)")
For the points, you could use aggregated data, for the smooth the normal data.
flights <- within(flights, {
bin <- floor(dep_delay / 10)
av_arr <- ave(arr_delay, bin, FUN=mean)
av_dep <- ave(dep_delay, bin, FUN=mean)
})
library("ggplot2")
library("nycflights13")
ggplot(flights) +
geom_point(aes(x=av_arr, y=av_dep), size=2) +
geom_smooth(aes(x=arr_delay, y=dep_delay), method="auto", se=TRUE,
fullrange=FALSE, level=0.95)
I have a problem combining plotly::ggplotly (v4.7.1) with facet_wrap (v3.0.0) that I can't seem to generalise but is reproducible with a particular dataset (summary metrics for a set of tweets):
require(tidyverse)
require(plotly)
d = read_csv('https://gist.githubusercontent.com/geotheory/21c4eacbf38ed397f7cf984f8d92e931/raw/9148df79326f53a66a8cc363241a440752487357/data.csv')
d = d %>% mutate(key = fct_reorder(key, n)) # order the bars
p = ggplot(d, aes(key, n)) + geom_bar(stat='identity') +
facet_wrap(~ set, scales='free', nrow=1) +
labs(x=NULL, y=NULL) + coord_flip()
print(p)
Enter ggplotly:
print(ggplotly(p))
This seems to relate to the combination of nrow=1 and scales='free' arguments. Any ideas about the cause?
I have a problem when doing an animated pie chart with gganimate and ggplot.
I want to have normal pies each year, but my output is totally different.
You can see an example of the code using mtcars:
library(ggplot2)
library(gganimate)
#Some Data
df<-aggregate(mtcars$mpg, list(mtcars$cyl,mtcars$carb), sum)
colnames(df)<-c("X","Y","Z")
bp<- ggplot(df, aes(x="", y=Z, fill=X, frame=Y))+
geom_bar(width = 1, stat = "identity") + coord_polar("y", start=0)
gganimate(pie, "output.gif")
An this is the output:
It works well when the frame has only one level:
The ggplot code creates a single stacked bar chart with a section for every row in df. With coord_polar this becomes a single pie chart with a wedge for each row in the data frame. Then when you use gg_animate, each frame includes only the wedges that correspond to a given level of Y. That's why you're getting only a section of the full pie chart each time.
If instead you want a full pie for each level of Y, then one option would be to create a separate pie chart for each level of Y and then combine those pies into a GIF. Here's an example with some fake data that (I hope) is similar to your real data:
library(animation)
# Fake data
set.seed(40)
df = data.frame(Year = rep(2010:2015, 3),
disease = rep(c("Cardiovascular","Neoplasms","Others"), each=6),
count=c(sapply(c(1,1.5,2), function(i) cumsum(c(1000*i, sample((-200*i):(200*i),5))))))
saveGIF({
for (i in unique(df$Year)) {
p = ggplot(df[df$Year==i,], aes(x="", y=count, fill=disease, frame=Year))+
geom_bar(width = 1, stat = "identity") +
facet_grid(~Year) +
coord_polar("y", start=0)
print(p)
}
}, movie.name="test1.gif")
The pies in the GIF above are all the same size. But you can also change the size of the pies based on the sum of count for each level of Year (code adapted from this SO answer):
library(dplyr)
df = df %>% group_by(Year) %>%
mutate(cp1 = c(0, head(cumsum(count), -1)),
cp2 = cumsum(count))
saveGIF({
for (i in unique(df$Year)) {
p = ggplot(df %>% filter(Year==i), aes(fill=disease)) +
geom_rect(aes(xmin=0, xmax=max(cp2), ymin=cp1, ymax=cp2)) +
facet_grid(~Year) +
coord_polar("y", start=0) +
scale_x_continuous(limits=c(0,max(df$cp2)))
print(p)
}
}, movie.name="test2.gif")
If I can editorialize for a moment, although animation is cool (but pie charts are uncool, so maybe animating a bunch of pie charts just adds insult to injury), the data will probably be easier to comprehend with a plain old static line plot. For example:
ggplot(df, aes(x=Year, y=count, colour=disease)) +
geom_line() + geom_point() +
scale_y_continuous(limits=c(0, max(df$count)))
Or maybe this:
ggplot(df, aes(x=Year, y=count, colour=disease)) +
geom_line() + geom_point(show.legend=FALSE) +
geom_line(data=df %>% group_by(Year) %>% mutate(count=sum(count)),
aes(x=Year, y=count, colour="All"), lwd=1) +
scale_y_continuous(limits=c(0, df %>% group_by(Year) %>%
summarise(count=sum(count)) %>% max(.$count))) +
scale_colour_manual(values=c("black", hcl(seq(15,275,length=4)[1:3],100,65)))
I'm trying to produce a boxplot of some numeric outcome broken down by treatment condition and visit number, with the number of observations in each box placed under the plot, and the visit numbers labeled as well. Here's some fake data that will serve to illustrate, and I give two examples of things I've tried that didn't quite work.
library(ggplot2)
library(plyr)
trt <- factor(rep(LETTERS[1:2],150),ordered=TRUE)
vis <- factor(c(rep(1,150),rep(2,100),rep(3,50)),ordered=TRUE)
val <- rnorm(300)
data <- data.frame(trt,vis,val)
data.sum <- ddply(data, .(vis, trt), summarise,
N=length(na.omit(val)))
mytheme <- theme_bw() + theme(panel.margin = unit(0, "lines"), strip.background = element_blank())
The below code produces a plot that has N labels where I want them. It does this by grabbing summary data from an auxiliary dataset I created. However, I couldn't figure out how to also label visit on the x-axis (ideally, below the individual box labels), or to delineate visits visually in other ways (e.g. lines separating them into panels).
plot1 <- ggplot(data) +
geom_boxplot(aes(x=vis:trt,y=val,group=vis:trt,colour=trt), show.legend=FALSE) +
scale_x_discrete(labels=paste(data.sum$trt,data.sum$N,sep="\n")) +
labs(x="Visit") + mytheme
The plot below is closer to what I want than the one above, in that it has a nice hierarchy of treatments and visits, and a pretty format delineating the visits. However, for each panel it grabs the Ns from the first row in the summary data that matches the treatment condition, because it doesn't "know" that each facet needs to use the row corresponding to that visit.
plot2 <- ggplot(data) + geom_boxplot(aes(x=trt,y=val,group=trt,colour=trt), show.legend=FALSE) +
facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1) +
scale_x_discrete(labels=paste(data.sum$trt,data.sum$N,sep="\n")) +
labs(x="Visit") + mytheme
One workaround is to manipulate your dataset so your x variable is the interaction between trt and N.
Working off what you already have, you can add N to the original dataset via a merge.
test = merge(data, data.sum)
Then make a new variable that is the combination of trt and N.
test = transform(test, trt2 = paste(trt, N, sep = "\n"))
Now make the plot, using the new trt2 variable on the x axis and using scales = "free_x" in facet_wrap to allow for the different labels per facet.
ggplot(test) +
geom_boxplot(aes(x = trt2, y = val, group = trt, colour = trt), show.legend = FALSE) +
facet_wrap(~ vis, drop = FALSE, switch="x", nrow = 1, scales = "free_x") +
labs(x="Visit") +
mytheme
Since this functionality isn't built in a good work-around is grid.extra:
library(gridExtra)
p1 <- ggplot(data[data$vis==1,]) + geom_boxplot(aes(x=trt,y=val,group=trt,colour=trt), show.legend=FALSE) +
#facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1) +
scale_x_discrete(labels=lb[1:2]) + #paste(data.sum$trt,data.sum$N,sep="\n")
labs(x="Visit") + mytheme
p2 <- ggplot(data[data$vis==2,]) + geom_boxplot(aes(x=trt,y=val,group=trt,colour=trt), show.legend=FALSE) +
#facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1) +
scale_x_discrete(labels=lb[3:4]) + #paste(data.sum$trt,data.sum$N,sep="\n")
labs(x="Visit") + mytheme
p3 <- ggplot(data[data$vis==3,]) + geom_boxplot(aes(x=trt,y=val,group=trt,colour=trt), show.legend=FALSE) +
#facet_wrap(~ vis, drop=FALSE, switch="x", nrow=1) +
scale_x_discrete(labels=lb[5:6]) + #paste(data.sum$trt,data.sum$N,sep="\n")
labs(x="Visit") + mytheme
grid.arrange(p1,p2,p3,nrow=1,ncol=3) # fully customizable
Related:
Varying axis labels formatter per facet in ggplot/R
You can also make them vertical or do other transformations:
I'm trying to recreate this graph produced by Tableau using ggplot2. I've gotten far enough but I can't seem to figure out how to add color (whose intensity is proportional to the amount of profit).
The dataset is here
Here's the plot I want to replicate
https://www.dropbox.com/s/wcu780m72a85lvi/Screen%20Shot%202014-05-11%20at%209.05.49%20PM.png
Here's my code so far:
ggplot(coffee,aes(x=Product,weight=Sales))
+geom_bar()+facet_grid(Market~Product.Type,scales="free_x",space="free")
+ylab("Sales")+theme(axis.text.x=element_text(angle=90))
Using the aggregate function.
library(ggplot2)
coffee <- read.csv('CoffeeChain.csv')
agg <- aggregate(cbind(Profit, Sales) ~ Product+Market+Product.Type, data=coffee, FUN=sum)
ggplot(agg, aes(x=Product, weight=Sales, fill=Profit), stat="identity") +
geom_bar() +
scale_fill_gradientn(colours=c("#F37767", "#9FC08D", "#6BA862", "#2B893E", "#036227")) +
facet_grid(Market~Product.Type, scales="free_x", space="free") +
ylab("Sales") +
theme(axis.text.x=element_text(angle=90))
Probably not the best way to do it:
require(ggplot2)
aggProfit <- ave(coffee$Profit, coffee$Product.Type, coffee$Product, coffee$Market, FUN=sum)
coffee$Breaks<- cut(aggProfit, c(seq(-8000, 25000, 5000), max(aggSales)), dig.lab = 10)
appcolors <- c("#F37767", "#9FC08D", "#6BA862", "#2B893E", "#036227")
gg <- ggplot(coffee,aes(x=Product,weight=Sales, fill = Breaks))+
geom_bar()+facet_grid(Market~Product.Type,scales="free_x",space="free")+
ylab("Sales")+theme(axis.text.x=element_text(angle=90)) +
scale_fill_manual(values=colorRampPalette(appcolors)( length(levels(coffee$Breaks)) ))
plot(gg)
To get the colors c("#F37767", "#9FC08D", "#6BA862", "#2B893E", "#036227") I used the ColorZilla plugin.