R: Combine pie charts with ggplot2 - r

EDITED
I have the following example where I create 3 pie charts , but I would like to have them 3 combined into 1 pie + donuts pie.
Besides, it would be really useful to have the numbers as well, how can this be accomplished? Thanks a lot.
df.mut <- data.frame(Avrg.muts.HLA.A11.A24=c(20.20000,37.39286,11.85714,50.26087,20.20000,37.39286,11.85714,50.26087,20.20000,37.39286,11.85714,50.26087), Avrg.muts.HLA.A11=c(32.86842,32.86842,35.72973,35.72973,32.86842,32.86842,35.72973,35.72973,32.86842,32.86842,35.72973,35.72973), Avrg.muts.HLA.A24=c(15.33333,43.19608,15.33333,43.19608,15.33333,43.19608,15.33333,43.19608,15.33333,43.19608,15.33333,43.19608), variable=c("HLA.A11.A24","HLA.A11.A24","HLA.A11.A24","HLA.A11.A24","HLA.A11","HLA.A11","HLA.A11","HLA.A11","HLA.A24","HLA.A24","HLA.A24","HLA.A24"), value=c("+/+","+/-","-/+","-/-","+","+","-","-","+","-","+","-"))
df.mut$variable <- factor(df.mut$variable, levels=unique(df.mut$variable))
png(file="IMAGES/test1.png")
print(
ggplot(df.mut, aes(x="")) +
facet_grid(variable~., scales="free_y") +
geom_bar(data=subset(df.mut, variable=='HLA.A11.A24'),
aes(x='0', y=Avrg.muts.HLA.A11.A24, fill=value), width = 1, stat = "identity") +
geom_bar(data=subset(df.mut, variable=='HLA.A11'),
aes(x='1', y=Avrg.muts.HLA.A11, fill=value), width = 1, stat = "identity") +
geom_bar(data=subset(df.mut, variable=='HLA.A24'),
aes(x='2', y=Avrg.muts.HLA.A24, fill=value), width = 1, stat = "identity") +
ggtitle("TEST1") +
theme(axis.text.x=element_blank(), legend.title=element_blank(), legend.position="right", legend.background=element_blank(), legend.box.just="left", plot.title=element_text(size=15, face="bold", colour="black", vjust=1.5)) +
scale_y_continuous(name="") +
scale_x_discrete(name="") +
coord_polar(theta="y")
)
dev.off()
This produces the following image:
However, when I try to having the 3 of them together, the best I get is this mess:
How can I combine the pie charts above? And include numbers.

This should get you started:
df.test <- data.frame(genotype.1=c("+","+","-","-"), genotype.2=c("+","-","+","-"), count=c(345,547,678,987))
require(ggplot2)
require(grid)
ggplot(df.test, aes(y = count)) +
geom_bar(aes(x='0', fill = paste(genotype.1, genotype.2, sep="/")), color='black', width = 1, stat = "identity") +
geom_bar(aes(x='1', fill = genotype.1), width = 1, color='black', stat = "identity") +
geom_bar(aes(x='2', fill = genotype.2), width = 1, color='black', stat = "identity") +
coord_polar(theta="y") +
scale_x_discrete(name='', breaks=c('0', '1', '2'), labels=rep('', 3)) +
theme(axis.ticks.length = unit(0, "npc")) +
scale_fill_discrete(name='genotype', breaks = c('-', '+', '-/-', '-/+', '+/-', '+/+')) +
scale_y_continuous(breaks=0)
EDIT: Part of the reason, you get something different with faceting than without is because you use scales="free_y". To get the same thing without the facets, you can do scale the variables yourself.
p <- ggplot(df.mut, aes(x="")) +
geom_bar(data=subset(df.mut, variable=='HLA.A11.A24'),
aes(x='0', y=Avrg.muts.HLA.A11.A24/sum(Avrg.muts.HLA.A11.A24), fill=value), color='black', width = 1, stat = "identity") +
geom_bar(data=subset(df.mut, variable=='HLA.A11'),
aes(x='1', y=Avrg.muts.HLA.A11/sum(Avrg.muts.HLA.A11), fill=value), color='black', width = 1, stat = "identity") +
geom_bar(data=subset(df.mut, variable=='HLA.A24'),
aes(x='2', y=Avrg.muts.HLA.A24/sum(Avrg.muts.HLA.A24), fill=value), color='black', width = 1, stat = "identity") +
ggtitle("TEST1") +
theme(axis.text.x=element_blank(), legend.title=element_blank(), legend.position="right", legend.background=element_blank(), legend.box.just="left", plot.title=element_text(size=15, face="bold", colour="black", vjust=1.5)) +
scale_y_continuous(name="") +
scale_x_discrete(name="") +
coord_polar(theta="y")
# now look at the faceted and unfaceted plots...
p
p + facet_grid(variable~., scales="free_y")
However, your faceted plots also don't line up as nicely as your previous test data did. That just appears to be because the data is actually not exactly lined up (there are really only 2 unique values for the HLA.A11 and HLA.A24, so it's impossible to get 4 different sizes).

Related

How to solve the issue of the group title being cut off?

this is my first time to ask questions in stack overflow.
Recently I have done a bit of data visualization on the Amino Acids by making a multiple pie charts
First, this is my dataset
library(ggplot2)
df = data.frame(Species <-c('Chicken','Chicken','Chicken','Chicken','Chicken','Human','Human','Human','Human','Human','Crab-eating macaque','Crab-eating macaque','Crab-eating macaque','Crab-eating macaque','Crab-eating macaque','Mouse','Mouse','Mouse','Mouse','Mouse','Zebrafish','Zebrafish','Zebrafish','Zebrafish','Zebrafish'),
Amino_acids <- c('E','R','G','P','Others','E','R','G','P','Others','E','R','G','P','Others','E','R','G','P','Others','E','R','G','P','Others'),
value <- c(18,6,10,9,57,26,14,8,5,46,29,15,10,4,42,23,17,7,5,48,31,4,13,7,46))
df$Species <- factor(df$Species)
df$Amino_acids <- factor(df$Amino_acids)
Then, using ggplot2 to make the data visualization
ggplot(data=df, aes(x=" ", y=value, group=Amino_acids, colour=Amino_acids, fill=Amino_acids)) +
geom_bar(width = 1, stat = "identity", position= "fill") +
coord_polar("y", start=0)+
facet_grid(.~ Species) + facet_wrap(.~Species, strip.position="top")+theme_void()
and the result is this
Results](https://i.stack.imgur.com/8EheY.png)
The problem is I dont know why some of the group (species) titles are being cut off, for example, crab-eating macaque, that is very confusing to me. Also, I have tried hjust on the axis.title.x and y already and it seems no use at all.
Can anyone solve this issue for me? I would rly appreciate it if you can.
You're looking to edit strip.title.x. You have several ways to go around this. I'd prefer editing the box size for the text:
ggplot(data=df, aes(x=" ", y=value, group=Amino_acids, colour=Amino_acids, fill=Amino_acids)) +
geom_bar(width = 1, stat = "identity", position= "fill") +
coord_polar("y", start=0)+
facet_grid(.~ Species) +
facet_wrap(.~Species) +
theme_void() +
theme(strip.text.x = element_text(margin = margin(1,0,1,0), "cm"))
However, you could also adjust the vertical position:
ggplot(data=df, aes(x=" ", y=value, group=Amino_acids, colour=Amino_acids, fill=Amino_acids)) +
geom_bar(width = 1, stat = "identity", position= "fill") +
coord_polar("y", start=0)+
facet_grid(.~ Species) +
facet_wrap(.~Species) +
theme_void() +
theme(strip.text.x = element_text(vjust = 1))
Both should result in this:

Adding an extra item to the legend

I have the following data:
trait,beta,se,p,analysis,signif
trait1,0.078,0.01,9.00E-13,group1,1
trait2,0.076,0.01,1.70E-11,group1,1
trait3,-0.032,0.01,0.004,group1,0
trait4,0.026,0.01,0.024,group1,0
trait5,0.023,0.01,0.037,group1,0
trait1,0.042,0.01,4.50E-04,group2,1
trait2,0.04,0.01,0.002,group2,1
trait3,0.03,0.01,0.025,group2,0
trait4,0.025,0.01,0.078,group2,0
trait5,0.015,0.01,0.294,group2,0
trait1,0.02,0.01,0.078,group3,0
trait2,0.03,0.01,0.078,group3,0
trait3,0.043,0.01,1.90E-04,group3,0
trait4,0.043,0.01,2.40E-04,group3,1
trait5,0.029,0.01,0.013,group3,0
And make a plot with the following code:
library(ggplot2)
ggplot(GEE, aes(y=beta, x=reorder(trait, beta), group=analysis)) +
geom_point(data = GEE[GEE$signif == 1, ],
color="red",
shape = "*",
size=12,
show.legend = F) +
geom_point(aes(color=analysis)) +
geom_errorbar(aes(ymin=beta-2*se, ymax=beta+2*se,color=analysis), width=.2,
position=position_dodge(.2)) +
geom_hline(yintercept = 0) +
theme_light() +
theme(axis.title.y=element_blank(),
legend.title=element_blank()) +
coord_flip()
Which gives me the following plot:
I would like to add an extra element to the legend, namely the red asterisk, and I want it to say "significant". How do I go about doing that?
PS. If you like this piece of code, I have another problem with it, specified here :)
Add dummy aes() to geom_point - for example fill that is named significant aes(fill = "Significant").
# Using OPs data
library(ggplot2)
ggplot(GEE, aes(y=beta, x=reorder(trait, beta), group=analysis)) +
geom_point(data = GEE[GEE$signif == 1, ],
color="red",
shape = "*",
size=12,
aes(fill = "Significant")) +
geom_point(aes(color=analysis)) +
geom_errorbar(aes(ymin=beta-2*se, ymax=beta+2*se,color=analysis), width=.2,
position=position_dodge(.2)) +
geom_hline(yintercept = 0) +
theme_light() +
theme(axis.title.y=element_blank(),
legend.title=element_blank()) +
coord_flip() +
guides(colour = guide_legend(order = 1),
fill = guide_legend(override.aes = list(size = 5))) +
theme(legend.margin = margin(-0.5,0,0,0, unit="cm"))
PS: I also removed show.legend = F from asterik geom_point

Position dodge does not work with geom_point and geom_errorbar

I have this overplotting issue going on. Even after reading a lot of posts on dodge, jitter and jitter dodge in all kinds of implementations I can't figure it out.
Here you can get my data: http://pastebin.com/embed_js.php?i=uPXN7nPt
library(dplyr)
library(gdata)
library(ggplot2)
library(directlabels)
all<-read.xls('all_auto_bio_adjusted_c.xls')
all$size.new<-sqrt(all$size.new)
all$station<-as.factor(all$station)
all$group.new<-factor(all$group, levels=c('C. hyperboreus','C. glacialis','Special Calanus','M. longa','Pseudocalanus sp.','Copepoda'))
pd <- position_dodge(w = 50)
allp <- ggplot(data = all, aes(y = averagebiol, x = automatic, colour = group.new, group=group.new)) +
geom_abline(intercept = 0, slope = 1) +
geom_point(aes(size = size.new), show_guide=TRUE, position=pd) +
scale_size_identity()+
geom_errorbar(aes(ymin = averagebiol - stdevbiol, ymax = averagebiol + stdevbiol),colour = "grey", width = 0.1, position=pd) +
facet_grid(group.new~station, scales="free") +
xlab("Automatic identification") + ylab("Manual identification") +
ggtitle("Comparison of automatic vs manual identification") +
theme_bw() +
theme(plot.title = element_text(lineheight=.8, face="bold", size=20,vjust=1), axis.text.x = element_text(colour="grey20",size=15,angle=0,hjust=.5,vjust=.5,face="bold"), axis.text.y = element_text(colour="grey20",size=15,angle=0,hjust=1,vjust=0,face="bold"), axis.title.x = element_text(colour="grey20",size=20,angle=0,hjust=.5,vjust=0,face="bold"), axis.title.y = element_text(colour="grey20",size=20,angle=90,hjust=.5,vjust=1,face="bold"), legend.position="none", strip.text.x = element_text(size = 12, face="bold", colour = "black", angle = 0), strip.text.y = element_text(size = 12, face="bold", colour = "black"))
allp
Which produces this nice plot
But as you can see a lot of the points and error bars are cramped together. Shouldn't my implementation of position dodge work?
If I understood right position dodge takes the scale of the axes, so with a doge of 50 I should see some results. I also tried putting the dodge argument directly into the geom, but that had no effect either.
Any ideas?
If you leave out position = pd in both geom_errorbar() and geom_point() you get the same plot. The reason the data look 'cramped' is because of the spread of the x-values. As far as I know, dodging will only happen if two points 'overlap', which I interpret as having the same x-value, e.g. data on a categorical x-axis like in the case of a bar plot. Your x-axis is continuous so the points will not be dodged.
To deal with the overplotting you could try logarithmic scales:
library(ggplot2)
tmp <- tempfile()
download.file("http://pastebin.com/raw.php?i=uPXN7nPt", tmp)
all <- read.csv(tmp)
all$size.new <- sqrt(all$size.new)
all$station <- as.factor(all$station)
all$group.new <- factor(all$group, levels = c("C. hyperboreus", "C. glacialis",
"Special Calanus", "M. longa",
"Pseudocalanus sp.", "Copepoda"))
# explicitly remove missing data
all <- all[complete.cases(all), ]
allp <- ggplot(data = all, aes(y = averagebiol, x = automatic, colour = group.new,
group = group.new, ymin = averagebiol - stdevbiol,
ymax = averagebiol + stdevbiol)) +
theme_bw() +
geom_abline(intercept = 0, slope = 1) +
geom_errorbar(colour = "grey", width = 0.1) +
geom_point(aes(size = size.new)) +
scale_size_area() + # Just so I could see all the points on my monitor :)
xlab("Automatic identification") +
ylab("Manual identification") +
ggtitle("Comparison of automatic vs manual identification")
allp + scale_x_log10() +
scale_y_log10() +
facet_grid(group.new ~ station, scales = "fixed")

ggplot does not show legend in geom_histogram

I have this code
ggplot()
+ geom_histogram(aes(x=V1, y=(..count..)/sum(..count..)), fill="red", alpha=.4, colour="red", data=coding, stat = "bin", binwidth = 30)
+ geom_histogram(aes(x=V1,y=(..count..)/sum(..count..)), fill="blue", alpha=.4, colour="blue", data=lncrna, stat = "bin", binwidth = 30)
+ coord_cartesian(xlim = c(0, 2000))
+ xlab("Size (nt)")
+ ylab("Percentage (%)")
+ geom_vline(data=cdf, aes(xintercept=rating.mean, colour=Labels), linetype="dashed", size=1)
that produces a beautiful histogram without legend:
In every post I visit with the same problem, they say to put color inside aes. nevertheless, this does not give any legend.
I tried:
ggplot() + geom_histogram(aes(x=V1, y=(..count..)/sum(..count..),color="red", fill="red"), fill="red", alpha=.4, colour="red", data=coding, stat = "bin", binwidth = 30)
+ geom_histogram(aes(x=V1,y=(..count..)/sum(..count..), color="blue", fill="blue"), fill="blue", alpha=.4, colour="blue", data=lncrna, stat = "bin", binwidth = 30)
+ coord_cartesian(xlim = c(0, 2000))
+ xlab("Size (nt)")
+ ylab("Percentage (%)")
+ geom_vline(data=cdf, aes(xintercept=rating.mean, colour=Labels), linetype="dashed", size=1)
without success.
How can I put a legend in my graph?
If you don't want to put the data in one data.frame, you can do this:
set.seed(42)
coding <- data.frame(V1=rnorm(1000))
lncrna <- data.frame(V1=rlnorm(1000))
library(ggplot2)
ggplot() +
geom_histogram(aes(x=V1, y=(..count..)/sum(..count..), fill="r", colour="r"), alpha=.4, data=coding, stat = "bin") +
geom_histogram(aes(x=V1,y=(..count..)/sum(..count..), fill="b", colour="b"), alpha=.4, data=lncrna, stat = "bin") +
scale_colour_manual(name="group", values=c("r" = "red", "b"="blue"), labels=c("b"="blue values", "r"="red values")) +
scale_fill_manual(name="group", values=c("r" = "red", "b"="blue"), labels=c("b"="blue values", "r"="red values"))
The problem is that you can't map your color into aes because you've got two separete sets of data. An idea is to bind them, then to apply the "melt" function of package reshape2 so you create a dummy categorical variable that you can pass into aes. the code:
require(reshape2)
df=cbind(blue=mtcars$mpg, red=mtcars$mpg*0.8)
df=melt(df, id.vars=1:2)
ggplot()+geom_histogram(aes(y=(..count..)/sum(..count..),x=value, fill=Var2, color=Var2), alpha=.4, data=df, stat = "bin")
There you've got your legend

to show mean value in ggplot box plot

I need to be able to show the mean value in ggplot box plot. Below works for a point but I need the white dashed lines? Any body help?
x
Team Value
A 10
B 5
C 29
D 35
ggplot(aes(x = Team , y = Value), data = x)
+ geom_boxplot (aes(fill=Team), alpha=.25, width=0.5, position = position_dodge(width = .9))
+ stat_summary(fun.y=mean, colour="red", geom="point")
Here's my way of adding mean to boxplots:
ggplot(RQA, aes(x = Type, y = engagementPercent)) +
geom_boxplot(aes(fill = Type),alpha = .6,size = 1) +
scale_fill_brewer(palette = "Set2") +
stat_summary(fun.y = "mean", geom = "text", label="----", size= 10, color= "white") +
ggtitle("Participation distribution by type") +
theme(axis.title.y=element_blank()) + theme(axis.title.x=element_blank())
ggplot(df, aes(x = Type, y = scorepercent)) +
geom_boxplot(aes(fill = Type),alpha = .6,size = 1) +
scale_fill_brewer(palette = "Set2") +
stat_summary(fun.y = "mean", geom = "point", shape= 23, size= 3, fill= "white") +
ggtitle("score distribution by type") +
theme(axis.title.y=element_blank()) + theme(axis.title.x=element_blank())
I would caution against using text to this and do geom_line instead as text is offset slightly and gives the wrong portrayal of the mean.
Hey user1471980, I think people are more inclined to help if you have a unique user name but then again you have a lot of points :)
this is a hack but does this help:
Value<-c(1,2,3,4,5,6)
Team<-c("a","a","a","b","b","b")
x<-data.frame(Team,Value) #note means for a=2, mean for b=5
ggplot(aes(x = Team , y = Value), data = x) + geom_boxplot (aes(fill=Team), alpha=.25, width=0.5, position = position_dodge(width = .9)) +
annotate(geom="text", x=1, y=2, label="----", colour="white", size=7, fontface="bold", angle=0) +
annotate(geom="text", x=2, y=5, label="----", colour="white", size=7, fontface="bold", angle=0)

Resources