unstack stacked ggplot legend - r

I'm working with a chemistry dataset, where I have 11 different chemicals, here labeled under the column c1,c2,...c11
I have made pie charts using library(ggplot2) , and would like to do 2 things with my plot.
Display all variables in the legend in a horizontal fashion (done), and not have them stacked (not done), as you see in my example. Having just one line would be great. 2 lines could also be acceptable.
Change colors to be color-blind friendly
Here is a pretend dataset we can work with so you can see what I have at this point. I have tried searching "legend margins" to increase the area the legend is plotted on, but to no avail.
data <- read.delim("https://pastebin.com/raw/MS5GLAxa", header = T)
ggplot(data, aes(x="", y=ratio, fill=chemical)) +
geom_bar(stat="identity", width=1,position = position_fill()) + facet_wrap(~treatment, nrow=1)+
coord_polar("y", start=0)+
theme_void(base_size = 20)+
theme(legend.position=c(0.5, 1.2),legend.direction = "horizontal")+
theme(plot.margin=unit(c(0,0,0,0), 'cm'))
Some side bonuses here would be to be able to:
increase the size of the pie chart (I believe I achieved this with making my margins as small as possible on the sides)
have the pie chart have solid colors, and no white lines in graph

Use guides to make the number of rows to 1 and use scale_fill_brewer with color blindness friendly palette.
ggplot(data, aes(x="", y=ratio, fill=chemical)) +
geom_bar(stat="identity", width=1,position = position_fill()) +
facet_wrap(~treatment, nrow=1)+
coord_polar("y", start=0) +
scale_fill_brewer(palette="Paired") +
theme_void(base_size = 20) +
theme(legend.position=c(0.5, 1.5),legend.direction = "horizontal",
plot.margin=unit(c(0,0,0,0), 'cm')) +
guides(fill = guide_legend(nrow = 1)) # if required nrow = 2

Related

Spacing between groups of bars in histogram

When I produce histograms in ggplot2 where the bar positions are dodge, I expect something like this where there is space between the groups of bars (i.e. notice the white space between each groups of red/green pairs):
I'm having a hard time producing the same effect when I build a histogram with continuous data. I can't seem to add space between the groups of bars, and instead, everything gets squashed together. As you can see, it makes it visually difficult to compare the red/green pairs:
To reproduce my problem, I created a sample data set here: https://www.dropbox.com/s/i9nxzo1cmbwwfsa/data.csv?dl=0
Code to reproduce:
data <- read.csv("https://www.dropbox.com/s/i9nxzo1cmbwwfsa/data.csv?dl=1")
ggplot(data, aes(x = soldPrice, fill = month)) +
geom_histogram(binwidth=1e5, position=position_dodge()) +
labs(x="Sold Price", y="Sales", fill="") +
scale_x_continuous(labels=scales::comma, breaks=seq(0, 2e6, by = 1e5)) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
How can I add white space between the groups of red/green pairs?
Alternative 1: overlapping bars with geom_histogram()
From ?position_dodge():
Dodging preserves the vertical position of an geom while adjusting the horizontal position
This function accepts a width argument that determines the space to be created.
To get what I think you want, you need to supply a suitable value to position_dodge(). In your case, where binwidth=1e5, you might play with e.g. 20% of that value: position=position_dodge(1e5-20*(1e3)).
(I left the rest of your code untouched.)
You could use the following code:
ggplot(data, aes(x = soldPrice, fill = month)) +
geom_histogram(binwidth=1e5, position=position_dodge(1e5-20*(1e3))) + ### <-----
labs(x="Sold Price", y="Sales", fill="") +
scale_x_continuous(labels=scales::comma, breaks=seq(0, 2e6, by = 1e5)) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
yielding this plot:
Alternative 2: use ggplot-object and render with geom_bar
geom_histogram() was not designed to produce what you want. geom_bar() on the other hand provides the flexibility you need.
You can generate the histogram with geom_histogram and save it in an ggplot-object. Then, you generate the plotting information with ggplot_build(). Now,
you may use the histogram plotting information in the object to generate a bar plot with geom_bar()
## save ggplot object to h
h <- ggplot(data, aes(x = soldPrice, fill = month)) +
geom_histogram(binwidth=1e5, position=position_dodge(1e5-20*(1e3)))
## get plotting information as data.frame
h_plotdata <- ggplot_build(h)$data[[1]]
h_plotdata$group <- as.factor(h_plotdata$group)
levels(h_plotdata$group) <- c("May 2018", "May 2019")
## plot with geom_bar
ggplot(h_plotdata, aes(x=x, y=y, fill = group)) +
geom_bar(stat = "identity") +
labs(x="Sold Price", y="Sales", fill="") +
scale_x_continuous(labels=scales::comma, breaks=seq(0, 2e6, by = 1e5)) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
yielding this graph:
Please, let me know whether this is what you want.

format internal lines of a stacked geom_bar ggplot

I want to remove the internal borders from my ggplot, leaving a coloured border around the outside of each bar only. Here is a test data frame, with a stacked bar plot. Ideally, I will end up with the groups in the stack still being a shade of grey, with a colourful outline per box.
test <- data.frame(iso=rep(letters[1:5],3),
num= sample(1:99, 15, replace=T),
fish=rep(c("pelagic", "reef", "benthic"), each=5),
colour=rep(rainbow(n=5),3))
ggplot(data=test, aes(x=iso, y=num, fill=fish, colour=colour)) +
geom_bar(stat="identity") +
theme_bw() +
scale_colour_identity() + scale_fill_grey(start = 0, end = .9)
You can accomplish this by moving the fill and colour aes() settings into two separate geom_bar() elements: one which takes the sum for each iso value (the outline), and another which splits things up by fish:
ggplot(data=test, aes(x=iso, y=num)) +
geom_bar(stat="summary", fun.y="sum", aes(color=colour)) +
geom_bar(stat="identity", aes(fill=fish)) +
theme_bw() +
scale_colour_identity() +
scale_fill_grey(start = 0, end = .9)

Removing ggplot legend symbol while retaining label

Example code and figure:
data <- data.frame( ID = c(LETTERS[1:26], paste0("A",LETTERS[1:26])),
Group = rep(c("Control","Treatment"),26),
x = rnorm(52,50,20),
y = rnorm(52,50,10))
ggplot(data, aes(y=y,x=x, label=ID, color=Group)) +
geom_text(size=8) +
scale_color_manual(values=c("blue","red")) +
theme_classic() +
theme(legend.text = element_text(color=c("blue","red")))
What I'm trying to solve is removing the legend symbols (the "a") and coloring the Group labels (Control and Treatment) as they appear in the plot (Blue and Red respectively).
I've tried:
geom_text(show_guide = F)
But that just removes the legend entirely.
To keep it simple I could just use annotate...but wondering if there's a legend specific solution.
ggplot(data, aes(y=y,x=x, label=ID, color=Group)) +
geom_text(size=8, show_guide=F) +
scale_color_manual(values=c("blue","red")) +
theme_classic() +
annotate("text",label="Control", color="blue",x=20,y=80,size=8) +
annotate("text",label="Treatment", color="Red",x=23,y=77,size=8)
Another option is to use point markers (instead of the letter "a") as the legend symbols, which you can do with the following workaround:
Remove the geom_text legend.
Add a "dummy" point geom and set the point marker size to NA, so no points are actually plotted, but a legend will be generated.
Override the size of the point markers in the legend, so that point markers will appear in the legend key to distinguish each group.
ggplot(data, aes(y=y,x=x, label=ID, color=Group)) +
geom_text(size=8, show.legend=FALSE) +
geom_point(size=NA) +
scale_color_manual(values=c("blue","red")) +
theme_classic() +
labs(colour="") +
guides(colour=guide_legend(override.aes=list(size=4)))
Beginning with ggplot2 2.3.2, you can specify the glyph used in the legend using the argument key_glyph:
ggplot(data, aes(x=x, y=y, label=ID, color=Group)) +
geom_text(size=8, key_glyph="point") +
scale_color_manual(values=c("blue", "red")) +
labs(color=NULL) +
theme_classic()
For a full list of glyphs, refer to the ggplot2 documentation for draw_key. Credit to R Data Berlin for alerting me to this simple solution. Emil Hvitfeldt also has a nice blog post showcasing the options.
As a quick fix you can tweak the legend key, by hard coding the info you want, although around the other way - keep the key and remove the label.
library(grid)
GeomText$draw_key <- function (data, params, size) {
txt <- ifelse(data$colour=="blue", "Control", "Treatment")
# change x=0 and left justify
textGrob(txt, 0, 0.5,
just="left",
gp = gpar(col = alpha(data$colour, data$alpha),
fontfamily = data$family,
fontface = data$fontface,
# also added 0.5 to reduce size
fontsize = data$size * .pt* 0.5))
}
And when you plot you suppress the legend labels, and make legend key a bit wider to fit text.
ggplot(data, aes(y=y,x=x, label=ID, color=Group)) +
geom_text(size=8) +
scale_color_manual(values=c("blue","red")) +
theme_classic() +
theme(legend.text = element_blank(),
legend.key.width = unit(1.5, "cm"))

How to illustrate non available data points in a different shape using ggplot2?

Is there a way to change the shape of the points for missing data in R? I am plotting .csv files like this one in a lollipop style.
Name,chr,Pos,Reads...ME_016,Reads...ME_017,Reads...ME_018,Reads...ME_019
cg01389728,chr10,6620395,33.82,41.38,41.38,38.46
cg01389728,chr10,6620410,0,-,-,-
cg01389728,chr10,6620430,0,0,-,-
cg01389728,chr10,6620447,0,-,0,-
cg01389728,chr10,6620478,0,-,-,-
cg01389728,chr10,6620510,28.33,29.85,25.64,28.13
cg01389728,chr10,6620520,0,0,-,0
cg01389728,chr10,6620531,0,-,50,-
Using ggplot2, my graphs are created with this:
dataset <-read.table("testset", sep=",",na.strings="-", header=TRUE)
dataset <- subset(dataset, select=c(-Name, -chr))
dataset <- melt(dataset, id.vars="Pos")
dataset$variable <- gsub("\\.\\.\\.","_",dataset$variable)
xaxes <- unique(dataset$Pos)
dataset$Pos <- as.factor(dataset$Pos)
ggplot(dataset, aes(x=Pos, y=variable,fill=cut(value, breaks=10))) + geom_point(size=4, shape=21) + geom_line() + scale_fill_discrete(labels=c("0-10%","10-20%","20-30%","30-40%","40-50%","50-60%","60-70%","70-80%","80-90%","90-100%")) +
xlab("CpG Positions") +
ylab("Sample") +
labs(fill="Coverage in %") +
theme_bw() +
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5),plot.title = element_text(vjust=2),axis.title.x = element_text(vjust=-0.5),axis.title.y = element_text(vjust=1.5))
However, I want to set the shape of the missing points ("-") in the plot to an "x", (shape=4) and show them also in the legend.
I've tried approaches like:
scale_fill_manual(values=c(value, NA))
or:
scale_shape_manual(values=c(21,4))
By default, the "-" are also shown with shape 21 and grey colour. There must be a way to manipulate this? Writing a method like this might be the trick, but how to call it for the whole column?
formas <- function(x){
+ if(is.na(x)) forma <- 4
+ if(!is.na(x)) forma <- 21
+ return(forma)
+ }
This comes pretty close, I think.
ggplot(dataset, aes(x=Pos, y=variable,
color=cut(value, breaks=10),
shape=ifelse(is.na(value),"Missing","Present"))) +
geom_point(size=4) +
geom_line() +
scale_shape_manual(name="",values=c(Missing=4,Present=19))+
scale_color_discrete(labels=c("0-10%","10-20%","20-30%","30-40%","40-50%","50-60%","60-70%","70-80%","80-90%","90-100%")) +
xlab("CpG Positions") +
ylab("Sample") +
labs(color="Coverage in %") +
theme_bw() +
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5),plot.title = element_text(vjust=2),axis.title.x = element_text(vjust=-0.5),axis.title.y = element_text(vjust=1.5))
Change are:
used color instead of fill, with shape=19 for points with data
added shape aesthetic to ggplot(...) call.
removed shape=21 from geom_point(...) call.
added scale_shape_manual(...) to define the shapes for Missing and Present, and turn off the guide label.
I know you wanted filled points with a black outline (it does look better), but when I tried that with the added shape aesthetic, the fill legend does not display the colors correctly. Try it yourself.
Here is another approach that comes closer to producing the graph you specified (circular points with black outline and fill color determined by coverage).
fill.colors <- hcl(h=seq(15, 375, length=11), l=65, c=100)[1:10]
ggplot(dataset, aes(x=Pos, y=variable,
fill=cut(value, breaks=10),
shape=ifelse(is.na(value),"Missing","Present"))) +
geom_point(size=4) +
geom_line() +
scale_fill_manual(name="Coverage in %",
values=fill.colors,
labels=c("0-10%","10-20%","20-30%","30-40%","40-50%","50-60%","60-70%","70-80%","80-90%","90-100%"),
drop=FALSE) +
scale_shape_manual(name="",values=c(Missing=4,Present=21),limits=c("Missing"))+
xlab("CpG Positions") +
ylab("Sample") +
labs(color="Coverage in %") +
theme_bw() +
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5),
plot.title = element_text(vjust=2),
axis.title.x = element_text(vjust=-0.5),
axis.title.y = element_text(vjust=1.5))+
guides(fill=guide_legend(override.aes=list(colour=fill.colors),order=1))
The problem in the other answer with using point shape 21 and the fill aesthetic is that, while the fill colors are displayed correctly in the plot, they are not displayed correctly in the legend. One way around that is to force ggplot to set the legend fill colors using
guides(fill=guide_legend(override.aes=list(colour=fill.colors),order=1))
Unfortunately, to do that you have to specify the fill colors manually (so that the actual fill and the override fill are the same). This code does that using
fill.colors <- hcl(h=seq(15, 375, length=11), l=65, c=100)[1:10]
which creates a color palette that mimics the ggplot default. You could of course use your own color palette here.
While this does come closer to your original intent, I actually think the other answer provides a better data visualization. The black outlines around the points, while "attractive", make it much more difficult to distinguish between fill colors, especially with 10 possible colors (which is at the edge of discernability anyway).
I can't see, why this is not working:
fill.colors <- hcl(h=seq(15, 375, length=11), l=65, c=100)[1:10]
ggplot(dataset, aes(x=Pos, y=variable
,color=cut(value, breaks=c(-0.01,10,20,30,40,50,60,70,80,90,100))
,shape=ifelse(is.na(value),"Missing","Present"))) +
geom_point(size=4) +
scale_shape_manual(name="",values=c("Missing"=4,"Present"=19),limits=c("Missing"))+
scale_color_manual(name="Coverage in %",
values=ifelse(is.na(dataset$value),"grey",fill.colors),
labels=c("0-10%","10-20%","20-30%","30-40%","40-50%","50-60%","60-70%","70-80%","80-90%","90-100%"),drop=FALSE) +
theme_bw() +
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5),
plot.title = element_text(vjust=2),
axis.title.x = element_text(vjust=-0.5),
axis.title.y = element_text(vjust=1.5)) +
xlab("CpG Positions") +
ylab("Sample") +
labs(color="Coverage in %") +
guides(fill=guide_legend(override.aes=list(colour=fill.colors),order=1))
NA values are not shown anymore with an X, and instead of displaying them in "grey", the class 90-100% will be shown in grey. No error message is shown - what is the problem?

Pie charts in ggplot2 with variable pie sizes

I've tried various ways to get a facet_grid of pie charts in ggplot2 to vary width/radii according to another variable (strength).
geom_bar accepts width=0.5 as a parameter but it is ignored once coord_polar is added. Adding width=0.5 to the ggplot aes or adding a aes to geom_bar doesn't work. I can't see any other relevant options for coord_polar. What's the easiest way to do this? The code below makes a nice grid of pie charts but doesn't change the sizes of the pie charts. What am I missing?
mydata <- data.frame(side1=rep(LETTERS[1:3],3,each=9),
side2=rep(LETTERS[1:3],9,each=3),
widget=rep(c("X","Y","Z"),9*3),
val=runif(9*3),
strength=rep(c(1,2,3),3,each=3))
ggplot(mydata, aes(x="",y = val, fill = widget, width = strength)) +
geom_bar(position="fill") +
facet_grid(side1 ~ side2) +
coord_polar("y") +
opts(axis.text.x = theme_blank())
Do you mean like this?
ggplot(mydata, aes(x=strength/2, y = val, fill = widget, width = strength)) +
geom_bar(position="fill", stat="identity") +
facet_grid(side1 ~ side2) +
coord_polar("y") +
opts(axis.text.x = theme_blank())

Resources