ggplot: legend for a plot the combines bars / lines? - r

I have a empirical PDF + CDF combo I'd like to plot on the same panel. distro.df has columns pdf, cdf, and day. I'd like the pdf values to be plotted as bars, and the cdf as lines. This does the trick for making the plot:
p <- ggplot(distro.df, aes(x=day))
p <- p + geom_bar(aes(y=pdf/max(pdf)), stat="identity", width=0.95, fill=fillCol)
p <- p + geom_line(aes(y=cdf))
p <- p + xlab("Day") + ylab("")
p <- p + theme_bw() + theme_update(panel.background = element_blank(), panel.border=element_blank())
However, I'm having trouble getting a legend to appear. I'd like a line for the cdf and a filled block for the pdf. I've tried various contortions with guides, but can't seem to get anything to appear.
Suggestions?
EDIT:
Per #Henrik's request: to make a suitable distro.df object:
df <- data.frame(day=0:10)
df$pdf <- runif(length(df$day))
df$pdf <- df$pdf / sum(df$pdf)
df$cdf <- cumsum(df$pdf)
Then the above to make the plot, then invoke p to see the plot.

This generally involves moving fill into aes and using it in both the geom_bar and geom_line layers. In this case, you also need to add show_guide = TRUE to geom_line.
Once you have that, you just need to set the fill colors in scale_fill_manual so CDF doesn't have a fill color and use override.aes to do the same thing for the lines. I didn't know what your fill color was, so I just used red.
ggplot(df, aes(x=day)) +
geom_bar(aes(y=pdf/max(pdf), fill = "PDF"), stat="identity", width=0.95) +
geom_line(aes(y=cdf, fill = "CDF"), show_guide = TRUE) +
xlab("Day") + ylab("") +
theme_bw() +
theme_update(panel.background = element_blank(),
panel.border=element_blank()) +
scale_fill_manual(values = c(NA, "red"),
breaks = c("PDF", "CDF"),
name = element_blank(),
guide = guide_legend(override.aes = list(linetype = c(0,1))))

I'd still like a solution to the above (and will checkout #aosmith's answer), but I am currently going with a slightly different approach to eliminate the need to solve the problem:
p <- ggplot(distro.df, aes(x=days, color=pdf, fill=pdf))
p <- p + geom_bar(aes(y=pdf/max(pdf)), stat="identity", width=0.95)
p <- p + geom_line(aes(y=cdf), color="black")
p <- p + xlab("Day") + ylab("CDF")
p <- p + theme_bw() + theme_update(panel.background = element_blank(), panel.border=element_blank())
p
This also has the advantage of displaying some of the previously missing information, namely the PDF values.

Related

How to create custom labels at the ends of the x-axis in a faceted ggplot

I've created a plot (below) and basically want the left and right ends of each facet to state "Ov" and "Cx." I've tried using scale_x_continuous but the issue is that the x-axis for each facet is different.
What I have right now (image):
What I'd like to get ideally:
all_prm %>% ggplot(aes(y_coord, prominence)) +
geom_point() +
facet_wrap(~interaction(ms, sample), scales="free_x") +
scale_x_continuous(breaks=c(10000), labels=c("Ov")) +
theme(
axis.title.y=element_text(margin=margin(r=7)),
axis.title.x=element_text(margin=margin(t=7)),
panel.background = element_rect(fill='white', color='grey10')) +
xlab("Oviduct-Cervical Axis") +
ylab("Prominence")
You can use a custom function to set the break points, in this case using the range of the x limit values with an adjustment argument to move the labels away from the axis limits relative to their scale.
Using the iris dataset:
break_range <- function(x, adjust = .025) {
rng <- range(x)
rng + diff(rng) * c(adjust, -adjust)
}
ggplot(iris, aes(Petal.Length, Sepal.Length )) +
geom_point() +
facet_wrap(. ~ Species, nrow = 3, scales="free_x") +
scale_x_continuous(breaks = break_range, labels = c("Ov", "Cx")) +
theme(
axis.title.y=element_text(margin=margin(r=7)),
axis.title.x=element_text(margin=margin(t=7)),
panel.background = element_rect(fill='white', color='grey10'))

How to trim extra space from ggplot

I am trying to make an extremely single heatmap of percentages using ggplot2 which ideally will just be two single thin columns. I tried the following code, believing that the width option in aes would solve the problem.
p_prev_tg <- ggplot(tg_melt, aes(x = variable , y = OTU, fill = value,
width=.3)) + geom_tile() +
scale_fill_gradientn(colours = hm.palette2(10)) +
xlab(NULL) + ylab(NULL) +
theme(axis.text=element_text(size=7))
p_prev_tg
Unfortunately, this returns a plot with lots of empty space as shown. The plot I would like is those two bars side by side, how can I do this in ggplot?
thanks
What about this solution ?
set.seed(1234)
tg_melt <- data.frame(variable=rep(c("Prevalence_T","Prevalence_NT"), each=10),
OTU=rep(paste0("OTU_",1:10),2),
value=rnorm(20))
library(RColorBrewer)
library(ggplot2)
hm.palette2 <- colorRampPalette(rev(brewer.pal(11, 'Spectral')))
p_prev_tg <- ggplot(tg_melt, aes(x = as.numeric(variable), y = OTU, fill = value)) +
geom_tile() +
scale_fill_gradientn(colours = hm.palette2(10)) +
xlab(NULL) + ylab(NULL) +
theme(axis.text=element_text(size=7)) +
scale_x_continuous(breaks=c(1,2),
limits=c(0,3),
labels=levels(tg_melt$variable))+
theme_bw()
p_prev_tg

Removing ggplot legend symbol while retaining label

Example code and figure:
data <- data.frame( ID = c(LETTERS[1:26], paste0("A",LETTERS[1:26])),
Group = rep(c("Control","Treatment"),26),
x = rnorm(52,50,20),
y = rnorm(52,50,10))
ggplot(data, aes(y=y,x=x, label=ID, color=Group)) +
geom_text(size=8) +
scale_color_manual(values=c("blue","red")) +
theme_classic() +
theme(legend.text = element_text(color=c("blue","red")))
What I'm trying to solve is removing the legend symbols (the "a") and coloring the Group labels (Control and Treatment) as they appear in the plot (Blue and Red respectively).
I've tried:
geom_text(show_guide = F)
But that just removes the legend entirely.
To keep it simple I could just use annotate...but wondering if there's a legend specific solution.
ggplot(data, aes(y=y,x=x, label=ID, color=Group)) +
geom_text(size=8, show_guide=F) +
scale_color_manual(values=c("blue","red")) +
theme_classic() +
annotate("text",label="Control", color="blue",x=20,y=80,size=8) +
annotate("text",label="Treatment", color="Red",x=23,y=77,size=8)
Another option is to use point markers (instead of the letter "a") as the legend symbols, which you can do with the following workaround:
Remove the geom_text legend.
Add a "dummy" point geom and set the point marker size to NA, so no points are actually plotted, but a legend will be generated.
Override the size of the point markers in the legend, so that point markers will appear in the legend key to distinguish each group.
ggplot(data, aes(y=y,x=x, label=ID, color=Group)) +
geom_text(size=8, show.legend=FALSE) +
geom_point(size=NA) +
scale_color_manual(values=c("blue","red")) +
theme_classic() +
labs(colour="") +
guides(colour=guide_legend(override.aes=list(size=4)))
Beginning with ggplot2 2.3.2, you can specify the glyph used in the legend using the argument key_glyph:
ggplot(data, aes(x=x, y=y, label=ID, color=Group)) +
geom_text(size=8, key_glyph="point") +
scale_color_manual(values=c("blue", "red")) +
labs(color=NULL) +
theme_classic()
For a full list of glyphs, refer to the ggplot2 documentation for draw_key. Credit to R Data Berlin for alerting me to this simple solution. Emil Hvitfeldt also has a nice blog post showcasing the options.
As a quick fix you can tweak the legend key, by hard coding the info you want, although around the other way - keep the key and remove the label.
library(grid)
GeomText$draw_key <- function (data, params, size) {
txt <- ifelse(data$colour=="blue", "Control", "Treatment")
# change x=0 and left justify
textGrob(txt, 0, 0.5,
just="left",
gp = gpar(col = alpha(data$colour, data$alpha),
fontfamily = data$family,
fontface = data$fontface,
# also added 0.5 to reduce size
fontsize = data$size * .pt* 0.5))
}
And when you plot you suppress the legend labels, and make legend key a bit wider to fit text.
ggplot(data, aes(y=y,x=x, label=ID, color=Group)) +
geom_text(size=8) +
scale_color_manual(values=c("blue","red")) +
theme_classic() +
theme(legend.text = element_blank(),
legend.key.width = unit(1.5, "cm"))

Cannot remove grey area behind legend symbol when using smooth

I'm using ggplot2 with a GAM smooth to look at the relationship between two variables. When plotting I'd like to remove the grey area behind the symbol for the two types of variables. For that I would use theme(legend.key = element_blank()), but that doesn't seem to work when using a smooth.
Can anyone tell me how to remove the grey area behind the two black lines in the legend?
I have a MWE below.
library(ggplot2)
len <- 10000
x <- seq(0, len-1)
df <- as.data.frame(x)
df$y <- 1 - df$x*(1/len)
df$y <- df$y + rnorm(len,sd=0.1)
df$type <- 'method 1'
df$type[df$y>0.5] <- 'method 2'
p <- ggplot(df, aes(x=x, y=y)) + stat_smooth(aes(lty=type), col="black", method = "auto", size=1, se=TRUE)
p <- p + theme_classic()
p <- p + theme(legend.title=element_blank())
p <- p + theme(legend.key = element_blank()) # <--- this doesn't work?
p
Here is a very hacky workaround, based on the notion that if you map things to aestethics in ggplot, they appear in the legend. geom_smooth has a fill aesthetic which allows for different colourings of different groups if one so desires. If it's hard to fix that downstream, sometimes it's easier to keep those unwanted items out of the legend altogether. In your case, the color of the se appeared in the legend. As such, I've created two geom_smooths. One without a line color (but grouped by type) to create the plotted se's, and one with linetype mapped to aes but se set to false.
p <- ggplot(df, aes(x=x, y=y)) +
#first smooth; se only
stat_smooth(aes(group=type),col=NA, method = "auto", size=1, se=TRUE)+
#second smooth: line only
stat_smooth(aes(lty=type),col="black", method = "auto", size=1, se=F) +
theme_classic() +
theme(
legend.title = element_blank(),
legend.key = element_rect(fill = NA, color = NA)) #thank you #alko989

How to illustrate non available data points in a different shape using ggplot2?

Is there a way to change the shape of the points for missing data in R? I am plotting .csv files like this one in a lollipop style.
Name,chr,Pos,Reads...ME_016,Reads...ME_017,Reads...ME_018,Reads...ME_019
cg01389728,chr10,6620395,33.82,41.38,41.38,38.46
cg01389728,chr10,6620410,0,-,-,-
cg01389728,chr10,6620430,0,0,-,-
cg01389728,chr10,6620447,0,-,0,-
cg01389728,chr10,6620478,0,-,-,-
cg01389728,chr10,6620510,28.33,29.85,25.64,28.13
cg01389728,chr10,6620520,0,0,-,0
cg01389728,chr10,6620531,0,-,50,-
Using ggplot2, my graphs are created with this:
dataset <-read.table("testset", sep=",",na.strings="-", header=TRUE)
dataset <- subset(dataset, select=c(-Name, -chr))
dataset <- melt(dataset, id.vars="Pos")
dataset$variable <- gsub("\\.\\.\\.","_",dataset$variable)
xaxes <- unique(dataset$Pos)
dataset$Pos <- as.factor(dataset$Pos)
ggplot(dataset, aes(x=Pos, y=variable,fill=cut(value, breaks=10))) + geom_point(size=4, shape=21) + geom_line() + scale_fill_discrete(labels=c("0-10%","10-20%","20-30%","30-40%","40-50%","50-60%","60-70%","70-80%","80-90%","90-100%")) +
xlab("CpG Positions") +
ylab("Sample") +
labs(fill="Coverage in %") +
theme_bw() +
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5),plot.title = element_text(vjust=2),axis.title.x = element_text(vjust=-0.5),axis.title.y = element_text(vjust=1.5))
However, I want to set the shape of the missing points ("-") in the plot to an "x", (shape=4) and show them also in the legend.
I've tried approaches like:
scale_fill_manual(values=c(value, NA))
or:
scale_shape_manual(values=c(21,4))
By default, the "-" are also shown with shape 21 and grey colour. There must be a way to manipulate this? Writing a method like this might be the trick, but how to call it for the whole column?
formas <- function(x){
+ if(is.na(x)) forma <- 4
+ if(!is.na(x)) forma <- 21
+ return(forma)
+ }
This comes pretty close, I think.
ggplot(dataset, aes(x=Pos, y=variable,
color=cut(value, breaks=10),
shape=ifelse(is.na(value),"Missing","Present"))) +
geom_point(size=4) +
geom_line() +
scale_shape_manual(name="",values=c(Missing=4,Present=19))+
scale_color_discrete(labels=c("0-10%","10-20%","20-30%","30-40%","40-50%","50-60%","60-70%","70-80%","80-90%","90-100%")) +
xlab("CpG Positions") +
ylab("Sample") +
labs(color="Coverage in %") +
theme_bw() +
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5),plot.title = element_text(vjust=2),axis.title.x = element_text(vjust=-0.5),axis.title.y = element_text(vjust=1.5))
Change are:
used color instead of fill, with shape=19 for points with data
added shape aesthetic to ggplot(...) call.
removed shape=21 from geom_point(...) call.
added scale_shape_manual(...) to define the shapes for Missing and Present, and turn off the guide label.
I know you wanted filled points with a black outline (it does look better), but when I tried that with the added shape aesthetic, the fill legend does not display the colors correctly. Try it yourself.
Here is another approach that comes closer to producing the graph you specified (circular points with black outline and fill color determined by coverage).
fill.colors <- hcl(h=seq(15, 375, length=11), l=65, c=100)[1:10]
ggplot(dataset, aes(x=Pos, y=variable,
fill=cut(value, breaks=10),
shape=ifelse(is.na(value),"Missing","Present"))) +
geom_point(size=4) +
geom_line() +
scale_fill_manual(name="Coverage in %",
values=fill.colors,
labels=c("0-10%","10-20%","20-30%","30-40%","40-50%","50-60%","60-70%","70-80%","80-90%","90-100%"),
drop=FALSE) +
scale_shape_manual(name="",values=c(Missing=4,Present=21),limits=c("Missing"))+
xlab("CpG Positions") +
ylab("Sample") +
labs(color="Coverage in %") +
theme_bw() +
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5),
plot.title = element_text(vjust=2),
axis.title.x = element_text(vjust=-0.5),
axis.title.y = element_text(vjust=1.5))+
guides(fill=guide_legend(override.aes=list(colour=fill.colors),order=1))
The problem in the other answer with using point shape 21 and the fill aesthetic is that, while the fill colors are displayed correctly in the plot, they are not displayed correctly in the legend. One way around that is to force ggplot to set the legend fill colors using
guides(fill=guide_legend(override.aes=list(colour=fill.colors),order=1))
Unfortunately, to do that you have to specify the fill colors manually (so that the actual fill and the override fill are the same). This code does that using
fill.colors <- hcl(h=seq(15, 375, length=11), l=65, c=100)[1:10]
which creates a color palette that mimics the ggplot default. You could of course use your own color palette here.
While this does come closer to your original intent, I actually think the other answer provides a better data visualization. The black outlines around the points, while "attractive", make it much more difficult to distinguish between fill colors, especially with 10 possible colors (which is at the edge of discernability anyway).
I can't see, why this is not working:
fill.colors <- hcl(h=seq(15, 375, length=11), l=65, c=100)[1:10]
ggplot(dataset, aes(x=Pos, y=variable
,color=cut(value, breaks=c(-0.01,10,20,30,40,50,60,70,80,90,100))
,shape=ifelse(is.na(value),"Missing","Present"))) +
geom_point(size=4) +
scale_shape_manual(name="",values=c("Missing"=4,"Present"=19),limits=c("Missing"))+
scale_color_manual(name="Coverage in %",
values=ifelse(is.na(dataset$value),"grey",fill.colors),
labels=c("0-10%","10-20%","20-30%","30-40%","40-50%","50-60%","60-70%","70-80%","80-90%","90-100%"),drop=FALSE) +
theme_bw() +
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5),
plot.title = element_text(vjust=2),
axis.title.x = element_text(vjust=-0.5),
axis.title.y = element_text(vjust=1.5)) +
xlab("CpG Positions") +
ylab("Sample") +
labs(color="Coverage in %") +
guides(fill=guide_legend(override.aes=list(colour=fill.colors),order=1))
NA values are not shown anymore with an X, and instead of displaying them in "grey", the class 90-100% will be shown in grey. No error message is shown - what is the problem?

Resources