Cannot remove grey area behind legend symbol when using smooth - r

I'm using ggplot2 with a GAM smooth to look at the relationship between two variables. When plotting I'd like to remove the grey area behind the symbol for the two types of variables. For that I would use theme(legend.key = element_blank()), but that doesn't seem to work when using a smooth.
Can anyone tell me how to remove the grey area behind the two black lines in the legend?
I have a MWE below.
library(ggplot2)
len <- 10000
x <- seq(0, len-1)
df <- as.data.frame(x)
df$y <- 1 - df$x*(1/len)
df$y <- df$y + rnorm(len,sd=0.1)
df$type <- 'method 1'
df$type[df$y>0.5] <- 'method 2'
p <- ggplot(df, aes(x=x, y=y)) + stat_smooth(aes(lty=type), col="black", method = "auto", size=1, se=TRUE)
p <- p + theme_classic()
p <- p + theme(legend.title=element_blank())
p <- p + theme(legend.key = element_blank()) # <--- this doesn't work?
p

Here is a very hacky workaround, based on the notion that if you map things to aestethics in ggplot, they appear in the legend. geom_smooth has a fill aesthetic which allows for different colourings of different groups if one so desires. If it's hard to fix that downstream, sometimes it's easier to keep those unwanted items out of the legend altogether. In your case, the color of the se appeared in the legend. As such, I've created two geom_smooths. One without a line color (but grouped by type) to create the plotted se's, and one with linetype mapped to aes but se set to false.
p <- ggplot(df, aes(x=x, y=y)) +
#first smooth; se only
stat_smooth(aes(group=type),col=NA, method = "auto", size=1, se=TRUE)+
#second smooth: line only
stat_smooth(aes(lty=type),col="black", method = "auto", size=1, se=F) +
theme_classic() +
theme(
legend.title = element_blank(),
legend.key = element_rect(fill = NA, color = NA)) #thank you #alko989

Related

Average line for 2D Histogram?

I am a very new user of "R" and have a question.I am currently working on making 2D Histograms on R. The material necessarily does not matter but how do I plot an average line on the 2D Histogram. The code I am running is this:
load("mydatabin.RData")
# Color housekeeping
library(RColorBrewer)
rf <- colorRampPalette(rev(brewer.pal(11,'Spectral')))
r <- rf(32)
# Create normally distributed data for plotting
x <- mydata$AGE
y <- mydata$BP
df <- data.frame(x,y)
# Plot
plot(df, pch=16, col='black', cex=0.5)
This gives me a scatter plot and then to turn it into a 2D Histogram I do:
library(ggplot2)
# Default call (as object)
p <- ggplot(df, aes(x,y))
h3 <- p + stat_bin2d()
h3
# Default call (using qplot)
qplot(x,y,data=df, geom='bin2d')
After this I do:
h3 <- p + stat_bin2d(bins=25) + scale_fill_gradientn(colours=r)
h3
to add color.
Therefore, from here how do I plot an average line of the data.
And if anyone can tell me how to plot a heat map that looks like this using mydatebin.RData:
Thanks.
You can use geom_hline or geom_vline in ggplot2, passing y/xintercept as a parameter to draw a line. In your case, the parameter can be an average of one of your column to draw an average line. See the code for the example.
I also played around and tried two different ways to draw 2D histograms. Yours seems better and more precise, though I removed colorBrewer.
library(ggplot2)
# Create normally distributed data for plotting
x <- rnorm(10000)
y <- rnorm(10000)
df <- data.frame(x,y)
# stat_density2d way, with average lines
p1 <- ggplot(df,aes(x=x,y=y))+
stat_density2d(aes(fill=..level..), geom="polygon") +
scale_fill_gradient(low="navy", high="yellow") +
# Here go average lines
geom_hline(yintercept = mean(df$y), color = "red") +
geom_vline(xintercept = mean(df$x), color = "red") +
# Just to remove grid and set background color
theme(line = element_blank(),
panel.background = element_rect(fill = "navy"))
p1
# stat_bin2d way, with average lines
p2 <- ggplot(df, aes(x,y)) +
stat_bin2d(bins=50) +
scale_fill_gradient(low="navy", high="yellow") +
# Here go average lines
geom_hline(yintercept = mean(df$y), color = "red") +
geom_vline(xintercept = mean(df$x), color = "red") +
# Just to remove grid and set background color
theme(line = element_blank(),
panel.background = element_rect(fill = "navy"))
p2

ggplot: Manually add legends for aesthetics that are not mapped

I want to produce a barplot overlayed with dots where both have separate legends. Also, I want to choose the color of the bars and the size of the dots using the arguments outside aes(). As both are not mapped, no legend is produced.
1) How can I add a legend manually for both fill and size?
library(ggplot2)
d <- data.frame(group = 1:3,
prop = 1:3 )
ggplot(d, aes(x=group, y=prop)) +
geom_bar(stat="identity", fill="red") +
geom_point(size=5)
This is what I came up with: I used dummy mappings and modified the legend according to my needs afterwards. But this approach appears clumsy to me.
2) Is there a manual way to say: Add a legend with this title, these shapes, these colors etc.?
d <- data.frame(dummy1="d1",
dummy2="d2",
group = 1:3,
prop = 1:3 )
ggplot(d, aes(x=group, y=prop, fill=dummy1, size=dummy2)) +
geom_bar(stat="identity", fill="red") +
geom_point(size=5) +
scale_fill_discrete(name="fill legend", label="fill label") +
scale_size_discrete(name="size legend", label="size label")
Above I mapped fill to dummy1. So I would expect scale_fill_discrete to alter this legend. But it appears to modify the size legend instead.
3) I am not sure what went wrong here. Any ideas?
I'm not sure why you say "Also, I want to choose the color of the bars and the size of the dots using the arguments outside aes()". Is it something you're trying to do or is it something that you have to do given how ggplot works?
If it's the latter, one solution is as under -
library(ggplot2)
d <- data.frame(group = 1:3,
prop = 1:3 )
ggplot(d, aes(x=group, y=prop)) +
geom_bar(stat="identity",aes( fill="label")) +
geom_point(aes(size='labelsize')) +
scale_fill_manual(breaks = 'label', values = 'red')+
scale_size_manual(breaks = 'labelsize', values = 5)

ggplot: legend for a plot the combines bars / lines?

I have a empirical PDF + CDF combo I'd like to plot on the same panel. distro.df has columns pdf, cdf, and day. I'd like the pdf values to be plotted as bars, and the cdf as lines. This does the trick for making the plot:
p <- ggplot(distro.df, aes(x=day))
p <- p + geom_bar(aes(y=pdf/max(pdf)), stat="identity", width=0.95, fill=fillCol)
p <- p + geom_line(aes(y=cdf))
p <- p + xlab("Day") + ylab("")
p <- p + theme_bw() + theme_update(panel.background = element_blank(), panel.border=element_blank())
However, I'm having trouble getting a legend to appear. I'd like a line for the cdf and a filled block for the pdf. I've tried various contortions with guides, but can't seem to get anything to appear.
Suggestions?
EDIT:
Per #Henrik's request: to make a suitable distro.df object:
df <- data.frame(day=0:10)
df$pdf <- runif(length(df$day))
df$pdf <- df$pdf / sum(df$pdf)
df$cdf <- cumsum(df$pdf)
Then the above to make the plot, then invoke p to see the plot.
This generally involves moving fill into aes and using it in both the geom_bar and geom_line layers. In this case, you also need to add show_guide = TRUE to geom_line.
Once you have that, you just need to set the fill colors in scale_fill_manual so CDF doesn't have a fill color and use override.aes to do the same thing for the lines. I didn't know what your fill color was, so I just used red.
ggplot(df, aes(x=day)) +
geom_bar(aes(y=pdf/max(pdf), fill = "PDF"), stat="identity", width=0.95) +
geom_line(aes(y=cdf, fill = "CDF"), show_guide = TRUE) +
xlab("Day") + ylab("") +
theme_bw() +
theme_update(panel.background = element_blank(),
panel.border=element_blank()) +
scale_fill_manual(values = c(NA, "red"),
breaks = c("PDF", "CDF"),
name = element_blank(),
guide = guide_legend(override.aes = list(linetype = c(0,1))))
I'd still like a solution to the above (and will checkout #aosmith's answer), but I am currently going with a slightly different approach to eliminate the need to solve the problem:
p <- ggplot(distro.df, aes(x=days, color=pdf, fill=pdf))
p <- p + geom_bar(aes(y=pdf/max(pdf)), stat="identity", width=0.95)
p <- p + geom_line(aes(y=cdf), color="black")
p <- p + xlab("Day") + ylab("CDF")
p <- p + theme_bw() + theme_update(panel.background = element_blank(), panel.border=element_blank())
p
This also has the advantage of displaying some of the previously missing information, namely the PDF values.

HeatMap not displaying correctly using ggplot()

I am having a strange situation when I am trying to plot a heatmap on a dataset that I have which can be found here.
I am using the following code to plot the heat map:
xaxis<-c('density')
midrange<-range(red[,xaxis])
xaxis <- c(xaxis,'quality')
molten<-melt(red[,xaxis],'quality')
p <- ggplot(molten, aes(x = value, y = quality))
p <- p + geom_tile(aes(fill = value), colour = "white")
p <- p + theme_minimal()
# turn y-axis text 90 degrees (optional, saves space)
p <- p + theme(axis.text.y = element_text(angle = 90, hjust = 0.5))
# remove axis titles, tick marks, and grid
p <- p + theme(axis.title = element_blank())
p <- p + theme(axis.ticks = element_blank())
p <- p + theme(panel.grid = element_blank())
p <- p + scale_y_discrete(expand = c(0, 0))
# optionally remove row labels (not useful depending on molten)
p <- p + theme(axis.text.x = element_blank())
# get diverging color scale from colorbrewer
# #008837 is green, #7b3294 is purple
palette <- c("#008837", "#b7f7f4", "#b7f7f4", "#7b3294")
if(midrange[1] == midrange[2]) {
# use a 3 color gradient instead
p <- p + scale_fill_gradient2(low = palette[1], mid = palette[2], high = palette[4], midpoint = midrange[1]) +
xlim(midrange[1],midrange[2])
}else{
# use a 4 color gradient (with a swath of white in the middle)
p <- p + scale_fill_gradientn(colours = palette, values = c(0, midrange[1], midrange[2], 1)) +
xlim(midrange[1],midrange[2])
}
p
I am trying to plot the heat map on the variable Density and would like to use the variable quality as separation in my heat map. When I use the above code, I get the following plot:
It can be clearly seen that it is a blank image. This is happening because the range of the variable Density is very low, it doesn't happen if I change the variable to the one having a wider range (pH for example).
Should ggplot automatically adjust to this? If not, how can I get ggplot to show the real plot?
Any help in this regard will be much appreciated.
So there are (at least) two problems here.
First, you have almost 1600 tiles in the x-direction, so specifying color="white" for the outline means that all you see is the outline, hence, white. Try taking this out.
Second, in your values=c(...) argument to scale_fill_gradientn(...) you seem to expect the midrange[1] and midrange[2] to be between (0,1), but midrange[2] = 1.003.
After taking out color="white" from the call to geom_tile(...), I get this:

overlaying plots in ggplot2

How to overlay one plot on top of the other in ggplot2 as explained in the following sentences? I want to draw the grey time series on top of the red one using ggplot2 in R (now the red one is above the grey one and I want my graph to be the other way around). Here is my code (I generate some data in order to show you my problem, the real dataset is much more complex):
install.packages("ggplot2")
library(ggplot2)
time <- rep(1:100,2)
timeseries <- c(rep(0.5,100),rep(c(0,1),50))
upper <- c(rep(0.7,100),rep(0,100))
lower <- c(rep(0.3,100),rep(0,100))
legend <- c(rep("red should be under",100),rep("grey should be above",100))
dataset <- data.frame(timeseries,upper,lower,time,legend)
ggplot(dataset, aes(x=time, y=timeseries)) +
geom_line(aes(colour=legend, size=legend)) +
geom_ribbon(aes(ymax=upper, ymin=lower, fill=legend), alpha = 0.2) +
scale_colour_manual(limits=c("grey should be above","red should be under"),values = c("grey50","red")) +
scale_fill_manual(values = c(NA, "red")) +
scale_size_manual(values=c(0.5, 1.5)) +
theme(legend.position="top", legend.direction="horizontal",legend.title = element_blank())
Convert the data you are grouping on into a factor and explicitly set the order of the levels. ggplot draws the layers according to this order. Also, it is a good idea to group the scale_manual codes to the geom it is being applied to for readability.
legend <- factor(legend, levels = c("red should be under","grey should be above"))
c <- data.frame(timeseries,upper,lower,time,legend)
ggplot(c, aes(x=time, y=timeseries)) +
geom_ribbon(aes(ymax=upper, ymin=lower, fill=legend), alpha = 0.2) +
scale_fill_manual(values = c("red", NA)) +
geom_line(aes(colour=legend, size=legend)) +
scale_colour_manual(values = c("red","grey50")) +
scale_size_manual(values=c(1.5,0.5)) +
theme(legend.position="top", legend.direction="horizontal",legend.title = element_blank())
Note that the ordering of the values in the scale_manual now maps to "grey" and "red"

Resources