Create ggplot2 legend for multiple datasets - r

I am trying to display background data in grey in a ggplot with legend automatically. My aim is to either include the grey datapoints in the legend, or to make a second legend with a manual title. However I fail at doing any of the two. My data is in long format.
require(ggplot2)
xx<-data.frame(observation="all cats",x=1:2,y=1:2)
yy<-data.frame(observation=c("red cats","blue cats"),x=3:4,y=3:4)
g<-ggplot() +
geom_point(aes(x,y, colour=factor(observation)), colour="grey60", size=5, data=xx) +
geom_point(aes(x,y, colour=factor(observation)), size=5, data=yy) +
scale_color_discrete(name = "ltitle")
g
I tried to merge the data.frames with rbind.data.frame, which produces a nice legend, but then I am not able to colour the background data in grey and keep ggplot colours at the same time.
I also realized that this solves the problem:
g<-ggplot(aes(x,y, colour=factor(observation)), colour="grey60", data=xx) +
geom_point(size=5) +
geom_point(aes(x,y, colour=factor(observation)), size=5, data=yy) +
scale_color_discrete(name = "ltitle")
g
however I can't do this, because I'm using a function which creates a complicated empty plot before, in which I then add the geom_points.

Assuming your plot doesn't have other geoms that require a fill parameter, the following is a workaround that fixes the colour of your background data geom_point layer without affecting the other geom_point layers:
g <- ggplot() +
geom_point(aes(x, y,
fill = "label"), # key change 1
shape = 21, # key change 2
color = "grey50", size = 5,
data = xx) +
geom_point(aes(x, y, colour = factor(observation)), size = 5, data = yy) +
scale_color_discrete(name = "ltitle") +
scale_fill_manual(name = "", values = c("label" = "grey50")) # key change 3
g
shape = 21 gives you a shape that looks like the default round dot, but accepts a fill parameter in addition to the colour parameter. You can then set xx's geom_point layer's fill to grey in scale_fill_manual() (this creates a fill legend), while leaving color = "grey50" outside aes() (this does not add to the colour legend).
The colour scale for yy's geom_point layer is not affected by any of this.
p.s. Just realized I used "grey50" instead of "grey60"... But everything else still applies. :)

One solution is to create color vector and pass it to scale_color_manual.
xx <- data.frame(observation = "all cats",x = 1:2,y = 1:2)
yy <- data.frame(observation = c("red cats", "blue cats"),x = 3:4,y = 3:4)
# rbind both datasets
# OP tried to use rbind.data.frame here
plotData <- rbind(xx, yy)
# Create color vector
library(RColorBrewer)
# Extract 3 colors from brewer Set1 palette
colorData <- brewer.pal(length(unique(plotData$observation)), "Set1")
# Replace first color first wanted grey
colorData[1] <- "grey60"
# Plot data
library(ggplot2)
ggplot(plotData, aes(x, y, colour = observation)) +
geom_point(size = 5)+
scale_color_manual(values = colorData, name = "ltitle")

I came up with pretty much same solution as Z.Lin but using the combined dataframe from rbind.data.frame. Similarly, it uses scale_colour_manual with a vector colors specifying the color mapping:
require(ggplot2)
xx<-data.frame(observation="all cats",x=1:2,y=1:2)
yy<-data.frame(observation=c("red cats","blue cats"),x=3:4,y=3:4)
zz <- rbind.data.frame(xx,yy)
colors <- c(
"all cats" = "grey60",
"red cats" = "red",
"blue cats" = "blue"
)
g<-ggplot() +
geom_point(aes(x,y, colour=factor(observation)), size=5, data=zz) +
scale_color_manual(values= colors, name = "ltitle")
g

Related

ggplot2 unable to color legend icons

I'm trying to use ggplot2 to make some sort of timeline using values from a dataframe (df). I've managed to plot the data exactly how I want it (the different colored line segments connecting the x-marks in this exact order, i.e., from left to right: 'early', 'unknown', 'late', 'sub'). The startpoint and endpoint columns in the dataframe are used to define the positions of the points and line segments.
The problem is that the legend doesn't show the color of the 'x' icons, they are just grey. I've tried adding scale_color_manual() and scale_fill_manual() commands but they don't seem to change anything. The legend does display the correct color when I change the shape to shape = 21, however, I really want the shape to be 4 (x icons). I don't care about the shape of the legend though but scale_shape_manual() again didn't change anything about the legend.
I have also tried placing different color arguments inside and outside the aes() argument of ggplot(), geom_segment() and/or geom_point().
How can I make the icons from the legend show the correct color?
Below I added a piece of code to reproduce the problem.
library(ggplot2)
library(RColorBrewer)
## Define dataframe
df <- data.frame(Var = c("sub","late","unknown","early"),
Time = c(10,267,0,1256),
Endpoint = c(1533,1523,1256,1256),
Startpoint = c(1523,1256,1256,0))
colorscheme <- RColorBrewer::brewer.pal(9, "Set1")[c(1,4,2,3)]
## Make plot
ggplot(df, aes(x="", y=Endpoint, fill=Var), color =colorscheme) +
geom_segment( aes(x="", xend="", y=Startpoint, yend=Endpoint), color = colorscheme) +
geom_point(aes(x="", y=Endpoint),size=5, shape=4 , color = colorscheme) +
coord_flip()
Thanks in advance for any suggestions!
You should use color instead of fill. To remove the line from the legend, use guides(color = guide_legend(override.aes = list(linetype = 0))) or use show.legend = F in geom_segment.
Also, arguments passed in ggplot need not to be repeated afterward.
ggplot(df, aes(x="", y=Endpoint, color=Var), colorscheme) +
geom_segment(aes(xend="", y=Startpoint, yend=Endpoint)) +
geom_point(size=5, shape=4) +
coord_flip() +
guides(color = guide_legend(override.aes = list(linetype = 0)))
#or
ggplot(df, aes(x="", y=Endpoint, color=Var), colorscheme) +
geom_segment(aes(xend="", y=Startpoint, yend=Endpoint)) +
geom_point(size=5, shape=4) +
coord_flip()
Try this:
ggplot(df, aes(x = "", y = Endpoint, color = Var), colorscheme) +
geom_segment(aes(x = "", xend = "", y = Startpoint, yend = Endpoint), show.legend = FALSE) +
geom_point(aes(x = "", y = Endpoint), size = 5, shape = 4) +
coord_flip()
In this way legend will show only X

R: ggplot2 density plot shows wrong fill colors

I would like to plot densities of two variables ("red_variable", "green_variable") from two independent dataframes on one density plot, using red and green color for the two variables.
This is my attempt at coding:
library(ggplot2)
### Create dataframes
red_dataframe <- data.frame(red_variable = c(10,11,12,13,14))
green_dataframe <- data.frame(green_variable = c(6,7,8,9,10))
mean(red_dataframe$red_variable) # mean is 12
mean(green_dataframe$green_variable) # mean is 8
### Set colors
red_color= "#FF0000"
green_color= "#008000"
### Trying to plot densities with correct colors and correct legend entries
ggplot() +
geom_density(aes(x=red_variable, fill = red_color, alpha=0.5), data=red_dataframe) +
geom_density(aes(x=green_variable, fill = green_color, alpha=0.5), data=green_dataframe) +
scale_fill_manual(labels = c("Density of red_variable", "Density of green_variable"), values = c(red_color, green_color)) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha=FALSE)
Result: The legend shows correct colors, but the colors on the plot are wrong: The "red" variable is plotted with green color, the "green" variable with red color. The "green" density (mean=8) should appear left and the "red" density (mean=12) on the right on the x-axis. This behavior of the plot doesn't make any sense to me.
I can in fact get the desired result by switching red and green in the code:
### load ggplot2
library(ggplot2)
### Create dataframes
red_dataframe <- data.frame(red_variable = c(10,11,12,13,14))
green_dataframe <- data.frame(green_variable = c(6,7,8,9,10))
mean(red_dataframe$red_variable) # mean is 12
mean(green_dataframe$green_variable) # mean is 8
### Set colors
red_color= "#FF0000"
green_color= "#008000"
### Trying to plot densities with correct colors and correct legend entries
ggplot() +
geom_density(aes(x=red_variable, fill = green_color, alpha=0.5), data=red_dataframe) +
geom_density(aes(x=green_variable, fill = red_color, alpha=0.5), data=green_dataframe) +
scale_fill_manual(labels = c("Density of red_variable", "Density of green_variable"), values = c(red_color, green_color)) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha=FALSE)
... While the plot makes sense now, the code doesn't. I cannot really trust code doing the opposite of what I would expect it to do. What's the problem here? Am I color blind?
On your code, in order to have color at the right position, you need to specify fill = red_color or fill = green_color (as well as alpha as it is a constant - as pointed out by #Gregor) outside of the aes such as:
...+
geom_density(aes(x=red_variable), alpha=0.5, fill = red_color, data=red_dataframe) +
geom_density(aes(x=green_variable), alpha=0.5, fill = green_color, data=green_dataframe) + ...
Alternatively, you can bind your dataframes together, reshape them into a longer format (much more appropriate to ggplot) and then add color column that you can use with scale_fill_identity function (https://ggplot2.tidyverse.org/reference/scale_identity.html):
df <- cbind(red_dataframe,green_dataframe)
library(tidyr)
library(ggplot2)
library(dplyr)
df <- df %>% pivot_longer(.,cols = c(red_variable,green_variable), names_to = "var",values_to = "val") %>%
mutate(Color = ifelse(grepl("red",var),red_color,green_color))
ggplot(df, aes(val, fill = Color))+
geom_density(alpha = 0.5)+
scale_fill_identity(guide = "legend", name = "Legend", labels = levels(as.factor(df$var)))+
xlab("X value") +
ylab("Density")
Does it answer your question ?
You're trying to use ggplot as if it's base graphics... the mindset shift can take a little while to get used to. dc37's answer shows how you should do it. I'll try to explain what goes wrong in your attempt:
When you put fill = green_color inside aes(), because it's inside aes() ggplot essentially creates a new column of data filled with the green_color values in your green_data_frame, i.e., "#008000", "#008000", "#008000", .... Ditto for the red color values in the red data frame. We can see this if we modify your plot by simply deleting your scale:
ggplot() +
geom_density(aes(x = red_variable, fill = green_color, alpha = 0.5), data =
red_dataframe) +
geom_density(aes(x = green_variable, fill = red_color, alpha = 0.5), data =
green_dataframe) +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha = FALSE)
We can actually get what you want by putting the identity scale, which is designed for the (common in base, rare in ggplot2) case where you actually put color values in the data.
ggplot() +
geom_density(aes(x = red_variable, fill = green_color, alpha = 0.5), data =
red_dataframe) +
geom_density(aes(x = green_variable, fill = red_color, alpha = 0.5), data =
green_dataframe) +
scale_fill_identity() +
xlab("X value") +
ylab("Density") +
labs(fill = "Legend") +
guides(alpha = FALSE)
When you added your scale_fill_manual, ggplot was like "okay, cool, you want to specify colors and labels". But you were thinking in the order that you added the layers to the plot (much like base graphics), whereas ggplot was thinking of these newly created variables "#FF0000" and "#008000", which it ordered alphabetically by default (just as if they were factor or character columns in a data frame). And since you happened to add the layers in reverse alphabetical order, it was switched.
dc37's answer shows a couple better methods. With ggplot you should (a) work with a single, long-format data frame whenever possible (b) don't put constants inside aes() (constant color, constant alpha, etc.), (c) set colors in a scale_fill_* or scale_color_* function when they're not constant.

Legend with un-plotted variable in ggplot2

I have data where each point lays on a spectrum between two centroids. I have generated a color for each point by specifying a color for each centroid, then setting the color of each point as a function of its position between its two centroids. I used this to manually specify colors for each point and plotted the data in the following way:
lb.plot.dat <- data.frame('UMAP1' = lb.umap$layout[,1], 'UMAP2' = lb.umap$layout[,2],
'sample' = as.factor(substr(colnames(lb.vip), 1, 5)),
'fuzzy.class' = color.vect))
p3 <- ggplot(lb.plot.dat, aes(x = UMAP1, y = UMAP2)) + geom_point(aes(color = color.vect)) +
ggtitle('Fuzzy Classification') + scale_color_identity()
p3 + facet_grid(cols = vars(sample)) + theme(legend.) +
ggsave(filename = 'ref-samps_bcell-vip-model_fuzzy-class.png', height = 8, width = 16)
(color.vect is the aforementioned vector of colors for each point in the plot)
I would like to generate a legend of this plot that gives the color used for each centroid. I have a named vector class.cols that contains the colors used for each centroid and is named according to the corresponding class.
Is there a way to transform this vector into a legend for the plot even though it is not explicitly used in the plotting call?
You can turn on legend drawing in scale_color_identity() by setting guide = "legend". You'll have to specify the breaks and labels in the scale function so that the legend correctly states what each color represents, and not just the name of the color.
library(ggplot2)
df <- data.frame(x = 1:3, y = 1:3, color = c("red", "green", "blue"))
# no legend by default
ggplot(df, aes(x, y, color = color)) +
geom_point() +
scale_color_identity()
# legend turned on
ggplot(df, aes(x, y, color = color)) +
geom_point() +
scale_color_identity(guide = "legend")
Created on 2019-12-15 by the reprex package (v0.3.0)

vline legends not showing on geom_histogram type plot

I'm struggling getting the legends for two vline elements showing on an histogram. I set the show_guide=T on the vline elements to force the legend to show, but it does not help.
I suspect that this is due to the geom_histogram plot showing only one series of data which. But not sure how to force the legend to show first place.
ggplot(mttr, aes(x=Resolution.Time)) +
geom_histogram(binwidth=0.5) +
geom_vline(aes(xintercept=mean(Resolution.Time, na.rm=T)),
color="red", linetype="dashed", size=1, show_guide=T) +
geom_vline(aes(xintercept=median(Resolution.Time, na.rm=T)),
color="green", linetype="dashed", size=1, show_guide=T) +
xlim(c(0,40)) +
xlab("Resolution Time (days)") # + theme(legend.position=c(1,0), legend.justification=c(1,0))
Is there a way to force the legend to show (for vlines), even if the histogram does not have an aesthetic fill ?
I already tried a number of potential solutions found on stackoverflow but without success.
Any help would be greatly appreciated.
Thanks!
Method 1: Manually map colors
Place colour=<your label> inside the aes function for geom_vline
Add scale_color_manual and correctly map your labels to the colors that you want
Example:
mttr <- data.frame(Resolution.Time = rexp(1000, 0.25))
ggplot(mttr, aes(x=Resolution.Time)) +
geom_histogram(binwidth=0.5) +
# Notice that I have color = "Mean" inside aes
geom_vline(aes(xintercept=mean(Resolution.Time, na.rm=T), color = "Mean"),
linetype="dashed", size=1, show_guide=T) +
# Here I have have color = "Median" inside aes
geom_vline(aes(xintercept=median(Resolution.Time, na.rm=T), color = "Median"),
linetype="dashed", size=1, show_guide=T) +
xlim(c(0,40)) +
xlab("Resolution Time (days)") +
# Here I map my labels, "Mean" and "Median", to their colors
# The legend title is "Statistics"
scale_color_manual("Statistics", values = c("Mean" = "red", "Median" = "green"))
Method 2: Create a separate data frame to plot your vertical lines
This method scales more nicely, and involves a single geom_vline instead of several.
Make a data.frame for the vertical lines that you want to plot, with columns describing the labels and the values
Have a single geom_vline that maps the xintercept to the values and the color to the labels
Optionally, control the colors by choosing an appropriate scale_color_*()
Example:
# Create a data frame with two columns, a label and a value
vlines = data.frame("Statistic" = c("Mean", "Median"),
"Value" = c(mean(mttr$Resolution.Time), median(mttr$Resolution.Time)),
stringsAsFactors = FALSE)
# Plot
ggplot(mttr, aes(x=Resolution.Time)) +
geom_histogram(binwidth=0.5) +
# Here we have a single geom_vline call
# Map the xintercept to Value, color to Statistic
# Specify data = vlines outside the aes function
geom_vline(aes(xintercept = Value, color = Statistic),
linetype = "dashed", size = 1, show_guide = TRUE, data = vlines) +
xlim(c(0,40)) +
xlab("Resolution Time (days)") # + scale_color_*() to change your colors

How to remove the background and labels when we have two geoms in ggplot2

I am developing a graph in R with ggplot2 that has two geoms (one geom_line and one geom_text). It draws a line graph and then places text labels on start and end points of each line segment.
(myplot <- ggplot(data=datatable, aes(x, y, group = group,colour = group, label=mylabels)) + geom_line(size = 1.5))
myplot + geom_text(color = "black")
Now my question is how can I do the following tasks in ggplot2, they all work when I only have one geom but not with both (seems that they overide each other)
1 - making the background white.
The following code works with geom_line but as soon as I add geom_text it becomes gray again. Even if I add this line after geom_text it gets rid of the point labels that are on the chart.
myplot + opts(panel.background = theme_rect(fill = "white", colour = NA))
2- x labels and x label format. Again the following code works with only one geom but breaks when I have the second geom
myplot + scale_x_date(format="%m", 'my x label')
3- While we are on it how can I put the legend at the bottom and spread it horizontally (p + opts(legend.position="bottom")) spreads that vertically that looks very stupid.
For 1), you haven't saved the object myplot after the second and third calls involving it. This works for me:
set.seed(3)
dat <- data.frame(x = runif(10), y = rnorm(10), group = gl(2,5),
mylabel = paste(1:10, "foo"))
require(ggplot2)
myplot <- ggplot(data=dat, aes(x, y, group = group, colour = group,
label = mylabel)) + geom_line(size = 1.5)
myplot + geom_text(color = "black") +
opts(panel.background = theme_rect(fill = "white", colour = NA))
Note that I only ever save myplot once. The second call involving myplot modifies it on the fly but doesn't save it.
For the rest, you'll need to provide a reproducible example.

Resources