Objective: to have a color legend for geom_point showing a colored dot for each group together with a legend for geom_abline showing one colored line for each line.
Am I doing something wrong? Is there a solution?
# data: mtcars + some made-up straight lines
library(ggplot2)
df = data.frame(Intercept = c(2,3,4), Slope = c(0,0,0), Group = factor(c(1, 2, 3)))
Comment about #1: There's nothing special about the basic plot, but I have grouped the data inside the aes() and made color an aes(). I think it is standard to have both "group" and "color" inside the aes to achieve grouping and coloring.
# 1. my basic plot
ggplot(data = mtcars, aes(x = mpg, y = wt, group = vs, color = factor(vs))) +
geom_point() -> p
Comment about #2: Clearly I did not set up ggplot right to handle the legend properly. I also tried to add group = Group inside the aes. But there is a somewhat more serious problem: geom_point forms 2 groups, geom_abline forms 3 groups, but the legend is showing only 4 color/line combinations. One of them has been merged (the green one). What have I done wrong here?
# 2. my naive attempt to add ablines of 3 different colours
p + geom_abline(data = df, aes(intercept = Intercept, slope = Slope,
colour = Group))
Comment about #3: The ablines have been removed in the legend, but the points are still not right. From here on it gets more and more desperate.
# 3. Suppress the ab_line legend?
p + geom_abline(data = df, aes(intercept = Intercept, slope = Slope,
colour = Group), show.legend = FALSE)
Comment about #4: This is what I'm going for at the moment. Better no legend than a wrong legend. Shame about losing the colors though.
# 4. Remove the geom_abline legend AND colors
p + geom_abline(data = df, aes(intercept = Intercept, slope = Slope))
Comment #5: I don't know what I was hoping here... that if I defined the data and aes inside the call to geom_point() rather than the ggplot(), somehow geom_abline()) would not hijack the colors and legend, but no, it does not appear to make a difference.
# 5. An attempt to set the aes inside geom_point() instead of ggplot()
ggplot() +
geom_point(data = mtcars, aes(x = mpg, y = wt, group = vs, color = factor(vs))) +
geom_abline(data = df, aes(intercept = Intercept, slope = Slope, color = "groups")) +
scale_color_manual(values = c("red", "blue", "black"))
One option would be to use a filled shape for the mtcars data, then you can have a fill scale and a colour scale, rather than two colour scales. You could add an option such as colour="white" to the geom_point statement in order to change the colour of the edges of the points, if you don't want the black outlines.
library(ggplot2)
df = data.frame(Intercept = c(2,3,4), Slope = c(0,0,0), Group = factor(c(1, 2, 3)))
ggplot(data = mtcars, aes(x = mpg, y = wt, group = vs, fill = factor(vs))) +
geom_point(shape=21, size=2) +
geom_abline(data = df, aes(intercept = Intercept, slope = Slope,
colour = Group))
if you need or want a horizontal line in the legend you might consider using this code:
library(ggplot2)
df = data.frame(Intercept = c(2,3,4), Slope = c(0,0,0),
Group = factor(c(1, 2, 3)))
ggplot(data = mtcars,
aes(x = mpg, y = wt, group = vs, fill = factor(vs))) +
geom_point(shape=21, size=2) +
geom_hline(data = df,
aes(yintercept = Intercept,colour = Group))
plot with geom_hline
Related
Problem
I have some data points stored in data.frame with three variables, x, y, and gender. My goal is to draw several generally fitted lines and also lines specifically fitted for male/female over the scatter plot, with points coloured by gender. It sounds easy but some issues just persist.
What I currently do is to use a new set of x's and predict y's for every model, combine the fitted lines together in a data.frame, and then convert wide to long, with their model name as the third var (from this post: ggplot2: how to add the legend for a line added to a scatter plot? and this: Add legend to ggplot2 line plot I learnt that mapping should be used instead of setting colours/legends separately). However, while I can get a multicolor line plot, the points come without specific colour for gender (already a factor) as I expected from the posts I referenced.
I also know it might be possible to use aes=(y=predict(model)), but I met other problems for this. I also tried to colour the points directly in aes, and assign colours separately for each line, but the legend cannot be generated unless I use lty, which makes legend in the same colour.
Would appreciate any idea, and also welcome to change the whole method.
Code
Note that two pairs of lines overlap. So it only appeared to be two lines. I guess adding some jitter in the data might make it look differently.
slrmen<-lm(tc~x+I(x^2),data=data[data['gender']==0,])
slrwomen<-lm(tc~x+I(x^2),data=data[data['gender']==1,])
prdf <- data.frame(x = seq(from = range(data$x)[1],
to = range(data$x)[2], length.out = 100),
gender = as.factor(rep(1,100)))
prdm <- data.frame(x = seq(from = range(data$x)[1],
to = range(data$x)[2], length.out = 100),
gender = as.factor(rep(0,100)))
prdf$fit <- predict(fullmodel, newdata = prdf)
prdm$fit <- predict(fullmodel, newdata = prdm)
rawplotdata<-data.frame(x=prdf$x, fullf=prdf$fit, fullm=prdm$fit,
linf=predict(slrwomen, newdata = prdf),
linm=predict(slrmen, newdata = prdm))
plotdata<-reshape2::melt(rawplotdata,id.vars="x",
measure.vars=c("fullf","fullm","linf","linm"),
variable.name="fitmethod", value.name="y")
plotdata$fitmethod<-as.factor(plotdata$fitmethod)
plt <- ggplot() +
geom_line(data = plotdata, aes(x = x, y = y, group = fitmethod,
colour=fitmethod)) +
scale_colour_manual(name = "Fit Methods",
values = c("fullf" = "lightskyblue",
"linf" = "cornflowerblue",
"fullm"="darkseagreen", "linm" = "olivedrab")) +
geom_point(data = data, aes(x = x, y = y, fill = gender)) +
scale_fill_manual(values=c("blue","green")) ## This does not work as I expected...
show(plt)
Code for another method (omitted two lines), which generates same-colour legend and multi-color plot:
ggplot(data = prdf, aes(x = x, y = fit)) + # prdf and prdm are just data frames containing the x's and fitted values for different models
geom_line(aes(lty="Female"),colour = "chocolate") +
geom_line(data = prdm, aes(x = x, y = fit, lty="Male"), colour = "darkblue") +
geom_point(data = data, aes(x = x, y = y, colour = gender)) +
scale_colour_discrete(name="Gender", breaks=c(0,1),
labels=c("Male","Female"))
This is related to using the colour aesthetic for lines and the fill aesthetics for points in your own (first) example. In the second example, it works because the colour aesthetic is used for lines and points.
By default, geom_point can not map a variable to fill, because the default point shape (19) doesn't have a fill.
For fill to work on points, you have to specify shape = 21:25 in geom_point(), outside of aes().
Perhaps this small reproducible example helps to illustrate the point:
Simulate data
set.seed(4821)
x1 <- rnorm(100, mean = 5)
set.seed(4821)
x2 <- rnorm(100, mean = 6)
data <- data.frame(x = rep(seq(20,80,length.out = 100),2),
tc = c(x1, x2),
gender = factor(c(rep("Female", 100), rep("Male", 100))))
Fit models
slrmen <-lm(tc~x+I(x^2), data = data[data["gender"]=="Male",])
slrwomen <-lm(tc~x+I(x^2),data = data[data["gender"]=="Female",])
newdat <- data.frame(x = seq(20,80,length.out = 200))
fitted.male <- data.frame(x = newdat,
gender = "Male",
tc = predict(object = slrmen, newdata = newdat))
fitted.female <- data.frame(x = newdat,
gender = "Female",
tc = predict(object = slrwomen, newdata = newdat))
Plot using colour aesthetics
Use the colour aesthetics for both points and lines (specify in ggplot such that it gets inherited throughout). By default, geom_point can map a variable to colour.
library(ggplot2)
ggplot(data, aes(x = x, y = tc, colour = gender)) +
geom_point() +
geom_line(data = fitted.male) +
geom_line(data = fitted.female) +
scale_colour_manual(values = c("tomato","blue")) +
theme_bw()
Plot using colour and fill aesthetics
Use the fill aesthetics for points and the colour aesthetics for lines (specify aesthetics in geom_* to prevent them being inherited). This will reproduce the problem.
ggplot(data, aes(x = x, y = tc)) +
geom_point(aes(fill = gender)) +
geom_line(data = fitted.male, aes(colour = gender)) +
geom_line(data = fitted.female, aes(colour = gender)) +
scale_colour_manual(values = c("tomato","blue")) +
scale_fill_manual(values = c("tomato","blue")) +
theme_bw()
To fix this, change the shape argument in geom_point to a point shape that can be filled (21:25).
ggplot(data, aes(x = x, y = tc)) +
geom_point(aes(fill = gender), shape = 21) +
geom_line(data = fitted.male, aes(colour = gender)) +
geom_line(data = fitted.female, aes(colour = gender)) +
scale_colour_manual(values = c("tomato","blue")) +
scale_fill_manual(values = c("tomato","blue")) +
theme_bw()
Created on 2021-09-19 by the reprex package (v2.0.1)
Note that the scales for colour and fill get merged automatically if the same variable is mapped to both aesthetics.
It seems to me that what you really want to do is use ggplot2::stat_smooth instead of trying to predict yourself.
Borrowing the data from #scrameri:
ggplot(data, aes(x = x, y = tc, color = gender)) +
geom_point() +
stat_smooth(aes(linetype = "X^2"), method = 'lm',formula = y~x + I(x^2)) +
stat_smooth(aes(linetype = "X^3"), method = 'lm',formula = y~x + I(x^2) + I(x^3)) +
scale_color_manual(values = c("darkseagreen","lightskyblue"))
I have plotted a boxplot+points. I want to add colors to the points. The position_jitterdodge worked fine without color as shown in Figure B, the points are close, which is I intended to do. But when I try to add colors to the points, the jitter.width parameter doesn't work any more (Figure A). The points are too far apart. I tried different numbers for jitter.width, not working. How do I solve this problem?
library(tidyverse)
library(ggpubr)
mtcars$cyl <- factor(mtcars$cyl)
p1 <- mtcars %>% ggplot(aes(x = cyl, y = mpg, fill = cyl)) +
geom_boxplot() +
geom_point(position = position_jitterdodge(jitter.width = 0.2),
aes(color = factor(wt)), show.legend = FALSE)
p2 <- mtcars %>% ggplot(aes(x = cyl, y = mpg, fill = cyl)) +
geom_boxplot() +
geom_point(position = position_jitterdodge(jitter.width = 0.2))
ggarrange(p1, p2, labels = c("A", "B"))
In p1, the points are not only jittered, they are also dodged by factor(wt). If you only want jitter, set dodge.width = 0 in position_jitterdodge.
It looks like the problem is that the points have a discrete color aesthetic, but no group aesthetic. If you want to keep coloring by a discrete variable, add group = cyl to the aesthetics for the geom_point layer. If you're plotting with another dataset, the grouping variable would be the same variable you plot along the x axis.
One catch: you have to increase the jitter.width when you apply grouping for it to be visible. I had to dial it up from 0.2 to 3 here.
Another option would be to color by a continuous variable.
library(tidyverse)
library(ggpubr)
mtcars$cyl=factor(mtcars$cyl)
p3=mtcars %>% ggplot(aes(x=cyl, y=mpg, fill=cyl))+
geom_boxplot()+
geom_point(aes(color = factor(wt), group = cyl),
position=position_jitterdodge(jitter.width=0.2),
show.legend = F)
p4=mtcars %>% ggplot(aes(x=cyl, y=mpg, fill=cyl))+
geom_boxplot()+
geom_point(aes(color = wt),
position=position_jitterdodge(jitter.width=0.2),
show.legend = F)
ggarrange(p3, p4)
This will render inline eventually, but for now a link: color_and_jitter
A very similar question to the one asked here. However, in that situation the fill parameter for the two plots are different. For my situation the fill parameter is the same for both plots, but I want different color schemes.
I would like to manually change the color in the boxplots and the scatter plots (for example making the boxes white and the points colored).
Example:
require(dplyr)
require(ggplot2)
n<-4*3*10
myvalues<- rexp((n))
days <- ntile(rexp(n),4)
doses <- ntile(rexp(n), 3)
test <- data.frame(values =myvalues,
day = factor(days, levels = unique(days)),
dose = factor(doses, levels = unique(doses)))
p<- ggplot(data = test, aes(x = day, y = values)) +
geom_boxplot( aes(fill = dose))+
geom_point( aes(fill = dose), alpha = 0.4,
position = position_jitterdodge())
produces a plot like this:
Using 'scale_fill_manual()' overwrites the aesthetic on both the boxplot and the scatterplot.
I have found a hack by adding 'colour' to geom_point and then when I use scale_fill_manual() the scatter point colors are not changed:
p<- ggplot(data = test, aes(x = day, y = values)) +
geom_boxplot(aes(fill = dose), outlier.shape = NA)+
geom_point(aes(fill = dose, colour = factor(test$dose)),
position = position_jitterdodge(jitter.width = 0.1))+
scale_fill_manual(values = c('white', 'white', 'white'))
Are there more efficient ways of getting the same result?
You can use group to set the different boxplots. No need to set the fill and then overwrite it:
ggplot(data = test, aes(x = day, y = values)) +
geom_boxplot(aes(group = interaction(day, dose)), outlier.shape = NA)+
geom_point(aes(fill = dose, colour = dose),
position = position_jitterdodge(jitter.width = 0.1))
And you should never use data$column inside aes - just use the bare column. Using data$column will work in simple cases, but will break whenever there are stat layers or facets.
So a handful of posts already address how to remove unwanted legends in ggplot.
The wonderful answer posted to "Remove extra legends in ggplot2"
suggests:
For any mapped variable you can supress the appearance of a legend by using guide = 'none' in the appropriate scale_...
However, I'm having problems with unwanted legends being created by adding the group aesthetic.
I tried the scale approach, but it doesn't seem to work with the group argument:
could not find function "scale_group"
A search here didn't provide any insight on the proper function call to modify group aesthetics either.
User #joran provided the following insight in the linked post above:
That's because the group aesthetic doesn't generate any scales or guides on its own. It's always sort of modifying something else. You'll never get a legend for the group aesthetic.
Example
So I could just add show.legend = FALSE to my function call containing group to remove any legend for that function, but this doesn't work out if I want some other portion (i.e., aesthetic) of that call to be included in the legend.
#Set Up Example:
library(lme4)
library(ggplot2)
mod <- lmer(mpg ~ hp + (1 |cyl), data = mtcars)
pred <- predict(mod,re.form = NA)
pdat <- data.frame(mtcars[,c('hp','cyl')], mpg = pred, up = pred+1, low = pred-1)
Adding show.legend = F to function calls work as expected:
gp <-
ggplot(data = mtcars, aes(x = hp, y = mpg, color = cyl, group = cyl), show.legend = F) +
geom_point(aes(group = cyl),show.legend = F) +
facet_wrap(~cyl) +
geom_line(data = pdat, aes(group = cyl),show.legend = F, color = 'orange')
But when I want to add a legend for a geom_ribbon fill based on the same group (and therefore cannot use the show.legend = F argument), I get a legend for my group again...
gp + geom_ribbon(data = pdat, aes(ymin = low, ymax = up, group = cyl, fill = 'mod'), alpha = 0.3) +
scale_fill_manual(values=c("orange"), name="model")
The outputs:
The last geom cover the first geom, you can try this
ggplot(data = mtcars, aes(x = hp, y = mpg, color = cyl, group = cyl), show.legend = F) +
geom_point(aes(group = cyl)) +
facet_wrap(~cyl) +
geom_line(data = pdat, aes(group = cyl),color = 'orange')
gp + geom_ribbon(data = pdat, aes(ymin = low, ymax = up, group = cyl, fill = 'mod'), alpha = 0.3) +
scale_fill_manual(values=c("orange"), name="model")+
scale_color_continuous(guide ='none')
By the way, my idea comes from the link you posted at the top.
I plotted two ggplots from two different datasets in one single plot. plots are simple linear regression. I want to add legend both for lines and dots in the plot with different colours. How can I do that? The code I used for plot is as below. But, I failed to add a desirable legend to that.
ggplot() +
geom_point(aes(x = Time_1, y = value1)) +
geom_point(aes(x = Time_2, y = value2)) +
geom_line(aes(x = Time_1, y = predict(reg, newdata = dataset)))+
geom_line(aes(x = Time_Month.x, y = predict(regressor, newdata = training_set)))+
ggtitle('Two plots in a single plot')
ggplot2 adds legends automatically if it has groups within the data. Your original code provides the minimum amount of information to ggplot(), basically enough for it to work but not enough to create a legend.
Since your data comes from two different objects due to the two different regressions, then it looks like all you need in this case is to add the 'color = "INSERT COLOR NAME"' argument to each geom_point() and each geom_line(). Using R's built-in mtcars data set for example, what you have is similar to
ggplot(mtcars) + geom_point(aes(x = cyl, y = mpg)) + geom_point(aes(x = cyl, y = wt)) + ggtitle("Example Graph")
Graph without Legend
And what you want can be obtained by using something similar to,
ggplot(mtcars) + geom_point(aes(x = cyl, y = mpg, color = "blue")) + geom_point(aes(x = cyl, y = wt, color = "green")) + ggtitle("Example Graph")
Graph with Legend
Which would seem to translate to
ggplot() +
geom_point(aes(x = Time_1, y = value1, color = "blue")) +
geom_point(aes(x = Time_2, y = value2, color = "green")) +
geom_line(aes(x = Time_1, y = predict(reg, newdata = dataset), color = "red"))+
geom_line(aes(x = Time_Month.x, y = predict(regressor, newdata = training_set), color = "yellow"))+
ggtitle('Two plots in a single plot')
You could also use the size, shape, or alpha arguments inside of aes() to differentiate the different series.