Adding legend to a single plot of multiple linear regression plot - r

I plotted two ggplots from two different datasets in one single plot. plots are simple linear regression. I want to add legend both for lines and dots in the plot with different colours. How can I do that? The code I used for plot is as below. But, I failed to add a desirable legend to that.
ggplot() +
geom_point(aes(x = Time_1, y = value1)) +
geom_point(aes(x = Time_2, y = value2)) +
geom_line(aes(x = Time_1, y = predict(reg, newdata = dataset)))+
geom_line(aes(x = Time_Month.x, y = predict(regressor, newdata = training_set)))+
ggtitle('Two plots in a single plot')

ggplot2 adds legends automatically if it has groups within the data. Your original code provides the minimum amount of information to ggplot(), basically enough for it to work but not enough to create a legend.
Since your data comes from two different objects due to the two different regressions, then it looks like all you need in this case is to add the 'color = "INSERT COLOR NAME"' argument to each geom_point() and each geom_line(). Using R's built-in mtcars data set for example, what you have is similar to
ggplot(mtcars) + geom_point(aes(x = cyl, y = mpg)) + geom_point(aes(x = cyl, y = wt)) + ggtitle("Example Graph")
Graph without Legend
And what you want can be obtained by using something similar to,
ggplot(mtcars) + geom_point(aes(x = cyl, y = mpg, color = "blue")) + geom_point(aes(x = cyl, y = wt, color = "green")) + ggtitle("Example Graph")
Graph with Legend
Which would seem to translate to
ggplot() +
geom_point(aes(x = Time_1, y = value1, color = "blue")) +
geom_point(aes(x = Time_2, y = value2, color = "green")) +
geom_line(aes(x = Time_1, y = predict(reg, newdata = dataset), color = "red"))+
geom_line(aes(x = Time_Month.x, y = predict(regressor, newdata = training_set), color = "yellow"))+
ggtitle('Two plots in a single plot')
You could also use the size, shape, or alpha arguments inside of aes() to differentiate the different series.

Related

How to assign colors to multicolor scatter plot with multicolor fitted lines in ggplot2

Problem
I have some data points stored in data.frame with three variables, x, y, and gender. My goal is to draw several generally fitted lines and also lines specifically fitted for male/female over the scatter plot, with points coloured by gender. It sounds easy but some issues just persist.
What I currently do is to use a new set of x's and predict y's for every model, combine the fitted lines together in a data.frame, and then convert wide to long, with their model name as the third var (from this post: ggplot2: how to add the legend for a line added to a scatter plot? and this: Add legend to ggplot2 line plot I learnt that mapping should be used instead of setting colours/legends separately). However, while I can get a multicolor line plot, the points come without specific colour for gender (already a factor) as I expected from the posts I referenced.
I also know it might be possible to use aes=(y=predict(model)), but I met other problems for this. I also tried to colour the points directly in aes, and assign colours separately for each line, but the legend cannot be generated unless I use lty, which makes legend in the same colour.
Would appreciate any idea, and also welcome to change the whole method.
Code
Note that two pairs of lines overlap. So it only appeared to be two lines. I guess adding some jitter in the data might make it look differently.
slrmen<-lm(tc~x+I(x^2),data=data[data['gender']==0,])
slrwomen<-lm(tc~x+I(x^2),data=data[data['gender']==1,])
prdf <- data.frame(x = seq(from = range(data$x)[1],
to = range(data$x)[2], length.out = 100),
gender = as.factor(rep(1,100)))
prdm <- data.frame(x = seq(from = range(data$x)[1],
to = range(data$x)[2], length.out = 100),
gender = as.factor(rep(0,100)))
prdf$fit <- predict(fullmodel, newdata = prdf)
prdm$fit <- predict(fullmodel, newdata = prdm)
rawplotdata<-data.frame(x=prdf$x, fullf=prdf$fit, fullm=prdm$fit,
linf=predict(slrwomen, newdata = prdf),
linm=predict(slrmen, newdata = prdm))
plotdata<-reshape2::melt(rawplotdata,id.vars="x",
measure.vars=c("fullf","fullm","linf","linm"),
variable.name="fitmethod", value.name="y")
plotdata$fitmethod<-as.factor(plotdata$fitmethod)
plt <- ggplot() +
geom_line(data = plotdata, aes(x = x, y = y, group = fitmethod,
colour=fitmethod)) +
scale_colour_manual(name = "Fit Methods",
values = c("fullf" = "lightskyblue",
"linf" = "cornflowerblue",
"fullm"="darkseagreen", "linm" = "olivedrab")) +
geom_point(data = data, aes(x = x, y = y, fill = gender)) +
scale_fill_manual(values=c("blue","green")) ## This does not work as I expected...
show(plt)
Code for another method (omitted two lines), which generates same-colour legend and multi-color plot:
ggplot(data = prdf, aes(x = x, y = fit)) + # prdf and prdm are just data frames containing the x's and fitted values for different models
geom_line(aes(lty="Female"),colour = "chocolate") +
geom_line(data = prdm, aes(x = x, y = fit, lty="Male"), colour = "darkblue") +
geom_point(data = data, aes(x = x, y = y, colour = gender)) +
scale_colour_discrete(name="Gender", breaks=c(0,1),
labels=c("Male","Female"))
This is related to using the colour aesthetic for lines and the fill aesthetics for points in your own (first) example. In the second example, it works because the colour aesthetic is used for lines and points.
By default, geom_point can not map a variable to fill, because the default point shape (19) doesn't have a fill.
For fill to work on points, you have to specify shape = 21:25 in geom_point(), outside of aes().
Perhaps this small reproducible example helps to illustrate the point:
Simulate data
set.seed(4821)
x1 <- rnorm(100, mean = 5)
set.seed(4821)
x2 <- rnorm(100, mean = 6)
data <- data.frame(x = rep(seq(20,80,length.out = 100),2),
tc = c(x1, x2),
gender = factor(c(rep("Female", 100), rep("Male", 100))))
Fit models
slrmen <-lm(tc~x+I(x^2), data = data[data["gender"]=="Male",])
slrwomen <-lm(tc~x+I(x^2),data = data[data["gender"]=="Female",])
newdat <- data.frame(x = seq(20,80,length.out = 200))
fitted.male <- data.frame(x = newdat,
gender = "Male",
tc = predict(object = slrmen, newdata = newdat))
fitted.female <- data.frame(x = newdat,
gender = "Female",
tc = predict(object = slrwomen, newdata = newdat))
Plot using colour aesthetics
Use the colour aesthetics for both points and lines (specify in ggplot such that it gets inherited throughout). By default, geom_point can map a variable to colour.
library(ggplot2)
ggplot(data, aes(x = x, y = tc, colour = gender)) +
geom_point() +
geom_line(data = fitted.male) +
geom_line(data = fitted.female) +
scale_colour_manual(values = c("tomato","blue")) +
theme_bw()
Plot using colour and fill aesthetics
Use the fill aesthetics for points and the colour aesthetics for lines (specify aesthetics in geom_* to prevent them being inherited). This will reproduce the problem.
ggplot(data, aes(x = x, y = tc)) +
geom_point(aes(fill = gender)) +
geom_line(data = fitted.male, aes(colour = gender)) +
geom_line(data = fitted.female, aes(colour = gender)) +
scale_colour_manual(values = c("tomato","blue")) +
scale_fill_manual(values = c("tomato","blue")) +
theme_bw()
To fix this, change the shape argument in geom_point to a point shape that can be filled (21:25).
ggplot(data, aes(x = x, y = tc)) +
geom_point(aes(fill = gender), shape = 21) +
geom_line(data = fitted.male, aes(colour = gender)) +
geom_line(data = fitted.female, aes(colour = gender)) +
scale_colour_manual(values = c("tomato","blue")) +
scale_fill_manual(values = c("tomato","blue")) +
theme_bw()
Created on 2021-09-19 by the reprex package (v2.0.1)
Note that the scales for colour and fill get merged automatically if the same variable is mapped to both aesthetics.
It seems to me that what you really want to do is use ggplot2::stat_smooth instead of trying to predict yourself.
Borrowing the data from #scrameri:
ggplot(data, aes(x = x, y = tc, color = gender)) +
geom_point() +
stat_smooth(aes(linetype = "X^2"), method = 'lm',formula = y~x + I(x^2)) +
stat_smooth(aes(linetype = "X^3"), method = 'lm',formula = y~x + I(x^2) + I(x^3)) +
scale_color_manual(values = c("darkseagreen","lightskyblue"))

how to add legends ggplot in charts facet_warp() in R?

I maked the chart in R using gglopt() and facet_warp(), but do not appear legends of geom_lines() and stat_smooth().
my code exemple is:
p <- ggplot(data = mtcars, aes(x = hp, y = disp)) +
geom_line(color="red")+
facet_wrap(~cyl)+
stat_smooth()+
guides()
how to add legends in the chart final?
You can add the labels for color aesthetics for each plot and link the color using named vectors in values parameter of scale_color_manual().
ggplot(data = mtcars, aes(x = hp, y = disp)) +
geom_line(aes(color = "line.color")) +
stat_smooth(aes(color = "smooth.color")) +
facet_wrap(~cyl) +
scale_color_manual(name = "", values = c("line.color" = "red", "smooth.color" = "blue"))

How do i manually add a legend to a ggplot and geom_point?

I am plotting 2 sets of data on the same plot using ggplot. I have specified the colour for each data set, but there is no legend that comes out when the dot plot is generated.
What can i do to manually add a legend?
# Create an index to hold values of m from 1 to 100
m_index <- (1:100)
data_frame_50 <- data(prob_max_abs_cor_50)
data_frame_20 <- data.frame(prob_max_abs_cor_20)
library(ggplot2)
plot1 <- ggplot(data_frame_50, mapping = aes(x = m_index,
y = prob_max_abs_cor_50),
colour = 'red') +
geom_point() +
ggplot(data_frame_20, mapping = aes(x = m_index,
y = prob_max_abs_cor_20),
colour = 'blue') +
geom_point()
plot1 + labs(x = " Values of m ",
y = " Maximum Absolute Correlation ",
title = "Dot plot of probability")
First, I would suggest neatening your ggplot code a little. This is equivalent to your posted code;
ggplot() +
geom_point(data = data_frame_50, aes(x = m_index, y = prob_max_abs_cor_50,
colour = 'red')) +
geom_point(data = data_frame_20, aes(x = m_index, y = prob_max_abs_cor_20,
colour = 'blue')) +
labs(x = " Values of m ", y = " Maximum Absolute Correlation ",
title = "Dot plot of probability")
You won't get a legend here, because you are plotting different datasets with only one category in each. You need to have a single dataset with a column grouping your data (i.e. 20 or 50). So using some example data, this is the equivalent of what you are plotting and ggplot won't provide a legend;
ggplot() +
geom_point(data = iris, aes(x = Sepal.Length, y = Petal.Width), colour = 'red') +
geom_point(data = iris, aes(x = Sepal.Length, y = Petal.Length), colour = 'blue')
If you want to colour by category, include a colour argument inside the aes call;
ggplot() +
geom_point(data = iris, aes(x = Sepal.Length, y = Petal.Width,
colour = factor(Species)))
Have a look at the iris dataset to get a sense of how you need to shape your data. It's hard to give precise advice, because you haven't provided an idea of what your data look like, but something like this might work;
df.20 <- data.frame("m" = 1:100, "Group" = 20, "Numbers" = prob_max_abs_cor_20)
df.50 <- data.frame("m" = 1:100, "Group" = 50, "Numbers" = prob_max_abs_cor_50)
df.All <- rbind(df.20, df.50)

ggplot legend: geom_abline interference

Objective: to have a color legend for geom_point showing a colored dot for each group together with a legend for geom_abline showing one colored line for each line.
Am I doing something wrong? Is there a solution?
# data: mtcars + some made-up straight lines
library(ggplot2)
df = data.frame(Intercept = c(2,3,4), Slope = c(0,0,0), Group = factor(c(1, 2, 3)))
Comment about #1: There's nothing special about the basic plot, but I have grouped the data inside the aes() and made color an aes(). I think it is standard to have both "group" and "color" inside the aes to achieve grouping and coloring.
# 1. my basic plot
ggplot(data = mtcars, aes(x = mpg, y = wt, group = vs, color = factor(vs))) +
geom_point() -> p
Comment about #2: Clearly I did not set up ggplot right to handle the legend properly. I also tried to add group = Group inside the aes. But there is a somewhat more serious problem: geom_point forms 2 groups, geom_abline forms 3 groups, but the legend is showing only 4 color/line combinations. One of them has been merged (the green one). What have I done wrong here?
# 2. my naive attempt to add ablines of 3 different colours
p + geom_abline(data = df, aes(intercept = Intercept, slope = Slope,
colour = Group))
Comment about #3: The ablines have been removed in the legend, but the points are still not right. From here on it gets more and more desperate.
# 3. Suppress the ab_line legend?
p + geom_abline(data = df, aes(intercept = Intercept, slope = Slope,
colour = Group), show.legend = FALSE)
Comment about #4: This is what I'm going for at the moment. Better no legend than a wrong legend. Shame about losing the colors though.
# 4. Remove the geom_abline legend AND colors
p + geom_abline(data = df, aes(intercept = Intercept, slope = Slope))
Comment #5: I don't know what I was hoping here... that if I defined the data and aes inside the call to geom_point() rather than the ggplot(), somehow geom_abline()) would not hijack the colors and legend, but no, it does not appear to make a difference.
# 5. An attempt to set the aes inside geom_point() instead of ggplot()
ggplot() +
geom_point(data = mtcars, aes(x = mpg, y = wt, group = vs, color = factor(vs))) +
geom_abline(data = df, aes(intercept = Intercept, slope = Slope, color = "groups")) +
scale_color_manual(values = c("red", "blue", "black"))
One option would be to use a filled shape for the mtcars data, then you can have a fill scale and a colour scale, rather than two colour scales. You could add an option such as colour="white" to the geom_point statement in order to change the colour of the edges of the points, if you don't want the black outlines.
library(ggplot2)
df = data.frame(Intercept = c(2,3,4), Slope = c(0,0,0), Group = factor(c(1, 2, 3)))
ggplot(data = mtcars, aes(x = mpg, y = wt, group = vs, fill = factor(vs))) +
geom_point(shape=21, size=2) +
geom_abline(data = df, aes(intercept = Intercept, slope = Slope,
colour = Group))
if you need or want a horizontal line in the legend you might consider using this code:
library(ggplot2)
df = data.frame(Intercept = c(2,3,4), Slope = c(0,0,0),
Group = factor(c(1, 2, 3)))
ggplot(data = mtcars,
aes(x = mpg, y = wt, group = vs, fill = factor(vs))) +
geom_point(shape=21, size=2) +
geom_hline(data = df,
aes(yintercept = Intercept,colour = Group))
plot with geom_hline

Plot multiple group histogram with overlaid line ggplot

I'm trying to plot a multiple group histogram with overlaid line, but I cannot get the right scaling for the histogram.
For example:
ggplot() + geom_histogram(data=df8,aes(x=log(Y),y=..density..),binwidth=0.15,colour='black') +
geom_line(data = as.data.frame(pdf8), aes(y=pdf8$f,x=pdf8$x), col = "black",size=1)+theme_bw()
produces the right scale. But when I try to perform fill according to groups, each group is scaled separately.
ggplot() + geom_histogram(data=df8,aes(x=log(Y),fill=vec8,y=..density..),binwidth=0.15,colour='black') +
geom_line(data = as.data.frame(pdf8), aes(y=pdf8$f,x=pdf8$x), col = "black",size=1)+theme_bw()
How would I scale it so that a black line is overlaid over the histogram and on the y axis is density?
It is going to be difficult for others to help you without a reproducible example, but perhaps something like this is what you're after:
library(ggplot2)
ggplot(data = mtcars, aes(x = mpg, fill = factor(cyl))) +
geom_histogram(aes(y = ..density..)) +
geom_line(stat = "density")
If you would rather the density line pertain to the entire dataset, you need to move the fill aesthetic into the geom_histogram function:
ggplot(data = mtcars, aes(x = mpg)) +
geom_histogram(aes(y = ..density.., fill = factor(cyl))) +
geom_line(data = mtcars, stat = "density")

Resources