Adding a separate line of regression to ggplot in R - r

Assuming I have a data frame with the following column headings: Height, Weight, Gender.
I am using ggplot to create a scatter graph.
ggplot(df, aes(x = height, y = weight , col = gender)) +
geom_point() +
theme_classic() +
geom_smooth(method = "lm", se = FALSE)
This plots two lines of regression one for male and female. I would like to add another regression for overall comparing height and weight. How do I do this?

You can add another geom_smooth with col set to a static label. This overrides the first aes(col = gender) argument, puts all observations back in one group and gives it the label you want to use:
library(ggplot2)
ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, col = Species)) +
geom_point() +
theme_classic() +
geom_smooth(method = "lm", se = FALSE) +
geom_smooth(aes(col = "Overall"), method = "lm", se = FALSE)
#> `geom_smooth()` using formula 'y ~ x'
#> `geom_smooth()` using formula 'y ~ x'
Created on 2020-12-11 by the reprex package (v0.3.0)
Edit: Following up questions below, this also works with scale_colour_lancet:
library(ggplot2)
library(dplyr)
iris %>%
modelr::add_predictions(model = lm(Sepal.Length ~ Sepal.Width + Species,
data = iris)) %>%
ggplot(aes(x = Sepal.Width, y = Sepal.Length, col = Species)) +
geom_point() +
theme_classic() +
geom_line(aes(y = pred)) +
geom_smooth(aes(col = "Overall"), method = "lm", se = FALSE) +
ggsci::scale_color_lancet()
#> `geom_smooth()` using formula 'y ~ x'
Created on 2020-12-12 by the reprex package (v0.3.0)
The geom_smooth used above does lm(y ~ x) for each grouping automatically, in this case giving the equivalent lines for lm(Sepal.Length ~ Sepal.Width*Species). To get lm(Sepal.Length ~ Sepal.Width + Species) a simple way would be to use modelr::add_predictions() to create a pred variable for y values before passing into ggplot.

Related

How to add legend into ggplot with both by-group and combined effects?

I'm trying to create a plot that has an overall effect from the regression as well as by-group effects for the same regression. As an example, I have used this plot for the mtcars dataset:
#### Group x Total Effect Plot ####
mtcars %>%
ggplot(aes(x=disp,
y=wt))+
geom_point()+
geom_smooth(method = "lm",
se=F,
color = "gray",
aes(group=factor(am)))+
geom_smooth(method = "lm",
se=F)
Which looks like this:
However, I'd like to add a legend to the plot which is normal for by-factor aes functions in R, but I'm unsure of how to do this with the given example, as the total effect gets lost in the legend when I try:
#### Group by Total Effect Plot ####
mtcars %>%
ggplot(aes(x=disp,
y=wt))+
geom_point()+
geom_smooth(method = "lm",
se=F,
aes(color=factor(am)))+
geom_smooth(method = "lm",
se=F)
Is there a way I can artificially add the legend in some way? Or is there some workaround I'm not considering? My desired result is below:
You can do something like this:
library(ggplot2)
mtcars |>
ggplot(aes(x = disp, y = wt)) +
geom_point() +
geom_smooth(
aes(group = factor(am), colour = "Automatic/Manual Transmission"),
method = "lm",
se = FALSE
) +
geom_smooth(
aes(colour = "Total Effect"),
method = "lm",
se = FALSE
) +
scale_colour_manual(
values = c(
"Automatic/Manual Transmission" = "grey",
"Total Effect" = "blue"
)
) +
labs(colour = "Legend")
#> `geom_smooth()` using formula 'y ~ x'
#> `geom_smooth()` using formula 'y ~ x'
Created on 2022-10-18 with reprex v2.0.2

Plotting in layers in R

I'm trying to plot individual regression lines for all of my experimental subjects (n=40) on the same plot where I show the overall regression line.
I can do the plots separately with ggplot, but I haven't found a way to superpose them on the same graph.
I can illustrate what I did with the iris data frame:
#first plot
ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length)) +
geom_point() +
stat_smooth(method = lm, se = FALSE) +
theme_classic()
# second plot, grouped by species
ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, colour =Species)) +
geom_point() +
stat_smooth(method = lm, se = FALSE) +
theme_classic()
# and I've been trying things like this:
ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length)) +
geom_point() +
stat_smooth(method = lm, se = FALSE) +
theme_classic() +
geom_point(aes(x = Sepal.Width, y = Sepal.Length, colour =Species))) +
stat_smooth(method = lm, se = FALSE) +
theme_classic()
which returns the message "Error: Cannot add ggproto objects together. Did you forget to add this object to a ggplot object?", so I get that this is not the right way to combine them, but what is?
How can I combine both graphs in one?
Thanks in advance!
Repeat the whole data and set Species to be something else ("Together") in example below. Attach the repeated data to the original data and just call the second plot.
d1 = iris
d2 = rbind(d1, transform(d1, Species = "Together"))
ggplot(d2, aes(x = Sepal.Width, y = Sepal.Length, colour =Species)) +
stat_smooth(method = lm, se = FALSE) +
geom_point(data = d1) +
theme_classic()
Similar to #d.b's answer, consider expanding the data frame with rbind, assigning an "All" category for Species and adjust for factor levels (so All shows at top on legend):
new_species_level <- c("All", unique(as.character(iris$Species)))
iris_expanded <- rbind(transform(iris, Species=factor("All", levels=new_species_level)),
transform(iris, Species=factor(Species, levels=new_species_level)))
ggplot(iris_expanded, aes(x=Sepal.Width, y=Sepal.Length, colour=Species)) +
geom_point() +
stat_smooth(method = lm, se = FALSE) +
theme_classic()

Scatter plot with horizontal lines representing averages with R and ggplot

The below code produces a scatter plot with regression lines for each group. Instead of the sloped regression lines is it possible to plot horizontal lines that represent the average of each group's y values? I tried modifying the formula parameter to "y ~ 0 *x" but can't think of anything else that's obvious to use.
Thanks
ggplot(data = iris, aes(y = Sepal.Length, x = Sepal.Width, colour = Species)) + geom_point() +
geom_smooth(method = 'lm', formula = y ~ x , se = F)
We can specify the formula as y ~ 1.
library(ggplot2)
ggplot(data = iris, aes(y = Sepal.Length, x = Sepal.Width, colour = Species)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ 1)

ggplot error: Found object is not a stat

ggplot() +
geom_point(aes(x = Africa_set$Africa_Predict, y = Africa_set$Africa_Real), color ="red") +
geom_line(aes(x = Africa_set$Africa_Predict, y = predict(simplelm, newdata = Africa_set)),color="blue") +
labs(title = "Africa Population",fill="") +
xlab("Africa_set$Africa_Predict") +
ylab("Africa_set$Africa_Real")
Then show the error message:
Error: Found object is not a stat
How can fix this error?
It looks like you are trying to plot points with a fitted regression line on top. You can do this using:
library(ggplot2)
ggplot(iris, aes(Petal.Length, Petal.Width)) +
geom_point() +
geom_smooth(method = "lm")
Or, if you really do want to use the model you've stored ahead of time in a simplelm object like you have in your example, you could use augment from the broom package:
library(ggplot2)
library(broom)
simplelm <- lm(Petal.Width ~ Petal.Length, data = iris)
ggplot(data = augment(simplelm),
aes(Petal.Length, Petal.Width)) +
geom_point() +
geom_line(aes(Petal.Length, .fitted), color = "blue")

multiple log regression models ggplot2

In the following example, I follow on from the following link, in which we learn the basic to creating a log regression model.
data(mtcars)
dat <- subset(mtcars, select=c(mpg, am, vs))
logr_vm <- glm(vs ~ mpg, data=dat, family=binomial)
library(ggplot2)
ggplot(dat, aes(x=mpg, y=vs)) + geom_point() +
stat_smooth(method="glm", method.args=list(family="binomial"), se=T) +
theme_bw()
Now I want to create a second log model where we predict a new outcome vs2.
How can I use ggplot2 to show the two models with different colours?
dat$vs2 <- with(dat, ifelse(mpg > 20, 1, vs))
so that the secondary log model is ....
logr_vm2 <- glm(vs2 ~ mpg, data=dat, family=binomial)
When fitting the models with ggplot itself, and you only have a few models, you can easily add a legend by manually mapping a model name to colors inside aes. The rest will then be taken care of.
Addionally, I use geom_count instead of geom_point to show that we have overlapping values here, and add some colors to show your different categories:
ggplot(dat, aes(x = mpg)) +
geom_count(aes(y = vs, col = mpg > 20), alpha = 0.3) +
stat_smooth(aes(y = vs, fill = 'm1'), col = 'black',
method = "glm", method.args = list(family = "binomial")) +
stat_smooth(aes(y = vs2, fill = 'm2'), col = 'black',
method = "glm", method.args = list(family = "binomial")) +
scale_size_area() +
scale_color_discrete(h.start = 90) +
theme_bw()

Resources