I have a model:
lm(Y ~ A + B + X) where A + B are covariates.
I have plotted the raw data using:
ggplot(data = data, aes(x=X, y=Y) + geom_point() + geom_smooth(method = "lm", se= FALSE)
I would like to plot the data such that the A + B covariates have been regressed out. Is there a way to do this?
Here is an example with one covariate. I leave it to you as an exercise to create a nice visualization with two covariates. I would probably use facets to illustrate this for different (constant) values of the second covariate.
fit <- lm(mpg ~ I(1/hp) + wt, data = mtcars)
summary(fit)
newdata <- expand.grid(hp = seq(50, 350, by = 1),
wt = 2:5)
newdata$mpg <- predict(fit, newdata = newdata)
library(ggplot2)
ggplot(mtcars, aes(x = hp, y = mpg, color = wt)) +
geom_point() +
geom_line(data = newdata, aes(group = wt))
Related
I like the neatness of using facet_wrap() or facet_grid() with ggplot since the plots are all made to be the same size and are fitted row and column wise automatically.
I have a data frame and I am experimenting with various transformations and their impact on fit as measured by R2
dm1 <- lm(price ~ x, data = diamonds)
dm1R2 <- summary(dm1)$r.squared #0.78
dm2 <- lm(log(price) ~ x, data = diamonds)
dm2R2 <- summary(dm2)$r.squared # 0.9177831
dm3 <- lm(log(price) ~ x^2, data = diamonds)
dm3R2 <- summary(dm3)$r.squared # also 0.9177831. Aside, why?
ggplot(diamonds, aes(x = x, y = price)) +
geom_point() +
geom_smooth(method = "lm", se = F) +
geom_text(x = 3.5, y = 10000, label = paste0('R-Squared: ', round(dm1R2, 3)))
ggplot(diamonds, aes(x = x, y = log(price))) +
geom_point() +
geom_smooth(method = "lm", se = F) +
geom_text(x = 3, y = 9, label = paste0('R-Squared: ', round(dm2R2, 3)))
ggplot(diamonds, aes(x = x^2, y = log(price))) +
geom_point() +
geom_smooth(method = "lm", se = F) +
geom_text(x = 3, y = 20, label = paste0('R-Squared: ', round(dm3R2, 3)))
This produces 3 completely separate plots. Within Rmd file they will appear one after the other.
Is there a way to add them to a grid like when using facet_wrap?
You can use ggplot2's built-in faceting if you generate a "long" data frame from the regression model objects. The model object returned by lm includes the data used to fit the model, so we can extract the data and the r-squared for each model, stack them into a single data frame, and generate a faceted plot.
The disadvantage of this approach is that you lose the ability to easily set separate x-axis and y-axis titles for each panel, which is important, because the x and y values have different transformations in different panels. In an effort to mitigate that problem, I've used the model formulas as the facet labels.
Also, the reason you got the same r-squared for the models specified by log(price) ~ x and log(price) ~ x^2 is that R treats them as the same model. To tell R that you literally mean x^2 in a model formula, you need to wrap it in the I() function, making the formula log(price) ~ I(x^2). You could also do log(price) ~ poly(x, 2, raw=TRUE).
library(tidyverse)
theme_set(theme_bw(base_size=14))
# Generate a small subset of the diamonds data frame
set.seed(2)
dsub = diamonds[sample(1:nrow(diamonds), 2000), ]
dm1 <- lm(price ~ x, data = dsub)
dm2 <- lm(log(price) ~ x, data = dsub)
dm3 <- lm(log(price) ~ I(x^2), data = dsub)
# Create long data frame from the three model objects
dat = list(dm1, dm2, dm3) %>%
map_df(function(m) {
tibble(r2=summary(m)$r.squared,
form=as_label(formula(m))) %>%
cbind(m[["model"]] %>% set_names(c("price","x")))
}, .id="Model") %>%
mutate(form=factor(form, levels=unique(form)))
# Create data subset for geom_text
text.dat = dat %>% group_by(form) %>%
summarise(x = quantile(x, 1),
price = quantile(price, 0.05),
r2=r2[1])
dat %>%
ggplot(aes(x, price)) +
geom_point(alpha=0.3, colour="red") +
geom_smooth(method="lm") +
geom_text(data=text.dat, parse=TRUE,
aes(label=paste0("r^2 ==", round(r2, 2))),
hjust=1, size=3.5, colour="grey30") +
facet_wrap(~ form, scales="free")
ggarrange from the ggpubr package can do this:
p1 = ggplot(diamonds, aes(x = x, y = price)) +
geom_point() +
geom_smooth(method = "lm", se = F) +
geom_text(x = 3.5, y = 10000, label = paste0('R-Squared: ', round(dm1R2, 3)))
p2 = ggplot(diamonds, aes(x = x, y = log(price))) +
geom_point() +
geom_smooth(method = "lm", se = F) +
geom_text(x = 3, y = 9, label = paste0('R-Squared: ', round(dm2R2, 3)))
p3 = ggplot(diamonds, aes(x = x^2, y = log(price))) +
geom_point() +
geom_smooth(method = "lm", se = F) +
geom_text(x = 3, y = 20, label = paste0('R-Squared: ', round(dm3R2, 3)))
ggpubr::ggarrange(p1, p2, p3, ncol = 2, nrow = 2, align = "hv")
Other packages that have been suggested in the comments like cowplot and patchwork also offer good options for this.
I'm trying to plot individual regression lines for all of my experimental subjects (n=40) on the same plot where I show the overall regression line.
I can do the plots separately with ggplot, but I haven't found a way to superpose them on the same graph.
I can illustrate what I did with the iris data frame:
#first plot
ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length)) +
geom_point() +
stat_smooth(method = lm, se = FALSE) +
theme_classic()
# second plot, grouped by species
ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, colour =Species)) +
geom_point() +
stat_smooth(method = lm, se = FALSE) +
theme_classic()
# and I've been trying things like this:
ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length)) +
geom_point() +
stat_smooth(method = lm, se = FALSE) +
theme_classic() +
geom_point(aes(x = Sepal.Width, y = Sepal.Length, colour =Species))) +
stat_smooth(method = lm, se = FALSE) +
theme_classic()
which returns the message "Error: Cannot add ggproto objects together. Did you forget to add this object to a ggplot object?", so I get that this is not the right way to combine them, but what is?
How can I combine both graphs in one?
Thanks in advance!
Repeat the whole data and set Species to be something else ("Together") in example below. Attach the repeated data to the original data and just call the second plot.
d1 = iris
d2 = rbind(d1, transform(d1, Species = "Together"))
ggplot(d2, aes(x = Sepal.Width, y = Sepal.Length, colour =Species)) +
stat_smooth(method = lm, se = FALSE) +
geom_point(data = d1) +
theme_classic()
Similar to #d.b's answer, consider expanding the data frame with rbind, assigning an "All" category for Species and adjust for factor levels (so All shows at top on legend):
new_species_level <- c("All", unique(as.character(iris$Species)))
iris_expanded <- rbind(transform(iris, Species=factor("All", levels=new_species_level)),
transform(iris, Species=factor(Species, levels=new_species_level)))
ggplot(iris_expanded, aes(x=Sepal.Width, y=Sepal.Length, colour=Species)) +
geom_point() +
stat_smooth(method = lm, se = FALSE) +
theme_classic()
I have a data set (dat), with raw data (raw_x and raw_y). I have predicted a model and the predictions from the model are stored in dat$predict.
I wish to plot the raw data but overlay the data with a geom_smooth (here a quadratic function) but using the predicted data. This is my attempt at the basic code. I am not sure how to use predicted values in the geom_smooth yet.
ggplot(dat, aes(x = raw_x, y = raw_y, colours = "red")) +
geom_point() +
theme_bw() +
geom_smooth(method = "lm", formula = y ~ x + I(x^2))
The following plots the original points, the linear fit line and the fitted points. I use made up data since you have posted none.
set.seed(1234)
x <- cumsum(rnorm(100))
y <- x + x^2 + rnorm(100, sd = 50)
dat <- data.frame(raw_x = x, raw_y = y)
fit <- lm(y ~ x + I(x^2), dat)
dat$predict <- predict(fit)
ggplot(dat, aes(x = raw_x, y = raw_y)) +
geom_point(colour = "blue") +
theme_bw() +
geom_smooth(method = "lm", formula = y ~ x + I(x^2), colour = "red") +
geom_point(aes(y = predict), colour = "black")
The below code produces a scatter plot with regression lines for each group. Instead of the sloped regression lines is it possible to plot horizontal lines that represent the average of each group's y values? I tried modifying the formula parameter to "y ~ 0 *x" but can't think of anything else that's obvious to use.
Thanks
ggplot(data = iris, aes(y = Sepal.Length, x = Sepal.Width, colour = Species)) + geom_point() +
geom_smooth(method = 'lm', formula = y ~ x , se = F)
We can specify the formula as y ~ 1.
library(ggplot2)
ggplot(data = iris, aes(y = Sepal.Length, x = Sepal.Width, colour = Species)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ 1)
In the following example, I follow on from the following link, in which we learn the basic to creating a log regression model.
data(mtcars)
dat <- subset(mtcars, select=c(mpg, am, vs))
logr_vm <- glm(vs ~ mpg, data=dat, family=binomial)
library(ggplot2)
ggplot(dat, aes(x=mpg, y=vs)) + geom_point() +
stat_smooth(method="glm", method.args=list(family="binomial"), se=T) +
theme_bw()
Now I want to create a second log model where we predict a new outcome vs2.
How can I use ggplot2 to show the two models with different colours?
dat$vs2 <- with(dat, ifelse(mpg > 20, 1, vs))
so that the secondary log model is ....
logr_vm2 <- glm(vs2 ~ mpg, data=dat, family=binomial)
When fitting the models with ggplot itself, and you only have a few models, you can easily add a legend by manually mapping a model name to colors inside aes. The rest will then be taken care of.
Addionally, I use geom_count instead of geom_point to show that we have overlapping values here, and add some colors to show your different categories:
ggplot(dat, aes(x = mpg)) +
geom_count(aes(y = vs, col = mpg > 20), alpha = 0.3) +
stat_smooth(aes(y = vs, fill = 'm1'), col = 'black',
method = "glm", method.args = list(family = "binomial")) +
stat_smooth(aes(y = vs2, fill = 'm2'), col = 'black',
method = "glm", method.args = list(family = "binomial")) +
scale_size_area() +
scale_color_discrete(h.start = 90) +
theme_bw()