geom_smooth for more than two variables - r

I want to create a colored scatter plot and display a (multiple) linear regression. At the moment my code looks like this (using the cars data-set as an example)
my.formula <- y ~ x
ggplot(mtcars, aes(x=mpg, y=cyl, color=(disp))) +
geom_point() +
geom_smooth(method=lm, se=FALSE) +
ggpmisc::stat_poly_eq(formula = my.formula,
aes(label = paste(..rr.label.., sep = "~~~")),
parse = TRUE) +
scale_colour_gradientn(colours=RColorBrewer::brewer.pal(9,"YlOrRd")) +
theme_bw()
However, I would like to include also the information from the second (coloured) information of the scatter plot in the regression model. Does anybody have a suggestion on how to achieve this?
The idea would be to use a formula like: my.formula <- y ~ x1 + x2 where x1 is mpg and x2 is disp. and to create e.g. a plot with the regression and the corresponding data if possible in 2D (also subplots would be possible to see all information)

You can manually create the plot using stat_function and the fit from your model, well described in this ggiraphExtra vignette. However, that package has a nice wrapper that can do exactly this.
library(ggiraphExtra)
mdl <- lm(data = mtcars, cyl ~ mpg + disp)
ggPredict(mdl)

Related

slope of lines in interaction plot in ggplot2 does not match estimates

I am trying to plot the interaction effects from a multiple linear regression using ggplot2. However, the slope of the lines plotted do not match what they should be based on the estimates returned by the lm function.
Here is my code:
lm.sense <- lm(sense_of_belonging ~ active*mathEAL + MathID + comfort_speaking, data=Data)
library(ggplot2)
p.sense <- ggplot(lm.sense, aes(y=sense_of_belonging, x=active, color=mathEAL)) + geom_smooth(method="lm", se=FALSE)```
Does ggplot not hold the other variables constant?
ggplot2 works with data.frames and doesn't naturally know what to do with an lm object. (Try plot(lm.sense) to see what base R offers here.)
Your ggplot call is using the underlying data from Data (tucked away inside your lm.sense object) to make a plot where x = active and y = sense_of_belonging. It uses that underlying data to do a linear regression that doesn't relate to the mathEAL, MathID, and comfort_speaking variables. Compare these: (they have the same result)
lm.mtcars <- lm(mpg ~ wt + cyl, data = mtcars)
ggplot(lm.mtcars, aes(mpg, wt)) +
geom_point() + geom_smooth(method="lm", se=FALSE)
ggplot(mtcars, aes(mpg, wt)) +
geom_point() + geom_smooth(method="lm", se=FALSE)
Depending on what you want to do, you could show some of the impact of other variables within your geom_smooth by referencing those:
ggplot(mtcars, aes(mpg, wt, color = as.character(cyl))) +
geom_point() + geom_smooth(method="lm", se=FALSE, fullrange = TRUE)
It would help to understand what kind of output you're hoping to generate to give more specific suggestions.

stat_smooth with different colors using geom_point

I want to plot two numeric variables against each other in a scatterplot and the points should have different colors for each category of another binary variable. I also want to have regression lines.
This is my straight forward code:
library(MASS)
library(ggplot2)
ggplot(cats, aes(Bwt, Hwt, color = Sex)) +
geom_point() +
stat_smooth(method = "lm")
However these are lines from two separate regressions.
I want to have the regression lines from the following regression:
lm(Hwt ~ Bwt + Sex, data = cats)
I've tried the following, but this doesn't work:
ggplot(cats, aes(Bwt, Hwt, color = Sex)) +
geom_point() +
stat_smooth(method = "lm", formula = Hwt ~ Bwt + Sex)
Is there an easy (!) way to achieve this?
It would be no problem for me to write a more complex code but that's not what I'm searching for.

Changing axis limits of effects plot in R

I'm using the effects package to plot interaction effects of a linear regression like this:
library(effects)
Model <- lm(drat~hp*cyl, data=mtcars)
plot(effect(term="hp*cyl",mod=Model,default.levels=10),multiline=TRUE)
How do I change the limits so that they go from say 0-10? I've tried with ylim=(0,10), and other variations with no effect. Alternatively, can a regression be plotted in the same way using ggplot2?
And here is the ggplot2 version:
library(effects)
library(ggplot2)
Model <- lm(drat~hp*cyl, data=mtcars)
ef <- effect(term = "hp:cyl", Model, default.levels = 9) # 9 because the breaks are nicer
ef2 <- as.data.frame(ef)
ggplot(ef2, aes(hp, fit, col = factor(cyl))) +
geom_line() +
labs(y = 'drat') +
ylim(0, 10)
With the plot function, set ylim like this
plot(effect(term="hp*cyl",mod=Model,default.levels=10),multiline=TRUE,ylim=c(0,10))
ggplot2 doesn't know how to deal with an data of class 'eff', so you need to convert your effects data into a dataframe before plotting. You can then use group= with your data inside aes() to get lines for each group.
library(effects)
library(ggplot2)
Model <- lm(drat~hp*cyl, data=mtcars)
e<-effect(term="hp*cyl",mod=Model,default.levels=10)
ee<-data.frame(e)
ee$cyl<-factor(ee$cyl)
ggplot(ee, aes(x = hp, y = fit, group = cyl, colour = cyl)) +
geom_line() +
scale_y_continuous(limits = c(0,10))

plot linear regressions lines without interaction in ggplot2

This code plots regression lines with interactions in ggplot2:
library(ggplot2)
ggplot(mtcars, aes(hp, mpg, group = cyl)) + geom_point() + stat_smooth(method = "lm")
Can lines without interactions be plotted with stat_smooth?
Workaround would be to make model outside the ggplot(). Then make predicition for this model and add result to the original data frame. This will add columns fit, lwr and upr.
mod<-lm(mpg~factor(cyl)+hp,data=mtcars)
mtcars<-cbind(mtcars,predict(mod,interval="confidence"))
Now you can use geom_line() with fit values as y to add three regression lines and geom_ribbon() with lwr and upr to add confidence interval.
ggplot(mtcars, aes(hp, mpg, group = cyl)) + geom_point() +
geom_line(aes(y=fit))+geom_ribbon(aes(ymin=lwr,ymax=upr),alpha=0.4)

Reduced major axis line and CI for ggplot in R

Is there anyway to add a reduced major axis line (and ideally CI) to a ggplot? I know I can use method="lm" to get an OLS fit, but there doesn't seem to be a default method for RMA. I can get the RMA coefs and the CI interval from package lmodel2, but adding them with geom_abline() doesn't seem to work. Here's dummy data and code. I just want to replace the OLS line and CI with a RMA line and CI:
dat <- data.frame(a=log10(rnorm(50, 30, 10)), b=log10(rnorm(50, 20, 2)))
ggplot(dat, aes(x=a, y=b) ) +
geom_point(shape=1) +
geom_smooth(method="lm")
Edit1: the code below gets the RMA (here called SMA - standardized major axis) coefs and CIs. Package lmodel2 provides more detailed output, while package smatr returns just the coefs and CIs, if that's any help:
library(lmodel2)
fit1 <- lmodel2(b ~ a, data=dat)
library(smatr)
fit2 <- line.cis(b, a, data=dat)
Building off Joran's answer, I think it's a little easier to pass the whole data frame to geom_abline:
library(ggplot2)
library(lmodel2)
dat <- data.frame(a=log10(rnorm(50, 30, 10)), b=log10(rnorm(50, 20, 2)))
mod <- lmodel2(a ~ b, data=dat,"interval", "interval", 99)
reg <- mod$regression.results
names(reg) <- c("method", "intercept", "slope", "angle", "p-value")
ggplot(dat) +
geom_point(aes(b, a)) +
geom_abline(data = reg, aes(intercept = intercept, slope = slope, colour = method))
As Chase commented, the actual lmodel2() code and the ggplot code you are using would be helpful. But here's an example that may point you in the right direction:
dat <- data.frame(a=log10(rnorm(50, 30, 10)), b=log10(rnorm(50, 20, 2)))
mod <- lmodel2(a ~ b, data=dat,"interval", "interval", 99)
#EDIT: mod is a list, with components (data.frames) regression.results and
# confidence.intervals containing the intercepts+slopes for different
# estimation methods; just put the right values into geom_abline
ggplot(dat,aes(x=b,y=a)) + geom_point() +
geom_abline(intercept=mod$regression.results[4,2],
slope=mod$regression.results[4,3],colour="blue") +
geom_abline(intercept=mod$confidence.intervals[4,2],
slope=mod$confidence.intervals[4,4],colour="red") +
geom_abline(intercept=mod$confidence.intervals[4,3],
slope=mod$confidence.intervals[4,5],colour="red") +
xlim(c(-10,10)) + ylim(c(-10,10))
Full disclosure: I know nothing about RMA regression, so I just plucked out the relevent slopes and intercepts and plopped them into geom_abline(), using some example code from lmodel2 as a guide. The CIs produced in this toy example don't seem to make much sense, since I had to force ggplot to zoom out using xlim() and ylim() in order to see the CI lines (red).
But maybe this will help you construct a working example in ggplot().
EDIT2: With OPs added code to extract the coefficients, the ggplot() would be something like this:
ggplot(dat,aes(x=b,y=a)) + geom_point() +
geom_abline(intercept=fit2[1,1],slope=fit2[2,1],colour="blue") +
geom_abline(intercept=fit2[1,2],slope=fit2[2,2],colour="red") +
geom_abline(intercept=fit2[1,3],slope=fit2[2,3],colour="red")
I found myself in the same situation.
Obtain fitted values and their confidence intervals using the ggpmisc package:
cibrary(ggpmisc)
ci <- predict.lmodel2(fit1, method= 'RMA', interval= "confidence")
Add the model predictions to your data:
datci <- cbind(dat, ci)
Plot using geom_smooth arguments like transparency and line width (of course, you can customize them)
p <- ggplot(datci, aes(x= b, y= a)) + geom_point() + geom_line(aes(x= b, y= a)), lwd= 1.1, alpha= 0.6)
Use geom_ribbon if you want to add confidence intervals:
p + geom_ribbon(aes(ymin= lwr, ymax= upr, fill= feather), alpha= 0.3, color= NA)

Resources