slope of lines in interaction plot in ggplot2 does not match estimates - r

I am trying to plot the interaction effects from a multiple linear regression using ggplot2. However, the slope of the lines plotted do not match what they should be based on the estimates returned by the lm function.
Here is my code:
lm.sense <- lm(sense_of_belonging ~ active*mathEAL + MathID + comfort_speaking, data=Data)
library(ggplot2)
p.sense <- ggplot(lm.sense, aes(y=sense_of_belonging, x=active, color=mathEAL)) + geom_smooth(method="lm", se=FALSE)```
Does ggplot not hold the other variables constant?

ggplot2 works with data.frames and doesn't naturally know what to do with an lm object. (Try plot(lm.sense) to see what base R offers here.)
Your ggplot call is using the underlying data from Data (tucked away inside your lm.sense object) to make a plot where x = active and y = sense_of_belonging. It uses that underlying data to do a linear regression that doesn't relate to the mathEAL, MathID, and comfort_speaking variables. Compare these: (they have the same result)
lm.mtcars <- lm(mpg ~ wt + cyl, data = mtcars)
ggplot(lm.mtcars, aes(mpg, wt)) +
geom_point() + geom_smooth(method="lm", se=FALSE)
ggplot(mtcars, aes(mpg, wt)) +
geom_point() + geom_smooth(method="lm", se=FALSE)
Depending on what you want to do, you could show some of the impact of other variables within your geom_smooth by referencing those:
ggplot(mtcars, aes(mpg, wt, color = as.character(cyl))) +
geom_point() + geom_smooth(method="lm", se=FALSE, fullrange = TRUE)
It would help to understand what kind of output you're hoping to generate to give more specific suggestions.

Related

geom_smooth for more than two variables

I want to create a colored scatter plot and display a (multiple) linear regression. At the moment my code looks like this (using the cars data-set as an example)
my.formula <- y ~ x
ggplot(mtcars, aes(x=mpg, y=cyl, color=(disp))) +
geom_point() +
geom_smooth(method=lm, se=FALSE) +
ggpmisc::stat_poly_eq(formula = my.formula,
aes(label = paste(..rr.label.., sep = "~~~")),
parse = TRUE) +
scale_colour_gradientn(colours=RColorBrewer::brewer.pal(9,"YlOrRd")) +
theme_bw()
However, I would like to include also the information from the second (coloured) information of the scatter plot in the regression model. Does anybody have a suggestion on how to achieve this?
The idea would be to use a formula like: my.formula <- y ~ x1 + x2 where x1 is mpg and x2 is disp. and to create e.g. a plot with the regression and the corresponding data if possible in 2D (also subplots would be possible to see all information)
You can manually create the plot using stat_function and the fit from your model, well described in this ggiraphExtra vignette. However, that package has a nice wrapper that can do exactly this.
library(ggiraphExtra)
mdl <- lm(data = mtcars, cyl ~ mpg + disp)
ggPredict(mdl)

stat_smooth with different colors using geom_point

I want to plot two numeric variables against each other in a scatterplot and the points should have different colors for each category of another binary variable. I also want to have regression lines.
This is my straight forward code:
library(MASS)
library(ggplot2)
ggplot(cats, aes(Bwt, Hwt, color = Sex)) +
geom_point() +
stat_smooth(method = "lm")
However these are lines from two separate regressions.
I want to have the regression lines from the following regression:
lm(Hwt ~ Bwt + Sex, data = cats)
I've tried the following, but this doesn't work:
ggplot(cats, aes(Bwt, Hwt, color = Sex)) +
geom_point() +
stat_smooth(method = "lm", formula = Hwt ~ Bwt + Sex)
Is there an easy (!) way to achieve this?
It would be no problem for me to write a more complex code but that's not what I'm searching for.

Changing axis limits of effects plot in R

I'm using the effects package to plot interaction effects of a linear regression like this:
library(effects)
Model <- lm(drat~hp*cyl, data=mtcars)
plot(effect(term="hp*cyl",mod=Model,default.levels=10),multiline=TRUE)
How do I change the limits so that they go from say 0-10? I've tried with ylim=(0,10), and other variations with no effect. Alternatively, can a regression be plotted in the same way using ggplot2?
And here is the ggplot2 version:
library(effects)
library(ggplot2)
Model <- lm(drat~hp*cyl, data=mtcars)
ef <- effect(term = "hp:cyl", Model, default.levels = 9) # 9 because the breaks are nicer
ef2 <- as.data.frame(ef)
ggplot(ef2, aes(hp, fit, col = factor(cyl))) +
geom_line() +
labs(y = 'drat') +
ylim(0, 10)
With the plot function, set ylim like this
plot(effect(term="hp*cyl",mod=Model,default.levels=10),multiline=TRUE,ylim=c(0,10))
ggplot2 doesn't know how to deal with an data of class 'eff', so you need to convert your effects data into a dataframe before plotting. You can then use group= with your data inside aes() to get lines for each group.
library(effects)
library(ggplot2)
Model <- lm(drat~hp*cyl, data=mtcars)
e<-effect(term="hp*cyl",mod=Model,default.levels=10)
ee<-data.frame(e)
ee$cyl<-factor(ee$cyl)
ggplot(ee, aes(x = hp, y = fit, group = cyl, colour = cyl)) +
geom_line() +
scale_y_continuous(limits = c(0,10))

plot linear regressions lines without interaction in ggplot2

This code plots regression lines with interactions in ggplot2:
library(ggplot2)
ggplot(mtcars, aes(hp, mpg, group = cyl)) + geom_point() + stat_smooth(method = "lm")
Can lines without interactions be plotted with stat_smooth?
Workaround would be to make model outside the ggplot(). Then make predicition for this model and add result to the original data frame. This will add columns fit, lwr and upr.
mod<-lm(mpg~factor(cyl)+hp,data=mtcars)
mtcars<-cbind(mtcars,predict(mod,interval="confidence"))
Now you can use geom_line() with fit values as y to add three regression lines and geom_ribbon() with lwr and upr to add confidence interval.
ggplot(mtcars, aes(hp, mpg, group = cyl)) + geom_point() +
geom_line(aes(y=fit))+geom_ribbon(aes(ymin=lwr,ymax=upr),alpha=0.4)

correlation values in a facet grid from ggplot2

When using a facet_grid in ggplot2 I would like to be able to have value of the correlation for the subsetted data for each grid cell in the top right corner of the specific plot.
e.g. if running:
p <- ggplot(mtcars, aes(mpg, wt)) + geom_point()
p + facet_grid(vs ~ am, margins=TRUE)
I would like to see the value for correlation for each of the 9 plots in the grid somewhere. In this specific case from the example, I would expect each to be close to -0.9 or so from visual inspection.
Or perhaps an output table to go with the plot that gives the correlation values for each of the cells in the table matching up with the facet_grid...(this is less desirable but also an option).
Ideally I would like to extend this to any other function I choose that so that it can use either or both of the two variables plotted to calculate statistics.
Is this possible?
Thanks in advance
Winston Chang suggested an answer on the ggplot2 group...this is what he said...its not a bad answer...
You could do something like this:
p <- ggplot(mtcars, aes(mpg, wt)) + geom_point()
# Calculate correlation for each group
cors <- ddply(mtcars, c("vs", "am"), summarise, cor = round(cor(mpg, wt), 2))
p + facet_grid(vs ~ am) +
geom_text(data=cors, aes(label=paste("r=", cor, sep="")), x=30, y=4)
I don't think it's possible to make this come out correctly with margins=TRUE, though. If you want the margins, you may need to preprocess your data to add an ALL value for each faceting variable.
-Winston
I would rather add a (linear) smoother to the data. It gives you a lot more information than a correlation.
ggplot(mtcars, aes(mpg, wt)) +
geom_smooth(method = "loess", colour = "red", fill = "red") +
geom_smooth(method = "lm", colour = "blue", fill = "blue") +
geom_point() + facet_grid(vs ~ am, margins=TRUE)
ggplot(mtcars, aes(mpg, wt)) + geom_smooth(method = "lm") + geom_point() +
facet_grid(vs ~ am, margins=TRUE)

Resources