Changing axis limits of effects plot in R - r

I'm using the effects package to plot interaction effects of a linear regression like this:
library(effects)
Model <- lm(drat~hp*cyl, data=mtcars)
plot(effect(term="hp*cyl",mod=Model,default.levels=10),multiline=TRUE)
How do I change the limits so that they go from say 0-10? I've tried with ylim=(0,10), and other variations with no effect. Alternatively, can a regression be plotted in the same way using ggplot2?

And here is the ggplot2 version:
library(effects)
library(ggplot2)
Model <- lm(drat~hp*cyl, data=mtcars)
ef <- effect(term = "hp:cyl", Model, default.levels = 9) # 9 because the breaks are nicer
ef2 <- as.data.frame(ef)
ggplot(ef2, aes(hp, fit, col = factor(cyl))) +
geom_line() +
labs(y = 'drat') +
ylim(0, 10)

With the plot function, set ylim like this
plot(effect(term="hp*cyl",mod=Model,default.levels=10),multiline=TRUE,ylim=c(0,10))

ggplot2 doesn't know how to deal with an data of class 'eff', so you need to convert your effects data into a dataframe before plotting. You can then use group= with your data inside aes() to get lines for each group.
library(effects)
library(ggplot2)
Model <- lm(drat~hp*cyl, data=mtcars)
e<-effect(term="hp*cyl",mod=Model,default.levels=10)
ee<-data.frame(e)
ee$cyl<-factor(ee$cyl)
ggplot(ee, aes(x = hp, y = fit, group = cyl, colour = cyl)) +
geom_line() +
scale_y_continuous(limits = c(0,10))

Related

slope of lines in interaction plot in ggplot2 does not match estimates

I am trying to plot the interaction effects from a multiple linear regression using ggplot2. However, the slope of the lines plotted do not match what they should be based on the estimates returned by the lm function.
Here is my code:
lm.sense <- lm(sense_of_belonging ~ active*mathEAL + MathID + comfort_speaking, data=Data)
library(ggplot2)
p.sense <- ggplot(lm.sense, aes(y=sense_of_belonging, x=active, color=mathEAL)) + geom_smooth(method="lm", se=FALSE)```
Does ggplot not hold the other variables constant?
ggplot2 works with data.frames and doesn't naturally know what to do with an lm object. (Try plot(lm.sense) to see what base R offers here.)
Your ggplot call is using the underlying data from Data (tucked away inside your lm.sense object) to make a plot where x = active and y = sense_of_belonging. It uses that underlying data to do a linear regression that doesn't relate to the mathEAL, MathID, and comfort_speaking variables. Compare these: (they have the same result)
lm.mtcars <- lm(mpg ~ wt + cyl, data = mtcars)
ggplot(lm.mtcars, aes(mpg, wt)) +
geom_point() + geom_smooth(method="lm", se=FALSE)
ggplot(mtcars, aes(mpg, wt)) +
geom_point() + geom_smooth(method="lm", se=FALSE)
Depending on what you want to do, you could show some of the impact of other variables within your geom_smooth by referencing those:
ggplot(mtcars, aes(mpg, wt, color = as.character(cyl))) +
geom_point() + geom_smooth(method="lm", se=FALSE, fullrange = TRUE)
It would help to understand what kind of output you're hoping to generate to give more specific suggestions.

geom_smooth for more than two variables

I want to create a colored scatter plot and display a (multiple) linear regression. At the moment my code looks like this (using the cars data-set as an example)
my.formula <- y ~ x
ggplot(mtcars, aes(x=mpg, y=cyl, color=(disp))) +
geom_point() +
geom_smooth(method=lm, se=FALSE) +
ggpmisc::stat_poly_eq(formula = my.formula,
aes(label = paste(..rr.label.., sep = "~~~")),
parse = TRUE) +
scale_colour_gradientn(colours=RColorBrewer::brewer.pal(9,"YlOrRd")) +
theme_bw()
However, I would like to include also the information from the second (coloured) information of the scatter plot in the regression model. Does anybody have a suggestion on how to achieve this?
The idea would be to use a formula like: my.formula <- y ~ x1 + x2 where x1 is mpg and x2 is disp. and to create e.g. a plot with the regression and the corresponding data if possible in 2D (also subplots would be possible to see all information)
You can manually create the plot using stat_function and the fit from your model, well described in this ggiraphExtra vignette. However, that package has a nice wrapper that can do exactly this.
library(ggiraphExtra)
mdl <- lm(data = mtcars, cyl ~ mpg + disp)
ggPredict(mdl)

Linear model diagnostics: smoothing line obtained in ggplot2 is different from the one obtained with base plot

I am trying to reproduce the diagnostics plots for a linear regression model using ggplot2. The smoothing line that I get is different from the one obtained using base plots or ggplot2::autoplot.
library(survival)
library(ggplot2)
model <- lm(wt.loss ~ meal.cal, data=lung)
## Fitted vs. residuals using base plot:
plot(model, which=1)
## Fitted vs. residuals using ggplot
model.frame <- fortify(model)
ggplot(model.frame, aes(.fitted, .resid)) + geom_point() + geom_smooth(method="loess", se=FALSE)
The smoothing line is different, the influence of the the first few points is much larger using the loess method provided by ggplot. My question is: how can I reproduce the smoothing line obtained with plot() using ggplot2?
You can calculate the lowess, which is used to plot the red line in the original diagnostic plot, using samename base function.
smoothed <- as.data.frame(with(model.frame, lowess(x = .fitted, y = .resid)))
ggplot(model.frame, aes(.fitted, .resid)) +
theme_bw() +
geom_point(shape = 1, size = 2) +
geom_hline(yintercept = 0, linetype = "dotted", col = "grey") +
geom_path(data = smoothed, aes(x = x, y = y), col = "red")
And the original:

plot linear regressions lines without interaction in ggplot2

This code plots regression lines with interactions in ggplot2:
library(ggplot2)
ggplot(mtcars, aes(hp, mpg, group = cyl)) + geom_point() + stat_smooth(method = "lm")
Can lines without interactions be plotted with stat_smooth?
Workaround would be to make model outside the ggplot(). Then make predicition for this model and add result to the original data frame. This will add columns fit, lwr and upr.
mod<-lm(mpg~factor(cyl)+hp,data=mtcars)
mtcars<-cbind(mtcars,predict(mod,interval="confidence"))
Now you can use geom_line() with fit values as y to add three regression lines and geom_ribbon() with lwr and upr to add confidence interval.
ggplot(mtcars, aes(hp, mpg, group = cyl)) + geom_point() +
geom_line(aes(y=fit))+geom_ribbon(aes(ymin=lwr,ymax=upr),alpha=0.4)

How would you plot a box plot and specific points on the same plot?

We can draw box plot as below:
qplot(factor(cyl), mpg, data = mtcars, geom = "boxplot")
and point as:
qplot(factor(cyl), mpg, data = mtcars, geom = "point")
How would you combine both - but just to show a few specific points(say when wt is less than 2) on top of the box?
If you are trying to plot two geoms with two different datasets (boxplot for mtcars, points for a data.frame of literal values), this is a way to do it that makes your intent clear. This works with the current (Sep 2016) version of ggplot (ggplot2_2.1.0)
library(ggplot2)
ggplot() +
# box plot of mtcars (mpg vs cyl)
geom_boxplot(data = mtcars,
aes(x = factor(cyl), y= mpg)) +
# points of data.frame literal
geom_point(data = data.frame(x = factor(c(4,6,8)), y = c(15,20,25)),
aes(x=x, y=y),
color = 'red')
I threw in a color = 'red' for the set of points, so it's easy to distinguish them from the points generated as part of geom_boxplot
Use + geom_point(...) on your qplot (just add a + geom_point() to get all the points plotted).
To plot selectively just select those points that you want to plot:
n <- nrow(mtcars)
# plot every second point
idx <- seq(1,n,by=2)
qplot( factor(cyl), mpg, data=mtcars, geom="boxplot" ) +
geom_point( aes(x=factor(cyl)[idx],y=mpg[idx]) ) # <-- see [idx] ?
If you know the points before-hand, you can feed them in directly e.g.:
qplot( factor(cyl), mpg, data=mtcars, geom="boxplot" ) +
geom_point( aes(x=factor(c(4,6,8)),y=c(15,20,25)) ) # plot (4,15),(6,20),...
You can show both by using ggplot() rather than qplot(). The syntax may be a little harder to understand, but you can usually get much more done. If you want to plot both the box plot and the points you can write:
boxpt <- ggplot(data = mtcars, aes(factor(cyl), mpg))
boxpt + geom_boxplot(aes(factor(cyl), mpg)) + geom_point(aes(factor(cyl), mpg))
I don't know what you mean by only plotting specific points on top of the box, but if you want a cheap (and probably not very smart) way of just showing points above the edge of the box, here it is:
boxpt + geom_boxplot(aes(factor(cyl), mpg)) + geom_point(data = ddply(mtcars, .(cyl),summarise, mpg = mpg[mpg > quantile(mpg, 0.75)]), aes(factor(cyl), mpg))
Basically it's the same thing except for the data supplied to geom_point is adjusted to include only the mpg numbers in the top quarter of the distribution by cylinder. In general I'm not sure this is good practice because I think people expect to see points beyond the whiskers only, but there you go.

Resources