This code plots regression lines with interactions in ggplot2:
library(ggplot2)
ggplot(mtcars, aes(hp, mpg, group = cyl)) + geom_point() + stat_smooth(method = "lm")
Can lines without interactions be plotted with stat_smooth?
Workaround would be to make model outside the ggplot(). Then make predicition for this model and add result to the original data frame. This will add columns fit, lwr and upr.
mod<-lm(mpg~factor(cyl)+hp,data=mtcars)
mtcars<-cbind(mtcars,predict(mod,interval="confidence"))
Now you can use geom_line() with fit values as y to add three regression lines and geom_ribbon() with lwr and upr to add confidence interval.
ggplot(mtcars, aes(hp, mpg, group = cyl)) + geom_point() +
geom_line(aes(y=fit))+geom_ribbon(aes(ymin=lwr,ymax=upr),alpha=0.4)
Related
I am trying to plot the interaction effects from a multiple linear regression using ggplot2. However, the slope of the lines plotted do not match what they should be based on the estimates returned by the lm function.
Here is my code:
lm.sense <- lm(sense_of_belonging ~ active*mathEAL + MathID + comfort_speaking, data=Data)
library(ggplot2)
p.sense <- ggplot(lm.sense, aes(y=sense_of_belonging, x=active, color=mathEAL)) + geom_smooth(method="lm", se=FALSE)```
Does ggplot not hold the other variables constant?
ggplot2 works with data.frames and doesn't naturally know what to do with an lm object. (Try plot(lm.sense) to see what base R offers here.)
Your ggplot call is using the underlying data from Data (tucked away inside your lm.sense object) to make a plot where x = active and y = sense_of_belonging. It uses that underlying data to do a linear regression that doesn't relate to the mathEAL, MathID, and comfort_speaking variables. Compare these: (they have the same result)
lm.mtcars <- lm(mpg ~ wt + cyl, data = mtcars)
ggplot(lm.mtcars, aes(mpg, wt)) +
geom_point() + geom_smooth(method="lm", se=FALSE)
ggplot(mtcars, aes(mpg, wt)) +
geom_point() + geom_smooth(method="lm", se=FALSE)
Depending on what you want to do, you could show some of the impact of other variables within your geom_smooth by referencing those:
ggplot(mtcars, aes(mpg, wt, color = as.character(cyl))) +
geom_point() + geom_smooth(method="lm", se=FALSE, fullrange = TRUE)
It would help to understand what kind of output you're hoping to generate to give more specific suggestions.
I'm using the effects package to plot interaction effects of a linear regression like this:
library(effects)
Model <- lm(drat~hp*cyl, data=mtcars)
plot(effect(term="hp*cyl",mod=Model,default.levels=10),multiline=TRUE)
How do I change the limits so that they go from say 0-10? I've tried with ylim=(0,10), and other variations with no effect. Alternatively, can a regression be plotted in the same way using ggplot2?
And here is the ggplot2 version:
library(effects)
library(ggplot2)
Model <- lm(drat~hp*cyl, data=mtcars)
ef <- effect(term = "hp:cyl", Model, default.levels = 9) # 9 because the breaks are nicer
ef2 <- as.data.frame(ef)
ggplot(ef2, aes(hp, fit, col = factor(cyl))) +
geom_line() +
labs(y = 'drat') +
ylim(0, 10)
With the plot function, set ylim like this
plot(effect(term="hp*cyl",mod=Model,default.levels=10),multiline=TRUE,ylim=c(0,10))
ggplot2 doesn't know how to deal with an data of class 'eff', so you need to convert your effects data into a dataframe before plotting. You can then use group= with your data inside aes() to get lines for each group.
library(effects)
library(ggplot2)
Model <- lm(drat~hp*cyl, data=mtcars)
e<-effect(term="hp*cyl",mod=Model,default.levels=10)
ee<-data.frame(e)
ee$cyl<-factor(ee$cyl)
ggplot(ee, aes(x = hp, y = fit, group = cyl, colour = cyl)) +
geom_line() +
scale_y_continuous(limits = c(0,10))
I am trying to create a plot with facets. Each facet should have its own scale, but for ease of visualization I would like each facet to show a fixed y point. Is this possible with ggplot?
This is an example using the mtcars dataset. I plot the weight (wg) as a function of the number of miles per gallon (mpg). The facets represent the number of cylinders of each car. As you can see, I would like the y scales to vary across facets, but still have a reference point (3, in the example) at the same height across facets. Any suggestions?
library(ggplot2)
data(mtcars)
ggplot(mtcars, aes(mpg, wt)) + geom_point() +
geom_hline (yintercept=3, colour="red", lty=6, lwd=1) +
facet_wrap( ~ cyl, scales = "free_y")
[EDIT: in my actual data, the fixed reference point should be at y = 0. I used y = 3 in the example above because 0 didn't make sense for the range of the data points in the example]
It's unclear where the line should be, let's assume in the middle; you could compute limits outside ggplot, and add a dummy layer to set the scales,
library(ggplot2)
library(plyr)
# data frame where 3 is the middle
# 3 = (min + max) /2
dummy <- ddply(mtcars, "cyl", summarise,
min = 6 - max(wt),
max = 6 - min(wt))
ggplot(mtcars, aes(mpg, wt)) + geom_point() +
geom_blank(data=dummy, aes(y=min, x=Inf)) +
geom_blank(data=dummy, aes(y=max, x=Inf)) +
geom_hline (yintercept=3, colour="red", lty=6, lwd=1) +
facet_wrap( ~ cyl, scales = "free_y")
When using a facet_grid in ggplot2 I would like to be able to have value of the correlation for the subsetted data for each grid cell in the top right corner of the specific plot.
e.g. if running:
p <- ggplot(mtcars, aes(mpg, wt)) + geom_point()
p + facet_grid(vs ~ am, margins=TRUE)
I would like to see the value for correlation for each of the 9 plots in the grid somewhere. In this specific case from the example, I would expect each to be close to -0.9 or so from visual inspection.
Or perhaps an output table to go with the plot that gives the correlation values for each of the cells in the table matching up with the facet_grid...(this is less desirable but also an option).
Ideally I would like to extend this to any other function I choose that so that it can use either or both of the two variables plotted to calculate statistics.
Is this possible?
Thanks in advance
Winston Chang suggested an answer on the ggplot2 group...this is what he said...its not a bad answer...
You could do something like this:
p <- ggplot(mtcars, aes(mpg, wt)) + geom_point()
# Calculate correlation for each group
cors <- ddply(mtcars, c("vs", "am"), summarise, cor = round(cor(mpg, wt), 2))
p + facet_grid(vs ~ am) +
geom_text(data=cors, aes(label=paste("r=", cor, sep="")), x=30, y=4)
I don't think it's possible to make this come out correctly with margins=TRUE, though. If you want the margins, you may need to preprocess your data to add an ALL value for each faceting variable.
-Winston
I would rather add a (linear) smoother to the data. It gives you a lot more information than a correlation.
ggplot(mtcars, aes(mpg, wt)) +
geom_smooth(method = "loess", colour = "red", fill = "red") +
geom_smooth(method = "lm", colour = "blue", fill = "blue") +
geom_point() + facet_grid(vs ~ am, margins=TRUE)
ggplot(mtcars, aes(mpg, wt)) + geom_smooth(method = "lm") + geom_point() +
facet_grid(vs ~ am, margins=TRUE)
We can draw box plot as below:
qplot(factor(cyl), mpg, data = mtcars, geom = "boxplot")
and point as:
qplot(factor(cyl), mpg, data = mtcars, geom = "point")
How would you combine both - but just to show a few specific points(say when wt is less than 2) on top of the box?
If you are trying to plot two geoms with two different datasets (boxplot for mtcars, points for a data.frame of literal values), this is a way to do it that makes your intent clear. This works with the current (Sep 2016) version of ggplot (ggplot2_2.1.0)
library(ggplot2)
ggplot() +
# box plot of mtcars (mpg vs cyl)
geom_boxplot(data = mtcars,
aes(x = factor(cyl), y= mpg)) +
# points of data.frame literal
geom_point(data = data.frame(x = factor(c(4,6,8)), y = c(15,20,25)),
aes(x=x, y=y),
color = 'red')
I threw in a color = 'red' for the set of points, so it's easy to distinguish them from the points generated as part of geom_boxplot
Use + geom_point(...) on your qplot (just add a + geom_point() to get all the points plotted).
To plot selectively just select those points that you want to plot:
n <- nrow(mtcars)
# plot every second point
idx <- seq(1,n,by=2)
qplot( factor(cyl), mpg, data=mtcars, geom="boxplot" ) +
geom_point( aes(x=factor(cyl)[idx],y=mpg[idx]) ) # <-- see [idx] ?
If you know the points before-hand, you can feed them in directly e.g.:
qplot( factor(cyl), mpg, data=mtcars, geom="boxplot" ) +
geom_point( aes(x=factor(c(4,6,8)),y=c(15,20,25)) ) # plot (4,15),(6,20),...
You can show both by using ggplot() rather than qplot(). The syntax may be a little harder to understand, but you can usually get much more done. If you want to plot both the box plot and the points you can write:
boxpt <- ggplot(data = mtcars, aes(factor(cyl), mpg))
boxpt + geom_boxplot(aes(factor(cyl), mpg)) + geom_point(aes(factor(cyl), mpg))
I don't know what you mean by only plotting specific points on top of the box, but if you want a cheap (and probably not very smart) way of just showing points above the edge of the box, here it is:
boxpt + geom_boxplot(aes(factor(cyl), mpg)) + geom_point(data = ddply(mtcars, .(cyl),summarise, mpg = mpg[mpg > quantile(mpg, 0.75)]), aes(factor(cyl), mpg))
Basically it's the same thing except for the data supplied to geom_point is adjusted to include only the mpg numbers in the top quarter of the distribution by cylinder. In general I'm not sure this is good practice because I think people expect to see points beyond the whiskers only, but there you go.