correlation values in a facet grid from ggplot2 - r

When using a facet_grid in ggplot2 I would like to be able to have value of the correlation for the subsetted data for each grid cell in the top right corner of the specific plot.
e.g. if running:
p <- ggplot(mtcars, aes(mpg, wt)) + geom_point()
p + facet_grid(vs ~ am, margins=TRUE)
I would like to see the value for correlation for each of the 9 plots in the grid somewhere. In this specific case from the example, I would expect each to be close to -0.9 or so from visual inspection.
Or perhaps an output table to go with the plot that gives the correlation values for each of the cells in the table matching up with the facet_grid...(this is less desirable but also an option).
Ideally I would like to extend this to any other function I choose that so that it can use either or both of the two variables plotted to calculate statistics.
Is this possible?
Thanks in advance

Winston Chang suggested an answer on the ggplot2 group...this is what he said...its not a bad answer...
You could do something like this:
p <- ggplot(mtcars, aes(mpg, wt)) + geom_point()
# Calculate correlation for each group
cors <- ddply(mtcars, c("vs", "am"), summarise, cor = round(cor(mpg, wt), 2))
p + facet_grid(vs ~ am) +
geom_text(data=cors, aes(label=paste("r=", cor, sep="")), x=30, y=4)
I don't think it's possible to make this come out correctly with margins=TRUE, though. If you want the margins, you may need to preprocess your data to add an ALL value for each faceting variable.
-Winston

I would rather add a (linear) smoother to the data. It gives you a lot more information than a correlation.
ggplot(mtcars, aes(mpg, wt)) +
geom_smooth(method = "loess", colour = "red", fill = "red") +
geom_smooth(method = "lm", colour = "blue", fill = "blue") +
geom_point() + facet_grid(vs ~ am, margins=TRUE)
ggplot(mtcars, aes(mpg, wt)) + geom_smooth(method = "lm") + geom_point() +
facet_grid(vs ~ am, margins=TRUE)

Related

slope of lines in interaction plot in ggplot2 does not match estimates

I am trying to plot the interaction effects from a multiple linear regression using ggplot2. However, the slope of the lines plotted do not match what they should be based on the estimates returned by the lm function.
Here is my code:
lm.sense <- lm(sense_of_belonging ~ active*mathEAL + MathID + comfort_speaking, data=Data)
library(ggplot2)
p.sense <- ggplot(lm.sense, aes(y=sense_of_belonging, x=active, color=mathEAL)) + geom_smooth(method="lm", se=FALSE)```
Does ggplot not hold the other variables constant?
ggplot2 works with data.frames and doesn't naturally know what to do with an lm object. (Try plot(lm.sense) to see what base R offers here.)
Your ggplot call is using the underlying data from Data (tucked away inside your lm.sense object) to make a plot where x = active and y = sense_of_belonging. It uses that underlying data to do a linear regression that doesn't relate to the mathEAL, MathID, and comfort_speaking variables. Compare these: (they have the same result)
lm.mtcars <- lm(mpg ~ wt + cyl, data = mtcars)
ggplot(lm.mtcars, aes(mpg, wt)) +
geom_point() + geom_smooth(method="lm", se=FALSE)
ggplot(mtcars, aes(mpg, wt)) +
geom_point() + geom_smooth(method="lm", se=FALSE)
Depending on what you want to do, you could show some of the impact of other variables within your geom_smooth by referencing those:
ggplot(mtcars, aes(mpg, wt, color = as.character(cyl))) +
geom_point() + geom_smooth(method="lm", se=FALSE, fullrange = TRUE)
It would help to understand what kind of output you're hoping to generate to give more specific suggestions.

R ggplot 2: how to set geom point alpha based on another variable?

Let's use mtcars dataset
p <- ggplot(mtcars, aes(wt, mpg, color = as.factor(gear)))
p + geom_point()
What I wish to do is, inside geom_point(), I want to set alpha such that it varies based on another column.
For example
p <- ggplot(mtcars, aes(wt, mpg, color = as.factor(gear)))
p + geom_point(alpha = cyl)
The higher the cyl for that point, the more intense the color. However it seems that alpha doesn't take a variable. Is there a workaround on this? Thanks
This will work if you put cyl in aes. Essentially what is happening here is that you want to be mapping the cyl variable. Is this what you are looking for?
ggplot(mtcars, aes(wt, mpg, color = as.factor(gear), alpha = cyl)) +
geom_point()

geom_point plot with only number without circles

In ggplot in R, is it possible to plot each point with a unique number but without circles surrounded? I tried to use color "white" but it doesn't work.
I would recommend geom_text.
set.seed(101)
dd <- data.frame(x=rnorm(50),y=rnorm(50),id=1:50)
library(ggplot2)
ggplot(dd,aes(x,y))+geom_text(aes(label=id))
I'll show how to do it with geom_text and/or geom_point.
Using geom_text (recommended)
For this example I'll use the built-in dataset mtcars and let's pretend the numbers you want to display are the weights (wt) variable:
data(mtcars)
p <- ggplot(mtcars, aes(wt, mpg, label = rownames(mtcars)))
p + geom_text(aes(label = wt),
parse = TRUE)
or if you want an example with truly unique numbers, we can just make up an index using seq:
data(mtcars)
p <- ggplot(mtcars, aes(wt, mpg, label = rownames(mtcars)))
p + geom_text(aes(label = seq(1:32)),
parse = TRUE)
Using geom_point
While it would require more work, it actually is possible to do this with geom_point.
This is a reference image of some of the shapes you can use with geom_point:
As you can see, shapes 48 to 57 are 0 to 9. You can leverage these shapes (and combinations of them to form an infinite amount of numbers) via geom_point like this:
d=data.frame(p=c(48:57))
ggplot() +
scale_y_continuous(name="") +
scale_x_continuous(name="") +
scale_shape_identity() +
geom_point(data=d, mapping=aes(x=p%%16, y=p%/%16, shape=p), size=5, fill="red")
Finally, a trivial example using mtcars + geom_point with arbitrary numbers:
d=data.frame(p=c(48:57,48:57,48:57,48,49))
attach(mtcars)
ggplot(mtcars) +
scale_y_continuous(name="") +
scale_x_continuous(name="") +
scale_shape_identity() +
geom_point(data=d, mapping=aes(x=wt, y=mpg, shape=p), size=5, fill="red")

How would you plot a box plot and specific points on the same plot?

We can draw box plot as below:
qplot(factor(cyl), mpg, data = mtcars, geom = "boxplot")
and point as:
qplot(factor(cyl), mpg, data = mtcars, geom = "point")
How would you combine both - but just to show a few specific points(say when wt is less than 2) on top of the box?
If you are trying to plot two geoms with two different datasets (boxplot for mtcars, points for a data.frame of literal values), this is a way to do it that makes your intent clear. This works with the current (Sep 2016) version of ggplot (ggplot2_2.1.0)
library(ggplot2)
ggplot() +
# box plot of mtcars (mpg vs cyl)
geom_boxplot(data = mtcars,
aes(x = factor(cyl), y= mpg)) +
# points of data.frame literal
geom_point(data = data.frame(x = factor(c(4,6,8)), y = c(15,20,25)),
aes(x=x, y=y),
color = 'red')
I threw in a color = 'red' for the set of points, so it's easy to distinguish them from the points generated as part of geom_boxplot
Use + geom_point(...) on your qplot (just add a + geom_point() to get all the points plotted).
To plot selectively just select those points that you want to plot:
n <- nrow(mtcars)
# plot every second point
idx <- seq(1,n,by=2)
qplot( factor(cyl), mpg, data=mtcars, geom="boxplot" ) +
geom_point( aes(x=factor(cyl)[idx],y=mpg[idx]) ) # <-- see [idx] ?
If you know the points before-hand, you can feed them in directly e.g.:
qplot( factor(cyl), mpg, data=mtcars, geom="boxplot" ) +
geom_point( aes(x=factor(c(4,6,8)),y=c(15,20,25)) ) # plot (4,15),(6,20),...
You can show both by using ggplot() rather than qplot(). The syntax may be a little harder to understand, but you can usually get much more done. If you want to plot both the box plot and the points you can write:
boxpt <- ggplot(data = mtcars, aes(factor(cyl), mpg))
boxpt + geom_boxplot(aes(factor(cyl), mpg)) + geom_point(aes(factor(cyl), mpg))
I don't know what you mean by only plotting specific points on top of the box, but if you want a cheap (and probably not very smart) way of just showing points above the edge of the box, here it is:
boxpt + geom_boxplot(aes(factor(cyl), mpg)) + geom_point(data = ddply(mtcars, .(cyl),summarise, mpg = mpg[mpg > quantile(mpg, 0.75)]), aes(factor(cyl), mpg))
Basically it's the same thing except for the data supplied to geom_point is adjusted to include only the mpg numbers in the top quarter of the distribution by cylinder. In general I'm not sure this is good practice because I think people expect to see points beyond the whiskers only, but there you go.

Qplot in ggplot2 causes multiple regression lines when a variable is factored

When I do a simple qplot() I want one regression line for the plot. How do I tell stat_smooth to ignore the factors?
Here's my example code:
library("ggplot2")
qplot(y=wt, x=mpg, size=cyl, col=factor(gear), data=mtcars) +
stat_smooth(method=lm, formula=y~x)
That gives this image:
When I remove the factor I get the graph that I want (although I can't remove the factor in my real dataset):
qplot(y=wt, x=mpg, size=cyl, col=gear, data=mtcars) +
stat_smooth(method=lm, formula=y~x)
You can separate the points (for which you want the groupings by the cycl and gear factors) from the smoother (for which you just want the x and y aesthetics, and nothing else).
ggplot( mtcars, aes( y=wt, x=mpg ) ) +
geom_point( aes(size=cyl, colour=factor(gear)) ) +
stat_smooth( method="lm" )
Or if you have a lot of geom and want to remove the default aesthetics from just one geom:
ggplot( mtcars, aes( y=wt, x=mpg, size=cyl, colour=factor(gear)) ) +
geom_point() +
stat_smooth(method="lm", aes(size = NULL, colour = NULL))

Resources