(ggplot) facet_grid implicit subsetting not respected in geom_text (?) - r

In geom_text(...), the default dataset is only sometimes subsetted based on facet variables. Easiest to explain with an example.
This example attempts to simulate pairs(...) with ggplot (and yes, I know about lattice, and plotmatrix, and ggpairs – the point is to understand how ggplot works).
require(data.table)
require(reshape2) # for melt(…)
require(plyr) # for .(…)
require(ggplot2)
Extract mgp, hp, disp, and wt from mtcars, use cyl as grouping factor
xx <- data.table(mtcars)
xx <- data.table(id=rownames(mtcars),xx[,list(group=cyl, mpg, hp, disp, wt)])
Reshape so we can use ggplot facets.
yy <- melt(xx,id=1:2, variable.name="H", value.name="xval")
yy <- data.table(yy,key="id,group")
ww <- yy[,list(V=H,yval=xval), key="id,group"]
zz <- yy[ww,allow.cartesian=T]
In zz,
H: facet variable for horizontal direction
V: facet variable for vertical direction
xval: x-value for a given facet (given value of H and V)
yval: y-value for a given facet
Now, the following generates something close to pairs(…),
ggp <- ggplot(zz, aes(x=xval, y=yval))
ggp <- ggp + geom_point(subset =.(H!=V), size=3, shape=1)
ggp <- ggp + facet_grid(V~H, scales="free")
ggp <- ggp + labs(x="",y="")
ggp
In other words, the values of xvar and yvar used in geom_point are appropriate for each facet; they have been subsetted based on the value of H and V. However, adding the following to center the variable names in the diagonal facets:
ggp + geom_text(subset = .(H==V),aes(label=factor(H),
x=min(xval)+0.5*diff(range(xval)),
y=min(yval)+0.5*diff(range(yval))),
size=10)
gives this:
It appears that H has been subsetted properly for each facet (e.g. the labels are correct), but xvar and yvar seem to apply to the whole dataset zz, not to the subset corresponding to H and V for each facet.
My question is: In the above, why are xvar and yvar treated differently than H in aes? Is there a way around this? {Note: I am much more interested in understanding why this is happening, than in a workaround.]

One observation is that actually the labels are overplotted:
ggp + geom_text(subset = .(H==V), aes(label=factor(H),
x=min(xval)+0.5*diff(range(xval))
+ runif(length(xval), max=10),
y=min(yval)+0.5*diff(range(yval))
+ runif(length(yval), max=20)), size=10)
adds some noise to the position of the labels, and you can see that for each observation in zz one text is added.
To your original question: From the perspective of ggplot it might be faster to evaluate all aesthetics at once and split later for faceting, which leads to the observed behavior. I'm not sure if doing the evaluation separately for each facet will ever be implemented in ggplot -- the only application I can think of is to aggregate facet-wise, and there are workarounds to achieve this easily. Also, to avoid the overplotting shown above, you'll have to build a table with four observations (one per text) anyway. Makes your code simpler, too.

Related

plotting multiple plots in ggplot2 on same graph that are unrelated

How would one use the smooth.spline() method in a ggplot2 scatterplot?
If my data is in the data frame called data, with two columns, x and y.
The smooth.spline would be sm <- smooth.spline(data$x, data$y). I believe I should use geom_line(), with sm$x and sm$y as the xy coordinates. However, how would one plot a scatterplot and a lineplot on the same graph that are completely unrelated? I suspect it has something to do with the aes() but I am getting a little confused.
You can use different data(frames) in different geoms and call the relevant variables using aes or you could combine the relevant variables from the output of smooth.spline
# example data
set.seed(1)
dat <- data.frame(x = rnorm(20, 10,2))
dat$y <- dat$x^2 - 20*dat$x + rnorm(20,10,2)
# spline
s <- smooth.spline(dat)
# plot - combine the original x & y and the fitted values returned by
# smooth.spline into a data.frame
library(ggplot2)
ggplot(data.frame(x=s$data$x, y=s$data$y, xfit=s$x, yfit=s$y)) +
geom_point(aes(x,y)) + geom_line(aes(xfit, yfit))
# or you could use geom_smooth
ggplot(dat, aes(x , y)) + geom_point() + geom_smooth()

49 plots arranged in a 7x7 matrix

I don't know if this question is trivial, but...
I'm trying to plot a group of variables in a similar form as a PAIRS plot.
But instead of using the same variables in the row and columns of the graphic I would like to have diferents variables. For exemple, if I have a dataset with X1,...,X7 and another dataset with Y1,...,Y7.
I've tryed with layout and par(mfrow) but as I want to cross 7 variables x 7 variables it gave me an overflow error.
Is there any way to do this plot matrix 7x7?
Thank you
I'm not aware of a way to do this using pairs(...) in base R, but here's a ggplot solution, assuming your x- and y-values are in dataframes named df.x and df.y.
# create a sample dataset - you have this already...
set.seed(1) # for reproducible example
df.x <- data.frame(matrix(sample(1:50,350,replace=T),nc=7))
df.y <- 2*df.x + rnorm(350,sd=5)
colnames(df.y) <- paste0("Y",1:7)
# this makes the plot - you start here.
library(ggplot2)
library(data.table)
library(reshape2) # for melt(...)
xDT <- data.table(melt(cbind(id=1:nrow(df.x),df.x),id="id",value.name="xval",variable.name="H"),key="id")
yDT <- data.table(melt(cbind(id=1:nrow(df.y),df.y),id="id",value.name="yval",variable.name="V"),key="id")
xy <- xDT[yDT,allow.cartesian=T]
# simulates pairs() in base R
ggp = ggplot(xy,aes(x=xval,y=yval))
ggp = ggp + geom_point()
ggp = ggp + facet_grid(V~H, scales="free")
ggp = ggp + labs(x="",y="")
print(ggp)
This assumes, but does not test, that the number of rows in df.x and df.y are the same.
You do not necessarily need data.tables to do this, but it's likely to be faster if your datasets are large, and the syntax is cleaner.

log-scaled density plot: ggplot2 and freqpoly, but with points instead of lines

What I really want to do is plot a histogram, with the y-axis on a log-scale. Obviously this i a problem with the ggplot2 geom_histogram, since the bottom os the bar is at zero, and the log of that gives you trouble.
My workaround is to use the freqpoly geom, and that more-or less does the job. The following code works just fine:
ggplot(zcoorddist) +
geom_freqpoly(aes(x=zcoord,y=..density..),binwidth = 0.001) +
scale_y_continuous(trans = 'log10')
The issue is that at the edges of my data, I get a couple of garish vertical lines that really thro you off visually when combining a bunch of these freqpoly curves in one plot. What I'd like to be able to do is use points at every vertex of the freqpoly curve, and no lines connecting them. Is there a way to to this easily?
The easiest way to get the desired plot is to just recast your data. Then you can use geom_point. Since you don't provide an example, I used the standard example for geom_histogram to show this:
# load packages
require(ggplot2)
require(reshape)
# get data
data(movies)
movies <- movies[, c("title", "rating")]
# here's the equivalent of your plot
ggplot(movies) + geom_freqpoly(aes(x=rating, y=..density..), binwidth=.001) +
scale_y_continuous(trans = 'log10')
# recast the data
df1 <- recast(movies, value~., measure.var="rating")
names(df1) <- c("rating", "number")
# alternative way to recast data
df2 <- as.data.frame(table(movies$rating))
names(df2) <- c("rating", "number")
df2$rating <- as.numeric(as.character(df$rating))
# plot
p <- ggplot(df1, aes(x=rating)) + scale_y_continuous(trans="log10", name="density")
# with lines
p + geom_linerange(aes(ymax=number, ymin=.9))
# only points
p + geom_point(aes(y=number))

R: ggplot, legend control using scale_shape_manual and one data frame

Using scale shape manual in ggplot, I created different values for three different types of factories (squares, triangles, and circles), which corresponds to North, South, and West respectively. Is it possible to have the North/South/West labels in the legend without creating three different data frames for each region? Can I add these labels to the original data frame?
I have one data frame for a plot (as recommended by the ggplot2 book), and with my code below, the default legend lists every row in my data frame, which is repetitive and not what I want.
Basically, I would like to know the best way to label these regions in the plot. The only reason I would like to maintain one data frame is because the code will be easy to use over and over again by just switching the data frame (the benefit of one df mentioned in the ggplot2 book).
I think part of the problem is that I am using scale shape manual to assign values to each point individually. Should I put the North/South/West labels in my data frame and alter my scale shape manual? If so, what is the best way to accomplish this?
Please let me know if my question is unclear. My code is below, and it replicates my plot as it stands. Thanks.
#Data frame
points <- c(3,5,4,7,12)
bars <- c(.8,1.2,1.4,2.1,4)
points_df<-data.frame(points)
row.names(points_df) <- c( "Factory 1","Factory 2","Factory 3","Factory 4","Factory 5" )
df<-data.frame(Output=points,Errors=bars,lev.names= rownames(points_df))
df$lev.names<-factor(df$lev.names,levels=df$lev.names[order(df$Output)])
# GGPLOT #
library(ggplot2)
library(scales)
p2 <- ggplot(df,aes(lev.names,Output,shape=lev.names))
p2 <- p2 +geom_errorbar(aes(ymin=Output-Errors, ymax=Output+Errors), width=0,color="gray40", lty=1, size=0)
p2 <- p2 + geom_point(aes(size=2))
p2 <- p2 + scale_shape_manual(values=c(6,7,6,1,1))
p2 <- p2 + theme_bw() + xlab(" ") + ylab("Output")
p2 <- p2 + opts(title = expression("Production"))
p2 <- p2+ coord_flip()
print(p2)
Yes, put the location in your data.frame and use it in the aes mapping:
df$location <- c("North","South","North","West","West")
p2 <- ggplot(df,aes(lev.names,Output,shape=location)) +
geom_errorbar(aes(ymin=Output-Errors, ymax=Output+Errors),
width=0,color="gray40", lty=1, size=0) +
geom_point(size=3) +
theme_bw() + xlab(" ") + ylab("Output") +
ggtitle(expression("Production")) +
coord_flip()
print(p2)
I've also fixed some other stuff (e.g., opts is deprecated and you don't want to map size, but to set it).

How to control ylim for a faceted plot with different scales in ggplot2?

In the following example, how do I set separate ylims for each of my facets?
qplot(x, value, data=df, geom=c("smooth")) + facet_grid(variable ~ ., scale="free_y")
In each of the facets, the y-axis takes a different range of values and I would like to different ylims for each of the facets.
The defaults ylims are too long for the trend that I want to see.
This was brought up on the ggplot2 mailing list a short while ago. What you are asking for is currently not possible but I think it is in progress.
As far as I know this has not been implemented in ggplot2, yet. However a workaround - that will give you ylims that exceed what ggplot provides automatically - is to add "artificial data". To reduce the ylims simply remove the data you don't want plot (see at the and for an example).
Here is an example:
Let's just set up some dummy data that you want to plot
df <- data.frame(x=rep(seq(1,2,.1),4),f1=factor(rep(c("a","b"),each=22)),f2=factor(rep(c("x","y"),22)))
df <- within(df,y <- x^2)
Which we could plot using line graphs
p <- ggplot(df,aes(x,y))+geom_line()+facet_grid(f1~f2,scales="free_y")
print(p)
Assume we want to let y start at -10 in first row and 0 in the second row, so we add a point at (0,-10) to the upper left plot and at (0,0) ot the lower left plot:
ylim <- data.frame(x=rep(0,2),y=c(-10,0),f1=factor(c("a","b")),f2=factor(c("x","y")))
dfy <- rbind(df,ylim)
Now by limiting the x-scale between 1 and 2 those added points are not plotted (a warning is given):
p <- ggplot(dfy,aes(x,y))+geom_line()+facet_grid(f1~f2,scales="free_y")+xlim(c(1,2))
print(p)
Same would work for extending the margin above by adding points with higher y values at x values that lie outside the range of xlim.
This will not work if you want to reduce the ylim, in which case subsetting your data would be a solution, for example to limit the upper row between -10 and 1.5 you could use:
p <- ggplot(dfy,aes(x,y))+geom_line(subset=.(y < 1.5 | f1 != "a"))+facet_grid(f1~f2,scales="free_y")+xlim(c(1,2))
print(p)
There are actually two packages that solve that problem now:
https://github.com/zeehio/facetscales, and https://cran.r-project.org/package=ggh4x.
I would recommend using ggh4x because it has very useful tools, such as facet grid multiple layers (having 2 variables defining the rows or columns), scaling the x and y-axis as you wish in each facet, and also having multiple fill and colour scales.
For your problems the solution would be like this:
library(ggh4x)
scales <- list(
# Here you have to specify all the scales, one for each facet row in your case
scale_y_continuous(limits = c(2,10),
scale_y_continuous(breaks = c(3, 4))
)
qplot(x, value, data=df, geom=c("smooth")) +
facet_grid(variable ~ ., scale="free_y") +
facetted_pos_scales(y = scales)
I have one example of function facet_wrap
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
facet_wrap(vars(class), scales = "free",
nrow=2,ncol=4)
Above code generates plot as:
my level too low to upload an image, click here to see plot

Resources