R: display values in levelplot stratified by a grouping variable - r

In this following example, I need to display the values for each of the cells in each of the panels stratified by the grouping variable class:
library("lattice")
x <- seq(pi/4, 5*pi, length.out=5)
y <- seq(pi/4, 5*pi, length.out=5)
r1 <- as.vector(sqrt(outer(x^2, y^2, "+")))
r2 <- as.vector(sqrt(outer(x^2, y^2, "/")))
grid1 <- grid2 <- expand.grid(x=x, y=y)
grid1$z <- cos(r1^2)*exp(-r1/(pi^3))
grid2$z <- cos(r2^2)*exp(-r2/(pi^3))
grid <- rbind(grid1, grid2)
grid$class <- c(rep("addition",length(x)^2), rep("division", length(x)^2))
p <- levelplot(z~x*y | factor(class), grid,
panel=function(...) {
arg <- list(...)
panel.levelplot(...)
panel.text(arg$x, arg$y, round(arg$z,1))})
print(p)
However, the cell values are superimposed on each other because the panel option dose not distinguish between the two groups. How can I get the values to display correctly in each group?

Slightly behind the scenes, lattice uses an argument called subscripts to subset data for display in different panels. Often, it does so without you needing to be aware of it, but this is not one of those cases.
A look at the source code for panel.levelplotreveals that it handles subscripts on its own. args(panel.levelplot) shows that it's among the function's formal arguments, and the function's body shows how it uses them.
panel.text(), (really just a wrapper for lattice:::ltext.default()), on the other hand, doesn't know about or do anything with subscripts. From within a call to panel.text(x,y,z), the x, y, and z that are seen are the full columns of the data.frame grid, which is why you saw the overplotting that you did.
To plot text for the values that are a part of the current panel, you need to make explicit use of the subscripts argument, like this:
myPanel <- function(x, y, z, ..., subscripts=subscripts) {
panel.levelplot(x=x, y=y, z=z, ..., subscripts=subscripts)
panel.text(x = x[subscripts],
y = y[subscripts],
labels = round(z[subscripts], 1))
}
p <- levelplot(z~x*y | factor(class), grid, panel = myPanel)
print(p)

Related

Can someone explain what these lines of code mean?

I have been trying to find a way to make a scatter plot with colour intensity that is indicative of the density of points plotted in the area (it's a big data set with lots of overlap). I found these lines of code which allow me to do this but I want to make sure I actually understand what each line is actually doing.
Thanks in advance :)
get_density <- function(x, y, ...){
dens <- MASS::kde2d(x, y, ...)
ix <- findInterval(x, dens$x)
iy <- findInterval(y, dens$y)
ii <- cbind(ix, iy)
return(dens$z[ii])
}
set.seed(1)
dat <- data.frame(x = subset2$conservation.phyloP, y = subset2$gene.expression.RPKM)
dat$density <- get_density(dat$x, dat$y, n = 100)
Below is the function with some explanatory comments, let me know if anything is still confusing:
# The function "get_density" takes two arguments, called x and y
# The "..." allows you to pass other arguments
get_density <- function(x, y, ...){
# The "MASS::" means it comes from the MASS package, but makes it so you don't have to load the whole MASS package and can just pull out this one function to use.
# This is where the arguments passed as "..." (above) would get passed along to the kde2d function
dens <- MASS::kde2d(x, y, ...)
# These lines use the base R function "findInterval" to get the density values of x and y
ix <- findInterval(x, dens$x)
iy <- findInterval(y, dens$y)
# This command "cbind" pastes the two sets of values together, each as one column
ii <- cbind(ix, iy)
# This line takes a subset of the "density" output, subsetted by the intervals above
return(dens$z[ii])
}
# The "set.seed()" function makes sure that any randomness used by a function is the same if it is re-run (as long as the same number is used), so it makes code more reproducible
set.seed(1)
dat <- data.frame(x = subset2$conservation.phyloP, y = subset2$gene.expression.RPKM)
dat$density <- get_density(dat$x, dat$y, n = 100)
If your question is about the MASS::kde2d function itself, it might be better to rewrite this StackOverflow question to reflect that!
It looks like the same function is wrapped into a ggplot2 method described here, so if you switch to making your plot with ggplot2 you could give it a try.

Retrieve facet labels from a ggplot or a gtable/gTree/grob/gDesc object

I have data I'm plotting using ggplot's facet_grid:
My data:
species <- c("spcies1","species2")
conditions <- c("cond1","cond2","cond3")
batches <- 1:6
df <- expand.grid(species=species,condition=conditions,batch=batches)
set.seed(1)
df$y <- rnorm(nrow(df))
df$replicate <- 1
df$col.fill <- paste(df$species,df$condition,df$batch,sep=".")
My plot:
integerBreaks <- function(n = 5, ...)
{
library(scales)
breaker <- pretty_breaks(n, ...)
function(x){
breaks <- breaker(x)
breaks[breaks == floor(breaks)]
}
}
library(ggplot2)
p <- ggplot(df,aes(x=replicate,y=y,color=col.fill))+
geom_point(size=3)+facet_grid(~col.fill,scales="free_x")+
scale_x_continuous(breaks=integerBreaks())+
theme_minimal()+theme(legend.position="none",axis.title=element_text(size=8))
which gives:
Obviously the labels are long and come out pretty messed up in the figure so I was wondering if there's a way edit these labels in the ggplot object (p) or the gtable/gTree/grob/gDesc object (ggplotGrob(p)).
I am aware that one way of getting better labels is to use the labeller function when the ggplot object is created but in my case I'm specifically looking for a way to edit the facet labels after the ggplot object has been created.
As I mentioned in the comments, the facet names are nested quite deeply within the gtable that ggplotGrob() gives you. However, this is still possible and since the OP explicitly wants to edit them after being plotted, you can do this with:
library(grid)
gg <- ggplotGrob(p)
edited_grobs <- mapply(FUN = function(x, y) {
x[["grobs"]][[1]][["children"]][[2]][["children"]][[1]][["label"]] <- y
return(x)
},
gg$grobs[which(grepl("strip-t",gg$layout$name))],
unique(gsub("cond","c", df$condition)),
SIMPLIFY = FALSE)
gg$grobs[which(grepl("strip-t",gg$layout$name))] <- edited_grobs
grid.draw(gg)
Note that this extracts all the strips using gg$grobs[which(grepl("strip-t",gg$layout$name))] and passes them to the mapply to be reset with the gsub(...) that OP specified in their comment.
In general, if you want to access just one of the text labels, there is a very similar structure which I made use of in my mapply:
num_to_access <- 1
gg$grobs[which(grepl("strip-t",gg$layout$name))][[num_to_access]][["grobs"]][[1]][["children"]][[2]][["children"]][[1]]$label
So to access the 4th label for example all you would need to do is change num_to_acces to be 4. Hope this helps!

How to combine custom panels with splom() (or xyplot() or pairs())

I'm having trouble combining heterogenous panels with lattice package tools. I tried splom(), pairs(), and xyplot(), but unsuccessfully so far. Suppose I have a simple time series data of 3 columns as xts object:
library(xts)
S = as.xts(apply(matrix(rnorm(300), ,3), 2, cumsum), Sys.Date()+1:100)
Diagonal panels (top left to bottom right or diag(5) format) need to show 3 density plots, one for each series.
Upper triangular panels need to show latticeExtra::densityplot (or equivalently panel.densityplot) for the three series. The order doesn't matter for now; I'll work it out later.
Lower triangular panels need to show horizontal box plots. I suppose panel.bwplot would work, but could not successfully tame it.
Here is a skeleton of what may work, but I'll be thankful for any successful version.
library(lattice); library(latticeExtra)
splom(as.data.frame(S),
upper.panel=function(){
panel.abline() # temporary placeholder
},
diag.panel = function(x, ...){
yrng <- current.panel.limits()$ylim
d <- density(x, na.rm=TRUE)
d$y <- with(d, yrng[1] + 0.95 * diff(yrng) * y / max(y) )
panel.lines(d)
diag.panel.splom(x, ...)
},
lower.panel = function(x, y, ...){
panel.abline() # temporary placeholder
},
pscale=0, as.matrix = TRUE
)

Different data in upper and lower panel of scatterplot matrix

I want to plot two different data sets in a scatterplot matrix.
I know that I can use upper.panel and lower.panel to differentiate the plot function. However, I don’t succeed in putting my data in a suitable format to harness this.
Assume I have two tissues (“brain” and “heart”) and four conditions (1–4). Now I can use e.g. pairs(data$heart) to get a scatterplot matrix for one of the data sets. Assume I have the following data:
conditions <- 1 : 4
noise <- rnorm(100)
data <- list(brain = sapply(conditions, function (x) noise + 0.1 * rnorm(100)),
heart = sapply(conditions, function (x) noise + 0.3 * rnorm(100)))
How do I get this into a format so that pairs(data, …) plots one data set above and one below the diagonal, as shown here (green = brain, violet = heart):
Just using
pairs(data, upper.panel = something, lower.panel = somethingElse)
Doesn’t work because that will plot all conditions versus all conditions without regard for different tissue – it essentially ignores the list, and the same when reordering the hierarchy (i.e. having data = (A=list(brain=…, heart=…), B=list(brain=…, heart=…), …)).
This is the best I seem to be able to do via passing arguments:
foo.upper <- function(x,y,ind.upper,col.upper,ind.lower,col.lower,...){
points(x[ind.upper],y[ind.upper],col = col.upper,...)
}
foo.lower <- function(x,y,ind.lower,col.lower,ind.upper,col.upper,...){
points(x[ind.lower],y[ind.lower],col = col.lower,...)
}
pairs(dat[,-5],
lower.panel = foo.lower,
upper.panel = foo.upper,
ind.upper = dat$type == 'brain',
ind.lower = dat$type == 'heart',
col.upper = 'blue',
col.lower = 'red')
Note that each panel needs all arguments. ... is a cruel mistress. If you include only the panel specific arguments in each function, it appears to work, but you get lots and lots of warnings from R trying to pass these arguments on to regular plotting functions and obviously they won't exist.
This was my quick first attempt, but it seems ugly:
dat <- as.data.frame(do.call(rbind,data))
dat$type <- rep(c('brain','heart'),each = 100)
foo.upper <- function(x,y,...){
points(x[dat$type == 'brain'],y[dat$type == 'brain'],col = 'red',...)
}
foo.lower <- function(x,y,...){
points(x[dat$type == 'heart'],y[dat$type == 'heart'],col = 'blue',...)
}
pairs(dat[,-5],lower.panel = foo.lower,upper.panel = foo.upper)
I'm abusing R's scoping here in this second version a somewhat ugly way. (Of course, you could probably do this more cleanly in lattice, but you probably knew that.)
The only other option I can think of is to design your own scatter plot matrix using layout, but that's probably quite a bit of work.
Lattice Edit
Here's at least a start on a lattice solution. It should handle varying x,y axis ranges better, but I haven't tested that.
dat <- do.call(rbind,data)
dat <- as.data.frame(dat)
dat$grp <- rep(letters[1:2],each = 100)
plower <- function(x,y,grp,...){
panel.xyplot(x[grp == 'a'],y[grp == 'a'],col = 'red',...)
}
pupper <- function(x,y,grp,...){
panel.xyplot(x[grp == 'b'],y[grp == 'b'],...)
}
splom(~dat[,1:4],
data = dat,
lower.panel = plower,
upper.panel = pupper,
grp = dat$grp)

How can I arrange an arbitrary number of ggplots using grid.arrange?

This is cross-posted on the ggplot2 google group
My situation is that I'm working on a function that outputs an arbitrary number of plots (depending upon the input data supplied by the user). The function returns a list of n plots, and I'd like to lay those plots out in 2 x 2 formation. I'm struggling with the simultaneous problems of:
How can I allow the flexibility to be handed an arbitrary (n) number of plots?
How can I also specify I want them laid out 2 x 2
My current strategy uses grid.arrange from the gridExtra package. It's probably not optimal, especially since, and this is key, it totally doesn't work. Here's my commented sample code, experimenting with three plots:
library(ggplot2)
library(gridExtra)
x <- qplot(mpg, disp, data = mtcars)
y <- qplot(hp, wt, data = mtcars)
z <- qplot(qsec, wt, data = mtcars)
# A normal, plain-jane call to grid.arrange is fine for displaying all my plots
grid.arrange(x, y, z)
# But, for my purposes, I need a 2 x 2 layout. So the command below works acceptably.
grid.arrange(x, y, z, nrow = 2, ncol = 2)
# The problem is that the function I'm developing outputs a LIST of an arbitrary
# number plots, and I'd like to be able to plot every plot in the list on a 2 x 2
# laid-out page. I can at least plot a list of plots by constructing a do.call()
# expression, below. (Note: it totally even surprises me that this do.call expression
# DOES work. I'm astounded.)
plot.list <- list(x, y, z)
do.call(grid.arrange, plot.list)
# But now I need 2 x 2 pages. No problem, right? Since do.call() is taking a list of
# arguments, I'll just add my grid.layout arguments to the list. Since grid.arrange is
# supposed to pass layout arguments along to grid.layout anyway, this should work.
args.list <- c(plot.list, "nrow = 2", "ncol = 2")
# Except that the line below is going to fail, producing an "input must be grobs!"
# error
do.call(grid.arrange, args.list)
As I am wont to do, I humbly huddle in the corner, eagerly awaiting the sagacious feedback of a community far wiser than I. Especially if I'm making this harder than it needs to be.
You're ALMOST there! The problem is that do.call expects your args to be in a named list object. You've put them in the list, but as character strings, not named list items.
I think this should work:
args.list <- c(plot.list, 2,2)
names(args.list) <- c("x", "y", "z", "nrow", "ncol")
as Ben and Joshua pointed out in the comments, I could have assigned names when I created the list:
args.list <- c(plot.list,list(nrow=2,ncol=2))
or
args.list <- list(x=x, y=y, z=x, nrow=2, ncol=2)
Try this,
require(ggplot2)
require(gridExtra)
plots <- lapply(1:11, function(.x) qplot(1:10,rnorm(10), main=paste("plot",.x)))
params <- list(nrow=2, ncol=2)
n <- with(params, nrow*ncol)
## add one page if division is not complete
pages <- length(plots) %/% n + as.logical(length(plots) %% n)
groups <- split(seq_along(plots),
gl(pages, n, length(plots)))
pl <-
lapply(names(groups), function(g)
{
do.call(arrangeGrob, c(plots[groups[[g]]], params,
list(main=paste("page", g, "of", pages))))
})
class(pl) <- c("arrangelist", "ggplot", class(pl))
print.arrangelist = function(x, ...) lapply(x, function(.x) {
if(dev.interactive()) dev.new() else grid.newpage()
grid.draw(.x)
}, ...)
## interactive use; open new devices
pl
## non-interactive use, multipage pdf
ggsave("multipage.pdf", pl)
I'm answering a bit late, but stumbled on a solution at the R Graphics Cookbook that does something very similar using a custom function called multiplot. Perhaps it will help others who find this question. I'm also adding the answer as the solution may be newer than the other answers to this question.
Multiple graphs on one page (ggplot2)
Here's the current function, though please use the above link, as the author noted that it's been updated for ggplot2 0.9.3, which indicates it may change again.
# Multiple plot function
#
# ggplot objects can be passed in ..., or to plotlist (as a list of ggplot objects)
# - cols: Number of columns in layout
# - layout: A matrix specifying the layout. If present, 'cols' is ignored.
#
# If the layout is something like matrix(c(1,2,3,3), nrow=2, byrow=TRUE),
# then plot 1 will go in the upper left, 2 will go in the upper right, and
# 3 will go all the way across the bottom.
#
multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
require(grid)
# Make a list from the ... arguments and plotlist
plots <- c(list(...), plotlist)
numPlots = length(plots)
# If layout is NULL, then use 'cols' to determine layout
if (is.null(layout)) {
# Make the panel
# ncol: Number of columns of plots
# nrow: Number of rows needed, calculated from # of cols
layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
ncol = cols, nrow = ceiling(numPlots/cols))
}
if (numPlots==1) {
print(plots[[1]])
} else {
# Set up the page
grid.newpage()
pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))
# Make each plot, in the correct location
for (i in 1:numPlots) {
# Get the i,j matrix positions of the regions that contain this subplot
matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))
print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
layout.pos.col = matchidx$col))
}
}
}
One creates plot objects:
p1 <- ggplot(...)
p2 <- ggplot(...)
# etc.
And then passes them to multiplot:
multiplot(p1, p2, ..., cols = n)

Resources