How to create a pairs-plot (matrix-like plot) with `grid`? - r

I'm trying my first steps purely in grid. As an exercise, I would like to create a pairs plot (similar to pairs()) purely based on grid. The function myplotGrob below should create the grid object (grob; or gTree) and return the object.
I'm not sure
what's the best way to continue. Which units should one use? (tried "null", too)
Is frameGrob meant to set up the layout? (this is what I understood from Paul Murrell's book) How do I have to choose/adjust the viewports such that I get the desired plot (so far, I only see a mess) Is the layout meant to be set up beforehand or is it better to just step-by-step "concatenate" additional panels to get the (4, 4) plot matrix?
require(grid)
require(mvtnorm)
set.seed(271)
X <- rmvnorm(1000, mean=1:4, sigma=diag(4:1)) # goal: draw this in a pairs plot
## auxiliary function
panel <- function(x, y) pointsGrob(x=x, y=y, name="panel", gp=gpar(), vp=NULL)
## creates and returns a gTree (class)
myplotGrob <- function(X, name=NULL, gp=NULL, vp=NULL)
{
## x-axis grob
## y-axis grob
## ...
## set up layout
layout <- grid.layout(4, 4, # (4, 4) matrix
widths=rep(0.25, 4), heights=rep(0.25, 4),
default.units="npc")
## pushViewport(viewport(layout=layout)) # required???
all <- frameGrob(layout=layout) # produces a gTree without children
for(i in 1:4) {
for(j in 1:4) {
## group grobs together
gt <- gTree(X,
children=gList(panel(X[,i], X[,j])),
name=name, gp=gp, vp=vp, cl="myplotGrob")
all <- placeGrob(all, gt, row=i, col=j)
}
}
all
}
## draw the gTree
grid.myplot <- function(...) grid.draw(myplotGrob(...))
## call
grid.myplot(X)
UPDATE
As it was asked for, here is the design/layout of the original problem I have in mind (the above would have only been a minimal/learning example). The units in cm were just for me (they should be 'relative' in the end). Of course, the number of panels may vary. I would like all parts to be grid objects, so that the function which creates the graphic will return an object (without printing/drawing). This way, each part can be modified afterwards. The graphic should display results from an array of dimension 5 (or less): one dimension is displayed in the row panels [row.vars], one in the column panels [col.vars], one on the x axis of each panel [xvar], and each panel can contain 2 different dimensions of the array (differing by color and line type) [I used d and n in the drawing]. If course, if the array is four-dimensional, then row 8 of the above design should be missing. I can construct the layout via grid, but the whole question is how to continue from there. That's what I wanted to express with my "minimal example" above.

I think you can divide the task in two main parts, like the basic examples in grid.panel() and grid.multipanel()
1- build a function that will produce a single panel, returned as a gTree. You need to figure out all the parameters, i.e. limits, axes, colours, shapes, grid, coordinates, ... You might end up rewriting lattice panel functions and axes,
grid.newpage()
grid::grid.panel(vp=viewport(width=0.8, height=0.8))
2- assemble the panels in a layout. This is much easier (and cleaner) with gtable,
library(gtable)
grid.newpage()
lg <- replicate(16, grobTree(rectGrob(), pointsGrob()), simplify=FALSE)
gt <- gtable_matrix("pairs", grobs=matrix(lg, ncol=4),
widths=unit(rep(1, 4), "null"),
heights=unit(rep(1, 4), "null"))
gt <- gtable_add_col_space(gt, width=unit(0.5,"line"))
gt <- gtable_add_row_space(gt, height=unit(0.5,"line"))
gt <- gtable_add_padding(gt, padding=unit(1,"line"))
grid.draw(gt)
If you want to build everything from scratch, here too you'll end up having to reinvent a good portion of gtable, I reckon.

Here's an attempt similar to grid.multipanel() but returning a gTree, and more specific to your pairs plot,
require(grid)
require(mvtnorm)
set.seed(271)
X <- rmvnorm(100, mean=1:4, sigma=diag(4:1)) # goal: draw this in a pairs plot
panelGrob <- function(x=runif(10, -10, 10), y=runif(10, -10, 100), ...,
xlim = range(x), ylim=range(y),
axis.x=TRUE, axis.y=TRUE){
xx <- pretty(x) ; yy <- pretty(y)
xx <- xx[xx <= xlim[2] & xx >= xlim[1]]
yy <- yy[yy <= ylim[2] & yy >= ylim[1]]
r <- rectGrob()
dvp <- dataViewport(xData=xx, yData=yy)
p <- pointsGrob(x, y, pch=".", gp=gpar(col="red"), default.units="native",
vp = dvp)
ax <- if(axis.x) xaxisGrob(at=xx, vp=dvp) else nullGrob()
ay <- if(axis.y) yaxisGrob(at=yy, vp=dvp) else nullGrob()
grobTree(r, ax, ay, p, ...)
}
grid.panel <- function(...)
grid.draw(panelGrob(...))
grid.newpage()
grid.panel(vp=viewport(width=0.8, height=0.8))
pairsGrob <- function(X, ..., name=NULL, gp=NULL, vp=NULL){
N <- NCOL(X)
layout <- grid.layout(N+1, N+1,
widths=unit(c(2, rep(1, N)), c("lines", rep("null", N))),
heights = unit(c(rep(1, N), 2), c(rep("null", N), "lines")))
wrap <- function(ii, jj, ...){
panelGrob(X[,ii], X[,jj], ..., axis.x= ii == N, axis.y = jj == 1,
vp=viewport(layout.pos.row=ii, layout.pos.col=jj+1))
}
rowcol <- expand.grid(ii=seq_len(N), jj=seq_len(N))
gl <- mapply(wrap, ii=rowcol[,"ii"], jj=rowcol[,"jj"], MoreArgs=list(...),
SIMPLIFY=FALSE)
gTree(children=do.call(gList, gl), vp=viewport(layout=layout))
}
grid.pairs <- function(...) grid.draw(pairsGrob(...))
grid.newpage()
grid.pairs(X, xlim=c(-10,10), ylim=c(-10,10))
Many problems are already apparent: i) it's cumbersome to add spacings in the layout, keeping track of the right viewports; ii) most parameters of the panel function are hard-wired (point shape, colour, grid, axis labels, ...), be prepared for an explosion in complexity, as in args(lattice::panel.xyplot); iii) the range of the axes should match across one row / column, which requires some thought about splitting the data properly in groups (facetting in ggplot2 or lattice); iv) the legend is yet another thing to reinvent in grid; v) ...

Related

R: How to plot 4 graphs using a loop statement with the plot function

I have the following code which creates a plot for, the data is located here
Data
data<-lidar
x<-lidar$range
y<-lidar$logratio
h<-20
par(mfrow=c(2,2))
r<-max(x)-min(x)
bn<-ceiling(r/h)
binwidth=c(5,10,30,100)
#Creates a matrix to handle the data of same length
W<-matrix(nrow=length(x),ncol=bn)
for (j in 1:bn){
for (i in 1:length(x)){
if (x[i]>=(min(x)+(j-1)*h) && x[i]<=(min(x)+(j)*h)){W[i,j]=1}
else {W[i,j]=0}
}
}
#Sets up the y-values of the bins
fit<-rep(0,bn)
for (j in 1:bn){
fit[j]<- sum(y*W[,j]/sum(W[,j]))
}
#Sets up the x values of the bins
t<-numeric(bn)
for (j in 1:bn){
t[j]=(min(x)+0.5*h)+(j-1)*h
}
plot(x,y)
lines(t,fit,type = "S", col = 1, lwd = 2)
This creates a single plot in the left corner of a page since I have
par(mfrow=c(2,2))
Is there a way to create a for statement that will plot 4 graphs for me on that one page using h values of 5,10,30,100 (The values provided by the variable binwidth) so I don't have to manually change my h value every time to reproduce a new plot so my final result appears like this,
Essentially I want to run the code 4 times with different values of h using another for statement that plots all 4 results without me changing h all the time. Any help or hints are greatly appreciated.
Here's a fully reproducible example that loads the data directly from the url then uses the apply family to iterate through the different plots
lidar <- read.table(paste0("http://www.stat.cmu.edu/%7Elarry",
"/all-of-nonpar/=data/lidar.dat"),
header = TRUE)
par(mfrow = c(2, 2))
breaks <- lapply(c(5, 10, 30, 100), function(i) {
val <- seq(min(lidar$range), max(lidar$range), i)
c(val, max(val) + i)})
means <- lapply(breaks, function(i) {
vals <- tapply(lidar$logratio,
cut(lidar$range, breaks = i, include.lowest = TRUE), mean)
c(vals[1], vals)})
invisible(mapply(function(a, b) {
plot(lidar$range, lidar$logratio)
lines(a, b, type = "S", lwd = 2)
}, breaks, means))
Created on 2020-09-25 by the reprex package (v0.3.0)
Answering directly your question: keep the same other parameters:
data<-lidar
x<-lidar$range
y<-lidar$logratio
h<-20
par(mfrow=c(2,2))
r<-max(x)-min(x)
bn<-ceiling(r/h)
binwidth=c(5,10,30,100)
do a plot function (not necessary, but good practice)
doplot = function(h){
#Creates a matrix to handle the data of same length
W<-matrix(nrow=length(x),ncol=bn)
for (j in 1:bn){
for (i in 1:length(x)){
if (x[i]>=(min(x)+(j-1)*h) && x[i]<=(min(x)+(j)*h)){W[i,j]=1}
else {W[i,j]=0}
}
}
#Sets up the y-values of the bins
fit<-rep(0,bn)
for (j in 1:bn){
fit[j]<- sum(y*W[,j]/sum(W[,j]))
}
#Sets up the x values of the bins
t<-numeric(bn)
for (j in 1:bn){
t[j]=(min(x)+0.5*h)+(j-1)*h
}
plot(x,y)
lines(t,fit,type = "S", col = 1, lwd = 2)
}
and then loop on the h parameter
for(h in c(5,10,30,100)){
doplot(h)
}
A general comment: you could gain a lot learning how to use the data.frames, a bit of dplyr or data.table and ggplot2 to do that. I feels that you could replicate your entire code + plots in 10 more comprehensible lines.

Putting multiple graphs (ggplot2 and other types) in one plot in R

Is there a way of mixing ggplot2 with other type of plots (survplot, plot, etc.). I have tried par and layout but nothing seems to be appropriate.
Thanks
I use the function grid.arrange within the package grid.Extra
You haven't provided sample data, but if you have 4 plots saved as "a", "b", "c" and "d", your code would be as follows:
grid.arrange(a, b, c, d, nrow=2, ncol=2)
You can use "?grid.arrange" to learn more about adding additional things into your plot, like a title, the heights of the images, etc.
grid.arrange(a, b, c, d, nrow=4), top="YourTitleHere", heights=c(3,1,3,1))
There exist a nice function multiplot, which I have in my own standard library always loaded. It can be googled but here it is.
# Multiple plot function
#
# ggplot objects can be passed in ..., or to plotlist (as a list of ggplot objects)
# - cols: Number of columns in layout
# - layout: A matrix specifying the layout. If present, 'cols' is ignored.
#
# If the layout is something like matrix(c(1,2,3,3), nrow=2, byrow=TRUE),
# then plot 1 will go in the upper left, 2 will go in the upper right, and
# 3 will go all the way across the bottom.
#
library(ggplot2)
multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
library(grid)
# Make a list from the ... arguments and plotlist
plots <- c(list(...), plotlist)
numPlots = length(plots)
# If layout is NULL, then use 'cols' to determine layout
if (is.null(layout)) {
# Make the panel
# ncol: Number of columns of plots
# nrow: Number of rows needed, calculated from # of cols
layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
ncol = cols, nrow = ceiling(numPlots/cols))
}
if (numPlots==1) {
print(plots[[1]])
} else {
# Set up the page
grid.newpage()
pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))
# Make each plot, in the correct location
for (i in 1:numPlots) {
# Get the i,j matrix positions of the regions that contain this subplot
matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))
print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
layout.pos.col = matchidx$col))
}
}
}

how to plot a figure with specific distance between each line

Actually I try to plot a figure but it puts and shows all the columns(lines) on each other so it is not representative. I try to make a simulated data and show you how I plot it, and also show you what I want
I don't know how to make a data like example i show below but here what I do
set.seed(1)
M <- matrix(rnorm(20),20,5)
x <- as.matrix(sort(runif(20, 5.0, 7.5)))
df <- as.data.frame(cbind(x,M))
After making the data frame, I will plot all columns versus the first one by melting it and using ggplot
require(ggplot2)
require(reshape)
dff <- melt(df , id.vars = 'V1')
b <- ggplot(dff, aes(V1,value)) + geom_line(aes(colour = variable))
I want to have specific distance between each line (in this case we have 6) something like below. in one dimension it is V1, in another dimension it is the number of column. I don't care about the function , I just want the photo
This solution uses rgl and produces this plot:
It uses this function that accepts 3 arguments:
df : a data.frame just like your 'M' above
x : a numeric vector (or a 1-coldata.frame`) for the x-axis
cols : (optionnal) a vector of colours to repeat. If missing, black line are drawn
Here is the function:
nik_plot <- function(df, x, cols){
require(rgl)
# if a data.frame is
if (is.data.frame(x) && ncol(x)==1)
x <- as.numeric(x[, 1])
# prepare a vector of colors
if (missing(cols))
cols <- rep_len("#000000", nrow(df))
else
cols <- rep_len(cols, nrow(df))
# initialize an empty 3D plot
plot3d(NA, xlim=range(x), ylim=c(1, ncol(df)-1), zlim=range(df), xlab="Mass/Charge (M/Z)", ylab="Time", zlab="Ion Spectra", box=FALSE)
# draw lines, silently
silence_please <- sapply(1:ncol(df), function(i) lines3d(x=x, y=i, z=df[, i], col=cols[i]))
}
Note that you can remove require(rgl) from the function and library(rgl) somewhere in your script, eg at the beginning.
If you don't have rgl installed, then install.packages("rgl").
Black lines, the default, may produce some moiré effect, but a repeating color palette is worse. This may be brain-dependant. A single colour would also avoid introducing an artificial dimension (and a strong one).
An example below:
# black lines
nik_plot(M, x)
# as in the image above
nik_plot(M, x, "grey40")
# an unreadable rainbow
nik_plot(M, x, rainbow(12))
The 3D window can be navigated with the mouse.
Do you need something else?
EDIT
You can build your second plot with the function below. The range of your data is so large, and I think the whole idea behind shifting upwards every line, prevent having an y-axis with a reliable scale. Here I have normalized all signals (0 <= signal <= 1). Also the parameter gap can be use to play with this. We could disconnect the two behaviors but I think it's nice. Try different values of gap and see examples below.
df : a data.frame just like your 'M' above
x : a numeric vector (or a 1-coldata.frame`) for the x-axis
cols : (optionnal) a vector of colours to repeat. If missing, black line are drawn
gap : gap factor between individual lines
more_gap_each: every n lines, a bigger gap is produced...
more_gap_relative: ... and will be gap x more_gap_relative wide
Here is the function:
nik_plot2D <- function(df, x, cols, gap=10, more_gap_each=1, more_gap_relative=0){
if (is.data.frame(x) && ncol(x)==1)
x <- as.numeric(x[, 1])
# we normalize ( 0 <= signal <= 1)
df <- df-min(df)
df <- (df/max(df))
# we prepare a vector of colors
if (missing(cols))
cols <- rep_len("#00000055", nrow(df))
else
cols <- rep_len(cols, nrow(df))
# we prepare gap handling. there is probably more elegant
gaps <- 1
for (i in 2:ncol(df))
gaps[i] <- gaps[i - 1] + 1/gap + ifelse((i %% more_gap_each) == 0, (1/gap)*more_gap_relative, 0)
# we initialize the plot
plot(NA, xlim=range(x), ylim=c(min(df), 1+max(gaps)), xlab="Time", ylab="", axes=FALSE, mar=rep(0, 4))
axis(1)
# finally, the lines
silent <- lapply(1:ncol(df), function(i) lines(x, df[, i] + gaps[i], col=cols[i]))
}
We can use it with (default):
nik_plot2D(M, x) # gap=10
And you obtain this plot:
or:
nik_plot2D(M, x, 50)
or, with colors:
nik_plot2D(M, x, gap=20, cols=1:3)
nik_plot2D(M, x, gap=20, cols=rep(1:3, each=5))
or, still with colours and but with larger gaps:
nik_plot2D(M, x, gap=20, cols=terrain.colors(10), more_gap_each = 1, more_gap_relative = 0) # no gap by default
nik_plot2D(M, x, gap=20, cols=terrain.colors(10), more_gap_each = 10, more_gap_relative = 4) # large gaps every 10 lines
nik_plot2D(M, x, gap=20, cols=terrain.colors(10), more_gap_each = 5, more_gap_relative = 2) # small gaps every 5 lines
As other have pointed out, your data have very large peaks and it's not clear whether you want to allow some curves to overlap,
m <- read.table("~/Downloads/M.txt", head=T)
fudge <- 0.05
shifty <- function(m, fudge=1){
shifts <- fudge * max(abs(apply(m, 2, diff))) * seq(0, ncol(m)-1)
m + matrix(shifts, nrow=nrow(m), ncol=ncol(m), byrow=TRUE)
}
par(mfrow=c(1,2), mar=c(0,0,1,0))
cols <- colorRampPalette(blues9[4:9])(ncol(m))
matplot(shifty(m), t="l", lty=1, bty="n", yaxt="n", xaxt="n", ylab="", col=cols)
title("no overlap")
matplot(shifty(m, 0.05), t="l", lty=1, bty="n", yaxt="n", xaxt="n", ylab="", col=cols)
title("some overlap")
Alternatively, some outlier/peak detection scheme could be used to filter them out before calculating the shift between curves,
library(outliers)
shifty2 <- function(m, outliers = 10){
tmp <- m
for(ii in seq_len(outliers)) tmp <- rm.outlier(tmp, median = TRUE)
shifts <- max(abs(apply(tmp, 2, diff))) * seq(0, ncol(m)-1)
m + matrix(shifts, nrow=nrow(m), ncol=ncol(m), byrow=TRUE)
}
matplot(shifty2(m), t="l", lty=1, bty="n", yaxt="n", xaxt="n", ylab="", col=cols)
(there are probably good algorithms to decide which points to remove, but I don't know them)
For 3D plotting I prefer the rgl package. This should be close to your desired solution.
The color of each scan changes on every third one.
library(rgl)
M<-read.table("M.txt", sep="\t", header = TRUE, colClasses = "numeric")
x<-read.table("x.txt", sep="\t", header = TRUE)
n<-ncol(M)
M[M<1]<-1
plot3d(x='', xlim=range(x$Time), ylim=c(1, n), zlim=(range(M)), box=FALSE)
sapply(seq(1,n), function(t){lines3d(x$Time, y=t*10, z=(M[,t])/10000, col=t/3+1)})
title3d(xlab="scan", ylab="time", zlab="intensity")
title3d(main ="Extracted Spectra Subset")
axes3d()
#axis3d(edge="x")
#axis3d(edge="y")
#axis3d(edge="z")
Do the huge differences in magnitude of the data points, I needed to scale some factors to make a readable graph. The intensity of the goes from 0 to nearly 1,000,000, thus distorting the graph. Attempted to normalize by taking the ln, but plot became unreadable.

R: multiple ggplot2 plot using d*ply

I know variations on this question have been up several times, but couldn't figure out how to apply those solutions to this particular challenge:
I would like to use ggplot inside a d*ply call to plot the data (data frame dat below) broken up by the v3variable and display a numeric variable v2 for the 3 conditions in v1. I want to have the plots in one page (pdf), so thought I could use dlply to contain resulting plots in a list that then could be fed to the multiplot wrapper function for ggplot2 found in 'Cookbook for R' here
# Multiple plot function
#
# ggplot objects can be passed in ..., or to plotlist (as a list of ggplot objects)
# - cols: Number of columns in layout
# - layout: A matrix specifying the layout. If present, 'cols' is ignored.
#
# If the layout is something like matrix(c(1,2,3,3), nrow=2, byrow=TRUE),
# then plot 1 will go in the upper left, 2 will go in the upper right, and
# 3 will go all the way across the bottom.
#
multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
require(grid)
# Make a list from the ... arguments and plotlist
plots <- c(list(...), plotlist)
numPlots = length(plots)
# If layout is NULL, then use 'cols' to determine layout
if (is.null(layout)) {
# Make the panel
# ncol: Number of columns of plots
# nrow: Number of rows needed, calculated from # of cols
layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
ncol = cols, nrow = ceiling(numPlots/cols))
}
if (numPlots==1) {
print(plots[[1]])
} else {
# Set up the page
grid.newpage()
pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))
# Make each plot, in the correct location
for (i in 1:numPlots) {
# Get the i,j matrix positions of the regions that contain this subplot
matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))
print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
layout.pos.col = matchidx$col))
}
}
}
Here is a toy data frame:
set.seed(999)
dat <- data.frame(
v1 = rep(c("A","B","C"),25),
v2 = runif(75,-1,2),
v3 = sample(c("hippo", "smoke", "meat"), 75, replace=T))
Here is the best I could come up with - it gives the plots separately but doesnt merge them, and gives a strange output in console. Note that any solution not using multiplot() is just as good for me.
require(dplyr)
require(ggplot2)
p <- dlply(dat, .(v3), function(x){
ggplot(x,aes(v1, v2)) +
geom_point()})
multiplot(plotlist=p, cols=2)
Here's a different way that avoids multiplot() and uses techniques shown here and here:
library(ggplot2)
library(dplyr)
results <- dat %>%
group_by(v3) %>%
do(plot = ggplot(., aes(v1, v2)) + geom_point())
pdf('all.pdf')
invisible(lapply(results$plot, print))
dev.off()

Sub Column Names on Grid Extra

I'm trying to create a table using the gridExtra package in R, and I want to have sub column names under a general column name. For example have one large column titled "Urbana-Champaign" that spans over two smaller column names "element" and "number of genes." I have looked everywhere on the gridExtra support site but can't seem to find a way to create overall column names that encompass subcolumns. Does anyone know how?
It's rather easy to get a basic gtable, and add new text to it, but you'd have to add all the formatting and styling of the cells. That's where I always give up -- way too many parameters and options to take care of.
library(gtable)
gtable_add_grobs <- gtable_add_grob #misleading name
d <- head(iris, 3)
extended_matrix <- cbind(c("", rownames(d)), rbind(colnames(d), as.matrix(d)))
all_grobs <- matrix(lapply(extended_matrix, textGrob), ncol=ncol(d) + 1)
row_heights <- function(m){
do.call(unit.c, apply(m, 1, function(l)
max(do.call(unit.c, lapply(l, grobHeight)))))
}
col_widths <- function(m){
do.call(unit.c, apply(m, 2, function(l)
max(do.call(unit.c, lapply(l, grobWidth)))))
}
g <- gtable_matrix("table", grobs=all_grobs,
widths=col_widths(all_grobs) + unit(4,"mm"),
heights=row_heights(all_grobs) + unit(4,"mm"))
g <- gtable_add_rows(g, unit(1, "line"), 0)
g <- gtable_add_grobs(g, list(textGrob("Sepal's main title"),
textGrob("Petal's main title"))
t=1,b=1,l=c(2, 4), r=c(3, 5))
grid.newpage()
grid.draw(g)

Resources