Visual Comparison of Regression & PCA - r

I'm trying to perfect a method for comparing regression and PCA, inspired by the blog Cerebral Mastication which has also has been discussed from a different angle on SO. Before I forget, many thanks to JD Long and Josh Ulrich for much of the core of this. I'm going to use this in a course next semester. Sorry this is long!
UPDATE: I found a different approach which almost works (please fix it if you can!). I posted it at the bottom. A much smarter and shorter approach than I was able to come up with!
I basically followed the previous schemes up to a point: Generate random data, figure out the line of best fit, draw the residuals. This is shown in the second code chunk below. But I also dug around and wrote some functions to draw lines normal to a line through a random point (the data points in this case). I think these work fine, and they are shown in First Code Chunk along with proof they work.
Now, the Second Code Chunk shows the whole thing in action using the same flow as #JDLong and I'm adding an image of the resulting plot. Data in black, red is the regression with residuals pink, blue is the 1st PC and the light blue should be the normals, but obviously they are not. The functions in First Code Chunk that draw these normals seem fine, but something is not right with the demonstration: I think I must be misunderstanding something or passing the wrong values. My normals come in horizontal, which seems like a useful clue (but so far, not to me). Can anyone see what's wrong here?
Thanks, this has been vexing me for a while...
First Code Chunk (Functions to Draw Normals and Proof They Work):
##### The functions below are based very loosely on the citation at the end
pointOnLineNearPoint <- function(Px, Py, slope, intercept) {
# Px, Py is the point to test, can be a vector.
# slope, intercept is the line to check distance.
Ax <- Px-10*diff(range(Px))
Bx <- Px+10*diff(range(Px))
Ay <- Ax * slope + intercept
By <- Bx * slope + intercept
pointOnLine(Px, Py, Ax, Ay, Bx, By)
}
pointOnLine <- function(Px, Py, Ax, Ay, Bx, By) {
# This approach based upon comingstorm's answer on
# stackoverflow.com/questions/3120357/get-closest-point-to-a-line
# Vectorized by Bryan
PB <- data.frame(x = Px - Bx, y = Py - By)
AB <- data.frame(x = Ax - Bx, y = Ay - By)
PB <- as.matrix(PB)
AB <- as.matrix(AB)
k_raw <- k <- c()
for (n in 1:nrow(PB)) {
k_raw[n] <- (PB[n,] %*% AB[n,])/(AB[n,] %*% AB[n,])
if (k_raw[n] < 0) { k[n] <- 0
} else { if (k_raw[n] > 1) k[n] <- 1
else k[n] <- k_raw[n] }
}
x = (k * Ax + (1 - k)* Bx)
y = (k * Ay + (1 - k)* By)
ans <- data.frame(x, y)
ans
}
# The following proves that pointOnLineNearPoint
# and pointOnLine work properly and accept vectors
par(mar = c(4, 4, 4, 4)) # otherwise the plot is slightly distorted
# and right angles don't appear as right angles
m <- runif(1, -5, 5)
b <- runif(1, -20, 20)
plot(-20:20, -20:20, type = "n", xlab = "x values", ylab = "y values")
abline(b, m )
Px <- rnorm(10, 0, 4)
Py <- rnorm(10, 0, 4)
res <- pointOnLineNearPoint(Px, Py, m, b)
points(Px, Py, col = "red")
segments(Px, Py, res[,1], res[,2], col = "blue")
##========================================================
##
## Credits:
## Theory by Paul Bourke http://local.wasp.uwa.edu.au/~pbourke/geometry/pointline/
## Based in part on C code by Damian Coventry Tuesday, 16 July 2002
## Based on VBA code by Brandon Crosby 9-6-05 (2 dimensions)
## With grateful thanks for answering our needs!
## This is an R (http://www.r-project.org) implementation by Gregoire Thomas 7/11/08
##
##========================================================
Second Code Chunk (Plots the Demonstration):
set.seed(55)
np <- 10 # number of data points
x <- 1:np
e <- rnorm(np, 0, 60)
y <- 12 + 5 * x + e
par(mar = c(4, 4, 4, 4)) # otherwise the plot is slightly distorted
plot(x, y, main = "Regression minimizes the y-residuals & PCA the normals")
yx.lm <- lm(y ~ x)
lines(x, predict(yx.lm), col = "red", lwd = 2)
segments(x, y, x, fitted(yx.lm), col = "pink")
# pca "by hand"
xyNorm <- cbind(x = x - mean(x), y = y - mean(y)) # mean centers
xyCov <- cov(xyNorm)
eigenValues <- eigen(xyCov)$values
eigenVectors <- eigen(xyCov)$vectors
# Add the first PC by denormalizing back to original coords:
new.y <- (eigenVectors[2,1]/eigenVectors[1,1] * xyNorm[x]) + mean(y)
lines(x, new.y, col = "blue", lwd = 2)
# Now add the normals
yx2.lm <- lm(new.y ~ x) # zero residuals: already a line
res <- pointOnLineNearPoint(x, y, yx2.lm$coef[2], yx2.lm$coef[1])
points(res[,1], res[,2], col = "blue", pch = 20) # segments should end here
segments(x, y, res[,1], res[,2], col = "lightblue1") # the normals
############ UPDATE
Over at Vincent Zoonekynd's Page I found almost exactly what I wanted. But, it doesn't quite work (obviously used to work). Here is a code excerpt from that site which plots normals to the first PC reflected through a vertical axis:
set.seed(1)
x <- rnorm(20)
y <- x + rnorm(20)
plot(y~x, asp = 1)
r <- lm(y~x)
abline(r, col='red')
r <- princomp(cbind(x,y))
b <- r$loadings[2,1] / r$loadings[1,1]
a <- r$center[2] - b * r$center[1]
abline(a, b, col = "blue")
title(main='Appears to use the reflection of PC1')
u <- r$loadings
# Projection onto the first axis
p <- matrix( c(1,0,0,0), nrow=2 )
X <- rbind(x,y)
X <- r$center + solve(u, p %*% u %*% (X - r$center))
segments( x, y, X[1,], X[2,] , col = "lightblue1")
And here is the result:

Alright, I'll have to answer my own question! After further reading and comparison of methods that people have put on the internet, I have solved the problem. I'm not sure I can clearly state what I "fixed" because I went through quite a few iterations. Anyway, here is the plot and the code (MWE). The helper functions are at the end for clarity.
# Comparison of Linear Regression & PCA
# Generate sample data
set.seed(39) # gives a decent-looking example
np <- 10 # number of data points
x <- -np:np
e <- rnorm(length(x), 0, 10)
y <- rnorm(1, 0, 2) * x + 3*rnorm(1, 0, 2) + e
# Plot the main data & residuals
plot(x, y, main = "Regression minimizes the y-residuals & PCA the normals", asp = 1)
yx.lm <- lm(y ~ x)
lines(x, predict(yx.lm), col = "red", lwd = 2)
segments(x, y, x, fitted(yx.lm), col = "pink")
# Now the PCA using built-in functions
# rotation = loadings = eigenvectors
r <- prcomp(cbind(x,y), retx = TRUE)
b <- r$rotation[2,1] / r$rotation[1,1] # gets slope of loading/eigenvector 1
a <- r$center[2] - b * r$center[1]
abline(a, b, col = "blue") # Plot 1st PC
# Plot normals to 1st PC
X <- pointOnLineNearPoint(x, y, b, a)
segments( x, y, X[,1], X[,2], col = "lightblue1")
###### Needed Functions
pointOnLineNearPoint <- function(Px, Py, slope, intercept) {
# Px, Py is the point to test, can be a vector.
# slope, intercept is the line to check distance.
Ax <- Px-10*diff(range(Px))
Bx <- Px+10*diff(range(Px))
Ay <- Ax * slope + intercept
By <- Bx * slope + intercept
pointOnLine(Px, Py, Ax, Ay, Bx, By)
}
pointOnLine <- function(Px, Py, Ax, Ay, Bx, By) {
# This approach based upon comingstorm's answer on
# stackoverflow.com/questions/3120357/get-closest-point-to-a-line
# Vectorized by Bryan
PB <- data.frame(x = Px - Bx, y = Py - By)
AB <- data.frame(x = Ax - Bx, y = Ay - By)
PB <- as.matrix(PB)
AB <- as.matrix(AB)
k_raw <- k <- c()
for (n in 1:nrow(PB)) {
k_raw[n] <- (PB[n,] %*% AB[n,])/(AB[n,] %*% AB[n,])
if (k_raw[n] < 0) { k[n] <- 0
} else { if (k_raw[n] > 1) k[n] <- 1
else k[n] <- k_raw[n] }
}
x = (k * Ax + (1 - k)* Bx)
y = (k * Ay + (1 - k)* By)
ans <- data.frame(x, y)
ans
}

Try changing this line of your code:
res <- pointOnLineNearPoint(x, y, yx2.lm$coef[2], yx2.lm$coef[1])
to
res <- pointOnLineNearPoint(x, new.y, yx2.lm$coef[2], yx2.lm$coef[1])
So you're calling the correct y values.

In Vincent Zoonekynd's code, change the line u <- r$loadings to u <- solve(r$loadings). In the second instance of solve(), the predicted component scores along the first principal axis (i.e., the matrix of predicted scores with the second predicted components scores set to zero) need to be multiplied by the inverse of the loadings/eigenvectors. Multiplying data by the loadings gives predicted scores; dividing predicted scores by the loadings give data. Hope that helps.

Related

Creating a 3D surface plot from two vectors and a matrix

I have got two vectors and a 2D-matrix, from which I want to create a 3D surface plot. I already have split my data into X and Y (vectors (time "t" and wavelength "w") and Z (matrix; absorbance "NIR" at time and wavelength) with the same number of rows/columns respectively:
t = matrix(1:456, ncol= 1)
w = matrix(1350:1650, nrow = 1)
NIR = as.matrix(read.table("NIR_alle_pur.txt", header = TRUE, dec =","))
colnames(NIR) = c(paste0("NIR.", 1350:1650))
dim(NIR)
# [1] 456 301
dput(NIR_example)
structure(c(60771.93, 57230.56, 56235.96, 41617.47, 41709.93,
57466.6, 59916.97, 63376.4, 41966.73, 41254.34, 65535, 61468.76,
65535, 41238.03, 42530.97, 56936.03, 65009.4, 65535, 40375.5,
41021.6, 62757, 65455.44, 63795.6, 41349.6, 41178.2), .Dim = c(5L,
5L), .Dimnames = list(NULL, c("NIR.Spectrum_1350.0000000", "NIR.Spectrum_1351.0000000",
"NIR.Spectrum_1352.0000000", "NIR.Spectrum_1353.0000000", "NIR.Spectrum_1354.0000000"
)))
I tried to insert those into the rgl.surface function, but I get the following error message:
Error in rgl.surface(x, y, z, coords = 1:3) : Bad dimension for rows
I've also tried to plot them with plotly, but my success was equally low.
Can someone give me an input how I can get my spectral data to look like the last ones (multiple surfaces) on this site, individually? I'll try the overlay of the surfaces with plotlylater on!
I am happy for every extra input and information on my level!
Thank you!
After looking at the source code, I'd guess the problem is that you stored your x and y vectors as matrices. If they are matrices, they need to be identical in shape to z.
As I mentioned in a comment, you should avoid using rgl.surface (and the other rgl.* functions in most cases), and use surface3d instead, or persp3d if you want axes.
The *3d functions are higher level functions that act more like other R functions, and they will lead to fewer problems in the long run.
You haven't posted any data, so I'll post a completely artificial example. Let's suppose z = x^2 + y^2 + a, where a is a different constant for each surface. Then you can plot it like this:
x <- seq(-2, 2, length = 7)
y <- seq(-3, 3, length = 5) # I've chosen different ranges
# and lengths just to illustrate.
z <- outer(x, y, function(x, y) x^2 + y^2)
colours <- heat.colors(100)
minval <- min(z)
maxval <- max(z) + 10
col <- colours[(z - minval)/(maxval - minval)*99 + 1]
persp3d(x, y, z, col = col) # get axes the first time
z <- outer(x, y, function(x, y) x^2 + y^2 + 5)
col <- colours[(z - minval)/(maxval - minval)*99 + 1]
surface3d(x, y, z, col = col)
z <- outer(x, y, function(x, y) x^2 + y^2 + 10)
col <- colours[(z - minval)/(maxval - minval)*99 + 1]
surface3d(x, y, z, col = col)
aspect3d(1, 1, 1) # Make axes all equal
That produces this plot:

using grid.arrange with loop holding multiple plots

I am trying to create a plot where for each i there is a density graph and a histogram side by side. For this instance i = 1..3
The problem I have is creating the list to pass to grid.arrange. However I do it it seems to repeat itself somehow.
df:
x1 x2 x3
1 108.28 17.05 1484.10
2 152.36 16.59 750.33
3 95.04 10.91 766.42
4 65.45 14.14 1110.46
5 62.97 9.52 1031.29
6 263.99 25.33 195.26
7 265.19 18.54 193.83
8 285.06 15.73 191.11
9 92.01 8.10 1175.16
10 165.68 11.13 211.15
X <- df
mu.X <- colMeans(X)
cov.X <- cov(X)
eg <- eigen(cov.X)
myprinboot = function(
X,
iter = 10000,
alpha = 0.05,
prettyPlot = T
){
# Find the dimensions of X
nrX <- dim(X)[1]
nx <- dim(X)[2]
# Make matrices of suitable sizes to hold the booted parameter estimates
# lambda
# each cov matrix will have nx lambdas
lambda.mat <- matrix(NA, nr = nx, nc = iter)
# e vectors nx components each and one vector per eigen value
# Each cov matrix will therefore produce a nx X nx matrix of components
Y.mat <- matrix(NA, nr = nx, nc = iter * nx)
# For loop to fill the matrices created above
for (i in 1:iter)
{
# ind will contain random integers used to make random samples of the X matrix
# Must use number of rows nrX to index
ind <- sample(1:nrX,nrX,replace=TRUE)
# eigen will produce lambdas in decreasing order of size
# make an object then remove extract the list entries using $
eigvalvec <- eigen(cov(X[ind,]))
lambda.mat[,i] <- eigvalvec$values
colstart <- 1 + nx * (i - 1)
colend <- colstart + nx - 1
Y.mat[,colstart:colend] = eigvalvec$vectors
}
if(prettyPlot){
p <- list()
i <- 0
for(j in 1:(2*nx))
{
if (j %% 2 == 0){
p[[j]] <- ggplot(NULL, aes(lambda.mat[i,])) +
geom_histogram(color = 'black', fill = 'green', alpha = .5) +
xlab(substitute(lambda[i])) +
ggtitle(substitute(paste("Histogram of the pc variance ", lambda[i])))
} else {
i <- i + 1
p[[j]] <- ggplot(NULL, aes(lambda.mat[i,])) +
geom_density(fill = 'blue', alpha = .5) +
xlab((substitute(lambda[i]))) +
ggtitle(substitute(paste("Density plot of the pc variance ", lambda[i])))
}
do.call(grid.arrange, p)
}
do.call(grid.arrange, p)
} else {
layout(matrix(1:(2*nx),nr=nx,nc=2,byrow=TRUE))
for(i in 1:nx)
{
plot(density(lambda.mat[i,]),xlab=substitute(lambda[i]),
main=substitute(paste("Density plot of the pc variance ", lambda[i])
))
hist(lambda.mat[i,],xlab=substitute(lambda[i]),
main=substitute(paste("Histogram of the pc variance ", lambda[i])))
}
}
library(rgl)
plot3d(t(lambda.mat))
list(lambda.mat = lambda.mat, Y.mat = Y.mat)
}
pc <- myprinboot(X = Y, iter=1000, alpha=0.5)
Output
Anyone have any clue what I am doing wrong or is this just not possible?
I don't understand your code, Jay, as it seems to do lots of things and use both base and ggplot plotting, but if all you want is to create a combined histogram and density plot for each j, why not loop over j and inside that for j loop do something like this:
d <- your density plot created so that it depends on j only
h <- your histogram plot created so that it depends on j only
p[[j]] <- grid.arrange(d,h,ncol=2)
Then, when you come out of the loop, you'll have an object p which consists of a list of plots, with each plot consisting of a combination of density plot and histogram.
Then you could use the cowplot package (after installing it) to do something like this:
cowplot::plot_grid(plotlist = p, ncol = 2)
where the number of columns may need to be changed. See here for other ways to plot a list of plots: How do I arrange a variable list of plots using grid.arrange?
I don't know enough about your problem to understand why you treat the case of j even and j odd differently. But the underlying idea should be the same as what I suggested here.
I eventually got this working as follows.
getHist <- function(x, i){
lam <- paste('$\\lambda_', i, '$', sep='')
p <- qplot(x[i,],
geom="histogram",
fill = I('green'),
color = I('black'),
alpha = I(.5),
main=TeX(paste("Histogram of the pc variance ", lam, sep='')),
xlab=TeX(lam),
ylab="Count",
show.legend=F)
return(p)
}
getDens <- function(x, i){
lam <- paste('$\\lambda_', i, '$', sep='')
p <- qplot(x[i,],
geom="density",
fill = I('blue'),
alpha = I(.5),
main=TeX(paste("Density plot of the pc variance ", lam, sep='')),
xlab=TeX(lam),
ylab="Density",
show.legend=F)
return(p)
}
fp <- lapply(1:3, function(x) arrangeGrob(getHist(lambda.mat, x), getDens(lambda.mat, x), ncol=2))
print(marrangeGrob(fp, nrow = 3, ncol=1, top = textGrob("Lambda.mat Histogram and Density Plot",gp=gpar(fontsize=18))))

R: Add points to surface plot with persp having the appropriate size

I would like to achieve that the points I add to the plot have their size adjusted to obtain a better 3D impression. I know that I somehow have to use the transformation matrix that is returned to compute the length of the vector orthogonal to the 2d plane to the respective point in 3d, but I don't know how to do that.
Here is an example:
x1 <- rnorm(100)
x2 <- 4 + rpois(100, 4)
y <- 0.1*x1 + 0.2*x2 + rnorm(100)
dat <- data.frame(x1, x2, y)
m1 <- lm(y ~ x1 + x2, data=dat)
x1r <- range(dat$x1)
x1seq <- seq(x1r[1], x1r[2], length=30)
x2r <- range(dat$x2)
x2seq <- seq(x2r[1], x2r[2], length=30)
z <- outer(x1seq, x2seq, function(a,b){
predict(m1, newdata=data.frame(x1=a, x2=b))
})
res <- persp(x1seq, x2seq, z)
mypoints <- trans3d(dat$x1, dat$x2, dat$y, pmat=res)
points(mypoints, pch=1, col="red")
You can use the function presented here to determine distance to the observer, then scale the pointsize (cex) to that distance:
# volcano data
z <- 2 * volcano # Exaggerate the relief
x <- 10 * (1:nrow(z)) # 10 meter spacing (S to N)
y <- 10 * (1:ncol(z)) # 10 meter spacing (E to W)
# draw volcano and store transformation matrix
pmat <- persp(x, y, z, theta = 35, phi = 40, col = 'green4', scale = FALSE,
ltheta = -120, shade = 0.75, border = NA, box = TRUE)
# take some xyz values from the matrix
s = sample(1:prod(dim(z)), size=500)
xx = x[row(z)[s] ]
yy = y[col(z)[s]]
zz = z[s] + 10
# depth calculation function (adapted from Duncan Murdoch at https://stat.ethz.ch/pipermail/r-help/2005-September/079241.html)
depth3d <- function(x,y,z, pmat, minsize=0.2, maxsize=2) {
# determine depth of each point from xyz and transformation matrix pmat
tr <- as.matrix(cbind(x, y, z, 1)) %*% pmat
tr <- tr[,3]/tr[,4]
# scale depth to point sizes between minsize and maxsize
psize <- ((tr-min(tr) ) * (maxsize-minsize)) / (max(tr)-min(tr)) + minsize
return(psize)
}
# determine distance to eye
psize = depth3d(xx,yy,zz,pmat,minsize=0.1, maxsize = 1)
# from 3D to 2D coordinates
mypoints <- trans3d(xx, yy, zz, pmat=pmat)
# plot in 2D space with pointsize related to distance
points(mypoints, pch=8, cex=psize, col=4)

Shorten Arrows/Lines/Segments Between Coordinates

I am drawing arrows from one set of points to another with arrows(). I'd like to shorten the arrows by a common length so that they don't overlap with the label. However, it's not obvious how one does that, given that arrows() takes coordinates as input.
For instance, here's an example.
x <- stats::runif(12); y <- stats::rnorm(12)
i <- order(x, y); x <- x[i]; y <- y[i]
plot(x,y, main = "Stack Example", type = 'n')
text(x = x, y = y, LETTERS[1:length(x)], cex = 2, col = sample(colors(), 12))
s <- seq(length(x)-1) # one shorter than data
arrows(x[s], y[s], x[s+1], y[s+1])
How do I shorten the arrows so they don't overlap with the labels?
UPDATE
These are all great answers. In an attempt to come up with something that doesn't presume that points connect in a chain, I wrote the following function, which moves x0y0 (a dataframe where column 1 is x and column 2 is y) closer to xy (same format as x0y0) by absolute distance d.
movePoints <- function(x0y0, xy, d){
total.dist <- apply(cbind(x0y0, xy), 1,
function(x) stats::dist(rbind(x[1:2], x[3:4])))
p <- d / total.dist
p <- 1 - p
x0y0[,1] <- xy[,1] + p*(x0y0[,1] - xy[,1])
x0y0[,2] <- xy[,2] + p*(x0y0[,2] - xy[,2])
return(x0y0)
}
I don't think there is a built-in solution, but if you can guarantee that your points are spaced far enough (otherwise drawing arrows would be difficult anyway!) then you can "shrink" the points the arrows are drawn on by the length of the radius of an imaginary circle circumscribing each letter.
Note that, however, since the scale of the x and y axes are different, we have to be careful to normalize the x and y values before transformation. The reduce_length parameter below is the estimated % of the total viewport that a typical letter occupies. You can tweak with this if you want a little more space around the letters. Also be careful to not pick bad colors that make the letter invisible.
Finally, the imperfections are because of different dimensions for different letters. To really address this, we would need a map of letters to micro x and y adjustments.
x <- stats::runif(12); y <- stats::rnorm(12)
i <- order(x, y); x <- x[i]; y <- y[i]
initx <- x; inity <- y
plot(x,y, main = "Stack Example", type = 'n')
text(x = x, y = y, LETTERS[1:length(x)], cex = 2, col = sample(colors()[13:100], 12))
spaced_arrows <- function(x, y, reduce_length = 0.048) {
s <- seq(length(x)-1) # one shorter than data
xscale <- max(x) - min(x)
yscale <- max(y) - min(y)
x <- x / xscale
y <- y / yscale
# shrink the line around its midpoint, normalizing for differences
# in scale of x and y
lapply(s, function(i) {
dist <- sqrt((x[i+1] - x[i])^2 + (y[i+1] - y[i])^2)
# calculate our normalized unit vector, accounting for scale
# differences in x and y
tmp <- reduce_length * (x[i+1] - x[i]) / dist
x[i] <- x[i] + tmp
x[i+1] <- x[i+1] - tmp
tmp <- reduce_length * (y[i+1] - y[i]) / dist
y[i] <- y[i] + tmp
y[i+1] <- y[i+1] - tmp
newdist <- sqrt((x[i+1] - x[i])^2 + (y[i+1] - y[i])^2)
if (newdist > reduce_length * 1.5) # don't show too short arrows
# we have to rescale back to the original dimensions
arrows(xscale*x[i], yscale*y[i], xscale*x[i+1], yscale*y[i+1])
})
TRUE
}
spaced_arrows(x, y)
I was seeing that some of the arrows were reversed in #RobertKrzyzanowski's answer when the letters were close so I reduced the factor. I also vectorized the function using hte diff() function:
plot(x,y, main = "Stack Example", type = 'n')
text(x = x, y = y, LETTERS[1:length(x)], cex = 2)
gap_arrows <- function(x, fact = 0.075) {
dist <- sqrt( diff(x)^2 + diff(y)^2)
x0 <- x[-length(x)] + (tmp <- fact * (diff(x)) / dist)
x1 <- x[-1] - tmp
y0 <- y[-length(y)] + (tmp <- fact * diff(y) / dist)
y1 <- y[-1] - tmp
arrows(x0,y0,x1,y1)
}
gap_arrows2(x)
I don't really think this is a finished answer, but perhaps useful? I think using a factor ratehr than an absolute reduction creates some shortening when the line is near horizontal that I don't understand. The G-G transition seems odd (too short) in this data:
> dput(x)
c(0.058478488586843, 0.152887222822756, 0.171698493883014, 0.197744736680761,
0.260856857057661, 0.397151953307912, 0.54208036721684, 0.546826156554744,
0.633055359823629, 0.662317642010748, 0.803418542025611, 0.83192756283097
)
> dput(y)
c(-0.256092192198247, -0.961856634130129, 0.0412329219929399,
0.235386572284857, 1.84386200523221, -0.651949901695459, -0.490557443700668,
1.44455085842335, -0.422496832339625, 0.451504053079215, -0.0713080861235987,
0.0779608495637108)

R: Plotting a 3D surface from x, y, z

imagine I have a 3 columns matrix
x, y, z
where z is a function of x and y.
I know how to plot a "scatter plot" of these points with
plot3d(x,y,z)
But if I want a surface instead I must use other commands such as surface3d
The problem is that it doesn't accept the same inputs as plot3d
it seems to need a matrix with
(nÂș elements of z) = (n of elements of x) * (n of elements of x)
How can I get this matrix?
I've tried with the command interp, as I do when I need to use contour plots.
How can I plot a surface directly from x,y,z without calculating this matrix?
If I had too many points this matrix would be too big.
cheers
If your x and y coords are not on a grid then you need to interpolate your x,y,z surface onto one. You can do this with kriging using any of the geostatistics packages (geoR, gstat, others) or simpler techniques such as inverse distance weighting.
I'm guessing the 'interp' function you mention is from the akima package. Note that the output matrix is independent of the size of your input points. You could have 10000 points in your input and interpolate that onto a 10x10 grid if you wanted. By default akima::interp does it onto a 40x40 grid:
require(akima)
require(rgl)
x = runif(1000)
y = runif(1000)
z = rnorm(1000)
s = interp(x,y,z)
> dim(s$z)
[1] 40 40
surface3d(s$x,s$y,s$z)
That'll look spiky and rubbish because its random data. Hopefully your data isnt!
You can use the function outer() to generate it.
Have a look at the demo for the function persp(), which is a base graphics function to draw perspective plots for surfaces.
Here is their first example:
x <- seq(-10, 10, length.out = 50)
y <- x
rotsinc <- function(x,y) {
sinc <- function(x) { y <- sin(x)/x ; y[is.na(y)] <- 1; y }
10 * sinc( sqrt(x^2+y^2) )
}
z <- outer(x, y, rotsinc)
persp(x, y, z)
The same applies to surface3d():
require(rgl)
surface3d(x, y, z)
You could look at using Lattice. In this example I have defined a grid over which I want to plot z~x,y. It looks something like this. Note that most of the code is just building a 3D shape that I plot using the wireframe function.
The variables "b" and "s" could be x or y.
require(lattice)
# begin generating my 3D shape
b <- seq(from=0, to=20,by=0.5)
s <- seq(from=0, to=20,by=0.5)
payoff <- expand.grid(b=b,s=s)
payoff$payoff <- payoff$b - payoff$s
payoff$payoff[payoff$payoff < -1] <- -1
# end generating my 3D shape
wireframe(payoff ~ s * b, payoff, shade = TRUE, aspect = c(1, 1),
light.source = c(10,10,10), main = "Study 1",
scales = list(z.ticks=5,arrows=FALSE, col="black", font=10, tck=0.5),
screen = list(z = 40, x = -75, y = 0))
rgl is great, but takes a bit of experimentation to get the axes right.
If you have a lot of points, why not take a random sample from them, and then plot the resulting surface. You can add several surfaces all based on samples from the same data to see if the process of sampling is horribly affecting your data.
So, here is a pretty horrible function but it does what I think you want it to do (but without the sampling). Given a matrix (x, y, z) where z is the heights it will plot both the points and also a surface. Limitations are that there can only be one z for each (x,y) pair. So planes which loop back over themselves will cause problems.
The plot_points = T will plot the individual points from which the surface is made - this is useful to check that the surface and the points actually meet up. The plot_contour = T will plot a 2d contour plot below the 3d visualization. Set colour to rainbow to give pretty colours, anything else will set it to grey, but then you can alter the function to give a custom palette. This does the trick for me anyway, but I'm sure that it can be tidied up and optimized. The verbose = T prints out a lot of output which I use to debug the function as and when it breaks.
plot_rgl_model_a <- function(fdata, plot_contour = T, plot_points = T,
verbose = F, colour = "rainbow", smoother = F){
## takes a model in long form, in the format
## 1st column x
## 2nd is y,
## 3rd is z (height)
## and draws an rgl model
## includes a contour plot below and plots the points in blue
## if these are set to TRUE
# note that x has to be ascending, followed by y
if (verbose) print(head(fdata))
fdata <- fdata[order(fdata[, 1], fdata[, 2]), ]
if (verbose) print(head(fdata))
##
require(reshape2)
require(rgl)
orig_names <- colnames(fdata)
colnames(fdata) <- c("x", "y", "z")
fdata <- as.data.frame(fdata)
## work out the min and max of x,y,z
xlimits <- c(min(fdata$x, na.rm = T), max(fdata$x, na.rm = T))
ylimits <- c(min(fdata$y, na.rm = T), max(fdata$y, na.rm = T))
zlimits <- c(min(fdata$z, na.rm = T), max(fdata$z, na.rm = T))
l <- list (x = xlimits, y = ylimits, z = zlimits)
xyz <- do.call(expand.grid, l)
if (verbose) print(xyz)
x_boundaries <- xyz$x
if (verbose) print(class(xyz$x))
y_boundaries <- xyz$y
if (verbose) print(class(xyz$y))
z_boundaries <- xyz$z
if (verbose) print(class(xyz$z))
if (verbose) print(paste(x_boundaries, y_boundaries, z_boundaries, sep = ";"))
# now turn fdata into a wide format for use with the rgl.surface
fdata[, 2] <- as.character(fdata[, 2])
fdata[, 3] <- as.character(fdata[, 3])
#if (verbose) print(class(fdata[, 2]))
wide_form <- dcast(fdata, y ~ x, value_var = "z")
if (verbose) print(head(wide_form))
wide_form_values <- as.matrix(wide_form[, 2:ncol(wide_form)])
if (verbose) print(wide_form_values)
x_values <- as.numeric(colnames(wide_form[2:ncol(wide_form)]))
y_values <- as.numeric(wide_form[, 1])
if (verbose) print(x_values)
if (verbose) print(y_values)
wide_form_values <- wide_form_values[order(y_values), order(x_values)]
wide_form_values <- as.numeric(wide_form_values)
x_values <- x_values[order(x_values)]
y_values <- y_values[order(y_values)]
if (verbose) print(x_values)
if (verbose) print(y_values)
if (verbose) print(dim(wide_form_values))
if (verbose) print(length(x_values))
if (verbose) print(length(y_values))
zlim <- range(wide_form_values)
if (verbose) print(zlim)
zlen <- zlim[2] - zlim[1] + 1
if (verbose) print(zlen)
if (colour == "rainbow"){
colourut <- rainbow(zlen, alpha = 0)
if (verbose) print(colourut)
col <- colourut[ wide_form_values - zlim[1] + 1]
# if (verbose) print(col)
} else {
col <- "grey"
if (verbose) print(table(col2))
}
open3d()
plot3d(x_boundaries, y_boundaries, z_boundaries,
box = T, col = "black", xlab = orig_names[1],
ylab = orig_names[2], zlab = orig_names[3])
rgl.surface(z = x_values, ## these are all different because
x = y_values, ## of the confusing way that
y = wide_form_values, ## rgl.surface works! - y is the height!
coords = c(2,3,1),
color = col,
alpha = 1.0,
lit = F,
smooth = smoother)
if (plot_points){
# plot points in red just to be on the safe side!
points3d(fdata, col = "blue")
}
if (plot_contour){
# plot the plane underneath
flat_matrix <- wide_form_values
if (verbose) print(flat_matrix)
y_intercept <- (zlim[2] - zlim[1]) * (-2/3) # put the flat matrix 1/2 the distance below the lower height
flat_matrix[which(flat_matrix != y_intercept)] <- y_intercept
if (verbose) print(flat_matrix)
rgl.surface(z = x_values, ## these are all different because
x = y_values, ## of the confusing way that
y = flat_matrix, ## rgl.surface works! - y is the height!
coords = c(2,3,1),
color = col,
alpha = 1.0,
smooth = smoother)
}
}
The add_rgl_model does the same job without the options, but overlays a surface onto the existing 3dplot.
add_rgl_model <- function(fdata){
## takes a model in long form, in the format
## 1st column x
## 2nd is y,
## 3rd is z (height)
## and draws an rgl model
##
# note that x has to be ascending, followed by y
print(head(fdata))
fdata <- fdata[order(fdata[, 1], fdata[, 2]), ]
print(head(fdata))
##
require(reshape2)
require(rgl)
orig_names <- colnames(fdata)
#print(head(fdata))
colnames(fdata) <- c("x", "y", "z")
fdata <- as.data.frame(fdata)
## work out the min and max of x,y,z
xlimits <- c(min(fdata$x, na.rm = T), max(fdata$x, na.rm = T))
ylimits <- c(min(fdata$y, na.rm = T), max(fdata$y, na.rm = T))
zlimits <- c(min(fdata$z, na.rm = T), max(fdata$z, na.rm = T))
l <- list (x = xlimits, y = ylimits, z = zlimits)
xyz <- do.call(expand.grid, l)
#print(xyz)
x_boundaries <- xyz$x
#print(class(xyz$x))
y_boundaries <- xyz$y
#print(class(xyz$y))
z_boundaries <- xyz$z
#print(class(xyz$z))
# now turn fdata into a wide format for use with the rgl.surface
fdata[, 2] <- as.character(fdata[, 2])
fdata[, 3] <- as.character(fdata[, 3])
#print(class(fdata[, 2]))
wide_form <- dcast(fdata, y ~ x, value_var = "z")
print(head(wide_form))
wide_form_values <- as.matrix(wide_form[, 2:ncol(wide_form)])
x_values <- as.numeric(colnames(wide_form[2:ncol(wide_form)]))
y_values <- as.numeric(wide_form[, 1])
print(x_values)
print(y_values)
wide_form_values <- wide_form_values[order(y_values), order(x_values)]
x_values <- x_values[order(x_values)]
y_values <- y_values[order(y_values)]
print(x_values)
print(y_values)
print(dim(wide_form_values))
print(length(x_values))
print(length(y_values))
rgl.surface(z = x_values, ## these are all different because
x = y_values, ## of the confusing way that
y = wide_form_values, ## rgl.surface works!
coords = c(2,3,1),
alpha = .8)
# plot points in red just to be on the safe side!
points3d(fdata, col = "red")
}
So my approach would be to, try to do it with all your data (I easily plot surfaces generated from ~15k points). If that doesn't work, take several smaller samples and plot them all at once using these functions.
Maybe is late now but following Spacedman, did you try duplicate="strip" or any other option?
x=runif(1000)
y=runif(1000)
z=rnorm(1000)
s=interp(x,y,z,duplicate="strip")
surface3d(s$x,s$y,s$z,color="blue")
points3d(s)

Resources