How to plot a regression plane with an interaction in rgl - r

I want to plot the regression surface from a model with an interaction term using rgl's interactive plotting system. It is easy to plot a regression plane for a model without an interaction term using:
plot3d(x=x1, y=x2, z=y1, type="s", col="yellow", size=1)
planes3d(a=coef(mod1)[2], b=coef(mod1)[3], c=-1, d=coef(mod1)[1], alpha=.5)
However, when the plane twists, this seems to be more difficult. Following on this question: 3D equivalent of the curve function in r, I am trying:
f2 <- function(x, y) as.vector(coef(mod2)%*%c(1, x, y, x*y))
curve_3d <- function(f2, x_range=c(0, 40), y_range=c(0, 40)){
if (!require(rgl) ) {stop("load rgl")}
xvec <- seq(x_range[1], x_range[2], by=1)
yvec <- seq(y_range[1], y_range[2], by=1)
fz <- outer(xvec, yvec, FUN=f2)
persp3d(xvec, yvec, fz, alpha=.5)
}
open3d()
plot3d(x=x1, y=x2, z=y2, type="s", col="yellow", size=1)
curve_3d(f2)
But, it's not working. (I've tried some other things as well, but I'm keeping this short.) My main problem so far seems to be with f2; however, I will also want this to look like planes3d, and I'm not sure if this is going to give me a wireframe.
Here's an example:
set.seed(897)
x1 = rep(c(0, 10, 20, 30, 40), times=25)
x2 = rep(c(0, 10, 20, 30, 40), each=25)
y2 = 37 + 0.7*x1 + 1.2*x2 - 0.05*x1*x2 + rnorm(125, mean=0, sd=5)
mod2 = lm(y2~x1*x2)
open3d()
plot3d(x=x1, y=x2, z=y2, type="s", col="yellow", size=1)
curve_3d(f2)

grd <- expand.grid(x1=unique(x1), x2=unique(x2) )
grd$pred <-predict(mod2, newdata=grd)
persp3d(x=unique(grd[[1]]), y=unique(grd[[2]]),
z=matrix(grd[[3]],5,5), add=TRUE)

Related

Variation on "How to plot decision boundary of a k-nearest neighbor classifier from Elements of Statistical Learning?"

This is a question related to https://stats.stackexchange.com/questions/21572/how-to-plot-decision-boundary-of-a-k-nearest-neighbor-classifier-from-elements-o
For completeness, here's the original example from that link:
library(ElemStatLearn)
require(class)
x <- mixture.example$x
g <- mixture.example$y
xnew <- mixture.example$xnew
mod15 <- knn(x, xnew, g, k=15, prob=TRUE)
prob <- attr(mod15, "prob")
prob <- ifelse(mod15=="1", prob, 1-prob)
px1 <- mixture.example$px1
px2 <- mixture.example$px2
prob15 <- matrix(prob, length(px1), length(px2))
par(mar=rep(2,4))
contour(px1, px2, prob15, levels=0.5, labels="", xlab="", ylab="", main=
"15-nearest neighbour", axes=FALSE)
points(x, col=ifelse(g==1, "coral", "cornflowerblue"))
gd <- expand.grid(x=px1, y=px2)
points(gd, pch=".", cex=1.2, col=ifelse(prob15>0.5, "coral", "cornflowerblue"))
box()
I've been playing with that example, and would like to try to make it work with three classes. I can change some values of g with something like
g[8:16] <- 2
just to pretend that there are some samples which are from a third class. I can't make the plot work, though. I guess I need to change the lines that deal with the proportion of votes for winning class:
prob <- attr(mod15, "prob")
prob <- ifelse(mod15=="1", prob, 1-prob)
and also the levels on the contour:
contour(px1, px2, prob15, levels=0.5, labels="", xlab="", ylab="", main=
"15-nearest neighbour", axes=FALSE)
I am also not sure contour is the right tool for this. One alternative that works is to create a matrix of data that covers the region I'm interested, classify each point of this matrix and plot those with a large marker and different colors, similar to what is being done with the points(gd...) bit.
The final purpose is to be able to show different decision boundaries generated by different classifiers. Can someone point me to the right direction?
thanks
Rafael
Separating the main parts in the code will help outlining how to achieve this:
Test data with 3 classes
train <- rbind(iris3[1:25,1:2,1],
iris3[1:25,1:2,2],
iris3[1:25,1:2,3])
cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))
Test data covering a grid
require(MASS)
test <- expand.grid(x=seq(min(train[,1]-1), max(train[,1]+1),
by=0.1),
y=seq(min(train[,2]-1), max(train[,2]+1),
by=0.1))
Classification for that grid
3 classes obviously
require(class)
classif <- knn(train, test, cl, k = 3, prob=TRUE)
prob <- attr(classif, "prob")
Data structure for plotting
require(dplyr)
dataf <- bind_rows(mutate(test,
prob=prob,
cls="c",
prob_cls=ifelse(classif==cls,
1, 0)),
mutate(test,
prob=prob,
cls="v",
prob_cls=ifelse(classif==cls,
1, 0)),
mutate(test,
prob=prob,
cls="s",
prob_cls=ifelse(classif==cls,
1, 0)))
Plot
require(ggplot2)
ggplot(dataf) +
geom_point(aes(x=x, y=y, col=cls),
data = mutate(test, cls=classif),
size=1.2) +
geom_contour(aes(x=x, y=y, z=prob_cls, group=cls, color=cls),
bins=2,
data=dataf) +
geom_point(aes(x=x, y=y, col=cls),
size=3,
data=data.frame(x=train[,1], y=train[,2], cls=cl))
We can also be a little fancier and plot the probability of class membership as a indication of the "confidence".
ggplot(dataf) +
geom_point(aes(x=x, y=y, col=cls, size=prob),
data = mutate(test, cls=classif)) +
scale_size(range=c(0.8, 2)) +
geom_contour(aes(x=x, y=y, z=prob_cls, group=cls, color=cls),
bins=2,
data=dataf) +
geom_point(aes(x=x, y=y, col=cls),
size=3,
data=data.frame(x=train[,1], y=train[,2], cls=cl)) +
geom_point(aes(x=x, y=y),
size=3, shape=1,
data=data.frame(x=train[,1], y=train[,2], cls=cl))

How to shade a graph using curve() in R

I am plotting the standard normal distribution.
curve(dnorm(x), from=-4, to=4,
main = "The Standard Normal Distibution",
ylab = "Probability Density",
xlab = "X")
For pedagogical reasons, I want to shade the area below a certain quantile of my choice. How can I do this?
If you want to use curve and base plot, then you can write a little function yourself with polygon:
colorArea <- function(from, to, density, ..., col="blue", dens=NULL){
y_seq <- seq(from, to, length.out=500)
d <- c(0, density(y_seq, ...), 0)
polygon(c(from, y_seq, to), d, col=col, density=dens)
}
A little example follows:
curve(dnorm(x), from=-4, to=4,
main = "The Standard Normal Distibution",
ylab = "Probability Density",
xlab = "X")
colorArea(from=-4, to=qnorm(0.025), dnorm)
colorArea(from=qnorm(0.975), to=4, dnorm, mean=0, sd=1, col=2, dens=20)
We could use the following R code too, in order to shade the regions under the standard normal curve below a certain (given) quantile:
library(ggplot2)
z <- seq(-4,4,0.01)
fz <- dnorm(z)
q <- qnorm(0.1) # the quantile
x <- seq(-4, q, 0.01)
y <- c(dnorm(x), 0, 0)
x <- c(x, q, -4)
ggplot() + geom_line(aes(z, fz)) +
geom_polygon(data = data.frame(x=x, y=y), aes(x, y), fill='blue')

Polynomial regression (second order) plot in R

I am doing a polynomial regression in R for the following data but I cannot display the correct graph of the polynomial of 2rd degree. I got the equation of polynomial of degree 2 right, however I did something wrong in the last part of the script. Could anyone help? Thanks
Here is my script:
Vegetation_Cover <- c(5,0,10,40,100,30,80,2,70,2,0)
NDVI <- c(0.35,0.32,0.36,0.68,0.75,0.48,0.75,0.35,0.70,0.34,0.28)
plot(Vegetation_Cover,NDVI, main=list
("Vegetation Cover and NDVI",cex=1.5),pch=20,cex=1.4,col="gray0")
sample1 <- data.frame(Vegetation_Cover, NDVI)
sample1
fit2 <- lm(sample1$NDVI ~ sample1$Vegetation_Cover + I(sample1$Vegetation_Cover^2))
summary(fit2)
lm(formula = sample1$NDVI ~ sample1$Vegetation_Cover + I(sample1$Vegetation_Cover^2))
anova(fit2)
pol2 <- function(x) fit2$coefficient[3]*x^2 + fit2$coefficient[2]*x + fit2$coefficient[1]
plot(sample1$Vegetation_Cover, sample1$NDVI, type="p", lwd=3)
pol2 <- function(x) fit2$coefficient[3]*x^2 + fit2$coefficient[2]*x + fit2$coefficient[1]
curve(pol2, col="red", lwd=2)
points(sample1$Vegetation_Cover, sample1$NDVI, type="p", lwd=3)
It looks like you're just missing add=TRUE for your call to curve. This seems to plot what you're looking for:
pol2 <- function(x) fit2$coefficient[3]*x^2 + fit2$coefficient[2]*x + fit2$coefficient[1]
plot(sample1$Vegetation_Cover, sample1$NDVI, type="p", lwd=3)
curve(pol2, col="red", lwd=2, add=T)

draw several ablines at once with specific color scheme

I have a data frame with slopes and intercepts coming from a series of simple linear regressions. In plotting the ablines I want to use a color coding that is specific for all possible combinations of class and category.
Say the data frame looks as follows:
(intercept <- rnorm(n = 40, mean = 1, sd = 0.25))
(slope <- rnorm(n = 40, mean = 2, sd = 1))
(clss <- c(rep("a", 20), rep("b", 20)))
(ctg <- c(rep("mm", 10), rep("nn", 10), rep("mm", 10), rep("nn", 10)))
df <- data.frame(intercept, slope, clss, ctg)
I managed to plot all ablines using:
plot(1, type="n", axes=FALSE, xlab="", ylab="", xlim=c(0, 10), ylim=c(0, 10))
mapply(abline, df$intercept, df$slope)
I want to plot these lines all in say green when clss=="a" and ctg=="mm" and use different colors for the other clss * ctg combinations.
Probably something like this would work:
by(df, paste(df$clss, df$ctg), mapply(abline, ... ))
But I could not figure out how.
Using ggplot:
library(ggplot2)
gg <- df
gg$color <- paste(gg$clss,".",gg$ctg,sep="")
ggplot(gg) +
geom_point(aes(x=-10,y=-10,color=color)) + # need this to display a legend...
geom_abline(aes(slope=slope, intercept=intercept, color=color)) +
xlim(0,10) + ylim(0,10) + labs(x="X",y="Y")
Produces this:
It turns out in your case you only have 4 unique clss and ctg combinations, so I just picked some random colours and modified your mapply
# get colour for each combination
x <- sample(colours(), length(unique(paste0(df$clss, df$ctg))))
# how many of each combination are there
q <- aggregate(df$intercept, by=list(paste0(df$clss, df$ctg)), length)
# make a colour vector
mycols <- rep(x, q[,2])
mapply(function(x,y,z) { abline(x, y, col=z) },
df$intercept, df$slope,
as.list(mycols) )
#You could obviously pick the colours yourself or choose a gradient

Adding markers to 3D plot in R

I need to mark where certain observations appear in a 3D plotted joint density function -- I envision adding a vector, (x, y, f(x,y) + something_small) to the density plot, showing where the point is. I have tried using trans3d(), but that hasn't worked.
Here is an example:
library(MASS)
Sigma <- matrix(c(12,1,1,12),2,2)
Sample <- mvrnorm(n=1000, rep(0, 2), Sigma)
empDen <- kde2d(Sample[,1],Sample[,2])
par(bg = "white")
x <- empDen$x
y <- empDen$y
z <- empDen$z
nrz <- nrow(z)
ncz <- ncol(z)
jet.colors <- colorRampPalette( c("lightblue", "blue") )
nbcol <- 100
color <- jet.colors(nbcol)
zfacet <- z[-1, -1] + z[-1, -ncz] + z[-nrz, -1] + z[-nrz, -ncz]
facetcol <- cut(zfacet, nbcol)
persp(x, y, z, col = color[facetcol], phi = 15, theta = -50, xlab="x", ylab="y", zlab="Empirical Joint Density", border=NA)
The question is: How do I indicate where Sample[1,] appears in the joint density, i.e. add this to the plot?
Thanks for any tips!
This works:
fmt=persp(x, y, z, col = color[facetcol], phi = 15, theta = -50, xlab="x", ylab="y", zlab="Empirical Joint Density", border=NA)
pt = Sample[1,]
points(trans3d(pt[1],pt[2],.001,fmt),pch=20, col="Red")
lines(trans3d(c(pt[1],pt[1]), c(pt[2],pt[2]), c(0,.001),fmt),col="Red",cex=2)
Although, it would be nice to replace .001 with some information based off the empirical joint density instead of manually specifying values for each point.

Resources