How to create a "Clustergram" plot ? (in R) - r

I came across this interesting website, with an idea of a way to visualize a clustering algorithm called "Clustergram":
(source: schonlau.net)
I am not sure how useful this really is, but in order to play with it I would like to reproduce it with R, but am not sure how to go about doing it.
How would you create a line for each item so it would stay consistent throughout the different number of clusters?
Here is an example code/data to play with for potential answer:
hc <- hclust(dist(USArrests), "ave")
plot(hc)

Update: I posted a solution with a lengthy example and discussion here. (it is based on the code I gave bellow). Also, Hadley was very kind and offered a ggplot2 implementation of the code.
Here is a basic solution (for a better one, look at the "update" above):
set.seed(100)
Data <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(Data) <- c("x", "y")
# noise <- runif(100,0,.05)
line.width <- rep(.004, dim(Data)[1])
Y <- NULL
X <- NULL
k.range <- 2:10
plot(0, 0, col = "white", xlim = c(1,10), ylim = c(-.5,1.6),
xlab = "Number of clusters", ylab = "Clusters means",
main = "(Basic) Clustergram")
axis(side =1, at = k.range)
abline(v = k.range, col = "grey")
centers.points <- list()
for(k in k.range){
cl <- kmeans(Data, k)
clusters.vec <- cl$cluster
the.centers <- apply(cl$centers,1, mean)
noise <- unlist(tapply(line.width, clusters.vec,
cumsum))[order(seq_along(clusters.vec)[order(clusters.vec)])]
noise <- noise - mean(range(noise))
y <- the.centers[clusters.vec] + noise
Y <- cbind(Y, y)
x <- rep(k, length(y))
X <- cbind(X, x)
centers.points[[k]] <- data.frame(y = the.centers , x = rep(k , k))
# points(the.centers ~ rep(k , k), pch = 19, col = "red", cex = 1.5)
}
require(colorspace)
COL <- rainbow_hcl(100)
matlines(t(X), t(Y), pch = 19, col = COL, lty = 1, lwd = 1.5)
# add points
lapply(centers.points,
function(xx){ with(xx,points(y~x, pch = 19, col = "red", cex = 1.3)) })

Related

Why does plot(Vectorize(function)) work but curve(Vectorize(function)) does not?

Why does plot(Vectorize(f)), where f is a function, work, but curve(Vectorize(f)) does not? This inconsistency bothers me a bit. Here is a full reproducable example:
set.seed(123)
N <- 9
X <- rnorm(N)
a <- 0.2
b <- 0.8
sigma <- 0.7
eps <- rnorm(N, 0, sigma)
Y <- a + b * X + eps
cor_p_value <- function(new_Y1)
{
Y[1] <- new_Y1
cor.test(X,Y)$p.value
}
lm_F_p_value <- function(new_Y1)
{
Y[1] <- new_Y1
anova(lm(Y ~ X))$Pr[1]
}
plot(Vectorize(cor_p_value), xlim = c(-50,50), lwd = 3)
curve(Vectorize(lm_F_p_value), col = "red", lwd = 3, lty = 2, add = TRUE)
# Error in curve(Vectorize(lm_F_p_value), col = "red", lwd = 3, lty = 2, :
# 'expr' must be a function, or a call or an expression containing 'x'
So for curve, I must do this annoying workaround:
cu <- Vectorize(lm_F_p_value)
curve(cu, col = "red", lwd = 3, lty = 2, add = TRUE)
But why? Vectorize returns the same function, plot takes it, curve does not. All these functions are from the base built-in R packages, so I expect them to cooperate and be consistent. Probably a bug?
According to ?curve
expr - The name of a function, or a call or an expression written as a function of x which will evaluate to an object of the same length as x.
curve(Vectorize(lm_F_p_value)(x), col = "red", lwd = 3, lty = 2,
add = TRUE, ylab = "y")
-output

Colour a line by a given value in a plot in R

Advancing on the answer given here where the same question was asked for a scatter plot, is it possible to plot a line where the colour is based on the y value?
Example data
x = 1:11
y = abs(6 - x)
plot(1:22,c(y,y), col = ifelse(c(y,y) < 2.5, 2, 3), pch = 16)
Will give
However, trying
plot(1:22,c(y,y), col = ifelse(c(y,y) < 2.5, 2, 3), type = "l")
Gives
or doing lines on y<2.5 which gives
instead of the solution I am after, which is
Is there any easy method to do this? This is only a simple case, and I can't manually add each section to my actual data.
Thanks!
Try this
x = 1:11
y = abs(6 - x)
y = c(y,y)
plot(1:22,y, col = ifelse(c(y,y) < 2.5, 2, 3), pch = 16)
for(i in 1:21){
if(y[i]>1.9&& y[i+1]>1.9){
linecolour="green"
} else {
linecolour="red"
}
lines(c((1:22)[i],(1:22)[i+1]),c(y[i],y[i+1]),col=linecolour)
}
Here is a vectorized solution. It is partly based on the answers to this question with the main difference that the function plotted there is a smooth function, unlike this one. This makes a difference when computing the points to plot and their colors.
fun <- function(x) abs(6 - x)
x <- 1:11
y <- fun(x)
X <- c(x, x + 11)
Y <- c(y, y)
n <- length(X)
color <- ifelse((Y[-n] < 2.5) & (Y[-1] < 2.5), 2, 3)
plot(X, Y, col = color, pch = 16)
segments(x0 = X[-n], y0 = Y[-n],
x1 = X[-1], y1 = Y[-1],
col = color, pch = 16)
To remove the points, start with
plot(X, Y, type = "n")

using layout to plot rasters with a common legend

Follow-up to this question. I am trying to plot 4 different rasters with a common scale (legend). I have managed to do it but now I want to plot them in the same graphic device, with the scale showing just once, at the right side of the device, and another legend for some elements I have plotted over the rasters placed at the bottom of the device.
I was sure this would be no problem with layout(), but turns out there is some incompatibility between the methods in plot.rasterand layout. Following the answer to the linked question, I plotted my rasters using image instead of plot, which seems to fix the problem. In this way, I was able to use layout() to get the points legend below my four rasters. However, now I can't seem to find a way of plotting just the scale to the right of my rasters images.
Example code:
library(raster)
# generate example rasters
set.seed(123)
ras1 <- raster(ncol = 10, nrow= 10)
values(ras1) <- runif(100, 1, 10)
ras2 <- raster(ncol = 10, nrow = 10)
values(ras2) <- runif(100, 5, 50)
ras3 <- raster(ncol = 10, nrow = 10)
values(ras3) <- runif(100, 10, 100)
ras4 <- raster(ncol = 10, nrow = 10)
values(ras4) <- runif(100, 8, 80)
# stack them
rasStack <- stack(ras1, ras2, ras3, ras4)
# obtain max and min values
maxv <- max(maxValue(rasStack))+1
minv <- min(minValue(rasStack))
# set the breaks between min and max values
brks <- seq(minv,maxv,by=0.1)
nbrks <- length(brks)-1
r.range <- c(minv, maxv)
# generate palette
colfunc<-colorRampPalette(c("springgreen", "royalblue", "yellow", "red"))
# add up the 4 layers for the common legend
rasTot <- ras1 + ras2 + ras3 + ras4
# plot in a loop with a common legend, using legend.only = T
par(mfrow=c(2,2))
for(i in seq_len(nlayers(rasStack))){
tmp <- rasStack[[i]]
plot(tmp, breaks=brks,col=colfunc(nbrks), legend = F, zlim=c(minv,maxv),
main = names(tmp))
plot(rasTot, legend.only=TRUE, col=colfunc(nbrks),
legend.width=1, legend.shrink=0.75,
legend.args=list(text='value', side=4, font=2, line=2.5, cex=0.8))
points(x = rnorm(5, 0, 100), y = rnorm(5, 0, 10), col = seq(1:10), pch = 16)
}
# so far so good. But when trying to add legends...
# setting layout parameters
m <- matrix(c(1,2,6,3,4,6,5,5,5),nrow = 3,ncol = 3,byrow = TRUE)
png("example_in_png_device.png")
layout(mat = m,heights = c(0.46,0.46,0.08), widths = c(0.45, 0.45, 0.1))
par(mar=c(4,2,4,2))
# plotting the images
for(i in seq_len(nlayers(rasStack))){
tmp <- rasStack[[i]]
image(tmp, breaks=brks,col=colfunc(nbrks), zlim=c(minv,maxv),
main = names(tmp))
points(x = rnorm(5, 0, 100), y = rnorm(5, 0, 10), col = seq(1:10), pch = 16)
}
# adding legend at the bottom
par(mar=c(0,2,0,2)+0.1)
plot(1, type = "n", axes = F, frame.plot = F)
legend("topleft",
c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"),
fill = seq(1:10), border = "black", ncol=10,
bg = "black", bty = "n",
y.intersp = 0.8, cex = 1.2)
dev.off()
All this generates an image that looks like this:
Which is pretty much what I wanted so that's great, but I can't figure out how to add the scale (common for all 4 raster layers) in the space number 6 in this layout.
Any ideas will be greatly appreciated and put to use immediately!

Specifying axes in base plot for multiple plots

I'm trying to make a plot with specific axes, while keeping the aspect ratio as 1.
the problem is there is parts of the plot that i don't need and want to remove.
I can manage it using margin:
## Creating Data
x <- seq(1, 100, length.out = 100)
y <- seq(1, 400, length.out = 100)
## Playing with margins
par(fin = c(3.75, 5.3) , mar = c(2, 9, 1, 3) + 0.1 )
## Making
plot(y ~ x ,asp = 1)
abline(v = -10)
abline(v = 120)
But if i want to plot multiple plots i dont know how to remove it
## Using mfrow
par(mfrow = c(3,2))
for (i in 1:6) {
plot(y ~ x ,asp = 1,xlim = c(0,100), ylim = c(0,400))
abline(v = -10)
abline(v = 120)
}
How can i do it for the multiple plots?
This might be closer, using the layout method (see this question and layout R documentation):
x <- seq(1, 100, length.out = 100)
y <- seq(1, 400, length.out = 100)
plot.new()
par(mai = c(0.6,0.5,0.3,0.3))
layout(matrix(c(1,2,3,4,5,6), nrow = 2, ncol = 3, byrow = TRUE))
for (i in 1:6) {
plot(y ~ x ,asp = 1, ylim = c(0,400))
abline(v = -10)
abline(v = 120)
}
The par(mai=c(b,l,t,r)) option changes the size of the whitespace surrounding the subplots.

Plot A Confusion Matrix with Color and Frequency in R

I want to plot a confusion matrix, but, I don't want to just use a heatmap, because I think they give poor numerical resolution. Instead, I want to also plot the frequency in the middle of the square. For instance, I like the output of this:
library(mlearning);
data("Glass", package = "mlbench")
Glass$Type <- as.factor(paste("Glass", Glass$Type))
summary(glassLvq <- mlLvq(Type ~ ., data = Glass));
(glassConf <- confusion(predict(glassLvq, Glass, type = "class"), Glass$Type))
plot(glassConf) # Image by default
However, 1.) I don't understand that the "01, 02, etc" means along each axis. How can we get rid of that?
2.) I would like 'Predicted' to be as the label of the 'y' dimension, and 'Actual' to be as the label for the 'x' dimension
3.) I would like to replace absolute counts by frequency / probability.
Alternatively, is there another package that will do this?
In essence, I want this in R:
http://www.mathworks.com/help/releases/R2013b/nnet/gs/gettingstarted_nprtool_07.gif
OR:
http://c431376.r76.cf2.rackcdn.com/8805/fnhum-05-00189-HTML/image_m/fnhum-05-00189-g009.jpg
The mlearning package seems quite inflexible with plotting confusion matrices.
Starting with your glassConf object, you probably want to do something like this:
prior(glassConf) <- 100
# The above rescales the confusion matrix such that columns sum to 100.
opar <- par(mar=c(5.1, 6.1, 2, 2))
x <- x.orig <- unclass(glassConf)
x <- log(x + 0.5) * 2.33
x[x < 0] <- NA
x[x > 10] <- 10
diag(x) <- -diag(x)
image(1:ncol(x), 1:ncol(x),
-(x[, nrow(x):1]), xlab='Actual', ylab='',
col=colorRampPalette(c(hsv(h = 0, s = 0.9, v = 0.9, alpha = 1),
hsv(h = 0, s = 0, v = 0.9, alpha = 1),
hsv(h = 2/6, s = 0.9, v = 0.9, alpha = 1)))(41),
xaxt='n', yaxt='n', zlim=c(-10, 10))
axis(1, at=1:ncol(x), labels=colnames(x), cex.axis=0.8)
axis(2, at=ncol(x):1, labels=colnames(x), las=1, cex.axis=0.8)
title(ylab='Predicted', line=4.5)
abline(h = 0:ncol(x) + 0.5, col = 'gray')
abline(v = 0:ncol(x) + 0.5, col = 'gray')
text(1:6, rep(6:1, each=6),
labels = sub('^0$', '', round(c(x.orig), 0)))
box(lwd=2)
par(opar) # reset par
The above code uses bits and pieces of the confusionImage function called by plot.confusion.
Here is a function for plotting confusion matrices I developed from jbaums excellent answer.
It is similar, but looks a bit nicer (IMO), and does not transpose the confusion matrix you feed it, which might be helpful.
### Function for plotting confusion matrices
confMatPlot = function(confMat, titleMy, shouldPlot = T) {
#' Function for plotting confusion matrice
#'
#' #param confMat: confusion matrix with counts, ie integers.
#' Fractions won't work
#' #param titleMy: String containing plot title
#' #return Nothing: It only plots
## Prepare data
x.orig = confMat; rm(confMat) # Lazy conversion to function internal variable name
n = nrow(x.orig) # conf mat is square by definition, so nrow(x) == ncol(x)
opar <- par(mar = c(5.1, 8, 3, 2))
x <- x.orig
x <- log(x + 0.5) # x<1 -> x<0 , x>=1 -> x>0
x[x < 0] <- NA
diag(x) <- -diag(x) # change sign to give diagonal different color
## Plot confusion matrix
image(1:n, 1:n, # grid of coloured boxes
# matrix giving color values for the boxes
# t() and [,ncol(x):1] since image puts [1,1] in bottom left by default
-t(x)[, n:1],
# ylab added later to avoid overlap with tick labels
xlab = 'Actual', ylab = '',
col = colorRampPalette(c("darkorange3", "white", "steelblue"),
bias = 1.65)(100),
xaxt = 'n', yaxt = 'n'
)
# Plot counts
text(rep(1:n, each = n), rep(n:1, times = n),
labels = sub('^0$', '', round(c(x.orig), 0)))
# Axis ticks but no lables
axis(1, at = 1:n, labels = rep("", n), cex.axis = 0.8)
axis(2, at = n:1, labels = rep("", n), cex.axis = 0.8)
# Tilted axis lables
text(cex = 0.8, x = (1:n), y = -0.1, colnames(x), xpd = T, srt = 30, adj = 1)
text(cex = 0.8, y = (n:1), x = +0.1, colnames(x), xpd = T, srt = 30, adj = 1)
title(main = titleMy)
title(ylab = 'Predicted', line = 6)
# Grid and box
abline(h = 0:n + 0.5, col = 'gray')
abline(v = 0:n + 0.5, col = 'gray')
box(lwd = 1, col = 'gray')
par(opar)
}
Example of output:

Resources