Dendrogram based on col values in a dataframe using R [duplicate] - r

I would like to create a dendrogram in R which has colored branches, like the one shown below.
So far I used following commands to create a standard dendrogram:
d <- dist(as.matrix(data[,29])) # find distance matrix
hc <- hclust(d) # apply hirarchical clustering
plot(hc,labels=data[,1], main="", xlab="") # plot the dendrogram
How should I modify this code to obtain desired result ?
Thanks in advance for your help.

You could use the dendextend package, aimed for tasks such as this:
# install the package:
if (!require('dendextend')) install.packages('dendextend'); library('dendextend')
## Example:
dend <- as.dendrogram(hclust(dist(USArrests), "ave"))
d1=color_branches(dend,k=5, col = c(3,1,1,4,1))
plot(d1) # selective coloring of branches :)
d2=color_branches(d1,k=5) # auto-coloring 5 clusters of branches.
plot(d2)
# More examples are in ?color_branches
You can see many examples in the presentations and vignettes of the package, in the "usage" section in the following URL: https://github.com/talgalili/dendextend

You should use dendrapply (help document).
For instance:
# Generate data
set.seed(12345)
desc.1 <- c(rnorm(10, 0, 1), rnorm(20, 10, 4))
desc.2 <- c(rnorm(5, 20, .5), rnorm(5, 5, 1.5), rnorm(20, 10, 2))
desc.3 <- c(rnorm(10, 3, .1), rnorm(15, 6, .2), rnorm(5, 5, .3))
data <- cbind(desc.1, desc.2, desc.3)
# Create dendrogram
d <- dist(data)
hc <- as.dendrogram(hclust(d))
# Function to color branches
colbranches <- function(n, col)
{
a <- attributes(n) # Find the attributes of current node
# Color edges with requested color
attr(n, "edgePar") <- c(a$edgePar, list(col=col, lwd=2))
n # Don't forget to return the node!
}
# Color the first sub-branch of the first branch in red,
# the second sub-branch in orange and the second branch in blue
hc[[1]][[1]] = dendrapply(hc[[1]][[1]], colbranches, "red")
hc[[1]][[2]] = dendrapply(hc[[1]][[2]], colbranches, "orange")
hc[[2]] = dendrapply(hc[[2]], colbranches, "blue")
# Plot
plot(hc)
Which gives:

FigTree can make color dendrograms. See for example, this paper.
To get data into FigTree from an R distance matrix dm,
library(ape)
z <- as.phylo(hclust(as.dist(dm)))
write.nexus(z, file="output.nex")

Related

Function to color branches in dendrogram plot using base R

I would like to write R function for coloring branches in dendrogram based on the given dendrogram object, specified number of clusters and vector of colors. I want to use base R instead of dendextend.
Using the exact code from this answer: https://stackoverflow.com/a/18036096/7064628 to similar question:
# Generate data
set.seed(12345)
desc.1 <- c(rnorm(10, 0, 1), rnorm(20, 10, 4))
desc.2 <- c(rnorm(5, 20, .5), rnorm(5, 5, 1.5), rnorm(20, 10, 2))
desc.3 <- c(rnorm(10, 3, .1), rnorm(15, 6, .2), rnorm(5, 5, .3))
data <- cbind(desc.1, desc.2, desc.3)
# Create dendrogram
d <- dist(data)
hc <- as.dendrogram(hclust(d))
# Function to color branches
colbranches <- function(n, col)
{
a <- attributes(n) # Find the attributes of current node
# Color edges with requested color
attr(n, "edgePar") <- c(a$edgePar, list(col=col, lwd=2))
n # Don't forget to return the node!
}
# Color the first sub-branch of the first branch in red,
# the second sub-branch in orange and the second branch in blue
hc[[1]][[1]] = dendrapply(hc[[1]][[1]], colbranches, "red")
hc[[1]][[2]] = dendrapply(hc[[1]][[2]], colbranches, "orange")
hc[[2]] = dendrapply(hc[[2]], colbranches, "blue")
# Plot
plot(hc)
In the code above, you have to manually select the branches to recolor them. I would like to have a function which finds k highest branches and changes color for them (and all their sub-branches). So far I experimented with iteratively searching for the highest sub-branch, but it seems to be needlessly difficult. If there was a way to extract heights of all branches, find k highest, and change the edgePar for each of their sub-branches, would be awesome.
the dendextend R package is designed for these tasks. You can see the many options for changing a dendrogram branch color in the vignette.
For example:
par(mfrow = c(1,2))
dend <- USArrests %>% dist %>% hclust(method = "ave") %>% as.dendrogram
d1=color_branches(dend,k=5, col = c(3,1,1,4,1))
plot(d1) # selective coloring of branches :)
d2=color_branches(d1,5)
plot(d2)

How to display the same legend for all raster layers in a rasterStack

I've looked through all SO but haven't been able to find an answer to this specific question:
I have a rasterStack with several layers whose values span over a quite large range. Knowing the values of each layer and the chosen colour scale I believe I have managed to plot all the rasters with the same colour scale, but I face three problems now:
I can't be sure that the values are being plotted with the same colour scale, although it seems like it
I can't manage to plot the same scale and scale labels for all layers.
One of my cells in ras3, ras3[3,9], doesn't get coloured, but it is not NA!
Example script follows:
library(raster)
# generate example rasters
set.seed(123)
ras1 <- raster(ncol = 10, nrow= 10)
values(ras1) <- runif(100, 1, 10)
ras2 <- raster(ncol = 10, nrow = 10)
values(ras2) <- runif(100, 5, 50)
ras3 <- raster(ncol = 10, nrow = 10)
values(ras3) <- runif(100, 10, 100)
# stack them
rasStack <- stack(ras1, ras2, ras3)
# plot normally to check the values
plot(rasStack)
# obtain max and min values
maxv <- max(maxValue(rasStack))+1
minv <- min(minValue(rasStack))
# set the breaks between min and max values
brks <- seq(minv,maxv,by=0.1)
nbrks <- length(brks)-1
r.range <- c(minv, maxv)
# generate palette
colfunc<-colorRampPalette(c("springgreen", "royalblue", "yellow", "red"))
# plot in a loop with a common legend, using legend.only = T
for(i in seq_len(nlayers(rasStack))){
tmp <- rasStack[[i]]
plot(tmp, breaks=brks,col=colfunc(nbrks), legend = F, zlim=c(minv,maxv),
main = names(tmp))
plot(tmp, legend.only=TRUE, col=colfunc(nbrks),
legend.width=1, legend.shrink=0.75,
axis.args=list(at=seq(minv, maxv, 5),
labels=round(seq(r.range[1], r.range[2], 5), 2),
cex.axis=0.6),
legend.args=list(text='value', side=4, font=2, line=2.5, cex=0.8))}
# check that the blank cell has a valid value
ras3[3, 9]
> 99.01704
Any help will be greatly appreciated!
EDIT: as per ycw's answer I've edited the code and now problem no. 3 has disappeared!
I just fixed this problem so I'll post the solution in case someone else stumbles with this.
I might be a bit of a workaround, and it certainly is not very elegant, but it works:
First of all we add up all three raster layers in a new one
rasTot <- ras1 + ras2 + ras3
Now we run the loop from the previous code but in the plot with the legend.onlycall we use this total raster.
for(i in seq_len(nlayers(rasStack))){
tmp <- rasStack[[i]]
plot(tmp, breaks=brks,col=colfunc(nbrks), legend = F, zlim=c(minv,maxv),
main = names(tmp))
plot(rasTot, legend.only=TRUE, col=colfunc(nbrks),
legend.width=1, legend.shrink=0.75,
legend.args=list(text='value', side=4, font=2, line=2.5, cex=0.8))
}
I also edited out some of the legend label specifications, as the defaults are OK.
The last break number should be larger than the maximum value of your data (maxv) so that the cell with the maximum can be colored based on the last color category. Otherwise, the cell will be blank.
I modified your code by changing maxv <- max(maxValue(rasStack)) + 1 but did not change other parts. The result looks good.

R - draw new layer behind current plot

Just curious, when plotting in R, one can easily change the order of the executive code to change the order of those "layer" on the plot, e.g.
plot(x, type = "n")
lines(y)
points(x)
to get x over the y. Are there any way to do it in an adhoc way, e.g.
plot(x)
lines(y, behind = TRUE) # fictional option behind
While there isn't explicitly a behind option or layers in plot, an easy way to overlay two plots might be using the add = TRUE option in plot. Here is an example with artificial data:
# Load sp package for creating artificial data
library(sp)
# Create sample town points
towns <- data.frame(lon = sample(100), lat = sample(100))
towns <- SpatialPoints(towns)
# Create sample polygon grid
grd <- GridTopology(c(1,1), c(10,10), c(10,10))
polys <- as.SpatialPolygons.GridTopology(grd)
# Plot polygons
plot(polys)
# Add towns (in red colour)
plot(towns, add = TRUE, col = 'red')
As another example, you can plot lines on different layers in ggplot and melt like this:
a <- c(3, 6, 16, 17, 11, 21)
b <- c(0.3, 2.3, 9, 9, 5 ,12)
c <- c(3, 7, 9, 7, 6, 10)
dat <- data.frame(a=a,b=b,c=c)
dat <- melt(dat)
Add an explicit 'x' variable to our data frame:
dat$x <- rep(1:6,times=3)
Then just plot the graph:
ggplot(dat,aes(x=x,y=value)) +
geom_line(aes(colour=variable)) +
scale_colour_manual(values=colours) +
labs(x="time[h]",y="a",colour="") +
opts(title="bla")
Finally, there is explicit support for layers in other packages, such as in PBSmapping for maps.

R legend for color density scatterplot produced using smoothScatter

I am producing a color density scatterplot in R using the smoothScatter() function.
Example:
## A largish data set
n <- 10000
x1 <- matrix(rnorm(n), ncol = 2)
x2 <- matrix(rnorm(n, mean = 3, sd = 1.5), ncol = 2)
x <- rbind(x1, x2)
oldpar <- par(mfrow = c(2, 2))
smoothScatter(x, nrpoints = 0)
Output:
The issue I am having is that I am unsure how to add a legend/color scale that describes the relative difference in numeric terms between different shades. For example, there is no way to tell whether the darkest blue in the figure above is 2 times, 10 times or 100 times as dense as the lightest blue without some sort of legend or color scale. Is there any way in R to retrieve the requisite information to make such a scale, or anything built in that can produce a color scale of this nature automatically?
Here is an answer that relies on fields::imageplot and some fiddling with par(mar) to get the margins correct
fudgeit <- function(){
xm <- get('xm', envir = parent.frame(1))
ym <- get('ym', envir = parent.frame(1))
z <- get('dens', envir = parent.frame(1))
colramp <- get('colramp', parent.frame(1))
fields::image.plot(xm,ym,z, col = colramp(256), legend.only = T, add =F)
}
par(mar = c(5,4,4,5) + .1)
smoothScatter(x, nrpoints = 0, postPlotHook = fudgeit)
You can fiddle around with image.plot to get what you want and look at ?bkde2D and the transformation argument to smoothScatter to get an idea of what the colours represent.

R: How do I display clustered matrix heatmap (similar color patterns are grouped)

I searched a lot of questions about heatmap throughout the site and packages, but I still have a problem.
I have clustered data (kmeans/EM/DBscan..), and I want to create a heatmap by grouping the same cluster. I want the similar color patterns to be grouped in the heatmap, so generally, it looks like a block-diagonal.
I tried to order the data by the cluster number and display it,
k = kmeans(data, 3)
d = data.frame(data)
d = data.frame(d, k$cluster)
d = d[order(d$k.cluster),]
heatmap(as.matrix(d))
but it is still not sorted and looks like this link: But, I want it to be sorted by its cluster number and looked like this:
Can I do this in R?
I searched lots of packages and tried many ways, but I still have a problem.
Thanks a lot.
You can do this using reshape2 and ggplot2 as follows:
library(reshape2)
library(ggplot2)
# Create dummy data
set.seed(123)
df <- data.frame(
a = sample(1:5, 1000, replace=TRUE),
b = sample(1:5, 1000, replace=TRUE),
c = sample(1:5, 1000, replace=TRUE)
)
# Perform clustering
k <- kmeans(df, 3)
# Append id and cluster
dfc <- cbind(df, id=seq(nrow(df)), cluster=k$cluster)
# Add idsort, the id number ordered by cluster
dfc$idsort <- dfc$id[order(dfc$cluster)]
dfc$idsort <- order(dfc$idsort)
# use reshape2::melt to create data.frame in long format
dfm <- melt(dfc, id.vars=c("id", "idsort"))
ggplot(dfm, aes(x=variable, y=idsort)) + geom_tile(aes(fill=value))
You should set Rowv and Colv to NA if you don't want the dendrograms and the subseuent ordering. BTW, You should also put of the scaling. Using the df of Andrie :
heatmap(as.matrix(df)[order(k$cluster),],Rowv=NA,Colv=NA,scale="none",labRow=NA)
In fact, this whole heatmap is based on image(). You can hack away using image to construct a plot exactly like you want. Heatmap is using layout() internally, so it will be diffucult to set the margins. With image you could do eg :
myHeatmap <- function(x,ord,xlab="",ylab="",main="My Heatmap",
col=heat.colors(5), ...){
op <- par(mar=c(3,0,2,0)+0.1)
on.exit(par(op))
nc <- NCOL(x)
nr <- NROW(x)
labCol <- names(x)
x <- t(x[ord,])
image(1L:nc, 1L:nr, x, xlim = 0.5 + c(0, nc), ylim = 0.5 +
c(0, nr), axes = FALSE, xlab=xlab, ylab=ylab, main=main,
col=col,...)
axis(1, 1L:nc, labels = labCol, las = 2, line = -0.5, tick = 0)
axis(2, 1L:nr, labels = NA, las = 2, line = -0.5, tick = 0)
}
library(RColorBrewer)
myHeatmap(df,order(k$cluster),col=brewer.pal(5,"BuGn"))
To produce a plot that has less margins on the side. You can also manipulate axes, colors, ... You should definitely take a look at the RColorBrewerpackage
(This custom function is based on the internal plotting used by heatmap btw, simplified for the illustration and to get rid of all the dendrogram stuff)

Resources