R: display clustered heatmap with simlarity matrix - r

I have a simlarity matrix as follows:
xx <- cor(matrix(rnorm(650), ncol =25))
I want to cluster this similarity matrix and image in a heatmap. Is the following correct?
yy <- heatmap(1-xx, Rowv=T, scale='none',symm = T,keep.dendro=F,
Here, I am taking 1-xx which is a dissimilarity matrix. Is this the right thing to do, or should it be input in some other way?

I have figured it out upon reading one of the examples in R. Here is what one has to do using the similarity matrix.
hU <- heatmap(xx, Rowv = FALSE, symm = TRUE,
distfun = function(c) as.dist(1 - c),
hclustfun = function(d) hclust(d, method = "single"),
keep.dendro = FALSE)
I hope that this helps someone!

Related

How to label just one observation in hierarchical clustering tree with dendextend?

I'd like to create a hierarchical clustering tree of a relatively large dataset (>3000 obs). Unfortunately, by including so many labels at the terminal nodes, the tree looks very cluttered and contains lots of unnecessary information. So to reduce the clutter, I'd like to just label one observation of interest. I have removed all of the labels but I don't know how to retrieve and add the label that I'm interested in.
For this MWE, let's assume, I'd like to add the letter k to my dendrogram.
library(dendextend)
library(cluster)
library(tidyverse)
set.seed(1)
a <- rnorm(20)
b <- rnorm(20)
c <- rnorm(20)
df <- as.data.frame(a, b, c)
names(df) <- letters[length(df)]
my_dist <- dist(df)
my_clust <- hclust(my_dist)
my_dend <- as.dendrogram(my_clust)
plot(color_branches(my_dend, k = 3), leaflab = "none", horiz = T)
You can specify the labels set function. If you only want to show one, make the others be the null string.
LAB = rep("", nobs(my_dend))
LAB[15] = "N15"
my_dend = set(my_dend, "labels", LAB)
plot(color_branches(my_dend, k = 3), horiz = T)

Most efficient way of drawing a heatmap from matrix with OpenGL?

Assume a matrix m of integer values:
m <- matrix(sample(1:10, 100, replace = TRUE), nrow = 10)
Given a colour palette that maps those values from 1 to 10 to some colours, how to show matrix m as a heatmap in R with OpenGL graphics, e.g. using the rgl package? (Preferably in the most efficient way.)
The very thorough answer here suggests this may not be what you want; you might want to try the solution below against the other solutions benchmarked there. Nonetheless:
Set up data and colour map
set.seed(101)
library(viridisLite)
vv <- viridis(10)
m <- matrix(sample(1:10, 100, replace = TRUE), nrow = 10)
Draw the picture:
library(rgl)
view3d(theta=0, phi=0) ## head-on view
par3d(zoom=0.7) ## (almost) fill window
surface3d(x = 1:10, y = 1:10, z = matrix(0, 10,10),
color = vv[m],
smooth=FALSE, lit=FALSE ## turn off smoothing/lights
)
You may need to use pop3d() between surfaces to clear the previous surface ...

Random Graph Function in R

I have an assignment in which I have to generate my own random graph function in R, with an igraph output. I've figured out that the easiest way to do this is to simply generate a square matrix and then build a function which creates edges between the nodes in the matrix. However I'd like to do something special, where the probability of the edges are based on forming a higher likelihood of sybil networks. Would look like this:
My matrix is generated and visualised quite simply like this:
library(ggraph)
library(igraph)
NCols <- 20
NRows <- 20
myMat <-matrix(runif(NCols*NRows), ncol = NCols)
myMat
randomgraph <- graph_from_adjacency_matrix(myMatG, mode = "undirected", weighted = NULL, diag = TRUE, add.colnames = NULL, add.rownames = NA)
randomgraph %>%
ggraph() +
geom_node_point(colour = "firebrick4", size = 0.5, show.legend = F)
I know there are functions like Erdos-Renyi Random- (for a true random graph), Barabási-Albert Scale-Free- and Watts-Strogatz Small-World graphs. I'm trying to write my own with a unique twist.
Any advice or code snippets on how to write my own preferential attachment function for the random matrix would be greatly appreciated! Thank you!

K-means clustering in R

I'm a beginner in R and I followed this tutorial on K-means clustering. However, I'm trying to run this algorithm on real data. I chose : http://exoplanet.eu/catalog/
I have loaded data :
d <- read.csv2(
"exoplanet.eu_catalog.csv",
header = TRUE,
sep = ","
)
With this code :
plot(
x = log(as.numeric(as.character(d$semi_major_axis))),
y = log(as.numeric(as.character(d$mass))),
xlab = "Star-exoplanet distance (log(UA))",
ylab = "Mass of exoplanets (log(M[Jupiter]))"
)
I have the following graphic :
I'd like to run the K-means clustering algorithm on this graphic to show three clusters with colors but I don't know how to proceed in R. I suppose I have to begin with :
y = log(as.numeric(as.character(d$mass)))
y <- y[!is.na(y)]
x = log(as.numeric(as.character(d$semi_major_axis)))
x <- x[!is.na(x)]
But I don't know how to format data into a matrix in order to run kmeans(matrix, 3, nstart = 20). Any clue please ?
Since you read your file using
d <- read.csv2("exoplanet.eu_catalog.csv",
header = TRUE,
sep = ",")
Your data is in the form of data frame and you need to convert as a matrix
Use this code to convert a data frame into matrix
inMatrixForm <- data.matrix(d)

setting distance matrix and clustering methods in heatmap.2

heatmap.2 defaults to dist for calculating the distance matrix and hclust for clustering.
Does anyone now how I can set dist to use the euclidean method and hclust to use the centroid method?
I provided a compilable code sample bellow.
I tried: distfun = dist(method = "euclidean"),
but that doesn't work. Any ideas?
library("gplots")
library("RColorBrewer")
test <- matrix(c(79,38.6,30.2,10.8,22,
81,37.7,28.4,9.7,19.9,
82,36.2,26.8,9.8,20.9,
74,29.9,17.2,6.1,13.9,
81,37.4,20.5,6.7,14.6),ncol=5,byrow=TRUE)
colnames(test) <- c("18:0","18:1","18:2","18:3","20:0")
rownames(test) <- c("Sample 1","Sample 2","Sample 3", "Sample 4","Sample 5")
test <- as.table(test)
mat=data.matrix(test)
heatmap.2(mat,
dendrogram="row",
Rowv=TRUE,
Colv=NULL,
distfun = dist,
hclustfun = hclust,
xlab = "Lipid Species",
ylab = NULL,
colsep=c(1),
sepcolor="black",
key=TRUE,
keysize=1,
trace="none",
density.info=c("none"),
margins=c(8, 12),
col=bluered
)
Glancing at the code for heatmap.2 I'm fairly sure that the default is to use dist, and it's default is in turn to use euclidean distances.
The reason your attempt at passing distfun = dist(method = 'euclidean') didn't work is that distfun (and hclustfun) are supposed to simply be name of functions. So if you want to alter defaults and pass arguments you need to write a wrapper function like this:
heatmap.2(...,hclustfun = function(x) hclust(x,method = 'centroid'),...)
As I mentioned, I'm fairly certain that heatmap.2 is using euclidean distances by default, but a similar solution can be used to alter the distance function used:
heatmap.2(...,distfun = function(x) dist(x,method = 'euclidean'),...)

Resources