How to draw dendrogram using ape package in R?

I have a distance matrix of ~200 x 200 size I an unable to plot a dendrogram using the BioNJ option of ape library in R
The size is big to make the plot visible
What ways can I improve the visibility

Two options depending on your data
If you need to calculate the distance matrix of your data then use
set.seed(1) # makes random sampling with rnorm reproducible
# example matrix
m <- matrix(rnorm(100), nrow = 5) # any MxN matrix
distm <- dist(m) # distance matrix
hm <- hclust(distm)
If your data is a distance matrix (must be a square matrix!)
# example matrix
m <- matrix(rnorm(25), nrow=5) # must be square matrix!
distm <- as.dist(m)
hm <- hclust(distm)
A 200 x 200 distance matrix gives me a reasonable plot
# example matrix
m <- matrix(rnorm(200*200), nrow=200) # must be square matrix!
distm <- as.dist(m)
hm <- hclust(distm)


Error in R: Cosine similarity and MDS

I calculate the cosine similarity with cosine() from the package 'lsa'. Here with three test vectors:
d <- data.frame(c(-1,1,0,-1,1,1,-1,1,0),c(-1,1,1,1,-1,1,-1,0,1),c(0,0,1,0,-1,-1,0,1,-1))
colnames(d) <- c("vector1","vector2","vector3")
d_dist <- cosine(as.matrix(d))
Now, I want to do dimensionality reduction with cmdscale and after that plot it as a scatterplot:
fit <- cmdscale(d_dist,k=2)
x <- fit[,2]
y <- fit[,1]
But I always get the warning In cmdscale (d_dist, k = 2): only 0 of the first 2 eigenvalues ​​are> 0 [translated from German] and an empty fit object.
What am I doing wrong? Thank you so much for your help!
The input should be a distance matrix. E.g.:
d_dist <- 1-d_dist
fit <- cmdscale(d_dist,k=2)
x <- fit[,2]
y <- fit[,1]

To find intersection of clusters in R

Let's assume I have done several operations and created cluster vectors of correlation values shown below
D <- matrix(rexp(10*10,rate=.1), ncol=10) #create a randomly filled 10x10 matrix
C <- matrix(rexp(10*10,rate=.1),ncol=10)
DCor <- cor(D) # generate correlation matrix
CCor <- cor(C)
DUpper<- DCor[upper.tri(DCor)] # extract upper triangle
CUpper<- CCor[upper.tri(CCor)]
ClusterD <- kmeans(DUpper,3) # cluster correlations
ClusterC <- kmeans(CUpper,3)
ClusterC <- cbind(c(1:45),matrix(ClusterC$cluster)) # add row numbers as column
ClusterD <- cbind(c(1:45),matrix(ClusterD$cluster))
I would like to generate a matrix shows the intersection of each cluster group. In this matrix, 5 rows belong to both C1 and D2 group.
How can I generate a matrix like this?
Before the cbind lines, you could do:
table(ClusterC$cluster, ClusterD$cluster)

How to input dissimilarity matrix in spatial analysis in spdep R

Aim: I want to create a dissimilarity matrix between pairs of coordinates. I want to use this matrix as an input to calculate local spatial clusters using Moran's I (LISA) and latter in geographically weighted regression (GWR).
Problem: I know I can use dnearneigh{spdep} to calculate a distance matrix. However, I want to use the travel-time between polygons I already have estimated. In practice, I think this would be like inputting a dissimilarity matrix that tells the distance/difference between polygons based on a another characteristic. I've tried inputting my matrix to dnearneigh{spdep}, but I get the error Error: ncol(x) == 2 is not TRUE
dist_matrix <- dnearneigh(diss_matrix_invers, d1=0, d2=5, longlat = F, row.names=rn)
Any suggestions? There is a reproducible example below:
EDIT: Digging a bit further, I think I could use mat2listw{spdep} but I'm still not sure it keeps the correspondence between the matrix and the polygons. If I add row.names = T it returns an error row.names wrong length :(
listw_dissi <- mat2listw(diss_matrix_invers)
lmoran <- localmoran(oregon.tract#data$white, listw_dissi,
zero.policy=T, alternative= "two.sided")
Reproducible example
# load data
# get centroids as a data.frame
centroids <- gCentroid(oregon.tract, byid=TRUE) )
# Convert row names into first column
setDT(centroids, keep.rownames = TRUE)[]
# create Origin-destination pairs
od_pairs <- expand.grid.df(centroids, centroids) %>% setDT()
colnames(od_pairs) <- c("origi_id", "long_orig", "lat_orig", "dest_id", "long_dest", "lat_dest")
# calculate dissimilarity between each pair.
# For the sake of this example, let's use ellipsoid distances. In my real case I have travel-time estimates
od_pairs[ , dist := distGeo(matrix(c(long_orig, lat_orig), ncol = 2),
matrix(c(long_dest, lat_dest), ncol = 2))]
# This is the format of how my travel-time estimates are organized, it has some missing values which include pairs of origin-destination that are too far (more than 2hours apart)
od_pairs <- od_pairs[, .(origi_id, dest_id, dist)]
od_pairs$dist[3] <- NA
> origi_id dest_id dist
> 1: oregon_0 oregon_0 0.00000
> 2: oregon_1 oregon_0 NA
> 3: oregon_2 oregon_0 39874.63673
> 4: oregon_3 oregon_0 31259.63100
> 5: oregon_4 oregon_0 33047.84249
# Convert to matrix
diss_matrix <- acast(od_pairs, origi_id~dest_id, value.var="dist") %>% as.matrix()
# get an inverse matrix of distances, make sure diagonal=0
diss_matrix_invers <- 1/diss_matrix
diag(diss_matrix_invers) <- 0
Calculate simple distance matrix
# get row names
rn <- sapply(slot(oregon.tract, "polygons"), function(x) slot(x, "ID"))
# get centroids coordinates
coords <- coordinates(oregon.tract)
# get distance matrix
diss_matrix <- dnearneigh(diss_matrix_invers, d1=0, d2=5, longlat =T, row.names=rn)
> [1] "nb"
Now how to use my diss_matrix_invers here?
you are right about the use of matlistw{spdep}. By default the function preserves the names of rows to keep correspondence between the matrix. You can also specify the row.names like so:
listw_dissi <- mat2listw(diss_matrix_invers, row.names = row.names(diss_matrix_invers))
The list that is created will contain the appropriate names for the neighbours along with their distance as weights. You can check this by looking at the neighbours.
And you should be able to use this directly to calculate Moran's I.
There is no way you can use diss_matrix within dnearneigh{spdep}, as this function takes in a list of coordinates.
however, if you need to define a set of neighbours given a distance threshold (d1,d2) using your own distance matrix (travel-time). I think this function can do the trick.
dis.neigh<-function(x, d1 = 0, d2=50){
#x must be a symmetrical distance matrix
#create empty list
style = "M" #for style unknown
#set attributes of neighbours list
attr(neighbours, "class")<-"nb"
attr(neighbours, "distances")<-c(d1,d2)
attr(neighbours, "")<-colnames(x)
#check each row for neighbors that satisfy distance threshold
for(row in c(1:nrow(x))){
for(col in c(1:ncol(x))){
if(x[row,col]>d1 && x[row,col]<d2){
weight[j]<-1/x[row,col] #inverse distance (dissimilarity)
#create neighbour and weight list
res <- list(style = style, neighbours = neighbours, weights = weights)
class(res) <- c("listw", "nb")
attr(res, "") <- attr(neighbours, "")
attr(res, "call") <-
And use it like so:
nb_list<-dis.neigh(diss_matrix, d1=0, d2=10000)
lmoran <- localmoran(oregon.tract#data$white, nb_lists, alternative= "two.sided")

Calculating divergence between joint posterior distributions

I wish to calculate the distance between two 3-dimensional posterior distributions. The draws are stored at two 30,000x3 matrices.
So far I have been successful in calculating Total Variation distance between two 2-dimensional posteriors (two 30,000x2 matrices) by splitting the grid into bins. However, I am having trouble calculating the divergence between posteriors with more parameters. Some examples of related distance measures can be found here.
NOTE: I do not wish to calculate the distance between the marginals (column-wise entries), rather than obtain an overall value after comparing the joint distributions in R.
I would really appreciate it if somebody could point out what I am missing here.
EDIT 1: Some example code for calculating Total variation distance between posterior samples stored in two matrices has been added below:
EDIT 2: This is a R question.
comparison.2D <- matrix(rnorm(40000*2,0,1),ncol=2)
ground.truth.2D <- matrix(rnorm(40000*2,0,2),ncol=2)
# Function to calculate TVD between matrices with 2 columns:
# Bandwidth for theta.1.
# Bandwidth for theta.2.
xx <- seq(range_x[1],range_x[2],by=my_bw_x)
yy <- seq(range_y[1],range_y[2],by=my_bw_y)
true.pointidxs <- matrix( c( findInterval(true[-c(1:burnin),1], xx),
findInterval(true[-c(1:burnin),2], yy) ), ncol=2)
comparison.pointidxs <- matrix( c( findInterval(comparison[-c(1:burnin),1], xx),
findInterval(comparison[-c(1:burnin),2], yy) ), ncol=2)
# Count the frequencies in the corresponding cells:
square.mat.dims <- max(length(xx),nrow=length(yy))
frequencies.true <- frequencies.comparison <- matrix(0, ncol=square.mat.dims, nrow=square.mat.dims)
for (i in 1:dim(true.pointidxs)[1]){
frequencies.true[true.pointidxs[i,1], true.pointidxs[i,2]] <- frequencies.true[true.pointidxs[i,1],
true.pointidxs[i,2]] + 1
frequencies.comparison[comparison.pointidxs[i,1], comparison.pointidxs[i,2]] <- frequencies.comparison[comparison.pointidxs[i,1],
comparison.pointidxs[i,2]] + 1
}# End for
# Normalize frequencies matrix:
frequencies.true <- frequencies.true/dim(true.pointidxs)[1]
frequencies.comparison <- frequencies.comparison/dim(comparison.pointidxs)[1]
TVD <-0.5*sum(abs(frequencies.comparison-frequencies.true))
}# End function
TVD.2D <- Total.Variation.Distance.2D(true=ground.truth.2D, comparison=comparison.2D,burnin=10000,window.size=0.05)

applying the pvclust R function to a precomputed dist object

I'm using R to perform an hierarchical clustering. As a first approach I used hclust and performed the following steps:
I imported the distance matrix
I used the as.dist function to transform it in a dist object
I run hclust on the dist object
Here's the R code:
distm <- read.csv("distMatrix.csv")
d <- as.dist(distm)
hclust(d, "ward")
At this point I would like to do something similar with the function pvclust; however, I cannot because it's not possible to pass a precomputed dist object. How can I proceed considering that I'm using a distance not available among those provided by the dist function of R?
I've tested the suggestion of Vincent, you can do the following (my data set is a dissimilarity matrix):
# Import you data
distm <- read.csv("distMatrix.csv")
d <- as.dist(distm)
# Compute the eigenvalues
x <- cmdscale(d,1,eig=T)
# Plot the eigenvalues and choose the correct number of dimensions (eigenvalues close to 0)
type="h", lwd=5, las=1,
xlab="Number of dimensions",
# Recover the coordinates that give the same distance matrix with the correct number of dimensions
x <- cmdscale(d,nb_dimensions)
# As mentioned by Stéphane, pvclust() clusters columns
If the dataset is not too large, you can embed your n points in a space of dimension n-1, with the same distance matrix.
# Sample distance matrix
n <- 100
k <- 1000
d <- dist( matrix( rnorm(k*n), nc=k ), method="manhattan" )
# Recover some coordinates that give the same distance matrix
x <- cmdscale(d, n-1)
stopifnot( sum(abs(dist(x) - d)) < 1e-6 )
# You can then indifferently use x or d
r1 <- hclust(d)
r2 <- hclust(dist(x)) # identical to r1
r3 <- pvclust(x)
If the dataset is large, you may have to check how pvclust is implemented.
It's not clear to me whether you only have a distance matrix, or you computed it beforehand. In the former case, as already suggested by #Vincent, it would not be too difficult to tweak the R code of pvclust itself (using fix() or whatever; I provided some hints on another question on CrossValidated). In the latter case, the authors of pvclust provide an example on how to use a custom distance function, although that means you will have to install their "unofficial version".
