I want to do an agglomerative clustering with R, but I want to use my own distance as a linkage method instead of the ones already predefined. How can I can I embed my own distance as a linkage method using the function hclust in R?
Related
I used the R package corrplot to visualize the correlation matrix from my data. I involved the clustering of variables using the embedded option hclust.
The invocation of the command was like this (plus various arrangements of titles, axes etc):
corrplot(Rbas,type="upper",order="hclust",method="ellipse")
But now I perform some analysis and visualizations using other packages, and the question arose about the compatibility of results. In particular, I have to repeat manually the clustering of the correlation matrix. But from the documentation to corrplot there is one obscure point: what dissimilarity measure was used in corrplot behind its reasonable defaults? Whether this is 1-|corr|, sqrt(1-corr^2), or anything else? In literature there are multiple choices, for example, as described in this article
Update to answer own question. I performed a guess trial, using the dissimilarity measure in the form 1-corr. That is I coded (Rbas is the correlation matrix):
dissim1<-1-Rbas
dist1<-as.dist(dissim1)
plot(hclust(dist1))
and recovered the ordering of variables, coinciding with the one suggested by default corrplot with hclust invocation. But it is not clear whether this is indeed their used mechanism and whether this will hold for any other matrix?
The function used by corrplot to reorder variables is corrMatOrder (try ?corrMatOrder). It returns a single permutation vector.
When order= "hclust" is selected in corrplot, corrMatOrder invokes the corrplot:::reorder_using_hclust function:
function (corr, hclust.method)
{
hc <- hclust(as.dist(1 - corr), method = hclust.method)
order.dendrogram(as.dendrogram(hc))
}
This function uses 1-corr as dissimilarity measure.
I am trying to run function pvclust, but using simpson dissimilarities instead using one of the default distances. Can I include a distance function inside the pvclust (method.dist). I already have my simpson dissimilarity intex as a dist. object from the package betapart, and that is the one that I want to use in pvclust.
Thanks!
I would like to apply k-nn on a learning set to predict the class of the data using the Euclidean distance.
I am finding some difficulties with implementing this method
You can try the knn() function in class package. It is using the Euclidean Distance.
You can check its documentation for more detail:
https://stat.ethz.ch/R-manual/R-devel/library/class/html/knn.html
I am using the function, kmeans, to perfrom K-means clustering.
I have a special data which need a custom distance measure function and custom mean function.
Can I put (1) a custom distance measure function and (2) custom mean function to the kmeans function?
It seems it uses Euclidean measure only.
The standard kmeans does not allow this, for good reasons. It uses some clever algorithms (Hartigan and Wong; which is why it is much faster than the standard Lloyd textbook algorithm you find in about 100 other R packages). But these only work for the classic k-means scenario with squared deviations (which means assigning each cluster to the Euclidean nearest center, but it actually optimizes least-squares, not Euclidean distances).
I doubt you can simply plug in other distances and centroid functions into the Hartigan and Wong method (apart from it being written in Fortran, so you cannot just plug in a R function there anyway).
Beware that there are very few known combinations where other distances and means are known to always converge well. Bregman divergences should be fine, and cosine is equivalent to squared Euclidean on a sphere, so it will also work.
I know very little about R, but I need to convert the dendrogram resulted from hierarchical clustering in matlab into R dendrogram structure. The following table shows the dendrogram resulted from hierarchical clustering in matlab function; where the first and the second column are the IDs for the objects or branches, and the third column is the distance.
Is there a way to map this table (or matlab dendrogram) into R dendrogram?
I think that the easiest way for you to have a dendrogram in R is to use some intermediate results from your matlab analysis instead of using the final table.
Assuming that you have a dissimilarity matrix called Diss_Mat (which you should definitely evaluate at some point of your matlab algorithm), you could do the following
DIST_Mat=as.dist(Diss_Mat) #create a dist type object
dendro=as.dendrogram(hclust(DIST_Mat))
where with the second line you perform the hierarchical clustering in R and then you create a dendrogram type object.