I am trying to do Hierarchical clustering on a dataset where the columns are ordinal on the scale of 1 to 5.
Based on Hierarchical clustering can be done using hclust() function.
For doing analysis with ordinal data, we should use "Max" distance or Chebyshev distance method.
But which Linkage should I use with Chebyshev distance as most of the Linkage using squared Euclidean distance. like following linkage methods - Ward, Centroid and Median use squared Euclidean distance.
Linkage - ward.D, ward.D2, Single, Complete, Average, Centroid, median.
So what Linkage should I use with Chebyshev distance to do hierarchical clustering for Ordinal Data?
Related
I'd like to ask if there is an R package that can perform multiple correspondence analysis with both row weights and a varimax based rotation? I only found FactoMineR which accepts row weights, but does no rotation and PCAmixdata which does rotation, but only accepts col weights.
Thanks!
The core of Principal Componenet Analysis (PCA) lies at calculating eigenvalues and eigenvectors from the variance-covariance matrix corresponding to some dataset (for example, a matrix of multivariate data coming from a set of individuals). Text-book knowledge I have is that:
a) by multipliying such eigenvectors with the original data matrix one can calculate "scores" (as many as orignal set of variables) which are independent from each other
b) the eigenvalues summarize the amount of variance of each score.
These two properties make this process a very effective data transformation technique to simplify the analysis of multivariate data.
My question is why is that so? why is calculating eigenvalues and eigenvectors from a covariance-variance matrix results in such unique properties of the scores?
I created a phylogenetic NJ tree in R using the ape package. My data contains metric measures from multiple individuals belonging to known groups. Thus, I decided to calculate the Mahalanobis distance between these groups in order to incorporate the covariance structure in my analyses. Creating the nj tree thus was not a problem.
require(ape)
lda <- lda(y, as.factor(ynames))
dist <- dist(as.matrix(predict(lda,lda$mean)$x),upper=T,diag=T)
plot(nj(dist))
However, now I'd like to calculate some bootstrap values for branch splits. I'd use the boot.phylo function, but here I have no idea how I can deal with the FUN (function) command, and thus with the correct calculation of Mahalanobis distances for the bootstrapped data set.
I am trying to find the centroid and covariance matrix used by mvoutlier to calculate its Mahalanobis distance. When I try to calculate the Mahalanobis distance myself it does not line up with the results from mvoutlier. The Mahalanobis distances also vary based on the value for alpha that is plugged into mvoutlier. Is there a function in the mvoutlier package to generate these two things or is there another way?
I am trying to train my data with nearest shrunken centroid classifier using pamr.train() function in pamr package of R. However, I also have a vector including sample weights except the training data. Is there any way to use this function with considering these sample weights?
Or, is there a way to obtain the source code of this function. If so, I can write the codes for weighted mean and weighted variances instead of the unweighted ones.
Thank you,