I have several questions:
1. What's the difference between isoMDS and cmdscale?
2. May I use asymmetric matrix?
3. Is there any way to determine optimal number of dimensions (in result)?
One of the MDS methods is distance scaling and it is divided in metric and non-metric. Another one is the classical scaling (also called distance geometry by those in bioinformatics). Classical scaling can be carried out in R by using the command cmdscale. Kruskal's method of nonmetric distance scaling (using the stress function and isotonic regression) can be carried out by using the command isoMDS in library MASS.
The standard treatment of classical scaling yields an eigendecomposition problem and as such is the same as PCA if the goal is dimensionality reduction. The distance scaling methods, on the other hand, use iterative procedures to arrive at a solution.
If you refer to the distance structure, I guess you should pass a structure of the class dist which is an object with distance information. Or a (symmetric) matrix of distances, or an object which can be coerced to such a matrix using as.matrix(). (As I read in the help, only the lower triangle of the matrix is used, the rest is ignored).
(for classical scaling method): One way of determining the dimensionality of the resulting configuration is to look at the eigenvalues of the doubly centered symmetric matrix B (= HAH). The usual strategy is to plot the ordered eigenvalues (or some function of them) against dimension and then identify a dimension at which the eigenvalues become “stable” (i.e., do not change perceptively). At that dimension, we may observe an “elbow” that shows where stability occurs (for points of a n-dimensional space, stability in the plot should occur at dimension n+1). For easier graphical interpretation of a classical scaling solution, we usually choose n to be small, of the order 2 or 3.
Related
I'm working on a standard problem in random walk in two dimension.
In particular, I'm trying to generate a migration matrix for a toroidal model (Toroidal Transition Matrix) in R.
I have previously posted the same problem here: Transition matrix for a two-dimensional random walk in a torus: compute matrix in R
However, I did not get any feedback.
Similarly to what I mentioned in that post, I decided to assume independent movement along each dimension. Therefore, instead of calculating a Toroidal migration matrix and retrieve transition probabilities, I have multiplied independent probabilities from two separate 'one-dimensional circular model'.
I'm not sure whether this is formally correct and I would like to have some opinion on this regard.
Yes, to walk within the torus with each dimension being independent you do just multiply the transition probabilities. If there are n states in your circular graph, then there would be n^2 states in your torus and so you should expect a n^2 x n^2 transition matrix.
To give any more detail you'll have to specify exactly what you need to know.
I have created a positive definite matrix from Wishart in Julia using the Distribution package. I want to use this to generate random multivariate normal with the specified precision. Hence I use the canonical form of MvNormal, which is MvNormalCanon.
However I get a bit confused as the randomly generated matrix from Wishart although positive definite, its inverse is not. So sometimes it causes trouble generating from multivariate normal using that precision.
For example:
using Distributions
X=rand(Wishart(10, eye(10)))
isposdef(X) // true
isposdef(inv(X)) // false
I also use the MvNormalCanon for generating random vectors as below:
rand(MvNormalCanon(X*μ, X))
where μ is my mean vector. But the above creates a Base.LinAlg.PosDefException(1).
Should the inverse also be positive definite, and if yes why does Julia act like this?
P.S.It might be adding a tiny bit to the scale matrix in Wishart might resolve the problem.
I am using the function, kmeans, to perfrom K-means clustering.
I have a special data which need a custom distance measure function and custom mean function.
Can I put (1) a custom distance measure function and (2) custom mean function to the kmeans function?
It seems it uses Euclidean measure only.
The standard kmeans does not allow this, for good reasons. It uses some clever algorithms (Hartigan and Wong; which is why it is much faster than the standard Lloyd textbook algorithm you find in about 100 other R packages). But these only work for the classic k-means scenario with squared deviations (which means assigning each cluster to the Euclidean nearest center, but it actually optimizes least-squares, not Euclidean distances).
I doubt you can simply plug in other distances and centroid functions into the Hartigan and Wong method (apart from it being written in Fortran, so you cannot just plug in a R function there anyway).
Beware that there are very few known combinations where other distances and means are known to always converge well. Bregman divergences should be fine, and cosine is equivalent to squared Euclidean on a sphere, so it will also work.
I'm wondering how one chooses a specific k in Shi-Malik Algo.
Do we choose several ks and rank them via their SSE measures?
Does k reflect the number of clusters we assume for the data?
kind regards Mikey
Yes, K is the number of natural grouping we believe their is in the data.
You can find K by exploring the eigenvalues.
One tool which is particularly designed for spectral clustering is the Eigengap heuristic (also called spectral gap) - number of clusters k is usually given by the value of k that maximizes the eigengap (difference between consecutive eigenvalues). i.e., choose the number k such that all eigenvalues λ1, . . . , λk are very small, but λk+1 is relatively large.
The larger this eigengap is, the closer the eigenvectors of the ideal case and hence the better spectral clustering works. If you're interested on the justifications for this procedure, it is based on perturbation theory and spectral graph theory.
You can read more here: A Tutorial on Spectral Clustering - Ulrike von Luxburg
Other way to explore the natural grouping: number of connected components and the spectrum of the Laplacian matrix - the number of times 0 appears as an eigenvalue in the Laplacian is the number of connected components in the graph. Your affinity matrix can be considered as a graph and then, try to look how many connected components you have in the graph. That will give you a sense of the neutral structure of your data..
In addition, as you mentioned, we can set a validation criterion (for example, SSE) and see its value under different values of K. That's fine once you have a labeled data (which is not always the case in clustering) and you know that this criterion/quality measure is really meaningful.
First of all, can someone explain what vector quantization is, its purpose, and what it does? Secondly, an explanation of how k-means is used to do this would be appreciated as well.
For the record, I don't know if this will make a difference in the explanation, but I'm trying to learn about vector quantization in the context of boundary descriptors. If I calculated a number of boundary descriptors for a particular segment in an image, and I wanted to vector quantize them using k-means, what would this mean, what would this do, why would I want to do, and how would I do it?
Vector quantization is the process of discretizing a random variable valued in some vector space. The result is the projection of that random variable onto a finite set of knots. It is used for signal transmission, quadrature, variance reduction and a lot of other applications.
Optimal quantization consists in choosing the knots in such a way to minimize the mean L^p discretization error.
The K-means is also called Lloyd algorithm consists in starting from an arbitrary set of knots, (or codebook), and iteratively replace each one of them by the L^p-median (or simply by the mean for quadratic quantization) of the probability distribution given that it falls in the Voronoi cell of that knot. An interactive animation is available here.
The historical reference on the Lloyd algorithm is the following
Stuart P. Lloyd, Least squares quantization in PCM, IEEE Transactions on Information Theory, vol. 28, issue 2, pp. 129–137, 1982
K-means algorithms always decreases the quantization error but does not always converge to the globally optimal quantizer. Although, in the case of one-dimensional log-concave distributions, the algorithm converges to a unique global minimum.
The optimal quantization web site contains an extensive bibliography on the matter of vector quantization and functional quantization.