I need to calculate the trace of a matrix to the power of 3 and 4 and it needs to be as fast as it can get.
The matrix here is an adjacency matrix of a simple graph, therefore it is square, symmetric, its entries are always 1 or 0 and the diagonal elements are always 0.
Optimization is trivial for the trace of the matrix to the power of 2:
We only need the diagonal entries (i,i) for the trace, skip all others
As the matrix is symmetric these entries are just the entries of the i-th row squared and summed up
And as the entries are just 1 or 0 the square-operation can be skipped
Another idea I found on wikipedia was summing up all elements of the Hadamard product, i.e. entry-wise multiplication, but I don't know how to extend this method to the power of 3 and 4.
See http://en.wikipedia.org/wiki/Trace_(linear_algebra)#Properties
Maybe I'm just blind but I can't think of a simple solution.
In the end I need a C++ implementation, but I think that's not important to the question.
Thanks in advance for any help.
The trace is the sum of the eigenvalues and the eigenvalues of a matrix power are just the eigenvalues to that power.
That is, if l_1,...,l_n are the eigenvalues of your matrix then trace(M^p) = 1_1^p + l_2^p +...+l_n^p.
Depending on your matrix you may want to go with computing the eigenvalues and then summing. If your matrix has low rank (or can be well approximated with a low rank matrix) you can compute the eigenvalues very cheaply (a partial eigendecomposition has complexity O(n*k^2) where k is the rank).
Edit: You mention in the comments that it's 1600x1600 in which case finding all the eigenvalues should be no problem. Here's one of many C++ codes that you can use for this http://code.google.com/p/redsvd/
Ok, I just figured this one out myself.
The important thing I did not know was this:
If A is the adjacency matrix of the directed or undirected graph G, then the matrix An (i.e., the matrix product of n copies of A) has an interesting interpretation: the entry in row i and column j gives the number of (directed or undirected) walks of length n from vertex i to vertex j. This implies, for example, that the number of triangles in an undirected graph G is exactly the trace of A^3 divided by 6.
(Copied from http://en.wikipedia.org/wiki/Adjacency_matrix#Properties)
Retrieving the number of paths of a given length from node i to i for all n nodes can essentially be done in O(n) when dealing with sparse graphs and using adjacency lists instead of matrices.
Nevertheless, thanks for your answers!
Related
I am not a mathematician, so I need to understand what SVD does and WHY more than how it works exactly from the math perspective. (I understand at least what is the decomposition though).
This guy on youtube gave the only human explanation of SVD saying, that the U matrix maps "user to concept correlation" Sigma matrix defines the strength of each concept, and V maps "movie to concept correlation" given that initial matrix M has users in the rows, and movie (ratings) in the columns.
He also mentioned two concept specifically "sci fi" and "romance" movies. See the picture below.
My questions are:
How SVD knows the number of concepts. He as human mentioned two - sci fi, and romance, but in reality in resulting matrices are 3 concepts. (for example matrix U - that one with blue titles - has 3 columns not 2).
How SVD knows what is the concept after all. I mean, what If i shuffle the columns randomly how SVD then knows what is sci fi, what is romance. I mean, I suppose there is no rule, group the concepts together in the column order. What if scifi movie is the first and last one? and not first 3 columns in the initial matrix M?
What is the practical usage of either U, Sigma or V matrices? (Except that you can multiply them to get the initial matrix M)
Is there also any other possible human explanation of SVD than the guy up provided, or it is the only one possible function? Matrices of correlations.
As was pointed out in the comments you may well get better explanations elsewhere. However since the question is still open, here is my tuppence worth.
Throughout I'll suppose that A is mxn where m>=n, ie that A has more rows than columns.
First of all there are many forms of the SVD, differing in the sizes of the matrices. They all share the fundamental properties that
A = U*S*V'
S is diagonal
U and V have orthogonal columns (ie U'*U = I, V'*V = I)
Perhaps the most useful from a theoretical point of view is the 'full fat' svd where we have that U is mxm, S is mxn and V is nxn. However this has rather a lot of elements that don't really contribute to A. For example S being diagonal we can write
S = ( S1 ) (where S1 is nxn )
( 0 )
If we divide up U into
U = ( U1 U2) (where U1 is mxn and U2 is (mx(m-n)))
Then its straightforward to calculate that
U*S = U1*S1
and so we can throw away the last m-n columns of U and the last m-n rows of S, and still recover A.
Moreover some of the diagonal elements of S1 may be 0; suppose in fact that p<n of them are non zero. Then we can write
S1 = ( S2 0)
( 0 0)
And arguing as above for U and analogously for V' we can in fact throw away all but the first p columns of U and all of S but S2, and all but the first p rows of V, and still recover A.
This latter is the form of SVD ('thin') in your question:
U is mxp
S is pxp
V' is pxm
where p is the number of non-zero singular values of A. This is my answer to your 1.
By convention the elements of S decrease as you move down the diagonal. To achieve this the routine that calculates the svd in effect works with a version of A with shuffled columns. This shuffling is undone by incorporating the shuffle in the U and V' output. This is my answer to your 2: however you shuffle A, it will be in effect shuffled again to ensure that the singular values decrease down the diagonal.
I struggle to answer 3, because I suspect that our ideas of 'practical' are rather different.
One thing that I think practical is to find simpler approximations to A. The reconstruction of A can be written
A = Sum{ 1<=i<=p | U[i]*S[i]*V[i]' }
where the S[i] are the diagonal elements of S, U[i] are the columns of U and V[i] those of V
We might want to use a simpler model for A, for example want to simplify it down to just one term. That is, we might wonder how much we would lose by using fewer 'concepts'. The 'thin' svd above has already done this in the sense that it has thrown away all the coluns that make no contribution to A. In an extreme case, we might wonder what we would get if we reduced to just one concept. This approximation is found by taking just the first term of the sum above. This extends to however many terms -- q say -- we want to allow: we just take the first q terms of the sum above.
I'm sorry, I can't answer 4.
first of all I'm sorry but I don't know how to write expressions, so I used images to explain my problem.
I have to find the following parameters: alpha and beta in order that the following bound holds:
where the matrix A is the following one
According to the definition of norm of a matrix, I should find the max eigenvalue of this matrix A.
First of all, I have computed:
Then looking at the solution, I have seen that, the lower 2 × 2 block on the diagonal of this ATA matrix has been called B and the maximum eigenvalue of AT A is:
Question 1: Is the maximum eigenvalue of a matrix always computed in this way? (Cosidering the trace). I usually compute the det(lambda*I-ATA) and then I compute the max eigenvalue
Then in the solution, there was written: Since we are looking for a bound on the norm of A(q), we can write the chain of inequalities
Question 2: Is the bound on the norm of a matrix, always less than the trace of the same matrix?
Question 3: I didn't understand from where the highlighted expressions come up. Could anyone help me?
The final result is the following (but I don't care, I would like to understand the steps):
(1) I have n points in 3D space
(2) I have a random vector
(3) I project all n points into the vector
Then I find the average distance between all points
How could I find the vector in which after projecting the points into it, the average distance between points is the greatest?
Can this be done in O(n)?
There is one method which you can use from machine learning, specifically dimensionality reduction. (This is based on PCA which was mentioned in one of the comments.)
Compute the covariance matrix.
Find the eigenvalues and the eigenvectors.
The eigenvector with the largest eigenvalue will correspond to the direction of the most variance, so the direction in which the points are most spread out.
Map the points onto the line defined by the vector.
Centring the points around 0 before the projection, and then moving them back afterwards may be needed as well. The issue with this, is that it is quite expensive in terms of time. For more details looks at this question: How is the complexity of PCA O(min(p^3,n^3))?
I'm having trouble understanding the time complexity of the solution to a problem.
Let X, Y and Z be n × n matrices. Suppose we want to verify whether XY = Z. What is the complexity of solving the problem directly by computing XY?
The correct answer is O(n3), but I don't understand why. Why is this the case?
The standard algorithm for computing the product of two n × n matrices is to use the fact that the entry at position (i, j) in the product is the inner product of the ith row of the first matrix and the jth column of the second matrix. Computing the inner product of this row and this column takes time Θ(n), because there are n entries that need to be pairwise multiplied and summed together. Therefore, each entry of the resulting matrix takes time Θ(n). Since there are n2 total entries in the matrices, the total time complexity using the naive algorithm is Θ(n3).
There are faster matrix multiplication approaches than the one described here that use more sophisticated algorithms. You might want to look up the Coppersmith-Winograd algorithm or the Strassen algorithm, which are asymptotically faster than the naive algorithm.
However, there are better randomized algorithms for checking whether the product is correct. Check out Frievalds' algorithm for an O(n2)-time randomized algorithm that with high probability can detect if the multiplication is correct.
Hope this helps!
Suppose there are three sequences to be compared: a, b, and c. Traditionally, the resulting 3-by-3 pairwise distance matrix is symmetric, indicating that the distance from a to b is equal to the distance from b to a.
I am wondering if TraMineR provides some way to produce an asymmetric pairwise distance matrix.
No, TraMineR does not produce 'assymetric' dissimilaries precisely for the reasons stressed in Pat's comment.
The main interest of computing pairwise dissimilarities between sequences is that once we have such dissimilarities we can for instance
measure the discrepancy among sequences, determine neighborhoods, find medoids, ...
run cluster algorithms, self-organizing maps, MDS, ...
make ANOVA-like analysis of the sequences
grow regression trees for the sequences
Inputting a non symmetric dissimilarity matrix in those processes would most probably generate irrelevant outcomes.
It is because of this symmetry requirement that the substitution costs used for computing Optimal Matching distances MUST be symmetrical. It is important to not interpret substitution costs as the cost of switching from one state to the other, but to understand them for what they are, i.e., edit costs. When comparing two sequences, for example
aabcc and aadcc, we can make them equal either by replacing arbitrarily b with d in the first one or d with b in the second one. It would then not make sense not giving the same cost for the two substitutions.
Hope this helps.