Matrix and multiplication complexity - math

I'm having trouble understanding the time complexity of the solution to a problem.
Let X, Y and Z be n × n matrices. Suppose we want to verify whether XY = Z. What is the complexity of solving the problem directly by computing XY?
The correct answer is O(n3), but I don't understand why. Why is this the case?

The standard algorithm for computing the product of two n × n matrices is to use the fact that the entry at position (i, j) in the product is the inner product of the ith row of the first matrix and the jth column of the second matrix. Computing the inner product of this row and this column takes time Θ(n), because there are n entries that need to be pairwise multiplied and summed together. Therefore, each entry of the resulting matrix takes time Θ(n). Since there are n2 total entries in the matrices, the total time complexity using the naive algorithm is Θ(n3).
There are faster matrix multiplication approaches than the one described here that use more sophisticated algorithms. You might want to look up the Coppersmith-Winograd algorithm or the Strassen algorithm, which are asymptotically faster than the naive algorithm.
However, there are better randomized algorithms for checking whether the product is correct. Check out Frievalds' algorithm for an O(n2)-time randomized algorithm that with high probability can detect if the multiplication is correct.
Hope this helps!

Related

Questions about SVD, Singular Value Decomposition

I am not a mathematician, so I need to understand what SVD does and WHY more than how it works exactly from the math perspective. (I understand at least what is the decomposition though).
This guy on youtube gave the only human explanation of SVD saying, that the U matrix maps "user to concept correlation" Sigma matrix defines the strength of each concept, and V maps "movie to concept correlation" given that initial matrix M has users in the rows, and movie (ratings) in the columns.
He also mentioned two concept specifically "sci fi" and "romance" movies. See the picture below.
My questions are:
How SVD knows the number of concepts. He as human mentioned two - sci fi, and romance, but in reality in resulting matrices are 3 concepts. (for example matrix U - that one with blue titles - has 3 columns not 2).
How SVD knows what is the concept after all. I mean, what If i shuffle the columns randomly how SVD then knows what is sci fi, what is romance. I mean, I suppose there is no rule, group the concepts together in the column order. What if scifi movie is the first and last one? and not first 3 columns in the initial matrix M?
What is the practical usage of either U, Sigma or V matrices? (Except that you can multiply them to get the initial matrix M)
Is there also any other possible human explanation of SVD than the guy up provided, or it is the only one possible function? Matrices of correlations.
As was pointed out in the comments you may well get better explanations elsewhere. However since the question is still open, here is my tuppence worth.
Throughout I'll suppose that A is mxn where m>=n, ie that A has more rows than columns.
First of all there are many forms of the SVD, differing in the sizes of the matrices. They all share the fundamental properties that
A = U*S*V'
S is diagonal
U and V have orthogonal columns (ie U'*U = I, V'*V = I)
Perhaps the most useful from a theoretical point of view is the 'full fat' svd where we have that U is mxm, S is mxn and V is nxn. However this has rather a lot of elements that don't really contribute to A. For example S being diagonal we can write
S = ( S1 ) (where S1 is nxn )
( 0 )
If we divide up U into
U = ( U1 U2) (where U1 is mxn and U2 is (mx(m-n)))
Then its straightforward to calculate that
U*S = U1*S1
and so we can throw away the last m-n columns of U and the last m-n rows of S, and still recover A.
Moreover some of the diagonal elements of S1 may be 0; suppose in fact that p<n of them are non zero. Then we can write
S1 = ( S2 0)
( 0 0)
And arguing as above for U and analogously for V' we can in fact throw away all but the first p columns of U and all of S but S2, and all but the first p rows of V, and still recover A.
This latter is the form of SVD ('thin') in your question:
U is mxp
S is pxp
V' is pxm
where p is the number of non-zero singular values of A. This is my answer to your 1.
By convention the elements of S decrease as you move down the diagonal. To achieve this the routine that calculates the svd in effect works with a version of A with shuffled columns. This shuffling is undone by incorporating the shuffle in the U and V' output. This is my answer to your 2: however you shuffle A, it will be in effect shuffled again to ensure that the singular values decrease down the diagonal.
I struggle to answer 3, because I suspect that our ideas of 'practical' are rather different.
One thing that I think practical is to find simpler approximations to A. The reconstruction of A can be written
A = Sum{ 1<=i<=p | U[i]*S[i]*V[i]' }
where the S[i] are the diagonal elements of S, U[i] are the columns of U and V[i] those of V
We might want to use a simpler model for A, for example want to simplify it down to just one term. That is, we might wonder how much we would lose by using fewer 'concepts'. The 'thin' svd above has already done this in the sense that it has thrown away all the coluns that make no contribution to A. In an extreme case, we might wonder what we would get if we reduced to just one concept. This approximation is found by taking just the first term of the sum above. This extends to however many terms -- q say -- we want to allow: we just take the first q terms of the sum above.
I'm sorry, I can't answer 4.

Computational complexity of n-dimensional Discrete Fourier Transform?

The computational complexity of n-dimensional Fast Fourier Transform was discussed here and (as the former's duplicate) here.
The computational complexity of a 1-dimensional Discrete Fourier Transform is O(N^2), N is the data set size.
Could you please tell us what is the computational complexity of the n-dimensional Discrete Fourier Transform consisting {N1, N2 ... Nn} points along each dimension?
The FFT itself is also a DFT (with some constraints). Will assume that you mean the naive summation method.
Re-writing the 1D DFT in integral form (the continuous version):
A particular value of f-tilde is equivalent to a single element in your DFT array. When the integral is discretized (i.e. converted a finite sum), there are N terms in the sum. This gives O(N) for each element and hence O(N^2) overall.
In case you were wondering, writing in this form allows for more compact notation for a general n-D DFT:
When this is discretized, we can see that for each element there are n sums, each over one of the dimensions and of length N. There are N ^ n values in the input "array", so the complexity is:

Perform sum of vectors in CUDA/thrust

So I'm trying to implement stochastic gradient descent in CUDA, and my idea is to parallelize it similar to the way that is described in the paper Optimal Distributed Online Prediction Using Mini-Batches
That implementation is aimed at MapReduce distributed environments so I'm not sure if it's optimal when using GPUs.
In short the idea is: at each iteration, calculate the error gradients for each data point in a batch (map), take their average by sum/reducing the gradients, and finally perform the gradient step updating the weights according to the average gradient. The next iteration starts with the updated weights.
The thrust library allows me to perform a reduction on a vector allowing me for example to sum all the elements in a vector.
My question is: How can I sum/reduce an array of vectors in CUDA/thrust?
The input would be an array of vectors and the output would be a vector that is the sum of all the vectors in the array (or, ideally, their average).
Converting my comment into this answer:
Let's say each vector has length m and the array has size n.
An "array of vectors" is then the same as a matrix of size n x m.
If you change your storage format from this "array of vectors" to a single vector of size n * m, you can use thrust::reduce_by_key to sum each row of this matrix separately.
The sum_rows example shows how to do this.

Calculating the trace of a matrix to the power k

I need to calculate the trace of a matrix to the power of 3 and 4 and it needs to be as fast as it can get.
The matrix here is an adjacency matrix of a simple graph, therefore it is square, symmetric, its entries are always 1 or 0 and the diagonal elements are always 0.
Optimization is trivial for the trace of the matrix to the power of 2:
We only need the diagonal entries (i,i) for the trace, skip all others
As the matrix is symmetric these entries are just the entries of the i-th row squared and summed up
And as the entries are just 1 or 0 the square-operation can be skipped
Another idea I found on wikipedia was summing up all elements of the Hadamard product, i.e. entry-wise multiplication, but I don't know how to extend this method to the power of 3 and 4.
See http://en.wikipedia.org/wiki/Trace_(linear_algebra)#Properties
Maybe I'm just blind but I can't think of a simple solution.
In the end I need a C++ implementation, but I think that's not important to the question.
Thanks in advance for any help.
The trace is the sum of the eigenvalues and the eigenvalues of a matrix power are just the eigenvalues to that power.
That is, if l_1,...,l_n are the eigenvalues of your matrix then trace(M^p) = 1_1^p + l_2^p +...+l_n^p.
Depending on your matrix you may want to go with computing the eigenvalues and then summing. If your matrix has low rank (or can be well approximated with a low rank matrix) you can compute the eigenvalues very cheaply (a partial eigendecomposition has complexity O(n*k^2) where k is the rank).
Edit: You mention in the comments that it's 1600x1600 in which case finding all the eigenvalues should be no problem. Here's one of many C++ codes that you can use for this http://code.google.com/p/redsvd/
Ok, I just figured this one out myself.
The important thing I did not know was this:
If A is the adjacency matrix of the directed or undirected graph G, then the matrix An (i.e., the matrix product of n copies of A) has an interesting interpretation: the entry in row i and column j gives the number of (directed or undirected) walks of length n from vertex i to vertex j. This implies, for example, that the number of triangles in an undirected graph G is exactly the trace of A^3 divided by 6.
(Copied from http://en.wikipedia.org/wiki/Adjacency_matrix#Properties)
Retrieving the number of paths of a given length from node i to i for all n nodes can essentially be done in O(n) when dealing with sparse graphs and using adjacency lists instead of matrices.
Nevertheless, thanks for your answers!

Multinomial Generation of Degree n

I'm basically looking for a summation function that will compute multinomials given the number of variables and a degree.
Example
2 Variables; 2 Degrees:
x^2+y^2+x*y+x+y+1
Thanks.
See Knuth The Art of Computer Programming, Vol. 4, Fascicle 3 for a comprehensive answer.
Short answer: it's enough to generate all multinomial expressions in n variables with degree exactly d. Then, for your problem, you can either put together the answers with degrees ≤d, or add a dummy variable "1".
The problem of generating all expressions with degree exactly d is thus simply one of generating all ordered partitions (i.e., all nonnegative integer solutions to x1 + ... + xn = d), and this can be done with a simple backtracking algorithm. ("Depth-first search")
Given N variables, and a maximum degree of D, you have an array of D slots to fill with all possible combinations of variables.
[_, _, ..., _, _]
You are allowed to fill the slots with any of the N variables any number <= D times total. Since multiplication is commutative, it suffices to not care about ordering of variables. As such, this problem is reduced to generating (1) partitions of an integer and (2) subsets of a set.
I hope this is at least a start to your solution.
This also seems to be a Dynamic programming variant of the 0-1 Knapsack problem. Here we would be interested in all possible leaves of the decision tree.

Resources