Related
I have a 3 million x 9 million sparse matrix with several billion non-zero entries. R and Python do not allow sparse matrices with more than MAXINT non-zero entries, thus why I found myself using Julia.
While scaling this data with the standard deviation is trivial, demeaning is of course a no-go in a naive manner as that would create a dense, 200+ terabyte matrix.
The relevant code for doing svd is julia can be found at https://github.com/JuliaLang/julia/blob/343b7f56fcc84b20cd1a9566fd548130bb883505/base/linalg/arnoldi.jl#L398
From my reading, a key element of this code is the AtA_or_AAt struct and several of the functions around those, specifically A_mul_B!. Copied below for your convenience
struct AtA_or_AAt{T,S} <: AbstractArray{T, 2}
A::S
buffer::Vector{T}
end
function AtA_or_AAt(A::AbstractMatrix{T}) where T
Tnew = typeof(zero(T)/sqrt(one(T)))
Anew = convert(AbstractMatrix{Tnew}, A)
AtA_or_AAt{Tnew,typeof(Anew)}(Anew, Vector{Tnew}(max(size(A)...)))
end
function A_mul_B!(y::StridedVector{T}, A::AtA_or_AAt{T}, x::StridedVector{T}) where T
if size(A.A, 1) >= size(A.A, 2)
A_mul_B!(A.buffer, A.A, x)
return Ac_mul_B!(y, A.A, A.buffer)
else
Ac_mul_B!(A.buffer, A.A, x)
return A_mul_B!(y, A.A, A.buffer)
end
end
size(A::AtA_or_AAt) = ntuple(i -> min(size(A.A)...), Val(2))
ishermitian(s::AtA_or_AAt) = true
This is passed into the eigs function, where some magic happens, and the output is then processed in to the relevant components for SVD.
I think the best way to make this work for a 'centering on the fly' type setup is to do something like subclass AtA_or_AAT with a AtA_or_AAT_centered version that more or less mimics the behavior but also stores the column means, and redefines the A_mul_B! function appropriately.
However, I do not use Julia very much and have run in to some difficulty modifying things already. Before I try to dive into this again, I was wondering if I could get feedback if this would be considered an appropriate plan of attack, or if there is simply a much easier way of doing SVD on such a large matrix (I haven't seen it, but I may have missed something).
edit: Instead of modifying base Julia, I've tried writing a "Centered Sparse Matrix" package that keeps the sparsity structure of the input sparse matrix, but enters the column means where appropriate in various computations. It's limited in what it has implemented, and it works. Unfortunately, it is still too slow, despite some pretty extensive efforts to try to optimize things.
After much fuddling with the sparse matrix algorithm, I realized that distributing the multiplication over the subtraction was dramatically more efficient:
If our centered matrix Ac is formed from the original nxm matrix A and its vector of column means M, with a nx1 vector of ones that I will just call 1. We are multiplying by a mxk matrix X
Ac := (A - 1M')
AcX = X
= AX - 1M'X
And we are basically done. Stupidly simple, actually.
AX is can be carried out with the usual sparse matrix multiplication function, M'X is a dense vector-matrix inner product, and the vector of 1's "broadcasts" (to use Julia's terminology) to each row of the AX intermediate result. Most languages have a way of doing that broadcasting without realizing the extra memory allocation.
This is what I've implemented in my package for AcX and Ac'X. The resulting object can then be passed to algorithms, such as the svds function, which only depend on matrix multiplication and transpose multiplication.
I've been struggling with the basics of functional programming lately. I started writing small functions in SML, so far so good. Although, there is one problem I can not solve. It's on Project Euler (https://projecteuler.net/problem=5) and it simply asks for the smallest natural number that is divisible from all the numbers from 1 - n (where n is the argument of the function I'm trying to build).
Searching for the solution, I've found that through prime factorization, you analyze all the numbers from 1 to 10, and then keep the numbers where the highest power on a prime number occurs (after performing the prime factorization). Then you multiply them and you have your result (eg for n = 10, that number is 2520).
Can you help me on implementing this to an SML function?
Thank you for your time!
Since coding is not a spectator sport, it wouldn't be helpful for me to give you a complete working program; you'd have no way to learn from it. Instead, I'll show you how to get started, and start breaking down the pieces a bit.
Now, Mark Dickinson is right in his comments above that your proposed approach is neither the simplest nor the most efficient; nonetheless, it's quite workable, and plenty efficient enough to solve the Project Euler problem. (I tried it; the resulting program completed instantly.) So, I'll go with it.
To start with, if we're going to be operating on the prime decompositions of positive integers (that is: the results of factorizing them), we need to figure out how we're going to represent these decompositions. This isn't difficult, but it's very helpful to lay out all the details explicitly, so that when we write the functions that use them, we know exactly what assumptions we can make, what requirements we need to satisfy, and so on. (I can't tell you how many times I've seen code-writing attempts where different parts of the program disagree about what the data should look like, because the exact easiest form for one function to work with was a bit different from the exact easiest form for a different function to work with, and it was all done in an ad hoc way without really planning.)
You seem to have in mind an approach where a prime decomposition is a product of primes to the power of exponents: for example, 12 = 22 × 31. The simplest way to represent that in Standard ML is as a list of pairs: [(2,2),(3,1)]. But we should be a bit more precise than this; for example, we don't want 12 to sometimes be [(2,2),(3,1)] and sometimes [(3,1),(2,2)] and sometimes [(3,1),(5,0),(2,2)]. So, we can say something like "The prime decomposition of a positive integer is represented as a list of prime–exponent pairs, with the primes all being positive primes (2,3,5,7,…), the exponents all being positive integers (1,2,3,…), and the primes all being distinct and arranged in increasing order." This ensures a unique, easy-to-work-with representation. (N.B. 1 is represented by the empty list, nil.)
By the way, I should mention — when I tried this out, I found that everything was a little bit simpler if instead of storing exponents explicitly, I just repeated each prime the appropriate number of times, e.g. [2,2,3] for 12 = 2 × 2 × 3. (There was no single big complication with storing exponents explicitly, it just made a lot of little things a bit more finicky.) But the below breakdown is at a high level, and applies equally to either representation.
So, the overall algorithm is as follows:
Generate a list of the integers from 1 to 10, or 1 to 20.
This part is optional; you can just write the list by hand, if you want, so as to jump into the meatier part faster. But since your goal is to learn the basics of functional programming, you might as well do this using List.tabulate [documentation].
Use this to generate a list of the prime decompositions of these integers.
Specifically: you'll want to write a factorize or decompose function that takes a positive integer and returns its prime decomposition. You can then use map, a.k.a. List.map [documentation], to apply this function to each element of your list of integers.
Note that this decompose function will need to keep track of the "next" prime as it's factoring the integer. In some languages, you would use a mutable local variable for this; but in Standard ML, the normal approach is to write a recursive helper function with a parameter for this purpose. Specifically, you can write a function helper such that, if n and p are positive integers, p ≥ 2, where n is not divisible by any prime less than p, then helper n p is the prime decomposition of n. Then you just write
local
fun helper n p = ...
in
fun decompose n = helper n 2
end
Use this to generate the prime decomposition of the least common multiple of these integers.
To start with, you'll probably want to write a lcmTwoDecompositions function that takes a pair of prime decompositions, and computes the least common multiple (still in prime-decomposition form). (Writing this pairwise function is much, much easier than trying to create a multi-way least-common-multiple function from scratch.)
Using lcmTwoDecompositions, you can then use foldl or foldr, a.k.a. List.foldl or List.foldr [documentation], to create a function that takes a list of zero or more prime decompositions instead of just a pair. This makes use of the fact that the least common multiple of { n1, n2, …, nN } is lcm(n1, lcm(n2, lcm(…, lcm(nN, 1)…))). (This is a variant of what Mark Dickinson mentions above.)
Use this to compute the least common multiple of these integers.
This just requires a recompose function that takes a prime decomposition and computes the corresponding integer.
(I'm not sure whether I should post this problem on this site or on the math site. Please feel free to migrate this post if necessary.)
My problem at hand is that given a value of k I'd like to numerically compute a rational function of nonlinear polynomials in k which looks like the following: (sorry I don't know how to typeset equations here...)
where {a_0, ..., a_N; b_0, ..., b_N} are complex constants, {u_0, ..., u_N, v_0, ..., v_N} are real constants and i is the imaginary number. I learned from Numerical Recipes that there are whole bunch of ways to compute polynomials quickly, in the meanwhile keeping the rounding error small enough, if all coefficients were constant. But I do not think those ideas are useful in my case since the exponential prefactors also depend on k.
Currently I calculate it in a brute force way in C with complex.h (this is just a pseudo code):
double complex function(double k)
{
return (a_0+a_1*cexp(I*u_1*k)*k+a_2*cexp(I*u_2*k)*k*k+...)/(b_0+b_1*cexp(I*v_1*k)*k+v_2*cexp(I*v_2*k)*k*k+...);
}
However when the number of calls of function increases (because this is just a part of my real calculation), it is very slow and inaccurate (only 6 valid digits). I appreciate any comments and/or suggestions.
I trust that this isn't a homework assignment!
Normally the trick is to use a loop add the next coefficient to the running sum, and multiply by k. However, in your case, I think the "e" term in the coefficient is going to overwhelm any savings by factoring out k. You can still do it, but the savings will probably be small.
Is u_i a constant? Depending on how many times you need to run this formula, maybe you could premultiply u_i * k (unless k changes each run). It's been so many decades since I took a Numerical Analysis course that I have only vague recollections of the tricks of the trade. Let's see... is e^(i*u_i*k) the same as (e^(i*u_i))^k? I don't remember the rules on imaginary numbers, or whether you'll save anything since you've got a real^real (assuming k is real) anyway (internally done using e^power).
If you're getting only 6 digits, that suggests that your math, and maybe your library, is working in single precision (32 bit) reals. Check your library and check your declarations that you are using at least double precision (64 bit) reals everywhere.
I am attempting to generate QR codes on an extremely limited embedded platform. Everything in the specification seems fairly straightforward except for generating the error correction codewords. I have looked at a bunch of existing implementations, and they all try to implement a bunch of polynomial math that goes straight over my head, particularly with regards to the Galois fields. The most straightforward way I can see, both in mathematical complexity and in memory requirements is a circuit concept that is laid out in the spec itself:
With their description, I am fairly confident I could implement this with the exception of the parts labeled GF(256) addition and GF(256) Multiplication.
They offer this help:
The polynomial arithmetic for QR Code shall be calculated using bit-wise modulo 2 arithmetic and byte-wise
modulo 100011101 arithmetic. This is a Galois field of 2^8
with 100011101 representing the field's prime modulus
polynomial x^8+x^4+x^3+x^2+1.
which is all pretty much greek to me.
So my question is this: What is the easiest way to perform addition and multiplication in this kind of Galois field arithmetic? Assume both input numbers are 8 bits wide, and my output needs to be 8 bits wide also. Several implementations precalculate, or hardcode in two lookup tables to help with this, but I am not sure how those are calculated, or how I would use them in this situation. I would rather not take the 512 byte memory hit for the two tables, but it really depends on what the alternative is. I really just need help understanding how to do a single multiplication and addition operation in this circuit.
In practice only one table is needed. That would be for the GP(256) multiply. Note that all arithmetic is carry-less, meaning that there is no carry-propagation.
Addition and subtraction without carry is equivalent to an xor.
So in GF(256), a + b and a - b are both equivalent to a xor b.
GF(256) multiplication is also carry-less, and can be done using carry-less multiplication in a similar way with carry-less addition/subtraction. This can be done efficiently with hardware support via say Intel's CLMUL instruction set.
However, the hard part, is reducing the modulo 100011101. In normal integer division, you do it using a series of compare/subtract steps. In GF(256), you do it in a nearly identical manner using a series of compare/xor steps.
In fact, it's bad enough where it's still faster to just precompute all 256 x 256 multiplies and put them into a 65536-entry look-up table.
page 3 of the following pdf has a pretty good reference on GF256 arithmetic:
http://www.eecs.harvard.edu/~michaelm/CS222/eccnotes.pdf
(I'm following up on the pointer to zxing in the first answer, since I'm the author.)
The answer about addition is exactly right; that's why working in this field is convenient on a computer.
See http://code.google.com/p/zxing/source/browse/trunk/core/src/com/google/zxing/common/reedsolomon/GenericGF.java
Yes multiplication works, and is for GF256. a * b is really the same as exp(log(a) + log(b)). And because GF256 has only 256 elements, there are only 255 unique powers of "x", and same for log. So these are easy to put in a lookup table. The tables would "wrap around" at 256, so that is why you see the "% size". "/ size" is slightly harder to explain in a sentence -- it's because really 1-255 "wrap around", not 0-255. So it's not quite just a simple modulus that's needed.
The final piece perhaps is how you reduce modulo an irreducible polynomial. The irreducibly polynomial is x^8 plus some lower-power terms, right -- call it I(x) = x^8 + R(x). And the polynomial is congruent to 0 in the field, by definition; I(x) == 0. So x^8 == -R(x). And, conveniently, addition and subtraction are the same, so x^8 == -R(x) == R(x).
The only time we need to reduce higher-power polynomials is when constructing the exponents table. You just keep multiplying by x (which is a shift left) until it gets too big -- gets an x^8 term. But x^8 is the same as R(x). So you take out the x^8 and add in R(x). R(x) merely has powers up to x^7 so it's all in a byte still, all in GF(256). And you know how to add in this field.
Helps?
Disclaimer
This is not strictly a programming question, but most programmers soon or later have to deal with math (especially algebra), so I think that the answer could turn out to be useful to someone else in the future.
Now the problem
I'm trying to check if m vectors of dimension n are linearly independent. If m == n you can just build a matrix using the vectors and check if the determinant is != 0. But what if m < n?
Any hints?
See also this video lecture.
Construct a matrix of the vectors (one row per vector), and perform a Gaussian elimination on this matrix. If any of the matrix rows cancels out, they are not linearly independent.
The trivial case is when m > n, in this case, they cannot be linearly independent.
Construct a matrix M whose rows are the vectors and determine the rank of M. If the rank of M is less than m (the number of vectors) then there is a linear dependence. In the algorithm to determine the rank of M you can stop the procedure as soon as you obtain one row of zeros, but running the algorithm to completion has the added bonanza of providing the dimension of the spanning set of the vectors. Oh, and the algorithm to determine the rank of M is merely Gaussian elimination.
Take care for numerical instability. See the warning at the beginning of chapter two in Numerical Recipes.
If m<n, you will have to do some operation on them (there are multiple possibilities: Gaussian elimination, orthogonalization, etc., almost any transformation which can be used for solving equations will do) and check the result (eg. Gaussian elimination => zero row or column, orthogonalization => zero vector, SVD => zero singular number)
However, note that this question is a bad question for a programmer to ask, and this problem is a bad problem for a program to solve. That's because every linearly dependent set of n<m vectors has a different set of linearly independent vectors nearby (eg. the problem is numerically unstable)
I have been working on this problem these days.
Previously, I have found some algorithms regarding Gaussian or Gaussian-Jordan elimination, but most of those algorithms only apply to square matrix, not general matrix.
To apply for general matrix, one of the best answers might be this:
http://rosettacode.org/wiki/Reduced_row_echelon_form#MATLAB
You can find both pseudo-code and source code in various languages.
As for me, I transformed the Python source code to C++, causes the C++ code provided in the above link is somehow complex and inappropriate to implement in my simulation.
Hope this will help you, and good luck ^^
If computing power is not a problem, probably the best way is to find singular values of the matrix. Basically you need to find eigenvalues of M'*M and look at the ratio of the largest to the smallest. If the ratio is not very big, the vectors are independent.
Another way to check that m row vectors are linearly independent, when put in a matrix M of size mxn, is to compute
det(M * M^T)
i.e. the determinant of a mxm square matrix. It will be zero if and only if M has some dependent rows. However Gaussian elimination should be in general faster.
Sorry man, my mistake...
The source code provided in the above link turns out to be incorrect, at least the python code I have tested and the C++ code I have transformed does not generates the right answer all the time. (while for the exmample in the above link, the result is correct :) -- )
To test the python code, simply replace the mtx with
[30,10,20,0],[60,20,40,0]
and the returned result would be like:
[1,0,0,0],[0,1,2,0]
Nevertheless, I have got a way out of this. It's just this time I transformed the matalb source code of rref function to C++. You can run matlab and use the type rref command to get the source code of rref.
Just notice that if you are working with some really large value or really small value, make sure use the long double datatype in c++. Otherwise, the result will be truncated and inconsistent with the matlab result.
I have been conducting large simulations in ns2, and all the observed results are sound.
hope this will help you and any other who have encontered the problem...
A very simple way, that is not the most computationally efficient, is to simply remove random rows until m=n and then apply the determinant trick.
m < n: remove rows (make the vectors shorter) until the matrix is square, and then
m = n: check if the determinant is 0 (as you said)
m < n (the number of vectors is greater than their length): they are linearly dependent (always).
The reason, in short, is that any solution to the system of m x n equations is also a solution to the n x n system of equations (you're trying to solve Av=0). For a better explanation, see Wikipedia, which explains it better than I can.