On the Subtleties of Sparseness in Chapel - multidimensional-array

Given a dense domain dom: domain(n); where n < 3, the declaration sps1: sparse subdomain(dom); yields a sparse subdomain sps1 of dom. With sps1 the usual array/matrix slicing is possible. That is, given a matrix A: [sps1] one can take n - 1 dimensional slices of A. However, the usual matrix operation transpose() is not applicable.
Defining a second matrix B:[sps2] over another sparse subdomain sps2 = CSRDomain(dom) enables one to take transpose()s of B, but the ability to slice into B is forfeit.
Both of these abilities would seem to be that which one should always have access to. Is there a better way to declare sparse subdomains that preserves the two?

Is there a better way to declare sparse subdomains that preserves the two?
I think you are just hitting a shortcoming of the current implementation as of Chapel 1.16.0.
COO sparse arrays & domains, the language's default sparse distribution, created with sps1: sparse subdomain(dom), are not yet supported in the LinearAlgebra.Sparse module, so there is no library-supported transpose.
CSR sparse arrays & domains, LinearAlgebra's default (and only supported) sparse distribution, created with sps2 = CSRDomain(dom), do not yet support slicing.
Both of these should be possible some day as sparse arrays and Linear Algebra features are further developed.

Related

Memory Efficient Centered Sparse SVD/PCA (in Julia)?

I have a 3 million x 9 million sparse matrix with several billion non-zero entries. R and Python do not allow sparse matrices with more than MAXINT non-zero entries, thus why I found myself using Julia.
While scaling this data with the standard deviation is trivial, demeaning is of course a no-go in a naive manner as that would create a dense, 200+ terabyte matrix.
The relevant code for doing svd is julia can be found at https://github.com/JuliaLang/julia/blob/343b7f56fcc84b20cd1a9566fd548130bb883505/base/linalg/arnoldi.jl#L398
From my reading, a key element of this code is the AtA_or_AAt struct and several of the functions around those, specifically A_mul_B!. Copied below for your convenience
struct AtA_or_AAt{T,S} <: AbstractArray{T, 2}
A::S
buffer::Vector{T}
end
function AtA_or_AAt(A::AbstractMatrix{T}) where T
Tnew = typeof(zero(T)/sqrt(one(T)))
Anew = convert(AbstractMatrix{Tnew}, A)
AtA_or_AAt{Tnew,typeof(Anew)}(Anew, Vector{Tnew}(max(size(A)...)))
end
function A_mul_B!(y::StridedVector{T}, A::AtA_or_AAt{T}, x::StridedVector{T}) where T
if size(A.A, 1) >= size(A.A, 2)
A_mul_B!(A.buffer, A.A, x)
return Ac_mul_B!(y, A.A, A.buffer)
else
Ac_mul_B!(A.buffer, A.A, x)
return A_mul_B!(y, A.A, A.buffer)
end
end
size(A::AtA_or_AAt) = ntuple(i -> min(size(A.A)...), Val(2))
ishermitian(s::AtA_or_AAt) = true
This is passed into the eigs function, where some magic happens, and the output is then processed in to the relevant components for SVD.
I think the best way to make this work for a 'centering on the fly' type setup is to do something like subclass AtA_or_AAT with a AtA_or_AAT_centered version that more or less mimics the behavior but also stores the column means, and redefines the A_mul_B! function appropriately.
However, I do not use Julia very much and have run in to some difficulty modifying things already. Before I try to dive into this again, I was wondering if I could get feedback if this would be considered an appropriate plan of attack, or if there is simply a much easier way of doing SVD on such a large matrix (I haven't seen it, but I may have missed something).
edit: Instead of modifying base Julia, I've tried writing a "Centered Sparse Matrix" package that keeps the sparsity structure of the input sparse matrix, but enters the column means where appropriate in various computations. It's limited in what it has implemented, and it works. Unfortunately, it is still too slow, despite some pretty extensive efforts to try to optimize things.
After much fuddling with the sparse matrix algorithm, I realized that distributing the multiplication over the subtraction was dramatically more efficient:
If our centered matrix Ac is formed from the original nxm matrix A and its vector of column means M, with a nx1 vector of ones that I will just call 1. We are multiplying by a mxk matrix X
Ac := (A - 1M')
AcX = X
= AX - 1M'X
And we are basically done. Stupidly simple, actually.
AX is can be carried out with the usual sparse matrix multiplication function, M'X is a dense vector-matrix inner product, and the vector of 1's "broadcasts" (to use Julia's terminology) to each row of the AX intermediate result. Most languages have a way of doing that broadcasting without realizing the extra memory allocation.
This is what I've implemented in my package for AcX and Ac'X. The resulting object can then be passed to algorithms, such as the svds function, which only depend on matrix multiplication and transpose multiplication.

affinity propagation clustering on a large sparse matrix

I am trying R package apcluster on a set of objects that I want to cluster, but I'm running into performance/memory problems, and I suspect I'm not doing it right. I'd like to hear your opinion, please.
In short: I have a set of about 13000 objects. Each object is associated with a set of 2 to 5 'features'. The similarity (by which I want to cluster, eventually) between any two objects i and j is equal to the number of features they have in common divided by the total number of distinct features they 'span'. E.g. if i = {a,b,c} and j = {c,d}, then sim[i,j] = 1/4 = 0.25, because they have only 1 feature in common ({c}) and in total they describe 4 distinct features ({a,b,c,d}).
Calculating my NxN similarity matrix is not a problem in theory: it can be done using set operations if each object's features are stored as a list; or features can be pivoted to a matrix of 1's and 0's, where each column is a feature, and then R's function dist with method="binary" does the trick.
In practice however, the first problem is that such similarity calculations are extremely slow. For 13 K objects, there are about 84.5 M similarities to calculate, but this doesn't sound so bad for a modern computer. I don't understand why it should take a few hours to do that. And the set operation version, that should be quicker as far as I can tell, is actually much slower than dist. [Another package called fingerprint is supposed to deal with such cases more efficiently, but so far I haven't been able to make it work, it gives a lot of errors when trying to make what they call 'featvec' objects].
The other thing to consider is that the 2-5 features per object are not very repetitive. There may be a group of 100 or so objects with at least one feature in common between them, but then none of the other 12.9 K objects has any feature in common with these 100 objects. The consequence is that the pivoted feature matrix is very sparse (if we consider 0's as empty). There are about 4000 columns in the pivoted matrix, and each row has at most 5 1's. I wonder if this is negatively impacting the performance of dist, in that it has to multiply through a lot of 0's that could instead be ignored.
Does it seem normal to you that it should take a few hours to apply dist to a matrix like the one I described? Can you suggest a different way to calculate the similarity that takes advantage of the sparseness of the matrix?
Anyway, I managed to get the output from dist, which however had class 'dist', and was a distance matrix, not a similarity one, so I had to use 1 - as.matrix(distance_matrix) to be able to make the similarity matrix apcluster needs as input.
That's when I got the first 'memory' problem. R said the vector could not be allocated due to its size. I tried the usual tricks, but in the end I could not get more than 4 GB, and my matrices are (apparently) bigger.
I overcame this by assigning each time new matrices to their old 'self'.
And then when I submitted this painstakingly put together similarity matrix to apcluster, again the vector size error popped up, as if the first thing apcluster did was create some other large object from what I had fed it.
I had a look at as.Sparse... in apcluster, but it does not seem to help a lot, considering that you have to calculate the full matrix first anyway.
In the end the only thing that worked a little bit was 'leveraged affinity propagation' by apclusterL, which however is an approximation.
Does anybody know if and how I could do this better? E.g. is it wise to pivot the data first, or should I stick to list and set operations? Or, can the fact that the initial matrix is sparse be used to compute directly a sparse similarity matrix, rather than compute it fully and reduce it to sparse later?
Any advice would be greatly appreciated. Thanks!
BTW, yes, I saw this thread: Cluster Analysis in R on large sparse matrix ; which does not seem to have been answered conclusively.
The R interpreter is really slow.
So you should use R mostly to "drive" your program, but implement all the computations heavy stuff in C or FORTRAN.
You didn't show the code you are using, but I guess it involves nested for loops? Try to rewrite it without any for loops in R, or rewrite it in C.
But no matter what, AP clustering will always remain very slow. It involves many passes over O(n²) matrixes, i.e. it scales very badly.

Graph library implementation

I' m trying to implement a weighted graph. I know that there are two ways to implement a weighted graph. Either with a two dimensional array(adjacency matrix) or with an array of linked lists(adjacency list). Which one of the two is more efficient and faster?
Which one of the two is more efficient and faster?
That depends on your usage and the kinds of graphs you want to store.
Let n be the number of nodes and m be the number of edges. If you want to know whether two nodes u and v are connected (and the weight of the edge), an adjacency matrix allows you to determine this in constant time (in O-notation, O(1)), simply by retrieving the entry A[u,v]. With an adjacency list, you will have to look at every entry in u's list, or v's list - in the worst case, there could be n entries. So edge lookup for an adjacency list is in O(n).
The main downside of an adjacency matrix is the memory required. Alltogether, you need to store n^2 entries. With an adjacency list, you need to store only the edges that actually exist (m entries, asuming a directed graph). So if your graph is sparse, adjacency lists clearly occupy much less memory.
My conclusion would be: Use an adjacency matrix if your main operation is retrieving the edge weight for two specific nodes; under the condition that your graphs are small enough so that n^2 entries fit in memory. Otherwise, use the adjacency list.
Personally I'd go for the linked lists approach, assuming that it will often be a sparse graph (i.e. most of the array cells are a waste of space).
Went to wikipedia to read up on adjacency lists (been ages since I used graphs) and it has a nice section on the trade-offs between the 2 approaches. Ultimately, as with many either/or choices it will come down to 'it depends', based on what are the likely use cases for your library.
After reading the wiki article, I think another point in favor of using lists would be attaching data to each directed segment (or even different weights, think of walk/bike/car distance between 2 points etc.)

Fortran: multiplication with matrices only containing +1 and -1 as entries

What would be an efficient way (in terms of CPU-time and/or memory requirements) of multiplying, in fortran9x, an arbitrary M x N matrix, say A, only containing +1 and -1 as its entries (and fully populated!), with an arbitrary (dense) N-dimensional vector, v?
Many thanks,
Osmo
P.S. The size of A (i.e., M and N) is not known at the compilation time.
My guess is that it would be faster to just do the multiplication instead of trying to avoid the multiplication by checking the sign of the matrix element and adding/subtracting accordingly. Hence, just use a general optimized matrix-vector multiply routine. E.g. xGEMV from BLAS.
Depending on the usage scenario, if you have to apply the same matrix multiple times, you might separate it into two parts, one with the positive entries and one with the negatives.
With this you can avoid the need for multiplications, however it would introduce an indirection, which might be more expensive then the multiplications.
Thus janneb's solution might be the most suitable.

Difference between a vector in maths and programming

Maybe this question is better suited in the math section of the site but I guess stackoverflow is suited too. In mathematics, a vector has a position and a direction, but in programming, a vector is usually defined as:
Vector v (3, 1, 5);
Where is the direction and magnitude? For me, this is a point, not a vector... So what gives? Probably I am not getting something so if anybody can explain this to me it would be very appreciated.
If we are working in cartesian coordinates, and assume (0,0,0) to be the origin, then a point p=(3,1,5) can be written as
where i, j and k are the unit vectors in the x, y and z directions. For convenience sake, the unit vectors are dropped from programming constructs.
The magnitude of the vector is
and its direction cosines are
respectively, both of which can be done programmatically. You can also take dot products and cross-products, which I'm sure you know about. So the usage is consistent between programming and mathematics. The difference in notations is mostly because of convenience.
However as Tomas pointed out, in programming, it is also common to define a vector of strings or objects, which really have no mathematical meaning. You can consider such vectors to be a one dimensional array or a list of items that can be accessed or manipulated easily by indexing.
In mathematics, it is easy to represent a vector by a point - just say that the "base" of the vector is implied to be the origin. Thus, a mathematical point for all practical purposes is also a representation of a mathematical vector, and the vector in your example has the magnitude sqrt(3^2 + 1^2 + 5^2) = 6 and the direction (1/2, 1/6, 5/6) (a normalized vector from the origin).
However, a vector in programming usually has no geometrical use, which means you really aren't interested in things like magnitude or direction. A vector in programming is rather just an ordered list of items. Important here is that the items need not be numbers - it can be anything handled by the language in question! Thus, ("Hello", "little", "world") is also a vector in programming, although it (obviously) has no vector interpretation in the mathematical sense.
Practically speaking (!):
A vector in mathematics is only a direction without a position (actually something more general, but to stay in your terminology). In programming you often use vectors for points. You can think of your vector as the vector pointing from the origin (0,0,0) to the point (3,1,5), called the location vector of the point. Consult texts on analytical and affine geometry for more insight.
A Vector in computer science is an "one dimensional" data structure (array) (can be thought as direction) with an usually dynamic size (length/magnitude). For that reason it is called as vector. But it's an array at least.
A vector also means a set of coordinates. This is how it is used in programming. Just as a set of numbers. You might want to represent position vectors, velocity vectors, momentum vectors, force vectors with a vector object, or you may wish to represent it any way that suits you.
Many times vector quantities may be represented by 4 coordinates instead of 3 (see homogeneous coordinates in computer graphics) so a physical vector is represented by a computer vector with 4 elements. Alternatively you can store direction and magnitude separately, or encode them with 3, 4 or more coordinates.
I guess what I am getting to, is that computer languages are designed to represent physical models, but abstract data containers that the programmer use as tools for his/hers modeling.
Vector in math is an element of n-dimensional space over some field(e.g. real/complex number, functions, string). It may have infinite dimension, e.g. functional space L^2. I don't remember infite-dimensional vectors were used in programming (infinite vectors are not vectors with non-limited length, but vector with infite number of elements)
The most rigorous statement is that a mathematical vector is a first-order tensor that transforms from one coordinate system to another according to tensor transformation rules. The physical idea to keep in mind is that vectors have both magnitude and direction.
Programming vectors are data structures that need not transform according to any rules and may or may not have a notion of a coordinate system as reference. If you happen to use a vector data structure to hold numbers, they may conform to the mathematical definition. But if you have a vector of objects, it's unlikely that they have anything to do with coordinate transformations.

Resources