Why aren't there any multidimensional sparse matrices/arrays in Julia? Why can we only have 2D sparse matrices and not for example 3D sparse matrices (or arrays)?
The problem as I understand it (I'm not sparse linear algebra expert, although Viral Shah, who is one of the other Julia co-founders is) is that all libraries (e.g. SuiteSparse) for doing sparse computations are matrix-only. They don't support sparse vectors and they don't support higher dimensional tensors either. So we could define types for higher-dimensional sparse tensors, but you wouldn't be able to do anything useful with them.
Related
I have a 10000x10000 array in Julia, say A=rand(10000,10000). How can I store that large array so I can work with it in a IDE like Atom/Juno, performing matrices operations, determinants, eigenvalues and so on? Or even if I transfer that array to R, is it a way to work with that large array in R?
If your data is sparse (not all cells have values) you can store it as a sparse Matrix, which will greatly improve the memory footprint (see https://docs.julialang.org/en/v1/stdlib/SparseArrays/). Whether or not it fits into memory also depends on what the elements of the Matrix are. E.g. can you represent the values with Int8 or do you need 64-bit precision elements? A Matrix is not just a Matrix.
On a more general note, if your objects become so big they don't fit into memory, you can write them to disk and "memory-map" them, in that way you can use on-disk Matrices for anything you can use a normal Matrix for. You can check the documentation here: https://docs.julialang.org/en/v1/stdlib/Mmap
How do I deal with sparse matrices in JuMP?
For example, suppose I want to impose a constrain of the form:
A * x == 0
where A is a sparse matrix and x a vector of variables. I assume that the sparsity of A could be exploited to make the optimization faster. How can I take advantage of this in JuMP?
JuMP already benefits from sparse matrix in different ways, I've not checked the source but refer to a cited paper from JuMP.jl:
In the case of LP, the input data structures are the vectors c and b
and the matrix A in sparse format, and the routines to generate these
data structures are called matrix generators
One point to note is that, the main task of algebraic modeling languages (AMLs) like JuMP is to generate input data structures for solvers. AMLs like JuMP do not solve generated problems themselves but they call standard appropriate solvers to do the task.
I am researching sparse adjacency matrices where most cells are zeros and some ones here-and-there, each relationship between two cells has a polynomial description that can be very long and their analysis manually time-consuming. My instructor is suggesting purely algebraic method in terms of Gröbner bases but before proceeding I would like to know from purely computer science and programming perspective about how to analyse sparse adjacency matrices? Does there exist some data mining tools to analyse them?
Multivariate polynomial computation and Gröbner basis is an active research area. In 1991, Sturmfels in Sparse elimination theory outlined the resultant methods and GR methods. In 2015 July conference, CoCoa analysis.
SE is gathering awesome material on this such as GR computational analysis in M2 where you an find step-by-step examples outlined in the books and different answers. For sparse matrices, there are sparse matrix algorithms built with GR bases such as Faugère's F4 and F5 algorithms that is based on Buchberger algorithm.
Updating this when finding more!
Using the Matrix package I can create a two-dimensional sparse matrix.
Can someone suggest a package that would allow me to create a multidimensional (specifically a 3-dimensional) sparse matrix (array, or technically a three-way tensor) in R?
The slam package has a simple_sparse_array class: http://finzi.psych.upenn.edu/R/library/slam/html/array.html , although it only has support for indexing and coercion (if you wanted to do tensor operations, or elementwise arithmetic, without converting back to a regular dense array, you'd have to implement them yourself ...)
I found this by doing
library("sos")
findFn("{sparse array}")
There's also the tensorr package, which looks promising in providing support for sparse tensors, and tensor decompositions like PARAFAC/CANDECOMP etc are also on the to-do list:
https://cran.r-project.org/web/packages/tensorr/README.html
I'm preparing for a coding interview, and was refreshing my mind on graphs. I was wondering the following : in all places I've seen, it is assumed that adjacency lists are more memory efficient than adjacency matrices for large sparse graphs, and should thus be preferred in that case. In addition, computing the number of outgoing edges from a node requires O(N) in a matrix while it's O(1) in a list, as well as which are the adjacent nodes in O(num adjacent nodes) for the list instead of O(N) for the matrix.
Such places include Cormen et al.'s book, or StackOverFlow : Size of a graph using adjacency list versus adjacency matrix? or Wikipedia.
However, using a sparse matrix representation like with Compressed Row Storage representation, the memory requirement is just in O(number of non-zeros) = O(number of edges), which is the same as using lists. The number of outgoing edges from a node is O(1) (it is directly stored in CRS), and the adjacent nodes can be listed in O(num adjacent nodes).
Why isn't it discussed ? Should I assume that CSR is a kind of adjacency list representation of the graph represented by the matrix ? Or is the argument that matrices are memory intensive flawed because they don't consider sparse matrix representations ?
Thanks!
Not everyone uses sparse matrix representations every day (I just happen to do so :), so I guess nobody thought of them. They are a kind of intermediate between adjacency lists and adjacency matrices, with performance similar to the first if you pick the right representation, and are very convenient for some graph algorithms.
E.g., to get a proximity matrix over two hops, you just square the matrix. I've successfully done this with sparse matrix representations of the Wikipedia link structure in modest amounts of CPU time.