Huge diaginal matrix in R - r

The following code causes a memory error:
diag(1:100000)
Is there any alternative for diag which allows producing a huge diagonal matrix?

Longer answer: I suggest not creating a diagonal matrix, because in most situations you can do without it. To make that clear, consider the most typical matrix operations:
Multiply the diagonal matrix D by a vector v to produce Dv. Instead of maintaining a matrix, keep your "matrix" as a vector d of the diagonal elements, and then multiply d elementwise by v. Same result.
Invert the matrix. Again, easy: invert each element (of course, only for diagonal matrices is this generally the correct inverse).
Various decompositions/eigenvalues/determinants/trace. Again, these can all be done on the vector d.
In short, though it requires a bit of attention in your code, you can always represent a diagonal matrix as a vector, and that should solve your memory issues.
Shorter answer: Now, having said all that, of course people have already implemented the above steps implicitly using sparse matrices, which does the above steps under the hood. In R, the Matrix package is nice for sparse matrices: https://cran.r-project.org/web/packages/Matrix/Matrix.pdf

Related

How to create a sparse diagonal matrix?

Might be a very silly question, but I cannot seem to find a proper way to create a sparse diagonal matrix in R.
I've found the functions:
diag.spam()
spdiags()
and used them with library Matrix and package spam downloaded, but R did not seem to recognize these functions. Does anyone know a function or library I need to download?
I need it because I want to create diagonal matrices larger than 256 by 256.
The Diagonal() function in the Matrix package. (Matrix is a "recommended" package, which means it is automatically available when you install R.)
library(Matrix)
m <- Diagonal(500)
image(m)
Diagonal(n) creates an n x n identity matrix. If you want to create a diagonal matrix with a specified diagonal x, use Diagonal(x=<your vector>)
Use bandSparse of the Matrix library.
to get an n-by-n matrix with m on its diagonal use, write:
bandSparse(n,n,0,list(rep(m, n+1)))

set up a diagonal matrix faster?

Calling diag(x) is apparently very slow. Is there not a faster way to set up a diagonal matrix? It seems like a fairly easy operation, yet R takes forever.
Also, using the diagonal matrix later on in multiplications is also extremely slow. So if I wanted to use sparse matrices, is there a faster way to set up a diagonal sparse matrix?
I don't have any idea what "too slow" means, but
Matrix::Diagonal(n=100)
will produce a 100x100 (sparse) identity matrix, and
Matrix::Diagonal(x=1:100)
will produce a sparse diagonal matrix with entries 1, 2, ... 100

Preallocate sparse matrix with max nonzeros in R

I'm looking to preallocate a sparse matrix in R (using simple_triplet_matrix) by providing the dimensions of the matrix, m x n, and also the number of non-zero elements I expect to have. Matlab has the function "spalloc" (see below), but I have not been able to find an equivalent in R. Any suggestions?
S = spalloc(m,n,nzmax) creates an all zero sparse matrix S of size m-by-n with room to hold nzmax nonzeros.
Whereas it may make sense to preallocate a traditional dense matrix in R (in the same way it is much more efficient to preallocate a regular (atomic) vector rather than increasing its size one by one,
I'm pretty sure it will not pay to preallocate sparse matrices in R, in most situations.
Why?
For dense matrices, you allocate and then assign "piece by piece", e.g.,
m[i,j] <- value
For sparse matrices, however that is very different: If you do something like
S[i,j] <- value
the internal code has to check if [i,j] is an existing entry (typically non-zero) or not. If it is, it can change the value, but otherwise, one way or the other, the triplet (i,j, value) needs to be stored and that means extending the current structure etc. If you do this piece by piece, it is inefficient... mostly irrespectively if you had done some preallocation or not.
If, on the other hand, you already know in advance all the [i,j] combinations which will contain non-zeroes, you could "pre-allocate", but in this case,
just store the vector i and j of length nnzero, say. And then use your underlying "algorithm" to also construct a vector x of the same length which contains all the corresponding values, i.e., entries.
Now, indeed, as #Pafnucy suggested, use spMatrix() or sparseMatrix(), two slightly different versions of the same functionality: Constructing a sparse matrix, given its contents.
I am happy to help further, as I am the maintainer of the Matrix package.

Avoiding automatic conversion of dgCMatrix to dgeMatrix

I use the class dgCMatrix from the Matrix package to store a square matrix of about 255 million values, with a size of about 1.7MB .
However after I perform variable <- variable/rowSums(variable) where variable is the sparse matrix, the resulting variable changes into class dgeMatrix, and the size ballooned to almost 2GB, effectively taking up all memory available and in some instances crashing the script.
Is there a way to coerce the output to remain in the class dgCMatrix ?
I suspect that the reason is that the number of non-zero elements increase to the point that the matrix is no longer considered sparse, due to introduction of NaN in elements where the sum of rows is zero. If there's a work around to address the NaN 's , I'm open to that too. Note however that I cannot avoid producing the zero rows, because my matrix need to be a square, and the corresponding column sums are generally non-zero.
You could try doing a simple ifelse function for the divisor:
variable <- variable/ifelse(rowSums(variable)!=0,rowSums(variable),1)
Unless there's some reason you need to be dividing by the 0 there, that seems like the simplest way to avoid NANs.
I have the same problem. This is the work-around that I am using to avoid NaNs and to preserve the output in the class dgCMatrix:
tmp = 1/rowSums(variable)
tmp[is.infinite(tmp)] <- 0
variable <- variable * tmp

Adding a vector to matrix rows in numpy

Is there a fast way in numpy to add a vector to every row or column of a matrix.
Lately, I have been tiling the vector to the size of the matrix, which can use a lot of memory. For example
mat=np.arange(15)
mat.shape=(5,3)
vec=np.ones(3)
mat+=np.tile(vec, (5,1))
The other way I can think of is using a python loop, but loops are slow:
for i in xrange(len(mat)):
mat[i,:]+=vec
Is there a fast way to do this in numpy without resorting to C extensions?
It would be nice to be able to virtually tile a vector, like a more flexible version of broadcasting. Or to be able to iterate an operation row-wise or column-wise, which you may almost be able to do with some of the ufunc methods.
For adding a 1d array to every row, broadcasting already takes care of things for you:
mat += vec
However more generally you can use np.newaxis to coerce the array into a broadcastable form. For example:
mat + np.ones(3)[np.newaxis,:]
While not necessary for adding the array to every row, this is necessary to do the same for column-wise addition:
mat + np.ones(5)[:,np.newaxis]
EDIT: as Sebastian mentions, for row addition, mat + vec already handles the broadcasting correctly. It is also faster than using np.newaxis. I've edited my original answer to make this clear.
Numpy broadcasting will automatically add a compatible size vector (1D array) to a matrix (2D array, not numpy matrix). It does this by matching shapes based on dimension from right to left, "stretching" missing or value 1 dimensions to match the other. This is explained in https://numpy.org/doc/stable/user/basics.broadcasting.html:
mat: 5 x 3
vec: 3
vec (broadcasted): 5 x 3
By default, numpy arrays are row-major ("C order"), with axis 0 is "matrix row" and axis 1 is "matrix col", so the broadcasting clones the vector as matrix rows along axis 0.

Resources