Adding a vector to matrix rows in numpy - vector

Is there a fast way in numpy to add a vector to every row or column of a matrix.
Lately, I have been tiling the vector to the size of the matrix, which can use a lot of memory. For example
mat=np.arange(15)
mat.shape=(5,3)
vec=np.ones(3)
mat+=np.tile(vec, (5,1))
The other way I can think of is using a python loop, but loops are slow:
for i in xrange(len(mat)):
mat[i,:]+=vec
Is there a fast way to do this in numpy without resorting to C extensions?
It would be nice to be able to virtually tile a vector, like a more flexible version of broadcasting. Or to be able to iterate an operation row-wise or column-wise, which you may almost be able to do with some of the ufunc methods.

For adding a 1d array to every row, broadcasting already takes care of things for you:
mat += vec
However more generally you can use np.newaxis to coerce the array into a broadcastable form. For example:
mat + np.ones(3)[np.newaxis,:]
While not necessary for adding the array to every row, this is necessary to do the same for column-wise addition:
mat + np.ones(5)[:,np.newaxis]
EDIT: as Sebastian mentions, for row addition, mat + vec already handles the broadcasting correctly. It is also faster than using np.newaxis. I've edited my original answer to make this clear.

Numpy broadcasting will automatically add a compatible size vector (1D array) to a matrix (2D array, not numpy matrix). It does this by matching shapes based on dimension from right to left, "stretching" missing or value 1 dimensions to match the other. This is explained in https://numpy.org/doc/stable/user/basics.broadcasting.html:
mat: 5 x 3
vec: 3
vec (broadcasted): 5 x 3
By default, numpy arrays are row-major ("C order"), with axis 0 is "matrix row" and axis 1 is "matrix col", so the broadcasting clones the vector as matrix rows along axis 0.

Related

Huge diaginal matrix in R

The following code causes a memory error:
diag(1:100000)
Is there any alternative for diag which allows producing a huge diagonal matrix?
Longer answer: I suggest not creating a diagonal matrix, because in most situations you can do without it. To make that clear, consider the most typical matrix operations:
Multiply the diagonal matrix D by a vector v to produce Dv. Instead of maintaining a matrix, keep your "matrix" as a vector d of the diagonal elements, and then multiply d elementwise by v. Same result.
Invert the matrix. Again, easy: invert each element (of course, only for diagonal matrices is this generally the correct inverse).
Various decompositions/eigenvalues/determinants/trace. Again, these can all be done on the vector d.
In short, though it requires a bit of attention in your code, you can always represent a diagonal matrix as a vector, and that should solve your memory issues.
Shorter answer: Now, having said all that, of course people have already implemented the above steps implicitly using sparse matrices, which does the above steps under the hood. In R, the Matrix package is nice for sparse matrices: https://cran.r-project.org/web/packages/Matrix/Matrix.pdf

Preallocate sparse matrix with max nonzeros in R

I'm looking to preallocate a sparse matrix in R (using simple_triplet_matrix) by providing the dimensions of the matrix, m x n, and also the number of non-zero elements I expect to have. Matlab has the function "spalloc" (see below), but I have not been able to find an equivalent in R. Any suggestions?
S = spalloc(m,n,nzmax) creates an all zero sparse matrix S of size m-by-n with room to hold nzmax nonzeros.
Whereas it may make sense to preallocate a traditional dense matrix in R (in the same way it is much more efficient to preallocate a regular (atomic) vector rather than increasing its size one by one,
I'm pretty sure it will not pay to preallocate sparse matrices in R, in most situations.
Why?
For dense matrices, you allocate and then assign "piece by piece", e.g.,
m[i,j] <- value
For sparse matrices, however that is very different: If you do something like
S[i,j] <- value
the internal code has to check if [i,j] is an existing entry (typically non-zero) or not. If it is, it can change the value, but otherwise, one way or the other, the triplet (i,j, value) needs to be stored and that means extending the current structure etc. If you do this piece by piece, it is inefficient... mostly irrespectively if you had done some preallocation or not.
If, on the other hand, you already know in advance all the [i,j] combinations which will contain non-zeroes, you could "pre-allocate", but in this case,
just store the vector i and j of length nnzero, say. And then use your underlying "algorithm" to also construct a vector x of the same length which contains all the corresponding values, i.e., entries.
Now, indeed, as #Pafnucy suggested, use spMatrix() or sparseMatrix(), two slightly different versions of the same functionality: Constructing a sparse matrix, given its contents.
I am happy to help further, as I am the maintainer of the Matrix package.

Getting elements of a list in R

This is my problem:
There is a predefined list named gamma with three entries: gamma$'2' is 2x2 matrix gamma$'3' a 3x3 matrix and gamma$'4' a 4x4 matrix. I would like to have function that returns the matrix I need:
GiveMatrix <- function(n) {
gamma.list <- #init the list of matrices
gamma.list$n # return the list entry named n
Since n is not a character, the last line does not work. I tried gamma.list$paste(n)and gamma.list$as.character(n)but both did not work. Is there a function that converts nto the right format? Or is there maybe a much better way? I know, I am not really good in R.
You need to use:
gamma.list[[as.character(n)]]
In your example, R is looking for a entry in the list called n. When using [[, the contents of n is used, which is what you need.
I've found it!
gamma.list[as.character(n)] is the solution I needed.

Numpy indexing using array

I'm trying to return a (square) section from an array, where the indices wrap around the edges. I need to juggle some indexing, but it works, however, I expect the last two lines of codes to have the same result, why don't they? How does numpy interpret the last line?
And as a bonus question: Am I being woefully inefficient with this approach? I'm using the product because I need to modulo the range so it wraps around, otherwise I'd use a[imin:imax, jmin:jmax, :], of course.
import numpy as np
from itertools import product
i = np.arange(-1, 2) % 3
j = np.arange(1, 4) % 3
a = np.random.randint(1,10,(3,3,2))
print a[i,j,:]
# Gives 3 entries [(i[0],j[0]), (i[1],j[1]), (i[2],j[2])]
# This is not what I want...
indices = list(product(i, j))
print indices
indices = zip(*indices)
print 'a[indices]\n', a[indices]
# This works, but when I'm explicit:
print 'a[indices, :]\n', a[indices, :]
# Huh?
The problem is that advanced indexing is triggered if:
the selection object, obj, is [...] a tuple with at least one sequence object or ndarray
The easiest fix in your case is to use repeated indexing:
a[i][:, j]
An alternative would be to use ndarray.take, which will perform the modulo operation for you if you specify mode='wrap':
a.take(np.arange(-1, 2), axis=0, mode='wrap').take(np.arange(1, 4), axis=1, mode='wrap')
To give another method of advanced indexing which is better in my opinion then the product solution.
If you have for every dimension an integer array these are broadcasted together and the output is the same output as the broadcast shape (you will see what I mean)...
i, j = np.ix_(i,j) # this adds extra empty axes
print i,j
print a[i,j]
# and now you will actually *not* be surprised:
print a[i,j,:]
Note that this is a 3x3x2 array, while you had a 9x2 array, but simple reshape will fix that and the 3x3x2 array is actually closer to what you want probably.
Actually the surprise is still hidden in a way, because in your examples a[indices] is the same as a[indices[0], indicies[1]] but a[indicies,:] is a[(indicies[0], indicies[1]),:] which is not a big surprise that it is different. Note that a[indicies[0], indicies[1],:] does give the same result.
See : http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing
When you add :, you are mixing integer indexing and slicing. The rules are quite complicated and better explained than I could in the above link.

nrow(matrix) function

I have assignment using R and have a little problem. In the assignment several matrices have to be generated with random number of rows and later used for various calculations. Everything works perfect, unless number of rows is 1.
In the calculations I use nrow(matrix) in different ways, for example if (i <= nrow(matrix) ) {action} and also statements like matrix[,4] and so on.
So in case number of rows is 1 (I know it is actually vector) R give errors, definitely because nrow(1-dimensional matrix)=NULL. Is there simple way to deal with this? Otherwise probably whole code have to be rewritten, but I'm very short in time :(
It is not that single-row/col matrices in R have ncol/nrow set to NULL -- in R everything is a 1D vector which can behave like matrix (i.e. show as a matrix, accept matrix indexing, etc.) when it has a dim attribute set. It seems otherwise because simple indexing a matrix to a single row or column drops dim and leaves the data in its default (1D vector) state.
Thus you can accomplish your goal either by directly recreating dim attribute of a vector (say it is called x):
dim(x)<-c(length(x),1)
x #Now a single column matrix
dim(x)<-c(1,length(x))
x #Now a single row matrix
OR by preventing [] operator from dropping dim by adding drop=FALSE argument:
x<-matrix(1:12,3,4)
x #OK, matrix
x[,3] #Boo, vector
x[,3,drop=FALSE] #Matrixicity saved!
Let's call your vector x. Try using matrix(x) or t(matrix(x)) to convert it into a proper (2D) matrix.

Resources