Arithmetic/logic operations with large sparse tensors in R - r

I need to make quick calculations (+,*,>) with large 3D arrays (tensors) in R (like 1500 x 150 x 30000). Since these arrays are very sparse (only 0.03% entries are non-zeros) I first use as_sptensor function from the 'tensorr' library to convert my tensors to sparse ones like:
x <- array(data = c(1,0,0,0,0,0,0,1,1,1,1,1) , dim = c(3,2,2))
s <- as_dtensor(x)
s1 <- as_sptensor(s)
And then I do some arithmetic operations, e.g. multiplication:
s1*s1
I also have memory limitations of 8GB totally so that also help s me to store the result.
The problem is that when I deal with large tensors like:
A <-some_index_matrix[1:3,1:1000000]
A2 <- sptensor(A, rep(1,ncol(A)), dims=c(max(A[1,]),max(A[2,]),max(A[3,])))
A2*A2
I fail to get this product result within reasonable time. How can I optimize my code for such calculations to be carried out within seconds?

Related

Avoiding memory allocations with linear algebra calculations in R

As a trivial case, let's say I'm interested in calculating:
r = v*M^t
Where r and v are vectors and M is an extremely large sparse matrix.
I can solve it one of two ways:
r = v*(M*M*M*M...)
r = ((((v*M)M)M)M)...
Where the first approach results in intermediate dense matrices that is impractical to store in RAM (I would need at least 10's of terabytes for my target minimum use case). The second solution , instead, always results in intermediate vectors, and it does in fact work in practice.
The problem is that at larger values of t, memory allocations are causing a substantial performance bottleneck.
library(pryr)
n = 20000
v = 1:n
M = matrix(1:(n*n), n) # Not a sparse matrix like my use case, but easier to start with
for (i in 1:10) {
v = v %*% M
print(address(v))
}
As the address() function shows, v is being reallocated every iteration. It is not being modified in place. Not only are the memory allocations slowing things down, but according to profvis, the garbage collector is constantly being called as well and taking up a large portion of the time.
So my question is, is there a way to perform this calculation (and potentially others similar to it) in R without the excess memory allocations and gc() calls happening under the hood?

Implementing fast numerical calculations in R

I was trying to do an extensive computation in R. Eighteen hours have passed but my RStudio seems to continue to work. I'm not sure if I could have written the script in a different way to make it faster. I was trying to implement a Crank–Nicolson type method over a 50000 by 350 matrix as shown below:
#defining the discretization of cells
dt<-1
t<-50000
dz<-0.0075
z<-350*dz
#velocity & diffusion
v<-2/(24*60*60)
D<-0.02475/(24*60*60)
#make the big matrix (all filled with zeros)
m <- as.data.frame(matrix(0, t/dt+1, z/dz+2)) #extra columns/rows for boundary conditions
#fill the first and last columns with constant boundary values
m[,1]<-400
m[,length(m)]<-0
#implement the calculation
for(j in 2:(length(m[1,])-1)){
for(i in 2:length(m[[1]])){
m[i,][2:length(m)-1][[j]]<-m[i-1,][[j]]+
D*dt*(m[i-1,][[j+1]]-2*m[i-1,][[j]]+m[i-1,][[j-1]])/(dz^2)-
v*dt*(m[i-1,][[j+1]]-m[i-1,][[j-1]])/(2*dz)
}}
Is there a way to know how long would it take for R to implement it? Is there a better way of constructing the numerical calculation? At this point, I feel like excel could have been faster!!
Just making a few simple optimisations really helps here. The original version code of your code would take ~ 5 days on my laptop. Using a matrix and calculating just once values that are reused in the loop, we bring this down to around 7 minutes
And think about messy constructions like
m[i,][2:length(m)-1][[j]]
This is equivalent to
m[[i, j]]
which would be faster (as well as much easier to understand). Making this change further reduces the runtime by another factor of over 2, to around 3 minutes
Putting this together we have
dt<-1
t<-50000
dz<-0.0075
z<-350*dz
#velocity & diffusion
v<-2/(24*60*60)
D<-0.02475/(24*60*60)
#make the big matrix (all filled with zeros)
m <- (matrix(0, t/dt+1, z/dz+2)) #extra columns/rows for boundary conditions
# cache a few values that get reused many times
NC = NCOL(m)
NR = NROW(m)
C1 = D*dt / dz^2
C2 = v*dt / (2*dz)
#fill the first and last columns with constant boundary values
m[,1]<-400
m[,NC]<-0
#implement the calculation
for(j in 2:(NC-1)){
for(i in 2:NR){
ma = m[i-1,]
ma.1 = ma[[j+1]]
ma.2 = ma[[j-1]]
m[[i,j]] <- ma[[j]] + C1*(ma.1 - 2*ma[[j]] + ma.2) - C2*(ma.1 - ma.2)
}
}
If you need to go even faster than this, you can try out some more optimisations. For example see here for how different ways of indexing the same element can have very different execution times. In general it is better to refer to column first, then row.
If all the optimisations you can do in R are not enough for your speed requirements, then you might implement the loop in RCpp instead.

How to efficiently do cross-validation with big.matrix in R?

I have a function, as follows, that takes a design matrix X with class type big.matrix as input and predicts the responses.
NOTE: the size of matrix X is over 10 GB. So I cannot load it into memory. I used read.big.matrix() to generate backing files X.bin and X.desc.
myfun <- function(X) {
## do something with X. class(X) == 'big.matrix'
}
My question is that, how I can do cross validation efficiently with this huge big.matrix?
My attempt: (It works, but is time consuming.)
Step 1: for each fold, get indices for training idx.train and test idx.test;
Step 2: divide X into X.train and X.test. Since X.train and X.test are also very large, I have to store them as big.matrix, and create associated backing files (.bin, .desc) for the training and test sets for each fold.
Step 3: feed the X.train to build the model, and predict responses for X.test.
The time-consuming part is Step 2, where I have to create backing files for training and test (almost like copy/paste the original big matrix) many times. For example, suppose I do 10-fold cross validation. Step 2 would take over 30 minutes for creating backing files for all 10 folds!
To solving this issue in Step 2, I think maybe I can divide the original matrix into 10 sub matrices (of class type big.matrix) just once. Then for each fold, I use one portion for testing, and combine the remaining 9 portions as one big matrix for training. But the new issue is, there is no way to combine small big.matrix into a larger one efficiently without copy/paste.
Of course I can do distributed computing for this cross validation procedure. But I am just wondering whether there is a better way to speed up the procedure if just using a single core.
Any ideas? Thanks in advance.
UPDATE:
It turns out that #cdeterman's answer doesn't work when X is very large. The reason is that the mpermute() function permutes the rows by essentially doing copy/paste. mpermute() calls ReorderRNumericMatrix() in C++, which then calls reorder_matrix() function. This function reorders the matrix by looping over all columns and rows and doing copy/paste. See the source code here.
Are there any better ideas for solving my problem?? Thanks.
END UPDATE
You will want to use the sub.big.matrix function. This avoids any further copies and points the same original data. However, it can currently only subset contiguous rows. So you will want to permute your rows first.
# Step 1 - generate random indices
idx <- sample(nrow(X), nrow(X))
mpermute(X, idx)
# Step 2 - create your folds
max <- nrow(bm)/10 # assuming 10 folds
idx_list <- split(seq(nrow(bm)), ceiling(seq(nrow(bm))/max))
# Step 3 - list of sub.big.matrix objects
sm_list <- lapply(idx_list, function(x) sub.big.matrix(bm, firstRow = x[1], lastRow = x[length(x)]))
You now have the original big.matrix split into 10 different matrices that you can use as you like.

Preallocate sparse matrix with max nonzeros in R

I'm looking to preallocate a sparse matrix in R (using simple_triplet_matrix) by providing the dimensions of the matrix, m x n, and also the number of non-zero elements I expect to have. Matlab has the function "spalloc" (see below), but I have not been able to find an equivalent in R. Any suggestions?
S = spalloc(m,n,nzmax) creates an all zero sparse matrix S of size m-by-n with room to hold nzmax nonzeros.
Whereas it may make sense to preallocate a traditional dense matrix in R (in the same way it is much more efficient to preallocate a regular (atomic) vector rather than increasing its size one by one,
I'm pretty sure it will not pay to preallocate sparse matrices in R, in most situations.
Why?
For dense matrices, you allocate and then assign "piece by piece", e.g.,
m[i,j] <- value
For sparse matrices, however that is very different: If you do something like
S[i,j] <- value
the internal code has to check if [i,j] is an existing entry (typically non-zero) or not. If it is, it can change the value, but otherwise, one way or the other, the triplet (i,j, value) needs to be stored and that means extending the current structure etc. If you do this piece by piece, it is inefficient... mostly irrespectively if you had done some preallocation or not.
If, on the other hand, you already know in advance all the [i,j] combinations which will contain non-zeroes, you could "pre-allocate", but in this case,
just store the vector i and j of length nnzero, say. And then use your underlying "algorithm" to also construct a vector x of the same length which contains all the corresponding values, i.e., entries.
Now, indeed, as #Pafnucy suggested, use spMatrix() or sparseMatrix(), two slightly different versions of the same functionality: Constructing a sparse matrix, given its contents.
I am happy to help further, as I am the maintainer of the Matrix package.

R: Anything faster than outer()?

Using R, I need to evaluate an expression of the form [using latex notation]
\frac{1}{n^2} \sum_{i=1}^n \sum_{j=1}^n f(x_i-x_j),
where x_i,x_j are real (scalar) numbers and f is a nonlinear function with scalar input and output.
My current best [now using R commands] is
mat <- outer(x,x,function(y,z) f(y-z))
res <- mean(mat)
where x is a vector of length n which holds all the x_i's.
For n = 10000, this operation takes about 26 seconds on my PC, but (as expected) computation time increases heavily in n. I'd like to speed this up, mainly because I want to pass the above result to an optimizer later on. Any suggestions? Thanks!

Resources