I am looking for an effficient way of computing the Kronecker product of two large matrices. I have tried using the method kronecker() as follows:
I = diag(700)
data = replicate(15, rnorm(120))
test = kronecker(I,data)
However, it takes a long time to execute and then gives the following error:
Error: cannot allocate vector of size 6.8 Gb
As long as you use Matrix::Diagonal to construct your diagonal matrix, you'll automatically get your test object constructed as a sparse matrix:
library(Matrix)
I=Diagonal(700)
data = replicate(15,rnorm(120))
system.time(test <- kronecker(I,data))
## user system elapsed
## 0.600 0.044 0.671
dim(test)
## [1] 84000 10500
format(object.size(test),"Mb")
## [1] "19.2 Mb"
If you are computing kron(I,A)*v where v is a vector you can do this using vec(A*V) where V reshapes v into a matrix. This uses the more general rule that vec(ABC)=kron(C',A)*vec(B). This avoids forming the Kronecker product and uses far less operations to perform the computation.
Note that V may need to be transposed depending on how matrix storage is handled (columns versus rows).
Related
I am trying to read and write data into files at each time step.
To do this, I am using the package h5 to store large datasets but I find that my code using the functions of this package is running slowly. I am working with very large datasets. So, I have memory limit issues. Here is a reproducible example:
library(ff)
library(h5)
set.seed(12345)
for(t in 1:3650){
print(t)
## Initialize the matrix to fill
mat_to_fill <- ff(-999, dim=c(7200000, 48), dimnames=list(NULL, paste0("P", as.character(seq(1, 48, 1)))), vmode="double", overwrite = T)
## print(mat_to_fill)
## summary(mat_to_fill[,])
## Create the output file
f_t <- h5file(paste0("file",t,".h5"))
## Retrieve the matrix at t - 1 if t > 1
if(t > 1){
f_t_1 <- h5file(paste0("file", t - 1,".h5"))
mat_t_1 <- f_t_1["testmat"][] ## *********** ##
## f_t_1["testmat"][]
} else {
mat_t_1 <- 0
}
## Fill the matrix
mat_to_fill[,] <- matrix(data = sample(1:100, 7200000*48, replace = TRUE), nrow = 7200000, ncol = 48) + mat_t_1
## mat_to_fill[1:3,]
## Write data
system.time(f_t["testmat"] <- mat_to_fill[,]) ## *********** ##
## f_t["testmat"][]
h5close(f_t)
}
Is there an efficient way to speed up my code (see symbols ## *********** ##) ? Any advice would be much appreciated.
EDIT
I have tried to create a data frame from the function createDataFrame of the package "SparkR" but I have this error message:
Error in writeBin(batch, con, endian = "big") :
long vectors not supported yet: connections.c:4418
I have also tested other functions to write huge data in file:
test <- mat_to_fill[,]
library(data.table)
system.time(fwrite(test, file = "Test.csv", row.names=FALSE))
user system elapsed
33.74 2.10 13.06
system.time(save(test, file = "Test.RData"))
user system elapsed
223.49 0.67 224.75
system.time(saveRDS(test, "Test.Rds"))
user system elapsed
197.42 0.98 199.01
library(feather)
test <- data.frame(mat_to_fill[,])
system.time(write_feather(test, "Test.feather"))
user system elapsed
0.99 1.22 10.00
If possible, I would like to reduce the elapsed time to <= 1 sec.
SUPPLEMENTARY INFORMATION
I am building an agent-based model with R but I have memory issues because I work with large 3D arrays. In the 3D arrays, the first dimension corresponds to the time (each array has 3650 rows), the second dimension defines the properties of individuals or landscape cells (each array has 48 columns) and the third dimension represents each individual (in total, there are 720000 individuals) or landscape cell (in total, there are 90000 cells). In total, I have 8 3D arrays. Currently, the 3D arrays are defined at initialization so that data are stored in the array at each time step (1 day) using several functions. However, to fill one 3D array at t from the model, I need to only keep data at t – 1 and t – tf – 1, where tf is a duration parameter that is fixed (e.g., tf = 320 days). However, I don’t know how to manage these 3D arrays in the ABM at each time step. My first solution to avoid memory issues was thus to save data that are contained in the 3D array for each individual or cell at each time step (thus 2D array) and to retrieve data (thus read data from files) at t – 1 and t – tf – 1.
You matrix is 7200000 * 48 and with a 4 byte integer you'll get 7200000 * 48 * 4 bytes or ~1.3Gb. With the HDD r/w operation speed of 120Mb/s you are lucky to get 10 seconds if you have an average HDD. With a good SDD you should be able to get 2-3Gb/s and therefore about 0.5 second using fwrite or write_feather you tried. I assume you don't have SDD as it is not mentioned. You have 32Gb of memory which seems to be enough for 8 datasets of that size, so chances are you are using the memory to copy this data around. You can try to optimize your memory usage instead of writing it to the hard drive or to work with a portion of the dataset at a time, although both approaches are probably presenting implementation challenges. The problem of splitting the data and merging results is frequent distributed computing which requires splitting datasets and then merging results from multiple workers. Using database is always slower than plain disc operations, unless it is in-memory database which is stated to be not fitting into memory, unless you have some very specific sparse data that could be easily compressed/extracted.
You can try using-
library(fst)
write.fst(x, path, compress = 50, uniform_encoding = TRUE)
You can find more detailed comparison here -
https://www.fstpackage.org/
Note: You can use compress parameter to make it more efficient.
I am looking for an effficient way of computing the Kronecker product of two large matrices. I have tried using the method kronecker() as follows:
I = diag(700)
data = replicate(15, rnorm(120))
test = kronecker(I,data)
However, it takes a long time to execute and then gives the following error:
Error: cannot allocate vector of size 6.8 Gb
As long as you use Matrix::Diagonal to construct your diagonal matrix, you'll automatically get your test object constructed as a sparse matrix:
library(Matrix)
I=Diagonal(700)
data = replicate(15,rnorm(120))
system.time(test <- kronecker(I,data))
## user system elapsed
## 0.600 0.044 0.671
dim(test)
## [1] 84000 10500
format(object.size(test),"Mb")
## [1] "19.2 Mb"
If you are computing kron(I,A)*v where v is a vector you can do this using vec(A*V) where V reshapes v into a matrix. This uses the more general rule that vec(ABC)=kron(C',A)*vec(B). This avoids forming the Kronecker product and uses far less operations to perform the computation.
Note that V may need to be transposed depending on how matrix storage is handled (columns versus rows).
In the R programming language...
Bottleneck in my code:
a <- a[b]
where:
a,b are vectors of length 90 Million.
a is a logical vector.
b is a permutation of the indeces of a.
This operation is slow: it takes ~ 1.5 - 2.0 seconds.
I thought straightforward indexing would be much faster, even for large vectors.
Am I simply stuck? Or is there a way to speed this up?
Context:
P is a large matrix (10k row, 5k columns).
rows = names, columns = features. values = real numbers.
Problem: Given a subset of names, I need to obtain matrix Q, where:
Each column of Q is sorted (independently of the other columns of Q).
The values in a column of Q come from the corresponding column of P and are only those from the rows of P which are in the given subset of names.
Here is a naive implementation:
Psub <- P[names,]
Q <- sapply( Psub , sort )
But I am given 10,000 distinct subsets of names (each subset is several 20% to 90% of the total). Taking the subset and sorting each time is incredibly slow.
Instead, I can pre-compute the order vector:
b <- sapply( P , order )
b <- convert_to_linear_index( as.data.frame(b) , dim(P) )
# my own function.
# Now b is a vector of length nrow(P) * ncol(P)
a <- rownames(P) %in% myNames
a <- rep(a , ncol(P) )
a <- a[b]
a <- as.matrix(a , nrow = length(myNames) )
I don't see this getting much faster than that. You can try to write an optimized C function to do exactly this, which might cut the time in half or so (and that's optimistic -- vectorized R operations like this don't have much overhead), but not much more than that.
You've got approx 10^8 values to go through. Each time through the internal loop, it needs to increment the iterator, get the index b[i] out of memory, look up a[b[i]] and then save that value into newa[i]. I'm not a compiler/assembly expert by a long shot, but this sounds like on the order of 5-10 instructions, which means you're looking at "big O" of 1 billion instructions total, so there's a clock rate limit to how fast this can go.
Also, R stores logical values as 32 bit ints, so the array a will take up about 400 megs, which doesn't fit into cache, so if b is a more or less random permutation, then you're going to be missing the cache regularly (on most lookups to a, in fact). Again, I'm not an expert, but I would think it's likely that the cache misses here are the bottleneck, and if that's the case, optimized C won't help much.
Aside from writing it in C, the other thing to do is determine whether there are any assumptions you can make that would let you not go through the whole array. For example, if you know most of the indices will not change, and you can figure out which ones do change, you might be able to make it go faster.
On edit, here are some numbers. I have an AMD with clock speed of 2.8GHz. It takes me 3.4 seconds with a random permutation (i.e. lots of cache misses) and 0.7 seconds with either 1:n or n:1 (i.e. very few cache misses), which breaks into 0.6 seconds of execution time and 0.1 of system time, presumably to allocate the new array. So it does appear that cache misses are the thing. Maybe optimized C code could shave something like 0.2 or 0.3 seconds off of that base time, but if the permutation is random, that won't make much difference.
> x<-sample(c(T,F),90*10**6,T)
> prm<-sample(90*10**6)
> prm1<-1:length(prm)
> prm2<-rev(prm1)
> system.time(x<-x[prm])
user system elapsed
3.317 0.116 3.436
> system.time(x<-x[prm1])
user system elapsed
0.593 0.140 0.734
> system.time(x<-x[prm2])
user system elapsed
0.631 0.112 0.743
>
I have a very tall integer matrix (mat) and a sparse binary vector (v) of equal row length. I want to find the minimum value in all columns of mat where v==1.
Here are several possible solutions:
mat <- matrix(as.integer(runif(100000*100,0,2^31)),nrow=100000,ncol=100)
v<-(rbinom(100000,1,.01))
a<-apply(v*mat,2, function(x) min(x[x>0]))
b<-apply(mat,2,function(x) min(x[v==1]))
c<-sapply(subset(data.frame(mat),v==1), min)
These all work fine, and on my machine solution c seems fastest (an admittedly older,slower MacBook). But if I have a function that feeds unique sets of v, the computation time scales linearly with the number of sets. So a large number of unique sets (>10,000) will take hours to process.
Any ideas on how to do such an operation faster, or is this as fast as I can go?
I guess that subsetting and then calling apply gains a lot, given that v is almost always 0:
system.time(b<-apply(mat[as.logical(v),],2, min))
# user system elapsed
# 0.012 0.000 0.013
system.time(a<-apply(v*mat,2, function(x) min(x[x>0])))
# user system elapsed
# 0.628 0.019 0.649
identical(a,b)
#[1] TRUE
I dropped also the x[x>0], since it appears that mat is always greater than 0.
I have a large correlation matrix, 62589x62589. I've binarised the matrix above a certain threshold which I've done with no problems but I'm slightly confused as to the significant difference in basic Calculation time.
The first time I did this.... number of 1's : 425,491 ... Number of 0's : 3,916,957,430
Sum of these two numbers == 62589^2, implying that the matrix is truly binarised. I saved this as an Rdata object (31Mb). Performing a basic calculation of the matrix takes ~3.5 minutes.
fooB <- foo %*% foo
The second time, with a lower threshold..... number of 1's : 30,384,683 ... Number of 0's : 3,886,998,238. Sum of these is again, 62589^2, and therefore truly binarised. The Rdata Object is 84Mb. Performing the same multiplication step as above is still currently calculating after an hour.
Should the increased number of 1's in the newest matrix increase the file size and processing time so drastically?
Thanks for reading
Edit: final time for same calculation to second matrix is 65 minutes
Edit2: performing is() results in : Matrix Array Structure Vector
Here is a reproducible example that may help with memory size and processing times for binary sparse matrices from package Matrix:
n <- 62589
N1 <- 425491
require(Matrix)
foo <- sparseMatrix(i=sample(n, N1, TRUE), j=sample(n, N1, TRUE), dims=c(n, n))
print(object.size(foo), units="Mb")
#1.9 Mb
sum(foo)
#[1] 425464
(Note that the sampling may give some duplicates in pairs (i,j), thus the number above is slightly less than N1.)
Squaring:
system.time(fooB <- foo %*% foo)
# user system elapsed
# 0.39 0.03 0.42
print(object.size(fooB), units="Mb")
#11.3 Mb
sum(fooB)
#[1] 2892234
Cubing:
system.time(fooC <- fooB %*% foo)
# user system elapsed
# 2.74 0.11 2.87
print(object.size(fooC), units="Mb")
#75 Mb
sum(fooC)
#[1] 19610641