3D Matrix Multiplication in R - r

I have a simple problem. I want to multiply a 3D array by another 3D array in R without using a for-loop.
To illustrate:
Suppose I have a 1x3 matrix A:
[A1, A2, A3]
And I have a 3x3 matrix B:
[B1, B2, B3 \\
B4, B5, B6 \\
B7, B8, B9]
My main operation is A %*% B resulting in a 1x3 matrix.
But now I want to repeat the process 10,000 times, each with a different A and B of the same dimensions as above. I can use a for-loop
for (i in 1:10000) {
A[i] %*% B[i]
}
Then I can store the 10,000 values.
But is there any way to achieve the same thing without using a for-loop. I am thinking of possibly a 3D array multiplication. But I am not sure how to do this in R.
Matrix A: 1 x 3 x 10000
[A1, A2, A3]
Matrix B: 3 x 3 x 10000
[B1, B2, B3
B4, B5, B6
B7, B8, B9]
Also, would vectorization help?
Can you guys please help? Thank you!

There are several ways to accomplish this with array multiplication. The price you pay is to reformat the matrices into much larger tensors with many zeros. Those are sparse, by definition, and so the principal cost is the overhead for conversion. It's actually superior to a loop by the time you have 10,000 arrays to multiply.
Let n by the number of (A,B) pairs and k=3 the dimension.
The sleekest solution seems to be to reorganize the n rows of A (an n by k matrix) into an n*k by n*k block-diagonal matrix of k by k blocks. Block i, i=1..n, contains row i of A in its top row and otherwise is zero. Multiplying this (on the right) by B (arranged as a k*n by k matrix consisting of a "stack" of n blocks of dimension k by k) computes all the individual products, depositing them at rows 1, k+1, 2k+1, ..., of the result, where they can be picked out.
f3 <- function(a, b) {
require(RcppArmadillo) # sparseMatrix package
n <- dim(b)[3]
k <- dim(b)[2]
i0 <- (1:n-1)*k+1
i <- rep(i0, each=k)
j <- 1:(k*n)
aa <- sparseMatrix(i, j, x=c(t(a)), dims=c(n*k, n*k))
bb <- matrix(aperm(b, c(1,3,2)), nrow=n*k)
t((aa %*% bb)[i0, ])
}
As you can see, the array operations are basic: create sparse matrices, transpose arrays (with aperm and t), and multiply. It returns its results in a k by n array (which you may transpose if you prefer), one result vector per column.
As a test, here is a brute-force loop using the same array data structures.
f1 <- function(a, b) sapply(1:nrow(a), function(i) a[i,] %*% b[,,i])
We may apply these solutions to the same input and compare the results:
#
# Create random matrices for testing.
#
k <- 3
n <- 1e6 # Number of (a,B) pairs
a <- matrix(runif(k*n), ncol=k)
b <- array(runif(k^2*n), dim=c(k,k,n))
system.time(c1 <- f1(a,b)) # 4+ seconds
system.time(c3 <- f3(a,b)) # 2/3 second
mean((c1-c3)^2) # Want around 10^-32 or less
The results aren't completely equal, but their mean squared difference is less than 10^-32, showing they can be considered the same up to floating point roundoff error.
The array-oriented procedure f3 is initially slower than the looping procedure f1, but catches up by the time n is 10,000. After that it's about twice as fast or better (on this machine; YMMV). Both algorithms should scale linearly in n (and the timing suggests they do, at least out to n=10,000,000).

If your A and B are lists, you can use mapply():
> nn <- 1e1
> set.seed(1)
> A <- replicate(nn,matrix(rnorm(3),nrow=1),simplify=FALSE)
> B <- replicate(nn,matrix(rnorm(9),nrow=3),simplify=FALSE)
> head(mapply("%*%",A,B,SIMPLIFY=FALSE),3)
[[1]]
[,1] [,2] [,3]
[1,] -1.193976 0.1275999 -0.6831007
[[2]]
[,1] [,2] [,3]
[1,] 1.371143 1.860379 -1.639078
[[3]]
[,1] [,2] [,3]
[1,] 0.8250047 -0.6967286 1.949236

The for-loop is more efficient than you think
Your problem of multiplying n (A,B) pairs is not equivalent to tensor multiplication in the usual sense, although whuber has presented a very neat way of turning it into a matrix multiplication by stacking the Bs as blocks in a sparse matrix.
You have said you want to avoid a for-loop, but the for-loop approach is actually very competitive when programmed efficiently, and I would suggest that you reconsider it.
I will use the same notation as whuber, with A of dimension n x k and B of dimension k x k x n, for example:
n <- 1e4
k <- 3
A <- array(rnorm(k*n),c(n,k))
B <- array(rnorm(k*k*n),c(k,k,n))
A simple and efficient for-loop solution would go like this
justAForLoop <- function(A,B) {
n <- nrow(A)
for (i in 1:n) A[i,] <- A[i,] %*% B[,,i]
A
}
producing an n x k matrix of results.
I have modified whuber's f3 function to load the Matrix package, otherwise the sparseMatrix function is unavailable. My version of f3 is very slightly faster than the original because I've eliminated the last matrix transpose before returning the result.
With this modification, it returns identical numerical results to justAForLoop.
f3 <- function(a, b) {
require(Matrix)
n <- dim(b)[3]
k <- dim(b)[2]
i0 <- (1:n-1)*k+1
i <- rep(i0, each=k)
j <- 1:(k*n)
aa <- sparseMatrix(i, j, x=c(t(a)), dims=c(n*k, n*k))
bb <- matrix(aperm(b, c(1,3,2)), nrow=n*k)
(aa %*% bb)[i0, ]
}
Now I rerun whuber's simulation in a fresh R session:
> k <- 3
> n <- 1e6
> a <- matrix(runif(k*n), ncol=k)
> b <- array(runif(k^2*n), dim=c(k,k,n))
>
> system.time(c1 <- f1(a,b))
user system elapsed
3.40 0.09 3.50
> system.time(c3 <- f3(a,b))
Loading required package: Matrix
user system elapsed
1.06 0.24 1.30
> system.time(c4 <- justAForLoop(a,b))
user system elapsed
1.27 0.00 1.26
The for-loop approach is actually the fastest by a narrow margin. It is very much faster than f1, which relies on sapply. (My machine is a Windows 10 PC with 32Gb RAM running R 3.6.0).
If I run all three methods a second time, then f3 becomes the fastest because this time the Matrix package is already in the search path and doesn't have to be reloaded:
> system.time(c1 <- f1(a,b))
user system elapsed
3.23 0.04 3.26
> system.time(c3 <- f3(a,b))
user system elapsed
0.33 0.20 0.53
> system.time(c4 <- justAForLoop(a,b))
user system elapsed
1.28 0.01 1.30
However f3 uses more RAM than the for-loop. On my PC, I can run justAForLoop successfully with n=1e8 whereas f1 and f3 both run out of memory and fail.
Summary
A direct for-loop approach is much more efficient than sapply.
For your problem with n=10,000 matrix multiplications, running the for-loop is simple and efficient, taking <0.02sec. By contrast, merely loading the package with sparse matrix functions requires about 2/3sec.
For n between 1-10 million, whuber's sparse matrix solution starts to outperform, especially if the Matrix package is already loaded.
The for-loop uses the least RAM of the three methods. For n at 100 million on my PC with 32Gb RAM, only the for-loop approach works.

Related

How to generate unique combinations of cases to maximize a value while minimizing another?

I am currently trying to work on some code that I hope will be able to accomplish two things for data that looks like this (with about ~100 observations):
Lines <- " key error money
1 7224 0.5500000 2483118
2 7223 0.5200000 2451469
3 7222 1.6600000 2425693
4 7247 0.6400000 2324070
5 7256 0.4400000 1785569
6 7248 0.2541168 1476720"
DF <- read.table(text = Lines)
I want to write a function that will create various combinations of "key" (perhaps with the ability to threshold to n cases) in order to maximize "money" (at perhaps an arbitrary amount, say >=50,000 or <=30,000) while minimizing the "error" amount. A good way to think about it is that I want to create various portfolios of these keys on-the-fly.
I am still somewhat of an R beginner, so I understand that this may be a complicated function - I would mainly want a way to get started, but I am happy with a complete explanation as well. Thank you!
If the problem is to find the subset of rows whose sum of errors is minimized subject to the sum of money being greater than or equal to some known constant M then it can be expressed as a 0-1 integer linear programming problem:
minimize error'x subject to money'x >= M and x is a vector of 0's and 1's
In terms of R code:
library(lpSolve)
M <- 4000000
res <- lp("min", DF$error, t(DF$money), ">=", M, all.bin = TRUE)
res
## Success: the objective function is 0.96
DF$key[res$solution == 1]
## [1] 7223 7256
N Best Feasible Solutions via num.bin.solns argument
A proposed solution is said to be feasible if it satisfies the constraints. To get the N best feasible solutions, the following is supposed to work but it seems it's a bit buggy here. ?lp does warn of this. It would still be worthwhile to try it on your problem in case it does work on it.
N <- 3
res <- lp("min", DF$error, t(DF$money), ">=", M, all.bin = TRUE,
num.bin.solns = N, use.rw = TRUE)
N Best Feasible Solutions via cutting planes
Another possibility for N best feasible solutions is that for the ith solution cut off the first i-1 solutions by adding a constraint that excludes them and re-run:
res <- list(objval = 0)
N <- 3 # no of solutions desired
for(i in 1:N) {
res <- lp("min", DF$error, rbind(DF$money, DF$error), ">=", c(M, res$objval * 1.0001),
all.bin = TRUE)
print(res)
print(DF$key[res$solution == 1])
}
giving:
Success: the objective function is 0.96
[1] 7223 7256
Success: the objective function is 0.99
[1] 7224 7256
Success: the objective function is 1.07
[1] 7224 7223
One caveat is that this method will only return one of multiple values feasible solutions that give the same objective value (or objective values very near to each other). For example, if there were two combinations that both had an objective value of 0.96 then the first iteration of the loop would find one of them and the second iteration would look for objective values >= 0.96 * 1.0001 hence it would eliminate both of them from further consideration.
Note: The input DF in reproducible form is:
Lines <- " key error money
1 7224 0.5500000 2483118
2 7223 0.5200000 2451469
3 7222 1.6600000 2425693
4 7247 0.6400000 2324070
5 7256 0.4400000 1785569
6 7248 0.2541168 1476720"
DF <- read.table(text = Lines)

How to calculate sum over term including rising factorial?

I am new to programming and R and would like to compute the following sum
I used the pochMpfr from the Rmpfr package for the rising factorial and a for loop in order compute the sum.
B=rep(1,k+1)
for (i in 0:k) {
B[(i+1)]= (-1)^i *choose(k,i)*pochMpfr((-i)*sigma, n)
}
sum(B)
Doing so, I get the results as list (including always: mpfr) and thus cannot compute the sum.
Is there a possibility to get the results immediately as a Matrix or to convert the list to vector including only the relevant Elements?
The solution is probably quite easy but I haven't found it while looking through the forums.
There is no need to use a for-loop, this should work:
library(Rmpfr)
# You do not define these in your question,
# so I just take some arbitrary values
k <- 10
n <- 3
sigma <- 0.3
i <- 0:k
B <- (-1)^i *choose(k,i)*pochMpfr((-i)*sigma, n)
sum(B)
## 1 'mpfr' number of precision 159 bits
## [1] 6.2977401071861993597462780570563107354142915151e-14

how many unique powers are for x^y for x in 1-1000 and y in 1-1000 using R

Using R, calculate for x and y be integers ∈ [1, 1000], How many unique powers, x^y exist.
This is what I have right now, just don't know how to eliminate the duplicate numbers,
x<-1:1000
y<-1:1000
for (i in x)
{
for (j in y){
print(i^j)
}
}
A combinatorial approach to this could split the numbers from 1-1000 into equivalence classes where each number in the class is the power of some other number. For instance, we would split the numbers 1-10 into (1), (2, 4, 8), (3, 9), (5), (6), (7), (10). None of the powers of values between equivalence classes will coincide, so we can just handle each equivalence class separately.
num.unique.comb <- function(limit) {
# Count number of powers in each equivalence class (labeled by lowest val)
num.powers <- rep(0, limit)
# Handle 1 as special case
num.powers[1] <- 1
# Beyond sqrt(limit), all unhandled numbers are in own equivalence class
handled <- c(T, rep(F, limit-1))
for (base in 2:ceiling(sqrt(limit))) {
if (!handled[base]) {
# Handle all the values in 1:limit that are powers of base
num.handle <- floor(log(limit, base))
handled[base^(1:num.handle)] <- T
# Compute the powers of base that we cover
num.powers[base] <- length(unique(as.vector(outer(1:num.handle, 1:limit))))
}
}
num.powers[!handled] <- limit
# Handle sums too big for standard numeric types
library(gmp)
print(sum(as.bigz(num.powers)))
}
num.unique.comb(10)
# [1] 76
num.unique.comb(1000)
# [1] 978318
One nice property of this combinatorial approach is that it's very fast compared to a brute-force approach. For instance, it takes less than 0.1 seconds to compute with limit set to 1000. This allows us to compute the result for much larger values:
# ~0.15 seconds
num.unique.comb(10000)
# [1] 99357483
# ~4 seconds
num.unique.comb(100000)
# [1] 9981335940
# ~220 seconds
num.unique.comb(1000000)
# [1] 999439867182
This is a pretty neat result -- in under 4 minutes we can compute the number of unique values within 1 trillion numbers, where each number can have up to 6 million digits!
Update: Based on this combinatorial code I've updated the OEIS entry for this sequence to include terms up to 10,000.
A brute-force approach would be to just compute all the powers and count the number of unique values:
num.unique.bf <- function(limit) {
length(unique(as.vector(sapply(1:limit, function(x) x^(1:limit)))))
}
num.unique.bf(10)
# [1] 76
A problem with this brute-force analysis is that you are dealing with large numbers that will create numerical issues. For instance:
1000^1000
# [1] Inf
As a result we get an inaccurate value:
# Wrong due to numerical issues!
num.unique.bf(1000)
# [1] 119117
However, a package like the gmp can enable us to compute even numbers as large as 1000^1000. My computer has trouble storing all 1 million numbers in memory at once, so I'll write them to a file (size for n=1000 is 1.2 GB on my computer) and then compute the number of unique values in that file:
library(gmp)
num.unique.bf2 <- function(limit) {
sink("foo.txt")
for (x in 1:limit) {
vals <- as.bigz(x)^(1:limit)
for (idx in 1:limit) {
cat(paste0(as.character(vals[idx]), "\n"))
}
}
sink()
as.numeric(system("sort foo.txt | uniq | wc -l", intern=T))
}
num.unique.bf2(10)
# [1] 76
num.unique.bf2(1000)
# [1] 978318
A quick visit to the OEIS (click the link for the first 1000 values) shows that this is correct. This approach is rather slow (roughly 40 minutes on my computer), and combinatorial approaches should be significantly faster.

R Large Matrix Size differences

I have a large correlation matrix, 62589x62589. I've binarised the matrix above a certain threshold which I've done with no problems but I'm slightly confused as to the significant difference in basic Calculation time.
The first time I did this.... number of 1's : 425,491 ... Number of 0's : 3,916,957,430
Sum of these two numbers == 62589^2, implying that the matrix is truly binarised. I saved this as an Rdata object (31Mb). Performing a basic calculation of the matrix takes ~3.5 minutes.
fooB <- foo %*% foo
The second time, with a lower threshold..... number of 1's : 30,384,683 ... Number of 0's : 3,886,998,238. Sum of these is again, 62589^2, and therefore truly binarised. The Rdata Object is 84Mb. Performing the same multiplication step as above is still currently calculating after an hour.
Should the increased number of 1's in the newest matrix increase the file size and processing time so drastically?
Thanks for reading
Edit: final time for same calculation to second matrix is 65 minutes
Edit2: performing is() results in : Matrix Array Structure Vector
Here is a reproducible example that may help with memory size and processing times for binary sparse matrices from package Matrix:
n <- 62589
N1 <- 425491
require(Matrix)
foo <- sparseMatrix(i=sample(n, N1, TRUE), j=sample(n, N1, TRUE), dims=c(n, n))
print(object.size(foo), units="Mb")
#1.9 Mb
sum(foo)
#[1] 425464
(Note that the sampling may give some duplicates in pairs (i,j), thus the number above is slightly less than N1.)
Squaring:
system.time(fooB <- foo %*% foo)
# user system elapsed
# 0.39 0.03 0.42
print(object.size(fooB), units="Mb")
#11.3 Mb
sum(fooB)
#[1] 2892234
Cubing:
system.time(fooC <- fooB %*% foo)
# user system elapsed
# 2.74 0.11 2.87
print(object.size(fooC), units="Mb")
#75 Mb
sum(fooC)
#[1] 19610641

subset slow in large matrix

I have a numeric vector of length 5,000,000
>head(coordvec)
[1] 47286545 47286546 47286547 47286548 47286549 472865
and a 3 x 1,400,000 numeric matrix
>head(subscores)
V1 V2 V3
1 47286730 47286725 0.830
2 47286740 47286791 0.065
3 47286750 47286806 -0.165
4 47288371 47288427 0.760
5 47288841 47288890 0.285
6 47288896 47288945 0.225
What I am trying to accomplish is that for each number in coordvec, find the average of V3 for rows in subscores in which V1 and V2 encompass the number in coordvec. To do that, I am taking the following approach:
results<-numeric(length(coordvec))
for(i in 1:length(coordvec)){
select_rows <- subscores[, 1] < coordvec[i] & subscores[, 2] > coordvec[i]
scores_subset <- subscores[select_rows, 3]
results[m]<-mean(scores_subset)
}
This is very slow, and would take a few days to finish. Is there a faster way?
Thanks,
Dan
I think there are two challenging parts to this question. The first is finding the overlaps. I'd use the IRanges package from Bioconductor (?findInterval in the base package might also be useful)
library(IRanges)
creating width 1 ranges representing the coordinate vector, and set of ranges representing the scores; I sort the coordinate vectors for convenience, assuming that duplicate coordinates can be treated the same
coord <- sort(sample(.Machine$integer.max, 5000000))
starts <- sample(.Machine$integer.max, 1200000)
scores <- runif(length(starts))
q <- IRanges(coord, width=1)
s <- IRanges(starts, starts + 100L)
Here we find which query overlaps which subject
system.time({
olaps <- findOverlaps(q, s)
})
This takes about 7s on my laptop. There are different types of overlaps (see ?findOverlaps) so maybe this step requires a bit of refinement.
The result is a pair of vectors indexing the query and overlapping subject.
> olaps
Hits of length 281909
queryLength: 5000000
subjectLength: 1200000
queryHits subjectHits
<integer> <integer>
1 19 685913
2 35 929424
3 46 1130191
4 52 37417
I think this is the end of the first complicated part, finding the 281909 overlaps. (I don't think the data.table answer offered elsewhere addresses this, though I could be mistaken...)
The next challenging part is calculating a large number of means. The built-in way would be something like
olaps0 <- head(olaps, 10000)
system.time({
res0 <- tapply(scores[subjectHits(olaps0)], queryHits(olaps0), mean)
})
which takes about 3.25s on my computer and appears to scale linearly, so maybe 90s for the 280k overlaps. But I think we can accomplish this tabulation efficiently with data.table. The original coordinates are start(v)[queryHits(olaps)], so as
require(data.table)
dt <- data.table(coord=start(q)[queryHits(olaps)],
score=scores[subjectHits(olaps)])
res1 <- dt[,mean(score), by=coord]$V1
which takes about 2.5s for all 280k overlaps.
Some more speed can be had by recognizing that the query hits are ordered. We want to calculate a mean for each run of query hits. We start by creating a variable to indicate the ends of each query hit run
idx <- c(queryHits(olaps)[-1] != queryHits(olaps)[-length(olaps)], TRUE)
and then calculate the cumulative scores at the ends of each run, the length of each run, and the difference between the cumulative score at the end and at the start of the run
scoreHits <- cumsum(scores[subjectHits(olaps)])[idx]
n <- diff(c(0L, seq_along(idx)[idx]))
xt <- diff(c(0L, scoreHits))
And finally, the mean is
res2 <- xt / n
This takes about 0.6s for all the data, and is identical to (though more cryptic than?) the data.table result
> identical(res1, res2)
[1] TRUE
The original coordinates corresponding to the means are
start(q)[ queryHits(olaps)[idx] ]
Something like this might be faster :
require(data.table)
subscores <- as.data.table(subscores)
subscores[, cond := V1 < coordvec & V2 > coordvec]
subscores[list(cond)[[1]], mean(V3)]
list(cond)[[1]] because: "When i is a single variable name, it is not considered an expression of column names and is instead evaluated in calling scope." source: ?data.table
Since your answer isn't easily reproducible and even if it were, none of your subscores meet your boolean condition, I'm not sure if this does exactly what you're looking for but you can use one of the apply family and a function.
myfun <- function(x) {
y <- subscores[, 1] < x & subscores[, 2] > x
mean(subscores[y, 3])
}
sapply(coordvec, myfun)
You can also take a look at mclapply. If you have enough memory this will probably speed things up significantly. However, you could also look at the foreach package with similar results. You've got your for loop "correct" by assigning into results rather than growing it, but really, you're doing a lot of comparisons. It will be hard to speed this up much.

Resources