How to perform matrix operations similar to multiplication fast? - r

I have two matrices.
A<-matrix(c(1,0,2,3),2,2)
B<-matrix(c(0,1,4,2),2,2)
Instead of multiplication (A%*%B) and having results like:
C[1,1]<-A[1,1]*B[1,1]+ A[1,2]*B[2,1]
C[1,2]<-A[1,1]*B[1,2]+ A[1,2]*B[2,2]
C[2,1]<-A[2,1]*B[1,1]+ A[2,2]*B[2,1]
C[2,2]<-A[2,1]*B[1,2]+ A[2,2]*B[2,2]
How can I have a modified version of multiplication and get results like:
C[1,1]<-min(A[1,1],B[1,1])+ min(A[1,2],B[2,1])
C[1,2]<-min(A[1,1],B[1,2])+ min(A[1,2],B[2,2])
C[2,1]<-min(A[2,1],B[1,1])+ min(A[2,2],B[2,1])
C[2,2]<-min(A[2,1],B[1,2])+ min(A[2,2],B[2,2])
?
I know that I can do it with rotation, but am looking for a faster solution.
result <- matrix(nrow= 2, ncol= 2)
for(i in 1:2){
minMat <-t(apply(B,2,function(x) pmin(x, A[i,])))
result[i,]<-rowSums(minMat)
}

A piece of the solution could be to use a function as follow (from one of the comments above):
## Defining the function
sum.min.row <- function(i, A, B) {
minMat <-t(apply(B,2,function(x) pmin(x, A[i,])))
rowSums(minMat)
}
## Applying it to the whole matrix
t(sapply(1:nrow(A), sum.min.row, A, B))
# [,1] [,2]
# [1,] 1 3
# [2,] 1 2
This is still not optimal though...

Related

How do I adjust my function to multiply multiple (random number of) matrices?

I have written the following function for multiplying two matrices A and B:
f <- function(A,B){
m<-nrow(A)
n<-ncol(A)
n<-nrow(B)
p<-ncol(B)
Result<-matrix(0,nrow = m,ncol = p)
for(i in 1:m){
for(j in 1:p){
for(k in 1:n){
Result[i,j]<-Result[i,j]+A[i,k]*B[k,j]
}
}
}
return(Result)
}
How would I adjust my function code to multiple 3 or more, i.e., a random number of matrices rather than just 2?
You just iteratively apply two-matrix multiplication. Let f be the fundamental function multiplying two matrices A and B. Normally we use the internal one %*%, but you can use the one defined in your question.
Since the number of matrices are unknown, I suggest using .... We collect all matrices input into a "matrix list" by list(...), then use Reduce to cumulatively apply two-operand matrix multiplication.
g <- function (...) Reduce(f, list(...))
Note, it is your responsibility to ensure the matrix dimension are conformable, especially when you have a lot of matrices. In the following, I would just use square matrices as an example.
set.seed(0)
A <- matrix(rnorm(4),2)
B <- matrix(rnorm(4),2)
C <- matrix(rnorm(4),2)
f <- "%*%"
g(A, B, C)
# [,1] [,2]
#[1,] -3.753667 0.08634328
#[2,] -0.161250 -1.54194176
And this is as same as:
A %*% B %*% C
# [,1] [,2]
#[1,] -3.753667 0.08634328
#[2,] -0.161250 -1.54194176

How to combine two loops

In a tutorial on for() Loops came across the following exercise:
Exercise 4.4. Write a function to perform matrix-vector multiplication. It should take a matrix A and a vector b as arguments, and return the vector Ab. Use two loops to do this, rather than %*% or any vectorization.
Lets say I use a specific matrix A(dim:3,4) and vector b(length(3)).
> # Ex 4.4
> out<-c(1,1,1)
> Ab<-function(A,b) {
+ for(i in 1:dim(A)[1]) {
+
+ out[i]=sum(A[i,]*b)
+ }
+ out
+ }
> a = c(1,1,1)
> A
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 10
> a
[1] 1 1 1
> Ab(A,a)
[1] 12 15 19
This works for a very specific case, i.e. matrix with 3 rows and vector of length 3, but leaves much to be desired, i don't know what a good solution to this exercise would be but the question says 'use two loops'. Suggestions will be much appreciated.
thx
You are hiding the inner loop with A[i,]*b which is doing vectorized multiplication (ie. a hidden loop). So, if you expand that out explicitly you will have the two required loops.
Ab<-function(A,b) {
if (dim(A)[2] != NROW(b)) stop("wrong dimensions")
out <- matrix(, nrow(A), 1)
for(i in 1:dim(A)[1]) {
s <- 0
for (j in 1:dim(A)[2]) s <- s + A[i,j] * b[j]
out[i] <- s
}
out
}

R-apply a function to each row of a matrix, with a changing argument?

I have a function with two arguments. The first argument takes vector, and the second argument takes a scalar. I want to apply this function to each row of a matrix, but this function takes different second argument every time. I tried the following, it didn't work. I expected to calculate the p.value for each row and then divide the p.value by the row number. I expected the result to be a vector, but I got a matrix instead. This is a pseudo example, but it illustrates my purpose.
> foo = matrix(rnorm(100),ncol=20)
> f = function (x,y) t.test(x[1:10],x[11:20])$p.value/y
> goo = 1:5
> apply(foo,1,f,y=goo)
[,1] [,2] [,3] [,4] [,5]
[1,] 0.9406881 0.6134117 0.5484542 0.11299535 0.20420786
[2,] 0.4703440 0.3067059 0.2742271 0.05649767 0.10210393
[3,] 0.3135627 0.2044706 0.1828181 0.03766512 0.06806929
[4,] 0.2351720 0.1533529 0.1371135 0.02824884 0.05105196
[5,] 0.1881376 0.1226823 0.1096908 0.02259907 0.04084157
The following for loop strategy produces the expected result, expect would be very slow for the real data.
> res = numeric(5)
> for (i in 1:5){
res[i]=f(foo[i,],i)
}
> res
[1] 0.94068810 0.30670585 0.18281807 0.02824884 0.04084157
Any suggestions would be appreciated!
If your real purpose is like your example, you can vectorize the division:
f <- function(x) t.test(x[1:10], x[11:20])$p.value
apply(foo, 1, f) / goo
Based on the comment, the above is not appropriate.
In the case of the example, you might observe that the diagonal of the returned matrix is the desired result:
f = function (x,y) t.test(x[1:10],x[11:20])$p.value/y
goo = 1:5
diag(apply(foo,1,f,y=goo))
Besides being inefficient in time or space, this suffers from another problem. It is a result of the operation on y being vectorized that this is correct for the example. And in that case, the former solution is better. So I suspect that in your actual problem, your operation is not vectorized.
Sometimes a for loop really is the best answer. The apply family of functions are not magical; they are still loops.
Here is an sapply solution. It won't beat for for time (probably won't lose either) but it doesn't have a high space overhead. The idea is to apply the row index and use that to extract the row of foo and the element of goo to pass to f
sapply(seq(nrow(foo)), function(i) f(foo[i,], goo[i]))
f <- function (x,y) t.test(x[1:10],x[11:20])$p.value/y
f2 <- function(a, b){
tt <- t.test(x = a[1:10], y = a[11:20])$p.value
tt/b
}
f3 <- function() {
res <- numeric(5)
for (i in 1:5){
res[i] <- f(foo[i,],i)
}
res
}
f4 <- function(x) t.test(x[1:10], x[11:20])$p.value
set.seed(101)
foo <- matrix(rnorm(100),ncol=20)
goo <- 1:5
library(rbenchmark)
benchmark(
apply(foo, 1, f4) / goo,
mapply(f,split(foo,row(foo)),goo),
f2(foo,goo),
f3(),replications=1000,
sapply(seq(nrow(foo)), function(i) f(foo[i,], goo[i])),
columns=c("test","replications","elapsed","relative"))
## test replications elapsed relative
## 1 apply(foo, 1, f4)/goo 1000 1.581 5.528
## 3 f2(foo, goo) 1000 0.286 1.000
## 4 f3() 1000 1.458 5.098
## 2 mapply(...) 1000 1.599 5.591
## 5 sapply(...) 1000 1.486 5.196
The direct division is best (but not actually applicable); for this example there's not much difference between the other solutions, but for loop is better than sapply which is better than mapply. You should try this on a more realistic example to see how it's going to scale for your problem.

Efficient way to calculate array multiplication

Is there any efficient way to calculate 2x2 matrix H without for statement?
n=10
a=array(rnorm(n),c(2,1,n))
b=array(rnorm(n),c(2,1,n))
H=matrix(0,2,2)
for(i in 1:n) H=H+a[,,i] %*% t(b[,,i])
H=matrix(0,2,2)
for(i in 1:n) H=H+a[,,i] %*% t(b[,,i])
H
#----------
[,1] [,2]
[1,] 10.770929 -0.4245556
[2,] -5.613436 -1.7588095
H2 <-a[ ,1, ] %*% t(b[ ,1, ])
H2
#-------------
[,1] [,2]
[1,] 10.770929 -0.4245556
[2,] -5.613436 -1.7588095
This does depend on the arrays in question having one of their dimensions == 1, and on the fact that "[" will drop length-1 dimensions unless you specify drop=FALSE.
This is the same (up to FAQ 7.31 issues) as what you calculate:
In case the second dimension truly has only 1 level, you can use
tcrossprod( matrix(a,nr=2), matrix(b,nr=2) )
and more generally,
crossprod( matrix( aperm(a, c(3,1,2)), nc=2), matrix( aperm(b, c(3,1,2)), nc=2) )
If you can create 'a' and 'b' ordered so that you do not need the aperm() it will be still faster.
The relative speed of different solutions depends on the dimensions. If the first two are both big and the last one small, a loop like yours (but using crossprod) might be as quick as you can get.

outer() equivalent for non-vector lists in R

I understand how outer() works in R:
> outer(c(1,2,4),c(8,16,32), "*")
[,1] [,2] [,3]
[1,] 8 16 32
[2,] 16 32 64
[3,] 32 64 128
It basically takes 2 vectors, finds the crossproduct of those vectors, and then applies the function to each pair in the crossproduct.
I don't have two vectors, however. I have two lists of matrices:
M = list();
M[[1]] = matrix(...)
M[[2]] = matrix(...)
M[[3]] = matrix(...)
And I want to do an operation on my list of matricies. I want to do:
outer(M, M, "*")
In this case, I want to take the dot product of each combination of matrices I have.
Actually, I am trying to generate a kernel matrix (and I have written a kernel function), so I want to do:
outer(M, M, kernelFunction)
where kernelFunction calculates a distance between my two matrices.
The problem is that outer() only takes "vector" arguments, rather than "list"s etc. Is there a function that does the equivalent of outer() for non-vector entities?
Alternately, I could use a for-loop to do this:
M = list() # Each element in M is a matrix
for (i in 1:numElements)
{
for (j in 1:numElements)
{
k = kernelFunction(M[[i]], M[[j]])
kernelMatrix[i,j] = k;
}
}
but I am trying to avoid this in favor of an R construct (which might be more efficient). (Yes I know I can modify the for-loop to compute the diagonal matrix and save 50% of the computations. But that's not the code that I'm trying to optimize!)
Is this possible? Any thoughts/suggestions?
The outer function actually DOES work on lists, but the function that you provide gets the two input vectors repeated so that they contain all possible combinations...
As for which is faster, combining outer with vapply is 3x faster than the double for-loop on my machine. If the actual kernel function does "real work", the difference in looping speed is probably not so important.
f1 <- function(a,b, fun) {
outer(a, b, function(x,y) vapply(seq_along(x), function(i) fun(x[[i]], y[[i]]), numeric(1)))
}
f2 <- function(a,b, fun) {
kernelMatrix <- matrix(0L, length(a), length(b))
for (i in seq_along(a))
{
for (j in seq_along(b))
{
kernelMatrix[i,j] = fun(a[[i]], b[[j]])
}
}
kernelMatrix
}
n <- 300
m <- 2
a <- lapply(1:n, function(x) matrix(runif(m*m),m))
b <- lapply(1:n, function(x) matrix(runif(m*m),m))
kernelFunction <- function(x,y) 0 # dummy, so we only measure the loop overhead
> system.time( r1 <- f1(a,b, kernelFunction) )
user system elapsed
0.08 0.00 0.07
> system.time( r2 <- f2(a,b, kernelFunction) )
user system elapsed
0.23 0.00 0.23
> identical(r1, r2)
[1] TRUE
Just use the for loop. Any built-in functions will degenerate to that anyway, and you'll lose clarity of expression, unless you carefully build a function that generalises outer to work with lists.
The biggest improvement you could make would be to preallocate the matrix:
M <- list()
length(M) <- numElements ^ 2
dim(M) <- c(numElements, numElements)
PS. A list is a vector.
Although this is an old question, here is another solution that is more in the spirit of the outer function. The idea is to apply outer along the indices of list1 and list2:
cor2 <- Vectorize(function(x,y) {
vec1 <- list1[[x]]
vec2 <- list2[[y]]
cor(vec1,vec2,method="spearman")
})
outer(1:length(list1), 1:length(list2), cor2)

Resources