Is there any efficient way to calculate 2x2 matrix H without for statement?
n=10
a=array(rnorm(n),c(2,1,n))
b=array(rnorm(n),c(2,1,n))
H=matrix(0,2,2)
for(i in 1:n) H=H+a[,,i] %*% t(b[,,i])
H=matrix(0,2,2)
for(i in 1:n) H=H+a[,,i] %*% t(b[,,i])
H
#----------
[,1] [,2]
[1,] 10.770929 -0.4245556
[2,] -5.613436 -1.7588095
H2 <-a[ ,1, ] %*% t(b[ ,1, ])
H2
#-------------
[,1] [,2]
[1,] 10.770929 -0.4245556
[2,] -5.613436 -1.7588095
This does depend on the arrays in question having one of their dimensions == 1, and on the fact that "[" will drop length-1 dimensions unless you specify drop=FALSE.
This is the same (up to FAQ 7.31 issues) as what you calculate:
In case the second dimension truly has only 1 level, you can use
tcrossprod( matrix(a,nr=2), matrix(b,nr=2) )
and more generally,
crossprod( matrix( aperm(a, c(3,1,2)), nc=2), matrix( aperm(b, c(3,1,2)), nc=2) )
If you can create 'a' and 'b' ordered so that you do not need the aperm() it will be still faster.
The relative speed of different solutions depends on the dimensions. If the first two are both big and the last one small, a loop like yours (but using crossprod) might be as quick as you can get.
Related
I hope I can find answer for this question here. I have this piece of code that I am trying to analyze closely,
alphas <- matrix(runif(900), ncol=3, byrow=TRUE)
z <- t(apply(alphas, 1, cumsum))
for(i in 1:nrow(z)){
z[i, ] <- z[i, ] / (1:ncol(z))
}
I am trying to understand what does z[i,]<- z[i,]/(1:ncol(z)) code is doing for the matrix alphas. I know we are dividing each column by the sequence of columns in the input matrix. I also know when using apply with margin 2, we apply the function we are interested in, which is in this case "cumsum" over the rows of matrix alphas. Thats basically what I know, I have no clue why the next line and what does to my matrix alphas?
I would appreciate some insigts
Thank you very much
With your code I would say you are calculating row-wise cumulative means of your alphas.
With the line in your loop you're doing a vector division that yields the averages of cumulative sums of each column.
Look what ncol(z) yields
> ncol(z)
[1] 3
So basically what you're doing with z[i, ] / (1:ncol(z)) in your loop is a division of each row by a vector, or sequence respectively, with length of column numbers, i.e. c(1, 2, 3) or just 1:3.
Consider the first row of your alphas and your z.
set.seed(42) # for sake of reproducibility
alphas <- matrix(runif(900), ncol=3, byrow=TRUE)
z <- t(apply(alphas, 1, cumsum))
> alphas[1, ]
[1] 0.9148060 0.9370754 0.2861395
> z[1, ]
[1] 0.914806 1.851881 2.138021
> cbind(alphas[1, 1], mean(c(alphas[1, 1:2])), mean(c(alphas[1, 1:3])))
[,1] [,2] [,3]
[1,] 0.914806 0.9259407 0.7126737
The core of your loop yields
> z[1, ] / 1:ncol(z)
[1] 0.9148060 0.9259407 0.7126737
So each element of a row of z[1, ] will be divided by its corresponding divisor of the vector, yielding the means of the aggregated cells of
Your loop simply does this for your whole z matrix.
Apropos—faster and more convenient in R we do this in a vectorized way within a function. Since you understand apply() you will understand sapply(). Which we will use by first defining a function.
FUN1 <- function(i){
z[i, ] / 1:ncol(z)
}
M <- t(sapply(1:nrow(z), FUN1))
> head(M, 3)
[,1] [,2] [,3]
[1,] 0.9148060 0.9259407 0.7126737
[2,] 0.8304476 0.7360966 0.6637630
[3,] 0.7365883 0.4356275 0.5094157
This yields the same as your loop but in the R way.
In one step we can do this saying
z <- t(sapply(seq_len(nrow(alphas)),
function(i) cumsum(alphas[i, ]) / seq_along(alphas[i, ])))
> head(z, 3)
[,1] [,2] [,3]
[1,] 0.9148060 0.9259407 0.7126737
[2,] 0.8304476 0.7360966 0.6637630
[3,] 0.7365883 0.4356275 0.5094157
I have two matrices.
A<-matrix(c(1,0,2,3),2,2)
B<-matrix(c(0,1,4,2),2,2)
Instead of multiplication (A%*%B) and having results like:
C[1,1]<-A[1,1]*B[1,1]+ A[1,2]*B[2,1]
C[1,2]<-A[1,1]*B[1,2]+ A[1,2]*B[2,2]
C[2,1]<-A[2,1]*B[1,1]+ A[2,2]*B[2,1]
C[2,2]<-A[2,1]*B[1,2]+ A[2,2]*B[2,2]
How can I have a modified version of multiplication and get results like:
C[1,1]<-min(A[1,1],B[1,1])+ min(A[1,2],B[2,1])
C[1,2]<-min(A[1,1],B[1,2])+ min(A[1,2],B[2,2])
C[2,1]<-min(A[2,1],B[1,1])+ min(A[2,2],B[2,1])
C[2,2]<-min(A[2,1],B[1,2])+ min(A[2,2],B[2,2])
?
I know that I can do it with rotation, but am looking for a faster solution.
result <- matrix(nrow= 2, ncol= 2)
for(i in 1:2){
minMat <-t(apply(B,2,function(x) pmin(x, A[i,])))
result[i,]<-rowSums(minMat)
}
A piece of the solution could be to use a function as follow (from one of the comments above):
## Defining the function
sum.min.row <- function(i, A, B) {
minMat <-t(apply(B,2,function(x) pmin(x, A[i,])))
rowSums(minMat)
}
## Applying it to the whole matrix
t(sapply(1:nrow(A), sum.min.row, A, B))
# [,1] [,2]
# [1,] 1 3
# [2,] 1 2
This is still not optimal though...
I have a site by species matrix. The dimensions are 375 x 360. Each value represents the frequency of a species in samples of that site.
I am trying to convert this matrix from frequencies to relative abundances at each site.
I've tried a few ways to achieve this and the only one that has worked is using a for loop. However, this takes an incredibly long time or simply never finishes.
Is there a function or a vectorised method of achieving this? I've included my for-loop as an example of what I am trying to do.
relative_abundance <- matrix(0, nrow= nrow(data_wide),
ncol=ncol(data), dimnames = dimnames(data))
i=0
j=0
for(i in 1:nrow(relative_abundance)){
for(j in 1:ncol(relative_abundance)){
species_freq <- data[i,j]
row_sum <- sum(data[i,])
relative_abundance[i,j] <- species_freq/row_sum
}
}
You could do this using apply, but scale in this case makes things even simplier. Assuming you want to divide columns by their sums:
set.seed(0)
relative_abundance <- matrix(sample(1:10, 360*375, TRUE), nrow= 375)
freqs <- scale(relative_abundance, center = FALSE,
scale = colSums(relative_abundance))
The matrix is too big to output here, but here's how it shoud look like:
> head(freqs[, 1:5])
[,1] [,2] [,3] [,4] [,5]
[1,] 0.004409603 0.0014231499 0.003439803 0.004052685 0.0024026910
[2,] 0.001469868 0.0023719165 0.002457002 0.005065856 0.0004805382
[3,] 0.001959824 0.0018975332 0.004914005 0.001519757 0.0043248438
[4,] 0.002939735 0.0042694497 0.002948403 0.002532928 0.0009610764
[5,] 0.004899559 0.0009487666 0.000982801 0.001519757 0.0028832292
[6,] 0.001469868 0.0023719165 0.002457002 0.002026342 0.0009610764
And a sanity check:
> head(colSums(freqs))
[1] 1 1 1 1 1 1
Using apply:
freqs2 <- apply(relative_abundance, 2, function(i) i/sum(i))
This has the advatange of being easly changed to run by rows, but the results will be joined as columns anyway, so you'd have to transpose it.
Firstly, you could just do
relative_abundance[i,j] <- data[i,j]/sum(data[i,])
so you dont create the variables...
But to vectorise it, I suggest: compute the row sums with rowsum function(fast) and then you can just use apply by columns and each of that divide by the rowsums:
relative_freq<-apply(data,2,function(x) data[,x]/rowsum(data))
Using some simple linear algebra we can produce faster results. Simply multiply on the left by a diagonal matrix with the scaling factors you need, like this:
library(Matrix)
set.seed(0)
relative_abundance <- matrix(sample(1:10, 360*375, TRUE), nrow= 375)
Diagonal_Matrix <- diag(1/rowSums(relative_abundance))
And then we multiply from the left:
row_normalized_matrix <- Diagonal_Matrix %*% relative_abundance
If you want to normalize columnwise simply make:
Diagonal_Matrix <- diag(1/colSums(relative_abundance))
and multiply from the right.
You can do something like this
relative_abundance <- matrix(sample(1:10, 360*375, TRUE), nrow= 375)
datnorm <- relative_abundance/rowSums(relative_abundance)
this will be faster if relative_abundance is a matrix rather than a data.frame
Suppose I have 3 matrices C,W, and S
C <- matrix(1:3)
W <- matrix(2:4)
S <- matrix(3:5)
I want to make a matrix with those matrices as elements. Say matrix K, but each element of matrix K being a matrix itself. Just as a list of matrices works, but instead in a matrix form. I.e.:
> K
[,1] [,2] [,3]
[1,] C 0 0
[2,] 0 W S
C, W and S would each be matrix objects stored inside the larger matrix K.
Ultimately, I would like to be able to then use matrix multiplication like K %*% K or similar.
There are not a lot of classes than can be an element in an R matrix. In particular objects that rely on attributes for their behavior cannot be objects that will retain their essential features. And ironically that includes matrices themselves since their behavior is governed by the dim(ension) attribute. That exclusion applies to dates, factors and specialized lists such as dataframes. You can include lists as index-able items in a matrix, but as #thelatemail's comment points out this will be somewhat clunky.
> C <- matrix(0, 3,2)
> W <- matrix(1, 4,5)
> S <- matrix(2, 6,7)
> bigM <- matrix( list(), 2, 3)
> bigM[1,1] <- list(C)
> bigM[2,2] <- list(W)
> bigM[2,3] <- list(S)
> bigM
[,1] [,2] [,3]
[1,] Numeric,6 NULL NULL
[2,] NULL Numeric,20 Numeric,42
> bigM[2,3][[1]][42]
[1] 2
Notice the need to extract the matrix itself with [[1]] after extracting it as a list with [2,3]. It's only after that additonal step thay you can get the 42nd item in the matrix, whould alos have been the [6,7]th item if you chose to reference it by row,column indices.
I have a 4 dimensional array and I want to fill in the slots with values which are a function of the inputs. Through searching the forums here I found that the function "outer" is helpful for 2x2 matrices but cannot be applied to general multidimensional arrays. Is there anything which can achieve this in R more efficiently than the following code ?
K <- array(0,dim=c(2,2,2,2)) #dimensions will be much larger
for(x1 in 1:2)
{
for(y1 in 1:2)
{
for(x2 in 1:2)
{
for(y2 in 1:2)
{
K[x1,y1,x2,y2] <- x1*y2 - sin(x2*y1) #this is just a dummy function.
}
}
}
}
Thank you in advance for any help.
Edit; Here's what I think will be an even faster solution. It assumes that you have predefined K as you offered. It uses the K[] <- construct to insert values calculated on a dataframe environment. Using the square-brackets on the LHS of the assignment preserves K's structure, and I think it is both vectorized and self-documenting:
dfm <- expand.grid(x1=1:2,x2=1:2,y1=1:2,y2=1:2)
K[] <- with(dfm, x1*y2 - sin(x2*y1 ) )
First solution offered:
If you can create a data.frame or matrix that has the indices x1,x2,y1,y2 and the values you can use the: K[cbind(index-vectors)] <- values construction:
mtx<- data.matrix( expand.grid(x1=1:2,x2=1:2,y1=1:2,y2=1:2) )
K[mtx] <- apply(mtx, 1, function(x) x["x1"]*x["y2"] - sin(x['x2']*x['y1']) )
#----------------
> K
, , 1, 1
[,1] [,2]
[1,] 0.158529 0.09070257
[2,] 1.158529 1.09070257
, , 2, 1
[,1] [,2]
[1,] 0.09070257 1.756802
[2,] 1.09070257 2.756802
, , 1, 2
[,1] [,2]
[1,] 1.158529 1.090703
[2,] 3.158529 3.090703
, , 2, 2
[,1] [,2]
[1,] 1.090703 2.756802
[2,] 3.090703 4.756802