Enhancing performance of matrix transformation in R - r

I have a little problem with a fast matrix transformation. That transformation has to be performed a lot of times, so I'm looking for a fast way to this. Imagine a given matrix A and a integer parameter L. Matrix A should be transformed into a new matrix newA with L rows and nrow(A)*ncol(A)/L columns. I take L rows of A and want to transform them into columns. A little example with two possible solutions to this problem newA1 and newA2 where newA1 = newA2 should clarify my explanations:
A = matrix(1:24,6,4,byrow=T) #example matrix
L = 2 #number of rows for new matrix
newA1 = NULL
newA2 = matrix(0,L,nrow(A)*ncol(A)/L)
for(i in 1:(nrow(A)/L)){
newA1 = cbind(newA1,A[((i-1)*L+1):(i*L),]) #slower than newA2
newA2[1:L,((i-1)*ncol(A)+1):(i*ncol(A))] = A[((i-1)*L+1):(i*L),] #faster than newA1
}
The matrices look like that:
> A
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
[4,] 13 14 15 16
[5,] 17 18 19 20
[6,] 21 22 23 24
> newA1
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 2 3 4 9 10 11 12 17 18 19 20
[2,] 5 6 7 8 13 14 15 16 21 22 23 24
L=3 works also in this example.

Related

How to transform hyperspectral 3d array to a 2d matrix in R

I have an array having dimension
128 128 1000
I would like to transform it to a 2d matrix in R
Any suggestions?
enter image description here
This could be done by changing dim of the array:
A <- array(1:(2*3*4), c(2,3,4))
dim(A) <- c(dim(A)[1], prod(dim(A)[2:3]))
A
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
#[1,] 1 3 5 7 9 11 13 15 17 19 21 23
#[2,] 2 4 6 8 10 12 14 16 18 20 22 24
or
A <- array(1:(2*3*4), c(2,3,4))
dim(A) <- c(prod(dim(A)[1:2]), dim(A)[3])
A
# [,1] [,2] [,3] [,4]
#[1,] 1 7 13 19
#[2,] 2 8 14 20
#[3,] 3 9 15 21
#[4,] 4 10 16 22
#[5,] 5 11 17 23
#[6,] 6 12 18 24
In case other orders are needed aperm could be used.

Efficiently reshuffling a long matrix into one consisting of column bound subblocks (of the original) in R

"I have a very long matrix, measuring 30^5 x 3 entries. I basically consists of subblocks of 10.000 30 x 3 matrices, stacked on top of one another. I want to afficiently "cbind" them, next to one another (without looping constructs), leading to a 30 x 30^4 matrix.
Just changing the matrix dimensions does not work, as R fills the new matrix per individual column.
I'm sure there is a very compact, superefficient way of doing this, and I'll slap myself on the forehead as soon as you fill me in on the obvious solution.
Thanks!"
"Just changing the matrix dimensions does not work, as R fills the new matrix per individual column."
```R
test <- matrix(c(1:18), 6, 3, byrow = FALSE)
>test
[,1] [,2] [,3]
[1,] 1 7 13
[2,] 2 8 14
[3,] 3 9 15
[4,] 4 10 16
[5,] 5 11 17
[6,] 6 12 18
dim(test) <- c(3,6)
>test
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 4 7 10 13 16
[2,] 2 5 8 11 14 17
[3,] 3 6 9 12 15 18
```
The output I'm looking for is:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 7 13 4 10 16
[2,] 2 8 14 5 11 17
[3,] 3 9 15 6 12 18
We can create a grouping variable to split the sequence of rows, subset the matrix and then cbind
do.call(cbind, lapply(split(seq_len(nrow(test)),
as.integer(gl(nrow(test), 3, nrow(test)))), function(i) test[i,]))
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 1 7 13 4 10 16
#[2,] 2 8 14 5 11 17
#[3,] 3 9 15 6 12 18

Taking the transpose of square blocks in a rectangular matrix r

Suppose I have two square matrices (actually many more) that are bound together:
mat = matrix(1:18,nrow=3,ncol=6)
mat
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 4 7 10 13 16
[2,] 2 5 8 11 14 17
[3,] 3 6 9 12 15 18
I want to take the transpose of each (3x3) matrix and keep them glued side by side, so the result is:
mat2
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 10 11 12
[2,] 4 5 6 13 14 15
[3,] 7 8 9 16 17 18
I do not want to do this manually because it is MANY matrices cbound together, not just 2.
I would like a solution that avoids looping or apply (which is just a wrapper for a loop). I need the efficient solution because this will have to run tens of thousands of times.
One way is to use matrix indexing
matrix(t(m), nrow=nrow(m))[, c(matrix(1:ncol(m), nrow(m), byrow=T)) ]
This takes the transposed matrix and rearanges the columns in the desired order.
m <- matrix(1:18,nrow=3,ncol=6)
matrix(t(m), nrow=nrow(m))
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 10 2 11 3 12
# [2,] 4 13 5 14 6 15
# [3,] 7 16 8 17 9 18
So we want the 1st, 3rd, and 5th columns, and 2, 4, and 6th columns together.
One way is to index these with
c(matrix(1:ncol(m), nrow(m), byrow=T))
#[1] 1 3 5 2 4 6
As an alternative, you could use
idx <- rep(1:ncol(m), each=nrow(m), length=ncol(m)) ;
do.call(cbind, split.data.frame(t(m), idx))
Try on a new matrix
(m <- matrix(1:50, nrow=5))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 6 11 16 21 26 31 36 41 46
# [2,] 2 7 12 17 22 27 32 37 42 47
# [3,] 3 8 13 18 23 28 33 38 43 48
# [4,] 4 9 14 19 24 29 34 39 44 49
# [5,] 5 10 15 20 25 30 35 40 45 50
matrix(t(m), nrow=nrow(m))[, c(matrix(1:ncol(m), nrow(m), byrow=T)) ]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 2 3 4 5 26 27 28 29 30
# [2,] 6 7 8 9 10 31 32 33 34 35
# [3,] 11 12 13 14 15 36 37 38 39 40
# [4,] 16 17 18 19 20 41 42 43 44 45
# [5,] 21 22 23 24 25 46 47 48 49 50
This might do it:
mat = matrix(1:18,nrow=3,ncol=6)
mat
output <- lapply(seq(3, ncol(mat), 3), function(i) { t(mat[, c((i - 2):i)]) } )
output
do.call(cbind, output)
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 1 2 3 10 11 12
#[2,] 4 5 6 13 14 15
#[3,] 7 8 9 16 17 18
Was curious and timed the two approaches. The matrix approach used by user20650 is much faster than the lapply approach I used:
library(microbenchmark)
mat = matrix(1:1600, nrow=4, byrow = FALSE)
lapply.function <- function(x) {
step1 <- lapply(seq(nrow(mat), ncol(mat), nrow(mat)), function(i) {
t(mat[, c((i - (nrow(mat) - 1) ):i)])
} )
l.output <- do.call(cbind, step1)
return(l.output)
}
lapply.output <- lapply.function(mat)
matrix.function <- function(x) {
m.output <- matrix(t(mat), nrow=nrow(mat))[, c(matrix(1:ncol(mat), nrow(mat), byrow=TRUE)) ]
}
matrix.output <- matrix.function(mat)
identical(lapply.function(mat), matrix.function(mat))
microbenchmark(lapply.function(mat), matrix.function(mat), times = 1000)
#Unit: microseconds
# expr min lq mean median uq max neval
# lapply.function(mat) 735.602 776.652 824.44917 791.443 809.856 2260.834 1000
# matrix.function(mat) 32.298 35.619 37.75495 36.826 37.732 78.481 1000

Pairwise calculation in r

I have been thinking about a problem I have but I don't know how to express the problem to even search for it. I'd be very thankful if you could explain it to me.
So, I have a data set with the following format:
10 6 4 4
10 6 4 4
7 6 4 4
I want to conduct a pairwise calculation for which I need to sum each element to the other one by one. That is 1 with 2, 1 with 3, 1 with 4, 2 with 3, 2 with 4 and 3 with 4.
I thought to do a nested a loop in R which I read about it and I started like this:
for (i in 1:r-1) { ## r the number of columns
for (j in (i+1):r) {
....
}
I am stuck at this stage, I don't know how to express in codes what I need to do. I am sorry for posting a not progressed code, some advice would be very good that how I should go about it.
Thanks a lot in advance.
Use combn to create the "pairs":
(pairs <- combn(4,2))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 1 2 2 3
[2,] 2 3 4 3 4 4
Then apply across the rows of your data by summing these subsets by applying across the columns of the pairs:
dat <- matrix(c(10,10,7,6,6,6,4,4,4,4,4,4),ncol=4)
t(apply(dat, 1, function(x) apply(combn(4,2),2,function(y) sum(x[y]))))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 16 14 14 10 10 8
[2,] 16 14 14 10 10 8
[3,] 13 11 11 10 10 8
You could slightly modify your loop:
d <- read.table(text='
10 6 4 4
10 6 4 4
7 6 4 4')
nc <- ncol(d)
r <- NULL
for (i in 1:nc) {
for (j in 1:nc) {
if (i < j) { # crucial condition
r <- cbind(r, d[, i] + d[, j]) # calculate new column and bind to calculated ones
}
}
}
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 16 14 14 10 10 8
[2,] 16 14 14 10 10 8
[3,] 13 11 11 10 10 8
Another application of combn but perhaps easier to understand:
apply(combn(ncol(dat),2), 2, function(x) rowSums(dat[,x]))
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 16 14 14 10 10 8
## [2,] 16 14 14 10 10 8
## [3,] 13 11 11 10 10 8
Here, the matrix dat is indexed by each column of the result of combn giving a matrix of two columns (the two columns to be summed). rowSums then does the arithmetic.
Because I really like package functional, here is a slight variation on the above:
apply(combn(ncol(dat),2), 2, Compose(Curry(`[`, dat, i=seq(nrow(dat))), rowSums))
It should be noted that a combn approach is more flexible than using nested for loops for this sort of computation. In particular, it is easily adapted to any number of columns to sum:
f <- function(dat, num=2)
{
apply(combn(ncol(dat),num), 2, function(x) rowSums(dat[,x,drop=FALSE]))
}
This will give all combinations of num columns, and sum them:
f(dat, 1)
## [,1] [,2] [,3] [,4]
## [1,] 10 6 4 4
## [2,] 10 6 4 4
## [3,] 7 6 4 4
f(dat, 2)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 16 14 14 10 10 8
## [2,] 16 14 14 10 10 8
## [3,] 13 11 11 10 10 8
f(dat, 3)
## [,1] [,2] [,3] [,4]
## [1,] 20 20 18 14
## [2,] 20 20 18 14
## [3,] 17 17 15 14
f(dat, 4)
## [,1]
## [1,] 24
## [2,] 24
## [3,] 21

How to index by 3's in a for loop R

I have a matrix that is 39 columns wide and I want to get the average values across rows for the first three columns then next three ect. so I would have 13 columns total after everything was done. Triple would be the indexes I would like to use but it just makes a vector from 1:39.
Triple <- c(1:3, 4:6, 7:9, 10:12, 13:15, 16:18, 19:21, 22:24, 25:27, 28:30, 31:33, 34:36, 37:39)
AveFPKM <- matrix(nrow=54175, ncol=13)
for (i in 1:39){
Ave <- rowMeans(AllFPKM[,i:i+2])
AveFPKM[,i] <- Ave
i+2
}
Thanks for the help
With some specifying of dimensions and apply-ing, you can pretty easily get your result. Here's a smaller example:
test <- matrix(1:36,ncol=12)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
#[1,] 1 4 7 10 13 16 19 22 25 28 31 34
#[2,] 2 5 8 11 14 17 20 23 26 29 32 35
#[3,] 3 6 9 12 15 18 21 24 27 30 33 36
Now get the mean of each row in each block of three columns:
apply(structure(test,dim=c(3,3,4)),c(1,3),mean)
# [,1] [,2] [,3] [,4]
#[1,] 4 13 22 31
#[2,] 5 14 23 32
#[3,] 6 15 24 33
Or generally, assuming your number of columns is always exactly divisible by the group size:
grp.row.mean <- function(x,grpsize) {
apply(structure(x,dim=c(nrow(x),grpsize,ncol(x)/grpsize)),c(1,3),mean)
}
grp.row.mean(test,3)
Here's a solution using sapply, taking advantage of the fact we know the number of columns is exactly a multiple of 3:
sapply(1:13, function(x) {
i <- (x-1)*3 + 1 # Get the actual starting index
rowMeans(AveFPKM[,i:(i+2)])
})

Resources