Taking the transpose of square blocks in a rectangular matrix r - r

Suppose I have two square matrices (actually many more) that are bound together:
mat = matrix(1:18,nrow=3,ncol=6)
mat
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 4 7 10 13 16
[2,] 2 5 8 11 14 17
[3,] 3 6 9 12 15 18
I want to take the transpose of each (3x3) matrix and keep them glued side by side, so the result is:
mat2
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 10 11 12
[2,] 4 5 6 13 14 15
[3,] 7 8 9 16 17 18
I do not want to do this manually because it is MANY matrices cbound together, not just 2.
I would like a solution that avoids looping or apply (which is just a wrapper for a loop). I need the efficient solution because this will have to run tens of thousands of times.

One way is to use matrix indexing
matrix(t(m), nrow=nrow(m))[, c(matrix(1:ncol(m), nrow(m), byrow=T)) ]
This takes the transposed matrix and rearanges the columns in the desired order.
m <- matrix(1:18,nrow=3,ncol=6)
matrix(t(m), nrow=nrow(m))
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 10 2 11 3 12
# [2,] 4 13 5 14 6 15
# [3,] 7 16 8 17 9 18
So we want the 1st, 3rd, and 5th columns, and 2, 4, and 6th columns together.
One way is to index these with
c(matrix(1:ncol(m), nrow(m), byrow=T))
#[1] 1 3 5 2 4 6
As an alternative, you could use
idx <- rep(1:ncol(m), each=nrow(m), length=ncol(m)) ;
do.call(cbind, split.data.frame(t(m), idx))
Try on a new matrix
(m <- matrix(1:50, nrow=5))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 6 11 16 21 26 31 36 41 46
# [2,] 2 7 12 17 22 27 32 37 42 47
# [3,] 3 8 13 18 23 28 33 38 43 48
# [4,] 4 9 14 19 24 29 34 39 44 49
# [5,] 5 10 15 20 25 30 35 40 45 50
matrix(t(m), nrow=nrow(m))[, c(matrix(1:ncol(m), nrow(m), byrow=T)) ]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 2 3 4 5 26 27 28 29 30
# [2,] 6 7 8 9 10 31 32 33 34 35
# [3,] 11 12 13 14 15 36 37 38 39 40
# [4,] 16 17 18 19 20 41 42 43 44 45
# [5,] 21 22 23 24 25 46 47 48 49 50

This might do it:
mat = matrix(1:18,nrow=3,ncol=6)
mat
output <- lapply(seq(3, ncol(mat), 3), function(i) { t(mat[, c((i - 2):i)]) } )
output
do.call(cbind, output)
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 1 2 3 10 11 12
#[2,] 4 5 6 13 14 15
#[3,] 7 8 9 16 17 18
Was curious and timed the two approaches. The matrix approach used by user20650 is much faster than the lapply approach I used:
library(microbenchmark)
mat = matrix(1:1600, nrow=4, byrow = FALSE)
lapply.function <- function(x) {
step1 <- lapply(seq(nrow(mat), ncol(mat), nrow(mat)), function(i) {
t(mat[, c((i - (nrow(mat) - 1) ):i)])
} )
l.output <- do.call(cbind, step1)
return(l.output)
}
lapply.output <- lapply.function(mat)
matrix.function <- function(x) {
m.output <- matrix(t(mat), nrow=nrow(mat))[, c(matrix(1:ncol(mat), nrow(mat), byrow=TRUE)) ]
}
matrix.output <- matrix.function(mat)
identical(lapply.function(mat), matrix.function(mat))
microbenchmark(lapply.function(mat), matrix.function(mat), times = 1000)
#Unit: microseconds
# expr min lq mean median uq max neval
# lapply.function(mat) 735.602 776.652 824.44917 791.443 809.856 2260.834 1000
# matrix.function(mat) 32.298 35.619 37.75495 36.826 37.732 78.481 1000

Related

array_reshape() not reshaping arrays as desired

I have "old_array", I want to reshape it to become "new_array" using array_reshape()
old_array <- array(seq(1,30,1),c(2,3,5))
new_array <- t(array(seq(1,30,1),c(6,5)))
The old_array is:
, , 1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
, , 3
[,1] [,2] [,3]
[1,] 13 15 17
[2,] 14 16 18
, , 4
[,1] [,2] [,3]
[1,] 19 21 23
[2,] 20 22 24
, , 5
[,1] [,2] [,3]
[1,] 25 27 29
[2,] 26 28 30
The new_array is:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
[2,] 7 8 9 10 11 12
[3,] 13 14 15 16 17 18
[4,] 19 20 21 22 23 24
[5,] 25 26 27 28 29 30
I tried the following code, however the reshaped array is not the way I want:
array_reshape(old_array,c(6,5))
Expected results:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
[2,] 7 8 9 10 11 12
[3,] 13 14 15 16 17 18
[4,] 19 20 21 22 23 24
[5,] 25 26 27 28 29 30
Actual results:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 7 13 19 25
[2,] 3 9 15 21 27
[3,] 5 11 17 23 29
[4,] 2 8 14 20 26
[5,] 4 10 16 22 28
[6,] 6 12 18 24 30
You can call matrix and specify the dimensions you desire. Given how R fills matrices, you need to specify byrow = TRUE in this scenario:
old_array <- array(seq(1,30,1),c(2,3,5))
matrix(old_array, nrow = dim(old_array)[3], ncol = prod(dim(old_array)[1:2]), byrow = TRUE)
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 1 2 3 4 5 6
#> [2,] 7 8 9 10 11 12
#> [3,] 13 14 15 16 17 18
#> [4,] 19 20 21 22 23 24
#> [5,] 25 26 27 28 29 30
Created on 2019-03-31 by the reprex package (v0.2.1)
First, you want a 5x6 matrix. Your current code is asking for a return of 6x5 so you would want to write
array_reshape(old_array, c(5,6))
However, your old_array is returning 5 different matrices all 2x3. Since you are filling it in by row, it looks like array_reshape will take the first value from each row of each separate matrix and then since you are telling it to have 6 columns it then grabs the second value of the first matrix to fill out the first row of the 5x6. It then repeats this pattern to fill in the other 4 rows. This will return:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 7 13 19 25 3
[2,] 9 15 21 27 5 11
[3,] 17 23 29 2 8 14
[4,] 20 26 4 10 16 22
[5,] 28 6 12 18 24 30
Can you remove c(2,3,5) from your old_array line? It will work fine then. Otherwise array_reshape is not the appropriate function in this case. But, if you really want to use it, you can tell it to fill a 6x5 matrix by column and then transpose the matrix. This will give you the result you want:
t(array_reshape(old_array, c(6,5), "F"))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
[2,] 7 8 9 10 11 12
[3,] 13 14 15 16 17 18
[4,] 19 20 21 22 23 24
[5,] 25 26 27 28 29 30

Calculate the max and the min difference between two adjacent numbers

Having a matrix A like:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 11 14 17 20 23 26
[2,] 12 15 18 21 24 27
[3,] 13 16 19 22 25 28
I want to calculate the max and the min difference between two adjacent numbers in all rows.
And then filter to limit only rows where adjacent numbers min is less between 4 and 7 and max is between 6 an 12
The output should return no row.
For the following matrix:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 11 16 17 28 23 26
[2,] 12 15 18 21 24 27
[3,] 13 16 19 22 25 28
the result should be row 1
You could approach this as follows:
d <- abs(apply(m, 1, diff))
m[apply(d, 2, min) %in% 4:7 & apply(d, 2, max) %in% 6:12,]
which gives:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 21 15 22 13 23 17
[2,] 27 18 13 25 16 11
Used data:
set.seed(2)
m <- matrix(sample(11:28, 54, TRUE), nrow = 9)
Here is a vectorized approach
library(matrixStats)
m1 <- abs(m[,-ncol(m) ] - m[,-1])
m[rowMins(m1) %in% 4:7 & rowMaxs(m1) %in% 6:12,]
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 21 15 22 13 23 17
#[2,] 27 18 13 25 16 11
data
set.seed(2)
m <- matrix(sample(11:28, 54, TRUE), nrow = 9)

R: calculating with different matrices

I am using the following packages:
library("quantmod")
library("PerformanceAnalytics")
library("termstrc")
Data:
AAA <- matrix(sample(30), ncol = 10)
BBB <- matrix(sample(30), ncol = 10)
CCC <- matrix(sample(30), ncol = 10)
with
print(AAA)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 28 18 16 10 20 21 23 27 5 6
[2,] 19 22 24 13 17 14 15 30 4 8
[3,] 1 25 11 2 29 9 3 7 12 26
> print(BBB)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 26 22 24 21 23 25 11 17 8 13
[2,] 14 18 16 28 12 1 10 6 20 15
[3,] 9 4 30 7 5 27 2 3 19 29
> print(CCC)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 6 29 9 24 26 10 12 21 5 22
[2,] 14 4 28 19 8 23 20 27 16 1
[3,] 7 17 13 18 30 2 3 15 11 25
Now I have the following problem:
There are 3 matrices (AAA, BBB and CCC), these matrices have all the same nummer of observations (3 obs. and 10 var.).
I calculated the min- & max-position for each row or observation in "AAA" (min/max for time t).
Calculated the following:
maxAAA_pos <- max.col(AAA)
minAAA_pos <- max.col(-AAA)
Result:
> print(maxAAA_pos)
[1] 1 8 5
> print(minAAA_pos)
[1] 9 9 1
The position of these min/max variables are telling me now which variable I have to take from the matrices BBB and CCC to calculate the following (example for the 1 observation):
Ft = variable from BBB at time t
St+1 = variable from CCC at time t+1
Result_max = (Ft / St+1) - 1
Result_min = 1 - (Ft / St+1)
My problem now is to select "Ft" and "St+1", which are given from the positions min/max variables from AAA and in the vector maxAAA_pos and minAAA_pos at time t.
This means the calculation should look like this for t=1 or the first observation:
Result_max = (26 / 14) - 1
Result_min = 1 - (8 / 16)
Thanks in advanced!
I found a solution, maybe the way is a bit complicated, but it works....
> AAA <- matrix(sample(30), ncol = 10)
> BBB <- matrix(sample(30), ncol = 10)
> CCC <- matrix(sample(40), ncol = 10)
>
> print(AAA)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 27 12 15 5 3 25 16 28 11 19
[2,] 4 10 14 2 17 21 13 22 24 26
[3,] 23 1 9 30 18 6 7 29 20 8
> print(BBB)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 20 3 23 2 8 17 19 22 15
[2,] 16 26 4 30 6 10 13 7 24 27
[3,] 18 28 5 11 21 9 12 29 25 14
> print(CCC)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 17 29 11 28 21 39 25 34 1 37
[2,] 36 8 5 19 6 26 33 32 14 3
[3,] 15 16 2 27 22 35 30 40 7 9
[4,] 20 4 12 23 31 24 10 18 38 13
>
>
> maxAAA_pos <- max.col(AAA)
> minAAA_pos <- max.col(-AAA)
>
>
> print(maxAAA_pos)
[1] 8 10 4
> print(minAAA_pos)
[1] 5 4 2
>
>
> pos_AAAmax <- cbind(1:3, maxAAA_pos)
> pos_AAAmin <- cbind(1:3, minAAA_pos)
>
>
> returnmax <- function(Ft, St) {
+ (Ft/St)-1
+ }
>
> returnmin <- function(Ft, St) { 1-(Ft/St)}
>
> returnmax(BBB[pos_AAAmax], CCC[cbind(pos_AAAmax[ ,1]+1, maxAAA_pos)])
[1] -0.4062500 2.0000000 -0.5217391
> returnmin(BBB[pos_AAAmin], CCC[cbind(pos_AAAmin[ ,1]+1, minAAA_pos)])
[1] 0.6666667 -0.1111111 -6.0000000

Enhancing performance of matrix transformation in R

I have a little problem with a fast matrix transformation. That transformation has to be performed a lot of times, so I'm looking for a fast way to this. Imagine a given matrix A and a integer parameter L. Matrix A should be transformed into a new matrix newA with L rows and nrow(A)*ncol(A)/L columns. I take L rows of A and want to transform them into columns. A little example with two possible solutions to this problem newA1 and newA2 where newA1 = newA2 should clarify my explanations:
A = matrix(1:24,6,4,byrow=T) #example matrix
L = 2 #number of rows for new matrix
newA1 = NULL
newA2 = matrix(0,L,nrow(A)*ncol(A)/L)
for(i in 1:(nrow(A)/L)){
newA1 = cbind(newA1,A[((i-1)*L+1):(i*L),]) #slower than newA2
newA2[1:L,((i-1)*ncol(A)+1):(i*ncol(A))] = A[((i-1)*L+1):(i*L),] #faster than newA1
}
The matrices look like that:
> A
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
[4,] 13 14 15 16
[5,] 17 18 19 20
[6,] 21 22 23 24
> newA1
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 2 3 4 9 10 11 12 17 18 19 20
[2,] 5 6 7 8 13 14 15 16 21 22 23 24
L=3 works also in this example.

How to index by 3's in a for loop R

I have a matrix that is 39 columns wide and I want to get the average values across rows for the first three columns then next three ect. so I would have 13 columns total after everything was done. Triple would be the indexes I would like to use but it just makes a vector from 1:39.
Triple <- c(1:3, 4:6, 7:9, 10:12, 13:15, 16:18, 19:21, 22:24, 25:27, 28:30, 31:33, 34:36, 37:39)
AveFPKM <- matrix(nrow=54175, ncol=13)
for (i in 1:39){
Ave <- rowMeans(AllFPKM[,i:i+2])
AveFPKM[,i] <- Ave
i+2
}
Thanks for the help
With some specifying of dimensions and apply-ing, you can pretty easily get your result. Here's a smaller example:
test <- matrix(1:36,ncol=12)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
#[1,] 1 4 7 10 13 16 19 22 25 28 31 34
#[2,] 2 5 8 11 14 17 20 23 26 29 32 35
#[3,] 3 6 9 12 15 18 21 24 27 30 33 36
Now get the mean of each row in each block of three columns:
apply(structure(test,dim=c(3,3,4)),c(1,3),mean)
# [,1] [,2] [,3] [,4]
#[1,] 4 13 22 31
#[2,] 5 14 23 32
#[3,] 6 15 24 33
Or generally, assuming your number of columns is always exactly divisible by the group size:
grp.row.mean <- function(x,grpsize) {
apply(structure(x,dim=c(nrow(x),grpsize,ncol(x)/grpsize)),c(1,3),mean)
}
grp.row.mean(test,3)
Here's a solution using sapply, taking advantage of the fact we know the number of columns is exactly a multiple of 3:
sapply(1:13, function(x) {
i <- (x-1)*3 + 1 # Get the actual starting index
rowMeans(AveFPKM[,i:(i+2)])
})

Resources