Pairwise calculation in r - r

I have been thinking about a problem I have but I don't know how to express the problem to even search for it. I'd be very thankful if you could explain it to me.
So, I have a data set with the following format:
10 6 4 4
10 6 4 4
7 6 4 4
I want to conduct a pairwise calculation for which I need to sum each element to the other one by one. That is 1 with 2, 1 with 3, 1 with 4, 2 with 3, 2 with 4 and 3 with 4.
I thought to do a nested a loop in R which I read about it and I started like this:
for (i in 1:r-1) { ## r the number of columns
for (j in (i+1):r) {
....
}
I am stuck at this stage, I don't know how to express in codes what I need to do. I am sorry for posting a not progressed code, some advice would be very good that how I should go about it.
Thanks a lot in advance.

Use combn to create the "pairs":
(pairs <- combn(4,2))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 1 2 2 3
[2,] 2 3 4 3 4 4
Then apply across the rows of your data by summing these subsets by applying across the columns of the pairs:
dat <- matrix(c(10,10,7,6,6,6,4,4,4,4,4,4),ncol=4)
t(apply(dat, 1, function(x) apply(combn(4,2),2,function(y) sum(x[y]))))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 16 14 14 10 10 8
[2,] 16 14 14 10 10 8
[3,] 13 11 11 10 10 8

You could slightly modify your loop:
d <- read.table(text='
10 6 4 4
10 6 4 4
7 6 4 4')
nc <- ncol(d)
r <- NULL
for (i in 1:nc) {
for (j in 1:nc) {
if (i < j) { # crucial condition
r <- cbind(r, d[, i] + d[, j]) # calculate new column and bind to calculated ones
}
}
}
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 16 14 14 10 10 8
[2,] 16 14 14 10 10 8
[3,] 13 11 11 10 10 8

Another application of combn but perhaps easier to understand:
apply(combn(ncol(dat),2), 2, function(x) rowSums(dat[,x]))
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 16 14 14 10 10 8
## [2,] 16 14 14 10 10 8
## [3,] 13 11 11 10 10 8
Here, the matrix dat is indexed by each column of the result of combn giving a matrix of two columns (the two columns to be summed). rowSums then does the arithmetic.
Because I really like package functional, here is a slight variation on the above:
apply(combn(ncol(dat),2), 2, Compose(Curry(`[`, dat, i=seq(nrow(dat))), rowSums))
It should be noted that a combn approach is more flexible than using nested for loops for this sort of computation. In particular, it is easily adapted to any number of columns to sum:
f <- function(dat, num=2)
{
apply(combn(ncol(dat),num), 2, function(x) rowSums(dat[,x,drop=FALSE]))
}
This will give all combinations of num columns, and sum them:
f(dat, 1)
## [,1] [,2] [,3] [,4]
## [1,] 10 6 4 4
## [2,] 10 6 4 4
## [3,] 7 6 4 4
f(dat, 2)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 16 14 14 10 10 8
## [2,] 16 14 14 10 10 8
## [3,] 13 11 11 10 10 8
f(dat, 3)
## [,1] [,2] [,3] [,4]
## [1,] 20 20 18 14
## [2,] 20 20 18 14
## [3,] 17 17 15 14
f(dat, 4)
## [,1]
## [1,] 24
## [2,] 24
## [3,] 21

Related

How to rearrange matrices?

I need to create a function, that will rearrange any square matrix based on the values in the matrix.
So if I have matrix like this:
M <- matrix(1:16, ncol = 4)
M
#> [,1] [,2] [,3] [,4]
#> [1,] 1 5 9 13
#> [2,] 2 6 10 14
#> [3,] 3 7 11 15
#> [4,] 4 8 12 16
After rearrangement it needs to look like this:
[,1] [,2] [,3] [,4]
[1,] 1 3 6 10
[2,] 2 5 9 13
[3,] 4 8 12 15
[4,] 7 11 14 16
So it is sorted from lowest (left upper corner) to highest (right lower corner), but the numbers are sorted on diagonal (is that the right word?) not in rows or columns.
I know how to to this "manually", but I can't figure out any rules that this rearrangement operates by.
1) row(m) + col(m) is constant along reverse diagonals so:
M <- replace(m, order(row(m) + col(m)), m)
gving:
> M
[,1] [,2] [,3] [,4]
[1,] 1 3 6 10
[2,] 2 5 9 13
[3,] 4 8 12 15
[4,] 7 11 14 16
It is not clear whether sorted on the diagonal means just that they are unravelled from the storage order onto the reverse diagonals or that they are actually sorted after that within each reverse diagonal. In the example in the question the two interpretations give the same answer; however, if you did wish to sort the result within reverse diagonal afterwards using different data then apply this:
ave(M, row(M) + col(M), FUN = sort)
2) A longer version:
M2 <- matrix(m[order(unlist(tapply(seq_along(m), row(m) + col(m), c)))], nrow(m))
Here's a function columns_to_diagonals in base R that ought to do what you're after. It uses split and unsplit with the appropriate factors.
columns_to_diagonals <- function(M) {
n <- ncol(M)
f <- matrix(rep(1:(2*n-1), c(1:n, (n-1):1)), ncol = n)
m <- split(M, f)
d <- row(M) + col(M)
matrix(unsplit(m, d), ncol = n)
}
First, we may test this on your original case:
M <- matrix(1:16, ncol = 4)
columns_to_diagonals(M)
#> [,1] [,2] [,3] [,4]
#> [1,] 1 3 6 10
#> [2,] 2 5 9 13
#> [3,] 4 8 12 15
#> [4,] 7 11 14 16
And then a larger, randomly permutated matrix, to check that this looks fine as well:
M <- matrix(sample(1:25), ncol = 5)
M
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 4 15 12 10 21
#> [2,] 19 7 5 23 6
#> [3,] 9 17 2 8 1
#> [4,] 3 11 16 25 14
#> [5,] 22 18 20 13 24
columns_to_diagonals(M)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 4 9 15 18 20
#> [2,] 19 22 11 16 25
#> [3,] 3 17 2 8 6
#> [4,] 7 5 23 21 14
#> [5,] 12 10 13 1 24
Created on 2019-12-15 by the reprex package (v0.2.1)

Efficiently reshuffling a long matrix into one consisting of column bound subblocks (of the original) in R

"I have a very long matrix, measuring 30^5 x 3 entries. I basically consists of subblocks of 10.000 30 x 3 matrices, stacked on top of one another. I want to afficiently "cbind" them, next to one another (without looping constructs), leading to a 30 x 30^4 matrix.
Just changing the matrix dimensions does not work, as R fills the new matrix per individual column.
I'm sure there is a very compact, superefficient way of doing this, and I'll slap myself on the forehead as soon as you fill me in on the obvious solution.
Thanks!"
"Just changing the matrix dimensions does not work, as R fills the new matrix per individual column."
```R
test <- matrix(c(1:18), 6, 3, byrow = FALSE)
>test
[,1] [,2] [,3]
[1,] 1 7 13
[2,] 2 8 14
[3,] 3 9 15
[4,] 4 10 16
[5,] 5 11 17
[6,] 6 12 18
dim(test) <- c(3,6)
>test
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 4 7 10 13 16
[2,] 2 5 8 11 14 17
[3,] 3 6 9 12 15 18
```
The output I'm looking for is:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 7 13 4 10 16
[2,] 2 8 14 5 11 17
[3,] 3 9 15 6 12 18
We can create a grouping variable to split the sequence of rows, subset the matrix and then cbind
do.call(cbind, lapply(split(seq_len(nrow(test)),
as.integer(gl(nrow(test), 3, nrow(test)))), function(i) test[i,]))
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 1 7 13 4 10 16
#[2,] 2 8 14 5 11 17
#[3,] 3 9 15 6 12 18

Applying a function that takes columns and rows of matrices as input with a matrix as output without using loop

I would like to write a function that takes columns and rows of matrices as arguments and gives a matrix as an output.
For example, a function that takes rows i of an m by k matrix A and columns j of a k by n matrix B, and return a matrix M with elements m_i,j that equals to min(A[i,] * B[,j]) (element-wise multiplication):
Is there any simple way to avoid using loops? Does an sapply equivalent for matrices exists?
> matrix_A
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 2 3 4 5 6
[3,] 3 4 5 6 7
[4,] 0 1 2 3 4
[5,] 5 6 7 8 9
> matrix_B
[,1] [,2] [,3] [,4] [,5]
[1,] 7 6 5 4 3
[2,] 6 5 4 3 2
[3,] 1 2 3 4 5
[4,] 8 7 6 5 4
[5,] 9 8 7 6 5
>
> output_matrix <- matrix(, nrow=nrow(matrix_A), ncol=ncol(matrix_B))
> for (row_i in 1:nrow(matrix_A)) {
+ for (col_j in 1:ncol(matrix_B)) {
+ output_matrix[row_i, col_j] <- min(matrix_A[row_i,]*matrix_B[,col_j])
+ }
+ }
> output_matrix
[,1] [,2] [,3] [,4] [,5]
[1,] 3 6 5 4 3
[2,] 4 8 10 8 6
[3,] 5 10 15 12 8
[4,] 0 0 0 0 0
[5,] 7 14 21 18 12
>
Using apply from base R,
apply(m2, 2, function(i) apply(m1, 1, function(j) min(j*i)))
which gives,
[,1] [,2] [,3] [,4] [,5]
[1,] 3 6 5 4 3
[2,] 4 8 10 8 6
[3,] 5 10 15 12 8
[4,] 0 0 0 0 0
[5,] 7 14 21 18 12
A fully vectorized solution can be,
t(matrix(do.call(pmin,
as.data.frame(
do.call(rbind, rep(split(m1, 1:nrow(m1)), each = 5)) * do.call(rbind, rep(split(t(m2), 1:nrow(m2)), 5)))),
nrow(m1)))
You can avoid R loops (*apply functions are loops too) for this specific example. Often an efficient solution is possible, but needs a specific algorithm as I demonstrate here. If you don't need to optimize speed, use loops. Your for loop offers the best readability and is easy to understand.
matrix_A <- matrix(c(1,2,3,0,5,
2,3,4,1,6,
3,4,5,2,7,
4,5,6,3,8,
5,6,7,4,9), 5)
matrix_B <- matrix(c(7,6,1,8,9,
6,5,2,7,8,
5,4,3,6,7,
4,3,4,5,6,
3,2,5,4,5), 5)
#all combinations of i and j
inds <- expand.grid(seq_len(nrow(matrix_A)), seq_len(ncol(matrix_B)))
#subset A and transposed B then multiply the resulting matrices
#then calculate rowwise min and turn result into a matrix
library(matrixStats)
matrix(rowMins(matrix_A[inds[[1]],] * t(matrix_B)[inds[[2]],]), nrow(matrix_A))
# [,1] [,2] [,3] [,4] [,5]
#[1,] 3 6 5 4 3
#[2,] 4 8 10 8 6
#[3,] 5 10 15 12 8
#[4,] 0 0 0 0 0
#[5,] 7 14 21 18 12
We use expand.grid to create all possible combinations of row and col pairs. We then use mapply to multiply all the row-column combination element wise and then select the min from it.
mat <- expand.grid(1:nrow(A),1:nrow(B))
mapply(function(x, y) min(matrix_A[x,] * matrix_B[, y]) , mat[,1], mat[,2])
#[1] 3 4 5 0 7 6 8 10 0 14 5 10 15 0 21 4 8 12 0 18 3 6 8 0 12
Assuming matrix_A, matrix_B and output_matrix all have the same dimensions we can relist the output from mapply to get the original dimensions.
output_matrix <- mapply(function(x, y) min(matrix_A[x,] * matrix_B[, y]),
mat[,1], mat[,2])
relist(output_matrix, matrix_A)
# [,1] [,2] [,3] [,4] [,5]
#[1,] 3 6 5 4 3
#[2,] 4 8 10 8 6
#[3,] 5 10 15 12 8
#[4,] 0 0 0 0 0
#[5,] 7 14 21 18 12
Here we use pmap to iterate over the rows and columns of A and B:
library(tidyverse)
pmap_dbl(expand.grid(1:nrow(A), 1:nrow(B)), ~ min(A[..1, ] * B[ , ..2])) %>%
matrix(nrow=5)
[,1] [,2] [,3] [,4] [,5]
[1,] 3 6 5 4 3
[2,] 4 8 10 8 6
[3,] 5 10 15 12 8
[4,] 0 0 0 0 0
[5,] 7 14 21 18 12

Flip the matrix

Hi everyone who loves while hates R:
Let's say you want to turn matrix M
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
to N
[,1] [,2] [,3]
[1,] 3 2 1
[2,] 6 5 4
[3,] 9 8 7
All you need to do is
N<-M[,c(3:1)]
And N's structure is still a matrix
However, when you want to turn matrix M
[,1] [,2] [,3]
[1,] 1 2 3
to N
[,1] [,2] [,3]
[1,] 3 2 1
if you do
N<-M[,c(3:1)]
R will give you
N
[1] 3 2 1
N now is a vector! Not a matrix!
My solution is
N<-M%*%diag(3)[,c(3:1)]
which needs big space to store the identity matrix however.
Any better idea?
You're looking for this:
N<-M[,c(3:1),drop = FALSE]
Read ?Extract for more information. This is also a FAQ. This behavior is one of the most common debates folks have about the way things "should" be in R. My general impression is that many people agree that drop = FALSE might be a more sensible default, but that behavior is so old that changing it would be enormously disruptive to vast swaths of existing code.
A=t(matrix(1:25,5,5))
B=matrix(0,5,5)
for(i in 1:5){
B[i,(nrow(A)+1-i)]=1
}
A
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 2 3 4 5
# [2,] 6 7 8 9 10
# [3,] 11 12 13 14 15
# [4,] 16 17 18 19 20
# [5,] 21 22 23 24 25
A%*%B
# [,1] [,2] [,3] [,4] [,5]
# [1,] 5 4 3 2 1
# [2,] 10 9 8 7 6
# [3,] 15 14 13 12 11
# [4,] 20 19 18 17 16
# [5,] 25 24 23 22 21

Elegant indexing up to end of vector/matrix

Is it possible in R to say - I want all indices from position i to the end of vector/matrix?
Say I want a submatrix from 3rd column onwards. I currently only know this way:
A = matrix(rep(1:8, each = 5), nrow = 5) # just generate some example matrix...
A[,3:ncol(A)] # get submatrix from 3rd column onwards
But do I really need to write ncol(A)? Isn't there any elegant way how to say "from the 3rd column onwards"? Something like A[,3:]? (or A[,3:...])?
Sometimes it's easier to tell R what you don't want. In other words, exclude columns from the matrix using negative indexing:
Here are two alternative ways that both produce the same results:
A[, -(1:2)]
A[, -seq_len(2)]
Results:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
But to answer your question as asked: Use ncol to find the number of columns. (Similarly there is nrow to find the number of rows.)
A[, 3:ncol(A)]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
For rows (not columns as per your example) then head() and tail() could be utilised.
A <- matrix(rep(1:8, each = 5), nrow = 5)
tail(A, 3)
is almost the same as
A[3:dim(A)[1],]
(the rownames/indices printed are different is all).
Those work for vectors and data frames too:
> tail(1:10, 4)
[1] 7 8 9 10
> tail(data.frame(A = 1:5, B = 1:5), 3)
A B
3 3 3
4 4 4
5 5 5
For the column versions, you could adapt tail(), but it is a bit trickier. I wonder if NROW() and NCOL() might be useful here, rather than dim()?:
> A[, 3:NCOL(A)]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
Or flip this on its head and instead of asking R for things, ask it to drop things instead. Here is a function that encapsulates this:
give <- function(x, i, dimen = 1L) {
ind <- seq_len(i-1)
if(isTRUE(all.equal(dimen, 1L))) { ## rows
out <- x[-ind, ]
} else if(isTRUE(all.equal(dimen, 2L))) { ## cols
out <- x[, -ind]
} else {
stop("Only for 2d objects")
}
out
}
> give(A, 3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 2 3 4 5 6 7 8
[2,] 1 2 3 4 5 6 7 8
[3,] 1 2 3 4 5 6 7 8
> give(A, 3, dimen = 2)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
You can use the following instruction:
A[, 3:length(A[, 1])]
A dplyr readable renewed approach for the same thing:
A %>% as_tibble() %>%
select(-c(V1,V2))
A %>% as_tibble() %>%
select(V3:ncol(A))

Resources