'R' Matrix subset without loop - r

I started using R about six months back and i have gained a little bit of experience in R. Recently, I ran into an issue regarding subsets within a matrix and would like assistance on making the solution that I have more efficient.
What I would like to do is the following. Suppose I have a matrix and two vectors as follows:
# matrix
a <- matrix(seq(1,100,by=1),10,10)
# vector (first column of matrix a)
b <- c(2,4,5,6,7,8)
# vector (column numbers of matrix a)
c <- c(5,3,1,4,6,2)
Just to reiterate,
Vector b refers to the first column of matrix a.
Vector c refers to column numbers of matrix a.
I would like to get tmp99 <- a[b,c:8]. However, when I do that I get the following warning message.
Warning message:
In c:8 : numerical expression has 6 elements: only the
first used (index has to be scalar and not vector)
So, I tried working around the problem using loops and list and I get the solution I want. I am assuming that there is a more time efficient solution than this. The solution what I have so far is the following:
a <- matrix(seq(1,100,by=1),10,10)
b <- c(2,4,5,6,7,8)
c <- c(5,3,1,4,6,2)
tmp <- list()
for (i in 1:length(b)) tmp[[i]] <- c(a[b[i],(c[i]:8)])
tmp99 <- t(sapply(tmp, '[', 1:max(sapply(tmp, length))))
tmp99[is.na(tmp99)] <- 0
What I would like to know is if there is a way to avoid using loops to achieve the above because my matrix dimension is 200000 x 200 and since I have to do this a lot (In my problem, b and c are determined as part of another part of the code and so I am not able to use absolute index numbers), I would like to cut down the time taken for the same. Any help will be greatly appreciated. Thank you.

You might try some kind of matrix indexing solution, like this. It's not clear if it will actually be faster or not; in small cases, I think it definitely will be, but in big cases, the overhead from creating the matrixes to index by might take longer than just running through a for loop. To get a better answer, make up a data set that is similar to yours that we could test against.
idx.in <- cbind(rep(b, 8-c+1), unlist(lapply(c, function(x) x:8)))
idx.out <- cbind(rep(seq_along(b), 8-c+1), unlist(lapply(c, function(x) 1:(8-x+1))))
tmp99 <- array(0, dim=apply(idx.out, 2, max))
tmp99[idx.out] <- a[idx.in]
Here's a version with matrix indexing but that does it separately for each row. This might be faster, depending on how many rows and columns are being replaced. What you want to avoid is running out of memory, which the for loop can help with, as it doesn't keep all the details for each step in memory at the same time.
out <- array(0, dim=c(length(b), 8-min(c)+1))
for(idx in seq_along(b)) {
out[cbind(idx, 1:(8-c[idx]+1))] <- a[cbind(b[idx], c[idx]:8)]
}
out

Following is one way to do it using base packages. There might be better solution using data.table but following works :)
a <- matrix(seq(1, 100, by = 1), 10, 10)
b <- c(2, 4, 5, 6, 7, 8)
c <- c(5, 3, 1, 4, 6, 2)
res <- t(sapply(X = mapply(FUN = function(b, c) expand.grid(b, seq(from = c, to = 8)), b, c, SIMPLIFY = FALSE), FUN = function(x) {
c(a[as.matrix(x)], rep(0, 8 - nrow(x)))
}))
res
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 42 52 62 72 0 0 0 0
## [2,] 24 34 44 54 64 74 0 0
## [3,] 5 15 25 35 45 55 65 75
## [4,] 36 46 56 66 76 0 0 0
## [5,] 57 67 77 0 0 0 0 0
## [6,] 18 28 38 48 58 68 78 0
# Let's break it down in multiple steps.
coordinates <- mapply(FUN = function(b, c) expand.grid(b, seq(from = c, to = 8)), b, c, SIMPLIFY = FALSE)
# below sapply subsets c using each element in coordinates and pads result with additional 0s such that total 8 elements are returned.
res <- sapply(X = coordinates, FUN = function(x) {
c(a[as.matrix(x)], rep(0, 8 - nrow(x)))
})
res
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 42 24 5 36 57 18
## [2,] 52 34 15 46 67 28
## [3,] 62 44 25 56 77 38
## [4,] 72 54 35 66 0 48
## [5,] 0 64 45 76 0 58
## [6,] 0 74 55 0 0 68
## [7,] 0 0 65 0 0 78
## [8,] 0 0 75 0 0 0
# you probably need result as traspose
res <- t(res)
res
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 42 52 62 72 0 0 0 0
## [2,] 24 34 44 54 64 74 0 0
## [3,] 5 15 25 35 45 55 65 75
## [4,] 36 46 56 66 76 0 0 0
## [5,] 57 67 77 0 0 0 0 0
## [6,] 18 28 38 48 58 68 78 0

tmp <- lapply(seq_len(length(b)),function(i) {
res <- a[b[i],c[i]:8]
res <- c(res,rep(0,c[i]-1))
res
})
tmp99 <- do.call("rbind",tmp)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
# [1,] 42 52 62 72 0 0 0 0
# [2,] 24 34 44 54 64 74 0 0
# [3,] 5 15 25 35 45 55 65 75
# [4,] 36 46 56 66 76 0 0 0
# [5,] 57 67 77 0 0 0 0 0
# [6,] 18 28 38 48 58 68 78 0

Related

Extract submatrix from matrix

I create a matrix in R with 10x10 (10 rows and 10 columns):
matriz <- matrix(1:100, nrow = 10, ncol = 10, byrow=T)
I want to extract square submatrices (3x3) from matrix (matriz), randomly and without overlap.
I see a package in R named "subset.matrix", but I couldn't in randomly matrix.
Any suggestion?
You can define the following function f
f <- function(mat, submat.size = 3) {
ridx <- Filter(function(x) length(x) == submat.size, split(sample(seq(nrow(mat))), ceiling(seq(nrow(mat)) / submat.size)))
cidx <- Filter(function(x) length(x) == submat.size, split(sample(seq(ncol(mat))), ceiling(seq(ncol(mat)) / submat.size)))
replicate(2, mat[ridx[[sample(length(ridx), 1)]], cidx[[sample(length(cidx), 1)]]], simplify = FALSE)
}
and this function enables you to generate a pair of sub-matrices which are random and non-overlapped.
Example Result
> f(matriz)
[[1]]
[,1] [,2] [,3]
[1,] 68 67 70
[2,] 38 37 40
[3,] 88 87 90
[[2]]
[,1] [,2] [,3]
[1,] 63 62 69
[2,] 33 32 39
[3,] 83 82 89
If you want all possible exclusive random sub-matrices each time, you can try
f2 <- function(mat, submat.size = 3) {
ridx <- Filter(function(x) length(x) == submat.size, split(sample(seq(nrow(mat))), ceiling(seq(nrow(mat)) / submat.size)))
cidx <- Filter(function(x) length(x) == submat.size, split(sample(seq(ncol(mat))), ceiling(seq(ncol(mat)) / submat.size)))
r <- list()
for (i in seq_along(ridx)) {
for (j in seq_along(cidx)) {
r[[length(r) + 1]] <- mat[ridx[[i]], cidx[[j]]]
}
}
r
}
and you will obtain
> f2(matriz)
[[1]]
[,1] [,2] [,3]
[1,] 3 6 5
[2,] 63 66 65
[3,] 83 86 85
[[2]]
[,1] [,2] [,3]
[1,] 2 8 4
[2,] 62 68 64
[3,] 82 88 84
[[3]]
[,1] [,2] [,3]
[1,] 1 10 7
[2,] 61 70 67
[3,] 81 90 87
[[4]]
[,1] [,2] [,3]
[1,] 13 16 15
[2,] 33 36 35
[3,] 23 26 25
[[5]]
[,1] [,2] [,3]
[1,] 12 18 14
[2,] 32 38 34
[3,] 22 28 24
[[6]]
[,1] [,2] [,3]
[1,] 11 20 17
[2,] 31 40 37
[3,] 21 30 27
[[7]]
[,1] [,2] [,3]
[1,] 43 46 45
[2,] 53 56 55
[3,] 73 76 75
[[8]]
[,1] [,2] [,3]
[1,] 42 48 44
[2,] 52 58 54
[3,] 72 78 74
[[9]]
[,1] [,2] [,3]
[1,] 41 50 47
[2,] 51 60 57
[3,] 71 80 77
I agree with the comment from user2974951 regarding randomness. However, this code block will do what you asked.
matriz <- matrix(1:100, nrow = 10, ncol = 10, byrow=T)
attempts <- 50
# Initialize a list to hold the results
sub_mats <- vector(mode = "list", length = attempts)
# The top left corner of the matrix can't have an index > 8
rand_x <- sample(1:8, attempts, replace = T)
rand_y <- sample(1:8, attempts, replace = T)
for (i in 1:attempts) {
# Get the three-length vectors
x_range <- rand_x[i] : (rand_x[i] + 2)
y_range <- rand_y[i] : (rand_y[i] + 2)
# Subset the matrix
sub_mat <- matriz[x_range, y_range]
# We'll use NAs to mark submatrices from previous loops
if (any(is.na(sub_mat))) next
# If there's no overlap, add it to the list
sub_mats[[i]] <- sub_mat
# Set this submatrix as NAs
matriz[x_range, y_range] <- rep(NA, 9)
}
# Remove failed attempts
sub_mats <- sub_mats[!sapply(sub_mats, is.null)]
Instead of a set number of attempts for the loop, you could use a counter. With 50 attempts, I get 4-6 sub-matrices. 1000 gives 6-8.

Is there a way to vectorize selection of columns (with repetition) from a matrix?

I have a matrix L of size n x k and a vector Z of size p. Z is composed of integers which represent the column indices of L. I want to create a matrix X of size n x p which is the aggregation of the corresponding columns of L selected based on the values in Z.
Z = c(1, 3, 1, 2)
L = matrix(c(73,50,4,14,87,5,34,51,17,57,47,65),nrow=4)
> L
[,1] [,2] [,3]
[1,] 73 87 17
[2,] 50 5 57
[3,] 4 34 47
[4,] 14 51 65
I want X to be
> X
[,1] [,2] [,3] [,4]
[1,] 73 17 73 87
[2,] 50 57 50 5
[3,] 4 47 4 34
[4,] 14 65 14 51
In my original data, p, k and n are quite big (30K, 500 and 2K, respectively), and a loop over all Z values to select and combine the columns from L takes a very long time. Can there be a vectorized way (no loops) to do this task?
Pretty sure this is just:
L[,Z]
# [,1] [,2] [,3] [,4]
#[1,] 73 17 73 87
#[2,] 50 57 50 5
#[3,] 4 47 4 34
#[4,] 14 65 14 51
R doesn't care if you have repeating column indexes when you do selections from most objects.

Multiply specific columns of one matrix with specific columns of another matrix for many indices

I have two large matrices P and Q around (10k x 50k dim in both, but to test this yourself a random 10x10 matrix for P and Q is sufficient). I have a list of indices, e.g.
i j
1 4
1 625
1 9207
2 827
... ...
etc. This means that I need to find the dot product of column 1 in P and column 4 in Q, then column 1 in P and column 625 in Q and so on. I could easily solve this with a for loop but I know they are not very efficient in R. Anyone got any ideas?
edit: asked for a reproducible example
P <- matrix(c(1,0,1,0,0,1,0,1,0), nrow = 3, ncol = 3)
Q <- matrix(c(0,0,1,0,1,0,1,0,1), nrow = 3, ncol = 3)
i <- c(1,1,2)
j <- c(2,1,3)
gives output (if in dot product form)
1: 0
2: 1
3: 1
P <- matrix(1:50, nrow = 5,ncol = 10)
Q <- matrix(1:50, nrow = 5, ncol = 10)
i <- c(1,2,4,7)
j <- c(5,3,7,2)
P
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 6 11 16 21 26 31 36 41 46
# [2,] 2 7 12 17 22 27 32 37 42 47
# [3,] 3 8 13 18 23 28 33 38 43 48
# [4,] 4 9 14 19 24 29 34 39 44 49
# [5,] 5 10 15 20 25 30 35 40 45 50
Q
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 6 11 16 21 26 31 36 41 46
# [2,] 2 7 12 17 22 27 32 37 42 47
# [3,] 3 8 13 18 23 28 33 38 43 48
# [4,] 4 9 14 19 24 29 34 39 44 49
# [5,] 5 10 15 20 25 30 35 40 45 50
P[,i] * Q[, j]
# [,1] [,2] [,3] [,4]
# [1,] 21 66 496 186
# [2,] 44 84 544 224
# [3,] 69 104 594 264
# [4,] 96 126 646 306
# [5,] 125 150 700 350
Using matrix multiplication, you can do
diag(t(P[, i]) %*% Q[, j])
[1] 0 1 1
Here is second a solution with apply.
apply(cbind(i, j), 1, function(x) t(P[, x[1]]) %*% Q[, x[2]])
[1] 0 1 1
To verify these agree in a second example:
set.seed(1234)
A <- matrix(sample(0:10, 100, replace=TRUE), 10, 10)
B <- matrix(sample(0:10, 100, replace=TRUE), 10, 10)
inds <- matrix(sample(10, 10, replace=TRUE), 5)
matrix multiplication
diag(t(A[, inds[,1]]) %*% B[, inds[,2]])
[1] 215 260 306 237 317
and apply
apply(inds, 1, function(x) t(A[, x[1]]) %*% B[, x[2]])
[1] 215 260 306 237 317

Converting code from MATLAB to R

I'm trying to convert some code from MATLAB to R.
I'm having particular problems converting this part of a differential equation:
In MATLAB :
dA.*(A*N - N.*sum(A,2))
where dA is an integer, A is a 10x10 matrix and N is a 10x1 matrix (see example code below)
In R so far I've got this:
dA*(A*N - N*colSums(A))
but for some reason it doesn't seem to be giving the same result. Does anyone have any ideas as to what I've done wrong?
Example of the data I'm using below:
in MATLAB:
dA = 0.1;
N = 120000*ones(1,nN);
seq = [0 1 0 0 0 1 0];
seq2 = repmat(seq,1,20);
seq100 = seq2(1:100)
A = AA-diag(diag(AA));
in R:
dA <- 0.1
N <- c(120000, 120000, 120000, 120000, 120000, 120000, 120000, 120000, 120000, 120000)
num_zeros_int <- zeros(70, 1)
num_ones_int <- ones(30, 1)
seq <- c(0,1,0,0,0,1,0)
seq2<- rep(seq, times = 20)
seq100 <- seq2[0:100]
int_mat <- matrix(seq100, nests, nests)
Matlab expression:
dA.*(A*N - N.*sum(A,2))
where
dA: real number
A: 10 x 10 matrix
N: 10 X 1 matrix
A*N: matrix multiplication
sum(A,2): sum of rows in A (10x1 matrix)
N.*sum(A,2): element by element multiplication (10 x 1 matrix)
Let's set up the following example in R:
A = matrix(data = 1:100,nrow = 10)
N = matrix(data = 1:10)
dA = 0.1
> A
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 11 21 31 41 51 61 71 81 91
[2,] 2 12 22 32 42 52 62 72 82 92
[3,] 3 13 23 33 43 53 63 73 83 93
[4,] 4 14 24 34 44 54 64 74 84 94
[5,] 5 15 25 35 45 55 65 75 85 95
[6,] 6 16 26 36 46 56 66 76 86 96
[7,] 7 17 27 37 47 57 67 77 87 97
[8,] 8 18 28 38 48 58 68 78 88 98
[9,] 9 19 29 39 49 59 69 79 89 99
[10,] 10 20 30 40 50 60 70 80 90 100
> N
[,1]
[1,] 1
[2,] 2
[3,] 3
[4,] 4
[5,] 5
[6,] 6
[7,] 7
[8,] 8
[9,] 9
[10,] 10
The first term is:
z1 = A %*% N
And the second term:
srow = rowSums(A)
z2 = srow * N
Which leads to the final result:
result = dA * (z1-z2)
Final equation
result = dA * (A %*% N - rowSums(A)*N)
This should give you the same answer as Matlab's dA.*(A*N - N.*sum(A,2))

Find row sums for a subset of the columns of a matrix

Here is a 10 x 12 matrix:
mat <- matrix(runif(120, 0, 1), 10)
I am trying to find column sums for subsets of a matrix (specifically, column sums for columns 1 through 4, 5 through 8, and 9 through 12) by row. The desired output would be a 10 x 3 matrix.
I tried the approaches from this answer using tapply and by (with detours to rowsum and aggregate), but encountered errors with all of them.
What the OP is describing is called a row sum in R:
# using Matthew Lundberg's example data
x <- matrix(1:36, 3,12)
g = split(seq(ncol(x)), (seq(ncol(x)) - 1) %/% 4 )
sapply(g, function(cols) rowSums( x[, cols] ))
# 0 1 2
# [1,] 22 70 118
# [2,] 26 74 122
# [3,] 30 78 126
It's typical to have grouping variables over rows/observations not columns/variables. To reach this case, the OP could transpose:
rowsum( t(x), (seq(ncol(x))-1) %/% 4 )
# [,1] [,2] [,3]
# 0 22 26 30
# 1 70 74 78
# 2 118 122 126
You can do this with a brute-force approach, specifying each column within apply:
t(apply(x, 1, function(y) c(sum(y[1:4]), sum(y[5:8]), sum(y[9:12]))))
It's easier to see with non-random data, and a shorter matrix for input:
> x <- matrix(1:36, 3,12)
> x
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 4 7 10 13 16 19 22 25 28 31 34
[2,] 2 5 8 11 14 17 20 23 26 29 32 35
[3,] 3 6 9 12 15 18 21 24 27 30 33 36
> t(apply(x, 1, function(y) c(sum(y[1:4]), sum(y[5:8]), sum(y[9:12]))))
[,1] [,2] [,3]
[1,] 22 70 118
[2,] 26 74 122
[3,] 30 78 126
You can also split the vector with split, and while this is more idiomatic for R and more flexible, it is not really more readable:
> t(apply(x, 1, function(y) sapply(split(y, ceiling(seq_along(y)/4)), sum)))
1 2 3
[1,] 22 70 118
[2,] 26 74 122
[3,] 30 78 126
We could convert to array, use apply with MARGIN=1 and get the colSums
n <- 4
t(apply(array(mat, dim=c(nrow(mat), n, ncol(mat)/n)), 1, colSums))
Or another option is melt/acast from library(reshape2)
library(reshape2)
acast(melt(mat), Var1~(Var2-1)%/%n, value.var='value', sum)
The wrapper function recast can be used to make this compact
recast(mat, Var1~(Var2-1)%/%4, id.var=NULL, sum)

Resources