apply function to subsets of each row in R - r

I am struggling to find a way to apply a specific function using apply, only to a "chunk" of a specific row.
For instance, I have a matrix:
x <- matrix(c(5,12,4,3,2,8,10,7,9,1,11,6),nrow=3)
[,1] [,2] [,3] [,4]
[1,] 5 3 10 1
[2,] 12 2 7 11
[3,] 4 8 9 6
And I would like to end up with a new matrix, made up of a sum of the first and last two values in each row. Like so:
[,1] [,2]
[1,] 8 11
[2,] 14 18
[3,] 12 15
I have tried something like this:
chunks<-c("1:2","3:4")
sumchunks<-function(x,chunks){
apply(x,1,
function(row){
for (i in chunks){
v<-sum(row[chunks[i]])
}})
}
But it doesn't work at all. Any suggestion on successful ways?
Thank you.

You can do:
chunks <- list(1:2, 3:4)
sumchunks <- function(x, chunks) sapply(chunks, function(ch) sum(x[ch]))
x <- matrix(c(5,12,4,3,2,8,10,7,9,1,11,6),nrow=3)
apply(x, 1, sumchunks, chunks=chunks)
# [,1] [,2] [,3]
# [1,] 8 14 12
# [2,] 11 18 15
Eventually you want to transpose the result.
Here is a vectorized variant:
chunks <- list(1:2, 3:4)
x <- matrix(c(5,12,4,3,2,8,10,7,9,1,11,6),nrow=3)
sapply(chunks, function(ch) rowSums(x[,ch]))
# [,1] [,2]
# [1,] 8 11
# [2,] 14 18
# [3,] 12 15

We can convert to array and then do
t(apply(array(x, c(3, 2, 2)), 1, colSums))
Or
sapply(seq(1, ncol(x), 2), function(i) rowSums(x[,i:(i+1)]))
# [,1] [,2]
#[1,] 8 11
#[2,] 14 18
#[3,] 12 15

like this?
x <- matrix(sample(1:12),nrow=3)
f = function(s) {
c(sum(s[1:2]), sum(s[3:4]))
}
t(apply(x, 1, f))

rowSums was built to sum over rows so should be quite fast. You can limit the columns you want to sum over and then cbind them to get what you want:
cbind(rowSums(x[,c(1,2)]), rowSums(x[,c(3,4)]))
# [,1] [,2]
#[1,] 8 11
#[2,] 14 18
#[3,] 12 15

Related

I have a table consists of many samples across many columns, how to substract the value of each sample in each column to the mean of their column? [duplicate]

I have a matrix with 5 columns and 4 rows. I also have a vector with 3 columns. I want to subtract the values in the vector from columns 3,4 and 5 respectively at each row of the matrix.
b <- matrix(rep(1:20), nrow=4, ncol=5)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
c <- c(5,6,7)
to get
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 4 7 10
[2,] 2 6 5 8 11
[3,] 3 7 6 9 12
[4,] 4 8 7 10 13
This is exactly what sweep was made for:
b <- matrix(rep(1:20), nrow=4, ncol=5)
x <- c(5,6,7)
b[,3:5] <- sweep(b[,3:5], 2, x)
b
# [,1] [,2] [,3] [,4] [,5]
#[1,] 1 5 4 7 10
#[2,] 2 6 5 8 11
#[3,] 3 7 6 9 12
#[4,] 4 8 7 10 13
..or even without subsetting or reassignment:
sweep(b, 2, c(0,0,x))
Perhaps not that elegant, but
b <- matrix(rep(1:20), nrow=4, ncol=5)
x <- c(5,6,7)
b[,3:5] <- t(t(b[,3:5])-x)
should do the trick. We subset the matrix to change only the part we need, and we use t() (transpose) to flip the matrix so simple vector recycling will take care of subtracting from the correct row.
If you want to avoid the transposed, you could do something like
b[,3:5] <- b[,3:5]-x[col(b[,3:5])]
as well. Here we subset twice, and we use the second to get the correct column for each value in x because both those matrices will index in the same order.
I think my favorite from the question that #thelatemail linked was
b[,3:5] <- sweep(b[,3:5], 2, x, `-`)
Another way, with apply:
b[,3:5] <- t(apply(b[,3:5], 1, function(x) x-c))
A simple solution:
b <- matrix(rep(1:20), nrow=4, ncol=5)
c <- c(5,6,7)
for(i in 1:nrow(b)) {
b[i,3:5] <- b[i,3:5] - c
}
This can be done with the rray package in a very satisfying way (using its (numpy-like) broadcasting - operator %b-%):
#install.packages("rray")
library(rray)
b <- matrix(rep(1:20), nrow=4, ncol=5)
x <- c(5, 6, 7)
b[, 3:5] <- b[, 3:5] %b-% matrix(x, 1)
b
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 5 4 7 10
#> [2,] 2 6 5 8 11
#> [3,] 3 7 6 9 12
#> [4,] 4 8 7 10 13
For large matrices this is even faster than sweep:
#install.packages("bench")
res <- bench::press(
size = c(10, 1000, 10000),
frac_selected = c(0.1, 0.5, 1),
{
B <- matrix(sample(size*size), nrow=size, ncol=size)
B2 <- B
x <- sample(size, size=ceiling(size*frac_selected))
idx <- sample(size, size=ceiling(size*frac_selected))
bench::mark(rray = {B2[, idx] <- B[, idx, drop = FALSE] %b-% matrix(x, nrow = 1); B2},
sweep = {B2[, idx] <- sweep(B[, idx, drop = FALSE], MARGIN = 2, x); B2}
)
}
)
plot(res)

Split a matrix in blocks of size n with offset i (vectorized method)

I want to split matrices of size k x l into blocks of size n x n considering an ofset o (Like Mathematica's Partition function does).
For example, given a matrix A like
A <- matrix(seq(1:16), nrow = 4, ncol = 4)
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
and block size = 3, offset = 1, I want as output the four submatrices that I'd get from
A[1:3, 1:3]
A[1:3, 2:4]
A[2:4, 1:3]
A[2:4, 2:4]
If offset were equal to 2 or 3, the output for this example should be only the submatrix that I get from
A[1:3, 1:3]
How can I vectorize this?
There might be a more elegant way. Here is how I'd do it by writing a myPartition function which simulates the mathematica Partition function. Firstly use Map to construct possible index along the row and column axis where we use seq to take offset into consideration, and then use cross2 from purrr to construct a list of all possible combinations of the subset index. Finally use lapply to subset the matrix and return a list of subset matrix;
The testing results on offset 1, 2 and 3 are as follows which seems to behave as expected:
library(purrr)
ind <- function(k, n, o) Map(`:`, seq(1, k-n+1, by = o), seq(n, k, by = o))
# this is a little helper function that generates subset index according to dimension of the
# matrix, the first sequence construct the starting point of the subset index with an interval
# of o which is the offset while the second sequence construct the ending point of the subset index
# use Map to construct vector from start to end which in OP's case will be 1:3 and 2:4.
myPartition <- function(mat, n, o) {
lapply(cross2(ind(nrow(mat),n,o), ind(ncol(mat),n,o)), function(i) mat[i[[1]], i[[2]]])
}
# This is basically an lapply. we use cross2 to construct combinations of all subset index
# which will be 1:3 and 1:3, 1:3 and 2:4, 2:4 and 1:3 and 2:4 and 2:4 in OP's case. Use lapply
# to loop through the index and subset.
# Testing case for offset = 1
myPartition(A, 3, 1)
# [[1]]
# [,1] [,2] [,3]
# [1,] 1 5 9
# [2,] 2 6 10
# [3,] 3 7 11
# [[2]]
# [,1] [,2] [,3]
# [1,] 2 6 10
# [2,] 3 7 11
# [3,] 4 8 12
# [[3]]
# [,1] [,2] [,3]
# [1,] 5 9 13
# [2,] 6 10 14
# [3,] 7 11 15
# [[4]]
# [,1] [,2] [,3]
# [1,] 6 10 14
# [2,] 7 11 15
# [3,] 8 12 16
# Testing case for offset = 2
myPartition(A, 3, 2)
# [[1]]
# [,1] [,2] [,3]
# [1,] 1 5 9
# [2,] 2 6 10
# [3,] 3 7 11
# Testing case for offset = 3
myPartition(A, 3, 3)
# [[1]]
# [,1] [,2] [,3]
# [1,] 1 5 9
# [2,] 2 6 10
# [3,] 3 7 11
How about this using base R, the idea is to generate all possible windows (i.e. winds) of size n*n while taking into account the offset. Then print all possible permutations of winds's elements in matrix A (i.e. perms). It works for any A of size k*l.
A <- matrix(seq(1:16), nrow = 4, ncol = 4)
c <- ncol(A); r <- nrow(A)
offset <- 1; size <- 3
sq <- seq(1, max(r,c), offset)
winds <- t(sapply(sq, function(x) c(x,(x+size-1))))
winds <- winds[winds[,2]<=max(r, c),] # check the range
if (is.vector(winds)) dim(winds) <- c(1,2) # vector to matrix
perms <- expand.grid(list(1:nrow(winds), 1:nrow(winds)))
out=apply(perms, 1, function(x) {
a11 <- winds[x[1],1];a12 <- winds[x[1],2];a21 <- winds[x[2],1];a22 <- winds[x[2],2]
if (ifelse(r<c, a12<=r, a22<=c)) { # check the range
cat("A[", a11, ":", a12, ", ", a21, ":", a22, "]", sep="", "\n")
print(A[a11:a12, a21:a22])
}
})
# A[1:3, 1:3]
# [,1] [,2] [,3]
# [1,] 1 5 9
# [2,] 2 6 10
# [3,] 3 7 11
# A[2:4, 1:3]
# [,1] [,2] [,3]
# [1,] 2 6 10
# [2,] 3 7 11
# [3,] 4 8 12
# A[1:3, 2:4]
# [,1] [,2] [,3]
# [1,] 5 9 13
# [2,] 6 10 14
# [3,] 7 11 15
# A[2:4, 2:4]
# [,1] [,2] [,3]
# [1,] 6 10 14
# [2,] 7 11 15
# [3,] 8 12 16
For size=3 and offset=2 or offset=3:
# A[1:3, 1:3]
# [,1] [,2] [,3]
# [1,] 1 5 9
# [2,] 2 6 10
# [3,] 3 7 11
For offset=2 and size=2:
# A[1:2, 1:2]
# [,1] [,2]
# [1,] 1 5
# [2,] 2 6
# A[3:4, 1:2]
# [,1] [,2]
# [1,] 3 7
# [2,] 4 8
# A[1:2, 3:4]
# [,1] [,2]
# [1,] 9 13
# [2,] 10 14
# A[3:4, 3:4]
# [,1] [,2]
# [1,] 11 15
# [2,] 12 16

split big matrix in subsets,R

I would like to make subgroups for a matrix, and each subgroup contains the same amount of column. For example, there is a matrix with 1000 rows and 420 columns, and I would like to split into 35 sub-matrix in order, like first one contains the first 12 cols, and the second contains the second 12 cols, and so on. I think I could use function Iris. Please help me!
iris[c()]
Not sure if this is what you're after, but here's a reproducible example:
# Define matrix
M = matrix(
c(1:20),
nrow=2,
ncol=10)
# Split into 5 submatrices of equal size
lapply(split(M, rep(1:5, each = 4)), matrix, ncol = 2)
$`1`
[,1] [,2]
[1,] 1 3
[2,] 2 4
$`2`
[,1] [,2]
[1,] 5 7
[2,] 6 8
$`3`
[,1] [,2]
[1,] 9 11
[2,] 10 12
$`4`
[,1] [,2]
[1,] 13 15
[2,] 14 16
$`5`
[,1] [,2]
[1,] 17 19
[2,] 18 20
You can turn your 1000 × 420 matrix into a 1000 × 12 × 35 matrix with
dim(x) <- c(1000, 12, 35)
where x is the original matrix. Then x[, , 1] gives you the first 1000 × 12 sub-matrix, x[, , 2] the second sub-matrix, and so forth.

subtract a constant vector from each row in a matrix in r

I have a matrix with 5 columns and 4 rows. I also have a vector with 3 columns. I want to subtract the values in the vector from columns 3,4 and 5 respectively at each row of the matrix.
b <- matrix(rep(1:20), nrow=4, ncol=5)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
c <- c(5,6,7)
to get
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 4 7 10
[2,] 2 6 5 8 11
[3,] 3 7 6 9 12
[4,] 4 8 7 10 13
This is exactly what sweep was made for:
b <- matrix(rep(1:20), nrow=4, ncol=5)
x <- c(5,6,7)
b[,3:5] <- sweep(b[,3:5], 2, x)
b
# [,1] [,2] [,3] [,4] [,5]
#[1,] 1 5 4 7 10
#[2,] 2 6 5 8 11
#[3,] 3 7 6 9 12
#[4,] 4 8 7 10 13
..or even without subsetting or reassignment:
sweep(b, 2, c(0,0,x))
Perhaps not that elegant, but
b <- matrix(rep(1:20), nrow=4, ncol=5)
x <- c(5,6,7)
b[,3:5] <- t(t(b[,3:5])-x)
should do the trick. We subset the matrix to change only the part we need, and we use t() (transpose) to flip the matrix so simple vector recycling will take care of subtracting from the correct row.
If you want to avoid the transposed, you could do something like
b[,3:5] <- b[,3:5]-x[col(b[,3:5])]
as well. Here we subset twice, and we use the second to get the correct column for each value in x because both those matrices will index in the same order.
I think my favorite from the question that #thelatemail linked was
b[,3:5] <- sweep(b[,3:5], 2, x, `-`)
Another way, with apply:
b[,3:5] <- t(apply(b[,3:5], 1, function(x) x-c))
A simple solution:
b <- matrix(rep(1:20), nrow=4, ncol=5)
c <- c(5,6,7)
for(i in 1:nrow(b)) {
b[i,3:5] <- b[i,3:5] - c
}
This can be done with the rray package in a very satisfying way (using its (numpy-like) broadcasting - operator %b-%):
#install.packages("rray")
library(rray)
b <- matrix(rep(1:20), nrow=4, ncol=5)
x <- c(5, 6, 7)
b[, 3:5] <- b[, 3:5] %b-% matrix(x, 1)
b
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 5 4 7 10
#> [2,] 2 6 5 8 11
#> [3,] 3 7 6 9 12
#> [4,] 4 8 7 10 13
For large matrices this is even faster than sweep:
#install.packages("bench")
res <- bench::press(
size = c(10, 1000, 10000),
frac_selected = c(0.1, 0.5, 1),
{
B <- matrix(sample(size*size), nrow=size, ncol=size)
B2 <- B
x <- sample(size, size=ceiling(size*frac_selected))
idx <- sample(size, size=ceiling(size*frac_selected))
bench::mark(rray = {B2[, idx] <- B[, idx, drop = FALSE] %b-% matrix(x, nrow = 1); B2},
sweep = {B2[, idx] <- sweep(B[, idx, drop = FALSE], MARGIN = 2, x); B2}
)
}
)
plot(res)

Form matrix from rows in 3-dimensional array

I have X, a three-dimensional array in R. I want to take a vector of indices indx (length equal to dim(X)[1]) and form a matrix where the first row is the first row of X[ , , indx[1]], the second row is the second row of X[ , , indx[2]], and so on.
For example, I have:
R> X <- array(1:18, dim = c(3, 2, 3))
R> X
, , 1
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
, , 2
[,1] [,2]
[1,] 7 10
[2,] 8 11
[3,] 9 12
, , 3
[,1] [,2]
[1,] 13 16
[2,] 14 17
[3,] 15 18
R> indx <- c(2, 3, 1)
My desired output is
R> rbind(X[1, , 2], X[2, , 3], X[3, , 1])
[,1] [,2]
[1,] 7 10
[2,] 14 17
[3,] 3 6
As of now I'm using the inelegant (and slow) sapply(1:dim(X)[2], function(x) X[cbind(1:3, x, indx)]). Is there any way to do this using the built-in indexing functions? I had no luck experimenting with the matrix indexing methods described in ?Extract, but I may just be doing it wrong.
Maybe like this:
t(sapply(1:3,function(x) X[,,idx][x,,x]))
I may be answering the wrong question (I can't reconcile your first description and your sample output)... This produces your sample output, but I can't say that it's much faster without running it on your data.
do.call(rbind, lapply(1:dim(X)[1], function(i) X[i, , indx[i]]))
Matrix indexing to the rescue! No applys needed.
Figure out which indices you want:
n <- dim(X)[2]
foo <- cbind(rep(seq_along(indx),n),
rep(seq.int(n), each=length(indx)),
rep(indx,n))
(the result is this)
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 2 1 3
[3,] 3 1 1
[4,] 1 2 2
[5,] 2 2 3
[6,] 3 2 1
and use it as index, converting back to a matrix to make it look like your output.
> matrix(X[foo],ncol=n)
[,1] [,2]
[1,] 7 10
[2,] 14 17
[3,] 3 6

Resources