I currently have 185*185 matrix and the goal is to convert this matrix into a 35*35 matrix by aggregating the value based on the rows and cols of the 185 matrix.
Example:
I have a 8*8 matrix as below:
matrix_x <- matrix(1:64, nrow = 8)
Then I want to convert it into a 4*4 matrix:
matrix_y <- matrix(NA, nrow = 4, ncol = 4)
The list below is created for aggregating the 8*8 matrix cols to a 4*4 matrix
col_list <- list(
1,
2:3,
c(4,8),
5:7
)
What I've done to achieve this is by assigning the value manually as below
matrix_y[1,1] <- sum(matrix_x[col_list[[1]],col_list[[1]]])
matrix_y[1,2] <- sum(matrix_x[col_list[[1]],col_list[[2]]])
matrix_y[1,3] <- sum(matrix_x[col_list[[1]],col_list[[3]]])
matrix_y[1,4] <- sum(matrix_x[col_list[[1]],col_list[[4]]])
matrix_y[2,1] <- sum(matrix_x[col_list[[2]],col_list[[1]]])
matrix_y[2,2] <- sum(matrix_x[col_list[[2]],col_list[[2]]])
matrix_y[2,3] <- sum(matrix_x[col_list[[2]],col_list[[3]]])
matrix_y[2,4] <- sum(matrix_x[col_list[[2]],col_list[[4]]])
matrix_y[3,1] <- sum(matrix_x[col_list[[3]],col_list[[1]]])
matrix_y[3,2] <- sum(matrix_x[col_list[[3]],col_list[[2]]])
matrix_y[3,3] <- sum(matrix_x[col_list[[3]],col_list[[3]]])
matrix_y[3,4] <- sum(matrix_x[col_list[[3]],col_list[[4]]])
matrix_y[4,1] <- sum(matrix_x[col_list[[4]],col_list[[1]]])
matrix_y[4,2] <- sum(matrix_x[col_list[[4]],col_list[[2]]])
matrix_y[4,3] <- sum(matrix_x[col_list[[4]],col_list[[3]]])
matrix_y[4,4] <- sum(matrix_x[col_list[[4]],col_list[[4]]])
This approach works well, but I'm looking for a more efficient way to achieve this since the approach I've done takes so many code lines.
There should be a neater/easier way to do this but here is one straight-forward option :
n <- 4
t(sapply(seq_len(n), function(p) sapply(col_list, function(q) sum(matrix_x[p, q]))))
# [,1] [,2] [,3] [,4]
#[1,] 1 26 82 123
#[2,] 2 28 84 126
#[3,] 3 30 86 129
#[4,] 4 32 88 132
This gives the same matrix as matrix_y in the post.
For the updated question, we can use outer
apply_fun <- function(x, y) sum(matrix_x[x, y])
outer(col_list, col_list, Vectorize(apply_fun))
# [,1] [,2] [,3] [,4]
#[1,] 1 26 82 123
#[2,] 5 58 170 255
#[3,] 12 72 184 276
#[4,] 18 108 276 414
Or following the same approach as in original answer with nested sapply
t(sapply(col_list, function(p) sapply(col_list, function(q) sum(matrix_x[p, q]))))
Related
I want to fill an array using the apply function.
For example, my purpose is to simplify the repetition in the following code
tmp <- matrix(1:100, 20, 5)
i <- replicate(2, sample(1:nrow(tmp), 3))
dt <- array(NA, c(3, 5, 2))
# the repetition:
dt[, , 1] <- tmp[i[,1], ]
dt[, , 2] <- tmp[i[,2], ]
The following may be a solution:
for (s in 1:2) {
dt[, , s] <- tmp[i[,s], ]
}
But, I do not want to use the for loop.
The purpose is to make pseudo data sets for a simulation.
That is, "tmp" is a matrix for the population with 20 individuals and 5 variables.
And, each column of "i" is the index of sampled persons for two iteration of the simulation.
Thus, I am trying to stake the two samples in an array.
Here, how can I make the line two lines in the code just one line?
For this, I think, apply function may be a candidate.
Can we reduce the last lines?
(This question is closely related to Fill a NA matrix using apply functions. But, the solution might be pretty different. )
Instead of filling dt, subset tmp by using i in apply and create the array with desired dimensions.
(dt <- array(apply(i, 2, \(x) tmp[x, ]), dim=c(3, 5, 2)))
# , , 1
#
# [,1] [,2] [,3] [,4] [,5]
# [1,] 14 34 54 74 94
# [2,] 4 24 44 64 84
# [3,] 15 35 55 75 95
#
# , , 2
#
# [,1] [,2] [,3] [,4] [,5]
# [1,] 15 35 55 75 95
# [2,] 19 39 59 79 99
# [3,] 20 40 60 80 100
Your for loop, however, is perfectly fine and well understandable.
Data:
tmp <- structure(1:100, dim = c(20L, 5L))
i <- structure(c(14L, 4L, 15L, 15L, 19L, 20L), dim = 3:2)
I have a cluster of 250 observations. each observation is a 4 by 9 matrix.
4 is number of variable parameters observed and 9 is number of days, observations were collected.
I want to know the variance between 250 observations which are in matrix form. as I ve studied so far, variance is calculated among one dimension variables.
any suggestion for data in matrix form?
mat1 <- matrix(c(0:69),10,7)
mat2 <- matrix(c(3:72),10,7)
mat3 <- matrix(c(0:69),10,7)
...
var <- var(mat1,mat2, mat3,..)
for these three matrices, var() returns a 7 by 7 matrix of 9.166667 for all elements. I do not know what r is doing. or how to get to this.
I think this will reflect what you're hoping for.
First, I'll create a few matrices, very small:
set.seed(42)
mat1 <- matrix(sample(100,12),2,4)
mat2 <- matrix(sample(100,12),2,4)
mat3 <- matrix(sample(100,12),2,4)
From here, I think you want to get
var(c(mat1[1,1], mat2[1,1], mat3[1,1]))
# [1] 193
but for every set of cells in all matrices.
One way to do this is to abind all matrices into a 3D array and then use apply:
ary <- do.call(abind::abind, c(list(mat1, mat2, mat3), along = 3))
ary
# , , 1
# [,1] [,2] [,3] [,4]
# [1,] 49 25 18 47
# [2,] 65 74 100 24
# , , 2
# [,1] [,2] [,3] [,4]
# [1,] 26 41 27 5
# [2,] 3 89 36 84
# , , 3
# [,1] [,2] [,3] [,4]
# [1,] 24 43 22 8
# [2,] 30 15 58 36
apply(ary, 1:2, var)
# [,1] [,2] [,3] [,4]
# [1,] 193.0000 97.33333 20.33333 549
# [2,] 966.3333 1530.33333 1057.33333 1008
Where 193 is the variance of the [1,1] elements, 97.333 is the variance of the [1,2] elements, etc.
The arguments to var are:
> args(var)
function (x, y = NULL, na.rm = FALSE, use)
so mat1 is being passed to x and mat2 to y and mat3 to na.rm. Element i, j of the result is the covariance of x[, i] and y[, j].
The code in the question really all makes no sense and some reading of ?var would help. It is not clear what "I want to know the variance between 250 observations which are in matrix form" means. If it means that v[i, j] should be calculated as the variance of c(mat1[i,j], mat2[i, j], mat3[i, j]) then we can use one of several list comprehension packages or just iterated sapply. They all use the fact that these two are the same for fixed i and j except the first is more general.
var(sapply(L, `[`, i, j))
var(c(L[[1]][i, j], L[[2]][i,j], L[[3]][i,j]))
The syntax for the listcompr alternative seems particularly intuitive here.
L <- list(mat1, mat2, mat3)
nr <- nrow(L[[1]])
nc <- ncol(L[[1]])
library(listcompr)
v1 <- gen.matrix(var(sapply(L, `[`, i, j)), i = 1:nr, j = 1:nc)
# or
library(eList)
v2 <- Mat(for(j in 1:nc) for(i in 1:nr) var(sapply(L, `[`, i, j)))
# or (no packages):
v3 <- sapply(1:nc, \(j) sapply(1:nr, \(i) var(sapply(L, `[`, i, j))))
# checks
identical(v1, v2)
## [1] TRUE
identical(v1, v3)
## [1] TRUE
i <- 2; j <- 3
identical(v1[i, j], var(c(L[[1]][i, j], L[[2]][i,j], L[[3]][i,j])))
## [1] TRUE
I would like to compute the product between the each row of a matrix x with itself. And then sum the result of all these products. The result is a scalar. I make the following coda that works but is not efficient. Can someone help me to avoid the for loop?
for(i in 1:nrow(x){
resid2[i] <- t(x[i,])%*% x[i,]
}
V = sum(resid2)/
The solution is just the sum of squares of all elements of the matrix.
V = sum(x^2)
which can also be calculated via matrix multiplication as:
V = crossprod(as.vector(x))
The intermediate vector resid2 can be calculated as
resid2 = rowSums(x^2)
V = sum(resid2)
Here is an answer that swaps the for loop out for the apply family.
sum(apply(x, margin = 1, function(z) z%*%z))
The apply function takes the matrix x, margin = 1 means for each row (as opposed to margin = 2 which means each column). So, for each row in x run a function that multiplies that row by itself: function(z) z%*%z
If I understand you correctly, you don't need to loop at all. mat %*% mat should do it:
mat <- matrix(seq.int(9), nrow=3)
mat
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
mat %*% mat
## [,1] [,2] [,3]
## [1,] 30 66 102
## [2,] 36 81 126
## [3,] 42 96 150
How can one extract the "diagonal" from three-dimensional array in R? For a matrix (2D array) one can use the diag(...) function. In a similar way, given an N x N x M array, a natural operation is to convert it into an N x M matrix by taking the diagonal from each N x N slice and returning it as a matrix.
It's easy to do this using a loop, but that is not idiomatic R and is slow. Another possibility is to use slightly complex indexing (see my own answer to this question) but it is a bit hard to read. What other alternatives are there? Is there a standard R way to do this?
Create an array and fill it by some values:
> a=array(0,c(10,10,5))
> for (i in 1:10) for (j in 1:10) for (k in 1:5) a[i,j,k]=100*i+10*j+k-111
Run the apply function:
> apply(a,3,diag)
[,1] [,2] [,3] [,4] [,5]
[1,] 0 1 2 3 4
[2,] 110 111 112 113 114
[3,] 220 221 222 223 224
[4,] 330 331 332 333 334
[5,] 440 441 442 443 444
[6,] 550 551 552 553 554
[7,] 660 661 662 663 664
[8,] 770 771 772 773 774
[9,] 880 881 882 883 884
[10,] 990 991 992 993 994
Various diagonals:
A = array(1:12, c(2, 2, 3))
apply(A, 1, diag)
# [,1] [,2]
#[1,] 1 2
#[2,] 7 8
apply(A, 2, diag)
# [,1] [,2]
#[1,] 1 3
#[2,] 6 8
apply(A, 3, diag)
# [,1] [,2] [,3]
#[1,] 1 5 9
#[2,] 4 8 12
Although I'm not enamored of the term "3d.diagonal" for this result, it can be achieved with this simple function (up to identity modulo transpose):
arr <- array(1:27,c(3,3,3) )
apply(arr, 3, function(x) x[row(x)==col(x)] )
# returns same value as diag.3d (arr)
[,1] [,2] [,3]
[1,] 1 10 19
[2,] 5 14 23
[3,] 9 18 27
I think a "real diagonal" would be arr[ cbind(1:3,1:3,1:3) ]
One possible approach is to use indexing, where the indices are a matrix with three columns. For example:
diag.3d <- function(A) {
# Expect a N x N x M array
stopifnot(length(dim(A)) == 3)
n <- nrow(A)
stopifnot(n == ncol(A))
m <- dim(A)[3]
IXS <- cbind(1:n, 1:n, rep(1:m, each = n))
cn <- colnames(A)
rn <- dimnames(A)[[3]]
matrix(A[IXS], ncol = n, byrow = T, dimnames = list(rn, cn))
}
Although indices (in variable IXS) seem hard to read.
Another approach is subseting the 3 dimensions array with a 2 dimensions matrix:
a <- array(1:100,dim = c(5,5,4))
ref <- cbind(1:5,1:5,rep(1:4,each= 5))
a[ref]
Output is a vector instead of a matrix. On my computer it is more efficient than apply() and you can also fill the diagonal values.
So I have taken a look at this question posted before which was used for summing every 2 values in each row in a matrix. Here is the link:
sum specific columns among rows. I also took a look at another question here: R Sum every k columns in matrix which is more similiar to mine. I could not get the solution in this case to work. Here is the code that I am working with...
y <- matrix(1:27, nrow = 3)
y
m1 <- as.matrix(y)
n <- 3
dim(m1) <- c(nrow(m1)/n, ncol(m1), n)
res <- matrix(rowSums(apply(m1, 1, I)), ncol=n)
identical(res[1,],rowSums(y[1:3,]))
sapply(split.default(y, 0:(length(y)-1) %/% 3), rowSums)
I just get an error message when applying this. The desired output is a matrix with the following values:
[,1] [,2] [,3]
[1,] 12 39 66
[2,] 15 42 69
[3,] 18 45 72
To sum consecutive sets of n elements from each row, you just need to write a function that does the summing and apply it to each row:
n <- 3
t(apply(y, 1, function(x) tapply(x, ceiling(seq_along(x)/n), sum)))
# 1 2 3
# [1,] 12 39 66
# [2,] 15 42 69
# [3,] 18 45 72
Transform the matrix to an array and use colSums (as suggested by #nongkrong):
y <- matrix(1:27, nrow = 3)
n <- 3
a <- y
dim(a) <- c(nrow(a), ncol(a)/n, n)
b <- aperm(a, c(2,1,3))
colSums(b)
# [,1] [,2] [,3]
#[1,] 12 39 66
#[2,] 15 42 69
#[3,] 18 45 72
Of course this assumes that ncol(y) is divisible by n.
PS: You can of course avoid creating so many intermediate objects. They are there for didactic purposes.
I would do something similar to the OP -- apply rowSums on subsets of the matrix:
n = 3
ng = ncol(y)/n
sapply( 1:ng, function(jg) rowSums(y[, (jg-1)*n + 1:n ]))
# [,1] [,2] [,3]
# [1,] 12 39 66
# [2,] 15 42 69
# [3,] 18 45 72