Fill a NA array using apply functions - r

I want to fill an array using the apply function.
For example, my purpose is to simplify the repetition in the following code
tmp <- matrix(1:100, 20, 5)
i <- replicate(2, sample(1:nrow(tmp), 3))
dt <- array(NA, c(3, 5, 2))
# the repetition:
dt[, , 1] <- tmp[i[,1], ]
dt[, , 2] <- tmp[i[,2], ]
The following may be a solution:
for (s in 1:2) {
dt[, , s] <- tmp[i[,s], ]
}
But, I do not want to use the for loop.
The purpose is to make pseudo data sets for a simulation.
That is, "tmp" is a matrix for the population with 20 individuals and 5 variables.
And, each column of "i" is the index of sampled persons for two iteration of the simulation.
Thus, I am trying to stake the two samples in an array.
Here, how can I make the line two lines in the code just one line?
For this, I think, apply function may be a candidate.
Can we reduce the last lines?
(This question is closely related to Fill a NA matrix using apply functions. But, the solution might be pretty different. )

Instead of filling dt, subset tmp by using i in apply and create the array with desired dimensions.
(dt <- array(apply(i, 2, \(x) tmp[x, ]), dim=c(3, 5, 2)))
# , , 1
#
# [,1] [,2] [,3] [,4] [,5]
# [1,] 14 34 54 74 94
# [2,] 4 24 44 64 84
# [3,] 15 35 55 75 95
#
# , , 2
#
# [,1] [,2] [,3] [,4] [,5]
# [1,] 15 35 55 75 95
# [2,] 19 39 59 79 99
# [3,] 20 40 60 80 100
Your for loop, however, is perfectly fine and well understandable.
Data:
tmp <- structure(1:100, dim = c(20L, 5L))
i <- structure(c(14L, 4L, 15L, 15L, 19L, 20L), dim = 3:2)

Related

How to calculate variance between observations, when observations are in matrix form each

I have a cluster of 250 observations. each observation is a 4 by 9 matrix.
4 is number of variable parameters observed and 9 is number of days, observations were collected.
I want to know the variance between 250 observations which are in matrix form. as I ve studied so far, variance is calculated among one dimension variables.
any suggestion for data in matrix form?
mat1 <- matrix(c(0:69),10,7)
mat2 <- matrix(c(3:72),10,7)
mat3 <- matrix(c(0:69),10,7)
...
var <- var(mat1,mat2, mat3,..)
for these three matrices, var() returns a 7 by 7 matrix of 9.166667 for all elements. I do not know what r is doing. or how to get to this.
I think this will reflect what you're hoping for.
First, I'll create a few matrices, very small:
set.seed(42)
mat1 <- matrix(sample(100,12),2,4)
mat2 <- matrix(sample(100,12),2,4)
mat3 <- matrix(sample(100,12),2,4)
From here, I think you want to get
var(c(mat1[1,1], mat2[1,1], mat3[1,1]))
# [1] 193
but for every set of cells in all matrices.
One way to do this is to abind all matrices into a 3D array and then use apply:
ary <- do.call(abind::abind, c(list(mat1, mat2, mat3), along = 3))
ary
# , , 1
# [,1] [,2] [,3] [,4]
# [1,] 49 25 18 47
# [2,] 65 74 100 24
# , , 2
# [,1] [,2] [,3] [,4]
# [1,] 26 41 27 5
# [2,] 3 89 36 84
# , , 3
# [,1] [,2] [,3] [,4]
# [1,] 24 43 22 8
# [2,] 30 15 58 36
apply(ary, 1:2, var)
# [,1] [,2] [,3] [,4]
# [1,] 193.0000 97.33333 20.33333 549
# [2,] 966.3333 1530.33333 1057.33333 1008
Where 193 is the variance of the [1,1] elements, 97.333 is the variance of the [1,2] elements, etc.
The arguments to var are:
> args(var)
function (x, y = NULL, na.rm = FALSE, use)
so mat1 is being passed to x and mat2 to y and mat3 to na.rm. Element i, j of the result is the covariance of x[, i] and y[, j].
The code in the question really all makes no sense and some reading of ?var would help. It is not clear what "I want to know the variance between 250 observations which are in matrix form" means. If it means that v[i, j] should be calculated as the variance of c(mat1[i,j], mat2[i, j], mat3[i, j]) then we can use one of several list comprehension packages or just iterated sapply. They all use the fact that these two are the same for fixed i and j except the first is more general.
var(sapply(L, `[`, i, j))
var(c(L[[1]][i, j], L[[2]][i,j], L[[3]][i,j]))
The syntax for the listcompr alternative seems particularly intuitive here.
L <- list(mat1, mat2, mat3)
nr <- nrow(L[[1]])
nc <- ncol(L[[1]])
library(listcompr)
v1 <- gen.matrix(var(sapply(L, `[`, i, j)), i = 1:nr, j = 1:nc)
# or
library(eList)
v2 <- Mat(for(j in 1:nc) for(i in 1:nr) var(sapply(L, `[`, i, j)))
# or (no packages):
v3 <- sapply(1:nc, \(j) sapply(1:nr, \(i) var(sapply(L, `[`, i, j))))
# checks
identical(v1, v2)
## [1] TRUE
identical(v1, v3)
## [1] TRUE
i <- 2; j <- 3
identical(v1[i, j], var(c(L[[1]][i, j], L[[2]][i,j], L[[3]][i,j])))
## [1] TRUE

Convert bigger dimension matrix to smaller dimension matrix with a loop

I currently have 185*185 matrix and the goal is to convert this matrix into a 35*35 matrix by aggregating the value based on the rows and cols of the 185 matrix.
Example:
I have a 8*8 matrix as below:
matrix_x <- matrix(1:64, nrow = 8)
Then I want to convert it into a 4*4 matrix:
matrix_y <- matrix(NA, nrow = 4, ncol = 4)
The list below is created for aggregating the 8*8 matrix cols to a 4*4 matrix
col_list <- list(
1,
2:3,
c(4,8),
5:7
)
What I've done to achieve this is by assigning the value manually as below
matrix_y[1,1] <- sum(matrix_x[col_list[[1]],col_list[[1]]])
matrix_y[1,2] <- sum(matrix_x[col_list[[1]],col_list[[2]]])
matrix_y[1,3] <- sum(matrix_x[col_list[[1]],col_list[[3]]])
matrix_y[1,4] <- sum(matrix_x[col_list[[1]],col_list[[4]]])
matrix_y[2,1] <- sum(matrix_x[col_list[[2]],col_list[[1]]])
matrix_y[2,2] <- sum(matrix_x[col_list[[2]],col_list[[2]]])
matrix_y[2,3] <- sum(matrix_x[col_list[[2]],col_list[[3]]])
matrix_y[2,4] <- sum(matrix_x[col_list[[2]],col_list[[4]]])
matrix_y[3,1] <- sum(matrix_x[col_list[[3]],col_list[[1]]])
matrix_y[3,2] <- sum(matrix_x[col_list[[3]],col_list[[2]]])
matrix_y[3,3] <- sum(matrix_x[col_list[[3]],col_list[[3]]])
matrix_y[3,4] <- sum(matrix_x[col_list[[3]],col_list[[4]]])
matrix_y[4,1] <- sum(matrix_x[col_list[[4]],col_list[[1]]])
matrix_y[4,2] <- sum(matrix_x[col_list[[4]],col_list[[2]]])
matrix_y[4,3] <- sum(matrix_x[col_list[[4]],col_list[[3]]])
matrix_y[4,4] <- sum(matrix_x[col_list[[4]],col_list[[4]]])
This approach works well, but I'm looking for a more efficient way to achieve this since the approach I've done takes so many code lines.
There should be a neater/easier way to do this but here is one straight-forward option :
n <- 4
t(sapply(seq_len(n), function(p) sapply(col_list, function(q) sum(matrix_x[p, q]))))
# [,1] [,2] [,3] [,4]
#[1,] 1 26 82 123
#[2,] 2 28 84 126
#[3,] 3 30 86 129
#[4,] 4 32 88 132
This gives the same matrix as matrix_y in the post.
For the updated question, we can use outer
apply_fun <- function(x, y) sum(matrix_x[x, y])
outer(col_list, col_list, Vectorize(apply_fun))
# [,1] [,2] [,3] [,4]
#[1,] 1 26 82 123
#[2,] 5 58 170 255
#[3,] 12 72 184 276
#[4,] 18 108 276 414
Or following the same approach as in original answer with nested sapply
t(sapply(col_list, function(p) sapply(col_list, function(q) sum(matrix_x[p, q]))))

is there a way to skip rows within lapply to automate a task?

How can I use an R function to automate something like this? I understand how to do it if row1 and row2 were one digit apart but the only way I can think of to skip values like this is to use a global variable (despite the fact that the offset is exactly 5 each time). Can someone suggest a better method? This is only a snippet, I have to perform this task almost 2000 times
cond<-rbind(c(1,2,3),c(4,5,6),c(9,9,9),c(9,9,9),c(9,9,9),c(9,9,9),c(7,8,9))
row1<-rbind(cond[c(1,2),])
avatar10<-matrix(colMeans(row1), ncol=3, byrow = TRUE)
row2<-rbind(cond[c(6,7),])
avatar20<-matrix(colMeans(row2), ncol=3, byrow = TRUE)
result<-rbind(avatar10,avatar20)
You can get the indices you want to apply it to with seq, using a step argument of 5. Then you can use sapply to perform your colMeans operation on each. In your case:
# example 10000 by 3 matrix
cond <- replicate(3, rnorm(10000))
indices <- seq(1, nrow(cond), 5)
result <- t(sapply(indices, function(i) colMeans(cond[i:(i + 1), ])))
(Notice that you had to transpose it afterwards since sapply stores each result as a column of a matrix, while you want each result as a row).
In the case of the specific problem you're solving, there's an even easier (and more computationally efficient) solution:
result <- (cond[indices, ] + cond[indices + 1, ]) / 2
However, this may not extend to more complex problems (e.g. if you're averaging more than two rows).
You can use rowsum.
Here's an example, just to make sure we're talking about the same thing:
set.seed(1)
n <- 5
nr <- 50
dat <- replicate(3, rnorm(nr))
idx <- sapply(seq(1, nr-1, by=n), `+`, 0:1)
idx
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 6 11 16 21 26 31 36 41 46
# [2,] 2 7 12 17 22 27 32 37 42 47
So, you want to col-mean rows 1&2, 6&7, 11&12 and so on.
res = rowsum(dat[c(idx),], c(col(idx))) / nrow(idx)
# [,1] [,2] [,3]
# 1 -0.22140524 -0.10696026 -0.2891254
# 2 -0.16651967 0.80658921 1.2419974
# 3 0.95081220 1.18118888 -0.5486906
# 4 -0.03056194 -0.80808316 -0.3564004
# 5 0.85055684 -0.11721845 0.4185407
# 6 -0.10596212 -0.07592282 0.3195510
# 7 0.62794591 -0.35192367 -0.2643670
# 8 -0.40464226 0.69802510 -0.9187130
# 9 -0.20894264 0.33267389 -0.3688881
# 10 -0.17145660 -0.35905289 0.6681738
# for comparison, the first row computed manually
colMeans(dat[c(1,2),])
# [1] -0.2214052 -0.1069603 -0.2891254
rowsum is a really weird name for this, but it's all explained in the docs, at ?rowsum.

Sum Every N Values in Matrix

So I have taken a look at this question posted before which was used for summing every 2 values in each row in a matrix. Here is the link:
sum specific columns among rows. I also took a look at another question here: R Sum every k columns in matrix which is more similiar to mine. I could not get the solution in this case to work. Here is the code that I am working with...
y <- matrix(1:27, nrow = 3)
y
m1 <- as.matrix(y)
n <- 3
dim(m1) <- c(nrow(m1)/n, ncol(m1), n)
res <- matrix(rowSums(apply(m1, 1, I)), ncol=n)
identical(res[1,],rowSums(y[1:3,]))
sapply(split.default(y, 0:(length(y)-1) %/% 3), rowSums)
I just get an error message when applying this. The desired output is a matrix with the following values:
[,1] [,2] [,3]
[1,] 12 39 66
[2,] 15 42 69
[3,] 18 45 72
To sum consecutive sets of n elements from each row, you just need to write a function that does the summing and apply it to each row:
n <- 3
t(apply(y, 1, function(x) tapply(x, ceiling(seq_along(x)/n), sum)))
# 1 2 3
# [1,] 12 39 66
# [2,] 15 42 69
# [3,] 18 45 72
Transform the matrix to an array and use colSums (as suggested by #nongkrong):
y <- matrix(1:27, nrow = 3)
n <- 3
a <- y
dim(a) <- c(nrow(a), ncol(a)/n, n)
b <- aperm(a, c(2,1,3))
colSums(b)
# [,1] [,2] [,3]
#[1,] 12 39 66
#[2,] 15 42 69
#[3,] 18 45 72
Of course this assumes that ncol(y) is divisible by n.
PS: You can of course avoid creating so many intermediate objects. They are there for didactic purposes.
I would do something similar to the OP -- apply rowSums on subsets of the matrix:
n = 3
ng = ncol(y)/n
sapply( 1:ng, function(jg) rowSums(y[, (jg-1)*n + 1:n ]))
# [,1] [,2] [,3]
# [1,] 12 39 66
# [2,] 15 42 69
# [3,] 18 45 72

R: How to compare 2 matrices

I have 2 matrices
matrix1 (nrow=3, ncol=3)
matrix2 (nrow=5, ncol=5)
I know how to compare and replace conditionnaly an element of a matrix by an element of another matrix, but ONLY if these 2 elements are sharing the same [i,j] like this :
ifelse(matrix1<0.5, matrix2[,], matrix1[,])
Question:
Here I'd like to replace an element of matrix1, by an element of matrix2 of another column like this:
If matrix1[i,j]<0.5 Then I want to replace it be matrix2[i,j+2]
Else I want to replace it be matrix2[i,j+1]
The problem is:
I can't use loop because of performance
I don't know how to explain to ifelse to move on another column.
How can I do this kind of comparaison efficiently on big matrix ?
Here is the data:
> dput(matrix1)
structure(c(0.782534098718315, 0.279918688116595, 0.139927505282685,
0.485497816000134, 0.150636059232056, 0.976677459431812, 0.101831247797236,
0.491994257550687, 0.492571017006412), .Dim = c(3L, 3L))
> dput(matrix2)
structure(1:25, .Dim = c(5L, 5L))
Here m1 and m2 are the two matrices. Based on #alexis_laz comments, you could try
indx1 <- tail(seq(ncol(m1)+1),ncol(m1))
indx2 <- tail(seq(ncol(m1)+2),ncol(m1))
rowInd <- 1:nrow(m1)
ifelse(m1 < 0.5, m2[rowInd,indx2], m2[rowInd, indx1])
# [,1] [,2] [,3]
#[1,] 6 16 21
#[2,] 12 17 22
#[3,] 13 13 23
Or you can create index by
indx <- cbind(c(row(m1)), c(col(m1)))
indx1 <- cbind(indx[,1], indx[,2]+1)
indx2 <- cbind(indx[,1], indx[,2]+2)
ifelse(m1 < 0.5, m2[indx2], m2[indx1])
# [,1] [,2] [,3]
#[1,] 6 16 21
#[2,] 12 17 22
#[3,] 13 13 23

Resources