how to avoid for loop using apply function in this question - r

mat<-matrix(1:9,nrow=3,ncol=3)
for(i in 1:3){
print(colSums(mat[1:i,]))
}
I'm trying to calculate mean of colSums of part of a matrix.
How do I avoid for loop in this case? The answer may be similar to the code below but I don't know how to proceed.
apply(mat,2,function(x) colSums(mat[]))
Thanks in advance!

The simplest way is to use cumsum() to get the sums and rowMeans() to get the means:
apply(mat, 2, cumsum)[2:4, ]
# [,1] [,2] [,3] [,4]
# [1,] 3 11 19 27
# [2,] 6 18 30 42
# [3,] 10 26 42 58
rowMeans(apply(mat, 2, cumsum)[2:4, ])
# [1] 15 24 34

Related

More general or efficient approach for this matrix multiplication?

In R, is there a more efficient and/or general way to produce the desired output from the two matrices below? I'm suspicious that what I've done is just some esoteric matrix multiplication operation of which I'm not aware.
ff <- matrix(1:6,ncol=2)
# [,1] [,2]
# [1,] 1 4
# [2,] 2 5
# [3,] 3 6
bb <- matrix(7:10,ncol=2)
# [,1] [,2]
# [1,] 7 9
# [2,] 8 10
# DESIRE:
# 7 36
# 14 45
# 21 54
# 8 40
# 16 50
# 24 60
This works, but isn't the general solution I'm looking for:
rr1 <- t(t(ff) * bb[1,])
rr2 <- t(t(ff) * bb[2,])
rbind(rr1,rr2)
# [,1] [,2]
# [1,] 7 36
# [2,] 14 45
# [3,] 21 54
# [4,] 8 40
# [5,] 16 50
# [6,] 24 60
This next code block seems pretty efficient and is general. But is there a better way?
Something like kronecker(ffa,bba)? (which clearly doesn't work in this case)
ffa <- matrix(rep(t(ff),2), ncol=2, byrow=T)
bba <- matrix(rep(bb,each=3), ncol=2)
ffa * bba
# [,1] [,2]
# [1,] 7 36
# [2,] 14 45
# [3,] 21 54
# [4,] 8 40
# [5,] 16 50
# [6,] 24 60
This is related to my other questions:
Using apply function over the row margin with expectation of stacked results, where I'm trying to understand the behavior of apply itself and:
Is this an example of some more general matrix product?, where I'm asking about the theoretical math, specifically.
Use a kronecker product and pick off the appropriate columns:
kronecker(bb, ff)[, c(diag(ncol(bb))) == 1]
or using the infix operator for kronecker:
(bb %x% ff)[, c(diag(ncol(bb))) == 1]
Another approach is to convert the arguments to data frames and mapply kronecker across them. For the case in the question this performs the calculation cbind(bb[, 1] %x% ff[, 1], bb[, 2] %x% ff[, 2]) but in a more general manner without resorting to indices:
mapply(kronecker, as.data.frame(bb), as.data.frame(ff))
or using the infix operator for kronecker:
mapply(`%x%`, as.data.frame(bb), as.data.frame(ff))
The functionality you are seeking for is available within the Matrix package as the function KhatriRao. Since the function is in Matrix, output is a matrix of class "dgCMatrix" (sparse matrix). You can transform it to an ordinary matrix of class "matrix" by as.matrix.
library(Matrix)
as.matrix(KhatriRao(bb, ff))

R: Picking values from matrix by indice matrix

I have a datamatrix with n rows and m columns (in this case n=192, m=1142) and an indice matrix of nxp (192x114). Each row of the indice matrix shows the column numbers of the elements that I would like to pick from the matching row of the datamatrix. Thus I have a situation something like this (with example values):
data<-matrix(1:30, nrow=3)
data
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 4 7 10 13 16 19 22 25 28
[2,] 2 5 8 11 14 17 20 23 26 29
[3,] 3 6 9 12 15 18 21 24 27 30
columnindices<-matrix(sample(1:10,size=9, replace=TRUE),nrow=3)
columnindices
[,1] [,2] [,3]
[1,] 8 7 4
[2,] 10 8 10
[3,] 8 10 2
I would like to pick values from the datamatrix rows using the in columnindices matrix, so that the resulting matrix would look like this
[,1] [,2] [,3]
[1,] 22 19 10
[2,] 29 23 29
[3,] 24 30 6
I tried using a for loop:
result<-0
for(i in 1:3) {
result[i]<-data[i,][columnindices[,i]]
print[i]
}
but this doesn't show the wished result. I guess my problem should be rather simply solved, but unfortunately regardless many hours of work and multiple searches I still haven't been able to solve it (I am rookie). I would really appreciate some help!
Your loop is just a little bit off:
result <- matrix(rep(NA, 9), nrow = 3)
for(i in 1:3){
result[i,] <- data[i, columnindices[i,]]
}
> result
[,1] [,2] [,3]
[1,] 25 13 7
[2,] 29 29 23
[3,] 15 15 18
Note that the matrix is not exactly the one you posted as expected result because the code for your example columnindices does not match the matrix you posted below. Code should work as you want it.
The for-loop way described by #LAP is easier to understand and to implement.
If you would like to have something universal, i.e. you don't need to
adjust row number every time, you may utilise the mapply function:
result <- mapply(
FUN = function(i, j) data[i,j],
row(columnindices),
columnindices)
dim(result) <- dim(columnindices)
mapply loop through every element of two matrices,
row(columnindices) is for i row index
columnindices is for j column index.
It returns a vector, which you have to coerce to the initial columnindices dimension.

Writing a loop/function that compares adjacent columns in a matrix and picks the max value so to reduce the number of columns

I'm new to R and stuck. I want to reduce the number of columns in a 92x8192 matrix. The matrix consists of 92 observations and each column resembles a data point in a spectrum. The value corresponds to an intensity that is an integer. I want to reduce the "resolution" (i.e. the number of data points = columns) of the spectrum in a somewhat controlled way.
Example:
[,1] [,2] [,3] [,4] [,5] [,6] [...]
[1,] 1 2 3 4 5 6
[2,] 7 8 9 10 11 12
[3,] 13 14 15 16 17 18
[4,] 19 20 21 22 23 24
[5,] 25 26 27 28 29 30
[6,] 31 32 33 34 35 36
What i would like to do is compare adjacent columns (for each row) e.g [1,1] and [1,2], and find the max value of those two entries (that would be [1,2] in that case). The smaller value should be dropped, and the next two adjacent columns should be evaluated. So that in the end there will only be ncol/2 left. I know there is something like pmax. But since my knowledge with loops and functions is far too limited at this point i don't know how to not only compare two columns at a time but do it for all 4096 pairs of values in each row. In the end the matrix should look like this:
[,1] [,2] [,3] [...]
[1,] 2 4 6
[2,] 8 10 12
[3,] 14 16 18
[4,] 20 22 24
[5,] 26 28 30
[6,] 32 34 36
The values i have used are not a good example because i know that in this case it looks like i could just drop every other column and i know how to do that.
Apologies if the question is worded in a complicated way but i think the task isn't really all that complicated.
Thanks for any help or suggestions on how to go about this task.
Example matrix:
> set.seed(101)
> x_full <- matrix(runif(30), nrow=5)
> x_full
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.37219838 0.3000548 0.8797957 0.59031973 0.7007115 0.79571976
[2,] 0.04382482 0.5848666 0.7068747 0.82043609 0.9568375 0.07121255
[3,] 0.70968402 0.3334671 0.7319726 0.22411848 0.2133520 0.38940777
[4,] 0.65769040 0.6220120 0.9316344 0.41166683 0.6610615 0.40645122
[5,] 0.24985572 0.5458286 0.4551206 0.03861056 0.9233189 0.65935508
Now reduce:
> x_reduced <- sapply(seq(1, ncol(x_full), 2), function(colnum) { pmax(x_full[, colnum], x_full[, colnum + 1]) })
> x_reduced
[,1] [,2] [,3]
[1,] 0.3721984 0.8797957 0.7957198
[2,] 0.5848666 0.8204361 0.9568375
[3,] 0.7096840 0.7319726 0.3894078
[4,] 0.6576904 0.9316344 0.6610615
[5,] 0.5458286 0.4551206 0.9233189
How it works: seq(1, ncol(x_full), 2) generates a sequence of integers representing the odd numbers up to the number of columns of x_full. Then sapply() applies a function to this sequence and presents the results in a tidy format (in this case it happens to be a matrix as we require). The function being applied is one that we specify using function: for column numbered colnum it just applies pmax() across that column and the next.
Example solution
mat = mat <- matrix(1:16,nrow=4)
m <- matrix(nrow=nrow(mat),ncol=ncol(mat)/2+1) #preassign a solution matrix to save time
for (i in seq(1,ncol(mat),2)){m[,i/2+1]<-(pmax(mat[,i],mat[,i+1]))}
your solution is then stored in m

Mean of list of matrices by element [duplicate]

This question already has answers here:
Mean of each element of a list of matrices
(3 answers)
Closed 6 years ago.
I have a list of matrices:
.list <- list(matrix(1:25, ncol = 5), matrix(11:35, ncol = 5))
I would like to use the Reduce method to find the element-by-element means of the matrices in the list.
In other words, I am looking for the following result:
res = matrix(6:30, ncol = 5)
I tried the following:
res = Reduce(mean, .list)
but I get an error:
Error in mean.default(init, x[[i]]) :
'trim' must be numeric of length one
Note that an element of a matrix can be NA.
Any help would be appreciated! Thank you!!
I just realized that this could be achieved the following way (using the Reduce function):
tmp = Reduce('+', .list)
result = tmp/length(.list)
This is probably easier to solve via an array, rather than a list, as R has some inbuilt, vectorised approaches to this problem.
To get an array from .list, unlist it and supply the relevant dimensions (which could be automated by looking up the dim() of .list[[1]] and length(.list):
arr <- array(unlist(.list), dim = c(5,5,2))
Then, the desired result is obtained via rowMeans() (yes, really!)
rowMeans(arr, dim = 2)
R> rowMeans(arr, dim = 2)
[,1] [,2] [,3] [,4] [,5]
[1,] 6 11 16 21 26
[2,] 7 12 17 22 27
[3,] 8 13 18 23 28
[4,] 9 14 19 24 29
[5,] 10 15 20 25 30
The na.rm argument handles the NA case too:
R> rowMeans(arr, dim = 2, na.rm = TRUE)
[,1] [,2] [,3] [,4] [,5]
[1,] 6 11 16 21 26
[2,] 7 12 17 22 27
[3,] 8 13 18 23 28
[4,] 9 14 19 24 29
[5,] 10 15 20 25 30
A slower way is to use apply(), which may be more instructive as to what rowMeans() is doing:
R> apply(arr, 1:2, mean, na.rm = TRUE)
[,1] [,2] [,3] [,4] [,5]
[1,] 6 11 16 21 26
[2,] 7 12 17 22 27
[3,] 8 13 18 23 28
[4,] 9 14 19 24 29
[5,] 10 15 20 25 30
i.e applying the mean function, grouping the data by the row and column dimensions. Think of the array as a box, with the height of the box being the third dimension. This box consists of little cubes, like a rubic cube. We want the mean of the little cubes stacked up above each row and column combination; the mean of the little cubes stacked above (1,1), and so on. This is what the apply() and rowMeans() functions do for you, if you treat the multiple matrices in a list as an array.
Here is one way with mapply.
matrix(do.call(mapply, c(function(...) mean(unlist(list(...))), .list)), ncol=5)
As a side note, .list isn't the best way to use a keyword as a variable name. In R, the period prefix means something like "meta-variable", and these variables won't show up when you call ls(). You could do list. or the easier to read list_.

do.call in r with matrix and lists

in language R, in order to generate a new matrix (N*6) as from an older one (N*3), is there a better way than the next one to do it without having to "unpack/unlist" the inner lists created in the apply function in order to "expand" the source matrix?
transformed <- matrix(byrow=T)
transformed <- as.matrix(
do.call("rbind", as.list(
apply(dataset, 1, function(x) {
x <- list(x[1], x[2], x[3], x[2]*x[3], x[2]^2, x[3]^2)
})
))
)
#Unpack all inner lists from the expanded matrix
ret_trans <- as.matrix( apply(transformed, 2, function(x) unlist(x)) )
EDIT: I add an example of that
dataset
[,1] [,2] [,3]
[1,] 1 6 11
[2,] 2 7 12
[3,] 3 8 13
[4,] 4 9 14
[5,] 5 10 15
and on applying the code above I want to expand to N*6, 5*6 (sorry, I misspelled the column dimension up there, and the margin of apply function) it should be like that
transformed
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 6 11 66 36 121
[2,] 2 7 12 84 49 144
[3,] 3 8 13 104 64 169
[4,] 4 9 14 126 81 196
[5,] 5 10 15 150 100 225
The question is if there is another way of doing that without having to use the last apply function, without having to coerce the x to be a list
thanks all for your replies
Like suggested in the comments, do:
cbind(dataset, dataset[,2] * dataset[,3], dataset[,c(2, 3)]^2)
It will be a lot faster than using apply, which should have looked like this:
transformed <- function(x) c(x[1], x[2], x[3], x[2]*x[3], x[2]^2, x[3]^2)
apply(dataset, 1, transformed)

Resources