Element-wise mean over list of matrices [duplicate] - r

This question already has answers here:
Mean of each element of a list of matrices
(3 answers)
How to sum a numeric list elements
(2 answers)
Closed 9 years ago.
Suppose you have list of matrices. What is the most convenient way to calculate the mean matrix on an element by element basic? Suppose we have a list of matrices:
> A <- matrix(c(1:9), 3, 3)
> A
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> B <- matrix(c(2:10), 3, 3)
> B
[,1] [,2] [,3]
[1,] 2 5 8
[2,] 3 6 9
[3,] 4 7 10
> my.list <- list(A, B)
So the desired output should be:
[,1] [,2] [,3]
[1,] 1.5 4.5 7.5
[2,] 2.5 5.5 8.5
[3,] 3.5 6.5 9.5

You can use:
Reduce("+", my.list) / length(my.list)
According to comments, you want both mean and sd implemented on a list of matrices, and the above ways will not work smoothly for sd. Try this instead :
apply(simplify2array(my.list), 1:2, mean)
apply(simplify2array(my.list), 1:2, sd)

Here is an alternative that should be pretty quick as we are working with base functions designed to work with matrices. We just take your list and use array to turn it into a 3D array then either use apply or just rowMeans...
# Make some data, a list of 3 matrices of 4x4
ll <- replicate( 3 , matrix( sample(5,16,repl=TRUE) , 4 ) , simplify = FALSE )
# Make a 3D array from list of matrices
arr <- array( unlist(ll) , c(4,4,3) )
# Get mean of third dimension
apply( arr , 1:2 , mean )
# [,1] [,2] [,3] [,4]
#[1,] 3.000000 3.666667 3.000000 1.666667
#[2,] 2.666667 3.666667 3.333333 3.666667
#[3,] 4.666667 2.000000 1.666667 3.666667
#[4,] 1.333333 4.333333 3.666667 3.000000
Or you can use rowMeans which is quicker, specifying you want to get the mean over 2 dimensions...
# Get mean of third dimension
rowMeans( arr , dims = 2 )
# [,1] [,2] [,3] [,4]
#[1,] 3.000000 3.666667 3.000000 1.666667
#[2,] 2.666667 3.666667 3.333333 3.666667
#[3,] 4.666667 2.000000 1.666667 3.666667
#[4,] 1.333333 4.333333 3.666667 3.000000

Related

Dividing a list of matrices by a matrix

I have a list of matrices that I like to divide the values in each matrix by a different value.
l1 <- list(1,2,3,4,5,6)
l2 <- list(7,8,9,10,11,12)
mat <- Map(
function(x, y) outer(unlist(x), unlist(y), `+`) / 2,
split(l1, ceiling(seq_along(l1) / 3)),
split(l2, ceiling(seq_along(l2) / 3))
)
For example the output below shows one of the elements in the mat list:
$`1`
[,1] [,2] [,3]
[1,] 4.0 4.5 5.0
[2,] 4.5 5.0 5.5
[3,] 5.0 5.5 6.0
I would like to divide the values in the matrix by another matrix with different values
Maybe a matrix that looks like this (I wasn't sure how to create a matrix in r)
2 1 2
3 2 3
1 2 3
My desired output would then look like this:
[,1] [,2] [,3]
[1,] 4.0/2 4.5/1 5.0/2
[2,] 4.5/3 5.0/2 5.5/3
[3,] 5.0/1 5.5/2 6.0/3
How could I do create this output? How do I create a matrix with my desired values in R?
Thank you.
If your matrices are the same dimensions you can divide them with the / operator.
# create matrix to divide by
mat_div <- matrix(c(2,3,1,1,2,2,2,3,3), nrow = 3)
# divide list of matricies
lapply(mat, `/`, mat_div)
#------
$`1`
[,1] [,2] [,3]
[1,] 2.0 4.50 2.500000
[2,] 1.5 2.50 1.833333
[3,] 5.0 2.75 2.000000
$`2`
[,1] [,2] [,3]
[1,] 3.5 7.50 4.000000
[2,] 2.5 4.00 2.833333
[3,] 8.0 4.25 3.000000
We can use Map
mat <- Map(`/`, mat, list(mat2))
-otuput
mat
$`1`
[,1] [,2] [,3]
[1,] 2.0 4.50 2.500000
[2,] 1.5 2.50 1.833333
[3,] 5.0 2.75 2.000000
$`2`
[,1] [,2] [,3]
[1,] 3.5 7.50 4.000000
[2,] 2.5 4.00 2.833333
[3,] 8.0 4.25 3.000000
data
mat2 <- cbind(c(2, 3, 1), c(1, 2, 2), c(2, 3, 3))

R: How can I obtain returns by row in a matrix?

First I create a 5x4 matrix with random numbers from 1 to 10:
A <- matrix(sample(1:10, 20, TRUE), 5, 4)
> A
[,1] [,2] [,3] [,4]
[1,] 1 5 6 6
[2,] 5 9 9 4
[3,] 10 6 1 8
[4,] 4 4 10 2
[5,] 10 9 7 5
In the following step I would like to obtain the returns by row (for row 1: (5-1)/1, (6-5)/5, (6-6)/6 and the same procedure for the other rows). The final matrix should therefore be a 5x3 matrix.
You can make use of the Base R funtion diff() applied to your transposed matrix:
Code:
# Data
set.seed(1)
A <- matrix(sample(1:10, 20, TRUE), 5, 4)
# [,1] [,2] [,3] [,4]
#[1,] 9 7 5 9
#[2,] 4 2 10 5
#[3,] 7 3 6 5
#[4,] 1 1 10 9
#[5,] 2 5 7 9
# transpose so we get per row and not column returns
t(diff(t(A))) / A[, -ncol(A)]
[,1] [,2] [,3]
[1,] -0.2222222 -0.2857143 0.8000000
[2,] -0.5000000 4.0000000 -0.5000000
[3,] -0.5714286 1.0000000 -0.1666667
[4,] 0.0000000 9.0000000 -0.1000000
[5,] 1.5000000 0.4000000 0.2857143
A <- matrix(sample(1:10, 20, TRUE), 5, 4)
fn.Calc <- function(a,b){(a-b)/a}
B <- matrix(NA, nrow(A), ncol(A)-1)
for (ir in 1:nrow(B)){
for (ic in 1:ncol(B)){
B[ir, ic] <- fn.Calc(A[ir, ic+1], A[ir, ic])
}
}
small note: when working with random functions providing a seed is welcomed ;)
So what we have here:
fn.Calc is just the calculation you are trying to do, i've isolated it in a function so that it's easier to change if needed
then a new B matrix is created having 1 column less then A but the same rows
finally we are going to loop every element in this B matrix, I like to use ir standing for incremental rows and ic standing for incremental column and finally inside the loop (B[ir, ic] <- fn.Calc(A[ir, ic+1], A[ir, ic])) is when the magic happens where the actual values are calculated and stored in B
it's a very basic approach without calling any package, there's probably many other ways to solve this that require less code.

Calculating standard error of the mean from multiple files in a directory in R

I have multiple text files (hundreds of them) in a directory. Each text has dimensions 225 rows and 50 columns (all the same row names and column names). All text files are numbers and I need to generate one data-frame that takes the standard error of the mean of each cell of all of these text files.
There is plenty of code to calculate one master data-frame that has the average in each cell of all text files in a directory but none for calculating one master data frame that just shows standard error of the mean in every cell.
For example, this will bring in all text files, read them, and generates one master data frame that has the average each cell for each text file.
txt <- lapply(list.files(pattern = ".txt"), read.delim)
Z <- Reduce("+", txt) / length(txt)
Which gives one data frame that looks like this:
>head(Z)
C1 C2 C3
Row_1 20 22 25
Row_2 14 9 22
But these are averages of all text files combined into one data frame. I would like this to be standard errors of the mean instead, and unfortunately I haven't found posts that can generate this result. There are plenty of posts that take the standard error of columns of one data-frame, just not this many stored in a directory.
I have tried this, but unfort. it does not work:
SE <- Reduce("sd", txt) /sqrt(length(txt)
Any help would be greatly appreciated. Thank-you.
One option would be to unlist, create an array and use one of the custom functions that calculate standard error
library(plotrix)
dim1 <- c(dim(txt[[1]]), length(txt))
apply(array(unlist(txt), dim1), 1:2, std.error)
# [,1] [,2] [,3] [,4]
#[1,] 1.666667 1.2018504 1.452966 1.7638342
#[2,] 2.081666 1.5275252 1.527525 2.3333333
#[3,] 2.027588 0.8819171 1.855921 0.8819171
which is also equal to the function OP showed for calculating
apply(array(unlist(txt), dim1), 1:2, function(x) sd(x)/sqrt(length(x)))
# [,1] [,2] [,3] [,4]
#[1,] 1.666667 1.2018504 1.452966 1.7638342
#[2,] 2.081666 1.5275252 1.527525 2.3333333
#[3,] 2.027588 0.8819171 1.855921 0.8819171
It can also be used to calculate the mean
Reduce(`+`, txt)/length(txt)
# V1 V2 V3 V4
#1 5.333333 6.333333 5.333333 4.666667
#2 4.000000 3.000000 4.000000 5.333333
#3 4.666667 4.666667 6.666667 6.666667
apply(array(unlist(txt), dim1), 1:2, mean)
# [,1] [,2] [,3] [,4]
#[1,] 5.333333 6.333333 5.333333 4.666667
#[2,] 4.000000 3.000000 4.000000 5.333333
#[3,] 4.666667 4.666667 6.666667 6.666667
apply(array(unlist(txt), dim1), 2, rowMeans)
data
set.seed(24)
txt <- lapply(1:3, function(i) as.data.frame(matrix(sample(1:9, 3 * 4,
replace = TRUE), 3, 4)))

apply and lapply in one function return an NAN

I have a function return list of list, I would like to find the standard deviation of the matrices of my output. The output of my function is a list of two list. I tried this code but it return me NAN. Since my function is complex, then I use this example from another question please see here since it is quite close to what I am trying to do.
> A <- matrix(c(1:9), 3, 3)
> A
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> B <- matrix(c(2:10), 3, 3)
> B
[,1] [,2] [,3]
[1,] 2 5 8
[2,] 3 6 9
[3,] 4 7 10
> my.list1 <- list(A, B)
so the mean of the first list is:
[,1] [,2] [,3]
[1,] 1.5 4.5 7.5
[2,] 2.5 5.5 8.5
[3,] 3.5 6.5 9.5
Then the standard deviation will be:
[,1] [,2] [,3]
[1,] 0.7071068 0.7071068 0.7071068
[2,] 0.7071068 0.7071068 0.7071068
[3,] 0.7071068 0.7071068 0.7071068
> c <- matrix(c(1:9), 3, 3)
> c
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> d <- matrix(c(2:10), 3, 3)
> d
[,1] [,2] [,3]
[1,] 2 5 8
[2,] 3 6 9
[3,] 4 7 10
> my.list2 <- list(c, d)
my.list <-list(my.list1,my.list2)
How can I get the standard deviation of my matrices on an element by element for the list?
Try ?rapply
> rapply(my.list, sd)
[1] 2.738613 2.738613 2.738613 2.738613
You could bind your lists into an array, or perhaps make your function return an array(?), then you could use apply() to apply your chosen functions...
A <- matrix(1:9, 3, 3)
B <- matrix(2:10, 3, 3)
my.list1 <- list(A, B)
c <- matrix(1:9, 3, 3)
d <- matrix(2:10, 3, 3)
my.list2 <- list(c, d)
Create array from all 4 lists
my.array1 <- abind::abind(c(my.list1, my.list2), along = 3)
Find the mean() of the required dimension
apply(my.array1, c(1, 2), mean)
apply(my.array1, c(1,2), sd)
Output
[,1] [,2] [,3]
[1,] 1.5 4.5 7.5
[2,] 2.5 5.5 8.5
[3,] 3.5 6.5 9.5

Remove NA's when adding in R

I'm trying to add two matrices in R, and I'd like the addition to treat any NA's as 0's. I know I could always do something like this:
ifelse(is.na(A), 0, A) + ifelse(is.na(B), 0, B)
but it seems like there should be a more elegant way of doing this. For example, is there some way of supplying the na.rm argument to the + function?
Assuming that "A" and "B" have the same dimensions,
`dim<-`(colSums(rbind(c(A), c(B)), na.rm=TRUE), dim(A))
# [,1] [,2] [,3] [,4]
#[1,] 4 7 6 6
#[2,] 5 7 2 4
#[3,] 8 9 6 1
#[4,] 4 2 5 5
Or instead of ifelse, we could use replace which will be a bit faster
replace(A, is.na(A), 0) +replace(B, is.na(B), 0)
# [,1] [,2] [,3] [,4]
#[1,] 4 7 6 6
#[2,] 5 7 2 4
#[3,] 8 9 6 1
#[4,] 4 2 5 5
Or if there are multiple datasets, we can place it in a list and work with Reduce
Reduce(`+`, lapply(list(A,B), function(x) replace(x, is.na(x), 0)))
Another compact option would be to use NAer from qdap
library(qdap)
NAer(A)+NAer(B)
For multiple datasets
Reduce(`+`, lapply(list(A,B), NAer))
data
set.seed(324)
A <- matrix(sample(c(NA,1:5), 4*4, replace=TRUE), ncol=4)
set.seed(59)
B <- matrix(sample(c(NA,1:5), 4*4, replace=TRUE), ncol=4)
You can try recode from the car package
A <- matrix(c(1,NA,5,9,3,NA), 2)
B <- matrix(c(NA,10,3,NA,21,3), 2)
library(car)
Reduce("+", lapply(list(A, B), recode, "NA=0"))
# [,1] [,2] [,3]
# [1,] 1 8 24
# [2,] 10 9 3

Resources