I have a list with three matrixes:
a<-matrix(runif(100))
b<-matrix(runif(100))
c<-matrix(runif(100))
mylist<-list(a,b,c)
I would like to obtain the mean of each element in the three matrices.
I tried: aaply(laply(mylist, as.matrix), c(1, 1), mean) but this returns the means of each matrix instead of taking the mean of each element as rowMeans() would.
Maybe what you want is:
> set.seed(1)
> a<-matrix(runif(4))
> b<-matrix(runif(4))
> c<-matrix(runif(4))
> mylist<-list(a,b,c) # a list of 3 matrices
>
> apply(simplify2array(mylist), c(1,2), mean)
[,1]
[1,] 0.3654349
[2,] 0.4441000
[3,] 0.5745011
[4,] 0.5818541
The vector c(1,2) for MARGIN in the apply call indicates that the function mean should be applied to rows and columns (both at once), see ?apply for further details.
Another alternative is using Reduce function
> Reduce("+", mylist)/ length(mylist)
[,1]
[1,] 0.3654349
[2,] 0.4441000
[3,] 0.5745011
[4,] 0.5818541
The simplify2array option is really slow because it calls the mean function nrow*ncol times:
Unit: milliseconds
expr min lq mean median uq max neval
reduce 7.320327 8.051267 11.23352 12.17859 13.59846 13.72176 10
simplify2array 4233.090223 4674.827077 4802.74033 4808.00417 5010.75771 5228.05362 10
via_vector 27.720372 42.757517 51.95250 59.47917 60.11251 61.83605 10
for_loop 10.405315 12.919731 13.93157 14.46218 15.82175 15.89977 10
l=lapply(1:3,function(i)matrix(i*(1:1e6),10))
microbenchmark(times=10,
Reduce={Reduce(`+`,l)/length(l)},
simplify2array={apply(simplify2array(l),c(1,2),mean)},
via_vector={matrix(rowMeans(sapply(l,as.numeric)),nrow(l[[1]]))},
for_loop={o=l[[1]];for(i in 2:length(l))o=o+l[[i]];o/length(l)}
)
Your question is not clear.
For the mean of all elements of each matrix:
sapply(mylist, mean)
For the mean of every row of each matrix:
sapply(mylist, rowMeans)
For the mean of every column of each matrix:
sapply(mylist, colMeans)
Note that sapply will automatically simplify the results to a vector or matrix, if possible. In the first case, the result will be a vector, but in the second and third, it may be a list or matrix.
Example:
a <- matrix(1:6,2,3)
b <- matrix(7:10,2,2)
c <- matrix(11:16,3,2)
mylist <- list(a,b,c)
> mylist
[[1]]
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[[2]]
[,1] [,2]
[1,] 7 9
[2,] 8 10
[[3]]
[,1] [,2]
[1,] 11 14
[2,] 12 15
[3,] 13 16
Results:
> sapply(mylist, mean)
[1] 3.5 8.5 13.5
> sapply(mylist, rowMeans)
[[1]]
[1] 3 4
[[2]]
[1] 8 9
[[3]]
[1] 12.5 13.5 14.5
> sapply(mylist, colMeans)
[[1]]
[1] 1.5 3.5 5.5
[[2]]
[1] 7.5 9.5
[[3]]
[1] 12 15
Related
I want to split a large matrix, mt, into a list of sub-matrices, res. The number of rows for each sub-matrix is specified by a vector, len.
For example,
> mt=matrix(c(1:20),ncol=2)
> mt
[,1] [,2]
[1,] 1 11
[2,] 2 12
[3,] 3 13
[4,] 4 14
[5,] 5 15
[6,] 6 16
[7,] 7 17
[8,] 8 18
[9,] 9 19
[10,] 10 20
lens=c(2,3,5)
What I want is a function some_function, that can offer the following result,
> res=some_function(mt,lens)
> res
[[1]]
[,1] [,2]
[1,] 1 11
[2,] 2 12
[[2]]
[,1] [,2]
[1,] 3 13
[2,] 4 14
[3,] 5 15
[[3]]
[,1] [,2]
[1,] 6 16
[2,] 7 17
[3,] 8 18
[4,] 9 19
[5,] 10 20
Speed is a big concern. The faster, the better!
Many thanks!
A function to create index based on length of each value and split the matrix.
mt <- matrix(c(1:20), ncol=2)
# Two arguments: m - matrix, len - length of each group
m_split <- function(m, len){
index <- 1:sum(len)
group <- rep(1:length(len), times = len)
index_list <- split(index, group)
mt_list <- lapply(index_list, function(vec) mt[vec, ])
return(mt_list)
}
m_split(mt, c(2, 3, 5))
$`1`
[,1] [,2]
[1,] 1 11
[2,] 2 12
$`2`
[,1] [,2]
[1,] 3 13
[2,] 4 14
[3,] 5 15
$`3`
[,1] [,2]
[1,] 6 16
[2,] 7 17
[3,] 8 18
[4,] 9 19
[5,] 10 20
Update
I used the following code to compare the performance of each method in this post.
library(microbenchmark)
library(data.table)
# Test case from #missuse
mt <- matrix(c(1:20000000),ncol=10)
lens <- c(20000,15000,(nrow(mt)-20000-15000))
# Functions from #Damiano Fantini
split.df <- function(mt, lens) {
fac <- do.call(c, lapply(1:length(lens), (function(i){ rep(i, lens[i])})))
split(as.data.frame(mt), f = fac)
}
split.mat <- function(mt, lens) {
fac <- do.call(c, lapply(1:length(lens), (function(i){ rep(i, lens[i])})))
lapply(unique(fac), (function(i) {mt[fac==i,]}))
}
# Benchmarking
microbenchmark(m1 = {m_split(mt, lens)}, # #ycw's method
m2 = {pam = rep(1:length(lens), times = lens)
split(data.table(mt), pam)}, # #missuse's data.table method
m3 = {split.df(mt, lens)}, # #Damiano Fantini's data frame method
m4 = {split.mat(mt, lens)}) # #Damiano Fantini's matrix method
Unit: milliseconds
expr min lq mean median uq max neval
m1 167.6896 209.7746 251.0932 230.5920 274.9347 555.8839 100
m2 402.3415 497.2397 554.1094 547.9603 599.7632 787.4112 100
m3 552.8548 657.6245 719.2548 711.4123 769.6098 989.6779 100
m4 166.6581 203.6799 249.2965 235.5856 275.4790 547.4927 100
As we can see, m1 and m4 are the fastest, while there are almost no differences between them, which means it is not needed to convert the matrix to a data frame or a data.table especially if the OP will keep working on the matrix. Working directly on the matrix (m1 and m4) should be sufficient.
If you are OK working with data.frames instead of matrices, you might build a grouping factor/vector according to lens and then use split(). Alternatively, use this grouping vector to subset your matrix and return a list. In this example, I wrapped the two solutions into two functions: .
# your data
mt=matrix(c(1:20),ncol=2)
lens=c(2,3,5)
# based on split
split.df <- function(mt, lens) {
fac <- do.call(c, lapply(1:length(lens), (function(i){ rep(i, lens[i])})))
split(as.data.frame(mt), f = fac)
}
split.df(mt, lens)
# based on subsetting
split.mat <- function(mt, lens) {
fac <- do.call(c, lapply(1:length(lens), (function(i){ rep(i, lens[i])})))
lapply(unique(fac), (function(i) {mt[fac==i,]}))
}
split.mat(mt, lens)
This second option is about ~10 times faster than the other one according to microbenchmark
library(microbenchmark)
microbenchmark({split.df(mt, lens)}, times = 1000)
# median = 323.743 microseconds
microbenchmark({split.mat(mt, lens)}, times = 1000)
# median = 31.7645 microseconds
One aproach is using split, however it can operate on vectors and data.frames so you need to convert the matrix - data.table should be efficient
mt=matrix(c(1:20),ncol=2)
lens=c(2,3,5)
pam = rep(1:length(lens), times = lens)
library(data.table)
mt_split <- split(data.table(mt), pam)
mt_split
#output
$`1`
V1 V2
1: 1 11
2: 2 12
$`2`
V1 V2
1: 3 13
2: 4 14
3: 5 15
$`3`
V1 V2
1: 6 16
2: 7 17
3: 8 18
4: 9 19
5: 10 20
Checking speed
mt=matrix(c(1:20000000),ncol=10)
lens=c(20000,15000,(nrow(mt)-20000-15000))
pam = rep(1:length(lens), times = lens)
system.time(split(data.table(mt), pam))
#output
user system elapsed
0.75 0.20 0.96
I have a relatively large list. Each element of the list is a list of six elements. Each of these elements is a matrix with a fixed number of rows and a variable number of columns. I would like to combine the matrices such that I end up with one list of six matrices where each matrix is the result of calling cbind on the corresponding elements from each sub-list i.e. The first matrix is a cbind of all first matrices from the inner lists, the second matrix is a cbind of the second matrices, etc.
For instance:
temp = list()
temp[["a"]] = list(matrix(1, nrow=2, ncol=1), matrix(2, nrow=2,ncol=2))
temp[["b"]] = list(matrix(3, nrow=2, ncol=3), matrix(4, nrow=2,ncol=4))
*call some R code* should result in the output of
$`1`
[,1] [,2] [,3] [,4]
[1,] 1 3 3 3
[2,] 1 3 3 3
$`3`
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 2 2 4 4 4 4
[2,] 2 2 4 4 4 4
I can see that calling
mapply(cbind, temp[[1]], temp[[2]])
produces the desired output for this toy example, but do I do it for a large list, with a possibly variable number elements from execution to execution.
Is there an elegant and performant solution for this?
Thanks in advance!
purrr::pmap iterates in parallel over list items and passes to the function you specify, so you can obtain your desired results with just
library(purrr)
temp %>% pmap(cbind)
## [[1]]
## [,1] [,2] [,3] [,4]
## [1,] 1 3 3 3
## [2,] 1 3 3 3
##
## [[2]]
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 2 2 4 4 4 4
## [2,] 2 2 4 4 4 4
do.call(mapply, c(cbind, temp))
We can use transpose from purrr and cbind
library(purrr)
lapply(transpose(temp), function(x) do.call(cbind, x))
Or we can just use transpose with map from purrr
transpose(temp) %>%
map(~matrix(unlist(.), nrow=2))
#[[1]]
# [,1] [,2] [,3] [,4]
#[1,] 1 3 3 3
#[2,] 1 3 3 3
#[[2]]
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 2 2 4 4 4 4
#[2,] 2 2 4 4 4 4
We can also try with the split option
library(data.table)
with(melt(temp), lapply(split(value, L1), matrix, nrow=2))
Benchmarks
set.seed(24)
lst <- lapply(1:1e5, function(x) replicate(2, matrix(sample(1:5, 10,
replace=TRUE), nrow=2), simplify = FALSE))
system.time({
do.call(mapply, c(cbind, lst))
})
# user system elapsed
# 0.66 0.00 0.65
system.time({
lst %>% pmap(cbind)
})
# user system elapsed
# 0.61 0.00 0.61
system.time({
lapply(transpose(lst), function(x) do.call(cbind, x))
})
# user system elapsed
# 0.39 0.00 0.40
library(microbenchmark)
microbenchmark(Hong = do.call(mapply, c(cbind, lst)),
ae = lst %>% pmap(cbind),
akrun = lapply(transpose(lst), function(x) do.call(cbind, x)),
unit = "relative")
# Unit: relative
# expr min lq mean median uq max neval
# Hong 1.716893 2.346379 1.975948 2.069316 2.012889 1.288478 100
# ae 1.623129 2.096566 1.697061 1.805834 1.702961 1.193930 100
# akrun 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 100
I would like to create 3 matrices from a bigger matrix.
The new matrices should contain:
new matrix 1: the 1st, 4th, 7th.... element of the old matrix
new matrix 2: the 2nd, 5th, 8th.... element of the old matrix
new matrix 3: the 3rd, 6th, 9th.... element of the old matrix
So if my matrix looks like this:
m<-matrix(c(1:3),nrow=12, ncol=2)
[,1] [,2]
[1,] 1 1
[2,] 2 2
[3,] 3 3
[4,] 1 1
[5,] 2 2
[6,] 3 3
[7,] 1 1
[8,] 2 2
[9,] 3 3
[10,] 1 1
[11,] 2 2
[12,] 3 3
I tried it with a for loop like this
for(i in 1:4){
m1<-m[i+3,]
m2<-m[i+4,]
m3<-m[i+5,]
}
But this not only would not be able to give me the 1st/2nd/3rd rows, but also doesn't give me all rows.
There has to be a more elegant way to do it.
Take advantage of the cycling rule of indexing in R:
m[c(T, F, F),]
# [,1] [,2]
# [1,] 1 1
# [2,] 1 1
# [3,] 1 1
# [4,] 1 1
m[c(F, T, F),]
# [,1] [,2]
# [1,] 2 2
# [2,] 2 2
# [3,] 2 2
# [4,] 2 2
m[c(F, F, T),]
# [,1] [,2]
# [1,] 3 3
# [2,] 3 3
# [3,] 3 3
# [4,] 3 3
When we are indexing the matrix with vectors which have different length from the number of rows of the matrix, the vector here which has a smaller length will get cycled until their lengths match, so for instance, the first case, the actual indexing vector is extended to c(T, F, F, T, F, F, T, F, F) which will pick up the first, fourth and seventh row as expected. The same goes for case two and three.
We can use seq to do this. This will be faster for big datasets.
m[seq(1, nrow(m), by =3),]
Or we could do:
m[seq(nrow(m))%%3==1,] # 1th, 3th, 7th, ...
m[seq(nrow(m))%%3==2,] # 2th, 5th, 8th, ...
m[seq(nrow(m))%%3==0,] # 3th, 6th, 9th, ...
BENCHMARKING
library(microbenchmark)
m <- matrix(c(1:3),nrow=12, ncol=2)
func_Psidom <- function(m){m[c(T, F, F),]}
func_akrun <- function(m){ m[seq(1, nrow(m), by =3),]}
func_42 <- function(m){ m[c(TRUE,FALSE,FALSE), ]}
func_m0h3n <- function(m){m[seq(nrow(m))%%3==1,]}
r <- func_Psidom(m)
all(func_akrun(m)==r)
# [1] TRUE
all(func_42(m)==r)
# [1] TRUE
all(func_m0h3n(m)==r)
# [1] TRUE
microbenchmark(func_Psidom(m), func_akrun(m), func_42(m), func_m0h3n(m))
# Unit: microseconds
# expr min lq mean median uq max neval
# func_Psidom(m) 2.566 3.850 4.49990 4.2780 4.7050 14.543 100
# func_akrun(m) 38.923 39.779 43.58536 40.2065 41.0615 252.359 100
# func_42(m) 2.994 3.422 4.13628 4.2770 4.7050 13.688 100
# func_m0h3n(m) 18.820 20.103 22.37447 20.7445 21.3860 104.365 100
# ============================================================
m <- matrix(c(1:3),nrow=1200, ncol=2)
r <- func_Psidom(m)
all(func_akrun(m)==r)
# [1] TRUE
all(func_42(m)==r)
# [1] TRUE
all(func_m0h3n(m)==r)
# [1] TRUE
microbenchmark(func_Psidom(m), func_akrun(m), func_42(m), func_m0h3n(m))
# Unit: microseconds
# expr min lq mean median uq max neval
# func_Psidom(m) 12.832 13.6875 14.41458 14.542 14.543 22.242 100
# func_akrun(m) 56.033 57.3150 65.17700 57.743 58.599 289.998 100
# func_42(m) 12.832 13.4735 14.76962 14.115 14.543 56.032 100
# func_m0h3n(m) 76.990 78.2730 97.82522 78.702 79.557 1873.437 100
# ============================================================
m <- matrix(c(1:3),nrow=120000, ncol=2)
r <- func_Psidom(m)
all(func_akrun(m)==r)
# [1] TRUE
all(func_42(m)==r)
# [1] TRUE
all(func_m0h3n(m)==r)
# [1] TRUE
microbenchmark(func_Psidom(m), func_akrun(m), func_42(m), func_m0h3n(m))
# Unit: microseconds
# expr min lq mean median uq max neval
# func_Psidom(m) 963.665 978.6355 1168.161 1026.113 1076.798 3648.498 100
# func_akrun(m) 1674.117 1787.6785 2808.231 1890.760 2145.043 58450.377 100
# func_42(m) 960.672 976.2835 1244.467 1033.812 1115.507 3114.268 100
# func_m0h3n(m) 5817.920 6127.8070 7697.345 7455.895 8055.565 62414.963 100
Logical vectors get recycled to the length of the number of rows or columns when matrix indexing:
m[c(TRUE,FALSE,FALSE), ]
[,1] [,2]
[1,] 1 1
[2,] 1 1
[3,] 1 1
[4,] 1 1
m[c(TRUE,FALSE,FALSE)[c(2,1,3)], ] # the numeric vector permutes the logical values
[,1] [,2]
[1,] 2 2
[2,] 2 2
[3,] 2 2
[4,] 2 2
m[c(TRUE,FALSE,FALSE)[c(2,3,1)], ]
[,1] [,2]
[1,] 3 3
[2,] 3 3
[3,] 3 3
[4,] 3 3
I designed my own function called SharpeRatio(data)
Where data is an nx2 matrix.
The function works fine for a given matrix dat, however when I try to use rollapply(dat, 20, SharpeRatio) I get the following error: Error in dat[, 1] : incorrect number of dimensions
The following is the function definition:
SharpeRatio <- function(dat){
Returns = dat[,1]
RiskFree = dat[,2]
ER = (Returns - RiskFree)/100
Volatility = sd(Returns/100)
return((exp(mean(log(1+ER))) - 1)/Volatility)
}
rollapply applies a function to rolling margins of an array. But it does this column-by-column. That is, it does not present an array to your function, but presents vectors N times over (N=2 in your case).
Here's an example:
(m <- matrix(1:10, ncol=2))
## [,1] [,2]
## [1,] 1 6
## [2,] 2 7
## [3,] 3 8
## [4,] 4 9
## [5,] 5 10
We'll rollapply a mean, and see what gets passed in at each iteration:
y <- rollapply(m, width=2, FUN=function(x) {print(x); mean(x)})
## [1] 1 2
## [1] 2 3
## [1] 3 4
## [1] 4 5
## [1] 6 7
## [1] 7 8
## [1] 8 9
## [1] 9 10
No matrices are passed to the function, just individual vectors. rollapply then packages the result up as an array in the correct shape:
y
## [,1] [,2]
## [1,] 1.5 6.5
## [2,] 2.5 7.5
## [3,] 3.5 8.5
## [4,] 4.5 9.5
The fix is in Mr. Grothendieck's comment, to pass by.column=FALSE to rollapply:
z <- rollapply(m, width=2, by.column=FALSE, FUN=function(x) {print(x); colMeans(x)})
## [,1] [,2]
## [1,] 1 6
## [2,] 2 7
## [,1] [,2]
## [1,] 2 7
## [2,] 3 8
## [,1] [,2]
## [1,] 3 8
## [2,] 4 9
## [,1] [,2]
## [1,] 4 9
## [2,] 5 10
Here, matrices of two rows each are being passed to the function. The result is the same as above:
z
## [,1] [,2]
## [1,] 1.5 6.5
## [2,] 2.5 7.5
## [3,] 3.5 8.5
## [4,] 4.5 9.5
rollapply works by column unless the by.column=FALSE argument is used so try this:
rollapply(dat, 20, SharpeRatio, by.column = FALSE)
Since the current roll apply taking a matrix returns "incorrect number of dimension", because it is not meant to take a matrix. I wrote my own one
rollmatapply <- function(m,by=100,FUN)
{
v = vector()
for(i in 1:(nrow(m)-by))
{
v[i] = do.call(FUN,list(m[i:(by+i-1),]))
}
v
}
I have a matrix in R that is supposed to be symmetric, however, due to machine precision the matrix is never symmetric (the values differ by around 10^-16). Since I know the matrix is symmetric I have been doing this so far to get around the problem:
s.diag = diag(s)
s[lower.tri(s,diag=T)] = 0
s = s + t(s) + diag(s.diag,S)
Is there a better one line command for this?
s<-matrix(1:25,5)
s[lower.tri(s)] = t(s)[lower.tri(s)]
You can force the matrix to be symmetric using forceSymmetric function in Matrix package in R:
library(Matrix)
x<-Matrix(rnorm(9), 3)
> x
3 x 3 Matrix of class "dgeMatrix"
[,1] [,2] [,3]
[1,] -1.3484514 -0.4460452 -0.2828216
[2,] 0.7076883 -1.0411563 0.4324291
[3,] -0.4108909 -0.3292247 -0.3076071
A <- forceSymmetric(x)
> A
3 x 3 Matrix of class "dsyMatrix"
[,1] [,2] [,3]
[1,] -1.3484514 -0.4460452 -0.2828216
[2,] -0.4460452 -1.0411563 0.4324291
[3,] -0.2828216 0.4324291 -0.3076071
Is the workaround really necessary if the values only differ by that much?
Someone pointed out that my previous answer was wrong. I like some of the other ones better, but since I can't delete this one (accepted by a user who left), here's yet another solution using the micEcon package:
symMatrix(s[upper.tri(s, TRUE)], nrow=nrow(s), byrow=TRUE)
s<-matrix(1:25,5)
pmean <- function(x,y) (x+y)/2
s[] <- pmean(s, matrix(s, nrow(s), byrow=TRUE))
s
#-------
[,1] [,2] [,3] [,4] [,5]
[1,] 1 4 7 10 13
[2,] 4 7 10 13 16
[3,] 7 10 13 16 19
[4,] 10 13 16 19 22
[5,] 13 16 19 22 25
I was curious to compare all the methods, so ran a quick microbenchmark. Clearly, the simplest 0.5 * (S + t(S)) is the fastest.
The specific function Matrix::forceSymmetric() is sometimes slightly faster, but it returns an object of a different class (dsyMatrix instead of matrix), and converting back to matrix takes a lot of time (although one might argue that it is a good idea to keep the output as dsyMatrix for further gains in computation).
S <-matrix(1:50^2,50)
pick_lower <- function(M) M[lower.tri(M)] = t(M)[lower.tri(M)]
microbenchmark::microbenchmark(micEcon=miscTools::symMatrix(S[upper.tri(S, TRUE)], nrow=nrow(S), byrow=TRUE),
Matri_raw =Matrix::forceSymmetric(S),
Matri_conv =as.matrix(Matrix::forceSymmetric(S)),
pick_lower = pick_lower(S),
base =0.5 * (S + t(S)),
times=100)
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> micEcon 62.133 74.7515 136.49538 104.2430 115.6950 3581.001 100 a
#> Matri_raw 14.766 17.9130 24.15157 24.5060 26.6050 63.939 100 a
#> Matri_conv 46.767 59.8165 5621.96140 66.3785 73.5380 555393.346 100 a
#> pick_lower 27.907 30.7930 235.65058 48.9760 53.0425 12484.779 100 a
#> base 10.771 12.4535 16.97627 17.1190 18.3175 47.623 100 a
Created on 2021-02-08 by the reprex package (v1.0.0)
as.dist() will overwrite the upper triangle of a matrix with the lower one and replace the diagonal with zeros. This method only works on numeric matrices.
mat <- matrix(1:25, 5)
unname(`diag<-`(as.matrix(as.dist(mat)), diag(mat)))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 2 3 4 5
# [2,] 2 7 8 9 10
# [3,] 3 8 13 14 15
# [4,] 4 9 14 19 20
# [5,] 5 10 15 20 25
Inspired by user3318600
s<-matrix(1:25,5)
s[lower.tri(s)]<-s[upper.tri(s)]