I have a matrix in R that is supposed to be symmetric, however, due to machine precision the matrix is never symmetric (the values differ by around 10^-16). Since I know the matrix is symmetric I have been doing this so far to get around the problem:
s.diag = diag(s)
s[lower.tri(s,diag=T)] = 0
s = s + t(s) + diag(s.diag,S)
Is there a better one line command for this?
s<-matrix(1:25,5)
s[lower.tri(s)] = t(s)[lower.tri(s)]
You can force the matrix to be symmetric using forceSymmetric function in Matrix package in R:
library(Matrix)
x<-Matrix(rnorm(9), 3)
> x
3 x 3 Matrix of class "dgeMatrix"
[,1] [,2] [,3]
[1,] -1.3484514 -0.4460452 -0.2828216
[2,] 0.7076883 -1.0411563 0.4324291
[3,] -0.4108909 -0.3292247 -0.3076071
A <- forceSymmetric(x)
> A
3 x 3 Matrix of class "dsyMatrix"
[,1] [,2] [,3]
[1,] -1.3484514 -0.4460452 -0.2828216
[2,] -0.4460452 -1.0411563 0.4324291
[3,] -0.2828216 0.4324291 -0.3076071
Is the workaround really necessary if the values only differ by that much?
Someone pointed out that my previous answer was wrong. I like some of the other ones better, but since I can't delete this one (accepted by a user who left), here's yet another solution using the micEcon package:
symMatrix(s[upper.tri(s, TRUE)], nrow=nrow(s), byrow=TRUE)
s<-matrix(1:25,5)
pmean <- function(x,y) (x+y)/2
s[] <- pmean(s, matrix(s, nrow(s), byrow=TRUE))
s
#-------
[,1] [,2] [,3] [,4] [,5]
[1,] 1 4 7 10 13
[2,] 4 7 10 13 16
[3,] 7 10 13 16 19
[4,] 10 13 16 19 22
[5,] 13 16 19 22 25
I was curious to compare all the methods, so ran a quick microbenchmark. Clearly, the simplest 0.5 * (S + t(S)) is the fastest.
The specific function Matrix::forceSymmetric() is sometimes slightly faster, but it returns an object of a different class (dsyMatrix instead of matrix), and converting back to matrix takes a lot of time (although one might argue that it is a good idea to keep the output as dsyMatrix for further gains in computation).
S <-matrix(1:50^2,50)
pick_lower <- function(M) M[lower.tri(M)] = t(M)[lower.tri(M)]
microbenchmark::microbenchmark(micEcon=miscTools::symMatrix(S[upper.tri(S, TRUE)], nrow=nrow(S), byrow=TRUE),
Matri_raw =Matrix::forceSymmetric(S),
Matri_conv =as.matrix(Matrix::forceSymmetric(S)),
pick_lower = pick_lower(S),
base =0.5 * (S + t(S)),
times=100)
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> micEcon 62.133 74.7515 136.49538 104.2430 115.6950 3581.001 100 a
#> Matri_raw 14.766 17.9130 24.15157 24.5060 26.6050 63.939 100 a
#> Matri_conv 46.767 59.8165 5621.96140 66.3785 73.5380 555393.346 100 a
#> pick_lower 27.907 30.7930 235.65058 48.9760 53.0425 12484.779 100 a
#> base 10.771 12.4535 16.97627 17.1190 18.3175 47.623 100 a
Created on 2021-02-08 by the reprex package (v1.0.0)
as.dist() will overwrite the upper triangle of a matrix with the lower one and replace the diagonal with zeros. This method only works on numeric matrices.
mat <- matrix(1:25, 5)
unname(`diag<-`(as.matrix(as.dist(mat)), diag(mat)))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 2 3 4 5
# [2,] 2 7 8 9 10
# [3,] 3 8 13 14 15
# [4,] 4 9 14 19 20
# [5,] 5 10 15 20 25
Inspired by user3318600
s<-matrix(1:25,5)
s[lower.tri(s)]<-s[upper.tri(s)]
Related
I have a n x 3 x m array, call it I. It contains 3 columns, n rows (say n=10), and m slices. I have a computation that must be done to replace the third column in each slice based on the other 2 columns in the slice.
I've written a function insertNewRows(I[,,simIndex]) that takes a given slice and replaces the third column. The following for-loop does what I want, but it's slow. Is there a way to speed this up by using one of the apply functions? I cannot figure out how to get them to work in the way I'd like.
for(simIndex in 1:m){
I[,, simIndex] = insertNewRows(I[,,simIndex])
}
I can provide more details on insertNewRows if needed, but the short version is that it takes a probability based on the columns I[,1:2, simIndex] of a given slice of the array, and generates a binomial RV based on the probability.
It seems like one of the apply functions should work just by using
I = apply(FUN = insertNewRows, MARGIN = c(1,2,3)) but that just produces gibberish..?
Thank you in advance!
IK
The question has not defined the input nor the transformation nor the result so we can't really answer it but here is an example of adding a row of ones to to a[,,i] for each i so maybe that will suggest how you could solve the problem yourself.
This is how you could use sapply, apply, plyr::aaply, reshaping using matrix/aperm and abind::abind.
# input array and function
a <- array(1:24, 2:4)
f <- function(x) rbind(x, 1) # append a row of 1's
aa <- array(sapply(1:dim(a)[3], function(i) f(a[,,i])), dim(a) + c(1,0,0))
aa2 <- array(apply(a, 3, f), dim(a) + c(1,0,0))
aa3 <- aperm(plyr::aaply(a, 3, f), c(2, 3, 1))
aa4 <- array(rbind(matrix(a, dim(a)[1]), 1), dim(a) + c(1,0,0))
aa5 <- abind::abind(a, array(1, dim(a)[2:3]), along = 1)
dimnames(aa3) <- dimnames(aa5) <- NULL
sapply(list(aa2, aa3, aa4, aa5), identical, aa)
## [1] TRUE TRUE TRUE TRUE
aa[,,1]
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
## [3,] 1 1 1
aa[,,2]
## [,1] [,2] [,3]
## [1,] 7 9 11
## [2,] 8 10 12
## [3,] 1 1 1
aa[,,3]
## [,1] [,2] [,3]
## [1,] 13 15 17
## [2,] 14 16 18
## [3,] 1 1 1
aa[,,4]
## [,1] [,2] [,3]
## [1,] 19 21 23
## [2,] 20 22 24
## [3,] 1 1 1
In R, is there a more efficient and/or general way to produce the desired output from the two matrices below? I'm suspicious that what I've done is just some esoteric matrix multiplication operation of which I'm not aware.
ff <- matrix(1:6,ncol=2)
# [,1] [,2]
# [1,] 1 4
# [2,] 2 5
# [3,] 3 6
bb <- matrix(7:10,ncol=2)
# [,1] [,2]
# [1,] 7 9
# [2,] 8 10
# DESIRE:
# 7 36
# 14 45
# 21 54
# 8 40
# 16 50
# 24 60
This works, but isn't the general solution I'm looking for:
rr1 <- t(t(ff) * bb[1,])
rr2 <- t(t(ff) * bb[2,])
rbind(rr1,rr2)
# [,1] [,2]
# [1,] 7 36
# [2,] 14 45
# [3,] 21 54
# [4,] 8 40
# [5,] 16 50
# [6,] 24 60
This next code block seems pretty efficient and is general. But is there a better way?
Something like kronecker(ffa,bba)? (which clearly doesn't work in this case)
ffa <- matrix(rep(t(ff),2), ncol=2, byrow=T)
bba <- matrix(rep(bb,each=3), ncol=2)
ffa * bba
# [,1] [,2]
# [1,] 7 36
# [2,] 14 45
# [3,] 21 54
# [4,] 8 40
# [5,] 16 50
# [6,] 24 60
This is related to my other questions:
Using apply function over the row margin with expectation of stacked results, where I'm trying to understand the behavior of apply itself and:
Is this an example of some more general matrix product?, where I'm asking about the theoretical math, specifically.
Use a kronecker product and pick off the appropriate columns:
kronecker(bb, ff)[, c(diag(ncol(bb))) == 1]
or using the infix operator for kronecker:
(bb %x% ff)[, c(diag(ncol(bb))) == 1]
Another approach is to convert the arguments to data frames and mapply kronecker across them. For the case in the question this performs the calculation cbind(bb[, 1] %x% ff[, 1], bb[, 2] %x% ff[, 2]) but in a more general manner without resorting to indices:
mapply(kronecker, as.data.frame(bb), as.data.frame(ff))
or using the infix operator for kronecker:
mapply(`%x%`, as.data.frame(bb), as.data.frame(ff))
The functionality you are seeking for is available within the Matrix package as the function KhatriRao. Since the function is in Matrix, output is a matrix of class "dgCMatrix" (sparse matrix). You can transform it to an ordinary matrix of class "matrix" by as.matrix.
library(Matrix)
as.matrix(KhatriRao(bb, ff))
I want to split a large matrix, mt, into a list of sub-matrices, res. The number of rows for each sub-matrix is specified by a vector, len.
For example,
> mt=matrix(c(1:20),ncol=2)
> mt
[,1] [,2]
[1,] 1 11
[2,] 2 12
[3,] 3 13
[4,] 4 14
[5,] 5 15
[6,] 6 16
[7,] 7 17
[8,] 8 18
[9,] 9 19
[10,] 10 20
lens=c(2,3,5)
What I want is a function some_function, that can offer the following result,
> res=some_function(mt,lens)
> res
[[1]]
[,1] [,2]
[1,] 1 11
[2,] 2 12
[[2]]
[,1] [,2]
[1,] 3 13
[2,] 4 14
[3,] 5 15
[[3]]
[,1] [,2]
[1,] 6 16
[2,] 7 17
[3,] 8 18
[4,] 9 19
[5,] 10 20
Speed is a big concern. The faster, the better!
Many thanks!
A function to create index based on length of each value and split the matrix.
mt <- matrix(c(1:20), ncol=2)
# Two arguments: m - matrix, len - length of each group
m_split <- function(m, len){
index <- 1:sum(len)
group <- rep(1:length(len), times = len)
index_list <- split(index, group)
mt_list <- lapply(index_list, function(vec) mt[vec, ])
return(mt_list)
}
m_split(mt, c(2, 3, 5))
$`1`
[,1] [,2]
[1,] 1 11
[2,] 2 12
$`2`
[,1] [,2]
[1,] 3 13
[2,] 4 14
[3,] 5 15
$`3`
[,1] [,2]
[1,] 6 16
[2,] 7 17
[3,] 8 18
[4,] 9 19
[5,] 10 20
Update
I used the following code to compare the performance of each method in this post.
library(microbenchmark)
library(data.table)
# Test case from #missuse
mt <- matrix(c(1:20000000),ncol=10)
lens <- c(20000,15000,(nrow(mt)-20000-15000))
# Functions from #Damiano Fantini
split.df <- function(mt, lens) {
fac <- do.call(c, lapply(1:length(lens), (function(i){ rep(i, lens[i])})))
split(as.data.frame(mt), f = fac)
}
split.mat <- function(mt, lens) {
fac <- do.call(c, lapply(1:length(lens), (function(i){ rep(i, lens[i])})))
lapply(unique(fac), (function(i) {mt[fac==i,]}))
}
# Benchmarking
microbenchmark(m1 = {m_split(mt, lens)}, # #ycw's method
m2 = {pam = rep(1:length(lens), times = lens)
split(data.table(mt), pam)}, # #missuse's data.table method
m3 = {split.df(mt, lens)}, # #Damiano Fantini's data frame method
m4 = {split.mat(mt, lens)}) # #Damiano Fantini's matrix method
Unit: milliseconds
expr min lq mean median uq max neval
m1 167.6896 209.7746 251.0932 230.5920 274.9347 555.8839 100
m2 402.3415 497.2397 554.1094 547.9603 599.7632 787.4112 100
m3 552.8548 657.6245 719.2548 711.4123 769.6098 989.6779 100
m4 166.6581 203.6799 249.2965 235.5856 275.4790 547.4927 100
As we can see, m1 and m4 are the fastest, while there are almost no differences between them, which means it is not needed to convert the matrix to a data frame or a data.table especially if the OP will keep working on the matrix. Working directly on the matrix (m1 and m4) should be sufficient.
If you are OK working with data.frames instead of matrices, you might build a grouping factor/vector according to lens and then use split(). Alternatively, use this grouping vector to subset your matrix and return a list. In this example, I wrapped the two solutions into two functions: .
# your data
mt=matrix(c(1:20),ncol=2)
lens=c(2,3,5)
# based on split
split.df <- function(mt, lens) {
fac <- do.call(c, lapply(1:length(lens), (function(i){ rep(i, lens[i])})))
split(as.data.frame(mt), f = fac)
}
split.df(mt, lens)
# based on subsetting
split.mat <- function(mt, lens) {
fac <- do.call(c, lapply(1:length(lens), (function(i){ rep(i, lens[i])})))
lapply(unique(fac), (function(i) {mt[fac==i,]}))
}
split.mat(mt, lens)
This second option is about ~10 times faster than the other one according to microbenchmark
library(microbenchmark)
microbenchmark({split.df(mt, lens)}, times = 1000)
# median = 323.743 microseconds
microbenchmark({split.mat(mt, lens)}, times = 1000)
# median = 31.7645 microseconds
One aproach is using split, however it can operate on vectors and data.frames so you need to convert the matrix - data.table should be efficient
mt=matrix(c(1:20),ncol=2)
lens=c(2,3,5)
pam = rep(1:length(lens), times = lens)
library(data.table)
mt_split <- split(data.table(mt), pam)
mt_split
#output
$`1`
V1 V2
1: 1 11
2: 2 12
$`2`
V1 V2
1: 3 13
2: 4 14
3: 5 15
$`3`
V1 V2
1: 6 16
2: 7 17
3: 8 18
4: 9 19
5: 10 20
Checking speed
mt=matrix(c(1:20000000),ncol=10)
lens=c(20000,15000,(nrow(mt)-20000-15000))
pam = rep(1:length(lens), times = lens)
system.time(split(data.table(mt), pam))
#output
user system elapsed
0.75 0.20 0.96
I want to calculate exponential with a matrix and vector. The matrix is as below
ID var_0 var_01 var_02 var_03
1 1 2 3 4
2 5 6 7 8
3 9 10 11 12
...
and vector is (0.1,0.2,0.3,0.4)
I want to get the result as below
ID var_0 var_01 var_02 var_03
1 1^0.1 2^0.2 3^0.3 4^0.4
2 5^0.1 6^0.2 7^0.3 8^0.4
3 9^0.1 10^0.2 11^0.3 12^0.4
...
That is, I want to get (ith var)^ith vector for each ID
You can use R's recycling of vectors. Transpose your matrix so that the power calculations are applied in the correct order and then transpose back.
(m <- matrix(1:12, nrow=3, ncol=4, byrow=TRUE))
# [,1] [,2] [,3] [,4]
# [1,] 1 2 3 4
# [2,] 5 6 7 8
# [3,] 9 10 11 12
p <- 1:4
t(t(m)^p)
# [,1] [,2] [,3] [,4]
# [1,] 1 4 27 256
# [2,] 5 36 343 4096
# [3,] 9 100 1331 20736
Or you could do (data from #user20650's post)
m^p[col(m)]
# [,1] [,2] [,3] [,4]
#[1,] 1 4 27 256
#[2,] 5 36 343 4096
#[3,] 9 100 1331 20736
Or maybe (using #user20650's data set)
m^rep(p, each = nrow(m))
# [,1] [,2] [,3] [,4]
# [1,] 1 4 27 256
# [2,] 5 36 343 4096
# [3,] 9 100 1331 20736
Another option
m ^ matrix(p, nrow(m), ncol(m), byrow = TRUE)
# [,1] [,2] [,3] [,4]
# [1,] 1 4 27 256
# [2,] 5 36 343 4096
# [3,] 9 100 1331 20736
Some benchmarks on a bigger data set. Seems like my two answers and #akruns scales the best
n <- 1e6
cols <- 100
m <- matrix(seq_len(n), nrow = n, ncol = cols)
p <- seq_len(cols)
user20650 = function() {t(t(m)^p)}
Nick = function() {sweep(m, 2, p, `^`)}
akrun = function() {m^p[col(m)]}
David1 = function() {m^rep(p, each = nrow(m))}
David2 = function() {m ^ matrix(p, nrow(m), ncol(m), byrow = TRUE)}
library(microbenchmark)
Res <- microbenchmark(
user20650() ,
Nick(),
akrun(),
David1(),
David2()
)
Res
# Unit: seconds
# expr min lq median uq max neval
# user20650() 9.692392 9.800470 9.878385 10.010198 11.002012 100
# Nick() 10.487660 10.595750 10.687573 10.896852 14.083319 100
# akrun() 8.213784 8.316646 8.395962 8.529671 9.325273 100
# David1() 9.115449 9.219430 9.304380 9.425614 10.445129 100
# David2() 8.157632 8.275277 8.335884 8.437017 9.348252 100
boxplot(Res)
You can do this using the sweep function. The signature is
sweep(x, MARGIN, STATS, FUN)
This function iterates over parts of x according to how you set MARGIN. On each iteration, the current part of x and the entire argument STATS get passed to FUN, which should be a function taking 2 arguments.
Setting MARGIN to 1 means STATS lines up with the rows of x (dimension 1), 2 means STATS lines up with the columns of x (dimension 2). Other variations are also possible.
So for your particular example, use
sweep(your.matrix, 2, your.exponents, `^`)
Edit: Based on #david-arenburg's answer, you probably shouldn't use sweep. I had no idea it was so slow!
I want to go from something like this:
1> a = matrix(c(1,4,2,5,2,5,2,1,4,4,3,2,1,6,7,4),4)
1> a
[,1] [,2] [,3] [,4]
[1,] 1 2 4 1
[2,] 4 5 4 6
[3,] 2 2 3 7
[4,] 5 1 2 4
To something like this:
[,1] [,2]
[1,] 12 15
[2,] 10 16
...without using for-loops, plyr, or otherwise without looping. Possible? I'm trying to shrink a geographic lat/long dataset from 5 arc-minutes to half-degree, and I've got an ascii grid. A little function where I specify blocksize would be great. I've got hundreds of such files, so things that allow me to do it quickly without parallelization/supercomputers would be much appreciated.
You can use matrix multiplication for this.
# Computation matrix:
mat <- function(n, r) {
suppressWarnings(matrix(c(rep(1, r), rep(0, n)), n, n/r))
}
Square-matrix example, uses a matrix and its transpose on each side of a:
# Reduce a 4x4 matrix by a factor of 2:
x <- mat(4, 2)
x
## [,1] [,2]
## [1,] 1 0
## [2,] 1 0
## [3,] 0 1
## [4,] 0 1
t(x) %*% a %*% x
## [,1] [,2]
## [1,] 12 15
## [2,] 10 16
Non-square example:
b <- matrix(1:24, 4 ,6)
t(mat(4, 2)) %*% b %*% mat(6, 2)
## [,1] [,2] [,3]
## [1,] 14 46 78
## [2,] 22 54 86
tapply(a, list((row(a) + 1L) %/% 2L, (col(a) + 1L) %/% 2L), sum)
# 1 2
# 1 12 15
# 2 10 16
I used 1L and 2L instead of 1 and 2 so indices remain integers (as opposed to numerics) and it should run faster that way.
I guess that might help you, but still it uses sapply which can be considered as loop-ish tool.
a <- matrix(c(1,4,2,5,2,5,2,1,4,4,3,2,1,6,7,4),4)
block.step <- 2
res <- sapply(seq(1, nrow(a), by=block.step), function(x)
sapply(seq(1, nrow(a), by=block.step), function(y)
sum(a[x:(x+block.step-1), y:(y+block.step-1)])
)
)
res
Is it anyhow helpful ?