I have vector:
v1 = c(1,2,3)
From this vector I want to create matrix where element on i,j position will be sum of vector members on i,j positions:
[,1] [,2] [,3]
[1,] 2 3 4
[2,] 3 4 5
[3,] 4 5 6
Questions:
i,j and j,i is the same, so there is no reason to compute it 2x
for better performance. How to achieve this?
How to create also variant which will not compute elements if i == j and simply returns NA instead? I'm not asking for diag(m) <- NA command, I'm asking how to prevent computing those elements.
PS: This is reduced version of my problem
There is an approach that is much faster than a straightforward calculation with 2 nested loops. It's not optimized in terms that you described in the question 1, but it's pretty fast because it's vectorized. Maybe, it will be enough for your purpose.
Vectorized (or even matrix) approach itself:
f1 <- function(x){
n <- length(x)
m <- matrix(rep(x,n),n)
m + t(m)
}
> f1(1:3)
[,1] [,2] [,3]
[1,] 2 3 4
[2,] 3 4 5
[3,] 4 5 6
We can also create a function for straightforward approach to perform benchmark. This function does even less than needed: it calculates only upper triangle, but we will see that it's much slower.
f2 <- function(x){
n <- length(x)
m <- matrix(rep(NA,n^2),n)
for(i in 1:(n-1)){
for(j in (i+1):n) m[i,j] <- x[[i]] + x[[j]]
}
m
}
> f2(1:3)
[,1] [,2] [,3]
[1,] NA 3 4
[2,] NA NA 5
[3,] NA NA NA
Benchmark:
library(microbenchmark)
> microbenchmark(f1(1:100), f2(1:100))
Unit: microseconds
expr min lq mean median uq max neval
f1(1:100) 124.775 138.6175 181.6401 187.731 196.454 294.301 100
f2(1:100) 10227.337 10465.1285 11000.1493 10616.830 10907.148 15826.259 100
Related
I have a cluster of 250 observations. each observation is a 4 by 9 matrix.
4 is number of variable parameters observed and 9 is number of days, observations were collected.
I want to know the variance between 250 observations which are in matrix form. as I ve studied so far, variance is calculated among one dimension variables.
any suggestion for data in matrix form?
mat1 <- matrix(c(0:69),10,7)
mat2 <- matrix(c(3:72),10,7)
mat3 <- matrix(c(0:69),10,7)
...
var <- var(mat1,mat2, mat3,..)
for these three matrices, var() returns a 7 by 7 matrix of 9.166667 for all elements. I do not know what r is doing. or how to get to this.
I think this will reflect what you're hoping for.
First, I'll create a few matrices, very small:
set.seed(42)
mat1 <- matrix(sample(100,12),2,4)
mat2 <- matrix(sample(100,12),2,4)
mat3 <- matrix(sample(100,12),2,4)
From here, I think you want to get
var(c(mat1[1,1], mat2[1,1], mat3[1,1]))
# [1] 193
but for every set of cells in all matrices.
One way to do this is to abind all matrices into a 3D array and then use apply:
ary <- do.call(abind::abind, c(list(mat1, mat2, mat3), along = 3))
ary
# , , 1
# [,1] [,2] [,3] [,4]
# [1,] 49 25 18 47
# [2,] 65 74 100 24
# , , 2
# [,1] [,2] [,3] [,4]
# [1,] 26 41 27 5
# [2,] 3 89 36 84
# , , 3
# [,1] [,2] [,3] [,4]
# [1,] 24 43 22 8
# [2,] 30 15 58 36
apply(ary, 1:2, var)
# [,1] [,2] [,3] [,4]
# [1,] 193.0000 97.33333 20.33333 549
# [2,] 966.3333 1530.33333 1057.33333 1008
Where 193 is the variance of the [1,1] elements, 97.333 is the variance of the [1,2] elements, etc.
The arguments to var are:
> args(var)
function (x, y = NULL, na.rm = FALSE, use)
so mat1 is being passed to x and mat2 to y and mat3 to na.rm. Element i, j of the result is the covariance of x[, i] and y[, j].
The code in the question really all makes no sense and some reading of ?var would help. It is not clear what "I want to know the variance between 250 observations which are in matrix form" means. If it means that v[i, j] should be calculated as the variance of c(mat1[i,j], mat2[i, j], mat3[i, j]) then we can use one of several list comprehension packages or just iterated sapply. They all use the fact that these two are the same for fixed i and j except the first is more general.
var(sapply(L, `[`, i, j))
var(c(L[[1]][i, j], L[[2]][i,j], L[[3]][i,j]))
The syntax for the listcompr alternative seems particularly intuitive here.
L <- list(mat1, mat2, mat3)
nr <- nrow(L[[1]])
nc <- ncol(L[[1]])
library(listcompr)
v1 <- gen.matrix(var(sapply(L, `[`, i, j)), i = 1:nr, j = 1:nc)
# or
library(eList)
v2 <- Mat(for(j in 1:nc) for(i in 1:nr) var(sapply(L, `[`, i, j)))
# or (no packages):
v3 <- sapply(1:nc, \(j) sapply(1:nr, \(i) var(sapply(L, `[`, i, j))))
# checks
identical(v1, v2)
## [1] TRUE
identical(v1, v3)
## [1] TRUE
i <- 2; j <- 3
identical(v1[i, j], var(c(L[[1]][i, j], L[[2]][i,j], L[[3]][i,j])))
## [1] TRUE
I would like to compute the product between the each row of a matrix x with itself. And then sum the result of all these products. The result is a scalar. I make the following coda that works but is not efficient. Can someone help me to avoid the for loop?
for(i in 1:nrow(x){
resid2[i] <- t(x[i,])%*% x[i,]
}
V = sum(resid2)/
The solution is just the sum of squares of all elements of the matrix.
V = sum(x^2)
which can also be calculated via matrix multiplication as:
V = crossprod(as.vector(x))
The intermediate vector resid2 can be calculated as
resid2 = rowSums(x^2)
V = sum(resid2)
Here is an answer that swaps the for loop out for the apply family.
sum(apply(x, margin = 1, function(z) z%*%z))
The apply function takes the matrix x, margin = 1 means for each row (as opposed to margin = 2 which means each column). So, for each row in x run a function that multiplies that row by itself: function(z) z%*%z
If I understand you correctly, you don't need to loop at all. mat %*% mat should do it:
mat <- matrix(seq.int(9), nrow=3)
mat
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
mat %*% mat
## [,1] [,2] [,3]
## [1,] 30 66 102
## [2,] 36 81 126
## [3,] 42 96 150
Consider I have following matrix
M <- matrix(1:9, 3, 3)
M
# [,1] [,2] [,3]
# [1,] 1 4 7
# [2,] 2 5 8
# [3,] 3 6 9
I just want to find the last element i.e M[3, 3]
As this matrix column and row size are dynamic we can't hardcode it to M[3, 3]
How can I get the value of last element?
Currently I've done using the below code
M[nrow(M), ncol(M)]
# [1] 9
Is there any better way to do it?
A matrix in R is just a vector with a dim attribute, so you can just subset it as one
M[length(M)]
## [1] 9
Though (as mentioned by #James) your solution could be more general in case you want to keep you matrix structure, as you can add drop = FALSE
M[nrow(M), ncol(M), drop = FALSE]
# [,1]
# [1,] 9
Though, my solution could be also modified in a similar manner using the dim<- replacement function
`dim<-`(M[length(M)], c(1,1))
# [,1]
# [1,] 9
Some Benchmarks (contributed by #zx8754)
M <- matrix(runif(1000000),nrow=1000)
microbenchmark(
nrow_ncol={
M[nrow(M),ncol(M)]
},
dim12={
M[dim(M)[1],dim(M)[2]]
},
length1={
M[length(M)]
},
tail1={
tail(c(M),1)
},
times = 1000
)
# Unit: nanoseconds
# expr min lq mean median uq max neval cld
# nrow_ncol 605 1209 3799.908 3623.0 6038 27167 1000 a
# dim12 302 605 2333.241 1811.0 3623 19922 1000 a
# length1 0 303 2269.564 1510.5 3925 14792 1000 a
# tail 1 3103005 3320034 4022028.561 3377234.0 3467487 42777080 1000 b
I would rather do:
tail(c(M),1)
# [1] 9
One way to do this and to avoid unnecessary repetition of the object name (or silly typos) would be to use pipes. Likes this:
require(magrittr)
M %>% .[nrow(.), ncol(.)]
##[1] 9
M %>% `[`(nrow(.), ncol(.))
##[1] 9
M %>% extract(nrow(.), ncol(.))
##[1] 9
The approaches are equivalent, so you can choose whichever feels more intuitive to you.
How is it possible to convert a list object (with different length) into a matrix object in an efficient way! Following example clarify the afore-mentioned goal:
imagine you have a list object of structure:
l <- list(c(1,2), c(5,7,3,11))
print(l)
# [[1]]
# [1] 1 2
# [[2]]
# [1] 5 7 3 11
The aim is to get a matrix or data.frame in form of:
[,1] [,2] [,3] [,4]
[1,] 1 2 NA NA
[2,] 5 7 3 11
It's very easy to tackle the problem with for-loop. Do you have any idea, how is it possible to make this kind of transformation easily? Thank you in advance!
You could also try
t(sapply(l, `length<-`, max(sapply(l, length))))
# [,1] [,2] [,3] [,4]
#[1,] 1 2 NA NA
#[2,] 5 7 3 11
Here's one way to do it:
n <- max(sapply(l, length))
t(sapply(l, function(x) if(length(x) < n) c(x, rep(NA, n - length(x))) else x))
[,1] [,2] [,3] [,4]
[1,] 1 2 NA NA
[2,] 5 7 3 11
First we find out the maximum vector length per list element and store it in n (which is 4 in this case).
Then, we sapply over the list and check if the length of the list element is equal to n and if it is, return it, if it's shorter than n, return the list element + NA repeated as often as the difference in length. This returns a matrix. We use t() on that matrix to transpose it and get the desired result.
If you're open to using a package, you could also consider stri_list2matrix from the "stringi" package:
library(stringi)
l <- list(c(1,2), c(5,7,3,11))
stri_list2matrix(l, byrow = TRUE)
# [,1] [,2] [,3] [,4]
# [1,] "1" "2" NA NA
# [2,] "5" "7" "3" "11"
Regarding your question about doing this efficiently, #akrun's answer is already pretty efficient, but could be made more efficient by using vapply instead of sapply. The "stringi" approach is also pretty efficient (and has the benefit of not resorting to cryptic code like length<-).
funDD <- function() {
n <- max(sapply(l, length))
t(sapply(l, function(x) if(length(x) < n) c(x, rep(NA, n - length(x))) else x))
}
funAK <- function() t(sapply(l, `length<-`, max(sapply(l, length))))
funAM <- function() {
x <- max(vapply(l, length, 1L))
t(vapply(l, `length<-`, numeric(x), x))
}
funStringi <- function() stri_list2matrix(l, byrow = TRUE)
## Make a big list to test on
set.seed(1)
l <- lapply(sample(3:10, 1000000, TRUE), function(x) sample(10, x, TRUE))
system.time(out1 <- funDD())
# user system elapsed
# 5.81 0.33 7.02
library(microbenchmark)
microbenchmark(funAK(), funAM(), funStringi(), times = 10)
# Unit: seconds
# expr min lq mean median uq max neval
# funAK() 2.350877 2.499963 2.974141 3.123008 3.200545 3.418648 10
# funAM() 1.154151 1.238235 1.337607 1.287610 1.494964 1.508884 10
# funStringi() 2.080901 2.168248 2.352030 2.344763 2.462959 2.716910 10
I have a matrix in R that is supposed to be symmetric, however, due to machine precision the matrix is never symmetric (the values differ by around 10^-16). Since I know the matrix is symmetric I have been doing this so far to get around the problem:
s.diag = diag(s)
s[lower.tri(s,diag=T)] = 0
s = s + t(s) + diag(s.diag,S)
Is there a better one line command for this?
s<-matrix(1:25,5)
s[lower.tri(s)] = t(s)[lower.tri(s)]
You can force the matrix to be symmetric using forceSymmetric function in Matrix package in R:
library(Matrix)
x<-Matrix(rnorm(9), 3)
> x
3 x 3 Matrix of class "dgeMatrix"
[,1] [,2] [,3]
[1,] -1.3484514 -0.4460452 -0.2828216
[2,] 0.7076883 -1.0411563 0.4324291
[3,] -0.4108909 -0.3292247 -0.3076071
A <- forceSymmetric(x)
> A
3 x 3 Matrix of class "dsyMatrix"
[,1] [,2] [,3]
[1,] -1.3484514 -0.4460452 -0.2828216
[2,] -0.4460452 -1.0411563 0.4324291
[3,] -0.2828216 0.4324291 -0.3076071
Is the workaround really necessary if the values only differ by that much?
Someone pointed out that my previous answer was wrong. I like some of the other ones better, but since I can't delete this one (accepted by a user who left), here's yet another solution using the micEcon package:
symMatrix(s[upper.tri(s, TRUE)], nrow=nrow(s), byrow=TRUE)
s<-matrix(1:25,5)
pmean <- function(x,y) (x+y)/2
s[] <- pmean(s, matrix(s, nrow(s), byrow=TRUE))
s
#-------
[,1] [,2] [,3] [,4] [,5]
[1,] 1 4 7 10 13
[2,] 4 7 10 13 16
[3,] 7 10 13 16 19
[4,] 10 13 16 19 22
[5,] 13 16 19 22 25
I was curious to compare all the methods, so ran a quick microbenchmark. Clearly, the simplest 0.5 * (S + t(S)) is the fastest.
The specific function Matrix::forceSymmetric() is sometimes slightly faster, but it returns an object of a different class (dsyMatrix instead of matrix), and converting back to matrix takes a lot of time (although one might argue that it is a good idea to keep the output as dsyMatrix for further gains in computation).
S <-matrix(1:50^2,50)
pick_lower <- function(M) M[lower.tri(M)] = t(M)[lower.tri(M)]
microbenchmark::microbenchmark(micEcon=miscTools::symMatrix(S[upper.tri(S, TRUE)], nrow=nrow(S), byrow=TRUE),
Matri_raw =Matrix::forceSymmetric(S),
Matri_conv =as.matrix(Matrix::forceSymmetric(S)),
pick_lower = pick_lower(S),
base =0.5 * (S + t(S)),
times=100)
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> micEcon 62.133 74.7515 136.49538 104.2430 115.6950 3581.001 100 a
#> Matri_raw 14.766 17.9130 24.15157 24.5060 26.6050 63.939 100 a
#> Matri_conv 46.767 59.8165 5621.96140 66.3785 73.5380 555393.346 100 a
#> pick_lower 27.907 30.7930 235.65058 48.9760 53.0425 12484.779 100 a
#> base 10.771 12.4535 16.97627 17.1190 18.3175 47.623 100 a
Created on 2021-02-08 by the reprex package (v1.0.0)
as.dist() will overwrite the upper triangle of a matrix with the lower one and replace the diagonal with zeros. This method only works on numeric matrices.
mat <- matrix(1:25, 5)
unname(`diag<-`(as.matrix(as.dist(mat)), diag(mat)))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 2 3 4 5
# [2,] 2 7 8 9 10
# [3,] 3 8 13 14 15
# [4,] 4 9 14 19 20
# [5,] 5 10 15 20 25
Inspired by user3318600
s<-matrix(1:25,5)
s[lower.tri(s)]<-s[upper.tri(s)]