I want to calculate exponential with a matrix and vector. The matrix is as below
ID var_0 var_01 var_02 var_03
1 1 2 3 4
2 5 6 7 8
3 9 10 11 12
...
and vector is (0.1,0.2,0.3,0.4)
I want to get the result as below
ID var_0 var_01 var_02 var_03
1 1^0.1 2^0.2 3^0.3 4^0.4
2 5^0.1 6^0.2 7^0.3 8^0.4
3 9^0.1 10^0.2 11^0.3 12^0.4
...
That is, I want to get (ith var)^ith vector for each ID
You can use R's recycling of vectors. Transpose your matrix so that the power calculations are applied in the correct order and then transpose back.
(m <- matrix(1:12, nrow=3, ncol=4, byrow=TRUE))
# [,1] [,2] [,3] [,4]
# [1,] 1 2 3 4
# [2,] 5 6 7 8
# [3,] 9 10 11 12
p <- 1:4
t(t(m)^p)
# [,1] [,2] [,3] [,4]
# [1,] 1 4 27 256
# [2,] 5 36 343 4096
# [3,] 9 100 1331 20736
Or you could do (data from #user20650's post)
m^p[col(m)]
# [,1] [,2] [,3] [,4]
#[1,] 1 4 27 256
#[2,] 5 36 343 4096
#[3,] 9 100 1331 20736
Or maybe (using #user20650's data set)
m^rep(p, each = nrow(m))
# [,1] [,2] [,3] [,4]
# [1,] 1 4 27 256
# [2,] 5 36 343 4096
# [3,] 9 100 1331 20736
Another option
m ^ matrix(p, nrow(m), ncol(m), byrow = TRUE)
# [,1] [,2] [,3] [,4]
# [1,] 1 4 27 256
# [2,] 5 36 343 4096
# [3,] 9 100 1331 20736
Some benchmarks on a bigger data set. Seems like my two answers and #akruns scales the best
n <- 1e6
cols <- 100
m <- matrix(seq_len(n), nrow = n, ncol = cols)
p <- seq_len(cols)
user20650 = function() {t(t(m)^p)}
Nick = function() {sweep(m, 2, p, `^`)}
akrun = function() {m^p[col(m)]}
David1 = function() {m^rep(p, each = nrow(m))}
David2 = function() {m ^ matrix(p, nrow(m), ncol(m), byrow = TRUE)}
library(microbenchmark)
Res <- microbenchmark(
user20650() ,
Nick(),
akrun(),
David1(),
David2()
)
Res
# Unit: seconds
# expr min lq median uq max neval
# user20650() 9.692392 9.800470 9.878385 10.010198 11.002012 100
# Nick() 10.487660 10.595750 10.687573 10.896852 14.083319 100
# akrun() 8.213784 8.316646 8.395962 8.529671 9.325273 100
# David1() 9.115449 9.219430 9.304380 9.425614 10.445129 100
# David2() 8.157632 8.275277 8.335884 8.437017 9.348252 100
boxplot(Res)
You can do this using the sweep function. The signature is
sweep(x, MARGIN, STATS, FUN)
This function iterates over parts of x according to how you set MARGIN. On each iteration, the current part of x and the entire argument STATS get passed to FUN, which should be a function taking 2 arguments.
Setting MARGIN to 1 means STATS lines up with the rows of x (dimension 1), 2 means STATS lines up with the columns of x (dimension 2). Other variations are also possible.
So for your particular example, use
sweep(your.matrix, 2, your.exponents, `^`)
Edit: Based on #david-arenburg's answer, you probably shouldn't use sweep. I had no idea it was so slow!
Related
I have a 5x4 matrix. I have created a function call fun1, fun1 use double for loop to loop through the matrix and use distance function to work out the distance between two-row. The final results matrix will be a 5x5 matrix.
I am struggling to covert this fun1 to a vectorization function(no loop, only apply function).
x =
[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20
distance = function(a, b) {
sqrt(sum((a - b)^2))
}
fun1 = function(x) {
n = nrow(x)
results = matrix(0, nrow = n, ncol = n)
for (i in seq_len(n)) {
for (j in seq_len(n)) {
results[i,j] = distance(m[i,], m[j,])
}
}
results
}
You can do it with just a matrix multiplication, some additions and a transpose.
x <- matrix(1:20, nrow = 5)
z <- x %*% t(x)
sqrt(diag(z)+t(diag(z)-2*z))
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0 2 4 6 8
#> [2,] 2 0 2 4 6
#> [3,] 4 2 0 2 4
#> [4,] 6 4 2 0 2
#> [5,] 8 6 4 2 0
Interestingly this is faster than the in built method mentioned in the comments above!
mdist <- function(x) {
z <- x %*% t(x)
sqrt(diag(z)+t(diag(z)-2*z))
}
n <- 1000
l <- 100
x <- matrix(runif(n*l), ncol = l)
microbenchmark::microbenchmark(
z1 = as.matrix(dist(x)),
z2 = dist(x, diag = TRUE, upper = TRUE),
z3 = mdist(x),
times = 100
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> z1 82.98502 90.20049 98.54552 94.85027 101.78114 140.1809 100
#> z2 72.54279 76.22054 82.75410 79.31865 83.47765 231.3008 100
#> z3 54.58258 59.73461 65.62313 63.14435 67.49865 115.0379 100
In a pinch, Vectorize can do what you need:
outer(seq_len(nrow(m)), seq_len(nrow(m)),
Vectorize(function(i,j) distance(m[i,], m[j,]), vectorize.args=c("i","j")))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0 2 4 6 8
# [2,] 2 0 2 4 6
# [3,] 4 2 0 2 4
# [4,] 6 4 2 0 2
# [5,] 8 6 4 2 0
Vectorize takes a function as an argument and returns a function that accepts vectors, iterating internally. The function passed to it is called once for each element within the vector passed. By default, Vectorize only vectorizes the first argument of the function, but it can "zip" along multiple arguments, assuming they are all the same length, by using vectorize.args=.
This might be a little easier to visualize by redefining distance:
distance_ind = function(i, j, data) {
sqrt(sum((data[i,] - data[j,])^2))
}
distance_ind(1, 2, m)
# [1] 2
distance_ind(c(1,3), c(2,3), m)
# [1] 2 ### wrong
distance_ind_vec <- Vectorize(distance_ind, vectorize.args = c("i", "j"))
distance_ind_vec(c(1,3), c(2,3), m)
# [1] 2 0
And the outer call:
outer(seq_len(nrow(m)), seq_len(nrow(m)), distance_ind_vec, data = m)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0 2 4 6 8
# [2,] 2 0 2 4 6
# [3,] 4 2 0 2 4
# [4,] 6 4 2 0 2
# [5,] 8 6 4 2 0
I want to split a large matrix, mt, into a list of sub-matrices, res. The number of rows for each sub-matrix is specified by a vector, len.
For example,
> mt=matrix(c(1:20),ncol=2)
> mt
[,1] [,2]
[1,] 1 11
[2,] 2 12
[3,] 3 13
[4,] 4 14
[5,] 5 15
[6,] 6 16
[7,] 7 17
[8,] 8 18
[9,] 9 19
[10,] 10 20
lens=c(2,3,5)
What I want is a function some_function, that can offer the following result,
> res=some_function(mt,lens)
> res
[[1]]
[,1] [,2]
[1,] 1 11
[2,] 2 12
[[2]]
[,1] [,2]
[1,] 3 13
[2,] 4 14
[3,] 5 15
[[3]]
[,1] [,2]
[1,] 6 16
[2,] 7 17
[3,] 8 18
[4,] 9 19
[5,] 10 20
Speed is a big concern. The faster, the better!
Many thanks!
A function to create index based on length of each value and split the matrix.
mt <- matrix(c(1:20), ncol=2)
# Two arguments: m - matrix, len - length of each group
m_split <- function(m, len){
index <- 1:sum(len)
group <- rep(1:length(len), times = len)
index_list <- split(index, group)
mt_list <- lapply(index_list, function(vec) mt[vec, ])
return(mt_list)
}
m_split(mt, c(2, 3, 5))
$`1`
[,1] [,2]
[1,] 1 11
[2,] 2 12
$`2`
[,1] [,2]
[1,] 3 13
[2,] 4 14
[3,] 5 15
$`3`
[,1] [,2]
[1,] 6 16
[2,] 7 17
[3,] 8 18
[4,] 9 19
[5,] 10 20
Update
I used the following code to compare the performance of each method in this post.
library(microbenchmark)
library(data.table)
# Test case from #missuse
mt <- matrix(c(1:20000000),ncol=10)
lens <- c(20000,15000,(nrow(mt)-20000-15000))
# Functions from #Damiano Fantini
split.df <- function(mt, lens) {
fac <- do.call(c, lapply(1:length(lens), (function(i){ rep(i, lens[i])})))
split(as.data.frame(mt), f = fac)
}
split.mat <- function(mt, lens) {
fac <- do.call(c, lapply(1:length(lens), (function(i){ rep(i, lens[i])})))
lapply(unique(fac), (function(i) {mt[fac==i,]}))
}
# Benchmarking
microbenchmark(m1 = {m_split(mt, lens)}, # #ycw's method
m2 = {pam = rep(1:length(lens), times = lens)
split(data.table(mt), pam)}, # #missuse's data.table method
m3 = {split.df(mt, lens)}, # #Damiano Fantini's data frame method
m4 = {split.mat(mt, lens)}) # #Damiano Fantini's matrix method
Unit: milliseconds
expr min lq mean median uq max neval
m1 167.6896 209.7746 251.0932 230.5920 274.9347 555.8839 100
m2 402.3415 497.2397 554.1094 547.9603 599.7632 787.4112 100
m3 552.8548 657.6245 719.2548 711.4123 769.6098 989.6779 100
m4 166.6581 203.6799 249.2965 235.5856 275.4790 547.4927 100
As we can see, m1 and m4 are the fastest, while there are almost no differences between them, which means it is not needed to convert the matrix to a data frame or a data.table especially if the OP will keep working on the matrix. Working directly on the matrix (m1 and m4) should be sufficient.
If you are OK working with data.frames instead of matrices, you might build a grouping factor/vector according to lens and then use split(). Alternatively, use this grouping vector to subset your matrix and return a list. In this example, I wrapped the two solutions into two functions: .
# your data
mt=matrix(c(1:20),ncol=2)
lens=c(2,3,5)
# based on split
split.df <- function(mt, lens) {
fac <- do.call(c, lapply(1:length(lens), (function(i){ rep(i, lens[i])})))
split(as.data.frame(mt), f = fac)
}
split.df(mt, lens)
# based on subsetting
split.mat <- function(mt, lens) {
fac <- do.call(c, lapply(1:length(lens), (function(i){ rep(i, lens[i])})))
lapply(unique(fac), (function(i) {mt[fac==i,]}))
}
split.mat(mt, lens)
This second option is about ~10 times faster than the other one according to microbenchmark
library(microbenchmark)
microbenchmark({split.df(mt, lens)}, times = 1000)
# median = 323.743 microseconds
microbenchmark({split.mat(mt, lens)}, times = 1000)
# median = 31.7645 microseconds
One aproach is using split, however it can operate on vectors and data.frames so you need to convert the matrix - data.table should be efficient
mt=matrix(c(1:20),ncol=2)
lens=c(2,3,5)
pam = rep(1:length(lens), times = lens)
library(data.table)
mt_split <- split(data.table(mt), pam)
mt_split
#output
$`1`
V1 V2
1: 1 11
2: 2 12
$`2`
V1 V2
1: 3 13
2: 4 14
3: 5 15
$`3`
V1 V2
1: 6 16
2: 7 17
3: 8 18
4: 9 19
5: 10 20
Checking speed
mt=matrix(c(1:20000000),ncol=10)
lens=c(20000,15000,(nrow(mt)-20000-15000))
pam = rep(1:length(lens), times = lens)
system.time(split(data.table(mt), pam))
#output
user system elapsed
0.75 0.20 0.96
I want to split matrices of size k x l into blocks of size n x n considering an ofset o (Like Mathematica's Partition function does).
For example, given a matrix A like
A <- matrix(seq(1:16), nrow = 4, ncol = 4)
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
and block size = 3, offset = 1, I want as output the four submatrices that I'd get from
A[1:3, 1:3]
A[1:3, 2:4]
A[2:4, 1:3]
A[2:4, 2:4]
If offset were equal to 2 or 3, the output for this example should be only the submatrix that I get from
A[1:3, 1:3]
How can I vectorize this?
There might be a more elegant way. Here is how I'd do it by writing a myPartition function which simulates the mathematica Partition function. Firstly use Map to construct possible index along the row and column axis where we use seq to take offset into consideration, and then use cross2 from purrr to construct a list of all possible combinations of the subset index. Finally use lapply to subset the matrix and return a list of subset matrix;
The testing results on offset 1, 2 and 3 are as follows which seems to behave as expected:
library(purrr)
ind <- function(k, n, o) Map(`:`, seq(1, k-n+1, by = o), seq(n, k, by = o))
# this is a little helper function that generates subset index according to dimension of the
# matrix, the first sequence construct the starting point of the subset index with an interval
# of o which is the offset while the second sequence construct the ending point of the subset index
# use Map to construct vector from start to end which in OP's case will be 1:3 and 2:4.
myPartition <- function(mat, n, o) {
lapply(cross2(ind(nrow(mat),n,o), ind(ncol(mat),n,o)), function(i) mat[i[[1]], i[[2]]])
}
# This is basically an lapply. we use cross2 to construct combinations of all subset index
# which will be 1:3 and 1:3, 1:3 and 2:4, 2:4 and 1:3 and 2:4 and 2:4 in OP's case. Use lapply
# to loop through the index and subset.
# Testing case for offset = 1
myPartition(A, 3, 1)
# [[1]]
# [,1] [,2] [,3]
# [1,] 1 5 9
# [2,] 2 6 10
# [3,] 3 7 11
# [[2]]
# [,1] [,2] [,3]
# [1,] 2 6 10
# [2,] 3 7 11
# [3,] 4 8 12
# [[3]]
# [,1] [,2] [,3]
# [1,] 5 9 13
# [2,] 6 10 14
# [3,] 7 11 15
# [[4]]
# [,1] [,2] [,3]
# [1,] 6 10 14
# [2,] 7 11 15
# [3,] 8 12 16
# Testing case for offset = 2
myPartition(A, 3, 2)
# [[1]]
# [,1] [,2] [,3]
# [1,] 1 5 9
# [2,] 2 6 10
# [3,] 3 7 11
# Testing case for offset = 3
myPartition(A, 3, 3)
# [[1]]
# [,1] [,2] [,3]
# [1,] 1 5 9
# [2,] 2 6 10
# [3,] 3 7 11
How about this using base R, the idea is to generate all possible windows (i.e. winds) of size n*n while taking into account the offset. Then print all possible permutations of winds's elements in matrix A (i.e. perms). It works for any A of size k*l.
A <- matrix(seq(1:16), nrow = 4, ncol = 4)
c <- ncol(A); r <- nrow(A)
offset <- 1; size <- 3
sq <- seq(1, max(r,c), offset)
winds <- t(sapply(sq, function(x) c(x,(x+size-1))))
winds <- winds[winds[,2]<=max(r, c),] # check the range
if (is.vector(winds)) dim(winds) <- c(1,2) # vector to matrix
perms <- expand.grid(list(1:nrow(winds), 1:nrow(winds)))
out=apply(perms, 1, function(x) {
a11 <- winds[x[1],1];a12 <- winds[x[1],2];a21 <- winds[x[2],1];a22 <- winds[x[2],2]
if (ifelse(r<c, a12<=r, a22<=c)) { # check the range
cat("A[", a11, ":", a12, ", ", a21, ":", a22, "]", sep="", "\n")
print(A[a11:a12, a21:a22])
}
})
# A[1:3, 1:3]
# [,1] [,2] [,3]
# [1,] 1 5 9
# [2,] 2 6 10
# [3,] 3 7 11
# A[2:4, 1:3]
# [,1] [,2] [,3]
# [1,] 2 6 10
# [2,] 3 7 11
# [3,] 4 8 12
# A[1:3, 2:4]
# [,1] [,2] [,3]
# [1,] 5 9 13
# [2,] 6 10 14
# [3,] 7 11 15
# A[2:4, 2:4]
# [,1] [,2] [,3]
# [1,] 6 10 14
# [2,] 7 11 15
# [3,] 8 12 16
For size=3 and offset=2 or offset=3:
# A[1:3, 1:3]
# [,1] [,2] [,3]
# [1,] 1 5 9
# [2,] 2 6 10
# [3,] 3 7 11
For offset=2 and size=2:
# A[1:2, 1:2]
# [,1] [,2]
# [1,] 1 5
# [2,] 2 6
# A[3:4, 1:2]
# [,1] [,2]
# [1,] 3 7
# [2,] 4 8
# A[1:2, 3:4]
# [,1] [,2]
# [1,] 9 13
# [2,] 10 14
# A[3:4, 3:4]
# [,1] [,2]
# [1,] 11 15
# [2,] 12 16
I have a matrix in R that is supposed to be symmetric, however, due to machine precision the matrix is never symmetric (the values differ by around 10^-16). Since I know the matrix is symmetric I have been doing this so far to get around the problem:
s.diag = diag(s)
s[lower.tri(s,diag=T)] = 0
s = s + t(s) + diag(s.diag,S)
Is there a better one line command for this?
s<-matrix(1:25,5)
s[lower.tri(s)] = t(s)[lower.tri(s)]
You can force the matrix to be symmetric using forceSymmetric function in Matrix package in R:
library(Matrix)
x<-Matrix(rnorm(9), 3)
> x
3 x 3 Matrix of class "dgeMatrix"
[,1] [,2] [,3]
[1,] -1.3484514 -0.4460452 -0.2828216
[2,] 0.7076883 -1.0411563 0.4324291
[3,] -0.4108909 -0.3292247 -0.3076071
A <- forceSymmetric(x)
> A
3 x 3 Matrix of class "dsyMatrix"
[,1] [,2] [,3]
[1,] -1.3484514 -0.4460452 -0.2828216
[2,] -0.4460452 -1.0411563 0.4324291
[3,] -0.2828216 0.4324291 -0.3076071
Is the workaround really necessary if the values only differ by that much?
Someone pointed out that my previous answer was wrong. I like some of the other ones better, but since I can't delete this one (accepted by a user who left), here's yet another solution using the micEcon package:
symMatrix(s[upper.tri(s, TRUE)], nrow=nrow(s), byrow=TRUE)
s<-matrix(1:25,5)
pmean <- function(x,y) (x+y)/2
s[] <- pmean(s, matrix(s, nrow(s), byrow=TRUE))
s
#-------
[,1] [,2] [,3] [,4] [,5]
[1,] 1 4 7 10 13
[2,] 4 7 10 13 16
[3,] 7 10 13 16 19
[4,] 10 13 16 19 22
[5,] 13 16 19 22 25
I was curious to compare all the methods, so ran a quick microbenchmark. Clearly, the simplest 0.5 * (S + t(S)) is the fastest.
The specific function Matrix::forceSymmetric() is sometimes slightly faster, but it returns an object of a different class (dsyMatrix instead of matrix), and converting back to matrix takes a lot of time (although one might argue that it is a good idea to keep the output as dsyMatrix for further gains in computation).
S <-matrix(1:50^2,50)
pick_lower <- function(M) M[lower.tri(M)] = t(M)[lower.tri(M)]
microbenchmark::microbenchmark(micEcon=miscTools::symMatrix(S[upper.tri(S, TRUE)], nrow=nrow(S), byrow=TRUE),
Matri_raw =Matrix::forceSymmetric(S),
Matri_conv =as.matrix(Matrix::forceSymmetric(S)),
pick_lower = pick_lower(S),
base =0.5 * (S + t(S)),
times=100)
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> micEcon 62.133 74.7515 136.49538 104.2430 115.6950 3581.001 100 a
#> Matri_raw 14.766 17.9130 24.15157 24.5060 26.6050 63.939 100 a
#> Matri_conv 46.767 59.8165 5621.96140 66.3785 73.5380 555393.346 100 a
#> pick_lower 27.907 30.7930 235.65058 48.9760 53.0425 12484.779 100 a
#> base 10.771 12.4535 16.97627 17.1190 18.3175 47.623 100 a
Created on 2021-02-08 by the reprex package (v1.0.0)
as.dist() will overwrite the upper triangle of a matrix with the lower one and replace the diagonal with zeros. This method only works on numeric matrices.
mat <- matrix(1:25, 5)
unname(`diag<-`(as.matrix(as.dist(mat)), diag(mat)))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 2 3 4 5
# [2,] 2 7 8 9 10
# [3,] 3 8 13 14 15
# [4,] 4 9 14 19 20
# [5,] 5 10 15 20 25
Inspired by user3318600
s<-matrix(1:25,5)
s[lower.tri(s)]<-s[upper.tri(s)]
I want to go from something like this:
1> a = matrix(c(1,4,2,5,2,5,2,1,4,4,3,2,1,6,7,4),4)
1> a
[,1] [,2] [,3] [,4]
[1,] 1 2 4 1
[2,] 4 5 4 6
[3,] 2 2 3 7
[4,] 5 1 2 4
To something like this:
[,1] [,2]
[1,] 12 15
[2,] 10 16
...without using for-loops, plyr, or otherwise without looping. Possible? I'm trying to shrink a geographic lat/long dataset from 5 arc-minutes to half-degree, and I've got an ascii grid. A little function where I specify blocksize would be great. I've got hundreds of such files, so things that allow me to do it quickly without parallelization/supercomputers would be much appreciated.
You can use matrix multiplication for this.
# Computation matrix:
mat <- function(n, r) {
suppressWarnings(matrix(c(rep(1, r), rep(0, n)), n, n/r))
}
Square-matrix example, uses a matrix and its transpose on each side of a:
# Reduce a 4x4 matrix by a factor of 2:
x <- mat(4, 2)
x
## [,1] [,2]
## [1,] 1 0
## [2,] 1 0
## [3,] 0 1
## [4,] 0 1
t(x) %*% a %*% x
## [,1] [,2]
## [1,] 12 15
## [2,] 10 16
Non-square example:
b <- matrix(1:24, 4 ,6)
t(mat(4, 2)) %*% b %*% mat(6, 2)
## [,1] [,2] [,3]
## [1,] 14 46 78
## [2,] 22 54 86
tapply(a, list((row(a) + 1L) %/% 2L, (col(a) + 1L) %/% 2L), sum)
# 1 2
# 1 12 15
# 2 10 16
I used 1L and 2L instead of 1 and 2 so indices remain integers (as opposed to numerics) and it should run faster that way.
I guess that might help you, but still it uses sapply which can be considered as loop-ish tool.
a <- matrix(c(1,4,2,5,2,5,2,1,4,4,3,2,1,6,7,4),4)
block.step <- 2
res <- sapply(seq(1, nrow(a), by=block.step), function(x)
sapply(seq(1, nrow(a), by=block.step), function(y)
sum(a[x:(x+block.step-1), y:(y+block.step-1)])
)
)
res
Is it anyhow helpful ?