How to vectorize a function - r

I have a 5x4 matrix. I have created a function call fun1, fun1 use double for loop to loop through the matrix and use distance function to work out the distance between two-row. The final results matrix will be a 5x5 matrix.
I am struggling to covert this fun1 to a vectorization function(no loop, only apply function).
x =
[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20
distance = function(a, b) {
sqrt(sum((a - b)^2))
}
fun1 = function(x) {
n = nrow(x)
results = matrix(0, nrow = n, ncol = n)
for (i in seq_len(n)) {
for (j in seq_len(n)) {
results[i,j] = distance(m[i,], m[j,])
}
}
results
}

You can do it with just a matrix multiplication, some additions and a transpose.
x <- matrix(1:20, nrow = 5)
z <- x %*% t(x)
sqrt(diag(z)+t(diag(z)-2*z))
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0 2 4 6 8
#> [2,] 2 0 2 4 6
#> [3,] 4 2 0 2 4
#> [4,] 6 4 2 0 2
#> [5,] 8 6 4 2 0
Interestingly this is faster than the in built method mentioned in the comments above!
mdist <- function(x) {
z <- x %*% t(x)
sqrt(diag(z)+t(diag(z)-2*z))
}
n <- 1000
l <- 100
x <- matrix(runif(n*l), ncol = l)
microbenchmark::microbenchmark(
z1 = as.matrix(dist(x)),
z2 = dist(x, diag = TRUE, upper = TRUE),
z3 = mdist(x),
times = 100
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> z1 82.98502 90.20049 98.54552 94.85027 101.78114 140.1809 100
#> z2 72.54279 76.22054 82.75410 79.31865 83.47765 231.3008 100
#> z3 54.58258 59.73461 65.62313 63.14435 67.49865 115.0379 100

In a pinch, Vectorize can do what you need:
outer(seq_len(nrow(m)), seq_len(nrow(m)),
Vectorize(function(i,j) distance(m[i,], m[j,]), vectorize.args=c("i","j")))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0 2 4 6 8
# [2,] 2 0 2 4 6
# [3,] 4 2 0 2 4
# [4,] 6 4 2 0 2
# [5,] 8 6 4 2 0
Vectorize takes a function as an argument and returns a function that accepts vectors, iterating internally. The function passed to it is called once for each element within the vector passed. By default, Vectorize only vectorizes the first argument of the function, but it can "zip" along multiple arguments, assuming they are all the same length, by using vectorize.args=.
This might be a little easier to visualize by redefining distance:
distance_ind = function(i, j, data) {
sqrt(sum((data[i,] - data[j,])^2))
}
distance_ind(1, 2, m)
# [1] 2
distance_ind(c(1,3), c(2,3), m)
# [1] 2 ### wrong
distance_ind_vec <- Vectorize(distance_ind, vectorize.args = c("i", "j"))
distance_ind_vec(c(1,3), c(2,3), m)
# [1] 2 0
And the outer call:
outer(seq_len(nrow(m)), seq_len(nrow(m)), distance_ind_vec, data = m)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0 2 4 6 8
# [2,] 2 0 2 4 6
# [3,] 4 2 0 2 4
# [4,] 6 4 2 0 2
# [5,] 8 6 4 2 0

Related

Split a matrix in blocks of size n with offset i (vectorized method)

I want to split matrices of size k x l into blocks of size n x n considering an ofset o (Like Mathematica's Partition function does).
For example, given a matrix A like
A <- matrix(seq(1:16), nrow = 4, ncol = 4)
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
and block size = 3, offset = 1, I want as output the four submatrices that I'd get from
A[1:3, 1:3]
A[1:3, 2:4]
A[2:4, 1:3]
A[2:4, 2:4]
If offset were equal to 2 or 3, the output for this example should be only the submatrix that I get from
A[1:3, 1:3]
How can I vectorize this?
There might be a more elegant way. Here is how I'd do it by writing a myPartition function which simulates the mathematica Partition function. Firstly use Map to construct possible index along the row and column axis where we use seq to take offset into consideration, and then use cross2 from purrr to construct a list of all possible combinations of the subset index. Finally use lapply to subset the matrix and return a list of subset matrix;
The testing results on offset 1, 2 and 3 are as follows which seems to behave as expected:
library(purrr)
ind <- function(k, n, o) Map(`:`, seq(1, k-n+1, by = o), seq(n, k, by = o))
# this is a little helper function that generates subset index according to dimension of the
# matrix, the first sequence construct the starting point of the subset index with an interval
# of o which is the offset while the second sequence construct the ending point of the subset index
# use Map to construct vector from start to end which in OP's case will be 1:3 and 2:4.
myPartition <- function(mat, n, o) {
lapply(cross2(ind(nrow(mat),n,o), ind(ncol(mat),n,o)), function(i) mat[i[[1]], i[[2]]])
}
# This is basically an lapply. we use cross2 to construct combinations of all subset index
# which will be 1:3 and 1:3, 1:3 and 2:4, 2:4 and 1:3 and 2:4 and 2:4 in OP's case. Use lapply
# to loop through the index and subset.
# Testing case for offset = 1
myPartition(A, 3, 1)
# [[1]]
# [,1] [,2] [,3]
# [1,] 1 5 9
# [2,] 2 6 10
# [3,] 3 7 11
# [[2]]
# [,1] [,2] [,3]
# [1,] 2 6 10
# [2,] 3 7 11
# [3,] 4 8 12
# [[3]]
# [,1] [,2] [,3]
# [1,] 5 9 13
# [2,] 6 10 14
# [3,] 7 11 15
# [[4]]
# [,1] [,2] [,3]
# [1,] 6 10 14
# [2,] 7 11 15
# [3,] 8 12 16
# Testing case for offset = 2
myPartition(A, 3, 2)
# [[1]]
# [,1] [,2] [,3]
# [1,] 1 5 9
# [2,] 2 6 10
# [3,] 3 7 11
# Testing case for offset = 3
myPartition(A, 3, 3)
# [[1]]
# [,1] [,2] [,3]
# [1,] 1 5 9
# [2,] 2 6 10
# [3,] 3 7 11
How about this using base R, the idea is to generate all possible windows (i.e. winds) of size n*n while taking into account the offset. Then print all possible permutations of winds's elements in matrix A (i.e. perms). It works for any A of size k*l.
A <- matrix(seq(1:16), nrow = 4, ncol = 4)
c <- ncol(A); r <- nrow(A)
offset <- 1; size <- 3
sq <- seq(1, max(r,c), offset)
winds <- t(sapply(sq, function(x) c(x,(x+size-1))))
winds <- winds[winds[,2]<=max(r, c),] # check the range
if (is.vector(winds)) dim(winds) <- c(1,2) # vector to matrix
perms <- expand.grid(list(1:nrow(winds), 1:nrow(winds)))
out=apply(perms, 1, function(x) {
a11 <- winds[x[1],1];a12 <- winds[x[1],2];a21 <- winds[x[2],1];a22 <- winds[x[2],2]
if (ifelse(r<c, a12<=r, a22<=c)) { # check the range
cat("A[", a11, ":", a12, ", ", a21, ":", a22, "]", sep="", "\n")
print(A[a11:a12, a21:a22])
}
})
# A[1:3, 1:3]
# [,1] [,2] [,3]
# [1,] 1 5 9
# [2,] 2 6 10
# [3,] 3 7 11
# A[2:4, 1:3]
# [,1] [,2] [,3]
# [1,] 2 6 10
# [2,] 3 7 11
# [3,] 4 8 12
# A[1:3, 2:4]
# [,1] [,2] [,3]
# [1,] 5 9 13
# [2,] 6 10 14
# [3,] 7 11 15
# A[2:4, 2:4]
# [,1] [,2] [,3]
# [1,] 6 10 14
# [2,] 7 11 15
# [3,] 8 12 16
For size=3 and offset=2 or offset=3:
# A[1:3, 1:3]
# [,1] [,2] [,3]
# [1,] 1 5 9
# [2,] 2 6 10
# [3,] 3 7 11
For offset=2 and size=2:
# A[1:2, 1:2]
# [,1] [,2]
# [1,] 1 5
# [2,] 2 6
# A[3:4, 1:2]
# [,1] [,2]
# [1,] 3 7
# [2,] 4 8
# A[1:2, 3:4]
# [,1] [,2]
# [1,] 9 13
# [2,] 10 14
# A[3:4, 3:4]
# [,1] [,2]
# [1,] 11 15
# [2,] 12 16

Creating a 5x5 matrix with 0's lined diagonally

In R, I want create a 5x5 matrix of 0,1,3,5,7 such that:
0 1 3 5 7
1 0 3 5 7
1 3 0 5 7
1 3 5 0 7
1 3 5 7 0
So obviously I can generate the starting matrix:
z <- c(0,1,3,5,7)
matrix(z, ncol=5, nrow=5, byrow = TRUE)
but I'm unsure of how to move the 0's position. I'm sure I have to use some sort of for/in loop, but I really don't know what exactly I need to do.
How about this:
m <- 1 - diag(5)
m[m==1] <- rep(c(1,3,5,7), each=5)
m
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0 1 3 5 7
# [2,] 1 0 3 5 7
# [3,] 1 3 0 5 7
# [4,] 1 3 5 0 7
# [5,] 1 3 5 7 0
Or we can do:
z <- c(1,3,5,7)
mat <- 1-diag(5)
mat[mat==1] <- z
t(mat)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0 1 3 5 7
# [2,] 1 0 3 5 7
# [3,] 1 3 0 5 7
# [4,] 1 3 5 0 7
# [5,] 1 3 5 7 0
Yet another solution just to enjoy combn as well:
r <- integer(5)
t(combn(5, 1, function(v) {r[v]<-0;r[-v]<-z;r}))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0 1 3 5 7
# [2,] 1 0 3 5 7
# [3,] 1 3 0 5 7
# [4,] 1 3 5 0 7
# [5,] 1 3 5 7 0
Or using sapply:
v <- integer(5)
t(sapply(seq(5), function(x) {v[x]<-0;v[-x]<-z;v}))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0 1 3 5 7
# [2,] 1 0 3 5 7
# [3,] 1 3 0 5 7
# [4,] 1 3 5 0 7
# [5,] 1 3 5 7 0
Here's a solution that builds the data vector with a couple of calls to rep(), a couple of calls to c(), a seq(), and an rbind(), and then wraps it in a call to matrix():
N <- 5L;
matrix(rep(c(0,rbind(seq(1,(N-1)*2,2),0)),rep(c(1,N),len=N*2-1)),N);
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0 1 3 5 7
## [2,] 1 0 3 5 7
## [3,] 1 3 0 5 7
## [4,] 1 3 5 0 7
## [5,] 1 3 5 7 0
Another idea, using two calls to diag() and a cumsum():
N <- 5L;
(1-diag(N))*(cumsum(diag(N)*2)-1);
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0 1 3 5 7
## [2,] 1 0 3 5 7
## [3,] 1 3 0 5 7
## [4,] 1 3 5 0 7
## [5,] 1 3 5 7 0
Benchmarking
Note: For the following benchmarking tests I modified everyone's solutions where necessary to ensure they are parameterized on the matrix size N. For the most part, this just involved replacing some literals with N, and replacing instances of c(1,3,5,7) with seq(1,(N-1)*2,2). I think this is fair.
library(microbenchmark);
josh <- function(N) { m <- 1-diag(N); m[m==1] <- rep(seq(1,(N-1)*2,2),each=N); m; };
marat <- function(N) matrix(rbind(0,col(diag(N))*2-1),nrow=N,ncol=N);
gregor <- function(N) { x = seq(1,(N-1)*2,2); t(mapply(FUN = append, after = c(0, seq_along(x)), MoreArgs = list(x = x, values = 0))); };
barkley <- function(N) { my_vec <- seq(1,(N-1)*2,2); my_val <- 0; my_mat <- matrix(NA, ncol = length(my_vec)+1, nrow = length(my_vec)+1); for (i in 1:nrow(my_mat)) { my_mat[i, i] <- my_val; my_mat[i, -i] <- my_vec; }; my_mat; };
m0h3n <- function(N) { z <- seq(1,(N-1)*2,2); mat=1-diag(N); mat[mat==1]=z; t(mat); };
bgoldst1 <- function(N) matrix(rep(c(0,rbind(seq(1,(N-1)*2,2),0)),rep(c(1,N),len=N*2-1)),N);
bgoldst2 <- function(N) (1-diag(N))*(cumsum(diag(N)*2)-1);
## small-scale: 5x5
N <- 5L;
ex <- josh(N);
identical(ex,marat(N));
## [1] TRUE
identical(ex,gregor(N));
## [1] TRUE
identical(ex,barkley(N));
## [1] TRUE
identical(ex,m0h3n(N));
## [1] TRUE
identical(ex,bgoldst1(N));
## [1] TRUE
identical(ex,bgoldst2(N));
## [1] TRUE
microbenchmark(josh(N),marat(N),gregor(N),barkley(N),m0h3n(N),bgoldst1(N),bgoldst2(N));
## Unit: microseconds
## expr min lq mean median uq max neval
## josh(N) 20.101 21.8110 25.71966 23.0935 24.8045 108.197 100
## marat(N) 5.987 8.1260 9.01131 8.5535 8.9820 24.805 100
## gregor(N) 49.608 51.9605 57.61397 53.8850 61.7965 98.361 100
## barkley(N) 29.081 32.0750 36.33830 33.7855 41.9110 54.740 100
## m0h3n(N) 22.666 24.8040 28.45663 26.0870 28.4400 59.445 100
## bgoldst1(N) 20.528 23.0940 25.49303 23.5220 24.8050 56.879 100
## bgoldst2(N) 3.849 5.1320 5.73551 5.5600 5.9880 16.251 100
## medium-scale: 50x50
N <- 50L;
ex <- josh(N);
identical(ex,marat(N));
## [1] TRUE
identical(ex,gregor(N));
## [1] TRUE
identical(ex,barkley(N));
## [1] TRUE
identical(ex,m0h3n(N));
## [1] TRUE
identical(ex,bgoldst1(N));
## [1] TRUE
identical(ex,bgoldst2(N));
## [1] TRUE
microbenchmark(josh(N),marat(N),gregor(N),barkley(N),m0h3n(N),bgoldst1(N),bgoldst2(N));
## Unit: microseconds
## expr min lq mean median uq max neval
## josh(N) 106.913 110.7630 115.68488 113.1145 116.1080 179.187 100
## marat(N) 62.866 65.4310 78.96237 66.7140 67.9980 1163.215 100
## gregor(N) 195.438 205.2735 233.66129 213.6130 227.9395 1307.334 100
## barkley(N) 184.746 194.5825 227.43905 198.6455 207.1980 1502.771 100
## m0h3n(N) 73.557 76.1230 92.48893 78.6885 81.6820 1176.045 100
## bgoldst1(N) 51.318 54.3125 95.76484 56.4500 60.0855 1732.421 100
## bgoldst2(N) 18.817 21.8110 45.01952 22.6670 23.5220 1118.739 100
## large-scale: 1000x1000
N <- 1e3L;
ex <- josh(N);
identical(ex,marat(N));
## [1] TRUE
identical(ex,gregor(N));
## [1] TRUE
identical(ex,barkley(N));
## [1] TRUE
identical(ex,m0h3n(N));
## [1] TRUE
identical(ex,bgoldst1(N));
## [1] TRUE
identical(ex,bgoldst2(N));
## [1] TRUE
microbenchmark(josh(N),marat(N),gregor(N),barkley(N),m0h3n(N),bgoldst1(N),bgoldst2(N));
## Unit: milliseconds
## expr min lq mean median uq max neval
## josh(N) 40.32035 43.42810 54.46468 45.36386 80.17241 90.69608 100
## marat(N) 41.00074 45.34248 54.74335 47.00904 50.74608 93.85429 100
## gregor(N) 33.65923 37.82393 50.50060 40.24914 75.09810 83.27246 100
## barkley(N) 31.02233 35.42223 43.08745 36.85615 39.81999 85.28585 100
## m0h3n(N) 27.08622 31.00202 38.98395 32.33244 34.33856 90.82652 100
## bgoldst1(N) 12.53962 13.02672 18.31603 14.92314 16.96433 59.87945 100
## bgoldst2(N) 13.23926 16.87965 28.81906 18.92319 54.60009 62.01258 100
## very large scale: 10,000x10,000
N <- 1e4L;
ex <- josh(N);
identical(ex,marat(N));
## [1] TRUE
identical(ex,gregor(N));
## [1] TRUE
identical(ex,barkley(N));
## [1] TRUE
identical(ex,m0h3n(N));
## [1] TRUE
identical(ex,bgoldst1(N));
## [1] TRUE
identical(ex,bgoldst2(N));
## [1] TRUE
microbenchmark(josh(N),marat(N),gregor(N),barkley(N),m0h3n(N),bgoldst1(N),bgoldst2(N));
## Unit: seconds
## expr min lq mean median uq max neval
## josh(N) 3.698714 3.908910 4.067409 4.046770 4.191938 4.608312 100
## marat(N) 6.440882 6.977273 7.272962 7.223293 7.493600 8.471888 100
## gregor(N) 3.546885 3.850812 4.032477 4.022563 4.221085 4.651799 100
## barkley(N) 2.955906 3.162409 3.324033 3.279032 3.446875 4.444848 100
## m0h3n(N) 3.355968 3.667484 3.829618 3.777151 3.973279 4.649226 100
## bgoldst1(N) 1.044510 1.260041 1.363827 1.369945 1.441194 1.819248 100
## bgoldst2(N) 1.144168 1.391711 1.517189 1.519653 1.629994 2.478636 100
Perhaps not the most beautiful solution ever, but maybe elegant in its simplicity:
my_vec <- c(1,3,5,7)
my_val <- 0
my_mat <- matrix(NA, ncol = length(my_vec)+1, nrow = length(my_vec)+1)
for (i in 1:nrow(my_mat)) {
my_mat[i, i] <- my_val
my_mat[i, -i] <- my_vec
}
my_mat
[,1] [,2] [,3] [,4] [,5]
[1,] 0 1 3 5 7
[2,] 1 0 3 5 7
[3,] 1 3 0 5 7
[4,] 1 3 5 0 7
[5,] 1 3 5 7 0
You could use
n <- 5
matrix(rbind(0,col(diag(n))*2-1),nrow=n,ncol=n)
Fun question! In poking around, I saw that append has a after argument.
x = c(1, 3, 5, 7)
t(mapply(FUN = append, after = c(0, seq_along(x)),
MoreArgs = list(x = x, values = 0)))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0 1 3 5 7
# [2,] 1 0 3 5 7
# [3,] 1 3 0 5 7
# [4,] 1 3 5 0 7
# [5,] 1 3 5 7 0
Another option, directly constructing each row:
v = c(1, 3, 5, 7)
n = length(v)
t(sapply(0:n, function(i) c(v[0:i], 0, v[seq(to = n, length.out = n - i)])))
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0 1 3 5 7
#[2,] 1 0 3 5 7
#[3,] 1 3 0 5 7
#[4,] 1 3 5 0 7
#[5,] 1 3 5 7 0

Imputation mean in a matrix in R

I have on matrix in R with 440 rows and 261 columns.
There are some 0 values.
In each row I need to change the 0 values to the mean of all the values.
I tried to do it with the code below, but every time it changed with only the first mean value.
snp2<- read.table("snp2.txt",h=T)
mean <- rowMeans(snp2)
for(k in 1:nrow(snp2))
{
snp2[k==0]<-mean[k]
}
Instead of looping through the rows, you could do this in one shot by identifying all the 0 indices in the matrix and replacing them with the appropriate row mean:
# Sample data
(mat <- matrix(c(0, 1, 2, 1, 0, 3, 11, 11, 11), nrow=3))
# [,1] [,2] [,3]
# [1,] 0 1 11
# [2,] 1 0 11
# [3,] 2 3 11
(zeroes <- which(mat == 0, arr.ind=TRUE))
# row col
# [1,] 1 1
# [2,] 2 2
mat[zeroes] <- rowMeans(mat)[zeroes[,"row"]]
mat
# [,1] [,2] [,3]
# [1,] 4 1 11
# [2,] 1 4 11
# [3,] 2 3 11
While you could fix up your function to replace this missing values row-by-row, this will not be as efficient as the one-shot approach (in addition to being more typing):
josilber <- function(mat) {
zeroes <- which(mat == 0, arr.ind=TRUE)
mat[zeroes] <- rowMeans(mat)[zeroes[,"row"]]
mat
}
OP.fixed <- function(mat) {
means <- rowMeans(mat)
for(k in 1:nrow(mat)) {
mat[k,][mat[k,] == 0] <- means[k]
}
mat
}
bgoldst <- function(m) ifelse(m==0,rowMeans({ mt <- m; mt[mt==0] <- NA; mt; },na.rm=T)[row(m)],m);
# 4400 x 2610 matrix
bigger <- matrix(sample(0:10, 4400*2610, replace=TRUE), nrow=4400)
all.equal(josilber(bigger), OP.fixed(bigger))
# [1] TRUE
# bgoldst differs because it takes means of non-zero values only
library(microbenchmark)
microbenchmark(josilber(bigger), OP.fixed(bigger), bgoldst(bigger), times=10)
# Unit: milliseconds
# expr min lq mean median uq max neval
# josilber(bigger) 262.541 382.0706 406.1107 395.3815 452.0872 532.4742 10
# OP.fixed(bigger) 1033.071 1184.7288 1236.6245 1238.8298 1271.7677 1606.6737 10
# bgoldst(bigger) 3820.044 4033.5826 4368.5848 4201.6302 4611.9697 5581.5514 10
For a fairly large matrix (4400 x 2610), the one-shot procedure is about 3 times quicker than the fixed up solution from the question and about 10 times faster than the one proposed by #bgoldst.
Here's a solution using ifelse(), assuming you want to exclude zeroes from the mean calculation:
NR <- 5; NC <- 5;
set.seed(1); m <- matrix(sample(c(rep(0,5),1:5),NR*NC,replace=T),NR);
m;
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0 4 0 0 5
## [2,] 0 5 0 3 0
## [3,] 1 2 2 5 2
## [4,] 5 2 0 0 0
## [5,] 0 0 3 3 0
ifelse(m==0,rowMeans({ mt <- m; mt[mt==0] <- NA; mt; },na.rm=T)[row(m)],m);
## [,1] [,2] [,3] [,4] [,5]
## [1,] 4.5 4 4.5 4.5 5.0
## [2,] 4.0 5 4.0 3.0 4.0
## [3,] 1.0 2 2.0 5.0 2.0
## [4,] 5.0 2 3.5 3.5 3.5
## [5,] 3.0 3 3.0 3.0 3.0

How to do exponential calculation with matrix?

I want to calculate exponential with a matrix and vector. The matrix is as below
ID var_0 var_01 var_02 var_03
1 1 2 3 4
2 5 6 7 8
3 9 10 11 12
...
and vector is (0.1,0.2,0.3,0.4)
I want to get the result as below
ID var_0 var_01 var_02 var_03
1 1^0.1 2^0.2 3^0.3 4^0.4
2 5^0.1 6^0.2 7^0.3 8^0.4
3 9^0.1 10^0.2 11^0.3 12^0.4
...
That is, I want to get (ith var)^ith vector for each ID
You can use R's recycling of vectors. Transpose your matrix so that the power calculations are applied in the correct order and then transpose back.
(m <- matrix(1:12, nrow=3, ncol=4, byrow=TRUE))
# [,1] [,2] [,3] [,4]
# [1,] 1 2 3 4
# [2,] 5 6 7 8
# [3,] 9 10 11 12
p <- 1:4
t(t(m)^p)
# [,1] [,2] [,3] [,4]
# [1,] 1 4 27 256
# [2,] 5 36 343 4096
# [3,] 9 100 1331 20736
Or you could do (data from #user20650's post)
m^p[col(m)]
# [,1] [,2] [,3] [,4]
#[1,] 1 4 27 256
#[2,] 5 36 343 4096
#[3,] 9 100 1331 20736
Or maybe (using #user20650's data set)
m^rep(p, each = nrow(m))
# [,1] [,2] [,3] [,4]
# [1,] 1 4 27 256
# [2,] 5 36 343 4096
# [3,] 9 100 1331 20736
Another option
m ^ matrix(p, nrow(m), ncol(m), byrow = TRUE)
# [,1] [,2] [,3] [,4]
# [1,] 1 4 27 256
# [2,] 5 36 343 4096
# [3,] 9 100 1331 20736
Some benchmarks on a bigger data set. Seems like my two answers and #akruns scales the best
n <- 1e6
cols <- 100
m <- matrix(seq_len(n), nrow = n, ncol = cols)
p <- seq_len(cols)
user20650 = function() {t(t(m)^p)}
Nick = function() {sweep(m, 2, p, `^`)}
akrun = function() {m^p[col(m)]}
David1 = function() {m^rep(p, each = nrow(m))}
David2 = function() {m ^ matrix(p, nrow(m), ncol(m), byrow = TRUE)}
library(microbenchmark)
Res <- microbenchmark(
user20650() ,
Nick(),
akrun(),
David1(),
David2()
)
Res
# Unit: seconds
# expr min lq median uq max neval
# user20650() 9.692392 9.800470 9.878385 10.010198 11.002012 100
# Nick() 10.487660 10.595750 10.687573 10.896852 14.083319 100
# akrun() 8.213784 8.316646 8.395962 8.529671 9.325273 100
# David1() 9.115449 9.219430 9.304380 9.425614 10.445129 100
# David2() 8.157632 8.275277 8.335884 8.437017 9.348252 100
boxplot(Res)
You can do this using the sweep function. The signature is
sweep(x, MARGIN, STATS, FUN)
This function iterates over parts of x according to how you set MARGIN. On each iteration, the current part of x and the entire argument STATS get passed to FUN, which should be a function taking 2 arguments.
Setting MARGIN to 1 means STATS lines up with the rows of x (dimension 1), 2 means STATS lines up with the columns of x (dimension 2). Other variations are also possible.
So for your particular example, use
sweep(your.matrix, 2, your.exponents, `^`)
Edit: Based on #david-arenburg's answer, you probably shouldn't use sweep. I had no idea it was so slow!

Build a square-ish matrix with a specified number of cells

I would like to write a function that transforms an integer, n, (specifying the number of cells in a matrix) into a square-ish matrix that contain the sequence 1:n. The goal is to make the matrix as "square" as possible.
This involves a couple of considerations:
How to maximize "square"-ness? I was thinking of a penalty equal to the difference in the dimensions of the matrix, e.g. penalty <- abs(dim(mat)[1]-dim(mat)[2]), such that penalty==0 when the matrix is square and is positive otherwise. Ideally this would then, e.g., for n==12 lead to a preference for a 3x4 rather than 2x6 matrix. But I'm not sure the best way to do this.
Account for odd-numbered values of n. Odd-numbered values of n do not necessarily produce an obvious choice of matrix (unless they have an integer square root, like n==9. I thought about simply adding 1 to n, and then handling as an even number and allowing for one blank cell, but I'm not sure if this is the best approach. I imagine it might be possible to obtain a more square matrix (by the definition in 1) by adding more than 1 to n.
Allow the function to trade-off squareness (as described in #1) and the number of blank cells (as described in #2), so the function should have some kind of parameter(s) to address this trade-off. For example, for n==11, a 3x4 matrix is pretty square but not as square as a 4x4, but the 4x4 would have many more blank cells than the 3x4.
The function needs to optionally produce wider or taller matrices, so that n==12 can produce either a 3x4 or a 4x3 matrix. But this would be easy to handle with a t() of the resulting matrix.
Here's some intended output:
> makemat(2)
[,1]
[1,] 1
[2,] 2
> makemat(3)
[,1] [,2]
[1,] 1 3
[2,] 2 4
> makemat(9)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> makemat(11)
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
Here's basically a really terrible start to this problem.
makemat <- function(n) {
n <- abs(as.integer(n))
d <- seq_len(n)
out <- d[n %% d == 0]
if(length(out)<2)
stop('n has fewer than two factors')
dim1a <- out[length(out)-1]
m <- matrix(1:n, ncol=dim1a)
m
}
As you'll see I haven't really been able to account for odd-numbered values of n (look at the output of makemat(7) or makemat(11) as described in #2, or enforce the "squareness" rule described in #1, or the trade-off between them as described in #3.
I think the logic you want is already in the utility function n2mfrow(), which as its name suggests is for creating input to the mfrow graphical parameter and takes an integer input and returns the number of panels in rows and columns to split the display into:
> n2mfrow(11)
[1] 4 3
It favours tall layouts over wide ones, but that is easily fixed via rev() on the output or t() on a matrix produced from the results of n2mfrow().
makemat <- function(n, wide = FALSE) {
if(isTRUE(all.equal(n, 3))) {
dims <- c(2,2)
} else {
dims <- n2mfrow(n)
}
if(wide)
dims <- rev(dims)
m <- matrix(seq_len(prod(dims)), nrow = dims[1], ncol = dims[2])
m
}
Notice I have to special-case n = 3 as we are abusing a function intended for another use and a 3x1 layout on a plot makes more sense than a 2x2 with an empty space.
In use we have:
> makemat(2)
[,1]
[1,] 1
[2,] 2
> makemat(3)
[,1] [,2]
[1,] 1 3
[2,] 2 4
> makemat(9)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> makemat(11)
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
> makemat(11, wide = TRUE)
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
Edit:
The original function padded seq_len(n) with NA, but I realised the OP wanted to have a sequence from 1 to prod(nrows, ncols), which is what the version above does. The one below pads with NA.
makemat <- function(n, wide = FALSE) {
if(isTRUE(all.equal(n, 3))) {
dims <- c(2,2)
} else {
dims <- n2mfrow(n)
}
if(wide)
dims <- rev(dims)
s <- rep(NA, prod(dims))
ind <- seq_len(n)
s[ind] <- ind
m <- matrix(s, nrow = dims[1], ncol = dims[2])
m
}
I think this function implicitly satisfies your constraints. The parameter can range from 0 to Inf. The function always returns either a square matrix with sides of ceiling(sqrt(n)), or a (maybe) rectangular matrix with rows floor(sqrt(n)) and just enough columns to "fill it out". The parameter trades off the selection between the two: if it is less than 1, then the second, more rectangular matrices are preferred, and if greater than 1, the first, always square matrices are preferred. A param of 1 weights them equally.
makemat<-function(n,param=1,wide=TRUE){
if (n<1) stop('n must be positive')
s<-sqrt(n)
bottom<-n-(floor(s)^2)
top<-(ceiling(s)^2)-n
if((bottom*param)<top) {
rows<-floor(s)
cols<-rows + ceiling(bottom / rows)
} else {
cols<-rows<-ceiling(s)
}
if(!wide) {
hold<-rows
rows<-cols
cols<-hold
}
m<-seq.int(rows*cols)
dim(m)<-c(rows,cols)
m
}
Here is an example where the parameter is set to default, and equally trades off the distance equally:
lapply(c(2,3,9,11),makemat)
# [[1]]
# [,1] [,2]
# [1,] 1 2
#
# [[2]]
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
#
# [[3]]
# [,1] [,2] [,3]
# [1,] 1 4 7
# [2,] 2 5 8
# [3,] 3 6 9
#
# [[4]]
# [,1] [,2] [,3] [,4]
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12
Here is an example of using the param with 11, to get a 4x4 matrix.
makemat(11,3)
# [,1] [,2] [,3] [,4]
# [1,] 1 5 9 13
# [2,] 2 6 10 14
# [3,] 3 7 11 15
# [4,] 4 8 12 16
What about something fairly simple and you can handle the exceptions and other requests in a wrapper?
library(taRifx)
neven <- 8
nodd <- 11
nsquareodd <- 9
nsquareeven <- 16
makemat <- function(n) {
s <- seq(n)
if( odd(n) ) {
s[ length(s)+1 ] <- NA
n <- n+1
}
sq <- sqrt( n )
dimx <- ceiling( sq )
dimy <- floor( sq )
if( dimx*dimy < length(s) ) dimy <- ceiling( sq )
l <- dimx*dimy
ldiff <- l - length(s)
stopifnot( ldiff >= 0 )
if( ldiff > 0 ) s[ seq( length(s) + 1, length(s) + ldiff ) ] <- NA
matrix( s, nrow = dimx, ncol = dimy )
}
> makemat(neven)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 NA
> makemat(nodd)
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 NA
> makemat(nsquareodd)
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 NA
[3,] 3 7 NA
[4,] 4 8 NA
> makemat(nsquareeven)
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16

Resources