What is the right way to multiply data frame by vector?

What is the right way to multiply data frame by vector? - r

I'm trying to multiply a data frame df by a vector v, so that the product is a data frame, where the i-th row is given by df[i,]*v. I can do this, for example, by
df <- data.frame(A=1:5, B=2:6); v <- c(0,2)
as.data.frame(t(t(df) * v))
A B
1 0 4
2 0 6
3 0 8
4 0 10
5 0 12
I am sure there has to be a more R-style approach (and a very simple one!), but nothing comes on my mind. I even tried something like
apply(df, MARGIN=1, function(x) x*v)
but still, non-readable constructions like as.data.frame(t(.)) are required.
How can I find an efficient and elegant workaround here?

This works too:
data.frame(mapply(`*`,df,v))
In that solution, you are taking advantage of the fact that data.frame is a type of list, so you can iterate over both the elements of df and v at the same time with mapply.
Unfortunately, you are limited in what you can output from mapply: as simple list, or a matrix. If your data are huge, this would likely be more efficient:
data.frame(mapply(`*`,df,v,SIMPLIFY=FALSE))
Because it would convert it to a list, which is more efficient to convert to a data.frame.

If you're looking for speed and memory efficiency - data.table to the rescue:
library(data.table)
dt = data.table(df)
for (i in seq_along(dt))
dt[, (i) := dt[[i]] * v[i]]
eddi = function(dt) { for (i in seq_along(dt)) dt[, (i) := dt[[i]] * v[i]] }
arun = function(df) { df * matrix(v, ncol=ncol(df), nrow=nrow(df), byrow=TRUE) }
nograpes = function(df) { data.frame(mapply(`*`,df,v,SIMPLIFY=FALSE)) }
N = 1e6
dt = data.table(A = rnorm(N), B = rnorm(N))
v = c(0,2)
microbenchmark(eddi(copy(dt)), arun(copy(dt)), nograpes(copy(dt)), times = 10)
#Unit: milliseconds
# expr min lq mean median uq max neval
# eddi(copy(dt)) 23.01106 24.31192 26.47132 24.50675 28.87794 34.28403 10
# arun(copy(dt)) 337.79885 363.72081 450.93933 433.21176 516.56839 644.70103 10
# nograpes(copy(dt)) 19.44873 24.30791 36.53445 26.00760 38.09078 95.41124 10
As Arun points out in the comments, one can also use the set function from the data.table package to do this in-place modification on data.frame's as well:
for (i in seq_along(df))
set(df, j = i, value = df[[i]] * v[i])
This of course also works for data.table's and could be significantly faster if the number of columns is large.

A language that lets you combine vectors with matrices has to make a decision at some point whether the matrices are row-major or column-major ordered. The reason:
> df * v
A B
1 0 4
2 4 0
3 0 8
4 8 0
5 0 12
is because R operates down the columns first. Doing the double-transpose trick subverts this. Sorry if this is just explaining what you know, but I don't know another way of doing it, except explicitly expanding v into a matrix of the same size.
Or write a nice function that wraps the not very R-style code into something that is R-stylish.

Whats wrong with
t(apply(df, 1, function(x)x*v))
?

library(purrr)
map2_dfc(df, v, `*`)
Benchmark
N = 1e6
dt = data.table(A = rnorm(N), B = rnorm(N))
v = c(0,2)
eddi = function(dt) { for (i in seq_along(dt)) dt[, (i) := dt[[i]] * v[i]]; dt }
arun = function(df) { df * matrix(v, ncol=ncol(df), nrow=nrow(df), byrow=TRUE) }
nograpes = function(df) { data.frame(mapply(`*`,df,v,SIMPLIFY=FALSE)) }
ryan = function(df) {map2_dfc(df, v, `*`) }
library(microbenchmark)
microbenchmark(
eddi(copy(dt))
, arun(copy(dt))
, nograpes(copy(dt))
, ryan(copy(dt))
, times = 100)
# Unit: milliseconds
# expr min lq mean median uq max neval
# eddi(copy(dt)) 8.367513 11.06719 24.26205 12.29132 19.35958 171.6212 100
# arun(copy(dt)) 94.031272 123.79999 186.42155 148.87042 251.56241 364.2193 100
# nograpes(copy(dt)) 7.910739 10.92815 27.68485 13.06058 21.39931 172.0798 100
# ryan(copy(dt)) 8.154395 11.02683 29.40024 13.73845 21.77236 181.0375 100

I think the fastest way (without testing data.table) is data.frame(t(t(df)*v)).
My tests:
testit <- function(nrow, ncol)
{
df <- as.data.frame(matrix(rnorm(nrow*ncol),nrow=nrow,ncol=ncol))
v <- runif(ncol)
r1 <- data.frame(t(t(df)*v))
r2 <- data.frame(mapply(`*`,df,v,SIMPLIFY=FALSE))
r3 <- df * rep(v, each=nrow(df))
stopifnot(identical(r1, r2) && identical(r1, r3))
microbenchmark(data.frame(t(t(df)*v)), data.frame(mapply(`*`,df,v,SIMPLIFY=FALSE)), df * rep(v, each=nrow(df)))
}
Result
> set.seed(1)
>
> testit(100,100)
Unit: milliseconds
expr min lq median uq max neval
data.frame(t(t(df) * v)) 2.297075 2.359541 2.455778 3.804836 33.05806 100
data.frame(mapply(`*`, df, v, SIMPLIFY = FALSE)) 9.977436 10.401576 10.658964 11.762009 15.09721 100
df * rep(v, each = nrow(df)) 14.309822 14.956705 16.092469 16.516609 45.13450 100
> testit(1000,10)
Unit: microseconds
expr min lq median uq max neval
data.frame(t(t(df) * v)) 754.844 805.062 844.431 1850.363 27955.79 100
data.frame(mapply(`*`, df, v, SIMPLIFY = FALSE)) 1457.895 1497.088 1567.604 2550.090 4732.03 100
df * rep(v, each = nrow(df)) 5383.288 5527.817 5875.143 6628.586 32392.81 100
> testit(10,1000)
Unit: milliseconds
expr min lq median uq max neval
data.frame(t(t(df) * v)) 17.07548 18.29418 19.91498 20.67944 57.62913 100
data.frame(mapply(`*`, df, v, SIMPLIFY = FALSE)) 99.90103 104.36028 108.28147 114.82012 150.05907 100
df * rep(v, each = nrow(df)) 112.21719 118.74359 122.51308 128.82863 164.57431 100

Related

Efficient and fast application of a function to 3D arrays in R

I have a very large 3D array (say 100 x 100 x 10) that I would like to apply a function over for pairwise comparisons. I've tried a number of solutions, using data.table, mapply, etc. I'm maybe naively hoping for faster speedups, and am considering just doing this with C++/Rcpp. But before doing that, I thought I'd see if anyone is aware of a more elegant / faster solution to this problem? Many thanks!
Example code in R. For this smaller dimension version of what I'm wanting to apply this to, mapply() is a little faster than data.table
m <- 20
n <- 10 # number of data points per row/col combination
R <- array(runif(n*m*m), dim=c(m,m,n)) # 3D array to apply function over
grid <- expand.grid(A = 1:m, B = 1:m, C = 1:m, D = 1:m) # array indices (used as args below)
#function to do basic correlations between R[1,2,] and R[1,10,]
ss2 <- function(a,b,c,d) {
rho = cor(R[a, b, ], R[c, d, ])
}
#solution with data.table
dt <- setDT(grid) # convert from df -> dt
sol_1 <- dt[, ss2(A, B,C,D), by = seq_len(nrow(dt))]
#solution with mapply
sol_2 <- mapply(ss2, grid$A, grid$B, grid$C, grid$D)
I tried this with mapply(), data.table(). I've also tried using a parellelized version of apply() (parApply, https://dept.stat.lsa.umich.edu/~jerrick/courses/stat701/notes/parallel.html)

UPDATE: cora from the Rfast package gives further performance improvements.
By reshaping the array, we can use cor directly for a ~2K times speedup:
library(data.table)
library(Rfast)
m <- 20
n <- 10 # number of data points per row/col combination
R <- array(runif(n*m*m), dim=c(m,m,n)) # 3D array to apply function over
grid <- expand.grid(A = 1:m, B = 1:m, C = 1:m, D = 1:m)
ss2 <- function(a,b,c,d) rho = cor(R[a, b, ], R[c, d, ])
dt <- setDT(grid)
microbenchmark::microbenchmark(
sol_1 = dt[, ss2(A, B, C, D), by = seq_len(nrow(dt))][[2]],
sol_2 = mapply(ss2, grid$A, grid$B, grid$C, grid$D),
sol_3 = c(cor(t(matrix(R, m*m, n)))),
sol_4 = c(cora(t(matrix(R, m*m, n)))),
check = "equal",
times = 10
)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> sol_1 2101327.2 2135311.0 2186922.33 2178526.6 2247049.6 2301429.5 10
#> sol_2 2255828.9 2266427.5 2306180.23 2287911.0 2321609.6 2471711.7 10
#> sol_3 1203.8 1222.2 1244.75 1236.1 1243.9 1343.5 10
#> sol_4 922.6 945.8 952.68 951.9 955.8 988.8 10
Timing the full 100 x 100 x 10 array:
m <- 100L
n <- 10L
R <- array(runif(n*m*m), dim=c(m,m,n))
microbenchmark::microbenchmark(
sol_3 = c(cor(t(matrix(R, m*m, n)))),
sol_4 = c(cora(t(matrix(R, m*m, n)))),
check = "equal",
times = 10
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> sol_3 1293.0739 1298.4997 1466.546 1503.453 1513.746 1902.802 10
#> sol_4 879.8659 892.2699 1058.064 1055.668 1143.767 1300.282 10
Note that filling by column then transposing tends to be slightly faster than filling by row in this case. Also note that ss2 and grid are no longer needed.

How can I vectorize a for loop used for permutation?

I am using R for analysis and would like to perform a permutation test. For this I am using a for loop that is quite slow and I would like to make the code as fast as possible. I think that vectorization is key for this. However, after several days of trying I still haven't found a suitable solution how to re-code this. I would deeply appreciate your help!
I have a symmetrical matrix with pairwise ecological distances between populations ("dist.mat"). I want to randomly shuffle the rows and columns of this distance matrix to generate a permuted distance matrix ("dist.mat.mix"). Then, I would like to save the upper triangular values in this permuted distance matrix (of the size of "nr.pairs"). This process should be repeated several times ("nr.runs"). The result should be a matrix ("result") containing the permuted upper triangular values of the several runs, with the dimensions of nrow=nr.runs and ncol=nr.pairs. Below an example R code that is doing what I want using a for loop:
# example number of populations
nr.pops <- 20
# example distance matrix
dist.mat <- as.matrix(dist(matrix(rnorm(20), nr.pops, 5)))
# example number of runs
nr.runs <- 1000
# find number of unique pairwise distances in distance matrix
nr.pairs <- nr.pops*(nr.pops-1) / 2
# start loop
result <- matrix(NA, nr.runs, nr.pairs)
for (i in 1:nr.runs) {
mix <- sample(nr.pops, replace=FALSE)
dist.mat.mix <- dist.mat[mix, mix]
result[i, ] <- dist.mat.mix[upper.tri(dist.mat.mix, diag=FALSE)]
}
# inspect result
result
I already made some clumsy vectorization attempts with the base::replicate function, but this doesn't speed things up. Actually it's a bit slower:
# my for loop approach
my.for.loop <- function() {
result <- matrix(NA, nr.runs, nr.pairs)
for (i in 1:nr.runs){
mix <- sample(nr.pops, replace=FALSE)
dist.mat.mix <- dist.mat[mix ,mix]
result[i, ] <- dist.mat.mix[upper.tri(dist.mat.mix, diag=FALSE)]
}
}
# my replicate approach
my.replicate <- function() {
results <- t(replicate(nr.runs, {
mix <- sample(nr.pops, replace=FALSE)
dist.mat.mix <- dist.mat[mix, mix]
dist.mat.mix[upper.tri(dist.mat.mix, diag=FALSE)]
}))
}
# compare speed
require(microbenchmark)
microbenchmark(my.for.loop(), my.replicate(), times=100L)
# Unit: milliseconds
# expr min lq mean median uq max neval
# my.for.loop() 23.1792 24.4759 27.1274 25.5134 29.0666 61.5616 100
# my.replicate() 25.5293 27.4649 30.3495 30.2533 31.4267 68.6930 100
I would deeply appreciate your support in case you know how to speed up my for loop using a neat vectorized solution. Is this even possible?

Slightly faster:
minem <- function() {
result <- matrix(NA, nr.runs, nr.pairs)
ut <- upper.tri(matrix(NA, 4, 4)) # create upper triangular index matrix outside loop
for (i in 1:nr.runs) {
mix <- sample.int(nr.pops) # slightly faster sampling function
result[i, ] <- dist.mat[mix, mix][ut]
}
result
}
microbenchmark(my.for.loop(), my.replicate(), minem(), times = 100L)
# Unit: microseconds
# expr min lq mean median uq max neval cld
# my.for.loop() 75.062 78.222 96.25288 80.1975 104.6915 249.284 100 a
# my.replicate() 118.519 122.667 152.25681 126.0250 165.1355 495.407 100 a
# minem() 45.432 48.000 104.23702 49.5800 52.9380 4848.986 100 a
Update:
We can get the necessary matrix indexes a little bit differently, so we can subset the elements at once:
minem4 <- function() {
n <- dim(dist.mat)[1]
ut <- upper.tri(matrix(NA, n, n))
im <- matrix(1:n, n, n)
p1 <- im[ut]
p2 <- t(im)[ut]
dm <- unlist(dist.mat)
si <- replicate(nr.runs, sample.int(nr.pops))
p <- (si[p1, ] - 1L) * n + si[p2, ]
result2 <- matrix(dm[p], nr.runs, nr.pairs, byrow = T)
result2
}
microbenchmark(my.for.loop(), minem(), minem4(), times = 100L)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# my.for.loop() 13.797526 14.977970 19.14794 17.071401 23.161867 29.98952 100 b
# minem() 8.366614 9.080490 11.82558 9.701725 15.748537 24.44325 100 a
# minem4() 7.716343 8.169477 11.91422 8.723947 9.997626 208.90895 100 a
Update2:
Some additional speedup we can get using dqrng sample function:
minem5 <- function() {
n <- dim(dist.mat)[1]
ut <- upper.tri(matrix(NA, n, n))
im <- matrix(1:n, n, n)
p1 <- im[ut]
p2 <- t(im)[ut]
dm <- unlist(dist.mat)
require(dqrng)
si <- replicate(nr.runs, dqsample.int(nr.pops))
p <- (si[p1, ] - 1L) * n + si[p2, ]
result2 <- matrix(dm[p], nr.runs, nr.pairs, byrow = T)
result2
}
microbenchmark(my.for.loop(), minem(), minem4(), minem5(), times = 100L)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# my.for.loop() 13.648983 14.672587 17.713467 15.265771 16.967894 36.18290 100 d
# minem() 8.282466 8.773725 10.679960 9.279602 10.335206 27.03683 100 c
# minem4() 7.719503 8.208984 9.039870 8.493231 9.097873 25.32463 100 b
# minem5() 6.134911 6.379850 7.226348 6.733035 7.195849 19.02458 100 a

Compare Matrices in R efficiently

I have an array a with some matrices in it. Now i need to efficiently check how many different matrices I have and what indices (in ascending order) they have in the array. My approach is the following: Paste the columns of the matrixes as character vectors and have a look at the frequency table like this:
n <- 10 #observations
a <- array(round(rnorm(2*2*n),1),
c(2,2,n))
paste_a <- apply(a, c(3), paste, collapse=" ") #paste by column
names(paste_a) <- 1:n
freq <- as.numeric( table(paste_a) ) # frequencies of different matrices (in ascending order)
indizes <- as.numeric(names(sort(paste_a[!duplicated(paste_a)])))
nr <- length(freq) #number of different matrices
However, as you increase n to large numbers, this gets very inefficient (it's mainly paste() that's getting slower and slower). Does anyone have a better solution?
Here is a "real" dataset with 100 observations where some matrices are actual duplicates (as opposed to my example above): https://pastebin.com/aLKaSQyF
Thank you very much.

Since your actual data is made up of the integers 0,1,2,3, why not take advantage of base 4? Integers are much faster to compare than entire matrix objects. (All occurrences of a below are of the data found in the real data set from the link.)
Base4Approach <- function() {
toBase4 <- sapply(1:dim(a)[3], function(x) {
v <- as.vector(a[,,x])
pows <- which(v > 0)
coefs <- v[pows]
sum(coefs*(4^pows))
})
myDupes <- which(duplicated(toBase4))
a[,,-(myDupes)]
}
And since the question is about efficiency, let's benchmark:
MartinApproach <- function() {
### commented this out for comparison reasons
# dimnames(a) <- list(1:dim(a)[1], 1:dim(a)[2], 1:dim(a)[3])
a <- a[,,!duplicated(a, MARGIN = 3)]
nr <- dim(a)[3]
a
}
identical(MartinApproach(), Base4Approach())
[1] TRUE
microbenchmark(Base4Approach(), MartinApproach())
Unit: microseconds
expr min lq mean median uq max neval
Base4Approach() 291.658 303.525 339.2712 325.4475 352.981 636.361 100
MartinApproach() 983.855 1000.958 1160.4955 1071.9545 1187.321 3545.495 100
The approach by #d.b. doesn't really do the same thing as the previous two approaches (it simply identifies and doesn't remove duplicates).
DBApproach <- function() {
a[, , 9] = a[, , 1]
#Convert to list
mylist = lapply(1:dim(a)[3], function(i) a[1:dim(a)[1], 1:dim(a)[2], i])
temp = sapply(mylist, function(x) sapply(mylist, function(y) identical(x, y)))
temp2 = unique(apply(temp, 1, function(x) sort(which(x))))
#The indices in 'a' where the matrices are same
temp2[lengths(temp2) > 1]
}
However, Base4Approach still dominates:
microbenchmark(Base4Approach(), MartinApproach(), DBApproach())
Unit: microseconds
expr min lq mean median uq max neval
Base4Approach() 298.764 324.0555 348.8534 338.899 356.0985 476.475 100
MartinApproach() 1012.601 1087.9450 1204.1150 1110.662 1162.9985 3224.299 100
DBApproach() 9312.902 10339.4075 11616.1644 11438.967 12413.8915 17065.494 100
Update courtesy of #alexis_laz
As mentioned in the comments by #alexis_laz, we can do much better.
AlexisBase4Approach <- function() {
toBase4 <- colSums(a * (4 ^ (0:(prod(dim(a)[1:2]) - 1))), dims = 2)
myDupes <- which(duplicated(toBase4))
a[,,-(myDupes)]
}
microbenchmark(Base4Approach(), MartinApproach(), DBApproach(), AlexisBase4Approach(), unit = "relative")
Unit: relative
expr min lq mean median uq max neval
Base4Approach() 11.67992 10.55563 8.177654 8.537209 7.128652 5.288112 100
MartinApproach() 39.60408 34.60546 27.930725 27.870019 23.836163 22.488989 100
DBApproach() 378.91510 342.85570 262.396843 279.190793 231.647905 108.841199 100
AlexisBase4Approach() 1.00000 1.00000 1.000000 1.000000 1.000000 1.000000 100
## Still gives accurate results
identical(MartinApproach(), AlexisBase4Approach())
[1] TRUE

My first attempt was actually really slow. So here is slightly changed version of yours:
dimnames(a) <- list(1:dim(a)[1], 1:dim(a)[2], 1:dim(a)[3])
a <- a[,,!duplicated(a, MARGIN = 3)]
nr <- dim(a)[3] #number of different matrices
idx <- dimnames(a)[[3]] # indices of left over matrices

I don't know if this is exactly what you want but here is a way you can extract indices where the matrices are same. More processing may be necessary to get what you want
#DATA
n <- 10
a <- array(round(rnorm(2*2*n),1), c(2,2,n))
a[, , 9] = a[, , 1]
temp = unique(apply(X = sapply(1:dim(a)[3], function(i)
sapply(1:dim(a)[3], function(j) identical(a[, , i], a[, , j]))),
MARGIN = 1,
FUN = function(x) sort(which(x))))
temp[lengths(temp) > 1]
#[[1]]
#[1] 1 9

Faster way to unlist a list of large matrices?

I have a list of large matrices. All these matrices have the same number of rows and I want to "unlist" them and bind all their columns together. Below is a piece of code that I wrote, but I am not sure if this is the best I can achieve in terms of computational efficiency.
# simulate
n <- 10
nr <- 24
nc <- 8000
test <- list()
set.seed(1234)
for (i in 1:n){
test[[i]] <- matrix(rnorm(nr*nc),nr,nc)
}
> system.time( res <- matrix( as.numeric( unlist(test) ) ,nr,nc*n) )
user system elapsed
0.114 0.006 0.120

To work on a list and call a function on all objects, do.call is my usual first idea, along with cbind here to bind by column all objects.
For n=100 (with others answers for sake of completeness):
n <- 10
nr <- 24
nc <- 8000
test <- list()
set.seed(1234)
for (i in 1:n){
test[[i]] <- matrix(rnorm(nr*nc),nr,nc)
}
require(data.table)
ori <- function() { matrix( as.numeric( unlist(test) ) ,nr,nc*n) }
Tensibai <- function() { do.call(cbind,test) }
BrodieG <- function() { `attr<-`(do.call(c, test), "dim", c(nr, nc * n)) }
nicola <- function() { setattr(unlist(test),"dim",c(nr,nc*n)) }
library(microbenchmark)
microbenchmark(r1 <- ori(),
r2 <- Tensibai(),
r3 <- BrodieG(),
r4 <- nicola(), times=10)
Results:
Unit: milliseconds
expr min lq mean median uq max neval cld
r1 <- ori() 23.834673 24.287391 39.49451 27.066844 29.737964 93.74249 10 a
r2 <- Tensibai() 17.416232 17.706165 18.18665 17.873083 18.192238 21.29512 10 a
r3 <- BrodieG() 6.009344 6.145045 21.63073 8.690869 10.323845 77.95325 10 a
r4 <- nicola() 5.912984 6.106273 13.52697 6.273904 6.678156 75.40914 10 a
As for the why (in comments), #nicola did give the answer about it, there's less copy than original method.
All methods gives the same result:
> identical(r1,r2,r3,r4)
[1] TRUE

It seems that do.call beats the other method due to a copy made during the matrix call. What is interesting is that you can avoid that copy using the data.table::setattr function which allows to set attributes by reference, avoiding any copy. I omitted also the as.numeric part, since it is not necessary (unlist(test) is already numeric). So:
require(microbenchmark)
require(data.table)
f1<-function() setattr(unlist(test),"dim",c(nr,nc*n))
f2<-function() do.call(cbind,test)
microbenchmark(res <-f1(),res2 <- f2(),times=10)
#Unit: milliseconds
# expr min lq mean median uq max neval
# res <- f1() 4.088455 4.183504 7.540913 4.44109 4.988605 35.05378 10
#res2 <- f2() 18.325302 18.379328 18.776834 18.66857 19.100681 19.47415 10
identical(res,res2)
#[1] TRUE

I think I have a better one. We can avoid some of the overhead from cbind since we know these all have the same number of rows and columns. Instead, we use c knowing that the underlying vector nature of the matrices will allow us to re-wrap them into the correct dimensions:
microbenchmark(
x <- `attr<-`(do.call(c, test), "dim", c(nr, nc * n)),
y <- do.call(cbind, test)
)
# Unit: milliseconds
# expr min lq
# x <- `attr<-`(do.call(c, test), "dim", c(nr, nc * n)) 4.435943 4.699006
# y <- do.call(cbind, test) 19.339477 19.567063
# mean median uq max neval cld
# 12.76214 5.209938 9.095001 379.77856 100 a
# 21.64878 20.000279 24.210848 26.02499 100 b
identical(x, y)
# [1] TRUE
If you have varying number of columns you can probably still do this with some care in computing the total number of columns.

partial product of two matrices

I'm trying to find a vectorised trick to calculate the products between row i and col i of two matrices, without wasting resources on the other products (row i and col j, i!=j).
A <- matrix(rnorm(4*5), nrow=4)
B <- matrix(rnorm(5*4), ncol=4)
diag(A %*% B)
Is there a name for this product, a base R function, or a reshaping strategy that avoids a for loop?
for (ii in seq.int(nrow(A)))
print(crossprod(A[ii,], B[,ii]))

rowSums(A * t(B)) seems to be quite fast:
A <- matrix(rnorm(400*500), nrow=400)
B <- matrix(rnorm(500*400), ncol=400)
bF <- function() diag(A %*% B)
jF <- function() rowSums(A * t(B))
vF <- function() mapply(crossprod, as.data.frame(t(A)), as.data.frame(B))
lF <- function() {
vec <- numeric(nrow(A))
for (ii in seq.int(nrow(A)))
vec[ii] <- crossprod(A[ii,], B[,ii])
vec
}
library(microbenchmark)
microbenchmark(bF(), jF(), vF(), lF(), times = 100)
# Unit: milliseconds
# expr min lq median uq max neval
# bF() 137.828993 183.320782 185.823658 200.747130 207.67997 100
# jF() 4.434627 5.300882 5.341477 5.475393 46.96347 100
# vF() 39.110948 51.071936 54.147338 55.127911 102.17793 100
# lF() 14.029454 18.667055 18.931154 22.166137 65.40562 100

How about this?
mapply(crossprod, as.data.frame(t(A)), as.data.frame(B))

Categories

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

What is the right way to multiply data frame by vector? - r

Whats wrong with t(apply(df, 1, function(x)x*v)) ?

Related

Efficient and fast application of a function to 3D arrays in R

How can I vectorize a for loop used for permutation?

Compare Matrices in R efficiently

Faster way to unlist a list of large matrices?

partial product of two matrices

Categories

Resources