r - Parallel allocation of a matrix [duplicate] - r

I want to do a simple column (Nx1) times row (1xM) multiplication, resulting in (NxM) matrix.
Code where I create a row by sequence, and column by transposing a similar sequence
row1 <- seq(1:6)
col1 <- t(seq(1:6))
col1 * row1
Output which indicates that R thinks matrices more like columns
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 4 9 16 25 36
Expected output: NxM matrix.
OS: Debian 8.5
Linux kernel: 4.6 backports
Hardware: Asus Zenbook UX303UA

In this case using outer would be a more natural choice
outer(1:6, 1:6)
In general for two numerical vectors x and y, the matrix rank-1 operation can be computed as
outer(x, y)
If you want to resort to real matrix multiplication routines, use tcrossprod:
tcrossprod(x, y)
If either of your x and y is a matrix with dimension, use as.numeric to cast it as a vector first.
It is not recommended to use general matrix multiplication operation "%*%" for this. But if you want, make sure you get comformable dimension: x is a one-column matrix and y is a one-row matrix, so x %*% y.
Can you say anything about efficiency?
Matrix rank-1 operation is known to be memory-bound. So make sure we use gc() for garbage collection to tell R to release memory from heap after every replicate (otherwise your system will stall):
x <- runif(500)
y <- runif(500)
xx <- matrix(x, ncol = 1)
yy <- matrix(y, nrow = 1)
system.time(replicate(200, {outer(x,y); gc();}))
# user system elapsed
# 4.484 0.324 4.837
system.time(replicate(200, {tcrossprod(x,y); gc();}))
# user system elapsed
# 4.320 0.324 4.653
system.time(replicate(200, {xx %*% yy; gc();}))
# user system elapsed
# 4.372 0.324 4.708
In terms of performance, they are all very alike.
Follow-up
When I came back I saw another answer with a different benchmark. Well, the thing is, it depends on the problem size. If you just try a small example you can not eliminate function interpretation / calling overhead for all three functions. If you do
x <- y <- runif(500)
microbenchmark(tcrossprod(x,y), x %*% t(y), outer(x,y), times = 200)
you will see roughly identical performance again.
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# tcrossprod(x, y) 2.09644 2.42466 3.402483 2.60424 3.94238 35.52176 200 a
# x %*% t(y) 2.22520 2.55678 3.707261 2.66722 4.05046 37.11660 200 a
# outer(x, y) 2.08496 2.55424 3.695660 2.69512 4.08938 35.41044 200 a

Here's a comparison of the execution speed for the three methods when the vectors being used are of length 100. The fastest is tcrossprod, with x%*%t(y) taking 17% longer and outer(x,y) taking 45% longer (in median time).
In the table, neval is the number of times the function was evaluated to get the benchmark scores.
> x <- runif(100,0,100)
> y <- runif(100,0,100)
> microbenchmark(tcrossprod(x,y), x%*%t(y), outer(x,y), times=5000)
Unit: microseconds
expr min lq mean median uq max neval
tcrossprod(x, y) 11.404 16.6140 50.42392 17.7300 18.7555 5590.103 5000
x %*% t(y) 13.878 19.4315 48.80170 20.5405 21.7310 4459.517 5000
outer(x, y) 19.238 24.0810 72.05250 25.3595 26.8920 89861.855 5000
To get the the following graph, have
library("ggplot2")
bench <- microbenchmark(tcrossprod(x,y), x%*%t(y), outer(x,y), times=5000)
autplot(bench)
Edit: The performance depends on the size of x and y, and of course the machine running the code. I originally did the benchmark with vectors of length 100 because that's what Masi asked about. However, it appears the three methods have very similar performance for larger vectors. For vectors of length 1000, the median times of the three methods are within 5% of each other on my machine.
> x <- runif(1000)
> y <- runif(1000)
> microbenchmark(tcrossprod(x,y),x%*%t(y),outer(x,y),times=2000)
Unit: milliseconds
expr min lq mean median uq max neval
tcrossprod(x, y) 1.870282 2.030541 4.721175 2.916133 4.482346 75.77459 2000
x %*% t(y) 1.861947 2.067908 4.921061 3.067670 4.527197 105.60500 2000
outer(x, y) 1.886348 2.078958 5.114886 3.033927 4.556067 93.93450 2000

An easy way to look at this is to transform your vectors to a matrix
row1.mat = matrix(row1)
col1.mat = matrix(col1)
and then use dim to see the dimension of the matrices:
dim(row1.mat)
dim(col1.mat)
If you want the product to work for this you need a 6*1 matrix, multiplied by a 1*6 matrix. so you need to transpose the col1.mat using t(col1.mat).
And as you might know the matrix product is %*%
row1.mat %*% t(col1.mat)
Comparison of this method to others
library("microbenchmark")
x <- runif(1000)
y <- runif(1000)
xx = matrix(x)
yy = matrix(y)
microbenchmark(tcrossprod(x,y),x%*%t(y),outer(x,y), xx %*% t(yy), times=2000)
Unit: milliseconds
expr min lq mean median uq max neval
tcrossprod(x, y) 2.829099 3.243785 6.015880 4.801640 5.040636 77.87932 2000
x %*% t(y) 2.847175 3.251414 5.942841 4.810261 5.049474 86.53374 2000
outer(x, y) 2.886059 3.277811 5.983455 4.788054 5.074997 96.12442 2000
xx %*% t(yy) 2.868185 3.255833 6.126183 4.699884 5.056234 87.80024 2000

Related

Fast Simplex in R

I am not sure whether this is the right place to ask, but I need to efficiently implement a very large linear programming problem in R (and there is no way around R here). I have tried out some packages like lpSolve but the results seem unsatisfactory. I would be glad for any advice on packages, or alternatively a better place to ask this question.
Here is the problem:
N <- 10^4 # number of products
K <- 10^4 # number of scenarios
### Get expectation and covariance matrix
mu <- rep(100,N)
A <- matrix(rnorm(N^2,0,1), nrow=N, ncol=N)
Sigma <- t(A) %*% A
R <- mvrnorm(K, mu, Sigma) # create scenarios
means <- apply(R, 2, mean) # computes mean for each product
### The LP
# There are some additional constraints to pure expectation maximization
# This leads to additional variables
c <- c(-means,0,rep(0,K))
A1 <- c(rep(1,N),0,rep(0,K))
A2 <- c(rep(0,N),1,-rep((0.05*K)^(-1),K))
A3 <- cbind(R,rep(-1,K),diag(1,K))
A <- rbind(A1,A2,A3)
b <- c(1,98,rep(0,K))
system.time(lp <- lp(direction = "min", objective.in = c, const.mat = A,
const.dir = c("=", ">=", rep(">=",K)), const.rhs = b))
You can try the Rglpk package. In the kantorovich package, there are two functions which solve the same linear programming problem: one using Rglpk, the other one using lpSolve. Benchmarks show that the first one is faster when the data is large.
library(kantorovich)
library(microbenchmark)
mu <- rpois(50, 20)
mu <- mu/sum(mu)
nu <- rpois(50, 20)
nu <- nu/sum(nu)
microbenchmark(
Rglpk = kantorovich_glpk(mu, nu),
lpSolve = kantorovich_lp(mu, nu),
times = 3
)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# Rglpk 402.0955 552.3754 605.159 702.6553 706.6908 710.7264 3 a
# lpSolve 1092.7131 1184.7517 1327.208 1276.7904 1444.4552 1612.1200 3 b
Edit: better with 'CVXR'
I tried and CVXR with the ECOS solver is faster:
Unit: milliseconds
expr min lq mean median uq max neval cld
Rglpk 364.2730 378.3198 383.0741 392.3666 392.4746 392.5827 3 b
lpSolve 1012.4465 1087.0728 1128.5777 1161.6990 1186.6433 1211.5875 3 c
CVXR 370.5944 386.2229 392.8022 401.8515 403.9061 405.9607 3 b
CVXR_GLPK 483.2246 488.4495 508.6683 493.6744 521.3902 549.1060 3 b
CVXR_ECOS 219.0252 222.9561 224.3361 226.8871 226.9915 227.0960 3 a

Compare Matrices in R efficiently

I have an array a with some matrices in it. Now i need to efficiently check how many different matrices I have and what indices (in ascending order) they have in the array. My approach is the following: Paste the columns of the matrixes as character vectors and have a look at the frequency table like this:
n <- 10 #observations
a <- array(round(rnorm(2*2*n),1),
c(2,2,n))
paste_a <- apply(a, c(3), paste, collapse=" ") #paste by column
names(paste_a) <- 1:n
freq <- as.numeric( table(paste_a) ) # frequencies of different matrices (in ascending order)
indizes <- as.numeric(names(sort(paste_a[!duplicated(paste_a)])))
nr <- length(freq) #number of different matrices
However, as you increase n to large numbers, this gets very inefficient (it's mainly paste() that's getting slower and slower). Does anyone have a better solution?
Here is a "real" dataset with 100 observations where some matrices are actual duplicates (as opposed to my example above): https://pastebin.com/aLKaSQyF
Thank you very much.
Since your actual data is made up of the integers 0,1,2,3, why not take advantage of base 4? Integers are much faster to compare than entire matrix objects. (All occurrences of a below are of the data found in the real data set from the link.)
Base4Approach <- function() {
toBase4 <- sapply(1:dim(a)[3], function(x) {
v <- as.vector(a[,,x])
pows <- which(v > 0)
coefs <- v[pows]
sum(coefs*(4^pows))
})
myDupes <- which(duplicated(toBase4))
a[,,-(myDupes)]
}
And since the question is about efficiency, let's benchmark:
MartinApproach <- function() {
### commented this out for comparison reasons
# dimnames(a) <- list(1:dim(a)[1], 1:dim(a)[2], 1:dim(a)[3])
a <- a[,,!duplicated(a, MARGIN = 3)]
nr <- dim(a)[3]
a
}
identical(MartinApproach(), Base4Approach())
[1] TRUE
microbenchmark(Base4Approach(), MartinApproach())
Unit: microseconds
expr min lq mean median uq max neval
Base4Approach() 291.658 303.525 339.2712 325.4475 352.981 636.361 100
MartinApproach() 983.855 1000.958 1160.4955 1071.9545 1187.321 3545.495 100
The approach by #d.b. doesn't really do the same thing as the previous two approaches (it simply identifies and doesn't remove duplicates).
DBApproach <- function() {
a[, , 9] = a[, , 1]
#Convert to list
mylist = lapply(1:dim(a)[3], function(i) a[1:dim(a)[1], 1:dim(a)[2], i])
temp = sapply(mylist, function(x) sapply(mylist, function(y) identical(x, y)))
temp2 = unique(apply(temp, 1, function(x) sort(which(x))))
#The indices in 'a' where the matrices are same
temp2[lengths(temp2) > 1]
}
However, Base4Approach still dominates:
microbenchmark(Base4Approach(), MartinApproach(), DBApproach())
Unit: microseconds
expr min lq mean median uq max neval
Base4Approach() 298.764 324.0555 348.8534 338.899 356.0985 476.475 100
MartinApproach() 1012.601 1087.9450 1204.1150 1110.662 1162.9985 3224.299 100
DBApproach() 9312.902 10339.4075 11616.1644 11438.967 12413.8915 17065.494 100
Update courtesy of #alexis_laz
As mentioned in the comments by #alexis_laz, we can do much better.
AlexisBase4Approach <- function() {
toBase4 <- colSums(a * (4 ^ (0:(prod(dim(a)[1:2]) - 1))), dims = 2)
myDupes <- which(duplicated(toBase4))
a[,,-(myDupes)]
}
microbenchmark(Base4Approach(), MartinApproach(), DBApproach(), AlexisBase4Approach(), unit = "relative")
Unit: relative
expr min lq mean median uq max neval
Base4Approach() 11.67992 10.55563 8.177654 8.537209 7.128652 5.288112 100
MartinApproach() 39.60408 34.60546 27.930725 27.870019 23.836163 22.488989 100
DBApproach() 378.91510 342.85570 262.396843 279.190793 231.647905 108.841199 100
AlexisBase4Approach() 1.00000 1.00000 1.000000 1.000000 1.000000 1.000000 100
## Still gives accurate results
identical(MartinApproach(), AlexisBase4Approach())
[1] TRUE
My first attempt was actually really slow. So here is slightly changed version of yours:
dimnames(a) <- list(1:dim(a)[1], 1:dim(a)[2], 1:dim(a)[3])
a <- a[,,!duplicated(a, MARGIN = 3)]
nr <- dim(a)[3] #number of different matrices
idx <- dimnames(a)[[3]] # indices of left over matrices
I don't know if this is exactly what you want but here is a way you can extract indices where the matrices are same. More processing may be necessary to get what you want
#DATA
n <- 10
a <- array(round(rnorm(2*2*n),1), c(2,2,n))
a[, , 9] = a[, , 1]
temp = unique(apply(X = sapply(1:dim(a)[3], function(i)
sapply(1:dim(a)[3], function(j) identical(a[, , i], a[, , j]))),
MARGIN = 1,
FUN = function(x) sort(which(x))))
temp[lengths(temp) > 1]
#[[1]]
#[1] 1 9

How to do R multiplication with Nx1 1xM for Matrix NxM?

I want to do a simple column (Nx1) times row (1xM) multiplication, resulting in (NxM) matrix.
Code where I create a row by sequence, and column by transposing a similar sequence
row1 <- seq(1:6)
col1 <- t(seq(1:6))
col1 * row1
Output which indicates that R thinks matrices more like columns
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 4 9 16 25 36
Expected output: NxM matrix.
OS: Debian 8.5
Linux kernel: 4.6 backports
Hardware: Asus Zenbook UX303UA
In this case using outer would be a more natural choice
outer(1:6, 1:6)
In general for two numerical vectors x and y, the matrix rank-1 operation can be computed as
outer(x, y)
If you want to resort to real matrix multiplication routines, use tcrossprod:
tcrossprod(x, y)
If either of your x and y is a matrix with dimension, use as.numeric to cast it as a vector first.
It is not recommended to use general matrix multiplication operation "%*%" for this. But if you want, make sure you get comformable dimension: x is a one-column matrix and y is a one-row matrix, so x %*% y.
Can you say anything about efficiency?
Matrix rank-1 operation is known to be memory-bound. So make sure we use gc() for garbage collection to tell R to release memory from heap after every replicate (otherwise your system will stall):
x <- runif(500)
y <- runif(500)
xx <- matrix(x, ncol = 1)
yy <- matrix(y, nrow = 1)
system.time(replicate(200, {outer(x,y); gc();}))
# user system elapsed
# 4.484 0.324 4.837
system.time(replicate(200, {tcrossprod(x,y); gc();}))
# user system elapsed
# 4.320 0.324 4.653
system.time(replicate(200, {xx %*% yy; gc();}))
# user system elapsed
# 4.372 0.324 4.708
In terms of performance, they are all very alike.
Follow-up
When I came back I saw another answer with a different benchmark. Well, the thing is, it depends on the problem size. If you just try a small example you can not eliminate function interpretation / calling overhead for all three functions. If you do
x <- y <- runif(500)
microbenchmark(tcrossprod(x,y), x %*% t(y), outer(x,y), times = 200)
you will see roughly identical performance again.
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# tcrossprod(x, y) 2.09644 2.42466 3.402483 2.60424 3.94238 35.52176 200 a
# x %*% t(y) 2.22520 2.55678 3.707261 2.66722 4.05046 37.11660 200 a
# outer(x, y) 2.08496 2.55424 3.695660 2.69512 4.08938 35.41044 200 a
Here's a comparison of the execution speed for the three methods when the vectors being used are of length 100. The fastest is tcrossprod, with x%*%t(y) taking 17% longer and outer(x,y) taking 45% longer (in median time).
In the table, neval is the number of times the function was evaluated to get the benchmark scores.
> x <- runif(100,0,100)
> y <- runif(100,0,100)
> microbenchmark(tcrossprod(x,y), x%*%t(y), outer(x,y), times=5000)
Unit: microseconds
expr min lq mean median uq max neval
tcrossprod(x, y) 11.404 16.6140 50.42392 17.7300 18.7555 5590.103 5000
x %*% t(y) 13.878 19.4315 48.80170 20.5405 21.7310 4459.517 5000
outer(x, y) 19.238 24.0810 72.05250 25.3595 26.8920 89861.855 5000
To get the the following graph, have
library("ggplot2")
bench <- microbenchmark(tcrossprod(x,y), x%*%t(y), outer(x,y), times=5000)
autplot(bench)
Edit: The performance depends on the size of x and y, and of course the machine running the code. I originally did the benchmark with vectors of length 100 because that's what Masi asked about. However, it appears the three methods have very similar performance for larger vectors. For vectors of length 1000, the median times of the three methods are within 5% of each other on my machine.
> x <- runif(1000)
> y <- runif(1000)
> microbenchmark(tcrossprod(x,y),x%*%t(y),outer(x,y),times=2000)
Unit: milliseconds
expr min lq mean median uq max neval
tcrossprod(x, y) 1.870282 2.030541 4.721175 2.916133 4.482346 75.77459 2000
x %*% t(y) 1.861947 2.067908 4.921061 3.067670 4.527197 105.60500 2000
outer(x, y) 1.886348 2.078958 5.114886 3.033927 4.556067 93.93450 2000
An easy way to look at this is to transform your vectors to a matrix
row1.mat = matrix(row1)
col1.mat = matrix(col1)
and then use dim to see the dimension of the matrices:
dim(row1.mat)
dim(col1.mat)
If you want the product to work for this you need a 6*1 matrix, multiplied by a 1*6 matrix. so you need to transpose the col1.mat using t(col1.mat).
And as you might know the matrix product is %*%
row1.mat %*% t(col1.mat)
Comparison of this method to others
library("microbenchmark")
x <- runif(1000)
y <- runif(1000)
xx = matrix(x)
yy = matrix(y)
microbenchmark(tcrossprod(x,y),x%*%t(y),outer(x,y), xx %*% t(yy), times=2000)
Unit: milliseconds
expr min lq mean median uq max neval
tcrossprod(x, y) 2.829099 3.243785 6.015880 4.801640 5.040636 77.87932 2000
x %*% t(y) 2.847175 3.251414 5.942841 4.810261 5.049474 86.53374 2000
outer(x, y) 2.886059 3.277811 5.983455 4.788054 5.074997 96.12442 2000
xx %*% t(yy) 2.868185 3.255833 6.126183 4.699884 5.056234 87.80024 2000

Faster way to unlist a list of large matrices?

I have a list of large matrices. All these matrices have the same number of rows and I want to "unlist" them and bind all their columns together. Below is a piece of code that I wrote, but I am not sure if this is the best I can achieve in terms of computational efficiency.
# simulate
n <- 10
nr <- 24
nc <- 8000
test <- list()
set.seed(1234)
for (i in 1:n){
test[[i]] <- matrix(rnorm(nr*nc),nr,nc)
}
> system.time( res <- matrix( as.numeric( unlist(test) ) ,nr,nc*n) )
user system elapsed
0.114 0.006 0.120
To work on a list and call a function on all objects, do.call is my usual first idea, along with cbind here to bind by column all objects.
For n=100 (with others answers for sake of completeness):
n <- 10
nr <- 24
nc <- 8000
test <- list()
set.seed(1234)
for (i in 1:n){
test[[i]] <- matrix(rnorm(nr*nc),nr,nc)
}
require(data.table)
ori <- function() { matrix( as.numeric( unlist(test) ) ,nr,nc*n) }
Tensibai <- function() { do.call(cbind,test) }
BrodieG <- function() { `attr<-`(do.call(c, test), "dim", c(nr, nc * n)) }
nicola <- function() { setattr(unlist(test),"dim",c(nr,nc*n)) }
library(microbenchmark)
microbenchmark(r1 <- ori(),
r2 <- Tensibai(),
r3 <- BrodieG(),
r4 <- nicola(), times=10)
Results:
Unit: milliseconds
expr min lq mean median uq max neval cld
r1 <- ori() 23.834673 24.287391 39.49451 27.066844 29.737964 93.74249 10 a
r2 <- Tensibai() 17.416232 17.706165 18.18665 17.873083 18.192238 21.29512 10 a
r3 <- BrodieG() 6.009344 6.145045 21.63073 8.690869 10.323845 77.95325 10 a
r4 <- nicola() 5.912984 6.106273 13.52697 6.273904 6.678156 75.40914 10 a
As for the why (in comments), #nicola did give the answer about it, there's less copy than original method.
All methods gives the same result:
> identical(r1,r2,r3,r4)
[1] TRUE
It seems that do.call beats the other method due to a copy made during the matrix call. What is interesting is that you can avoid that copy using the data.table::setattr function which allows to set attributes by reference, avoiding any copy. I omitted also the as.numeric part, since it is not necessary (unlist(test) is already numeric). So:
require(microbenchmark)
require(data.table)
f1<-function() setattr(unlist(test),"dim",c(nr,nc*n))
f2<-function() do.call(cbind,test)
microbenchmark(res <-f1(),res2 <- f2(),times=10)
#Unit: milliseconds
# expr min lq mean median uq max neval
# res <- f1() 4.088455 4.183504 7.540913 4.44109 4.988605 35.05378 10
#res2 <- f2() 18.325302 18.379328 18.776834 18.66857 19.100681 19.47415 10
identical(res,res2)
#[1] TRUE
I think I have a better one. We can avoid some of the overhead from cbind since we know these all have the same number of rows and columns. Instead, we use c knowing that the underlying vector nature of the matrices will allow us to re-wrap them into the correct dimensions:
microbenchmark(
x <- `attr<-`(do.call(c, test), "dim", c(nr, nc * n)),
y <- do.call(cbind, test)
)
# Unit: milliseconds
# expr min lq
# x <- `attr<-`(do.call(c, test), "dim", c(nr, nc * n)) 4.435943 4.699006
# y <- do.call(cbind, test) 19.339477 19.567063
# mean median uq max neval cld
# 12.76214 5.209938 9.095001 379.77856 100 a
# 21.64878 20.000279 24.210848 26.02499 100 b
identical(x, y)
# [1] TRUE
If you have varying number of columns you can probably still do this with some care in computing the total number of columns.

Vectorization while using which() function in R

I have 3 vectors and I want to apply separately on each of them the 'which()' function.
I'm trying to find the max index of values less than some given number.
How can I operate this task using vectorization?
my 3 vectors (may have various lengths)
vec1 <- c(1,2,3,4,5)
vec2 <- c(11,12,13)
vec3 <- c(1,2,3,4,5,6,7,8)
How can I vectorize it?
max(which(vec1<3))
max(which(vec2<12.3))
max(which(vec3<5.7))
The expected result is:
2
2
5
One way to get a speedup would be to use Rcpp to search for elements smaller than your cutoff, starting from the right side of the vector and moving left. You can return as soon as you find the element that meets your criteria, which means that if your target is near the right side of the vector you might avoid looking at most of the vector's elements (meanwhile, which looks at all vector elements and max looks at all values returned by which). The speedup would be largest for long vectors where the target element is close to the end.
library(Rcpp)
rightmost.small <- cppFunction(
'double rightmostSmall(NumericVector x, const double cutoff) {
for (int i=x.size()-1; i >= 0; --i) {
if (x[i] < cutoff) return i+1; // 1-index
}
return 0; // None found
}')
rightmost.small(vec1, 3)
# [1] 2
rightmost.small(vec2, 12.3)
# [1] 2
rightmost.small(vec3, 5.7)
# [1] 5
Let's look at the performance for a vector where we expect this to give us a big speedup:
set.seed(144)
vec.large <- rnorm(1000000)
all.equal(max(which(vec.large < -1)), rightmost.small(vec.large, -1))
# [1] TRUE
library(microbenchmark)
microbenchmark(max(which(vec.large < -1)), rightmost.small(vec.large, -1))
# Unit: microseconds
# expr min lq mean median uq max neval
# max(which(vec.large < -1)) 4912.016 8097.290 12816.36406 9189.0685 9883.9775 60405.585 100
# rightmost.small(vec.large, -1) 1.643 2.476 8.54274 8.8915 12.8375 58.152 100
For this vector of length 1 million, we see a speedup of about 1000x using the Rcpp code.
This speedup should carry directly over to the case where you have many vectors stored in a list; you can use #JoshO'Brien's mapply code and observe a speedup when you switch from max(which(...)) to the Rcpp code:
f <- function(v,m) max(which(v < m))
l <- list(vec.large)[rep(1, 100)]
m <- rep(-1, 100)
microbenchmark(mapply(f, l, m), mapply(rightmost.small, l, m))
Unit: microseconds
expr min lq mean median uq max neval
mapply(f, l, m) 865287.828 907893.8505 931448.1555 918637.343 935632.0505 1133909.950 100
mapply(rightmost.small, l, m) 253.573 281.6855 344.5437 303.094 335.1675 712.897 100
We see a 3000x speedup by using the Rcpp code here.
l <- list(vec1,vec2,vec3)
m <- c(3, 12.3, 5.7)
f <- function(v,m) max(which(v < m))
mapply(f,l,m)
# [1] 2 2 5

Resources