I have a very large 3D array (say 100 x 100 x 10) that I would like to apply a function over for pairwise comparisons. I've tried a number of solutions, using data.table, mapply, etc. I'm maybe naively hoping for faster speedups, and am considering just doing this with C++/Rcpp. But before doing that, I thought I'd see if anyone is aware of a more elegant / faster solution to this problem? Many thanks!
Example code in R. For this smaller dimension version of what I'm wanting to apply this to, mapply() is a little faster than data.table
m <- 20
n <- 10 # number of data points per row/col combination
R <- array(runif(n*m*m), dim=c(m,m,n)) # 3D array to apply function over
grid <- expand.grid(A = 1:m, B = 1:m, C = 1:m, D = 1:m) # array indices (used as args below)
#function to do basic correlations between R[1,2,] and R[1,10,]
ss2 <- function(a,b,c,d) {
rho = cor(R[a, b, ], R[c, d, ])
}
#solution with data.table
dt <- setDT(grid) # convert from df -> dt
sol_1 <- dt[, ss2(A, B,C,D), by = seq_len(nrow(dt))]
#solution with mapply
sol_2 <- mapply(ss2, grid$A, grid$B, grid$C, grid$D)
I tried this with mapply(), data.table(). I've also tried using a parellelized version of apply() (parApply, https://dept.stat.lsa.umich.edu/~jerrick/courses/stat701/notes/parallel.html)
UPDATE: cora from the Rfast package gives further performance improvements.
By reshaping the array, we can use cor directly for a ~2K times speedup:
library(data.table)
library(Rfast)
m <- 20
n <- 10 # number of data points per row/col combination
R <- array(runif(n*m*m), dim=c(m,m,n)) # 3D array to apply function over
grid <- expand.grid(A = 1:m, B = 1:m, C = 1:m, D = 1:m)
ss2 <- function(a,b,c,d) rho = cor(R[a, b, ], R[c, d, ])
dt <- setDT(grid)
microbenchmark::microbenchmark(
sol_1 = dt[, ss2(A, B, C, D), by = seq_len(nrow(dt))][[2]],
sol_2 = mapply(ss2, grid$A, grid$B, grid$C, grid$D),
sol_3 = c(cor(t(matrix(R, m*m, n)))),
sol_4 = c(cora(t(matrix(R, m*m, n)))),
check = "equal",
times = 10
)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> sol_1 2101327.2 2135311.0 2186922.33 2178526.6 2247049.6 2301429.5 10
#> sol_2 2255828.9 2266427.5 2306180.23 2287911.0 2321609.6 2471711.7 10
#> sol_3 1203.8 1222.2 1244.75 1236.1 1243.9 1343.5 10
#> sol_4 922.6 945.8 952.68 951.9 955.8 988.8 10
Timing the full 100 x 100 x 10 array:
m <- 100L
n <- 10L
R <- array(runif(n*m*m), dim=c(m,m,n))
microbenchmark::microbenchmark(
sol_3 = c(cor(t(matrix(R, m*m, n)))),
sol_4 = c(cora(t(matrix(R, m*m, n)))),
check = "equal",
times = 10
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> sol_3 1293.0739 1298.4997 1466.546 1503.453 1513.746 1902.802 10
#> sol_4 879.8659 892.2699 1058.064 1055.668 1143.767 1300.282 10
Note that filling by column then transposing tends to be slightly faster than filling by row in this case. Also note that ss2 and grid are no longer needed.
Related
I have a very large data set with categorical labels a and a vector b that contains all possible labels in the data set:
a <- c(1,1,3,2) # artificial data
b <- c(1,2,3,4) # fixed categories
Now I want to find for each observation in a the set of all remaining categories (that is, the elements of b excluding the given observation in a). From these remaining categories, I want to sample one at random.
My approach using a loop is
goal <- numeric() # container for results
for(i in 1:4){
d <- setdiff(b, a[i]) # find the categories except the one observed in the data
goal[i] <- sample(d,1) # sample one of the remaining categories randomly
}
goal
[1] 4 4 1 1
However, this has to be done a large number of times and applied to very large data sets. Does anyone have a more efficient version that leads to the desired result?
EDIT:
The function by akrun is unfortunately slower than the original loop. If anyone has a creative idea with a competitive result, I'm happy to hear it!
We can use vapply
vapply(a, function(x) sample(setdiff(b, x), 1), numeric(1))
set.seed(24)
a <- sample(c(1:4), 10000, replace=TRUE)
b <- 1:4
system.time(vapply(a, function(x) sample(setdiff(b, x), 1), numeric(1)))
# user system elapsed
# 0.208 0.007 0.215
It turns out that resampling the labels that are equal to the labels in the data is an even faster approach, using
test = sample(b, length(a), replace=T)
resample = (a == test)
while(sum(resample>0)){
test[resample] = sample(b, sum(resample), replace=T)
resample = (a == test)
}
Updated Benchmarks for N=10,000:
Unit: microseconds
expr min lq mean median uq max neval
loop 14337.492 14954.595 16172.2165 15227.010 15585.5960 24071.727 100
akrun 14899.000 15507.978 16271.2095 15736.985 16050.6690 24085.839 100
resample 87.242 102.423 113.4057 112.473 122.0955 174.056 100
shree(data = a, labels = b) 5195.128 5369.610 5472.4480 5454.499 5574.0285 5796.836 100
shree_mapply(data = a, labels = b) 1500.207 1622.516 1913.1614 1682.814 1754.0190 10449.271 100
Update: Here's a fast version with mapply. This method avoids calling sample() for every iteration so is a bit faster. -
mapply(function(x, y) b[!b == x][y], a, sample(length(b) - 1, length(a), replace = T))
Here's a version without setdiff (setdiff can be a bit slow) although I think even more optimization is possible. -
vapply(a, function(x) sample(b[!b == x], 1), numeric(1))
Benchmarks -
set.seed(24)
a <- sample(c(1:4), 1000, replace=TRUE)
b <- 1:4
microbenchmark::microbenchmark(
akrun = vapply(a, function(x) sample(setdiff(b, x), 1), numeric(1)),
shree = vapply(a, function(x) sample(b[!b == x], 1), numeric(1)),
shree_mapply = mapply(function(x, y) b[!b == x][y], a, sample(length(b) - 1, length(a), replace = T))
)
Unit: milliseconds
expr min lq mean median uq max neval
akrun 28.7347 30.66955 38.319655 32.57875 37.45455 237.1690 100
shree 5.6271 6.05740 7.531964 6.47270 6.87375 45.9081 100
shree_mapply 1.8286 2.01215 2.628989 2.14900 2.54525 7.7700 100
I have this data.frame:
set.seed(1)
df <- cbind(matrix(rnorm(26,100),26,100),data.frame(id=LETTERS,parent.id=sample(letters[1:5],26,replace = T),stringsAsFactors = F))
Each row is 100 measurements from a certain subject (designated by id), which is associated with a parent ID (designated by parent.id). The relationship between parent.id and id is one-to-many.
I'm looking for a fast way to get the fraction of each df$id (for each of its 100 measurements) out the measurements of its parent.id. Meaning that for each id in df$id I want to divide each of its 100 measurements by the sum of its measurements across all df$id's which correspond to its df$parent.id.
What I'm trying is:
sum.df <- dplyr::select(df,-id) %>% dplyr::group_by(parent.id) %>% dplyr::summarise_all(sum)
fraction.df <- do.call(rbind,lapply(df$id,function(i){
pid <- dplyr::filter(df,id == i)$parent.id
(dplyr::filter(df,id == i) %>% dplyr::select(-id,-parent.id))/
(dplyr::filter(sum.df,parent.id == pid) %>% dplyr::select(-parent.id))
}))
But for the real dimensions of my data: length(df$id) = 10,000 with 1,024 measurements, this is not fast enough.
Any idea how to improve this, ideally using dplyr functions?
Lets compare these options with microbenchmark, all using the new definition for the dataset in #Sathish's answer:
OP method:
Units: seconds
min lq mean median uq max neval
1.423583 1.48449 1.602001 1.581978 1.670041 2.275105 100
#Sathish method speeds it up by a factor of about 5. This is valuable, to be sure
Units: milliseconds
min lq mean median uq max neval
299.3581 334.787 388.5283 363.0363 398.6714 951.4654 100
One possible base R implementation below, using principles of efficient R code, improves things by a factor of about 65 (24 milliseconds, vs 1,582 milliseconds):
Units: milliseconds
min lq mean median uq max neval
21.49046 22.59205 24.97197 23.81264 26.36277 34.72929 100
Here's the base R implementation. As is the case for the OP's implementation, the parent.id and id columns are not included in the resulting structure (here fractions). fractions is a matrix with rows ordered according to sort(interaction(df$id, df$parent.id, drop = TRUE)).
values <- df[1:100]
parents <- split(values, df$parent.id)
sums <- vapply(parents, colSums, numeric(100), USE.NAMES = FALSE)
fractions <- matrix(0, 26, 100)
f_count <- 0
for (p_count in seq_along(parents)){
parent <- as.matrix(parents[[p_count]])
dimnames(parent) <- NULL
n <- nrow(parent)
for (p_row in seq_len(nrow(parent))){
fractions[(f_count + p_row),] <- parent[p_row,] / sums[,p_count]
}
f_count <- f_count + p_row
}
Note: there's still room for improvement. split() is not particularly efficient.
Note 2: What "principles of efficient R code" were used?
Get rid of names whenever you can
It's faster to find things in a matrix than a data frame
Don't be afraid of for loops for efficiency, provided you're not growing an object
Prefer vapply to the other apply family functions.
The problem with your data is all rows are duplicate of each other, so I changed it slightly to reflect different values in the dataset.
Data:
set.seed(1L)
df <- cbind(matrix(rnorm(2600), nrow = 26, ncol = 100),data.frame(id=LETTERS,parent.id=sample(letters[1:5],26,replace = T),stringsAsFactors = F))
Code:
library('data.table')
setDT(df) # assign data.table class by reference
# compute sum for each `parent.id` for each column (100 columns)
sum_df <- df[, .SD, .SDcols = which(colnames(df) != 'id' )][, lapply(.SD, sum ), by = .(parent.id ) ]
# get column names for sum_df and df which are sorted for consistency
no_pid_id_df <- gtools::mixedsort( colnames(df)[ ! ( colnames(df) %in% c( 'id', 'parent.id' ) ) ] )
no_pid_sum_df <- gtools::mixedsort( colnames(sum_df)[ colnames(sum_df) != 'parent.id' ] )
# match the `parent.id` for each `id` and then divide its value by the value of `sum_df`.
df[, .( props = {
pid <- parent.id
unlist( .SD[, .SD, .SDcols = no_pid_id_df ] ) /
unlist( sum_df[ parent.id == pid, ][, .SD, .SDcols = no_pid_sum_df ] )
}, parent.id ), by = .(id)]
Output:
# id props parent.id
# 1: A -0.95157186 e
# 2: A 0.06105359 e
# 3: A -0.42267771 e
# 4: A -0.03376174 e
# 5: A -0.16639600 e
# ---
# 2596: Z 2.34696158 e
# 2597: Z 0.23762369 e
# 2598: Z 0.60068440 e
# 2599: Z 0.14192337 e
# 2600: Z 0.01292592 e
Benchmark:
library('microbenchmark')
microbenchmark( sathish(), frank(), dan())
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# sathish() 404.450219 413.456675 433.656279 420.46044 429.876085 593.44202 100 c
# frank() 2.035302 2.304547 2.707019 2.47257 2.622025 18.31409 100 a
# dan() 17.396981 18.230982 19.316653 18.59737 19.700394 27.13146 100 b
I have the following data frame 'df'.
Each participant (here 10 participants) saw several stimuli (here 100), and made
a judgment about it (here a random number). For each stimuli, I know the true
answer (here a random number; a different number for each stimuli but always
the same answer for all participanst)
participant <- rep(1:10, each=100)
stimuli <- rep(1:100, 10)
judgment <- rnorm(1000)
df1 <- data.frame(participant, stimuli, judgment)
df2 <- data.frame(stimuli=1:100, criterion=rnorm(100))
df <- merge(df1, df2, by='stimuli') %>% arrange(participant, stimuli)
Here is what I am trying to do:
1) Taking n randomly selected participants (here n is between 1 and 10).
2) Computing the mean of their judgments per stimuli
3) Computing the correlation between this mean and the true answer
I want to perform step 1-3 for all n (that is, I want to take 1 randomly selected participants and perform steps 1-3, then I want to take 2 randomly selected participants and perform steps 1-3 ... 10 randomly selected participants and perform steps 1-3.
The results should be a data frame with 10 rows and 2 variables: N and the correlation. I want to work only with dplyr.
My solution is based on lapply. Here it is:
participants_id = unique (df$participant)
MyFun = function(Data) {
HelpFun = function(x, Data) {
# x is the index for the number of participants.
# It Will be used in the lapply call bellow
participants_x = sample(participants_id, x)
filter(Data, participant %in% participants_x) %>%
group_by(stimuli) %>%
summarise( mean_x = mean(judgment),
criterion = unique(criterion) ) %>%
summarise(cor = cor(.$mean_x, .$criterion))
}
N <- length(unique(Data$participant))
lapply(1:N, HelpFun, Data) %>% bind_rows()
}
MyFun(df)
The problem is that this code is slow. Since every selection is random, I
perform all this 10,000 times. And this slow. On my machine (Windows 10, 16 GB) 1000 simulations take 2 minutes. 10,000 simulations takes 20 minutes. (I also tried with loops but it did not help, although for some reasons it was a little bit faster). It has to be a solution faster. After all, a computations are not so complicated.
Below I wrote 100 simulations only in order to not interfere with your computer.
system.time(replicate(100, MyFun(df), simplify = FALSE ) %>% bind_rows())
Any idea about making all of this faster?
Using data.table and for loops we can get 10 times faster solution.
My function:
minem <- function(n) { # n - simulation count
require(data.table)
participants_id <- unique(df$participant)
N <- length(unique(df$participant))
dt <- as.data.table(df)
setkey(dt, stimuli)
L <- list()
for (j in 1:n) {
corss <- rep(0, N)
for (i in 1:N) {
participants_x <- sample(participants_id, i)
xx <- dt[participant %in% participants_x,
.(mean_x = mean(judgment),
criterion = first(criterion)),
by = stimuli]
corss[i] <- cor(xx$mean_x, xx$criterion)
}
L[[j]] <- corss
}
unlist(L)
}
head(minem(10))
# [1] 0.13642499 -0.02078109 -0.14418400 0.04966805 -0.09108837 -0.15403185
Your function:
Meir <- function(n) {
replicate(n, MyFun(df), simplify = FALSE) %>% bind_rows()
}
Benchmarks:
microbenchmark::microbenchmark(
Meir(10),
minem(10),
times = 10)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# Meir(10) 1897.6909 1956.3427 1986.5768 1973.5594 2043.4337 2048.5809 10 b
# minem(10) 193.5403 196.0426 201.4132 202.1085 204.9108 215.9961 10 a
around 10 times faster
system.time(minem(1000)) # ~19 sek
Update
If your data size and memory limit allows then you can do it much faster with this approach:
minem2 <- function(n) {
require(data.table)
participants_id <- unique(df$participant)
N <- length(unique(df$participant))
dt <- as.data.table(df)
setkey(dt, participant)
L <- lapply(1:n, function(x)
sapply(1:N, function(i)
sample(participants_id, i)))
L <- unlist(L, recursive = F)
names(L) <- 1:length(L)
g <- sapply(seq_along(L), function(x) rep(names(L[x]), length(L[[x]])))
L <- data.table(participant = unlist(L), .id = as.integer(unlist(g)),
key = "participant")
L <- dt[L, allow.cartesian = TRUE]
xx <- L[, .(mean_x = mean(judgment), criterion = first(criterion)),
keyby = .(.id, stimuli)]
xx <- xx[, cor(mean_x, criterion), keyby = .id][[2]]
xx
}
microbenchmark::microbenchmark(
Meir(100),
minem(100),
minem2(100),
times = 2, unit = "relative")
# Unit: relative
# expr min lq mean median uq max neval cld
# Meir(100) 316.34965 316.34965 257.30832 257.30832 216.85190 216.85190 2 c
# minem(100) 31.49818 31.49818 26.48945 26.48945 23.05735 23.05735 2 b
# minem2(100) 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 2 a
But you will need to test yourself.
I have a string matrix (my_data) of dimensions 9000000x10 with each value being a single character string. I want to transform it to a numeric matrix using the function utf8ToInt, but it takes a long time and crashes my session.
new_matrix <- apply(my_data, 1:2, "utf8ToInt")
The result is what I expect, but I need a more efficient way of doing that.
Any help is deeply appreciated.
Imagine my data is:
my_data <- matrix(c("a","b","c","d"), ncol = 2)
but it is actually 9000000x10 instead of 2x2.
stringi::stri_enc_toutf32 may be an alternative.
From ?stri_enc_toutf32:
This function is roughly equivalent to a vectorized call to utf8ToInt(enc2utf8(str))
On a 1e3 * 2 matrix, stri_enc_toutf32 is about 10 and 20 times faster than vapply / apply + utf8ToInt respectively:
library(stringi)
library(microbenchmark)
nr = 1e3
nc = 2
m = matrix(sample(letters, nr*nc, replace = TRUE), nrow = nr, ncol = nc)
microbenchmark(
f_apply = apply(m, 1:2, utf8ToInt),
f_vapply = structure(vapply(m, utf8ToInt, numeric(1)), dim=dim(m)),
f = matrix(unlist(stri_enc_toutf32(m), use.names = FALSE), nrow = nrow(m)),
times = 10L, check = "equal")
# Unit: microseconds
# expr min lq mean median uq max neval
# f_apply 2283.4 2297.2 2351.17 2325.40 2354.5 2583.6 10
# f_vapply 1276.1 1298.0 1348.88 1322.00 1353.4 1611.3 10
# f 87.6 92.3 108.53 105.15 111.0 163.8 10
Using vapply would be almost twice as fast. Since vapply returns a vector, it is necessary to re-establish the matrix format (here with structure).
library(microbenchmark)
my_data <- matrix(sample(letters, 2*100, replace = TRUE), ncol = 2)
microbenchmark(
apply = apply(my_data, 1:2, utf8ToInt),
vapply = structure(vapply(my_data, utf8ToInt, numeric(1)), dim=dim(my_data)),
times = 500L, check = 'equal'
)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> apply 199.201 208.001 224.811 213.801 220.1515 1560.400 500
#> vapply 111.000 115.501 136.343 120.401 124.9505 1525.901 500
Created on 2021-03-06 by the reprex package (v1.0.0)
I'm trying to multiply a data frame df by a vector v, so that the product is a data frame, where the i-th row is given by df[i,]*v. I can do this, for example, by
df <- data.frame(A=1:5, B=2:6); v <- c(0,2)
as.data.frame(t(t(df) * v))
A B
1 0 4
2 0 6
3 0 8
4 0 10
5 0 12
I am sure there has to be a more R-style approach (and a very simple one!), but nothing comes on my mind. I even tried something like
apply(df, MARGIN=1, function(x) x*v)
but still, non-readable constructions like as.data.frame(t(.)) are required.
How can I find an efficient and elegant workaround here?
This works too:
data.frame(mapply(`*`,df,v))
In that solution, you are taking advantage of the fact that data.frame is a type of list, so you can iterate over both the elements of df and v at the same time with mapply.
Unfortunately, you are limited in what you can output from mapply: as simple list, or a matrix. If your data are huge, this would likely be more efficient:
data.frame(mapply(`*`,df,v,SIMPLIFY=FALSE))
Because it would convert it to a list, which is more efficient to convert to a data.frame.
If you're looking for speed and memory efficiency - data.table to the rescue:
library(data.table)
dt = data.table(df)
for (i in seq_along(dt))
dt[, (i) := dt[[i]] * v[i]]
eddi = function(dt) { for (i in seq_along(dt)) dt[, (i) := dt[[i]] * v[i]] }
arun = function(df) { df * matrix(v, ncol=ncol(df), nrow=nrow(df), byrow=TRUE) }
nograpes = function(df) { data.frame(mapply(`*`,df,v,SIMPLIFY=FALSE)) }
N = 1e6
dt = data.table(A = rnorm(N), B = rnorm(N))
v = c(0,2)
microbenchmark(eddi(copy(dt)), arun(copy(dt)), nograpes(copy(dt)), times = 10)
#Unit: milliseconds
# expr min lq mean median uq max neval
# eddi(copy(dt)) 23.01106 24.31192 26.47132 24.50675 28.87794 34.28403 10
# arun(copy(dt)) 337.79885 363.72081 450.93933 433.21176 516.56839 644.70103 10
# nograpes(copy(dt)) 19.44873 24.30791 36.53445 26.00760 38.09078 95.41124 10
As Arun points out in the comments, one can also use the set function from the data.table package to do this in-place modification on data.frame's as well:
for (i in seq_along(df))
set(df, j = i, value = df[[i]] * v[i])
This of course also works for data.table's and could be significantly faster if the number of columns is large.
A language that lets you combine vectors with matrices has to make a decision at some point whether the matrices are row-major or column-major ordered. The reason:
> df * v
A B
1 0 4
2 4 0
3 0 8
4 8 0
5 0 12
is because R operates down the columns first. Doing the double-transpose trick subverts this. Sorry if this is just explaining what you know, but I don't know another way of doing it, except explicitly expanding v into a matrix of the same size.
Or write a nice function that wraps the not very R-style code into something that is R-stylish.
Whats wrong with
t(apply(df, 1, function(x)x*v))
?
library(purrr)
map2_dfc(df, v, `*`)
Benchmark
N = 1e6
dt = data.table(A = rnorm(N), B = rnorm(N))
v = c(0,2)
eddi = function(dt) { for (i in seq_along(dt)) dt[, (i) := dt[[i]] * v[i]]; dt }
arun = function(df) { df * matrix(v, ncol=ncol(df), nrow=nrow(df), byrow=TRUE) }
nograpes = function(df) { data.frame(mapply(`*`,df,v,SIMPLIFY=FALSE)) }
ryan = function(df) {map2_dfc(df, v, `*`) }
library(microbenchmark)
microbenchmark(
eddi(copy(dt))
, arun(copy(dt))
, nograpes(copy(dt))
, ryan(copy(dt))
, times = 100)
# Unit: milliseconds
# expr min lq mean median uq max neval
# eddi(copy(dt)) 8.367513 11.06719 24.26205 12.29132 19.35958 171.6212 100
# arun(copy(dt)) 94.031272 123.79999 186.42155 148.87042 251.56241 364.2193 100
# nograpes(copy(dt)) 7.910739 10.92815 27.68485 13.06058 21.39931 172.0798 100
# ryan(copy(dt)) 8.154395 11.02683 29.40024 13.73845 21.77236 181.0375 100
I think the fastest way (without testing data.table) is data.frame(t(t(df)*v)).
My tests:
testit <- function(nrow, ncol)
{
df <- as.data.frame(matrix(rnorm(nrow*ncol),nrow=nrow,ncol=ncol))
v <- runif(ncol)
r1 <- data.frame(t(t(df)*v))
r2 <- data.frame(mapply(`*`,df,v,SIMPLIFY=FALSE))
r3 <- df * rep(v, each=nrow(df))
stopifnot(identical(r1, r2) && identical(r1, r3))
microbenchmark(data.frame(t(t(df)*v)), data.frame(mapply(`*`,df,v,SIMPLIFY=FALSE)), df * rep(v, each=nrow(df)))
}
Result
> set.seed(1)
>
> testit(100,100)
Unit: milliseconds
expr min lq median uq max neval
data.frame(t(t(df) * v)) 2.297075 2.359541 2.455778 3.804836 33.05806 100
data.frame(mapply(`*`, df, v, SIMPLIFY = FALSE)) 9.977436 10.401576 10.658964 11.762009 15.09721 100
df * rep(v, each = nrow(df)) 14.309822 14.956705 16.092469 16.516609 45.13450 100
> testit(1000,10)
Unit: microseconds
expr min lq median uq max neval
data.frame(t(t(df) * v)) 754.844 805.062 844.431 1850.363 27955.79 100
data.frame(mapply(`*`, df, v, SIMPLIFY = FALSE)) 1457.895 1497.088 1567.604 2550.090 4732.03 100
df * rep(v, each = nrow(df)) 5383.288 5527.817 5875.143 6628.586 32392.81 100
> testit(10,1000)
Unit: milliseconds
expr min lq median uq max neval
data.frame(t(t(df) * v)) 17.07548 18.29418 19.91498 20.67944 57.62913 100
data.frame(mapply(`*`, df, v, SIMPLIFY = FALSE)) 99.90103 104.36028 108.28147 114.82012 150.05907 100
df * rep(v, each = nrow(df)) 112.21719 118.74359 122.51308 128.82863 164.57431 100