Apply an operation across multiple lists - r

I know in R if I have a list of matrices, I can use the Reduce function to apply an operation across all the matrices. For example:
l <- list(matrix(rnorm(16), 4, 4), matrix(rnorm(16), 4, 4))
Reduce(`*`, l)
But what if I want to apply this operation across multiple lists? I could do a brute-force approach with a for loop but I feel like there should be a better way. I can do two lists with mapply
l2 <- l
mapply(`*`, l, l2, SIMPLIFY = FALSE)
But if I have more that two I'm not sure how to solve that.
The following thoughts all result in errors:
l3 <- l2
mapply(`*`, l, l2, l3, SIMPLIFY = FALSE)
Error in .Primitive("*")(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]]) :
operator needs one or two arguments
Reduce(`*`, list(l, l2, l3))
Error in f(init, x[[i]]) : non-numeric argument to binary operator
The desired output is a list of length 2 with the elementwise products of each matrix within each list. The brute-force loop would look like this:
out <- vector("list", length = 2)
for(i in 1:2){
out[[i]] <- l[[i]] * l2[[i]] * l3[[i]]
}

This combination of Reduce and Map will produce the desired result in base R.
# copy the matrix list
l3 <- l2 <- l
out2 <- Reduce(function(x, y) Map(`*`, x, y), list(l, l2, l3))
which returns
out2
[[1]]
[,1] [,2] [,3] [,4]
[1,] -5.614351e-01 -0.06809906 -0.16847839 0.8450600
[2,] -1.201886e-05 0.02008037 5.64656727 -2.4845526
[3,] 5.587296e-02 -0.54793853 0.02254552 0.4608697
[4,] -9.732049e-04 11.73020448 1.83408770 -1.4844601
[[2]]
[,1] [,2] [,3] [,4]
[1,] -4.7372339865 -0.398501528 0.8918474 0.12433983
[2,] 0.0007413892 0.151864126 -0.2138688 -0.10223482
[3,] -0.0790846342 -0.413330364 2.0640126 -0.01549591
[4,] -0.1888032661 -0.003773035 -0.9246891 -2.30731237
We can check that this is the same as the for loop in the OP.
identical(out, out2)
[1] TRUE

Related

Sequentially multiply matrices from a list by a vector at time t-1 (recursively)

I am trying to multiply a list of matrices, in the order they appear within the list starting with matrix 1, by an initial vector and then recursively; so matrix 2 from the list multiplied by that resulting vector. I have tried various iteration of lapply and map but am unable to project forward and perform this recursively. More explicitly: A[[1]] % * % allYears[,1], then A[[2]] % * % allYears[,2],.....,A[[4]] % * % allYears[,4] which produces the final 5th column in "allYears". Below is the sample code with the known error in the for loop at A[[i]] indexing as i is not explicitly referenced.
A <- lapply(1:4, function(x) # construct list of matrices
matrix(c(0, 0, 10,
rbeta(1, 5, 4), 0, 0,
0, rbeta(1, 10, 2), 0), nrow=3, ncol=3, byrow=TRUE, ))
n <- c(1000, 100, 10) # initial vector of abundances
nYears <- 4 # define the number of years to project over
allYears <- matrix(0, nrow=3, ncol=nYears+1) # build a storage array for all abundances
allYears[, 1] <- n # set the year 0 abundance
for (t in 1:(nYears + 1)) { # loop through all years
allYears[, t] <- A[[i]] %*% allYears[, t - 1]
}
Based on the description, perhaps we need to loop over the sequence - i.e. length of A is 4, whereas the number of columns of 'allYears' is 5. Create an index from 2 to ncol of 'allYears', then loop over the sequence of that index, extract the corresponding element of 'A' based on the sequence whereas we get the allYears previous column
i1 <- 2:(nYears + 1)
for(t in seq_along(i1)) {
allYears[,i1[t]] <- A[[t]] %*% allYears[,i1[t]-1]
}
-output
> allYears
[,1] [,2] [,3] [,4] [,5]
[1,] 1000 100.00000 817.24277 2081.08322 333.6702
[2,] 100 261.46150 55.44237 423.22095 1244.6680
[3,] 10 81.72428 208.10832 33.36702 355.5175
Alternative, possibly too clever, solution:
Construct a list of (A[[1]], A[[1]] %*% A[[2]], ... )
Alist <- Reduce("%*%", A, accumulate=TRUE)
Multiply each of these by the initial value
vlist <- lapply(Alist, "%*%", n)
Combine:
do.call(cbind, vlist)
[,1] [,2] [,3] [,4]
[1,] 100.00000 856.66558 4864.20044 486.420
[2,] 569.23739 56.92374 543.02451 3307.690
[3,] 78.55101 553.62548 55.36255 445.619
#MikaelJagan points out that this can be done in fewer steps:
do.call(cbind,
rev(Reduce("%*%", rev(A), init = n, right = TRUE, accumulate = TRUE)))
or (in recent versions of R)
(A
|> rev()
|> Reduce(f = "%*%", init = n, right = TRUE, accumulate = TRUE)
|> rev()
|> do.call(what = cbind)
)
(The last step could be replaced by |> unlist() |> matrix(nrow = length(n)).)

double sum calculation, what is the most efficient approach?

I need to calculate this
where x is a vector of length n and f is a function.
What is the most efficient calculation for this in R?
One method is a double for loop, but that is obviously slow.
One fast way to do is the following:
Assume we have this vector:
x = c(0,1,2)
i.e. n=3, and assume f is a multiplication function:
Now, we use expand.grid.unique custom function which produces unique combinations within vector; in other words, it is similar to expand.grid base function but with unique combinations:
expand.grid.unique <- function(x, y, include.equals=FALSE)
{
x <- unique(x)
y <- unique(y)
g <- function(i)
{
z <- setdiff(y, x[seq_len(i-include.equals)])
if(length(z)) cbind(x[i], z, deparse.level=0)
}
do.call(rbind, lapply(seq_along(x), g))
}
In our vector case, when we cal expand.grid.unique(x,x), it produces the following result:
> expand.grid.unique(x,x)
[,1] [,2]
[1,] 0 1
[2,] 0 2
[3,] 1 2
Let's assign two_by_two to it:
two_by_two <- expand.grid.unique(x,x)
Since our function is assumed to be multiplication, then we need to calculate sum-product, i.e. dot product of first and second columns of two_by_two. For this we need %*% operator:
output <- two_by_two[,1] %*% two_by_two[,2]
> output
[,1]
[1,] 2
See ?combn
x <- 0:2
combn(x, 2)
# unique combos
[,1] [,2] [,3]
#[1,] 0 0 1
#[2,] 1 2 2
sum(combn(x, 2))
#[1] 6
combn() creates all the unique combinations. If you have a function that you want to sum, you can add a FUN to the call:
random_f <- function(x){x[1] + 2 * x[2]}
combn(x, 2, FUN = random_f)
#[1] 2 4 5
sum(combn(x, 2, FUN = random_f))
#[1] 11

Apply a function to each combination of columns

I have a data frame with n columns and want to apply a function to each combination of columns. This is very similar to how the cor() function takes a data frame as input and produces a correlation matrix as output, for example:
X <- data.frame(A=rnorm(100), B=rnorm(100), C=rnorm(100))
cor(X)
Which will generate this output:
> cor(X)
A B C
A 1.00000000 -0.01199511 0.02337429
B -0.01199511 1.00000000 0.07918920
C 0.02337429 0.07918920 1.00000000
However, I have a custom function that I need to apply to each combination of columns. I am now using a solution that uses nested for loops, which works:
f <- function(x, y) sum((x+y)^2) # some placeholder function
out <- matrix(NA, ncol = ncol(X), nrow = ncol(X)) # pre-allocate
for(i in seq_along(X)) {
for(j in seq_along(X)) {
out[i, j] <- f(X[, i], X[, j]) # apply f() to each combination
}
}
Which produces:
> out
[,1] [,2] [,3]
[1,] 422.4447 207.0833 211.4198
[2,] 207.0833 409.1242 218.2430
[3,] 211.4198 218.2430 397.5321
I am currently trying to transition into the tidyverse and would prefer to avoid using for loops. Could someone show me a tidy solution for this situation? Thanks!
You could do
library(tidyverse)
f <- function(x, y) sum((x+y)^2)
X <- data.frame(A=rnorm(100), B=rnorm(100), C=rnorm(100))
as.list(X) %>%
expand.grid(., .) %>%
mutate(out = map2_dbl(Var1, Var2, f)) %>%
as_tibble()
This isn’t a tidyverse solution, but it does avoid using for loops. We use RcppAlgos (I am the author) to generate all pair-wise permutations of columns and apply your custom function to each of these. After that, we coerce to a matrix.
set.seed(42)
X <- data.frame(A=rnorm(100), B=rnorm(100), C=rnorm(100))
library(RcppAlgos)
matrix(permuteGeneral(ncol(X), 2, repetition = TRUE, FUN = function(y) {
sum((X[,y[1]] + X[,y[2]])^2)
}), ncol = ncol(X))
# [,1] [,2] [,3]
# [1,] 429.8549 194.4271 179.4449
# [2,] 194.4271 326.8032 197.2585
# [3,] 179.4449 197.2585 409.6313
Using base R you could do:
set.seed(42)
X <- data.frame(A=rnorm(100), B=rnorm(100), C=rnorm(100))
OUT = diag(colSums((X+X)^2))
OUT[lower.tri(OUT)] = combn(X, 2, function(x) sum(do.call('+', x)^2)) #combn(X,2,function(x)sum(rowSums(x)^2))
OUT[upper.tri(OUT)] = OUT[lower.tri(OUT)]
OUT
[,1] [,2] [,3]
[1,] 429.8549 194.4271 179.4449
[2,] 194.4271 326.8032 197.2585
[3,] 179.4449 197.2585 409.6313

How to combine subsequent list elements into a new list in R?

For example: I have a list of matrices, and I would like to evaluate their differences, sort of a 3-D diff. So if I have:
m1 <- matrix(1:4, ncol=2)
m2 <- matrix(5:8, ncol=2)
m3 <- matrix(9:12, ncol=2)
mat.list <- list(m1,m2,m3)
I want to obtain
mat.diff <- list(m2-m1, m3-m2)
The solution I found is the following:
mat.diff <- mapply(function (A,B) B-A, mat.list[-length(mat.list)], mat.list[-1])
Is there a nicer/built-in way to do this?
You can do this with just lapply or other ways of looping:
mat.diff <- lapply( tail( seq_along(mat.list), -1 ),
function(i) mat.list[[i]] - mat.list[[ i-1 ]] )
You can use combn to generate the indexes of matrix and apply a function on each combination.
combn(1:length(l),2,FUN=function(x)
if(diff(x) == 1) ## apply just for consecutive index
l[[x[2]]]-l[[x[1]]],
simplify = FALSE) ## to get a list
Using #Arun data, I get :
[[1]]
[,1] [,2]
[1,] 4 4
[2,] 4 4
[[2]]
NULL
[[3]]
[,1] [,2]
[1,] 4 4
[2,] 4 4

positions of non-NA cells in a matrix

Consider the following matrix,
m <- matrix(letters[c(1,2,NA,3,NA,4,5,6,7,8)], 2, byrow=TRUE)
## [,1] [,2] [,3] [,4] [,5]
## [1,] "a" "b" NA "c" NA
## [2,] "d" "e" "f" "g" "h"
I wish to obtain the column indices corresponding to all non-NA elements, merged with the NA elements immediately following:
result <- c(list(1), list(2:3), list(4,5),
list(1), list(2), list(3), list(4), list(5))
Any ideas?
The column (and row) indicies of non-NA elements can be obtained with
which(!is.na(m), TRUE)
A full answer:
Since you want to work row-wise, but R treats vector column-wise, it is easier to work on the transpose of m.
t_m <- t(m)
n_cols <- ncol(m)
We get the array indicies as mentioned above, which gives the start point of each list.
ind_non_na <- which(!is.na(t_m), TRUE)
Since we are working on the transpose, we want the row indices, and we need to deal with each column separately.
start_points <- split(ind_non_na[, 1], ind_non_na[, 2])
The length of each list is given by the difference between starting points, or the difference between the last point and the end of the row (+1). Then we just call seq to get a sequence.
unlist(
lapply(
start_points,
function(x)
{
len <- c(diff(x), n_cols - x[length(x)] + 1L)
mapply(seq, x, length.out = len, SIMPLIFY = FALSE)
}
),
recursive = FALSE
)
This will get you close:
cols <- col(m)
cbind(cols[which(is.na(m))-1],cols[is.na(m)])
[,1] [,2]
[1,] 2 3
[2,] 4 5

Resources