Converting categorical lists into a matrix with ranks - r

Sorry, again in for some basic question.
I have six lists of categorical data:
[1] A,B,C,D
[2] C,D,B,A
In the end, i would like to have a matrix, that looks like
[1],[2]
[A] 1,4
[B] 2,3
[C] 3,1
[D] 4,2
So that each column contains a list of the ranks of the categorical data. Thank you very much again!

It is not clear about your input dataset
l1 <- list(LETTERS[1:4], LETTERS[c(3:4, 2:1)])
library(reshape2)
dat1 <- transform(melt(l1), indx=ave(seq_along(value), L1, FUN=seq_along))[,-2]
split(dat1$indx, dat1$value)
# $A
#[1] 1 4
# $B
#[1] 2 3
# $C
#[1] 3 1
# $D
#[1] 4 2
If you need a matrix output
do.call(rbind,split(dat1$indx, dat1$value))
Update
Or you could use mapply
res2 <- mapply(match, rep(list(LETTERS[1:4]),length(l1)), l1)
rownames(res2) <- LETTERS[1:4]
res2
# [,1] [,2]
#A 1 4
#B 2 3
#C 3 1
#D 4 2
Or using sapply contributed by #alexis_laz
res3 <- sapply(l1, function(x) match(LETTERS[1:4], x))
rownames(res3) <- rownames(res2)
Or, in this case:
sapply(l1, order)
# [,1] [,2]
#[1,] 1 4
#[2,] 2 3
#[3,] 3 1
#[4,] 4 2

This should work for you:
a <- c(toupper(letters[1:4]))
b <- c("C","D","B","A")
n <- length(a)
dfs <- list(a, b)
ranks <- rep(list(seq(n)), length(dfs))
for(i in 1:length(ranks)){
names(ranks[[i]]) <- dfs[[i]]
ranks[[i]] <- ranks[[i]][order(names(ranks[[i]]))]
}
sapply(ranks, FUN = function(x) x)

Related

R values that go into matrix multiplication

What is the fastest approach to saving unique values that go into matrix multiplication (without 0)?
For example, if I have a data.table object
library(data.table)
A = data.table(j3=c(3,0,3),j5=c(0,5,5),j7=c(0,7,0),j8=c(8,0,8))
I would like to see which unique values go into A*transpose(A) (or as.matrix(A) %*% as.matrix(t(A))). Right now, I can do it using for loops as:
B=t(A)
L = list()
models = c('A1','A2','A3')
for(i in 1:nrow(A)){
for(j in 1:ncol(B)){
u = union(unlist(A[i,]),B[,j])
u = u[u!=0] # remove 0
L[[paste(models[i],models[j])]]= u
}
}
However, is there a faster and more RAM-efficient way? The output doesn't have to be a list object, as in my case, it can be a data.table (data.frame) as well. Also, the order of values is not important. For example, 3 5 8 is as good as 5 3 8, 8 5 3 etc.
Any help is appreciated.
EDIT: So as.matrix(A) %*% as.matrix(t(A)) is:
[,1] [,2] [,3]
[1,] 73 0 73
[2,] 0 74 25
[3,] 73 25 98
The first element is calculated as 3*3+0*0+0*0+8*8 = 73, the second element is 3*0+0*5+0*7+8*0 = 0, etc. I need unique numbers that go to this calculation but without 0.
Therefore outputs (saved in the list L) are:
> L
$`A1 A1`
[1] 3 8
$`A1 A2`
[1] 3 8 5 7
$`A1 A3`
[1] 3 8 5
$`A2 A1`
[1] 5 7 3 8
$`A2 A2`
[1] 5 7
$`A2 A3`
[1] 5 7 3 8
$`A3 A1`
[1] 3 5 8
$`A3 A2`
[1] 3 5 8 7
$`A3 A3`
[1] 3 5 8
Once again, the output doesn't have to be a list object. I would prefer data.table if it is doable. Is it possible to rewrite my approach as Rcpp function?
Potential optimizations
Following up on #user2554330's answer, note that if A is an m-by-n matrix, then AAT = A %*% t(A) (equivalently tcrossprod(A)) is an m-by-m symmetric matrix. AAT[i, j] and AAT[j, i] are computed using the same entries of A, so you only need to inspect m*(m+1)/2 pairs of rows of A, not m*m.
You can do even better by finding and caching the unique elements of each row before pairing them. Preprocessing in this way avoids redundant computation and should noticeably improve performance when m << n.
Limitations
Another aspect of the problem is how unique works under the hood. unique has an argument nmax that you can use to specify an expected maximum number of unique elements. From ?duplicated:
Except for factors, logical and raw vectors the default nmax = NA is equivalent to nmax = length(x). Since a hash table of size 8*nmax bytes is allocated, setting nmax suitably can save large amounts of memory. For factors it is automatically set to the smaller of length(x) and the number of levels plus one (for NA). If nmax is set too small there is liable to be an error: nmax = 1 is silently ignored.
Long vectors are supported for the default method of duplicated, but may only be usable if nmax is supplied.
These comments apply to unique as well. Since you have a 300-by-4e+07 matrix, you would be evaluating (with preprocessing):
unique(<4e+07-length vector>), 300 times,
unique(<up to 8e+07-length vector>), 299*300/2 times.
That can consume a lot of memory if you don't know anything about your matrix that might allow you to set nmax. And it can take a long time if you don't have access to many CPUs.
So I agree with comments asking you to consider why you need to do this at all and whether your underlying problem has a nicer solution.
Two answers
FWIW, here are two approaches to your general problem that actually take advantage of symmetry. f and g are without and with preprocessing. [[.utri allows you to extract elements from the return value, an m*(m+1)/2-length list, as if it were an m-by-m matrix. as.matrix.utri constructs the full, symmetric m-by-m list matrix.
f <- function(A, nmax = NA) {
a <- seq_len(nrow(A))
J <- cbind(sequence(a), rep.int(a, a))
FUN <- function(i) {
if (i[1L] == i[2L]) {
x <- A[i[1L], ]
} else {
x <- c(A[i[1L], ], A[i[2L], ])
}
unique.default(x[x != 0], nmax = nmax)
}
res <- apply(J, 1L, FUN, simplify = FALSE)
class(res) <- "utri"
res
}
g <- function(A, nmax = NA) {
l <- lapply(asplit(A, 1L), function(x) unique.default(x[x != 0], nmax = nmax))
a <- seq_along(l)
J <- cbind(sequence(a), rep.int(a, a))
FUN <- function(i) {
if (i[1L] == i[2L]) {
l[[i[1L]]]
} else {
unique.default(c(l[[i[1L]]], l[[i[2L]]]))
}
}
res <- apply(J, 1L, FUN, simplify = FALSE)
class(res) <- "utri"
res
}
`[[.utri` <- function(x, i, j) {
stopifnot(length(i) == 1L, length(j) == 1L)
class(x) <- NULL
if (i <= j) {
x[[i + (j * (j - 1L)) %/% 2L]]
} else {
x[[j + (i * (i - 1L)) %/% 2L]]
}
}
as.matrix.utri <- function(x) {
p <- length(x)
n <- as.integer(round(0.5 * (-1 + sqrt(1 + 8 * p))))
i <- rep.int(seq_len(n), n)
j <- rep.int(seq_len(n), rep.int(n, n))
r <- i > j
ir <- i[r]
i[r] <- j[r]
j[r] <- ir
res <- x[i + (j * (j - 1L)) %/% 2L]
dim(res) <- c(n, n)
res
}
Here is a simple test on a 4-by-4 integer matrix:
mkA <- function(m, n) {
A <- sample(0:(n - 1L), size = as.double(m) * n, replace = TRUE,
prob = rep.int(c(n - 1, 1), c(1L, n - 1L)))
dim(A) <- c(m, n)
A
}
set.seed(1L)
A <- mkA(4L, 4L)
A
## [,1] [,2] [,3] [,4]
## [1,] 0 0 2 3
## [2,] 0 1 0 0
## [3,] 2 1 0 3
## [4,] 1 2 0 0
identical(f(A), gA <- g(A))
## [1] TRUE
gA[[1L, 1L]] # used for 'tcrossprod(A)[1L, 1L]'
## [1] 2 3
gA[[1L, 2L]] # used for 'tcrossprod(A)[1L, 2L]'
## [1] 2 3 1
gA[[2L, 1L]] # used for 'tcrossprod(A)[2L, 1L]'
## [1] 2 3 1
gA # under the hood, an 'm*(m+1)/2'-length list
## [[1]]
## [1] 2 3
##
## [[2]]
## [1] 2 3 1
##
## [[3]]
## [1] 1
##
## [[4]]
## [1] 2 3 1
##
## [[5]]
## [1] 1 2 3
##
## [[6]]
## [1] 2 1 3
##
## [[7]]
## [1] 2 3 1
##
## [[8]]
## [1] 1 2
##
## [[9]]
## [1] 2 1 3
##
## [[10]]
## [1] 1 2
##
## attr(,"class")
## [1] "utri"
mgA <- as.matrix(gA) # the full, symmetric, 'm'-by-'m' list matrix
mgA
## [,1] [,2] [,3] [,4]
## [1,] integer,2 integer,3 integer,3 integer,3
## [2,] integer,3 1 integer,3 integer,2
## [3,] integer,3 integer,3 integer,3 integer,3
## [4,] integer,3 integer,2 integer,3 integer,2
mgA[1L, ] # used for first row of 'tcrossprod(A)'
## [[1]]
## [1] 2 3
##
## [[2]]
## [1] 2 3 1
##
## [[3]]
## [1] 2 3 1
##
## [[4]]
## [1] 2 3 1
## If you need names
dimnames(mgA) <- rep.int(list(sprintf("A%d", seq_len(nrow(mgA)))), 2L)
mgA["A1", ]
## $A1
## [1] 2 3
##
## $A2
## [1] 2 3 1
##
## $A3
## [1] 2 3 1
##
## $A4
## [1] 2 3 1
## If you need an 'm'-by-'m' 'data.table' result
DT <- data.table::as.data.table(mgA)
DT
## A1 A2 A3 A4
## 1: 2,3 2,3,1 2,3,1 2,3,1
## 2: 2,3,1 1 1,2,3 1,2
## 3: 2,3,1 1,2,3 2,1,3 2,1,3
## 4: 2,3,1 1,2 2,1,3 1,2
And here are two benchmarks on two large integer matrices, showing that preprocessing can help quite a bit:
set.seed(1L)
A <- mkA(100L, 1e+04L)
microbenchmark::microbenchmark(f(A), g(A), times = 10L, setup = gc(FALSE))
## Unit: milliseconds
## expr min lq mean median uq max neval
## f(A) 2352.0572 2383.3100 2435.7954 2403.8968 2431.6214 2619.553 10
## g(A) 843.0206 852.5757 858.7262 858.2746 863.8239 881.450 10
A <- mkA(100L, 1e+06L)
microbenchmark::microbenchmark(f(A), g(A), times = 10L, setup = gc(FALSE))
## Unit: seconds
## expr min lq mean median uq max neval
## f(A) 290.93327 295.54319 302.57001 301.17810 307.50226 318.14203 10
## g(A) 72.85608 73.83614 76.67941 76.57313 77.78056 83.73388 10
Perhaps we can try this
f <- function(A, models) {
AA <- replace(A, A == 0, NA)
setNames(
c(t(outer(
1:nrow(A),
1:nrow(A),
Vectorize(function(x, y) unique(na.omit(c(t(AA[c(x, y)])))))
))),
t(outer(models, models, paste))
)
}
which gives
$`A1 A1`
[1] 3 8
$`A1 A2`
[1] 3 8 5 7
$`A1 A3`
[1] 3 8 5
$`A2 A1`
[1] 5 7 3 8
$`A2 A2`
[1] 5 7
$`A2 A3`
[1] 5 7 3 8
$`A3 A1`
[1] 3 5 8
$`A3 A2`
[1] 3 5 8 7
$`A3 A3`
[1] 3 5 8
If you care about the speed, you can try
lst <- asplit(replace(A, A == 0, NA), 1)
mat <- matrix(list(), nrow = nrow(A), ncol = nrow(A))
mat[lower.tri(mat)] <- combn(lst, 2, function(...) unique(na.omit(unlist(...))), simplify = FALSE)
mat[upper.tri(mat)] <- t(mat)[upper.tri(mat)]
diag(mat) <- Map(function(x) unname(x)[!is.na(x)], lst)
L <- c(t(mat))
Thanks for posting the additional information in your edits. From what you posted, it appears that for all pairs of rows of a matrix or data table A, you want the unique non-zero values in those two rows.
To do that efficiently I'd suggest ensuring that A is a matrix. Row indexing in dataframes or data tables is a lot slower than doing so in matrices. (Column indexing can be faster, but I doubt if it's worth transposing the table to get that.)
Once you have a matrix, A[i, ] is a vector containing the values in row i, and that's a pretty fast calculation. You want the unique non-zero values in c(A[i, ], A[j, ]). The unique function will produce this, but won't leave out the zeros. I'd suggest experimenting. Depending on the contents of each row, it is conceivable that leaving the zeros out of the rows first before computing the unique entries could be either faster or slower than calculating all the unique values and deleting 0 afterwards.
You say you want to do this for a few hundred rows, but each row is very long. I'd guess you won't be able to improve much on nested loops: the time will be spent on each entry, not on the loops. However, you could experiment with vectorization using the apply() function, e.g.
result <- vector("list", nrows)
for (i in 1:nrows)
result[[i]] <- apply(A, 1, function(row) setdiff(unique(c(row, A[i,])), 0))
This will give a list of lists; if you want to examine entry i, j, you can use result[[c(i,j)]].

Function on all column combinations of n*m matrix yielding a m*m result

Say I have a function which takes two vectors and returns a single integer, for instance the number of elements in one that is also in the other vector. Like:
f <- function(v1,v2)sum(v1 %in% v2)
How can I apply that function to all pairwise combinations of m columns in a n*m matrix.
set.seed(1)
m <- replicate(3, sample(letters[1:10], size = 5))
dimnames(m) <- list(NULL, paste0('c', 1:ncol(m)))
Now,
> m
[,1] [,2] [,3]
[1,] "c" "i" "c"
[2,] "d" "j" "b"
[3,] "e" "f" "f"
[4,] "g" "e" "j"
[5,] "b" "a" "e"
And take the function on the first two columns:
> f(m[,1], m[,2])
[1] 1 #'e' is shared.
How to do that on all column combinations ? The result could be a m*m matrix (where the results are symmetric around the diagonal) or, alternative, it could be a long-format data frame with columns for v1, v2 and the function's result (e.g. the first row would be c1, c2 and 3 )
I tried to investigate the functions outer and expand.grid but could not find a solution.
sapply(1:3, function(i) sapply(1:3, function(j) f(m[,i], m[,j])))
# [,1] [,2] [,3]
#[1,] 5 1 3
#[2,] 1 5 3
#[3,] 3 3 5
Or the following output might be friendlier
sapply(data.frame(m), function(x1) sapply(data.frame(m), function(x2) f(x1, x2)))
# c1 c2 c3
#c1 5 1 3
#c2 1 5 3
#c3 3 3 5
Using expand.grid to get all combinations, then loop through pairs get length of intersected items.
myComb <- expand.grid(colnames(m), colnames(m))
myComb$N <- apply(myComb, 1, function(i){
length(intersect(m[, i[1]], m[, i[2]]))
# or use your own function
# f(m[, i[1]], m[, i[2]])
})
myComb
# Var1 Var2 N
# 1 c1 c1 5
# 2 c2 c1 1
# 3 c3 c1 3
# 4 c1 c2 1
# 5 c2 c2 5
# 6 c3 c2 3
# 7 c1 c3 3
# 8 c2 c3 3
# 9 c3 c3 5
We can do this with outer
f1 <- function(x, y) length(intersect(m[,x], m[,y]))
res <- outer(colnames(m), colnames(m), FUN = Vectorize(f1))
dimnames(res) <- list(colnames(m), colnames(m))
res
# c1 c2 c3
#c1 5 1 3
#c2 1 5 3
#c3 3 3 5
A double loop also works. Only thing is I converted m to be a dataframe M:
f <- function(v1,v2)sum(v1 %in% v2)
set.seed(1) #Leads to different m values than you posted
m <- replicate(3, sample(letters[1:10], size = 5))
dimnames(m) <- list(NULL, paste0('c', 1:ncol(m)))
#Convert m to dataframe M
M <- as.data.frame(m)
#Initialize dataframe of answers
df <- data.frame(matrix(ncol=3, nrow=ncol(M)))
#Loop and get answers
row <- 1
for(i in 1:(ncol(M)-1)){
for(j in 1:(ncol(M)-i)){
df[row, 1] <- names(M)[i]
df[row, 2] <- names(M)[i+j]
df[row, 3] <- f(M[,i], M[,i+j])
row <- row+1
}
}
df
X1 X2 X3
1 c1 c2 1
2 c1 c3 3
3 c2 c3 3

Generalize R %in% operator to match tuples

I spent a while the other day looking for a way to check if a row vector is contained in some set of row vectors in R. Basically, I want to generalize the %in% operator to match a tuple instead of each entry in a vector. For example, I want:
row.vec = c("A", 3)
row.vec
# [1] "A" "3"
data.set = rbind(c("A",1),c("B",3),c("C",2))
data.set
# [,1] [,2]
# [1,] "A" "1"
# [2,] "B" "3"
# [3,] "C" "2"
row.vec %tuple.in% data.set
# [1] FALSE
for my made-up operator %tuple.in% because the row vector c("A",3) is not a row vector in data.set. Using the %in% operator gives:
row.vec %in% data.set
# [1] TRUE TRUE
because "A" and 3 are in data.set, which is not what I want.
I have two questions. First, are there any good existing solutions to this?
Second, since I couldn't find them (even if they exist), I tried to write my own function to do it. It works for an input matrix of row vectors, but I'm wondering if any experts have proposed improvements:
is.tuple.in <- function(matrix1, matrix2){
# Apply rbind() so that matrix1 has columns even if it is a row vector.
matrix1 = rbind(matrix1)
if(ncol(matrix1) != ncol(matrix2)){
stop("Matrices must have the same number of columns.") }
# Now check for the first row and handle other rows recursively
row.vec = matrix1[1,]
tuple.found = FALSE
for(i in 1:nrow(matrix2)){
# If we find a match, then this row exists in matrix 2 and we can break the loop
if(all(row.vec == matrix2[i,])){
tuple.found = TRUE
break
}
}
# If there are more rows to be checked, use a recursive call
if(nrow(matrix1) > 1){
return(c(tuple.found, is.tuple.in(matrix1[2:nrow(matrix1),],matrix2)))
} else {
return(tuple.found)
}
}
I see a couple problems with that that I'm not sure how to fix. First, I'd like the base case to be clear at the start of the function. I didn't manage to do this because I pass matrix1[2:nrow(matrix1),] in the recursive call, which produces an error if matrix1 has one row. So instead of getting to a case where matrix1 is empty, I have an if condition at the end deciding if more iterations are necessary.
Second, I think the use of rbind() at the start is sloppy, but I needed it for when matrix1 had been reduced to a single row. Without using rbind(), ncol(matrix1) produced an error in the 1-row case. I figure my trouble here has to do with a lack of knowledge about R data types.
Any help would be appreciated.
I'm wondering if you have made this a bit more complicated than it is. For example,
set.seed(1618)
vec <- c(1,3)
mat <- matrix(rpois(1000,3), ncol = 2)
rownames(mat) <- 1:nrow(mat)
mat[sapply(1:nrow(mat), function(x) all(vec %in% mat[x, ])), ]
# gives me this
# [,1] [,2]
# 6 3 1
# 38 3 1
# 39 3 1
# 85 1 3
# 88 1 3
# 89 1 3
# 95 3 1
# 113 1 3
# ...
you could subset this further if you care about the order
or you could modify the function slightly:
mat[sapply(1:nrow(mat), function(x)
all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]
# [,1] [,2]
# 85 1 3
# 88 1 3
# 89 1 3
# 113 1 3
# 133 1 3
# 139 1 3
# 187 1 3
# ...
another example with a longer vector
set.seed(1618)
vec <- c(1,4,5,2)
mat <- matrix(rpois(10000, 3), ncol = 4)
rownames(mat) <- 1:nrow(mat)
mat[sapply(1:nrow(mat), function(x) all(vec %in% mat[x, ])), ]
# [,1] [,2] [,3] [,4]
# 57 2 5 1 4
# 147 1 5 2 4
# 279 1 2 5 4
# 303 1 5 2 4
# 437 1 5 4 2
# 443 1 4 5 2
# 580 5 4 2 1
# ...
I see a couple that match:
mat[sapply(1:nrow(mat), function(x)
all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]
# [,1] [,2] [,3] [,4]
# 443 1 4 5 2
# 901 1 4 5 2
# 1047 1 4 5 2
but only three
for your single row case:
vec <- c(1,4,5,2)
mat <- matrix(c(1,4,5,2), ncol = 4)
rownames(mat) <- 1:nrow(mat)
mat[sapply(1:nrow(mat), function(x)
all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]
# [1] 1 4 5 2
here is a simple function with the above code
is.tuplein <- function(vec, mat, exact = TRUE) {
rownames(mat) <- 1:nrow(mat)
if (exact)
tmp <- mat[sapply(1:nrow(mat), function(x)
all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]
else tmp <- mat[sapply(1:nrow(mat), function(x) all(vec %in% mat[x, ])), ]
return(tmp)
}
is.tuplein(vec = vec, mat = mat)
# [1] 1 4 5 2
seems to work, so let's make our own %in% operator:
`%tuple%` <- function(x, y) is.tuplein(vec = x, mat = y, exact = TRUE)
`%tuple1%` <- function(x, y) is.tuplein(vec = x, mat = y, exact = FALSE)
and try her out
set.seed(1618)
c(1,2,3) %tuple% matrix(rpois(1002,3), ncol = 3)
# [,1] [,2] [,3]
# 133 1 2 3
# 190 1 2 3
# 321 1 2 3
set.seed(1618)
c(1,2,3) %tuple1% matrix(rpois(1002,3), ncol = 3)
# [,1] [,2] [,3]
# 48 2 3 1
# 64 2 3 1
# 71 1 3 2
# 73 3 1 2
# 108 3 1 2
# 112 1 3 2
# 133 1 2 3
# 166 2 1 3
Does this do what you want (even for more than 2 columns)?
paste(row.vec,collapse="_") %in% apply(data.set,1,paste,collapse="_")

Multiplying Combinations of a list of lists in R

Given a list of two lists, I am trying to obtain, without using for loops, a list of all element-wise products of the first list with the second. For example:
> a <- list(c(1,2), c(2,3), c(4,5))
> b <- list(c(1,3), c(3,4), c(6,2))
> c <- list(a, b)
The function should return a list with 9 entries, each of size two. For example,
> answer
[[1]]
[1] 1 6
[[2]]
[1] 3 8
[[3]]
[1] 6 4
[[4]]
[1] 2 9
[[5]]
[1] 6 12
etc...
Any suggestions would be much appreciated!
A fast (but memory-intensive) way would be to use the mechanism of mapply in combination with argument recycling, something like this:
mapply(`*`,a,rep(b,each=length(a)))
Gives :
> mapply(`*`,a,rep(b,each=length(a)))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 2 4 3 6 12 6 12 24
[2,] 6 9 15 8 12 20 4 6 10
Or replace a with c[[1]] and b with c[[2]] to obtain the same. To get a list, set the argument SIMPLIFY = FALSE.
Have no idea if this is fast or memory intensive just that it works, Joris Meys's answer is more eloquent:
x <- expand.grid(1:length(a), 1:length(b))
x <- x[order(x$Var1), ] #gives the order you asked for
FUN <- function(i) diag(outer(a[[x[i, 1]]], b[[x[i, 2]]], "*"))
sapply(1:nrow(x), FUN) #I like this out put
lapply(1:nrow(x), FUN) #This one matches what you asked for
EDIT: Now that Brian introduced benchmarking (which I love (LINK)) I have to respond. I actually have a faster answer using what I call expand.grid2 that's a lighter weight version of the original that I stole from HERE. I was going to throw it up before but when I saw how fast Joris's is I figured why bother, both short and sweet but also fast. But now that Diggs has dug I figured I'd throw up here the expand.grid2 for educational purposes.
expand.grid2 <-function(seq1,seq2) {
cbind(Var1 = rep.int(seq1, length(seq2)),
Var2 = rep.int(seq2, rep.int(length(seq1),length(seq2))))
}
x <- expand.grid2(1:length(a), 1:length(b))
x <- x[order(x[,'Var1']), ] #gives the order you asked for
FUN <- function(i) diag(outer(a[[x[i, 1]]], b[[x[i, 2]]], "*"))
lapply(1:nrow(x), FUN)
Here's the results (same labeling as Bryan's except TylerEG2 is using the expand.grid2):
Unit: microseconds
expr min lq median uq max
1 DiggsL(a, b) 5102.296 5307.816 5471.578 5887.516 70965.58
2 DiggsM(a, b) 384.912 428.769 443.466 461.428 36213.89
3 Joris(a, b) 91.446 105.210 123.172 130.171 16833.47
4 TylerEG2(a, b) 392.377 425.503 438.100 453.263 32208.94
5 TylerL(a, b) 1752.398 1808.852 1847.577 1975.880 49214.10
6 TylerM(a, b) 1827.515 1888.867 1925.959 2090.421 75766.01
7 Wojciech(a, b) 1719.740 1771.760 1807.686 1924.325 81666.12
And if I take the ordering step out I can squeak out even more but it still isn't close to Joris's answer.
Pulling ideas from the other answers together, I'll throw another one-liner in for fun:
do.call(mapply, c(FUN=`*`, as.list(expand.grid(b, a))))
which gives
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 3 6 2 6 12 4 12 24
[2,] 6 8 4 9 12 6 15 20 10
If you really need it in the format you gave, then you can use the plyr library to transform it into that:
library("plyr")
as.list(unname(alply(do.call(mapply, c(FUN=`*`, as.list(expand.grid(b, a)))), 2)))
which gives
[[1]]
[1] 1 6
[[2]]
[1] 3 8
[[3]]
[1] 6 4
[[4]]
[1] 2 9
[[5]]
[1] 6 12
[[6]]
[1] 12 6
[[7]]
[1] 4 15
[[8]]
[1] 12 20
[[9]]
[1] 24 10
Just for fun, benchmarking:
Joris <- function(a, b) {
mapply(`*`,a,rep(b,each=length(a)))
}
TylerM <- function(a, b) {
x <- expand.grid(1:length(a), 1:length(b))
x <- x[order(x$Var1), ] #gives the order you asked for
FUN <- function(i) diag(outer(a[[x[i, 1]]], b[[x[i, 2]]], "*"))
sapply(1:nrow(x), FUN)
}
TylerL <- function(a, b) {
x <- expand.grid(1:length(a), 1:length(b))
x <- x[order(x$Var1), ] #gives the order you asked for
FUN <- function(i) diag(outer(a[[x[i, 1]]], b[[x[i, 2]]], "*"))
lapply(1:nrow(x), FUN)
}
Wojciech <- function(a, b) {
# Matrix with indicies for elements to multiply
G <- expand.grid(1:3,1:3)
# Coversion of G to list
L <- lapply(1:nrow(G),function(x,d=G) d[x,])
lapply(L,function(i,x=a,y=b) x[[i[[2]]]]*y[[i[[1]]]])
}
DiggsM <- function(a, b) {
do.call(mapply, c(FUN=`*`, as.list(expand.grid(b, a))))
}
DiggsL <- function(a, b) {
as.list(unname(alply(t(do.call(mapply, c(FUN=`*`, as.list(expand.grid(b, a))))), 1)))
}
and the benchmarks
> library("rbenchmark")
> benchmark(Joris(b,a),
+ TylerM(a,b),
+ TylerL(a,b),
+ Wojciech(a,b),
+ DiggsM(a,b),
+ DiggsL(a,b),
+ order = "relative",
+ replications = 1000,
+ columns = c("test", "elapsed", "relative"))
test elapsed relative
1 Joris(b, a) 0.08 1.000
5 DiggsM(a, b) 0.26 3.250
4 Wojciech(a, b) 1.34 16.750
3 TylerL(a, b) 1.36 17.000
2 TylerM(a, b) 1.40 17.500
6 DiggsL(a, b) 3.49 43.625
and to show they are equivalent:
> identical(Joris(b,a), TylerM(a,b))
[1] TRUE
> identical(Joris(b,a), DiggsM(a,b))
[1] TRUE
> identical(TylerL(a,b), Wojciech(a,b))
[1] TRUE
> identical(TylerL(a,b), DiggsL(a,b))
[1] TRUE
# Your data
a <- list(c(1,2), c(2,3), c(4,5))
b <- list(c(1,3), c(3,4), c(6,2))
# Matrix with indicies for elements to multiply
G <- expand.grid(1:3,1:3)
# Coversion of G to list
L <- lapply(1:nrow(G),function(x,d=G) d[x,])
lapply(L,function(i,x=a,y=b) x[[i[[2]]]]*y[[i[[1]]]])

in R, how to retrieve a complete matrix using combn?

My problem, removing the specific purpose, seems like this:
how to transform a combination like this:
first use combn(letters[1:4], 2) to calculate the combination
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "a" "a" "a" "b" "b" "c"
[2,] "b" "c" "d" "c" "d" "d"
use each column to obtain another data frame:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
elements are obtained, for example: the first element, from the first column of the above dataframe
then How can i transform the above dataframe into a matrix, for example result, things like:
a b c d
a 0 1 2 3
b 1 0 4 5
c 2 4 0 6
d 3 5 6 0
the elements with same col and row names will have zero value where others corresponding to above value
Here is one way that works:
inputs <- letters[1:4]
combs <- combn(inputs, 2)
N <- seq_len(ncol(combs))
nams <- unique(as.vector(combs))
out <- matrix(ncol = length(nams), nrow = length(nams))
out[lower.tri(out)] <- N
out <- t(out)
out[lower.tri(out)] <- N
out <- t(out)
diag(out) <- 0
rownames(out) <- colnames(out) <- inputs
Which gives:
> out
a b c d
a 0 1 2 3
b 1 0 4 5
c 2 4 0 6
d 3 5 6 0
If I had to do this a lot, I'd wrap those function calls into a function.
Another option is to use as.matrix.dist() to do the conversion for us by setting up a "dist" object by hand. Using some of the objects from earlier:
## Far easier
out2 <- N
class(out2) <- "dist"
attr(out2, "Labels") <- as.character(inputs)
attr(out2, "Size") <- length(inputs)
attr(out2, "Diag") <- attr(out2, "Upper") <- FALSE
out2 <- as.matrix(out2)
Which gives:
> out2
a b c d
a 0 1 2 3
b 1 0 4 5
c 2 4 0 6
d 3 5 6 0
Again, I'd wrap this in a function if I had to do it more than once.
Does it have to be a mirror matrix with zeros over the diagonal?
combo <- combn(letters[1:4], 2)
in.combo <- matrix(1:6, nrow = 1)
combo <- rbind(combo, in.combo)
out.combo <- matrix(rep(NA, 16), ncol = 4)
colnames(out.combo) <- letters[1:4]
rownames(out.combo) <- letters[1:4]
for(cols in 1:ncol(combo)) {
vec1 <- combo[, cols]
out.combo[vec1[1], vec1[2]] <- as.numeric(vec1[3])
}
> out.combo
a b c d
a NA 1 2 3
b NA NA 4 5
c NA NA NA 6
d NA NA NA NA

Resources