Looping a matrix - vector multiplication, whose elements change with every loop - r

I have a dataset with very large matrices and vectors. I would like to multiply a matrix with a vector that has one "1" element with the rest being zero. I would like to loop this calculation so that every possible 1 and 0 combination within the vector has been multiplied with the matrix, and the store the results in a vector.
I give an example of what I'm trying to do.
I have two matrices, a and b:
a <- matrix(1:16, nrow = 4, byrow = TRUE)
b <- matrix(17:32, nrow = 4, byrow = TRUE)
and a vector with 1's and 0's. As I don't know how to loop well yet, I write down the code for each combination:
c1 <- rep(0, times = 4)
c1[1] <- 1
c2 <- rep(0, times = 4)
c2[2] <- 1
c3 <- rep(0, times = 4)
c3[3] <- 1
c4 <- rep(0, times = 4)
c4[4] <- 1
I multiply a with each of the vector combinations c, diagonalize it, multiply this with b and sum each row and column. I then store this result in a vector results:
d1 <- sum(colSums(b %*% diag(as.vector(a %*% c1), nrow = 4)))
d2 <- sum(colSums(b %*% diag(as.vector(a %*% c2), nrow = 4)))
d3 <- sum(colSums(b %*% diag(as.vector(a %*% c3), nrow = 4)))
d4 <- sum(colSums(b %*% diag(as.vector(a %*% c4), nrow = 4)))
results <- cbind(d1, d2, d3, d4)
that gives:
d1 d2 d3 d4
[1,] 2824 3216 3608 4000
Is there a good line of code that does this more efficiently than what I did here?

Because of the special structure of your calculation you can shorten it to
a <- matrix(1:16, nrow = 4, byrow = TRUE)
b <- matrix(17:32, nrow = 4, byrow = TRUE)
results <- apply(a, 2, function(ai) sum(b %*% ai))
# [1] 2824 3216 3608 4000
or even shorter
colSums(b %*% a)
# [1] 2824 3216 3608 4000

Related

How to calculate the number of common neighbours for all edges in a graph?

I have a network defined by a list of edges. The network is large and sparse. For each pair of connected vertices, I would like to calculate the number of common neighbours. This post discusses how to do this for a single pair of vertices, but it strikes me as inefficient to loop over all edges to calculate this statistic for each edge in the graph. Instead, the statistic I'm after can be calculated from the product of the adjacency matrix with itself, as follows:
library(igraph)
library(data.table)
set.seed(1111)
E <- data.table(i = sample(as.character(1:5e4), 1e5, replace = T),
j = sample(as.character(1:5e4), 1e5, replace = T))
G <- simplify(graph_from_data_frame(E, directed = F)) # remove loops and multiples
N <- as_adjacency_matrix(G) %*% as_adjacency_matrix(G)
However, I don't know how to efficiently get the information out of the resulting matrix N, without looping over all the cells, which would look like this:
extract_entries <- function(x, M) {
nl <- M#p[x] + 1 # index from 1, not 0
nu <- M#p[x+1]
j.col <- M#Dimnames[[1]][M#i[nl:nu] + 1]
i.col <- M#Dimnames[[2]][x]
nb.col <- M#x[nl:nu]
data.table(i = i.col, j = j.col, nb = nb.col)
}
system.time(E.nb <- rbindlist(lapply(1:N#Dim[1], extract_entries, N), fill = T))
# user system elapsed
# 8.29 0.02 8.31
E <- E.nb[E, on = c('i', 'j')][is.na(nb), nb := 0]
Even in the reproducible example above, looping is slow, and the true graph might have millions of vertices and tens of millions of edges. My end goal is to add a column to the data frame E with the number of common neighbours for each edge, as illustrated in the MWE.
My question is: is there a (much) more efficient way of extracting the number of common neighbours for each pair of vertices and merging this information back into the list of edges?
I have seen that the package diagramme_R includes a function that calculates the number of common neighbours, however it again appears intended to be used for a limited number of edges, and wouldn't solve the problem of adding the information on the number of common neighbours back to the original data frame.
You're pretty much there. Just a couple things: converting N to a triangular matrix lets us build E.nb without lapply. Also, the i-j ordering is messing up the E.nb[E join. Temporarily sorting each row fixes this.
I've also included a function that uses the igraph triangles function instead of squaring the adjacency matrix, which is a bit faster on this example dataset.
library(igraph)
library(data.table)
library(Matrix)
set.seed(1111)
E <- data.table(i = sample(as.character(1:5e4), 1e5, replace = TRUE),
j = sample(as.character(1:5e4), 1e5, replace = TRUE))
f1 <- function(E) {
blnSort <- E[[1]] > E[[2]]
E[blnSort, 2:1 := .SD, .SDcols = 1:2]
G <- simplify(graph_from_data_frame(E, directed = FALSE)) # remove loops and multiples
N <- as(tril(as_adjacency_matrix(G) %*% as_adjacency_matrix(G), -1), "dtTMatrix")
data.table(
i = N#Dimnames[[1]][N#i + 1],
j = N#Dimnames[[2]][N#j + 1],
nb = as.integer(N#x)
)[
i > j, 2:1 := .(i, j)
][
E, on = .(i, j)
][
is.na(nb), nb := 0L
][
blnSort, 2:1 := .(i, j)
]
}
f2 <- function(E) {
blnSort <- E[[1]] > E[[2]]
E[blnSort, 2:1 := .SD, .SDcols = 1:2]
G <- simplify(graph_from_data_frame(E, directed = FALSE)) # remove loops and multiples
tri <- matrix(triangles(G), ncol = 3, byrow = TRUE)
data.table(
i = names(V(G)[tri[,c(1, 1, 2)]]),
j = names(V(G)[tri[,c(2, 3, 3)]])
)[
i > j, 2:1 := .(i, j)
][
,.(nb = .N), .(i, j)
][
E, on = .(i, j)
][
is.na(nb), nb := 0L
][
blnSort, 2:1 := .(i, j)
]
}
microbenchmark::microbenchmark(f1 = f1(EE),
f2 = f2(EE),
setup = {EE <- copy(E)})
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> f1 257.4803 281.8928 325.0267 303.4441 370.4977 478.7524 100
#> f2 123.5213 139.5152 169.3914 151.3065 190.7800 284.2644 100
identical(f1(copy(E)), f2(copy(E)))
#> [1] TRUE
I think you can simply the a matrix for indexing (since you are using character as vertex name, which turns to be the row/column names of the sparse matrix N, so we have to use match to find the corresponding indices)
cbind(df, nb = N[matrix(match(as.matrix(df), colnames(N)), ncol = 2)])
Example
Given an edge list df and resulting sparse matrix N
df <- data.frame(
from = letters[c(1, 1, 2, 2, 6, 6, 6, 1, 1, 1)],
to = letters[c(2, 4, 3, 5, 3, 2, 5, 5, 6, 3)]
)
g <- graph_from_data_frame(df, directed = FALSE)
m <- get.adjacency(g)
N <- m %*% m
by running the solution above, we will obtain
> cbind(df, nb = N[matrix(match(as.matrix(df), colnames(N)), ncol = 2)])
from to nb
1 a b 3
2 a d 0
3 b c 2
4 b e 2
5 f c 2
6 f b 3
7 f e 2
8 a e 2
9 a f 3
10 a c 2

Multiplying a 5X3 matrix and 3X1 vector in R

I am starting to learn R and trying to multiply a 5X3 matrix with a 3X1 column vector in R; However while creating a new variable to perform the operation, R throws the error "non-conformable arrays". Can someone please point out my mistake in the code below -
*#5X3 Matrix*
X <- matrix(c(1,25.5,1.23,1,40.8,1.89,1,30.2,1.55,1,4.3,1.18,1,10.7,1.68),nrow=5,ncol=3,byrow=TRUE)
*3X1 Column vector*
b1 <- matrix(c(23,0.1,-8), nrow = 3, ncol = 1, byrow = TRUE)
v1 <- X * b1
v1
Appreciate your help :)
You need the matrix-multiplication operator %*%:
X <- matrix(c(1,25.5,1.23,1,40.8,1.89,1,30.2,1.55,1,4.3,1.18,1,10.7,1.68),nrow=5,ncol=3,byrow=TRUE)
b1 <- matrix(c(23,0.1,-8), nrow = 3, ncol = 1, byrow = TRUE)
v1 <- X %*% b1
v1
#> [,1]
#> [1,] 15.71
#> [2,] 11.96
#> [3,] 13.62
#> [4,] 13.99
#> [5,] 10.63
Normally one would use the first alternative below but the others are possible too. The first four alternatives below give a column vector as the result while the others give a plain vector without dimensions. The first three work even if b1 has more than one column. The remainder assume b1 has one column but could be generalized.
X %*% b1
crossprod(t(X), b1)
library(einsum)
einsum("ij,jk -> ik", X, b1)
out <- matrix(0, nrow(X), ncol(b1))
for(i in 1:nrow(X)) {
for(k in 1:ncol(X)) out[i] <- out[i] + X[i, k] * b1[k, 1]
}
out
colSums(t(X) * c(b1))
apply(X, 1, crossprod, b1)
sapply(1:nrow(X), function(i) sum(X[i, ] * b1))
rowSums(mapply(`*`, as.data.frame(X), b1))
rowSums(sapply(1:ncol(X), function(j) X[, j] * b1[j]))
X[, 1] * b1[1, 1] + X[, 2] * b1[2, 1] + X[, 3] * b1[3, 1]
Note
The input shown in the question is:
X <- matrix(c(1,25.5,1.23,1,40.8,1.89,1,30.2,1.55,1,4.3,1.18,1,10.7,1.68),nrow=5,ncol=3,byrow=TRUE)
b1 <- matrix(c(23,0.1,-8), nrow = 3, ncol = 1, byrow = TRUE)

Error in while (e_i$X1 < 12 | e_i$X2 < 12) { : argument is of length zero

In an earlier question (R: Logical Conditions Not Being Respected), I learned how to make the following simulation :
Step 1: Keep generating two random numbers "a" and "b" until both "a" and "b" are greater than 12
Step 2: Track how many random numbers had to be generated until it took for Step 1 to be completed
Step 3: Repeat Step 1 and Step 2 100 times
res <- matrix(0, nrow = 0, ncol = 3)
for (j in 1:100){
a <- rnorm(1, 10, 1)
b <- rnorm(1, 10, 1)
i <- 1
while(a < 12 | b < 12) {
a <- rnorm(1, 10, 1)
b <- rnorm(1, 10, 1)
i <- i + 1
}
x <- c(a,b,i)
res <- rbind(res, x)
}
head(res)
[,1] [,2] [,3]
x 12.14232 12.08977 399
x 12.27158 12.01319 1695
x 12.57345 12.42135 302
x 12.07494 12.64841 600
x 12.03210 12.07949 82
x 12.34006 12.00365 782
Question: Now, I am trying to make a slight modification to the above code - Instead of "a" and "b" being produced separately, I want them to be produced "together" (in math terms: "a" and "b" were being produced from two independent univariate normal distributions, now I want them to come from a bivariate normal distribution).
I tried to modify this code myself:
library(MASS)
Sigma = matrix(
c(1,0.5, 0.5, 1), # the data elements
nrow=2, # number of rows
ncol=2, # number of columns
byrow = TRUE) # fill matrix by rows
res <- matrix(0, nrow = 0, ncol = 3)
for (j in 1:100){
e_i = data.frame(mvrnorm(n = 1, c(10,10), Sigma))
e_i$i <- 1
while(e_i$X1 < 12 | e_i$X2 < 12) {
e_i = data.frame(mvrnorm(n = 1, c(10,10), Sigma))
e_i$i <- i + 1
}
x <- c(e_i$X1, e_i$X2 ,i)
res <- rbind(res, x)
}
res = data.frame(res)
But this is producing the following error:
Error in while (e_i$X1 < 12 | e_i$X2 < 12) { : argument is of length
zero
If I understand your code correctly you are trying to see how many samples occur before both values are >=12 and doing that for 100 trials? This is the approach I would take:
library(MASS)
for(i in 1:100){
n <- 1
while(any((x <- mvrnorm(1, mu=c(10,10), Sigma=diag(0.5, nrow=2)+0.5))<12)) n <- n+1
if(i==1) res <- data.frame("a"=x[1], "b"=x[2], n)
else res <- rbind(res, data.frame("a"=x[1], "b"=x[2], n))
}
Here I am assigning the results of a mvrnorm to x within the while() call. In that same call, it evaluates whether either are less than 12 using the any() function. If that evaluates to FALSE, n (the counter) is increased and the process repeated. Once TRUE, the values are appended to your data.frame and it goes back to the start of the for-loop.
Regarding your code, the mvrnorm() function is returning a vector, not a matrix, when n=1 so both values go into a single variable in the data.frame:
data.frame(mvrnorm(n = 1, c(10,10), Sigma))
Returns:
mvrnorm.n...1..c.10..10...Sigma.
1 9.148089
2 10.605546
The matrix() function within your data.frame() calls, along with some tweaks to your use of i, will fix your code:
library(MASS)
Sigma = matrix(
c(1,0.5, 0.5, 1), # the data elements
nrow=2, # number of rows
ncol=2, # number of columns
byrow = TRUE) # fill matrix by rows
res <- matrix(0, nrow = 0, ncol = 3)
for (j in 1:10){
e_i = data.frame(matrix(mvrnorm(n = 1, c(10,10), Sigma), ncol=2))
i <- 1
while(e_i$X1[1] < 12 | e_i$X2[1] < 12) {
e_i = data.frame(matrix(mvrnorm(n = 1, c(10,10), Sigma), ncol=2))
i <- i + 1
}
x <- c(e_i$X1, e_i$X2 ,i)
res <- rbind(res, x)
}
res = data.frame(res)

Symbolic matrix mutiplication error (Ryacas)

I would like to do symbolic matrix operations with Ryacas using a function that converts base R matrices to Ryacas format. The result of the function seems to match Ryacas format. But when I attempt to multiply the matrices, the error
# Error in aa %*% aa : requires numeric/complex matrix/vector arguments
throws. The code below is a minimal example that shows the case.
Any suggestion, please?
library(Ryacas)
conv.mat <- function(x) {
conv <- lapply(1:nrow(x), function(i) paste0(x[i, ], collapse = ", "))
conv <- paste0("List(", paste0("List(", unlist(conv), ")", collapse = ", "), ")")
noquote(conv)
}
# Writing a matrix manually for Ryacas format
a <- List(List(1, 2), List(3, 7))
a * a
# expression(list(list(7, 16), list(24, 55)))
# Writing a matrix in R and convert it to Ryacas format by the function conv.mat
aa <- matrix(c(1, 2, 3, 7), 2, byrow = TRUE)
aa <- conv.mat(aa)
# [1] List(List(1, 2), List(3, 7))
aa * aa
# Error in aa * aa : non-numeric argument to binary operator
Firstly, to multiply Ryacas matrices you want aa * aa rather than aa %*% aa. But that alone doesn't help in your case as conv.mat doesn't give exactly what we need (an expression).
We may use, e.g.,
conv.mat <- function(x)
do.call(List, lapply(1:nrow(x), function(r) do.call(List, as.list(x[r, ]))))
Then
M <- matrix(c(1, 2, 3, 7), 2, byrow = TRUE)
M %*% M
# [,1] [,2]
# [1,] 7 16
# [2,] 24 55
M <- conv.mat(M)
M * M
# expression(list(list(7, 16), list(24, 55)))

Function to multiple every matrix by every vector?

I am implementing a technique whereby I take two matrices (M1 and M2) and multiply them each by the same "skewer" vector (B), producing results vectors R1 and R2, then taking a correlation of these vectors, as so:
P1 <- data.frame(split(rnorm(5*16, 1, 1), 1:16))
M1 <- matrix(unlist(P1[1,]), nrow = 4)
M1[upper.tri(M1)] <- t(M1)[upper.tri(M1)]
P2 <- data.frame(split(rnorm(5*16, 1, 1), 1:16))
M2 <- matrix(unlist(P2[1,]), nrow = 4)
M2[upper.tri(M2)] <- t(M2)[upper.tri(M2)]
B <- rnorm(4, 0, 1)
R1 <- M1 %*% B
R2 <- M2 %*% B
cor(R1, R2)
However, I need to extend this in two ways: i) I need to do this for n (4000, but showing here for 2) vectors of B, which I have achieved using a function as below, and ii) performing this for each iteration of a posterior distribution of the matrices (1000, using 5 here in the example), which I have achieved using a for loop inside the function. This returns a data frame with one row per iteration, and 1 column per skewer, and each cell giving the correlation of response vectors. While this works, the for loop is slow -
com_rsk_p2 <- function(m1, m2, n = 2){
nitt <- length(m1[,1])
k <- sqrt(length(m1))
B <- split(rnorm(n*k, 0, 1), 1:n)
rv_cor <- split(rep(NA, times = n*nitt), 1:nitt)
for(i in 1:nitt){
R1 <- sapply(B, function(x) x %*% matrix(unlist(m1[i,]), ncol = k))
R2 <- sapply(B, function(x) x %*% matrix(unlist(m2[i,]), ncol = k))
rv_cor[[i]] <- diag(matrix(mapply(cor, list(R1), list(R2)), ncol = n))
}
return(t(data.frame(rv_cor)))
}
I've been working on this for a couple of days, but coming up short - is it possible to use a non-looping/apply approach so that each iteration of M1 and M2 are multiplied by each skewer, storing the result vector correlations for each case? I'm sure there must be some trick that I am missing!
> out <- com_rsk_p2(P1, P2)
> out
[,1] [,2]
X1 0.7622732 0.8156658
X2 0.4414054 0.4266134
X3 0.4388098 -0.1248999
X4 0.5438046 0.7723585
X5 -0.5833943 -0.5294521
Ideally I'd like to have the objects R1 and R2 remain within the function because I will use these for some other things later on when I add to this function (calculating angles between vectors etc.).
Updated 26/04/2018 I have created a list of the matrices, and matrix of the B vectors, and I can multiply a single matrix by each vector of B as below - the key I am looking for is to extend this to an efficient approach that multiplies each matrix in the list by each vector of B:
P1 <- data.frame(split(round(rnorm(5*16, 1, 1),2), 1:16))
P2 <- data.frame(split(round(rnorm(5*16, 1, 1),2), 1:16))
nitt <- length(P1[,1])
k <- sqrt(length(P1))
M1L <- list(rep(NA, times = nitt))
M2L <- list(rep(NA, times = nitt))
for(i in 1:nitt){
M <- matrix(P1[i,], byrow = T, ncol = k)
M[lower.tri(M)] <- t(M)[lower.tri(M)]
M1L[[i]] <- M
M <- matrix(P2[i,], byrow = T, ncol = k)
M[lower.tri(M)] <- t(M)[lower.tri(M)]
M2L[[i]] <- M
}
B <- matrix(round(rnorm(2*4, 0, 1),2), ncol = 2)
matrix(unlist(M2L[[1]]), ncol = 4) %*% B
> matrix(unlist(M2L[[1]]), ncol = 4) %*% B
[,1] [,2]
[1,] 0.1620 -0.3203
[2,] 0.6027 0.8148
[3,] 0.9763 -1.3177
[4,] -0.5369 0.5605

Resources