Pairwise matrix to list of outcomes - r

Suppose we have a matrix M
M <- matrix(c(1:9),3,3)
diag(M) <- NA
M
[,1] [,2] [,3]
[1,] NA 4 7
[2,] 2 NA 8
[3,] 3 6 NA
where each entry describes the outcomes of pairwise interactions. Each interaction of row i with column j is interepreted as "object i outperformed object j X times". Examples: Object 2 performs better than object 1 in 2 cases. Object 1 performs better than object 3 in 7 cases.
Is there a quick way to transform this matrix into an object holding this information in a format where each row fully describes the interactions between two objects? The goal is something like this:
[,1] [,2] [,3] [,4]
[1,] "OBJ1" "OBJ2" "N1" "N2"
[2,] "1" "2" "4" "2"
[3,] "1" "3" "7" "3"
[4,] "2" "3" "8" "6"
where the first two columns give the objects that are compared while columns 3 and 4 describe how often OBJ1 outperformed OBJ2 and vice versa. The interpretation of the first row is: Object 1 has outperformed Object 2 4 times, whereas Object 2 has outperformed Object 1 2 times. I have been playing around with reshape2 and aggregating without useful results so far.

Maybe you can try the code below
inds <- t(combn(dim(M)[1], 2))
Mout <- `colnames<-`(
cbind(inds, M[inds], M[inds[, 2:1]]),
do.call(paste0, rev(expand.grid(1:2, c("Obj", "N"))))
)
which gives
> Mout
Obj1 Obj2 N1 N2
[1,] 1 2 4 2
[2,] 1 3 7 3
[3,] 2 3 8 6

Another solution could be:
M <- matrix(c(1:9),3,3)
diag(M) <- NA
M1 <- M
M[upper.tri(M, diag=TRUE)] <- NA
M1[lower.tri(M1, diag=TRUE)] <- NA
R1 = reshape2::melt(M1, na.rm=TRUE, value.name="N1")
R2 = reshape2::melt(M, na.rm=TRUE, value.name="N2")
R1$N2 <- R2$N2
rownames(R1) <- NULL
Output:
> R1
Var1 Var2 N1 N2
1 1 2 4 2
2 1 3 7 3
3 2 3 8 6

Related

Identify groups of identical rows in a matrix

tl;dr What is the idiomatic way to identify groups of identical rows in a matrix in R?
Given an n-by-2 matrix where some rows occur more than once,
> mat <- matrix(c(2,5,5,3,4,6,2,5,4,6,4,6), ncol=2, byrow=T)
> mat
[,1] [,2]
[1,] 2 5
[2,] 5 3
[3,] 4 6
[4,] 2 5
[5,] 4 6
[6,] 4 6
I am looking to get the groups of row indices of identical rows. In the example above, rows (1,4) are identical, and so are rows (3,5,6). Finally, there is row (2). I am looking to get these groups, represented in whatever way is idiomatic in R.
The output could be something like this,
> groups <- matrix(c(1,1, 2,2, 3,3, 4,1, 5,3, 6,3), ncol=2, byrow=T)
> groups
[,1] [,2]
[1,] 1 1
[2,] 2 2
[3,] 3 3
[4,] 4 1
[5,] 5 3
[6,] 6 3
where the first column contains the row indices of mat and the second the group index for each row index. Or it could be like this:
> split(groups[,1], groups[,2])
$`1`
[1] 1 4
$`2`
[1] 2
$`3`
[1] 3 5 6
Either will do. I am not sure what is the best way to represent groups in R, and advice on this is also welcome.
For benchmarking purposes, here's a larger dataset:
set.seed(123)
n <- 10000000
mat <- matrix(sample.int(10, 2*n, replace = T), ncol=2)
cbind with sequence of rows and the match between the rows and unique values of the row
v1 <- paste(mat[,1], mat[,2])
# or if there are more columns
#v1 <- do.call(paste, as.data.frame(mat))
out <- cbind(seq_len(nrow(mat)), match(v1, unique(v1)))
-output
> out
[,1] [,2]
[1,] 1 1
[2,] 2 2
[3,] 3 3
[4,] 4 1
[5,] 5 3
[6,] 6 3
If we want a list output
split(out[,1], out[,2])
-ouptut
$`1`
[1] 1 4
$`2`
[1] 2
$`3`
[1] 3 5 6
Benchmarks
With the OP's big data
> system.time({
+ v1 <- paste(mat[,1], mat[,2])
+
+ out <- cbind(seq_len(nrow(mat)), match(v1, unique(v1)))
+
+ })
user system elapsed
2.603 0.130 2.706

Subset assignment of multidimensional array in R

I am trying to assign rows of a 3D array, but I don't know how excatly.
I have a 2D index array where each row corresponds to the first and second index of the 3D array, and a 2D value array which i want to insert into the 3D array. The simplest way I found to do this was
indexes <- cbind(1:30, rep(c(1, 2), 15))
rows <- cbind(1:20, 31:50, 71:90)
for (i in 1:nrow(indexes)) for (j in 1:3)
data[indexes[i,1], indexes[i,2], j] <- rows[i, j]
But this is hard to read, because it uses nested indexing, so I was hoping there was a simpler way, like
data[indexes,] <- rows
(this does not work)
What I've tried:
this question shows how to index the array (without assignment)
apply(data, 3, `[`, indexes)
but this doesn't allow assignment
apply(data, 3, `[`, indexes) <- rows #: could not find function "apply<-"
nor does using [<- work:
apply(data, 3, `[<-`, indexes, rows)
because it treats rows as a vector.
Neither of the following works either
data[indexes[1], indexes[2],] <- rows #: subscript out of bounds
data[indexes,] <- rows #: incorrect number of subscripts on matrix
So is there a simpler way of assigning to a multidimensional array?
Your indexes variable implies that data has first dim of 30, but rows[30,j] doesn't exist. So your problem isn't well posed, and I'll change it.
The basic idea is that you can index a 3 way array by an n x 3 matrix. Each row of the matrix corresponds to a location in the 3 way array, so if you want to set entry data[1,2,3] to 4, and entry data[5,6,7] to 8, you'd use
index <- rbind(c(1,2,3), c(5,6,7))
data[index] <- c(4,8)
You will need to expand your indexes variable to replicate each row 3 times, then read the rows matrix as a vector, and then this works:
data <- array(NA, dim=c(30, 2, 3))
indexes <- cbind(1:30, rep(c(1, 2), 15))
rows <- cbind(1:30, 31:60, 71:100)
indexes1 <- indexes[rep(1:nrow(indexes), each = 3),]
indexes2 <- cbind(indexes1, 1:3)
data[indexes2] <- t(rows) # Transpose because R reads down columns first
I don't think this is any simpler than what you had with the for loops, but maybe you'll find it preferable.
After reading #user2554330's answer, I found a slightly simpler solution
# initialize as in user2554330's answer
data <- ...
indexes <- ...
rows <- ...
indexes3 <- as.matrix(merge(indexes, 1:3))
data[indexes3] <- rows
comparison of indexes2 and indexes3 (using fewer elements):
# print(indexes2)
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 1 2
[3,] 1 1 3
[4,] 2 2 1
[5,] 2 2 2
[6,] 2 2 3
[7,] 3 1 1
[8,] 3 1 2
[9,] 3 1 3
[10,] 4 2 1
[11,] 4 2 2
[12,] 4 2 3
# print(indexes3)
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 2 1
[3,] 3 1 1
[4,] 4 2 1
[5,] 1 1 2
[6,] 2 2 2
[7,] 3 1 2
[8,] 4 2 2
[9,] 1 1 3
[10,] 2 2 3
[11,] 3 1 3
[12,] 4 2 3

Need to vectorize function that using loop (replace NA rows with values from vector)

How I can rewrite this function to vectorized variant. As I know, using loops are not good practice in R:
# replaces rows that contains all NAs with non-NA values from previous row and K-th column
na.replace <- function(x, k) {
for (i in 2:nrow(x)) {
if (!all(is.na(x[i - 1, ])) && all(is.na(x[i, ]))) {
x[i, ] <- x[i - 1, k]
}
}
x
}
This is input data and returned data for function:
m <- cbind(c(NA,NA,1,2,NA,NA,NA,6,7,8), c(NA,NA,2,3,NA,NA,NA,7,8,9))
m
[,1] [,2]
[1,] NA NA
[2,] NA NA
[3,] 1 2
[4,] 2 3
[5,] NA NA
[6,] NA NA
[7,] NA NA
[8,] 6 7
[9,] 7 8
[10,] 8 9
na.replace(m, 2)
[,1] [,2]
[1,] NA NA
[2,] NA NA
[3,] 1 2
[4,] 2 3
[5,] 3 3
[6,] 3 3
[7,] 3 3
[8,] 6 7
[9,] 7 8
[10,] 8 9
Here is a solution using na.locf in the zoo package. row.na is a vector with one component per row of m such that a component is TRUE if the corresponding row of m is all NA and FALSE otherwise. We then set all elements of such rows to the result of applying na.locf to column 2.
At the expense of a bit of speed the lines ending with ## could be replaced with row.na <- apply(is.na(m), 1, all) which is a bit more readable.
If we knew that if any row has an NA in column 2 then all columns of that row are NA, as in the question, then the lines ending in ## could be reduced to just row.na <- is.na(m[, 2])
library(zoo)
nr <- nrow(m) ##
nc <- ncol(m) ##
row.na <- .rowSums(is.na(m), nr, nc) == nc ##
m[row.na, ] <- na.locf(m[, 2], na.rm = FALSE)[row.na]
The result is:
> m
[,1] [,2]
[1,] NA NA
[2,] NA NA
[3,] 1 2
[4,] 2 3
[5,] 3 3
[6,] 3 3
[7,] 3 3
[8,] 6 7
[9,] 7 8
[10,] 8 9
REVISED Some revisions to improve speed as in comments below. Also added alternatives in discussion.
Notice that, unless you have a pathological condition where the first row is all NANA (in which case you're screwed anyway), you don't need to check whether all(is.na(x[i−1,]))all(is.na(x[i - 1, ])) is T or F because in the previous time thru the loop you "fixed" row i−1i-1 .
Further, all you care about is that the designated k-th value is not NA. The rest of the row doesn't matter.
BUT: The k-th value always "falls through" from the top, so perhaps you should:
1) treat the k-th column as a vector, e.g. c(NA,1,NA,NA,3,NA,4,NA,NA) and "fill-down" all numeric values. That's been done many times on SO questions.
2) Every row which is entirely NA except for column k gets filled with that same value.
I think that's still best done using either a loop or apply
You probably need to clarify whether some rows have both numeric and NA values, which your example fails to include. If that's the case, then things get trickier.
The most important part in this answer is getting the grouping you want, which is:
groups = cumsum(rowSums(is.na(m)) != ncol(m))
groups
#[1] 0 0 1 2 2 2 2 3 4 5
Once you have that the rest is just doing your desired operation by group, e.g.:
library(data.table)
dt = as.data.table(m)
k = 2
cond = rowSums(is.na(m)) != ncol(m)
dt[, (k) := .SD[[k]][1], by = cumsum(cond)]
dt[!cond, names(dt) := .SD[[k]]]
dt
# V1 V2
# 1: NA NA
# 2: NA NA
# 3: 1 2
# 4: 2 3
# 5: 3 3
# 6: 3 3
# 7: 3 3
# 8: 6 7
# 9: 7 8
#10: 8 9
Here is another base only vectorized approach:
na.replace <- function(x, k) {
is.all.na <- rowSums(is.na(x)) == ncol(x)
ref.idx <- cummax((!is.all.na) * seq_len(nrow(x)))
ref.idx[ref.idx == 0] <- NA
x[is.all.na, ] <- x[ref.idx[is.all.na], k]
x
}
And for fair comparison with #Eldar's solution, replace is.all.na with is.all.na <- is.na(x[, k]).
Finally I realized my version of vectorized solution and it works as expected. Any comments and suggestions are welcome :)
# Last Observation Move Forward
# works as na.locf but much faster and accepts only 1D structures
na.lomf <- function(object, na.rm = F) {
idx <- which(!is.na(object))
if (!na.rm && is.na(object[1])) idx <- c(1, idx)
rep.int(object[idx], diff(c(idx, length(object) + 1)))
}
na.replace <- function(x, k) {
v <- x[, k]
i <- which(is.na(v))
r <- na.lomf(v)
x[i, ] <- r[i]
x
}
Here's a workaround with the na.locf function from zoo:
m[na.locf(ifelse(apply(m, 1, function(x) all(is.na(x))), NA, 1:nrow(m)), na.rm=F),]
[,1] [,2]
[1,] NA NA
[2,] NA NA
[3,] 1 2
[4,] 2 3
[5,] 2 3
[6,] 2 3
[7,] 2 3
[8,] 6 7
[9,] 7 8
[10,] 8 9

Construct dynamic-sized array in R

I was wondering about what are the ways to construct dynamic-size array in R.
For one example, I want to construct a n-vector but its dimension n is dynamically determined. The following code will work:
> x=NULL
> n=2;
> for (i in 1:n) x[i]=i;
> x
[1] 1 2
For another example, I want to construct a n by 2 matrix where the number of rows n is dynamically determined. But I fail even at assigning the first row:
> tmp=c(1,2)
> x=NULL
> x[1,]=tmp
Error in x[1, ] = tmp : incorrect number of subscripts on matrix
> x[1,:]=tmp
Error: unexpected ':' in "x[1,:"
Thanks and regards!
I think the answers you are looking for are rbind() and cbind():
> x=NULL # could also use x <- c()
> rbind(x, c(1,2))
[,1] [,2]
[1,] 1 2
> x <- rbind(x, c(1,2))
> x <- rbind(x, c(1,2)) # now extend row-wise
> x
[,1] [,2]
[1,] 1 2
[2,] 1 2
> x <- cbind(x, c(1,2)) # or column-wise
> x
[,1] [,2] [,3]
[1,] 1 2 1
[2,] 1 2 2
The strategy of trying to assign to "new indices" on the fly as you attempted can be done in some languages but cannot be done that way in R.
You can also use sparse matrices provided in the Matrix package. They would allow assignments of the form M <- sparseMatrix(i=200, j=50, x=234) resulting in a single value at row 200, column 50 and 0's everywhere else.
require(Matrix)
M <- sparseMatrix(i=200, j=50, x=234)
M[1,1]
# [1] 0
M[200, 50]
# [1] 234
But I think the use of sparse matrices is best reserved for later use after mastering regular matrices.
It is possible to dimension the array after we fill it (in a one-dimensional, vector, fashion)
Emulating the 1-dimension snippet of the question, here's the way it can be done with higher dimensions.
> x=c()
> tmp=c(1,2)
> n=6
> for (i in seq(1, by=2, length=n)) x[i:(i+1)] =tmp;
> dim(x) = c(2,n)
> x
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 1 1 1 1
[2,] 2 2 2 2 2 2
>
Rather than using i:(i+1) as index, it may be preferable to use seq(i, length=2) or better yet, seq(i, length=length(tmp)) for a more generic approach, as illustrated below (for a 4 x 7 array example)
> x=c()
> tmp=c(1,2,3,4)
> n=7
> for (i in seq(1, by=length(tmp), length=n))
x[seq(i, length=length(tmp))] = tmp;
> dim(x) = c(length(tmp),n)
> x
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 1 1 1 1 1 1
[2,] 2 2 2 2 2 2 2
[3,] 3 3 3 3 3 3 3
[4,] 4 4 4 4 4 4 4
>
We can also obtain a similar result by re-assigning x with cbind/rbind, as follow.
> tmp=c(1,2)
> n=6
> x=rbind(tmp)
> for (i in 1:n) x=rbind(x, tmp);
> x
[,1] [,2]
tmp 1 2
tmp 1 2
tmp 1 2
tmp 1 2
tmp 1 2
tmp 1 2
tmp 1 2
Note: one can get rid of the "tmp" names (these are a side effect of the rbind), with
> dimnames(x)=NULL
You can rbind it:
tmp = c(1,2)
x = NULL
rbind(x, tmp)
I believe this is an approach you need
arr <- array(1)
arr <- append(arr,3)
arr[1] <- 2
print(arr[1])
(found on rosettacode.org)
When I want to dynamically construct an array (matrix), I do it like so:
n <- 500
new.mtrx <- matrix(ncol = 2, nrow = n)
head(new.mtrx)
[,1] [,2]
[1,] NA NA
[2,] NA NA
[3,] NA NA
[4,] NA NA
[5,] NA NA
[6,] NA NA
Your matrix is now ready to accept vectors.
Assuming you already have a vector, you pass that to the matrix() function. Notice how values are "broken" into the matrix (column wise). This can be changed with byrow argument.
matrix(letters, ncol = 2)
[,1] [,2]
[1,] "a" "n"
[2,] "b" "o"
[3,] "c" "p"
[4,] "d" "q"
[5,] "e" "r"
[6,] "f" "s"
[7,] "g" "t"
[8,] "h" "u"
[9,] "i" "v"
[10,] "j" "w"
[11,] "k" "x"
[12,] "l" "y"
[13,] "m" "z"
n = 5
x = c(1,2) %o% rep(1,n)
x
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 1 1 1 1
# [2,] 2 2 2 2 2
x = rep(1,n) %o% c(1,2)
x
# [,1] [,2]
# [1,] 1 2
# [2,] 1 2
# [3,] 1 2
# [4,] 1 2
# [5,] 1 2

Form matrix from rows in 3-dimensional array

I have X, a three-dimensional array in R. I want to take a vector of indices indx (length equal to dim(X)[1]) and form a matrix where the first row is the first row of X[ , , indx[1]], the second row is the second row of X[ , , indx[2]], and so on.
For example, I have:
R> X <- array(1:18, dim = c(3, 2, 3))
R> X
, , 1
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
, , 2
[,1] [,2]
[1,] 7 10
[2,] 8 11
[3,] 9 12
, , 3
[,1] [,2]
[1,] 13 16
[2,] 14 17
[3,] 15 18
R> indx <- c(2, 3, 1)
My desired output is
R> rbind(X[1, , 2], X[2, , 3], X[3, , 1])
[,1] [,2]
[1,] 7 10
[2,] 14 17
[3,] 3 6
As of now I'm using the inelegant (and slow) sapply(1:dim(X)[2], function(x) X[cbind(1:3, x, indx)]). Is there any way to do this using the built-in indexing functions? I had no luck experimenting with the matrix indexing methods described in ?Extract, but I may just be doing it wrong.
Maybe like this:
t(sapply(1:3,function(x) X[,,idx][x,,x]))
I may be answering the wrong question (I can't reconcile your first description and your sample output)... This produces your sample output, but I can't say that it's much faster without running it on your data.
do.call(rbind, lapply(1:dim(X)[1], function(i) X[i, , indx[i]]))
Matrix indexing to the rescue! No applys needed.
Figure out which indices you want:
n <- dim(X)[2]
foo <- cbind(rep(seq_along(indx),n),
rep(seq.int(n), each=length(indx)),
rep(indx,n))
(the result is this)
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 2 1 3
[3,] 3 1 1
[4,] 1 2 2
[5,] 2 2 3
[6,] 3 2 1
and use it as index, converting back to a matrix to make it look like your output.
> matrix(X[foo],ncol=n)
[,1] [,2]
[1,] 7 10
[2,] 14 17
[3,] 3 6

Resources