Transform binary vector to binary matrix - r

I have a binary vector that holds information on whether or not some event happened for some observation:
v <- c(0,1,1,0)
What I want to achieve is a matrix that holds information on all bivariate pairs of observations in this vector. That is, if two observations both have 0 or both have 1 in this vector v, they should get a 1 in the matrix. If one has 0 and the other has 1, they should get a 0 otherwise.
Hence, the goal is this matrix:
[,1] [,2] [,3] [,4]
[1,] 0 0 0 1
[2,] 0 0 1 0
[3,] 0 1 0 0
[4,] 1 0 0 0
Whether the main diagonal is 0 or 1 does not matter for me.
Is there an efficient and simple way to achieve this that does not require a combination of if statements and for loops? v might be of considerable size.
Thanks!

We can use outer
out <- outer(v, v, `==`)
diag(out) <- 0L # as you don't want to compare each element to itself
out
# [,1] [,2] [,3] [,4]
#[1,] 0 0 0 1
#[2,] 0 0 1 0
#[3,] 0 1 0 0
#[4,] 1 0 0 0

Another option with expand.grid is to create pairwise combinations of v with itself and since you have values of only 0 and 1, we can find values with 0 and 2. (0 + 0 and 1 + 1).
inds <- rowSums(expand.grid(v, v))
matrix(+(inds == 0 | inds == 2), nrow = length(v))
# [,1] [,2] [,3] [,4]
#[1,] 1 0 0 1
#[2,] 0 1 1 0
#[3,] 0 1 1 0
#[4,] 1 0 0 1
Since, the diagonal element are not important for you, I will keep it as it is or if you want to change you can use diag as shown in #markus's answer.

Another (slightly less efficient) approach than the use of outer would be sapply:
out <- sapply(v, function(x){
x == v
})
diag(out) <- 0L
out
[,1] [,2] [,3] [,4]
[1,] 0 0 0 1
[2,] 0 0 1 0
[3,] 0 1 0 0
[4,] 1 0 0 0
microbenchmark on a vector of length 1000:
> test <- microbenchmark("LAP" = sapply(v, function(x){
+ x == v
+ }),
+ "markus" = outer(v, v, `==`), times = 1000, unit = "ms")
> test
Unit: milliseconds
expr min lq mean median uq max neval
LAP 3.973111 4.065555 5.747905 4.573002 6.324607 101.03498 1000
markus 3.515725 3.535067 4.852606 3.694924 4.908930 84.85184 1000

If you allow the main diagonal to be 1, then there will always be two unique rows v and 1 - v in this matrix no matter how large v is. Since the matrix is symmetric, it also has two such unique columns. This makes it trivial to construct this matrix.
## example `v`
set.seed(0)
v <- sample.int(2, 10, replace = TRUE) - 1L
#[1] 1 0 0 1 1 0 1 1 1 1
## column expansion from unique columns
cbind(v, 1 - v, deparse.level = 0L)[, 2 - v]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 0 0 1 1 0 1 1 1 1
# [2,] 0 1 1 0 0 1 0 0 0 0
# [3,] 0 1 1 0 0 1 0 0 0 0
# [4,] 1 0 0 1 1 0 1 1 1 1
# [5,] 1 0 0 1 1 0 1 1 1 1
# [6,] 0 1 1 0 0 1 0 0 0 0
# [7,] 1 0 0 1 1 0 1 1 1 1
# [8,] 1 0 0 1 1 0 1 1 1 1
# [9,] 1 0 0 1 1 0 1 1 1 1
#[10,] 1 0 0 1 1 0 1 1 1 1
What is the purpose of this matrix?
If there are n0 zeros and n1 ones, the matrix will have dimension (n0 + n1) x (n0 + n1), but there are only (n0 x n0 + n1 x n1) ones in the matrix. So for long vector v, the matrix is sparse. In fact, it has super sparsity, as it has large number of duplicated rows / columns.
Obviously, if you want to store the position of 1 in this matrix, you can simply get it without forming this matrix at all.

Related

R, Trying to transform a vector of integer to a specific binary Matrix

I would like to transform a vector of integer such:
vector = c(0,6,1,8,5,4,2)
length(vector) = 7
max(vector) = 8
into a matrix m of nrow = length(vector) and ncol = max(vector) :
m =
0 0 0 0 0 0 0 0
1 1 1 1 1 1 0 0
1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1
1 1 1 1 1 0 0 0
1 1 1 1 0 0 0 0
1 1 0 0 0 0 0 0
It's just an example of what I am trying to do. I intend that the function work with every vector of integer.
I tried to used the function mapply(rep, 1, vector) but I obtained a list and I didn't succeed to convert it into a matrix...
It would be very useful for me if someone can help me.
Best Regards,
Maxime
If you use c(rep(1, x), rep(0, max(vector-x)) on each element of your variable vector you get the desired binary results. Looping that with sapply even returns a matrix. You only need to transpose it afterwards and you get your result.
vector = c(0,6,1,8,5,4,2)
result <- t(sapply(vector, function(x) c(rep(1, x), rep(0, max(vector)-x))))
is.matrix(result)
#> [1] TRUE
result
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 0 0 0 0 0 0 0 0
#> [2,] 1 1 1 1 1 1 0 0
#> [3,] 1 0 0 0 0 0 0 0
#> [4,] 1 1 1 1 1 1 1 1
#> [5,] 1 1 1 1 1 0 0 0
#> [6,] 1 1 1 1 0 0 0 0
#> [7,] 1 1 0 0 0 0 0 0
Putting that into a function is easy:
binaryMatrix <- function(v) {
t(sapply(v, function(x) c(rep(1, x), rep(0, max(v)-x))))
}
binaryMatrix(vector)
# same result as before
Created on 2021-02-14 by the reprex package (v1.0.0)
Another straightforward approach would be to exploit matrix sub-assignment using row/column indices in a matrix form (see, also, ?Extract).
Define a matrix of 0s:
x = c(0, 6, 1, 8, 5, 4, 2)
m = matrix(0L, nrow = length(x), ncol = max(x))
And fill with 1s:
i = rep(seq_along(x), x) ## row indices of 1s
j = sequence(x) ## column indices of 1s
ij = cbind(i, j)
m[ij] = 1L
m
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#[1,] 0 0 0 0 0 0 0 0
#[2,] 1 1 1 1 1 1 0 0
#[3,] 1 0 0 0 0 0 0 0
#[4,] 1 1 1 1 1 1 1 1
#[5,] 1 1 1 1 1 0 0 0
#[6,] 1 1 1 1 0 0 0 0
#[7,] 1 1 0 0 0 0 0 0
Assuming that all values in the vector are non-negative integers, you can define the following function
transformVectorToMatrix <- function(v) {
nrOfCols <- max(v)
zeroRow <- integer(nrOfCols)
do.call("rbind",lapply(v,function(nrOfOnes) {
if(nrOfOnes==0) return(zeroRow)
if(nrOfOnes==nrOfCols) return(zeroRow+1)
c(integer(nrOfOnes)+1,integer(nrOfCols-nrOfOnes))
}))
}
and finally do
m = transformVectorToMatrix(vector)
to get your desired binary matrix.

Obtain all possible matrices by swapping only two positions in any given column

Let's start with the following matrix.
M <- matrix(c(0,0,1,1,0,0,1,1,
0,1,1,0,0,1,1,0,
0,0,0,0,1,1,1,1,
0,1,0,1,1,0,1,0,
0,0,1,1,1,1,0,0,
0,1,1,0,1,0,0,1),nrow = 8,ncol = 6)
Here is the M
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 0 0 0
[2,] 0 1 0 1 0 1
[3,] 1 1 0 0 1 1
[4,] 1 0 0 1 1 0
[5,] 0 0 1 1 1 1
[6,] 0 1 1 0 1 0
[7,] 1 1 1 1 0 0
[8,] 1 0 1 0 0 1
If I pick a random column, say 4, I want to swap two positions in that column. One such possibility is swapping 5th and 6th position is given by
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 0 0 0
[2,] 0 1 0 1 0 1
[3,] 1 1 0 0 1 1
[4,] 1 0 0 1 1 0
[5,] 0 0 1 0 1 1
[6,] 0 1 1 1 1 0
[7,] 1 1 1 1 0 0
[8,] 1 0 1 0 0 1
I want to do this for every possible swap in each column and then for all columns to obtain all the possible matrices.
Here's another solution:
# Return all unique permutations for c(0,0,0,0,1,1,1,1)
library(gtools)
perms = unique(permutations(8, 8, M[,1], set = FALSE))
# Create nested list
Mat_list = lapply(vector("list", ncol(M)), function(x) vector("list", nrow(perms)))
# Loop through every column and every permutations replacing each column
# with each unique permutation one at a time
for(ii in 1:ncol(M)){
for(jj in 1:nrow(perms)){
New_Mat = M
New_Mat[,ii] = perms[jj,]
Mat_list[[ii]][[jj]] = New_Mat
}
}
Result:
> Mat_list[[1]][[2]]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 0 0 0
[2,] 0 1 0 1 0 1
[3,] 1 1 0 0 1 1
[4,] 1 0 0 1 1 0
[5,] 0 0 1 1 1 1
[6,] 1 1 1 0 1 0
[7,] 0 1 1 1 0 0
[8,] 1 0 1 0 0 1
Note:
Instead of creating a super long list, I've created a nested list of matrices with 8 elements and n sub-elements per element (where n is the number of unique permutations). You can unlist the result if you prefer the long list form.
This code gives every permutation of 0s and 1s by column. I used a smaller toy example here, because the number of possibilities can get very large -- prod(choose(nrow(M), colSums(M))). As a note, this will likely not run on a standard computer for the matrix given, because of memory requirements.
library(gtools)
set.seed(1234)
M <- matrix(sample(0:1, 16, replace = TRUE), ncol = 4)
M
# [,1] [,2] [,3] [,4]
# [1,] 0 1 1 0
# [2,] 1 1 1 1
# [3,] 1 0 1 0
# [4,] 1 0 1 1
perm1s <- function(n, N) {
unique(permutations(N, N, c(rep(0, N - n), rep(1, n)), FALSE, FALSE))
}
createMat <- function(vec, lst) {
tmp <- lapply(seq_along(vec), function(x) lst[[x]][vec[x], ])
do.call(cbind, tmp)
}
makeMats <- function(M) {
sums <- colSums(M)
rows <- nrow(M)
rowPerm <- lapply(sums, perm1s, N = rows)
comb <- expand.grid(lapply(sapply(rowPerm, nrow), seq))
comb <- lapply(split(comb, seq(nrow(comb))), unlist)
mats <- lapply(comb, createMat, lst = rowPerm)
mats
}
res <- makeMats(M)
res[[1]]
# [,1] [,2] [,3] [,4]
# [1,] 0 0 1 0
# [2,] 1 0 1 0
# [3,] 1 1 1 1
# [4,] 1 1 1 1
To hold other columns constant when varying 1 column -- sum(choose(nrow(M), colSums(M))) possibilities:
makeMats2 <- function(M) {
sums <- colSums(M)
rows <- nrow(M)
rowPerm <- lapply(sums, perm1s, N = rows)
ind <- rep(seq_along(rowPerm), sapply(rowPerm, nrow))
rowPerm <- lapply(rowPerm, function(x) split(x, seq(nrow(x))))
rowPerm <- unlist(rowPerm, recursive = FALSE)
mats <- rep(list(M), length(rowPerm))
mats <- mapply(function(x, y, z) {x[ , y] <- z; x},
x = mats, y = ind, z = rowPerm, SIMPLIFY = FALSE)
mats
}

Changing the values in a binary matrix

Consider the 8 by 6 binary matrix, M:
M <- matrix(c(0,0,1,1,0,0,1,1,
0,1,1,0,0,1,1,0,
0,0,0,0,1,1,1,1,
0,1,0,1,1,0,1,0,
0,0,1,1,1,1,0,0,
0,1,1,0,1,0,0,1),nrow = 8,ncol = 6)
Here is the M
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 0 0 0
[2,] 0 1 0 1 0 1
[3,] 1 1 0 0 1 1
[4,] 1 0 0 1 1 0
[5,] 0 0 1 1 1 1
[6,] 0 1 1 0 1 0
[7,] 1 1 1 1 0 0
[8,] 1 0 1 0 0 1
The following matrix contains the column index of the 1's in matrix M
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 2 5 2 3 2
[2,] 4 3 6 4 4 3
[3,] 7 6 7 5 5 5
[4,] 8 7 8 7 6 8
Let's denote that
ind <- matrix(c(3,4,7,8,
2,3,6,7,
5,6,7,8,
2,4,5,7,
3,4,5,6,
2,3,5,8),nrow = 4, ncol=6)
I'm trying to change a single position of 1 into 0in each column of M.
For an example, one possibility of index of1s in each column would be (4,2,5,4,3,2), i.e. 4th position of Column1, 2nd position of Column2, 5thposition of Column3 and so on. Let N be the resulting matrices. This will produce the following matrix N
N <- matrix(c(0,0,1,0,0,0,1,1,
0,0,1,0,0,1,1,0,
0,0,0,0,0,1,1,1,
0,1,0,0,1,0,1,0,
0,0,0,1,1,1,0,0,
0,0,1,0,1,0,0,1),nrow = 8,ncol = 6)
Here is that N
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 0 0 0
[2,] 0 0 0 1 0 0
[3,] 1 1 0 0 0 1
[4,] 0 0 0 0 1 0
[5,] 0 0 0 1 1 1
[6,] 0 1 1 0 1 0
[7,] 1 1 1 1 0 0
[8,] 1 0 1 0 0 1
For EACH of the resulting matrices of N, I do the following calculations.
X <- cbind(c(rep(1,nrow(N))),N)
ans <- sum(diag(solve(t(X)%*%X)[-1,-1]))
Then, I want to obtain the matrix N, which produce the smallest value of ans. How do I do this efficiently?
Let me know if this works.
We first build a conversion function that I'll need, and we build also the reverse function as you may need it at some point:
ind_to_M <- function(ind){
M <- matrix(rep(0,6*8),ncol=6)
for(i in 1:ncol(ind)){M[ind[,i],i] <- 1}
return(M)
}
M_to_ind <- function(M){apply(M==1,2,which)}
Then we will build a matrix of possible ways to ditch a value
all_possible_ways_to_ditch_value <- 1:4
for (i in 2:ncol(M)){
all_possible_ways_to_ditch_value <- merge(all_possible_ways_to_ditch_value,1:4,by=NULL)
}
# there's probably a more elegant way to do that
head(all_possible_ways_to_ditch_value)
# x y.x y.y y.x y.y y
# 1 1 1 1 1 1 1 # will be used to ditch the 1st value of ind for every column
# 2 2 1 1 1 1 1
# 3 3 1 1 1 1 1
# 4 4 1 1 1 1 1
# 5 1 2 1 1 1 1
# 6 2 2 1 1 1 1
Then we iterate through those, each time storing ans and N (as data is quite small overall).
ans_list <- list()
N_list <- list()
for(j in 1:nrow(all_possible_ways_to_ditch_value)){
#print(j)
ind_N <- matrix(rep(0,6*3),ncol=6) # initiate ind_N as an empty matrix
for(i in 1:ncol(M)){
ind_N[,i] <- ind[-all_possible_ways_to_ditch_value[j,i],i] # fill with ind except for the value we ditch
}
N <- ind_to_M(ind_N)
X <- cbind(c(rep(1,nrow(N))),N)
ans_list[[j]] <- try(sum(diag(solve(t(X)%*%X)[-1,-1])),silent=TRUE) # some systems are not well defined, we'll just ignore the errors
N_list[[j]] <- N
}
We finally retrieve the minimal ans and the relevant N
ans <- ans_list[[which.min(ans_list)]]
# [1] -3.60288e+15
N <- N_list[[which.min(ans_list)]]
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 0 0 0 0 0 0
# [2,] 0 1 0 1 0 1
# [3,] 1 1 0 0 1 1
# [4,] 1 0 0 1 1 0
# [5,] 0 0 1 1 1 1
# [6,] 0 1 1 0 0 0
# [7,] 1 0 1 0 0 0
# [8,] 0 0 0 0 0 0
EDIT:
To get minimal positive ans
ans_list[which(!sapply(ans_list,is.numeric))] <- Inf
ans <- ans_list[[which.min(abs(unlist(ans_list)))]]
# [1] 3.3
N <- N_list[[which.min(abs(unlist(ans_list)))]]
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 0 0 0 0 0 0
# [2,] 0 1 0 1 0 0
# [3,] 1 1 0 0 0 1
# [4,] 1 0 0 0 1 0
# [5,] 0 0 0 1 1 1
# [6,] 0 1 1 0 1 0
# [7,] 1 0 1 1 0 0
# [8,] 0 0 1 0 0 1
EDIT 2 : to generalize the number of rows of ind to ditch
It seems to give the same result for ans for n_ditch = 1, and results make sense for n_ditch = 2
n_ditch <- 2
ditch_possibilities <- combn(1:4,n_ditch) # these are all the possible sets of indices to ditch for one given columns
all_possible_ways_to_ditch_value <- 1:ncol(ditch_possibilities) # this will be all the possible sets of indices of ditch_possibilities to test
for (i in 2:ncol(M)){
all_possible_ways_to_ditch_value <- merge(all_possible_ways_to_ditch_value,1:ncol(ditch_possibilities),by=NULL)
}
ans_list <- list()
N_list <- list()
for(j in 1:nrow(all_possible_ways_to_ditch_value)){
#print(j)
ind_N <- matrix(rep(0,6*(4-n_ditch)),ncol=6) # initiate ind_N as an empty matrix
for(i in 1:ncol(M)){
ind_N[,i] <- ind[-ditch_possibilities[,all_possible_ways_to_ditch_value[j,i]],i] # fill with ind except for the value we ditch
}
N <- ind_to_M(ind_N)
X <- cbind(c(rep(1,nrow(N))),N)
ans_list[[j]] <- try(sum(diag(solve(t(X)%*%X)[-1,-1])),silent=TRUE) # some systems are not well defined, we'll just ignore the errors
N_list[[j]] <- N
}

one in a matrix full of zeros - R

I am new to programming and I am trying to figure it how can I make a matrix with all zeros and insert just a random one?
I've looked for help but I can only find code to create a random matrix with zeros and ones but I only want a "one" to appear at random places in a matrix.
I've looked in here for example,
http://www.r-bloggers.com/making-matrices-with-zeros-and-ones/
set.seed(1)
mm <- matrix(0, 10, 5)
apply(mm, c(1, 2), function(x) sample(c(0, 1), 1))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0 0 1 0 1
# [2,] 0 0 0 1 1
# [3,] 1 1 1 0 1
# [4,] 1 0 0 0 1
# [5,] 0 1 0 1 1
# [6,] 1 0 0 1 1
# [7,] 1 1 0 1 0
# [8,] 1 1 0 0 0
# [9,] 1 0 1 1 1
# [10,] 0 1 0 0 1
Creating all-zeros matrix is easy
X <- matrix(0, 10, 10)
now notice that matrix in R is stored as a vector with additional dimension
> str(X)
num [1:10, 1:10] 0 0 0 0 0 0 0 0 0 0 ...
so if you want to insert 1 on a random position, than just pick a random position in vector of length N*M and replace it with the value
X[sample(10*10, 1)] <- 1

Sampling and replacing random elements of a vector, conditionally

Suppose I have a vector containing data:
c <- c(1:100)
c[1:75] <- 0
c[76:100] <- 1
What I need to do is select a number of the 0's and turn them into 1's. There are potentially many ways to do this - like if I'm switching 25 of the 0's, it'd be 75 choose 25, so 5.26x10^19 - so I need do it, say, 1000 times randomly. (this is part of a larger model. I'll be using the mean of the results.)
I know (think), that I need to use sample() and a for loop - but how do I select n values randomly among the 0's, then change them to 1's?
vec <- c(rep(0, 75), rep(1, 25))
n <- 25
to_change <- sample(which(vec == 0), n)
modified_vec <- vec
modified_vec[to_change] <- 1
Something like this. You could wrap it up in a function.
And you should really do it in a matrix with apply, rather than a for loop.
This small example is easy to see it work:
n_vecs <- 5
vec_length <- 10
n_0 <- 7 # Number of 0's at the start of each vector
vec_mat <- matrix(c(rep(0, n_vecs * n_0), rep(1, n_vecs * (vec_length - n_0))),
nrow = vec_length, ncol = n_vecs, byrow = T)
> vec_mat
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 0 0
[2,] 0 0 0 0 0
[3,] 0 0 0 0 0
[4,] 0 0 0 0 0
[5,] 0 0 0 0 0
[6,] 0 0 0 0 0
[7,] 0 0 0 0 0
[8,] 1 1 1 1 1
[9,] 1 1 1 1 1
[10,] 1 1 1 1 1
change_n_0 <- function(x, n) {
x_change <- sample(which(x == 0), n)
x[x_change] <- 1
return(x)
}
vec_mat <- apply(vec_mat, MARGIN = 2, FUN = change_n_0, n = 2)
> vec_mat
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 0 0 1
[2,] 0 0 0 1 0
[3,] 0 0 0 0 0
[4,] 0 0 0 0 0
[5,] 0 0 1 0 1
[6,] 0 1 0 1 0
[7,] 1 0 1 0 0
[8,] 1 1 1 1 1
[9,] 1 1 1 1 1
[10,] 1 1 1 1 1
You can scale up the constants at the beginning as big as you'd like.

Resources