Related
I would like to transform a vector of integer such:
vector = c(0,6,1,8,5,4,2)
length(vector) = 7
max(vector) = 8
into a matrix m of nrow = length(vector) and ncol = max(vector) :
m =
0 0 0 0 0 0 0 0
1 1 1 1 1 1 0 0
1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1
1 1 1 1 1 0 0 0
1 1 1 1 0 0 0 0
1 1 0 0 0 0 0 0
It's just an example of what I am trying to do. I intend that the function work with every vector of integer.
I tried to used the function mapply(rep, 1, vector) but I obtained a list and I didn't succeed to convert it into a matrix...
It would be very useful for me if someone can help me.
Best Regards,
Maxime
If you use c(rep(1, x), rep(0, max(vector-x)) on each element of your variable vector you get the desired binary results. Looping that with sapply even returns a matrix. You only need to transpose it afterwards and you get your result.
vector = c(0,6,1,8,5,4,2)
result <- t(sapply(vector, function(x) c(rep(1, x), rep(0, max(vector)-x))))
is.matrix(result)
#> [1] TRUE
result
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 0 0 0 0 0 0 0 0
#> [2,] 1 1 1 1 1 1 0 0
#> [3,] 1 0 0 0 0 0 0 0
#> [4,] 1 1 1 1 1 1 1 1
#> [5,] 1 1 1 1 1 0 0 0
#> [6,] 1 1 1 1 0 0 0 0
#> [7,] 1 1 0 0 0 0 0 0
Putting that into a function is easy:
binaryMatrix <- function(v) {
t(sapply(v, function(x) c(rep(1, x), rep(0, max(v)-x))))
}
binaryMatrix(vector)
# same result as before
Created on 2021-02-14 by the reprex package (v1.0.0)
Another straightforward approach would be to exploit matrix sub-assignment using row/column indices in a matrix form (see, also, ?Extract).
Define a matrix of 0s:
x = c(0, 6, 1, 8, 5, 4, 2)
m = matrix(0L, nrow = length(x), ncol = max(x))
And fill with 1s:
i = rep(seq_along(x), x) ## row indices of 1s
j = sequence(x) ## column indices of 1s
ij = cbind(i, j)
m[ij] = 1L
m
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#[1,] 0 0 0 0 0 0 0 0
#[2,] 1 1 1 1 1 1 0 0
#[3,] 1 0 0 0 0 0 0 0
#[4,] 1 1 1 1 1 1 1 1
#[5,] 1 1 1 1 1 0 0 0
#[6,] 1 1 1 1 0 0 0 0
#[7,] 1 1 0 0 0 0 0 0
Assuming that all values in the vector are non-negative integers, you can define the following function
transformVectorToMatrix <- function(v) {
nrOfCols <- max(v)
zeroRow <- integer(nrOfCols)
do.call("rbind",lapply(v,function(nrOfOnes) {
if(nrOfOnes==0) return(zeroRow)
if(nrOfOnes==nrOfCols) return(zeroRow+1)
c(integer(nrOfOnes)+1,integer(nrOfCols-nrOfOnes))
}))
}
and finally do
m = transformVectorToMatrix(vector)
to get your desired binary matrix.
I have a binary vector that holds information on whether or not some event happened for some observation:
v <- c(0,1,1,0)
What I want to achieve is a matrix that holds information on all bivariate pairs of observations in this vector. That is, if two observations both have 0 or both have 1 in this vector v, they should get a 1 in the matrix. If one has 0 and the other has 1, they should get a 0 otherwise.
Hence, the goal is this matrix:
[,1] [,2] [,3] [,4]
[1,] 0 0 0 1
[2,] 0 0 1 0
[3,] 0 1 0 0
[4,] 1 0 0 0
Whether the main diagonal is 0 or 1 does not matter for me.
Is there an efficient and simple way to achieve this that does not require a combination of if statements and for loops? v might be of considerable size.
Thanks!
We can use outer
out <- outer(v, v, `==`)
diag(out) <- 0L # as you don't want to compare each element to itself
out
# [,1] [,2] [,3] [,4]
#[1,] 0 0 0 1
#[2,] 0 0 1 0
#[3,] 0 1 0 0
#[4,] 1 0 0 0
Another option with expand.grid is to create pairwise combinations of v with itself and since you have values of only 0 and 1, we can find values with 0 and 2. (0 + 0 and 1 + 1).
inds <- rowSums(expand.grid(v, v))
matrix(+(inds == 0 | inds == 2), nrow = length(v))
# [,1] [,2] [,3] [,4]
#[1,] 1 0 0 1
#[2,] 0 1 1 0
#[3,] 0 1 1 0
#[4,] 1 0 0 1
Since, the diagonal element are not important for you, I will keep it as it is or if you want to change you can use diag as shown in #markus's answer.
Another (slightly less efficient) approach than the use of outer would be sapply:
out <- sapply(v, function(x){
x == v
})
diag(out) <- 0L
out
[,1] [,2] [,3] [,4]
[1,] 0 0 0 1
[2,] 0 0 1 0
[3,] 0 1 0 0
[4,] 1 0 0 0
microbenchmark on a vector of length 1000:
> test <- microbenchmark("LAP" = sapply(v, function(x){
+ x == v
+ }),
+ "markus" = outer(v, v, `==`), times = 1000, unit = "ms")
> test
Unit: milliseconds
expr min lq mean median uq max neval
LAP 3.973111 4.065555 5.747905 4.573002 6.324607 101.03498 1000
markus 3.515725 3.535067 4.852606 3.694924 4.908930 84.85184 1000
If you allow the main diagonal to be 1, then there will always be two unique rows v and 1 - v in this matrix no matter how large v is. Since the matrix is symmetric, it also has two such unique columns. This makes it trivial to construct this matrix.
## example `v`
set.seed(0)
v <- sample.int(2, 10, replace = TRUE) - 1L
#[1] 1 0 0 1 1 0 1 1 1 1
## column expansion from unique columns
cbind(v, 1 - v, deparse.level = 0L)[, 2 - v]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 0 0 1 1 0 1 1 1 1
# [2,] 0 1 1 0 0 1 0 0 0 0
# [3,] 0 1 1 0 0 1 0 0 0 0
# [4,] 1 0 0 1 1 0 1 1 1 1
# [5,] 1 0 0 1 1 0 1 1 1 1
# [6,] 0 1 1 0 0 1 0 0 0 0
# [7,] 1 0 0 1 1 0 1 1 1 1
# [8,] 1 0 0 1 1 0 1 1 1 1
# [9,] 1 0 0 1 1 0 1 1 1 1
#[10,] 1 0 0 1 1 0 1 1 1 1
What is the purpose of this matrix?
If there are n0 zeros and n1 ones, the matrix will have dimension (n0 + n1) x (n0 + n1), but there are only (n0 x n0 + n1 x n1) ones in the matrix. So for long vector v, the matrix is sparse. In fact, it has super sparsity, as it has large number of duplicated rows / columns.
Obviously, if you want to store the position of 1 in this matrix, you can simply get it without forming this matrix at all.
Consider the 8 by 6 binary matrix, M:
M <- matrix(c(0,0,1,1,0,0,1,1,
0,1,1,0,0,1,1,0,
0,0,0,0,1,1,1,1,
0,1,0,1,1,0,1,0,
0,0,1,1,1,1,0,0,
0,1,1,0,1,0,0,1),nrow = 8,ncol = 6)
Here is the M
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 0 0 0
[2,] 0 1 0 1 0 1
[3,] 1 1 0 0 1 1
[4,] 1 0 0 1 1 0
[5,] 0 0 1 1 1 1
[6,] 0 1 1 0 1 0
[7,] 1 1 1 1 0 0
[8,] 1 0 1 0 0 1
The following matrix contains the column index of the 1's in matrix M
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 2 5 2 3 2
[2,] 4 3 6 4 4 3
[3,] 7 6 7 5 5 5
[4,] 8 7 8 7 6 8
Let's denote that
ind <- matrix(c(3,4,7,8,
2,3,6,7,
5,6,7,8,
2,4,5,7,
3,4,5,6,
2,3,5,8),nrow = 4, ncol=6)
I'm trying to change a single position of 1 into 0in each column of M.
For an example, one possibility of index of1s in each column would be (4,2,5,4,3,2), i.e. 4th position of Column1, 2nd position of Column2, 5thposition of Column3 and so on. Let N be the resulting matrices. This will produce the following matrix N
N <- matrix(c(0,0,1,0,0,0,1,1,
0,0,1,0,0,1,1,0,
0,0,0,0,0,1,1,1,
0,1,0,0,1,0,1,0,
0,0,0,1,1,1,0,0,
0,0,1,0,1,0,0,1),nrow = 8,ncol = 6)
Here is that N
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 0 0 0
[2,] 0 0 0 1 0 0
[3,] 1 1 0 0 0 1
[4,] 0 0 0 0 1 0
[5,] 0 0 0 1 1 1
[6,] 0 1 1 0 1 0
[7,] 1 1 1 1 0 0
[8,] 1 0 1 0 0 1
For EACH of the resulting matrices of N, I do the following calculations.
X <- cbind(c(rep(1,nrow(N))),N)
ans <- sum(diag(solve(t(X)%*%X)[-1,-1]))
Then, I want to obtain the matrix N, which produce the smallest value of ans. How do I do this efficiently?
Let me know if this works.
We first build a conversion function that I'll need, and we build also the reverse function as you may need it at some point:
ind_to_M <- function(ind){
M <- matrix(rep(0,6*8),ncol=6)
for(i in 1:ncol(ind)){M[ind[,i],i] <- 1}
return(M)
}
M_to_ind <- function(M){apply(M==1,2,which)}
Then we will build a matrix of possible ways to ditch a value
all_possible_ways_to_ditch_value <- 1:4
for (i in 2:ncol(M)){
all_possible_ways_to_ditch_value <- merge(all_possible_ways_to_ditch_value,1:4,by=NULL)
}
# there's probably a more elegant way to do that
head(all_possible_ways_to_ditch_value)
# x y.x y.y y.x y.y y
# 1 1 1 1 1 1 1 # will be used to ditch the 1st value of ind for every column
# 2 2 1 1 1 1 1
# 3 3 1 1 1 1 1
# 4 4 1 1 1 1 1
# 5 1 2 1 1 1 1
# 6 2 2 1 1 1 1
Then we iterate through those, each time storing ans and N (as data is quite small overall).
ans_list <- list()
N_list <- list()
for(j in 1:nrow(all_possible_ways_to_ditch_value)){
#print(j)
ind_N <- matrix(rep(0,6*3),ncol=6) # initiate ind_N as an empty matrix
for(i in 1:ncol(M)){
ind_N[,i] <- ind[-all_possible_ways_to_ditch_value[j,i],i] # fill with ind except for the value we ditch
}
N <- ind_to_M(ind_N)
X <- cbind(c(rep(1,nrow(N))),N)
ans_list[[j]] <- try(sum(diag(solve(t(X)%*%X)[-1,-1])),silent=TRUE) # some systems are not well defined, we'll just ignore the errors
N_list[[j]] <- N
}
We finally retrieve the minimal ans and the relevant N
ans <- ans_list[[which.min(ans_list)]]
# [1] -3.60288e+15
N <- N_list[[which.min(ans_list)]]
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 0 0 0 0 0 0
# [2,] 0 1 0 1 0 1
# [3,] 1 1 0 0 1 1
# [4,] 1 0 0 1 1 0
# [5,] 0 0 1 1 1 1
# [6,] 0 1 1 0 0 0
# [7,] 1 0 1 0 0 0
# [8,] 0 0 0 0 0 0
EDIT:
To get minimal positive ans
ans_list[which(!sapply(ans_list,is.numeric))] <- Inf
ans <- ans_list[[which.min(abs(unlist(ans_list)))]]
# [1] 3.3
N <- N_list[[which.min(abs(unlist(ans_list)))]]
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 0 0 0 0 0 0
# [2,] 0 1 0 1 0 0
# [3,] 1 1 0 0 0 1
# [4,] 1 0 0 0 1 0
# [5,] 0 0 0 1 1 1
# [6,] 0 1 1 0 1 0
# [7,] 1 0 1 1 0 0
# [8,] 0 0 1 0 0 1
EDIT 2 : to generalize the number of rows of ind to ditch
It seems to give the same result for ans for n_ditch = 1, and results make sense for n_ditch = 2
n_ditch <- 2
ditch_possibilities <- combn(1:4,n_ditch) # these are all the possible sets of indices to ditch for one given columns
all_possible_ways_to_ditch_value <- 1:ncol(ditch_possibilities) # this will be all the possible sets of indices of ditch_possibilities to test
for (i in 2:ncol(M)){
all_possible_ways_to_ditch_value <- merge(all_possible_ways_to_ditch_value,1:ncol(ditch_possibilities),by=NULL)
}
ans_list <- list()
N_list <- list()
for(j in 1:nrow(all_possible_ways_to_ditch_value)){
#print(j)
ind_N <- matrix(rep(0,6*(4-n_ditch)),ncol=6) # initiate ind_N as an empty matrix
for(i in 1:ncol(M)){
ind_N[,i] <- ind[-ditch_possibilities[,all_possible_ways_to_ditch_value[j,i]],i] # fill with ind except for the value we ditch
}
N <- ind_to_M(ind_N)
X <- cbind(c(rep(1,nrow(N))),N)
ans_list[[j]] <- try(sum(diag(solve(t(X)%*%X)[-1,-1])),silent=TRUE) # some systems are not well defined, we'll just ignore the errors
N_list[[j]] <- N
}
I want to generate a 7 column by 10 row matrix with a total of exactly 20 randomly generated 1's, but with at least two 1's per row and two 1's per column. How could I do that?
Also, how would the code be different if I wanted to set a range of acceptable row and column totals instead of minimums?
Thanks!
I feel like there should be a more elegant solution, but here's a ball of duct tape:
matbuilder <- function(n,nrow,ncol) {
finished <- F
while(!finished) {
trial <- matrix(sample(c(rep(1,n),rep(0,nrow*ncol-n))),nrow=nrow,ncol=ncol)
if(all(rowSums(trial)>=2 & all(colSums(trial)>=2))) finished <- T
}
return(trial)
}
x <- matbuilder(20, 10, 7)
x
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 1 1 0 0 0 0 0
## [2,] 0 0 0 0 1 0 1
## [3,] 0 1 0 0 0 1 0
## [4,] 1 0 0 0 1 0 0
## [5,] 0 1 0 0 0 1 0
## [6,] 0 0 1 1 0 0 0
## [7,] 0 0 1 1 0 0 0
## [8,] 0 0 0 0 1 0 1
## [9,] 0 0 0 0 0 1 1
## [10,] 0 1 1 0 0 0 0
sum(x)
## [1] 20
rowSums(x)
## [1] 2 2 2 2 2 2 2 2 2 2
colSums(x)
## [1] 2 4 3 2 3 3 3
Or, to give a range of acceptable row/column totals...
matbuilder <- function(n,nrow,ncol,rowmin,rowmax,colmin,colmax,ntimeout=100000) {
finished <- F
i <- 1
trial <- NA
while(!finished) {
trial <- matrix(sample(c(rep(1,n),rep(0,nrow*ncol-n))),nrow=nrow,ncol=ncol)
if(all(rowSums(trial)>=rowmin) & all(rowSums(trial)<=rowmax) & all(colSums(trial)>=colmin) & all(colSums(trial)<=colmax)) finished <- T
i <- i+1
if(i>ntimeout) {
finished <- T
cat("sorry boss, timeout.")
}
}
return(trial)
}
x <- matbuilder(25,10,7,rowmin=2,rowmax=3,colmin=2,colmax=4)
x
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 1 0 0 0 0 1 1
## [2,] 0 1 1 0 0 0 1
## [3,] 1 0 0 0 1 0 0
## [4,] 1 0 1 1 0 0 0
## [5,] 1 0 0 0 0 1 1
## [6,] 0 1 1 1 0 0 0
## [7,] 0 0 0 1 0 0 1
## [8,] 0 0 1 0 1 0 0
## [9,] 0 0 0 1 1 0 0
## [10,] 0 0 0 0 1 1 0
sum(x)
## [1] 25
rowSums(x)
## [1] 3 3 2 3 3 3 2 2 2 2
colSums(x)
## [1] 4 2 4 4 4 3 4
This one involves sampling a matrix of indices such that each row is repeated twice and columns are repeated at least 2 times.
set.seed(42)
m = matrix(rep(0, 70), nrow = 10)
#Sample rows 1-10 twice
rows = sample(c(1:10, 1:10))
#Sample columns 1-7 twice and additional 6 to make 20
columns = sample(c(sample(1:7, 6, replace = TRUE), 1:7, 1:7))
#Create a matrix of indices that should be 1
inds = cbind(rows, columns)
#Remove duplicates in inds if any (Refer: https://stackoverflow.com/q/44555420/7128934)
inds[,2] = replace(x = inds[,2],
list = duplicated(inds),
values = sample(x = columns[!(columns %in% inds[,2][duplicated(inds)])],
size = 1))
m[inds] = 1
#Check
rowSums(m)
#[1] 2 2 2 2 2 2 2 2 2 2
colSums(m)
#[1] 4 2 2 3 2 2 5
sum(m)
#[1] 20
Suppose I have a vector containing data:
c <- c(1:100)
c[1:75] <- 0
c[76:100] <- 1
What I need to do is select a number of the 0's and turn them into 1's. There are potentially many ways to do this - like if I'm switching 25 of the 0's, it'd be 75 choose 25, so 5.26x10^19 - so I need do it, say, 1000 times randomly. (this is part of a larger model. I'll be using the mean of the results.)
I know (think), that I need to use sample() and a for loop - but how do I select n values randomly among the 0's, then change them to 1's?
vec <- c(rep(0, 75), rep(1, 25))
n <- 25
to_change <- sample(which(vec == 0), n)
modified_vec <- vec
modified_vec[to_change] <- 1
Something like this. You could wrap it up in a function.
And you should really do it in a matrix with apply, rather than a for loop.
This small example is easy to see it work:
n_vecs <- 5
vec_length <- 10
n_0 <- 7 # Number of 0's at the start of each vector
vec_mat <- matrix(c(rep(0, n_vecs * n_0), rep(1, n_vecs * (vec_length - n_0))),
nrow = vec_length, ncol = n_vecs, byrow = T)
> vec_mat
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 0 0
[2,] 0 0 0 0 0
[3,] 0 0 0 0 0
[4,] 0 0 0 0 0
[5,] 0 0 0 0 0
[6,] 0 0 0 0 0
[7,] 0 0 0 0 0
[8,] 1 1 1 1 1
[9,] 1 1 1 1 1
[10,] 1 1 1 1 1
change_n_0 <- function(x, n) {
x_change <- sample(which(x == 0), n)
x[x_change] <- 1
return(x)
}
vec_mat <- apply(vec_mat, MARGIN = 2, FUN = change_n_0, n = 2)
> vec_mat
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 0 0 1
[2,] 0 0 0 1 0
[3,] 0 0 0 0 0
[4,] 0 0 0 0 0
[5,] 0 0 1 0 1
[6,] 0 1 0 1 0
[7,] 1 0 1 0 0
[8,] 1 1 1 1 1
[9,] 1 1 1 1 1
[10,] 1 1 1 1 1
You can scale up the constants at the beginning as big as you'd like.