Consider the following matrix:
set.seed(3)
nn <- 9 # This is always fixed
mm <- 6 # This is always multiple of 3. Other possible values are 9,12,15 etc.
testMat <- matrix(rbinom(nn*mm,1,.5), nrow = nn, ncol = mm)
I am trying to take the product of all the possible combinations of the first 3 columns and the next 3 columns. From the help of solutions found in internet, I can do it in following way:
testMat1 <- testMat[,1:3]
testMat2 <- testMat[,4:6]
t(sapply(1:nn, function(i) tcrossprod(testMat1[i, ], testMat2[i, ])))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 0 1 1 0 1 1 0 0 0
[2,] 1 1 0 0 0 0 0 0 0
[3,] 0 1 0 0 1 0 0 0 0
[4,] 0 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 1 1 0
[6,] 0 0 0 1 1 0 0 0 0
[7,] 0 0 0 0 0 0 0 0 0
[8,] 0 0 0 0 0 1 0 0 1
[9,] 0 0 0 0 0 0 1 1 1
However, it requires me to separate the testMat manually. I am trying to automate this process where mm is greater than 6, for example, 9. Can you suggest an efficient way to do this?
I propose this solution to the question I asked. However, I would appreciate it if someone can propose a more elegant solution.
test.WinCombo <- matrix(NA, nrow = nn, ncol = 3^(mm/3))
for (i in 1:nrow(testMat)){
temp.split <- split(testMat[i,],ceiling(seq_along(testMat[i,])/3))
test.WinCombo[i,] <- apply(expand.grid(temp.split), 1, prod)
}
I want to generate a random M x N matrix with zeros and ones with the following special properties:
1) The ones are only in m of the M rows.
2) The ones are only in n of the N columns.
Suppose I am ONLY given that M=10, N=10, m=6 and n=4. One possible random matrix is given by
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 0 1 0 0 0 1 0 1 0
[2,] 0 0 1 0 0 0 1 0 1 0
[3,] 0 0 0 0 0 0 0 0 0 0
[4,] 0 0 0 0 0 0 0 0 0 0
[5,] 0 0 1 0 0 0 1 0 0 0
[6,] 0 1 0 0 0 0 0 0 0 0
[7,] 0 0 0 0 0 0 0 0 0 0
[8,] 0 0 1 0 0 0 0 0 0 0
[9,] 0 0 1 0 0 0 0 0 1 0
[10,] 0 0 0 0 0 0 0 0 0 0
For reproducibility, I've artificially generated the above "random" matrix using
ex <- matrix(0,10,10)
ex[1,3] <- ex[1,7] <- ex[1,9] <- ex[2,3] <- ex[2,7] <-
ex[2,9] <- ex[5,3] <- ex[5,7] <- ex[6,2] <- ex[8,3] <- ex[9,3] <- ex[9,9] <- 1
Note that
sum(rowSums(ex)>0)
[1] 6
sum(colSums(ex)>0)
[1] 4
which match exactly m and n above. The number of ones can be random. On one extreme I could have 6 ones spread out over 6 rows and 4 columns (2 columns will have 2 ones, while the rest have 1) or on the other extreme, I could have 24 ones (each of the 6 rows will have a 1 in the same 4 columns).
Question
I can generate this in a brute force manner, sampling over the rows and columns, but I need to do this over thousands of such matrices (because the m and n will be different each time) and these matrices are large (M=5000 and N=8000, typically). Is there a way to do this efficiently in R?
M=10 #total number rows
N=10 #total number columns
m=6 #number valid rows
n=4 #number valid columns
#number of cells to simulate
k=12
ex <- matrix(0,M,N)
#sample m valid rows and n valid columns from uniform
mi <- sample(1:M, m)
ni <- sample(1:N, n)
#get all valid cells (valid rows and columns)
mn_i <- expand.grid(mi, ni)
#sample k cells from valid cells
x <- mn_i[sample(1:nrow(mn_i), k), ]
#update sampled cells using matrix subet on ex
ex[as.matrix(x)] <- 1
# > ex
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 0 0 0 0 0 0 0 0 0
# [2,] 0 0 0 0 0 0 0 0 0 0
# [3,] 0 0 0 0 0 0 1 1 0 1
# [4,] 0 0 0 0 0 0 0 0 0 0
# [5,] 1 0 0 0 0 0 1 1 0 1
# [6,] 1 0 0 0 0 0 1 0 0 0
# [7,] 0 0 0 0 0 0 0 0 0 0
# [8,] 0 0 0 0 0 0 0 1 0 1
# [9,] 0 0 0 0 0 0 0 0 0 0
# [10,] 0 0 0 0 0 0 0 0 0 0
You might want to wrap in a function to call it something like
ex <- constrained_matrix_sample(M, N, m, n, k)
A dplyr variant
M = 10 # rows
N = 10 # columns
m = 6
n = 4
ni = sample(1:N, n)
mi = sample(1:M, m)
expand.grid(N = 1:N, M = 1:M) %>%
mutate(value = ifelse(N %in% ni & M %in% mi, 1, 0)) %>%
.$value %>%
matrix(., nrow = M, byrow = TRUE)
Example code here:
> temp2
a b c d e f g h
i 1 1 0 0 0 1 0 1
j 0 1 0 0 0 1 0 1
k 0 1 1 0 0 1 1 1
l 0 0 0 0 1 0 0 1
m 0 0 1 1 0 0 1 1
n 0 0 1 1 0 0 1 1
o 0 0 0 1 0 0 1 1
p 0 0 0 0 1 0 0 1
> dput(temp2)
structure(list(a = c(1, 0, 0, 0, 0, 0, 0, 0), b = c(1, 1, 1,
0, 0, 0, 0, 0), c = c(0, 0, 1, 0, 1, 1, 0, 0), d = c(0, 0, 0,
0, 1, 1, 1, 0), e = c(0, 0, 0, 1, 0, 0, 0, 1), f = c(1, 1, 1,
0, 0, 0, 0, 0), g = c(0, 0, 1, 0, 1, 1, 1, 0), h = c(1, 1, 1,
1, 1, 1, 1, 1)), .Names = c("a", "b", "c", "d", "e", "f", "g",
"h"), class = "data.frame", row.names = c("i", "j", "k", "l",
"m", "n", "o", "p"))
I have this 8x8 grid of 1s and 0s. I need to solve for some grid where each row and each column has exactly one 1 and the rest 0s, but the 1 has to be in a place where the original grid has a 1. It's almost like a sudoku question but not exactly. Any thoughts on how to get started?
I would need some function that can do this for a general grid, not simply this specific one. We can assume that there's always a solution grid, given some starting grid.
Thanks!
Edit: a valid solution
> temp3
a b c d e f g h
i 1 0 0 0 0 0 0 0
j 0 1 0 0 0 0 0 0
k 0 0 0 0 0 1 0 0
l 0 0 0 0 1 0 0 0
m 0 0 0 1 0 0 0 0
n 0 0 1 0 0 0 0 0
o 0 0 0 0 0 0 1 0
p 0 0 0 0 0 0 0 1
EDIT2: given that there's only 8! unique solutions for any grid, i may attempt a brute force / matching approach.
This can be solved as a transportation problem or as an integer programming problem. We also show a one-line solution using only base R which generates random matrices for which each row and each columns column sums to 1 filtering out and returning the ones satisfying the additional constraints that each element of the solution matrix be less than or equal to the corresponding element of temp2.
1) transportation problem Using lp.transport in lpSolve we can solve it in one statement:
library(lpSolve)
res <- lp.transport(as.matrix(temp2), "max",
rep("=", 8), rep(1, 8), rep("=", 8), rep(1, 8), integers = 0:1)
res
## Success: the objective function is 8
soln <- array(res$solution, dim(temp2))
# verify
all(colSums(soln)==1) && all(rowSums(soln)==1) && all(temp2>=soln) && all(soln %in% 0:1)
## [1] TRUE
2) integer programming
If X is the solution we have specified the row and column constraints but have not specified the X <= temp2 constraints since they will be satisfied automatically as no solution putting a 1 where a temp2 0 is can have the maximum objective of 8.
library(lpSolve)
n <- nrow(temp2)
obj <- unlist(temp2)
const_row <- t(sapply(1:n, function(i) c(row(temp2)) == i)) # each row sums to 1
const_col <- t(sapply(1:n, function(i) c(col(temp2)) == i)) # each col sums to 1
const.mat <- rbind(const_row, const_col)
res <- lp("max", obj, const.mat, "=", 1, all.bin = TRUE)
res
## Success: the objective function is 8
soln <- array(res$solution, dim(temp2))
# verify
all(colSums(soln)==1) && all(rowSums(soln)==1) && all(temp2>=soln) && all(soln %in% 0:1)
## [1] TRUE
(Note that by the same argument we could have relaxed the problem to a linear programming problem provided we add 0 <= soln[i, j] <= 1 constraints since by the same argument that allowed us to omit the soln[i, j] <= temp2[i, j] constraints the maximization will force the soln elements to be 0 or 1 anyways.)
2a) This approach is longer but does spell out the X <= temp2 constraints explicitly:
n <- nrow(temp2)
obj <- numeric(n*n)
const1 <- diag(n*n) # soln[i,j] <= temp2[i,j]
const2 <- t(sapply(1:n, function(i) c(row(temp2)) == i)) # each row sums to 1
const3 <- t(sapply(1:n, function(i) c(col(temp2)) == i)) # each col sums to 1
const.mat <- rbind(const1, const2, const3)
const.dir <- rep(c("<=", "="), c(n*n, 2*n))
const.rhs <- c(unlist(temp2), rep(1, 2*n))
res <- lp("max", obj, const.mat, const.dir, const.rhs, all.bin = TRUE)
res
## Success: the objective function is 0
soln <- array(res$solution, dim(temp2))
# verify
all(colSums(soln)==1) && all(rowSums(soln)==1) && all(temp2>=soln) && all(soln %in% 0:1)
## [1] TRUE
2b) Note that if X is the solution matrix then in X <= temp2 only the positions of X corresponding to zeros in temp2 actually constrain so we could eliminate any constraint corresponding to a 1 in temp2 in the (2a) solution. With this change all constraints become equality constraints.
n <- nrow(temp2)
obj <- numeric(n*n)
const1 <- diag(n*n)[unlist(temp2) == 0, ]
const2 <- t(sapply(1:n, function(i) c(row(temp2)) == i)) # each row sums to 1
const3 <- t(sapply(1:n, function(i) c(col(temp2)) == i)) # each col sums to 1
const.mat <- rbind(const1, const2, const3)
const.dir <- "="
const.rhs <- c(numeric(nrow(const1)), rep(1, 2*n))
res <- lp("max", obj, const.mat, const.dir, const.rhs, all.bin = TRUE)
res
## Success: the objective function is 0
soln <- array(res$solution, dim(temp2))
# verify
all(colSums(soln)==1) && all(rowSums(soln)==1) && all(temp2>=soln) && all(soln %in% 0:1)
## [1] TRUE
In fact, we could go further and remove the variables that correspond to zero elements of temp2.
3) r2dtable Here we use rd2table to generate 10,000 8x8 tables whose rows and columns sum to 1 and then filter them to pick out only those satisfying the X < temp2 constrainsts. Withtemp2` from the question and the random seed shown has found 3 solutions. If with different inputs it finds no solutions then try generating a higher number of random proposals. This approach does not use any packages.
set.seed(123) # for reproducibility
Filter(function(x) all(x <= temp2), r2dtable(10000, rep(1, 8), rep(1, 8)))
giving:
[[1]]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 0 0 0 0 0 0 0
[2,] 0 0 0 0 0 1 0 0
[3,] 0 1 0 0 0 0 0 0
[4,] 0 0 0 0 0 0 0 1
[5,] 0 0 0 0 0 0 1 0
[6,] 0 0 1 0 0 0 0 0
[7,] 0 0 0 1 0 0 0 0
[8,] 0 0 0 0 1 0 0 0
[[2]]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 0 0 0 0 0 0 0
[2,] 0 0 0 0 0 1 0 0
[3,] 0 1 0 0 0 0 0 0
[4,] 0 0 0 0 1 0 0 0
[5,] 0 0 0 1 0 0 0 0
[6,] 0 0 1 0 0 0 0 0
[7,] 0 0 0 0 0 0 1 0
[8,] 0 0 0 0 0 0 0 1
[[3]]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 0 0 0 0 0 0 0
[2,] 0 1 0 0 0 0 0 0
[3,] 0 0 0 0 0 1 0 0
[4,] 0 0 0 0 1 0 0 0
[5,] 0 0 1 0 0 0 0 0
[6,] 0 0 0 0 0 0 1 0
[7,] 0 0 0 1 0 0 0 0
[8,] 0 0 0 0 0 0 0 1
A brute-force way:
m = as.matrix(temp2)
w = data.frame(which(m == 1, arr.ind = TRUE))
combos = as.matrix(do.call(expand.grid, with(w, split(col, row))))
combos[ apply(combos, 1, function(x) !anyDuplicated(x)), ]
1 2 3 4 5 6 7 8
[1,] 1 6 2 8 7 3 4 5
[2,] 1 2 6 8 7 3 4 5
[3,] 1 6 2 8 3 7 4 5
[4,] 1 2 6 8 3 7 4 5
[5,] 1 6 2 8 4 3 7 5
[6,] 1 2 6 8 4 3 7 5
[7,] 1 6 2 8 3 4 7 5
[8,] 1 2 6 8 3 4 7 5
[9,] 1 6 2 5 7 3 4 8
[10,] 1 2 6 5 7 3 4 8
[11,] 1 6 2 5 3 7 4 8
[12,] 1 2 6 5 3 7 4 8
[13,] 1 6 2 5 4 3 7 8
[14,] 1 2 6 5 4 3 7 8
[15,] 1 6 2 5 3 4 7 8
[16,] 1 2 6 5 3 4 7 8
OP claims to only ever need to handle an 8x8 grid, so I guess this performs well enough. Each row of the result is a solution. The first row says that (1,1), (2,6), (3,2) ... is a solution.
A variation using data.table:
library(data.table)
m = as.matrix(temp2)
comboDT = setDT(melt(m))[ value == 1, do.call(CJ, split(Var2, Var1)) ][,
rid := .I ][, melt(.SD, id="rid", variable.name="row", value.name="col")]
setkey(comboDT, rid)
comboDT[ .( comboDT[, !anyDuplicated(col), by=rid][(V1), rid]) ]
this works. Let grid be my grid (temp2 from above). then this will return a grid that works
# create random sufficient grid
counter = 0
while(2 > 1) {
counter = counter + 1
if(counter == 10000) {
break
}
rand_grid = matrix(0, nrow = 8, ncol = 8)
indices_avail = seq(1,8,by=1)
for(i in 1:8) {
k = sample(indices_avail, 1)
rand_grid[i, k] = 1
indices_avail = indices_avail[indices_avail != k]
}
if(sum(grid[which(rand_grid == 1)]) == 8) {
break
}
print(counter)
}
This approach will return all valid combinations. First find all matrix row combinations. Then search through exhaustively. This method would have to be improved if your matrix size increased. One simple improvement would be to run the diag test in parallel.
st<-as.matrix(temp2) # make sure we are working with matrices
## This method will return all possible matrices of combinations
## in essence if you have diag(matr) = width matrix than you have
## a valid choice
## Helper function to build all combinations, there may be better way to
## do this but it gets the job done
allCombinationsAux<-function(z,nreg,x){
if(sum(nreg)>1){
innerLoop<-do.call(rbind,lapply(x[nreg&(z!=x)], test1,nreg&(z!=x),x))
ret<-cbind(z,innerLoop )
}
else{
ret<-x[nreg]
}
ret
}
## Build all of the combinations of possible matrices
combs<-do.call(rbind,lapply(x,function(y) allCombinationsAux(y,y!=x,x)))
## iterate through all the possible combinations of matrices, to find out
## which ones have 1s throughout the diag
inds<-which(apply(combs,1,function(x) sum(diag(st[x,]))==8))
lapply(inds,function(x) st[combs[x,],])
While there are great answers already here for the brute-force approach and actually using math, just for kicks, here's a version that guesses and checks lags of the non-matching columns. For the example in question, it actually turns out to be quite quick, and as a bonus, you could find a new answer on any particular run! How fun! To the code:
set.seed(47) # remove this to have more fun
mat.in <- as.matrix(temp2) # we'll work in matrices
mat.out <- diag(8) # a starting guess
dimnames(mat.out) <- dimnames(mat.in) # make our answer pretty
iteration <- 1 # for kicks, a loop counter
while (any((mat.out != mat.in)[as.logical(mat.out)])) {
mat.ref <- mat.out
mat.out <- mat.out[, sample(8)] # make this deterministic if you like
inner <- 1 # don't repeat yourself (too much)
while (any(mat.out != mat.ref) & inner <= 8) {
mat.ref <- mat.out
# find non-matching indices and lag those columns
to.lag <- which((mat.out != mat.in)[as.logical(mat.out)])
i <- 1:8
i[to.lag] <- c(to.lag[length(to.lag)], to.lag[-length(to.lag)])
mat.out <- mat.out[, i]
cat(inner, " ") # let's see what it does
inner <- inner + 1
}
print(iteration) # whoo, scrolling numbers
iteration <- iteration + 1
}
## 1 2 3 [1] 1
## 1 2 3 4 5 6 7 8 [1] 2
## 1 2 [1] 3
## 1 2 3 [1] 4
which, for this particular seed returns
mat.out
## a c e g d b f h
## i 1 0 0 0 0 0 0 0
## j 0 0 0 0 0 1 0 0
## k 0 1 0 0 0 0 0 0
## l 0 0 0 0 1 0 0 0
## m 0 0 1 0 0 0 0 0
## n 0 0 0 0 0 0 1 0
## o 0 0 0 1 0 0 0 0
## p 0 0 0 0 0 0 0 1
It could certainly be optimized further, but it's already pretty quick (without the printing, which slows it down):
Unit: microseconds
expr min lq mean median uq max neval
let's guess 137.796 383.6445 838.2327 693.819 1163.08 2510.436 100
running all 100 times in a fraction of a second. It's quite a bit faster than actual guessing (chopping out the inner loop):
Unit: microseconds
expr min lq mean median uq max neval cld
guess smart 148.997 349.916 848.6314 588.162 1085.841 3117.78 100 a
actually guess 322.458 7341.961 31197.1237 20012.969 47677.501 160250.02 100 b
Note, though, that luck plays a role here, and if there are fewer solutions, it will take longer. If there are no solutions, it will run forever. It could, of course, be optimized to avoid such a fate by making sure it doesn't reuse the same starting permutation provided by sample(8) (a good idea regardless, which I deemed superfluous here as it only runs through a handful of permutations each run anyway). Hack away.
Suppose I have a vector containing data:
c <- c(1:100)
c[1:75] <- 0
c[76:100] <- 1
What I need to do is select a number of the 0's and turn them into 1's. There are potentially many ways to do this - like if I'm switching 25 of the 0's, it'd be 75 choose 25, so 5.26x10^19 - so I need do it, say, 1000 times randomly. (this is part of a larger model. I'll be using the mean of the results.)
I know (think), that I need to use sample() and a for loop - but how do I select n values randomly among the 0's, then change them to 1's?
vec <- c(rep(0, 75), rep(1, 25))
n <- 25
to_change <- sample(which(vec == 0), n)
modified_vec <- vec
modified_vec[to_change] <- 1
Something like this. You could wrap it up in a function.
And you should really do it in a matrix with apply, rather than a for loop.
This small example is easy to see it work:
n_vecs <- 5
vec_length <- 10
n_0 <- 7 # Number of 0's at the start of each vector
vec_mat <- matrix(c(rep(0, n_vecs * n_0), rep(1, n_vecs * (vec_length - n_0))),
nrow = vec_length, ncol = n_vecs, byrow = T)
> vec_mat
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 0 0
[2,] 0 0 0 0 0
[3,] 0 0 0 0 0
[4,] 0 0 0 0 0
[5,] 0 0 0 0 0
[6,] 0 0 0 0 0
[7,] 0 0 0 0 0
[8,] 1 1 1 1 1
[9,] 1 1 1 1 1
[10,] 1 1 1 1 1
change_n_0 <- function(x, n) {
x_change <- sample(which(x == 0), n)
x[x_change] <- 1
return(x)
}
vec_mat <- apply(vec_mat, MARGIN = 2, FUN = change_n_0, n = 2)
> vec_mat
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 0 0 1
[2,] 0 0 0 1 0
[3,] 0 0 0 0 0
[4,] 0 0 0 0 0
[5,] 0 0 1 0 1
[6,] 0 1 0 1 0
[7,] 1 0 1 0 0
[8,] 1 1 1 1 1
[9,] 1 1 1 1 1
[10,] 1 1 1 1 1
You can scale up the constants at the beginning as big as you'd like.
I have a data.frame and I'm trying to create a frequency table that shows the frequency of values for each row. So I'm starting with something like this:
d <- data.frame(a=c(1,2,3), b=c(3,4,5), c=c(1,2,5))
which looks like this:
a b c
1 3 1
2 4 2
3 5 5
What I'd really like to create is a contingency data.frame or matrix that looks like this:
1, 2, 3, 4, 5, 6, 7, 8, 9
2, 0, 1, 0, 0, 0, 0, 0, 0
0, 2, 0, 1, 0, 0, 0, 0, 0
0, 0, 1, 0, 2, 0, 0, 0, 0
The top row is simply a label row and need not be in the final result. But I add it there for illustration. Each row shows the digits 1:9 and the number of times each digit shows up in each row of the starting data.
I can't wrap my head around an easy way to create this. Although it seems like the table() function should be helpful, I can't get it to give me any love. Any help or ideas are appreciated.
Here you go:
t(apply(d, 1, tabulate, nbin=9))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 2 0 1 0 0 0 0 0 0
[2,] 0 2 0 1 0 0 0 0 0
[3,] 0 0 1 0 2 0 0 0 0
(Though it probably doesn't matter in this application, tabulate() (which is used inside of the code for table()) is also nice for the impressive speed with which it performs its calculations.)
EDIT: tabulate() isn't set up to deal with 0s or negative integers. If you want another one liner that does, you could use table() though, doing something like this:
d <- data.frame(a=c(0,-1,-2), b=c(3,4,5), c=c(1,2,5))
t(apply(d, 1, function(X) table(c(X, -9:9)) - 1))
-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9
[1,] 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0
[2,] 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0
[3,] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 2 0 0 0 0
another solution using table
library(reshape)
d <- data.frame(a=c(1,2,3), b=c(3,4,5), c=c(1,2,5))
d2 <- melt(d)
d2$rows <- rep(1:nrow(d), ncol(d))
table(d2$rows, d2$value)