How can I create all binary combinations of matrices with the condition that there can only be a single 1 per column and row. The example will clarify. This particular example must have 6 matrices of combinations, I am showing only the first 2.
c1 <- matrix(c(1, 0, 0, 0, 1, 0, 0, 0, 1), nrow = 3) #First combination
c2 <- matrix(c(0, 1, 0, 1, 0, 0, 0, 0, 1), nrow = 3) #Second combination
What you are asking for is equivalent to finding all permutations of length = n where n = nrow(c1) (or c2 above). Using the FUN argument of permuteGeneral from RcppAlgos (I am the author), we can easily generate the desired outcome:
n <- 3L
myIdentity <- diag(nrow = n)
library(RcppAlgos)
permuteGeneral(n, n, FUN = function(x) myIdentity[x, ])
[[1]]
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
[[2]]
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 0 1
[3,] 0 1 0
[[3]]
[,1] [,2] [,3]
[1,] 0 1 0
[2,] 1 0 0
[3,] 0 0 1
[[4]]
[,1] [,2] [,3]
[1,] 0 1 0
[2,] 0 0 1
[3,] 1 0 0
[[5]]
[,1] [,2] [,3]
[1,] 0 0 1
[2,] 1 0 0
[3,] 0 1 0
[[6]]
[,1] [,2] [,3]
[1,] 0 0 1
[2,] 0 1 0
[3,] 1 0 0
There are many other ways of generating the requested output. Most notably, utilizing the tried and true combinat package, we can get a similar result (the output will be in a different order):
combinat::permn(3, fun = function(x) myIdentity[x, ])
Now that we have reduced the problem to simply generating permutations, we can use any of the great packages (arrangements, gtools, multicool, partitions, etc.) for generating permutations to obtain our desired result with the help of lapply:
library(arrangements)
myPerms <- permutations(n)
lapply(1:nrow(myPerms), function(x) myIdentity[myPerms[x,], ])
Related
I have a few vectors that I would like to arrange into square matrices of the same dimensions for future multiplication. Some vectors are shorters than others so I would like to add zeros to the shorters ones so that all the resulting matrices are of the same dimension.
I tried to add 0 to the tail of the shorter vectors but I haven't been able to generate the reducible matrices that I want. Below are some pseudo data. Thank you for your time!
seq_a <- rep(1,4)
seq_b <- rep(1,3)
matA <- diag(seq_a)
matB <- matrix(c(diag(seq_b),0),nrow=4,ncol = 4)
[,1] [,2] [,3] [,4]
[1,] 1 1 1 0
[2,] 0 0 0 0
[3,] 0 0 1 1
[4,] 0 0 0 0
Warning message:
In matrix(c(diag(seq_c), 0), nrow = 4, ncol = 4) :
data length [10] is not a sub-multiple or multiple of the number of rows [4]
The desired matB should be
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 0 0 1 0
[4,] 0 0 0 0
That is just a typo: you should first pad the vector, and then convert it to a diagonal matrix.
matB <- diag( c(seq_b,0) )
I have a list of matrices (some hundred thousands). I want to create a single matrix where the cells correspond to e.g. the 95%. With that I mean this: if e.g. cell mat[1,2] is positive (i.e. >0) in 95% of the matrices it is scored a 1, and if e.g. cell mat[2,1] is negative (i.e. <0) in 95% of the matrices it is scored a -1. If they fall below this threshold they are scored a 0.
#Dummy data
listX <- list()
for(i in 1:10){listX[[i]]<-matrix(rnorm(n = 25, mean = 0.5, sd = 1),5,5)}
listX2 <- listX
for(i in 1:10) { listX2[[i]] <- ifelse(listX[[i]] >0, 1, -1) }
For the sake of the dummy data, the 95% can be changed to say 60%, such that the cells that keep their sign in 6 out of 10 matrices are kept and scored either 1 or -1 and the rest 0.
I'm stuck on this, hence cannot provide any more code.
I would do:
listX <- list()
set.seed(20)
# I set seed for reproducability, and changed your mean so you could see the negatives
for(i in 1:10){listX[[i]]<-matrix(rnorm(n = 25, mean = 0, sd = 1),5,5)}
threshold <- 0.7
(Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) >= threshold) - (Reduce('+',lapply(listX,function(x){x < 0}))/length(listX) >= threshold)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 0 -1 1
[2,] -1 1 -1 -1 1
[3,] 0 0 0 1 1
[4,] 0 1 0 0 0
[5,] 0 0 0 0 0
This basically checks both conditions, and adds the two checks together. To break down one of the conditions (Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) > threshold)
lapply(listX,function(x){x > 0}) loops through each matrix and converts it to a a matrix of true/false for every value that is above zero.
Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) then adds these all together (Reduce), and divides by the number of obeservations. If the proportion is greater than our threshold, we set that value to one, and if not it is zero.
We then subtract the same matrix with x < 0 as the test, which gives -1 in each case where enough sub-values are negative.
You can change the list to an array and then take the mean over the dimensions.
arr <- simplify2array(listX)
grzero = rowMeans(arr > 0, dims = 2)
lezero = rowMeans(arr < 0, dims = 2)
prop = 0.6
1* (grzero >= prop) + -1* (lezero >= prop)
Test case showing which answers work so far! (edit)
Below you'll find my original answer. It ended up producing comparable results to the other answers on test cases involving randomly seeded data. To triple check, I created a small test data set with a known answer. It turns out that only answer by #Chris passes right now (though #user20650 should be ok if using >= on this example as indicated in comments). Here it is in case anyone else wants to use it:
listX <- list(
matrix(c(1,0,-1,1), nrow = 2),
matrix(c(1,0,-1,1), nrow = 2),
matrix(c(1,0, 1,0), nrow = 2)
)
# With any threshold < .67,
# result should be...
matrix(c(1, 0, -1, 1), nrow = 2)
#> [,1] [,2]
#> [1,] 1 -1
#> [2,] 0 1
# Otherwise...
matrix(c(1, 0, 0, 0), nrow = 2)
#> [,1] [,2]
#> [1,] 1 0
#> [2,] 0 0
# #Chris answer passes
threshold <- 0.5
(Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) >= threshold) - (Reduce('+',lapply(listX,function(x){x < 0}))/length(listX) >= threshold)
#> [,1] [,2]
#> [1,] 1 -1
#> [2,] 0 1
threshold <- 1.0
(Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) >= threshold) - (Reduce('+',lapply(listX,function(x){x < 0}))/length(listX) >= threshold)
#> [,1] [,2]
#> [1,] 1 0
#> [2,] 0 0
# My function fails...
prob_matrix(listX, .5)
#> [,1] [,2]
#> [1,] 1 -1
#> [2,] 0 1
prob_matrix(listX, 1)
#> [,1] [,2]
#> [1,] 1 0
#> [2,] 0 1
# #user20650 answer fails...
arr <- simplify2array(listX)
grzero = rowSums(arr > 0, dims = 2) / length(listX)
lezero = rowSums(arr < 0, dims = 2) / length(listX)
prop = 0.5
1* (grzero > prop) + -1* (lezero > prop)
#> [,1] [,2]
#> [1,] 1 -1
#> [2,] 0 1
arr <- simplify2array(listX)
grzero = rowSums(arr > 0, dims = 2) / length(listX)
lezero = rowSums(arr < 0, dims = 2) / length(listX)
prop = 1.0
1* (grzero > prop) + -1* (lezero > prop)
#> [,1] [,2]
#> [1,] 0 0
#> [2,] 0 0
Original answer
Here's one approach...
Combine sign and Reduce to do a cumulative sum of the signs of values in each cell, returning a single matrix.
Any cells where this value is less than the threshold number (your probability * number of matrices in the list) is converted to 0.
Return the sign() of all cells.
Below is an example with a wrapper function:
Toy data...
set.seed(12)
listX <- list()
for(i in 1:10){listX[[i]]<-matrix(rnorm(n = 25, mean = 0, sd = 1), 5, 5)}
Function...
prob_matrix <- function(matrix_list, prob) {
# Sum the signs of values in each cell
matrix_list <- lapply(matrix_list, sign)
x <- Reduce(`+`, matrix_list)
# Convert cells below prob to 0, others to relevant sign
x[abs(x) < (prob * length(matrix_list)) / 2] <- 0
sign(x)
}
Example cases...
prob_matrix(listX, .2)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -1 1 0 1 0
#> [2,] -1 0 -1 -1 0
#> [3,] 1 -1 1 1 1
#> [4,] 0 -1 1 1 -1
#> [5,] -1 0 -1 0 -1
prob_matrix(listX, .4)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -1 1 0 1 0
#> [2,] -1 0 -1 -1 0
#> [3,] 1 -1 1 1 1
#> [4,] 0 -1 1 1 -1
#> [5,] -1 0 -1 0 -1
prob_matrix(listX, .6)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0 1 0 1 0
#> [2,] -1 0 0 -1 0
#> [3,] 1 -1 0 1 1
#> [4,] 0 0 0 1 -1
#> [5,] -1 0 0 0 -1
prob_matrix(listX, .8)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0 1 0 1 0
#> [2,] -1 0 0 -1 0
#> [3,] 1 -1 0 1 1
#> [4,] 0 0 0 1 -1
#> [5,] -1 0 0 0 -1
I am looking for an easier way to do the following:
m <- matrix(0, nrow=3, 3)
v <- c(1, 3, 2)
for (i in 1:nrow(m)) {
m[[i, v[i]]] = 1
}
The above code creates the following index matrix:
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 0 1
[3,] 0 1 0
There surely must be a better way to do this?!
One way to do it without pre-defining the matrix would be to use outer:
num.col <- 3
outer(v, seq_len(num.col), "==") * 1
# [,1] [,2] [,3]
# [1,] 1 0 0
# [2,] 0 0 1
# [3,] 0 1 0
I'm doing some kind of optical character recognition and face the following issue. I store the glyphs in a list of binary matrices and they can be of different size, but their maximum possible width is wid = 3 columns (may be any defined constant, not just 3). In some cases after the first stage of processing I get data which look like this:
myll <- list(matrix(c(0, 0, 0, 1, 1, 0), ncol = 2),
matrix(c(0), ncol = 1),
matrix(c(1, 1, 0), ncol = 3),
matrix(c(1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1), ncol = 7),
matrix(c(1, 1, 1, 1), ncol = 2))
# [[1]]
# [,1] [,2]
# [1,] 0 1
# [2,] 0 1
# [3,] 0 0
#
# [[2]]
# [,1]
# [1,] 0
#
# [[3]]
# [,1] [,2] [,3]
# [1,] 1 1 0
#
# [[4]]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 1 1 1 0 0 0 1
# [2,] 0 1 0 1 0 0 1
# [3,] 1 1 1 1 0 0 1
#
# [[5]]
# [,1] [,2]
# [1,] 1 1
# [2,] 1 1
So, some glyphs may be not separated for some reasons. This happens only with glyphs of maximum possible width. Moreover, there may be some junk at the end of the matrix. I have to split them into matrices of width ncol = wid leaving the last piece (junk) as is. Then I store this matrices in separate elements of list to get the following output:
# [[1]]
# [,1] [,2]
# [1,] 0 1
# [2,] 0 1
# [3,] 0 0
#
# [[2]]
# [,1]
# [1,] 0
#
# [[3]]
# [,1] [,2] [,3]
# [1,] 1 1 0
#
# [[4]]
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] 0 1 0
# [3,] 1 1 1
#
# [[5]]
# [,1] [,2] [,3]
# [1,] 0 0 0
# [2,] 1 0 0
# [3,] 1 0 0
#
# [[6]]
# [,1]
# [1,] 1
# [2,] 1
# [3,] 1
#
# [[7]]
# [,1] [,2]
# [1,] 1 1
# [2,] 1 1
At the moment I can make it with the help of this functions
checkGlyphs <- function(gl_m, wid = 3) {
if (ncol(gl_m) > wid)
return(list(gl_m[,1:wid], matrix(gl_m[,-(1:wid)], nrow = nrow(gl_m)))) else
return(gl_m)
}
separateGlyphs <- function(myll, wid = 3) {
require("magrittr")
presplit <- lapply(myll, checkGlyphs, wid)
total_new_length <-
presplit[unlist(lapply(presplit, is.list))] %>% lapply(length) %>% unlist() %>% sum() +
as.integer(!unlist(lapply(presplit, is.list))) %>% sum()
splitted <- vector("list", length = total_new_length)
spl_index <- 1
for (i in 1:length(presplit))
{
if (!is.list(presplit[[i]]))
{
splitted[[spl_index]] <- presplit[[i]]
spl_index <- spl_index + 1
} else
{
for (j in 1:length(presplit[[i]]))
{
splitted[[spl_index]] <- presplit[[i]][[j]]
spl_index <- spl_index + 1
}
}
}
if (any(lapply(splitted, ncol) > wid)) return(separateGlyphs(splitted, wid)) else
return(splitted)
}
But I believe there is more fast and convenient way to achieve the same result (without using for loops and this enlooped reassignment of elements and then recursion if needed O_o).
I will be thankful for any suggestions on the point or, alternatively, for recommending some OCR-packages for R.
This should do the trick, with the values in final being what you're after.
combined <- do.call(cbind, lapply(myll, unlist))
idx <- seq(1, ncol(combined), 2)
final <- do.call(list, lapply(idx, function(x) combined[, x:(x+1)]))
A few questions, for which the R language might have elegant solutions....
Given, a matrix m containing binary values 1 and 0, and a vector v of column indices
how would I write a function to extract the all rows in m that have
the value of 1 in each of the columns indexed by the integers in v?
as an extra feature, how would one return the row indices along with
the corresponding rows?
Probably best if I illustrating, with an example....
Assuming the logic I'm asking for resides in function selectByIndices( matrix, indexVector).
so if we have the matrix (or perhaps the equivalent dataframe):
>(m= matrix(c( 1, 0, 1, 1, 1,0, 1, 1, 0, 1,1, 0, 1, 1, 0,1, 1, 1,
0, 1,0, 1, 0, 0, 1), 5))
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 1 1 0
[2,] 0 1 0 1 1
[3,] 1 1 1 1 0
[4,] 1 0 1 0 0
[5,] 1 1 0 1 1
and index vectors:
>c1 = c(1,3,4)
>c2 = c(4,5)
>c3 = c(1,3,5)
The function would behave something like this:
>selectByIndices( m, c1)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 1 1 0
[3,] 1 1 1 1 0
>selectByIndices( m, c2)
[,1] [,2] [,3] [,4] [,5]
[2,] 0 1 0 1 1
[5,] 1 1 0 1 1
>selectByIndices( m, c3)
#no rows (i.e. empty collection) returned
Hoping it's clear enough, thanks in advance for your help.
## Create a function that extracts the qualifying rows
f <- function(m, j) {
m[rowSums(m[, j]) == length(j),]
# m[apply(m[, j], 1, function(X) all(X==1)),] ## This would also work
# which(rowSums(m[, j]) == length(j)) ## & this would get row indices
}
## Try it out
f(m, c1)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 0 1 1 0
# [2,] 1 1 1 1 0
f(m, c2)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0 1 0 1 1
# [2,] 1 1 0 1 1
> selectRows <- function(mat, rown) suppressWarnings(mat[apply( mat[, rown], 1, all) , ])
> selectRows(m, c1)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 1 1 0
[2,] 1 1 1 1 0
> whichRows <-function(mat, rown) suppressWarnings( which( apply( mat[, rown], 1, all) ) )
> whichRows(m, c1)
[1] 1 3