create a X % probability matrix from list of matrices - r

I have a list of matrices (some hundred thousands). I want to create a single matrix where the cells correspond to e.g. the 95%. With that I mean this: if e.g. cell mat[1,2] is positive (i.e. >0) in 95% of the matrices it is scored a 1, and if e.g. cell mat[2,1] is negative (i.e. <0) in 95% of the matrices it is scored a -1. If they fall below this threshold they are scored a 0.
#Dummy data
listX <- list()
for(i in 1:10){listX[[i]]<-matrix(rnorm(n = 25, mean = 0.5, sd = 1),5,5)}
listX2 <- listX
for(i in 1:10) { listX2[[i]] <- ifelse(listX[[i]] >0, 1, -1) }
For the sake of the dummy data, the 95% can be changed to say 60%, such that the cells that keep their sign in 6 out of 10 matrices are kept and scored either 1 or -1 and the rest 0.
I'm stuck on this, hence cannot provide any more code.

I would do:
listX <- list()
set.seed(20)
# I set seed for reproducability, and changed your mean so you could see the negatives
for(i in 1:10){listX[[i]]<-matrix(rnorm(n = 25, mean = 0, sd = 1),5,5)}
threshold <- 0.7
(Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) >= threshold) - (Reduce('+',lapply(listX,function(x){x < 0}))/length(listX) >= threshold)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 0 -1 1
[2,] -1 1 -1 -1 1
[3,] 0 0 0 1 1
[4,] 0 1 0 0 0
[5,] 0 0 0 0 0
This basically checks both conditions, and adds the two checks together. To break down one of the conditions (Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) > threshold)
lapply(listX,function(x){x > 0}) loops through each matrix and converts it to a a matrix of true/false for every value that is above zero.
Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) then adds these all together (Reduce), and divides by the number of obeservations. If the proportion is greater than our threshold, we set that value to one, and if not it is zero.
We then subtract the same matrix with x < 0 as the test, which gives -1 in each case where enough sub-values are negative.

You can change the list to an array and then take the mean over the dimensions.
arr <- simplify2array(listX)
grzero = rowMeans(arr > 0, dims = 2)
lezero = rowMeans(arr < 0, dims = 2)
prop = 0.6
1* (grzero >= prop) + -1* (lezero >= prop)

Test case showing which answers work so far! (edit)
Below you'll find my original answer. It ended up producing comparable results to the other answers on test cases involving randomly seeded data. To triple check, I created a small test data set with a known answer. It turns out that only answer by #Chris passes right now (though #user20650 should be ok if using >= on this example as indicated in comments). Here it is in case anyone else wants to use it:
listX <- list(
matrix(c(1,0,-1,1), nrow = 2),
matrix(c(1,0,-1,1), nrow = 2),
matrix(c(1,0, 1,0), nrow = 2)
)
# With any threshold < .67,
# result should be...
matrix(c(1, 0, -1, 1), nrow = 2)
#> [,1] [,2]
#> [1,] 1 -1
#> [2,] 0 1
# Otherwise...
matrix(c(1, 0, 0, 0), nrow = 2)
#> [,1] [,2]
#> [1,] 1 0
#> [2,] 0 0
# #Chris answer passes
threshold <- 0.5
(Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) >= threshold) - (Reduce('+',lapply(listX,function(x){x < 0}))/length(listX) >= threshold)
#> [,1] [,2]
#> [1,] 1 -1
#> [2,] 0 1
threshold <- 1.0
(Reduce('+',lapply(listX,function(x){x > 0}))/length(listX) >= threshold) - (Reduce('+',lapply(listX,function(x){x < 0}))/length(listX) >= threshold)
#> [,1] [,2]
#> [1,] 1 0
#> [2,] 0 0
# My function fails...
prob_matrix(listX, .5)
#> [,1] [,2]
#> [1,] 1 -1
#> [2,] 0 1
prob_matrix(listX, 1)
#> [,1] [,2]
#> [1,] 1 0
#> [2,] 0 1
# #user20650 answer fails...
arr <- simplify2array(listX)
grzero = rowSums(arr > 0, dims = 2) / length(listX)
lezero = rowSums(arr < 0, dims = 2) / length(listX)
prop = 0.5
1* (grzero > prop) + -1* (lezero > prop)
#> [,1] [,2]
#> [1,] 1 -1
#> [2,] 0 1
arr <- simplify2array(listX)
grzero = rowSums(arr > 0, dims = 2) / length(listX)
lezero = rowSums(arr < 0, dims = 2) / length(listX)
prop = 1.0
1* (grzero > prop) + -1* (lezero > prop)
#> [,1] [,2]
#> [1,] 0 0
#> [2,] 0 0
Original answer
Here's one approach...
Combine sign and Reduce to do a cumulative sum of the signs of values in each cell, returning a single matrix.
Any cells where this value is less than the threshold number (your probability * number of matrices in the list) is converted to 0.
Return the sign() of all cells.
Below is an example with a wrapper function:
Toy data...
set.seed(12)
listX <- list()
for(i in 1:10){listX[[i]]<-matrix(rnorm(n = 25, mean = 0, sd = 1), 5, 5)}
Function...
prob_matrix <- function(matrix_list, prob) {
# Sum the signs of values in each cell
matrix_list <- lapply(matrix_list, sign)
x <- Reduce(`+`, matrix_list)
# Convert cells below prob to 0, others to relevant sign
x[abs(x) < (prob * length(matrix_list)) / 2] <- 0
sign(x)
}
Example cases...
prob_matrix(listX, .2)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -1 1 0 1 0
#> [2,] -1 0 -1 -1 0
#> [3,] 1 -1 1 1 1
#> [4,] 0 -1 1 1 -1
#> [5,] -1 0 -1 0 -1
prob_matrix(listX, .4)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -1 1 0 1 0
#> [2,] -1 0 -1 -1 0
#> [3,] 1 -1 1 1 1
#> [4,] 0 -1 1 1 -1
#> [5,] -1 0 -1 0 -1
prob_matrix(listX, .6)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0 1 0 1 0
#> [2,] -1 0 0 -1 0
#> [3,] 1 -1 0 1 1
#> [4,] 0 0 0 1 -1
#> [5,] -1 0 0 0 -1
prob_matrix(listX, .8)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0 1 0 1 0
#> [2,] -1 0 0 -1 0
#> [3,] 1 -1 0 1 1
#> [4,] 0 0 0 1 -1
#> [5,] -1 0 0 0 -1

Related

optimize network for three connections each in r

I have a list of locations and their weights (calculated distances apart) in a matrix.
I would like the optimal solution for each location having 3 connections, minimizing total distance.
costs6 <- matrix(c(0,399671,1525211,990914,1689886,1536081,399671,0,1802419,1128519,1964930,1603803,1525211,1802419,0,814942,164677,943489,990914,1128519.4,814942.7,0,953202,565712,1689886,1964930,164677,953202,0, 1004916,1536081,1603803,943489,565712,1004916,0),ncol=6,byrow=TRUE)
plantcap <- rep(3,6)
citydemand <- rep(3,6)
plant.signs <- rep("=",6)
city.signs <- rep("=",6)
lptrans <- lp.transport(costs6,"min",plant.signs,plantcap,city.signs,citydemand)
lptrans$solution
lptrans
This LP solver returns
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 0 0 0 0 0
[2,] 0 3 0 0 0 0
[3,] 0 0 3 0 0 0
[4,] 0 0 0 3 0 0
[5,] 0 0 0 0 3 0
[6,] 0 0 0 0 0 3
I am wondering if there is a way to max out any Xij at 1, so that the solver will give me three ones in each column/row, rather than one 3 in each column/row? If not, is there another solver I can use to find the solution?
Something like this, setting it up as an LP problem (assuming a symmetric solution matrix)?
library(lpSolve)
costs6 <- matrix(c(0,399671,1525211,990914,1689886,1536081,
399671,0,1802419,1128519,1964930,1603803,
1525211,1802419,0,814942,164677,943489,
990914,1128519.4,814942.7,0,953202,565712,
1689886,1964930,164677,953202,0, 1004916,
1536081,1603803,943489,565712,1004916,0),ncol=6,byrow=TRUE)
nLoc <- nrow(costs6)
nParams <- sum(1:(nLoc - 1L))
# set up the constraint matrix
# columns are parameters corresponding to the lower triangular of costs6 (by column)
# the first six constraints are for the row/column sums
# the last 15 constraints are for the maximum number of times each path can be used (1)
nConst <- sum(1:nLoc)
mConst <- matrix(0L, nConst, nParams)
mConst[matrix(c(c(combn(1:nLoc, 2)), rep(1:nParams, each = 2)), ncol = 2)] <- 1L
mConst[(nLoc + 1L):nConst,] <- diag(nParams)
lpSol <- lp(
direction = "min",
objective.in = unlist(costs6[lower.tri(costs6)]),
const.mat = mConst,
const.dir = c(rep("=", nLoc), rep("<=", nParams)),
const.rhs = c(rep(3L, nLoc), rep(1L, nParams)),
all.int = TRUE
)
lpSol
#> Success: the objective function is 8688039
# convert the solution to a transport matrix
mSol <- matrix(0, nLoc, nLoc)
mSol[lower.tri(mSol)] <- lpSol$solution
mSol[upper.tri(mSol)] <- t(mSol)[upper.tri(mSol)]
mSol
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0 1 1 1 0 0
#> [2,] 1 0 0 1 1 0
#> [3,] 1 0 0 0 1 1
#> [4,] 1 1 0 0 0 1
#> [5,] 0 1 1 0 0 1
#> [6,] 0 0 1 1 1 0

Create matrix with for-loop

I am trying to create the following matrix A for n rows and n+1 columns. n will likely be around 20 or 30, but for the purpose of the question I put it at 4 and 5.
Here is what I have so far:
N <- 5 # n+1
n <- 4 # n
columns <- list()
# first column:
columns[1] <- c(-1, 1, rep(0, N-2))
# all other columns:
for(i in N:2) {
columns[i] <- c((rep(0, N-i), 1, -2, 1, rep(0, i-3)))
}
# combine into matrix:
A <- cbind(columns)
I keep getting the following error msg:
In columns[1] <- c(-1, 1, rep(0, N - 2)) :
number of items to replace is not a multiple of replacement length
And later
"for(i in N:2) {
columns[i] <- c((rep(0, N-i),"
}
Error: unexpected '}' in "}"
I guess you can try the for loop below to create your matrix A:
N <- 5
n <- 4
A <- matrix(0,n,N)
for (i in 1:nrow(A)) {
if (i == 1) {
A[i,1:2] <- c(-1,1)
} else {
A[i,i+(-1:1)] <- c(1,-2,1)
}
}
such that
> A
[,1] [,2] [,3] [,4] [,5]
[1,] -1 1 0 0 0
[2,] 1 -2 1 0 0
[3,] 0 1 -2 1 0
[4,] 0 0 1 -2 1
Another solution is to use outer, and this method would be faster and looks more compact than the for loop approach, i.e.,
A <- `diag<-`(replace(z<-abs(outer(1:n,1:N,"-")),!z %in% c(0,1),0),
c(-1,rep(-2,length(diag(z))-1)))
I thought this would be fast compared to the loop, but when I tested on a 5000x5001 example, the loop in ThomasIsCoding's answer was about 5x faster. Go with that one!
N = 5
n = N - 1
A = matrix(0, nrow = n, ncol = N)
delta = row(A) - col(A)
diag(A) = -2
A[delta %in% c(1, -1)] = 1
A[1, 1] = -1
A
# [,1] [,2] [,3] [,4] [,5]
# [1,] -1 1 0 0 0
# [2,] 1 -2 1 0 0
# [3,] 0 1 -2 1 0
# [4,] 0 0 1 -2 1
You could use data.table::shift to shift the vector c(1, -2, 1, 0) by all increments from -1 (backwards shift / lead by 1) to n - 1 (forward shift / lagged by n - 1) and then cbind all the shifted outputs together. The first-row first-column element doesn't follow this pattern so that's fixed at the end.
library(data.table)
out <- do.call(cbind, shift(c(1, -2, 1, 0), seq(-1, n - 1), fill = 0))
out[1, 1] <- -1
out
# [,1] [,2] [,3] [,4] [,5]
# [1,] -1 1 0 0 0
# [2,] 1 -2 1 0 0
# [3,] 0 1 -2 1 0
# [4,] 0 0 1 -2 1

Conditional combinations matrixes in r

How can I create all binary combinations of matrices with the condition that there can only be a single 1 per column and row. The example will clarify. This particular example must have 6 matrices of combinations, I am showing only the first 2.
c1 <- matrix(c(1, 0, 0, 0, 1, 0, 0, 0, 1), nrow = 3) #First combination
c2 <- matrix(c(0, 1, 0, 1, 0, 0, 0, 0, 1), nrow = 3) #Second combination
What you are asking for is equivalent to finding all permutations of length = n where n = nrow(c1) (or c2 above). Using the FUN argument of permuteGeneral from RcppAlgos (I am the author), we can easily generate the desired outcome:
n <- 3L
myIdentity <- diag(nrow = n)
library(RcppAlgos)
permuteGeneral(n, n, FUN = function(x) myIdentity[x, ])
[[1]]
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
[[2]]
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 0 1
[3,] 0 1 0
[[3]]
[,1] [,2] [,3]
[1,] 0 1 0
[2,] 1 0 0
[3,] 0 0 1
[[4]]
[,1] [,2] [,3]
[1,] 0 1 0
[2,] 0 0 1
[3,] 1 0 0
[[5]]
[,1] [,2] [,3]
[1,] 0 0 1
[2,] 1 0 0
[3,] 0 1 0
[[6]]
[,1] [,2] [,3]
[1,] 0 0 1
[2,] 0 1 0
[3,] 1 0 0
There are many other ways of generating the requested output. Most notably, utilizing the tried and true combinat package, we can get a similar result (the output will be in a different order):
combinat::permn(3, fun = function(x) myIdentity[x, ])
Now that we have reduced the problem to simply generating permutations, we can use any of the great packages (arrangements, gtools, multicool, partitions, etc.) for generating permutations to obtain our desired result with the help of lapply:
library(arrangements)
myPerms <- permutations(n)
lapply(1:nrow(myPerms), function(x) myIdentity[myPerms[x,], ])

splitting list elements expanding the list

I'm doing some kind of optical character recognition and face the following issue. I store the glyphs in a list of binary matrices and they can be of different size, but their maximum possible width is wid = 3 columns (may be any defined constant, not just 3). In some cases after the first stage of processing I get data which look like this:
myll <- list(matrix(c(0, 0, 0, 1, 1, 0), ncol = 2),
matrix(c(0), ncol = 1),
matrix(c(1, 1, 0), ncol = 3),
matrix(c(1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1), ncol = 7),
matrix(c(1, 1, 1, 1), ncol = 2))
# [[1]]
# [,1] [,2]
# [1,] 0 1
# [2,] 0 1
# [3,] 0 0
#
# [[2]]
# [,1]
# [1,] 0
#
# [[3]]
# [,1] [,2] [,3]
# [1,] 1 1 0
#
# [[4]]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 1 1 1 0 0 0 1
# [2,] 0 1 0 1 0 0 1
# [3,] 1 1 1 1 0 0 1
#
# [[5]]
# [,1] [,2]
# [1,] 1 1
# [2,] 1 1
So, some glyphs may be not separated for some reasons. This happens only with glyphs of maximum possible width. Moreover, there may be some junk at the end of the matrix. I have to split them into matrices of width ncol = wid leaving the last piece (junk) as is. Then I store this matrices in separate elements of list to get the following output:
# [[1]]
# [,1] [,2]
# [1,] 0 1
# [2,] 0 1
# [3,] 0 0
#
# [[2]]
# [,1]
# [1,] 0
#
# [[3]]
# [,1] [,2] [,3]
# [1,] 1 1 0
#
# [[4]]
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] 0 1 0
# [3,] 1 1 1
#
# [[5]]
# [,1] [,2] [,3]
# [1,] 0 0 0
# [2,] 1 0 0
# [3,] 1 0 0
#
# [[6]]
# [,1]
# [1,] 1
# [2,] 1
# [3,] 1
#
# [[7]]
# [,1] [,2]
# [1,] 1 1
# [2,] 1 1
At the moment I can make it with the help of this functions
checkGlyphs <- function(gl_m, wid = 3) {
if (ncol(gl_m) > wid)
return(list(gl_m[,1:wid], matrix(gl_m[,-(1:wid)], nrow = nrow(gl_m)))) else
return(gl_m)
}
separateGlyphs <- function(myll, wid = 3) {
require("magrittr")
presplit <- lapply(myll, checkGlyphs, wid)
total_new_length <-
presplit[unlist(lapply(presplit, is.list))] %>% lapply(length) %>% unlist() %>% sum() +
as.integer(!unlist(lapply(presplit, is.list))) %>% sum()
splitted <- vector("list", length = total_new_length)
spl_index <- 1
for (i in 1:length(presplit))
{
if (!is.list(presplit[[i]]))
{
splitted[[spl_index]] <- presplit[[i]]
spl_index <- spl_index + 1
} else
{
for (j in 1:length(presplit[[i]]))
{
splitted[[spl_index]] <- presplit[[i]][[j]]
spl_index <- spl_index + 1
}
}
}
if (any(lapply(splitted, ncol) > wid)) return(separateGlyphs(splitted, wid)) else
return(splitted)
}
But I believe there is more fast and convenient way to achieve the same result (without using for loops and this enlooped reassignment of elements and then recursion if needed O_o).
I will be thankful for any suggestions on the point or, alternatively, for recommending some OCR-packages for R.
This should do the trick, with the values in final being what you're after.
combined <- do.call(cbind, lapply(myll, unlist))
idx <- seq(1, ncol(combined), 2)
final <- do.call(list, lapply(idx, function(x) combined[, x:(x+1)]))

Subset selection from binary matrix with dynamic column indices

A few questions, for which the R language might have elegant solutions....
Given, a matrix m containing binary values 1 and 0, and a vector v of column indices
how would I write a function to extract the all rows in m that have
the value of 1 in each of the columns indexed by the integers in v?
as an extra feature, how would one return the row indices along with
the corresponding rows?
Probably best if I illustrating, with an example....
Assuming the logic I'm asking for resides in function selectByIndices( matrix, indexVector).
so if we have the matrix (or perhaps the equivalent dataframe):
>(m= matrix(c( 1, 0, 1, 1, 1,0, 1, 1, 0, 1,1, 0, 1, 1, 0,1, 1, 1,
0, 1,0, 1, 0, 0, 1), 5))
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 1 1 0
[2,] 0 1 0 1 1
[3,] 1 1 1 1 0
[4,] 1 0 1 0 0
[5,] 1 1 0 1 1
and index vectors:
>c1 = c(1,3,4)
>c2 = c(4,5)
>c3 = c(1,3,5)
The function would behave something like this:
>selectByIndices( m, c1)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 1 1 0
[3,] 1 1 1 1 0
>selectByIndices( m, c2)
[,1] [,2] [,3] [,4] [,5]
[2,] 0 1 0 1 1
[5,] 1 1 0 1 1
>selectByIndices( m, c3)
#no rows (i.e. empty collection) returned
Hoping it's clear enough, thanks in advance for your help.
## Create a function that extracts the qualifying rows
f <- function(m, j) {
m[rowSums(m[, j]) == length(j),]
# m[apply(m[, j], 1, function(X) all(X==1)),] ## This would also work
# which(rowSums(m[, j]) == length(j)) ## & this would get row indices
}
## Try it out
f(m, c1)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 0 1 1 0
# [2,] 1 1 1 1 0
f(m, c2)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0 1 0 1 1
# [2,] 1 1 0 1 1
> selectRows <- function(mat, rown) suppressWarnings(mat[apply( mat[, rown], 1, all) , ])
> selectRows(m, c1)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 1 1 0
[2,] 1 1 1 1 0
> whichRows <-function(mat, rown) suppressWarnings( which( apply( mat[, rown], 1, all) ) )
> whichRows(m, c1)
[1] 1 3

Resources