Get the maximum permutation matrix from logical matrix - math

A (m rows, n columns) is a (0,1)-Matrix (or logical matrix).
How to get a sub matrix B (p rows, p columns) from A, satisfying that B is a permutation matrix and p is the maximum? For instance,
PS: A permutation matrix is a square binary matrix that has exactly one entry 1 in each row and each column and 0s elsewhere.

One possibility is to exploit that every permutation matrix can be built up one row and column at a time.
So we can take every permutation matrix of a certain size, try to extend it by all possible rows or columns,
and see what results in a permutation matrix that is one size larger.
The running time isn't that great. I think it's something like O(2^(m+n)). (And I've used Python, FWIW.)
#!/usr/local/bin/python3
import itertools
A = ((0,1,0,0),
(0,0,1,0),
(0,1,1,0),
(1,0,0,1))
maximalSubmatrices = { ( (), () ), }
# each tuple is a tuple of rows and then columns
maxP = 0
def isPerm(rows,cols):
if ( len(rows) != len(cols) ):
return False
for row in rows:
if not exactlyOne( A[row][col] for col in cols ):
return False
for col in cols:
if not exactlyOne( A[row][col] for row in rows ):
return False
return True
def exactlyOne(sequence):
return sum( 1 for elt in sequence if elt ) == 1
while True:
moreMaxl = set()
for submatrix in maximalSubmatrices:
for row,col in itertools.product(range(len(A)),range(len(A[0]))):
if ( row not in submatrix[0] and col not in submatrix[1] ):
moreMaxl.add( ( tuple(sorted(submatrix[0]+(row,))) , tuple(sorted(submatrix[1]+(col,))) ) )
moreMaxl = set( ( maxl for maxl in moreMaxl if isPerm(*maxl) ) )
if ( len(moreMaxl) ):
maxP += 1
maximalSubmatrices = moreMaxl
else:
break
for maxl in maximalSubmatrices:
print("maximal rows: ",maxl[0],"\nmaximal cols: ",maxl[1],end="\n\n")
print("maximum permutation size is: ",maxP)
The output is:
maximal rows: (0, 1, 3)
maximal cols: (0, 1, 2)
maximal rows: (0, 1, 3)
maximal cols: (1, 2, 3)
maximum permutation size is: 3
Explanation:
In Python, a tuple is an immutable array of objects. Because it’s immutable, it can be hashed and made an element of a set. So maximalSubmatrices is a set of the rows and columns needed to make a submatrix. In Java, we’d do something like:
class Submatrix {
List<Integer> rows;
List<Integer> columns;
public int hashCode();
public boolean equals(Object);
}
Set<Submatrix> maximalSubmatrices;
but Python can take care of all that by itself.
We start with the rows and columns needed to make a submatrix of size 0: both are the empty tuple (). Each time through the while loop, we take all possible row,column pairs and see if that row,column could extend a current permutation matrix (in other words, they’re not already in the matrix). If so, we add the extended matrix to the set moreMaxl. Then we go through moreMaxl and keep only the permutation matrices. If there’s still elements in moreMaxl, then they’re permutation matrices that are one size larger than the matrices in maximalSubmatrices. Since we could extend, the while loop continues.

Related

Julia beginner: Iterate getindex() for increasing indices

I'm working with a 121x137 (i,j) array (mortality table) and am trying to create a 121x1 vector from this made up of the (i+1,j+1) values (i.e. index (50,50), (51,51) and so forth.
I'm using the following code - BaseTable is my 121x137 array:
age=50
survivalcurve = for i in age:nrow(BaseTable)-1
for j in age:ncol(BaseTable)-1
println(getindex(FemaleBaseTable, i+1, j+1))
end
end
However, when I do this, its returning all values of i and j - if I picture the values I want as a diagonal line running top-bottom, L-R of my table, its giving me all the values on the top right of my imaginary diagonal line.
If I fix i and loop through j, it works and returns the entire 50th row:
age=50
survivalcurve =
for j in age:ncol(FemaleBaseTable)
println(getindex(FemaleBaseTable, 50, j+1))
end
and likewise if I fix j and loop through i, this works and returns the entire jth column:
age=50
survivalcurve =
for i in age:nrow(FemaleBaseTable)-1
println(getindex(FemaleBaseTable, i+1, 50))
end
I've surmised its returning all values because I'm using age:nrow / age:ncol, but not sure what a suitable replacement would be to only return the (i+1),(j+1) value would be. Any help would be appreciated!
You can collect the diagonal elements starting at any index (i,j) by using either diag from LinearAlgebra.jl or manually using a single loop.
Assuming you have:
nrows, ncols = 121, 137
FemaleBaseTable = rand(nrows, ncols)
# And the diagonal should start at any (i,j)
i, j = 50, 50
You can either use
diag(#view FemaleBaseTable[i:end,j:end])
or,
[FemaleBaseTable[i,i] for (i,j) in zip(i:nrows,j:ncols)]

function which finds a common number in multiple lists

how would this function be completed to return the common integers between two lists?
how would i complete the get_common_elements(list1, list2) function?. The function should select all the common integers from both parameters and displays them in the result.
ie numbers 1 = 3,6,8,9,12,35
numbers 2 = 6,7,13,34, 35
result = 6,35
you can assume that each number only occurs in each list once
def common_member(a, b):
a_set = set(a)
b_set = set(b)
if (a_set & b_set):
print(a_set & b_set)
else:
print("No common elements")

Identify which rows (or columns) have values in sparse Matrix

I need to identify the rows (/columns) that have defined values in a large sparse Boolean Matrix. I want to use this to 1. slice (actually view) the Matrix by those rows/columns; and 2. slice (/view) vectors and matrices that have the same dimensions as the margins of a Matrix. I.e. the result should probably be a Vector of indices / Bools or (preferably) an iterator.
I've tried the obvious:
a = sprand(10000, 10000, 0.01)
cols = unique(a.colptr)
rows = unique(a.rowvals)
but each of these take like 20ms on my machine, probably because they allocate about 1MB (at least they allocate cols and rows). This is inside a performance-critical function, so I'd like the code to be optimized. The Base code seems to have an nzrange iterator for sparse matrices, but it is not easy for me to see how to apply that to my case.
Is there a suggested way of doing this?
Second question: I'd need to also perform this operation on views of my sparse Matrix - would that be something like x = view(a,:,:); cols = unique(x.parent.colptr[x.indices[:,2]]) or is there specialized functionality for this? Views of sparse matrices appear to be tricky (cf https://discourse.julialang.org/t/slow-arithmetic-on-views-of-sparse-matrices/3644 – not a cross-post)
Thanks a lot!
Regarding getting the non-zero rows and columns of a sparse matrix, the following functions should be pretty efficient:
nzcols(a::SparseMatrixCSC) = collect(i
for i in 1:a.n if a.colptr[i]<a.colptr[i+1])
function nzrows(a::SparseMatrixCSC)
active = falses(a.m)
for r in a.rowval
active[r] = true
end
return find(active)
end
For a 10_000x10_000 matrix with 0.1 density it takes 0.2ms and 2.9ms for cols and rows, respectively. It should also be quicker than method in question (apart from the correctness issue as well).
Regarding views of sparse matrices, a quick solution would be to turn view into a sparse matrix (e.g. using b = sparse(view(a,100:199,100:199))) and use functions above. In code:
nzcols(b::SubArray{T,2,P}) where {T,P<:AbstractSparseArray} = nzcols(sparse(b))
nzrows(b::SubArray{T,2,P}) where {T,P<:AbstractSparseArray} = nzrows(sparse(b))
A better solution would be to customize the functions according to view. For example, when the view uses UnitRanges for both rows and columns:
# utility predicate returning true if element of sorted v in range r
inrange(v,r) = searchsortedlast(v,last(r))>=searchsortedfirst(v,first(r))
function nzcols(b::SubArray{T,2,P,Tuple{UnitRange{Int64},UnitRange{Int64}}}
) where {T,P<:SparseMatrixCSC}
return collect(i+1-start(b.indexes[2])
for i in b.indexes[2]
if b.parent.colptr[i]<b.parent.colptr[i+1] &&
inrange(b.parent.rowval[nzrange(b.parent,i)],b.indexes[1]))
end
function nzrows(b::SubArray{T,2,P,Tuple{UnitRange{Int64},UnitRange{Int64}}}
) where {T,P<:SparseMatrixCSC}
active = falses(length(b.indexes[1]))
for c in b.indexes[2]
for r in nzrange(b.parent,c)
if b.parent.rowval[r] in b.indexes[1]
active[b.parent.rowval[r]+1-start(b.indexes[1])] = true
end
end
end
return find(active)
end
which work faster than the versions for the full matrices (for 100x100 submatrix of above 10,000x10,000 matrix cols and rows take 16μs and 12μs, respectively on my machine, but these are unstable results).
A proper benchmark would use fixed matrices (or at least fix the random seed). I'll edit this line with such a benchmark if I do it.
In case the indices are not ranges, the fallback to converting to a sparse matrix works, but here are versions for indices which are Vectors. If the indices are mixed, yet another set of versions needs to be made. Quite repetitive, but this is the strength of Julia, when the versions are done, the code will choose optimized methods correctly using the types in the caller without too much effort.
function sortedintersecting(v1, v2)
i,j = start(v1), start(v2)
while i <= length(v1) && j <= length(v2)
if v1[i] == v2[j] return true
elseif v1[i] > v2[j] j += 1
else i += 1
end
end
return false
end
function nzcols(b::SubArray{T,2,P,Tuple{Vector{Int64},Vector{Int64}}}
) where {T,P<:SparseMatrixCSC}
brows = sort(unique(b.indexes[1]))
return [k
for (k,i) in enumerate(b.indexes[2])
if b.parent.colptr[i]<b.parent.colptr[i+1] &&
sortedintersecting(brows,b.parent.rowval[nzrange(b.parent,i)])]
end
function nzrows(b::SubArray{T,2,P,Tuple{Vector{Int64},Vector{Int64}}}
) where {T,P<:SparseMatrixCSC}
active = falses(length(b.indexes[1]))
for c in b.indexes[2]
active[findin(b.indexes[1],b.parent.rowval[nzrange(b.parent,c)])] = true
end
return find(active)
end
-- ADDENDUM --
Since it was noted nzrows for Vector{Int} indices is a bit slow, this is an attempt to improve its speed by replacing findin with a version exploiting sortedness:
function findin2(inds,v,w)
i,j = start(v),start(w)
res = Vector{Int}()
while i<=length(v) && j<=length(w)
if v[i]==w[j]
push!(res,inds[i])
i += 1
elseif (v[i]<w[j]) i += 1
else j += 1
end
end
return res
end
function nzrows(b::SubArray{T,2,P,Tuple{Vector{Int64},Vector{Int64}}}
) where {T,P<:SparseMatrixCSC}
active = falses(length(b.indexes[1]))
inds = sortperm(b.indexes[1])
brows = (b.indexes[1])[inds]
for c in b.indexes[2]
active[findin2(inds,brows,b.parent.rowval[nzrange(b.parent,c)])] = true
end
return find(active)
end

Julia: swap gives errors

I'm using Julia 0.3.4
I'm trying to write LU-decomposition using Gaussian elimination. So I have to swap rows. And here's my problem:
If I'm using a,b = b,a I get an error,
but if I'm using:
function swapRows(row1, row2)
temp = row1
row1 = row2
row2 = temp
end
then everything works just fine.
Am I doing something wrong or it's a bug?
Here's my source code:
function lu_t(A::Matrix)
# input value: (A), where A is a matrix
# return value: (L,U), where L,U are matrices
function swapRows(row1, row2)
temp = row1
row1 = row2
row2 = temp
return null
end
if size(A)[1] != size(A)[2]
throw(DimException())
end
n = size(A)[1] # matrix dimension
U = copy(A) # upper triangular matrix
L = eye(n) # lower triangular matrix
for k = 1:n-1 # direct Gaussian elimination for each column `k`
(val,id) = findmax(U[k:end,k]) # find max pivot element and it's row `id`
if val == 0 # check matrix for singularity
throw(SingularException())
end
swapRows(U[k,k:end],U[id,k:end]) # swap row `k` and `id`
# U[k,k:end],U[id,k:end] = U[id,k:end],U[k,k:end] - error
for i = k+1:n # for each row `i` > `k`
μ = U[i,k] / U[k,k] # find elimination coefficient `μ`
L[i,k] = μ # save to an appropriate position in lower triangular matrix `L`
for j = k:n # update each value of the row `i`
U[i,j] = U[i,j] - μ⋅U[k,j]
end
end
end
return (L,U)
end
###### main code ######
A = rand(4,4)
#time (L,U) = lu_t(A)
#test_approx_eq(L*U, A)
The swapRows function is a no-op and has no effect whatsoever – all it does is swap around some local variable names. See various discussions of the difference between assignment and mutation:
https://groups.google.com/d/msg/julia-users/oSW5hH8vxAo/llAHRvvFVhMJ
http://julia.readthedocs.org/en/latest/manual/faq/#i-passed-an-argument-x-to-a-function-modified-it-inside-that-function-but-on-the-outside-the-variable-x-is-still-unchanged-why
http://julia.readthedocs.org/en/latest/manual/faq/#why-does-x-y-allocate-memory-when-x-and-y-are-arrays
The constant null doesn't mean what you think it does – in Julia v0.3 it's a function that computes the null space of a linear transformation; in Julia v0.4 it still means this but has been deprecated and renamed to nullspace. The "uninteresting" value in Julia is called nothing.
I'm not sure what's wrong with your commented out row swapping code, but this general approach does work:
julia> X = rand(3,4)
3x4 Array{Float64,2}:
0.149066 0.706264 0.983477 0.203822
0.478816 0.0901912 0.810107 0.675179
0.73195 0.756805 0.345936 0.821917
julia> X[1,:], X[2,:] = X[2,:], X[1,:]
(
1x4 Array{Float64,2}:
0.478816 0.0901912 0.810107 0.675179,
1x4 Array{Float64,2}:
0.149066 0.706264 0.983477 0.203822)
julia> X
3x4 Array{Float64,2}:
0.478816 0.0901912 0.810107 0.675179
0.149066 0.706264 0.983477 0.203822
0.73195 0.756805 0.345936 0.821917
Since this creates a pair of temporary arrays that we can't yet eliminate the allocation of, this isn't the most efficient approach. If you want the most efficient code here, looping over the two rows and swapping pairs of scalar values will be faster:
function swapRows!(X, i, j)
for k = 1:size(X,2)
X[i,k], X[j,k] = X[j,k], X[i,k]
end
end
Note that it is conventional in Julia to name functions that mutate one or more of their arguments with a trailing !. Currently, closures (i.e. inner functions) have some performance issues, so you'll want such a helper function to be defined at the top-level scope instead of inside of another function the way you've got it.
Finally, I assume this is an exercise since Julia ships with carefully tuned generic (i.e. it works for arbitrary numeric types) LU decomposition: http://docs.julialang.org/en/release-0.3/stdlib/linalg/#Base.lu.
-
It's quite simple
julia> A = rand(3,4)
3×4 Array{Float64,2}:
0.241426 0.283391 0.201864 0.116797
0.457109 0.138233 0.346372 0.458742
0.0940065 0.358259 0.260923 0.578814
julia> A[[1,2],:] = A[[2,1],:]
2×4 Array{Float64,2}:
0.457109 0.138233 0.346372 0.458742
0.241426 0.283391 0.201864 0.116797
julia> A
3×4 Array{Float64,2}:
0.457109 0.138233 0.346372 0.458742
0.241426 0.283391 0.201864 0.116797
0.0940065 0.358259 0.260923 0.578814

Finding duplicate values in r

So, In a string containing multiple 1's,
Now, it is possible that, the number
'1'
appears at several positions, let's say, at multiple positions. What I want is
(3)
This is not a complete answer, but some ideas (partly based on comments):
z <- "1101101101"
zz <- as.numeric(strsplit(z,"")[[1]])
Compute autocorrelation function and draw plot: in this case I'm getting the periodicity=3 pretty crudely as the first point at which there is an increase followed by a decrease ...
a1 <- acf(zz)
first.peak <- which(diff(sign(diff(a1$acf[,,1])))==-2)[1]
Now we know the periodicity is 3; create runs of 3 with embed() and analyze their similarities:
ee <- embed(zz,first.peak)
pp <- apply(ee,1,paste,collapse="")
mm <- outer(pp,pp,"==")
aa <- apply(mm[!duplicated(mm),],1,which)
sapply(aa,length) ## 3 3 2 ## number of repeats
sapply(aa,function(x) unique(diff(x))) ## 3 3 3
The following code does exactly what you ask for. Try it with str_groups('1101101101'). It returns a list of 3-vectors. Note that the first triple is (1, 3, 4) because the character at the 10th position is also a 1.
Final version, optimized and without errors
str_groups <- function (s) {
digits <- as.numeric(strsplit(s, '')[[1]])
index1 <- which(digits == 1)
len <- length(digits)
back <- length(index1)
if (back == 0) return(list())
maxpitch <- (len - 1) %/% 2
patterns <- matrix(0, len, maxpitch)
result <- list()
for (pitch in 1:maxpitch) {
divisors <- which(pitch %% 1:(pitch %/% 2) == 0)
while (index1[back] > len - 2 * pitch) {
back <- back - 1
if (back == 0) return(result)
}
for (startpos in index1[1:back]) {
if (patterns[startpos, pitch] != 0) next
pos <- seq(startpos, len, pitch)
if (digits[pos[2]] != 1 || digits[pos[3]] != 1) next
repeats <- length(pos)
if (repeats > 3) for (i in 4:repeats) {
if (digits[pos[i]] != 1) {
repeats <- i - 1
break
}
}
continue <- F
for (subpitch in divisors) {
sublen <- patterns[startpos, subpitch]
if (sublen > pitch / subpitch * (repeats - 1)) {
continue <- T
break
}
}
if (continue) next
for (i in 1:repeats) patterns[pos[i], pitch] <- repeats - i + 1
result <- append(result, list(c(startpos, pitch, repeats)))
}
}
return(result)
}
Note: this algorithm has roughly quadratic runtime complexity, so if you make your strings twice as long, it will take four times as much time to find all patterns on average.
Pseudocode version
To aid understanding of the code. For particulars of R functions such as which, consult the R online documentation, for example by running ?which on the R command line.
PROCEDURE str_groups WITH INPUT $s (a string of the form /(0|1)*/):
digits := array containing the digits in $s
index1 := positions of the digits in $s that are equal to 1
len := pointer to last item in $digits
back := pointer to last item in $index1
IF there are no items in $index1, EXIT WITH empty list
maxpitch := the greatest possible interval between 1-digits, given $len
patterns := array with $len rows and $maxpitch columns, initially all zero
result := array of triplets, initially empty
FOR EACH possible $pitch FROM 1 TO $maxpitch:
divisors := array of divisors of $pitch (including 1, excluding $pitch)
UPDATE $back TO the last position at which a pattern could start;
IF no such position remains, EXIT WITH result
FOR EACH possible $startpos IN $index1 up to $back:
IF $startpos is marked as part of a pattern, SKIP TO NEXT $startpos
pos := possible positions of pattern members given $startpos, $pitch
IF either the 2nd or 3rd $pos is not 1, SKIP TO NEXT $startpos
repeats := the number of positions in $pos
IF there are more than 3 positions in $pos THEN
count how long the pattern continues
UPDATE $repeats TO the length of the pattern
END IF (more than 3 positions)
FOR EACH possible $subpitch IN $divisors:
check $patterns for pattern with interval $subpitch at $startpos
IF such a pattern is found AND it envelopes the current pattern,
SKIP TO NEXT $startpos
(using helper variable $continue to cross two loop levels)
END IF (pattern found)
END FOR (subpitch)
FOR EACH consecutive position IN the pattern:
UPDATE $patterns at row of position and column of $pitch TO ...
... the remaining length of the pattern at that position
END FOR (position)
APPEND the triplet ($startpos, $pitch, $repeats) TO $result
END FOR (startpos)
END FOR (pitch)
EXIT WITH $result
END PROCEDURE (str_groups)
Perhaps the following route will help:
Convert string to a vector of integers characters
v <- as.integer(strsplit(s, "")[[1]])
Repeatedly convert this vector to matrices of varying number of rows...
m <- matrix(v, nrow=...)
...and use rle to find relevant patterns in the rows of the matrix m:
rle(m[1, ]); rle(m[2, ]); ...

Resources