Counting and listing motifs in SAGE - sage

The question was correctly answered in http://ask.sagemath.org/question/2612/motifs-and-subgraphs
I'm counting the number of 3-motifs (3-nodes isophormic class of connected subgraphs) in a random directed network. There are 13 of this. One is, for example S1={1 -> 2, 2 -> 3} and another one S2={1 -> 2, 2 -> 3, 1 -> 3}: they are two distinct motifs, and I wouldn't count a S1 when I actually find S2. The problem is that S1 is in S2, hence subgraph_search() finds a S1 in each S2 and all related functions inherit the problem (wrong counting, wrong iterator...).
Any idea how to resolve this issue? Similar things would happen for 4-nodes motifs and so on... I could remove from the graph the occurrence of S2 after having counted them, but that would be really a awful trick (and dangerous if I wanted to count also 4 motifs).
The code I used goes like:
import numpy
M1 = DiGraph(numpy.array([[0,1,0],[0,0,1],[0,0,0]])) #first motif
M5 = DiGraph(numpy.array([[0,1,1],[0,0,1],[0,0,0]])) #second motif
g = digraphs.RandomDirectedGNP(20,0.1) #a random network
l1 = []
for p in g.subgraph_search_iterator(M1): #search first motif
l1.append(p) #make a list of its occurences
l5 = []
for p in g.subgraph_search_iterator(M5): #the same for the second motif
l5.append(p)

The trick was to include the option induce=true in the subgraph_search() function as correctly answered in http://ask.sagemath.org/question/2612/motifs-and-subgraphs .

Related

How to arrange 5 M, 5 S and 5 T such that M and T are not adjacent and string starts with M and ends with T

Problem : 5 monkeys, 5 snakes and 5 tigers are standing in a line in a grocery store, with animals of the same species being indistinguishable. A monkey stands in the front of the line, and a tiger stands at the end of the line. Unfortunately, tigers and monkeys are sworn enemies, so monkeys and tigers cannot stand in adjacent places in line. Compute the number of possible arrangements of the line.
Solving this problem by hand is daunting. I want to write a program to output the possible arrangements and also count the total arrangements. My first thought was to use a brute force. Monkeys, snakes, and tigers can be represented by the letters M, S, and T respectively. With 1 M at start of string and 1 T at the end, there are 13!/(4!4!5!) = 90,090 possibilities. I would then remove arrangements that do not satisfy the second condition about adjacency.
My second thought was to first compute the number of arrangements where M and T are adjacent and then subtract this number from 90,090. I am new to programming so I am not sure how to do this.
Is there a better way to approach these types of problems? Any hints?
Thank you.
TL;DR: python solution using sympy
import sympy # sympy.ntheory.multinomial_coefficients
import math # math.comb
def count_monkeytigers(n_monkeys, n_snakes, n_tigers):
return sum(
m * math.comb(n_monkeys - 1, mb_minus1) * math.comb(n_tigers - 1, tb_minus1)
for (mb_minus1, eb, tb_minus1), m in
sympy.ntheory.multinomial_coefficients(3, n_snakes-1).items()
)
Explanation
We already know that there is an M at the beginning, a T at the end, and five S in the string:
M?? S ?? S ?? S ?? S ?? S ??T
Since M and T cannot be adjacent, and the only way to separate them is with an S, you can think of the S as separators; the five S are cutting the string into 6 "bins". Every bin can either be empty, or contain one or more M, or contain one or more T. Furthermore, the first bin contains at least an M, and the last bin contains at least a T.
To count all permutations of the string, we can do the following:
Loop over the triplets (monkey_bins, empty_bins, tiger_bins) deciding how many bins have monkeys, are empty, or have tigers;
For the loop, we can use bounds 1 <= monkey_bins <= 5; 0 <= empty_bins <= 5 - monkey_bins; tiger_bins = 6 - monkey_bins - empty_bins;
Count the number m of ways to choose monkey_bins bins, empty_bins bins and tiger_bins bins among 6 bins (Multinomial coefficient);
Count the number monkey_partitions of ways to place n_monkeys 'M' into monkey_bins bins with at least one M per bin (Stars and bars theorem 1);
Count the number tiger_partitions of ways to place n_tigers 'T' into tiger_bins bins with at least one T per bin (Stars and bars theorem 1;
Add m * monkey_partitions * tiger_partitions to the count.
Python code with loops
import math
def multinomial(*params):
return math.prod(math.comb(sum(params[:i]), x) for i, x in enumerate(params, 1))
def count_monkeytigers(n_monkeys, n_snakes, n_tigers):
result = 0
for monkey_bins in range(1, n_snakes + 1):
for empty_bins in range(0, n_snakes + 1 - monkey_bins):
tiger_bins = n_snakes + 1 - monkey_bins - empty_bins
m = multinomial(monkey_bins - 1, empty_bins, tiger_bins - 1) # nb permutations of the 3 types of bins
monkey_partitions = math.comb(n_monkeys - 1, monkey_bins - 1)
tiger_partitions = math.comb(n_tigers - 1, tiger_bins - 1)
result += m * monkey_partitions * tiger_partitions
return result
print(count_monkeytigers(5, 5, 5))
# 1251
print(count_monkeytigers(2,2,2))
# 3
# = len(['MMSSTT', 'MSMSTT', 'MMSTST'])
The code for multinomial comes from this question:
Does python have a function which computes multinomial coefficients?
Note that we're only using a "trinomial" coefficient here, so you can replace function multinomial with this simpler function if you want:
def trinomial(k1,k2,k3):
return math.comb(k1+k2+k3, k1) * math.comb(k2+k3, k2)
Python code using sympy
In the previous python code, we're manually looping over the possible triplets (monkey_bins, empty_bins, tiger_bins) and using the corresponding binomial coefficients. As it turns out, sympy.ntheory.multinomial_coefficients(m, n) returns a dictionary that contains specifically those triplets as keys and the corresponding multinomial coefficients as values!
We can use that to shorten our code:
import sympy # sympy.ntheory.multinomial_coefficients
import math # math.comb
def count_monkeytigers(n_monkeys, n_snakes, n_tigers):
return sum(
m * math.comb(n_monkeys - 1, mb_minus1) * math.comb(n_tigers - 1, tb_minus1)
for (mb_minus1, eb, tb_minus1), m in
sympy.ntheory.multinomial_coefficients(3, n_snakes-1).items()
)
print(count_monkeytigers(5, 5, 5))
# 1251
print(count_monkeytigers(2,2,2))
# 3
# = len(['MMSSTT', 'MSMSTT', 'MMSTST'])
Note that the dictionary multinomial_coefficients(3, n) contains all triplets of nonnegative numbers summing to n, including those where the middle-element empty_bins is equal to n, and the other two elements are 0. But we want at least one bin with monkeys, and at least one bin with tigers; hence I called the triplet (mb_minus1, eb, tb_minus1) rather than (mb, eb, tb), and accordingly I used n_snakes-1 rather than n_snakes+1 as the sum of the triplet.
Before writing code directly , Just solve the question on paper upto the factorial notations , then you can easily find factorial in Code
At first , fix 1 monkey at front and 1 tiger at end .
Then try to fix remaining tigers , then fix snakes in adjancent of tigers , Atleast one snake must be in adjacent of a tiger and then fix monkeys in adjacent of snakes

The Eight-Queen Puzzle in Programming in Lua Fourth Edition

I'm currently reading Programming in Lua Fourth Edition and I'm already stuck on the first exercise of "Chapter 2. Interlude: The Eight-Queen Puzzle."
The example code is as follows:
N = 8 -- board size
-- check whether position (n, c) is free from attacks
function isplaceok (a, n ,c)
for i = 1, n - 1 do -- for each queen already placed
if (a[i] == c) or -- same column?
(a[i] - i == c - n) or -- same diagonal?
(a[i] + i == c + n) then -- same diagonal?
return false -- place can be attacked
end
end
return true -- no attacks; place is OK
end
-- print a board
function printsolution (a)
for i = 1, N do -- for each row
for j = 1, N do -- and for each column
-- write "X" or "-" plus a space
io.write(a[i] == j and "X" or "-", " ")
end
io.write("\n")
end
io.write("\n")
end
-- add to board 'a' all queens from 'n' to 'N'
function addqueen (a, n)
if n > N then -- all queens have been placed?
printsolution(a)
else -- try to place n-th queen
for c = 1, N do
if isplaceok(a, n, c) then
a[n] = c -- place n-th queen at column 'c'
addqueen(a, n + 1)
end
end
end
end
-- run the program
addqueen({}, 1)
The code's quite commented and the book's quite explicit, but I can't answer the first question:
Exercise 2.1: Modify the eight-queen program so that it stops after
printing the first solution.
At the end of this program, a contains all possible solutions; I can't figure out if addqueen (n, c) should be modified so that a contains only one possible solution or if printsolution (a) should be modified so that it only prints the first possible solution?
Even though I'm not sure to fully understand backtracking, I tried to implement both hypotheses without success, so any help would be much appreciated.
At the end of this program, a contains all possible solutions
As far as I understand the solution, a never contains all possible solutions; it either includes one complete solution or one incomplete/incorrect one that the algorithm is working on. The algorithm is written in a way that simply enumerates possible solutions skipping those that generate conflicts as early as possible (for example, if first and second queens are on the same line, then the second queen will be moved without checking positions for other queens, as they wouldn't satisfy the solution anyway).
So, to stop after printing the first solution, you can simply add os.exit() after printsolution(a) line.
Listing 1 is an alternative to implement the requirement. The three lines, commented respectively with (1), (2), and (3), are the modifications to the original implementation in the book and as listed in the question. With these modifications, if the function returns true, a solution was found and a contains the solution.
-- Listing 1
function addqueen (a, n)
if n > N then -- all queens have been placed?
return true -- (1)
else -- try to place n-th queen
for c = 1, N do
if isplaceok(a, n, c) then
a[n] = c -- place n-th queen at column 'c'
if addqueen(a, n + 1) then return true end -- (2)
end
end
return false -- (3)
end
end
-- run the program
a = {1}
if not addqueen(a, 2) then print("failed") end
printsolution(a)
a = {1, 4}
if not addqueen(a, 3) then print("failed") end
printsolution(a)
Let me start from Exercise 2.2 in the book, which, based on my past experience to explain "backtracking" algorithms to other people, may help to better understand the original implementation and my modifications.
Exercise 2.2 requires to generate all possible permutations first. A straightforward and intuitive solution is in Listing 2, which uses nested for-loops to generate all permutations and validates them one by one in the inner most loop. Although it fulfills the requirement of Exercise 2.2, the code does look awkward. Also it is hard-coded to solve 8x8 board.
-- Listing 2
local function allsolutions (a)
-- generate all possible permutations
for c1 = 1, N do
a[1] = c1
for c2 = 1, N do
a[2] = c2
for c3 = 1, N do
a[3] = c3
for c4 = 1, N do
a[4] = c4
for c5 = 1, N do
a[5] = c5
for c6 = 1, N do
a[6] = c6
for c7 = 1, N do
a[7] = c7
for c8 = 1, N do
a[8] = c8
-- validate the permutation
local valid
for r = 2, N do -- start from 2nd row
valid = isplaceok(a, r, a[r])
if not valid then break end
end
if valid then printsolution(a) end
end
end
end
end
end
end
end
end
end
-- run the program
allsolutions({})
Listing 3 is equivalent to List 2, when N = 8. The for-loop in the else-end block does what the whole nested for-loops in Listing 2 do. Using recursive call makes the code not only compact, but also flexible, i.e., it is capable of solving NxN board and board with pre-set rows. However, recursive calls sometimes do cause confusions. Hope the code in List 2 helps.
-- Listing 3
local function addqueen (a, n)
n = n or 1
if n > N then
-- verify the permutation
local valid
for r = 2, N do -- start from 2nd row
valid = isplaceok(a, r, a[r])
if not valid then break end
end
if valid then printsolution(a) end
else
-- generate all possible permutations
for c = 1, N do
a[n] = c
addqueen(a, n + 1)
end
end
end
-- run the program
addqueen({}) -- empty board, equivalent allsolutions({})
addqueen({1}, 2) -- a queen in 1st row and 1st column
Compare the code in Listing 3 with the original implementation, the difference is that it does validation after all eight queens are placed on the board, while the original implementation validates every time when a queen is added and will not go further to next row if the newly-added queen causes conflicts. This is all what "backtracking" is about, i.e. it does "brute-force" search, it abandons the search branch once it finds a node that will not lead to a solution, and it has to reach a leaf of the search tree to determine it is a valid solution.
Back to the modifications in Listing 1.
(1) When the function hits this point, it reaches a leaf of the search tree and a valid solution is found, so let it return true representing success.
(2) This is the point to stop the function from further searching. In original implementation, the for-loop continues regardless of what happened to the recursive call. With modification (1) in place, the recursive call returns true if a solution was found, the function needs to stop and to propagate the successful signal back; otherwise, it continues the for-loop, searching for other possible solutions.
(3) This is the point the function returns after finishing the for-loop. With modification (1) and (2) in place, it means that it failed to find a solution when the function hits this point, so let it explicitly return false representing failure.

Generate Unique Combinations of Integers

I am looking for help with pseudo code (unless you are a user of Game Maker 8.0 by Mark Overmars and know the GML equivalent of what I need) for how to generate a list / array of unique combinations of a set of X number of integers which size is variable. It can be 1-5 or 1-1000.
For example:
IntegerList{1,2,3,4}
1,2
1,3
1,4
2,3
2,4
3,4
I feel like the math behind this is simple I just cant seem to wrap my head around it after checking multiple sources on how to do it in languages such as C++ and Java. Thanks everyone.
As there are not many details in the question, I assume:
Your input is a natural number n and the resulting array contains all natural numbers from 1 to n.
The expected output given by the combinations above, resembles a symmetric relation, i. e. in your case [1, 2] is considered the same as [2, 1].
Combinations [x, x] are excluded.
There are only combinations with 2 elements.
There is no List<> datatype or dynamic array, so the array length has to be known before creating the array.
The number of elements in your result is therefore the binomial coefficient m = n over 2 = n! / (2! * (n - 2)!) (which is 4! / (2! * (4 - 2)!) = 24 / 4 = 6 in your example) with ! being the factorial.
First, initializing the array with the first n natural numbers should be quite easy using the array element index. However, the index is a property of the array elements, so you don't need to initialize them in the first place.
You need 2 nested loops processing the array. The outer loop ranges i from 1 to n - 1, the inner loop ranges j from 2 to n. If your indexes start from 0 instead of 1, you have to take this into consideration for the loop limits. Now, you only need to fill your target array with the combinations [i, j]. To find the correct index in your target array, you should use a third counter variable, initialized with the first index and incremented at the end of the inner loop.
I agree, the math behind is not that hard and I think this explanation should suffice to develop the corresponding code yourself.

Maximize the number of isolated nodes in a network

I would like to know which node(s) should I delete if I want to maximize the number of isolated node in my undirected network?
For instance in the following R script, I would like the result to be H if I delete 1 node and H & U if I delete 2 nodes and so on ...
library(igraph)
graph <- make_graph( ~ A-B-C-D-A, E-A:B:C:D,
G-H-I,
K-L-M-N-K, O-K:L:M:N,
P-Q-R-S-P,
C-I, L-T, O-T, M-S,
C-P, C-L, I-U-V,V-H,U-H,H-W)
plot(graph)
Thanks for your help.
You will want to do something like:
Compute the k-coreness of each node (just called Graph.coreness in the python bindings, don't know about R).
Find the node with k-coreness 2, that connects to the largest number of nodes with k-coreness 1.
Edit:
Your counter-example was spot on, so I resorted to brute force (which is still linear time in this case).
This is a brute force python implementation that could be optimised (only loop over nodes with k-coreness 1), but it completes in linear time and should be accessible even if you don't know python.
import numpy as np
import igraph
def maximise_damage(graph):
coreness = graph.coreness()
# find number of leaves for each node
n = graph.vcount()
number_of_leaves = np.zeros((n))
for ii in range(n):
if coreness[ii] == 1:
neighbour = graph.neighbors(ii) # list of length 1
number_of_leaves[neighbour] += 1
# rank nodes by number of leaves
order = np.argsort(number_of_leaves)
# reverse order such that the first element has the most leaves
order = order[::-1]
return order, number_of_leaves[order]
EDIT 2:
Just realised this will not work in general for cases where you want to delete more than 1 node at a time. But I think the general approach would still work -- I will think about it some more.
EDIT 3:
Here we go; still linear. You will need to process the output a little bit though -- some solutions are less than the number of nodes that you want to delete, and then you have to combine them.
import numpy as np
import igraph
def maximise_damage(graph, delete=1):
# get vulnerability
# nodes are vulnerable if their degree count is lower
# than the number of nodes that we want to delete
vulnerability = np.array(graph.degree())
# create a hash table to keep track of all combinations of nodes to delete
combinations = dict()
# loop over vulnerable nodes
for ii in np.where(vulnerability <= delete)[0]:
# find neighbours of vulnerable nodes and
# count the number of vulnerable nodes for that combination
neighbours = tuple(graph.neighbors(ii))
if neighbours in combinations:
combinations[neighbours] += 1
else:
combinations[neighbours] = 1
# determine rank of combinations by number of vulnerable nodes dangling from them
combinations, counts = combinations.keys(), combinations.values()
# TODO:
# some solutions will contain less nodes than the number of nodes that we want to delete;
# combine these solutions
return combinations, counts

How do you find the first index of multiple 'motifs' in a sequence?

I'm learning Julia, but have relatively little programming experience outside of R. I'm taking this problem directly from rosalind.info and you can find it here if you'd like a bit more detail.
I've given two strings: a motif and a sequence where the motif is a substring of the sequence and i'm tasked with finding out the index of the beginning position of the substring however many times it is found in the sequence.
For example:
Sequence: "GATATATGCATATACTT"
Motif: "ATAT"
ATAT is found three times, once beginning at index 2, once at index 4, and once at index 10. This is assuming 1-based indexing. So the final output would be: 2 4 10
Here's what I have so far:
f = open("motifs.txt")
stream = readlines(f)
sequence = chomp(stream[1])
motif = chomp(stream[2])
println("Sequence: $sequence")
println("Motif: $motif")
result = searchindex(sequence, motif)
println("$result")
close(f)
My main problem seems to be that there isn't a searchindexall option. The current script gives me the first index of the first time the motif is encountered (index 2), i've tried a variety of for loops that haven't ended in much success so i'm hoping that someone can give me some insight on this.
Here is one solution with while loops:
sequence = "GATATATGCATATACTT"
motif = "ATAT"
function find_indices(sequence, motif)
# initalise empty array of integers
found_indices = Array{Int, 1}()
# set initial values for search helpers
start_at = 1
while true
# search string for occurrence of motif
result = searchindex(sequence, motif, start_at)
# if motif not found, terminate while loop
result == 0 && break
# add new index to results
push!(found_indices, result-1+start_at)
start_at += result + 1
end
return found_indices
end
This gives what you want:
>find_indices(sequence, motif)
2
4
10
If the performance is not so important, regular expression can be a good choice.
julia> map(x->x.offset, eachmatch(r"ATAT", "GATATATGCATATACTT", true))
3-element Array{Any,1}:
2
4
10
PS. The third arguments of eachmatch means "overlap", don't forget to set it true.
If a better performance is required, maybe you should spend some time implementing an algorithm like KMP.

Resources