How is the number of random walks determined in GDS/Neo4j? - graph

I am running the random walk algorithm on my Neo4j graph named 'example', with the minimum allowed walk length (2) and walks per node (1). Namely,
CALL gds.beta.randomWalk.stream(
'example',
{
walkLength: 2,
walksPerNode: 1,
randomSeed: 42,
concurrency: 1
}
)
YIELD nodeIds, path
RETURN nodeIds, [node IN nodes(path) | node.name ] AS event_name
And I get 41 walks. How is this number determined? I checked the graph and it contains 161 nodes and 574 edges. Any insights?
Added later: Here is more info on the projected graph that I am constructing. Basically, I am filtering on nodes and relationships and just projecting the subgraph and doing nothing else. Here is the code -
// Filter for only IDH Codel recurrent events
WITH [path=(m:IDHcodel)--(n:Tissue)
WHERE (m.node_category = 'molecular' AND n.event_class = 'Recurrence')
AND NOT EXISTS((m)--(:Tissue{event_class:'Primary'})) | m] AS recur_events
// Obtain the sub-network with 2 or more patients in edges
MATCH p=(m1)-[r:hasIDHcodelPatients]-(m2)
WHERE (m1 IN recur_events AND m2 IN recur_events AND r.total_common_patients >= 2)
WITH COLLECT(p) AS all_paths
WITH [p IN all_paths | nodes(p)] AS path_nodes, [p IN all_paths | relationships(p)] AS path_rels
WITH apoc.coll.toSet(apoc.coll.flatten(path_nodes)) AS subgraph_nodes, apoc.coll.flatten(path_rels) AS subgraph_rels
// Form the GDS Cypher projection
CALL gds.graph.create.cypher(
'example',
'MATCH (n) where n in $sn RETURN id(n) as id',
'MATCH ()-[r]-() where r in $sr RETURN id(startNode(r)) as source , id(endNode(r)) as target, { LINKS: { orientation: "UNDIRECTED" } }',
{parameters: {sn: subgraph_nodes, sr: subgraph_rels} }
)
YIELD graphName AS graph, nodeQuery, nodeCount AS nodes, relationshipQuery, relationshipCount AS rels
RETURN graph, nodes, rels
Thanks.

It seems that the documentation is missing the description for the sourceNodes parameter, which would tell you how many walks will be created.
We don't know the default value, but we can use the parameter to set the source nodes that the walk should start from.
For example, you could use all the nodes in the graph to be treated as a source node (the random walk will start from them).
MATCH (n)
WITH collect(n) AS nodes
CALL gds.beta.randomWalk.stream(
'example',
{ sourceNodes:nodes,
walkLength: 2,
walksPerNode: 1,
randomSeed: 42,
concurrency: 1
}
)
YIELD nodeIds, path
RETURN nodeIds, [node IN nodes(path) | node.name ] AS event_name
This way you should get 161 walks as there are 161 nodes in your graph and the walksPerNode is set to 1, so a single random walk will start from every node in the graph. In essence, the number of source nodes times the walks per node will determine the number of random walks.

Related

Creating a subgraph using Cypher projection

I am trying to create a subgraph of my graph using Cypher projection because I want to use the GDS library. First, I am creating a subgraph using Cypher query which works perfectly fine. Here is the query:
// Filter for only recurrent events
WITH [path=(m:IDHcodel)--(n:Tissue)
WHERE (m.node_category = 'molecular' AND n.event_class = 'Recurrence')
AND NOT EXISTS((m)--(:Tissue{event_class:'Primary'})) | m] AS recur_events
// Obtain the sub-network with 2 or more patients in edges
MATCH p=(m1)-[r:hasIDHcodelPatients]->(m2)
WHERE (m1 IN recur_events AND m2 IN recur_events AND r.total_common_patients >= 2)
WITH COLLECT(p) AS all_paths
WITH [p IN all_paths | nodes(p)] AS path_nodes, [p IN all_paths | relationships(p)] AS path_rels
RETURN apoc.coll.toSet(apoc.coll.flatten(path_nodes)) AS subgraph_nodes, apoc.coll.flatten(path_rels) AS subgraph_rels
So far so good. Now all I am trying to do is a Cypher projection by sending the subgraph nodes and subgraph rels as parameters in the GDS create query and this gives me a null pointer exception:
// All the above lines except using WITH instead of RETRUN in the last line. ie.,
...
WITH apoc.coll.toSet(apoc.coll.flatten(path_nodes)) AS subgraph_nodes, apoc.coll.flatten(path_rels) AS subgraph_rels
// Call gds library to create a graph by sending subgraph_nodes and subgraph_rels as parameters
CALL gds.graph.create.cypher(
'example',
'MATCH (n) where n in $sn RETURN id(n) as id',
'MATCH ()-[r]-() where r in $sr RETURN r.start as source , r.end as target',
{parameters: {sn: subgraph_nodes, sr: subgraph_rels} }
) YIELD graphName AS graph, nodeQuery, nodeCount AS nodes, relationshipQuery, relationshipCount AS rels
RETURN graph
What could be wrong? Thanks.
To access start and end node of a relationship, there is a slightly different syntax that you are using:
WITH apoc.coll.toSet(apoc.coll.flatten(path_nodes)) AS subgraph_nodes, apoc.coll.flatten(path_rels) AS subgraph_rels
// Call gds library to create a graph by sending subgraph_nodes and subgraph_rels as parameters
CALL gds.graph.create.cypher(
'example',
'MATCH (n) where n in $sn RETURN id(n) as id',
'MATCH ()-[r]-() where r in $sr RETURN id(startNode(r)) as source , id(endNode(r)) as target',
{parameters: {sn: subgraph_nodes, sr: subgraph_rels} }
) YIELD graphName AS graph, nodeQuery, nodeCount AS nodes, relationshipQuery, relationshipCount AS rels
RETURN graph
This is what I noticed, hopefully this is the only error.

All path *lengths* from source to target in Directed Acyclic Graph

I have a graph with an adjacency matrix shape (adj_mat.shape = (4000, 4000)). My current problem involves finding the list of path lengths (the sequence of nodes is not so important) that traverses from the source (row = 0 ) to the target (col = trans_mat.shape[0] -1).
I am not interested in finding the path sequences; I am only interested in propagating the path length. As a result, this is different from finding all simple paths - which would be too slow (ie. find all paths from source to target; then score each path). Is there a performant way to do this quickly?
DFS is suggested as one possible strategy (noted here). My current implementation (below) is simply not optimal:
# create graph
G = nx.from_numpy_matrix(adj_mat, create_using=nx.DiGraph())
# initialize nodes
for node in G.nodes:
G.nodes[node]['cprob'] = []
# set starting node value
G.nodes[0]['cprob'] = [0]
def propagate_prob(G, node):
# find incoming edges to node
predecessors = list(G.predecessors(node))
curr_node_arr = []
for prev_node in predecessors:
# get incoming edge weight
edge_weight = G.get_edge_data(prev_node, node)['weight']
# get predecessor node value
if len(G.nodes[prev_node]['cprob']) == 0:
G.nodes[prev_node]['cprob'] = propagate_prob(G, prev_node)
prev_node_arr = G.nodes[prev_node]['cprob']
# add incoming edge weight to prev_node arr
curr_node_arr = np.concatenate([curr_node_arr, np.array(edge_weight) + np.array(prev_node_arr)])
# update current node array
G.nodes[node]['cprob'] = curr_node_arr
return G.nodes[node]['cprob']
# calculate all path lengths from source to sink
part_func = propagate_prob(G, 4000)
I don't have a large example by hand (e.g. >300 nodes), but I found a non recursive solution:
import networkx as nx
g = nx.DiGraph()
nx.add_path(g, range(7))
g.add_edge(0, 3)
g.add_edge(0, 5)
g.add_edge(1, 4)
g.add_edge(3, 6)
# first step retrieve topological sorting
sorted_nodes = nx.algorithms.topological_sort(g)
start = 0
target = 6
path_lengths = {start: [0]}
for node in sorted_nodes:
if node == target:
print(path_lengths[node])
break
if node not in path_lengths or g.out_degree(node) == 0:
continue
new_path_length = path_lengths[node]
new_path_length = [i + 1 for i in new_path_length]
for successor in g.successors(node):
if successor in path_lengths:
path_lengths[successor].extend(new_path_length)
else:
path_lengths[successor] = new_path_length.copy()
if node != target:
del path_lengths[node]
Output: [2, 4, 2, 4, 4, 6]
If you are only interested in the number of paths with different length, e.g. {2:2, 4:3, 6:1} for above example, you could even reduce the lists to dicts.
Background
Some explanation what I'm doing (and I hope works for larger examples as well). First step is to retrieve the topological sorting. Why? Then I know in which "direction" the edges flow and I can simply process the nodes in that order without "missing any edge" or any "backtracking" like in a recursive variant. Afterwards, I initialise the start node with a list containing the current path length ([0]). This list is copied to all successors, while updating the path length (all elements +1). The goal is that in each iteration the path length from the starting node to all processed nodes is calculated and stored in the dict path_lengths. The loop stops after reaching the target-node.
With igraph I can calculate up to 300 nodes in ~ 1 second. I also found that accessing the adjacency matrix itself (rather than calling functions of igraph to retrieve edges/vertices) also saves time. The two key bottlenecks are 1) appending a long list in an efficient manner (while also keeping memory) 2) finding a way to parallelize. This time grows exponentially past ~300 nodes, I would love to see if someone has a faster solution (while also fitting into memory).
import igraph
# create graph from adjacency matrix
G = igraph.Graph.Adjacency((trans_mat_pad > 0).tolist())
# add edge weights
G.es['weight'] = trans_mat_pad[trans_mat_pad.nonzero()]
# initialize nodes
for node in range(trans_mat_pad.shape[0]):
G.vs[node]['cprob'] = []
# set starting node value
G.vs[0]['cprob'] = [0]
def propagate_prob(G, node, trans_mat_pad):
# find incoming edges to node
predecessors = trans_mat_pad[:, node].nonzero()[0] # G.get_adjlist(mode='IN')[node]
curr_node_arr = []
for prev_node in predecessors:
# get incoming edge weight
edge_weight = trans_mat_pad[prev_node, node] # G.es[prev_node]['weight']
# get predecessor node value
if len(G.vs[prev_node]['cprob']) == 0:
curr_node_arr = np.concatenate([curr_node_arr, np.array(edge_weight) + propagate_prob(G, prev_node, trans_mat_pad)])
else:
curr_node_arr = np.concatenate([curr_node_arr, np.array(edge_weight) + np.array(G.vs[prev_node]['cprob'])])
## NB: If memory constraint, uncomment below
# set max size
# if len(curr_node_arr) > 100:
# curr_node_arr = np.sort(curr_node_arr)[:100]
# update current node array
G.vs[node]['cprob'] = curr_node_arr
return G.vs[node]['cprob']
# calculate path lengths
path_len = propagate_prob(G, trans_mat_pad.shape[0]-1, trans_mat_pad)

Neo4j: match with multiple relations in timely manner

Consider following nodes that are connected between each other with 2 type of edges: direct and intersect. The query needs to discover all possible paths between 2 nodes that satisfies all following rules:
0..N direct edges
0..1 intersect edge
intersect edge can be between direct edges
These paths are considered valid between nodeA and nodeZ:
(nodeA)-[:direct]->(nodeB)-[:direct]->(nodeC)->[:direct]->(nodeZ)
(nodeA)-[:intersect]->(nodeB)-[:direct]->(nodeC)->[:direct]->(nodeZ)
(nodeA)-[:direct]->(nodeB)-[:intersect]->(nodeC)->[:direct]->(nodeZ)
(nodeA)-[:direct]->(nodeB)->[:direct]->(nodeC)-[:intersect]->(nodeZ)
Basically intersect edge can happen anywhere in the path but only once.
My ideal cypher query in non-existing neo4j version would be this:
MATCH (from)-[:direct*0..N|:intersect*0..1]->(to)
But neo4j doesn't support multiple constraints for edges type :(.
UPDATE 23.04.16
There 6609 nodes (out of 550k total), 5184 edges of type direct (out of 440k total) and 34119 of type intersect (out of 37289 total). There are some circular references expected (which neo4j avoids, isn't it?)
The query that looked promising but failed to finish in a manner of seconds:
MATCH p = (from {from: 1})-[:direct|intersect*0..]->(to {to: 99})
WHERE
123 < from.departureTS < 123 + 86400 //next day
AND REDUCE(s = 0, x IN RELATIONSHIPS(p) | CASE TYPE(x) WHEN 'intersect' THEN s + 1 ELSE s END) <= 1
return p;
Here is a query that conforms to the stated requirements:
MATCH p = (from)-[:direct|intersect*0..]->(to)
WHERE REDUCE(s = 0, x IN RELATIONSHIPS(p) |
CASE WHEN TYPE(x) = 'intersect' THEN s + 1 ELSE s END) <= 1
return p;
It returns all paths with 0 or more direct relationships and 0 or 1 intersect relationships.
This will do what you want:
// Cybersam's correction:
MATCH p = ((from)-[:direct*0..]->(middle)-[:intersect*0..1]->(middle2)-[:direct*0..]->(to)‌​) return DISTINCT p;
return p
Here's the test scenario I used:
create (a:nodeA {name: "A"})
create (b:nodeB {name: "B"})
create (c:nodeC {name: "C"})
create (z:nodeZ {name: "Z"})
merge (a)-[:direct {name: "D11"}]->(b)-[:direct {name: "D21"}]->(c)-[:direct {name: "D31"}]->(z)
merge (a)-[:intersect {name: "I12"}]->(b)-[:direct {name: "D22"}]->(c)-[:direct {name: "D32"}]->(z)
merge (a)-[:direct {name: "D13"}]->(b)-[:intersect {name: "I23"}]->(c)-[:direct {name: "D33"}]->(z)
merge (a)-[:direct {name: "D14"}]->(b)-[:direct {name: "D24"}]->(c)-[:intersect {name: "I34"}]->(z)
merge (a)-[:intersect {name: "I15"}]->(z)
// Cybersam's correction:
MATCH p = ((from)-[:direct*0..]->(middle)-[:intersect*0..1]->(middle2)-[:direct*0..]->(to)‌​) return DISTINCT p;
return p
I made the mistake of thinking the graph on the browser reflected the data that was returned in "p" - it did not, you have to look at the "rows" part of the report to get all the details.
This query will also return single nodes- which fits the requirements.

Recognition of interval graphs

I need an algorithm for recognizing an interval graph and generate its intervals.
After some research I found the Algorithm developed by Wen-Lian Hsu.
(http://www.iis.sinica.edu.tw/IASL/webpdf/paper-1992-A_New_Test_for_Interval_Graphs.pdf).
It seems to be an algorithm, which solves my problem. But, I am not a computer scientist so I am having problems understanding the algorithm.
Could anybody explain this algorithm to a novice, plain and simple?
Having worked through some examples I think I know what is going on, though I still do not follow algorithm 4. My algorithm for determining if graphs are interval graphs is below followed by some Javascript code to implement the algorithm. Putting it in code allowed me to check whether it worked or not. The code can be found at this JSFiddle. I have tested the code with these three graphs. (1 and 2 are interval graphs 3 is not)
As I have made the algorithm from my interpretation of the paper given in my earlier answer I can give no guarantees that it is fully correct but it seems to work.
The algorithm will be worked through with graph 1
When x and y are vertices of the graph then x and y are neighbours when x and y are joined by an edge of the graph.
Algorithm for Interval Graphs.
Stage 1 create a Lexicographically Ordered list, L, of vertices from the graph.
Form an arbitrarily ordered list U of vertices of the graph, called a CLASS.
Form US, an ordered list of one element the class U.
While US is not empty
Take the 1st vertex, v, from the 1st class in US and put it at the front of L.
Set T to be an empty class
For each vertex in each class in US
If it is a neighbour of v remove it from its class and push to back of T
If T is not empty put T at front of US.
Remove any empty classes from US
L is now a Lexicographically Ordered list of vertices from the graph
For graph 1
U= (3,6,1,4,5,8,2,7)
US=( (3,6,1,4,5,8,2,7))
v=3 US=( (3,6,1,4,5,8,2,7)) L=(3)
neighbours of 3 to front
US=( (6,8),(1,4,5,2,7))
v=6 US=((8)(1,4,5,2,7)) L=(6,3)
neighbours of 6 to front
US=((8,1,7),(4,5,2))
v=8 US=((1,7)(4,5,2)) L=(8,6,3)
neighbours of 8 to front
US=((7,4,5,2)(1))
v=7 US=((4,5,2)(1)) L=(7,8,6,3)
neighbours of 7 to front
US=( (4,5)(2)(1))
v=4 US=((5)(2)(1)) L=(4,7,8,6,3)
neighbours of 4 to front
US=((5)(2)(1))
v=5 US=((2)(1)) L=(5,4,8,6,3)
neighbours of 5 to front – no neighbours so no change
US=((2)(1))
v=2 US=((1)) L=(2,5,4,8,6,3)
neighbours of 2 to front – no neighbours so no change
US=((1))
v=1 US=() L=(1,2,5,4,8,6,3)
L finished
Stage 2 – First test for an Interval Graph
When x is a vertex of the graph and L a lexicographical Ordered list of the graph
Set RN(x) to be the neighbours of x which come after x in L placed in the same order that they occur in L
Set parent(x) to be the first vertex in RN(x)
Set RNnoP(x) to be the vertices of in RN(x) with parent(x) removed, ie RN(x)/parent(x)
If the graph is an interval graph then for all vertices x if RNnoP(x) has any vertices in it then they will all appear in RN(parent(x)), ie RNnoP(x) is a subset of RN(parent(x)
If any x fails this test then the graph cannot be an interval graph.
Below shows the results for the example graph
x RN(x) Parent(x) RN(x)/parent(x) RN(parent(x)) Pass
1 | 6 | 6 | - | 3- | T
2 | 8 | 8 | - | 6- | T
5 | 7,8 | 7 | 8 | 8- | T
4 | 7,8 | 7 | 8 | 8- | T
7 |8 | 8 | - | 6- | T
8 | 6,3 | 6 | 3 | 3- | T
6 | 3 | 3 | - | -- | T
3 | - | - | - | -- | T
Stage 3 - for graphs that pass stage 2 form cliques for each vertex of the graph and create a set of maximal cliques
A CLIQUE is a set of vertices from the graph such that for any pair of different vertices in the set x,y then x and y are neighbours.
Set C(x) to be the set containing the vertices in RN(x) together with x, ie RN(x) Union {x}
Now it is necessary to form a set of maximal cliques. A clique is MAXIMAL if the addition on any other vertex into the clique stops it being a clique.
Using the parent relationships found in stage 2 form a tree
Using post-order traverse the tree and form a set, CS, of vertices where for each x in CS C(x) is a maximal clique using the following process on all vertices except for the root.
If C(parent(x)) is a subset of C(x)
if parent(x) in CS remove it
put x at in CS
Below is a tree for the example graph showing x and C[x] at each node
x RN(x) Parent(x) C(x) C(parent(x))
1 |6 | 6 | 1,6 | 3,6
2 Z8 | 8 | 2,8 | 3,6,8
5 |7,8 | 7 | 5,7,8 | 7,8
4 |7,8 | 7 | 4,7,8 | 7,8
7 |8 | 8 | 7,8 | 3,6,8
8 |6,3 | 6 | 3,6,8 | 3,6
6 |3 | 3 | 3,6 | 3
3 | - | - | 3 | -
The process on above tree
x=3 is root
x=6 C(6) = {3,6} CS=(6) C(3) is a subset of C(6) but 3 not in OC
x=1 C(1)={1,6} CS=(1,6) C(6) not a subset of C(1) put in 6
x=8 C(8)={3,6,8) CS=(1,8) C(6) is a subset of C(8) remove 6 put in 8
x=7 C(7)={7,8} ) CS=(1,8,7) C(8) not a subset of C(7) put in 7
x=4 C(4)={4,7,8} CS=(1,8,4) C(7) is a subset of C(4) remove 7 put in 4
x=5 C(5)={5,7,8} CS=(1,8,4,5) C(7) is a subset of C(5) but no 7, put in 5
x=2 C(2)={2,8} CS={1,8,4,5,2} C(8) is not a subset of C(2) put in 2
NOTE in the code I used a Children’s relationship to traverse the tree as a way of excluding the root.
Stage 4 Attempt to order the maximal cliques so that they are consecutive.
Cliques are in CONSECUTIVE order if for any vertex x, if x is in cliques n and n+m, m>0, the x is in cliques n+1, n+2, ……n+ m-1
The following algorithm will put the maximal cliques in consecutive order if it is possible to do so
Set NC to be the ordered list, or class, of maximal cliques from CS, such that if x is in CS(x) then C(x) is in NC
Set P to be the ordered list containing NC, P=(NC)
Set OC=() and empty ordered list
While P is not empty
Take the last clique, LST, from the last class of P and put at front of OC
For each clique Q in the last class of P partition into two classes
OUT if Q and LST have no vertices in common (intersection empty)
IN if Q and LST have vertices in common (non empty intersection)
Replace the last class of P with the classes OUT IN if both non empty
Replace the last class of P with the class OUT if OUT non empty and IN empty
Replace the last class of P with the classes IN if IN non empty and OUT empty
Leave P if both empty
For the example graph P=(({3,6,8},{4,7,8},{1,6},{5,7,8},{2,8})) (I have mixed up the order to show how the process works)
P=(({3,6,8},{4,7,8},{1,6},{5,7,8},{2,8})) OC=()
P=(({3,6,8},{4,7,8},{1,6},{5,7,8})) OC=({2,8})
OUT=({1,6}) IN=({3,6,8},{4,7,8},{5,7,8})
P=(({1,6}),({3,6,8},{4,7,8},{5,7,8})) OC=({2,8})
P=(({1,6}),({3,6,8},{4,7,8})) OC=({5,7,8},{2,8})
OUT=() IN({3,6,8},{4,7,8})
P=(({1,6}),({3,6,8},{4,7,8})) OC=({5,7,8},{2,8})
P=(({1,6}),({3,6,8})) OC=({4,7,8},{5,7,8},{2,8})
OUT=() IN({3,6,8})
P=(({1,6}),({3,6,8})) OC=({4,7,8},{5,7,8},{2,8})
P=(({1,6})) OC=({3,6,8},{4,7,8},{5,7,8},{2,8})
OUT=() IN=({1,6})
P=(({1,6})) OC=({3,6,8},{4,7,8},{5,7,8},{2,8})
P=(()) OC=({1,6},{3,6,8},{4,7,8},{5,7,8},{2,8})
P=()
NOTE in the code I have left NC = cliques as a list of vertices and used clique(label) for C(x)
Stage 5 check if OC is consecutively ordered
For each v in CS (as in stage 3)
For each vertex, x in C(v) the clique associated with v
If x is only in adjacent cliques in OC then
Interval graph
else
Not an interval graph
Stage 6 if past consecutive test draw intervals
Where n is the number of vertices determine n columns numbered 1 to n with gaps between them and equal width
For each v in CS with v first appearing in clique i and lastly in clique j 1=i<=j=n draw the interval from the start of column i to the end of column j
CODE IN JAVASCRIPT
Styles for Output of Intervals
.interval {
border-bottom: 1px solid black;
position: absolute
}
.label {
position:absolute;
}
Code
//Array methods added to carry out interval graph check
if(!Array.indexOf){ //Needed for earlier versions of IE;
Array.prototype.indexOf = function(obj){
for(var i=0; i<this.length; i++){
if(this[i]===obj){
return i;
}
}
return -1;
}
}
Array.prototype.subsetOf = function(set){ //returns true if Array is a subset of set
if(this.length==0) {return true;} //empty set is subset of all sets
var subset=true;
for(var i=0; i<this.length; i++){
subset = subset && (set.indexOf(this[i])>-1); //element of Array not in set forces subset to be false
}
return subset;
}
Array.prototype.intersection = function(set){ //returns the intersection of Array and set
if(this.length==0) {return [];} //empty set intersect any set is empty set
var inter=[];
for(var i=0; i<this.length; i++){
if(set.indexOf(this[i])>-1) {//element of Array and set
inter.push(this[i]);
}
}
return inter;
}
Array.prototype.union = function(set){ //returns the union of Array and set
var union=[];
for(var i=0; i<this.length; i++){
union.push(this[i]);
}
for(var i=0; i<set.length; i++) {
if(union.indexOf(set[i])==-1) {//element not yet in union
union.push(set[i]);
}
}
return union;
}
//A set is an array with no repeating elements
function vertex(label,neighbours) {
this.label=label;
this.neighbours=neighbours;
}
//Using the following format for each vertex on the graph [vertex lable, [neighbour lables] ] set up the model of the graph
//order of vertices does not matter
//graph One - an interval graph
graph=[
[3,[6,8]],
[6,[1,3,7,8]],
[1,[6]],
[4,[7,8]],
[5,[7,8]],
[8,[2,3,4,5,6,7]],
[2,[8]],
[7,[4,5,8]]
];
//graph Two - an interval graph
/*graph=[
['A',['B','C','D']],
['B',['A','C']],
['C',['A','B','D','E','F']],
['D',['A','C','E','F']],
['E',['C','D']],
['F',['D','G']],
['G',['F']]
];
//graph Three - not an interval graph
graph=[
['W',['Y','Z']],
['X',['Z']],
['Y',['W']],
['Z',['W']]
];
*/
/*Create a new vertex object U[i] where U is an unordered array.
*Unordered as at this point the numbering of vertices does not matter.
*Referencing by name rather than an array index is easier to follow
*/
var U=[];
for(var i=0; i<graph.length; i++) {
U[i]=new vertex(graph[i][0],graph[i][4]);
}
var US=[U];
/*US is an array containing the single array U
* during Lexicographical ordering US will contain other arrays
*/
//********************Lexicographical Ordering Start*************************
var L=[]; //L with contain the vertices in Lexicographical order.
while (US.length>0) {
F=US[0]; //First array in US
vertex=F.shift(); //In Javascript shift removes first element of an array
if(F.length==0) { //F is empty
US.shift();
}
L.unshift(vertex); //In Javascript unshift adds to front of array
var T=new Array(); //new array to add to front of US
tus=[]; //tempory stack for US sets
while(US.length>0) { //for remaining vertices in the arrays in US check if neighbours of vertex
set=US.shift(); //first set of US
ts=[]; //tempory stack for set elements
while(set.length>0){
v=set.shift(); //v is one of the remaining vertices //
lbl=v.label;
if (vertex.neighbours.indexOf(lbl) != -1) { //is v a neighbour of vertex
T.push(v); //push v to T
}
else {
ts.unshift(v); //not a neighbour store for return
}
}
while(ts.length>0) { //restore list of v not moved to set
set.push(ts.pop());
}
if(set.length>0) {//if set not empty store for restoration
tus.unshift(set);
}
}
if(T.length>0) { //if T not empty
US.push(T); //put T as first set of US
}
while(tus.length>0) {
US.push(tus.pop()); // restore non empty sets
}
}
//************************End of Lexicographical Ordering*************************
//----------------------Chordality Check and Clique Generation Start----------------------------------------
RN={}; //RN as an object so that an associative array can be used if labels are letters
Parent={}; //Parent as an object so that an associative array can be used if labels are letters
RNnoP={}; //RN with parent removed as an object so that an associative array can be used if labels are letters
Children={}; //Used in clique generation NOTE this is a deviation from given alogorithm 4 which I do not follow, again object for associative array
for(var i=0; i<L.length;i++) {
Children[L[i].label]=[];
}
var chordal=true;
for(var i=0; i<L.length-1; i++) {
vertex=L[i];
RN[vertex.label]=[];
RNnoP[vertex.label]=[];
for(j=i+1;j<L.length; j++) {
v=L[j];
lbl=v.label;
if(vertex.neighbours.indexOf(lbl) != -1) {
RN[vertex.label].push(lbl); //store vertex labels in order they are processed
}
}
Parent[vertex.label]=RN[vertex.label][0]; //Parent is front vertex of RN
Children[RN[vertex.label][0]].push(vertex.label);//used for Clique generation my method
for(k=1;k<RN[vertex.label].length;k++) {
RNnoP[vertex.label][k-1]=RN[vertex.label][k];
}
}
//************** chordality check ************
for(i=0; i<L.length-1; i++) {
vertex=L[i];
var x=vertex.label;
parentx=Parent[x];
for(j=0;j<RNnoP[x].length;j++) {
chordal = chordal && (RNnoP[x].subsetOf(RN[parentx]));
}
}
if(!chordal) {
alert('Not an Interval Graph');
}
else {
//Construct maximal clique list from tree formed by parent and child relationships determined above NOTE not algorithm 4
var root = Children[L[L.length-1].label]; //last vertex in L which has no parent
RN[L[L.length-1].label]=[]; //no vertices to right of last vertex
var clique={}; //clique for each vertex label -- object so associative array using lables
var cliques=[]; //stores maximal cliques from subtree of vertices processed
clique[L[L.length-1].label]=[L[L.length-1].label]; //clique for root contains last vertex label
generateCliques(root); //cliques becomes a list of labels of vertices with maximal cliques
var pivots=[];
for(i=0;i<cliques.length;i++) {
pivots=pivots.union(clique[cliques[i]]);
}
/*attempt to place each clique in cliques in consecutive order
* ie all cliques containing a given label are all next to each other
* if at end of process the cliques are in consecutive order then have an interval graph otherwise not an interval graph
*/
var orderedCliques=[]; //will contain maximal cliques in consecutive order if possible
var partitions=[cliques]; //holds partitions of cliques during process
while(partitions.length>0) {
inPartition=new Array(); //partition of elements containing pivot
outPartition=new Array(); //partition of elements not containing pivot
lastPartition=partitions.pop(); //last partition of cliques
orderedCliques.unshift(lastPartition.shift());//first label in partition moved to front of orderedCliques
pivotClique=clique[orderedCliques[0]]; //which points to pivot clique
for(var i=0; i<lastPartition.length; i++) {
if(pivotClique.intersection(clique[lastPartition[i]]).length>0){ //non empty intersection
inPartition=inPartition.union([lastPartition[i]]);
}
else {
outPartition=outPartition.union([lastPartition[i]]);
}
}
if(outPartition.length>0) {
partitions.push(outPartition);
}
if(inPartition.length>0) {
partitions.push(inPartition);
}
}
//----------------------End of Chordality Check and Clique Generation----------------------------------------
var start={}; //start is an associative array;
var end={}; //end is an associative array;
if (consecutive()){
//draw intervals......................
var across=20;
var down=20;
var colwidth=20;
var gap=30;
var coldepth=30;
var height=20;
for(v=0;v<pivots.length;v++) {
var vertex=pivots[v];
var line=document.createElement('div');
line.style.top=(down+(coldepth+height)*v)+'px';
line.style.height=(coldepth+height)+'px';
line.style.left=(across+start[vertex]*(colwidth+gap))+'px';
line.style.width=((end[vertex]-start[vertex])*gap+(1+end[vertex]-start[vertex])*colwidth)+'px';
line.className='interval';
document.body.appendChild(line);
var label=document.createElement('div');
label.style.left=line.style.left;
label.style.top=(parseInt(line.style.top)+28)+'px';
label.style.height='17px';
label.style.width='30px';
label.innerHTML=vertex;
label.className='label';
document.body.appendChild(label);
}
}
else {
alert('Not an Interval Graph')
};
}
function generateCliques(node) {
for(var i=0; i<node.length;i++) {
lbl=node[i];
clique[lbl]=[];
for(j=0;j<RN[lbl].length;j++) {
clique[lbl][j]=RN[lbl][j]; //each element of RN[x] becomes an element of clique[x]
}
clique[lbl].push(lbl); //RN(x) U {x} is a clique of subgraph processed so far and now clique[x] = RN[x] as sets
var parentx=Parent[lbl];
if(clique[parentx].subsetOf(clique[lbl])) {
var indx=cliques.indexOf(parentx);
if(indx>-1) { //if parent of lbl is in cliques list remove it
cliques.splice(indx,1);
}
}
cliques.push(lbl); //add lbl to cliques list
if(Children[lbl].length>0) { //lbl is not a leaf
generateCliques(Children[lbl]);
}
}
}
function consecutive() {
var p;
for(v=0;v<pivots.length;v++) {
var vertex=pivots[v];
p=0;
for(cl=0;cl<orderedCliques.length;cl++) {
if(clique[orderedCliques[cl]].indexOf(vertex)>-1) { //is vertex in maximal clique
if(p==0){
p=cl+1;
start[vertex]=p;
if(p==orderedCliques.length) {
end[vertex]=p;
}
}
else {
p+=1;
if(p==orderedCliques.length) {
end[vertex]=p;
}
if(p!=cl+1) {
return false;
}
}
}
else {
if(!end[vertex] && p>0) {
end[vertex]=cl;
}
}
}
}
return true;
}
This is not the full answer you are looking for but I hope it goes some way to helping.
Wikipedia led me to the *Interval Graph * page, on to the Lexicographic breadth-first search page on which I found the reference to the paper Habib, Michel; McConnell, Ross; Paul, Christophe; Viennot, Laurent (2000), "Lex-BFS and partition refinement, with applications to transitive orientation, interval graph recognition and consecutive ones testing",
Now this paper does give actual algorithms for determining if a graph is an integral graph using algorithms 2,3,4 and 9. Algorithms 2 and 3 can be found in alternative forms on the LBS page above and can be worked through. However so far, over the last couple of days, algorithm 4 has defeated me. Even working through the example graph they give does not produce the results they state.
So three possibilities.
I am not clever enough to understand it;
The algorithm is not sufficiently detailed;
There are mistakes in the algorithm.
Working on it being 2 or 3 that is true I will continue working on it on and off to see if I can crack it. Then there is algorithm 9 to tackle.
Perhaps the above pages and paper will give you enough insight into solving your problem. If I find a full answer I will post it. Good Luck.
For those who suffer from this paper like I do, I confirm that the algorithm 4 in the mentioned reference paper is weird/broken. Instead, I have found the second paper from the same authors about the same topics. You can check both papers here: http://citeseer.uark.edu:8080/citeseerx/showciting;jsessionid=B9CECB9E4B9DA156C687A414FA8743BF?cid=1681311
The second one appears to be written after a month and seems to be corrected by authors. I hope this may help someone now or later. In case the mentioned link will be unavailable ever, here are the 2 headings of the papers to search for:
Lex-BFS and Partition Refinement, with Applications to Transitive Orientation, Interval Graph Recognition and Consecutive Ones Testing.
Lex-BFS, a Partition Refining Technique. Application to Transitive Orientation, Interval Graph Recognition and Consecutive 1's Testing.
I have implemented the algorithms described in the second paper but it appears to have some bugs in the algorithm. I have met one of the authors (prof. Michel Habib) regarding to this, which required some more deep analysis. My implementation can be found here: https://github.com/Hack06/LexBFS

Changing attribute of nodes during breadth first search in R

I have created a random (Erdos-Renyi) graph that has 100 nodes. I have set an attribute value for all 100 nodes as 0. I find the node with the maximum degree (the most neighbors), and change its attribute value from 0 to 1. Then, using the node as the root node, and another node as a second root node, I do a breadth first search (BFS) on the network.
This is related to this question.
I do the breadth first search like this:
# BFS on the network
bfs <- graph.bfs(graph, root = c(root_node, root_node2), unreachable = FALSE,
order = TRUE, dist = TRUE)
I want to look at the neighbors of the first root node, then the neighbors of the second root node, then the neighbors of the first root node's neighbors, then the neighbors of the second root node's neighbors, and so on.
So something like this:
O # Note: O* is the first root node
| # and O! is the second root node
|
O----O----O!----O----O*----O----O----O
| |
| |
O O
So, to start with, the neighbors of the first root node are looked at:
O # Note: double connections are
| # the paths taken to the neighbors
|
O----O----O!----O====O*====O----O----O
| ||
| ||
O O
Then the neighbors of the second root node are looked at:
O
|
|
O----O====O!====O----O*----O----O----O
|| |
|| |
O O
Then, the neighbors of the first root node's neighbors:
O
||
||
O----O----O!----O----O*----O====O----O
| |
| |
O O
Then the neighbors of the second root node's neighbors:
O
|
|
O====O----O!----O----O*----O----O----O
| |
| |
O O
And so on until all of the nodes have been looked at:
O
|
|
O----O----O!----O----O*----O----O====O
| |
| |
O O
As each node is looked at, I want to change its attribute value from 0 to 1, so that if another path comes to it, it that knows this node has already been looked at.
Also, is there a way to count how many iterations if takes to look through all of the nodes? For example, here it is 6 (including the original).
Note: the two root nodes are connected in some way (i.e. there is a path between them).
Sorry about the images, but that's the basic idea. Hope this makes sense.
Any help would be much appreciated. Thanks!
Here is how to do it. First, here is a randomly generated graph.
numnodes <- 50
the.graph <- grg.game(numnodes, 0.3)
V(the.graph)$visited <- 0
graph.degree <- degree(the.graph)
Now, we take the maximum vertex and a random vertex. (You didn't specify how you chose the second one). We randomly repick the vertex until it is connected to and is not the maximum degree vertex.
maxvertex <- sample(which(graph.degree == max(graph.degree)),1)
randvertex <- as.integer(sample(V(the.graph),1))
while((randvertex == maxvertex) ||
(shortest.paths(the.graph,maxvertex,randvertex) == Inf)) {
randvertex <- sample(V(the.graph),1)
}
When traversing graphs like this, I like to keep track of where I am. Here is the starting position and a line to mark these initial nodes as visited.
curpos <- c(maxvertex, randvertex)
for(num in curpos) V(the.graph)[num]$visited <- 1
Now we actually do the search and mark nodes as visited. The loop will terminate if all of the nodes are marked as visited or if there are no more connected nodes to explore. If the code is bugged, we know there shouldn't be more iterations than steps for the search, so we know if it goes over the graph is not connected and we needn't continue. For each iteration, we go through the vector containing our currently occupied nodes. If any of its neighbors haven't been visited, we mark them as visited and add them to the vector for next time. Once we have visited all of the nodes for this iteration, we start the next cycle.
maxloops = length(V(the.graph))
curloop = 0
while((curloop < maxloops) && (length(curpos)>0) &&
(sum(V(the.graph)$visited) < numnodes)) {
nextpos <- c()
while(length(curpos)>0) {
curnode <- curpos[1]
curpos <- curpos[-1]
adjnodes <- which(the.graph[curnode] == 1)
for(adjnode in adjnodes) {
if(!V(the.graph)[adjnode]$visited) {
nextpos <- c(nextpos,adjnode)
V(the.graph)[adjnode]$visited <- 1
}
}
}
curpos <- nextpos
curloop <- curloop + 1
}
Now we have visited all nodes connected to the maximal degree node. We now print the number of iterations it took to traverse the graph. If any nodes are not visited, this will additionally print a message stating that the graph is not connected.
print(curloop)
if(sum(V(the.graph)$visited) < numnodes) print("Not a connected graph.")

Resources