Finding longest cyclic path in a graph with Gremlin - graph
I am trying to construct Gremlin queries to use within DSE Graph with geo-searches enabled (indexed in Solr). The problem is the graph is so densely interconnected that the cyclic path traversals time out. Right now the prototype graph I'm working with has ~1600 vertices and ~35K edges. The number of triangles passing through each vertex also summarised:
+--------------------+-----+
| gps|count|
+--------------------+-----+
|POINT (-0.0462032...| 1502|
|POINT (-0.0458048...| 405|
|POINT (-0.0460680...| 488|
|POINT (-0.0478356...| 1176|
|POINT (-0.0479465...| 5566|
|POINT (-0.0481031...| 9896|
|POINT (-0.0484724...| 433|
|POINT (-0.0469379...| 302|
|POINT (-0.0456595...| 394|
|POINT (-0.0450722...| 614|
|POINT (-0.0475904...| 3080|
|POINT (-0.0479464...| 5566|
|POINT (-0.0483400...| 470|
|POINT (-0.0511753...| 370|
|POINT (-0.0521901...| 1746|
|POINT (-0.0519999...| 1026|
|POINT (-0.0468071...| 1247|
|POINT (-0.0469636...| 1165|
|POINT (-0.0463685...| 526|
|POINT (-0.0465805...| 1310|
+--------------------+-----+
only showing top 20 rows
I anticipate the graph growing to a massive size eventually but I will limit the searches for cycles to geographic regions (say of radius ~ 300 meters).
My best attempt so far has been some versions of the following:
g.V().has('gps',Geo.point(lon, lat)).as('P')
.repeat(both()).until(cyclicPath()).path().by('gps')
Script evaluation exceeded the configured threshold of realtime_evaluation_timeout at 180000 ms for the request
For the sake of illustration, the map below shows a starting vertex in green and a terminating vertex in red. Assume that all the vertices are interconnected. I am interested in the longest path between green and red, which would be to circumnavigate the block.
A few links I've read through to no avail:
1) http://tinkerpop.apache.org/docs/current/recipes/#cycle-detection
2) Longest acyclic path in a directed unweighted graph
3) https://groups.google.com/forum/#!msg/gremlin-users/tc8zsoEWb5k/9X9LW-7bCgAJ
EDIT
Using Daniel's suggestion below to create a subgraph, it still times out:
gremlin> hood = g.V().hasLabel('image').has('gps', Geo.inside(point(-0.04813968113126384, 51.531259899256995), 100, Unit.METERS)).bothE().subgraph('hood').cap('hood').next()
==>tinkergraph[vertices:640 edges:28078]
gremlin> hg = hood.traversal()
==>graphtraversalsource[tinkergraph[vertices:640 edges:28078], standard]
gremlin> hg.V().has('gps', Geo.point(-0.04813968113126384, 51.531259899256995)).as('x')
==>v[{~label=image, partition_key=2507574903070261248, cluster_key=RFAHA095CLK-2017-09-14 12:52:31.613}]
gremlin> hg.V().has('gps', Geo.point(-0.04813968113126384, 51.531259899256995)).as('x').repeat(both().simplePath()).emit(where(both().as('x'))).both().where(eq('x')).tail(1).path()
Script evaluation exceeded the configured threshold of realtime_evaluation_timeout at 180000 ms for the request: [91b6f1fa-0626-40a3-9466-5d28c7b5c27c - hg.V().has('gps', Geo.point(-0.04813968113126384, 51.531259899256995)).as('x').repeat(both().simplePath()).emit(where(both().as('x'))).both().where(eq('x')).tail(1).path()]
Type ':help' or ':h' for help.
Display stack trace? [yN]n
The longest path, based on the number of hops, will be the last one you can find.
g.V().has('gps', Geo.point(x, y)).as('x').
repeat(both().simplePath()).
emit(where(both().as('x'))).
both().where(eq('x')).tail(1).
path()
There's no way to make this query perform well in OLTP, unless you have a very tiny (sub)graph. So, depending on what you see as a "city block" in your graph, you should probably extract that first as a subgraph and then apply the longest path query (in memory).
One solution I've come up with involves using Spark GraphFrames and a label propagation algorithm (GraphFrames, LPA). Each community's average GPS location can then be computed (in fact you don't even need the average, simply a single member of each community would suffice) and all the edges that exist between each community member representative (average or otherwise).
Select and save a region of the graph and save the vertices and edges:
g.V().has('gps', Geo.inside(Geo.point(x,y), radius, Unit.METERS))
.subgraph('g').cap(g')
Spark snippet:
import org.graphframes.GraphFrame
val V = spark.read.json("v.json")
val E = spark.read.json("e.json")
val g = GraphFrame(V,E)
val result = g.labelPropagation.maxIter(5).run()
val rdd = result.select("fullgps", "label").map(row => {
val coords = row.getString(0).split(",")
val x = coords(0).toDouble
val y = coords(1).toDouble
val z = coords(2).toDouble
val id = row.getLong(1)
(x,y,z,id)
}).rdd
// Average GPS:
val newVertexes = rdd.map{ case (x:Double,y:Double,z:Double,id:Long) => (id, (x,y,z)) }.toDF("lbl","gps")
rdd.map{ case (x:Double,y:Double,z:Double,id:Long) => (id, (x,y,z)) }.mapValues(value => (value,1)).reduceByKey{ case (((xL:Double,yL:Double,zL:Double), countL:Int), ((xR:Double,yR:Double,zR:Double), countR:Int)) => ((xR+xL,yR+yL,zR+yL),countR+countL) }.map{ case (id,((x,y,z),c)) => (id, ((x/c,y/c,z/c),c)) }.map{ case (id:Long, ((x:Double, y:Double, z:Double), count:Int)) => Array(x.toString,y.toString,z.toString,id.toString,count.toString) }.map(a => toCsv(a)).saveAsTextFile("avg_gps.csv")
// Keep IDs
val rdd2 = result.select("id", "label").map(row => {
val id = row.getString(0)
val lbl = row.getLong(1)
(lbl, id) }).rdd
val edgeDF = E.select("dst","src").map(row => (row.getString(0),row.getString(1))).toDF("dst","src")
// Src
val tmp0 = result.select("id","label").join(edgeDF, result("id") === edgeDF("src")).withColumnRenamed("lbl","src_lbl")
val srcDF = tmp0.select("src","dst","label").map(row => { (row.getString(0)+"###"+row.getString(1),row.getLong(2)) }).withColumnRenamed("_1","src_lbl").withColumnRenamed("_2","src_edge")
// Dst
val tmp1 = result.select("id","label").join(edgeDF, result("id") === edgeDF("dst")).withColumnRenamed("lbl","dst_lbl")
val dstDF = tmp1.select("src","dst","label").map(row => { (row.getString(0)+"###"+row.getString(1),row.getLong(2)) }).withColumnRenamed("_1","dst_lbl").withColumnRenamed("_2","dst_edge")
val newE = srcDF.join(dstDF, srcDF("src_lbl")===dstDF("dst_lbl"))
val newEdges = newE.filter(newE("src_edge")=!=newE("dst_edge")).select("src_edge","dst_edge").map(row => { (row.getLong(0).toString + "###" + row.getLong(1).toString, row.getLong(0), row.getLong(1)) }).withColumnRenamed("_1","edge").withColumnRenamed("_2","src").withColumnRenamed("_3","dst").dropDuplicates("edge").select("src","dst")
val newGraph = GraphFrames(newVertexes, newEdges)
The averaged locations are then connected by edges and the problem is reduced in this case from ~1600 vertices and ~35K edges to 25 vertices and 54 edges:
Here the non-green coloured segments (red, white, black, etc.) represent the individual communities. The green circles are the averaged GPS locations and their sizes are proportional to the number of members in each community. Now it is considerably easier to perform an OLTP algorithm such as proposed by Daniel in the comment above.
Related
How is the number of random walks determined in GDS/Neo4j?
I am running the random walk algorithm on my Neo4j graph named 'example', with the minimum allowed walk length (2) and walks per node (1). Namely, CALL gds.beta.randomWalk.stream( 'example', { walkLength: 2, walksPerNode: 1, randomSeed: 42, concurrency: 1 } ) YIELD nodeIds, path RETURN nodeIds, [node IN nodes(path) | node.name ] AS event_name And I get 41 walks. How is this number determined? I checked the graph and it contains 161 nodes and 574 edges. Any insights? Added later: Here is more info on the projected graph that I am constructing. Basically, I am filtering on nodes and relationships and just projecting the subgraph and doing nothing else. Here is the code - // Filter for only IDH Codel recurrent events WITH [path=(m:IDHcodel)--(n:Tissue) WHERE (m.node_category = 'molecular' AND n.event_class = 'Recurrence') AND NOT EXISTS((m)--(:Tissue{event_class:'Primary'})) | m] AS recur_events // Obtain the sub-network with 2 or more patients in edges MATCH p=(m1)-[r:hasIDHcodelPatients]-(m2) WHERE (m1 IN recur_events AND m2 IN recur_events AND r.total_common_patients >= 2) WITH COLLECT(p) AS all_paths WITH [p IN all_paths | nodes(p)] AS path_nodes, [p IN all_paths | relationships(p)] AS path_rels WITH apoc.coll.toSet(apoc.coll.flatten(path_nodes)) AS subgraph_nodes, apoc.coll.flatten(path_rels) AS subgraph_rels // Form the GDS Cypher projection CALL gds.graph.create.cypher( 'example', 'MATCH (n) where n in $sn RETURN id(n) as id', 'MATCH ()-[r]-() where r in $sr RETURN id(startNode(r)) as source , id(endNode(r)) as target, { LINKS: { orientation: "UNDIRECTED" } }', {parameters: {sn: subgraph_nodes, sr: subgraph_rels} } ) YIELD graphName AS graph, nodeQuery, nodeCount AS nodes, relationshipQuery, relationshipCount AS rels RETURN graph, nodes, rels Thanks.
It seems that the documentation is missing the description for the sourceNodes parameter, which would tell you how many walks will be created. We don't know the default value, but we can use the parameter to set the source nodes that the walk should start from. For example, you could use all the nodes in the graph to be treated as a source node (the random walk will start from them). MATCH (n) WITH collect(n) AS nodes CALL gds.beta.randomWalk.stream( 'example', { sourceNodes:nodes, walkLength: 2, walksPerNode: 1, randomSeed: 42, concurrency: 1 } ) YIELD nodeIds, path RETURN nodeIds, [node IN nodes(path) | node.name ] AS event_name This way you should get 161 walks as there are 161 nodes in your graph and the walksPerNode is set to 1, so a single random walk will start from every node in the graph. In essence, the number of source nodes times the walks per node will determine the number of random walks.
Getting the Vertices numbers from an Edge
I am using the shortest path algorithm from LightGraphs.jl. In the end I want to collect some information about the nodes along the path. In order to do that I need to be able to extract the vertices from the edges that the function gives back. Using LightGraphs g = cycle_graph(4) path = a_star(g, 1, 3) edge1 = path[1] Using this I get: Edge 1 => 2 How would I automatically get the vertices 1, 2 without having to look at the Edge manually? I thinking about some thing like edge1[1] or edge1.From which both does not work. Thanks in advance!
The accessors for AbstractEdge classes are src and dst, used like this: using LightGraphs g = cycle_graph(4) path = a_star(g, 1, 3) edge1 = path[1] s = src(edge1) d = dst(edge1) println("source: $s") # prints "source: 1" println("destination: $d") # prints "destination: 2"
All path *lengths* from source to target in Directed Acyclic Graph
I have a graph with an adjacency matrix shape (adj_mat.shape = (4000, 4000)). My current problem involves finding the list of path lengths (the sequence of nodes is not so important) that traverses from the source (row = 0 ) to the target (col = trans_mat.shape[0] -1). I am not interested in finding the path sequences; I am only interested in propagating the path length. As a result, this is different from finding all simple paths - which would be too slow (ie. find all paths from source to target; then score each path). Is there a performant way to do this quickly? DFS is suggested as one possible strategy (noted here). My current implementation (below) is simply not optimal: # create graph G = nx.from_numpy_matrix(adj_mat, create_using=nx.DiGraph()) # initialize nodes for node in G.nodes: G.nodes[node]['cprob'] = [] # set starting node value G.nodes[0]['cprob'] = [0] def propagate_prob(G, node): # find incoming edges to node predecessors = list(G.predecessors(node)) curr_node_arr = [] for prev_node in predecessors: # get incoming edge weight edge_weight = G.get_edge_data(prev_node, node)['weight'] # get predecessor node value if len(G.nodes[prev_node]['cprob']) == 0: G.nodes[prev_node]['cprob'] = propagate_prob(G, prev_node) prev_node_arr = G.nodes[prev_node]['cprob'] # add incoming edge weight to prev_node arr curr_node_arr = np.concatenate([curr_node_arr, np.array(edge_weight) + np.array(prev_node_arr)]) # update current node array G.nodes[node]['cprob'] = curr_node_arr return G.nodes[node]['cprob'] # calculate all path lengths from source to sink part_func = propagate_prob(G, 4000)
I don't have a large example by hand (e.g. >300 nodes), but I found a non recursive solution: import networkx as nx g = nx.DiGraph() nx.add_path(g, range(7)) g.add_edge(0, 3) g.add_edge(0, 5) g.add_edge(1, 4) g.add_edge(3, 6) # first step retrieve topological sorting sorted_nodes = nx.algorithms.topological_sort(g) start = 0 target = 6 path_lengths = {start: [0]} for node in sorted_nodes: if node == target: print(path_lengths[node]) break if node not in path_lengths or g.out_degree(node) == 0: continue new_path_length = path_lengths[node] new_path_length = [i + 1 for i in new_path_length] for successor in g.successors(node): if successor in path_lengths: path_lengths[successor].extend(new_path_length) else: path_lengths[successor] = new_path_length.copy() if node != target: del path_lengths[node] Output: [2, 4, 2, 4, 4, 6] If you are only interested in the number of paths with different length, e.g. {2:2, 4:3, 6:1} for above example, you could even reduce the lists to dicts. Background Some explanation what I'm doing (and I hope works for larger examples as well). First step is to retrieve the topological sorting. Why? Then I know in which "direction" the edges flow and I can simply process the nodes in that order without "missing any edge" or any "backtracking" like in a recursive variant. Afterwards, I initialise the start node with a list containing the current path length ([0]). This list is copied to all successors, while updating the path length (all elements +1). The goal is that in each iteration the path length from the starting node to all processed nodes is calculated and stored in the dict path_lengths. The loop stops after reaching the target-node.
With igraph I can calculate up to 300 nodes in ~ 1 second. I also found that accessing the adjacency matrix itself (rather than calling functions of igraph to retrieve edges/vertices) also saves time. The two key bottlenecks are 1) appending a long list in an efficient manner (while also keeping memory) 2) finding a way to parallelize. This time grows exponentially past ~300 nodes, I would love to see if someone has a faster solution (while also fitting into memory). import igraph # create graph from adjacency matrix G = igraph.Graph.Adjacency((trans_mat_pad > 0).tolist()) # add edge weights G.es['weight'] = trans_mat_pad[trans_mat_pad.nonzero()] # initialize nodes for node in range(trans_mat_pad.shape[0]): G.vs[node]['cprob'] = [] # set starting node value G.vs[0]['cprob'] = [0] def propagate_prob(G, node, trans_mat_pad): # find incoming edges to node predecessors = trans_mat_pad[:, node].nonzero()[0] # G.get_adjlist(mode='IN')[node] curr_node_arr = [] for prev_node in predecessors: # get incoming edge weight edge_weight = trans_mat_pad[prev_node, node] # G.es[prev_node]['weight'] # get predecessor node value if len(G.vs[prev_node]['cprob']) == 0: curr_node_arr = np.concatenate([curr_node_arr, np.array(edge_weight) + propagate_prob(G, prev_node, trans_mat_pad)]) else: curr_node_arr = np.concatenate([curr_node_arr, np.array(edge_weight) + np.array(G.vs[prev_node]['cprob'])]) ## NB: If memory constraint, uncomment below # set max size # if len(curr_node_arr) > 100: # curr_node_arr = np.sort(curr_node_arr)[:100] # update current node array G.vs[node]['cprob'] = curr_node_arr return G.vs[node]['cprob'] # calculate path lengths path_len = propagate_prob(G, trans_mat_pad.shape[0]-1, trans_mat_pad)
Best way to count downstream with edge data
I have a NetworkX problem. I create a digraph with a pandas DataFrame and there is data that I set along the edge. I now need to count the # of unique sources for nodes descendants and access the edge attribute. This is my code and it works for one node but I need to pass a lot of nodes to this and get unique counts. graph = nx.from_pandas_edgelist(df, source="source", target="target", edge_attr=["domain", "category"], create_using=nx.DiGraph) downstream_nodes = list(nx.descendants(graph, node)) downstream_nodes.append(node) subgraph = graph.subgraph(downstream_nodes).copy() domain_sources = {} for s, t, v in subgraph.edges(data=True): if v["domain"] in domain_sources: domain_sources[v["domain"]].append(s) else: domain_sources[v["domain"]] = [s] down_count = {} for k, v in domain_sources.items(): down_count[k] = len(list(set(v))) It works but, again, for one node the time is not a big deal but I'm feeding this routine at least 40 to 50 nodes. Is this the best way? Is there something else I can do that can group by an edge attribute and uniquely count the nodes?
Two possible enhancements: Remove copy from line creating the sub graph. You are not changing anything and the copy is redundant. Create a defaultdict with keys of set. Read more here. from collections import defaultdict import networkx as nx # missing part of df creation graph = nx.from_pandas_edgelist(df, source="source", target="target", edge_attr=["domain", "category"], create_using=nx.DiGraph) downstream_nodes = list(nx.descendants(graph, node)) downstream_nodes.append(node) subgraph = graph.subgraph(downstream_nodes) domain_sources = defaultdict(set) for s, t, v in subgraph.edges(data=True): domain_sources[v["domain"]].add(s) down_count = {} for k, v in domain_sources.items(): down_count[k] = len(set(v))
Recognition of interval graphs
I need an algorithm for recognizing an interval graph and generate its intervals. After some research I found the Algorithm developed by Wen-Lian Hsu. (http://www.iis.sinica.edu.tw/IASL/webpdf/paper-1992-A_New_Test_for_Interval_Graphs.pdf). It seems to be an algorithm, which solves my problem. But, I am not a computer scientist so I am having problems understanding the algorithm. Could anybody explain this algorithm to a novice, plain and simple?
Having worked through some examples I think I know what is going on, though I still do not follow algorithm 4. My algorithm for determining if graphs are interval graphs is below followed by some Javascript code to implement the algorithm. Putting it in code allowed me to check whether it worked or not. The code can be found at this JSFiddle. I have tested the code with these three graphs. (1 and 2 are interval graphs 3 is not) As I have made the algorithm from my interpretation of the paper given in my earlier answer I can give no guarantees that it is fully correct but it seems to work. The algorithm will be worked through with graph 1 When x and y are vertices of the graph then x and y are neighbours when x and y are joined by an edge of the graph. Algorithm for Interval Graphs. Stage 1 create a Lexicographically Ordered list, L, of vertices from the graph. Form an arbitrarily ordered list U of vertices of the graph, called a CLASS. Form US, an ordered list of one element the class U. While US is not empty Take the 1st vertex, v, from the 1st class in US and put it at the front of L. Set T to be an empty class For each vertex in each class in US If it is a neighbour of v remove it from its class and push to back of T If T is not empty put T at front of US. Remove any empty classes from US L is now a Lexicographically Ordered list of vertices from the graph For graph 1 U= (3,6,1,4,5,8,2,7) US=( (3,6,1,4,5,8,2,7)) v=3 US=( (3,6,1,4,5,8,2,7)) L=(3) neighbours of 3 to front US=( (6,8),(1,4,5,2,7)) v=6 US=((8)(1,4,5,2,7)) L=(6,3) neighbours of 6 to front US=((8,1,7),(4,5,2)) v=8 US=((1,7)(4,5,2)) L=(8,6,3) neighbours of 8 to front US=((7,4,5,2)(1)) v=7 US=((4,5,2)(1)) L=(7,8,6,3) neighbours of 7 to front US=( (4,5)(2)(1)) v=4 US=((5)(2)(1)) L=(4,7,8,6,3) neighbours of 4 to front US=((5)(2)(1)) v=5 US=((2)(1)) L=(5,4,8,6,3) neighbours of 5 to front – no neighbours so no change US=((2)(1)) v=2 US=((1)) L=(2,5,4,8,6,3) neighbours of 2 to front – no neighbours so no change US=((1)) v=1 US=() L=(1,2,5,4,8,6,3) L finished Stage 2 – First test for an Interval Graph When x is a vertex of the graph and L a lexicographical Ordered list of the graph Set RN(x) to be the neighbours of x which come after x in L placed in the same order that they occur in L Set parent(x) to be the first vertex in RN(x) Set RNnoP(x) to be the vertices of in RN(x) with parent(x) removed, ie RN(x)/parent(x) If the graph is an interval graph then for all vertices x if RNnoP(x) has any vertices in it then they will all appear in RN(parent(x)), ie RNnoP(x) is a subset of RN(parent(x) If any x fails this test then the graph cannot be an interval graph. Below shows the results for the example graph x RN(x) Parent(x) RN(x)/parent(x) RN(parent(x)) Pass 1 | 6 | 6 | - | 3- | T 2 | 8 | 8 | - | 6- | T 5 | 7,8 | 7 | 8 | 8- | T 4 | 7,8 | 7 | 8 | 8- | T 7 |8 | 8 | - | 6- | T 8 | 6,3 | 6 | 3 | 3- | T 6 | 3 | 3 | - | -- | T 3 | - | - | - | -- | T Stage 3 - for graphs that pass stage 2 form cliques for each vertex of the graph and create a set of maximal cliques A CLIQUE is a set of vertices from the graph such that for any pair of different vertices in the set x,y then x and y are neighbours. Set C(x) to be the set containing the vertices in RN(x) together with x, ie RN(x) Union {x} Now it is necessary to form a set of maximal cliques. A clique is MAXIMAL if the addition on any other vertex into the clique stops it being a clique. Using the parent relationships found in stage 2 form a tree Using post-order traverse the tree and form a set, CS, of vertices where for each x in CS C(x) is a maximal clique using the following process on all vertices except for the root. If C(parent(x)) is a subset of C(x) if parent(x) in CS remove it put x at in CS Below is a tree for the example graph showing x and C[x] at each node x RN(x) Parent(x) C(x) C(parent(x)) 1 |6 | 6 | 1,6 | 3,6 2 Z8 | 8 | 2,8 | 3,6,8 5 |7,8 | 7 | 5,7,8 | 7,8 4 |7,8 | 7 | 4,7,8 | 7,8 7 |8 | 8 | 7,8 | 3,6,8 8 |6,3 | 6 | 3,6,8 | 3,6 6 |3 | 3 | 3,6 | 3 3 | - | - | 3 | - The process on above tree x=3 is root x=6 C(6) = {3,6} CS=(6) C(3) is a subset of C(6) but 3 not in OC x=1 C(1)={1,6} CS=(1,6) C(6) not a subset of C(1) put in 6 x=8 C(8)={3,6,8) CS=(1,8) C(6) is a subset of C(8) remove 6 put in 8 x=7 C(7)={7,8} ) CS=(1,8,7) C(8) not a subset of C(7) put in 7 x=4 C(4)={4,7,8} CS=(1,8,4) C(7) is a subset of C(4) remove 7 put in 4 x=5 C(5)={5,7,8} CS=(1,8,4,5) C(7) is a subset of C(5) but no 7, put in 5 x=2 C(2)={2,8} CS={1,8,4,5,2} C(8) is not a subset of C(2) put in 2 NOTE in the code I used a Children’s relationship to traverse the tree as a way of excluding the root. Stage 4 Attempt to order the maximal cliques so that they are consecutive. Cliques are in CONSECUTIVE order if for any vertex x, if x is in cliques n and n+m, m>0, the x is in cliques n+1, n+2, ……n+ m-1 The following algorithm will put the maximal cliques in consecutive order if it is possible to do so Set NC to be the ordered list, or class, of maximal cliques from CS, such that if x is in CS(x) then C(x) is in NC Set P to be the ordered list containing NC, P=(NC) Set OC=() and empty ordered list While P is not empty Take the last clique, LST, from the last class of P and put at front of OC For each clique Q in the last class of P partition into two classes OUT if Q and LST have no vertices in common (intersection empty) IN if Q and LST have vertices in common (non empty intersection) Replace the last class of P with the classes OUT IN if both non empty Replace the last class of P with the class OUT if OUT non empty and IN empty Replace the last class of P with the classes IN if IN non empty and OUT empty Leave P if both empty For the example graph P=(({3,6,8},{4,7,8},{1,6},{5,7,8},{2,8})) (I have mixed up the order to show how the process works) P=(({3,6,8},{4,7,8},{1,6},{5,7,8},{2,8})) OC=() P=(({3,6,8},{4,7,8},{1,6},{5,7,8})) OC=({2,8}) OUT=({1,6}) IN=({3,6,8},{4,7,8},{5,7,8}) P=(({1,6}),({3,6,8},{4,7,8},{5,7,8})) OC=({2,8}) P=(({1,6}),({3,6,8},{4,7,8})) OC=({5,7,8},{2,8}) OUT=() IN({3,6,8},{4,7,8}) P=(({1,6}),({3,6,8},{4,7,8})) OC=({5,7,8},{2,8}) P=(({1,6}),({3,6,8})) OC=({4,7,8},{5,7,8},{2,8}) OUT=() IN({3,6,8}) P=(({1,6}),({3,6,8})) OC=({4,7,8},{5,7,8},{2,8}) P=(({1,6})) OC=({3,6,8},{4,7,8},{5,7,8},{2,8}) OUT=() IN=({1,6}) P=(({1,6})) OC=({3,6,8},{4,7,8},{5,7,8},{2,8}) P=(()) OC=({1,6},{3,6,8},{4,7,8},{5,7,8},{2,8}) P=() NOTE in the code I have left NC = cliques as a list of vertices and used clique(label) for C(x) Stage 5 check if OC is consecutively ordered For each v in CS (as in stage 3) For each vertex, x in C(v) the clique associated with v If x is only in adjacent cliques in OC then Interval graph else Not an interval graph Stage 6 if past consecutive test draw intervals Where n is the number of vertices determine n columns numbered 1 to n with gaps between them and equal width For each v in CS with v first appearing in clique i and lastly in clique j 1=i<=j=n draw the interval from the start of column i to the end of column j CODE IN JAVASCRIPT Styles for Output of Intervals .interval { border-bottom: 1px solid black; position: absolute } .label { position:absolute; } Code //Array methods added to carry out interval graph check if(!Array.indexOf){ //Needed for earlier versions of IE; Array.prototype.indexOf = function(obj){ for(var i=0; i<this.length; i++){ if(this[i]===obj){ return i; } } return -1; } } Array.prototype.subsetOf = function(set){ //returns true if Array is a subset of set if(this.length==0) {return true;} //empty set is subset of all sets var subset=true; for(var i=0; i<this.length; i++){ subset = subset && (set.indexOf(this[i])>-1); //element of Array not in set forces subset to be false } return subset; } Array.prototype.intersection = function(set){ //returns the intersection of Array and set if(this.length==0) {return [];} //empty set intersect any set is empty set var inter=[]; for(var i=0; i<this.length; i++){ if(set.indexOf(this[i])>-1) {//element of Array and set inter.push(this[i]); } } return inter; } Array.prototype.union = function(set){ //returns the union of Array and set var union=[]; for(var i=0; i<this.length; i++){ union.push(this[i]); } for(var i=0; i<set.length; i++) { if(union.indexOf(set[i])==-1) {//element not yet in union union.push(set[i]); } } return union; } //A set is an array with no repeating elements function vertex(label,neighbours) { this.label=label; this.neighbours=neighbours; } //Using the following format for each vertex on the graph [vertex lable, [neighbour lables] ] set up the model of the graph //order of vertices does not matter //graph One - an interval graph graph=[ [3,[6,8]], [6,[1,3,7,8]], [1,[6]], [4,[7,8]], [5,[7,8]], [8,[2,3,4,5,6,7]], [2,[8]], [7,[4,5,8]] ]; //graph Two - an interval graph /*graph=[ ['A',['B','C','D']], ['B',['A','C']], ['C',['A','B','D','E','F']], ['D',['A','C','E','F']], ['E',['C','D']], ['F',['D','G']], ['G',['F']] ]; //graph Three - not an interval graph graph=[ ['W',['Y','Z']], ['X',['Z']], ['Y',['W']], ['Z',['W']] ]; */ /*Create a new vertex object U[i] where U is an unordered array. *Unordered as at this point the numbering of vertices does not matter. *Referencing by name rather than an array index is easier to follow */ var U=[]; for(var i=0; i<graph.length; i++) { U[i]=new vertex(graph[i][0],graph[i][4]); } var US=[U]; /*US is an array containing the single array U * during Lexicographical ordering US will contain other arrays */ //********************Lexicographical Ordering Start************************* var L=[]; //L with contain the vertices in Lexicographical order. while (US.length>0) { F=US[0]; //First array in US vertex=F.shift(); //In Javascript shift removes first element of an array if(F.length==0) { //F is empty US.shift(); } L.unshift(vertex); //In Javascript unshift adds to front of array var T=new Array(); //new array to add to front of US tus=[]; //tempory stack for US sets while(US.length>0) { //for remaining vertices in the arrays in US check if neighbours of vertex set=US.shift(); //first set of US ts=[]; //tempory stack for set elements while(set.length>0){ v=set.shift(); //v is one of the remaining vertices // lbl=v.label; if (vertex.neighbours.indexOf(lbl) != -1) { //is v a neighbour of vertex T.push(v); //push v to T } else { ts.unshift(v); //not a neighbour store for return } } while(ts.length>0) { //restore list of v not moved to set set.push(ts.pop()); } if(set.length>0) {//if set not empty store for restoration tus.unshift(set); } } if(T.length>0) { //if T not empty US.push(T); //put T as first set of US } while(tus.length>0) { US.push(tus.pop()); // restore non empty sets } } //************************End of Lexicographical Ordering************************* //----------------------Chordality Check and Clique Generation Start---------------------------------------- RN={}; //RN as an object so that an associative array can be used if labels are letters Parent={}; //Parent as an object so that an associative array can be used if labels are letters RNnoP={}; //RN with parent removed as an object so that an associative array can be used if labels are letters Children={}; //Used in clique generation NOTE this is a deviation from given alogorithm 4 which I do not follow, again object for associative array for(var i=0; i<L.length;i++) { Children[L[i].label]=[]; } var chordal=true; for(var i=0; i<L.length-1; i++) { vertex=L[i]; RN[vertex.label]=[]; RNnoP[vertex.label]=[]; for(j=i+1;j<L.length; j++) { v=L[j]; lbl=v.label; if(vertex.neighbours.indexOf(lbl) != -1) { RN[vertex.label].push(lbl); //store vertex labels in order they are processed } } Parent[vertex.label]=RN[vertex.label][0]; //Parent is front vertex of RN Children[RN[vertex.label][0]].push(vertex.label);//used for Clique generation my method for(k=1;k<RN[vertex.label].length;k++) { RNnoP[vertex.label][k-1]=RN[vertex.label][k]; } } //************** chordality check ************ for(i=0; i<L.length-1; i++) { vertex=L[i]; var x=vertex.label; parentx=Parent[x]; for(j=0;j<RNnoP[x].length;j++) { chordal = chordal && (RNnoP[x].subsetOf(RN[parentx])); } } if(!chordal) { alert('Not an Interval Graph'); } else { //Construct maximal clique list from tree formed by parent and child relationships determined above NOTE not algorithm 4 var root = Children[L[L.length-1].label]; //last vertex in L which has no parent RN[L[L.length-1].label]=[]; //no vertices to right of last vertex var clique={}; //clique for each vertex label -- object so associative array using lables var cliques=[]; //stores maximal cliques from subtree of vertices processed clique[L[L.length-1].label]=[L[L.length-1].label]; //clique for root contains last vertex label generateCliques(root); //cliques becomes a list of labels of vertices with maximal cliques var pivots=[]; for(i=0;i<cliques.length;i++) { pivots=pivots.union(clique[cliques[i]]); } /*attempt to place each clique in cliques in consecutive order * ie all cliques containing a given label are all next to each other * if at end of process the cliques are in consecutive order then have an interval graph otherwise not an interval graph */ var orderedCliques=[]; //will contain maximal cliques in consecutive order if possible var partitions=[cliques]; //holds partitions of cliques during process while(partitions.length>0) { inPartition=new Array(); //partition of elements containing pivot outPartition=new Array(); //partition of elements not containing pivot lastPartition=partitions.pop(); //last partition of cliques orderedCliques.unshift(lastPartition.shift());//first label in partition moved to front of orderedCliques pivotClique=clique[orderedCliques[0]]; //which points to pivot clique for(var i=0; i<lastPartition.length; i++) { if(pivotClique.intersection(clique[lastPartition[i]]).length>0){ //non empty intersection inPartition=inPartition.union([lastPartition[i]]); } else { outPartition=outPartition.union([lastPartition[i]]); } } if(outPartition.length>0) { partitions.push(outPartition); } if(inPartition.length>0) { partitions.push(inPartition); } } //----------------------End of Chordality Check and Clique Generation---------------------------------------- var start={}; //start is an associative array; var end={}; //end is an associative array; if (consecutive()){ //draw intervals...................... var across=20; var down=20; var colwidth=20; var gap=30; var coldepth=30; var height=20; for(v=0;v<pivots.length;v++) { var vertex=pivots[v]; var line=document.createElement('div'); line.style.top=(down+(coldepth+height)*v)+'px'; line.style.height=(coldepth+height)+'px'; line.style.left=(across+start[vertex]*(colwidth+gap))+'px'; line.style.width=((end[vertex]-start[vertex])*gap+(1+end[vertex]-start[vertex])*colwidth)+'px'; line.className='interval'; document.body.appendChild(line); var label=document.createElement('div'); label.style.left=line.style.left; label.style.top=(parseInt(line.style.top)+28)+'px'; label.style.height='17px'; label.style.width='30px'; label.innerHTML=vertex; label.className='label'; document.body.appendChild(label); } } else { alert('Not an Interval Graph') }; } function generateCliques(node) { for(var i=0; i<node.length;i++) { lbl=node[i]; clique[lbl]=[]; for(j=0;j<RN[lbl].length;j++) { clique[lbl][j]=RN[lbl][j]; //each element of RN[x] becomes an element of clique[x] } clique[lbl].push(lbl); //RN(x) U {x} is a clique of subgraph processed so far and now clique[x] = RN[x] as sets var parentx=Parent[lbl]; if(clique[parentx].subsetOf(clique[lbl])) { var indx=cliques.indexOf(parentx); if(indx>-1) { //if parent of lbl is in cliques list remove it cliques.splice(indx,1); } } cliques.push(lbl); //add lbl to cliques list if(Children[lbl].length>0) { //lbl is not a leaf generateCliques(Children[lbl]); } } } function consecutive() { var p; for(v=0;v<pivots.length;v++) { var vertex=pivots[v]; p=0; for(cl=0;cl<orderedCliques.length;cl++) { if(clique[orderedCliques[cl]].indexOf(vertex)>-1) { //is vertex in maximal clique if(p==0){ p=cl+1; start[vertex]=p; if(p==orderedCliques.length) { end[vertex]=p; } } else { p+=1; if(p==orderedCliques.length) { end[vertex]=p; } if(p!=cl+1) { return false; } } } else { if(!end[vertex] && p>0) { end[vertex]=cl; } } } } return true; }
This is not the full answer you are looking for but I hope it goes some way to helping. Wikipedia led me to the *Interval Graph * page, on to the Lexicographic breadth-first search page on which I found the reference to the paper Habib, Michel; McConnell, Ross; Paul, Christophe; Viennot, Laurent (2000), "Lex-BFS and partition refinement, with applications to transitive orientation, interval graph recognition and consecutive ones testing", Now this paper does give actual algorithms for determining if a graph is an integral graph using algorithms 2,3,4 and 9. Algorithms 2 and 3 can be found in alternative forms on the LBS page above and can be worked through. However so far, over the last couple of days, algorithm 4 has defeated me. Even working through the example graph they give does not produce the results they state. So three possibilities. I am not clever enough to understand it; The algorithm is not sufficiently detailed; There are mistakes in the algorithm. Working on it being 2 or 3 that is true I will continue working on it on and off to see if I can crack it. Then there is algorithm 9 to tackle. Perhaps the above pages and paper will give you enough insight into solving your problem. If I find a full answer I will post it. Good Luck.
For those who suffer from this paper like I do, I confirm that the algorithm 4 in the mentioned reference paper is weird/broken. Instead, I have found the second paper from the same authors about the same topics. You can check both papers here: http://citeseer.uark.edu:8080/citeseerx/showciting;jsessionid=B9CECB9E4B9DA156C687A414FA8743BF?cid=1681311 The second one appears to be written after a month and seems to be corrected by authors. I hope this may help someone now or later. In case the mentioned link will be unavailable ever, here are the 2 headings of the papers to search for: Lex-BFS and Partition Refinement, with Applications to Transitive Orientation, Interval Graph Recognition and Consecutive Ones Testing. Lex-BFS, a Partition Refining Technique. Application to Transitive Orientation, Interval Graph Recognition and Consecutive 1's Testing. I have implemented the algorithms described in the second paper but it appears to have some bugs in the algorithm. I have met one of the authors (prof. Michel Habib) regarding to this, which required some more deep analysis. My implementation can be found here: https://github.com/Hack06/LexBFS