[graph]: build a graph of relationship - graph

According to the books I read, they often build a graph using vertices whose value ranges from 1 to n, so every vertex has a unique name.
What if I have to build a graph using vertices whose value are strings, say
V = {'Arm', 'Bob', 'Lin', 'Kok'} #vertices
E = {('Arm', 'Lin'), (Bob, 'Lin'), ('Bob', 'Kok')} #edges
Am I supposed to map these string vertices to integers before I build the graph for them?
Any example I can refer to?

I'm gonna go out on a limb and guess you're drawing with pygraphviz
Using strings instead of numbers, the simple.py example would look like:
import pygraphviz as pgv
A=pgv.AGraph()
A.add_edge('foo', 'bar')
A.add_edge('bar', 'baz')
A.add_edge('baz', 'foo')
A.write('simple.dot')
B=pgv.AGraph('simple.dot')
B.layout()
B.draw('simple.png')
Or, if you're not drawing, just building, the code you posted is a great way to represent a graph; no need to use numbers when strings work just fine.

Related

Find all disjoint connected paths in a graph

I have k pairs of starting points and end points on a graph.
As shown in the picture, I dyed the different pair in different colors.
I need to connect them two by two.
Each node can only be passed through once, and cannot pass through the starting points of other colors.
The problem is to output the number of solutions that satisfy this constraint, and output 0 if there is no solution.
Does this problem have a well known name, is there any library to solve this problem?

Does igraph has "has_path" function?

I am trying to port code from python NetworkX to R igraph. In NetworkX there is a function with the name has_path that finds if two vertices have a path. I want to find in an efficient way all the vertices of a graph that don't have an edge between them but they have a path.
I think you can use the code below to check if there is a path from vertex V1 to V2 (the graph can be directed or undirected)
c(!is.infinite(distances(g, V1, V2, mode = "out")))
If you need to check this repeatedly in an undirected graph, simply break it into connected components and check if both vertices are within the same component. This will be very efficient, as the components need to be found only once.
See the components function. It gives you a membership vector. You need to check if the position corresponding to the two vertices has the same value (same component index).
If the graph is directed, the simplest solution is the one posted by #ThomasIsCoding. This is perfectly fine for a one-time check. Speeding up repeated checks is more trouble and warrants its own question.

Gremlin find highest match

I am planning to use a Graph Database (AWS Neptune) that can be queried with Gremlin as a sort of Knowledge base. The KB would be used as a classification tool on with entities with multiple features. For simplicity, I am using geometric shapes to code the properties of my entities in this example. Let's suppose I want to classify Points that can be related to Squares, Triangles and Circles. I have blueprint the different possible relationships of Points with the possibles Squares, Triangles and Circles in a graph as depicted in the picture below.
Created with:
g.addV('Square').property(id, 'S_A')
.addV('Square').property(id, 'S_B')
.addV('Circle').property(id, 'C_A')
.addV('Triangle').property(id, 'T_A')
.addV('Triangle').property(id, 'T_B')
.addV('Point').property(id, 'P1')
.addV('Point').property(id, 'P2')
.addV('Point').property(id, 'P3')
g.V('P1').addE('Has_Triangle').to(g.V('T_B'))
g.V('P2').addE('Has_Triangle').to(g.V('T_A'))
g.V('P1').addE('Has_Square').to(g.V('S_A'))
g.V('P2').addE('Has_Square').to(g.V('S_A'))
g.V('P2').addE('Has_Circle').to(g.V('C_A'))
g.V('P3').addE('Has_Circle').to(g.V('C_A'))
g.V('P3').addE('Has_Square').to(g.V('S_B'))
The different entities are for example Points, Squares, Triangles, Circles.
So my ultimate goal is to find the Point that satisfies the highest number of conditions. E.g.
g.V().hasLabel('Point').where(and(
out('Has_Triangle').hasId('T_A'),
out('Has_Circle').hasId('C_A'),
out('Has_Square').hasId('S_A')
))
// ==>v[P2]
The query above works very well for classifying a Point (a) with properties (T_A,S_A,C_A) respectively as a Point 2 (P2) type for example. But if I would have to use the same query for classifying a Point with properties (C_A,S_B,T_X) for example:
g.V().hasLabel('Point').where(and(
out('Has_Triangle').hasId('T_X'),
out('Has_Circle').hasId('C_A'),
out('Has_Square').hasId('S_B')
))
The query would fail to classify this point as Point 3 (P3) as in the KB there is no known Triangle property for P3.
Is there a way I can express a query that returns the vertex with the highest match which in this case would be P3?
Thank you in advance.
EDIT
Best idea to solve this so far, is to put sentinel values for KB properties that do not exist. Then modify the query to match each exact property or the sentinel value. But this means that if I add a new "type" of property to a Point in the future e.g. a Point Has_Hexagon, than I need to add sentinel Hexagon to all Points of my graph.
EDIT 2
Added Gremlin script that creates sample data
You can use the choose() step to increment a counter (sack) for each match, then order by counter values (descending) and pick the first one (highest match).
gremlin> g.withSack(0).V().hasLabel('Point').
choose(out('Has_Triangle').hasId('T_A'), sack(sum).by(constant(1))).
choose(out('Has_Circle').hasId('T_A'), sack(sum).by(constant(1))).
choose(out('Has_Square').hasId('T_A'), sack(sum).by(constant(1))).
order().
by(sack(), decr).
limit(1)
==>v[P2]
gremlin> g.withSack(0).V().hasLabel('Point').
choose(out('Has_Triangle').hasId('T_X'), sack(sum).by(constant(1))).
choose(out('Has_Circle').hasId('T_A'), sack(sum).by(constant(1))).
choose(out('Has_Square').hasId('S_B'), sack(sum).by(constant(1))).
order().
by(sack(), decr).
limit(1)
==>v[P3]
Each choose() step in the queries above can be read as if (condition) increment-counter. In any case, whether the condition is met or not, the original vertex (Point) will be emitted by the choose-step.

Reduce openstreetmap graph size in networkx

I have a graph (transformed from OSMNX) of London's walk path, containing 667.588 edges, with different highway attributes (the street types in openstreetmap). Running a shortest_path algorithm is quite slow (4 seconds). To improve the speed, I want to largely reduce the number of edges in a systematic way without losing main connections/city structures, but not sure how to do it? Any suggestions? Is there a way to group some close nodes to a more important one, thus reduce the size?
You can extract edges with desired highway types from your main graph G:
highways_to_keep = ['motorway', 'trunk', 'primary']
H = nx.MultiDiGraph()
for u,v,attr in G.edges(data=True):
if attr['highway'] in highways_to_keep:
H.add_edge(u,v,attr_dict=attr)
H.node[u] = G.node[u]
H.node[v] = G.node[v]
Here, we first initialized an empty MultiDiGraph, which is a type of graph used by OSMnx, then populate it with data from the main graph G, if the 'highway' attribute is in our list of highways_to_keep. You can find more about highway types in this OpenStreetMap page.
Our graph is a valid NetworkX graph, but you need to do one more thing before you can take advantage of OSMnx functionalities as well. if you execute G.graph, you will see graph attributes which contains crs (coordinate reference system) and some other things. you should add this information into your newly create graph:
H.graph = G.graph
here is the plot of H , osmnx.plot_graph(H):
It depends what type of network you're working with (e.g., walk, bike, drive, drive_service, all, etc.). The drive network type would be the smallest and prioritize major routes, but at the expense of pedestrian paths and passageways.
OSMnx also provides the ability to simplify the graph's topology with a built-in function. This is worth doing if you haven't already as it can reduce graph size by 90% sometimes while correctly retaining all intersection and dead-end nodes, as well as edge geometries, faithfully.
The above solution does not work anymore since the networkx library has changed. Specifically
H.node[u] = G.node[u]
is not supported anymore.
The following solution relies on the osmnx.geo_utils.induce_subgraph and used a node list as an argument to this function.
highways_to_keep = ['motorway', 'trunk', 'primary', 'secondary', 'tertiary']
H = nx.MultiDiGraph() # new graph
Hlist = [] # node list
for u,v,attr in G.edges(data=True):
if "highway" in attr.keys():
if attr['highway'] in highways_to_keep :
Hlist.append(G.nodes[u]['osmid'])
H = ox.geo_utils.induce_subgraph(G, Hlist)
The osmnx simplification module worked for me in this case https://osmnx.readthedocs.io/en/stable/osmnx.html#module-osmnx.simplification:
osmnx.simplification module
Simplify, correct, and consolidate network topology.
osmnx.simplification.consolidate_intersections(G, tolerance=10, rebuild_graph=True, dead_ends=False, reconnect_edges=True)
Consolidate intersections comprising clusters of nearby nodes.
osmnx.simplification.simplify_graph(G, strict=True, remove_rings=True)
Simplify a graph’s topology by removing interstitial nodes.

How can I find all 'long' simple acyclic paths in a graph?

Let's say we have a fully connected directed graph G. The vertices are [a,b,c]. There are edges in both directions between each vertex.
Given a starting vertex a, I would like to traverse the graph in all directions and save the path only when I hit a vertex which is already in the path.
So, the function full_paths(a,G) should return:
- [{a,b}, {b,c}, {c,d}]
- [{a,b}, {b,d}, {d,c}]
- [{a,c}, {c,b}, {b,d}]
- [{a,c}, {c,d}, {d,b}]
- [{a,d}, {d,c}, {c,b}]
- [{a,d}, {d,b}, {b,c}]
I do not need 'incomplete' results like [{a,b}] or [{a,b}, {b,c}], because it is contained in the first result already.
Is there any other way to do it except of generating a powerset of G and filtering out results of certain size?
How can I calculate this?
Edit: As Ethan pointed out, this could be solved with depth-first search method, but unfortunately I do not understand how to modify it, making it store a path before it backtracks (I use Ruby Gratr to implement my algorithm)
Have you looked into depth first search or some variation? A depth first search traverses as far as possible and then backtracks. You can record the path each time you need to backtrack.
If you know your graph G is fully connected there is N! paths of length N when N is number of vertices in graph G. You can easily compute it in this way. You have N possibilities of choice starting point, then for each starting point you can choose N-1 vertices as second vertex on a path and so on when you can chose only last not visited vertex on each path. So you have N*(N-1)*...*2*1 = N! possible paths. When you can't chose starting point i.e. it is given it is same as finding paths in graph G' with N-1 vertices. All possible paths are permutation of set of all vertices i.e. in your case all vertices except starting point. When you have permutation you can generate path by:
perm_to_path([A|[B|_]=T]) -> [{A,B}|perm_to_path(T)];
perm_to_path(_) -> [].
simplest way how to generate permutations is
permutations([]) -> [];
permutations(L) ->
[[H|T] || H <- L, T <- permutations(L--[H])].
So in your case:
paths(A, GV) -> [perm_to_path([A|P]) || P <- permutations(GV--[A])].
where GV is list of vertices of graph G.
If you would like more efficient version it would need little bit more trickery.

Resources