Gremlin: Traversing back graph, aggregating the results - gremlin

https://imgur.com/a/bvDoEZn
Hello! I have a graph like in the picture above.
I am trying to do a query that will traverse the graph until leaf vertices are reached,
make some computation for each leaf that will retrieve a list of vertices and then traverse back to the above level, aggregating the results from the children vertices in a unique list of vertices.
Now on each parent vertex, I will run the same computations and add the resulting list (in the picture represented as the list before +) with vertices to the aggregated list.
This is repeated for all the levels until first vertex is reached.
The question is, once I reached the leaves and got a list of vertices for each leaf, how do I traverse back to parents, aggregating the results in a single list.
I am using AWS Neptune with gremlin python.
This is the query I made so far that reaches the leaves and get a list for each leaf traversal:
g.V(vertexId).repeat(out()).until(outE().count().is(0)).[additional steps that return a list of vertices]
Sample graph:
g.addV("Website").property("name", "www.ex1.com").property("type", "root")
.addV("Website").property("name", "www.ex1.com/sub1")
.addV("Website").property("name", "www.ex1.com/sub2")
.addV("Website").property("name", "www.ex1.com/sub1/about")
.addV("Endpoint").property("name", "Node 1")
.addV("Endpoint").property("name", "Node 2")
.addV("Endpoint").property("name", "Node 3")
.addV("Endpoint").property("name", "Node 4")
.addE("SUBPATH").from(V().has("name", "www.ex1.com")).to(V().has("name", "www.ex1.com/sub1"))
.addE("SUBPATH").from(V().has("name", "www.ex1.com")).to(V().has("name", "www.ex1.com/sub2"))
.addE("SUBPATH").from(V().has("name", "www.ex1.com/sub1")).to(V().has("name", "www.ex1.com/sub1/about"))
.addE("RELATED").from(V().has("name", "www.ex1.com")).to(V().has("name", "Node 3"))
.addE("RELATED").from(V().has("name", "www.ex1.com/sub1")).to(V().has("name", "Node 1"))
.addE("RELATED").from(V().has("name", "www.ex1.com/sub1")).to(V().has("name", "Node 4"))
.addE("RELATED").from(V().has("name", "www.ex1.com/sub2")).to(V().has("name", "Node 2"))
.addE("RELATED").from(V().has("name", "www.ex1.com/sub1/about")).to(V().has("name", "Node 2"))
.addE("RELATED").from(V().has("name", "www.ex1.com/sub1/about")).to(V().has("name", "Node 4"))
Edit:
https://imgur.com/a/bvDoEZn
Hello everyone, thanks for the help and sorry for the wait. I constructed a more detailed graph to better illustrate what the problem is. I have a bunch of these "Website" nodes in a hierarchical tree-like fashion, and each of them is hooked up to an arbitrary number of distinct nodes with the same Label. We can call those nodes endpoints.
I have to find out for each Website node in the tree how many distinct endpoints there are for both that node and everything beneath it (union of all endpoints).
I have done this in the brute-force way of recalculating the set of endpoints for each node's sub-tree using a repeat(out().as("")) and afterwards selecting every child node in order to traverse towards the endpoints all at once in order to dedup() them.
g.V()
.hasLabel("Website").as("k")
.project("website", "count")
.by(select("k").by(T.id))
.by(
repeat(out().as("k")).until(outE().count().is(0))
.select(all, "k")
.unfold()
.out("Related")
.dedup()
.count())
I am now searching for a memoization type approach where I could gather the endpoints for each leaf and use the calculated set when going another level up without having to recalculate them.
The sack() would have been a great candidate for this task, however I couldn't make it work in Gremlin-Java with Sets or custom merge operators. Perhaps I was just doing something wrong. If it were possible it would then be very easy to traverse into the leaves of the tree, get the endpoints, store the vertices into the sack set, go back towards the root, append the parent's endpoints to the set and perhaps merge the parent's traversals sacks if it has more than one child.

Related

Gremlin: How to obtain outgoing edges and their target vertices in a single query

Given a set of vertices (say, for simplicity, that I start with one: G.V().hasId("something")), I want to obtain all outgoing edges and their target vertices. I know that .out() will give me all target vertices, but without the information about the edges (which have properties on them, too). On the other hand, .outE() will give me the edges but not the target vertices. Can I obtain both in a single Gremlin query?
Gremlin is as much about transforming graph data as it is navigating graph data. Typically folks seem to understand the navigation first which got you to:
g.V().hasId("something").outE()
You then need to transform those edges into the result you want - one that includes the edge data and it's adjacent vertex. One way to do that is with project():
g.V().hasId("something").outE()
project('e','v').
by().
by(inV())
Each by()-modulator supplied to project() aligns to the keys supplied as arguments. The first applies to "e" and the second to "v". The first by() is empty and is effectively by(identity()) which returns the same argument given to it (i.e. the current edge in the stream).
Never mind. Figured this out.
G.V().hasId("something").outE().as("E").otherV().as("V").select("E", "V")

Break up graph into smallest sub-components of 2-nodes or greater

I wish to be able to separate my graph into subcomponent such that the removal of any single node would create no further sub-components (excluding single nodes). As an example see the two images below.
The first image shows the complete graph. The second image shows the sub-components of the graph when it has been split into the smallest possible subcomponents. As can be seen from the second image, the vertex names have been maintained. I don't need the new structure to be a single graph it can be a list of graphs, or even a list of the nodes in each component.
The component of nodes 4-5-6 remains as removing any of the three nodes will not create a new component as the node that was broken off will only be a single node.
At the moment I am trying to put together an iterative process, that removes nodes sequentially in ascending degree order and recurses into the resultant new components. However, it is difficult and I imagine someone else has done it better before.
You say you want the "smallest subcomponents of 2 nodes of greater", and that your example has the "smallest possible subcomponents". But what you actually meant is the largest possible subcomponents such that the removal of any single node would create no further sub-components, right? Otherwise you could just separate the graph into a collection of all of the 2-graphs.
I believe, then, that your problem can be described as finding all "biconnected components" (aka maximal biconnected subgraphs of a graph): https://en.wikipedia.org/wiki/Biconnected_component
As you said in the comments, igraph has the function biconnected_components(g), which will solve your problem. :)

Easy way to find a graph edge between graph vertexes

if there are 100 graph vertexes, each graph vertex has 4 graph edges toward another graph vertex, and are stored in an array, X. "X(100, 4)" is the array's size, while "X(38, 2)" means the contents of the array at two dimensional index 38, 2.
Is there any simple way to find a way form a given starting graph vertex to another given graph vertex?
It does not have to be the shortest wat, as long as the destination can be reached.
Thanks!
Yes. This is the same as finding a path between two vertices in an undirected graph, and is a thoroughly studied concept in mathematics and computer science. The usual method is a "Depth First Search" (DFS). A suitable algorithm is described here.
Essentially it follows this pattern:
Start with x equal to the start node.
If x is the end node, then we're done.
If we've already visited x then abandon this path.
For each of the nodes y connected to x,
Add x to the current path and set y=x.
Run algorithm from step 2.
Loop to step 4.
That will explore every possible path from x, going as deep down each branch as possible to find the goal or a deadend. Hence "depth first".

Trying to reverse a MetaGraph in Julia

I am currently working on Julia with the MetaGraphs.jl package.
I have created the Graph with a dataset, the characteristic of the graph is that there's one start vertex and one end vertex. However the graph is created by creating all possible options so some edges which "take too much time" don't reach the final vertex. In order to have a decent graph for my next step/ an optimization problem.
I am cleaning the graph by simply going through all edges and those who don't reach the end vertex are deleted. But this method is costly in time and I know a wiser way to do it.
The method is simple:
1.reversing the graph (end vertex becomes start vertex)
2.Calculating the distance of all vertices from the start vertex
3.erase the vertices which don't have a defined distance with the start vertex
I have created an example of a small digraph that would be the same type as mine:
module EssaiModule
using LightGraphs, MetaGraphs
g = DiGraph(8)
mg = MetaDiGraph(g, 1.0)
add_vertex!(mg)
add_edge!(mg,1,2)
add_edge!(mg,1,3)
add_edge!(mg,1,4)
add_edge!(mg,2,4)
add_edge!(mg,2,5)
add_edge!(mg,3,5)
add_edge!(mg,5,6)
add_edge!(mg,4,6)
add_edge!(mg,3,7)
add_edge!(mg,4,8)
rg=reverse(mg)
end
the end vertex is number 6 so normally I would like to erase edges 3->7 and 4->8
But I can't even start my function because I simply cannot reverse this graph.
I get the error message "LoadError: type MetaDiGraph has no field badjlist"
I know it's because the graph is a metaGraph and not a lightGraph but shouldn't we be able to reverse a metagraph? it seems like a basic function that could be useful on many occasions while working with graphs.
Thanks for your help!

Algorithms for converting trace to graph?

Are there any algorithms to convert single line traces to graphs?
Example.
I have traces of events as they occur.
T1: A -> B -> C-> D
T2: A'-> X -> B' -> D'
T3: A"-> C" -> F" -> D"
I want to take these and create a graph structure. I have some way of establishing equivalence between A, A', A" and so on.
Is there a standardized algorithm to convert from the traces above to a graph?
I can think of an intuitive one which creates nodes for all events and add edges that are present in the traces but want to know if there is something better or my algorithm has a name.
Thanks
You just have to design the implementation of your graph, if it is static (ie you do get all the traces at the same time) or dynamic (you get the traces incrementally, in an online fashion, so that you need to update your graph at each time).
There is no particular algorithm as far as I remember; you just have, in first case, to collect all the nodes looping through all the traces and then add all the edges. In the second case you can add nodes incrementally to your structure (checking whether or not you've already seen the same node twice) as you add edges to them (it doesn't show from your examples but even edges can be repeated I guess).

Resources