Working with redis-graph - graph

I'm a beginner in redis-graph and presently I'm working on K-shortest path algorithm which is implemented in JAVA(where a graph is created using hashmap) and as the dataset is quite large(27 million rows) I need to a database to store a graph and for the same reason I plan to use redis-graph, but redis-graph uses cypher query language. How can integrate both these applications?
Any other suggestion(s) would be welcome.

Although you can use RedisGraph to hold the graph for you at the moment there's no way of finding K shortest paths from node A to node B, I've implemented a shortest path algorithm within RedisGraph but have yet to expose it to clients, I'm not sure of the approach you had in mind for finding K shortest paths, *I've implemented one using a cost edge flow-network, you can find my javascript implementation here
I'll might include a k-shortest path algo within RedisGraph, I need some time to think about that, in any case, using the current sub-set of Cypher supported by RedisGraph finding K shortest path is not possible,
You'll might be able to retrieve a relevant sub-graph from RedisGraph to your Java application find path I out of K and once no additional paths can be found, extend that sub-graph be retrieving additional nodes / edges from RedisGraph.

Related

How to do efficient bulk upserts (insert new vertex, or update property(ies)) in Gremlin?

Context:
I do have a graph with about 2000 vertices, and 6000 edges, this over time might grow to 10000 vertices and 100000 edges. Currently I am upserting the new vertices using the following traversal query:
Upserting Vertices & Edges
queryVertex = "g.V().has(label, name, foo).fold().coalesce(
unfold(), addV(label).property(name, foo).property(model, 2)
).property(model, 2)"
The intent here is to look for vertex, named foo, and if found update its model property, otherwise create a new vertex and set the model property. this is issued twice: once for the source vertex and then for the target vertex.
Once the two related vertices are created, another query is issued to create the edge between them:
queryEdge = "g.V('id_of_source_vertex').coalesce(
outE(edge_label).filter(inV().hasId('id_of_target_vertex')),
addE(edge_label).to(V('id_of_target_vertex'))
).property(model, 2)"
here, if there is an edge between the two vertices, the model property on edge is updated, otherwise it creates the edge between them.
And the pseudocode that does this, is something as follows:
for each edge in the list of new edges:
//upsert source and target vertices:
execute queryVertex for edge.source
execute queryVertex for edge.target
// upsert edge:
execute queryEdge
This works, but it is highly inefficient; for example for the mentioned graph size it takes several minutes to finish, and with some in-app concurrency, it reduces the time only by couple of minutes. Surely, there must be a more efficient way of doing this for such a small graph size.
Question
* How can I make these upserts faster?
Bulk loading should typically be relegated to the provider specific tools that are optimized to handle such tasks. Gremlin really doesn't provide abstractions to cover the diverse group of bulk loader tools that are out there for each of the various graph database systems that implement TinkerPop. For Neptune, which is how you tagged your question, that would mean using the Neptune Bulk Loader.
Speaking specifically to your question, though you might see some optimizations to what you described as your approach. From a Gremlin perspective, I imagine you would see some savings here by submitting a single Gremlin request per edge by combining your existing traversals:
g.V().has(label, name, foo).fold().
coalesce(unfold(),
addV(label).property(name, foo)).
property(model, 2).as('source').
V().has(label, name, bar).fold().
coalesce(unfold(),
addV(label).property(name, bar)).
property(model, 2).as('target').
coalesce(inE(edge_label).where(outV().as('source')),
addE(edge_label).from('source').to('target')).
property(model, 2)
I think I got that right - untested, but hopefully you get the idea. Basically, we just reference the vertices already in memory via step labels so that we don't need to requery them. You might try other tactics as well if you continue with Gremlin-style bulk loading like ordering your edges so that you could batch together more edge loads to reduce the amount of vertex lookups and submit vertex/edge data in a more dynamic fashion as described here.

Path-finding of a real life map

I want to create a path-finding algorithm to find the shortest distance between a business location and supplier location. I plan on converting the map into a graph but I am not sure what the nodes would be in this case, the roads?
I have researched the A* path finding algorithm but I don't know what the nodes are supposed to be
I am asking on here so that I can be directed to a solution or available information

Gremlin: how to get all of the graph structure surrounding a single vertex into a subgraph

I would like to get all of the graph structure surrounding a single vertex into a subgraph.
The TinkerPop documentation shows how to do this for a fixed number of traversal steps.
My question: are there any recipes for getting the entire surrounding structure (which may include cycles) without knowing ahead of time how many traversal steps are needed?
I have updated this question for the benefit of anyone who might land on this question, here is a gremlin statement that will capture an arbitrary graph structure that surrounds a vertex with id='xxx'
g.V('xxx').repeat(out().simplePath()).until(outE().count().is(eq(0))).path()
-- this incorporates Stephen Mallete's suggestion to use simplePath within the repeat step.
That example uses repeat()...times() but you could easily replace times() with until() and not know the number of steps ahead of time. See more information in the Reference Documentation on how repeat() works to see how to express different types of flow control and traversal stop conditions.

Routing Engine Using OpenStreetMap Data

As part of my academic project, I have to build a routing engine based on data supplied from OSM. I have looked at the data model of OSM and I'm all fine with it. However, I'm having trouble converting an OSM XML file into a graph structure (nodes and edges) that I can use to apply search algorithms (Dijkstra, A* etc.) on. I would like the graph to be stored in memory to allow fast read/write.
So can anyone shed light or suggest techniques on how this can be done, or even provide pointers for further research.
Please note that I'm not allowed to re-use existing routing engines as this would defeat the purpose of doing the project.
All you need to do is:
create a node for every <node> item
every <way> entry is a sequenced list of <nd> items, each of which is a backreference to a node. So for each <way>, you iterate pairwise through its <nd>s and create an arc between the two nodes referenced.
You can do this in one pass using a streaming XML parser, since the XML data defines all the nodes before the ways.
The data doesn't intrinsically include distances, so you need to calculate that from the latlon of each node. You should also take account of the road type (highway=*) and the access info (access=*) in your routing, and you probably also want to ignore ways that are not traversable (eg waterway=stream) but that's all down to your specific situation.
http://wiki.openstreetmap.org/wiki/Elements

Hadoop MapReduce implementation of shortest PATH in a graph, not just the distance

I have been looking for "MapReduce implementation of Shortest path search algorithms".
However, all the instances I could find "computed the shortest distance form node x to y", and none actually output the "actual shortest path like x-a-b-c-y".
As for what am I trying to achieve is that I have graphs with hundreds of 1000s of nodes and I need to perform frequent pattern analysis on shortest paths among the various nodes. This is for a research project I am working on.
It would be a great help if some one could point me to some implementation (if it exists) or give some pointers as to how to hack the existing SSSP implementations to generate the paths along with the distances.
Basically these implementations work with some kind of messaging. So messages are send to HDFS between map and reduce stage.
In the reducer they are grouped and filtered by distance, the lowest distance wins. When the distance is updated in this case, you have to set the vertex (well, some ID probably) where the message came from.
So you have additional space requirement per vertex, but you can reconstruct every possible shortest path in the graph.
Based on your comment:
yes probably
I will need to write another class of the vertex object to hold this
additional information. Thanks for the tip, though it would be very
helpful if you could point out where and when I can capture this
information of where the minimum weight came from, anything from your blog maybe :-)
Yea, could be a quite cool theme, also for Apache Hama. Most of the implementations are just considering the costs not the real path. In your case (from the blog you've linked above) you will have to extract a vertex class which actually holds the adjacent vertices as LongWritable (maybe a list instead of this split on the text object) and simply add a parent or source id as field (of course also LongWritable).
You will set this when propagating in the mapper, that is the for loop that is looping over the adjacent vertices of the current key node.
In the reducer you will update the lowest somewhere while iterating over the grouped values, there you will have to set the source vertex in the key vertex by the vertex that updated to the minimum.
You can actually get some of the vertices classes here from my blog:
Source
or directly from the repository:
Source
Maybe it helps you, it is quite unmaintained so please come back to me if you have a specific question.
Here is the same algorithm in BSP with Apache Hama:
https://github.com/thomasjungblut/tjungblut-graph/blob/master/src/de/jungblut/graph/bsp/SSSP.java

Resources