Creating edge using vertex ids - graph

I am trying to bulk load data into janusgraph.
I am trying to follow bulk load recommendations
I am strugling with point (d) which is "Add all the edges using the map to look-up JanusGraph’s vertex id and retrieving the vertices using that id"
My current code(scala) looks like
val key = Key[String]("Key")
val sLabel : StepLabel[Vertex] = StepLabel("target")
case (src, edgeType, dest) =>
graph.V().has(key, dest).as(sLabel).traversal.V().has(key, src).addE(edgeType).to(sLabel).traversal
.property("propertyKey", "propertyValue")
I have vertexId of source and dest vertex but I am unable to figureout how to change this code to create edge using vertexIds.
I am quite new to gremlin, any help would be appreciated.

case (idSrc, edgeType, idDest) =>
g.V(idSrc).as("a").V(idDest).addE(edgeType).from("a")

Related

Sort paths based on Edge properties

Sorting traversal paths based on Edge property and Dedup
Hello,
I'm having a in memory graph and I want to sort paths based on Edge property and also dedup where paths leading to same destination.
E.g.
String NAME = "name";
String id = "id";
g.addV().property(id, 1).property(NAME, "u1").as("u1")
.addV().property(id, 2).property(NAME, "u2").as("u2")
.addV().property(id, 3).property(NAME, "u3").as("u3")
.addV().property(id, 4).property(NAME, "u4").as("u4")
.addE(rel).from("u2").to("u1").property("order", 2)
.addE(rel).from("u3").to("u1").property("order", 1)
.addE(rel).from("u4").to("u2").property("order", 3)
.addE(rel).from("u4").to("u3").property("order", 4)
.iterate();
What I'm trying to achieve is a traversal which gives me only one path i.e.
vertices = [path[u1, u3, u4]].
I tried using below gremlin.
List<Path> maps = g.V()
.has("id", 1)
.repeat(in()
.simplePath())
.until(inE().count().is(0))
.order().by(outE("rel").values("order"),Order.asc)
.path().by("name")
.toList();
However sorting doesn't happen. It gives me two paths :
vertices = [path[u1, u2, u4], path[u1, u3, u4]]
But I'm looking for output as vertices = [path[u1, u3, u4]]
I'm new to gremlin and ran out of options to try.
can someone help ?
g.V()
.has("id", 1)
.repeat(in("rel") .order() .by(outE().values("order"), Order.asc) .simplePath() )
.until(inE().count().is(0))
.dedup()
.path()
.by("name")
.toList() ;
Using toList will give you all the passible traversals. In your case you did order the answers but didn't take only the first one.
You should add limit step:
...
.limit(1).toList()
Or you can use next() instead of toList()

ArangoDB copy Vertex and Edges to neighbors

I'm trying to copy a vertex node and retain it's relationships in ArangoDB. I'm getting a "access after data-modification" error (1579). It doesn't like it when I iterate over the source node's edges and insert an edge copy within the loop. This makes sense but I'm struggling to figure out how to do what I'm wanting within a single transaction.
var query = arangojs.aqlQuery`
let tmpNode = (FOR v IN vertices FILTER v._id == ${nodeId} RETURN v)[0]
let nodeCopy = UNSET(tmpNode, '_id', '_key', '_rev')
let nodeCopyId = (INSERT nodeCopy IN 'vertices' RETURN NEW._id)[0]
FOR e IN GRAPH_EDGES('g', ${nodeId}, {'includeData': true, 'maxDepth': 1})
let tmpEdge = UNSET(e, '_id', '_key', '_rev')
let edgeCopy = MERGE(tmpEdge, {'_from': nodeCopyId})
INSERT edgeCopy IN 'edges'
`;
This quesion is somewhat similar to 'In AQL how to re-parent a vertex' - so let me explain this in a similar way.
One should use the ArangoDB 2.8 pattern matching traversals to solve this.
We will copy Alice to become Sally with similar relations:
let alice=DOCUMENT("persons/alice")
let newSally=UNSET(MERGE(alice, {_key: "sally", name: "Sally"}), '_id')
let r=(for v,e in 1..1 ANY alice GRAPH "knows_graph"
LET me = UNSET(e, "_id", "_key", "_rev")
LET newEdge = (me._to == "persons/alice") ?
MERGE(me, {_to: "persons/sally"}) :
MERGE(me, {_from: "persons/sally"})
INSERT newEdge IN knows RETURN newEdge)
INSERT newSally IN persons RETURN newSally
We therefore first load Alice. We UNSET the properties ArangoDB should set on its own. We change the properties that have to be uniq to be uniq for Alice so we have a Sally afterwards.
Then we open a subquery to traverse ANY first level relations of Alice. In this subequery we want to copy the edges - e. We need to UNSET once more the document attributes that have to be autogenerated by ArangoDB. We need to find out which side of _from and _to pointed to Alice and relocate it to Sally.
The final insert of Sally has to be outside of the subquery, else this statement will attempt to insert one Sally per edge we traverse. We can't insert Saly in front of the query as you already found out - no subsequent fetches are allowed after the insert.

How to modify the avro key/value schema in a RDD map transformation

I'm trying to migrate some Hadoop Map Reduce code to Spark and I have doubts about how to manage map and reduce transformations when the schema of either the key or value change from input to output.
I have avro files with Indicator records that I want to process somehow. I already have this code that works:
val myAvroJob = new Job()
myAvroJob.setInputFormatClass(classOf[AvroKeyInputFormat[Indicator]])
myAvroJob.setOutputFormatClass(classOf[AvroKeyOutputFormat[Indicator]])
myAvroJob.setOutputValueClass(classOf[NullWritable])
AvroJob.setInputValueSchema(myAvroJob, Schema.create(Schema.Type.NULL))
AvroJob.setInputKeySchema(myAvroJob, Indicator.SCHEMA$)
AvroJob.setOutputKeySchema(myAvroJob, Indicator.SCHEMA$)
val indicatorsRdd = sc.newAPIHadoopRDD(myAvroJob.getConfiguration,
classOf[AvroKeyInputFormat[Indicator]],
classOf[AvroKey[Indicator]],
classOf[NullWritable])
val myRecordOnlyRdd = indicatorsRdd.map(x => (doSomethingWith(x._1), NullWritable.get)
val indicatorPairRDD = new PairRDDFunctions(myRecordOnlyRdd)
indicatorPairRDD.saveAsNewAPIHadoopDataset(myAvroJob.getConfiguration)
But this code works since the schema of the input and ouput keys does not change, is always Indicator. In hadoop Map Reduce you can define a map or reduce functions and modify the schema from input to output. In fact, I have map functions which process every Indicator record and generates a new record SoporteCartera. How can I do this in spark? It is possible from the same RDD or I have to define 2 different RDDs and pass from one to another somehow?
Thanks for your help.
To answer my own question... the problem was that you cannot change the RDD type, you must define a different RDD, so I solved it with the above code:
val myAvroJob = new Job()
myAvroJob.setInputFormatClass(classOf[AvroKeyInputFormat[SoporteCartera]])
myAvroJob.setOutputFormatClass(classOf[AvroKeyOutputFormat[Indicator]])
myAvroJob.setOutputValueClass(classOf[NullWritable])
AvroJob.setInputValueSchema(myAvroJob, Schema.create(Schema.Type.NULL))
AvroJob.setInputKeySchema(myAvroJob, SoporteCartera.SCHEMA$)
AvroJob.setOutputKeySchema(myAvroJob, Indicator.SCHEMA$)
val soporteCarteraRdd = sc.newAPIHadoopRDD(myAvroJob.getConfiguration,
classOf[AvroKeyInputFormat[SoporteCartera]],
classOf[AvroKey[SoporteCartera]],
classOf[NullWritable])
val indicatorsRdd = soporteCarteraRdd.map(x => (fromSoporteCarteraToIndicator(x._1), NullWritable.get))
val indicatorPairRDD = new PairRDDFunctions(indicatorsRdd)
indicatorPairRDD.saveAsNewAPIHadoopDataset(myAvroJob.getConfiguration)

Select paths from traversal and filter on the destination vertex (OrientDB)

I am new to graph databases and OrientDB, so I appreciate your patience.
I have the following SQL query to produce an expanded set of results for the shortest path between two vertices (I am using the GratefulDeadConcerts database):
select expand(sp) from (select shortestPath(#9:2,#9:15,'BOTH') as sp)
For whatever reason, using expand without aliasing produces no results, but that isn't really an issue.
What I want is not the shortest path, but a collection of potential paths and branches.
I have tried playing with travesedVertex:
SELECT traversedVertex(-1) FROM ( TRAVERSE out() FROM #9:2 WHILE $depth <= 10 )
But I don't know how to set the destination, or (honestly) how to interpret the results I get.
EDIT
If there are multiple ways to get from A to B, I want each of those paths returned as a set, something like:
{
paths: [
[#9:2, #4:16, #8:7, #9:15],
[#9:2, #4:2, #16:5, #11:3, #9:15],
[#9:2, #4:4, #11:6, #9:15]
]
}
Thank you for your help.
First, $path is the string representation of the current path.
Second, you can filter on the destination columns on the where clause of the outer query. Try this :
SELECT
$path
FROM
( TRAVERSE
out()
FROM
#9:2
WHILE
$depth <= 10 )
WHERE
#rid = #9:15
I get the following output :
Is this what you are looking for ?
If I don't add the where clause, I get this output :

OrientDB GraphDatabase: OSQLSynchQuery for #RID to get graph.getVertex(rid) ... the fastest way to load a vertex from index key?

Given a basic Blueprints-compatible OrientGraph with Index 'name' (unique or notunique), any suggestions for how the following could be improved, if needs be?
Note: I can't find a definitive guide to load a [blueprints] vertex using index. I have a large graph and using has('name','bob') (in console) takes 2 minutes! On the other hand, an index-based search returns in milliseconds.
The best I've come up with so far:
OrientGraph graph = new OrientGraph("local:/graph1/databases/test", "admin", "admin");
List<ODocument> resultlist = graph.getRawGraph().query(new OSQLSynchQuery<ODocument>("SELECT FROM INDEX:name WHERE KEY = 'bob'"));
ODocument resultodoc = resultlist.get(0).field("rid");
String rid = resultodoc.getIdentity().toString(); // would return something like #6:1500000
Vertex v1 = graph.getVertex(rid);
System.out.println(v1.getProperty("name"));
OrientDB supports the IndexableGraph interface. To use it take a look at:
https://github.com/tinkerpop/blueprints/wiki/Graph-Indices

Resources