DSE graph Batch write with ifnotexist on edges - graph

I am using DSE graph to load data from a excel and preparing addE gremlin queries through java code and at last executing them over DSE graph.
In current testing need to fire 4,00,000 addE gremlin queries with two edge labels.
1) What is best practice to finish this execution in few minutes ?
Right now i am giving gremlin queries in 1000 batch to dseSession.executeGraph(new SimpleGraphStatement("")) which leading to exception Method code too large! at groovyjarjarasm.asm.MethodWriter
2) For edge labels in this usecase, my schema defined as single cardinality.
Also using custom vertex ids for vertexes.
So if a edge already exist then DSE should just ignore it without any exception ?

The query parameter should be a simple array that looks like this:
[[from1, to1, label1], [from2, to2, label2], ...]
Then your script should look like this:
for (def triple in arg) {
def (id1, id2, lbl) = triple
def v1 = graph.vertices(id1).next()
def v2 = graph.vertices(id2).next()
if (!g.V(v1).outE(lbl).filter(inV().is(v2)).hasNext()) {
v1.addEdge(lbl, v2)
}
}
Alternatively:
for (def triple in arg) {
def (id1, id2, lbl) = triple
def v1 = graph.vertices(id1).next()
if (!g.V(v1).outE(lbl).filter(inV().hasId(id2)).hasNext()) {
v1.addEdge(lbl, graph.vertices(id2).next())
}
}
Try both variants; at least one of them should outperform any other solution.

Related

How to traverse a graph and pattern-match a subgraph in Gremlin

I have a graph which is made of many instances of the same pattern (or subgraph).
The subgraph of interest is pictured below.
The relationship cardinality between the nodes are:
s -> c (one-many)
c -> p (many-many)
p -> aid (one-many)
p -> rA (one-one)
p -> rB (one-one)
p -> o (many-one)
The goal is to return a list of all instances of this subgraph or pattern as shown below
[
{
s-1,
c-1,
p-1,
aid-1,
o-1,
rA-1,
rB-1
},
{
s-2,
c-2,
p-2,
aid-2,
o-2,
rA-2,
rB-2
},
{
... so on and so forth
}
]
How do I query my graph to return this response?
I have tried using a combination of and() and or() as shown below, but that did not capture the entire subpattern as desired.
g.V().hasLabel('severity').as('s').out('severity').as('c').out('affecting').as('p')
.and(
out('ownedBy').as('o'),
out('rA').as('rA'),
out('rB').as('rB'),
out('package_to_aid').as('aid')
)
.project('p', 'c', 's', 'o', 'rA', 'r', 'aid').
by(valueMap()).
by(__.in('affecting').values('cve_id')).
by(__.in('affecting').in('severity').values('severity')).
by(out('ownedBy').values('name')).
by(out('rA').valueMap()).
by(out('rB').valueMap()).
by(out('package_to_aid').values('aid')).
I know I can use a series of out() and in() steps to traverse a non-branching path (for example the nodes: s->c->p), however I am struggling with capturing/traversing paths that branch out (for example, the node p and its 3 children nodes: rA, rB, and o)
I looked at Union() but I am unable to make it work either.
I am unable to find examples of similar queries online. Does Gremlin allow this sort of traversal, or do I have to remodel my graph as a Linked-list for this to work?
ps. I am doing this on Cosmos where Match() step is not supported

Vertex in Python Gremlin not updating

Using python gremlin on Neptune workbench, I have two functions:
The first adds a Vertex with a set of properties, and returns a reference to the traversal operation
The second adds to that traversal operation.
For some reason, the first function's operations are getting persisted to the DB, but the second operations do not. Why is this?
Here are the two functions:
def add_v(v_type, name):
tmp_id = get_id(f"{v_type}-{name}")
result = g.addV(v_type).property('id', tmp_id).property('name', name)
result.iterate()
return result
def process_records(features):
for i in features:
v_type = i[0]
name = i[1]
v = add_v(v_type, name)
if len(i) > 2:
%debug
props = i[2]
for r in props:
v.property(r[0], r[1]).iterate()
Your add_V method has already iterated the traversal. If you want to return the traversal from add_v in a way that you can add to it remove the iterate.

Using Parameters in Neo4j Relationship Queries

I'm struggling to work around a small limitation of Neo4j in that I am unable to use a parameter in the Relationship section of a Cypher query.
Christophe Willemsen has already graciously assisted me in working my query to the following:
MATCH (n1:Point { name: {n1name} }),
(n2:Point { name: {n2name} }),
p = shortestPath((n1)-[r]->(n2))
WHERE type(r) = {relType}
RETURN p
Unfortunately as r is a Collection of relationships and not a single relationship, this fails with an error:
scala.collection.immutable.Stream$Cons cannot be cast to org.neo4j.graphdb.Relationship
Removing the use of shortestPath() allows the query to run successfully but returns no results.
Essentially my graph is a massive collection of "paths" that link "points" together. It is currently structured as such:
http://console.neo4j.org/r/rholp
I need to be able to provide a starting point (n1Name), an ending point (n2Name), and a single path to travel along (relType). I need a list of nodes to come out of the query (all the ones along the path).
Have I structured my graph incorrectly / not optimally? I am open to advice on whether the overall structure is not optimal as well as advice on how best to structure the query!
EDIT
Regarding your edit, the nodes() function returns you the nodes along the path :
MATCH p=allShortestPaths((n:Point { name:"Point5" })-[*]->(n2:Point { name:"Point8" }))
WHERE ALL (r IN rels(p) WHERE type(r)={relType})
RETURN nodes(p)
In the console link, it is returning nodes Points 5,6,7,8
I guess in your case that using a common relationship type name for connecting your Point nodes would be more efficient.
If having a Path1, Path2, .. is for knowing the distance between two points, you can easily know the distance by asking for the length of the path, like this query related to your console link :
MATCH (n:Point { name:"Point1" })
WITH n
MATCH (n2:Point { name:"Point4" })
WITH n, n2
MATCH p=shortestPath((n)-[]->(n2))
RETURN length(p)
If you need to return only paths having a defined relationship length, you can use it without the shortestPath by specifying a strict depth :
MATCH (n:Point { name:"Point1" })
WITH n
MATCH (n2:Point { name:"Point4" })
WITH n, n2
MATCH p=(n)-[*3..3]->(n2)
RETURN length(p)
LIMIT1
As you can see here, the need to specify the relationship is not mandatory, you can just omit it or add the :NEXT type if you have other relationship types in your graph
If you need to match on the type, for e.g. the path from point 5 to point 8 in your console link, and the path can only have a PATH_TWO relationship, then you can do this :
MATCH (n:Point { name:"Point5" })
WITH n
MATCH (n2:Point { name:"Point8" })
WITH n, n2
MATCH p=(n)-[r*]->(n2)
WHERE type(r[0])= 'PATH_TWO'
WITH p, length(p) AS l
ORDER BY l
RETURN p, l
LIMIT 1
If you really NEED to have the Path1, Path2 style, maybe a short explanation on the need could help us find the more appropriate query
MATCH p=shortestpath((n1:Point{name:{n1name}})-[:relType *]->(n2:Point {name:{n2name}}))
RETURN p

Making cyclic graphs in F#. Is mutability required?

I'm trying to do a cyclic graph in F#
My node type looks something like this:
type Node = { Value : int; Edges : Node list }
My question is: Do I need to make Edges mutable in order to have cycles?
F# makes it possible to create immediate recursive object references with cycles, but this really only works on (fairly simple) records. So, if you try this on your definition it won't work:
let rec loop =
{ Value = 0;
Edges = [loop] }
However, you can still avoid mutation - one reasonable alternative is to use lazy values:
type Node = { Value : int; Edges : Lazy<Node list>}
This way, you are giving the compiler "enough time" to create a loop value before it needs to evaluate the edges (and access the loop value again):
let rec loop =
{ Value = 0;
Edges = lazy [loop] }
In practice, you'll probably want to call some functions to create the edges, but that should work too. You should be able to write e.g. Edges = lazy (someFancyFunction loop).
Alternatively, you could also use seq<Edges> (as sequences are lazy by default), but that would re-evaluate the edges every time, so you probably don't want to do that.

Parallelize function on dictionary in IPython

Up till now, I have parallelized functions by mapping them on to lists that are distributed out to the various clusters using the function map_sync(function, list) .
Now, I need to run a function on each entry of a dictionary.
map_sync does not seem work on dictionaries. I have also tried to scatter the dictionary and use decorators to run the function in parallel. However, dictionaries dont seem to lend themselves to scattering either. Is there some other way to parallelize functions on dictionaries without having to convert to lists?
These are my attempts thus far:
from IPython.parallel import Client
rc = Client()
dview = rc[:]
test_dict = {'43':"lion", '34':"tiger", '343':"duck"}
dview.scatter("test",test)
dview["test"]
# this yields [['343'], ['43'], ['34'], []] on 4 clusters
# which suggests that a dictionary can't be scattered?
Needless to say, when I run the function itself, I get an error:
#dview.parallel(block=True)
def run():
for d,v in test.iteritems():
print d,v
run()
AttributeError
Traceback (most recent call last) in ()
in run(dict)
AttributeError: 'str' object has no attribute 'iteritems'
I don't know if it's relevant, but I'm using an IPython Notebook connected to Amazon AWS clusters.
You can scatter a dict with:
def scatter_dict(view, name, d):
"""partition a dictionary across the engines of a view"""
ntargets = len(view)
keys = d.keys() # list(d.keys()) in Python 3
for i, target in enumerate(view.targets):
subd = {}
for key in keys[i::ntargets]:
subd[key] = d[key]
view.client[target][name] = subd
scatter_dict(dview, 'test', test_dict)
and then operate on it remotely, as you normally would.
You can also gather the remote dicts into one local one again with:
def gather_dict(view, name):
"""gather dictionaries from a DirectView"""
merged = {}
for d in view.pull(name):
merged.update(d)
return merged
gather_dict(dv, 'test')
An example notebook

Resources