Gremlin where query won't use values from labeled vertices - gremlin

I'm trying to find the edge between 2 vertexes in a single query on Neptun, but something is really weird.
Here's the query
g.V().has("PRINCIPAL", "principal id", "Test User").as("principal").id().as("principal_id")
.select("principal").out().hasLabel("LICENSE").as("license").valueMap(true).as("license_vm")
.select("license").in("is attached to").as("attachments")
.select("license").inE().where(outV().hasId(select("principal").id()))
.valueMap(true)
I know it's complicated, but here's the idea:
Visit a PRINCIPAL vertex, save a reference to is as "principal" and a reference to its id as "principal_id"
Visit its out() vertices and label them as "license", I'm also saving their valuemaps for later
return to license, and get all of the in() vertices that are associated by an edge labelled "is attached to", save a reference to those as "attachments"
return to license, but now, I want to get the in-edge that points to my "principal" (from earlier). This is where it gets weird, no matter what I do here, I can't get it to recognize the principal in a where clause. THe above example fails, but if I copy-paste the literal id, it works fine
What am I missing?

Replace the where step on the second to last line of the query with:
filter(out().where(eq('principal')).by(T.id))
That should get it working.
The hasId step can take either a predicate such as gt(123) or a list of one or more ID values. That's why it works when you use the ID value.

Related

How to filter edges by attributes in Gephi?

I have some edges with their corresponding labels, and I want to filter in only records with label 1, but it just doesn't work as shown below.
The function works to filter in nodes but doesn't work for edges. I thought it would be due to that there were too many edges, then I tried .gexf files with only hundreds of edges, but the problem remains. I also tried to create a new column in the app or create the column using Python in the .gexf file, but both failed. Sometimes an error arises: an error occurred while fetching data.
I wonder how to filter in only matched edges on Gephi?
It seems that you must have an entry for every edge in the Label column.
What you can do in your situation:
Sort the edges according to Label by clicking on the column name (might click twice).
Select edges that don't have a label yet.
Right-click: Edit all edges.
Give a default Label in the edit menu.
If you don't already have labels and want to manually assign them in the Data Table, you can also use Fill column with a value and give a default value to every edge.
This is probably a bug since we get a NullPointerException sometimes, probably because filtering doesn't expect null values in the label column (at least judging after a quick glance at the stack trace). You might file this to their GitHub Issue Page over here.
In addition:
A useful tutorial notes: "However, looking at the "catalogue" of filters, we see no filter on Label. The reason is that Label is an internal property of nodes, inaccessible to filters. So we must first copy the Labels of the nodes in a new attribute, which we will be able to apply a filter on." Whilst the tutorial refers to nodes the same idea works for edges: create a new edge column named whatever you choose and copy your edge labels into it. You can filter using this new column. NB: I can find the filter under Attributes: Equal but not Attributes: Partition, but it may help you. NNB: If you can't see the filter after creating the new column, you may have to hit Reset at the top of the Filters panel.

Shortest Path between nodes of some specific type and destination node

Say I want to find the shortest path between some node of a specific type (say "central production unit") and a defined end node (say "consumer" with an id), how to I calculate this in Neo4j with Cypher?
With such queries i'd like to answer questions like: "Which production unit feeds this customer with the shortest distance".
I tried with queries like:
match p=AllShortestPaths((source:Asset)-[:LINKS_TO*]-(destination:Asset))
where source.type = 'central production unit' and destination.id = '1234'
return extract(n in nodes(p)| n.type) as type_path,
extract(n in nodes(p)| n.id) as id_path,
length(p) as path_length;
queries like the one above will run into an out of memory error.
Using the same query but instead of a type of node, entering a specific id works perfectly fine.
Sniffing around on Stackoverflow i've found ceveral examples of 1 specific node to 1 other specific node, but not 1 yet to determine node of a certain type to 1 specific node.
I think i've found a solution using the sanningTree procedure.
This works pretty fast! I do not understand why. And how to include link properties to minimize on fysical properties instead of number of hops.
// first match to and collect end nodes
MATCH (m:Asset {type:'central production unit'})
WITH collect(m) as endNodes
MATCH (n:Asset {id:'1234'})
// proc call will be executed per n node, finding the first shortest path found from n to one of the end nodes
CALL apoc.path.spanningTree(n, {endNodes:endNodes, limit:1}) YIELD path
RETURN path

Why does SELECT then performing a step like hasId() change what was selected?

Am I not using select() properly in my code? When I re-select("pair") for some reason, what it contained originally has been updated after performing some step. Shouldn't what was labeled using as() preserve what was contained?
g.V()
.hasLabel("Project")
.hasId("parentId","childId").as("pair")
.select("pair")
.hasId("parentId").as("parent")
.select("pair") // no longer what it was originally set to
I think this is expected. You (presumably) find two vertices with hasId("parentId","childId") and so the first select("pair") would of course show each vertex. But, then you filter again, hasId("parentId") and kill the traverser that contains the vertex with the id of "childId". It gets filtered away and therefore never triggers the second/last select("pair") step and would only therefore return the one vertex that has the id of "parentId".

How to retrieve more than 2 properties in a path in gremlin.

I wanted to get two properties as results but i got only one. what i did was using the given code in gremlin
g.V().repeat(out()).until(has('title','school')).path().by('title').by('name')
how to get with both of them.
The by() modulators are applied round-robin to the Path objects so, for the first item in the path you'll get "title", then the second item will get "name", then the third item, 'title'. If you want both "title" and "name" for each vertex in the path then you need to specify that in a single by().
by() can take more than just a string (i.e. property key) as a value. It can also take a traversal and therefore you have many options to get what you want. Here's one way to do it:
g.V().repeat(out()).until(has('title','school')).
path().by(values('name','title').fold())

How to get a path from one node to another including all other nodes and relationships involved in between

I have designed a model in Neo4j in order to get paths from one station to another including platforms/legs involved. The model is depicted down here. Basically, I need a query to take me from NBW to RD. also shows the platforms and legs involved. I am struggling with the query. I get no result. Appreciate if someone helps.
Here is my cypher statement:
MATCH p = (a:Station)-[r:Goto|can_board|can_alight|has_platfrom*0..]->(c:Station)
WHERE (a.name='NBW')
AND c.name='RD'
RETURN p
Model:
As mentioned in the comments, in Cypher you can't use a directed variable-length relationship that uses differing directions for some of the relationships.
However, APOC Procedures just added the ability to expand based on sequences of relationships. You can give this a try:
MATCH (start:station), (end:station)
WHERE start.name='NBW' AND end.name='THT'
CALL apoc.path.expandConfig(start, {terminatorNodes:[end], limit:1,
relationshipFilter:'has_platform>, can_board>, goto>, can_alight>, <has_platform'}) YIELD path
RETURN path
I added a limit so that only the first (and shortest) path to your end station will be returned. Removing the limit isn't advisable, since this will continue to repeat the relationships in the expansion, going from station to station, until it finds all possible ways to get to your end station, which could hang your query.
EDIT
Regarding the new model changes, the reason the above will not work is because relationship sequences can't contain a variable-length sequence within them. You have 2 goto> relationships to traverse, but only one is specified in the sequence.
Here's an alternative that doesn't use sequences, just a whitelisting of allowed relationships. The spanningTree() procedure uses NODE_GLOBAL uniqueness so there will only be a single unique path to each node found (paths will not backtrack or revisit previously-visited nodes).
MATCH (start:station), (end:station)
WHERE start.name='NBW' AND end.name='RD'
CALL apoc.path.spanningTree(start, {terminatorNodes:[end], limit:1,
relationshipFilter:'has_platform>|can_board>|goto>|can_alight>|<has_platform'}) YIELD path
RETURN path
Your query is directed --> and not all of the relationships between your two stations run in the same direction. If you remove the relationship direction you will get a result.
Then once you have a result I think something like this could get you pointed in the right direction on extracting the particular details from the resulting path once you get that working.
Essentially I am assuming that everything you are interested in is in your path that is returned you just need to filter out the different pieces that are returned.
As #InverseFalcon points out this query should be limited in a larger graph or it could easily run away.
MATCH p = (a:Station)-[r:Goto|can_board|can_alight|has_platfrom*0..]-(c:Station)
WHERE (a.name='NBW')
AND c.name='THT'
RETURN filter( n in nodes(p) WHERE 'Platform' in labels(n)) AS Platforms

Resources