Unpack vertices' properties returned from a select() - gremlin

I would like to write a MATCH query in gremlin to find a subgraph matching a particular pattern. The pattern of interest contains 4 different types/labels of nodes: c, p, r, and s. And 4 different types of edges as shown below:
(c)->[affecting]->(p)
(c)-[c_found_in_release]->(r)
(p)-[p_found_in_release]->(r)
(s)-[severity]->(c)
So far I have the query below which works fine, however, the results do not show the properties of the vertices returned. Since the verticies returned from the select() statement belong to different types of nodes, I cannot use something like value() or valueMap()
g.V().match(
__.as('c').out('affecting').as('p'), \
__.as('c').out('cve_found_in_release').as('r'), \
__.as('p').out('pack_found_in_release').as('r'), \
__.as('s').both('severity').as('c') \
). \
select('c', 'p', 'r', 's').limit(10)
Current result:
==>[c:v[0],p:v[3],r:v[6],s:v[10]]
How to get something more detailed like this instead:
Desired result:
==>[
c:[cve_id:[CVE-2021-3618],publishedOn:[2022-03-23],
p:[name:[vsftpd],version:[3.0.3]],
r:[sourceBranch:[1.0],detectedOn:[2022-04-05],status:[Upgraded]],
s:[severity:[High]],
]

You can simply add additional by() modulators. On the Tinkergraph modern example graph:
g = TinkerFactory.createModern().traversal()
g.V().match(
__.as('v').hasLabel('software').as('s'),
__.as('s').both().hasLabel('person').as('p')
).select('s', 'p')
.by(values('name', 'lang').fold())
.by(values('name', 'age').fold())
==>[s:[lop,java],p:[marko,29]]
==>[s:[lop,java],p:[josh,32]]
==>[s:[lop,java],p:[peter,35]]
==>[s:[ripple,java],p:[josh,32]]

Related

How to migrate Presto `map` function to hive

Presto map() function is quite a bit easier to use than hive. A presto map() invocation takes two lists: first one for the keys second for the values
A hive map() takes a varargs variable length parameter set of alternating key,values.
Here is a query snippet that I need to migrate (backwards?) from presto to hive:
, map(
concat(map_keys(decision_feature_importance), array['id_queue', 'queue_disposition']),
concat(map_values(decision_feature_importance), array[CAST(id_queue AS VARCHAR), queue_disposition])) other_info
The core of it is that the map() accepts two parallel arrays. But hive objects rather strongly to that. What is the pattern to [reverse- ?] migrate the map() ?
There are several questions about zipping lists in hive: e.g hive create map or key/value pair from two arrays They are pretty complicated, may involve UDF's (that I do not have ability to create) or libraries (brickhouse) that I do not have ability to install (shared cluster for hundreds of users). Also they constitute only a portion of the problem here.
The following toy query shows how to build the hive format map entries from two parallel lists. Basically we need to zip the lists manually - since there is no such builtin function for hive.
Hive partial equivalent
with mydata as (
select 1 id, map('key11','val11','key12','val12','key13','val13') as mymap
union all
select 2 id, map('key21','val21','key22','val22','key13','val13') as mymap
)
select split(concat_ws(',',collect_list(concat(key,',',value ))),',') keyval from (
select * from mydata lateral view outer explode (mymap) m
) d;

Gremlin continue traversal only if 2 vertices are not the same

I have a query which looks at 2 different vertices and I want to stop traversing if they don't both roll up to the same root ancestor via a path of "contains" edges.
g.V('node1')
.until(hasLabel('root')).repeat(in('contains')).as('node1Root')
.V('node2')
.until(hasLabel('root')).repeat(in('contains')).as('node2Root')
//FILTER|WHERE clause
I'd like to confirm that node1Root and node2root are the same vertex before continuing the traversal, but for the life of me I cannot figure out how to do this.
I've tried the following:
g.V('node1')
.until(hasLabel('root')).repeat(in('contains')).as('node1Root')
.V('node2')
.until(hasLabel('root')).repeat(in('contains')).as('node2Root')
//.where('node1Root', P.eq('node2Root')
//.where(select("node1Root").is(P.eq("node2Root")))
//.where(select("node1Root").is("node2Root"))
What's interesting is that the following query does work to filter appropriately.
g.V('node1').as('1')
.V('node2').as('2')
.where('1', P.eq('2'))
I'm not sure if there's something up with the until/repeat that screws it up or if I'm just doing something blatantly wrong. Any help would be much appreciated.
Thanks!
I found How to check equality with nodes from an earlier part of query in Gremlin?
and it seems like you use "as" with the same key as the previous "as" and if they match its considered equal.
So here's the winner (I think):
g.V('node1')
.until(hasLabel('root')).repeat(in('contains')).as('node1Root')
.V('node2')
.until(hasLabel('root')).repeat(in('contains')).as('node2Root')
.where(select('node1Root').as('node2Root')
//.not(select('node1Root').as('node2Root')) //OR this to determine they aren't the same
//continue traversal
I also found that my original issue was that the .until().repeat() steps could return a LIST, but in my case I know that my graph model will always return a single 'root' so to make it work, I can use 'unfold'
g.V('node1')
.until(hasLabel('root')).repeat(in('contains')).unfold().as('node1Root')
.V('node2')
.until(hasLabel('root')).repeat(in('contains')).unfold().as('node2Root')
.where('node1Root', P.eq('node2Root')
I think I'll be going with the second solution because I'm much more confident in it, unless I hear otherwise.
You can try this gremlin query
g.V(node1-id)
.map(until(hasLabel('root')).repeat(in().aggregate('x')).cap('x')).as("array")
.V(node2-id)
.until(
as("i").select("array").unfold().as("j")
.where("i", eq("j"))
).repeat(in())
Here we are putting all the vertices in path to root from node1 in an array, and secondly we are checking existence of node in array.
this query can only work with traversal with only one iteration because aggregate step collect to a global variable to traversal that means it will be same array for every iteration. To fix this If you are doing this on jvm do use lamda/groovy closures
g.V(node-start-id-1,node-start-id-2)
.map(
{ x->
var v = x.get()
var g = getGraph().get().traversal();
g.V(v.id())until(hasLabel('root')).repeat(in().aggregate('x')).cap('x')).next()
}
)
.as("array")
.V(node2-id)
.until(
as("i").select("array").unfold().as("j")
.where("i", eq("j"))
).repeat(in())

Gremlin - Using an OR step to get different types of connected vertices

So I have a graph schema where vertex type A can connect inwards to vertex type B or type C in a one to many relationship. I'm trying to write a query that outputs any of those relationships if they exist, for instance a sample output would be:
Type A | Type B | Type C
Sample1A,'', Sample1C
Sample2A, Sample2B, ''
Sample3A, Sample3B, Sample3C
Sample4A, 'Sample4Ba, Sample4Bb', Sample4C
The fourth example is if A is connected to multiple B types. If B and C don't exist, then nothing is output.
So far I have the query: g.V().hasLabel('A').as('A').in('connect').hasLabel('B').as('B').or().in('connect').hasLabel('C').as('C').select('A','B','C')
But this query only returns the A vertices without any B's or C's.
Using AWS Neptune if that matters.
As kevin mentioned in the comment, you can use .project() method for this scenario.
g.V().hasLabel("A").as("A").project("a","b", "c")
.by(select("A"))
.by(choose(in("connect").hasLabel("B").count().is(0), constant("NO_B").value(), in("connect").hasLabel("B")))
.by(choose(in("connect").hasLabel("C").count().is(0), constant("NO_C").value() , in("connect").hasLabel("C")));
Your or() steps are not returning a result as written. You could simplify the query as follows:
g.V().hasLabel('A').as('A').in('connect').hasLabel(within('B','C').as('B').select('A','B')
This avoids using select('A','B','C') as only one of 'B' or 'C' will have a result in the or() case.
Here is a version that still uses or()
g.V().hasLabel('A').as('A').in().or(hasLabel('B'),hasLabel('C')).as('B').select('A','B')

zsh : Testing the existence of a variable via indirect reference

If I want to know, whether variable v exists in zsh, I can use ${+v}. Example:
u=xxx
v=
print ${+u} ${+v} ${+w}
outputs 1 1 0.
If I want to access the content of a variable, where I have the NAME of it stored in variable v, I can do it with ${(P)v}. Example:
a=xxx
b=a
print ${(P)b}
outputs xxx.
Now I would like to combine the two: Testing whether a variable exists, but the name of the variable is stored in another variable. How can I do this? Example:
r=XXX
p=r
q=s
Here is my approach which does NOT work:
print ${+${(P)p}} # Expect 1, because $p is r and r exists.
print ${+${(P)q}} # Expect 0, because $q is s and s does not exist
However, I get the error message zsh: bad substitution.
Is there a way I can achieve my goal without reverting to eval?
print ${(P)+p}
print ${(P)+q}
The opening parenthesis of of a Parameter Expansion Flag needs to follow immediately after the opening brace. Also, it is not necessary to explicitly substitute p or q as (P) takes care of that. Nevertheless, ${(P)+${p}} and ${(P)+${q}} would also work.

PostScript forall on dictionaries

According to the PLRM it doesn't matter in which order you execute a forall on a dict:
(p. 597) forall pushes a key and a value on the operand stack and executes proc for each key-value pair in the dictionary
...
(p. 597) The order in which forall enumerates the entries in the dictionary is arbitrary. New entries put in the dictionary during the execution of proc may or may not be included in the enumeration. Existing entries removed from the dictionary by proc will not be encountered later in the enumeration.
Now I was executing some code:
/d 5 dict def
d /abc 123 put
d { } forall
My output (operand stack) is:
--------top-
/abc
123
-----bottom-
The output of ghostscript and PLRM (operand stack) is:
--------top-
123
/abc
-----bottom-
Does it really not matter in what order you process the key-value pairs of the dict?
on the stack, do you first need to push the value and then the key, or do you need to push the key first? (as the PLRM only talks about "a key and a value", but doesnt tell you anything about the order).
Thanks in advance
It would probably help if you quoted the page number qhen you quote sections from the PLRM, its hard to see where you are getting this from.
When executing forall the order in which forall enumerates the dictionary pairs is arbitrary, you have no influence over it. However forall always pushes the key and then the value. Even if this is implied in the text you (didn't quite) quote, you can see from the example in the forall operator that this is hte case.
when you say 'my output' do you mean you are writing your own PostScript interpreter ? If so then your output is incorrect, when pushing a key/value pair the key is pushed first.

Resources