Can Gremlin aggregate the values of edges connected to the same node? - gremlin

Suppose you have one node with label, 'A'. This node is connected to many nodes with label, 'B', via edges with label 'e'. For a given B, there can be many edges between A and B with the same label, 'e'. On each edge, there is a property, 'p'.
We want to aggregate all the 'p' properties from edges connected from A, to the same B.
E.g. suppose we have a particular B. One edge between A and that B has a 'p' value of 'foo', and another edge connecting to the same B has a 'p' value of 'bar'. Their aggregation would be:
{'e': {'p': ['foo', 'bar']}
How can this be achieved?
At the moment, I have this query:
g.V()
.hasLabel('A').as('A')
.outE().hasLabel('e').as('e')
.inV().hasLabel('B').as('B')
.select('A', 'e', 'B')
.by(valueMap())
It would produce an output like this:
[
{{'A': {'name': ['john']}, {'e': {'p': ['foo']}, 'B': {'place': 'Qatar'}},
{{'A': {'name': ['john']}, {'e': {'p': ['bar']}, 'B': {'place': 'Qatar'}},
{{'A': {'name': ['john']}, {'e': {'p': ['hello']}, 'B': {'place': 'Argentina'}},
{{'A': {'name': ['john']}, {'e': {'p': ['goodbye']}, 'B': {'place': 'Argentina'}}
]
Whereas, I would want this:
[
{{'A': {'name': ['john']}, {'e': {'p': ['foo', 'bar']}, 'B': {'place': 'Qatar'}},
{{'A': {'name': ['john']}, {'e': {'p': ['hello', 'goodbye']}, 'B': {'place': 'Argentina'}}
]

Using the data from the question, the following graph can be built:
g.addV('A').property('name','john').property(id,'J1').as('j').
addV('B').property('place','Qatar').property(id,'Q1').as('q').
addV('B').property('place','Argentina').property(id,'A1').as('a').
addE('e').from('j').to('q').property('p','foo').
addE('e').from('j').to('q').property('p','bar').
addE('e').from('j').to('a').property('p','hello').
addE('e').from('j').to('a').property('p','goodbye')
Using that graph, we can get close to what you are looking for using a nested group step. From these building blocks you should be able to construct other variations:
g.V().hasLabel('A').as('a').outE('e').as('e').inV().hasLabel('B').as('b').
group().
by(select('a').values('name')).
by(group().
by(select('b').values('place')).
by(select('e').values('p').fold()))
Which yields
{'john': {'Argentina': ['hello', 'goodbye'], 'Qatar': ['foo', 'bar']}}
Using valueMap we can add the keys to the result:
g.V().hasLabel('A').as('a').outE('e').as('e').inV().hasLabel('B').as('b').
group().
by(select('a').values('name')).
by(group().
by(select('b').valueMap('place')).
by(select('e').valueMap('p').unfold().group().by(keys).by(values)))
Which produces
{'john': {{'place': ('Argentina',)}: {'p': ['hello', 'goodbye']}, {'place': ('Qatar',)}: {'p': ['foo', 'bar']}}}
So, what we end up with, for each person (just "john" in this case), is a list containing each place they visited along with the "p" values for each edge that got them there). You can then select into that nested structure any way you need to to extract individual pieces. With these building blocks you should be able to tweak things to get any variations of this output that you prefer.

Related

jq flatten list of objects into one object

I want to go from
[
{"key_skjdghkbs": "deep house"},
{"key_kjsskjbgs": "deadmau5"},
{"key_jhw98w4hl": "progressive house"},
{"key_sjkh348vg": "swedish house mafia"},
{"key_js3485jwh": "dubstep"},
{"key_jsg587jhs": "escape"}
]
to
{
"key_skjdghkbs": "deep house",
"key_kjsskjbgs": "deadmau5"
"key_jhw98w4hl": "progressive house",
"key_sjkh348vg": "swedish house mafia",
"key_js3485jwh": "dubstep",
"key_jsg587jhs": "escape"
}
Each object in the original list has exactly one key but the keys are unique.
I could do something like jq .[] .genre if the keys were the same but they're not.
jq's add function does exactly this
jq 'add'
Try this (assuming your file is named so72297039.json):
jq '[.[] | to_entries] | flatten | from_entries' < so72297039.json
(Edit: OP edited question, so here's relevant answer)
Since duplicate keys are not possible (see other answer) you can use to format like this:
{
"artist": [
"deadmau5",
"swedish house mafia"
],
"genre": [
"deep house",
"progressive house",
"dubstep"
],
"song": [
"escape"
]
}
with a jq call like this:
jq '
map(to_entries)
| flatten
| group_by(.key)
| map({key: first.key, value: map(.value)})
| from_entries
' input.json
If the keys artist, genre, song are known in advance an easier to understand expression can be used.

Getting properties of a vertex along with the properties of its child vertices [Gremlin]

I’m using Gremlin for querying a Graph DB and I’m having trouble figuring out how to retrieve all properties of a specific vertex along with all properties of specific child vertices. I know valueMap() is generally the operation to use to expose properties of nodes, but I'm not sure I am using it correctly.
Here is a visual representation of the Graph that I am working with. In this graph there are author nodes, which can be related to multiple book nodes connected by a wrote edge. And a book node can be connected to multiple chapter nodes by a hasChapter edge. A book node has a title and year as additional properties, while a chapter node has a name and a length as additional properties:
Here is the data that produces the above graph:
g.addV('author').property(id, 'author1').as('a1').
addV('book').
property(id,'book1').
property('title', 'Book 1').
property('year', '1999').
as('b1').
addV('book').
property(id,'book2').
property('title', 'Book 2').
property('year', '2000').
as('b2').
addV('book').
property(id,'book3').
property('title', 'Book 3').
property('year', '2002').
as('b3').
addE('wrote').from('a1').to('b1').
addE('wrote').from('a1').to('b2').
addE('wrote').from('a1').to('b3').
addV('chapter').
property(id,'b1chapter1').
property('name', 'The Start').
property('length', '350').
as('b1c1').
addV('chapter').
property(id,'b1chapter2').
property('name', 'Trees').
property('length', '500').
as('b1c2').
addV('chapter').
property(id,'b2chapter1').
property('name', 'Chapter 1').
property('length', '425').
as('b2c1').
addV('chapter').
property(id,'b2chapter2').
property('name', 'Chapter 2').
property('length', '650').
as('b2c2').
addV('chapter').
property(id,'b2chapter3').
property('name', 'Chapter 3').
property('length', '505').
as('b2c3').
addE('hasChapter').from('b1').to('b1c1').
addE('hasChapter').from('b1').to('b1c2').
addE('hasChapter').from('b2').to('b2c1').
addE('hasChapter').from('b2').to('b2c2').
addE('hasChapter').from('b2').to('b2c3').
iterate()
I would like to form a query that is able to return the properties of all books that are written by author1, along with all properties of each book’s chapters, ideally sorted by date (ascending). I'm wondering if it's possible to make a query that would return the results in the following fashion (or something similar enough that I can parse through on the client side):
1 {'year': ['1999'], 'title': ['Book 1'], 'chapters': [{'name': ['The Start'], 'length': ['350']}, {'name': ['Trees'], 'length': ['500']}]}
2 {'year': ['2000'], 'title': ['Book 2'], 'chapters': [{'name': ['Chapter 1'], 'length': ['425']}, {'name': ['Chapter 2'], 'length': ['650']}, {'name': ['Chapter 3'], 'length': ['505']}]}
3 {'year': ['2002'], 'title': ['Book 3'], 'chapters': []}
So far, I have attempted a few variations of this query with no luck:
g.V('author1').as('writer')
.out('wrote').as('written')
.order().by('year', asc)
.out('hasChapter').as('chapter')
.project('written', 'chapter')
.by(valueMap())
which returns:
1 {'written': {'name': ['The Start'], 'length': ['350']}, 'chapter': {'name': ['The Start'], 'length': ['350']}}
2 {'written': {'name': ['Trees'], 'length': ['500']}, 'chapter': {'name': ['Trees'], 'length': ['500']}}
3 {'written': {'name': ['Chapter 1'], 'length': ['425']}, 'chapter': {'name': ['Chapter 1'], 'length': ['425']}}
4 {'written': {'name': ['Chapter 2'], 'length': ['650']}, 'chapter': {'name': ['Chapter 2'], 'length': ['650']}}
5 {'written': {'name': ['Chapter 3'], 'length': ['505']}, 'chapter': {'name': ['Chapter 3'], 'length': ['505']}}
The above query only returns the chapter properties for all books with chapters, whereas I'm looking for a query that will give me all book properties (regardless of whether the book has chapters) and all chapter properties of each book. Anyone have any advice on the use of valueMap() across multiple levels of traversal. Ideally would like to avoid multiple queries to the Graph DB, but open to solutions that involve doing so.
This below query gives the result exactly as you needed.
I think the key is projecting the values inside the by modulator.
gremlin> g.V('author1').
......1> out('wrote').
......2> order().by('year', asc).
......3> project('year', 'title', 'chapters').
......4> by('year').
......5> by('title').
......6> by(out('hasChapter').valueMap().fold())
==>[year:1999,title:Book 1,chapters:[[name:[Trees],length:[500]],[name:[The Start],length:[350]]]]
==>[year:2000,title:Book 2,chapters:[[name:[Chapter 1],length:[425]],[name:[Chapter 2],length:[650]],[name:[Chapter 3],length:[505]]]]
==>[year:2002,title:Book 3,chapters:[]]

Aggregation/GroupCount on graphDatabase

I have a graph's database in gremlin with shape like this image:
Vertices:
Person
Event:
Edge:
Attends
I need help to build a query to get results between all "Persons", with the edge as a count of all "Events" in common. The result should be something like this:
{
nodes: [
{id:"PersonA", label: "Person A"},
{id:"PersonB", label: "Person B"},
{id:"PersonC", label: "Person C"},
{id:"PersonD", label: "Person D"},
{id:"PersonE", label: "Person E"},
{id:"PersonF", label: "Person F"},
],
edges: [
{from: "PersonA", to: "PersonB", label: 1},
{from: "PersonA", to: "PersonC", label: 2},
{from: "PersonA", to: "PersonD", label: 2},
{from: "PersonA", to: "PersonE", label: 1},
{from: "PersonA", to: "PersonF", label: 1},
{from: "PersonB", to: "PersonC", label: 1},
{from: "PersonB", to: "PersonD", label: 1},
{from: "PersonC", to: "PersonD", label: 2},
{from: "PersonC", to: "PersonE", label: 1},
{from: "PersonC", to: "PersonF", label: 1},
{from: "PersonD", to: "PersonE", label: 1},
{from: "PersonD", to: "PersonF", label: 1},
{from: "PersonE", to: "PersonF", label: 1}
]
}
I'm struggling on this for a few hours and can't something close for what I'm looking for.
A picture is nice, but when asking questions about Gremlin it's best to provide a Gremlin script to create your data:
g.addV('person').property(id,'a').as('a').
addV('person').property(id,'b').as('b').
addV('person').property(id,'c').as('c').
addV('person').property(id,'d').as('d').
addV('person').property(id,'e').as('e').
addV('person').property(id,'f').as('f').
addV('event').property(id,'1').as('1').
addV('event').property(id,'2').as('2').
addE('attends').from('a').to('1').
addE('attends').from('a').to('2').
addE('attends').from('b').to('2').
addE('attends').from('c').to('1').
addE('attends').from('c').to('2').
addE('attends').from('d').to('1').
addE('attends').from('d').to('2').
addE('attends').from('e').to('1').
addE('attends').from('f').to('1').iterate()
I went with this approach to solve your problem:
g.V().hasLabel('person').as('s').
out().in().
where(neq('s')).
path().by(id).
groupCount().
by(union(limit(local,1),tail(local,1)).fold()).
unfold().
dedup().
by(select(keys).order(local)).
order().
by(select(keys).limit(local,1)).
by(select(keys).tail(local,1))
which produces the output your seeking:
gremlin> g.V().hasLabel('person').as('s').
......1> out().in().
......2> where(neq('s')).
......3> path().by(id).
......4> groupCount().
......5> by(union(limit(local,1),tail(local,1)).fold()).
......6> unfold().
......7> dedup().
......8> by(select(keys).order(local)).
......9> order().by(select(keys).limit(local,1))
==>[a, b]=1
==>[a, e]=1
==>[a, c]=2
==>[a, d]=2
==>[a, f]=1
==>[b, c]=1
==>[b, d]=1
==>[c, d]=2
==>[c, e]=1
==>[c, f]=1
==>[d, e]=1
==>[d, f]=1
==>[e, f]=1
The approach above utilizes path() to gather the "person->event<-person" that Gremlin travels over and avoid retracing steps with where(neq('s')). It then does a groupCount() by the "person" vertices which represent the person pairs. We now have a Map with the person pairs and their counts as you want but it needs a bit of post-processing so we unfold() the Map to key-value pairs. The first step is to dedup() by the person pairs as the Map currently contains things like "a->b" and "b->a" and we don't need both, so deduping by the ordered list of those pairs, will give us the unique list. Finally, we add some order() to make the results look exactly like yours.
I suppose you could try to dedup() immediately after the path() and avoid some extra work in groupCount().

ansible flattened map filter results

I'm using Ansible's map filter to extract data, but the output is a list of lists; what I need is a flattened list. The closest I've come is illustrated by the "energy.yml" playbook below. Invoke as
ansible-playbook ./energy.yml --extra-vars='src=solar'
---
- hosts: localhost
vars:
region: [ 'east', 'west' ]
sources:
wind:
east:
filenames:
- noreaster.txt
- gusts.txt
- drafty.txt
west:
filenames:
- zephyr.txt
- jetstream.txt
solar:
east:
filenames:
- sunny.txt
- cloudy.txt
west:
filenames:
- blazing.txt
- frybaby.txt
- skynuke.txt
src: wind
tasks:
- name: Do the {{ src }} data
debug:
msg: "tweak file '/energy/{{src}}/{{ item[0] }}/{{ item[1] }}'."
with_nested:
- "{{ region }}"
- "{{
(region|map('extract',sources[src],'filenames')|list)[0] +
(region|map('extract',sources[src],'filenames')|list)[1]
}}"
when: "item[1] in sources[src][item[0]].filenames"
The output of the map() filter is a number of lists the same length as "region". Jinja's "+" operator is the only mechanism I've found to join lists, but since it's a binary operator rather than a filter, I can't apply it to an arbitrary number of lists. The code above depends on "region" having length 2, and having to map() multiple times is ugly in the extreme.
Restructuring the data (or the problem) is not an option. The aspect I'd like to focus on is flattening the map() output, or some other way of generating the correct "msg:" lines the code above does
sum filter with start=[] is your friend:
region | map('extract',sources[src],'filenames') | sum(start=[])
From this:
[
[
"noreaster.txt",
"gusts.txt",
"drafty.txt"
],
[
"zephyr.txt",
"jetstream.txt"
]
]
It will do this:
[
"noreaster.txt",
"gusts.txt",
"drafty.txt",
"zephyr.txt",
"jetstream.txt"
]

How to translate a first-order logic sentence into a restriction in Protègè with string matching?

I'm tring to build an ontology to infer some informations about a domain classification and a terminology, but I'm experiencing some conceptual difficulties.
Let me explain the problem. In Protègè 4.1 i created 6 subclasses of Thing: Concept, conceptTitle, ConceptSynonym (for the classification) and Term, TermTitle, TermSynonym (for the terminology). I also have created hasConceptTitle, hasConceptSynonym, hasTermTitle and hasTermSynonym object relationships (with some constrint) to say that every Concept has one (and only one) title, and may have some synonyms, and every Term has one (and only one) title and some synonyms. Both Concept and Term have another relationship isA, giving to the classification a DAG/tree structure, while the terminology has a lattice structure (in other words, a term may be a subclass of more than one term).
Here comes the problem: I would like to create a subclass of Concept, let's say "MappedConcept"), which should be the set of mapped concepts, that is the set of concepts which have the title equals to a term's title, or it has a synonym equals to a term's title or has a synonym that is equal to a synonym of a term.
In the first-order logic, this set may be expressed as:
∀x∃y( ∃z((hasConceptTitle(x,z) ∧ hasTermTitle(y,z)) ∨
∃z((hasConceptTitle(x,z) ∧ hasTermSynonym(y,z)) ∨
∃z((hasConceptSynonym(x,z) ∧ hasTermTitle(y,z)) ∨
∃z((hasConceptSynonym(x,z) ∧ hasTermSynonym(y,z)) )
How can I obtain this? Defining data properties for "ConceptTitle", "ConceptSynonym", "TermTitle" and "TermSynonym"? And how to describe the string matches?
Maybe those 4 classes should be just data properties of Concept and Term classes?
I read the practical guide of Matthew Horridge several times, but I can't the practical ideas I have on my mind into an ongology in Protègè.
Thanks in advance.
I'm afraid you cannot do this in OWL 2 DL nor in Protégé, which is an editor for OWL 2 DL, because, as far as I can tell, it seems necessary to introduce the inverse of a datatype property, which is forbidden in OWL 2 DL. However, it's possible in OWL Full, and some DL reasoners may even be able to deal with it. Here, in Turtle:
<MappedConcept> a owl:Class;
owl:equivalentTo [
a owl:Class;
owl:unionOf (
[
a owl:Restriction;
owl:onProperty <hasConceptTitle>;
owl:someValuesFrom [
a owl:Restriction;
owl:onProperty [ owl:inverseOf <hasTermTitle> ];
owl:someValuesFrom <Term>
]
] [
a owl:Restriction;
owl:onProperty <hasConceptTitle>;
owl:someValuesFrom [
a owl:Restriction;
owl:onProperty [ owl:inverseOf <hasTermSynonym> ];
owl:someValuesFrom <Term>
]
] [
a owl:Restriction;
owl:onProperty <hasConceptSynonym>;
owl:someValuesFrom [
a owl:Restriction;
owl:onProperty [ owl:inverseOf <hasTermSynonym> ];
owl:someValuesFrom <Term>
]
] [
a owl:Restriction;
owl:onProperty <hasConceptSynonym>;
owl:someValuesFrom [
a owl:Restriction;
owl:onProperty [ owl:inverseOf <hasTermTitle> ];
owl:someValuesFrom <Term>
]
]
)
] .
You can also do it without OWL, with a rule language for instance. The rules would look closer to how you would do it in programming languages. In SWRL:
hasConceptTitle(?x,?z), hasTermTitle(?y,?z) -> MappedConcept(?x)
hasConceptTitle(?x,?z), hasTermSynonym(?y,?z) -> MappedConcept(?x)
hasConceptSynonym(?x,?z), hasTermTitle(?y,?z) -> MappedConcept(?x)
hasConceptSynonym(?x,?z), hasTermSynonym(?y,?z) -> MappedConcept(?x)

Resources