How to export edgelist from Grakn without using client APIs - vaticle-typedb

I'm trying to export edges from grakn. I can do that with Python client like so:
edge_query = "match $c2c($c1, $c2) isa c2c; $c1 has id $id1; $c2 has id $id2;get $id1,$id2;"
with open(f"grakn.edgelist","w") as outfile:
with GraknClient(uri="localhost:48555") as client:
with client.session(keyspace=KEYSPACE) as session:
with session.transaction().read() as read_transaction:
answer_iterator = read_transaction.query(edge_query)
for answer in tqdm(answer_iterator):
id1 = answer.get("id1")
id2 = answer.get("id2")
outfile.write(f"{id1.value()} {id2.value()} \n")
Edit: For each Relation, I want to export entities pairwise. The output can be a pair of Grakn IDs.
I can ignore the attributes of relation or entities.
Exporting to edges seems like a common task. Is there a better way(more elegant, faster, more efficient) to do it in Grakn?

This works as long as the relation type c2c always has two roleplayers. However, this will produce two edges for every $c1, $c2, which is probably not what you want.
Let's take a pair of Things, with ids V123 and V456. If they satisfy $c2c($c1, $c2) isa c2c; with $c1 = V123 and $c2 = V456 then they will also satisfy the same pattern as $c1 = V456 and $c2 = V123. Grakn will return all combinations of $c1, $c2 that satisfy your query, so you'll get two answers back for this one c2c relation.
Assuming this isn't what you want, if $c1 and $c2 play different roles in the relation c2c (likely implying there is direction to the edge) then try changing the query, adding the roles, to:
edge_query = "match $c2c(role1: $c1, role2: $c2) isa c2c; $c1 has id $id1; $c2 has id $id2; get $id1,$id2;"
If they both play the same role (implying undirected edges), then we need to do something different in our logic. Either store edges as a set of sets of ids to remove duplicates without much effort, or perhaps consider using the Python ConceptAPI, something like this:
relation_query = "match $rc2c isa c2c;get;"
with open(f"grakn.edgelist","w") as outfile:
with GraknClient(uri="localhost:48555") as client:
with client.session(keyspace=KEYSPACE) as session:
with session.transaction().read() as read_transaction:
answer_iterator = read_transaction.query(relation_query)
for answer in answer_iterator:
relation_concept = answer.get("rc2c")
role_players_map = relation_concept.role_players_map()
role_player_ids = set()
for role, thing in role_players_map.items():
# Here you can do any logic regarding what things play which roles
for t in thing:
role_player_ids.add(t.id) # Note that you can retrieve a concept id from the concept object, you don't need to ask for it in the query
outfile.write(", ".join(role_player_ids) + "\n")
Of course, I have no idea what you're doing with the resulting edgelist, but for completeness, the more Grakn-esque way would be to treat the Relation as a first-class citizen since it represents a hyperedge in the Grakn knowledge model, in this case we would treat the Roles of the relation as edges. This means we aren't stuck when we have ternary or N-ary relations. We can do this by changing the query:
match $c2c($c) isa c2c; get;
Then in the result we get the id of the $c2c and of the $c.

Related

Neo4j - Project graph from Neo4j with GraphDataScience

So I have a graph of nodes -- "Papers" and relationships -- "Citations".
Nodes have properties: "x", a list with 0/1 entries corresponding to whether a word is present in the paper or not, and "y" an integer label (one of the classes from 0-6).
I want to project the graph from Neo4j using GraphDataScience.
I've been using this documentation and I indeed managed to project the nodes and vertices of the graph:
Code
from graphdatascience import GraphDataScience
AURA_CONNECTION_URI = "neo4j+s://xxxx.databases.neo4j.io"
AURA_USERNAME = "neo4j"
AURA_PASSWORD = "my_code:)"
# Client instantiation
gds = GraphDataScience(
AURA_CONNECTION_URI,
auth=(AURA_USERNAME, AURA_PASSWORD),
aura_ds=True
)
#Shorthand projection --works
shorthand_graph, result = gds.graph.project(
"short-example-graph",
["Paper"],
["Citation"]
)
When I do print(result) it shows
nodeProjection {'Paper': {'label': 'Paper', 'properties': {}}}
relationshipProjection {'Citation': {'orientation': 'NATURAL', 'aggre...
graphName short-example-graph
nodeCount 2708
relationshipCount 10556
projectMillis 34
Name: 0, dtype: object
However, no properties of the nodes are projected. I then use the extended syntax as described in the documentation:
# Project a graph using the extended syntax
extended_form_graph, result = gds.graph.project(
"Long-form-example-graph",
{'Paper': {properties: "x"}},
"Citation"
)
print(result)
#Errors
I get the error:
NameError: name 'properties' is not defined
I tried various variations of this, with or without " ", but none have worked so far (also documentation is very confusing because one of the docs always uses " " and in another place I did not see " ").
Also, note that all my properties are integers in the Neo4j db (in AuraDS), as I used to have the error that String properties are not supported.
Some clarification on the correct way of projecting node features (aka properties) would be very useful.
thank you,
Dina
The keys in the Python dictionaries that you use with the GraphDataScience library should be enclosed in quotation marks. This is different from Cypher syntax, where map keys are not enclosed with quotation marks.
This should work for you.
extended_form_graph, result = gds.graph.project(
"Long-form-example-graph",
{'Paper': {"properties": "x"}},
"Citation"
)
Best wishes,
Nathan

Iterate list of values from traversal A in traversal B (Gremlin)

This is my test data:
graph = TinkerGraph.open()
g= graph.traversal()
g.addV('Account').property('id',"0x0").as('a1').
addV('Account').property('id',"0x1").as('a2').
addV('Account').property('id',"0x2").as('a3').
addV('Token').property('address','1').as('tk1').
addV('Token').property('address','2').as('tk2').
addV('Token').property('address','3').as('tk3').
addV('Trx').property('address','1').as('Trx1').
addV('Trx').property('address','1').as('Trx2').
addV('Trx').property('address','3').as('Trx3').
addE('sent').from('a1').to('Trx1').
addE('sent').from('a2').to('Trx2').
addE('received_by').from('Trx1').to('a2').
addE('received_by').from('Trx2').to('a3').
addE('distributes').from('a1').to('tk1').
addE('distributes').from('a1').to('tk2').
addE('distributes').from('a1').to('tk3').
iterate()
I need to first get all the Token addresses using the distributes relationship and then with those values loop through a traversal. This is an example of what I need for one single token
h = g.V().has('Account','id','0x0').next()
token = '1'
g.V(h).
out('sent').has('address',token).as('t1').
out('received_by').as('a2').
out('sent').has('address',token).as('t2').
out('received_by').as('a3').
select('a3','a2'). \
by('id').toList()
This is the output:
[a3:0x2,a2:0x1]
Instead of doing that has('address',token) on each hop I could omit it and just make sure the token address is the same by placing a where('t1',eq('t2')).by('address') at the end of the traversal, but this performs badly given my database design and indexes.
So what I do to iterate is:
tokens = g.V(h).out('distributes').values('address').toList()
finalList = []
for (token in tokens){
finalList.add(g.V(h).
out('sent').has('address',token).
out('received_by').as('a2').
out('sent').has('address',token).
out('received_by').as('a3').
select('a3','a2'). \
by('id').toList())
}
And this is what's stored in finalList at the end:
==>[[a3:0x2,a2:0x1]]
==>[]
==>[]
This works but I was wondering how can I iterate that token list this way without leaving Gremlin and without introducing that for loop. Also, my results contain empty results which is not optimal. The key here for me is to always be able to do that has('address',token) for each hop with the tokens that the Account node has ever sent. Thank you very much.
There is still uncertainty about what you are trying to achieve.
Nevertheless, I think this query does what you need:
g.V().has('Account', 'id', '0x0').as('a').
out('distributes').values('address').as('t').
select('a').
repeat(out('sent').where(values('address').
as('t')).
out('received_by')).
emit()
Example: https://gremlify.com/spwya4itlvd

Merging Value of one dictionary as key of another

I have 2 Dictionaries:
StatePops={'AL':4887871, 'AK':737438, 'AZ':7278717, 'AR':3013825}
StateNames={'AL':'Alabama', 'AK':'Alaska', 'AZ':'Arizona', 'AR':'Arkansas'}
I am trying to merge so the Value of StateNames is the Key for StatePops.
Ex.
{'Alabama': 4887871, 'Alaska': 737438, ...
I also have to display the name of states w/ population over 4million.
Any help is appreciated!!!
You have not specified in what programming language you want this problem to be solved.
Nonetheless, here is a solution in Python.
state_pops = {
'AL': 4887871,
'AK': 737438,
'AZ':7278717,
'AR':3013825
}
state_names = {
'AL':'Alabama',
'AK':'Alaska',
'AZ':'Arizona',
'AR':'Arkansas'
}
states = dict([([state_names[k],state_pops[k]]) for k in state_pops])
final = {k:v for k, v in states.items() if v > 4000000}
print(states)
print(final)
First, you can merge two dictionaries with the predefined dict python function in the states variable as such. Here, k is an iterator and it is used as index for state_names and state_pops.
Then, store the filtered dictionary in final where the states.items() is used to access the keys and values in states and type-cast it as a string with the str function.
There may be more simpler solutions but this is as far as I can optimize the problem.
Hope this helps.
Dictionary Keys cannot be changed in Python. You need to either add the modified key-value pair to the dictionary and then delete the old key, or you can create a new dictionary. I'd opt for the second option, i.e., creating a new dictionary.
myDict = {}
for i in StatePops:
myDict.update({StateNames[i] : StatePops[i]})
This outputs myDict as
{'Alabama': 4887871, 'Alaska': 737438, 'Arizona': 7278717, 'Arkansas': 3013825}

ArangoDB copy Vertex and Edges to neighbors

I'm trying to copy a vertex node and retain it's relationships in ArangoDB. I'm getting a "access after data-modification" error (1579). It doesn't like it when I iterate over the source node's edges and insert an edge copy within the loop. This makes sense but I'm struggling to figure out how to do what I'm wanting within a single transaction.
var query = arangojs.aqlQuery`
let tmpNode = (FOR v IN vertices FILTER v._id == ${nodeId} RETURN v)[0]
let nodeCopy = UNSET(tmpNode, '_id', '_key', '_rev')
let nodeCopyId = (INSERT nodeCopy IN 'vertices' RETURN NEW._id)[0]
FOR e IN GRAPH_EDGES('g', ${nodeId}, {'includeData': true, 'maxDepth': 1})
let tmpEdge = UNSET(e, '_id', '_key', '_rev')
let edgeCopy = MERGE(tmpEdge, {'_from': nodeCopyId})
INSERT edgeCopy IN 'edges'
`;
This quesion is somewhat similar to 'In AQL how to re-parent a vertex' - so let me explain this in a similar way.
One should use the ArangoDB 2.8 pattern matching traversals to solve this.
We will copy Alice to become Sally with similar relations:
let alice=DOCUMENT("persons/alice")
let newSally=UNSET(MERGE(alice, {_key: "sally", name: "Sally"}), '_id')
let r=(for v,e in 1..1 ANY alice GRAPH "knows_graph"
LET me = UNSET(e, "_id", "_key", "_rev")
LET newEdge = (me._to == "persons/alice") ?
MERGE(me, {_to: "persons/sally"}) :
MERGE(me, {_from: "persons/sally"})
INSERT newEdge IN knows RETURN newEdge)
INSERT newSally IN persons RETURN newSally
We therefore first load Alice. We UNSET the properties ArangoDB should set on its own. We change the properties that have to be uniq to be uniq for Alice so we have a Sally afterwards.
Then we open a subquery to traverse ANY first level relations of Alice. In this subequery we want to copy the edges - e. We need to UNSET once more the document attributes that have to be autogenerated by ArangoDB. We need to find out which side of _from and _to pointed to Alice and relocate it to Sally.
The final insert of Sally has to be outside of the subquery, else this statement will attempt to insert one Sally per edge we traverse. We can't insert Saly in front of the query as you already found out - no subsequent fetches are allowed after the insert.

Users who mentioned each other in Gremlin

We have a smaller example twitter database:
user -[TWEETED]-> tweet -[MENTIONED]-> user2
and I would like to find out how to write a query in Gremlin, that shows who were the users who mentioned each other. I have already read the docs but I don't know how to do it.
Given this sample data that assume marko and stephen mention each other and marko and daniel mention each other:
g = new TinkerGraph()
vMarko = g.addVertex("marko", [type:"user"])
vStephen = g.addVertex("stephen", [type:"user"])
vDaniel = g.addVertex("daniel", [type:"user"])
vTweetm1s = g.addVertex("m1s", [type:"tweet"])
vTweetm2d = g.addVertex("m2d", [type:"tweet"])
vTweets1m = g.addVertex("s1m", [type:"tweet"])
vTweetd1m = g.addVertex("d1m", [type:"tweet"])
vMarko.addEdge("tweeted",vTweetm1s)
vMarko.addEdge("tweeted",vTweetm2d)
vStephen.addEdge("tweeted",vTweets1m)
vDaniel.addEdge("tweeted",vTweetd1m)
vTweetm1s.addEdge("mentioned", vStephen)
vTweetm2d.addEdge("mentioned", vDaniel)
vTweets1m.addEdge("mentioned", vMarko)
vTweetd1m.addEdge("mentioned", vMarko)
you could handle it with the following:
gremlin> g.V.has("type","user").as('s')
.out("tweeted").out("mentioned").as('m').out("tweeted")
.out("mentioned").as('e').select.filter{it[0]==it[2]}
==>[s:v[daniel], m:v[marko], e:v[daniel]]
==>[s:v[stephen], m:v[marko], e:v[stephen]]
==>[s:v[marko], m:v[stephen], e:v[marko]]
==>[s:v[marko], m:v[daniel], e:v[marko]]
This approach uses select to extract the data from the labelled steps then a final filter to find those where "s" (vertex in the first position) is equal to the "e" (vertex in the final position). This of course means that there is cycle pattern detected where the one user mentioned another and the other mentioned that person back at some point.
If you follow that much then we can clean up the result a little bit so as to get the unique set of pairs:
gremlin> g.V.has("type","user").as('s')
.out("tweeted").out("mentioned").as('m')
.out("tweeted").out("mentioned").as('e')
.select.filter{it[0]==it[2]}
.transform{[it[0].id,it[1].id] as Set}.toList() as Set
==>[daniel, marko]
==>[stephen, marko]
By adding a transform to the previous code, we can convert the result to "id" (the user's name in this case) and flip everything to Set so as to get unique pairs of results.

Resources