Neo4j - Project graph from Neo4j with GraphDataScience - graph

So I have a graph of nodes -- "Papers" and relationships -- "Citations".
Nodes have properties: "x", a list with 0/1 entries corresponding to whether a word is present in the paper or not, and "y" an integer label (one of the classes from 0-6).
I want to project the graph from Neo4j using GraphDataScience.
I've been using this documentation and I indeed managed to project the nodes and vertices of the graph:
Code
from graphdatascience import GraphDataScience
AURA_CONNECTION_URI = "neo4j+s://xxxx.databases.neo4j.io"
AURA_USERNAME = "neo4j"
AURA_PASSWORD = "my_code:)"
# Client instantiation
gds = GraphDataScience(
AURA_CONNECTION_URI,
auth=(AURA_USERNAME, AURA_PASSWORD),
aura_ds=True
)
#Shorthand projection --works
shorthand_graph, result = gds.graph.project(
"short-example-graph",
["Paper"],
["Citation"]
)
When I do print(result) it shows
nodeProjection {'Paper': {'label': 'Paper', 'properties': {}}}
relationshipProjection {'Citation': {'orientation': 'NATURAL', 'aggre...
graphName short-example-graph
nodeCount 2708
relationshipCount 10556
projectMillis 34
Name: 0, dtype: object
However, no properties of the nodes are projected. I then use the extended syntax as described in the documentation:
# Project a graph using the extended syntax
extended_form_graph, result = gds.graph.project(
"Long-form-example-graph",
{'Paper': {properties: "x"}},
"Citation"
)
print(result)
#Errors
I get the error:
NameError: name 'properties' is not defined
I tried various variations of this, with or without " ", but none have worked so far (also documentation is very confusing because one of the docs always uses " " and in another place I did not see " ").
Also, note that all my properties are integers in the Neo4j db (in AuraDS), as I used to have the error that String properties are not supported.
Some clarification on the correct way of projecting node features (aka properties) would be very useful.
thank you,
Dina

The keys in the Python dictionaries that you use with the GraphDataScience library should be enclosed in quotation marks. This is different from Cypher syntax, where map keys are not enclosed with quotation marks.
This should work for you.
extended_form_graph, result = gds.graph.project(
"Long-form-example-graph",
{'Paper': {"properties": "x"}},
"Citation"
)
Best wishes,
Nathan

Related

BertModel transformers outputs string instead of tensor

I'm following this tutorial that codes a sentiment analysis classifier using BERT with the huggingface library and I'm having a very odd behavior. When trying the BERT model with a sample text I get a string instead of the hidden state. This is the code I'm using:
import transformers
from transformers import BertModel, BertTokenizer
print(transformers.__version__)
PRE_TRAINED_MODEL_NAME = 'bert-base-cased'
PATH_OF_CACHE = "/home/mwon/data-mwon/paperChega/src_classificador/data/hugingface"
tokenizer = BertTokenizer.from_pretrained(PRE_TRAINED_MODEL_NAME,cache_dir = PATH_OF_CACHE)
sample_txt = 'When was I last outside? I am stuck at home for 2 weeks.'
encoding_sample = tokenizer.encode_plus(
sample_txt,
max_length=32,
add_special_tokens=True, # Add '[CLS]' and '[SEP]'
return_token_type_ids=False,
padding=True,
truncation = True,
return_attention_mask=True,
return_tensors='pt', # Return PyTorch tensors
)
bert_model = BertModel.from_pretrained(PRE_TRAINED_MODEL_NAME,cache_dir = PATH_OF_CACHE)
last_hidden_state, pooled_output = bert_model(
encoding_sample['input_ids'],
encoding_sample['attention_mask']
)
print([last_hidden_state,pooled_output])
that outputs:
4.0.0
['last_hidden_state', 'pooler_output']
While the answer from Aakash provides a solution to the problem, it does not explain the issue. Since one of the 3.X releases of the transformers library, the models do not return tuples anymore but specific output objects:
o = bert_model(
encoding_sample['input_ids'],
encoding_sample['attention_mask']
)
print(type(o))
print(o.keys())
Output:
transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions
odict_keys(['last_hidden_state', 'pooler_output'])
You can return to the previous behavior by adding return_dict=False to get a tuple:
o = bert_model(
encoding_sample['input_ids'],
encoding_sample['attention_mask'],
return_dict=False
)
print(type(o))
Output:
<class 'tuple'>
I do not recommend that, because it is now unambiguous to select a specific part of the output without turning to the documentation as shown in the example below:
o = bert_model(encoding_sample['input_ids'], encoding_sample['attention_mask'], return_dict=False, output_attentions=True, output_hidden_states=True)
print('I am a tuple with {} elements. You do not know what each element presents without checking the documentation'.format(len(o)))
o = bert_model(encoding_sample['input_ids'], encoding_sample['attention_mask'], output_attentions=True, output_hidden_states=True)
print('I am a cool object and you can acces my elements with o.last_hidden_state, o["last_hidden_state"] or even o[0]. My keys are; {} '.format(o.keys()))
Output:
I am a tuple with 4 elements. You do not know what each element presents without checking the documentation
I am a cool object and you can acces my elements with o.last_hidden_state, o["last_hidden_state"] or even o[0]. My keys are; odict_keys(['last_hidden_state', 'pooler_output', 'hidden_states', 'attentions'])
I faced the same issue while learning how to implement Bert. I noticed that using
last_hidden_state, pooled_output = bert_model(encoding_sample['input_ids'], encoding_sample['attention_mask'])
is the issue. Use:
outputs = bert_model(encoding_sample['input_ids'], encoding_sample['attention_mask'])
and extract the last_hidden state using
output[0]
You can refer to the documentation here which tells you what is returned by the BertModel

Extracting dict keys from values

I am still learning about python and I face some trouble extracting data from a dict. I need to create a loop which check each values and extract the keys. So for this code I need to find the nice students. I am stuck at line 3 #blank.
How do i go about this?
Thanks in advance
class = {"James":"naughty", "Lisa":"nice", "Bryan":"nice"}
for student in class:
if #blank:
print("Hello, "+student+" students!")
else:
print("odd")
Uses dictionary methods "keys(), values(), items()":
def get_students_by_criteria(student_class, criteria):
students = []
for candidate, value in student_class.items():
if value == criteria:
students.append(candidate)
return students
my_class = {"James":"naughty", "Lisa":"nice", "Bryan":"nice"}
print(get_students_by_criteria(my_class, "nice"))
Warning to the word "class" it is a keyword reserved for python programming oriented object

How to export edgelist from Grakn without using client APIs

I'm trying to export edges from grakn. I can do that with Python client like so:
edge_query = "match $c2c($c1, $c2) isa c2c; $c1 has id $id1; $c2 has id $id2;get $id1,$id2;"
with open(f"grakn.edgelist","w") as outfile:
with GraknClient(uri="localhost:48555") as client:
with client.session(keyspace=KEYSPACE) as session:
with session.transaction().read() as read_transaction:
answer_iterator = read_transaction.query(edge_query)
for answer in tqdm(answer_iterator):
id1 = answer.get("id1")
id2 = answer.get("id2")
outfile.write(f"{id1.value()} {id2.value()} \n")
Edit: For each Relation, I want to export entities pairwise. The output can be a pair of Grakn IDs.
I can ignore the attributes of relation or entities.
Exporting to edges seems like a common task. Is there a better way(more elegant, faster, more efficient) to do it in Grakn?
This works as long as the relation type c2c always has two roleplayers. However, this will produce two edges for every $c1, $c2, which is probably not what you want.
Let's take a pair of Things, with ids V123 and V456. If they satisfy $c2c($c1, $c2) isa c2c; with $c1 = V123 and $c2 = V456 then they will also satisfy the same pattern as $c1 = V456 and $c2 = V123. Grakn will return all combinations of $c1, $c2 that satisfy your query, so you'll get two answers back for this one c2c relation.
Assuming this isn't what you want, if $c1 and $c2 play different roles in the relation c2c (likely implying there is direction to the edge) then try changing the query, adding the roles, to:
edge_query = "match $c2c(role1: $c1, role2: $c2) isa c2c; $c1 has id $id1; $c2 has id $id2; get $id1,$id2;"
If they both play the same role (implying undirected edges), then we need to do something different in our logic. Either store edges as a set of sets of ids to remove duplicates without much effort, or perhaps consider using the Python ConceptAPI, something like this:
relation_query = "match $rc2c isa c2c;get;"
with open(f"grakn.edgelist","w") as outfile:
with GraknClient(uri="localhost:48555") as client:
with client.session(keyspace=KEYSPACE) as session:
with session.transaction().read() as read_transaction:
answer_iterator = read_transaction.query(relation_query)
for answer in answer_iterator:
relation_concept = answer.get("rc2c")
role_players_map = relation_concept.role_players_map()
role_player_ids = set()
for role, thing in role_players_map.items():
# Here you can do any logic regarding what things play which roles
for t in thing:
role_player_ids.add(t.id) # Note that you can retrieve a concept id from the concept object, you don't need to ask for it in the query
outfile.write(", ".join(role_player_ids) + "\n")
Of course, I have no idea what you're doing with the resulting edgelist, but for completeness, the more Grakn-esque way would be to treat the Relation as a first-class citizen since it represents a hyperedge in the Grakn knowledge model, in this case we would treat the Roles of the relation as edges. This means we aren't stuck when we have ternary or N-ary relations. We can do this by changing the query:
match $c2c($c) isa c2c; get;
Then in the result we get the id of the $c2c and of the $c.

Users who mentioned each other in Gremlin

We have a smaller example twitter database:
user -[TWEETED]-> tweet -[MENTIONED]-> user2
and I would like to find out how to write a query in Gremlin, that shows who were the users who mentioned each other. I have already read the docs but I don't know how to do it.
Given this sample data that assume marko and stephen mention each other and marko and daniel mention each other:
g = new TinkerGraph()
vMarko = g.addVertex("marko", [type:"user"])
vStephen = g.addVertex("stephen", [type:"user"])
vDaniel = g.addVertex("daniel", [type:"user"])
vTweetm1s = g.addVertex("m1s", [type:"tweet"])
vTweetm2d = g.addVertex("m2d", [type:"tweet"])
vTweets1m = g.addVertex("s1m", [type:"tweet"])
vTweetd1m = g.addVertex("d1m", [type:"tweet"])
vMarko.addEdge("tweeted",vTweetm1s)
vMarko.addEdge("tweeted",vTweetm2d)
vStephen.addEdge("tweeted",vTweets1m)
vDaniel.addEdge("tweeted",vTweetd1m)
vTweetm1s.addEdge("mentioned", vStephen)
vTweetm2d.addEdge("mentioned", vDaniel)
vTweets1m.addEdge("mentioned", vMarko)
vTweetd1m.addEdge("mentioned", vMarko)
you could handle it with the following:
gremlin> g.V.has("type","user").as('s')
.out("tweeted").out("mentioned").as('m').out("tweeted")
.out("mentioned").as('e').select.filter{it[0]==it[2]}
==>[s:v[daniel], m:v[marko], e:v[daniel]]
==>[s:v[stephen], m:v[marko], e:v[stephen]]
==>[s:v[marko], m:v[stephen], e:v[marko]]
==>[s:v[marko], m:v[daniel], e:v[marko]]
This approach uses select to extract the data from the labelled steps then a final filter to find those where "s" (vertex in the first position) is equal to the "e" (vertex in the final position). This of course means that there is cycle pattern detected where the one user mentioned another and the other mentioned that person back at some point.
If you follow that much then we can clean up the result a little bit so as to get the unique set of pairs:
gremlin> g.V.has("type","user").as('s')
.out("tweeted").out("mentioned").as('m')
.out("tweeted").out("mentioned").as('e')
.select.filter{it[0]==it[2]}
.transform{[it[0].id,it[1].id] as Set}.toList() as Set
==>[daniel, marko]
==>[stephen, marko]
By adding a transform to the previous code, we can convert the result to "id" (the user's name in this case) and flip everything to Set so as to get unique pairs of results.

How to access and mutate node property value by the property name string in Cypher?

My goal is to access and mutate a property of a node in a cypher query where the name of the property to be accessed and mutated is an unknown string value.
For example, consider a command:
Find all nodes containing a two properties such that the name of the first property is lower-case and the name of the latter is the upper-case representation of the former. Then, propagate the value of the property with the lower-case string name to the value of the property with the upper-case name.
The particular case is easy:
MATCH ( node )
WHERE has(node.age) AND has(node.AGE) AND node.age <> node.AGE
SET node.AGE = node.age
RETURN node;
But I can't seem to find a way to implement the general case in a single request.
Specifically, I am unable to:
Access the property of the node with a string and a value
Mutate the property of the node with a string and a value
For the sake of clarity, I'll include my attempt to handle the general case. Where I failed to modify the property of the node I was able to generate the cypher for a command that would accomplish my end goal if it were executed in a subsequent transaction.
MERGE ( justToMakeSureOneExists { age: 14, AGE : 140 } ) WITH justToMakeSureOneExists
MATCH (node)
WHERE ANY ( kx IN keys(node) WHERE kx = LOWER(kx) AND ANY ( ky in keys(node) WHERE ky = UPPER(kx) ) )
REMOVE node.name_conflicts // make sure results are current
FOREACH(kx in keys(node) |
SET node.name_conflicts
= COALESCE(node.name_conflicts,[])
+ CASE kx
WHEN lower(kx)
THEN []
+ CASE WHEN any ( ky in keys(node) WHERE ky = upper(kx) )
THEN ['match (node) where id(node) = ' + id(node)+ ' and node.' + upper(kx) + ' <> node.' + kx + ' set node.' + upper(kx) + ' = node.' + kx + ' return node;']
ELSE [] END
ELSE []
END )
RETURN node,keys(node)
Afterthought: It seems like the ability to mutate a node property by property name would be a pretty common requirement, but the lack of obvious support for the feature leads me to believe that the feature was omitted deliberately? If this feature is indeed unsupported is there any documentation to explain why and if there is some conflict between the approach and the recommended way of doing things in Neo/Cypher?
There is some discussion going on regarding improved support for dynamic property access in Cypher. I'm pretty confident that we will see support for this in the future, but I cannot comment on a target release nor on a date.
As a workaround I'd recommend implementing that into a unmanaged extension.
It appears that the desired language feature was added to Cypher in Neo4j 2.3.0 under the name "dynamic property". The Cypher docs from version 2.3.0-up declare the following syntax group as a valid cypher expression:
A dynamic property: n["prop"], rel[n.city + n.zip], map[coll[0]].
This feature is documented for 2.3.0 but is absent from the previous version (2.2.9).
Thank you Neo4j Team!

Resources