How to attach relationships in existing nodes in neo4j? - graph

I'm trying to make a graph from a csv file, but I'm not being able to add additional relationship in the existing nodes.
My actual code is:
USING PERIODIC COMMIT 10000
LOAD CSV FROM 'my_file.csv' AS line
MERGE (p:Title { title: line[0]})
MERGE (a:Author { name: line[1]})
MERGE (a)-[:COLABORATE_IN]->(p)
WITH line WHERE line[2] IS NOT NULL
MERGE (b:Author {name: line[2]})
MERGE (b)-[:COLABORATE_IN]->(p) //not working
RETURN line[2]
It should be a simple, It creates well the nodes and the firsts relationships, but for the line[2] it just create the relationships for new nodes. What could I do?
Thanks

Everything that is not piped in the WITH clause is not available to the next part of the query :
MERGE (a:Author { name: line[1]})
MERGE (a)-[:COLABORATE_IN]->(p)
WITH line WHERE line[2] IS NOT NULL
// p is no more available here
Just add the p identifier to make it available in the remaining part of the query :
USING PERIODIC COMMIT 10000
LOAD CSV FROM 'my_file.csv' AS line
MERGE (p:Title { title: line[0]})
MERGE (a:Author { name: line[1]})
MERGE (a)-[:COLABORATE_IN]->(p)
WITH p, line
WHERE line[2] IS NOT NULL
MERGE (b:Author {name: line[2]})
MERGE (b)-[:COLABORATE_IN]->(p) //not working
RETURN line[2]

Related

How to select all columns within a JSON structure in aws databrew?

Databrew recipes can be written under JSON for transformations that will be used more than once for multiple datasets.
This is an example that i copied from Databrew Developer Guide to do joins between datasets:
`
{
"Action": {
"Operation": "JOIN",
"Parameters": {
"joinKeys": "[{\"key\":\"assembly_session\",\"value\":\"assembly_session\"},{\"key\":\"state_code\",\"value\":\"state_code\"}]",
"joinType": "INNER_JOIN",
"leftColumns": "[\"year\",\"assembly_session\",\"state_code\",\"state_name\",\"all_votes\",\"yes_votes\",\"no_votes\",\"abstain\",\"idealpoint_estimate\",\"affinityscore_usa\",\"affinityscore_russia\",\"affinityscore_china\",\"affinityscore_india\",\"affinityscore_brazil\",\"affinityscore_israel\"]",
"rightColumns": "[\"assembly_session\",\"vote_id\",\"resolution\",\"state_code\",\"state_name\",\"member\",\"vote\"]",
"secondInputLocation": "s3://databrew-public-datasets-us-east-1/votes.csv",
"secondaryDatasetName": "votes"
}
}
}
`
It's possible to select all columns with a * within "leftColumns" or anything close to that?
I've tried to add only * but it doesn't work.
I will do the same transformations in multiple tables and this functionality would work great if i could select everything on a left join, without needing to specify all the columns.

Return data from multiple vertices in a single nested object

I am trying to return a single nested object which combines data from multiple vertices using Gremlin.
Example Data Model:
For the above data model I would like to return a list of externalReferences for a given referenceSignal and system.
I have been able to return the vertices I want data from by using .select() steps but I am unsure how I can manipulate the returned vertices into one single object.
Current query:
g.V().has('id', 'SYSTEM_ID').hasLabel('system').as("system")
.in("partOf").hasLabel('component').as("component")
.in("partOf").hasLabel('signal').as("signal")
.where(out("instanceOf").hasLabel('referenceSignal').has("name", "REF_SIGNAL_NAME")).in("describes").hasLabel('externalReference').as("externalReference")
.select('system', 'component', 'signal', 'externalReference')
Output:
[{system=v[2776], component=v[2780], signal=v[2797], externalReference=v[2843]}, {system=v[2776], component=v[2785], signal=v[2802], externalReference=v[2848]}]
I want the returned data to be in the follow format:
{
"system_id": "{system_id from the system vertex}",
"system_name": "{system_name from the system vertex}",
"components": [ # Array of components adjacent to system
{
"component_id": "{component_id from the component vertex}",
"component_name": "{component_name from the component vertex}",
"signals": [ # Array of signals adjacent to component
{
"signal_id": "{signal_id from the signal vertex}",
"signal_name": "{signal_name from the signal vertex}",
"external_reference_id": "{external_reference_id from the external reference vertex adjacent to the signal}"
}
]
}
]
}

firebase database set method appending to array instead of replacing the elements

I'm using firebase database to store a complex data structure, it looks like this:
../thousand_island/
genomes:[
0:{
"somthing": "some content"
},
... 49 more
],
version: 1223
The genomes array always have 50 elements. Every time I call set to replace all the data in /my_record_name then 50 new records will be inserted into the genomes array instead of replacing.
Code sample:
// set the ref into the instance
this.ref= firebase.database.ref('/genomes/thousand_island');
...
// record = { genomes:[...], version: *** }
if(version> record.version){
this.ref.set(record,function(){
console.log('updated thousand_island ', version, '->', record.version);
})
}
So how to make it replace it instead of appending to the array? I was thinking of deleting the data then insert again but it sounds tedious and it costs 2 requests.

Multiple commits to neo4j from R

I have collected some tweets using the twitteR package and thereafter exported them to a neo4j database using Nicole White's various tutorials. I extract the tweets to a dataframe called kdf and thereafter use functions from stringr for basic cleaning up as demonstrated by Nicole. I am then sending this to neo4j from R. The essential part of my code is:
library(RNeo4j)
graph = startGraph("http://localhost:7474/db/data/", username="xxxx", password="xxxx")
clear(graph)
addConstraint(graph, "Tweet", "id")
addConstraint(graph, "User", "username")
addConstraint(graph, "Hashtag", "hashtag")
addConstraint(graph, "Tags", "ent_tag")
query = "
CREATE (tweet:Tweet {id: {tweetID}})
SET tweet.text = {text}
CREATE (user:User {name: {Username}})
CREATE (user)-[:TWEETED]->(tweet)
FOREACH(reply_to_sn IN CASE {reply_to_sn} WHEN NULL then [] else [{reply_to_sn}] END |
MERGE (replytouser:User {username:{reply_to_sn}})
CREATE (tweet)-[:IN_REPLY_TO]->(replytouser)
)
FOREACH(retweet_sn IN CASE {retweet_sn} WHEN NULL THEN [] ELSE [{retweet_sn}] END |
MERGE(retweet_user:User {username: {retweet_sn}})
CREATE (tweet)-[:RETWEET_OF]->(retweet_user)
)
FOREACH(hastag_nodes IN CASE {hashtag_nodes} WHEN NULL then [] else [{hashtag_nodes}] END |
MERGE (h:Hashtag {hashtag :{hashtag_nodes}})
CREATE (tweet)-[:HASHTAG]->(h)
)
FOREACH(mentioned_users IN CASE {mentioned_users} WHEN NULL then [] else [{mentioned_users}] END |
MERGE (m:User {username :{mentioned_users}})
CREATE (tweet)-[:MENTIONED]->(m)
)
"
tx = newTransaction(graph)
for(i in 1:nrow(kdf)){
row = kdf[i, ]
appendCypher(tx, query,
tweetID=row$id,
text=row$text,
Username=row$screenName,
reply_to_sn=row$replyToSN,
retweet_sn=getRetweetSN(row$text),
hashtag_nodes=getHashtags(row$text),
mentioned_users=getMentions(row$text))
}
commit(tx)
What I have done thereafter is extracted named entities for all the text using Watson's Alchemy API. This is stored in a dataframe called ent_tbl. This contains three variables, tweetid, etext and etype. Now I am trying to export this data too to the same neo4j databse and join on the id of the tweets. This is the other part of the code:
query="
MATCH(t:ent_tag {id : $twid, type :$etype, text :$etext})
MATCH(tw:tweet {tweetID : $twid })
CREATE (tw)-[:HAS_ENT]->(t)
"
tx=newTransaction(graph)
for (i in 1:nrow(ent_tbl)){
row = ent_tbl[i,]
appendCypher(tx, query,
twid=row2$tweetid,
etype=row2$etype,
etext=row2$etext)
}
commit(tx)
While I do not get any errors on committing this, summary(graph) does not show me the relationship between the tags (t) and the tweets (tw) that I expected to see.
> summary(graph)
This To That
1 User TWEETED Tweet
2 Tweet RETWEET_OF User
3 Tweet HASHTAG Hashtag
4 Tweet MENTIONED User
5 Tweet IN_REPLY_TO User
Why would this happen?
This is my db.schema in neo4j:
That is because the MATCH does not find any tag or tweet so it breaks. If you want to add data to existing nodes, you should match them by ID and then set their properties. And you got to be consistent with labels and upper/lower cases. I think this is what you are looking for.
query="
MATCH(t:Tags {ent_tag : $twid})
MATCH(tw:Tweet {tweetID : $twid })
SET t.type=$etype, t.text=$etext
CREATE (tw)-[:HAS_ENT]->(t)
"
tx=newTransaction(graph)
for (i in 1:nrow(ent_tbl)){
row = ent_tbl[i,]
appendCypher(tx, query,
twid=row2$tweetid,
etype=row2$etype,
etext=row2$etext)
}
commit(tx)

Neo4j - Cypher: mutual object with traversing relationships

I have a small Graph:
CREATE
(Dic1:Dictioniary { name:'Dic1' }),
(Dic2:Dictioniary { name: 'Dic2' }),
(Dic3:Dictioniary { name: 'Dic3' }),
(File1:File { name: 'File1' }),
(File2:File { name: 'File2' }),
(File3:File { name: 'File3' }),
(Dic2)-[:contains]->(Dic1),
(Dic1)-[:contains]->(File1),
(Dic3)-[:contains]->(File2),
(File1)-[:references]->(File3),
(File2)-[:references]->(File3)
I need a cypher query to find out, if for example Dic2 and Dic3 have paths/relations, where they reference the same File.
In this case it would be true; the mutual File is File3.
Thanks for your help
When you are looking for just two dictionaries you can achieve this in a single statement:
MATCH (d2:Dictioniary { name:'Dic2' }),(d3:Dictioniary { name:'Dic3' })
MATCH (d2)-[:contains|references*]->(f:File)<-[:contains|references*]-(d3)
RETURN f
It is quite expensive due to the two unbounded path matches, but it is quite cheap as it is bound from the outset by the two dictionary matches.
If you had an arbitrary number of Dictionaries to test you could do something like:
MATCH (d1:Dictioniary { name:'Dic1' }),(d2:Dictioniary { name:'Dic2' }),(d3:Dictioniary { name:'Dic3' })
WITH [d1,d2,d3] AS ds
MATCH (d)-[:contains|references*]->(f:File)
WHERE d IN ds
WITH f, ds, COLLECT(d) AS fds
WHERE length(ds)= length(fds)
RETURN f
This matches the dictionaries that you are interested in first and for each of them in turn it finds the files that they reference. Importantly the File object is preserved and the Dictionary that referenced it is collected into an array (fds). If we know that we had 3 dictionaries to begin with (length(ds)) and that a given file has the same number of related dictionaries (length(fds)) then all dictionaries must reference it.
Assuming that there may multiple paths to a given File from a given Dictionary then you can insert the DISTINCTmodifier into the second WITH statement:
WITH f, ds, COLLECT(DISTINCT(d)) AS fds

Resources