Do I need to create all nodes by hand in Neo4j? - graph

I am probably missing something because I am very new to Neo4j, but looking at their Movie graph - probably the very first graph to play with when you are learning the platform - they give us a really big piece of code where every node and labels and properties are imputed by hand, one after the other. Ok, it seems fair to a small graph for learning purpose. But, how should I proceed when I want to import a CSV and create a graph from this data? I believe a hand-imput is not expected at all.
My data look something like this:
date
origin
destiny
value
type
balance
01-05-2021
A
B
500
transf
2500
It has more than 10 thousand rows like this.
I loaded it as:
LOAD CSV FROM "file:///MyData.csv" AS data
RETURN data;
and it worked. The data was loaded etc. But now I have some questions:
1- How do I proceeed if I want origin to be a node and destiny to be another node with type to be edges with value as property? I mean, I know how to create it like (a)->[]->(b) but how to create the entire graph without creating edge by edge, node by node, property by property etc...?
2- Am I able to select the date and see something like a time evolution for this graph? I want to see all transactions in 20-05-2021, 01-05-2021 etc and see how it evolves. Is it possible?

As example in the official docs says here: https://neo4j.com/docs/operations-manual/current/tutorial/neo4j-admin-import/#tutorial-neo4j-admin-import
You may want to create 3 separate files for the import:
First: you need the movies.csv to import nodes with label :Movie
movieId:ID,title,year:int,:LABEL
tt0133093,"The Matrix",1999,Movie
tt0234215,"The Matrix Reloaded",2003,Movie;Sequel
tt0242653,"The Matrix Revolutions",2003,Movie;Sequel
Second: you need actors.csv to import nodes with label :Actor
personId:ID,name,:LABEL
keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor
carrieanne,"Carrie-Anne Moss",Actor
Finally, you can import relationships
As you see, actors and movies are already imported. So now you just need to specify the relationships. In the example, you're importing ROLE relationship in the given format:
:START_ID,role,:END_ID,:TYPE
keanu,"Neo",tt0133093,ACTED_IN
keanu,"Neo",tt0234215,ACTED_IN
keanu,"Neo",tt0242653,ACTED_IN
laurence,"Morpheus",tt0133093,ACTED_IN
laurence,"Morpheus",tt0234215,ACTED_IN
laurence,"Morpheus",tt0242653,ACTED_IN
carrieanne,"Trinity",tt0133093,ACTED_IN
carrieanne,"Trinity",tt0234215,ACTED_IN
carrieanne,"Trinity",tt0242653,ACTED_IN
So as you see in the header, you've got values:
START_ID - where the relationship starts, from which node
role - property name (you can specify multiple properties here, just make sure the csv format contains data for it)
:END_IN - where the relationship ends, to which node
:TYPE - type of the relationship
That's all :)

Related

Efficient way to create relationships from a csv in neo4j

I am trying to populate my graph db with relationships that I currently have access to in a file.
They are in the form were each line in the relationship csv has the unique IDs of the two nodes that relationship is describing as well as the kind of the relationship it is.
Each line in the relationship csv has something along of the lines of:
uniqueid1,uniqueid2,relationship_name,property1_value,..., propertyn_value
I already had all nodes created and was working on matching the nodes that match the uniqueids specified in each of the files and then creating the relationship between them.
However, the tend to be taking a long time to be creating for each of the relationships and my suspicion is that I am doing something wrong.
The csv file has about 2.5 million lines with different relationship types. So i manually set the relationships.rela property to one of them and try to run through creating all nodes involved in that relationship and follow up with the next using my where clause.
The amount of properties each node has has been reduced by an ellipsis(...) and the names redacted.
I currently have the query to create the relationships set up in the following way
:auto USING PERIODIC COMMIT 100 LOAD CSV WITH HEADERS FROM 'file:///filename.csv' as relationships
WITH relationships.uniqueid1 as uniqueid1, relationships.uniqueid2 as uniqueid2, relationships.extraproperty1 as extraproperty1, relationships.rela as rela... , relationships.extrapropertyN as extrapropertyN
WHERE relations.rela = "manager_relationship"
MATCH (a:Item {uniqueid: uniqueid1})
MATCH (b:Item {uniqueid: uniqueid2})
MERGE (b) - [rel: relationship_name {propertyvalue1: extraproperty1,...propertyvalueN: extrapropertyN }] -> (a)
RETURN count(rel)
Would appreciate if alternate patterns could be recommended.
Indexing is a mechanism that databases use, to speed up data lookups. In your case, since Item nodes are not indexed, these two matches can take a lot of time, especially if the number of Item nodes is very large.
MATCH (a:Item {uniqueid: uniqueid1})
MATCH (b:Item {uniqueid: uniqueid2})
To speed this up, you can create an index on Item nodes uniqueid property, like this:
CREATE INDEX unique_id_index FOR (n:Item) ON (n.uniqueid)
When you'll run your import query after creating the index, it will be much faster. But it will still take a bit of time as there are 2.5 million relationships. Read more about indexing in neo4j here.
Aside from suggestion from Charchit about create an index, I will recommend using an APOC function apoc.periodic.iterate which will execute the query into parallel batches of 10k rows.
https://neo4j.com/labs/apoc/4.4/overview/apoc.periodic/apoc.periodic.iterate/
For example:
CALL apoc.periodic.iterate(
"LOAD CSV WITH HEADERS FROM 'file:///filename.csv' as relationships RETURN relationships",
"WITH relationships.uniqueid1 as uniqueid1, relationships.uniqueid2 as uniqueid2, relationships.extraproperty1 as extraproperty1, relationships.rela as rela... , relationships.extrapropertyN as extrapropertyN
WHERE relations.rela = "manager_relationship"
MATCH (a:Item {uniqueid: uniqueid1})
MATCH (b:Item {uniqueid: uniqueid2})
MERGE (b) - [rel: relationship_name {propertyvalue1: extraproperty1,...propertyvalueN: extrapropertyN }] -> (a)",{batchSize:10000, parallel:true})
The first parameter will return all the data in the csv file then it will divide the rows into 10k per batch and it will run it in parallel using default concurrency (50 workers).
I use it often where I load 40M nodes/edges in about 30mins.

How to source statments with neoj4 graph database?

I'm looking forward to use NeoJ4 for some researchs. However I have to check if it can do what I want first.
I would like to build a graph that says :
StatementID1 = Cannabidiol hasPositiveEffectOn ChronicPain
StatementID1 isSupportedBy Study1
StatementID1 isSupportedBy Study2
StatementID1 isNotSupportedBy Study3
This is easy to add key:Value properties to a relationship in NeoJ4.
The difficulty is that I want Study1,2,3 to be nodes. So that these can have them own set of properties.
This can be done in a triplestore where each triple has an ID like "Statment1" here. This is a matter of adding triples where the object is another triple ID.
url:TripleID1 = url:Cannabidiol url:hasPositiveEffectOn url:ChronicPain
url:TripleID2 = url:TripleID1 url:isSupportedBy url:Study1
url:TripleID3 = url:TripleID1 url:isSupportedBy url:Study2
url:TripleID4 = url:TripleID1 url:isNotSupportedBy url:Study3
For the moment I can't find how to do it simplely in NeoJ4.
I could add the DOI of the study as a property :
Study 1 :
DOI:123/123
Then add the same DOI in the link :
isSupportedBy:
DOI:123/123
Since the DOI is unique it could be possible to make some searchs. However this will make things much more complex.
Is there a simpler method to achieve that?
I suppose this is a database design issue.
Would a node/relationship model something like the following fit your data and make your queries easy?
Neo4j doesn’t support edges going from an edge to a node. Edges are always between nodes. So you’ll have to convert your positiveEffect edge to a node (as proposed in rickhg12hs’s answer) or model the positiveEffect as a non-edge (as you yourself proposed).

arcmap network analyst iteration over multiple files using model builder

I have 10+ files that I want to add to ArcMap then do some spatial analysis in an automated fashion. The files are in csv format which are located in one folder and named in order as "TTS11_path_points_1" to "TTS11_path_points_13". The steps are as follows:
Make XY event layer
Export the XY table to a point shapefile using the feature class to feature class tool
Project the shapefiles
Snap the points to another line shapfile
Make a Route layer - network analyst
Add locations to stops using the output of step 4
Solve to get routes between points based on a RouteName field
I tried to attach a snapshot of the model builder to show the steps visually but I don't have enough points to do so.
I have two problems:
How do I iterate this procedure over the number of files that I have?
How to make sure that every time the output has a different name so it doesn't overwrite the one form the previous iteration?
Your help is much appreciated.
Once you're satisfied with the way the model works on a single input CSV, you can batch the operation 10+ times, manually adjusting the input/output files. This easily addresses your second problem, since you're controlling the output name.
You can use an iterator in your ModelBuilder model -- specifically, Iterate Files. The iterator would be the first input to the model, and has two outputs: File (which you link to other tools), and Name. The latter is a variable which you can use in other tools to control their output -- for example, you can set the final output to C:\temp\out%Name% instead of just C:\temp\output. This can be a little trickier, but once it's in place it tends to work well.
For future reference, gis.stackexchange.com is likely to get you a faster response.

neo4j cypher : how to query a linked list

I'm having a bit of trouble to design a cypher query.
I have a graph data structures that records some data in time, using
(starting_node)-[:last]->(data1)-[:previous]->(data2)-[:previous]->(data3)->...
Each of the data nodes has a date, and some data as attributes that I want to sum.
Now, for the sake of the example, let's say I want to query what happened last week.
The closer I got is to query something like
start n= ... // definition of the many starting nodes here
match n-[:last]->d1, path = d1-[:previous*0..7]->dn
where dn.date > some_date_a_week_ago
Unfortunately, as I get the right path, I also get all the intermediate paths (from 2 days ago, from 3 days ago...etc).
Since there is many starting nodes, and thus many possible path lengths, I cannot ask for the longest path in my query. Furthermore, dn.date can be different from date_a_week_ago ( if there is only one data node this week, and one data node last month, then the expected path is of length=1).
Any tip on how to filter the intermediate paths in my query ?
Thanks in advance !
ps : by the way, I'm quite new with the graph modeling, and I'd be interested with any answer that would require to change the graph structure if needed.
You can add a further point "dnnext" in your path, and add a condition to ensure the "dn" is the last one that satisfis the condifition,
start n= ... // definition of the many starting nodes here
match n-[:last]->d1, path = d1-[:previous*0..7]->dn-[:previous*0..1]->dnnext
where dn.date > some_date_a_week_ago and dnnext < some_date_a_week

R: Associate values to tips in phylogeny

I have a phylogeny and some data (traits values). I've reconstructed trait values for all nodes using ace in caper.
I used makeNodeLabel in ape to associate the reconstructed trait values with their appropriate nodes.
What I want to do is to export a nexus file (phylogeny) from R that contains both the node values (recontructed values) and the tip labels (emperical data).
I want to use color codes (in FigTree) to indicate the values, but right now I'm only able to do this with nodes, i.e. the tip-branches do not have data and are hence not "color codable".
I need to associate values to the tips in order to do this, but I haven't been able to figure out how to do this. I also need all the data I associate to the phylogeny to be in a "similar category", i.e. similarly to for example how theta values from *BEAST are coded in nexus files.
Any and all help is greatly appreciated.
I've found a workaround:
Export a tree (nexus format) from R with reconstructed traits associated to the nodes. Open in FigTree and define the "label" trait as "trait". Import tip annotations from a text file which contain empirical data in a column with the header "trait". Then export the tree from FigTree to a new file with nexus-block and include annotations. Lasty, copy names from the name block in the nexus file (animal/organism[&Trait=2.35754]) and exchange the names with the ones in the coded tree. You will then have trait values coded through [&Trait=value] for both nodes and tips. Now you can color code the entire tree which includes the empirical values that are now associated to the tips of the phylogeny.
Sooo, that's a stupid way of doing it. If anyone has a better way, I'd love to hear it.
You could have a try 'phytools' package in the R. The function 'plotTree.wBars' could be conducted. Best wishes.

Resources