How to get relation from csv in Neo4J/Cypher using CSV LOAD - graph

I use Neo4J Community Edition version 3.2.1.
Consider this CSV-file with edges:
node1,relation,node2,type
1,RELATED_TO,2,Married
2,RELATED_TO,1,Married
1,RELATED_TO,3,Child
2,RELATED_TO,3,Child
3,RELATED_TO,4,Sibling
3,RELATED_TO,5,Sibling
4,RELATED_TO,5,Sibling
I have allready created the nodes for this. I then run the following csv load command:
load csv with headers from
"file:///test_dataset/edges.csv" as line
match (person1:Person {pid:line.node1}),
(person2:Person {pid:line.node2})
create (person1)-[:line.relation {type:line.type}]->(person2)
But this returns the following error:
Invalid input '.': expected an identifier character, whitespace, '|', a length specification, a property map or ']' (line 5, column 24 (offset: 167))
"create (person1)-[:line.relation {type:line.type}]->(person2)"
It seems that I cannot use "line.relation" like this. How can I use the relation from the csv-file (second column) using csv load?
I have seen this answer, but I would like to do this using native query language.
To verify that the rest of the query is correct I have managed to create the edges correctly by hardcoding the relation like this:
load csv with headers from
"file:///test_dataset/edges.csv" as line
match (person1:Person {pid:line.node1}),
(person2:Person {pid:line.node2})
create (person1)-[:RELATED_TO {type:line.type}]->(person2)

Natively it's not possible to create a node with a dynamic label and a relationship with a dynamic type.
That's why there is a procedure for that.
If you want to do it natively and you know all the distinct value of your relation column, you can create many cypher script like that (one per value):
LOAD CSV WITH HEADERS FROM "file:///test_dataset/edges.csv" AS line
WITH line WHERE line.relation ='RELATED_TO'
MATCH (person1:Person {pid:line.node1})
MATCH (person2:Person {pid:line.node2})
CREATE (person1)-[:RELATED_TO {type:line.type}]->(person2)

Related

Neo4J Self Lookup

I am struggling to get a simple Neo4J file to map to itself.
I have two CSV files
File A
ID,Name
0,abc
1,def
2,ghi
3,JJK
And File B
ID,Primary_ID,Secondary_ID
0,2,3
What I am wanting is to import File A into the Bloom and then link to the other elements by looking up File B if there is a relationship.
A Neo4J expert could probably tell me what I'm doing wrong.
This is my neo4j command:
neo4j#neo4j> LOAD CSV WITH HEADERS FROM 'file:///FileB.csv' AS row
WITH toInteger(row["ID"]) as ID, row["Primary_ID"] as Primary, row["Secondary_ID"] as Secondary
MATCH (c:item {itemId: Secondary})
MATCH (p:item {itemId: Primary})
MERGE (o)-->(p)
RETURN count(o);
Looks like there’s a typo in your query. You used o instead of c in the MERGE clause.
LOAD CSV WITH HEADERS FROM 'file:///FileB.csv' AS row
WITH toInteger(row["ID"]) as ID, row["Primary_ID"] as Primary, row["Secondary_ID"] as Secondary
MATCH (c:item {itemId: Secondary})
MATCH (p:item {itemId: Primary})
MERGE (c)-->(p)
RETURN count(c);

Error in loading a data to neo4j in case statement in cypher query

try to upload CSV data with a case statement in the query, but the following error appears:
cypher:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///test.csv' as line
MATCH(a:test_t{tid:line.pid})
CASE
WHEN line.key !='NA' THEN
WITH split(line.key,",") as name
UNWIND name as x
MERGE(k:test_key{key_term:toLower(x)})
MERGE(a)-[:contains]->(k)
END
Error
Neo.ClientError.Statement.SyntaxError: Invalid input 'S': expected 'l/L' (line 5, column 3 (offset: 137))
"CASE"
Can anyone help me?
The CASE clause does not support embedding other Cypher clauses (but it can invoke functions). In fact, a CASE clause is not actually needed for your use case.
This query should work (the :auto at the beginning is needed in neo4j 4.0+):
:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///test.csv' as line FIELDTERMINATOR ';'
WITH line
WHERE line.key <> 'NA'
MATCH (a:test_t {tid: line.pid})
UNWIND split(line.key, ',') as x
MERGE (k:test_key {key_term: TOLOWER(x)})
MERGE (a)-[:contains]->(k)
This query filters out all unwanted lines as soon as they are obtained from the file. Reducing the number of rows of data being worked on as early as possible is good practice.
Also, you have a second issue. Your data file cannot use the comma as both the (default) field terminator AND as the delimiter between your x values.
To resolve this ambiguity, the above query chose to use the FIELDTERMINATOR ';' option to specify that the ";" character will be used as the field terminator. A sample data file would look like this:
pid;key
123;NA
234;Foo,Bar
345;Bar,Baz
456;NA
567;Baz
You are using the CASE incorrectly. You cannot have update clauses inside of a CASE statement. Instead you can use a WHERE clause to filter the rows of the file. For instance, adding WHERE line.key != 'NA' while processing the file befor you move onto the update will work. Something like this should fit the bill.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///test.csv' as line
MATCH (a:test_t {tid: line.pid})
WITH line
WHERE line.key <> 'NA'
WITH split(line.key, ",") as name
UNWIND name as x
MERGE (k:test_key {key_term: toLower(x)})
MERGE (a)-[:contains]->(k)
It looks like,from your logic you could even move the test up above the MATCH. So this might be better (fewer unecessary matches).
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///test.csv' as line
WITH line
WHERE line.key <> 'NA'
MATCH (a:test_t {tid: line.pid})
WITH split(line.key, ",") as name
UNWIND name as x
MERGE (k:test_key {key_term: toLower(x)})
MERGE (a)-[:contains]->(k)

AWS Glue Custom Classifiers Json Path

I have a set of Json data files that look like this
[
{"client":"toys",
"filename":"toy1.csv",
"file_row_number":1,
"secondary_db_index":"4050",
"processed_timestamp":1535004075,
"processed_datetime":"2018-08-23T06:01:15+0000",
"entity_id":"4050",
"entity_name":"4050",
"is_emailable":false,
"is_txtable":false,
"is_loadable":false}
]
I have created a Glue Crawler with the following custom classifier Json Path
$[*]
Glue returns the correct schema with the columns correctly identified.
However, when I query the data on Athena... all the data is landing in the first column and the rest of the columns are empty.
How can I get the data to spread according to their columns?
image of Athena query
Thank you!
It is a issue connected to Hive. I suggest two approaches. Firstly, you can create new table in Athena with struct data type like this:
CREATE EXTERNAL TABLE `example`(
`row` struct<client:string,filename:string,file_row_number:int,secondary_db_index:string,processed_timestamp:int,processed_datetime:string,entity_id:string,entity_name:string,is_emailable:boolean,is_txtable:boolean,is_loadable:boolean> COMMENT 'from deserializer')
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://example'
TBLPROPERTIES (
'CrawlerSchemaDeserializerVersion'='1.0',
'CrawlerSchemaSerializerVersion'='1.0',
'UPDATED_BY_CRAWLER'='example',
'averageRecordSize'='271',
'classification'='json',
'compressionType'='none',
'jsonPath'='$[*]',
'objectCount'='1',
'recordCount'='1',
'sizeKey'='271',
'transient_lastDdlTime'='1535533583',
'typeOfData'='file')
And then you can run the query as follows:
SELECT row.client, row.filename, row.file_row_number FROM "example"
Secondly, you can re-design your json file as below and then run the Crawler again. In this example I used Single-JSON-Record-Per-Line format.
{"client":"toys","filename":"toy1.csv","file_row_number":1,"secondary_db_index":"4050","processed_timestamp":1535004075,"processed_datetime":"2018-08-23T06:01:15+0000","entity_id":"4050","entity_name":"4050","is_emailable":false,"is_txtable":false,"is_loadable":false},
{"client":"toys2","filename":"toy2.csv","file_row_number":1,"secondary_db_index":"4050","processed_timestamp":1535004075,"processed_datetime":"2018-08-23T06:01:15+0000","entity_id":"4050","entity_name":"4050","is_emailable":false,"is_txtable":false,"is_loadable":false}

Using Marklogic Xquery data population

I have the data as below manner.
<Status>Active Leave Terminated</Status>
<date>05/06/2014 09/10/2014 01/10/2015</date>
I want to get the data as in the below manner.
<status>Active</Status>
<date>05/06/2014</date>
<status>Leave</Status>
<date>09/10/2014</date>
<status>Terminated</Status>
<date>01/10/2015</date>
please help me on the query, to retrieve the data as specified above.
Well, you have a string and want to split it at the whitestapces. That's what tokenize() is for and \s is a whitespace. To get the corresponding date you can get the current position in the for loop using at. Together it looks something like this (note that I assume that the input data is the current context item):
let $dates := tokenize(date, "\s+")
for $status at $pos in tokenize(Status, "\s+")
return (
<status>{$status}</status>,
<date>{$dates[$pos]}</date>
)
You did not indicate whether your data is on the file system or already loaded into MarkLogic. It's also not clear if this is something you need to do once on a small set of data or on an on-going basis with a lot of data.
If it's on the file system, you can transform it as it is being loaded. For instance, MarkLogic Content Pump can apply a transformation during load.
If you have already loaded the content and you want to transform it in place, you can use Corb2.
If you have a small amount of data, then you can just loop across it using Query Console.
Regardless of how you apply the transformation code, dirkk's answer shows how you need to change it. If you are updating content already in your database, you'll xdmp:node-delete() the original Status and date elements and xdmp:node-insert-child() the new ones.

BizTalk Varying Length Flat File using Single Schema for Transform

I have a pipe delimited .txt Flat File that I'm using to do bulk insert to SQL. Everything works well for straight one to one. However, the Flat File now contains 2 new fields that can repeat an unknown number of times.
Is there a way to create a single flat file schema where I can have an unbounded child within the main unbounded child? I think the place I'm getting tripped up is how to make the ChildRoot listed below just a "group heading" like Root is where ChildRoot doesn't correspond to a location in the flat file. How do I insert something like that?
Schema:
-Roots
--Root (unbounded)
---ChildID
---ChildName
Roots gets a direct link to my sql stored procedure to do a bulk insert on as many "Root" rows that come in.
Now I have:
Schema:
-Roots
--Root (unbounded)
---Child
---ChildName
---ChildRoot (unbounded)
----ChildRootID
----ChildRootName
**EDIT
I should also add that ChildRootID & ChildRootName can repeat an indefinite number of times until the row delimiter (carriage return) is found

Resources