I am struggling to get a simple Neo4J file to map to itself.
I have two CSV files
File A
ID,Name
0,abc
1,def
2,ghi
3,JJK
And File B
ID,Primary_ID,Secondary_ID
0,2,3
What I am wanting is to import File A into the Bloom and then link to the other elements by looking up File B if there is a relationship.
A Neo4J expert could probably tell me what I'm doing wrong.
This is my neo4j command:
neo4j#neo4j> LOAD CSV WITH HEADERS FROM 'file:///FileB.csv' AS row
WITH toInteger(row["ID"]) as ID, row["Primary_ID"] as Primary, row["Secondary_ID"] as Secondary
MATCH (c:item {itemId: Secondary})
MATCH (p:item {itemId: Primary})
MERGE (o)-->(p)
RETURN count(o);
Looks like there’s a typo in your query. You used o instead of c in the MERGE clause.
LOAD CSV WITH HEADERS FROM 'file:///FileB.csv' AS row
WITH toInteger(row["ID"]) as ID, row["Primary_ID"] as Primary, row["Secondary_ID"] as Secondary
MATCH (c:item {itemId: Secondary})
MATCH (p:item {itemId: Primary})
MERGE (c)-->(p)
RETURN count(c);
Related
I want to use READ_NOS to read a file from S3 and get all rows to return. But it only returns some rows .
I create a foreign table for a parquet file.
but result: https://imgur.com/a/E0KLNJT
use studio still the same result: https://imgur.com/a/d8UP9uH
how to get the all rows return ?
the first SQL (COUNT*) show the number of records. The second one the number of Parquet Files. So in average each files holds 6.470 records.
There is a Teradata Orange Book dedicated to Use of NOS with some backgroud but as well some example SQL. Chapter 5 of this is focussed on Parquet Files.
It looks like RETURNTYPE ('NOSREAD_PARQUET_SCHEMA') is important in the combination os READ_NOS and Parquet.
I am using following code to create a graph
LOAD CSV WITH HEADERS
FROM "file:///fileName.csv"
AS network
MERGE (n:sourceNode {id:network.node1})
MERGE (m:destNode {id:network.node2})
WITH n,m,network
CALL apoc.create.relationship(n, network.connection, {}, m) yield rel
RETURN n,
rel,
m
They CSV file contains repeating values like
node1,connection,node2
A,0.75,B
c,0.5,A
this code creates a graph like this
But I need graph like following to perform analysis
one solution I came up with is that I can create both
node1 and node2 with single MERGE clause as it will create non-repeating node. I have tried to modify this code like
MERGE (n:sourceNode {id:network.node1}, m:destNode {id:network.node2})
and others but I get syntax error. Can some please me out with this situation? or has any other solution to this problem?
You have two nodes A, because in your MERGE you are not using the same label.
So at the end you have :
one node A with the label sourceNode
one node A with the label destNode
If you want to only have one node A, please use a common label on both source and destination node, something like that :
LOAD CSV WITH HEADERS
FROM "file:///fileName.csv"
AS network
MERGE (n:Node {id:network.node1})
MERGE (m:Node {id:network.node2})
WITH n,m,network
CALL apoc.create.relationship(n, network.connection, {}, m) yield rel
RETURN n,
rel,
m
Moreover on this example you should create a unique constraint on the label Node for the property id : CREATE UNIQUE CONSTRAINT ON (n:Node) ASSERT n.id IS UNIQUE;
I use Neo4J Community Edition version 3.2.1.
Consider this CSV-file with edges:
node1,relation,node2,type
1,RELATED_TO,2,Married
2,RELATED_TO,1,Married
1,RELATED_TO,3,Child
2,RELATED_TO,3,Child
3,RELATED_TO,4,Sibling
3,RELATED_TO,5,Sibling
4,RELATED_TO,5,Sibling
I have allready created the nodes for this. I then run the following csv load command:
load csv with headers from
"file:///test_dataset/edges.csv" as line
match (person1:Person {pid:line.node1}),
(person2:Person {pid:line.node2})
create (person1)-[:line.relation {type:line.type}]->(person2)
But this returns the following error:
Invalid input '.': expected an identifier character, whitespace, '|', a length specification, a property map or ']' (line 5, column 24 (offset: 167))
"create (person1)-[:line.relation {type:line.type}]->(person2)"
It seems that I cannot use "line.relation" like this. How can I use the relation from the csv-file (second column) using csv load?
I have seen this answer, but I would like to do this using native query language.
To verify that the rest of the query is correct I have managed to create the edges correctly by hardcoding the relation like this:
load csv with headers from
"file:///test_dataset/edges.csv" as line
match (person1:Person {pid:line.node1}),
(person2:Person {pid:line.node2})
create (person1)-[:RELATED_TO {type:line.type}]->(person2)
Natively it's not possible to create a node with a dynamic label and a relationship with a dynamic type.
That's why there is a procedure for that.
If you want to do it natively and you know all the distinct value of your relation column, you can create many cypher script like that (one per value):
LOAD CSV WITH HEADERS FROM "file:///test_dataset/edges.csv" AS line
WITH line WHERE line.relation ='RELATED_TO'
MATCH (person1:Person {pid:line.node1})
MATCH (person2:Person {pid:line.node2})
CREATE (person1)-[:RELATED_TO {type:line.type}]->(person2)
Is it possible to extract files only for 3 days, without extracting all the files.
DROP VIEW IF EXISTS dbo.Read;
CREATE VIEW IF NOT EXISTS dbo.Read AS
EXTRACT
Statements
FROM
"adl://Test/{date:yyyy}/{date:M}/{date:d}/Testfile.csv"
USING Extractors.Csv(silent:true,quoting : true, nullEscape : "/N");
#res =
SELECT * FROM dbo.Read
WHERE date BETWEEN DateTime.Parse("2015/07/01") AND DateTime.Parse("2015/07/03");
OUTPUT #res
TO "adl://test/Testing/loop.csv"
USING Outputters.Csv();
Partition elimination already ensures for your query that only files matching predicates will actually be read (you can confirm that in the job graph).
See also my previous answer for How to implement Loops in U-SQL
If you have remaining concerns about performance, the job graph can also help you nail down where they originate.
You can use the pattern identifiers in the fileset specification in parts of the path or even parts of the name (see https://msdn.microsoft.com/en-us/library/azure/mt771650.aspx). You can do lists of files, so if you only have one file in each directory you can do;
EXTRACT ...
FROM "adl://Test/2015/07/1/Testfile.csv"
, "adl://Test/2015/07/2/Testfile.csv"
USING ...;
If there is more than one file in each directory you can do individual extracts for each day and then union the result. Something like:
#a = EXTRACT ....
FROM "adl://Test/2015/07/1/{*}.csv"
USING ...;
#b = EXTRACT ....
FROM "adl://Test/2015/07/2/{*}.csv"
USING ...;
#fullset = SELECT * FROM #a UNION SELECT * FROM #b;
Unfortunately I believe there is no list of filesets at the moment allowing you to do above case in one EXTRACT statement.
I have a pipe delimited .txt Flat File that I'm using to do bulk insert to SQL. Everything works well for straight one to one. However, the Flat File now contains 2 new fields that can repeat an unknown number of times.
Is there a way to create a single flat file schema where I can have an unbounded child within the main unbounded child? I think the place I'm getting tripped up is how to make the ChildRoot listed below just a "group heading" like Root is where ChildRoot doesn't correspond to a location in the flat file. How do I insert something like that?
Schema:
-Roots
--Root (unbounded)
---ChildID
---ChildName
Roots gets a direct link to my sql stored procedure to do a bulk insert on as many "Root" rows that come in.
Now I have:
Schema:
-Roots
--Root (unbounded)
---Child
---ChildName
---ChildRoot (unbounded)
----ChildRootID
----ChildRootName
**EDIT
I should also add that ChildRootID & ChildRootName can repeat an indefinite number of times until the row delimiter (carriage return) is found