I am using NebulaGraph Explorer Workflow to create DAG pipelines on NebulaGraph today.
I am creating a DAG with three tasks like this:
┌──────────────────────────────────────────┐
│ │
│ MATCH ()-[e]->() │
│ WITH e LIMIT 10000 │
│ WITH e AS e │
│ WHERE e.goals > 10 │
│ AND toFloat(e.goals)/e.caps > 0.2 │
│ RETURN src(e), dst(e) │
│ │ │
└──────────┼───────────────────────────────┘
│
▼
┌──────────────────────────────────┐
│ │
│ MATCH (v0)-[:belongto]-(v1) │
│ WHERE id(v0) == ${src} │ # here I mapped src(e) from last task into ${src}
│ RETURN id(v0), id(v1) │
│ │ │ │
└───────────┼────────┼─────────────┘
│ │
┌──────────▼────────▼────────────┐
│ │
│ Betweenness Centrality │
│ │
│ │
└────────────────────────────────┘
While it failed in the second task:
Process exited with status 1
/root/nebula-analytics-3.3.0-centos7.x86_64/3rd/mpich/bin/mpiexec.hydra -n 1 -hosts 192.168.8.237 /root/nebula-analytics-3.3.0-centos7.x86_64/bin/exec_ngql --ngql=MATCH (v0)-[:belongto]-(v1)
WHERE id(v0) == "Álvaro Morata","Romelu Lukaku","Neymar","Naïm Sliti","Mehdi Taremi","Mario Götze","Kylian Mbappé","Kasper Dolberg","Jordan Morris","Joel Campbell","Ivan Perišić","Enner Valencia (captain","Christian Eriksen","Alphonso Davies","Ali Assadalla","Akram Afif","İlkay Gündoğan","Youssef En-Nesyri","Serge Gnabry","Sardar Azmoun","Richarlison","Raúl Jiménez","Mohammed Muntari","Almoez Ali","Memphis Depay","Marcus Rashford","Luis Suárez","Lucas Cavallini","Leroy Sané","Karl Toko Ekambi","Hirving Lozano","Haris Seferovic","Hakim Ziyech","Gareth Bale (captain","Gabriel Jesus","Dušan Tadić (captain","Christian Pulisic","Bruno Fernandes","Andrej Kramarić","Aleksandar Mitrović","Aaron Ramsey","Thomas Partey","Robert Lewandowski (captain","Raheem Sterling","Lionel Messi (captain","Kwon Chang-hoon","Junior Hoilett","Hassan Al-Haydos (captain","Cristiano Ronaldo (captain","André Silva","André Ayew (captain","Alireza Jahanbakhsh","Ángel Di María","Wahbi Khazri","Vincent Aboubakar (captain","Thomas Müller","Takumi Minamino","Leon Goretzka","Krzysztof Piątek","Karim Benzema","Jordan Ayew","Harry Kane (captain","Ferran Torres","Cyle Larin","Arkadiusz Milik","Antoine Griezmann","Xherdan Shaqiri","Son Heung-min (captain","Salem Al-Dawsari","Olivier Giroud","Michy Batshuayi","Lautaro Martínez","Kevin De Bruyne","Karim Ansarifard","Jonathan David","Hwang Ui-jo","Eric Maxim Choupo-Moting","Edinson Cavani","Eden Hazard (captain"
RETURN id(v0), id(v1) --threads=1 --datasource_user=root --datasink_hdfs_url=hdfs://192.168.8.168:9000/ll_test/analytics/1999/tasks/query_2/ --datasource_graphd=192.168.8.131:9669 --datasource_space=fifa_2020 --datasource_graphd_timeout=60000
I20230109 02:52:37.758909 2224521 base.hpp:179] thread support level provided by MPI:
I20230109 02:52:37.759177 2224521 base.hpp:182] MPI_THREAD_MULTIPLE
I20230109 02:52:37.759189 2224521 base.hpp:215] threads: 1
I20230109 02:52:37.759192 2224521 base.hpp:216] sockets: 2
I20230109 02:52:37.759194 2224521 base.hpp:217] partitions: 1
I20230109 02:52:37.759402 2224521 license.cc:519] [part-0]Signature validation started
I20230109 02:52:37.759697 2224521 license.cc:547] [part-0]Signature validation succeed
I20230109 02:52:37.759745 2224521 license.cc:717] [part-0][License] Trial license detected, hardware checking is skipped.
I20230109 02:52:37.759830 2224521 license.cc:623] [part-0]The number of cpus of the current machine is 4
I20230109 02:52:37.759858 2224521 license.cc:253] [License] Expiration timestamp in UTC: 4826275199
I20230109 02:52:37.759866 2224521 license.cc:259] [License] Timezone difference: 0 seconds
I20230109 02:52:37.759869 2224521 license.cc:263] [License] Expiration timestamp in local time zone: 4826275199
I20230109 02:52:37.759872 2224521 license.cc:607] [part-0][License] Expiration check passed
I20230109 02:52:37.810142 2224521 exec_ngql.cc:52] stmt:MATCH (v0)-[:belongto]-(v1)
WHERE id(v0) == "Álvaro Morata","Romelu Lukaku","Neymar","Naïm Sliti","Mehdi Taremi","Mario Götze","Kylian Mbappé","Kasper Dolberg","Jordan Morris","Joel Campbell","Ivan Perišić","Enner Valencia (captain","Christian Eriksen","Alphonso Davies","Ali Assadalla","Akram Afif","İlkay Gündoğan","Youssef En-Nesyri","Serge Gnabry","Sardar Azmoun","Richarlison","Raúl Jiménez","Mohammed Muntari","Almoez Ali","Memphis Depay","Marcus Rashford","Luis Suárez","Lucas Cavallini","Leroy Sané","Karl Toko Ekambi","Hirving Lozano","Haris Seferovic","Hakim Ziyech","Gareth Bale (captain","Gabriel Jesus","Dušan Tadić (captain","Christian Pulisic","Bruno Fernandes","Andrej Kramarić","Aleksandar Mitrović","Aaron Ramsey","Thomas Partey","Robert Lewandowski (captain","Raheem Sterling","Lionel Messi (captain","Kwon Chang-hoon","Junior Hoilett","Hassan Al-Haydos (captain","Cristiano Ronaldo (captain","André Silva","André Ayew (captain","Alireza Jahanbakhsh","Ángel Di María","Wahbi Khazri","Vincent Aboubakar (captain","Thomas Müller","Takumi Minamino","Leon Goretzka","Krzysztof Piątek","Karim Benzema","Jordan Ayew","Harry Kane (captain","Ferran Torres","Cyle Larin","Arkadiusz Milik","Antoine Griezmann","Xherdan Shaqiri","Son Heung-min (captain","Salem Al-Dawsari","Olivier Giroud","Michy Batshuayi","Lautaro Martínez","Kevin De Bruyne","Karim Ansarifard","Jonathan David","Hwang Ui-jo","Eric Maxim Choupo-Moting","Edinson Cavani","Eden Hazard (captain"
RETURN id(v0), id(v1)
I20230109 02:52:37.810698 2224521 exec_ngql.cc:55] session execute failed, statment: MATCH (v0)-[:belongto]-(v1)
WHERE id(v0) == "Álvaro Morata","Romelu Lukaku","Neymar","Naïm Sliti","Mehdi Taremi","Mario Götze","Kylian Mbappé","Kasper Dolberg","Jordan Morris","Joel Campbell","Ivan Perišić","Enner Valencia (captain","Christian Eriksen","Alphonso Davies","Ali Assadalla","Akram Afif","İlkay Gündoğan","Youssef En-Nesyri","Serge Gnabry","Sardar Azmoun","Richarlison","Raúl Jiménez","Mohammed Muntari","Almoez Ali","Memphis Depay","Marcus Rashford","Luis Suárez","Lucas Cavallini","Leroy Sané","Karl Toko Ekambi","Hirving Lozano","Haris Seferovic","Hakim Ziyech","Gareth Bale (captain","Gabriel Jesus","Dušan Tadić (captain","Christian Pulisic","Bruno Fernandes","Andrej Kramarić","Aleksandar Mitrović","Aaron Ramsey","Thomas Partey","Robert Lewandowski (captain","Raheem Sterling","Lionel Messi (captain","Kwon Chang-hoon","Junior Hoilett","Hassan Al-Haydos (captain","Cristiano Ronaldo (captain","André Silva","André Ayew (captain","Alireza Jahanbakhsh","Ángel Di María","Wahbi Khazri","Vincent Aboubakar (captain","Thomas Müller","Takumi Minamino","Leon Goretzka","Krzysztof Piątek","Karim Benzema","Jordan Ayew","Harry Kane (captain","Ferran Torres","Cyle Larin","Arkadiusz Milik","Antoine Griezmann","Xherdan Shaqiri","Son Heung-min (captain","Salem Al-Dawsari","Olivier Giroud","Michy Batshuayi","Lautaro Martínez","Kevin De Bruyne","Karim Ansarifard","Jonathan David","Hwang Ui-jo","Eric Maxim Choupo-Moting","Edinson Cavani","Eden Hazard (captain"
RETURN id(v0), id(v1)
errorCode: -1004, errorMsg: SyntaxError: syntax error near `E id(v0)'
I am following the docs chapter 4, while it seems something went wrong anyway.
I tried to use the GO clause instead of MATCH in the second task, but it complained of similar errors.
Could anyone help answer at where I could be wrong?
I have SecurityLog with fields like DstIP_s and want to display records matching my trojanDst table
let trojanDst = datatable (DstIP_s:string)
[ "1.1.1.1","2.2.2.2","3.3.3.3"
];
SecurityLog |
| join trojanDst on DstIP_s
I am getting query could not be parsed error ?
The query you posted has a redundant pipe (|) before the join.
From an efficiency standpoint, make sure the left side of the join is the smaller one, as suggested here: https://learn.microsoft.com/en-us/azure/kusto/query/best-practices#join-operator
This is too long for a comment. As #Yoni L pointed the problem is doubled pipe operator.
For anyone with SQL background join may be a bit counterintuitive(in reality it is kind=innerunique):
JOIN operator:
kind unspecified, kind=innerunique
Only one row from the left side is matched for each value of the on
key. The output contains a row for each match of this row with rows
from the right.
Kind=inner
There's a row in the output for every combination of matching rows
from left and right.
let t1 = datatable(key:long, value:string)
[
1, "a",
1, "b"
];
let t2 = datatable(key:long, value:string)
[
1, "c",
1, "d"
];
t1| join t2 on key;
Output:
┌─────┬───────┬──────┬────────┐
│ key │ value │ key1 │ value1 │
├─────┼───────┼──────┼────────┤
│ 1 │ a │ 1 │ c │
│ 1 │ a │ 1 │ d │
└─────┴───────┴──────┴────────┘
Demo
SQL style JOIN version:
let t1 = datatable(key:long, value:string)
[
1, "a",
1, "b"
];
let t2 = datatable(key:long, value:string)
[
1, "c",
1, "d"
];
t1| join kind=inner t2 on key;
Output:
┌─────┬───────┬──────┬────────┐
│ key │ value │ key1 │ value1 │
├─────┼───────┼──────┼────────┤
│ 1 │ b │ 1 │ c │
│ 1 │ a │ 1 │ c │
│ 1 │ b │ 1 │ d │
│ 1 │ a │ 1 │ d │
└─────┴───────┴──────┴────────┘
Demo
There are many join types in KQL such as innerunique, inner, leftouter, rightouter, fullouter, anti and more. here you can find the full list
Having a graph like:
CREATE (Alice:Person {id:'a', fraud:1})
CREATE (Bob:Person {id:'b', fraud:0})
CREATE (Charlie:Person {id:'c', fraud:0})
CREATE (David:Person {id:'d', fraud:0})
CREATE (Esther:Person {id:'e', fraud:0})
CREATE (Fanny:Person {id:'f', fraud:0})
CREATE (Gabby:Person {id:'g', fraud:0})
CREATE (Fraudster:Person {id:'h', fraud:1})
CREATE
(Alice)-[:CALL]->(Bob),
(Bob)-[:SMS]->(Charlie),
(Charlie)-[:SMS]->(Bob),
(Fanny)-[:SMS]->(Charlie),
(Esther)-[:SMS]->(Fanny),
(Esther)-[:CALL]->(David),
(David)-[:CALL]->(Alice),
(David)-[:SMS]->(Esther),
(Alice)-[:CALL]->(Esther),
(Alice)-[:CALL]->(Fanny),
(Fanny)-[:CALL]->(Fraudster)
When trying to query like:
MATCH (a)-->(b)
WHERE b.fraud = 1
RETURN (count() / ( MATCH (a) -->(b) RETURN count() ) * 100)
I want to compute the fraudulence of a user which (as fraud is only either 0 or 1 is defined as the mean of all connected nodes fraud level:
MATCH ()--(f)
RETURN f.id, f.fraud, COUNT(*), COLLECT(f) AS fs
returns the correct number of friends, but is not able to access these i.e. in the collect statement is only accessing the node itself:
╒══════╤═════════╤══════════════╤══════════╤══════════════════════════════════════════════════════════════════════╕
│"f.id"│"f.fraud"│"avg(f.fraud)"│"COUNT(*)"│"fs" │
╞══════╪═════════╪══════════════╪══════════╪══════════════════════════════════════════════════════════════════════╡
│"h" │1 │1 │1 │[{"fraud":1,"id":"h"}] │
├──────┼─────────┼──────────────┼──────────┼──────────────────────────────────────────────────────────────────────┤
│"f" │0 │0 │4 │[{"fraud":0,"id":"f"},{"fraud":0,"id":"f"},{"fraud":0,"id":"f"},{"frau│
│ │ │ │ │d":0,"id":"f"}] │
....
I.e. naively calculating the average
MATCH ()--(f)
RETURN f.id, avg(f.fraud)
will only consider this single node and not the network. How can I consider the social network of a node instead (up to a defined depth, i.e. here 1) to improve the original answer of neo4j percentage of attribute for social network
edit
MATCH p = ()--()
UNWIND nodes(p) AS f
RETURN f.id, f.fraud, COUNT(*), COLLECT({id: f.id, fraud: f.fraud}) AS fs
will return only duplicates of the original node in the list and not the connected nodes:
│"f.id"│"f.fraud"│"COUNT(*)"│"fs" │
╞══════╪═════════╪══════════╪══════════════════════════════════════════════════════════════════════╡
│"h" │1 │2 │[{"id":"h","fraud":1},{"id":"h","fraud":1}] │
├──────┼─────────┼──────────┼──────────────────────────────────────────────────────────────────────┤
│"f" │0 │8 │[{"id":"f","fraud":0},{"id":"f","fraud":0},{"id":"f","fraud":0},{"id":│
│ │ │ │"f","fraud":0},{"id":"f","fraud":0},{"id":"f","fraud":0},{"id":"f","fr│
│ │ │ │aud":0},{"id":"f","fraud":0}] │
edit 2
MATCH p = (source)--(destination)
RETURN source.id, source.fraud, COUNT(*), COLLECT({id: destination.id, fraud: destination.fraud}) AS neighbors
is already pretty close - but lacking the avg function
MATCH p = (source)-[*..3]-(destination)
RETURN source.id, source.fraud, COUNT(*), avg(destination.fraud), COLLECT({id: destination.id, fraud: destination.fraud}) AS neighbors
includes the fraudulence defined as the average
This may be a stupid question, but for the life of me I can't figure out how to get Julia to read a csv file with column names that start with numbers and use them in DataFrames. How does one do this?
For example, say I have the file "test.csv" which contains the following:
,1Y,2Y,3Y
1Y,11,12,13
2Y,21,22,23
If I just use readtable(), I get this:
julia> using DataFrames
julia> df = readtable("test.csv")
2x4 DataFrames.DataFrame
| Row | x | x1Y | x2Y | x3Y |
|-----|------|-----|-----|-----|
| 1 | "1Y" | 11 | 12 | 13 |
| 2 | "2Y" | 21 | 22 | 23 |
What gives? How can I get the column names to be what they're supposed to be, "1Y, "2Y, etc.?
The problem is that in DataFrames, column names are symbols, which aren't meant to (see comment below) start with a number.
You can see this by doing e.g. typeof(:2), which will return Int64, rather than (as you might expect) Symbol. Thus, to get your columnnames into a useable format, DataFrames will have to prefix it with a letter - typeof(:x2) will return Symbol, and is therefore a valid column name.
Unfortunately, you can't use numbers for starting names in DataFrames.
The code that does the parsing of names makes sure that this restriction stays like this.
I believe this is because of how parsing takes place in julia: :aa names a symbol, while :2aa is a value (makes more sense considering 1:2aa is a range)
You could just use rename!() after the import:
df = csv"""
,1Y,2Y,3Y
1Y,11,12,13
2Y,21,22,23
"""
rename!(df, Dict(:x1Y =>Symbol("1Y"), :x2Y=>Symbol("2Y"), :x3Y=>Symbol("3Y") ))
2×4 DataFrames.DataFrame
│ Row │ x │ 1Y │ 2Y │ 3Y │
├─────┼──────┼────┼────┼────┤
│ 1 │ "1Y" │ 11 │ 12 │ 13 │
│ 2 │ "2Y" │ 21 │ 22 │ 23 │
Still you may experience problems later in your code, better to avoid column names starting with numbers...