How to wire queries in Nebula explorer workflow properly - nebula-graph

I am using NebulaGraph Explorer Workflow to create DAG pipelines on NebulaGraph today.
I am creating a DAG with three tasks like this:
┌──────────────────────────────────────────┐
│ │
│ MATCH ()-[e]->() │
│ WITH e LIMIT 10000 │
│ WITH e AS e │
│ WHERE e.goals > 10 │
│ AND toFloat(e.goals)/e.caps > 0.2 │
│ RETURN src(e), dst(e) │
│ │ │
└──────────┼───────────────────────────────┘
│
▼
┌──────────────────────────────────┐
│ │
│ MATCH (v0)-[:belongto]-(v1) │
│ WHERE id(v0) == ${src} │ # here I mapped src(e) from last task into ${src}
│ RETURN id(v0), id(v1) │
│ │ │ │
└───────────┼────────┼─────────────┘
│ │
┌──────────▼────────▼────────────┐
│ │
│ Betweenness Centrality │
│ │
│ │
└────────────────────────────────┘
While it failed in the second task:
Process exited with status 1
/root/nebula-analytics-3.3.0-centos7.x86_64/3rd/mpich/bin/mpiexec.hydra -n 1 -hosts 192.168.8.237 /root/nebula-analytics-3.3.0-centos7.x86_64/bin/exec_ngql --ngql=MATCH (v0)-[:belongto]-(v1)
WHERE id(v0) == "Álvaro Morata","Romelu Lukaku","Neymar","Naïm Sliti","Mehdi Taremi","Mario Götze","Kylian Mbappé","Kasper Dolberg","Jordan Morris","Joel Campbell","Ivan Perišić","Enner Valencia (captain","Christian Eriksen","Alphonso Davies","Ali Assadalla","Akram Afif","İlkay Gündoğan","Youssef En-Nesyri","Serge Gnabry","Sardar Azmoun","Richarlison","Raúl Jiménez","Mohammed Muntari","Almoez Ali","Memphis Depay","Marcus Rashford","Luis Suárez","Lucas Cavallini","Leroy Sané","Karl Toko Ekambi","Hirving Lozano","Haris Seferovic","Hakim Ziyech","Gareth Bale (captain","Gabriel Jesus","Dušan Tadić (captain","Christian Pulisic","Bruno Fernandes","Andrej Kramarić","Aleksandar Mitrović","Aaron Ramsey","Thomas Partey","Robert Lewandowski (captain","Raheem Sterling","Lionel Messi (captain","Kwon Chang-hoon","Junior Hoilett","Hassan Al-Haydos (captain","Cristiano Ronaldo (captain","André Silva","André Ayew (captain","Alireza Jahanbakhsh","Ángel Di María","Wahbi Khazri","Vincent Aboubakar (captain","Thomas Müller","Takumi Minamino","Leon Goretzka","Krzysztof Piątek","Karim Benzema","Jordan Ayew","Harry Kane (captain","Ferran Torres","Cyle Larin","Arkadiusz Milik","Antoine Griezmann","Xherdan Shaqiri","Son Heung-min (captain","Salem Al-Dawsari","Olivier Giroud","Michy Batshuayi","Lautaro Martínez","Kevin De Bruyne","Karim Ansarifard","Jonathan David","Hwang Ui-jo","Eric Maxim Choupo-Moting","Edinson Cavani","Eden Hazard (captain"
RETURN id(v0), id(v1) --threads=1 --datasource_user=root --datasink_hdfs_url=hdfs://192.168.8.168:9000/ll_test/analytics/1999/tasks/query_2/ --datasource_graphd=192.168.8.131:9669 --datasource_space=fifa_2020 --datasource_graphd_timeout=60000
I20230109 02:52:37.758909 2224521 base.hpp:179] thread support level provided by MPI:
I20230109 02:52:37.759177 2224521 base.hpp:182] MPI_THREAD_MULTIPLE
I20230109 02:52:37.759189 2224521 base.hpp:215] threads: 1
I20230109 02:52:37.759192 2224521 base.hpp:216] sockets: 2
I20230109 02:52:37.759194 2224521 base.hpp:217] partitions: 1
I20230109 02:52:37.759402 2224521 license.cc:519] [part-0]Signature validation started
I20230109 02:52:37.759697 2224521 license.cc:547] [part-0]Signature validation succeed
I20230109 02:52:37.759745 2224521 license.cc:717] [part-0][License] Trial license detected, hardware checking is skipped.
I20230109 02:52:37.759830 2224521 license.cc:623] [part-0]The number of cpus of the current machine is 4
I20230109 02:52:37.759858 2224521 license.cc:253] [License] Expiration timestamp in UTC: 4826275199
I20230109 02:52:37.759866 2224521 license.cc:259] [License] Timezone difference: 0 seconds
I20230109 02:52:37.759869 2224521 license.cc:263] [License] Expiration timestamp in local time zone: 4826275199
I20230109 02:52:37.759872 2224521 license.cc:607] [part-0][License] Expiration check passed
I20230109 02:52:37.810142 2224521 exec_ngql.cc:52] stmt:MATCH (v0)-[:belongto]-(v1)
WHERE id(v0) == "Álvaro Morata","Romelu Lukaku","Neymar","Naïm Sliti","Mehdi Taremi","Mario Götze","Kylian Mbappé","Kasper Dolberg","Jordan Morris","Joel Campbell","Ivan Perišić","Enner Valencia (captain","Christian Eriksen","Alphonso Davies","Ali Assadalla","Akram Afif","İlkay Gündoğan","Youssef En-Nesyri","Serge Gnabry","Sardar Azmoun","Richarlison","Raúl Jiménez","Mohammed Muntari","Almoez Ali","Memphis Depay","Marcus Rashford","Luis Suárez","Lucas Cavallini","Leroy Sané","Karl Toko Ekambi","Hirving Lozano","Haris Seferovic","Hakim Ziyech","Gareth Bale (captain","Gabriel Jesus","Dušan Tadić (captain","Christian Pulisic","Bruno Fernandes","Andrej Kramarić","Aleksandar Mitrović","Aaron Ramsey","Thomas Partey","Robert Lewandowski (captain","Raheem Sterling","Lionel Messi (captain","Kwon Chang-hoon","Junior Hoilett","Hassan Al-Haydos (captain","Cristiano Ronaldo (captain","André Silva","André Ayew (captain","Alireza Jahanbakhsh","Ángel Di María","Wahbi Khazri","Vincent Aboubakar (captain","Thomas Müller","Takumi Minamino","Leon Goretzka","Krzysztof Piątek","Karim Benzema","Jordan Ayew","Harry Kane (captain","Ferran Torres","Cyle Larin","Arkadiusz Milik","Antoine Griezmann","Xherdan Shaqiri","Son Heung-min (captain","Salem Al-Dawsari","Olivier Giroud","Michy Batshuayi","Lautaro Martínez","Kevin De Bruyne","Karim Ansarifard","Jonathan David","Hwang Ui-jo","Eric Maxim Choupo-Moting","Edinson Cavani","Eden Hazard (captain"
RETURN id(v0), id(v1)
I20230109 02:52:37.810698 2224521 exec_ngql.cc:55] session execute failed, statment: MATCH (v0)-[:belongto]-(v1)
WHERE id(v0) == "Álvaro Morata","Romelu Lukaku","Neymar","Naïm Sliti","Mehdi Taremi","Mario Götze","Kylian Mbappé","Kasper Dolberg","Jordan Morris","Joel Campbell","Ivan Perišić","Enner Valencia (captain","Christian Eriksen","Alphonso Davies","Ali Assadalla","Akram Afif","İlkay Gündoğan","Youssef En-Nesyri","Serge Gnabry","Sardar Azmoun","Richarlison","Raúl Jiménez","Mohammed Muntari","Almoez Ali","Memphis Depay","Marcus Rashford","Luis Suárez","Lucas Cavallini","Leroy Sané","Karl Toko Ekambi","Hirving Lozano","Haris Seferovic","Hakim Ziyech","Gareth Bale (captain","Gabriel Jesus","Dušan Tadić (captain","Christian Pulisic","Bruno Fernandes","Andrej Kramarić","Aleksandar Mitrović","Aaron Ramsey","Thomas Partey","Robert Lewandowski (captain","Raheem Sterling","Lionel Messi (captain","Kwon Chang-hoon","Junior Hoilett","Hassan Al-Haydos (captain","Cristiano Ronaldo (captain","André Silva","André Ayew (captain","Alireza Jahanbakhsh","Ángel Di María","Wahbi Khazri","Vincent Aboubakar (captain","Thomas Müller","Takumi Minamino","Leon Goretzka","Krzysztof Piątek","Karim Benzema","Jordan Ayew","Harry Kane (captain","Ferran Torres","Cyle Larin","Arkadiusz Milik","Antoine Griezmann","Xherdan Shaqiri","Son Heung-min (captain","Salem Al-Dawsari","Olivier Giroud","Michy Batshuayi","Lautaro Martínez","Kevin De Bruyne","Karim Ansarifard","Jonathan David","Hwang Ui-jo","Eric Maxim Choupo-Moting","Edinson Cavani","Eden Hazard (captain"
RETURN id(v0), id(v1)
errorCode: -1004, errorMsg: SyntaxError: syntax error near `E id(v0)'
I am following the docs chapter 4, while it seems something went wrong anyway.
I tried to use the GO clause instead of MATCH in the second task, but it complained of similar errors.
Could anyone help answer at where I could be wrong?

Related

Maxscale readwritesplit show the same number of connections

I've set up Maxscale to ReadWriteSplit with no reads to master (the default) to a Galera cluster (3 nodes).
#
# Global configuration
#
[maxscale]
threads=auto
local_address=10.1.0.11
query_retries=2
#
# Servers
#
[sql1]
type=server
address=10.1.0.2
port=3306
protocol=MariaDBBackend
persistpoolmax=16
persistmaxtime=300s
priority=1
ssl=required
ssl_ca_cert=/var/lib/maxscale/ssl/ca-cert.pem
ssl_cert=/var/lib/maxscale/ssl/client.pem
ssl_key=/var/lib/maxscale/ssl/client.key
[sql2]
type=server
address=10.1.0.3
port=3306
protocol=MariaDBBackend
persistpoolmax=16
persistmaxtime=300s
priority=2
ssl=required
ssl_ca_cert=/var/lib/maxscale/ssl/ca-cert.pem
ssl_cert=/var/lib/maxscale/ssl/client.pem
ssl_key=/var/lib/maxscale/ssl/client.key
[sql3]
type=server
address=10.1.0.4
port=3306
protocol=MariaDBBackend
persistpoolmax=16
persistmaxtime=300s
priority=3
ssl=required
ssl_ca_cert=/var/lib/maxscale/ssl/ca-cert.pem
ssl_cert=/var/lib/maxscale/ssl/client.pem
ssl_key=/var/lib/maxscale/ssl/client.key
#
# Monitor
#
[monitor]
type=monitor
module=galeramon
servers=sql1,sql2,sql3
user=maxscale
password=324F7B3BE796AD5F4BB2FAD65E1F9052A976701742729400
available_when_donor=true
use_priority=true
#
# Listeners
#
[listener-rw]
type=listener
service=readwritesplit
protocol=MariaDBClient
address=10.1.0.1
port=3306
ssl=required
ssl_ca_cert=/var/lib/maxscale/ssl/ca-cert.pem
ssl_cert=/var/lib/maxscale/ssl/server.pem
ssl_key=/var/lib/maxscale/ssl/server.key
#
# Services
#
[readwritesplit]
type=service
router=readwritesplit
servers=sql1,sql2,sql3
user=maxscale
password=324F74A347291B3BE79956AD5F4BB917701742729400
enable_root_user=1
max_sescmd_history=150
While testing some read queries using loader.io I always get the same number of connection across all nodes:
> maxctrl list servers
┌────────┬───────────┬──────┬─────────────┬─────────────────────────┬───────────────────────────────┐
│ Server │ Address │ Port │ Connections │ State │ GTID │
├────────┼───────────┼──────┼─────────────┼─────────────────────────┼───────────────────────────────┤
│ sql1 │ 10.1.0.2 │ 3306 │ 87 │ Master, Synced, Running │ 0-1-12474939,1-1-148225,2-2-2 │
├────────┼───────────┼──────┼─────────────┼─────────────────────────┼───────────────────────────────┤
│ sql2 │ 10.1.0.3 │ 3306 │ 87 │ Slave, Synced, Running │ 0-2-410,2-2-2 │
├────────┼───────────┼──────┼─────────────┼─────────────────────────┼───────────────────────────────┤
│ sql3 │ 10.1.0.4 │ 3306 │ 87 │ Slave, Synced, Running │ 2-2-2 │
└────────┴───────────┴──────┴─────────────┴─────────────────────────┴───────────────────────────────┘
Shouldn't i expect to see a high number of connections on nodes 2 and 3 (slaves) and a slow number on node 1?
By default readwritesplit creates a connection to all nodes. You need to define max_slave_connections=1 to have it create only one slave connection.

How do i read a csv file with time in AM and PM in julia

I am using Julia CSV and I am trying to read data with DateTime in the form 10/17/2012 12:00:00 AM i tried
dfmt = dateformat"mm/dd/yyyy HH:MM:SS"
data =CSV.File("./Fremont_Bridge_Bicycle_Counter.csv", dateformat=dfmt) |> DataFrame
println(first(data,8))
but the thing is that I think the AM and PM makes the string not recognized as a date can someone help show how to pass this as a date
You can use the p specifier, which matches AM or PM. With that, your date format would look like this:
dfmt = dateformat"mm/dd/yyyy HH:MM:SS p"
You can see that the parsing is correct:
julia> DateTime("10/17/2012 12:00:00 AM", dfmt)
2012-10-17T00:00:00
To see all the possible format characters, check out the docstring of Dates.DateFormat, which is accessible in the REPL through ?DateFormat.
With the file Fremont_Bridge_Bicycle_Counter.csv
N1, N2, fecha
hola, 3, 10/03/2020 10:30:00
pepe, 5, 10/03/2020 11:40:50
juan, 5, 03/04/2020 20:10:12
And with the julia code:
using DataFrames, Dates, CSV
dfmt = dateformat"mm/dd/yyyy HH:MM:SS p"
data =CSV.File("./Fremont_Bridge_Bicycle_Counter.csv", dateformat=dfmt) |> DataFrame
println(first(data,8))
It gives the right result:
3×3 DataFrame
│ Row │ N1 │ N2 │ fecha │
│ │ String │ Int64 │ DateTime │
├─────┼────────┼───────┼─────────────────────┤
│ 1 │ hola │ 3 │ 2020-10-03T10:30:00 │
│ 2 │ pepe │ 5 │ 2020-10-03T11:40:50 │
│ 3 │ juan │ 5 │ 2020-03-04T20:10:12 │

Problem with Sqlite Query - How to convert the code from SQLite v0.9.0 to v1.0.0

I am prettry new to Julia and I am just playing around, and suddenly
the following code starts throwing errors, but it has worked in the past.
using SQLite
db = SQLite.DB("db")
data = SQLite.Query(db,"SELECT * FROM d")
throws:
ERROR: LoadError: MethodError: no method matching
SQLite.Query(::SQLite.DB, ::String)
can someone please enlighten me wha the problem is? Thank you.
I also tried with lower case: query.
Here is a short MWE of differences using SQLLite (v0.9.0 vs v1.0.0) with the current Julia version (1.3.1).
You do not have the table so you need to create it first:
using SQLite
using DataFrames
db = SQLite.DB("db")
# v0.9.0
SQLite.Query(db,"CREATE TABLE d (col1 INT, col2 varchar2(100))")
# v1.0.0
DBInterface.execute(db,"CREATE TABLE d (col1 INT, col2 varchar2(100))")
Now you can check if the table exits:
julia> SQLite.tables(db) |> DataFrame
1×1 DataFrames.DataFrame
│ Row │ name │
│ │ String⍰ │
├─────┼─────────┤
│ 1 │ d │
Let's insert some rows (note how one should sepearate data from SQL code via precompiled statements):
stmt = SQLite.Stmt(db, "INSERT INTO d (col1, col2) VALUES (?, ?)")
#v0.9.0
SQLite.execute!(stmt; values=(1, "Hello world"))
SQLite.execute!(stmt; values=(2, "Goodbye world"))
#v1.0.0
DBInterface.execute(stmt, (1, "Hello world"))
DBInterface.execute(stmt, (2, "Goodbye world"))
Now let us get the data
v0.9.0
julia> data = SQLite.Query(db,"SELECT * FROM d") |> DataFrame
3×2 DataFrame
│ Row │ col1 │ col2 │
│ │ Int64⍰ │ String⍰ │
├─────┼────────┼───────────────┤
│ 1 │ 1 │ Hello world │
│ 2 │ 2 │ Goodbye world │
v1.0.0
julia> data = DBInterface.execute(db, "select * from d") |> DataFrame
3×2 DataFrame
│ Row │ col1 │ col2 │
│ │ Int64⍰ │ String⍰ │
├─────┼────────┼───────────────┤
│ 1 │ 1 │ Hello world │
│ 2 │ 2 │ Goodbye world │

Kusto query could not be parsed

I have SecurityLog with fields like DstIP_s and want to display records matching my trojanDst table
let trojanDst = datatable (DstIP_s:string)
[ "1.1.1.1","2.2.2.2","3.3.3.3"
];
SecurityLog |
| join trojanDst on DstIP_s
I am getting query could not be parsed error ?
The query you posted has a redundant pipe (|) before the join.
From an efficiency standpoint, make sure the left side of the join is the smaller one, as suggested here: https://learn.microsoft.com/en-us/azure/kusto/query/best-practices#join-operator
This is too long for a comment. As #Yoni L pointed the problem is doubled pipe operator.
For anyone with SQL background join may be a bit counterintuitive(in reality it is kind=innerunique):
JOIN operator:
kind unspecified, kind=innerunique
Only one row from the left side is matched for each value of the on
key. The output contains a row for each match of this row with rows
from the right.
Kind=inner
There's a row in the output for every combination of matching rows
from left and right.
let t1 = datatable(key:long, value:string)
[
1, "a",
1, "b"
];
let t2 = datatable(key:long, value:string)
[
1, "c",
1, "d"
];
t1| join t2 on key;
Output:
┌─────┬───────┬──────┬────────┐
│ key │ value │ key1 │ value1 │
├─────┼───────┼──────┼────────┤
│ 1 │ a │ 1 │ c │
│ 1 │ a │ 1 │ d │
└─────┴───────┴──────┴────────┘
Demo
SQL style JOIN version:
let t1 = datatable(key:long, value:string)
[
1, "a",
1, "b"
];
let t2 = datatable(key:long, value:string)
[
1, "c",
1, "d"
];
t1| join kind=inner t2 on key;
Output:
┌─────┬───────┬──────┬────────┐
│ key │ value │ key1 │ value1 │
├─────┼───────┼──────┼────────┤
│ 1 │ b │ 1 │ c │
│ 1 │ a │ 1 │ c │
│ 1 │ b │ 1 │ d │
│ 1 │ a │ 1 │ d │
└─────┴───────┴──────┴────────┘
Demo
There are many join types in KQL such as innerunique, inner, leftouter, rightouter, fullouter, anti and more. here you can find the full list

Neo4j: How time-consuming is EVERY branch between node A and F?

The following graph is given:
-> B --> E -
/ \
A - -> F
\ /
-> C --> D -
All nodes are of type task. As properties, they have a start time and an end time (both are of the data type DateTime).
All relationships are CONNECT_TO and are directed to the right. The relationships have no properties.
Can somebody help me how the following query should look like in Cypher:
How time-consuming is EVERY branch between node A and F?
A list as result would be fine:
Path Duration [minutes]
---------------------------------
A->B->E->F 100
A->C->D->F 50
Thanks for your help.
Creating your graph
The first statement creates the nodes, the second the relationships between them.
CREATE
(TaskA:Task {name: 'TaskA', time:10}),
(TaskB:Task {name: 'TaskB', time:20}),
(TaskC:Task {name: 'TaskC', time:30}),
(TaskD:Task {name: 'TaskD', time:10}),
(TaskE:Task {name: 'TaskE', time:40}),
(TaskF:Task {name: 'TaskF', time:10})
CREATE
(TaskA)-[:CONNECT_TO]->(TaskB),
(TaskB)-[:CONNECT_TO]->(TaskE),
(TaskE)-[:CONNECT_TO]->(TaskF),
(TaskA)-[:CONNECT_TO]->(TaskC),
(TaskC)-[:CONNECT_TO]->(TaskD),
(TaskD)-[:CONNECT_TO]->(TaskF);
Your desired solution
Defining your start node (Task A)
Finding path of variable length
Defining your end node (Task F)
Retrieve all task nodes for each path
Sum the duration for all tasks of each path
Bonus: amount of tasks per path
Neo4j Statement:
// |----------- 1 -----------| |----- 2 ----| |----------- 3 -----------|
MATCH path = (taskA:Task {name: 'TaskA'})-[:CONNECT_TO*]->(taskF:Task {name: 'TaskF'})
UNWIND
// |-- 4 -|
nodes(path) AS task
// |---- 5 -----| |--- 6 ----|
RETURN path, sum(task.time) AS timeConsumed, length(path)+1 AS taskAmount;
Result
╒══════════════════════════════════════════════════════════════════════╤════════════════╤════════════╕
│"path" │ "timeConsumed" │"taskAmount"│
╞══════════════════════════════════════════════════════════════════════╪════════════════╪════════════╡
│[{"name":"TaskA","time":10},{},{"name":"TaskB","time":20},{"name":"Tas│80 │4 │
│kB","time":20},{},{"name":"TaskE","time":40},{"name":"TaskE","time":40│ │ │
│},{},{"name":"TaskF","time":10}] │ │ │
├──────────────────────────────────────────────────────────────────────┼────────────────┼────────────┤
│[{"name":"TaskA","time":10},{},{"name":"TaskC","time":30},{"name":"Tas│60 │4 │
│kC","time":30},{},{"name":"TaskD","time":10},{"name":"TaskD","time":10│ │ │
│},{},{"name":"TaskF","time":10}] │ │ │
└──────────────────────────────────────────────────────────────────────┴────────────────┴────────────┘

Resources