Neo4j: How time-consuming is EVERY branch between node A and F?

Neo4j: How time-consuming is EVERY branch between node A and F? - graph

The following graph is given:
-> B --> E -
/ \
A - -> F
\ /
-> C --> D -
All nodes are of type task. As properties, they have a start time and an end time (both are of the data type DateTime).
All relationships are CONNECT_TO and are directed to the right. The relationships have no properties.
Can somebody help me how the following query should look like in Cypher:
How time-consuming is EVERY branch between node A and F?
A list as result would be fine:
Path Duration [minutes]
---------------------------------
A->B->E->F 100
A->C->D->F 50
Thanks for your help.

Creating your graph
The first statement creates the nodes, the second the relationships between them.
CREATE
(TaskA:Task {name: 'TaskA', time:10}),
(TaskB:Task {name: 'TaskB', time:20}),
(TaskC:Task {name: 'TaskC', time:30}),
(TaskD:Task {name: 'TaskD', time:10}),
(TaskE:Task {name: 'TaskE', time:40}),
(TaskF:Task {name: 'TaskF', time:10})
CREATE
(TaskA)-[:CONNECT_TO]->(TaskB),
(TaskB)-[:CONNECT_TO]->(TaskE),
(TaskE)-[:CONNECT_TO]->(TaskF),
(TaskA)-[:CONNECT_TO]->(TaskC),
(TaskC)-[:CONNECT_TO]->(TaskD),
(TaskD)-[:CONNECT_TO]->(TaskF);
Your desired solution
Defining your start node (Task A)
Finding path of variable length
Defining your end node (Task F)
Retrieve all task nodes for each path
Sum the duration for all tasks of each path
Bonus: amount of tasks per path
Neo4j Statement:
// |----------- 1 -----------| |----- 2 ----| |----------- 3 -----------|
MATCH path = (taskA:Task {name: 'TaskA'})-[:CONNECT_TO*]->(taskF:Task {name: 'TaskF'})
UNWIND
// |-- 4 -|
nodes(path) AS task
// |---- 5 -----| |--- 6 ----|
RETURN path, sum(task.time) AS timeConsumed, length(path)+1 AS taskAmount;
Result
╒══════════════════════════════════════════════════════════════════════╤════════════════╤════════════╕
│"path" │ "timeConsumed" │"taskAmount"│
╞══════════════════════════════════════════════════════════════════════╪════════════════╪════════════╡
│[{"name":"TaskA","time":10},{},{"name":"TaskB","time":20},{"name":"Tas│80 │4 │
│kB","time":20},{},{"name":"TaskE","time":40},{"name":"TaskE","time":40│ │ │
│},{},{"name":"TaskF","time":10}] │ │ │
├──────────────────────────────────────────────────────────────────────┼────────────────┼────────────┤
│[{"name":"TaskA","time":10},{},{"name":"TaskC","time":30},{"name":"Tas│60 │4 │
│kC","time":30},{},{"name":"TaskD","time":10},{"name":"TaskD","time":10│ │ │
│},{},{"name":"TaskF","time":10}] │ │ │
└──────────────────────────────────────────────────────────────────────┴────────────────┴────────────┘

Related

How to wire queries in Nebula explorer workflow properly

I am using NebulaGraph Explorer Workflow to create DAG pipelines on NebulaGraph today.
I am creating a DAG with three tasks like this:
┌──────────────────────────────────────────┐
│ │
│ MATCH ()-[e]->() │
│ WITH e LIMIT 10000 │
│ WITH e AS e │
│ WHERE e.goals > 10 │
│ AND toFloat(e.goals)/e.caps > 0.2 │
│ RETURN src(e), dst(e) │
│ │ │
└──────────┼───────────────────────────────┘
│
▼
┌──────────────────────────────────┐
│ │
│ MATCH (v0)-[:belongto]-(v1) │
│ WHERE id(v0) == ${src} │ # here I mapped src(e) from last task into ${src}
│ RETURN id(v0), id(v1) │
│ │ │ │
└───────────┼────────┼─────────────┘
│ │
┌──────────▼────────▼────────────┐
│ │
│ Betweenness Centrality │
│ │
│ │
└────────────────────────────────┘
While it failed in the second task:
Process exited with status 1
/root/nebula-analytics-3.3.0-centos7.x86_64/3rd/mpich/bin/mpiexec.hydra -n 1 -hosts 192.168.8.237 /root/nebula-analytics-3.3.0-centos7.x86_64/bin/exec_ngql --ngql=MATCH (v0)-[:belongto]-(v1)
WHERE id(v0) == "Álvaro Morata","Romelu Lukaku","Neymar","Naïm Sliti","Mehdi Taremi","Mario Götze","Kylian Mbappé","Kasper Dolberg","Jordan Morris","Joel Campbell","Ivan Perišić","Enner Valencia (captain","Christian Eriksen","Alphonso Davies","Ali Assadalla","Akram Afif","İlkay Gündoğan","Youssef En-Nesyri","Serge Gnabry","Sardar Azmoun","Richarlison","Raúl Jiménez","Mohammed Muntari","Almoez Ali","Memphis Depay","Marcus Rashford","Luis Suárez","Lucas Cavallini","Leroy Sané","Karl Toko Ekambi","Hirving Lozano","Haris Seferovic","Hakim Ziyech","Gareth Bale (captain","Gabriel Jesus","Dušan Tadić (captain","Christian Pulisic","Bruno Fernandes","Andrej Kramarić","Aleksandar Mitrović","Aaron Ramsey","Thomas Partey","Robert Lewandowski (captain","Raheem Sterling","Lionel Messi (captain","Kwon Chang-hoon","Junior Hoilett","Hassan Al-Haydos (captain","Cristiano Ronaldo (captain","André Silva","André Ayew (captain","Alireza Jahanbakhsh","Ángel Di María","Wahbi Khazri","Vincent Aboubakar (captain","Thomas Müller","Takumi Minamino","Leon Goretzka","Krzysztof Piątek","Karim Benzema","Jordan Ayew","Harry Kane (captain","Ferran Torres","Cyle Larin","Arkadiusz Milik","Antoine Griezmann","Xherdan Shaqiri","Son Heung-min (captain","Salem Al-Dawsari","Olivier Giroud","Michy Batshuayi","Lautaro Martínez","Kevin De Bruyne","Karim Ansarifard","Jonathan David","Hwang Ui-jo","Eric Maxim Choupo-Moting","Edinson Cavani","Eden Hazard (captain"
RETURN id(v0), id(v1) --threads=1 --datasource_user=root --datasink_hdfs_url=hdfs://192.168.8.168:9000/ll_test/analytics/1999/tasks/query_2/ --datasource_graphd=192.168.8.131:9669 --datasource_space=fifa_2020 --datasource_graphd_timeout=60000
I20230109 02:52:37.758909 2224521 base.hpp:179] thread support level provided by MPI:
I20230109 02:52:37.759177 2224521 base.hpp:182] MPI_THREAD_MULTIPLE
I20230109 02:52:37.759189 2224521 base.hpp:215] threads: 1
I20230109 02:52:37.759192 2224521 base.hpp:216] sockets: 2
I20230109 02:52:37.759194 2224521 base.hpp:217] partitions: 1
I20230109 02:52:37.759402 2224521 license.cc:519] [part-0]Signature validation started
I20230109 02:52:37.759697 2224521 license.cc:547] [part-0]Signature validation succeed
I20230109 02:52:37.759745 2224521 license.cc:717] [part-0][License] Trial license detected, hardware checking is skipped.
I20230109 02:52:37.759830 2224521 license.cc:623] [part-0]The number of cpus of the current machine is 4
I20230109 02:52:37.759858 2224521 license.cc:253] [License] Expiration timestamp in UTC: 4826275199
I20230109 02:52:37.759866 2224521 license.cc:259] [License] Timezone difference: 0 seconds
I20230109 02:52:37.759869 2224521 license.cc:263] [License] Expiration timestamp in local time zone: 4826275199
I20230109 02:52:37.759872 2224521 license.cc:607] [part-0][License] Expiration check passed
I20230109 02:52:37.810142 2224521 exec_ngql.cc:52] stmt:MATCH (v0)-[:belongto]-(v1)
WHERE id(v0) == "Álvaro Morata","Romelu Lukaku","Neymar","Naïm Sliti","Mehdi Taremi","Mario Götze","Kylian Mbappé","Kasper Dolberg","Jordan Morris","Joel Campbell","Ivan Perišić","Enner Valencia (captain","Christian Eriksen","Alphonso Davies","Ali Assadalla","Akram Afif","İlkay Gündoğan","Youssef En-Nesyri","Serge Gnabry","Sardar Azmoun","Richarlison","Raúl Jiménez","Mohammed Muntari","Almoez Ali","Memphis Depay","Marcus Rashford","Luis Suárez","Lucas Cavallini","Leroy Sané","Karl Toko Ekambi","Hirving Lozano","Haris Seferovic","Hakim Ziyech","Gareth Bale (captain","Gabriel Jesus","Dušan Tadić (captain","Christian Pulisic","Bruno Fernandes","Andrej Kramarić","Aleksandar Mitrović","Aaron Ramsey","Thomas Partey","Robert Lewandowski (captain","Raheem Sterling","Lionel Messi (captain","Kwon Chang-hoon","Junior Hoilett","Hassan Al-Haydos (captain","Cristiano Ronaldo (captain","André Silva","André Ayew (captain","Alireza Jahanbakhsh","Ángel Di María","Wahbi Khazri","Vincent Aboubakar (captain","Thomas Müller","Takumi Minamino","Leon Goretzka","Krzysztof Piątek","Karim Benzema","Jordan Ayew","Harry Kane (captain","Ferran Torres","Cyle Larin","Arkadiusz Milik","Antoine Griezmann","Xherdan Shaqiri","Son Heung-min (captain","Salem Al-Dawsari","Olivier Giroud","Michy Batshuayi","Lautaro Martínez","Kevin De Bruyne","Karim Ansarifard","Jonathan David","Hwang Ui-jo","Eric Maxim Choupo-Moting","Edinson Cavani","Eden Hazard (captain"
RETURN id(v0), id(v1)
I20230109 02:52:37.810698 2224521 exec_ngql.cc:55] session execute failed, statment: MATCH (v0)-[:belongto]-(v1)
WHERE id(v0) == "Álvaro Morata","Romelu Lukaku","Neymar","Naïm Sliti","Mehdi Taremi","Mario Götze","Kylian Mbappé","Kasper Dolberg","Jordan Morris","Joel Campbell","Ivan Perišić","Enner Valencia (captain","Christian Eriksen","Alphonso Davies","Ali Assadalla","Akram Afif","İlkay Gündoğan","Youssef En-Nesyri","Serge Gnabry","Sardar Azmoun","Richarlison","Raúl Jiménez","Mohammed Muntari","Almoez Ali","Memphis Depay","Marcus Rashford","Luis Suárez","Lucas Cavallini","Leroy Sané","Karl Toko Ekambi","Hirving Lozano","Haris Seferovic","Hakim Ziyech","Gareth Bale (captain","Gabriel Jesus","Dušan Tadić (captain","Christian Pulisic","Bruno Fernandes","Andrej Kramarić","Aleksandar Mitrović","Aaron Ramsey","Thomas Partey","Robert Lewandowski (captain","Raheem Sterling","Lionel Messi (captain","Kwon Chang-hoon","Junior Hoilett","Hassan Al-Haydos (captain","Cristiano Ronaldo (captain","André Silva","André Ayew (captain","Alireza Jahanbakhsh","Ángel Di María","Wahbi Khazri","Vincent Aboubakar (captain","Thomas Müller","Takumi Minamino","Leon Goretzka","Krzysztof Piątek","Karim Benzema","Jordan Ayew","Harry Kane (captain","Ferran Torres","Cyle Larin","Arkadiusz Milik","Antoine Griezmann","Xherdan Shaqiri","Son Heung-min (captain","Salem Al-Dawsari","Olivier Giroud","Michy Batshuayi","Lautaro Martínez","Kevin De Bruyne","Karim Ansarifard","Jonathan David","Hwang Ui-jo","Eric Maxim Choupo-Moting","Edinson Cavani","Eden Hazard (captain"
RETURN id(v0), id(v1)
errorCode: -1004, errorMsg: SyntaxError: syntax error near `E id(v0)'
I am following the docs chapter 4, while it seems something went wrong anyway.
I tried to use the GO clause instead of MATCH in the second task, but it complained of similar errors.
Could anyone help answer at where I could be wrong?

ArgumentError: no default `Tables.columns` implementation for type: XLSX.XLSXFile

I have a very simple excel file (.xlsx) it contains just 2 columns and 2 rows with values.
But when I run the code below I get ArgumentError: no default `Tables.columns` implementation for type: XLSX.XLSXFile, for both readxlsx and openxlsx Why is that? Is it something with DataFrames or with XLXS pkg?
The excel file looks like this
| text | text |
------------------
| 0 | 1 |
------------------
using DataFrames
using XLSX
df = XLSX.readxlsx("Test1.xlsx")
As a suggested solution I'm running the following code
The excel file looks like this
| text | text |
------------------
| 0 | 1 |
------------------
using DataFrames
using XLSX
df = DataFrame(XLSX.readtable("Test1.xlsx", "Blad1"))
but that gives the following error ArgumentError: 'Tuple{Vector{Any}, Vector{Symbol}}' iterates 'Vector{Any}' values, which doesn't satisfy the Tables.jl `AbstractRow` interface

The tutorial includes a relevant example with readtable:
julia> using DataFrames, XLSX
julia> df = DataFrame(XLSX.readtable("myfile.xlsx", "mysheet"))
3×2 DataFrames.DataFrame
│ Row │ HeaderA │ HeaderB │
├─────┼─────────┼──────────┤
│ 1 │ 1 │ "first" │
│ 2 │ 2 │ "second" │
│ 3 │ 3 │ "third" │
I've tried to replicate your setup as closely as possible with the information you've given. However, I can't replicate your error with readtable:
$ julia
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.8.1 (2022-09-06)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
(#v1.8) pkg> activate --temp
Activating new project
julia> using DataFrames, XLSX
julia> df = DataFrame(XLSX.readtable("Test1.xlsx", "Blad1"))
1×2 DataFrame
Row │ Zero One
│ Any Any
─────┼───────────
1 │ 0 1
This makes me suspect that the problem is elsewhere. Could you please try running everything in a temporary project, as I've done here? I want to be sure that another package is not interfering.

Kusto query could not be parsed

I have SecurityLog with fields like DstIP_s and want to display records matching my trojanDst table
let trojanDst = datatable (DstIP_s:string)
[ "1.1.1.1","2.2.2.2","3.3.3.3"
];
SecurityLog |
| join trojanDst on DstIP_s
I am getting query could not be parsed error ?

The query you posted has a redundant pipe (|) before the join.
From an efficiency standpoint, make sure the left side of the join is the smaller one, as suggested here: https://learn.microsoft.com/en-us/azure/kusto/query/best-practices#join-operator

This is too long for a comment. As #Yoni L pointed the problem is doubled pipe operator.
For anyone with SQL background join may be a bit counterintuitive(in reality it is kind=innerunique):
JOIN operator:
kind unspecified, kind=innerunique
Only one row from the left side is matched for each value of the on
key. The output contains a row for each match of this row with rows
from the right.
Kind=inner
There's a row in the output for every combination of matching rows
from left and right.
let t1 = datatable(key:long, value:string)
[
1, "a",
1, "b"
];
let t2 = datatable(key:long, value:string)
[
1, "c",
1, "d"
];
t1| join t2 on key;
Output:
┌─────┬───────┬──────┬────────┐
│ key │ value │ key1 │ value1 │
├─────┼───────┼──────┼────────┤
│ 1 │ a │ 1 │ c │
│ 1 │ a │ 1 │ d │
└─────┴───────┴──────┴────────┘
Demo
SQL style JOIN version:
let t1 = datatable(key:long, value:string)
[
1, "a",
1, "b"
];
let t2 = datatable(key:long, value:string)
[
1, "c",
1, "d"
];
t1| join kind=inner t2 on key;
Output:
┌─────┬───────┬──────┬────────┐
│ key │ value │ key1 │ value1 │
├─────┼───────┼──────┼────────┤
│ 1 │ b │ 1 │ c │
│ 1 │ a │ 1 │ c │
│ 1 │ b │ 1 │ d │
│ 1 │ a │ 1 │ d │
└─────┴───────┴──────┴────────┘
Demo

There are many join types in KQL such as innerunique, inner, leftouter, rightouter, fullouter, anti and more. here you can find the full list

neo4j mean of property for all friends

Having a graph like:
CREATE (Alice:Person {id:'a', fraud:1})
CREATE (Bob:Person {id:'b', fraud:0})
CREATE (Charlie:Person {id:'c', fraud:0})
CREATE (David:Person {id:'d', fraud:0})
CREATE (Esther:Person {id:'e', fraud:0})
CREATE (Fanny:Person {id:'f', fraud:0})
CREATE (Gabby:Person {id:'g', fraud:0})
CREATE (Fraudster:Person {id:'h', fraud:1})
CREATE
(Alice)-[:CALL]->(Bob),
(Bob)-[:SMS]->(Charlie),
(Charlie)-[:SMS]->(Bob),
(Fanny)-[:SMS]->(Charlie),
(Esther)-[:SMS]->(Fanny),
(Esther)-[:CALL]->(David),
(David)-[:CALL]->(Alice),
(David)-[:SMS]->(Esther),
(Alice)-[:CALL]->(Esther),
(Alice)-[:CALL]->(Fanny),
(Fanny)-[:CALL]->(Fraudster)
When trying to query like:
MATCH (a)-->(b)
WHERE b.fraud = 1
RETURN (count() / ( MATCH (a) -->(b) RETURN count() ) * 100)
I want to compute the fraudulence of a user which (as fraud is only either 0 or 1 is defined as the mean of all connected nodes fraud level:
MATCH ()--(f)
RETURN f.id, f.fraud, COUNT(*), COLLECT(f) AS fs
returns the correct number of friends, but is not able to access these i.e. in the collect statement is only accessing the node itself:
╒══════╤═════════╤══════════════╤══════════╤══════════════════════════════════════════════════════════════════════╕
│"f.id"│"f.fraud"│"avg(f.fraud)"│"COUNT(*)"│"fs" │
╞══════╪═════════╪══════════════╪══════════╪══════════════════════════════════════════════════════════════════════╡
│"h" │1 │1 │1 │[{"fraud":1,"id":"h"}] │
├──────┼─────────┼──────────────┼──────────┼──────────────────────────────────────────────────────────────────────┤
│"f" │0 │0 │4 │[{"fraud":0,"id":"f"},{"fraud":0,"id":"f"},{"fraud":0,"id":"f"},{"frau│
│ │ │ │ │d":0,"id":"f"}] │
....
I.e. naively calculating the average
MATCH ()--(f)
RETURN f.id, avg(f.fraud)
will only consider this single node and not the network. How can I consider the social network of a node instead (up to a defined depth, i.e. here 1) to improve the original answer of neo4j percentage of attribute for social network
edit
MATCH p = ()--()
UNWIND nodes(p) AS f
RETURN f.id, f.fraud, COUNT(*), COLLECT({id: f.id, fraud: f.fraud}) AS fs
will return only duplicates of the original node in the list and not the connected nodes:
│"f.id"│"f.fraud"│"COUNT(*)"│"fs" │
╞══════╪═════════╪══════════╪══════════════════════════════════════════════════════════════════════╡
│"h" │1 │2 │[{"id":"h","fraud":1},{"id":"h","fraud":1}] │
├──────┼─────────┼──────────┼──────────────────────────────────────────────────────────────────────┤
│"f" │0 │8 │[{"id":"f","fraud":0},{"id":"f","fraud":0},{"id":"f","fraud":0},{"id":│
│ │ │ │"f","fraud":0},{"id":"f","fraud":0},{"id":"f","fraud":0},{"id":"f","fr│
│ │ │ │aud":0},{"id":"f","fraud":0}] │
edit 2
MATCH p = (source)--(destination)
RETURN source.id, source.fraud, COUNT(*), COLLECT({id: destination.id, fraud: destination.fraud}) AS neighbors
is already pretty close - but lacking the avg function

MATCH p = (source)-[*..3]-(destination)
RETURN source.id, source.fraud, COUNT(*), avg(destination.fraud), COLLECT({id: destination.id, fraud: destination.fraud}) AS neighbors
includes the fraudulence defined as the average

Julia DataFrame columns starting with number?

This may be a stupid question, but for the life of me I can't figure out how to get Julia to read a csv file with column names that start with numbers and use them in DataFrames. How does one do this?
For example, say I have the file "test.csv" which contains the following:
,1Y,2Y,3Y
1Y,11,12,13
2Y,21,22,23
If I just use readtable(), I get this:
julia> using DataFrames
julia> df = readtable("test.csv")
2x4 DataFrames.DataFrame
| Row | x | x1Y | x2Y | x3Y |
|-----|------|-----|-----|-----|
| 1 | "1Y" | 11 | 12 | 13 |
| 2 | "2Y" | 21 | 22 | 23 |
What gives? How can I get the column names to be what they're supposed to be, "1Y, "2Y, etc.?

The problem is that in DataFrames, column names are symbols, which aren't meant to (see comment below) start with a number.
You can see this by doing e.g. typeof(:2), which will return Int64, rather than (as you might expect) Symbol. Thus, to get your columnnames into a useable format, DataFrames will have to prefix it with a letter - typeof(:x2) will return Symbol, and is therefore a valid column name.

Unfortunately, you can't use numbers for starting names in DataFrames.
The code that does the parsing of names makes sure that this restriction stays like this.
I believe this is because of how parsing takes place in julia: :aa names a symbol, while :2aa is a value (makes more sense considering 1:2aa is a range)

You could just use rename!() after the import:
df = csv"""
,1Y,2Y,3Y
1Y,11,12,13
2Y,21,22,23
"""
rename!(df, Dict(:x1Y =>Symbol("1Y"), :x2Y=>Symbol("2Y"), :x3Y=>Symbol("3Y") ))
2×4 DataFrames.DataFrame
│ Row │ x │ 1Y │ 2Y │ 3Y │
├─────┼──────┼────┼────┼────┤
│ 1 │ "1Y" │ 11 │ 12 │ 13 │
│ 2 │ "2Y" │ 21 │ 22 │ 23 │
Still you may experience problems later in your code, better to avoid column names starting with numbers...

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Neo4j: How time-consuming is EVERY branch between node A and F? - graph

Related

How to wire queries in Nebula explorer workflow properly

ArgumentError: no default `Tables.columns` implementation for type: XLSX.XLSXFile

Kusto query could not be parsed

neo4j mean of property for all friends

Julia DataFrame columns starting with number?

Categories

Resources