I have model implemented in titan graph database with relations presented below:
[A] ---(e1)---> [B] <---(e2)--- [C] ---(e3)---> [D]
| | | | | | |
prop:id | prop:number | | label:e3 |
| | prop:id |
label:e1 label:e2 prop:number
prop:prop1
A and B are "main vertices" (for example users), vertices B and C are "less important vertices" describing some data connected with users.
The input for the query algorithm is property id of vertex A.
I want to find all such vertices D, that are connected with A in the manner shown above. What's more I want to remember the property prop1 of the edge e1 between A and B.
More precisely, I want to efficiently retrieve pairs (prop1, numberD) where prop1 is the property of edge between A -> B (if the edge has this property), and numberD is the property number from D.
I don't know how to efficiently implement this query.
It is easy to retrieve only vertices D (using GremlinPipes):
pipe
.start(startVertex)
.outE("e1")
.inV().hasProperty("number")
.inE("e2")
.outV().hasProperty("id")
.outE("e3")
.inV().hasProperty("number");
But problems occur when I need to get also edges e1 and match them with vertices D.
I tried to compute all these steps separately, but is seems to be very inefficient.
Do you have any suggestions how to implement this (maybe using several queries) using gremlin-java or gremlin-groovy?
Thanks!
Take a look at the Pattern Match Pattern described here:
https://github.com/tinkerpop/gremlin/wiki/Pattern-Match-Pattern
startVertex.outE('e1').as('e')
.inV().hasProperty('number').inE("e2")
.outV().hasProperty("id")
.outE("e3")
.inV().hasProperty("number").as('d')
.table(t)
This should give an iterator of maps
[e:e1, d:D]
From each of these maps, you can easily extract the properties you are interested in.
Related
Why in the get subgraph statement, I can't use the vertex in the format of tagName.propName like in the go statement.
(root#nebula) [subgraph]> GET SUBGRAPH 6 STEPS FROM "player101" WHERE $$.player.age>50 YIELD VERTICES AS nodes, EDGES AS relationships;
+---------------------------+---------------+
| nodes | relationships |
+---------------------------+---------------+
| [("player101" :player{})] | [] |
+---------------------------+---------------+
Got 1 rows (time spent 5803/7091 us)
Thu, 20 Oct 2022 03:45:11 UTC
(root#nebula) [subgraph]> GET SUBGRAPH 6 STEPS FROM "player101" WHERE player.age>50 YIELD VERTICES AS nodes, EDGES AS relationships;
[ERROR (-1005)]: EdgeName `player' is nonexistent
(root#nebula) [subgraph]> GO FROM "player100" OVER follow WHERE follow.degree > 90 YIELD dst(edge);
+-----------+
| dst(EDGE) |
+-----------+
+-----------+
In the GET SUBGRAPH statement, you cannot use the vertex in the format of tagName.propName because the WHERE clause of the GET SUBGRAPH statement operates on the vertices and edges within the subgraph, rather than on the starting vertex or edge that you specify in the FROM clause.
Therefore, you should specify the vertex or edge properties directly in the WHERE clause using a reference symbol such as &&, rather than using the tagName.propName syntax.
To put it in a simple way, NebulaGraph does not support such a way of use.
I was asked to find the asymptotic complexity of the given function using recursion tree
but I'm struggling to find the correct complexity at each level
Let's draw out the first two levels of the recursion tree:
+------------------+
| Input size n |
| Work done: n^2 |
+------------------+
/ \
+--------------------+ +--------------------+
| Input size: 3n/4 | | Input size: n/3 |
| Work done: 9n^2/16 | | Work done: n^2/9 |
+--------------------+ +--------------------+
Once we've done that, let's sum up the work done by each layer. That top layer does n2 work. That next layer does
(9/16)n2 + (1/9)n2 = (43/48)n2
total work. Notice that the work done by this second level is (43/48)ths of the work done in the level just above it. If you expand out a few more levels of the recursion tree, you'll find that the next level does (43/48)2n2 work, the level below that does (43/48)3n2 work, and that more generally the work done by level l in the tree is (43/48)ln2. (Convince yourself of this - don't just take my word for it!)
From there, you can compute the total amount of work done by recursion tree by summing up the work done per level across all the levels of the tree. As a hint, you're looking at the sum of a geometric sequence that decays from one term to the next - does this remind you of any of the cases of the Master Theorem?
This question concerns to Oracle DB, so if there are general answers I would like to know. As I am discarding information from Derby/MySQL and other DBs regarding this subject.
Let's say I have several queries using the following columns on its WHERE clause:
Column | Cardinality | Selectivity
_______|__________________________
A | low | low
B | high | low
C | low | low
D | high | high
E | low | low
F | low | low
-- Queries
SELECT * FROM T WHERE A=? AND B=?
SELECT * FROM T WHERE A=? AND B=? AND C=?
SELECT * FROM T WHERE A=? AND C=?
SELECT * FROM T WHERE A=? AND C=? AND D=?
SELECT * FROM T WHERE A=? AND E=? AND F=?
Is there any benefit from pairing these columns (taking into account cardinality mixing) as composite indexes? If so, what is the logic to follow?
I have understood this explanation but it is for SQL Server and it may behave differently.
Is it worthwhile to do covering indexes instead of individual small composite indexes?
Does it matter the column order of composite indexes? i.e:
-- Regardless the column order on the table creation.
CREATE INDEX NDX_1 ON T (A, C);
-- Versus:
CREATE INDEX NDX_1 ON T (C, A);
Would this index be useful?
CREATE INDEX NDX_2 ON T(E, F); -- (low + low) Ignoring 'A' column.
A few things and bear in mind these are generalities
Generally you can only use the leading parts of an index. So looking
at your examples
If you have an index on ( A, B, C ) and you have a predicate on A and
C, then only the index on A can be used. Now there are some cases
where the non-leading part of an index can be used; you will see
this in an execution plan as a SKIP-SCAN operation, but they are
often sub-optimal. So you may want to have (A, C) and ( C, A )
A covering index can be useful, if you are not projecting columns other than those in the index.
Again generally, you do not usually want or need an index if the column has low selectivity. However, it's possible that you have two columns that individually have low selectivity, but have high selectivity when used in combination. (In fact, this is the premise of a bitmap index / star transformation in a dimensional model).
If a multi-column index is useful you may want to put the column with the lowest selectivity first and enable index compression. Index compression can save a huge amount of space in some cases and has very little CPU overhead.
Finally, a SQL Monitor report will help you optimizing a sql statement when it comes to running it.
The minimum number indexes to optimally handle all 5 cases:
(A, B, C) -- in exactly this order
(A, C, D) -- in exactly this order
(A, E, F) -- in any order
If you add another SELECT, all bets are off.
When to have (A, C) and (C, A)?...
Each handles the case where only the first column is being used.
The former is optimal for WHERE A=1 AND C>5; the latter is not. (Etc) Note: = versus some kind of "range" test matters.
When designing indexes for a table, first write out all the queries.
More discussion: Higher cardinality column first in an index when involving a range?
I have a typical friend of friend graph database i.e. a social network database. The requirement is to extract all the nodes as a list in such a way that the least connected nodes appear together in the list and the most connected nodes are placed further apart in the list.
Basically its asking a graph to be represented as a list and I'm not sure if we can really do that. For e.g. if A is related to B with strength 10, B is related to C with strength 80, A to C is 20
then how to place this in a list ?
A, B, C - no because then A is distant from C relatively more than B which is not the case
A, C, B - yes because A and B are less related that A,C and C,B.
With 3 nodes its very simple but with lot of nodes - is it possible to put them in a list based on relationship strength ?
Ok, I think this is maybe what you want. An inverse of the shortestPath traversal with weights. If not, tell me how the output should be.
http://console.neo4j.org/r/n8npue
MATCH p=(n)-[*]-(m) // search all paths
WHERE n <> m
AND ALL (x IN nodes(p) WHERE length([x2 IN nodes(p) WHERE x2=x])=1) // this filters simple paths
RETURN [n IN nodes(p)| n.name] AS names, // get the names out
reduce(acc=0, r IN relationships(p)| acc + r.Strength) AS totalStrength // calculate total strength produced by following this path
ORDER BY length(p) DESC , totalStrength ASC // get the max length (hopefully a full traversal), and the minimum strength
LIMIT 1
This is not going to be efficient for a large graph, but I think it's definitely doable--probably needs using the traversal/graphalgo API shortest path functionality if you need speed on a large graph.
http://imageshack.us/photo/my-images/707/graphpw.png/
I would like to know how I can get the number of leaf node from certain node using method or something in neo4j ?
Example.
At Node A --> contains 12 leaf nodes
At Node B --> contains 6 leaf nodes
Thanks in advance.
I would model the intermediate relationships as contains and the leaf relationships as leaf, see http://console.neo4j.org/r/ulo3yc
Then, you can do
With a setup of
create (f1{name:'folder1'}), ({name:'root'})-[:contains]->(f1)-[:leaf]-> (f2{name:'folder2'}), f1-[:leaf]->({name:'folder3'})
you can do something like
start root=node(1)
match root-[:contains*0..]->()-[:leaf]->leaf
return leaf
returning
+-------------------------+
| leaf |
+-------------------------+
| Node[2]{name:"folder2"} |
| Node[3]{name:"folder3"} |
+-------------------------+