use aggregate functions in the WHERE clause (Neo4j) - graph

How do I select all nodes that are connected to node(2) [from] with more than one path?
START from=node(2)
MATCH p=from-->to
where count(p) > 1
return from,to
To Neo4J team : Any plans to implement Count/Having functions?
great job so far with the product!

actually found the solution combining the 'WITH' keyword
START from=node(*)
MATCH p=from-->to
WITH from as from , to as to, count(p) as paths
WHERE paths >1
RETURN to,paths

Related

How to migrate Presto `map` function to hive

Presto map() function is quite a bit easier to use than hive. A presto map() invocation takes two lists: first one for the keys second for the values
A hive map() takes a varargs variable length parameter set of alternating key,values.
Here is a query snippet that I need to migrate (backwards?) from presto to hive:
, map(
concat(map_keys(decision_feature_importance), array['id_queue', 'queue_disposition']),
concat(map_values(decision_feature_importance), array[CAST(id_queue AS VARCHAR), queue_disposition])) other_info
The core of it is that the map() accepts two parallel arrays. But hive objects rather strongly to that. What is the pattern to [reverse- ?] migrate the map() ?
There are several questions about zipping lists in hive: e.g hive create map or key/value pair from two arrays They are pretty complicated, may involve UDF's (that I do not have ability to create) or libraries (brickhouse) that I do not have ability to install (shared cluster for hundreds of users). Also they constitute only a portion of the problem here.
The following toy query shows how to build the hive format map entries from two parallel lists. Basically we need to zip the lists manually - since there is no such builtin function for hive.
Hive partial equivalent
with mydata as (
select 1 id, map('key11','val11','key12','val12','key13','val13') as mymap
union all
select 2 id, map('key21','val21','key22','val22','key13','val13') as mymap
)
select split(concat_ws(',',collect_list(concat(key,',',value ))),',') keyval from (
select * from mydata lateral view outer explode (mymap) m
) d;

Gremlin continue traversal only if 2 vertices are not the same

I have a query which looks at 2 different vertices and I want to stop traversing if they don't both roll up to the same root ancestor via a path of "contains" edges.
g.V('node1')
.until(hasLabel('root')).repeat(in('contains')).as('node1Root')
.V('node2')
.until(hasLabel('root')).repeat(in('contains')).as('node2Root')
//FILTER|WHERE clause
I'd like to confirm that node1Root and node2root are the same vertex before continuing the traversal, but for the life of me I cannot figure out how to do this.
I've tried the following:
g.V('node1')
.until(hasLabel('root')).repeat(in('contains')).as('node1Root')
.V('node2')
.until(hasLabel('root')).repeat(in('contains')).as('node2Root')
//.where('node1Root', P.eq('node2Root')
//.where(select("node1Root").is(P.eq("node2Root")))
//.where(select("node1Root").is("node2Root"))
What's interesting is that the following query does work to filter appropriately.
g.V('node1').as('1')
.V('node2').as('2')
.where('1', P.eq('2'))
I'm not sure if there's something up with the until/repeat that screws it up or if I'm just doing something blatantly wrong. Any help would be much appreciated.
Thanks!
I found How to check equality with nodes from an earlier part of query in Gremlin?
and it seems like you use "as" with the same key as the previous "as" and if they match its considered equal.
So here's the winner (I think):
g.V('node1')
.until(hasLabel('root')).repeat(in('contains')).as('node1Root')
.V('node2')
.until(hasLabel('root')).repeat(in('contains')).as('node2Root')
.where(select('node1Root').as('node2Root')
//.not(select('node1Root').as('node2Root')) //OR this to determine they aren't the same
//continue traversal
I also found that my original issue was that the .until().repeat() steps could return a LIST, but in my case I know that my graph model will always return a single 'root' so to make it work, I can use 'unfold'
g.V('node1')
.until(hasLabel('root')).repeat(in('contains')).unfold().as('node1Root')
.V('node2')
.until(hasLabel('root')).repeat(in('contains')).unfold().as('node2Root')
.where('node1Root', P.eq('node2Root')
I think I'll be going with the second solution because I'm much more confident in it, unless I hear otherwise.
You can try this gremlin query
g.V(node1-id)
.map(until(hasLabel('root')).repeat(in().aggregate('x')).cap('x')).as("array")
.V(node2-id)
.until(
as("i").select("array").unfold().as("j")
.where("i", eq("j"))
).repeat(in())
Here we are putting all the vertices in path to root from node1 in an array, and secondly we are checking existence of node in array.
this query can only work with traversal with only one iteration because aggregate step collect to a global variable to traversal that means it will be same array for every iteration. To fix this If you are doing this on jvm do use lamda/groovy closures
g.V(node-start-id-1,node-start-id-2)
.map(
{ x->
var v = x.get()
var g = getGraph().get().traversal();
g.V(v.id())until(hasLabel('root')).repeat(in().aggregate('x')).cap('x')).next()
}
)
.as("array")
.V(node2-id)
.until(
as("i").select("array").unfold().as("j")
.where("i", eq("j"))
).repeat(in())

MDX : get members from a subselect (FILTER BY in MDX+)

I've got the following MDX statement:
WITH
MEMBER [Measures].[ist] AS __get_time_member__
SELECT
// Measures
{[Measures].[ist],[Measures].[soll]} ON 0,
// Rows
FROM [Finance]
FROM ( SELECT [Time].[Time].[month].&[2018-04-01] on 0 from [Finance]
or in MDX+
FILTERBY [Time].[Time].[month].&[2018-04-01]
How can I get in the calculated measure, [ist], the time member defined in the subselect ?
In MDX+ you've a couple of functions that allow to get some informations from the slicer and the subselect :
ContextMember - This works like currentMember including the slicer and subselect
GetFilterInfo(hierarchy) - extracts only from slicer and subselect
In your case you can use GetFilterInfo function with the hierarchy you're looking for.
I guess is just a question of playing around with these functions.
PS: We could easily add GetSlicerInfo and GetSubselectInfo if needed.

Drools, graph traversal, query to find root nodes

I have a Java-side class with essential behaviour like:
declare Datum
description: String
broader: List <Datum>
narrower: List <Datum>
end
I want to write
query rootDatumsFor(Datum datum)
that provides a list of the root datums - that is, work "up" the broader property and return a list of each datum that has an empty broader list.
I am getting totally confused how to write this - mainly because of the negation involved.
I think I want something like
query rootDatumsFor( Datum datum )
not Datum() from $datum.broader
or
rootDatumsFor( $datum.broader )
end
But I am getting confused on both parts. If there are no broader terms, which the not should detect, how do I "return" the current value of $datum? I feel each part wants a $result and I want to do a $result: $datum but that isn't valid.
And I'm not certain how to do the recursion. Should I have rootDatumsFor(datum, result) and do it via binding?
I've seen examples that do things likes Datum( this == $datum) but that doesn't seem to be accepted when I try it.
Any assistance, whilst I keep re-reading the docn to find a little clue how to proceed, would be much appreciated.
To find all Datum facts with an empty broaderlist, all you have to do is
query rootDatumsFor( Datum $datum )
$datum: Datum( broader.size() == 0 )
end

Get nth element of a collection in Cypher

Using Cypher 1.8, there are some functions working on collections and returning a single element:
HEAD( expression ):
START a=node(2)
RETURN a.array, head(a.array)
LAST( expression ):
START a=node(2)
RETURN a.array, last(a.array)
However, I could not find a function to return the nth element of a collection. What am I missing?
There's no good way to do that at the moment. Please submit a feature request at https://github.com/neo4j/neo4j
I've seen people do head(tail(tail(tail(coll)))), and while it's probably acceptably fast, it still makes me a little ill to see in a query, especially if you're talking about the 17th element or worse.
Example:
http://console.neo4j.org/r/bbo6o4
Update:
Here's a way to do it using reduce and range. It makes it so you can give a parameter for nth at least, even though it still makes me cringe:
start n=node(*)
with collect(n) as allnodes
return head(reduce(acc=allnodes, x in range(1,3): tail(acc)));
http://console.neo4j.org/r/8erfup
Update 2 (8/31/2013):
The new collection syntax is now merged into 2.0 and will be theoretically be a part of M05! So, you'll be able to do:
start n=node(*)
with collect(n) as allnodes
return allnodes[3]; // or slices, like [1..3]
I'll add a link to the snapshot documentation when it gets updated.
I've just come across this old question, and for the benefit of anyone else recently coming across it... it seems the list support has improved.
From the Cypher 4 list docs:
Cypher has comprehensive support for lists.
^ Sidenote: I think that's list comprehensions pun? ;-)
They go on to give an example showing how you'd access the n'th element of a list:
To access individual elements in the list, we use the square brackets again. This will extract from the start index and up to but not including the end index.
... we’ll use the range function. It gives you a list containing all numbers between given start and end numbers. Range is inclusive in both ends.
RETURN range(0, 10)[3]
^ returns "3"
Currently, with the release of APOC Procedures 3.3.0.2 you can use aggregation functions.
This way, you can do thinks like:
create (:Node {node_id : 1}),
(:Node {node_id : 2}),
(:Node {node_id : 3});
match(n:Node)
with n order by n.node_id
// returns {"node_id":2}
return apoc.agg.nth(n, 1);
or:
match(n:Node)
with n order by n.node_id
// returns {"node_id":1}
// you can also use apoc.agg.last
return apoc.agg.first(n);
To work with lists UNWIND the list first:
with ['fist', 'second', 'third'] as list
unwind list as value
// returns 'second'
return apoc.agg.nth(value, 1);

Resources