Union and intersection operations on collections in Cypher (Neo4j) - collections

I need to calculate both the union and the intersection of a set of arrays/collections in Cypher. Let's say for instance I have the topics of interest of a number of individuals saved as array properties for each individual node and I need to know (1) the topics that every member of a given group find interesting; but I also need to know (2) the topics that may attract the attention of any of the group members.
So, take the following individuals as the members of a group of two:
CREATE ({name: 'bill', interests: ["biking", "hiking", "fishing", "swimming"]})
CREATE ({name: 'joe', interests: ["swimming", "hiking", "biking", "tennis"]})
Inspired by this great answer I have written the following scripts to get what I need:
Intersection (n.interests ∩ m.interests)
MATCH (n {name:'bill'}), (m {name:'joe'})
RETURN FILTER(x IN n.interests WHERE x IN m.interests)
Response: biking, hiking, swimming
Union (n.interests ∪ m.interests)
MATCH (n {name:'bill'}), (m {name:'joe'})
RETURN FILTER(x IN n.interests WHERE x IN m.interests)+
FILTER(x IN n.interests WHERE NOT(x IN m.interests))+
FILTER(x IN m.interests WHERE NOT(x IN n.interests))
Response: biking, hiking, swimming, fishing, tennis
Both work pretty well for groups of two. The problem is the union script is not generalizable and needs to be expanded further for each additional group member. This is because instead of doing a straightforward n.interests ∪ m.interests I am going the long way by producing (n.interests ∩ m.interests) ∪ (n.interests - m.interests) ∪ (m.interests - n.interests) which equals n.interests ∪ m.interests, but necessitates pairwise comparison of all individuals in the group.
Hence my question: Is there any better way in Cypher to produce the union of two collections/arrays, without redundant results in the response collection?
P.S. As you may have noticed these interests don't really have an ordering, so I am actually treating Neo4j collections as sets.
P.S.2 It is possible that I am misunderstanding and incorrectly conflating the notions of collection and array in Cypher, in which case please don't hesitate to point out what the mistake is.

APOC Procedures has union and intersection functions, should be exactly what you need.
MATCH (n {name:'bill'}), (m {name:'joe'})
RETURN apoc.coll.union(n.interests, m.interests) as interests_union,
apoc.coll.intersection(n.interests, m.interests) as interests_intersection
The above is usable with Neo4j 3.1 and up (which supports user-defined functions). In Neo4j 3.0, these are procedures instead, and you'll need to CALL them as procedures.
This is also easily applied to multiple collections, instead of just two. If the collections are collected, you can run REDUCE() on the list of lists to apply the union or intersection for all collections.

I recently solved the same problem by first taking the duplicated union, and then deduplicating using distinct
MATCH (n {name:'bill'}), (m {name:'joe'})
UNWIND n.interests + m.interests AS interests
RETURN COLLECT(distinct interests) AS interests_union

I think you can generalize it using reduce to produce your collections.
And probably use one of the quantor-predicates (ANY,ALL, SINGLE, NONE)
Something like this for intersection:
WITH [1,2,3] as a, [3,4,5] as b, [2,3,4] as c
REDUCE (res=[], x in a | case when x in b AND x in c then res + [x] else res)
WITH [1,2,3] as a, [3,4,5] as b, [2,3,4] as c
REDUCE (res=[], x in a | case when ALL(coll in [b,c] WHERE x in coll) then res + [x] else res)
But all of these operations won't have really good runtime characteristics.

Related

Finding all topological orders

I have to make an algorithm that finds all the topological orders(using predecessor counting) and the highest cost paths and their costs between 2 pairs of vertices. My algorithm looks like this for now:
def topologicalSort(self):
sorted = []
count = {}
q = deque()
for x in self.parseX():
count[x] = self.innerDegree(x)
if count[x] == 0:
q.append(x)
while len(q) > 0:
x = q.popleft()
sorted.append(x)
for y in self.parseNout(x):
count[y] -= 1
if count[y] == 0:
q.append(y)
return sorted
It works fine but the problem is that is will find only one topological order. And my question would be: How can I make it to find all the topological orders?
Your loops are in a fixed order. Different topological sorts are achieved by iterating over them in different orders. So you need another level of recursion trying an topological sort on each of them being the first one to be tried.
I'd elaborate, but cursory search found several pages apparently describing the algorithm you want (albeit in different languages):
https://www.geeksforgeeks.org/all-topological-sorts-of-a-directed-acyclic-graph/
https://www.techiedelight.com/find-all-possible-topological-orderings-of-dag/

Isabelle function to find the longest sequence of members of a relation

I have a relation R :: w => w => bool that is both transitive an irreflexive.
I have the axiom Ax1: "finite {x::w. True}". Therefore, for each x there is always a longest sequence of wn R ... R w2 R w1 R x.
I need a function F:: w => nat, that -for a given x - gives back the "lenght" of this sequence (or 0 if there is no y such that xRy). How would I go about building one in isabelle.
Also: Is Ax1 a good way to axiomatize the "finiteness of type w" or is there a better one?
First of all, a more idiomatic way of writing {x::w. True} is UNIV :: w set. I suggest writing finite (UNIV :: w set), or possibly using the finite type class, although that might make your theorem more difficult to apply because you need a finite instance for your type. I think it's not really necessary or helpful for your use case.
I then suggest the following approach:
Define an inductive predicate (using inductive) on lists of type w list stating that the first element is x and for each two successive list elements y and z, R y z holds, i.e. the list is an ascending chain w.r.t. R.
Show that any list that is such a chain must have distinct elements (cf. distinct :: 'a list ⇒ bool).
Show that there are finitely many distinct lists over a finite set.
Use the Max operator to find the biggest n such that there exists a list of length n that is an ascending chain w.r.t. R. That this works should be easy since there is at least one such chain, and you've already shown that there are only finitely many chains.

Difference in implementation of gcd between logic and functional programming

I'm currently learning programming language concepts and pragmatics, hence I feel like I need help in differentiating two subbranches of declarative language family.
Consider the following code snippets which are written in Scheme and Prolog, respectively:
;Scheme
(define gcd
(lambda (a b)
(cond ((= a b) a)
((> a b) (gcd (- a b) b))
(else (gcd (- b a) a)))))
%Prolog
gcd(A, B, G) :- A = B, G = A.
gcd(A, B, G) :- A > B, C is A-B, gcd(C, B, G).
gcd(A, B, G) :- B > A, C is B-A, gcd(C, A, G).
The thing that I didn't understand is:
How do these two different programming languages behave
differently?
Where do we make the difference so that they are categorized either
Functional or Logic-based programming language?
As far as I'm concerned, they do exactly the same thing, calling recursive functions until it terminates.
Since you are using very low-level predicates in your logic programming version, you cannot easily see the increased generality that logic programming gives you over functional programming.
Consider this slightly edited version of your code, which uses CLP(FD) constraints for declarative integer arithmetic instead of the low-level arithmetic you are currently using:
gcd(A, A, A).
gcd(A, B, G) :- A #> B, C #= A - B, gcd(C, B, G).
gcd(A, B, G) :- B #> A, C #= B - A, gcd(C, A, G).
Importantly, we can use this as a true relation, which makes sense in all directions.
For example, we can ask:
Are there two integers X and Y such that their GCD is 3?
That is, we can use this relation in the other direction too! Not only can we, given two integers, compute their GCD. No! We can also ask, using the same program:
?- gcd(X, Y, 3).
X = Y, Y = 3 ;
X = 6,
Y = 3 ;
X = 9,
Y = 3 ;
X = 12,
Y = 3 ;
etc.
We can also post even more general queries and still obtain answers:
?- gcd(X, Y, Z).
X = Y, Y = Z ;
Y = Z,
Z#=>X+ -1,
2*Z#=X ;
Y = Z,
_1712+Z#=X,
Z#=>X+ -1,
Z#=>_1712+ -1,
2*Z#=_1712 ;
etc.
That's a true relation, which is more general than a function of two arguments!
See clpfd for more information.
The GCD example only lightly touches on the differences between logic programming and functional programming as they are much closer to each other than to imperative programming. I will concentrate on Prolog and OCaml, but I believe it is quite representative.
Logical Variables and Unification:
Prolog allows to express partial datastructures e.g. in the term node(24,Left,Right) we don't need to specify what Left and Right stand for, they might be any term. A functional language might insert a lazy function or a thunk which is evaluated later on, but at the creation of the term, we need to know what to insert.
Logical variables can also be unified (i.e. made equal). A search function in OCaml might look like:
let rec find v = function
| [] -> false
| x::_ when v = x -> true
| _::xs (* otherwise *) -> find v xs
While the Prolog implementation can use unification instead of v=x:
member_of(X,[X|_]).
member_of(X,[_|Xs]) :-
member_of(X,Xs).
For the sake of simplicity, the Prolog version has some drawbacks (see below in backtracking).
Backtracking:
Prolog's strength lies in successively instantiating variables which can be easily undone. If you try the above program with variables, Prolog will return you all possible values for them:
?- member_of(X,[1,2,3,1]).
X = 1 ;
X = 2 ;
X = 3 ;
X = 1 ;
false.
This is particularly handy when you need to explore search trees but it comes at a price. If we did not specify the size of the list, we will successively create all lists fulfilling our property - in this case infinitely many:
?- member_of(X,Xs).
Xs = [X|_3836] ;
Xs = [_3834, X|_3842] ;
Xs = [_3834, _3840, X|_3848] ;
Xs = [_3834, _3840, _3846, X|_3854] ;
Xs = [_3834, _3840, _3846, _3852, X|_3860] ;
Xs = [_3834, _3840, _3846, _3852, _3858, X|_3866] ;
Xs = [_3834, _3840, _3846, _3852, _3858, _3864, X|_3872]
[etc etc etc]
This means that you need to be more careful using Prolog, because termination is harder to control. In particular, the old-style ways (the cut operator !) to do that are pretty hard to use correctly and there's still some discussion about the merits of recent approaches (deferring goals (with e.g. dif), constraint arithmetic or a reified if). In a functional programming language, backtracking is usually implemented by using a stack or a backtracking state monad.
Invertible Programs:
Perhaps one more appetizer for using Prolog: functional programming has a direction of evaluation. We can use the find function only to check if some v is a member of a list, but we can not ask which lists fulfill this. In Prolog, this is possible:
?- Xs = [A,B,C], member_of(1,Xs).
Xs = [1, B, C],
A = 1 ;
Xs = [A, 1, C],
B = 1 ;
Xs = [A, B, 1],
C = 1 ;
false.
These are exactly the lists with three elements which contain (at least) one element 1. Unfortunately the standard arithmetic predicates are not invertible and together with the fact that the GCD of two numbers is always unique is the reason why you could not find too much of a difference between functional and logic programming.
To summarize: logic programming has variables which allow for easier pattern matching, invertibility and exploring multiple solutions of the search tree. This comes at the cost of complicated flow control. Depending on the problem it is easier to have a backtracking execution which is sometimes restricted or to add backtracking to a functional language.
The difference is not very clear from one example. Programming language are categorized to logic,functional,... based on some characteristics that they support and as a result they are designed in order to be more easy for programmers in each field (logic,functional...). As an example imperative programming languages (like c) are very different from object oriented (like java,C++) and here the differences are more obvious.
More specifically, in your question the Prolog programming language has adopted he philosophy of logic programming and this is obvious for someone who knows a little bit about mathematical logic. Prolog has predicates (rather than functions-basically almost the same) which return true or false based on the "world" we have defined which is for example what facts and clauses do we have already defined, what mathematical facts are defined and more....All these things are inherited by mathematical logic (propositional and first order logic). So we could say that Prolog is used as a model to logic which makes logical problems (like games,puzzles...) more easy to solve. Moreover Prolog has some features that general-purpose languages have. For example you could write a program in your example to calculate gcd:
gcd(A, B, G) :- A = B, G = A.
gcd(A, B, G) :- A > B, C is A-B, gcd(C, B, G).
gcd(A, B, G) :- B > A, C is B-A, gcd(C, A, G).
In your program you use a predicate gcd in returns TRUE if G unifies with GCD of A,B, and you use multiple clauses to match all cases. When you query gcd(2,5,1). will return True (NOTE that in other languages like shceme you can't give the result as parameter), while if you query gcd(2,5,G). it unifies G with gcd of A,B and returns 1, it is like asking Prolog what should be G in order gcd(2,5,G). be true. So you can understand that it is all about when the predicate succeeds and for that reason you can have more than one solutions, while in functional programming languages you can't.
Functional languages are based in functions so always return the SAME
TYPE of result. This doesn't stand always in Prolog you could have a predicate predicate_example(Number,List). and query predicate_example(5,List). which returns List=... (a list) and also query
predicate_example(Number,[1,2,3]). and return N=... (a number).
The result should be unique, In mathematics, a function is a relation
between a set of inputs and a set of permissible outputs with the property that each input is related to exactly one output
Should be clear what parameter is the variable that will be returned
for example gcd function is of type : N * N -> R so gets A,B parameters which belong to N (natural numbers) and returns gcd. But prolog (with some changes in your program) could return the parameter A,so querying gcd(A,5,1). would give all possible A such that predicate gcd succeeds,A=1,2,3,4,5 .
Prolog in order to find gcd tries every possible way with choice
points so in every step it will try all of you three clauses and will
find every possible solutions. Functional programming languages on
the other hand, like functions should have well unique defined steps
to find the solution.
So you can understand that the difference between Functional and logic languages may not be always visible but they are based on different philosophy-way of thinking.
Imagine how hard would be to solve tic-tac-toe or N queens problem or man-goat-wolf-cabbage problem in Scheme.

How to implement a binary heap using list in OCaml?

I am implementing a binary heap using list in OCaml, just to sharpen my OCaml skills.
I feel it very difficult using list and after struggling for 2 days, I have to come here for suggestions and hints.
Here is my thought so far
Obviously, I can't use the orignal array based algorithm to implement it using list.
What I am trying to utilise is binary tree. I have keep the invariant that a node should be bigger than any node whose level is lower than its.
I roughly figured out how to implement insert, although I am not sure whether it is correct or not.
For the binary tree, each node has two children, value and size n which is the total number of offsprings it has. This n is used to balance the tree.
When inserting x, I compare with a node (from root, recursively). Assume x < the value of the node, then
If one or both of the node's children are Leaf, then I insert the x to that Leaf place.
If none of the node's children are Leaf, then I will choose the child whose n is less and then recursively insert.
Here is my code
type 'a heap =
| Node of 'a * 'a heap * 'a heap * int
| Leaf
exception EmptyHeapException
let create_heap () = Leaf;;
let rec insert x = function
| Leaf -> Node (x, Leaf, Leaf, 0)
| Node (v, l, r, n) ->
let (stay, move) = if x > v then (x, v) else (v, x)
in
match (l, r) with
| (Leaf, Leaf) ->
Node (stay, Node (move, Leaf, Leaf, 0), Leaf, 1)
| (Leaf, _) ->
Node (stay, Node (move, Leaf, Leaf, 0), r, n+1)
| (_, Leaf) ->
Node (stay, l, Node (move, Leaf, Leaf, 0), n+1)
| (Node (_, _, _, n1), Node (_, _, _, n2)) ->
if n1 <= n2 then
Node (stay, (insert move l), r, n1+1)
else
Node (stay, l, (insert move r), n2+1);;
Ok, I have following questions.
Am I heading to the correct direction? Is my thought or implementation correct?
I get stuck in implementing get_top function. I don't know how to continue. any hints?
ocaml batteries implemented an efficient batHeap.ml. I have had a look, but I feel its way is totally different from mine and I can't understand it. Any one can help me understanding it?
This insertion code looks pretty nice to me. (I was confused by the counts for a while, but now I see they're counting the number of offspring.)
The function to remove the largest element (the root) is basically a deletion, which is always the most difficult. In essence you need to merge two trees while maintaining your invariant. I don't have time right now to work through it in detail, but I think it will turn out to be possible.
If you look in Okasaki (which you can do if you get stuck!) you'll see his trees have an extra invariant that makes it easier to do these operations. I'm pretty sure it's not something I would come up with right away. His implementation is based on an operation that merges two trees. It's used for insertion and deletion.
At a quick glance the Batteries heap code is based on "binomial trees", which are in fact a lot more complicated. They're explained in Okasaki also.
Update
Okasaki's book Purely Functional Data Structures is an elaboration of his PhD thesis. It appears that priority queues appear only in the book--sorry. If you're really interested in FP and not too strapped for cash the book is really worth owning.
As I said, your insert code looks great to me. It seems to me you actually have two invariants:
The value in a node is less than or equal to the values at the roots of its subtrees (ordering invariant).
The populations of the subtrees of a node differ by at most 1 (balance invariant).
As I said, I don't have time to verify in detail, but it looks to me like your insert code maintains the invariants and thus is O(log n).
The usefulness of this structure depends on your being able to delete the root in O(log n) while maintaining these two invariants.
The sketch of delete would be something like this:
let pop = function Leaf -> 0 | Node (_, _, _, p) -> p
let rec merge a b =
(* populations of a, b differ by at most one. pop a >= pop b *)
match a, b with
| Leaf, Leaf -> Leaf
| Leaf, _ -> b
| _, Leaf -> a
| Node (av, al, ar, ap), Node (bv, bl, br, bp) ->
if av >= bv then Node (av, merge al ar, b, ap + bp)
else Node (bv, merge al ar, insert av (delete_min b), ap + bp)
and delete_min = function
| Leaf -> Leaf
| Node (_, Leaf, Leaf, _) -> Leaf
| Node (_, l, Leaf, _) -> l
| Node (_, Leaf, r, _) -> r
| Node (_, l, r, _) ->
if pop l >= pop r then merge l r else merge r l
I still don't have a lot of time, so this might need some fixing up for correctness or for complexity.
Update
Being a purely cerebral guy, I (truly) never wondered what Chris Okasaki is like in real life. He teaches at West Point, and it's not too difficult to find his personal page there. It might satisfy some of your curiosity.

Is there any algebraic structures used in functional programming other then monoid?

I recently getting to know about functional programming (in Haskell and Scala). It's capabilities and elegance is quite charming.
But when I met Monads, which makes use of an algebraic structure named Monoid, I was surprised and glad to see the theoretic knowledge I have been learning from Mathematics is made use of in programming.
This observation brought a question into my mind: Can Groups, Fields or Rings (see Algebraic Structures for others) be used in programming for more abstraction and code reuse purposes and achieving mathematic-alike programming?
As I know, the language named Fortress (which I would surely prefer over any language once when its compiler is completed) defines these structure in its library code. But only uses I saw so far was for numeric types, which we already familiar with. Could there be any other uses of them?
Best regards,
ciun
You can model many structures. Here's a group:
class Group a where
mult :: a -> a -> a
identity :: a
inverse :: a -> a
instance Group Integer where
mult = (+)
identity = 0
inverse = negate
-- S_3 (group of all bijections of a 3-element set)
data S3 = ABC | ACB | BAC | BCA | CAB | CBA
instance Group S3 where
mult ABC x = x
... -- some boring code
identity = ABC
inverse ABC = ABC
... -- remaining cases
-- Operations on groups. Dual:
data Dual a = Dual { getDual :: a }
instance Group a => Group (Dual a) where
mult (Dual x) (Dual y) = Dual (mult y x)
identity = Dual identity
inverse (Dual x) = Dual (inverse x)
-- Product:
instance (Group a, Group b) => Group (a,b) where
mult (x,y) (z,t) = (x `mult` z, y `mult` t)
identity = (identity, identity)
inverse (x,y) = (inverse x, inverse y)
Now, you can write mult (Dual CAB, 5) (Dual CBA, 1) and get a result. This will be a computation in group S3* ⨯ Z. You can add other groups, combine them in any possible way and do computations with them.
Similar things can be done with rings, fields, orderings, vector spaces, categories etc. Haskell's numeric hierarchy is unfortunately badly modeled, but there's numeric prelude that attempts to fix that. Also there's DoCon that takes it to extreme. For a tour of type classes (mainly motivated by category theory), there's Typeclassopedia which has a large list of examples and applications.
Haskell's Arrows are a generalisation of monads and probably are relevant.
I'd recommend Edward Kmett's very readable blog and related category extras package. Should keep you busy for years.

Resources