I am trying to implement FP-Growth (frequent pattern mining) algorithm in Java. I have built the tree, but have difficulties with conditional FP tree construction; I do not understand what recursive function should do. Given a list of frequent items (in increasing order of frequency counts) - a header, and a tree (list of Node class instances) what steps should the function take?
I have hard time understanding this pseudocode above. Are alpha and Betha nodes in the Tree, and what do generate and construct functions do? I can do FP-Growth by hand, but find the implementation extremely confusing. If that could help, I can share my code for FP-Tree generation. Thanks in advance.
alpha is the prefix that lead to this specific prefix tree
beta is the new prefix (of the tree to be constructed)
the generate line means something like: add to result set the pattern beta with support anItem.support
the construct function creates the new patterns from which the new tree is created
an example of the construct function (bottom up way) would be something like:
function construct(Tree, anItem)
conditional_pattern_base = empty list
in Tree find all nodes with tag = anItem
for each node found:
support = node.support
conditional_pattern = empty list
while node.parent != root_node
conditional_pattern.append(node.parent)
node = node.parent
conditional_pattern_base.append( (conditional_pattern, support))
return conditional_pattern_base
Related
I am familiar with what choice operator ((?)) does, it takes two arguments and matches to both of them. We could define it as follows:
a?_=a
_?b=b
This can be used to introduce non-determinism between two values. However what I don't understand why we would want to do that.
What would be an example of a problem that could be solved by using (?)?
One example that is typically used to motivate non-determinism is a function that computes all permutations of a list.
insert e [] = [e]
insert e (x:xs) = (e : x : xs) ? (x : insert e xs)
perm [] = []
perm (x:xs) = insert x (perm xs)
The nice thing here is that you do not need to specify how you want to enumerate all lists, the search algorithm that underlies a logic programming language like Curry (per default it's depth-first-search) does the job for you. You merely give a specification of how a list in your result should look like.
Hopefully, you find some more realistic examples in the following papers.
New Functional Logic Design Patterns
Functional Logic Design Patterns
Edit: As I recently published work on that topic, I want to add the application of probabilistic programming. In the paper we show that using non-determinism to model probabilistic values can have advantages to list-based approaches with respect to pruning of the search space. More precisely, when we perform a query on the probabilistic values, i.e., filter a distribution based on a predicate, the non-determinism behaves less strict that an list-based approaches and can prune the search space.
I have a Neo4j graph with directed cycles. I have had no issue finding all descendants of A assuming I don't care about loops using this Cypher query:
match (n:TEST{name:"A"})-[r:MOVEMENT*]->(m:TEST)
return n,m,last(r).movement_time
The relationships between my nodes have a timestamp property on them, movement_time. I've simulated that in my test data below using numbers that I've imported as floats. I would like to traverse the graph using the timestamp as a constraint. Only follow relationships that have a greater movement_time than the movement_time of the relationship that brought us to this node.
Here is the CSV sample data:
from,to,movement_time
A,B,0
B,C,1
B,D,1
B,E,1
B,X,2
E,A,3
Z,B,5
C,X,6
X,A,7
D,A,7
Here is what the graph looks like:
I would like to calculate the descendants of every node in the graph and include the timestamp from the last relationship using Cypher; so I'd like my output data to look something like this:
Node:[{Descendant,Movement Time},...]
A:[{B,0},{C,1},{D,1},{E,1},{X,2}]
B:[{C,1},{D,1},{E,1},{X,2},{A,7}]
C:[{X,6},{A,7}]
D:[{A,7}]
E:[{A,3}]
X:[{A,7}]
Z:[{B,5}]
This non-Neo4J implementation looks similar to what I'm trying to do: Cycle enumeration of a directed graph with multi edges
This one is not 100% what you want, but very close:
MATCH (n:TEST)-[r:MOVEMENT*]->(m:TEST)
WITH n, m, r, [x IN range(0,length(r)-2) |
(r[x+1]).movement_time - (r[x]).movement_time] AS deltas
WHERE ALL (x IN deltas WHERE x>0)
RETURN n, collect(m), collect(last(r).movement_time)
ORDER BY n.name
We basically find all the paths between any of your nodes (beware cartesian products get very expensive on non-trivial datasets). In the WITH we're building a collection delta's that holds the difference between two subsequent movement_time properties.
The WHERE applies an ALL predicate to filter out those having any non-positive value - aka we guarantee increasing values of movement_time along the path.
The RETURN then just assembles the results - but not as a map, instead one collection for the reachable nodes and the last value of movement_time.
The current issue is that we have duplicates since e.g. there are multiple paths from B to A.
As a general notice: this problem is much more elegantly and more performant solvable by using Java traversal API (http://neo4j.com/docs/stable/tutorial-traversal.html). Here you would have a PathExpander that skips paths with decreasing movement_time early instead of collection all and filter out (as Cypher does).
I am new to OCaml, and am trying to figure out a way to replace all nodes in a tree by their depth. I think I'll have to construct a new tree. Can anyone help?
Yes, you have to build a new tree. You should try to define a (recursive) function that takes two parameters, a subtree and the depth of this subtree, and returns the subtree where all nodes have been replaced by their depth. Then you can get what you want by applying this function on the whole tree with depth 0.
I begin to write graph related data structure and algorithms in OCaml now.
I am willing to try to write them in a functional way in OCaml, i.e., avoiding using array, mutable type, etc.
But if I write all those using list, will it be efficient or make sense?
You don't need to use lists for everything. There are many other immutable data structures in the library, or you can define your own. You can think of a graph as a map from nodes to lists of successors. I've written large amounts of graph processing code in OCaml using this representation, and I was always pretty happy with the results.
Update
Here's a sketch of the representation I'm talking about. It assumes you label your nodes with uniqe strings. Note that (as monniaux commented) using a list of successors is probably more suitable for sparse graphs than for dense ones.
type label = string
module GMap = Map.Make(struct type t = label let compare = compare end)
type 'a node = label * 'a * label list
type 'a graph = 'a node GMap.t
let empty = GMap.empty
let add (label: label) (contents: 'a) successors (graph: 'a graph) : 'a graph =
GMap.add label (label, contents, successors) graph
Make a graph containing just a cycle of length 2:
# let g = add "a" () ["b"] (add "b" () ["a"] empty);;
val g : unit graph = <abstr>
This all depends on the kind of algorithms you wish to implement and whether you wish to consider sparse or dense graphs. A dense graph is usually represented by a matrix (which, by the way, can be kept immutable and used functionally).
If you need arrays with O(1) update (e.g. for marking nodes), you may still use them in a functional way by using a wrapper library that presents arrays as updatable structures, giving a new array each time. The trick is to implement these updates by keeping a mutable array as "head" version and keeping past versions as deltas from "head". This keeps all the imperative issues wrapped up inside a module presenting, semantically at least, a functional interface.
If you have sparse graphs, it is common to talk about "adjacency lists". I think it is a bad idea to use them unless the graph is extremely sparse (nodes with small degree, in absolute terms), because they have O(n) access. Binary trees à la Set are, I think, much more suitable, because they allow testing in O(log n).
Cyclomatic complexity measures how many possible branches can be taken through a function. Is there an existing function/tool to calculate it for R functions? If not, suggestions are appreciated for the best way to write one.
A cheap start towards this would be to count up all the occurences of if, ifelse or switch within your function. To get a real answer though, you need to understand when branches start and end, which is much harder. Maybe some R parsing tools would get us started?
You can use codetools::walkCode to walk the code tree. Unfortunately codetools' documentation is pretty sparse. Here's an explanation and sample to get you started.
walkCode takes an expression and a code walker. A code walker is a list that you create, that must contain three callback functions: handler, call, and leaf. (You can use the helper function makeCodeWalker to provide sensible default implementations of each.) walkCode walks over the code tree and makes calls into the code walker as it goes.
call(e, w) is called when a compound expression is encountered. e is the expression and w is the code walker itself. The default implementation simply recurses into the expression's child nodes (for (ee in as.list(e)) if (!missing(ee)) walkCode(ee, w)).
leaf(e, w) is called when a leaf node in the tree is encountered. Again, e is the leaf node expression and w is the code walker. The default implementation is simply print(e).
handler(v, w) is called for each compound expression and can be used to easily provide an alternative behavior to call for certain types of expressions. v is the character string representation of the parent of the compound expression (a little hard to explain--but basically <- if it's an assignment expression, { if it's the start of a block, if if it's an if-statement, etc.). If the handler returns NULL then call is invoked as usual; if you return a function instead, that's what's called instead of the function.
Here's an extremely simplistic example that counts occurrences of if and ifelse of a function. Hopefully this can at least get you started!
library(codetools)
countBranches <- function(func) {
count <- 0
walkCode(body(func),
makeCodeWalker(
handler=function(v, w) {
if (v == 'if' || v == 'ifelse')
count <<- count + 1
NULL # allow normal recursion
},
leaf=function(e, w) NULL))
count
}
Also, I just found a new package called cyclocomp (released 2016). Check it out!