I'm writing a bunch of recursive graph algorithms where graph nodes have parents, children, and a number of other properties. The algorithms can also create nodes dynamically, and make use of recursive functions.
What are the right data structures to use in this case? In C++ I would've implemented this via pointers (i.e. each node has a vector<Node*> parents, vector<Node*> children), but I'm not sure if Julia pointers are the right tool for that, or if there's something else ... ?
In Julia state-of-the-art with this regard is the LightGraphs.jl library.
It uses adjacency lists for graph representation and assumes that the data for nodes is being kept outside the graph (for example in Vectors indexed by node identifiers) rather than inside the graph.
This approach is generally most efficient and most convenient (operating Array indices rather than references).
LightGraphs.jl provides implementation for several typical graph algorithms and is usually the way to go when doing computation on graphs.
However, LightGraphs.jl,'s approach might be less convenient in scenarios where you are continuously at the same time adding and destroying many nodes within the graph.
Now, regarding an equivalent of the C++ approach you have proposed it can be accomplished as
struct MyNode{T}
data::T
children::Vector{MyNode}
parents::Vector{MyNode}
MyNode(data::T,children=MyNode[],parents=MyNode[]) where T= new{T}(data,children,parents)
end
And this API can be used as:
node1 = MyNode(nothing)
push!(node1.parents, MyNode("hello2"))
Finally, since LightGraphs.jl is a Julia standard it is usually worth to provide some bridging implementation so your API is able to use LightGraphs.jl functions.
For illustration how it can be done for an example have a look for SimpleHypergraphs.jl library.
EDIT:
Normally, for efficiency reasons you will want the data field to be be homogenous across the graph, in that case better is:
struct MyNode{T}
data::T
children::Vector{MyNode{T}}
parents::Vector{MyNode{T}}
MyNode(data::T,children=MyNode{T}[],parents=MyNode{T}[]) where T= new{T}(data,children,parents)
end
Related
I am using the Julia package ReinforcementLearning.jl. I would like to gain from the fact that DQN does not require enumerate and revising the whole state space. So, my question is how to describe state_space for discrete environments with no need to enumerate states. In other words, let's assume states are represented by an array of N elements and each of these elements can take M possible values, I would like to avoid the enumeration of the N^M potential states and instead of it, have some generative function.
I have implemented DQN by using ReinforcementLearning.jl for environments where actions and states are discrete. To do so, I have enumerated states at the state_space definition. It works quite well, but the enumeration is avoiding me to get the computational advantages of DQN.
Terminology note: "vertices"="nodes", "vertex/node labels" = "indices"
LightGraphs in Julia changes node indices when producing induced subgraphs.
For instance, if a graph has nodes [1, 2, 3, 4], its LightGraphs.induced_subgraph induced by nodes [3,4] will be a new graph with nodes [3,4] getting relabeled as [1,2].
In state-of-the-art graph algorithms, recursive subgraphing is used, with sets of nodes being modified and passed up and down the recursion layers. For these algorithms to properly keep track of node identities (labels), subgraphing must not change the indices.
Subgraphing in networkx in Python, for instance, preserves node labels.
One can use MetaGraphs by adding a node attribute :id, which is preserved by subgraphing, but then you have to write a lot of extra code to convert between node indices and node :id's.
Is there not a Julia package that "just works" when it comes to subgraphing and preserving node identities?
I'd first like to take the opportunity to clarify some terminology here: LightGraphs itself doesn't dictate a graph type. It's a collection of algorithms and an interface specification. The limitations you're seeing are for SimpleGraphs, which is a graph type that ships with the LightGraphs package and is the default type for Graph and DiGraph.
The reason this is significant is that it is (or at least should be) very easy to create a graph type that does exactly what you want and that can take advantage of the existing LightGraphs infrastructure. All you (theoretically) need to do is to implement the interface functions described in src/interface.jl. If you implement them correctly, all the LightGraphs algorithms should Just Work (tm) (though they might not be performant; that's up to the data structures you've chosen and interface decisions you've made).
So - my advice is to write the graph structure you want, and implement the dozen or so interface functions, and see what works and what doesn't. If there's an existing algorithm that breaks with your interface implementation, file a bug report and we'll see where the problem is.
I am working on a parallel finite element method on moving meshes.
So I will need to call ParMETIS_V3_AdaptiveRepart from ParMetis to perform re-partitioning every time I re-mesh.
When successful, the function only generates the partitioning information, i.e. the elements on the processors.
However, the neighbors of a process are important as well, in order to construct the ghost layers of a sub-mesh.
So I am wondering if there is any efficient way to get the information about shared (overlapped) entities and neighbors, or does the ParMetis actually provide this information?
ParMetis is the function ParMETIS_V3_AdaptiveRepart does more or less the smae thing as ParMETIS_V3_PartKway
The ouput of ParMETIS_V3_PartKway is part "an array of size equal to the number of locally-stored vertices. Upon successful completion the
partition vector of the locally-stored vertices is written to this array."
It also returns the number of edges that are cut. (which is only a part of what you want).
But METIS does not provide a way to create the "ghost layers" as you elegantly say.
However since you have the created the graph you know how to find each neighbour for each element. And you can check if your neighbour element is in your current process's graph and if part[element]==part[neighbour_element]. If the neighbour element is not in your current process you will have to do a bit of MPI.
Say I wanted to write an algorithm working on an immutable tree data structure that has a list of leaves as its input. It needs to return a new tree with changes made to the old tree going upwards from those leaves.
My problem is that there seems to be no way to do this purely functional without reconstructing the entire tree checking at leaves if they are in the list, because you always need to return a complete new tree as the result of an operation and you can't mutate the existing tree.
Is this a basic problem in functional programming that only can be avoided by using a better suited algorithm or am I missing something?
Edit: I not only want to avoid to recreate the entire tree but also the functional algorithm should have the same time complexity than the mutating variant.
The most promising I have seen so far (which admittedly is not very long...) is the Zipper data structure: It basically keeps a separate structure, a reverse path from the node to root, and does local edits on this separate structure.
It can do multiple local edits, most of which are constant time, and write them back to the tree (reconstructing the path to root, which are the only nodes that need to change) all in one go.
The Zipper is a standard library in Clojure (see the heading Zippers - Functional Tree Editing).
And there's the original paper by Huet with an implementation in OCaml.
Disclaimer: I have been programming for a long time, but only started functional programming a couple of weeks ago, and had never even heard of the problem of functional editing of trees until last week, so there may very well be other solutions I'm unaware of.
Still, it looks like the Zipper does most of what one could wish for. If there are other alternatives at O(log n) or below, I'd like to hear them.
You may enjoy reading
http://lorgonblog.spaces.live.com/blog/cns!701679AD17B6D310!248.entry
This depends on your functional programming language. For instance in Haskell, which is a Lazy functional programming language, results are calculated at the last moment; when they are acutally needed.
In your example the assumption is that because your function creates a new tree, the whole tree must be processed, whereas in reality the function is just passed on to the next function and only executed when necessary.
A good example of lazy evaluation is the sieve of erastothenes in Haskell, which creates the prime numbers by eliminating the multiples of the current number in the list of numbers. Note that the list of numbers is infinite. Taken from here
primes :: [Integer]
primes = sieve [2..]
where
sieve (p:xs) = p : sieve [x|x <- xs, x `mod` p > 0]
I recently wrote an algorithm that does exactly what you described - https://medium.com/hibob-engineering/from-list-to-immutable-hierarchy-tree-with-scala-c9e16a63cb89
It works in 2 phases:
Sort the list of nodes by their depth in the hierarchy
constructs the tree from bottom up
Some caveats:
No Node mutation, The result is an Immutable-tree
The complexity is O(n)
Ignores cyclic referencing in the incoming list
I read the mapreduce at http://en.wikipedia.org/wiki/MapReduce ,understood the example of how to get the count of a "word" in many "documents". However I did not understand the following line:
Thus the MapReduce framework transforms a list of (key, value) pairs into a list of values. This behavior is different from the functional programming map and reduce combination, which accepts a list of arbitrary values and returns one single value that combines all the values returned by map.
Can someone elaborate on the difference again(MapReduce framework VS map and reduce combination)? Especially, what does the reduce functional programming do?
Thanks a great deal.
The main difference would be that MapReduce is apparently patentable. (Couldn't help myself, sorry...)
On a more serious note, the MapReduce paper, as I remember it, describes a methodology of performing calculations in a massively parallelised fashion. This methodology builds upon the map / reduce construct which was well known for years before, but goes beyond into such matters as distributing the data etc. Also, some constraints are imposed on the structure of data being operated upon and returned by the functions used in the map-like and reduce-like parts of the computation (the thing about data coming in lists of key/value pairs), so you could say that MapReduce is a massive-parallelism-friendly specialisation of the map & reduce combination.
As for the Wikipedia comment on the function being mapped in the functional programming's map / reduce construct producing one value per input... Well, sure it does, but here there are no constraints at all on the type of said value. In particular, it could be a complex data structure like perhaps a list of things to which you would again apply a map / reduce transformation. Going back to the "counting words" example, you could very well have a function which, for a given portion of text, produces a data structure mapping words to occurrence counts, map that over your documents (or chunks of documents, as the case may be) and reduce the results.
In fact, that's exactly what happens in this article by Phil Hagelberg. It's a fun and supremely short example of a MapReduce-word-counting-like computation implemented in Clojure with map and something equivalent to reduce (the (apply + (merge-with ...)) bit -- merge-with is implemented in terms of reduce in clojure.core). The only difference between this and the Wikipedia example is that the objects being counted are URLs instead of arbitrary words -- other than that, you've got a counting words algorithm implemented with map and reduce, MapReduce-style, right there. The reason why it might not fully qualify as being an instance of MapReduce is that there's no complex distribution of workloads involved. It's all happening on a single box... albeit on all the CPUs the box provides.
For in-depth treatment of the reduce function -- also known as fold -- see Graham Hutton's A tutorial on the universality and expressiveness of fold. It's Haskell based, but should be readable even if you don't know the language, as long as you're willing to look up a Haskell thing or two as you go... Things like ++ = list concatenation, no deep Haskell magic.
Using the word count example, the original functional map() would take a set of documents, optionally distribute subsets of that set, and for each document emit a single value representing the number of words (or a particular word's occurrences) in the document. A functional reduce() would then add up the global counts for all documents, one for each document. So you get a total count (either of all words or a particular word).
In MapReduce, the map would emit a (word, count) pair for each word in each document. A MapReduce reduce() would then add up the count of each word in each document without mixing them into a single pile. So you get a list of words paired with their counts.
MapReduce is a framework built around splitting a computation into parallelizable mappers and reducers. It builds on the familiar idiom of map and reduce - if you can structure your tasks such that they can be performed by independent mappers and reducers, then you can write it in a way which takes advantage of a MapReduce framework.
Imagine a Python interpreter which recognized tasks which could be computed independently, and farmed them out to mapper or reducer nodes. If you wrote
reduce(lambda x, y: x+y, map(int, ['1', '2', '3']))
or
sum([int(x) for x in ['1', '2', '3']])
you would be using functional map and reduce methods in a MapReduce framework. With current MapReduce frameworks, there's a lot more plumbing involved, but it's the same concept.