This is an ambitious question from a Wolfram Science Conference: Is there such a thing as a network analog of a recursive function? Maybe a kind of iterative "map-reduce" pattern? If we add interaction to iteration, things become complicated: continuous iteration of a large number of interacting entities can produce very complex results. It would be nice to have a way of seeing the consequences of the myriad interactions that define a complex system. Can we find a counterpart of a recursive function in an iterative network of connected nodes which contain nested propagation loops?
One of the basic patterns of distributed computation is Map-Reduce: it can be found in Cellular Automata (CA) and Neural Networks (NN). Neurons in NN collect informations through their synapses (reduce) and send it to other neurons (map). Cells in CA act similar, they gather informations from their neighbors (reduce), apply a transition rule (reduce), and offer the result to their neighbors again. Thus >if< there is a network analog of a recursive function, then Map-Reduce is certainly an important part of it. What kind of iterative "map-reduce" patterns exist? Do certain kinds of "map-reduce" patterns result in certain kinds of streams or even vortices or whirls? Can we formulate a calculus for map-reduce patterns?
I'll take a stab at the question about recursion in neural networks, but I really don't see how map-reduce plays into this at all. I get that neural network can perform distributed computation and then reduce it to a more local representation, but the term map-reduce is a very specific brand of this distributed/local piping, mainly associated with google and Hadoop.
Anyways, the simple answer to your question is that there isn't a general method for recursion in neural networks; in fact, the very related simpler problem of implementing general purpose role-value bindings in neural networks is currently still an open question.
The general principle of why things like role-binding and recursion in neural networks (ANNs) are so hard is that ANNs are very interdependent by nature; indeed that is where most of their computational power is derived from. Whereas function calls and variable bindings are both very delineated operations; what they include is an all-or-nothing affair, and that discreteness is a valuable property in many cases. So implementing one inside the other without sacrificing any computational power is very tricky indeed.
Here is a small sampling of papers that try there hand at partial solutions. Lucky for you, a great many people find this problem very interesting!
Visual Segmentation and the Dynamic Binding Problem: Improving the Robustness of an Artificial Neural Network Plankton Classifier (1993)
A Solution to the Binding Problem for Compositional Connectionism
A (Somewhat) New Solution to the Binding Problem
Related
I am learning about these algorithms in class and in textbooks. However, these algorithms assume that the network topology does not change and that the network is reliable. We know this is not true for a network in real life. For example, if each node can only communicate with a subset of the nodes (and the relationship is not bidirectional), many weird cases emerge.
My question is how are these algorithms actually implemented in production distributed systems, in face of these challenges. I seems like we would need immensely complicated solutions than what these simple algorithms provide.
What is the point of learning these algorithms if they don't even work in real software.
From what I understand, it is usually difficult to select the best possible clustering method for your data priori, and we can use cluster validity to compare the results of different clustering algorithms and choose the one with the best validation scores.
I use an internal validation function from R stats package on my clustering result (for clustering methods I used R igraph fast.greedy and walk.trap).
The outcome is a list of many validation scores.
In the list, almost in every validation Fast greedy method has better scores than Walk trap, except in entropy walk trap method has a better score.
Can I use this validation result list as one of my reasons to explain to others why I choose Fast greedy method rather than walk trap method?
Also, is there any way to validate a disconnected graph?
Short answer: NO!
You can't use a internal index to justify the choose of a algorithm over another. Why?
Because evaluation indexes were designed to evaluate clustering results, i.e., partitions and hierarchies. You can only use them to access the quality of a clustering and therefore justify its choice over the others options. But again, you can't use them to justify choosing a particular algorithm to apply on a different dataset based on a single previous experiment.
For this task, several benchmarks are needed to determine which algorithms are generally better and should be tried first. Here some paper about it: Community detection algorithms: a comparative analysis.
Edit: What I am saying is, your validation indexes may show that the fast.greed's solution is better than the walk.trap's. However, they do not explain why you chose these algorithms instead of any others. Only your data, your assumptions, and your constraints could do that.
Also, is there any way to validate a disconnected graph?
Theoretically, any evaluation index can do this. Technically, some implementations don't handle disconnected components.
I'm newbie in Functional Programming.
I have a huge neural network with thousands of neurons and every connection between neurons has its weight. I have to update these weights very often, several thousand times per learning session.
Is FP still applicable here? I mean in fp we can't modify variables and only able to return new variables not changing previous values. Does this mean I have to recreate whole network on every weight update?
Is FP still applicable here?
You can certainly write this in a functional style with decent asymptotic algorithmic efficiency but you are not likely to get with 10× the performance of a decent imperative solution because purely functional programming makes it difficult to use CPU caches effectively.
I mean in fp we can't modify variables and only able to return new variables not changing previous values. Does this mean I have to recreate whole network on every weight update?
No, for two reasons:
Purely functional data structures can be updated efficiently because they decompose large structures (e.g. a hash table) into many small recursively-defined structures (e.g. a balanced binary tree). When you update a single node within an immutable tree, you copy data from every node in the path from the root to the destination but refer back to all other branches by reference safe in the knowledge that they cannot be changed under you because they are immutable. So you only do O(log n) work instead of O(n) work.
Purely functional data structures usually offer functions like map that allow every element to be updated in the same way and avoid rebalancing by copying the structure of the source tree. So the time for n updates is O(n) instead of O(n log n).
So you should be able to achieve similar or even equal asymptotic time complexity but, in absolute terms, you will be using several times as much space and time as an imperative solution. I described these basics in detail in my book Visual F# 2010 for Technical Computing and I wrote the article Artificial Intelligence: Neural Networks (8th May 2010) for the OCaml Journal.
Look into Haskell arrays which include mutable variants in a monad.
You should not need to recreate the entire network every time a weight update occurs. Presumably, your neurons are modeled as individual objects - this means that to "update" an individual neuron, you would actually be creating a new neuron with the updated weight. Then this neuron would be inserted into the network in place of the old one, which would in turn be free for reclamation by the garbage collector.
I do not agree with the idea of using mutable state. Functional languages know that they are functional, so they make optimizations for functional programming. If a functional language really is the best tool for the job, then take advantage of its benefits.
If you structure your data in such a way that you can use a persistent data structure to model your neural network, functional updates to the neural network will be cheap (at least compared to copying the whole thing).
If it is still not fast enough, your language may allow other techniques (such as careful use of mutation) to speed it up; for example, if you were using Clojure, you could use transients to some additional speed.
I've found UML useful for documenting various aspects of OO systems, particularly class diagrams for overall architecture and sequence diagrams to illustrate particular routines. I'd like to do the same kind of thing for my clojure applications. I'm not currently interested in Model Driven Development, simply on communicating how applications work.
Is UML a common / reasonable approach to modelling functional programming? Is there a better alternative to UML for FP?
the "many functions on a single data structure" approach of idiomatic Clojure code waters down the typical "this uses that" UML diagram because many of the functions end up pointing at map/reduce/filter.
I get the impression that because Clojure is a somewhat more data centric language a way of visualizing the flow of data could help more than a way of visualizing control flow when you take lazy evaluation into account. It would be really useful to get a "pipe line" diagram of the functions that build sequences.
map and reduce etc would turn these into trees
Most functional programmers prefer types to diagrams. (I mean types very broadly speaking, to include such things as Caml "module types", SML "signatures", and PLT Scheme "units".) To communicate how a large application works, I suggest three things:
Give the type of each module. Since you are using Clojure you may want to check out the "Units" language invented by Matthew Flatt and Matthias Felleisen. The idea is to document the types and the operations that the module depends on and that the module provides.
Give the import dependencies of the interfaces. Here a diagram can be useful; in many cases you can create a diagram automatically using dot. This has the advantage that the diagram always accurately reflects the code.
For some systems you may want to talk about important dependencies of implementations. But usually not—the point of separating interfaces from implementations is that the implementations can be understood only in terms of the interfaces they depend on.
There was recently a related question on architectural thinking in functional languages.
It's an interesting question (I've upvoted it), I expect you'll get at least as many opinions as you do responses. Here's my contribution:
What do you want to represent on your diagrams? In OO one answer to that question might be, considering class diagrams, state (or attributes if you prefer) and methods. So, obviously I would suggest, class diagrams are not the right thing to start from since functions have no state and, generally, implement one function (aka method). Do any of the other UML diagrams provide a better starting point for your thinking? The answer is probably yes but you need to consider what you want to show and find that starting point yourself.
Once you've written a (sub-)system in a functional language, then you have a (UML) component to represent on the standard sorts of diagram, but perhaps that is too high-level, too abstract, for you.
When I write functional programs, which is not a lot I admit, I tend to document functions as I would document mathematical functions (I work in scientific computing, lots of maths knocking around so this is quite natural for me). For each function I write:
an ID;
sometimes, a description;
a specification of the domain;
a specification of the co-domain;
a statement of the rule, ie the operation that the function performs;
sometimes I write post-conditions too though these are usually adequately specified by the co-domain and rule.
I use LaTeX for this, it's good for mathematical notation, but any other reasonably flexible text or word processor would do. As for diagrams, no not so much. But that's probably a reflection of the primitive state of the design of the systems I program functionally. Most of my computing is done on arrays of floating-point numbers, so most of my functions are very easy to compose ad-hoc and the structuring of a system is very loose. I imagine a diagram which showed functions as nodes and inputs/outputs as edges between nodes -- in my case there would be edges between each pair of nodes in most cases. I'm not sure drawing such a diagram would help me at all.
I seem to be coming down on the side of telling you no, UML is not a reasonable way of modelling functional systems. Whether it's common SO will tell us.
This is something I've been trying to experiment with also, and after a few years of programming in Ruby I was used to class/object modeling. In the end I think the types of designs I create for Clojure libraries are actually pretty similar to what I would do for a large C program.
Start by doing an outline of the domain model. List the main pieces of data being moved around the primary functions being performed on this data. I write these in my notebook and a lot of the time it will be just a name with 3-5 bullet points underneath it. This outline will probably be a good approximation of your initial namespaces, and it should point out some of the key high level interfaces.
If it seems pretty straight forward then I'll create empty functions for the high level interface, and just start filling them in. Typically each high level function will require a couple support functions, and as you build up the whole interface you will find opportunities for sharing more code, so you refactor as you go.
If it seems like a more difficult problem then I'll start diagramming out the structure of the data and the flow of key functions. Often times the diagram and conceptual model that makes the most sense will depend on the type of abstractions you choose to use in a specific design. For example if you use a dataflow library for a Swing GUI then using a dependency graph would make sense, but if you are writing a server to processing relational database queries then you might want to diagram pools of agents and pipelines for processing tuples. I think these kinds of models and diagrams are also much more descriptive in terms of conveying to another developer how a program is architected. They show more of the functional connectivity between aspects of your system, rather than the pretty non-specific information conveyed by something like UML.
Suppose I have a set of directed graphs. I need to query those graphs. I would like to get a feeling for my best choice for the graph modeling task. So far I have these options, but please don't hesitate to suggest others:
Proprietary implementation (matrix)
and graph traversal algorithms.
RDBM and SQL option (too space consuming)
RDF and SPARQL option (too slow)
What would you guys suggest? Regards.
EDIT: Just to answer Mad's questions:
Each one is relatively small, no more than 200 vertices, 400 edges. However, there are hundreds of them.
Frequency of querying: hard to say, it's an experimental system.
Speed: not real time, but practical, say 4-5 seconds tops.
You didn't give us enough information to respond with a well thought out answer. For example: what size are these graphs? With what frequencies do you expect to query these graphs? Do you need real-time response to these queries? More information on what your application is for, what is your purpose, will be helpful.
Anyway, to counter the usual responses that suppose SQL-based DBMSes are unable to handle graphs structures effectively, I will give some references:
Graph Transformation in Relational Databases (.pdf), by G. Varro, K. Friedl, D. Varro, presented at International Workshop on Graph-Based Tools (GraBaTs) 2004;
5 Conclusion and Future Work
In the paper, we proposed a new graph transformation engine based on off-the-shelf
relational databases. After sketching the main concepts of our approach, we carried
out several test cases to evaluate our prototype implementation by comparing it to
the transformation engines of the AGG [5] and PROGRES [18] tools.
The main conclusion that can be drawn from our experiments is that relational
databases provide a promising candidate as an implementation framework for graph
transformation engines. We call attention to the fact that our promising experimental
results were obtained using a worst-case assessment method i.e. by recalculating
the views of the next rule to be applied from scratch which is still highly inefficient,
especially, for model transformations with a large number of independent matches
of the same rule. ...
They used PostgreSQL as DBMS, which is probably not particularly good at this kind of applications. You can try LucidDB and see if it is better, as I suspect.
Incremental SQL Queries (more than one paper here, you should concentrate on " Maintaining Transitive Closure of Graphs in SQL "): "
.. we showed that transitive closure, alternating paths, same generation, and other recursive queries, can be maintained in SQL if some auxiliary relations are allowed. In fact, they can all be maintained using at most auxiliary relations of arity 2. ..
Incremental Maintenance of Shortest Distance and Transitive Closure in First Order Logic and SQL.
Edit: you give more details so... I think the best way is to experiment a little with both a main-memory dedicated graph library and with a DBMS-based solution, then evaluate carefully pros and cons of both solutions.
For example: a DBMS need to be installed (if you don't use an "embeddable" DBMS like SQLite), only you know if/where your application needs to be deployed and what your users are. On the other hand, a DBMS gives you immediate benefits, like persistence (I don't know what support graph libraries gives for persisting their graphs), transactions management and countless other. Are these relevant for your application? Again, only you know.
The first option you mentioned seems best. If your graph won't have many edges (|E|=O(|V|)) then you might earn better complexity of time and space using Dictionary:
var graph = new Dictionary<Vertex, HashSet<Vertex>>();
An interesting graph library is QuickGraph. Never used it but it seems promising :)
I wrote and designed quite a few graph algorithms for various programming contests and in production code. And I noticed that every time I need one, I have to develop it from scratch, assembling together concepts from graph theory (BFS, DFS, topological sorting etc).
Perhaps a lack of experience is a reason, but it seems to me that there's still no reasonable general-purpose query language to solve graph problems. Pick a couple of general-purpose graph libraries and solve your particular task in a programming (not query!) language. That will give you best performance and space consumption, but will also require understanding of graph theory basic concepts and of their limitations.
And the last one: do not use SQL for graphs.