I'm playing arround with a Genetic Algorithm in which I want to evolve graphs.
Do you know a way to apply crossover and mutation when the chromosomes are graphs?
Or am I missing a coding for the graphs that let me apply "regular" crossover and mutation over bit strings?
thanks a lot!
Any help, even if it is not directly related to my problem, is appreciated!
Manuel
I like Sandor's suggestion of using Ken Stanley's NEAT algorithm.
NEAT was designed to evolve neural networks with arbitrary topologies, but those are just basically directed graphs. There were many ways to evolve neural networks before NEAT, but one of NEAT's most important contributions was that it provided a way to perform meaningful crossover between two networks that have different toplogies.
To accomplish this, NEAT uses historical markings attached to each gene to "line up" the genes of two genomes during crossover (a process biologists call synapsis). For example:
(source: natekohl.net)
(In this example, each gene is a box and represents a connection between two nodes. The number at the top of each gene is the historical marking for that gene.)
In summary: Lining up genes based on historical markings is a principled way to perform crossover between two networks without expensive topological analysis.
You might as well try Genetic Programming. A graph would be the closest thing to a tree and GP uses trees... if you still want to use GAs instead of GPs then take a look at how crossover is performed on a GP and that might give you an idea how to perform it on the graphs of your GA:
(source: geneticprogramming.com)
Here is how crossover for trees (and graphs) works:
You select 2 specimens for mating.
You pick a random node from one parent and swap it with a random node in the other parent.
The resulting trees are the offspring.
As others have mentioned, one common way to cross graphs (or trees) in a GA is to swap subgraphs (subtrees). For mutation, just randomly change some of the nodes (w/ small probability).
Alternatively, if you are representing a graph as an adjacency matrix, then you might swap/mutate elements in the matrices (kind of like using a two-dimensional bit string).
I'm not sure if using a bitstring is the best idea, I'd rather represent at least the weights with real values. Nevertheless bitstrings may also work.
If you have a fixed topology, then both crossover and mutation are quite easy (assuming you only evolve the weights of the network):
Crossover: take some weights from one parent, the rest from the other, can be very easily done if you represent the weights as an array or list. For more details or alternatives see http://en.wikipedia.org/wiki/Crossover_%28genetic_algorithm%29.
Mutation: simply select some of the weights and adjust them slightly.
Evolving some other stuff (e.g. activation function) is pretty similar to these.
If you also want to evolve the topology then things become much more interesting. There are quite some additional mutation possibilities, like adding a node (most likely connected to two already existing nodes), splitting a connection (instead of A->B have A->C->B), adding a connection, or the opposites of these.
But crossover will not be too easy (at least if the number of nodes is not fixed), because you will probably want to find "matching" nodes (where matching can be anything, but likely be related to a similar "role", or a similar place in the network). If you also want to do it I'd highly recommend studying already existing techniques. One that I know and like is called NEAT. You can find some info about it at
http://en.wikipedia.org/wiki/Neuroevolution_of_augmenting_topologies
http://nn.cs.utexas.edu/?neat
and http://www.cs.ucf.edu/~kstanley/neat.html
Well, I have never played with such an implementation, but eventually for crossover you could pick a branch of one of the graphs and swap it with a branch from another graph.
For mutation you could randomly change a node inside the graph, with small probability.
Related
I have time-series data of 12 consumers. The data corresponding to 12 consumers (named as a ... l) is
I want to cluster these consumers so that I may know which of the consumers have utmost similar consumption behavior. Accordingly, I found clustering method pamk, which automatically calculates the number of clusters in input data.
I assume that I have only two options to calculate the distance between any two time-series, i.e., Euclidean, and DTW. I tried both of them and I do get different clusters. Now the question is which one should I rely upon? and why?
When I use Eulidean distance I got following clusters:
and using DTW distance I got
Conclusion:
How will you decide which clustering approach is the best in this case?
Note: I have asked the same question on Cross-Validated also.
none of the timeseries above look similar to me. Do you see any pattern? Maybe there is no pattern?
the clustering visualizations indicate that there are no clusters, too. b and l appear to be the most unusual outliers; followed by d,e,h; but there are no clusters there.
Also try hierarchical clustering. The dendrogram may be more understandable.
But in either way, there may be no clusters. You need to be prepared for this outcome, and consider it a valid hypothesis. Double-check any result. As you have seen, pam will always return a result, and you have absolutely no means to decide which result is more "correct" than the other (most likely, neither is correct, and you should rely on neither, to answer your question).
The problem
I have a set of locations on a plane (actually they are pins in a KML file) and I want to partition this graph into subgraphs. Connectivity is pretty good - as with all real world road networks - so I assume that if two locations are close they have some kind of connection. The resulting set of subgraphs should adhere to these constraints:
Every node has to be covered by a subgraph
Every node should be in exactly 1 subgraph
Every node within a subgraph should be close to each other (L2 norm distances)
Every subgraph should contain at least 5 locations
The amount of subgraphs should be minimal
Right now the amount of locations is no more than 100 so I thought about brute forcing through every possibility but this obviously won't scale well.
I thought about using some k-Nearest-Neighbors algorithm (e.g. using QuickGraph) but I can't get my head around where to start and how to extend/shrink the subgraphs on the way. Maybe it's possible to map this problem to another problem that can easily be solved with some numerical procedure (e.g. Simplex) ...
Maybe someone has experience in this kind of optimization problems and is willing to help me find a solution? I don't have access to Mathematica/Matlab or the like ... but sufficient .NET programming skills and hmm Excel :-)
Thanks a lot!
As soon as there are multiple criteria that need to be appeased in the best possible way simultanously, it is usually starting to get difficult.
A numerical solution could work as follows: You could define yourself a utility function, that maps partitionings of your locations to positive real values, describing how "good" a partition is by assigning it a "rating" (good could be high "bad" could be near zero).
Once you have such a function assigning partitions their according "values", you simply need to optimize it and then you hopefully obtain a good solution if you defined your utility function reasonably. Evolutionary algorithms are good at that task since your utility function is probably analytically too complex to solve due to its discrete nature.
The problem is then only how you assign "values" to partitions via this utility function. This is then your task. It can be done for example by weighing each criterion with a factor and summing the results up, or even more complex functions (least squares etc.). The factors you use in the definition of the utility function are tuning parameters and can be varied until the result seems to be good.
Some CA software wold help a lot for testing if you can get your hands on one, bit I guess to obtain a black box solver for your partitioning problem, you need to implement the complete procedure yourself using a language of your choice.
I implemented a differential evolution algorithm for a side project I was doing. Because the crossover step seemed to involve a lot of parameter choices (e.g. crossover probabilities), I decided to skip it and just use mutation. The method seemed to work ok, but I am unsure whether I would get better performance if I introduced crossover.
Main Question: What is the motivation behind introducing crossover to differential evolution? Can you provide a toy example where introducing crossover out-performs pure mutation?
My intuition is that crossover will produce something like the following in 2-dimensions. Say
we have two parent vectors (red). Uniform crossover could produce a new trial vector at one of the blue points.
I am not sure why this kind of exploration would be expected to be beneficial. In fact, it seems like this could make performance worse if high-fitness solutions follow some linear trend. In the figure below, lets say the red points are the current population, and the optimal solution is towards the lower right corner. The population is traveling down a valley such that the upper right and lower left corners produce bad solutions. The upper left corner produces "okay" but suboptimal solutions. Notice how uniform crossover produces trials (in blue) that are orthogonal to the direction of improvement. I've used a cross-over probability of 1 and neglected mutation to illustrate my point (see code). I imagine this situation could arise quite frequently in optimization problems, but could be misunderstanding something.
Note: In the above example, I am implicitly assuming that the population was randomly initialized (uniformly) across this space, and has begun to converge to the correct solution down the central valley (top left to bottom right).
This toy example is convex, and thus differential evolution wouldn't even be the appropriate technique. However, if this motif was embedded in a multi-modal fitness landscape, it seems like crossover might be detrimental. While crossover does support exploration, which could be beneficial, I am not sure why one would choose to explore in this particular direction.
R code for the example above:
N = 50
x1 <- rnorm(N,mean=2,sd=0.5)
x2 <- -x1+4+rnorm(N,mean=0,sd=0.1)
plot(x1,x2,pch=21,col='red',bg='red',ylim=c(0,4),xlim=c(0,4))
x1_cx = list(rep(0, 50))
x2_cx = list(rep(0, 50))
for (i in 0:N) {
x1_cx[i] <- x1[i]
x2_cx[i] <- x2[sample(1:N,1)]
}
points(x1_cx,x2_cx,pch=4,col='blue',lwd=4)
Follow-up Question: If crossover is beneficial in certain situations, is there a sensible approach to a) determining if your specific problem would benefit from crossover, and b) how to tune the crossover parameters to optimize the algorithm?
A related stackoverflow question (I am looking for something more specific, with a toy example for instance): what is the importance of crossing over in Differential Evolution Algorithm?
A similar question, but not specific to differential evolution: Efficiency of crossover in genetic algorithms
I am not particularly familiar with the specifics of the DE algorithm but in general the point of crossover is that if you have two very different individuals with high fitness it will produce an offspring that is intermediate between them without being particularly similar to either. Mutation only explores the local neighbourhood of each individual without taking the rest of the population into account. If you think of genomes as points in some high dimensional vector space, then a mutation is shift in a random direction. Therefore mutation needs to take small steps since if your are starting from a significantly better than random position, a long step in a random direction is almost certain to make things worse because it is essentially just introducing entropy into an evolved genome. You can think of a cross over as a step from one parent towards the other. Since the other parent is also better than random, it is more promising to take a longer step in that direction. This allows for faster exploration of the promising parts of the fitness landscape.
In real biological organisms the genome is often organized in such a way that genes that depend on each other are close together on the same chromosome. This means that crossover is unlikely to break synergetic gene combinations. Real evolution actually moves genes around to achieve this (though this is much slower than the evolution of individual genes) and sometimes the higher order structure of the genome (the 3 dimensional shape of the DNA) evolves to prevent cross-overs in particularly sensitive areas. These mechanisms are rarely modeled in evolutionary algorithms, but you will get more out of crossovers if you order your genome in a way that puts genes that are likely to interact close to each other.
No. Crossover is not useful. There I said it. :P
I've never found a need for crossover. People seem to think it does some kind of magic. But it doesn't (and can't) do anything more useful than simple mutation. Large mutations can be used to explore the entire problem space and small mutations can be used to exploit niches.
And all the explanations I've read are (to put it mildly) unsatisfactory. Crossover only complicates your algorithms. Drop it asap. Your life will be simpler. .... IMHO.
As Daniel says, cross over is a way to take larger steps across the problem landscape, allowing you to escape local maxima that a single mutation would be unable to do so.
Whether it is appropriate or not will depend on the complexity of the problem space, how the genotype -> phenotype expression works (will related genes be close together), etc.
More formally this is the concept of 'Connectivity' in Local Search algorithms, providing strong enough operators that the local search neighbourhood is sufficentally large to escape local minima.
I want to carry out Graph Clustering in a huge undirected graph with millions of edges and nodes. Graph is almost clustered with different clusters joined together only by some nodes(kind of ambiguous nodes which can relate to multiple clusters). There will be very few or almost no edges between two clusters. This problem is almost similar to finding vertex cut set of a graph, with one exception that graph needs to be partitioned into many components(their number being unknown).(Refer this picture https://docs.google.com/file/d/0B7_3zLD0XdtAd3ZwMFAwWDZuU00/edit?pli=1)
Its almost like different strongly connected components sharing a couple of nodes between them and i am supposed to remove those nodes to separate those strongly connected components. Edges are weigthed but this problem is more like finding structures in a graph, so edge weigths won't be of relevance. (Another way to think about the problem would be to visualize Solid Spheres touching each other at some points with Spheres being those strongly connected components and touching points being those ambiguous nodes)
I am prototyping something, so am quiet short of time to pick up Graph Clustering Algorithms by myself and to select the best possible. Plus i need a solution that would cut nodes and not edges since different clusters share nodes and not edges in my case.
Is there any research paper, blog that addresses this or somewhat related problem? Or can anyone come up with a solution to this problem howsoever dirty.
Since millions of nodes and edges are involved, i would need a MapReduce implementation of the solution. Any inputs, links for that too?
Is there any current open source implementation in MapReduce that can i directly use?
I think this problem is analogous to Finding Communities in online social networks by removing vertices.
Your problem is not so simple. I am afraid that it is related to the clique problem, which is NP complete, so unless you quantify somehow the statement "there are almost no edges between the clusters", your problem might be still very difficult. But what I would do in your shoes, would be to try one dirty, greedy approach, namely regarding the nodes as the following kind of quasi-neural net:
Each vertex I would consider to have inputs, outputs and a sigmoid activation function which convert the input value (sum of inputs) into the output value. The output value, and I consider this important, would not be cloned and sent to all the neighbors, but rather divided evenly between the neighbors. In addition to this, I would define a logarithmic decay of activity in a neuron (self-suppression, suppressive connection to itself), defined by a decay parameter global for the net.
Now, I would start simulation with all the neurons starting from activity 0.5 (activity range is 0 to 1) with very high decay parameter, which would lead to all the neuronst quickly stabilizing in 0 state. I would then gradually decrease the decay parameter until the steady state result would yield the first clique with non-zero stable activity.
The question is what to do next. One possibility is to subtract the found clique from the graph and run the same process again until we find all the cliques. This greedy approach might succeed if your graph is indeed as well behaved (really almost clustered) as you say, but might lead to unexpected results otherwise. Another possibility is to give the found clique a unique clique smell that would be repulsive (mutual suppresion) to other cliques an rerun the algorithm until the second clique is found, give it a different clique smell repulsive to all others etc., until each node has its own assigned smell.
I think this would be as many big ideas as i have about this.
The key is, that since it is probably not possible to solve this problem in the general case (likely NP complete), you need to take use of whatever special properties your graph has. That means you need to play with parameters for a while until the algorithm solves 99% of the cases that you encounter. I don't think that it is possible to give the numerically precise answer to your question without long experimentation with the actual datasets that you encounter.
Since millions of nodes and edges are involved, i would need a MapReduce implementation of the solution. Any inputs, links for that too?
In my experience I doubt if using Map/Reduce over here would be truly advantageous. First 10^6 order of nodes isn't really that large [that too in a non hyper-connected graph, since you are considering clustering], and the over head of using Map/Reduce [unless you already have setup your hardware/software for it] for your problem will not be worth it.
Map/Reduce will work much better, where once you have solved the clustering issue, and then want to process each cluster with similar analysis. Basically when you can break your task into relatively isolated sub-tasks, which can be performed in parallel. This of course can be cascaded to several layers.
In a relatively similar situation, I personally first modelled my graph into a graph database (I used Neo4J, and would recommend it highly) and then ran my analytic and queries on it. You will be surprised as to how white board friendly this solution is, and even massively joined and connected queries will be executed near instantaneously especially at the scale of only a few million nodes. For example, you can do a filtered analysis, based on degrees of separation, followed by listing of commons.
I have a list of about 100 igraph objects with a typical object having about 700 vertices and 3500 edges.
I would like to identify groups of vertices within which ties are more likely. My plan is to then use a mixed model to predict how many within-group ties vertices have using vertex and group attributes.
Some people may want to respond to other aspects of my project, which would be great, but the thing I'm most interested in is information about functions in igraph for grouping vertices. I've come across these community detection algorithms but I'm not sure of their advantages and disadvantages, or whether some other function would be better for my case. I saw the links here as well, but they aren't specific to igraph. Thanks for your advice.
Here is a short summary about the community detection algorithms currently implemented in igraph:
edge.betweenness.community is a hierarchical decomposition process where edges are removed in the decreasing order of their edge betweenness scores (i.e. the number of shortest paths that pass through a given edge). This is motivated by the fact that edges connecting different groups are more likely to be contained in multiple shortest paths simply because in many cases they are the only option to go from one group to another. This method yields good results but is very slow because of the computational complexity of edge betweenness calculations and because the betweenness scores have to be re-calculated after every edge removal. Your graphs with ~700 vertices and ~3500 edges are around the upper size limit of graphs that are feasible to be analyzed with this approach. Another disadvantage is that edge.betweenness.community builds a full dendrogram and does not give you any guidance about where to cut the dendrogram to obtain the final groups, so you'll have to use some other measure to decide that (e.g., the modularity score of the partitions at each level of the dendrogram).
fastgreedy.community is another hierarchical approach, but it is bottom-up instead of top-down. It tries to optimize a quality function called modularity in a greedy manner. Initially, every vertex belongs to a separate community, and communities are merged iteratively such that each merge is locally optimal (i.e. yields the largest increase in the current value of modularity). The algorithm stops when it is not possible to increase the modularity any more, so it gives you a grouping as well as a dendrogram. The method is fast and it is the method that is usually tried as a first approximation because it has no parameters to tune. However, it is known to suffer from a resolution limit, i.e. communities below a given size threshold (depending on the number of nodes and edges if I remember correctly) will always be merged with neighboring communities.
walktrap.community is an approach based on random walks. The general idea is that if you perform random walks on the graph, then the walks are more likely to stay within the same community because there are only a few edges that lead outside a given community. Walktrap runs short random walks of 3-4-5 steps (depending on one of its parameters) and uses the results of these random walks to merge separate communities in a bottom-up manner like fastgreedy.community. Again, you can use the modularity score to select where to cut the dendrogram. It is a bit slower than the fast greedy approach but also a bit more accurate (according to the original publication).
spinglass.community is an approach from statistical physics, based on the so-called Potts model. In this model, each particle (i.e. vertex) can be in one of c spin states, and the interactions between the particles (i.e. the edges of the graph) specify which pairs of vertices would prefer to stay in the same spin state and which ones prefer to have different spin states. The model is then simulated for a given number of steps, and the spin states of the particles in the end define the communities. The consequences are as follows: 1) There will never be more than c communities in the end, although you can set c to as high as 200, which is likely to be enough for your purposes. 2) There may be less than c communities in the end as some of the spin states may become empty. 3) It is not guaranteed that nodes in completely remote (or disconencted) parts of the networks have different spin states. This is more likely to be a problem for disconnected graphs only, so I would not worry about that. The method is not particularly fast and not deterministic (because of the simulation itself), but has a tunable resolution parameter that determines the cluster sizes. A variant of the spinglass method can also take into account negative links (i.e. links whose endpoints prefer to be in different communities).
leading.eigenvector.community is a top-down hierarchical approach that optimizes the modularity function again. In each step, the graph is split into two parts in a way that the separation itself yields a significant increase in the modularity. The split is determined by evaluating the leading eigenvector of the so-called modularity matrix, and there is also a stopping condition which prevents tightly connected groups to be split further. Due to the eigenvector calculations involved, it might not work on degenerate graphs where the ARPACK eigenvector solver is unstable. On non-degenerate graphs, it is likely to yield a higher modularity score than the fast greedy method, although it is a bit slower.
label.propagation.community is a simple approach in which every node is assigned one of k labels. The method then proceeds iteratively and re-assigns labels to nodes in a way that each node takes the most frequent label of its neighbors in a synchronous manner. The method stops when the label of each node is one of the most frequent labels in its neighborhood. It is very fast but yields different results based on the initial configuration (which is decided randomly), therefore one should run the method a large number of times (say, 1000 times for a graph) and then build a consensus labeling, which could be tedious.
igraph 0.6 will also include the state-of-the-art Infomap community detection algorithm, which is based on information theoretic principles; it tries to build a grouping which provides the shortest description length for a random walk on the graph, where the description length is measured by the expected number of bits per vertex required to encode the path of a random walk.
Anyway, I would probably go with fastgreedy.community or walktrap.community as a first approximation and then evaluate other methods when it turns out that these two are not suitable for a particular problem for some reason.
A summary of the different community detection algorithms can be found here: http://www.r-bloggers.com/summary-of-community-detection-algorithms-in-igraph-0-6/
Notably, the InfoMAP algorithm is a recent newcomer that could be useful (it supports directed graphs too).