I'm making a program that implements procedural content generation (PCG) to create maps in a 2d game.
I use the graph data structure as the basis. then the graph will be transformed into a map like in the example image I attached.
with graph specifications as follows:
-vertex can have more than 4 edges
-allowed the formation of cycles in the graph
any suggestions on what method I can use to transform the graph to a 2d map in a grid with space-tight results?
thanks
Uh, that is a tough one. The first problem you will encounter is whether this is even possible for the graph you use. See more below for that specific topic.
Let's say we ignore the fact that your graph could be impossible to map to a grid. I faced the same issue in my Master's Thesis a few years back. (PDF available here; 3.4 World Generation; page 25). I tried to find an algorithm, that could generate my world from a graph structure but ultimately failed. I tried placing one element after the other and implemented some backtracking in case it got stuck. But in the end you're facing a similar complexity to calculating chess moves. At some point you know you messed up, but you don't know how many steps you should go back/reverse, before trying the next one. If you try to solve this by brute force, you're not going to have a good time. And I did not come up with good heuristics to solve it in an adequate time.
My solution: I decided in the end to go with AnswerSet Programming. You're basically not solving the problem with an algorithm, but you find a (more or less) elegant logical representation of your problem and let a logic solver (program specifically made to find a valid solution to your logical problem-representation) do the work. Have a look in my thesis about the details, it was a few years ago and I didn't use one since. I remember however, that this process was not easy and it took me a few days to find a good logical representation of my problem.
Another question to ask: Could you work on the grid directly? Or maybe on a graph structure representing a grid? In the end a grid is nothing else than a graph; every cell is a node and neighbouring connections are the edges. I have quite some experience in the field and would be happy to help you, if you'd like to share what you want to achieve with your generator. I have also a vast collection of resources about procedural generation, maybe you find something helpful there, too.
More on the planarity of a graph: For your graph to be mappable to a plane, it needs to be planar, and checking so is also not trivial. The easiest way - if I'm not mistaken - is to prove the existence of a non-planar sub-graph, e.g. the K5 (the smallest non-planar a complete graph) or K3,3 (the smallest non-planar complete bipartite graph). And even if your graph is planar, it is not necessarily guaranteed that you can put it on your grid.
Related
The problem
I have a set of locations on a plane (actually they are pins in a KML file) and I want to partition this graph into subgraphs. Connectivity is pretty good - as with all real world road networks - so I assume that if two locations are close they have some kind of connection. The resulting set of subgraphs should adhere to these constraints:
Every node has to be covered by a subgraph
Every node should be in exactly 1 subgraph
Every node within a subgraph should be close to each other (L2 norm distances)
Every subgraph should contain at least 5 locations
The amount of subgraphs should be minimal
Right now the amount of locations is no more than 100 so I thought about brute forcing through every possibility but this obviously won't scale well.
I thought about using some k-Nearest-Neighbors algorithm (e.g. using QuickGraph) but I can't get my head around where to start and how to extend/shrink the subgraphs on the way. Maybe it's possible to map this problem to another problem that can easily be solved with some numerical procedure (e.g. Simplex) ...
Maybe someone has experience in this kind of optimization problems and is willing to help me find a solution? I don't have access to Mathematica/Matlab or the like ... but sufficient .NET programming skills and hmm Excel :-)
Thanks a lot!
As soon as there are multiple criteria that need to be appeased in the best possible way simultanously, it is usually starting to get difficult.
A numerical solution could work as follows: You could define yourself a utility function, that maps partitionings of your locations to positive real values, describing how "good" a partition is by assigning it a "rating" (good could be high "bad" could be near zero).
Once you have such a function assigning partitions their according "values", you simply need to optimize it and then you hopefully obtain a good solution if you defined your utility function reasonably. Evolutionary algorithms are good at that task since your utility function is probably analytically too complex to solve due to its discrete nature.
The problem is then only how you assign "values" to partitions via this utility function. This is then your task. It can be done for example by weighing each criterion with a factor and summing the results up, or even more complex functions (least squares etc.). The factors you use in the definition of the utility function are tuning parameters and can be varied until the result seems to be good.
Some CA software wold help a lot for testing if you can get your hands on one, bit I guess to obtain a black box solver for your partitioning problem, you need to implement the complete procedure yourself using a language of your choice.
This has to be a well researched problem, but I am struggling researching it.
I started here, but I am looking for algorithms to study and implement.
http://en.wikipedia.org/wiki/Graph_isomorphism_problem
For example, if I have two of these DAGs (Directe Acyclic Graphs), I want to mark/delete one of them because it is just a rotation/reflection of the first. Being in the same automorphism group means they can be rotated/reflected to have the exact same adjacency matrix right?
you can use nauty or saucy algorithm to compute this problem.
This links might be helpful to you. :)
Nauty:
http://cs.anu.edu.au/~bdm/nauty/
http://cs.anu.edu.au/~bdm/nauty/
Saucy:
http://vlsicad.eecs.umich.edu/BK/SAUCY/
There is also a list of ready made tools available (especially for command line in linux based OS), in the saucy page.
There seems to be a similar question here, but I was satisfied with neither the clarity of the answer, nor its practicality. I was asked in a recent interview what data structure I would store my large set of floating point numbers so that I look up a new arrival for itself or its closest neighbor. I said I would use a binary search tree, and try to make it balanced to achieve O(log n).
Then the question was extended to two dimensions: What data structure would I use to store a large set of (x,y) pairs such as geographical coordinates for fast look up? I couldn't think of a satisfactory answer and totally gave up when extended to K-dimensions. Using a k-dimensional tree directly that uses coordinates values to "split" the space doesn't seem to work since two close points near the origin but inside different quadrants may end up on far far away leaves.
After the interview, I remembered Voronoi diagrams for partitioning K-dimensional space nicely. What is the best way of implementing this using what data structure? How are the look ups going to be performed? I feel like this question is so common in computer science that by now it has even a dedicated data structure.
You can use a grid and sort the points into the grid-cells. There is a similar question here:Distance Calculation for massive number of devices/nodes. You can also use a space filling curve (quadkey) or a quadtree,e.g. r-tree when you need additional informations,e.g hierarchy.
http://msl.cs.uiuc.edu/rrt/
Can anyone explain how rrt works with simple wording that is easy to understand?
I read the description in the site and in wikipedia.
What I would like to see, is a short implementation of a rrt or a thorough explanation of the following thing:
Why does the rrt grow outwards instead of just growing very dense around the center?
How is it different from a naive random tree?
How is the next new vertex that we attempt to reach picked?
I know there is an Motion Strategy Library I could download but I would much rather understand the idea before I delve into the code rather than the other way around.
The simplest possible RRT algorithm has been so successful because it is pretty easy to implement. Things tend to get complicated when you:
need to visualise planning concepts in more than two dimensions
are unfamiliar with the terminology associated with planning, and;
in the huge number of variants of RRT that are have been described in the literature.
Pseudo code
The basic algorithm looks something like this:
Start with an empty search tree
Add your initial location (configuration) to the search tree
while your search tree has not reached the goal (and you haven't run out of time)
3.1. Pick a location (configuration), q_r, (with some sampling strategy)
3.2. Find the vertex in the search tree closest to that random point, q_n
3.3. Try to add an edge (path) in the tree between q_n and q_r, if you can link them without a collision occurring.
Although that description is adequate, after a while working in this space, I really do prefer the pseudocode of figure 5.16 on RRT/RDT in Steven LaValle's book "Planning Algorithms".
Tree Structure
The reason that the tree ends up covering the entire search space (in most cases) is because of the combination of the sampling strategy, and always looking to connect from the nearest point in the tree. This effect is described as reducing the Voronoi bias.
Sampling Strategy
The choice of where to place the next vertex that you will attempt to connect to is the sampling problem. In simple cases, where search is low dimensional, uniform random placement (or uniform random placement biased toward the goal) works adequately. In high dimensional problems, or when motions are very complex (when joints have positions, velocities and accelerations), or configuration is difficult to control, sampling strategies for RRTs are still an open research area.
Libraries
The MSL library is a good starting point if you're really stuck on implementation, but it hasn't been actively maintained since 2003. A more up-to-date library is the Open Motion Planning Library (OMPL). You'll also need a good collision detection library.
Planning Terminology & Advice
From a terminology point of view, the hard bit is to realise that although lots of the diagrams you see in the (early years of) publications on RRT are in two dimensions (trees that link 2d points), that this is the absolute simplest case.
Typically, a mathematically rigorous way to describe complex physical situations is required. A good example of this is planning for a robot arm with n- linkages. Describing the end of such an arm requires a minimum of n joint angles. This set of minimum parameters to describe a position is a configuration (or some publications state). A single configuration is often denoted q
The combination of all possible configurations (or a subset thereof) that can be achieved make up a configuration space (or state space). This can be as simple as an unbounded 2d plane for a point in the plane, or incredibly complex combinations of ranges of other parameters.
I want to carry out Graph Clustering in a huge undirected graph with millions of edges and nodes. Graph is almost clustered with different clusters joined together only by some nodes(kind of ambiguous nodes which can relate to multiple clusters). There will be very few or almost no edges between two clusters. This problem is almost similar to finding vertex cut set of a graph, with one exception that graph needs to be partitioned into many components(their number being unknown).(Refer this picture https://docs.google.com/file/d/0B7_3zLD0XdtAd3ZwMFAwWDZuU00/edit?pli=1)
Its almost like different strongly connected components sharing a couple of nodes between them and i am supposed to remove those nodes to separate those strongly connected components. Edges are weigthed but this problem is more like finding structures in a graph, so edge weigths won't be of relevance. (Another way to think about the problem would be to visualize Solid Spheres touching each other at some points with Spheres being those strongly connected components and touching points being those ambiguous nodes)
I am prototyping something, so am quiet short of time to pick up Graph Clustering Algorithms by myself and to select the best possible. Plus i need a solution that would cut nodes and not edges since different clusters share nodes and not edges in my case.
Is there any research paper, blog that addresses this or somewhat related problem? Or can anyone come up with a solution to this problem howsoever dirty.
Since millions of nodes and edges are involved, i would need a MapReduce implementation of the solution. Any inputs, links for that too?
Is there any current open source implementation in MapReduce that can i directly use?
I think this problem is analogous to Finding Communities in online social networks by removing vertices.
Your problem is not so simple. I am afraid that it is related to the clique problem, which is NP complete, so unless you quantify somehow the statement "there are almost no edges between the clusters", your problem might be still very difficult. But what I would do in your shoes, would be to try one dirty, greedy approach, namely regarding the nodes as the following kind of quasi-neural net:
Each vertex I would consider to have inputs, outputs and a sigmoid activation function which convert the input value (sum of inputs) into the output value. The output value, and I consider this important, would not be cloned and sent to all the neighbors, but rather divided evenly between the neighbors. In addition to this, I would define a logarithmic decay of activity in a neuron (self-suppression, suppressive connection to itself), defined by a decay parameter global for the net.
Now, I would start simulation with all the neurons starting from activity 0.5 (activity range is 0 to 1) with very high decay parameter, which would lead to all the neuronst quickly stabilizing in 0 state. I would then gradually decrease the decay parameter until the steady state result would yield the first clique with non-zero stable activity.
The question is what to do next. One possibility is to subtract the found clique from the graph and run the same process again until we find all the cliques. This greedy approach might succeed if your graph is indeed as well behaved (really almost clustered) as you say, but might lead to unexpected results otherwise. Another possibility is to give the found clique a unique clique smell that would be repulsive (mutual suppresion) to other cliques an rerun the algorithm until the second clique is found, give it a different clique smell repulsive to all others etc., until each node has its own assigned smell.
I think this would be as many big ideas as i have about this.
The key is, that since it is probably not possible to solve this problem in the general case (likely NP complete), you need to take use of whatever special properties your graph has. That means you need to play with parameters for a while until the algorithm solves 99% of the cases that you encounter. I don't think that it is possible to give the numerically precise answer to your question without long experimentation with the actual datasets that you encounter.
Since millions of nodes and edges are involved, i would need a MapReduce implementation of the solution. Any inputs, links for that too?
In my experience I doubt if using Map/Reduce over here would be truly advantageous. First 10^6 order of nodes isn't really that large [that too in a non hyper-connected graph, since you are considering clustering], and the over head of using Map/Reduce [unless you already have setup your hardware/software for it] for your problem will not be worth it.
Map/Reduce will work much better, where once you have solved the clustering issue, and then want to process each cluster with similar analysis. Basically when you can break your task into relatively isolated sub-tasks, which can be performed in parallel. This of course can be cascaded to several layers.
In a relatively similar situation, I personally first modelled my graph into a graph database (I used Neo4J, and would recommend it highly) and then ran my analytic and queries on it. You will be surprised as to how white board friendly this solution is, and even massively joined and connected queries will be executed near instantaneously especially at the scale of only a few million nodes. For example, you can do a filtered analysis, based on degrees of separation, followed by listing of commons.