Best data structure to store coordinates for efficient look-up? - voronoi

There seems to be a similar question here, but I was satisfied with neither the clarity of the answer, nor its practicality. I was asked in a recent interview what data structure I would store my large set of floating point numbers so that I look up a new arrival for itself or its closest neighbor. I said I would use a binary search tree, and try to make it balanced to achieve O(log n).
Then the question was extended to two dimensions: What data structure would I use to store a large set of (x,y) pairs such as geographical coordinates for fast look up? I couldn't think of a satisfactory answer and totally gave up when extended to K-dimensions. Using a k-dimensional tree directly that uses coordinates values to "split" the space doesn't seem to work since two close points near the origin but inside different quadrants may end up on far far away leaves.
After the interview, I remembered Voronoi diagrams for partitioning K-dimensional space nicely. What is the best way of implementing this using what data structure? How are the look ups going to be performed? I feel like this question is so common in computer science that by now it has even a dedicated data structure.

You can use a grid and sort the points into the grid-cells. There is a similar question here:Distance Calculation for massive number of devices/nodes. You can also use a space filling curve (quadkey) or a quadtree,e.g. r-tree when you need additional informations,e.g hierarchy.

Related

How to Transform Graph into Grid Map?

I'm making a program that implements procedural content generation (PCG) to create maps in a 2d game.
I use the graph data structure as the basis. then the graph will be transformed into a map like in the example image I attached.
with graph specifications as follows:
-vertex can have more than 4 edges
-allowed the formation of cycles in the graph
any suggestions on what method I can use to transform the graph to a 2d map in a grid with space-tight results?
thanks
Uh, that is a tough one. The first problem you will encounter is whether this is even possible for the graph you use. See more below for that specific topic.
Let's say we ignore the fact that your graph could be impossible to map to a grid. I faced the same issue in my Master's Thesis a few years back. (PDF available here; 3.4 World Generation; page 25). I tried to find an algorithm, that could generate my world from a graph structure but ultimately failed. I tried placing one element after the other and implemented some backtracking in case it got stuck. But in the end you're facing a similar complexity to calculating chess moves. At some point you know you messed up, but you don't know how many steps you should go back/reverse, before trying the next one. If you try to solve this by brute force, you're not going to have a good time. And I did not come up with good heuristics to solve it in an adequate time.
My solution: I decided in the end to go with AnswerSet Programming. You're basically not solving the problem with an algorithm, but you find a (more or less) elegant logical representation of your problem and let a logic solver (program specifically made to find a valid solution to your logical problem-representation) do the work. Have a look in my thesis about the details, it was a few years ago and I didn't use one since. I remember however, that this process was not easy and it took me a few days to find a good logical representation of my problem.
Another question to ask: Could you work on the grid directly? Or maybe on a graph structure representing a grid? In the end a grid is nothing else than a graph; every cell is a node and neighbouring connections are the edges. I have quite some experience in the field and would be happy to help you, if you'd like to share what you want to achieve with your generator. I have also a vast collection of resources about procedural generation, maybe you find something helpful there, too.
More on the planarity of a graph: For your graph to be mappable to a plane, it needs to be planar, and checking so is also not trivial. The easiest way - if I'm not mistaken - is to prove the existence of a non-planar sub-graph, e.g. the K5 (the smallest non-planar a complete graph) or K3,3 (the smallest non-planar complete bipartite graph). And even if your graph is planar, it is not necessarily guaranteed that you can put it on your grid.

Graph partitioning optimization

The problem
I have a set of locations on a plane (actually they are pins in a KML file) and I want to partition this graph into subgraphs. Connectivity is pretty good - as with all real world road networks - so I assume that if two locations are close they have some kind of connection. The resulting set of subgraphs should adhere to these constraints:
Every node has to be covered by a subgraph
Every node should be in exactly 1 subgraph
Every node within a subgraph should be close to each other (L2 norm distances)
Every subgraph should contain at least 5 locations
The amount of subgraphs should be minimal
Right now the amount of locations is no more than 100 so I thought about brute forcing through every possibility but this obviously won't scale well.
I thought about using some k-Nearest-Neighbors algorithm (e.g. using QuickGraph) but I can't get my head around where to start and how to extend/shrink the subgraphs on the way. Maybe it's possible to map this problem to another problem that can easily be solved with some numerical procedure (e.g. Simplex) ...
Maybe someone has experience in this kind of optimization problems and is willing to help me find a solution? I don't have access to Mathematica/Matlab or the like ... but sufficient .NET programming skills and hmm Excel :-)
Thanks a lot!
As soon as there are multiple criteria that need to be appeased in the best possible way simultanously, it is usually starting to get difficult.
A numerical solution could work as follows: You could define yourself a utility function, that maps partitionings of your locations to positive real values, describing how "good" a partition is by assigning it a "rating" (good could be high "bad" could be near zero).
Once you have such a function assigning partitions their according "values", you simply need to optimize it and then you hopefully obtain a good solution if you defined your utility function reasonably. Evolutionary algorithms are good at that task since your utility function is probably analytically too complex to solve due to its discrete nature.
The problem is then only how you assign "values" to partitions via this utility function. This is then your task. It can be done for example by weighing each criterion with a factor and summing the results up, or even more complex functions (least squares etc.). The factors you use in the definition of the utility function are tuning parameters and can be varied until the result seems to be good.
Some CA software wold help a lot for testing if you can get your hands on one, bit I guess to obtain a black box solver for your partitioning problem, you need to implement the complete procedure yourself using a language of your choice.

Scotch/PT-Scotch graph vertex reordering for minimizing adjacency matrix bandwidth

Although the Scotch documentation is pretty clear it lacks examples of using the API. Even using Google for finding other third party documents, examples or tutorials is a dead end.
My problem is the following:
I want to reorder the vertices of a graph with the goal to reduce the adjacency matrix bandwidth using Scotch. Now Scotch has the GPS (Gibbs-Poole-Stockmeyer) algorithm implemented which is one algorithm that can do this type of reordering. But the documentation says:
This method is mainly used on separators to reduce the number and
extent of extra-diagonal blocks.
I used the strategy string "g" to select the GPS algorithm, tried it with different pass values, but the result doesn't come. All I get is a matrix with bandwidth larger than the original.
My question is:
How can I tell SCOTCH_graphOrder() to do reordering (to reduce bandwidth) on the whole graph?
If you could at least recommend any resources where I might find the answer, I'd be thankful.

rapid exploring random trees

http://msl.cs.uiuc.edu/rrt/
Can anyone explain how rrt works with simple wording that is easy to understand?
I read the description in the site and in wikipedia.
What I would like to see, is a short implementation of a rrt or a thorough explanation of the following thing:
Why does the rrt grow outwards instead of just growing very dense around the center?
How is it different from a naive random tree?
How is the next new vertex that we attempt to reach picked?
I know there is an Motion Strategy Library I could download but I would much rather understand the idea before I delve into the code rather than the other way around.
The simplest possible RRT algorithm has been so successful because it is pretty easy to implement. Things tend to get complicated when you:
need to visualise planning concepts in more than two dimensions
are unfamiliar with the terminology associated with planning, and;
in the huge number of variants of RRT that are have been described in the literature.
Pseudo code
The basic algorithm looks something like this:
Start with an empty search tree
Add your initial location (configuration) to the search tree
while your search tree has not reached the goal (and you haven't run out of time)
3.1. Pick a location (configuration), q_r, (with some sampling strategy)
3.2. Find the vertex in the search tree closest to that random point, q_n
3.3. Try to add an edge (path) in the tree between q_n and q_r, if you can link them without a collision occurring.
Although that description is adequate, after a while working in this space, I really do prefer the pseudocode of figure 5.16 on RRT/RDT in Steven LaValle's book "Planning Algorithms".
Tree Structure
The reason that the tree ends up covering the entire search space (in most cases) is because of the combination of the sampling strategy, and always looking to connect from the nearest point in the tree. This effect is described as reducing the Voronoi bias.
Sampling Strategy
The choice of where to place the next vertex that you will attempt to connect to is the sampling problem. In simple cases, where search is low dimensional, uniform random placement (or uniform random placement biased toward the goal) works adequately. In high dimensional problems, or when motions are very complex (when joints have positions, velocities and accelerations), or configuration is difficult to control, sampling strategies for RRTs are still an open research area.
Libraries
The MSL library is a good starting point if you're really stuck on implementation, but it hasn't been actively maintained since 2003. A more up-to-date library is the Open Motion Planning Library (OMPL). You'll also need a good collision detection library.
Planning Terminology & Advice
From a terminology point of view, the hard bit is to realise that although lots of the diagrams you see in the (early years of) publications on RRT are in two dimensions (trees that link 2d points), that this is the absolute simplest case.
Typically, a mathematically rigorous way to describe complex physical situations is required. A good example of this is planning for a robot arm with n- linkages. Describing the end of such an arm requires a minimum of n joint angles. This set of minimum parameters to describe a position is a configuration (or some publications state). A single configuration is often denoted q
The combination of all possible configurations (or a subset thereof) that can be achieved make up a configuration space (or state space). This can be as simple as an unbounded 2d plane for a point in the plane, or incredibly complex combinations of ranges of other parameters.

applying crossover and mutation to a graph (genetic algorithm)

I'm playing arround with a Genetic Algorithm in which I want to evolve graphs.
Do you know a way to apply crossover and mutation when the chromosomes are graphs?
Or am I missing a coding for the graphs that let me apply "regular" crossover and mutation over bit strings?
thanks a lot!
Any help, even if it is not directly related to my problem, is appreciated!
Manuel
I like Sandor's suggestion of using Ken Stanley's NEAT algorithm.
NEAT was designed to evolve neural networks with arbitrary topologies, but those are just basically directed graphs. There were many ways to evolve neural networks before NEAT, but one of NEAT's most important contributions was that it provided a way to perform meaningful crossover between two networks that have different toplogies.
To accomplish this, NEAT uses historical markings attached to each gene to "line up" the genes of two genomes during crossover (a process biologists call synapsis). For example:
(source: natekohl.net)
(In this example, each gene is a box and represents a connection between two nodes. The number at the top of each gene is the historical marking for that gene.)
In summary: Lining up genes based on historical markings is a principled way to perform crossover between two networks without expensive topological analysis.
You might as well try Genetic Programming. A graph would be the closest thing to a tree and GP uses trees... if you still want to use GAs instead of GPs then take a look at how crossover is performed on a GP and that might give you an idea how to perform it on the graphs of your GA:
(source: geneticprogramming.com)
Here is how crossover for trees (and graphs) works:
You select 2 specimens for mating.
You pick a random node from one parent and swap it with a random node in the other parent.
The resulting trees are the offspring.
As others have mentioned, one common way to cross graphs (or trees) in a GA is to swap subgraphs (subtrees). For mutation, just randomly change some of the nodes (w/ small probability).
Alternatively, if you are representing a graph as an adjacency matrix, then you might swap/mutate elements in the matrices (kind of like using a two-dimensional bit string).
I'm not sure if using a bitstring is the best idea, I'd rather represent at least the weights with real values. Nevertheless bitstrings may also work.
If you have a fixed topology, then both crossover and mutation are quite easy (assuming you only evolve the weights of the network):
Crossover: take some weights from one parent, the rest from the other, can be very easily done if you represent the weights as an array or list. For more details or alternatives see http://en.wikipedia.org/wiki/Crossover_%28genetic_algorithm%29.
Mutation: simply select some of the weights and adjust them slightly.
Evolving some other stuff (e.g. activation function) is pretty similar to these.
If you also want to evolve the topology then things become much more interesting. There are quite some additional mutation possibilities, like adding a node (most likely connected to two already existing nodes), splitting a connection (instead of A->B have A->C->B), adding a connection, or the opposites of these.
But crossover will not be too easy (at least if the number of nodes is not fixed), because you will probably want to find "matching" nodes (where matching can be anything, but likely be related to a similar "role", or a similar place in the network). If you also want to do it I'd highly recommend studying already existing techniques. One that I know and like is called NEAT. You can find some info about it at
http://en.wikipedia.org/wiki/Neuroevolution_of_augmenting_topologies
http://nn.cs.utexas.edu/?neat
and http://www.cs.ucf.edu/~kstanley/neat.html
Well, I have never played with such an implementation, but eventually for crossover you could pick a branch of one of the graphs and swap it with a branch from another graph.
For mutation you could randomly change a node inside the graph, with small probability.

Resources