Break up graph into smallest sub-components of 2-nodes or greater - r

I wish to be able to separate my graph into subcomponent such that the removal of any single node would create no further sub-components (excluding single nodes). As an example see the two images below.
The first image shows the complete graph. The second image shows the sub-components of the graph when it has been split into the smallest possible subcomponents. As can be seen from the second image, the vertex names have been maintained. I don't need the new structure to be a single graph it can be a list of graphs, or even a list of the nodes in each component.
The component of nodes 4-5-6 remains as removing any of the three nodes will not create a new component as the node that was broken off will only be a single node.
At the moment I am trying to put together an iterative process, that removes nodes sequentially in ascending degree order and recurses into the resultant new components. However, it is difficult and I imagine someone else has done it better before.

You say you want the "smallest subcomponents of 2 nodes of greater", and that your example has the "smallest possible subcomponents". But what you actually meant is the largest possible subcomponents such that the removal of any single node would create no further sub-components, right? Otherwise you could just separate the graph into a collection of all of the 2-graphs.
I believe, then, that your problem can be described as finding all "biconnected components" (aka maximal biconnected subgraphs of a graph): https://en.wikipedia.org/wiki/Biconnected_component
As you said in the comments, igraph has the function biconnected_components(g), which will solve your problem. :)

Related

Best data structure & packages to represent geometric units on a grid

I want to write a program with 'geometry automata'. I'd like it to be a companion to a book on artistic designs. There will be different units, like the 'four petal unit' and 'six petal unit' shown below, and users and choose rulesets to draw unique patterns onto the units:
I don't know what the best data structure to use for this project is. I also don't know if similar things have been done and if so, using what packages or languages. I'm willing to learn anything.
All I know right now is 2D arrays to represent a grid of units. I'm also having trouble mathematically partitioning the 'subunits'. I can see myself just overlapping a bunch of unit circle formulas and shrinking the x/y domains (cartesian system). I can also see myself representing the curve from one unit to another (radians).
Any help would be appreciated.
Thanks!!
I can't guarantee that this is the most efficient solution, but it is a solution so should get you started.
It seems that a graph (vertices with edges) is a natural way to encode this grid. Each node has 4 or 6 neighbours (the number of neighbours matches the number of petals). Each node has 8 or 12 edges, two for each neighbour.
Each vertex has an (x,y) co-ordinate, for example the first row in in the left image, starting from the left is at location (1,0), the next node to its right is (3,0). The first node on the second row is (0,1). This can let you make sure they get plotted correctly, but otherwise the co-ordinate doesn't have much to do with it.
The trouble comes from having two different edges to each neighbour, each aligned with a different circle. You could identify them with the centres of their circles, or you could just call one "upper" and the other "lower".
This structure lets you follow edges easily, and can be stored sparsely if necessary in a hash set (keyed by co-ordinate), or linked list.
Data structure:
The vertices can naturally be stored as a 2-dimensional array (row, column), with the special characteristic that every second column has a horizontal offset.
Each vertex has a set of possible connections to those vertices to its right (upper-right, right, or lower right). The set of possible connections depends on the grid. Whether a connection should be displayed as a thin or a thick line can be represented as a single bit, so all possible connections for the vertex could be packed into a single byte (more compact than a boolean array). For your 4-petal variant, only 4 bits need storing; for the 6-petal variant you need to store 6 bits.
That means your data structure should be a 2-dimensional array of bytes.
Package:
Anything you like that allows drawing and mouse/touch interaction. Drawing the connections is pretty straightforward; you could either draw arcs with SVG or you could even use a set of PNG sprites for different connection bit-patterns (the sprites having partial transparency so as not to obscure other connections).

Edge-connectivity: Does it mean to split a graph into two?

The minimum number of edges whose deletion from a graph G disconnects G.
Above is the definition of edge connectivity, does it mean G will be split into two pieces only?
or will be split into any number of pieces?
Just did not see that point, which one is right?
Say the edge-connectivity is k. It means you need to remove at least k links to split a graph into several (separated) components. Now, remove only the k-1 first links. At this point, the graph is still connected. The removal of the kth link will split it. But a link connects only two nodes, so, if each node belongs to one different potential component, it connects (at most) only two potential components. So, removing this kth link will always split the graph into only 2 components. This is not true for node-connectivity, since a node can be attached to several links, i.e. several other nodes, i.e. more than two potential components.

How to find the longest path in a graph with a set of start and target points?

I have a DAG (with costs/weights per edge) and want to find the longest path between two sets of nodes. The two sets of start and target nodes are disjoint and small in size compared to the total number of nodes in the graph.
I know how to do this efficiently between one start and target node. With multiple, I can list all paths from every start to every target node and pick the longest one – but that takes quadratic number of single path searches. Is there a better way?
I assume that you want the longest path possible that starts in any of the nodes from the first set and ends in any of the nodes in the second set. Then you can add two virtual nodes:
The first node has no predecessors and its successors are the nodes from the first set.
The second node has no successors and its predecessors are the nodes from the second set.
All the newly added edges should have zero weight.
The graph would still be a DAG. Now if you use the standard algorithm to find the longest path in the DAG between the two new nodes, you’ll get the longest path that starts in the first set and ends in the second set, except that there will be an extra zero-weighted edge at the beginning and an extra zero-weighted edge at the end.
By the way, this solution is essentially executing the algorithm from all the nodes from the first set, but in parallel as opposed to the sequential approach your question suggests.

How is this Huffman Table created?

I have a table that shows the probability of an event happening.
I'm fine with part 1, but part 2 is not clicking with me. I'm trying to get my head around how
the binary numbers are derived in part 2?
I understand 0 is assigned to the largest probability and we work back from there, but how do we work out what the next set of binary numbers is? And what do the circles around the numbers mean/2 shades of grey differentiate?
It's just not clicking. Maybe someone can explain it in a way that will make me understand?
To build huffman codes, one approach is to build a binary tree, using a priority queue, in which the data to be assigned codes are inserted, sorted by frequency.
To start with, you have a queue with only leaf nodes, representing each of your data.
At each step you take the two lowest priority nodes from the queue, make a new node with a frequency equal to the sum of the two removed nodes, and then attach those two nodes as the left and right children. This new node is reinserted into the queue, according to it's frequency.
You repeat this until you only have one node in the queue, which will be the root.
Now you can traverse the tree from the root to any leaf node, and the path you take (whether you go left or right) at each level gives you either a 0 or a 1, and the length of the path (how far down the tree the node is) gives you the length of the code.
In practice you can just build this code as you build the tree, but appending 0 or 1 to the code at each node, according to whether the sub-tree it is part of is being added to the left or the right of some new parent.
In your diagram, the numbers in the circles are indicating the sum of the frequency of the two nodes which have been combined at each stage of building the tree.
You should also see that the two being combined have been assigned different bits (one a 0, the other a 1).
A diagram may help. Apologies for my hand-writing:

Find All Cycle Bases In a Graph, With the Vertex Coordinates Given

A similar question is posted here.
I have an undirected graph with Vertex V and Edge E. I am looking for an algorithm to identify all the cycle bases in that graph. An example of such a graph is shown below:
Now, all the vertex coordinates are known ( unlike previous question, and contrary to the explanation in the above diagram), therefore it is possible to find the smallest cycles that encompass the whole graph.
In this graph, it is possible that there are edges that don't form any cycles.
What is the best algorithm to do this?
Here's another example that you can take a look at:
Assuming that e1 is the edge that gets picked first, and the arrow shows the direction of the edge.
I haven't tried this and it is rather greedy but should work:
Pick one node
Go to one it's neighbors's
Keep on going until you get back to your starting node, but you're not allowed to visit an old node.
If you get a cycle save it if it doesn't already exist or a subset of those node make up a cycle. If the node in the cycle is a subset of the nodes in another cycle remove the larger cycle (or maybe split it in two?)
Start over at 2 with a new neighbor.
Start over at 1 with a new node.
Comments: At 3 you should of course do the same thing as for step 2, so take all possible paths.
Maybe that's a start? As I said, I haven't tried it so it is not optimized.
EDIT: An undocumented and not optimized version of one implementation of the algorithm can be found here: https://gist.github.com/750015. But, it doesn't solve the solution completely since it can only recognize "true" subsets.

Resources