Mixing ordered and unordered trees - math

I understand that there are ordered trees in which children of all nodes do have some kind of order. And also there is unordered trees in which nodes do not have any sort of order. But I find both structures not being complementary. So is there a name for trees in which only children of some nodes have an order and children of other nodes do not? I would have guessed 'partially ordered trees' but unfortunately thats already been taken by some different topic, namely heap trees.
In particular I am interested in trees build by mathematical expressions, e.g. "2+x^5". When transformed into a tree, I want this to be equal to the tree of "x^5 + 2", but not being equal to the tree of "2+5^x".

Related

decision tree for significant variables

how can I use decision tree graph to determine the significant variables,I know which one has largest information gain should be in the root of tree which means has small entropy so this is my graph if I want to know which variables are significant how can I interpret
What does significant mean to you? At each node, the variable selected it the most significant given the context and assuming that selecting by information gain will actually work (it's not always the case). For example, at node 11, BB is the most significant discriminator given AA>20.
Clearly, AA and BB are the most useful assuming selecting by information gain gives the best way to partition the data. The rest give further refinement. C and N would be next.
What you should be asking is: Should I keep all the nodes?
The answer depends on many things and there is likely no best answer.
One way would be by using the total case count of each leaf and merge them.
Not sure how I would do this given your image. It's not really clear what is being shown at the leaves and what 'n' is. Also not sure what 'p' is.

Finding largest family in graph

This is a homework for a data structures course. I'm not asking for code, but I have hard time coming up with an effective algorithm for this :l
I have information about different family trees. Among those I have to find out the largest family and return the name of the greatest elder and number of his descendants. The descendants may have kids between them (a brother and a sister may have a kid) and this has to be done in at least O(n^2).
What would be the most effective way to solve this? I imagine having a breadth first search on graphs, but that means I have to keep up children counters for many levels upwards (if I am traversing a grand^99 children for example).
CMIIW, but my assumption is every family tree is separated from each other and the root is the eldest ancestor. If that's the case, since you're counting all tree nodes regardless, I think any unweighted graph traversal algorithm would give the same result. BFS would do the job. I don't get what you mean by "keeping up children counters for many levels upwards" though, just 1 counter is fine right?

Rejecting isomorphisms from collection of graphs

I have a collection of 15M (Million) DAGs (directed acyclic graphs - directed hypercubes actually) that I would like to remove isomorphisms from. What is the common algorithm for this? Each graph is fairly small, a hybercube of dimension N where N is 3 to 6 (for now) resulting in graphs of 64 nodes each for N=6 case.
Using networkx and python, I implemented it like this which works for small sets like 300k (Thousand) just fine (runs in a few days time).
def isIsomorphicDuplicate(hcL, hc):
"""checks if hc is an isomorphism of any of the hc's in hcL
Returns True if hcL contains an isomorphism of hc
Returns False if it is not found"""
#for each cube in hcL, check if hc could be isomorphic
#if it could be isomorphic, then check if it is
#if it is isomorphic, then return True
#if all comparisons have been made already, then it is not an isomorphism and return False
for saved_hc in hcL:
if nx.faster_could_be_isomorphic(saved_hc, hc):
if nx.fast_could_be_isomorphic(saved_hc, hc):
if nx.is_isomorphic(saved_hc, hc):
return True
return False
One better way to do it would be to convert each graph to its canonical ordering, sort the collection, then remove the duplicates. This bypasses checking each of the 15M graphs in a binary is_isomophic() test, I believe the above implementation is something like O(N!N) (not taking isomorphic time into account) whereas a clean convert all to canonical ordering and sort should take O(N) for the conversion + O(log(N)N) for the search + O(N) for the removal of duplicates. O(N!N) >> O(log(N)N)
I found this paper on Canonical graph labeling, but it is very tersely described with mathematical equations, no pseudocode: "McKay's Canonical Graph Labeling Algorithm" - http://www.math.unl.edu/~aradcliffe1/Papers/Canonical.pdf
tldr: I have an impossibly large number of graphs to check via binary isomorphism checking. I believe the common way this is done is via canonical ordering. Do any packaged algorithms or published straightforward to implement algorithms (i.e. have pseudocode) exist?
Here is a breakdown of McKay ’ s Canonical Graph Labeling Algorithm, as presented in the paper by Hartke and Radcliffe [link to paper].
I should start by pointing out that an open source implementation is available here: nauty and Traces source code.
Ok, let's do this! Unfortunately this algorithm is heavy in graph theory, so we need some terms. First I will start by defining isomorphic and automorphic.
Isomorphism:
Two graphs are isomorphic if they are the same, except that the vertices are labelled differently. The following two graphs are isomorphic.
Automorphic:
Two graphs are automorphic if they are completely the same, including the vertex labeling. The following two graphs are automorphic. This seems trivial, but turns out to be important for technical reasons.
Graph Hashing:
The core idea of this whole thing is to have a way to hash a graph into a string, then for a given graph you compute the hash strings for all graphs which are isomorphic to it. The isomorphic hash string which is alphabetically (technically lexicographically) largest is called the "Canonical Hash", and the graph which produced it is called the "Canonical Isomorph", or "Canonical Labelling".
With this, to check if any two graphs are isomorphic you just need to check if their canonical isomporphs (or canonical labellings) are equal (ie are automorphs of each other). Wow jargon! Unfortuntately this is even more confusing without the jargon :-(
The hash function we are going to use is called i(G) for a graph G: build a binary string by looking at every pair of vertices in G (in order of vertex label) and put a "1" if there is an edge between those two vertices, a "0" if not. This way the j-th bit in i(G) represents the presense of absence of that edge in the graph.
McKay ’ s Canonical Graph Labeling Algorithm
The problem is that for a graph on n vertices, there are O( n! ) possible isomorphic hash strings based on how you label the vertices, and many many more if we have to compute the same string multiple times (ie automorphs). In general we have to compute every isomorph hash string in order to find the biggest one, there's no magic sort-cut. McKay's algorithm is a search algorithm to find this canonical isomoprh faster by pruning all the automorphs out of the search tree, forcing the vertices in the canonical isomoprh to be labelled in increasing degree order, and a few other tricks that reduce the number of isomorphs we have to hash.
(1) Sect 4: the first step of McKay's is to sort vertices according to degree, which prunes out the majority of isomoprhs to search, but is not guaranteed to be a unique ordering since there may be more than one vertex of a given degree. For example, the following graph has 6 vertices; verts {1,2,3} have degree 1, verts {4,5} have degree 2 and vert {6} has degree 3. It's partial ordering according to vertex degree is {1,2,3|4,5|6}.
(2) Sect 5: Impose artificial symmetry on the vertices which were not distinguished by vertex degree; basically we take one of the groups of vertices with the same degree, and in turn pick one at a time to come first in the total ordering (fig. 2 in the paper), so in our example above, the node {1,2,3|4,5|6} would have children { {1|2,3|4,5|6}, {2|1,3|4,5|6}}, {3|1,2|4,5|6}} } by expanding the group {1,2,3} and also children { {1,2,3|4|5|6}, {1,2,3|5|4|6} } by expanding the group {4,5}. This splitting can be done all the way down to the leaf nodes which are total orderings like {1|2|3|4|5|6} which describe a full isomorph of G. This allows us to to take the partial ordering by vertex degree from (1), {1,2,3|4,5|6}, and build a tree listing all candidates for the canonical isomorph -- which is already a WAY fewer than n! combinations since, for example, vertex 6 will never come first. Note that McKay evaluates the children in a depth-first way, starting with the smallest group first, this leads to a deeper but narrower tree which is better for online pruning in the next step. Also note that each total ordering leaf node may appear in more than one subtree, there's where the pruning comes in!
(3) Sect. 6: While searching the tree, look for automorphisms and use that to prune the tree. The math here is a bit above me, but I think the idea is that if you discover that two nodes in the tree are automorphisms of each other then you can safely prune one of their subtrees because you know that they will both yield the same leaf nodes.
I have only given a high-level description of McKay's, the paper goes into a lot more depth in the math, and building an implementation will require an understanding of this math. Hopefully I've given you enough context to either go back and re-read the paper, or read the source code of the implementation.
This is indeed an interesting problem.
I would approach it from the adjacency matrix angle. Two isomorphic graphs will have adjacency matrices where the rows / columns are in a different order. So my idea is to compute for each graph several matrix properties which are invariant to row/column swaps, off the top of my head:
numVerts, min, max, sum/mean, trace (probably not useful if there are no reflexive edges), norm, rank, min/max/mean column/row sums, min/max/mean column/row norm
and any pair of isomorphic graphs will be the same on all properties.
You could make a hash function which takes in a graph and spits out a hash string like
string hashstr = str(numVerts)+str(min)+str(max)+str(sum)+...
then sort all graphs by hash string and you only need to do full isomorphism checks for graphs which hash the same.
Given that you have 15 million graphs on 36 nodes, I'm assuming that you're dealing with weighted graphs, for unweighted undirected graphs this technique will be way less effective.
This is an interesting question which I do not have an answer for! Here is my two cents:
By 15M do you mean 15 MILLION undirected graphs? How big is each one? Any properties known about them (trees, planar, k-trees)?
Have you tried minimizing the number of checks by detecting false positives in advance? Something includes computing and comparing numbers such as vertices, edges degrees and degree sequences? In addition to other heuristics to test whether a given two graphs are NOT isomorphic. Also, check nauty. It may be your way to check them (and generate canonical ordering).
If all your graphs are hypercubes (like you said), then this is trivial: All hypercubes with the same dimension are isomorphic, hypercubes with different dimension aren't. So run through your collection in linear time and throw each graph in a bucket according to its number of nodes (for hypercubes: different dimension <=> different number of nodes) and be done with it.
since you mentioned that testing smaller groups of ~300k graphs can be checked for isomorphy I would try to split the 15M graphs into groups of ~300k nodes and run the test for isomorphy on each group
say: each graph Gi := VixEi (Vertices x Edges)
(1) create buckets of graphs such that the n-th bucket contains only graphs with |V|=n
(2) for each bucket created in (1) create subbuckets such that the (n,m)-th subbucket contains only graphs such that |V|=n and |E|=m
(3) if the groups are still too large, sort the nodes within each graph by their degrees (meaning the nr of edges connected to the node), create a vector from it and distribute the graphs by this vector
example for (3):
assume 4 nodes V = {v1, v2, v3, v4}. Let d(v) be v's degree with d(v1)=3, d(v2)=1, d(v3)=5, d(v4)=4, then find < := transitive hull ( { (v2,v1), (v1,v4), (v4,v3) } ) and create a vector depening on the degrees and the order which leaves you with
(1,3,4,5) = (d(v2), d(v1), d(v4), d(v3)) = d( {v2, v1, v4, v3} ) = d(<)
now you have divided the 15M graphs into buckets where each bucket has the following characteristics:
n nodes
m edges
each graph in the group has the same 'out-degree-vector'
I assume this to be fine grained enough if you are expecting not to find too many isomorphisms
cost so far: O(n) + O(n) + O(n*log(n))
(4) now, you can assume that members inside each bucket are likely to be isomophic. you can run your isomorphic-check on the bucket and only need to compare the currently tested graph against all representants you have already found within this bucket. by assumption there shouldn't be too many, so I assume this to be quite cheap.
at step 4 you also can happily distribute the computation to several compute nodes, which should really speed up the process
Maybe you can just use McKay's implementation? It is found here now: http://pallini.di.uniroma1.it/
You can convert your 15M graphs to the compact graph6 format (or sparse6) which nauty uses and then run the nauty tool labelg to generate the canonical labels (also in graph6 format).
For example - removing isomorphic graphs from a set of random graphs:
#gnp.py
import networkx as nx
for i in range(100000):
graph = nx.gnp_random_graph(10,0.1)
print nx.generate_graph6(graph,header=False)
[nauty25r9]$ python gnp.py > gnp.g6
[nauty25r9]$ cat gnp.g6 |./labelg |sort |uniq -c |wc -l
>A labelg
>Z 10000 graphs labelled from stdin to stdout in 0.05 sec.
710

How can I check if two graphs with LABELED vertices are isomorphic?

For example, suppose I had a graph G that had all blue nodes and one red node. I also had a graph F that had all blue and one red node.
What is an algorithm that I can run to verify that these two graphs are isomorphic with respect to their colored nodes?
I have made a few attempts at trying to create a polynomial graph isomorphism algorithm, and while I have yet to create an algorithm that is proven to be polynomial for every case, one algorithm I came up with is particularly suited for this purpose. It's based on a DFA minimization algorithm (the specific algorithm is http://en.wikipedia.org/wiki/DFA_minimization#Hopcroft.27s_algorithm ; you may want to find a description from elsewhere, since Wikipedia's is difficult to follow).
The original algorithm was initialized by organizing the vertexes into distinct groups based on degree (one group for vertexes of degree 1, one for vertexes of degree 2, etc.). For your purposes, you will want to organize the vertexes into groups based upon both degree and label; this will ensure that no two nodes will be paired if they have different labels. Each graph should have its own structure containing such groups. Check the collection of groups for both graphs; there should be the same number of groups for the two graphs, and for each group in one graph, there should be a group in the other graph containing the same number of vertexes of the same degree and label. If this isn't the case, the graphs aren't isomorphic.
At each iteration of the main algorithm, you should generate a new data structure for each of the two graphs for the vertex groups that the next step will use. For each group, generate a list for each vertex of group indices/IDs that correspond to the vertexes that are adjacent to the vertex in question (include duplicate groups in this list). Check each group to see if the sorted group index/ID list for each contained vertex is the same. If this is the case, create a unmodified copy of this group in the next step's group structure. If this isn't the case, then for each unique list of group indices/IDs within that group, create a new group for vertexes within the original group that generated that list and add this new group to the next step's group structure. If you do not subdivide any of the groups of either graph in a given iteration, stop running the main portion of this algorithm. If you subdivide at least one group, you will need to once again check to make sure the group structures of the two graphs correspond to each other. This check will be similar to the one performed at the end of the algorithm's initialization (you may even be able to use the same function for both). If this check fails, then the graphs aren't isomorphic. If the check passes, then discard/free the current group structures and start the next iteration with the freshly created ones.
To make the process of determining "corresponding groups" easier, I would highly recommend using a predictable scheme for adding groups to the structure. For example, if you add groups during initialization in (degree, label) order, subdivide groups in ascending index order, and add subdivided groups to the new structure based on the order of the group index list (i.e., sorted by first listed index, then second, etc.), then corresponding groups between the two group structures will always have the same index, which makes the process of keeping track of which groups correspond to each other much easier.
If all groups contain 3 or fewer vertexes when the algorithm completes, then the graphs are isomorphic (for corresponding groups containing 2 or 3 vertexes, any vertex pairing is valid). If this isn't the case (this always happens for graphs where all nodes have equal degree and label, and sometimes happens for subgraphs with that property), then the graphs are not yet determined to be isomorphic or non-isomorphic. To differentiate between the two cases, choose an arbitrary node of the first graph's largest group and separate it into its own group. Then, for each node of the other graph's largest group, try running the algorithm again with that node separated into its own group. In essence, you are choosing an unpaired node from the first graph and pairing it by guess-and-check to every node in the second graph that is still a plausible pairing. If any of the forked iterations returns an isomorphism, the graphs are isomorphic. If none of them do, the graphs are not isomorphic.
For general cases, this algorithm is polynomial. In corner cases, the algorithm might be exponential. Whether this is the case or not is related to how frequently the algorithm can be forced to fork in the worst case of both graph input and node selection, which I have had difficulties trying to put useful bounds on. For example, although the algorithm forks at every step when comparing two full graphs, every branch of that tree produces an isomorphism; therefore, the algorithm returns in polynomial time in this case even though traversing the entire execution tree would require exponential time since traversing only one branch of the execution tree takes polynomial time.
Regardless, this algorithm should work well for your purposes. I hope my explanation of it was comprehensible; if not, I can try providing examples of the algorithm handling simple cases or expressing it as pseudocode instead.
Years ago, I created a simple and flexible algorithm for exactly this problem (graph isomorphism with labels).
I named it "Powerhash", and to create the algorithm it required two insights. The first is the power iteration graph algorithm, also used in PageRank. The second is the ability to replace power iteration's inside step function with anything that we want. I replaced it with a function that does the following on each iteration, and for each node:
Sort the hashes (from previous iteration) of the node's neighbors
Hash the concatenated sorted hashes
Replace node's hash with newly computed hash
On the first step, a node's hash is affected by its direct neighbors. On the second step, a node's hash is affected by the neighborhood 2-hops away from it. On the Nth step a node's hash will be affected by the neighborhood N-hops around it. So you only need to continue running the Powerhash for N = graph_radius steps. In the end, the graph center node's hash will have been affected by the whole graph.
To produce the final hash, sort the final step's node hashes and concatenate them together. After that, you can compare the final hashes to find if two graphs are isomorphic. If you have labels, then add them (on the first iteration) in the internal hashes that you calculate for each node.
For more on this you can look at my post here:
https://plus.google.com/114866592715069940152/posts/fmBFhjhQcZF
The algorithm above was implemented inside the "madIS" functional relational database. You can find the source code of the algorithm here:
https://github.com/madgik/madis/blob/master/src/functions/aggregate/graph.py
Just checking; do you mean strict graph isomorphism or something else? Isomorphic graphs have the same adjacency relations (I.e. if node A is adjacent to node B in one graph then node g(A) is adjacent to node g(B) in another graph that is the result of applying the transformation g to the first one...) If you just wanted to check of one graph has the same types and number of nodes as another then you can just compare counts.

Why is a Fibonacci heap called a Fibonacci heap?

The Fibonacci heap data structure has the word "Fibonacci" in its name, but nothing in the data structure seems to use Fibonacci numbers. According to the Wikipedia article:
The name of Fibonacci heap comes from Fibonacci numbers which are used in the running time analysis.
How do these Fibonacci numbers arise in the Fibonacci heap?
The Fibonacci heap is made up of a collection of smaller heap-ordered trees of different "orders" that obey certain structural constraints. The Fibonacci sequence arises because these trees are constructed in a way such that a tree of order n has at least Fn+2 nodes in it, where Fn+2 is the (n + 2)nd Fibonacci number.
To see why this result is true, let's begin by seeing how the trees in the Fibonacci heap are constructed. Initially, whenever a node is put into a Fibonacci heap, it is put into a tree of order 0 that contains just that node. Whenever a value is removed from the Fibonacci heap, some of the trees in the Fibonacci heap are coalesced together such that the number of trees doesn't grow too large.
When combining trees together, the Fibonacci heap only combines together trees of the same order. To combine two trees of order n into a tree of order n + 1, the Fibonacci heap takes whichever of the two trees has a greater root value than the other, then makes that tree a child of the other tree. One consequence of this fact is that trees of order n always have exactly n children.
The main attraction of the Fibonacci heap is that it supports the decrease-key efficiently (in amortized O(1)). In order to support this, the Fibonacci heap implements decrease-key as follows: to decrease the key of a value stored in some node, that node is cut from its parent tree and treated as its own separate tree. When this happens, the order of its old parent node is decreased by one. For example, if an order 4 tree has a child cut from it, it shrinks to an order 3 tree, which makes sense because the order of a tree is supposed to be the number of children it contains.
The problem with doing this is that if too many trees get cut off from the same tree, we might have a tree with a large order but which contains a very small number of nodes. The time guarantees of a Fibonacci heap are only possible if trees with large orders contain a huge number of nodes, and if we can just cut any nodes we'd like from trees we could easily get into a situation where a tree with a huge order only contains a small number of nodes.
To address this, Fibonacci heaps make one requirement - if you cut two children from a tree, you have to in turn cut that tree from its parent. This means that the trees that form a Fibonacci heap won't be too badly damaged by decrease-key.
And now we can get to the part about Fibonacci numbers. At this point, we can say the following about the trees in a Fibonacci heap:
A tree of order n has exactly n children.
Trees of order n are formed by taking two trees of order n - 1 and making one the child of another.
If a tree loses two children, that tree is cut away from its parent.
So now we can ask - what are the smallest possible trees that you can make under these assumptions?
Let's try out some examples. There is only one possible tree of order 0, which is a just a single node:
Smallest possible order 0 tree: *
The smallest possible tree of order 1 would have to be at least a node with a child. The smallest possible child we could make is a single node with the smallest order-0 tree as a child, which is this tree:
Smallest possible order 1 tree: *
|
*
What about the smallest tree of order 2? This is where things get interesting. This tree certainly has to have two children, and it would be formed by merging together two trees of order 1. Consequently, the tree would initially have two children - a tree of order 0 and a tree of order 1. But remember - we can cut away children from trees after merging them! In this case, if we cut away the child of the tree of order 1, we would be left with a tree with two children, both of which are trees of order 0:
Smallest possible order 2 tree: *
/ \
* *
How about order 3? As before, this tree would be made by merging together two trees of order 2. We would then try to cut away as much of the subtrees of this order-3 tree as possible. When it's created, the tree has subtrees of orders 2, 1, and 0. We can't cut away from the order 0 tree, but we can cut a single child from the order 2 and order 1 tree. If we do this, we're left with a tree with three children, one of order 1, and two of order 2:
Smallest possible order 3 tree: *
/|\
* * *
|
*
Now we can spot a pattern. The smallest possible order-(n + 2) tree would be formed as follows: start by creating a normal order (n + 2) tree, which has children of orders n + 1, n, n - 1, ..., 2, 1, 0. Then, make those trees as small as possible by cutting away nodes from them without cutting two children from the same node. This leaves a tree with children of orders n, n - 2, ..., 1, 0, and 0.
We can now write a recurrence relation to try to determine how many nodes are in these trees. If we do this, we get the following, where NC(n) represents the smallest number of nodes that could be in a tree of order n:
NC(0) = 1
NC(1) = 2
NC(n + 2) = NC(n) + NC(n - 1) + ... + NC(1) + NC(0) + NC(0) + 1
Here, the final +1 accounts for the root node itself.
If we expand out these terms, we get the following:
NC(0) = 1
NC(1) = 2
NC(2) = NC(0) + NC(0) + 1 = 3
NC(3) = NC(1) + NC(0) + NC(0) + 1 = 5
NC(4) = NC(2) + NC(1) + NC(0) + NC(0) + 1 = 8
If you'll notice, this is exactly the Fibonacci series offset by two positions. In other words, each of these trees has to have at least Fn+2 nodes in them, where Fn + 2 is the (n + 2)nd Fibonacci number.
So why is it called a Fibonacci heap? Because each tree of order n has to have at least Fn+2 nodes in it!
If you're curious, the original paper on Fibonacci heaps has pictures of these smallest possible trees. It's pretty nifty to see! Also, check out this CS Theory Stack Exchange Post for an alternative explanation as to why Fibonacci heap trees have the sizes they do.
Hope this helps!
I want to give an intuitive explanation that I myself had an "Aha!" moment with.
Tree structures achieve O(log n) runtimes because they are able to store exponential number of items in terms of their heights. Binary trees can store 2^h, tri-nary trees can store 3^h and so on for x^h as generic case.
Can x be less than 2? Sure it can! As long as x > 1, we still achieve log runtimes and ability to store exponentially large number of items for its height. But how do we build such a tree? Fibonacci heap is a data structure where x ≈ 1.62, or Golden Ratio. Whenever we encounter Golden Ratio, there are Fibonacci numbers lurking somewhere...
Fibonacci heap is actually a forest of trees. After a process called "consolidation", each of these trees hold a distinct count of items that are exact powers of 2. For example, if your Fibonacci heap has 15 items, it would have 4 trees that hold 1, 2, 4 and 8 items respectively looking like this:
Details of "consolidation" process is not relevant but in essence it basically keeps unioning trees in the forest of same degree until all trees have distinct degree (try a cool visualization to see how these trees get built). As you can write any n as sum of exact powers of 2, it's easy to imagine how consolidated trees for Fibonacci heap would look like for any n.
OK, so far we still don't see any direct connection to Fibonacci numbers. Where do they come in picture? They actually appear in very special case and this is also a key to why Fibonacci heap can offer O(1) time for DECREASE-KEY operation. When we decrease a key, if new key is still larger than parent's key then we don't need to do anything else because min-heap property is not violated. But if it isn't then we can't leave smaller child buried under larger parent and so we need to cut it's subtree out and make a new tree out of it. Obviously we can't keep doing this for each delete because otherwise we will end up with trees that are too tall and no longer have exponential items which means no more O(log n) time for extract operation. So the question is what rule can we set up so tree still have exponential items for its height? The clever insight is this:
We will allow each parent to loose up to one child. If there is a subsequent attempt to remove another child from same parent then we will cut that parent also out of that tree and put it in root list as tree of 1.
Above rule makes sure trees still have exponential items for its height even in the worse case. What is the worse case? The worse case occurs when we make each parent loose one child. If parent has > 1 child we have choice to which one to get rid of. In that case let's get rid of child with largest subtree. So in above diagram, if you do that for each parent, you will have trees of the size 1, 1, 2 and 3. See a pattern here? Just for fun, add new tree of order 4 (i.e. 16 items) in above diagram and guess what would you be left with after running this rule for each parent: 5. We have a Fibonacci sequence here!
As Fibonacci sequence is exponential, each tree still retains exponential number of items and thus manages to have O(log n) runtime for EXTRACT-MIN operation. However notice that DECREASE-KEY now takes only O(1). One other cool thing is that you can represent any number as a sum of Fibonacci numbers. For example, 32 = 21 + 8 + 3 which means if you needed to hold 32 items in Fibonacci heap, you can do so using 3 trees holding 21, 8 and 3 items respectively. However "consolidation" process does not produces trees with Fibonacci numbers of nodes. They only occur when this worse case of DECREASE-KEY or DELETE happens.
Further reading
Binomial Heap - Fibonacci heaps are essentially lazy Binomial heaps. It's a cool data structure because it shows a different way of storing exponential items in a tree for its height other than binary heap.
Intuition behind Fibonacci Heaps Required reading for anyone who wants to understand Fibonacci heaps at its core. Textbooks often are either too shallow and too cluttered on this subject but the author of this SO answer has nailed it hands down.

Resources