How is this Huffman Table created? - networking

I have a table that shows the probability of an event happening.
I'm fine with part 1, but part 2 is not clicking with me. I'm trying to get my head around how
the binary numbers are derived in part 2?
I understand 0 is assigned to the largest probability and we work back from there, but how do we work out what the next set of binary numbers is? And what do the circles around the numbers mean/2 shades of grey differentiate?
It's just not clicking. Maybe someone can explain it in a way that will make me understand?

To build huffman codes, one approach is to build a binary tree, using a priority queue, in which the data to be assigned codes are inserted, sorted by frequency.
To start with, you have a queue with only leaf nodes, representing each of your data.
At each step you take the two lowest priority nodes from the queue, make a new node with a frequency equal to the sum of the two removed nodes, and then attach those two nodes as the left and right children. This new node is reinserted into the queue, according to it's frequency.
You repeat this until you only have one node in the queue, which will be the root.
Now you can traverse the tree from the root to any leaf node, and the path you take (whether you go left or right) at each level gives you either a 0 or a 1, and the length of the path (how far down the tree the node is) gives you the length of the code.
In practice you can just build this code as you build the tree, but appending 0 or 1 to the code at each node, according to whether the sub-tree it is part of is being added to the left or the right of some new parent.
In your diagram, the numbers in the circles are indicating the sum of the frequency of the two nodes which have been combined at each stage of building the tree.
You should also see that the two being combined have been assigned different bits (one a 0, the other a 1).
A diagram may help. Apologies for my hand-writing:

Related

Graph - Algorithm to position nodes

I am trying to create a dynamic graph where users can add new nodes using ELK.js
The graph is a tree that has one root node. I am trying to set the position of nodes in a row using (x,y) position (y is not important for now).
Assumptions:
Lower x value brings the node to the left.
Two nodes in one row can't have the same x value.
When we add a new node it should appear on the right of other children if available (the green box in row 2 and 3 for example are a new nodes)
New nodes can be added to every row at any moment (green box in row 2 and row 3)
Max number we can use to set x value is 16 digit long: 9999999999999999
A simple example of how positions behave can be found here (see the position of nodes n2, n3, n4 and change them in JSON)
I am trying not to calculate every position of every node in a row. I tried a lot of different numbers but I stuck and need fresh ideas.
Any help would be appreciated. Thank you
You could approach this as follows:
When a new node is the first one on its level, give it 0 as its x-value.
When it is not the first, find out what its immediate two siblings are on that level (one at the left, one at the right of the node). In some cases you'll need to traverse from the node via one or more of its ancestor(s) to find such immediate sibling.
Get the x-values of these two siblings, and take the average of those two values for the new node's x value.
It might be that there is only a sibling at one side. If there is no sibling at the right side, take the average between the left siblings's x-value and 1016. If it is the left sibling that is missing, take the average between the right siblings's x-value and -1016.
This practically means you use an initial range of -1016...1016 and keep cutting segments in half when a new node must be placed within a segment.

Break up graph into smallest sub-components of 2-nodes or greater

I wish to be able to separate my graph into subcomponent such that the removal of any single node would create no further sub-components (excluding single nodes). As an example see the two images below.
The first image shows the complete graph. The second image shows the sub-components of the graph when it has been split into the smallest possible subcomponents. As can be seen from the second image, the vertex names have been maintained. I don't need the new structure to be a single graph it can be a list of graphs, or even a list of the nodes in each component.
The component of nodes 4-5-6 remains as removing any of the three nodes will not create a new component as the node that was broken off will only be a single node.
At the moment I am trying to put together an iterative process, that removes nodes sequentially in ascending degree order and recurses into the resultant new components. However, it is difficult and I imagine someone else has done it better before.
You say you want the "smallest subcomponents of 2 nodes of greater", and that your example has the "smallest possible subcomponents". But what you actually meant is the largest possible subcomponents such that the removal of any single node would create no further sub-components, right? Otherwise you could just separate the graph into a collection of all of the 2-graphs.
I believe, then, that your problem can be described as finding all "biconnected components" (aka maximal biconnected subgraphs of a graph): https://en.wikipedia.org/wiki/Biconnected_component
As you said in the comments, igraph has the function biconnected_components(g), which will solve your problem. :)

Uber Question : Toggle a cell from 0 to 1 to get the optimum path

In a Binary maze with 0 and 1, 0 is the valid cell to which we can travel and 1 means that the cell is blocked. Given source and destination. We have to find-
1. IF path exists, if yes, find shortest path.
2. If we are given a chance to toggle single cell from 1 to 0 , which cell you will toggle so that you will surely get the shortest path.
For the second part how to check for each of the cell without toggling it one by one, if there any efficient way to do the same?
I think instead of using a dynamic programing approach where you at any index i of the grid take the min of the all possible direction you can take, you can use a greedy approach where you create the graph in such a way that a index i is connected to all the indexes around it(max possible 8) in the grid which has zero value and use the shortest path algorithm between given two nodes. This will make toggling also easy you only toggle the set of indexes which are shared between the source and the destination.

How to find the longest path in a graph with a set of start and target points?

I have a DAG (with costs/weights per edge) and want to find the longest path between two sets of nodes. The two sets of start and target nodes are disjoint and small in size compared to the total number of nodes in the graph.
I know how to do this efficiently between one start and target node. With multiple, I can list all paths from every start to every target node and pick the longest one – but that takes quadratic number of single path searches. Is there a better way?
I assume that you want the longest path possible that starts in any of the nodes from the first set and ends in any of the nodes in the second set. Then you can add two virtual nodes:
The first node has no predecessors and its successors are the nodes from the first set.
The second node has no successors and its predecessors are the nodes from the second set.
All the newly added edges should have zero weight.
The graph would still be a DAG. Now if you use the standard algorithm to find the longest path in the DAG between the two new nodes, you’ll get the longest path that starts in the first set and ends in the second set, except that there will be an extra zero-weighted edge at the beginning and an extra zero-weighted edge at the end.
By the way, this solution is essentially executing the algorithm from all the nodes from the first set, but in parallel as opposed to the sequential approach your question suggests.

Find All Cycle Bases In a Graph, With the Vertex Coordinates Given

A similar question is posted here.
I have an undirected graph with Vertex V and Edge E. I am looking for an algorithm to identify all the cycle bases in that graph. An example of such a graph is shown below:
Now, all the vertex coordinates are known ( unlike previous question, and contrary to the explanation in the above diagram), therefore it is possible to find the smallest cycles that encompass the whole graph.
In this graph, it is possible that there are edges that don't form any cycles.
What is the best algorithm to do this?
Here's another example that you can take a look at:
Assuming that e1 is the edge that gets picked first, and the arrow shows the direction of the edge.
I haven't tried this and it is rather greedy but should work:
Pick one node
Go to one it's neighbors's
Keep on going until you get back to your starting node, but you're not allowed to visit an old node.
If you get a cycle save it if it doesn't already exist or a subset of those node make up a cycle. If the node in the cycle is a subset of the nodes in another cycle remove the larger cycle (or maybe split it in two?)
Start over at 2 with a new neighbor.
Start over at 1 with a new node.
Comments: At 3 you should of course do the same thing as for step 2, so take all possible paths.
Maybe that's a start? As I said, I haven't tried it so it is not optimized.
EDIT: An undocumented and not optimized version of one implementation of the algorithm can be found here: https://gist.github.com/750015. But, it doesn't solve the solution completely since it can only recognize "true" subsets.

Resources