How to calculate the average branching factor on a given tree - math

Can anyone explain to me, what the value of the average branching factor is if we exclude leaf nodes when computing b?
Example:
I don't know how to calculate this in the right way :/
Thanks a lot

Iterate through each non-terminal node and average the number of branches it has. In your example, the average should be between 1 and 2.
Related:
How to Find the Branching Factor of a Tree

Related

How is the MeanGiniDecrease for each feature calculated in randomForest package?

With my understanding that the Gini decrease can be calculated in a straightforward manner by subtracting the Gini impurity of child nodes from the parent node, how are all calculations aggregated per feature across the forest?
For example I have seen many MeanGiniDecrease graphs that show values of over 100 for some features. It seems unrealistic (or maybe it isn't??) that summing all decreases on nodes relevant to a given feature (all values between 0 and 1) for a given tree would produce such large numbers.
Any help would be greatly appreciated!

How to interpret a Dendrogram from hierarchical clustering to find optimal number of clusters?

When viewing this, how do we know to find the optimal number of clusters? I used K-means and found the "elbow" on the graph that showed the optimal point but I am having trouble figuring this out from just the dendrogram.
The interpretation varies depending on your metric and linkage used.
But in general, you want to keep branches that have "many" observations and with a “large" distance above (for the next merge).

What is the maximum number of possible topological sorts of N-order Direct Acyclic Graph?

I need to find the maximum number of topological sorts on Direct Acyclic Graph of N-order. I've checked by running Depth first search algorithm on various Direct Acyclic graphs, and it looks like it is the size of Depth first search algorithm forest that created after running DFS on the graph. Or maybe I completely wrong or miss something. I also need to prove it. Any help will be appreciated. Thank you.
If you have a total of n elements, the maximum number of possible ways to order those n elements is n! (the number of permutations of n elements). So you certainly can't do any better than that. If we can find a family of graphs with n nodes that have n! possible topological orderings, then we know that has to be the maximum possible number of topological orderings.
As a hint, it is indeed possible to find n-node DAGs with n! possible topological orderings. Try thinking about what that would mean about the possible edges between those nodes. Once you've found this family of graphs, it's very easy to show that they have the maximum possible number of topological orderings by using the above argument.
Hope this helps!

How to calculate correlation between time periods

if I have 2 lists of time intervals :
List1 :
1. 2010-06-06 to 2010-12-12
2. 2010-05-04 to 2010-11-02
3. 2010-02-04 to 2010-10-08
4. 2010-04-01 to 2010-08-02
5. 2010-01-03 to 2010-02-02
and
List2 :
1. 2010-06-08 to 2010-12-14
2. 2010-04-04 to 2010-10-10
3. 2010-02-02 to 2010-12-16
What would be the best way to calculate some sort of correlation or similarity factor between the two lists?
Thanks!
Is that the extent of the data or just a sample to give an idea of the structure you have?
Just a few ideas about how to look at this... My apologies if it is redundant to your current state in looking at this set.
Two basic ideas come to mind for comparing interval like this: absolute or relative. A relative comparison would ignore absolute time for the interval data and look for repeating structures or signature that occur in both groups but not necessarily at the same time. The absolute version would consider simultaneous events to be relevant and and it doesn't matter if something happens every week if they are separated by a year... You can maybe make this distinction by knowing something about the origin of the data.
If it is the grand total of data available for your decision about associations it will come down to some assumptions about what constitutes "correlation". For instance, if you have a specific model for what is going on - e.g. a time to start, time to stop (failure) model you could evaluate the likelihood of observing one sequence given the other. However, without more example data it seems unlikely you'd be able to make any firm conclusions.
The first interval in the two groups are nearly identical so they will contribute strongly to any correlation measure I can think of for the two groups. If there is a random model for this set, I would expect that many models would show these two observations and "unlikely" just because of that.
One way to asses "similarity" would be to ask what portion of the time-axis is covered (possibly generalized to multiple coverage) and compare the two groups on that basis.
Another possibility is to assign a function that adds one for each sequence that occurs during any particular day in the overall interval of these events. That way you have a continuous function with a rudimentary description of multiple events covering the same date. Calculating a correlation between the two groups might give you suggestions of structural similarity, but again you would need more groups of data to make any conclusions.
Ok that was a little rambling. Good luck with your project!
You may try with Cross-Correlation.
However, you should be aware that you have vector data (start, length), and the algorithms suppose a functional dependency between them. That depends on the semantic of your data, which is not clear from the question.
HTH!
A more useful link for your current problem here.

In a graph, how to find the nearest node to a group of nodes?

I have an undirected, unweighted graph, which doesn't have to be planar. I also have a subset of graph's nodes (true subset) and I need to find a node not belonging to the subset, with minimum sum of distances to all nodes in the subset.
So far, I have implemented breath-first search starting from each node in the subset, and the intersection that occurs first is the node I am looking for. Unfortunately, it is running too slow since the graph contains a large number of nodes.
An all-pair shortest path algorithm allows you to find the distance of all nodes to each other in O(V^3) time, see Floyd-warshall. Then summing afterwards will at least be quadratic and I believe worst case cubic as well. It's a very straightforward and not terribly fast way of doing it, but it sounds like it might be an order of magnitude faster than what you're doing right now.

Resources