Finding largest family in graph - graph

This is a homework for a data structures course. I'm not asking for code, but I have hard time coming up with an effective algorithm for this :l
I have information about different family trees. Among those I have to find out the largest family and return the name of the greatest elder and number of his descendants. The descendants may have kids between them (a brother and a sister may have a kid) and this has to be done in at least O(n^2).
What would be the most effective way to solve this? I imagine having a breadth first search on graphs, but that means I have to keep up children counters for many levels upwards (if I am traversing a grand^99 children for example).

CMIIW, but my assumption is every family tree is separated from each other and the root is the eldest ancestor. If that's the case, since you're counting all tree nodes regardless, I think any unweighted graph traversal algorithm would give the same result. BFS would do the job. I don't get what you mean by "keeping up children counters for many levels upwards" though, just 1 counter is fine right?

Related

Time complexity for detecting a cycle in a graph

I am trying to understand the time complexity of some efficient methods of detecting cycles in a graph.
Two approaches for doing this are explained here. I would assume that time complexity is provided in terms of worst-case.
The first is union-find, which is said to have a time complexity of O(Vlog E).
The second uses a DFS based approach and is said to have a time complexity of O(V+E). If I am correct this is a more efficient complexity asymptotically than O(Vlog E). It is also convenient that the DFS-based approach can be used for directed and undirected graphs.
My issue is that I fail to see how the second approach can be considered to run in O(V+E) time because DFS runs in O(V+E) time and the algorithm checks the nodes adjacent to any discovered nodes for the starting node. Surely this would mean that the algorithm runs in O(V2) time because up to V-1 adjacent nodes might have to be traversed over for each discovered node? It is obviously impossible for more than one node to require the traversal of n-1 adjacent nodes but from my understanding this would still be the upper bound of the runtime.
Hopefully someone understands why I think this and can help me to understand why the complexity is O(V+E).
The algorithm, based on DFS, typically maintains a "visited" boolean variable for each vertex, which contains one bit of information - this vertex was already visited or not. So, none of vertices can be visited more than once.
If the graph is connected, then starting the DFS from any vertex will give you an answer right away. If the graph is a tree, then all the vertices will be visited in a single call to the DFS. If the graph is not a tree, then a single call to the DFS will find a cycle - and in this case not all the vertices might be visited. In both cases the subgraph, induced by all the already visited vertices, will be a tree at each step of the DFS lookup - so the total number of traversed edges will be O(V). Because of that we can reduce the time complexity estimate O(V+E) of the cycle detection algorithm to O(V).
Starting the DFS from all vertices of the graph is necessary in the case when the graph consists of a number of connected components - the "visited" boolean variable guarantees that the DFS won't traverse the same component again and again.

Possibilities of dividing a class in groups with several criteria

I have to divide a class of 50 students writing a dissertation in 10 different discussion groups of 5 members each. In theory, there are 1.35363x10^37 possible ways of doing this, which is just the result of {50!}/{(5!^10)*10!)}, if it is already decided that the groups will consist of 5.
However, each group is to be led by a facilitator. This reduces the number of possible combinations considerably, because each facilitaror has one field of expertise among 5 possible ones, which should be matched to the topics the students are writing about as much as possible. If there are three facilitators with competence A, three with competence B, two with competence C, one with competence D and one with competence E, and 15 students are assigned to A, 15 to B, 10 to C, 5 to D and 5 to E, the number of possible combinations comes down to 252 505.
But both students and facilitators keep advocating for the use of more criteria, instead of just focusing on field of expertise. For example, wanting to be in a group of students that know each other, or being in a group with a facilitator that has particular knowledge of a specific research method.
I am trying to illustrate my intuitive reasoning, which tells me that each new criteria increases the complexity/impossibility of the task, if the objective is a completely efficient solution. But I can't get my head around expressing this analytically in a satisfactory manner.
Is my reasoning correct, that adding criteria would reduce the amount of possibilities that can be discarded following the inclusion-exclusion principle, thus making the task more complex, adding possible combinations? I also think that if the criteria are not compatible (for example if students that know each other are writing about different topics, and there aren't enough competent facilitators), certain constraints become inviable.
You need to distinguish between computational complexity and human complexity. Adding constraints almost automatically increases the human complexity of the problem in the sense that it means that there is more to wrap your mind around. But -- it isn't true that the computational complexity increases. At least sometimes it decreases.
For example, say you have a set of 200 items and you want to determine if there is a subset of them which satisfy some constraint. Depending on the constraint, There might be no feasible way to do it. After all, 2^200 is much too large to brute-force. Now add the constraint that the subset needs to have exactly 3 elements. Now all of a sudden it is possible to brute force (just run through all 1,313,400 3-element subsets until you either find a solution or determine that none exist). This is enough to show that it isn't true that adding a constraint always makes a problem intrinsically more difficult. In the discrete case a new constraint can cut down on the size of the search space in a way that can be exploited. In the continuous cases it can reduce degrees of freedom and thus lower the dimension of the problem. This isn't to say that it always makes it easier. Probably as a rule of thumb, additional constraints tend to make a problem more difficult.
Your actual problem isn't spelled out enough to give concrete advice. One possibility (and one way to handle a proliferation of somewhat extraneous constraints) is to divide the constraints into hard constraints which need to be satisfied and soft constraints which are merely desired but not strictly needed. Turn it into an optimization problem: find the solution which maximizes the number of soft-constraints that are satisfied, subject to the condition that it satisfies the hard constraints. Perhaps you can formulate it as an integer programming problem and hopefully find an exact solution. Or, if it is easy to generate solutions that satisfy the hard constraints and it is easy to mutate one such solution to obtain another (e.g. swap two students who are in different groups), then an evolutionary algorithm would be a reasonable heuristic.

decision tree for significant variables

how can I use decision tree graph to determine the significant variables,I know which one has largest information gain should be in the root of tree which means has small entropy so this is my graph if I want to know which variables are significant how can I interpret
What does significant mean to you? At each node, the variable selected it the most significant given the context and assuming that selecting by information gain will actually work (it's not always the case). For example, at node 11, BB is the most significant discriminator given AA>20.
Clearly, AA and BB are the most useful assuming selecting by information gain gives the best way to partition the data. The rest give further refinement. C and N would be next.
What you should be asking is: Should I keep all the nodes?
The answer depends on many things and there is likely no best answer.
One way would be by using the total case count of each leaf and merge them.
Not sure how I would do this given your image. It's not really clear what is being shown at the leaves and what 'n' is. Also not sure what 'p' is.

How many different DFS's and BFS's can we make out of a graph ? do DFS's show more variety or BFS 's?

i'm tryna to find it out how many different BFS and DFS trees can i construct out of a given graph if this is impossible to determine exactly then i wanna to know if the DFS's has more variety than BFS's sure all of them related to the given graph !
Thank you
Perhaps the complexities answer your question
Depth-first search
Worst case performance O(|E|) for explicit graphs traversed without repetition, O(b^{d}) for implicit graphs with branching factor b searched to depth d.
Worst case space complexity O(|V|) if entire graph is traversed without repetition, O(longest path length searched) for implicit graphs without elimination of duplicate nodes
&
Breadth-first search
Worst case performance O(|E|)=O(b^{d})
Worst case space complexity O(|V|)=O(b^{d})

Explain BFS and DFS in terms of backtracking

Wikipedia about Depth First Search:
Depth-first search (DFS) is an
algorithm for traversing or searching
a tree, tree structure, or graph. One
starts at the root (selecting some
node as the root in the graph case)
and explores as far as possible along
each branch before backtracking.
So what is Breadth First Search?
"an algorithm that choose a starting
node, checks all nodes backtracks,
chooses the shortest path, chose neighbour nodes backtracks,
chose the shortest path, finally
finds the optimal path because of
traversing each path due to continuous
backtracking.
Regex find's pruning -- backtracking?
The term backtracking confuses due to its variety of use. UNIX's find pruning an SO-user explained with backtracking. Regex Buddy uses the term "catastrophic backtracking" if you do not limit the scope of your Regexes. It seems to be a too widely used umbrella-term. So:
How do you define "backtracking" specifically for Graph Theory?
What is "backtracking" in Breadth First Search and Depth First Search?
[Added]
Good definitions about backtracking and examples
The Brute-force method
Stallman's(?) invented term "dependency-directed backtracking"
Backtracking and regex example
Depth First Search definition.
The confusion comes in because backtracking is something that happens during search, but it also refers to a specific problem-solving technique where a lot of backtracking is done. Such programs are called backtrackers.
Picture driving into a neighborhood, always taking the first turn you see (let's assume there are no loops) until you hit a dead end, at which point you drive back to the intersection of the next unvisited street. This the "first" kind of backtracking, and it's roughly equivalent to colloquial usage of the word.
The more specific usage refers to a problem-solving strategy that is similar to depth-first search but backtracks when it realizes that it's not worth continuing down some subtree.
Put another way -- a naive DFS blindly visits each node until it reaches the goal. Yes, it "backtracks" on leaf nodes. But a backtracker also backtracks on useless branches. One example is searching a Boggle board for words. Each tile is surrounded by 8 others, so the tree is huge, and naive DFS can take too long. But when we see a combination like "ZZQ," we can safely stop searching from this point, since adding more letters won't make that a word.
I love these lectures by Prof. Julie Zelenski. She solves 8 queens, a sudoku puzzle, and a number substitution puzzle using backtracking, and everything is nicely animated.
Programming Abstractions, Lecture 10
Programming Abstractions, Lecture 11
A tree is a graph where any two vertices only have one path between them. This eliminates the possibility of cycles. When you're searching a graph, you will usually have some logic to eliminate cycles anyway, so the behavior is the same. Also, with a directed graph, you can't follow edges in the "wrong" direction.
From what I can tell, in the Stallman paper they developed a logic system that doesn't just say "yes" or "no" on a query but actually suggests fixes for incorrect queries by making the smallest number of changes. You can kind of see where the first definition of backtracking might come into play.

Resources