Graph traversal - graph

At a party with n people P1, . . . , Pn, certain pairs of individuals cannot stand each other.
Given a list of such pairs, determine if we can divide the n people into two groups such that all the people
in both group are amicable, that is, they can stand each other.

Suppose we have a G that the pairs of people cannot be in the same group has a edge between them. Use DFS in this G and set Group1 for s, and then Group2 for its successor, and then Group2.... If we can finish it, we find it, otherwise, there are some collisions, which means we can't divide them into two groups as the question asked.

One brute force solution would be to find all possible combinations of n choose n/2 people and then verify that everyone in the group is amicable, if so, then you must check everyone in the other half as well. If both sides are happy then you've found a solution. Otherwise, move on to the next combination. Obviously, this is not an ideal solution, but it does work deterministically. Typically in an interview, it is best to start with something that works and iterate on to better ideas.
A more sophisticated solution would compute the complement graph, then remove any edges that are not bi-directional, pick an arbitrary node to start from, use depth-first search, mark every node found in group 1. Then pick any unmarked node, and mark every node found in group 2. If there are any remaining unmarked nodes, then the individuals cannot be divided into two amicable groups.

Related

How to split vertices into two paths with restriction on which two vertices can be adjancent

I am trying to solve a problem where I have queue of N people and M pairs of people who can't stand each other. I need to split these people into two groups such that people who can't stand each other are not adjacent in one queue and to minimize the difference between the length of these two queues. All the approaches I've tried so far are either not working or extremely slow and I am not sure what to try next.

decision tree for significant variables

how can I use decision tree graph to determine the significant variables,I know which one has largest information gain should be in the root of tree which means has small entropy so this is my graph if I want to know which variables are significant how can I interpret
What does significant mean to you? At each node, the variable selected it the most significant given the context and assuming that selecting by information gain will actually work (it's not always the case). For example, at node 11, BB is the most significant discriminator given AA>20.
Clearly, AA and BB are the most useful assuming selecting by information gain gives the best way to partition the data. The rest give further refinement. C and N would be next.
What you should be asking is: Should I keep all the nodes?
The answer depends on many things and there is likely no best answer.
One way would be by using the total case count of each leaf and merge them.
Not sure how I would do this given your image. It's not really clear what is being shown at the leaves and what 'n' is. Also not sure what 'p' is.

Find solution minimum spanning tree (with conditions) when extending graph

I have a logic question, therefore chose from two explanations:
Mathematical:
I have a undirected weighted complete graph over 2-14 nodes. The nodes always come in pairs (startpoint to endpoint). For this I already have the minimum spanning tree, which considers that the pairs startpoint always comes before his endpoint. Now I want to add another pair of nodes.
Real life explanation:
I already have a optimal taxi route for 1-7 people. Each joins (startpoint) and leaves (endpoint) at different places. Now I want to find the optimal route when I add another person to the taxi. I have already the calculated subpaths from each point to each point in my database (therefore this is a weighted graph). All calculated paths are real value, not heuristics.
Now I try to find the most performant solution to solve this. My current idea:
Find the point nearest to the new startpoint. Add it a) before and b) after this point. Choose the faster one.
Find the point nearest to the new endpoint. Add it a) before and b) after this point. Choose the faster one.
Ignoring the case that the new endpoint comes before the new start point, this seams feasible.
I expect that the general direction of the taxi is one direction, this eliminates the following edge case.
Is there any case I'm missing in which this algorithm wouldn't calculate the optimal solution?
There are definitely many cases were this algorithm (which is a First Fit construction heuristic) won't find the optimal solution. Given a reasonable sized dataset, in my experience, I would guess to get improvements of 10-20% by simply taking that result and adding metaheuristics (or other optimization algo's).
Explanation:
If you have multiple taxis with a limited person capacity, it has an inherit bin packing problem, which is NP-complete (which is proven to be suboptimally solved by all known construction heuristics in P).
But even if you have just 1 taxi, it is similar to TSP: if you have the optimal solution for 10 locations and add 1 location, it can create a snowball effect in the optimal solution to make the optimal solution look completely different. (sorry, no visual image of this yet)
And if you need to any additional constraints on top of that later on, you need to be aware of these false assumptions.

How can I check if two graphs with LABELED vertices are isomorphic?

For example, suppose I had a graph G that had all blue nodes and one red node. I also had a graph F that had all blue and one red node.
What is an algorithm that I can run to verify that these two graphs are isomorphic with respect to their colored nodes?
I have made a few attempts at trying to create a polynomial graph isomorphism algorithm, and while I have yet to create an algorithm that is proven to be polynomial for every case, one algorithm I came up with is particularly suited for this purpose. It's based on a DFA minimization algorithm (the specific algorithm is http://en.wikipedia.org/wiki/DFA_minimization#Hopcroft.27s_algorithm ; you may want to find a description from elsewhere, since Wikipedia's is difficult to follow).
The original algorithm was initialized by organizing the vertexes into distinct groups based on degree (one group for vertexes of degree 1, one for vertexes of degree 2, etc.). For your purposes, you will want to organize the vertexes into groups based upon both degree and label; this will ensure that no two nodes will be paired if they have different labels. Each graph should have its own structure containing such groups. Check the collection of groups for both graphs; there should be the same number of groups for the two graphs, and for each group in one graph, there should be a group in the other graph containing the same number of vertexes of the same degree and label. If this isn't the case, the graphs aren't isomorphic.
At each iteration of the main algorithm, you should generate a new data structure for each of the two graphs for the vertex groups that the next step will use. For each group, generate a list for each vertex of group indices/IDs that correspond to the vertexes that are adjacent to the vertex in question (include duplicate groups in this list). Check each group to see if the sorted group index/ID list for each contained vertex is the same. If this is the case, create a unmodified copy of this group in the next step's group structure. If this isn't the case, then for each unique list of group indices/IDs within that group, create a new group for vertexes within the original group that generated that list and add this new group to the next step's group structure. If you do not subdivide any of the groups of either graph in a given iteration, stop running the main portion of this algorithm. If you subdivide at least one group, you will need to once again check to make sure the group structures of the two graphs correspond to each other. This check will be similar to the one performed at the end of the algorithm's initialization (you may even be able to use the same function for both). If this check fails, then the graphs aren't isomorphic. If the check passes, then discard/free the current group structures and start the next iteration with the freshly created ones.
To make the process of determining "corresponding groups" easier, I would highly recommend using a predictable scheme for adding groups to the structure. For example, if you add groups during initialization in (degree, label) order, subdivide groups in ascending index order, and add subdivided groups to the new structure based on the order of the group index list (i.e., sorted by first listed index, then second, etc.), then corresponding groups between the two group structures will always have the same index, which makes the process of keeping track of which groups correspond to each other much easier.
If all groups contain 3 or fewer vertexes when the algorithm completes, then the graphs are isomorphic (for corresponding groups containing 2 or 3 vertexes, any vertex pairing is valid). If this isn't the case (this always happens for graphs where all nodes have equal degree and label, and sometimes happens for subgraphs with that property), then the graphs are not yet determined to be isomorphic or non-isomorphic. To differentiate between the two cases, choose an arbitrary node of the first graph's largest group and separate it into its own group. Then, for each node of the other graph's largest group, try running the algorithm again with that node separated into its own group. In essence, you are choosing an unpaired node from the first graph and pairing it by guess-and-check to every node in the second graph that is still a plausible pairing. If any of the forked iterations returns an isomorphism, the graphs are isomorphic. If none of them do, the graphs are not isomorphic.
For general cases, this algorithm is polynomial. In corner cases, the algorithm might be exponential. Whether this is the case or not is related to how frequently the algorithm can be forced to fork in the worst case of both graph input and node selection, which I have had difficulties trying to put useful bounds on. For example, although the algorithm forks at every step when comparing two full graphs, every branch of that tree produces an isomorphism; therefore, the algorithm returns in polynomial time in this case even though traversing the entire execution tree would require exponential time since traversing only one branch of the execution tree takes polynomial time.
Regardless, this algorithm should work well for your purposes. I hope my explanation of it was comprehensible; if not, I can try providing examples of the algorithm handling simple cases or expressing it as pseudocode instead.
Years ago, I created a simple and flexible algorithm for exactly this problem (graph isomorphism with labels).
I named it "Powerhash", and to create the algorithm it required two insights. The first is the power iteration graph algorithm, also used in PageRank. The second is the ability to replace power iteration's inside step function with anything that we want. I replaced it with a function that does the following on each iteration, and for each node:
Sort the hashes (from previous iteration) of the node's neighbors
Hash the concatenated sorted hashes
Replace node's hash with newly computed hash
On the first step, a node's hash is affected by its direct neighbors. On the second step, a node's hash is affected by the neighborhood 2-hops away from it. On the Nth step a node's hash will be affected by the neighborhood N-hops around it. So you only need to continue running the Powerhash for N = graph_radius steps. In the end, the graph center node's hash will have been affected by the whole graph.
To produce the final hash, sort the final step's node hashes and concatenate them together. After that, you can compare the final hashes to find if two graphs are isomorphic. If you have labels, then add them (on the first iteration) in the internal hashes that you calculate for each node.
For more on this you can look at my post here:
https://plus.google.com/114866592715069940152/posts/fmBFhjhQcZF
The algorithm above was implemented inside the "madIS" functional relational database. You can find the source code of the algorithm here:
https://github.com/madgik/madis/blob/master/src/functions/aggregate/graph.py
Just checking; do you mean strict graph isomorphism or something else? Isomorphic graphs have the same adjacency relations (I.e. if node A is adjacent to node B in one graph then node g(A) is adjacent to node g(B) in another graph that is the result of applying the transformation g to the first one...) If you just wanted to check of one graph has the same types and number of nodes as another then you can just compare counts.

general idea and maybe example on graph

I'm looking for a general idea (and maybe some code example or at least pseudocode)
Now, this is from a problem that someone gave me, or rather showed me, I don't have to solve it, but I did most of the questions anyway, the problem that I'm having is this:
Let's say you have a directed weighted graph with the following nodes:
AB5, BC4, CD8, DC8, DE6, AD5, CE2, EB3, AE7
and the question is:
how many different routes from C to C with a distance of less than x. (say, 10, 20, 30, 40)
The answer of different trips is: CDC, CEBC, CEBCDC, CDCEBC, CDEBC, CEBCEBC, CEBCEBCEBC.
The main problem I'm having with it is that when I do DFS or BFS, my implementation first chooses the node and marks it as visited therefore I'm only able to find 2 paths which are CDC and CEBC and then my algorithm quits. If I don't mark it as visited then on the next iteration (or recursive call) it will choose the same node and not next available route, so I have to always mark them as visited however by doing that how can I get for example CEBCEBCEBC, which is pretty much bouncing between nodes.
I've looked at all the different algorithms books that I have at home and while every algorithm describes how to do DFS, BFS and find shortest paths (all the good stufF), none show how to iterate indefinitively and stop only when one reaches certain weight of the graph or hits certain vertex number of times.
So why not just keep branching and branching; at each node you will evaluate two things; has this particular path exceeded the weight limit (if so, terminate the branch) and is this node where I started (in which case log my path history to an 'acceptable solutions' list); then make new branches which each take a step in each possible direction.
You should not mark nodes as visited; as MikeB points out, CDCDC is a valid solution and yet it revisits D.
I'd do it lke this:
Start with two lists of paths:
Solutions (empty) and
ActivePaths (containing one path, "C").
While ActivePaths is not empty,
Take a path out of ActivePaths (suppose it's "CD"[8]).
If its distance is not over the limit,
see where you are by looking at the last node in the path ("D").
If you're at "C", add a copy of this path to Solutions.
Now for each possible next destination ("C", "E")
make a copy of this path, ("CD"[8])
append the destination, ("CDC"[8])
add the weight, ("CDC"[16])
and put it in ActivePaths
Discard the path.
Whether this turns out to be a DFS, a BFS or something else depends on where in ActivePaths you insert and remove paths.
No offense, but this is pretty simple and you're talking about consulting a lot of books for the answer. I'd suggest playing around with the simple examples until they become more obvious.
In fact you have two different problems:
Find all distinct cycles from C to C, we will call them C_1, C_2, ..., C_n (done with a DFS)
Each C_i has a weight w_i, then you want every combination of cycles with a total weight less than N. This is a combinatorial problem (and seems to be easily solvable with dynamic programming).

Resources