I've been studying up on graphs using the Adjacency List implementation, and I am reading that adding an edge is an O(1) operation.
This makes sense if you are just tacking an edge on to the Vertex's Linked List of edges, but I don't understand how this could be the case so long as you care about removing the old edge if one already exists. Finding that edge would take O(V) time.
If you don't do this, and you add an edge that already exists, you would have duplicate entries for that edge, which means they could have different weights, etc.
Can anyone explain what I seem to be missing? Thanks!
You're right at your complecxity analysis. Find if edge already exist is truly O(V). But notice that adding this edge even if existed is still O(1).
You need to remember that having 2 edges with the same source an destination are valid input to graph - even with different weights (maybe not even but because).
That way adding edge to adjacency-list-graph is O(1)
What people usually do to have both optimal search time complexity and the advantages of adjacency lists is to use an array of hashsets instead of an array of lists.
Alternatively,
If you want a worst-case optimal solution, use RadixSort to order the
list of all edges in O(v+e) time, remove duplicates, and then build
the adjacency list representation in the usual way.
source: https://www.quora.com/What-are-the-various-approaches-you-can-use-to-build-adjacency-list-representation-of-a-undirected-graph-having-time-complexity-better-than-O-V-*-E-and-avoiding-duplicate-edges
Related
So let's say I got a matrix with two types of cells: 0 and 1. 1 is not passable.
I want to find a point, from which I can run paths (say, A*) to a bunch of destinations (don't expect it to be more than 4). And I want the length of these paths to be such that l1/l2/l3/l4 = 1 or as close to 1 as possible.
For two destinations is simple: run a path between them and take the midpoint. For more destinations, I imagine I can run paths between each pair, then they will create a sort of polygon, and I could grab the centroid (or average of all path point coordinates)? Or would it be better to take all midpoints of paths between each pair and then use them as vertices in a polygon which will contain my desired point?
It seems you want to find the point with best access to multiple endpoints. For other readers, this is like trying to found an ideal settlement to trade with nearby cities; you want them to be as accessible as possible. It appears to be a variant of the Weber Problem applied to pathfinding.
The best solution, as you can no longer rely on exploiting geometry (imagine a mountain path or two blocking the way), is going to be an iterative approach. I don't imagine it will be easy to find an optimal solution because you'll need to check every square; you can't guess by pathing between endpoints anymore. In nearly any large problem space, you will need to path from each possible centroid to all endpoints. A suboptimal solution will be fairly fast. I recommend these steps:
try to estimate the centroid using geometry, forming a search area
Use a modified A* algorithm from each point S in the search area to all your target points T to generate a perfect path from S to each T.
Add the length of each path S -> T together to get Cost (probably stored in a matrix for all sample points)
Select the lowest Cost from all your samples in the matrix (or the entire population if you didn't cull the search space).
The algorithm above can also work without estimating a centroid and limiting solutions. If you choose to search the entire space, the search will be much longer, but you can find a perfect solution even in a labyrinth. If you estimate the centroid and start the search near it, you'll find good answers faster.
I mentioned earlier that you should use a modified A* algorithm... Rather than repeating a generic A* search S->Tn for every T, code A* so that it seeks multiple target locations, storing the paths to each one and stopping when it has found them all.
If you really want a perfect solution to the problem, you'll be waiting a long time, so I recommend that you use any exploit you can to reduce wasteful calculations. Even go so far as to store found paths in a lookup table for each T, and see if a point already exists along any of those paths.
To put it simply, finding the point is easy. Finding it fast-enough might take lots of clever heuristics (cost-saving measures) and stored data.
If each node maps to a set of the nodes it has edges to, instead of a list, we would gain the ability to gain constant time lookup of edges, instead of having to traverse the whole list. The only disadvantage I can think of is slightly more memory overhead and time to enumerate the edges of a node, but not asymptotically significantly so.
I think "list" is just a generic label, not to be taken literally. I've used a Set and it works perfectly well.
If I recall correctly, my textbook also used a Set.
I'm trying to solve the standard bipartization problem, i.e., find a subset of the edges such that the output graph is a bipartite graph.
My additional constraints are:
The number of vertices on each side must be equal.
Each vertex has exactly 1 edge.
In fact, it would suffice to know whether such a subset exists at all - I don't really need the construction itself.
Optimally, the algorithm should be fast as I need to run it for O(400) nodes repeatedly.
If each vertex is to be incident on exactly one edge, it seems what you want is a matching. If so, Edmonds's blossom algorithm will do the job. I haven't used an implementation of the algorithm to recommend. You might check out http://www.algorithmic-solutions.com/leda/ledak/index.htm
I'm reading a book about algorithms ("Data Structures and Algorithms in C++") and have come across the following exercise:
Ex. 20. Modify cycleDetectionDFS() so that it could determine whether a particular edge is part of a cycle in an undirected graph.
In the chapter about graphs, the book reads:
Let us recall from a preceding section that depth-first search
guaranteed generating a spanning tree in which no elements of edges
used by depthFirstSearch() led to a cycle with other element of edges.
This was due to the fact that if vertices v and u belonged to edges,
then the edge(vu) was disregarded by depthFirstSearch(). A problem
arises when depthFirstSearch() is modified so that it can detect
whether a specific edge(vu) is part of a cycle (see Exercise 20).
Should such a modified depth-first search be applied to each edge
separately, then the total run would be O(E(E+V)), which could turn
into O(V^4) for dense graphs. Hence, a better method needs to be
found.
The task is to determine if two vertices are in the same set. Two
operations are needed to implement this task: finding the set to which
a vertex v belongs and uniting two sets into one if vertex v belongs
to one of them and w to another. This is known as the union-find
problem.
Later on, author describes how to merge two sets into one in case an edge passed to the function union(edge e) connects vertices in distinct sets.
However, still I don't know how to quickly check whether an edge is part of a cycle. Could someone give me a rough explanation of such algorithm which is related to the aforementioned union-find problem?
a rough explanation could be checking if a link is a backlink, whenever you have a backlink you have a loop, and whenever you have a loop you have a backlink (that is true for directed and undirected graphs).
A backlink is an edge that points from a descendant to a parent, you should know that when traversing a graph with a DFS algorithm you build a forest, and a parent is a node that is marked finished later in the traversal.
I gave you some pointers to where to look, let me know if that helps you clarify your problems.
There are three ways to store a graph in memory:
Nodes as objects and edges as pointers
A matrix containing all edge weights between numbered node x and node y
A list of edges between numbered nodes
I know how to write all three, but I'm not sure I've thought of all of the advantages and disadvantages of each.
What are the advantages and disadvantages of each of these ways of storing a graph in memory?
One way to analyze these is in terms of memory and time complexity (which depends on how you want to access the graph).
Storing nodes as objects with pointers to one another
The memory complexity for this approach is O(n) because you have as many objects as you have nodes. The number of pointers (to nodes) required is up to O(n^2) as each node object may contain pointers for up to n nodes.
The time complexity for this data structure is O(n) for accessing any given node.
Storing a matrix of edge weights
This would be a memory complexity of O(n^2) for the matrix.
The advantage with this data structure is that the time complexity to access any given node is O(1).
Depending on what algorithm you run on the graph and how many nodes there are, you'll have to choose a suitable representation.
A couple more things to consider:
The matrix model lends itself more easily to graphs with weighted edges, by storing the weights in the matrix. The object/pointer model would need to store edge weights in a parallel array, which requires synchronization with the pointer array.
The object/pointer model works better with directed graphs than undirected graphs because the pointers would need to be maintained in pairs, which can become unsynchronized.
The objects-and-pointers method suffers from difficulty of search, as some have noted, but are pretty natural for doing things like building binary search trees, where there's a lot of extra structure.
I personally love adjacency matrices because they make all kinds of problems a lot easier, using tools from algebraic graph theory. (The kth power of the adjacency matrix give the number of paths of length k from vertex i to vertex j, for example. Add an identity matrix before taking the kth power to get the number of paths of length <=k. Take a rank n-1 minor of the Laplacian to get the number of spanning trees... And so on.)
But everyone says adjacency matrices are memory expensive! They're only half-right: You can get around this using sparse matrices when your graph has few edges. Sparse matrix data structures do exactly the work of just keeping an adjacency list, but still have the full gamut of standard matrix operations available, giving you the best of both worlds.
I think your first example is a little ambiguous — nodes as objects and edges as pointers. You could keep track of these by storing only a pointer to some root node, in which case accessing a given node may be inefficient (say you want node 4 — if the node object isn't provided, you may have to search for it). In this case, you'd also lose portions of the graph that aren't reachable from the root node. I think this is the case f64 rainbow is assuming when he says the time complexity for accessing a given node is O(n).
Otherwise, you could also keep an array (or hashmap) full of pointers to each node. This allows O(1) access to a given node, but increases memory usage a bit. If n is the number of nodes and e is the number of edges, the space complexity of this approach would be O(n + e).
The space complexity for the matrix approach would be along the lines of O(n^2) (assuming edges are unidirectional). If your graph is sparse, you will have a lot of empty cells in your matrix. But if your graph is fully connected (e = n^2), this compares favorably with the first approach. As RG says, you may also have fewer cache misses with this approach if you allocate the matrix as one chunk of memory, which could make following a lot of edges around the graph faster.
The third approach is probably the most space efficient for most cases — O(e) — but would make finding all the edges of a given node an O(e) chore. I can't think of a case where this would be very useful.
Take a look at comparison table on wikipedia. It gives a pretty good understanding of when to use each representation of graphs.
Okay, so if edges don't have weights, the matrix can be a binary array, and using binary operators can make things go really, really fast in that case.
If the graph is sparse, the object/pointer method seems a lot more efficient. Holding the object/pointers in a data structure specifically to coax them into a single chunk of memory might also be a good plan, or any other method of getting them to stay together.
The adjacency list - simply a list of connected nodes - seems by far the most memory efficient, but probably also the slowest.
Reversing a directed graph is easy with the matrix representation, and easy with the adjacency list, but not so great with the object/pointer representation.
There is another option: nodes as objects, edges as objects too, each edge being at the same time in two doubly-linked lists: the list of all edges coming out from the same node and the list of all edges going into the same node.
struct Node {
... node payload ...
Edge *first_in; // All incoming edges
Edge *first_out; // All outgoing edges
};
struct Edge {
... edge payload ...
Node *from, *to;
Edge *prev_in_from, *next_in_from; // dlist of same "from"
Edge *prev_in_to, *next_in_to; // dlist of same "to"
};
The memory overhead is big (2 pointers per node and 6 pointers per edge) but you get
O(1) node insertion
O(1) edge insertion (given pointers to "from" and "to" nodes)
O(1) edge deletion (given the pointer)
O(deg(n)) node deletion (given the pointer)
O(deg(n)) finding neighbors of a node
The structure also can represent a rather general graph: oriented multigraph with loops (i.e. you can have multiple distinct edges between the same two nodes including multiple distinct loops - edges going from x to x).
A more detailed explanation of this approach is available here.