Is the ParMetis generating any information about neighbors of a processor? - mpi

I am working on a parallel finite element method on moving meshes.
So I will need to call ParMETIS_V3_AdaptiveRepart from ParMetis to perform re-partitioning every time I re-mesh.
When successful, the function only generates the partitioning information, i.e. the elements on the processors.
However, the neighbors of a process are important as well, in order to construct the ghost layers of a sub-mesh.
So I am wondering if there is any efficient way to get the information about shared (overlapped) entities and neighbors, or does the ParMetis actually provide this information?

ParMetis is the function ParMETIS_V3_AdaptiveRepart does more or less the smae thing as ParMETIS_V3_PartKway
The ouput of ParMETIS_V3_PartKway is part "an array of size equal to the number of locally-stored vertices. Upon successful completion the
partition vector of the locally-stored vertices is written to this array."
It also returns the number of edges that are cut. (which is only a part of what you want).
But METIS does not provide a way to create the "ghost layers" as you elegantly say.
However since you have the created the graph you know how to find each neighbour for each element. And you can check if your neighbour element is in your current process's graph and if part[element]==part[neighbour_element]. If the neighbour element is not in your current process you will have to do a bit of MPI.

Related

How to find a point on 2-d weighted map, which will have equidistant (as close as possible) paths to multiple endpoints?

So let's say I got a matrix with two types of cells: 0 and 1. 1 is not passable.
I want to find a point, from which I can run paths (say, A*) to a bunch of destinations (don't expect it to be more than 4). And I want the length of these paths to be such that l1/l2/l3/l4 = 1 or as close to 1 as possible.
For two destinations is simple: run a path between them and take the midpoint. For more destinations, I imagine I can run paths between each pair, then they will create a sort of polygon, and I could grab the centroid (or average of all path point coordinates)? Or would it be better to take all midpoints of paths between each pair and then use them as vertices in a polygon which will contain my desired point?
It seems you want to find the point with best access to multiple endpoints. For other readers, this is like trying to found an ideal settlement to trade with nearby cities; you want them to be as accessible as possible. It appears to be a variant of the Weber Problem applied to pathfinding.
The best solution, as you can no longer rely on exploiting geometry (imagine a mountain path or two blocking the way), is going to be an iterative approach. I don't imagine it will be easy to find an optimal solution because you'll need to check every square; you can't guess by pathing between endpoints anymore. In nearly any large problem space, you will need to path from each possible centroid to all endpoints. A suboptimal solution will be fairly fast. I recommend these steps:
try to estimate the centroid using geometry, forming a search area
Use a modified A* algorithm from each point S in the search area to all your target points T to generate a perfect path from S to each T.
Add the length of each path S -> T together to get Cost (probably stored in a matrix for all sample points)
Select the lowest Cost from all your samples in the matrix (or the entire population if you didn't cull the search space).
The algorithm above can also work without estimating a centroid and limiting solutions. If you choose to search the entire space, the search will be much longer, but you can find a perfect solution even in a labyrinth. If you estimate the centroid and start the search near it, you'll find good answers faster.
I mentioned earlier that you should use a modified A* algorithm... Rather than repeating a generic A* search S->Tn for every T, code A* so that it seeks multiple target locations, storing the paths to each one and stopping when it has found them all.
If you really want a perfect solution to the problem, you'll be waiting a long time, so I recommend that you use any exploit you can to reduce wasteful calculations. Even go so far as to store found paths in a lookup table for each T, and see if a point already exists along any of those paths.
To put it simply, finding the point is easy. Finding it fast-enough might take lots of clever heuristics (cost-saving measures) and stored data.

Why are graphs represented using an adjacency list instead of an adjacency set?

If each node maps to a set of the nodes it has edges to, instead of a list, we would gain the ability to gain constant time lookup of edges, instead of having to traverse the whole list. The only disadvantage I can think of is slightly more memory overhead and time to enumerate the edges of a node, but not asymptotically significantly so.
I think "list" is just a generic label, not to be taken literally. I've used a Set and it works perfectly well.
If I recall correctly, my textbook also used a Set.

Union-Find algorithm and determining whether an edge belongs to a cycle in a graph

I'm reading a book about algorithms ("Data Structures and Algorithms in C++") and have come across the following exercise:
Ex. 20. Modify cycleDetectionDFS() so that it could determine whether a particular edge is part of a cycle in an undirected graph.
In the chapter about graphs, the book reads:
Let us recall from a preceding section that depth-first search
guaranteed generating a spanning tree in which no elements of edges
used by depthFirstSearch() led to a cycle with other element of edges.
This was due to the fact that if vertices v and u belonged to edges,
then the edge(vu) was disregarded by depthFirstSearch(). A problem
arises when depthFirstSearch() is modified so that it can detect
whether a specific edge(vu) is part of a cycle (see Exercise 20).
Should such a modified depth-first search be applied to each edge
separately, then the total run would be O(E(E+V)), which could turn
into O(V^4) for dense graphs. Hence, a better method needs to be
found.
The task is to determine if two vertices are in the same set. Two
operations are needed to implement this task: finding the set to which
a vertex v belongs and uniting two sets into one if vertex v belongs
to one of them and w to another. This is known as the union-find
problem.
Later on, author describes how to merge two sets into one in case an edge passed to the function union(edge e) connects vertices in distinct sets.
However, still I don't know how to quickly check whether an edge is part of a cycle. Could someone give me a rough explanation of such algorithm which is related to the aforementioned union-find problem?
a rough explanation could be checking if a link is a backlink, whenever you have a backlink you have a loop, and whenever you have a loop you have a backlink (that is true for directed and undirected graphs).
A backlink is an edge that points from a descendant to a parent, you should know that when traversing a graph with a DFS algorithm you build a forest, and a parent is a node that is marked finished later in the traversal.
I gave you some pointers to where to look, let me know if that helps you clarify your problems.

Traversal of directed acyclic weighted graph with constraints

I have a directed acyclic weighted graph which I want to traverse.
The constraints for a valid solution route are:
The sum of the weights of all edges traversed in the route must be the highest possible in the graph, taking in mind the second constraint.
Exactly N vertices must have been visited in the chosen route (including the start and end vertex).
Typically the graph will have a high amount of vertices and edges, so trying all possibilities is not an option, and requires quite an efficient algorithm.
Looking for some pointers or a suitable algorithm for this problem. I know the first condition is easily fulfilled using Dijkstra's algorithm, but I am not sure how to incorporate the second condition, or even where to begin to look.
Please let me know if any additional information is needed.
I'm not sure if you are interested in any path of length N in the graph or just path between two specific vertices; I suspect the latter, but you did not mention that constraint in your question.
If the former, the solution should be a trivial Dijkstra-like algorithm where you sort all edges by their potential path value that starts at the edge weight and gets adjusted by already built adjecent paths. In each iteration, take the node with the best potential path value and add it to an adjecent path. Stop when you get a path of length N (or longer that you cut off at the sides). There are some other technical details esp. wrt. creating long paths, but I won't go into details as I suspect this is not what you are interested in. :-)
If you have fixed source and sink, I think there is no deep magic involved - just run a basic Dijkstra where a path will be associated with each vertex added to the queue, but do not insert vertices with path length >= N into the queue and do not insert sink into the queue unless its path length is N.

Three ways to store a graph in memory, advantages and disadvantages

There are three ways to store a graph in memory:
Nodes as objects and edges as pointers
A matrix containing all edge weights between numbered node x and node y
A list of edges between numbered nodes
I know how to write all three, but I'm not sure I've thought of all of the advantages and disadvantages of each.
What are the advantages and disadvantages of each of these ways of storing a graph in memory?
One way to analyze these is in terms of memory and time complexity (which depends on how you want to access the graph).
Storing nodes as objects with pointers to one another
The memory complexity for this approach is O(n) because you have as many objects as you have nodes. The number of pointers (to nodes) required is up to O(n^2) as each node object may contain pointers for up to n nodes.
The time complexity for this data structure is O(n) for accessing any given node.
Storing a matrix of edge weights
This would be a memory complexity of O(n^2) for the matrix.
The advantage with this data structure is that the time complexity to access any given node is O(1).
Depending on what algorithm you run on the graph and how many nodes there are, you'll have to choose a suitable representation.
A couple more things to consider:
The matrix model lends itself more easily to graphs with weighted edges, by storing the weights in the matrix. The object/pointer model would need to store edge weights in a parallel array, which requires synchronization with the pointer array.
The object/pointer model works better with directed graphs than undirected graphs because the pointers would need to be maintained in pairs, which can become unsynchronized.
The objects-and-pointers method suffers from difficulty of search, as some have noted, but are pretty natural for doing things like building binary search trees, where there's a lot of extra structure.
I personally love adjacency matrices because they make all kinds of problems a lot easier, using tools from algebraic graph theory. (The kth power of the adjacency matrix give the number of paths of length k from vertex i to vertex j, for example. Add an identity matrix before taking the kth power to get the number of paths of length <=k. Take a rank n-1 minor of the Laplacian to get the number of spanning trees... And so on.)
But everyone says adjacency matrices are memory expensive! They're only half-right: You can get around this using sparse matrices when your graph has few edges. Sparse matrix data structures do exactly the work of just keeping an adjacency list, but still have the full gamut of standard matrix operations available, giving you the best of both worlds.
I think your first example is a little ambiguous — nodes as objects and edges as pointers. You could keep track of these by storing only a pointer to some root node, in which case accessing a given node may be inefficient (say you want node 4 — if the node object isn't provided, you may have to search for it). In this case, you'd also lose portions of the graph that aren't reachable from the root node. I think this is the case f64 rainbow is assuming when he says the time complexity for accessing a given node is O(n).
Otherwise, you could also keep an array (or hashmap) full of pointers to each node. This allows O(1) access to a given node, but increases memory usage a bit. If n is the number of nodes and e is the number of edges, the space complexity of this approach would be O(n + e).
The space complexity for the matrix approach would be along the lines of O(n^2) (assuming edges are unidirectional). If your graph is sparse, you will have a lot of empty cells in your matrix. But if your graph is fully connected (e = n^2), this compares favorably with the first approach. As RG says, you may also have fewer cache misses with this approach if you allocate the matrix as one chunk of memory, which could make following a lot of edges around the graph faster.
The third approach is probably the most space efficient for most cases — O(e) — but would make finding all the edges of a given node an O(e) chore. I can't think of a case where this would be very useful.
Take a look at comparison table on wikipedia. It gives a pretty good understanding of when to use each representation of graphs.
Okay, so if edges don't have weights, the matrix can be a binary array, and using binary operators can make things go really, really fast in that case.
If the graph is sparse, the object/pointer method seems a lot more efficient. Holding the object/pointers in a data structure specifically to coax them into a single chunk of memory might also be a good plan, or any other method of getting them to stay together.
The adjacency list - simply a list of connected nodes - seems by far the most memory efficient, but probably also the slowest.
Reversing a directed graph is easy with the matrix representation, and easy with the adjacency list, but not so great with the object/pointer representation.
There is another option: nodes as objects, edges as objects too, each edge being at the same time in two doubly-linked lists: the list of all edges coming out from the same node and the list of all edges going into the same node.
struct Node {
... node payload ...
Edge *first_in; // All incoming edges
Edge *first_out; // All outgoing edges
};
struct Edge {
... edge payload ...
Node *from, *to;
Edge *prev_in_from, *next_in_from; // dlist of same "from"
Edge *prev_in_to, *next_in_to; // dlist of same "to"
};
The memory overhead is big (2 pointers per node and 6 pointers per edge) but you get
O(1) node insertion
O(1) edge insertion (given pointers to "from" and "to" nodes)
O(1) edge deletion (given the pointer)
O(deg(n)) node deletion (given the pointer)
O(deg(n)) finding neighbors of a node
The structure also can represent a rather general graph: oriented multigraph with loops (i.e. you can have multiple distinct edges between the same two nodes including multiple distinct loops - edges going from x to x).
A more detailed explanation of this approach is available here.

Resources