I have a set of disconnected graphs, each one represented by a set of triples (head, edge, tail).
Given a new graph, I want to find which graph in the forest is structurally most similar to the new graph.
I am able to create feature vectors representing individual nodes and edges for each graph, but I dont understand how to represent and use them for the purpose of finding similarity.
Related
I know that there are many techniques to represent graphs.
Suppose I have a directed massive 3D graph with 100,000 nodes at maximum.
Suppose the graph looks somewhat like the following:
Suppose each node of the graph has three pieces of information:
A 30-character string as a label
floating point values as coordinates
three integer values
The graph is dynamic. I.e., connections frequently change, and the nodes frequently change their coordinates.
What would be the most efficient way to represent this graph in computer memory so that I can apply mathematical operations on each node?
Should I use data structures, or should I use big-data analytics or ML?
I have two successive graphs, the two graphs might have different vertex set and edge set. And I want to use GNN to learn the positions of the second graph, for preserving the mental map. Which GNN model and loss could do this task?
I have found a partitioning algorithm that works on hypergraphs and its name is hMETIS, but my input is in the form of a simple weighted graph. Is there any technique that maps a graph to a hypergraph?
In general: No.
A graph contains information on binary interactions between two vertices, and there is no way to extract the information about the higher order interactions.
In short, if I give you a hypergraph I can use (multiple methods) to turn it into a graph, but that graph could be the result of multiple hypergraphs.
There are a few exceptions to this, notably if you have more information about the vertices outside of the graph, or if the graph is bipartite.
There are many theories about calculating of graph similarity such as vertex edge overlap, jacard, co-sine, edit distance, signature similarity, lambda distance, deltacon so on. These things are based on single edge of the graph. But there are many graphs having multiple edges in real world.
Given similar two graphs like above, how could we calculate graph similarity?
Using previous graph similarity, there are only 2-dimension vector and the entry is just scalar that is number, but in multiple edge's graph, the entry should be tuple. Because there are one more actions between nodes. For the previous method, it could be called who-knows-whom schem, but latter graph, it could be said who-knows-whom*-how*. I think the previous mothods could be used for the multiple edge's graph easily, so there aren't logic or methods about it.
Thanks in advance!
There is not "the" way yo compute graph similarity.
Depending on your data and problem, very different approaches may be good. In many cases, simply merging the two edges into one makes perfect sense. For example, if I have two roads of capacity x and y to go from A to B - for many analyses this is comparable to having just one rode, with the combined capacity.
I have found this paper so far. Is it outdated? Are there any faster and better implementations?
By the way, Wikipedia says that there can be n^n-2 spanning trees in a undirected graph. How many spanning trees can be in a directed graph?
If you use terms from paper you mentioned and you define spanning tree of directed graph as tree rooted in vertex r, having unique path from r to any other vertex then:
It's obvious that worst case when directed graph has the greatest number of the spanning trees is complete graph (there are a->b and b->a edges for any pair).
If we "forget" about directions we will get n^{n-2} spanning trees as in case of undirected graphs. For any of this spanning trees we have n options to choose a root, and this choice define uniquely define directions of edges we need to use. Not hard to see, that all trees we get are spanning, unique and there are no nother options. So we get n^{n-1} spanning trees. Strict proof will take time, I hope that simple explanation is enough.
So this task will take exponential time depend from vertex count in worst case. Considering the size of output (all spanning trees), I conclude that for arbitrary graph, algorithm can not be significantly faster and better. I think you need to somehow reformulate your original problem to not deal with all spanning trees, and may be search only needed by some criteria.
for undirected graph only....
n^n-2 spanning tress are possible for only complete graph....to find total number of spanning trees of any graph u can apply this method.....
find the adjacency matrix of the graph.
if column values are represented by 'i' and row entries by 'j' then...
if i=j...then the value will be the degree of vertex
suppose,there is a single edge between vertex v1 and v2 then the value of matrix entry will be -1......7 if there are two edges then it will be -2...& so on...
after constructing adjacency matrix....exclude any row and column...i.e, Nth row and Nth column....
answer will be the total number of spanning tress.