Clustering Graphs using graph distance

Clustering Graphs using graph distance - graph

What I'm currently doing is:
Train a GNN and see which graphs are labelled wrongly compared to the ground truth.
Use a GNN-explainer model to help explain which minimum sub-graph is responsible for the mislabeling by checking the wrongly label instances.
Use the graph_edit_distance from networkx to see how much these graphs differentiate from another.
See if I can find clusters that help explain why the GNN might label some graphs wrongly.
Does this seem reasonable?
How would I go around step 4? Would I use something like sklearn_extra.cluster.KMedoids?
All help is appreciated!

Use the graph_edit_distance from networkx to see how much these graphs
differentiate from another.
Guessing this gives you a single number for any pair of graphs.
The question is: on what direction is this number? How many dimensions ( directions ) are there? Suppose two graphs have the same distance from a third. Does this mean that the two graphs are close together, forming a cluster at a distance from the third graph?
If you have answers to the questions in the previous paragraph, then the KMeans algorithm can find clusters for as many dimensions as you might have. It is fast and easy to code, usually giving satisfactory results. https://en.wikipedia.org/wiki/K-means_clustering

Related

Optimal number of graph vertices when conducting an experiment

I want to conduct an experiment about graph drawing algorithms and for this purpose I have to generate graphs, but I don't know what is the optimal number of graph vertices that should be generated, is it 100 or 200 vertices ? What is the best number of vertices that humans can understand and comprehend ? How can I decide that, do you have any ideas or some papers that are useful for me, I searched online about this topic in Google scholar and many other papers search engine, but I did not find anything.
Thanks in advance

This is a very broad question. Size and type of graphs may depend on the research focus.
The GDToolkit (which i am not affiliated with) publishes several graph drawing test case collections from academic literature which might be a starting point.
In general graph drawing gets more interesting the higher the number of vertices is, especially if labelling comes into play.
A number of vertices up to 100 (maybe more in graphs with a structure to exploit geometrically) has the benefit that you can ask humans to layout the graph and compare their results with what the tested algos produce.
As for the maximum number of vertices that people can 'understand', there is no fixed limit - think of a 2D or 3D lattice, the number of vertices up to which humans can grasp the essence of the graph is virtually unlimited.
There is of course a lot of leeway in what you mean exactly by 'understand'. In general human respondents will be able to tell about non-trivial properties of the graph or create hypotheses on such properties if some visual pattern shows up (this might be an interesting research topic in itself [I have not checked for existing work in this domain], think of 'distorted' drawings of lattices or drawings projections of lattices in higher dimensions).

graph similarity having multiple edges between two nodes

There are many theories about calculating of graph similarity such as vertex edge overlap, jacard, co-sine, edit distance, signature similarity, lambda distance, deltacon so on. These things are based on single edge of the graph. But there are many graphs having multiple edges in real world.
Given similar two graphs like above, how could we calculate graph similarity?
Using previous graph similarity, there are only 2-dimension vector and the entry is just scalar that is number, but in multiple edge's graph, the entry should be tuple. Because there are one more actions between nodes. For the previous method, it could be called who-knows-whom schem, but latter graph, it could be said who-knows-whom*-how*. I think the previous mothods could be used for the multiple edge's graph easily, so there aren't logic or methods about it.
Thanks in advance!

There is not "the" way yo compute graph similarity.
Depending on your data and problem, very different approaches may be good. In many cases, simply merging the two edges into one makes perfect sense. For example, if I have two roads of capacity x and y to go from A to B - for many analyses this is comparable to having just one rode, with the combined capacity.

Edge detection for 3d point clouds/meshes with R

The detection of edges in 3d objects may be the first step for the automatic processing of particular characteristics and landmarks.
Thus, I'm looking for a method to identify such edges for some of my 3d-scanned objects.
However, with all my ideas (Hough transformation, angles threshold for neighboring vertices) I didn't succeed.
Thus, I'd be quite happy if someone could point me to a solution to the edge-finding-problem for 3d point clouds which can be applied using R.

There is a nice paper from last year about this topic.
Basically, you need to compute several features, for each point, based on it's neighbors.
I usually prefer Python over R so I'm not aware of any point-cloud processing package un R. But the implementation of that paper in R should be easy.
If you can translate Python-R, you can take a look at this library that I wrote as it has already implemented the computation of all the features mentioned on that paper.
If that helps you, in this answer you can find example code on how to add the curvature for each point. You just have to replace the word curvature with the other names of features.

Algorithm for placing nodes in a graph

I have been trying to create an algorithm that can create a graph. It is not a tree graph as nodes can have multiple parents, more like an activity diagram. My problem is with placing nodes on the x axis, making sure that they do not overlap each other. I have been looking around for months now, but I have been unable to find any information relevant to this kind of graph. So I where wondering if some of you people might know of an algorithm that can solve this problem, or an idea on what approach I should take.
Here you see my problem: The red nodes are overlapping other nodes
My best approach right now is where i add it all to row:
With this approach will the tree above look like this.

quality analysis of fitted pyramid

sorry for posting this in programing site, but there might be many programming people who are professional in geometry, 3d geometry... so allow this.
I have been given best fitted planes with the original point data. I want to model a pyramid for this data as the data represent a pyramid. My approach of this modeling is
Finding the intersection lines (e.g. AB, CD,..etc) for each pair of adjacent plane
Then, finding the pyramid top (T) by intersecting the previously found lines as these lines don’t pass through a single point
Intersecting the available side planes with a desired horizontal plane to get the basement
In figure – black triangles are original best fitted triangles; red
and blue triangles are model triangles
I want to show that the points are well fitted for the pyramid model
than that it fitted for the given best fitted planes. (Assume original
planes are updated as shown)
Actually step 2 is done using weighted least square process. Each intersection line is assigned with a weight. Weight is proportional to the angle between normal vectors of corresponding planes. in this step, I tried to find the point which is closest to all the intersection lines i.e. point T. according to the weights, line positions might change with respect to the influence of high weight line. That mean, original planes could change little bit. So I want to show that these new positions of planes are well fitted for the original point data than original planes.
Any idea to show this? I am thinking to use RMSE and show before and after RMSE. But again I think I should use weighted RMSE as all the planes refereeing to the point T are influenced so that I should cope this as a global case rather than looking individual planes….. But I can’t figure out a way to show this. Or maybe I should use some other measure…
So, I am confused and no idea to show this.. Please help me…

If you are given the best-fit planes, why not intersect the three of them to get a single unambiguous T, then determine the lines AT, BT, and CT?
This is not a rhetorical question, by the way. Your actual question seems to be for reassurance that your procedure yields "well-fitted" results, but you have not explained or described what kind of fit you're looking for!
Unfortunately, without this information, your question cannot be answered as asked. If you describe your goals, we may be able to help you achieve them -- or, if you have not yet articulated them for yourself, that exercise may be enough to let you answer your own question...
That said, I will mention that the only difference between the planes you started with and the planes your procedure ends up with should be due to floating point error. This is because, geometrically speaking, all three lines should intersect at the same point as the planes that generated them.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex