Graph Drawing With Weighted Edges - graph

I'm looking to build an algorithm (or reuse one) that organizes nodes and edges on a 2 dimensional canvas where edges can have corresponding weights.
Any starting material and info would be helpful.

What would the weights do to affect their placement on your canvas?
That being said, you might want to look into graphviz and, more specifically, the DOT language, which organizes nodes on a canvas.
Many graph visualization frameworks use a force-based simulation, in which all nodes exert a repulsive force against each other (with their mass being their size), and edges exert tension on the nodes they connect. This creates aesthetically-arranged graph visualizations.
Although again, I'm not sure where you want node "weights" to come into play. Do you want weighted nodes to be more in the center? To be larger? More further apart?

Many graph/network layout algorithms are implicitly capable of handling weighted networks, but you may need to do some pre-processing and tweaks to the implementation to get it to work. Usually the first step is to determine if your weights represent "similarities" (usually interpreted to mean that stronger weights should place nodes closer togeter) or "dissimilarities" (stronger weights = father apart). The most common case is the former, so you will need to translate them to dissimilarities, often done by subtracting each edge value from the maximum observed edge value in the network. The matrix of dissimilarity values for each edge can then be fed to the algorithm and interpreted as desired distances in the layout space for each edge (i.e. "spring lengths")--usually after multiplying by some constant to transform to display units (pixels).
If you tell me what language you are using, I may be able to point you to some code examples.

Related

Why nodes(vertices) in peripheral positions have higher betweenness centrality scores after plotting on the igraph network visualization?

I calculated the betweenness centrality for a matrix using the 'igraph' package and obtained the scores. After plotting the network, I found that nodes (vertices) that are in the peripheral positions of the network have higher betweenness centrality scores compared to the more center-positioned nodes. Since the definition of betweenness centrality is defined by "the number of geodesics (shortest paths) going through a vertex or an edge". In that case, should more central nodes have higher betweenness centrality? The scores I am getting here, with higher centrality scores located in the peripheral positions of the network, does not fit with the definition and the other graphs that I have seen plotting the betweenness centrality. Do you know what's happening here? enter image description here The original matrix to create the network is shared on the github here (https://github.com/evaliu0077/network.matrix.git). My code for plotting the network and also the network visualization plot are both attached.
matrix <- read.csv("matrix.csv")
matrix <-as.matrix(matrix)
network <- graph_from_adjacency_matrix(matrix, weighted=T, mode="undirected", diag=F)
network =delete.edges(network, which(E(network.eng)$weight <=.1)) # delete the negative correlation values to plot it later
set.seed(10)
l=layout.fruchterman.reingold(network)
plot.igraph(network, layout=l,
vertex.size=betweenness(network),
edge.width=E(network)$weight*2 # rescaled by 2,
edge.color=ifelse(E(network)$weight>0.25,"blue","red"),main="Betweenness
centrality for the sample")
Thank you!
Pay attention to the meaning of edge weights before you use them.
In the context of betweenness centrality, edge "weights" are interpreted as "lengths", and are used for determining shortest paths. The length of a path is the sum of the weights/lengths of edges along the path. Higher "length" values indicate a weaker link, not a stronger one.
Are your weight values suitable for this use? Does it make sense to add them up along a path? If they are correlations, then I would say no. You could transform them so that weaker links have higher lengths, for example by inverting the values. You will sometimes see this in the literature, but it is a rather dubious practice. It still does not make much sense to add up inverse correlation values.
Similarly, check if the layout function you are calling makes use of weights, and if yes, in what way. First, your graph is almost complete. Therefore, with layout methods that do not use weights, the vertex positions are completely meaningless. Generally, be careful about reading too much into any kind of network visualization unless there are very obvious effects (such as an undisputable community structure). Here you use igraph's Fruchterman-Reingold layout algorithm, which happens to draw vertices connected by a high-weight edge closer to each other, not further. Thus, it interprets weights in exactly the opposite way compared to betweenness calculations: high weight indicates "strong" connections. Some other layout algorithms, such as Kamada-Kawai, interpret high weights (lengths) as weak (long) connections. Yet other layout algorithms ignore weights completely. It's good to keep this in mind when trying to interpret a network visualization.
should more central nodes have higher betweenness centrality?
I think the problem is that you're mixing two notions of centrality here. There's the well defined 'betweenness centrality' and then there's 'nodes that end up in the center of the picture after doing a layout with Fruchterman-Reingold'. They are not the same.
For example, take a full graph, and then add one new node A and connect it only to node B (just some random node in the full graph). Then B will have a high betweenness, but there's no reason to draw it in the middle of the graph. If I wanted to make a nice picture of this I would draw A and B at the edge. Maybe Fruchterman-Reingold does that too, because it will force A outward because it's not connected to most nodes.
Betweenness-based layout algorithms do exist:
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-19, but I don't think igraph has one available.

How to redistribute graph elements to maximize readability?

I have a graph that consists of nodes. Each node can have multiple parents and/or children. I want to display that graph and connections between nodes.
But I don't know how to redistribute nodes to maximize readability. Currently I'm facing following problems:
Node connections cross each other too much even though it's unnecessary and can be evaded
Connections between nodes are too long visually
Some connections have the same angle so they overlap and become one line
Connections beteween column i and column i-2 (and further away) sometimes go straight through elements in column i-1
Also I can shift nodes only vertically, not horizontally because the amount of columns is limited.
To make it easier for myself I tried to place nodes in a grid-like pattern. And I've managed to group them by columns. But then I somehow need to iterate through columns and compare them with other columns to re-arrange stuff. And I don't know where to start.
UPD: I may be wrong but I feel like my problem with graph alignment is somehow related to a typical graph problem of the shortest path. Except that in my case there are multiple paths that should be calculated at the same time and some nodes can be passed only once.
On the image below you can see a nearly ideal redistribution that I made by just scribbling stuff on paper (direction left-to-right shows parent-to-child connections).
It is graph layout and drawing problem. You can take one of the following two approaches
Use already existing libraries: There are many graph layout libraries available for example GraphViz, Gephi, D3js etc. You can use theirs APIs directly or you can find applications/tools build on top of them. But to get best layout, you need to have guess on family of layout. e.g. Layered graph layout (Good for dense but layered graph like flowcharts) Tree layout (Used when graph is actually a tree or forest. There are many variants of tree), Radial tree layout (Again for tree but in polar system), Force directed layouts (When you don't know what visual structure will best represent the data, it is good starting point). All these layout will have many customization parameters like spacing between nodes, spacing between nodes and edges, overall aspect ratio of drawing etc.
GraphViz
Gephi
Implement layout algorithms yourself
Detailed coverage of graph drawing algorithms for different families can be found here
Graph Drawing Handbook
If you don't want to get into details, here are quick start points
For graphs, Do a topological sort and place nodes in layers as dictated by topo order. It can give you a very good starting point and help you avoid unnecessary crossings. Grid can be good idea here. But place nodes in grid in topo order.
Alternatively, Find a spanning tree for the graph, use tree layouts to draw spanning tree and then add remaining edges
For trees: Use recursive bottoms up approach for placing subtrees. For radial trees, do rectilinear layout and then transform to polar coordinate system
For unknown family: Use force directed method. Define force between two nodes (e.g. spring force) and then go through iterations to find equilibrium point.
Best auto visualization of a graph is very interesting area and people are trying many ML techniques here.
You could implement force directed drawing. Or you could use a graph drawing library that already supports force directed drawing, such as D3's force directed layout.

Layout for a family tree

I have a dataset of DNA relationships (as a percent match) between myself and few hundred relatives, almost all distant relatives. I also have data on DNA relationships between each of them and certain other members in the dataset.
I'm hoping to build a network graph that shows the interrelationships and have Gephi build something that loosely resembles a family tree. But even using a small sample database I can't get the resulting graph to look anything like that.
I want each relationship (i.e. edge) to have a "force" related to the closeness of the relationship, so distant relatives (nodes) are pushed further away. I want the graph to self-assemble based on these "forces" and assume there is a layout for this, but I haven't found one.
I'm currently putting the DNA relationship in the weight column, and not using the interval column at all. But even using just 8 relatives and artificially perfect data I have to manually move nodes around to make it look remotely useful.
What layout should I use for this type of graph, and what other advice can you offer to make this work? Should the weight field increase or decrease as relationship distance increases?
… and have Gephi build something that loosely resembles a family tree. But even using a small sample database I can't get the resulting graph to look anything like that.
A family tree connects descendants (mostly). DNA similarity (as a percentage) does not conform to this structure. Related questions may be answered here.
Setting a Library > Edges > Edge Weight -filter to the DNA similarity attribute may help (but will not produce "something that loosely resembles a family tree").
I want each relationship (i.e. edge) to have a "force" related to the closeness of the relationship, so distant relatives (nodes) are pushed further away. I want the graph to self-assemble based on these "forces" …
All layouts work like that. However, Gephi does not feature hierarchical positioning. 3rd party candidates include EventGraphLayout, Layered Layout and Concentric Layout.
Should the weight field increase or decrease as relationship distance increases?
The greater an edge's weight, the stronger its connection (resulting in less distance between the nodes it connects). To a family tree however this is irrelevant.
I'm hoping to build a network graph that shows the interrelationships between each member …
What layout should I use for this type of graph, and what other advice can you offer to make this work?
Following steps emphasize clustering and modularity:
Calculate modularity.
Color nodes by modularity class: Appearance > Nodes > Partition > Modularity Class
Apply a layout; ForceAtlas 2 for example (with Dissuade Hubs, LinLog mode and Prevent Overlap enabled).
Apply the Contraction layout afterwards if necessary. Optionally set node size according to (for example) Eigenvector Centrality (prior to applying layout).

Representing a graph where nodes are also weights

Consider a graph that has weights on each of its nodes instead of between two nodes. Therefore the cost of traveling to a node would be the weight of that node.
1- How can we represent this graph?
2- Is there a minimum spanning path algorithm for this type of graph (or could we modify an existing algorithm)?
For example, consider a matrix. What path, when traveling from a certain number to another, would produce a minimum sum? (Keep in mind the graph must be directed)
if one don't want to adjust existing algorithms and use edge oriented approaches, one could transform node weights to edge weights. For every incoming edge of node v, one would save the weight of v to the edge. Thats the representation.
well, with the approach of 1. this is now easy to do with well known algorithms like MST.
You could also represent the graph as wished and hold the weight at the node. The algorithm simply didn't use Weight w = edge.weight(); it would use Weight w = edge.target().weight()
simply done. no big adjustments are necessary.
if you have to use adjacency matrix, you need a second array with node weights and in adjacency matrix are just 0 - for no edge or 1 - for an edge.
hope that helped

How do I calculate a normal vector based on multiple triangles sharing a vertex?

If I have a mesh of triangles, how does one go about calculating the normals at each given vertex?
I understand how to find the normal of a single triangle. If I have triangles sharing vertices, I can partially find the answer by finding each triangle's respective normal, normalizing it, adding it to the total, and then normalizing the end result. However, this obviously does not take into account proper weighting of each normal (many tiny triangles can throw off the answer when linked with a large triangle, for example).
I think a good method should be using a weighted average but using angles instead of area as weights. This is in my opinion a better answer because the normal you are computing is a "local" feature so you don't really care about how big is the triangle that is contributing... you need a sort of "local" measure of the contribution and the angle between the two sides of the triangle on the specified vertex is such a local measure.
Using this approach a lot of small (thin) triangles doesn't give you an unbalanced answer.
Using angles is the same as using an area-weighted average if you localize the computation by using the intersection of the triangles with a small sphere centered in the vertex.
The weighted average appears to be the best approach.
But be aware that, depending on your application, sharp corners could still give you problems. In that case, you can compute multiple vertex normals by averaging surface normals whose cross product is less than some threshold (i.e., closer to being parallel).
Search for Offset triangular mesh using the multiple normal vectors of a vertex by SJ Kim, et. al., for more details about this method.
This blog post outlines three different methods and gives a visual example of why the standard and simple method (area weighted average of the normals of all the faces joining at the vertex) might sometimes give poor results.
You can give more weight to big triangles by multiplying the normal by the area of the triangle.
Check out this paper: Discrete Differential-Geometry Operators for Triangulated 2-Manifolds.
In particular, the "Discrete Mean Curvature Normal Operator" (Section 3.5, Equation 7) gives a robust normal that is independent of tessellation, unlike the methods in the blog post cited by another answer here.
Obviously you need to use a weighted average to get a correct normal, but using the triangles area won't give you what you need since the area of each triangle has no relationship with the % weight that triangles normal represents for a given vertex.
If you base it on the angle between the two sides coming into the vertex, you should get the correct weight for every triangle coming into it. It might be convenient if you could convert it to 2d somehow so you could go off of a 360 degree base for your weights, but most likely just using the angle itself as your weight multiplier for calculating it in 3d space and then adding up all the normals produced that way and normalizing the final result should produce the correct answer.

Resources