Visualising large tree-like graphs - graph

I want to visualize a large (tens of thousands of nodes) tree-like structure. The graph is almost a hierarchical tree, with the difference that there can be a few extra edges (making it not strictly at tree anymore, potentially making algorythms crash)
What is the best way I can do this?

You might be looking for something like ArcTrees (PDF) or Treemaps with Link Overlays (PDF). For the latter, edge-bundled versions (PDF)have been explored as well. The combination of a space-filling base visualization of the tree and the extra edges being displayed as arcs or links on top of them really helps to show where the tree property breaks down and how many extra edges there are. So, if that is what you are looking for, I'd go with these.
Otherwise, you can always use a standard DAG layout - e.g., as produced by the Sugiyama framework. See this Wikipedia entry for more information on this option.

Related

How to redistribute graph elements to maximize readability?

I have a graph that consists of nodes. Each node can have multiple parents and/or children. I want to display that graph and connections between nodes.
But I don't know how to redistribute nodes to maximize readability. Currently I'm facing following problems:
Node connections cross each other too much even though it's unnecessary and can be evaded
Connections between nodes are too long visually
Some connections have the same angle so they overlap and become one line
Connections beteween column i and column i-2 (and further away) sometimes go straight through elements in column i-1
Also I can shift nodes only vertically, not horizontally because the amount of columns is limited.
To make it easier for myself I tried to place nodes in a grid-like pattern. And I've managed to group them by columns. But then I somehow need to iterate through columns and compare them with other columns to re-arrange stuff. And I don't know where to start.
UPD: I may be wrong but I feel like my problem with graph alignment is somehow related to a typical graph problem of the shortest path. Except that in my case there are multiple paths that should be calculated at the same time and some nodes can be passed only once.
On the image below you can see a nearly ideal redistribution that I made by just scribbling stuff on paper (direction left-to-right shows parent-to-child connections).
It is graph layout and drawing problem. You can take one of the following two approaches
Use already existing libraries: There are many graph layout libraries available for example GraphViz, Gephi, D3js etc. You can use theirs APIs directly or you can find applications/tools build on top of them. But to get best layout, you need to have guess on family of layout. e.g. Layered graph layout (Good for dense but layered graph like flowcharts) Tree layout (Used when graph is actually a tree or forest. There are many variants of tree), Radial tree layout (Again for tree but in polar system), Force directed layouts (When you don't know what visual structure will best represent the data, it is good starting point). All these layout will have many customization parameters like spacing between nodes, spacing between nodes and edges, overall aspect ratio of drawing etc.
GraphViz
Gephi
Implement layout algorithms yourself
Detailed coverage of graph drawing algorithms for different families can be found here
Graph Drawing Handbook
If you don't want to get into details, here are quick start points
For graphs, Do a topological sort and place nodes in layers as dictated by topo order. It can give you a very good starting point and help you avoid unnecessary crossings. Grid can be good idea here. But place nodes in grid in topo order.
Alternatively, Find a spanning tree for the graph, use tree layouts to draw spanning tree and then add remaining edges
For trees: Use recursive bottoms up approach for placing subtrees. For radial trees, do rectilinear layout and then transform to polar coordinate system
For unknown family: Use force directed method. Define force between two nodes (e.g. spring force) and then go through iterations to find equilibrium point.
Best auto visualization of a graph is very interesting area and people are trying many ML techniques here.
You could implement force directed drawing. Or you could use a graph drawing library that already supports force directed drawing, such as D3's force directed layout.

Layout for a family tree

I have a dataset of DNA relationships (as a percent match) between myself and few hundred relatives, almost all distant relatives. I also have data on DNA relationships between each of them and certain other members in the dataset.
I'm hoping to build a network graph that shows the interrelationships and have Gephi build something that loosely resembles a family tree. But even using a small sample database I can't get the resulting graph to look anything like that.
I want each relationship (i.e. edge) to have a "force" related to the closeness of the relationship, so distant relatives (nodes) are pushed further away. I want the graph to self-assemble based on these "forces" and assume there is a layout for this, but I haven't found one.
I'm currently putting the DNA relationship in the weight column, and not using the interval column at all. But even using just 8 relatives and artificially perfect data I have to manually move nodes around to make it look remotely useful.
What layout should I use for this type of graph, and what other advice can you offer to make this work? Should the weight field increase or decrease as relationship distance increases?
… and have Gephi build something that loosely resembles a family tree. But even using a small sample database I can't get the resulting graph to look anything like that.
A family tree connects descendants (mostly). DNA similarity (as a percentage) does not conform to this structure. Related questions may be answered here.
Setting a Library > Edges > Edge Weight -filter to the DNA similarity attribute may help (but will not produce "something that loosely resembles a family tree").
I want each relationship (i.e. edge) to have a "force" related to the closeness of the relationship, so distant relatives (nodes) are pushed further away. I want the graph to self-assemble based on these "forces" …
All layouts work like that. However, Gephi does not feature hierarchical positioning. 3rd party candidates include EventGraphLayout, Layered Layout and Concentric Layout.
Should the weight field increase or decrease as relationship distance increases?
The greater an edge's weight, the stronger its connection (resulting in less distance between the nodes it connects). To a family tree however this is irrelevant.
I'm hoping to build a network graph that shows the interrelationships between each member …
What layout should I use for this type of graph, and what other advice can you offer to make this work?
Following steps emphasize clustering and modularity:
Calculate modularity.
Color nodes by modularity class: Appearance > Nodes > Partition > Modularity Class
Apply a layout; ForceAtlas 2 for example (with Dissuade Hubs, LinLog mode and Prevent Overlap enabled).
Apply the Contraction layout afterwards if necessary. Optionally set node size according to (for example) Eigenvector Centrality (prior to applying layout).

Algorithm for placing nodes in a graph

I have been trying to create an algorithm that can create a graph. It is not a tree graph as nodes can have multiple parents, more like an activity diagram. My problem is with placing nodes on the x axis, making sure that they do not overlap each other. I have been looking around for months now, but I have been unable to find any information relevant to this kind of graph. So I where wondering if some of you people might know of an algorithm that can solve this problem, or an idea on what approach I should take.
Here you see my problem: The red nodes are overlapping other nodes
My best approach right now is where i add it all to row:
With this approach will the tree above look like this.

How can I produce visualizations combining network graphs and imaginary maps?

Basically, I'm looking for something like this awesome research project: Gmap, which was referenced in this related SO question.
It's a rather novel data visualization that combines a network graph with an imaginary set of regions that looks like a map. Basically, the map-ification helps humans comprehend the enormous data set better.
Cool, huh? GMap doesn't appear to be open source, though I plan to contact the authors.
I already know how to create a network graph with a force-directed layout (currently using Prefuse/Flare), so an answer could be a way to layer a mapping algorithm on top of an existing graph. I'm also not concerned about the client-side at all right now - this would be a backend process, and I am flexible about technology stack and data output at this stage.
There's also this paper that describes the algorithm backing GMap. If you have heard of Voronoi diagrams (which rock, but make my head hurt), this paper is for you. I quit after Calc 1, though, so I'm hoping to avoid remembering what sigmas and epsilons are.
As a start, could you do a simple closest point sort of an algorithm? So it looks something like this: You have your force directed layout and have computed some sort of bounding box. Now you want to render it. Adjust your bounding box to line up to the origin and then as you calculate the color of each pixel, find it's closest point. This should generate some semblance of regions and should be quite simple to try out. Of course, it isn't going to be as pretty as GMap, but maybe a start? The runtime would be awful, but... I don't know about you but computing boundary lines directly sounds a lot harder to me.

Clustered Graphs Visualization Techniques

I need to visualize a relatively large graph (6K nodes, 8K edges) that has the following properties:
Distinct Clusters. Approximately 50-100 Nodes per cluster and moderate interconnectivity at the cluster level
Minimal (5-10 inter-cluster edges per cluster) interconnectivity between clusters
Let global edge overlap = The edge overlaps caused by directly visualizing a graph of Clusters = {A, B, C, D, E}, Edges = {Pentagram of those clusters, which is non-planar by the way and will definitely generate edge overlap if you draw it out directly}
Let Local Edge Overlap = the above but { A, B, C, D, E } are just nodes.
I need to visualize graphs with the above in a way that satisfies the following requirements
No global edge overlap (i.e. edge overlaps caused by inter-cluster properties is not okay)
Local edge overlap within a cluster is fine
Anyone have thoughts on how to best visualize a graph with the requirements above?
One solution I've come up with to deal with the global edge overlap is to make sure a cluster A can only have a max of 1 direct edge to another cluster (B) during visualization. Any additional inter-cluster edges between cluster A -> C, A -> D, ... are disconnected and additional node/edges A -> A_C, C -> C_A, A -> A_D, D -> D_A... are created.
Anyone have any thoughts?
Prefuse has some good graph drawing link text algorithms built in and it seems to handle fairly large graphs relatively well. You might try Flow Map Layout which is built on top of Prefuse.
Given your objectives, I think that the Fruchterman-Reingold algorithm does a pretty decent job of preventing edge overlap. See for example this screenshot of a network consisting of multiple components drawn using the Fruchterman-Reingold algorithm. IGraph has built-in support for this algorithm (as does Networkx I believe) and is really fast.
There is a program built on top of Prefuse called SocialAction. You have to request the code from the author, but it does a lot of statistical analysis on the graph for you, such as identifying subgraphs. I've used it on a graph with more than 18,000 nodes, and although it is very slow at that scale it still works.
Although it may be silly to ask at this point, had you tried out http://www.graphviz.org/ ?
I haven't seen too many graph visualization tools that support separating clusters within a graph visually. One option might be to take a look at WilmaScope. It looks to have some support for cluster based layouts.
Organic layout manages fairly well clustered graphs in yFiles framework. Try first in yEd to see if it does what needed. It is probably reasonable to use nested graphs alias groups for each cluster. Organic layout has feature called Group Layout Policy which can be used if layout needs to be done using different principles for inter-cluster and intra-cluster edges, with incremental layouting. With some effort, one can translate graph into GraphML to avoid manual work.

Resources