Metrics for hierarchical graphs connectivity - graph

this is my first question on Stack Overflow. This is not really a programming question but since most of us have to deal with theoretical problems at some point and there might be some graph theory specialists around, I thought I might give it a go.
I am currently doing some research on multilingual websites and I found some interesting patterns in the website structure. The graphs below are the website graphs of two different multilingual websites. Sorry, I don't have enough rep points to post images so I leave them as links. I used the Force Atlas algorithm for the layout. Vertices are colored according to the page language. The shaded areas correspond to the subgraphs of a specific language.
Here is the graph of the website where different language versions of the same content are very closely linked. Hence the planes representing the different language versions are overlapping.
http://www.ai.soc.i.kyoto-u.ac.jp/~julien/phd/images/tight.png
In this second graph, we have a website where language versions of a website are almost independent, thus we have almost no overlap.
http://www.ai.soc.i.kyoto-u.ac.jp/~julien/phd/images/loose.png
So here is my question:
Is there a specific metric to quantify this overlap? If so, what is it named?
Since I used a force-based layout, the number of edges between the language subgraphs. So I guess something like taking the ratio of the number of edges within the subgraph to the number edges going outside/coming inside a specific subgraph might do the trick. I am sure I am not the first to get this idea so I was wondering if this metric had a name. I could then Google it from there :)
Thank you in advance!

It sounds like what you're looking for is Network Modularity. Given a graph, and a partition (breaking the graph into disjoint subgraphs), the modularity is defined as:
The fraction of the edges that fall within the given groups minus the
expected such fraction if edges were distributed at random.
Modularity was the basis of some of the first community detection algorithms on networks, which try to find sets of nodes that are densely connected. Recently, modularity has been shown to be a poor metric for community detection though because of resolution limits that fail to detect small groups or break apart well defined groups in certain cases (see this paper).

And there are now other approaches than modularity, designed to overcome the limitations mentionned by job, such as surprise; or the B- and C-scores (designed to be significance indices).

Related

Symmetric(or undirected) Hamiltonian Cycle data sets

I would like to test my recently created algorithm on large (50+ node) graphs. Preferrably, they would specifically be challenging graphs, and known tours would exist (for at least most of them).
Problem sets for this problem do not seem as easy to find as with the TSP. I am aware of the Flinder's challenge set available at http://www.flinders.edu.au/science_engineering/csem/research/programs/flinders-hamiltonian-cycle-project/fhcpcs.cfm
However, they seem to be directed. I can probably alter my algorithm to work for directed, but it will take time and likely induce bugs. I'd prefer to know if it can work for undirected first.
Does anyone know where problem sets are available? Thank you.
quick edit:
Now I am unsure if the flinder's set is directed or not.... It doesn't say. Examples make it seem like maybe it actually is undirected.
Check this video:
https://www.youtube.com/watch?v=G1m7goLCJDY
Also check the in depth sequel to the video.
You can determine yourself how many nodes you want to add to the graph.
It does require you to construct the data yourself, which should be deable.
One note: the problem is about a path, not a cycle, but you can overcome this by connecting the start and end node.

Fast volume representation, modification and polygonisation

I am looking for ideas for algorithms and data structures for representing volumetric objects. I am working on a sculpting system, like sculptrix or mudbox, and want to find a good implementation strategy.
I currently have a very nice dynamic halfedge mesh system to collapse/subdivide faces. It works very well and is incredibly fast, but since it is a surface algorithm, it is not easy to robustly change topology.
So I want to go back to the drawingboard and implement a proper volumetric system. My first idea was some kind of octtree representation for the volume and marching cubes to polygonise it.
However, I have a few problems with this. First, marching cubes often produces small or thin triangles, something that is highly undesirable (reason why later). Second, I want to polygonise the volume only in the area of editing, and at different levels of detail. For example, I may want a low res sphere, but with a few tiny high res bumps. I can easily get that kind of subdivision behaviour with my current surface based sustem, but I can't envision how I could do it robustly with marching cubes.
Another problem is that the actual trianglular mesh is further subdivided on the gpu for smooth surfaces, so I need neighbourhood information too. Again, I already have this with the current half-edge system, but with a volume polygonisation system, I imagine it taking a lot of extra processing to find the extra connectivity information. This is the reason thin triangles are bad.
So I have a lot of constraints, and I am asking this community for ideas or pertinent papers to read. I was thinking about surfacenets to avoid the small/thin triangle problem. Also, I have a feeling kd-trees may be better for storing multiresolution volumes since they seem more flexible then octtrees.
Anyway, any ideas/suggestions very welcome.

Graph data structures in LabVIEW

What's the best way to represent graph data structures in LabVIEW?
I'm doing some basic algorithm review over the holiday, and I'd prefer to not implement all of the storage and traversals myself, if possible.
(I'm aware that there was a thread a few years ago on LAVA, is that my best bet?)
I've never had a need to do this myself, so I never really looked into it, but there are some people who did do some work as far I know.
Brian K. has posted something over here, although it's been a long time since I looked at it:
https://decibel.ni.com/content/docs/DOC-12668
If that doesn't help, I would suggest you read this and then try sending a PM to Daklu there, as he's the most likely candidate to have something.
https://decibel.ni.com/content/thread/8179?tstart=0
If not, I would suggest posting a question on LAVA, as you're more likely to find the relevant people there.
Well you don't have that many options for graphs , from a simple point of view. It really depends on the types of algorithms you are doing, in order to choose the most convenient representation.
Adjacency matrix is simple, but can be slow for some tasks, and can be wasteful if the graph is not dense.
You can keep a couple of lists and hash maps of your edges and vertices. With each edge or vertex created assigned a unique index into the list,it's pretty simple to keep things under control. Each vertex could then be associated with a list of its neighbors. Depending on your needs you could divide that neighbors list into in and out edges. Also depending on your look up needs, you could choose to index edges by their in or out edge or both, or simple by a unique index number.
I had a glance at the LabView quick reference, and while it was not obvious from there how you would do that, as long as they have arrays of some sort, you can implement a graph. I'm sure you'll be fine.

What's the fastest force-directed network graph engine for large data sets?

We currently have a dynamically updated network graph with around 1,500 nodes and 2,000 edges. It's ever-growing. Our current layout engine uses Prefuse - the force directed layout in particular - and it takes about 10 minutes with a hefty server to get a nice, stable layout.
I've looked a little GraphViz's sfpd algorithm, but haven't tested it yet...
Are there faster alternatives I should look at?
I don't care about the visual appearance of the nodes and edges - we process that separately - just putting x, y on the nodes.
We do need to be able to tinker with the layout properties for specific parts of the graph, for instance, applying special tighter or looser springs for certain nodes.
Thanks in advance, and please comment if you need more specific information to answer!
EDIT: I'm particularly looking for speed comparisons between the layout engine options. Benchmarks, specific examples, or just personal experience would suffice!
I wrote a JavaScript-based graph drawing library VivaGraph.js.
It calculates layout and renders graph with 2K+ vertices, 8.5K edges in ~10-15 seconds. If you don't need rendering part it should be even faster.
Here is a video demonstrating it in action: WebGL Graph Rendering With VivaGraphJS.
Online demo is available here. WebGL is required to view the demo but is not needed to calculate graphs layouts. The library also works under node.js, thus could be used as a service.
Example of API usage (layout only):
var graph = Viva.Graph.graph(),
layout = Viva.Graph.Layout.forceDirected(graph);
graph.addLink(1, 2);
layout.run(50); // runs 50 iterations of graph layout
// print results:
graph.forEachNode(function(node) { console.log(node.position); })
Hope this helps :)
I would have a look at OGDF, specifically http://www.ogdf.net/doku.php/tech:howto:frcl
I have not used OGDF, but I do know that Fast Multipole Multilevel is a good performant algorithm and when you're dealing with the types of runtimes involved with force directed layout with the number of nodes you want, that matters a lot.
Why, among other reasons, that algorithm is awesome: Fast Multipole method. The fast multipole method is a matrix multiplication approximation which reduces the O() runtime of matrix multiplication for approximation to a small degree. Ideally, you'd have code from something like this: http://mgarland.org/files/papers/layoutgpu.pdf but I can't find it anywhere; maybe a CUDA solution isn't up your alley anyways.
Good luck.
The Gephi Toolkit might be what you need: some layouts are very fast yet with a good quality: http://gephi.org/toolkit/
30 secondes to 2 minutes are enough to layout such a graph, depending on your machine.
You can use the ForAtlas layout, or the Yifan Hu Multilevel layout.
For very large graphs (+50K nodes and 500K links), the OpenOrd layout wil
In a commercial scenario, you might also want to look at the family of yFiles graph layout and visualization libraries.
Even the JavaScript version of it can perform layouts for thousands of nodes and edges using different arrangement styles. The "organic" layout style is an implementation of a force directed layout algorithm similar in nature to the one used in Neo4j's browser application. But there are a lot more layout algorithms available that can give better visualizations for certain types of graph structures and diagrams. Depending on the settings and structure of the problem, some of the algorithms take only seconds, while more complex implementations can also bring your JavaScript engine to its knees. The Java and .net based variants still perform quite a bit better, as of today, but the JavaScript engines are catching up.
You can play with these algorithms and settings in this online demo.
Disclaimer: I work for yWorks, which is the maker of these libraries, but I do not represent my employer on SO.
I would take a look at http://neo4j.org/ its open source which is beneficial in your case so you can customize it to your needs. The github account can be found here.

Applications of large graph layout

I've been playing with large graph layout for a while now. What are some applications for the layout, visualization, and interaction with large graphs? Large in this case meaning at least 10's of thousands of nodes.
I'd like to find something that doesn't just look cool but is actually useful.
I believe Tulip is designed to visualize huge graphs, some of the screenshots might give you ideas for applications. There is also a similar conversation, but in reverse here. There the question is which existing tools are best for huge graphs. The post by Scott has a number of links which might contain useful galleries.
the best tools for large graphs are mental mathematical tools from topology and differential geometry such as fiber bundles, ricci flows or manifold surgery as well as dimensional reduction via either compression techniques or linear algebraic methods. anyone who sits there drawing a zillion nodes and edges without any analysis of structural redundancy is just basically trying to make pretty pictures with no purpose or understanding.

Resources