Interactive alternative to dot? - graph

From time to time I need to visualize dependencies/dependent structures. For example function calls, data structures etc.
For relatively small graphs Graphviz dot is a good match. The input format of dot is easy to generate and it produces good layouts.
But sometimes the graph contains too many vertices and dependencies to be useful as static PDF document. For that I want to use an interactive graph viewer, where I can dynamically select a main vertex (or two, restricting the displayed connections), temporarily hide vertices/edges to make the graph more accessible, zoom in/out, etc.
What are my open-source alternative for such an interactive tool?

Quoting from the long list (from Paul Sweatte's comment):
InfoVis Toolkit – The InfoVis Toolkit is an interactive graphics toolkit written in
Java to ease the development of Information Visualization applications
and components
Prefuse – Prefuse is a user interface toolkit for building highly
interactive visualizations of structured and unstructured data. This
includes any form of data that can be represented as a set of entities
(or nodes) possibly connected by any number of relations (or edges).
Examples of data supported by prefuse include hierarchies
(organization charts, taxonomies, file systems), networks (computer
networks, social networks, web site linkage) and even non-connected
collections of data (timelines, scatterplots). See also Jeff Heer, the
author of Prefuse (http://jheer.org/)
Treebolic - Treebolic is a Java component (widget) whose purpose is to provide a
hyperbolic rendering of hierarchical data. A tree is rendered with
nodes and edges but display space is subject to a particular curvature
(hence the name) : more space is allocated to the focus node while the
parent and children, still in the immediate visual context, appear
slightly smaller. The grandparents and grandchildren are still visible
but come out even smaller. As we move away from the focus node, less
display space is allotted to the nodes, which gradually disappear
towards the disk’s border, as though the whole hierarchy were seen
through a fisheye lens. Wrapped as a Java applet, the Treebolic widget
can be embedded in a web page. Nodes may then contain hypertext links
and the browser to other web pages. The tree is dynamic (animation
brings the focus node to the center) and responds to user interaction.
Walrus – Walrus is a tool for interactively visualizing large directed
graphs in three-dimensional space. By employing a fisheye-like
distortion, it provides a display that simultaneously shows local
detail and the global context. It is technically possible to display
graphs containing a million nodes or more, but visual clutter,
occlusion, and other factors can diminish the effectiveness of Walrus
as the number of nodes, or the degree of their connectivity,
increases. Thus, in practice, Walrus is best suited to visualizing
moderately sized graphs that are nearly trees. A graph with a few
hundred thousand nodes and only a slightly greater number of links is
likely to best target size

Related

Does an Increased Number of Node Types Impact Performance of Graph DBs?

I am in the process of creating a graph database, a simple one for movies with several types of information like the actors, producers, directors and so on.
What I would like to know is, is it better to break down your nodes to a more granular level? For example, is it better to have two kinds of nodes for 'actors' and 'directors' or is it better to have one node, say 'person' and use different kinds of relationships like 'acted_in' and 'directed'? Does this even matter at all?
Further, is there any impact on the traversal queries? Does having more types of nodes mean that the traversal is slower?
Note: I intend to implement this using the Gremlin console in Amazon Neptune.
The answer really is it depends. If I were building such a model I would break out the key "nouns" into their own nodes. I would also label the edges appropriately such as ACTED_IN or DIRECTED.
The performance of any graph query depends on how much data it will need to touch (the fan out factor as you go from depth to depth).
The best advice I can give you is think about the questions you will need the graph to answer and try to design your data model so that writing those queries is as easy as possible. Don't be afraid to iterate multiple times on your data model also. That is common and expected.
Properties can be useful when you want to add a unique piece of information to a node - perhaps the birthday of the director.
Edge properties can be useful for filtering out unneeded edges but edge labels can also. In some cases you may find a label such as DIRECTED-IN-2005 is a useful short cut to avoid checking a label and a property on an edge.

Graph database design: Should I add relationships, or just traverse

I have recently started exploring graph databases and Neo4J, and would like to work with my own data. At the moment I've hit some confusion. I've created an example image to illustrate my issue. In terms of efficiency, I'm wondering which option is better (and I want to get it right now in early days before I start handling larger amounts).
Option A: Using only the blue relationships, I can work out whether things are related to, or come under, the Ancient group. This process will be done many many times, however it is unlikely to be more than ~6 generations.
Option B: I implement the red relationships, so that it is much faster to work out if young structures belong to the Ancient group.
I'm trying not to use Labels in this scenario, as I'm trying to use labels for a specific purpose to simplify my life (linking structures across seperate networks), and I'm not sure if I should have a label to represent a node that already exists.
In summary, I'm wondering whether adding a whole new bunch of relationships, whilst taking more space, is worth it, or whether traversing to find all relatives is such a simple/inexpensive task that it isn't worth doing so. Or alternatively, both options are viable and this isn't a real issue at all. Thanks for reading.
I'd go with Option A. One of the strengths of Neo4j is that it traverses relationships very efficiently and quickly, and so, there is no need to materialise relationships (sometimes, relationships are materialised in complex and/or extremely large graphs, but this is not your case).
Not sure why you don't want to use labels? Labels serve to group nodes into sets of the same type, and are also index backed- this makes it much faster to find the starting point of your query (index lookup over full database scan).

graph representation tool for huge data with specific features

I'm building a JavaFX app and I want to display interactive graph of my huge data in it. something like placing cytoscape in javaFX app and working with graph inside of my app. my node may be up to 30000 nodes at max but usually its about 200 nodes after filtering nodes.
key features (sorted by importance):
generating graph with best layout and good looking with good performance and low overlapping (same as cytoscape)
selection some nodes and mark them (same as ctrl+L in cytoscape)
selecting neighbours of some nodes
building new graph from number 3
filtering graph base on weights, number of edges and ...
hiding and showing some selected edges and nodes
capturing image of built graph
Additional features :
zoom in zoom out
node tagging
multi color nodes and edges
Changing width of edges base on weight
Changing color of specific nodes and edges without rebuilding graph
Directed edge support
I have tested cytoscape.js but couldnt use it in javaFX browser. im testing WebVowl now. is anything better than these for my purpose ? if you suggest something that it cant be placed in javaFX app directly, please show how I do it.
Thanks
Depending on what you're trying to do, you could use Cytoscape as the data model, and build a JavaFX renderer around it. I've wanted to do this, but it's not in roadmap associated with our funding.
I've done a few JavaFX projects that might be good starting points, but they don't integrate directly with Cytoscape, which has a richer model of subnetworks, groups, etc.
https://github.com/AdamStuart/appFX/tree/master2/src/main/java/diagrams
one of which is based on a great example from TESIS DYNAware GmbH.
As you realize, the key issue is filtering down the network before trying to visualize it. The number of edges associated with 30000 nodes will bog down most any system, if you try to build something interactive.

Graph exploration: does the choice to use incoming edges or outgoing edges affect performance?

I have been tinkering with Graphs for some time, with the objective that I implement appropriate portions of the server-side stack using them. I have used Scala-Graph and Neo4J, and I am learning Spark GraphX. In almost all the applications I have implemented, the model has been that of a Property Graph (Node -> Edge -> Node, with attributes).
When designing the graph (DAGs to be precise), if I spot a strong and directed relationship between two nodes, I set up an edge from one node to one node. This is obvious and intuitive. If a Person likes a Site, an edge with property 'likes' connects them. Thus:
[Nirmalya] -- (Likes) --> [StackOverFlow]
[John] -- (Likes) --> [StackOverFlow]
[Ted] -- (Likes) --> [GoogleGroups ]
[Nirmalya] -- (Likes) --> [Neo4J]
Now, using outgoing edges, I can easily find out which sites Nirmalya likes.
But, when I want to find out who else likes what Nirmalya likes (i.e.,John), I tend to think that I should create an edge from Site-type Node to Person-type Node also (with property 'isLikedBy'), so that the path is obvious and the traversal is intuitive. Every Person and Site must be connected in both the directions, so that I can reach the other from either to answer queries like this one.
[Nirmalya] -- (Likes) --> [StackOverFlow] -- (IsLikedBy) --> [John]
But from many examples given by experts, I see that this is not prescribed. Instead, this is achieved by making use of operators like incoming. In other words, if two Nodes have an edge set up between them, I don't need to set both the directions of the edge explicitly (just 'likes' is sufficient, 'isLikedBy' is superfluous). Implementation of adjacency matrix makes this possible perhaps but I get a bit confused because I am being allowed to derive a contra-direction even when that direction is not explicit in the DAG.
My question is where is the gap in my understanding? Is it that 'IsLikedBy' direction should ideally be present, but we are optimizing? Alternatively, is it that there can be UseCases where such bidirectional edges are necessary and I need to spot them? Am I completely missing a theoretical underpinning?
I will be glad to become wiser.
I think it depends on the software. I can speak for Neo4j, but not for the other tools that you mentioned ;)
In Neo4j relationships are designed to be traversable both forwards and backwards without a performance cost. This applies both to traversing in the Java APIs as well as using Cypher. You can query both specifying a direction of incoming/outgoing as well as querying for relationships without concern for the direction and it should also be the same performance characteristics.

Visualizing a hierarchy of rectangles with neo4j?

I am thinking of using neo4j to store a graph database. My data basically consists of a hierarchy of rectangular regions with fixed coordinates: the top node has R rectangles in it, each of those has Q rectangles in it, and so on. The regions do not form a rectangular subdivision. Since I have a lot of data, I would like to be able to present an interface where a user can click on a particular rectangle to see its substructures in more detail, and then be able to click on one of those rectangles to show more detail, and so on. My application would be sort of like Google Maps, where more detailed layers get loaded as a user zooms in. I was thinking of generating tiles to serve to OpenLayers or Leaflet for display, but my data has a graph structure that I would like to take advantage of, and I think using neo4j (possibly in combination with a visualization library like d3.js) may be an easier way to build my tool.
I have these questions about neo4j and the ability to visualize its data:
Can data in neo4j be organized into different layers corresponding to different levels of detail?
Can neo4j display nodes as rectangles with fixed coordinates on a 2D plane? Can these rectangles be selectable / "zoomable"?
I know neo4j has a default web interface for showing nodes but I'd like to know how customizable this is before committing a lot of time to it. The TreeMap example at https://github.com/mbostock/d3/wiki/Gallery sort of looks like what I want, but I'd like to show more detailed structure in regions that users select.

Resources