D3.js force layout: How to isolate node groups? - graph

What I'm trying to achieve?
I would like to have groups of nodes, in a tree-like structure where each root is either the main root, or is a descendant of a leaf from another tree.
Generating what you see below is easy, but what I'd really like to see is complete circles around each root. However, since the nodes are repelling each other the below gaps are present between each cluster. I assume the solution involves ignoring the repulsion caused by charges between leaves coming from different roots.
My ideas
Set some sort of radius around each root which repels other nodes in all directions beyond that radius, allowing the leaves to be circular within it
Use linkDistances and linkStrengths to somehow arrange the clusters in a way that they do not interact significantly
Is this possible?
Other than my vague ideas, I really have no clue how to do this!
From reading the D3 docs, I found that unlike the dynamic linkDistance and linkStrength methods, node charge manipulations seem to be universal:
"All nodes are assumed to be infinitesimal points with equal charge and mass."
If this statement is true, can one of you guys please guide me in the right direction?

I'm not sure exactly how you would approach it, but I found an example in which clusters have charge only within themselves and nodes don't interact with different ones: http://bl.ocks.org/mbostock/1804889

I'm think the beginning of an answer might be in this stackoverflow question, Space out nodes evenly around root node in D3 force layout
Do know that is possible to make each node's charge dependent on some attribute of same.
Try it with something like
.charge(function(d) { console.log(d); //you'll see this brings up the nodes
if (d.something == onething) {
return -1300;}
else {
return -100; }
})
As the linked-to answer mentions, you'll almost certainly have to experiment with "friction" and "linkDistance." Don't be afraid of trial and error--I at least have been intermittently dealing with this kind of issue for a few months now, and haven't yet found a "general" solution.

Related

Does an Increased Number of Node Types Impact Performance of Graph DBs?

I am in the process of creating a graph database, a simple one for movies with several types of information like the actors, producers, directors and so on.
What I would like to know is, is it better to break down your nodes to a more granular level? For example, is it better to have two kinds of nodes for 'actors' and 'directors' or is it better to have one node, say 'person' and use different kinds of relationships like 'acted_in' and 'directed'? Does this even matter at all?
Further, is there any impact on the traversal queries? Does having more types of nodes mean that the traversal is slower?
Note: I intend to implement this using the Gremlin console in Amazon Neptune.
The answer really is it depends. If I were building such a model I would break out the key "nouns" into their own nodes. I would also label the edges appropriately such as ACTED_IN or DIRECTED.
The performance of any graph query depends on how much data it will need to touch (the fan out factor as you go from depth to depth).
The best advice I can give you is think about the questions you will need the graph to answer and try to design your data model so that writing those queries is as easy as possible. Don't be afraid to iterate multiple times on your data model also. That is common and expected.
Properties can be useful when you want to add a unique piece of information to a node - perhaps the birthday of the director.
Edge properties can be useful for filtering out unneeded edges but edge labels can also. In some cases you may find a label such as DIRECTED-IN-2005 is a useful short cut to avoid checking a label and a property on an edge.

Graph database design: Should I add relationships, or just traverse

I have recently started exploring graph databases and Neo4J, and would like to work with my own data. At the moment I've hit some confusion. I've created an example image to illustrate my issue. In terms of efficiency, I'm wondering which option is better (and I want to get it right now in early days before I start handling larger amounts).
Option A: Using only the blue relationships, I can work out whether things are related to, or come under, the Ancient group. This process will be done many many times, however it is unlikely to be more than ~6 generations.
Option B: I implement the red relationships, so that it is much faster to work out if young structures belong to the Ancient group.
I'm trying not to use Labels in this scenario, as I'm trying to use labels for a specific purpose to simplify my life (linking structures across seperate networks), and I'm not sure if I should have a label to represent a node that already exists.
In summary, I'm wondering whether adding a whole new bunch of relationships, whilst taking more space, is worth it, or whether traversing to find all relatives is such a simple/inexpensive task that it isn't worth doing so. Or alternatively, both options are viable and this isn't a real issue at all. Thanks for reading.
I'd go with Option A. One of the strengths of Neo4j is that it traverses relationships very efficiently and quickly, and so, there is no need to materialise relationships (sometimes, relationships are materialised in complex and/or extremely large graphs, but this is not your case).
Not sure why you don't want to use labels? Labels serve to group nodes into sets of the same type, and are also index backed- this makes it much faster to find the starting point of your query (index lookup over full database scan).

How can I force vis.js edges in a rendered DAG to "jump" graph levels?

I have used vis.js to draw some DAGs using the hierarchical layout option. It works well, however for my use case there are often going to be edges that must "jump vertex generations," not certain if I am saying that properly. Essentially, one branch may have 10 levels, and then a sibling of the parent of the deep branch may want to connect to the deepest leaf node.
This "works" - vis.js draws it. But it screws up my layout, shifting a large portion of the preexisting graph, and it will not be useful for a user to look at the result. I have attached a picture of what I am trying to achieve and what the current results are, can anyone point me in the right direction?
Solution turned out to be very simple, I just overlooked it. Using the hierarchical layout, it is possible to assign each node a field called level. It is an all-or-nothing option: either you let vis.js take care of the levels, or you manually assign all your nodes a level. It respects the levels very well, and when adding edges to nodes whose levels have been manually defined, the nodes no longer jump around the layout.

Graph exploration: does the choice to use incoming edges or outgoing edges affect performance?

I have been tinkering with Graphs for some time, with the objective that I implement appropriate portions of the server-side stack using them. I have used Scala-Graph and Neo4J, and I am learning Spark GraphX. In almost all the applications I have implemented, the model has been that of a Property Graph (Node -> Edge -> Node, with attributes).
When designing the graph (DAGs to be precise), if I spot a strong and directed relationship between two nodes, I set up an edge from one node to one node. This is obvious and intuitive. If a Person likes a Site, an edge with property 'likes' connects them. Thus:
[Nirmalya] -- (Likes) --> [StackOverFlow]
[John] -- (Likes) --> [StackOverFlow]
[Ted] -- (Likes) --> [GoogleGroups ]
[Nirmalya] -- (Likes) --> [Neo4J]
Now, using outgoing edges, I can easily find out which sites Nirmalya likes.
But, when I want to find out who else likes what Nirmalya likes (i.e.,John), I tend to think that I should create an edge from Site-type Node to Person-type Node also (with property 'isLikedBy'), so that the path is obvious and the traversal is intuitive. Every Person and Site must be connected in both the directions, so that I can reach the other from either to answer queries like this one.
[Nirmalya] -- (Likes) --> [StackOverFlow] -- (IsLikedBy) --> [John]
But from many examples given by experts, I see that this is not prescribed. Instead, this is achieved by making use of operators like incoming. In other words, if two Nodes have an edge set up between them, I don't need to set both the directions of the edge explicitly (just 'likes' is sufficient, 'isLikedBy' is superfluous). Implementation of adjacency matrix makes this possible perhaps but I get a bit confused because I am being allowed to derive a contra-direction even when that direction is not explicit in the DAG.
My question is where is the gap in my understanding? Is it that 'IsLikedBy' direction should ideally be present, but we are optimizing? Alternatively, is it that there can be UseCases where such bidirectional edges are necessary and I need to spot them? Am I completely missing a theoretical underpinning?
I will be glad to become wiser.
I think it depends on the software. I can speak for Neo4j, but not for the other tools that you mentioned ;)
In Neo4j relationships are designed to be traversable both forwards and backwards without a performance cost. This applies both to traversing in the Java APIs as well as using Cypher. You can query both specifying a direction of incoming/outgoing as well as querying for relationships without concern for the direction and it should also be the same performance characteristics.

Relationships with relationships in Neo4j

In Neo4j, is it possible for a relationship to have a relationship?
To illustrate: Imagine a domain model that encompasses a collection of geometric planes. Each plane has a collection of lines on it, and each line has a collection of points on it. Each point on a line is connected to the point after it by an outgoing -[NEXT]-> relationship, and to the point preceding it by an incoming one. The way I have it now, each of these NEXT relationships contains a property lineID, which identifies the line on which it exists: The node entities representing lines in the database contain only an id, and perhaps a bit of metadata, and we return line X by traversing the graph, finding all -[NEXT{lineID:X}]-> relationships, fetching the start and end nodes of each and returning an list of them along with the line's metadata.
I was a bit more longwinded there than I intended to be, but my question is this: What if, rather than having a lineID property on each [NEXT] relationship, I wanted to create an -[ON]-> relationship between each [NEXT] and the node entity representing the line it is on?
To illustrate: Rather than doing
CREATE (:point)-[:NEXT{lineID:x}]->(:point)-[:NEXT{lineID:x}-> ...
, what about something like:
CREATE (:point)-[z:NEXT]->(:point), (z)-[:ON]->(:line)`
That's some ugly cypher, but I hope it clarifies my point. Intuitively, it seems like this would make line traversals more efficient (because we'd be playing to neo4j's strength by asking it to traverse all [ON] relationships from a line node rather than simply searching for a (presumably indexed) property. It would also make it easier to specify nested relationships:
(z)-[:ON]->(:line), (z)-[:ON]->(:plane)
Is this intuition misconceived? If not, would something like this be possible? I don't think it is, but am contemplating a workaround that would involve creating a node entity for each "relationship". Something like this:
(:point)<-[:FROM]-(x:next)-[:TO]->(:point), (x)-[:ON]->(:line)
, which would have the added advantage of facilitating hypergraph structures, which is something else I'm interested in. Leaving that conversation for another day (and another post), would such an approach be more trouble/expensive than its worth the purposes elucidated here? Might there be any dis/advantages (aside from plain cost) I'm not considering? Or am I reinventing the wheel here - is there an extant solution in this situation that I'm unaware of?
There are no relationships that you can link to other relations. I think that when you ask yourself this kind of questions, you may have a modelling problem for your data, and the next thing to do is try to model the data differently. For instance, why the relationship that links two points knows the line on which the points are ? Wouldn't it be more natural that the point knows the line, therefore having the property lineID on the points? This way you may have points on several lines, which you can't model properly if the lineID is on the NEXT relationship. Perhaps even better, you can have a node Line that has a relationship CONTAINS with all the points on that particular line instead of using lineID property.
This is not possible.
Restructure your model so any data on your relationship which needs linking is a node

Resources