How to add multiple joinCondition in vis.js while clustering - vis.js

I am using vis.js for network diagramming, I need to cluster some nodes.
For Example: create new cluster when a node property is 'aaa' (Suppose two nodes has 'aaa' they will create cluster).
Again create new cluster when a node property is 'bbb' (Suppose two nodes has 'bbb' they will create another cluster).
I have two problems:
1) I don't want to hard-code 'aaa' or 'bbb'.
2) I don't want to create multiple clusterOptionsByData object and invoke network.cluster(clusterOptionsByData) multiple times.
Is there any way to pass multiple joinconditions while clustering in vis.js?

I'm the developer of the visjs network. The answer to 2) is no.
What is the use case of not wanting to call the method twice? You can construct your own join condition based on variables and pass that into the cluster method. You don't need to hardcore anything.
Every cluster call makes one cluster. Multiple join conditions do not make sense. It's up to you to come up with a good join condition that covers all your cases.
Next time you have a question, post it on our github issues page. We get email updates on these so we'd be able to help you quicker.
Cheers

Related

Gremlin - avoid traversing already seen edges

I've the following model which I'd like to model as a graph in Azure CosmmodDB.
So I have a user that can be in multiple groups, user can also have multiple permissions attached, groups can also have multiple permissions attached.
I want to find an efficient query that starting from User, I get all the permissions attached (either directly attached or via a group).
One thing to add is that user and group may be assigned to the same permission (and I want to get it just once).
I came up with the query:
g.V().hasLabel('user').has('userid', '0_2147483647').repeat(out().simplePath()).until(hasLabel('permission'))
This query is not very efficient when there is much data, so the question is: can we make it better ?
I don't see a reason to use repeat() here as the depth of your traversal is known. I would just do:
g.V().has('user`, 'userid', '0_2147483647').
union(out('has'),
out('isingroup').out('has')).
dedup()

How to clone a graph without modification of inital one?

How to clone a graph if:
nodes are not labeled
it's only possible to move between nodes through edges
you can't mark or any other way use nodes of original graph
More precise definition of problem:
You are navigating through graph.
At a time you see only one node and it's links. For example as simple as single number (number of links).
Graph is non-directed with unique links between each connected pair of nodes.
To move to another node you just pass a link index into step API call and get next node (number of links in next node). Function step handles moves in a way like list of links is sorted same way every time you enter the node. E.g. if you are in node A connected with node B then passing some constant number i_A_B into step always get you into node B.
At start you are in special start node that has only one connection, then you can only pass "0" into and get into start node. This is an equivalent to call special start API call without parameters and get into start node.
Is there any and what is the well-known name of this problem?
What is the algorithm that can make a clone of graph without modifying (marking, labeling, coloring) the original one?
Creating nodes during navigation and connect them to previous node is trivial.
The main difficulty then obviously would be to match already visited nodes with already created.
What minimal additional information is required?
Is marked initial node is enough?
For example without mark any cycle graph will look equivalent through this API. Also any graph with same regular structure but different size.
What types of graphs can be recognized (cloned) without marked start node?
Upd1: API provides no way to compare nodes or edges of original graph.
Upd2: This problem looks like creation of a map of a maze while walking inside it.
Or like replication of a state machine that is encapsulated in black box and provides only oracle that returns a set of acceptable inputs on call with current acceptable input.

Vis.js: How to get a cluster node from the network

Is there a way to get the Node data for a cluster by its id?
According to the documentation:
Clustered Nodes when created are not contained in the original data.nodes passed on network creation
So it's not possible to get it from the vis dataset like we do with normal nodes.
There is the method network.clustering.updateClusteredNode() to update a cluster node, but none to get the cluster node.
There is this way (quite simple).
if the node id is 7 for example.
you can get its data like this.
network.body.nodes[7]
Hope it helps you.

Can a graph database edge have multiple starting nodes?

I am designing a graph database for eligibility rules. Some eligibility rules require that a user select 2 particular products (Product A and Product B) to qualify for Product C.
Is it possible to create a graph edge with 2 starting nodes?
I would think this would break what I think is the fundamental building block of a graph db - its adjacency list. But if this was possible, it would be very powerful for my application.
Update 6/16
More specifically, I'm looking to create a directed edge with 2 starting nodes, and 1 ending node. So, in biz rules terms: IF Node=A AND Node=B THEN Node=C. The real world relationship is this: If customer buys Product A and Product B, then customer qualifies for Product C.
Usually, to model a hypergraph in Neo4j, you end up creating an intermediate "group node" that connects all of the nodes you want to connect, then bridging off of that node to the other node. It's not a true hypergraph, but rather a representation of a hypergraph using the tools provided.
Here's an example:
http://www.markhneedham.com/blog/2013/10/22/neo4j-modelling-hyper-edges-in-a-property-graph/
Yes you can have multiple starting nodes in Neo4j, not sure about other graph db.
START a=node(0), b=node(1)
RETURN a,b
You should refer to http://docs.neo4j.org/chunked/stable/query-start.html for more details. Starting from Neo4j 2.0 start node is optional, Cypher will try and infer starting points from your query based on label and where clause.
Edit
I have edited the answer based on the updated question. What you need is a hypergraph. As Wes Freeman mentioned, to model a hypergraph Neo4j you will need to create an intermediate node that connects your other two nodes and the the third node. In you scenario a user will have a PURCHASED relationship with the two products(A and B) kinda like (:User {Id: 1})-[:PURCHASED]->(:Product {Name:A}). Then you will have to create an intermediate node like ProductQualifier (I am very bad at naming things) having a relationship from user like (:User {Id:1})-[:QUALIFIES]->(:ProductQualifier {Id:1}). The Product qualifier will then have 3 relations, two to Product A and B respectively and a third to Product C,
(:Product {Name: 'B'})<-[:HAS]-(:ProductQualifier {Id:1})-[:HAS]->(:Product {Name: 'A'})
(ProductQualifier {Id:1}-[:ELIGIBLE]->(:Product {Name: 'C'})
This ought to do what you want.
A second approach that you can take is use a database that inherently supports hypergraphs, something like Hypergraphdb, thus discarding the burden of creating extra node. I haven't had any occasion to use it though I wanted to try it out for quite some time, so I don't know in much details about its API's or its limitations, however it is fairly well known graph database.
Note: As mentioned I am very bad at naming things. You should probably change the label names to more suitable to your business model.

Storing multiple graphs in Neo4J

I have an application that stores relationship information in a MySQL table (contact_id, other_contact_id, strength, recorded_at). This is fine if all I need to do is show who a contact's relationships are or even to generate a list of mutual contacts for two contacts.
But now I need to generate stats like: 'what was the total number of 2-way connections of strength 3 or better in January 2011' or (assuming that each contact is part of a group) 'which group has the most number of connections to other groups' etc.
I quickly found that the SQL for generating these stats became unwieldy real fast.
So I wrote a script that for any given date it will generate a graph in memory. I could then run whatever stat I wanted against that graph. Much easier to understand and in general, much more performant also -- except for the generating the graph part.
My next thought was to cache those graphs so I could call on them whenever I needed to run a new stat (or generate a later graph: eg for today's graph I take yesterday's graph and apply any changes that happened since yesterday). I tried memcached which worked great until the graphs grew > 1 MB.
So now I'm thinking about using a graph database like Neo4J.
Only problem is, I don't have just one graph. Or I do, but it is one that changes over time and I need to be able to query it with different reference times.
So, can I:
store multiple graphs in Neo4J and rertrieve/interact with them separately? i would then create and store separate social graphs for each date.
or
add valid to and from timestamps to each edge and filter the graph appropriately: so if i wanted a graph for "May 1st" i would only follow the newest edge between two noeds that was created before "May 1st" (and if all the edges were created after May 1st then those nodes wouldn't be connected).
I'm pretty new to graph databases so any help/pointers/hints will be appreciated.
Right now you can store just one graph database in a single Neo4j instance, but this one graphdb can contain as many different sub-graphs as you like. You only have to keep that in mind when doing global operations (like index queries) but there you can do compound queries that include timestamped properties as well to limit the results.
One way of doing that is, as you said adding temporal information to edges to represent the structure of a graph for a given date you can then traverse the structure of the graph back then.
Reference node has a different meaning in Neo4j.
Using category nodes per day (and linking them and also aggregating them for higher level timespans) is the more graphy way of categorizing nodes than indexed properties. (Effectively these are in-graph indices that you can easily include in your traversals and graph queries).
You don't have to duplicate the nodes as long as you are only interested in different temporal structures. If your nodes are also different (e.g. changing properties, you could either duplicate them, and so effectively creating different subgraphs) or create a connected list of history nodes on each node that contain just the changes (or the full snapshot depending on your requirements).
Your domain sounds very fitting for the graph database. If you have more and detailed questions feel free to join the Neo4j mailing list.
Not the easiest solution (I'm assuming you only work with one machine), but if you really want to separate your graphs, you only need to remember that a graph is a directory.
You can then create a dynamic loader class which takes the path of the database you want, load it in memory for the query, and close it after you getting your answer. You could also configure a proxy server, and send 2 parameters to your loader: your query (which I presume is a cypher query in this case) and the path of the database you want to query.
This is not adequate if you have tons of real-time queries to answer. But if it is simply for storing and doing some analytics over data sets, it can definitly answer your needs.
This is an old question, but starting with Neo4j 4.x, multi-tenancy is supported and you can have different databases within the same Neo4j server (with distinct RBAC permissions).

Resources