How to clone a graph if:
nodes are not labeled
it's only possible to move between nodes through edges
you can't mark or any other way use nodes of original graph
More precise definition of problem:
You are navigating through graph.
At a time you see only one node and it's links. For example as simple as single number (number of links).
Graph is non-directed with unique links between each connected pair of nodes.
To move to another node you just pass a link index into step API call and get next node (number of links in next node). Function step handles moves in a way like list of links is sorted same way every time you enter the node. E.g. if you are in node A connected with node B then passing some constant number i_A_B into step always get you into node B.
At start you are in special start node that has only one connection, then you can only pass "0" into and get into start node. This is an equivalent to call special start API call without parameters and get into start node.
Is there any and what is the well-known name of this problem?
What is the algorithm that can make a clone of graph without modifying (marking, labeling, coloring) the original one?
Creating nodes during navigation and connect them to previous node is trivial.
The main difficulty then obviously would be to match already visited nodes with already created.
What minimal additional information is required?
Is marked initial node is enough?
For example without mark any cycle graph will look equivalent through this API. Also any graph with same regular structure but different size.
What types of graphs can be recognized (cloned) without marked start node?
Upd1: API provides no way to compare nodes or edges of original graph.
Upd2: This problem looks like creation of a map of a maze while walking inside it.
Or like replication of a state machine that is encapsulated in black box and provides only oracle that returns a set of acceptable inputs on call with current acceptable input.
Related
I am brand new to Gremlin and am using gremlin-python to traverse my graph. The graph is made up of many clusters or sub-graphs which are intra-connected, and not inter-connected with any other cluster in the graph.
A simple example of this is a graph with 5 nodes and 3 edges:
Customer_1 is connected to CreditCard_A with 1_HasCreditCard_A edge
Customer_2 is connected to CreditCard_B with 2_HasCreditCard_B edge
Customer_3 is connected to CreditCard_A with 3_HasCreditCard_A edge
I want a query that will return a sub-graph object of all nodes and edges connected (in or out) to the queried node. I can then store this sub-graph as a variable and then run different traversals on it to calculate different things.
This query would need to be recursive as these clusters could be made up of nodes which are many (inward or outward) hops away from each other. There are also many different types of nodes and edges, and they all must be returned.
For example:
If I specified Customer_1 in the query, the resulting sub-graph would contain Customer_1, Customer_3, CreditCardA, 1_HasCreditCard_A, and 3_HasCreditCard_A.
If I specififed Customer_2, the returned sub-graph would consist of Customer_2, CreditCard_B, 2_HasCreditCard_B.
If I queried Customer_3, the exact same subgraph object as returned from the Customer_1 query would be returned.
I have used both Neo4J with Cypher and Dgraph with GraphQL and found this task quite easy in these two langauges, but am struggling a bit more with understanding gremlin.
EDIT:
From, this question, the selected answer should achieve what I want, but without specifying the edge type by changing .both('created') to just .both().
However, the loop syntax: .loop{true}{true} is invalid in Python of course. Is this loop function available in gremlin-python? I cannot find anything.
EDIT 2:
I have tried this and it seems to be working as expected, I think.
g.V(node_id).repeat(bothE().otherV().simplePath()).emit()
Is this a valid solution to what I am looking for? Is it also possible to include the queried node in this result?
Regarding the second edit, this looks like a valid solution that returns all the vertices connected to the starting vertex.
Some small fixes:
you can change the bothE().otherV() to both()
if you want to get also the starting vertex you need to move the emit step before the repeat
I would add a dedup step to remove all duplicate vertices (can be more than 1 path to a vertex)
g.V(node_id).emit().repeat(both().simplePath()).dedup()
exmaple: https://gremlify.com/jngpuy3dwg9
Please may you help me to write a query that returns each source vertex in my traversal along with its associated edges and vertices as arrays on each such source vertex? In short, I need a result set comprising an array of 3-tuples with item 1 of each tuple being the source vertex and items 2 and 3 being the associated arrays.
Thanks!
EDIT 1: Expanded on the graph data and added my current problem query.
EDIT 2: Improved Gremlin sample graph code (apologies, didn't think anyone would actually run it.)
Sample Graph
g.addV("blueprint").property("name","Mall").
addV("blueprint").property("name","HousingComplex").
addV("blueprint").property("name","Airfield").
addV("architect").property("name","Tom").
addV("architect").property("name","Jerry").
addV("architect").property("name","Sylvester").
addV("buildingCategory").property("name","Civil").
addV("buildingCategory").property("name","Commercial").
addV("buildingCategory").property("name","Industrial").
addV("buildingCategory").property("name","Military").
addV("buildingCategory").property("name","Resnameential").
V().has("name","Tom").addE("designed").to(V().has("name","HousingComplex")).
V().has("name","Tom").addE("assisted").to(V().has("name","Mall")).
V().has("name","Jerry").addE("designed").to(V().has("name","Airfield")).
V().has("name","Jerry").addE("assisted").to(V().has("name","HousingComplex")).
V().has("name","Sylvester").addE("designed").to(V().has("name","Mall")).
V().has("name","Sylvester").addE("assisted").to(V().has("name","Airfield")).
V().has("name","Sylvester").addE("assisted").to(V().has("name","HousingComplex")).
V().has("name","Mall").addE("classification").to(V().has("name","Commercial")).
V().has("name","HousingComplex").addE("classification").to(V().has("name","Resnameential")).
V().has("name","Airfield").addE("classification").to(V().has("name","Civil"))
Please note that the above is a very simplified rendering of our data.
Needed Query Results
I need to bring back each blueprint vertex as a base with each of its associated edges / vertices as arrays.
My Current Solution
Currently I do this very cumbersome query that gets the blueprints and assigns a label, gets the architects and assigns a label, then selects both labels. The solution is ok; however, it gets messy when I need to include edges or I need to get blueprint classification vertices (industrial, military, residential, commercial, etc.). In effect, the more associated data that I need to pull back for each blueprint, the sloppier my solution becomes.
My current query looks something like this:
g.V().hasLabel("blueprint").as("blueprints").
outE().or(hasLabel("designed"),hasLabel("assisted")).inV().as("architects").
select("blueprints").coalesce(out("classification"),constant()).as("classifications").
select("blueprints","architects","classifications")
The above produces a lot of duplication. If the number of: blueprints is b, architects is a, and classifications is c, the result set comprises b * a * c results. I'd like one blueprint with an array of its associated architects and an array of its associated classifications, if any.
Complications
I'm trying to do this in one query so that I can get all blueprint data from the graph to populate a filtered list. Once I have the list comprising all of the vertices, edges, and their properties, users can then click links to blobs, browse to project sites, etc. Accordingly, I've got pagination as well as filtering to think about and I'd prefer to make one trip to the server each time I get a new page or the filters change.
I figured out an answer; however, it quadruples the compute charge for the query. Not sure if this can be optimized further.
g.V().hasLabel("blueprint").
project("blueprints","architects").
by().
by(outE().or(hasLabel("designed"),hasLabel("assisted")).inV().dedup().fold())
I just solved for blueprints and architects, but classifications just needs another by(...traversal...) and projection label.
I may have to just get the blueprints in one query, get each of their associated items in parallel queries, then put it all together in the API. That would be very bad design for the API data layer but may be necessary for performance reasons.
I wonder if there is a way to observe changes in the whole graph, instead of subscribing to changes on one particular node. I was not able to find an answer reading the Docs/Howtos at gun.eco/docs
Lets say you build a real-time mind-mapping application, so basically a graph/tree structure.
If I add a new node - at some place - to the graph I want to update my UI
If I remove a node or a whole subtree ...
The 2. scenario is a a general concern:
How can I delete multiple nodes together with all related edges ?
Copied from conversation with Gun Community:
First Answer:
To answer the 1st, You could have an index node that you subscribe to unsing gun.get(node).on(callback, changesOnlyFlag). New nodes would trigger the update function, where you will check what that node might be related to in your application.
To answer the 2nd, Delete in a decentralized system is hard. (Google tombstone problem) In Gun deletes are handled by putting null to an object, which cuts all edges from that item and it becomes unreachable from a traversal standpoint. (Although you can still get the children nodes by their soul (UUID of node), or via the index node that you might add all children too, by default)
Second Answer:
https://gun.eco/docs/API#open describes an additional module you can require to open the whole graph. Which can be used to track changes, with slight modifications to the code.
I am planing to use ArangoDB and I am faced with a problem I don't know how to solve. I would like to do simple traversals but in my case but there are two requirements that I don't know how to solve:
I will not know in advance the type of vertices than an edge will connect to. I want to be able to connect edge of one type to any vertex on any side.
For one vertex, I want to retrieve all connected vertices (depth 1) no matter the edge type.
For the requirement 1, an example would be a Tag vertex (to tag some entity with some information) and I want to be able to tag any vertex using i.e. HasTag edge in a named graph. From what I currently see is that I need to define the "From" collections ("To" collection is the Tag collection) and this is limited to 10 collections. Since I could have 100 or more From collections I don't see how to solve this with named graphs.
Option would be to use anonymous graphs but then I have a problem in the second requirement. I also want to have an option, when given a vertex, to find all connected vertices (depth = 1) no matter the type of an edge. In an anonymous graph I would need to specify all of the edge collections in a query and again, there could be 100 or more of them. I don't know if there is a limit to this number but I would assume there is one - maybe I'm mistaken since I haven't yet tried it out.
Has anyone any idea how to solve this with ArrangoDB? I really like the database but I would like it to be more "typeless", that is, that I wouldn't have to define the type of vertex collection an edge can connect to.
Best regards
Tomaz
You can have more than 10 vertex collections in a named graph. The limitation of 10 only exists in the webUI. Creating the named graph over the ArangoShell or the server console will work.
I am new to Neo4j.
For a given node (say, node 'n'), I am trying to find all other nodes in the graph that are in some way dependent on it. In other words, find nodes in graph who have edges directed towards node 'n'. I am getting correct nodes(lets call them c,d,e) using the following query:
MATCH (depNode)-[r]->(n:AttributeNode)
WHERE n.name='testnode'
RETURN depNode
In the original graph, the nodes c and d are connected as well using a relationship. In the result of the above query, I am also receiving that relationship (edge between c and d). How do I get rid of that edge in my output?
If I get your question correctly, I think you're already getting the correct answer in tabular form but in the visualization form Neo4j shows the "extra edges". You should check out the tabular form and confirm whether it's correctly showing the desired output or not (which it would be).
What's happening here is the default way how the Neo4j browser works. Whenever you try to retrieve some nodes, it shows all the relationships between the nodes as well. If you want to just visualize the nodes, you cannot do that in the current version of the Neo4j browser. You will have to use visualization tools like Gephi on your database and filter your results accordingly.
As of Neo4j 2.2.0.RC1 you can disable the extra relationships being used by setting Autocomplete to Off. The toggle appears at the bottom-right of your result graph and seems to be remembered for future requests.