Get nodes adjacent to child node in Gremlin - graph

I'm fairly new to Gremlin and I'm trying to query a graph starting at my vertex Customer, which is related to various nodes amongst which is the Account node. And so, I want to retrieve all nodes related to the Customer node + all nodes related to the Account node connected to it.
As you can see in the image, my Customer node is related to the account node via the has_account edge. I would like to get all the nodes adjacent to that account node.
Customer node
As I said, I'm fairly new to neptune so what I've tried aside from the most basic visualizations is:
g.V('id').outE().inV().outE().inV().path()
And that gives me the nodes adjacent to the Account node but ommits the other adjacent nodes to the Customer node. I've also tried some other groupings and mappings but I can't seem to make it work.

In the query you wrote above, you are starting out a vertex id and then traversing to all of its connections (the first outE().inV(), and then the connections of the connections (the second outE().inV()). If a connection does not have any connections, then it will be filtered out.
If you would like to return both the 1st and optionally 2nd connections, then I would look at using the optional() step for the 2nd+ level connections that may or may not exist like this:
g.V('id').outE().inV().optional(outE().inV()).path()

Related

OpenCypher - Get all nodes that not connect to the center of the graph

I have a graph on Neptune and I used OpenCypher to query on it.
At the middle of the graph I have a big connected nodes,
at the edges you can see that I have some single nodes/ nodes that connected only to 1-5 other nodes. (see on the picture)
I want to get all of them, there is an option to do so?
I tried to think about options like take a random Id from the center and check all the nodes that don't have a path from them to this node, or maybe say get a table with all nodes and number of connected nodes to them, and ask for all nodes that not contain more than 10 nodes connected
but I didn't find a way to write this query,
Must to know that opencypher on Neptune not contains all the magic keys like 'all' predicate function, so need to find a way with the functions that neptune support
Ideally, you would want to run Weakly Connected Components algorithm to identify the largest component and then return all nodes that are not part of it. It seems that Neptune doesn't support that algorithm out-of-the-box, but you could implement it with gremlin as discussed in another SO question: Find largest connected components AWS Neptune

Gremlin query to find the entire sub-graph that a specific node is connected in any way to

I am brand new to Gremlin and am using gremlin-python to traverse my graph. The graph is made up of many clusters or sub-graphs which are intra-connected, and not inter-connected with any other cluster in the graph.
A simple example of this is a graph with 5 nodes and 3 edges:
Customer_1 is connected to CreditCard_A with 1_HasCreditCard_A edge
Customer_2 is connected to CreditCard_B with 2_HasCreditCard_B edge
Customer_3 is connected to CreditCard_A with 3_HasCreditCard_A edge
I want a query that will return a sub-graph object of all nodes and edges connected (in or out) to the queried node. I can then store this sub-graph as a variable and then run different traversals on it to calculate different things.
This query would need to be recursive as these clusters could be made up of nodes which are many (inward or outward) hops away from each other. There are also many different types of nodes and edges, and they all must be returned.
For example:
If I specified Customer_1 in the query, the resulting sub-graph would contain Customer_1, Customer_3, CreditCardA, 1_HasCreditCard_A, and 3_HasCreditCard_A.
If I specififed Customer_2, the returned sub-graph would consist of Customer_2, CreditCard_B, 2_HasCreditCard_B.
If I queried Customer_3, the exact same subgraph object as returned from the Customer_1 query would be returned.
I have used both Neo4J with Cypher and Dgraph with GraphQL and found this task quite easy in these two langauges, but am struggling a bit more with understanding gremlin.
EDIT:
From, this question, the selected answer should achieve what I want, but without specifying the edge type by changing .both('created') to just .both().
However, the loop syntax: .loop{true}{true} is invalid in Python of course. Is this loop function available in gremlin-python? I cannot find anything.
EDIT 2:
I have tried this and it seems to be working as expected, I think.
g.V(node_id).repeat(bothE().otherV().simplePath()).emit()
Is this a valid solution to what I am looking for? Is it also possible to include the queried node in this result?
Regarding the second edit, this looks like a valid solution that returns all the vertices connected to the starting vertex.
Some small fixes:
you can change the bothE().otherV() to both()
if you want to get also the starting vertex you need to move the emit step before the repeat
I would add a dedup step to remove all duplicate vertices (can be more than 1 path to a vertex)
g.V(node_id).emit().repeat(both().simplePath()).dedup()
exmaple: https://gremlify.com/jngpuy3dwg9

gremlin hasLabel query times out

I have a test graph with less than a million nodes and probably a slightly higher number of edges. I'm using a remote gremlin client to connect to a janusgraph/gremlin-server instance backed by 3 scylla backends.
I have various different labeled nodes i.e url, domain, host and brand. The graph contains mainly url, domain, and host nodes. I have one brand node in this entire graph. The brand node looks like this:
{
label: brand
properties: {
brand: string
}
}
I am able to do the following query in 1.5 ms. The brand property has a composite index.
g.V().hasLabel('brand').has('brand','stackoverflow');
The query below hits the 30s timeout. I expect this query to only return only one result based on the data I imported into the graph. I verified by testing with a limit
g.V().hasLabel('brand')
My questions
Why does this timeout?
Is Janusgraph scanning through all nodes in the graph to try find a single node labeled 'brand'? Is there no default index on labels?
Why does the first query execute fine when the first steps for both are the same?
Thank you
Why does this timeout?
Is Janusgraph scanning through all nodes in
the graph to try find a single node labeled 'brand'? Is there no
default index on labels?
As you have guessed this is likely timing out due to a full graph scan since vertex labels are not indexed in JanusGraph. There is an open issue for this: https://github.com/JanusGraph/janusgraph/issues/283
Why does the first query execute fine when the first steps for both are the same?
In this case I suspect that JanusGraph's optimizer is able to optimize the traversal plan to use the composite index.

How to clone a graph without modification of inital one?

How to clone a graph if:
nodes are not labeled
it's only possible to move between nodes through edges
you can't mark or any other way use nodes of original graph
More precise definition of problem:
You are navigating through graph.
At a time you see only one node and it's links. For example as simple as single number (number of links).
Graph is non-directed with unique links between each connected pair of nodes.
To move to another node you just pass a link index into step API call and get next node (number of links in next node). Function step handles moves in a way like list of links is sorted same way every time you enter the node. E.g. if you are in node A connected with node B then passing some constant number i_A_B into step always get you into node B.
At start you are in special start node that has only one connection, then you can only pass "0" into and get into start node. This is an equivalent to call special start API call without parameters and get into start node.
Is there any and what is the well-known name of this problem?
What is the algorithm that can make a clone of graph without modifying (marking, labeling, coloring) the original one?
Creating nodes during navigation and connect them to previous node is trivial.
The main difficulty then obviously would be to match already visited nodes with already created.
What minimal additional information is required?
Is marked initial node is enough?
For example without mark any cycle graph will look equivalent through this API. Also any graph with same regular structure but different size.
What types of graphs can be recognized (cloned) without marked start node?
Upd1: API provides no way to compare nodes or edges of original graph.
Upd2: This problem looks like creation of a map of a maze while walking inside it.
Or like replication of a state machine that is encapsulated in black box and provides only oracle that returns a set of acceptable inputs on call with current acceptable input.

ArangoDB anonymous graph traversal

I am planing to use ArangoDB and I am faced with a problem I don't know how to solve. I would like to do simple traversals but in my case but there are two requirements that I don't know how to solve:
I will not know in advance the type of vertices than an edge will connect to. I want to be able to connect edge of one type to any vertex on any side.
For one vertex, I want to retrieve all connected vertices (depth 1) no matter the edge type.
For the requirement 1, an example would be a Tag vertex (to tag some entity with some information) and I want to be able to tag any vertex using i.e. HasTag edge in a named graph. From what I currently see is that I need to define the "From" collections ("To" collection is the Tag collection) and this is limited to 10 collections. Since I could have 100 or more From collections I don't see how to solve this with named graphs.
Option would be to use anonymous graphs but then I have a problem in the second requirement. I also want to have an option, when given a vertex, to find all connected vertices (depth = 1) no matter the type of an edge. In an anonymous graph I would need to specify all of the edge collections in a query and again, there could be 100 or more of them. I don't know if there is a limit to this number but I would assume there is one - maybe I'm mistaken since I haven't yet tried it out.
Has anyone any idea how to solve this with ArrangoDB? I really like the database but I would like it to be more "typeless", that is, that I wouldn't have to define the type of vertex collection an edge can connect to.
Best regards
Tomaz
You can have more than 10 vertex collections in a named graph. The limitation of 10 only exists in the webUI. Creating the named graph over the ArangoShell or the server console will work.

Resources